diff options
author | Kristian Lyngstol <kly@kly.no> | 2019-01-29 21:48:54 +0100 |
---|---|---|
committer | Kristian Lyngstol <kly@kly.no> | 2019-01-29 21:48:54 +0100 |
commit | 68d31e02b28487cb5dd552c68efd10b4973f4169 (patch) | |
tree | 24eb002a4966b9ce47aef6c8bcc4c891cb1bb97d /doc | |
parent | 3b1ff674784205218c215212fd19d9cffd2ac708 (diff) | |
parent | 4306bc4f9c5ff40a5d56f700a2d753345188605f (diff) |
Merge branch 'master' of github.com:tech-server/gondul
Diffstat (limited to 'doc')
-rw-r--r-- | doc/gondul-git-split.rst | 62 | ||||
-rw-r--r-- | doc/gondul-receiver.rst | 185 |
2 files changed, 247 insertions, 0 deletions
diff --git a/doc/gondul-git-split.rst b/doc/gondul-git-split.rst new file mode 100644 index 0000000..e4fd155 --- /dev/null +++ b/doc/gondul-git-split.rst @@ -0,0 +1,62 @@ +================ +Ny repo-struktur +================ + +Motivasjon +========== + +Vi ønsker å dele opp gondul-repoet i mindre blokker, hensikten bak splitten +er delt: + +- Klarere skille mellom ellers uavhengige komponenter +- Enklere utvikling for alle +- Deployment forenkles ved å ha det i et eget repo, og at hvert repo kan + levere en faktisk pakke som installeres om ønskelig. +- Lettere å fryse enkelt-komponenter i forkant av arrangement. + + + +Nye repo +======== + +- Templating +- Front - inkluderer web/{js,img,fonts,css} og web/index.html +- lib - inkluderer include/ - Målet er nok å endre denne, da det ikke + egentlig er voldsomt overlapp mellom API og collectors, men inntill + videre er det eget repo. +- api - inkluderer web/api +- collectors - inkluderer collectors/ +- gondul/ - inkluderer ansible, dokumentasjon, default config. + +På sikt er målet at API er det eneste som snakker med postgres, men inntill +videre vil fortsatt collectors snakke direkte. Collectors kan i prinsippet +deles yttligere opp om det ønskes, men det blir mye små-repoer. + +Navn: + +- gondul-templating +- gondul-frontend +- gondul-api +- gondul-collectors +- gondul + +Repoet som da heter "gondul" blir "master-repo" og et slags +integrasjonsrepo. Det kan potensielt bli delt mer på sikt for å skille +ansible-saker fra dokumentasjon mm. Dette venter vi med for å unngå +usedvanlig mye fragmentering. + +Installasjon +============ + +Alt installeres default i /opt/$gondul-repo - Det vil være opp til +master-repoet å binde ting sammen. Det vil typisk bety at apache/nginx +settes opp for å levere statisk innhold for front og i dag levere CGI for +API, templating settes opp på egen port - Varnish vil da sørge for faktisk +ruting. + +Alt av "deployment" legges i "gondul"-repoet, men hver enkelt repo kan også +ønske å levere rutiner for isolert installasjon av typen som hører hjemme i +for eksempel en python-pakke eller debian-pakke. + + + diff --git a/doc/gondul-receiver.rst b/doc/gondul-receiver.rst new file mode 100644 index 0000000..e663e19 --- /dev/null +++ b/doc/gondul-receiver.rst @@ -0,0 +1,185 @@ + +================================ +API for receiving time-base data +================================ + +Background +========== + +Toda, Gondul has three different "collectors". The ping-collector, the +snmp-collector and the dhcp log-tailer. + +They all write data directly to the postgres backend. + +Over the years we've tried different methods of storing time series data for +actual graphs. To support this, we've stored some data in two sources. Most +recently, we've stored stuff in postgres and influxdb. + +In addition to actually storing this data in different locations, we some +times need to "massage" data, or change the database schema. A prime example +for SNMP is to actually establish a tree-structure for port-data by picking +up ifTable and ifXTable and building a "ports"-tree using ifIndex. An other +example is normalization of MAC addresses for example. + +We also need to do something with Virtual Chassis MIBs. + +While we've been able to do all this, the fact that these collectors all +write directly to the postgres database creates a strong cross-dependency +between the collectors, the database schema and the API. It has also created +a strong depenendency to time series database tools. + +This has made it difficult to safely experiment with enriching input-data +without introducing critical bugs in collectors or breaking the north-bound +API. + +This document outlines a way to reduce this problem. + +Concept +======= + +The concept is to create a generic "time-based" API for all time-oriented +data. The API will cover high-frequency producers like the ping collector, +but also low-frequency producers like the operations log, or the DHCP +tailer. + +While the API will be generic, it will provide just enough data to allow the +receiver to identify the type of data and apply enrichments to it and thus +treat it diffrently. By default, the data posted will just be written to +postgres, but through enrichment add-ons, we can also chose to split a +single SNMP poll into multiple entries (e.g.: individual entries for virtual +chassis nodes), or re-arrange the data to produce interface-mapping. + +The enrichment will also be able to do basically anything with the data, +including sending it to multiple other APIs - e.g. influx. + +While the first version does not deal with authentication, future versions +should. + +Core API +======== + +The core API accepts N or more metrics in a single post. + +The core of the API will accept 3 fields: + +- `source` - a text-string identifying the source of the data, e.g. "dhcp", + "ping", "snmp". This should be sent either as a json text field, or as + part of the url. E.g., allow posting to + ``https://user:pass@gondul/api/write/gtimes/dhcp`` . The benefit of + linking this with the URL is that it will simplify authentication in the + future, allowing "write-only" accounts. +- `metadata` - this is a generic JSON object that contain a number of fields + that will be indexed upon or used by enrichment. Example: ``{ "server": + "dhcpserver1", "time": "2019-01-05T15:00:10Z" }``. +- `data` - an array of json-objects. Each object in the array must either + have a "time" field or the "metadata"-field must have a time field. + + +Examples +======== + +Example 1, dhcp:: + + { + src: "dhcp", + metadata: { + server: "dhcpserver1" + }, + data: [ + { + type: "assignment", + time: "2001-01-01T15:12:01Z", + ip: "2001:db8::1", + circuit: "vlan123:e3-1:ge-0/0/1", + msg: "blatti foo" + }, + { + type: "renew", + time: "2001-01-01T15:32:01Z", + ip: "2001:db8::1", + circuit: "vlan123:e3-1:ge-0/0/1", + msg: "blatti foo something" + } + ] + } + +Example 2, ping:: + + { + "src": "ping", + "metadata": { + "time": "2019-05-01T15:01:12Z" + }, + "data": [ + { "s": "e1-3", "l": 0.91211 }, + { "s": "e1-2", "l": 0.12211 }, + { "s": "e1-1", "l": 0.12311 }, + { "s": "e3-1", "l": 1.12111 }, + { "s": "e3-2", "l": null }, + { "s": "e3-3", "l": 0.91211 }, + { "s": "e3-4", "l": 0.91211 } + ] + } + +Example 3, oplog:: + + { + "src": "oplog", + "data": [ + { + "system": "floor", + "user": "kristian", + "message": "lol", + "time": "2019-04-19T15:00:10Z" + } + ] + } + +Note that "metadata" is optional. + +Implementation plan +=================== + +The plan would be to start small. The first candidate is the dhcp log +tailer, which needs to support IPv6 and thus needs a change. + +The first implementation would be a "hard-coded" perl API since that is what +we already have. There is no current plan to migrate other producers to the +new API at this time. + +The first implementation would not offer much in the way of generic storage +for other users than the dhcp collector. + +Since particularly the ping collector can produce quite a lot of data, some +care might be needed to support it. This will most likely require a +different apporach than the old CGI-based perl way of doing things. + +To allow a flexible enrichment-scheme, it might be necessarry to implement a +separate service in a more modern language. There are currently three worthy +alternatives: + +Node.js has the benefit of using JavaScript which is already heavily used in +Gondul, and is fairly fault-tolerant. There are also already plans to +utilize node.js to do server-side parsing of health data. However, I'm +unsure if it offers the speed or integration we need. + +Python is an other alternative, which is also already used. It is slightly +more mature than Node.js, but also doesn't really offer much else. + +The third alternative is Go, which will certainly provide us with the speed +we need, but might not allow the development pace we require during an +event. + +No conclusion is offered and at any rate, no plans to actually implement +such a service exist nor should one be planned until we have more experience +from the DHCP-collector implementation. + +Storage +======= + +Storage is deliberately left OUT of the API definition, but for +implementation-purposes we should assume postgres as the primary target with +influx as a sencodary target. Details of how this is done is intentionally +left out of this document as this should not be relevant to any user of the +API. + |