aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
-rw-r--r--doc/api-2.0.rst331
1 files changed, 331 insertions, 0 deletions
diff --git a/doc/api-2.0.rst b/doc/api-2.0.rst
new file mode 100644
index 0000000..4d76098
--- /dev/null
+++ b/doc/api-2.0.rst
@@ -0,0 +1,331 @@
+New approach to Gondul API
+==========================
+
+The current api is split in three/four:
+
+- /api/read - read/only access for sensitive data
+- /api/public - read/only access for public data
+- /api/write - write-only (authenticated)
+- /templating or similar - for templating (read/sort-of-write-but-not-quite, sensitive)
+
+Today
+-----
+
+(/a/ = /api/, /a/w/ = /api/write/, etc)
+
+- "all" API endponts for reading data supports?when=(date) to adjust what
+ the defintion of "now" is, to enable historic review. All end-points
+ return ETags calculated using the content of the data returned.
+
+- /a/p/config - this provides information mainly used to determine if this
+ is the public variant or not. There was an idea originally to extend this
+ with more configuration-data, but it never materialized.
+
+- /a/p/dhcp - returns the dhcp-specific data for each network/traffic VLAN,
+ used. Returns both the most recent timestamp and a count of seen leases.
+
+- /a/p/dhcp-summary - returns total number of dhcp leases seen recently.
+ Used mainly to show total number of active clients.
+
+- /a/p/distro-tree - returns two structures, distro-tree-phy maps distros
+ and their physical ports to access switches ("distro5": { "ge-0/0/4":
+ "e13-1" } ), and "distro-tree-sys" maps distros to sysnames and ports
+ ("distro5": { "e13-1": "ge-0-0/4" } ). Required to be able to easily look
+ up both how ports are connected and how switches are connected.
+
+- /a/p/location - Uses the source-ip of the request to return a HTML page
+ that determines which switch the request is made for. Used for "dhcp
+ testing"/"dhcp-løp" to ensure switches are actually hooked up correctly
+ (e.g.: Hook up to a switch, visit the page, verify that what it tells you
+ matches the physical label of the switch)
+
+- /a/p/ping - Returns latency stats for all switches
+
+- /a/p/switches - Returns a subset of the information we have for all
+ switches. Only returns public-data.
+
+- /a/p/switch-state - Returns a subset of data from SNMP, parsed and
+ filtered, including summaries for port groups like "clients" vs
+ "uplinks". Has a good bit of logic for filtering what should and
+ shouldn't be shown to the general public.
+
+- /a/r/networks - lists all networks/vlans/layer2 domains we have
+
+- /a/r/oplog - shows the oplog
+
+- /a/r/snmp - show raw snmp-data, no filtering
+
+- /a/r/switches-management - shows config for switches - the unfilitered
+ variant of /a/p/switches
+
+- /a/r/template-list - lists all available templates
+
+- /a/w/collector - simple skogul interface written to receive DHCP log
+ data. Should be replaced by an actual skogul instance.
+
+- /a/w/config - write-endpoint for updating event config, rarely used.
+
+- /a/w/networks - add/update networks
+
+- /a/w/oplog - add oplog entries
+
+- /a/w/switches - add/update switches
+
+Changes
+-------
+
+The big changes suggested is:
+
+- Remove public interfaces from the "native" API. Consider adding public
+ nms as a filtered variant on top instead.
+
+- Do not use paths to distinguish write from read.
+
+- Do not natively deliver "two" data sets. If rates are needed. Make that
+ instead.
+
+- Create a new "ifmetrics" concept to extract interface metrics from SNMP
+ data, since it can also come from telemetry and other sources.
+
+- Leave all "rate" calculation out of the API. Instead, add integration
+ with influxdb under, e.g., /api/rates.
+
+- Option: Support ?then=now-5m or similar, which will then be cacheable,
+ and the client can then do two request (implicit ?then=now and a
+ ?then=now-1m) and compare.
+
+
+Actual suggestion
+-----------------
+
+I'm fairly convinced about:
+
+- /api/switches - Read/write interface for getting, updating and adding
+ switches. Read interface should be as identical to write interface as
+ possible.
+
+- /api/switches/some-switch - Similar, but for a single switch.
+
+- /api/networks - Ditto as switches
+
+- /api/networks/some-net - Similar, but for a single net
+
+- /api/oplog - Ditto
+
+- /api/snmp - GET all SNMP data available.
+
+- /api/snmp/some-switch - Get all SNMP data for a single switch
+
+I'm somewhat convinced about:
+
+- /api/ifmetrics - Get all interface metrics - regardless of source. Also
+ integrates the logic of "switch-state". If possible: Get "rates" for
+ relevant counters.
+
+- /api/ifmetrics/some-switch - Get all interface metrics for a single
+ switch
+
+- /api/ifmetrics/some-switch/port - Get metrics for a specific interface
+ for a specific switch.
+
+Less sure:
+
+- /api/templates/ - List all templates (in JSON format)
+
+- /api/templates/some-template - GET uncompiled template. Should optionally
+ support "Accept: application/json" to provide the data json-encoded as
+ well as "Accept: text/plain" for plain text/raw (default).
+
+- /templating/ GET the compiled template (uses templating.py)
+
+- /api/collector/{name} - POST url for relevant collector. Uses Skogul
+ JSON format (and implementation).
+
+- /api/collector/{dhcp,snmp,telemetry,ping,generic} - Some examples, where
+ "generic" will allow us to accept any data, and just stick it in some
+ general-purpose format or something. I have some more ideas about that.
+
+We could also consider implementing https://grafana.com/grafana/plugins/grafana-simple-json-datasource
+
+Progress
+--------
+
+We should get a basic API up in GO pretty fast, focusing on a single
+end-point and get it right. E.g.: Get /api/switches right from the start.
+All the 1-to-1 API-to-DB-table interfaces should be pretty much identical
+code-wise.
+
+Next up is probably ping, simply because it is, well, simple. It means
+re-factoring the collector to do HTTP POST, but that's a minor issue.
+
+Then I believe tackling SNMP and interfaces is important.
+
+Ifmetrics example
+-----------------
+
+Interface metrics should be agnostic to SNMP vs Telemetry vs Magic. It will
+therefore have a subset of curated fields. A spec needs to be written and
+maintained that defines what is and isn't REQUIRED, so front-ends can
+gracefully reduce functionality.
+
+Example, which WILL change during implementation::
+
+ {
+ "e13-1": {
+ "ge-0/0/1": {
+ "name": "ge-0/0/1",
+ "snmp_if_index": 1234,
+ "ifHighSpeed": 10000,
+ "if_operational_status": "UP",
+ "parent_ae_name": "ae95",
+ "description": "alias|name?",
+ "ingress": {
+ "octets": 125,
+ "errors": 5,
+ "discards": 0,
+ ...
+ },
+ "egress": {
+ "octets": 125,
+ "errors": 5,
+ "discards": 0,
+ ...
+ },
+ "rates": {
+ "ingress": {
+ "octets": 125,
+ "errors": 5,
+ "discards": 0,
+ ...
+ },
+ "egress": {
+ "octets": 125,
+ "errors": 5,
+ "discards": 0,
+ ...
+ }
+ }
+ },
+ "ge-0/0/2": {....}
+ },
+ "e15-1": {
+ "ge-0/0/1": {
+ "name": "ge-0/0/1",
+ "snmp_if_index": 1234,
+ "ifHighSpeed": 10000,
+ "if_operational_status": "UP",
+ "parent_ae_name": "ae95",
+ "description": "alias|name?",
+ "ingress": {
+ "octets": 125,
+ "errors": 5,
+ "discards": 0,
+ ...
+ },
+ "egress": {
+ "octets": 125,
+ "errors": 5,
+ "discards": 0,
+ ...
+ },
+ "rates": {
+ "ingress": {
+ "octets": 125,
+ "errors": 5,
+ "discards": 0,
+ ...
+ },
+ "egress": {
+ "octets": 125,
+ "errors": 5,
+ "discards": 0,
+ ...
+ }
+ }
+ },
+ "ge-0/0/2": {....}
+ }
+ }
+
+Requesting /api/ifmetrics/e15-1 would give::
+
+ {
+ "ge-0/0/1": {
+ "name": "ge-0/0/1",
+ "snmp_if_index": 1234,
+ "ifHighSpeed": 10000,
+ "if_operational_status": "UP",
+ "parent_ae_name": "ae95",
+ "description": "alias|name?",
+ "ingress": {
+ "octets": 125,
+ "errors": 5,
+ "discards": 0,
+ ...
+ },
+ "egress": {
+ "octets": 125,
+ "errors": 5,
+ "discards": 0,
+ ...
+ },
+ "rates": {
+ "ingress": {
+ "octets": 125,
+ "errors": 5,
+ "discards": 0,
+ ...
+ },
+ "egress": {
+ "octets": 125,
+ "errors": 5,
+ "discards": 0,
+ ...
+ }
+ }
+ },
+ "ge-0/0/2": {....}
+ }
+
+And /api/ifmetrics/e15-1/ge-0/0/1 ::
+
+ {
+ "name": "ge-0/0/1",
+ "snmp_if_index": 1234,
+ "ifHighSpeed": 10000,
+ "if_operational_status": "UP",
+ "parent_ae_name": "ae95",
+ "description": "alias|name?",
+ "ingress": {
+ "octets": 125,
+ "errors": 5,
+ "discards": 0,
+ ...
+ },
+ "egress": {
+ "octets": 125,
+ "errors": 5,
+ "discards": 0,
+ ...
+ },
+ "rates": {
+ "ingress": {
+ "octets": 125,
+ "errors": 5,
+ "discards": 0,
+ ...
+ },
+ "egress": {
+ "octets": 125,
+ "errors": 5,
+ "discards": 0,
+ ...
+ }
+ }
+ }
+
+Some issues remains: There should be an idea of totals, for convenience.
+Some metadata regarding precision of rates (e.g.: number of measurements or
+something), and various other enrichments. So the exact details here might
+need some refinement.
+