• Menu
    • Ping map
    • Uplink map
    • Temperature map
    • Traffic map
    • Comment spotter
    • Total switch traffic
    • DISCO
    • Travel in time
    • Replay TG
    • View
    • Toggle Night Mode
    • Tweak Night Mode blur
    • Set layer visibility
    • Map scale
    • Help
    • About TG15 data
    • About NMS
    • About Performance
    • Keyboard Shortcuts
    • Debug timers

About the TG15 data

The data you see from The Gathering 2015 will seem "broken up". This is not because we don't have data from the first day, but because the backend was re-written on day 1/2 and this web app only uses the new API.

NMS was set up on March 30th (Monday). Data started pouring in on the same day.

Ping data is available for the entire event with 1 second resolution. We "lost" data from the 30th because we re-inserted the switches (We have the ping data, but not the mapping between switch ID number and actual switch).

DHCP data is available only for the last detected DHCP ack (no history, except extensive text-based logs)

Uplink status is available for most of the event, but not exposed here. We only expose traffic-based uplink state here, which, again, is based on the new API.

Traffic status was temporarily bugged, but is available from late on day 2.

Temperature data is available from day 2.

Plans are being made to ensure that we don't have gaps like these in the future.

It is also worth mentioning that things like switch positions are not logged historically, so you see the final position on the map.

Performance

Outstanding AJAX requests
Overflowed AJAX requests

NMS performance is surprisingly complex. It's split into several parts and dealt with differently.

Poller performance is a matter of efficiently collecting data and is mostly handled in the Perl code (and ensuring we use sensible database schemas).

Backend performance for the GUI is mostly about not killing the database server. We do NOT try to protect against malicious clients directly, since this is a management system not public-facing, but Varnish is used to cache requests. To be able to do that properly, we need use absolute time when reviewing past events (so "2015-04-02 17:30:00", not "2 hours ago"). We've also tried to minimize the stupidity in the queries. There's still work to be done here, though, as we need to split up a few large backend requests (port-state.pl).

Front-end performance is mostly about drawing things sensibly and not completely bombing the memory usage. And about gracefully handling slow backends This will affect you. For example, if you are reviewing past events and the DB is struggling, we'll simply skip a backend request if we have too many outstanding requests, that means you may jump from "17:00" to "18:30" instead of going through "17:30" and "18:00" too. This is working as intended. It also means that you can happily spam the forward/backward keyboard bindings to jump 18 hours forward: You'll overflow the extra AJAX requests for individual requests, but you'll land at the right time when you let go. But there could be a 1 second delay (or more if the backend really struggles) since you'll have to rely on the periodic backend requests instead of the explicit ones triggered on hitting a button.

Note that the counters on top are updated on a timer, but this timer is set up at the same time as everything else, which means that it's likely to update at the same time as we fire off AJAX requests, so the 'outstanding ajax requests' counter might either show almost constantly 3 or 0 depending on what timer happens to fire first. This does NOT mean that NMS has 3 requests all the time, just that we're checking right after we fire off AJAX requests every time.

NMS also tries to handle drawing OK, which is why things are split into different HTML5 canvases. Blur and text are particularly expensive, but there's no reason to re-paint that all the time, etc).

The basic performance experiments are done on TG15 data using a laptop and a VM with 6GB of memory, so it should hold up quite well on "proper" hardware.

Keyboard Shortcuts

Key Description
? Toggle navigation bar
n Toggle night mode
1 View Ping map
2 View uplink map
3 View temperature map
4 View uplink traffic map
5 View comment spotter map
6 View total switch traffic map
7 View Disco map
h Step 1 hour back in time
j Step 5 minutes back in time
k Step 5 minutes forward in time
l Step 1 hour forward in time
p Toggle playback (1 hour per second)
r Return to real time

Time travel

Some features do not have time travel support (comment spotting and DHCP map at the moment). We also lack compatible SNMP data for the first day or so, so you'll only have ping data for the first day of TG15.

It could take some time to load a specific point in time for the first time. See "About performance" under the help menu for more information.

You can also step backwards and forwards in time, stop and start replay and go back to real time using keyboard shortcuts. See the help menu for an overview of keyboard shortcuts.

Welcome to NMS

Cool stuff:

  • Click a switch for more info
  • Rewind: You can check out state at a specific time or replay from the beginning of the event. Only works for data where we keep time-series (so not for comments)
  • Press '?' to toggle the menu.
  • Auto-scaling the viewport/canvas
  • Total client speed (up right)
  • Generic(-ish) map handlers: provide a name, init-function and an update-function and the nms lib does the rest as far as integration goes.

Todo list front end:

  • Polish time travel UI (Allow playing from a given time at a given speed, play/pause buttons, etc)
  • Better "popup" boxes: It's growing out of control.
  • Toggle auto-scale on/off
  • Clean up various global variables
  • Create name spaces in nms.*: It's just barely better than global stuff now.
  • Add DHCP map
  • More info on switches: Port state, possibly link time trends
  • Moving switches around (like ping.html + edit)
  • Split nms.js into multiple components to unclutter the code
  • Comments: Fix UTF8 garbligash caused by $dbh->quote()

Todo for backend:

  • IPv6 support
  • Provide public API's
  • Investigate a json tree filter/massager
  • Close SQL injections (IT'S WIDE OPEN BECAUSE WHY NOT THAT'S NEVER A PROBLEM)
  • Split port-state.pl into multiple appropriate pieces. Right it mixes heavy time-critical data with less time-critical and cheap computation.
  • Rip comments out of port-state.pl completely so it's not bound by the same cache issues and can be reliably refreshed.
  • Consider time log of DHCP (right now it just stores the most recent timestamp, making time travel impossible)
  • Fix SNMP-fetcher so it gets ifXTable and at least ifOperStatus from ifTable. Don't request the entire ifXTable if we can avoid it. Possibly other tweaks.
  • Support for adding switches through an API, not just pure SQL.
  • Integrate with FAP
  • Clean up old interfaces
  • Review various agents/tools
  • Improve cache headers
  • Cache invalidation of comments? (Probably not needed)
  • Re-test the SQL schema. It's been modified and works fine on my laptop, but I need to dump it, commit it and test it.
  • Munin plugin for ports.

Blur tweaks

Debug timers (e.g.: Break stuff! FAST!)

These are internal timers for the NMS frontend. They are provided mainly to debug the frontend. Setting AJAX-triggering counters to ridiculous numbers is not advised (mainly because it causes server load).

Set layer visibility

Background
Linknets
Blur
Switches
Text
TextInfo
Timestamp