Measure All The Things (And How!)

But Why?


  • Provides insight into how the infrastructure works
  • Easier to model and visualize data flows
  • Allows for faster, more effective troubleshooting
  • Facilitates design decisions
  • Lets the business move faster with greater confidence

What Should We Measure?


  • Application response times
  • Server resource usage (cpu, memory, disk)
  • Database statistics (collection sizes, disk usage)
  • Network traffic (bandwidth, where it's going)
  • Log statistics (error rate, success rate)
  • Celery task execution times
  • Celery task execution frequency

How can we do it?


In pieces


Stage 1

Hosted Metrics

  • Use DataDog as a fast and easy way to get metrics
  • Supports Ubuntu, MongoDB, ElasticSearch, etc.
  • Attractive UI with effective dashboards
  • Able to push custom metrics

Stage 2

In-House Solution

  • Comprehensive Metrics and Logging
  • Able to visualize log data
  • Able to gather and monitor on a wide range of metrics
  • Extensible and highly customizable

In House Metrics Pipeline


Collection

  • Diamond for gathering host metrics
  • Has plugin infrastructure
  • Existing plugins for MongoDB, ElasticSearch, Celery, etc.
  • Written in Python and easy to extend
  • PacketBeat for network statistics
  • Able to track which applications are sending how much data
  • Visualize network map and bandwidth usage
  • Custom decorator for sending app metrics
  • GELF (Greylog Extended Format) log handler for app
  • Syslog forwarding
  • Beaver(?)

In House Metrics Pipeline (Cont.)


Aggregation

  • Logstash for log data and statistics
  • Riemann for metric data and statistics
  • Riemann can also be used to drive alerts
  • Logstash can feed into Riemann for metrics around logs

In House Metrics Pipeline (Cont.)


Storage

  • Elastic Search - Stores log data, as well as output from PacketBeat
  • InfluxDB - Stores metrics data
  • InfluxDB is a distributed TimeSeries database written in Go

In House Metrics Pipeline (Cont.)


Visualization

  • Kibana - pure JS web interface to Elastic Search
  • Used to visualize and analyze log data
  • Extended by PacketBeat for displaying network statistics
  • Grafana - pure JS web interface to Graphite and InfluxDB
  • Used to visualize and analyze metrics data
  • Build custom dashboards and graphs
  • Ad hoc querying of InfluxDB and display of results

Questions?


References


  • Beaver - https://github.com/josegonzalez/beaver
  • DataDog - https://www.datadoghq.com/
  • Diamond - https://github.com/BrightcoveOS/Diamond
  • Elastic Search - http://www.elasticsearch.org/
  • Grafana - http://grafana.org/
  • GrayPy - https://pypi.python.org/pypi/graypy/0.2.9
  • InfluxDB - http://influxdb.org/
  • Kibana - http://www.elasticsearch.org/overview/kibana/
  • LogStash - http://logstash.net/
  • PacketBeat - http://packetbeat.com/
  • Riemann - http://riemann.io/
  • riemann_wrapper - 
https://pypi.python.org/pypi/riemann_wrapper/0.5.8

Measure all the things (and how!)

By blarghmatey

Measure all the things (and how!)

  • 2,064