Measure All The Things (And How!)
But Why?
- Provides insight into how the infrastructure works
- Easier to model and visualize data flows
- Allows for faster, more effective troubleshooting
- Facilitates design decisions
- Lets the business move faster with greater confidence
What Should We Measure?
- Application response times
- Server resource usage (cpu, memory, disk)
- Database statistics (collection sizes, disk usage)
- Network traffic (bandwidth, where it's going)
- Log statistics (error rate, success rate)
- Celery task execution times
- Celery task execution frequency
How can we do it?
In pieces
Stage 1
Hosted Metrics
-
Use DataDog as a fast and easy way to get metrics
- Supports Ubuntu, MongoDB, ElasticSearch, etc.
- Attractive UI with effective dashboards
- Able to push custom metrics
Stage 2
In-House Solution
- Comprehensive Metrics and Logging
- Able to visualize log data
- Able to gather and monitor on a wide range of metrics
- Extensible and highly customizable
In House Metrics Pipeline
Collection
- Diamond for gathering host metrics
- Has plugin infrastructure
- Existing plugins for MongoDB, ElasticSearch, Celery, etc.
- Written in Python and easy to extend
- PacketBeat for network statistics
- Able to track which applications are sending how much data
- Visualize network map and bandwidth usage
- Custom decorator for sending app metrics
- GELF (Greylog Extended Format) log handler for app
- Syslog forwarding
- Beaver(?)
In House Metrics Pipeline (Cont.)
Aggregation
- Logstash for log data and statistics
- Riemann for metric data and statistics
- Riemann can also be used to drive alerts
- Logstash can feed into Riemann for metrics around logs
In House Metrics Pipeline (Cont.)
Storage
- Elastic Search - Stores log data, as well as output from PacketBeat
- InfluxDB - Stores metrics data
- InfluxDB is a distributed TimeSeries database written in Go
In House Metrics Pipeline (Cont.)
Visualization
- Kibana - pure JS web interface to Elastic Search
- Used to visualize and analyze log data
- Extended by PacketBeat for displaying network statistics
- Grafana - pure JS web interface to Graphite and InfluxDB
- Used to visualize and analyze metrics data
- Build custom dashboards and graphs
- Ad hoc querying of InfluxDB and display of results
Questions?
![](http://en.hdyo.org/assets/ask-question-1-ca45a12e5206bae44014e11cd3ced9f1.jpg)
References
- Beaver - https://github.com/josegonzalez/beaver
- DataDog - https://www.datadoghq.com/
- Diamond - https://github.com/BrightcoveOS/Diamond
- Elastic Search - http://www.elasticsearch.org/
- Grafana - http://grafana.org/
- GrayPy - https://pypi.python.org/pypi/graypy/0.2.9
- InfluxDB - http://influxdb.org/
- Kibana - http://www.elasticsearch.org/overview/kibana/
- LogStash - http://logstash.net/
- PacketBeat - http://packetbeat.com/
- Riemann - http://riemann.io/
- riemann_wrapper -
Measure all the things (and how!)
By blarghmatey
Measure all the things (and how!)
- 2,064