Application Metrics &
DevOps Awesomeness
With Graphite and Grafana
Logging Analytics
Observability
Application Metrics
Who am I?
Torkel Ödegaard
@torkelo
github.com/torkelo
Coding Instinct
"we are survival machines - robot vehicles blindly programmed to preserve the selfish molecules known as genes"
Open source metrics dashboard and graph editor for
Graphite, InfluxDB and OpenTSDB
Sponsors
Continuous delivery
- Monitoring
- Logging
- Alerting
- Analytics
Distributed systems
- Isolated sub-systems / applications
- Async messaging via queues
- Many servers
Standard metrics solution (win)
Performance Counters
Metric vs Log Event
MetricKey Value Timestamp
Graphite
- Open source scalable time series database
- Composed of 3 components
- Carbon - receives and records metrics
- Whisper - Storage engine
- Graphite-web - Http frontend
- Large community
- Written in python
Input
prod.apps.server-1.counter.login.count 10 1398969187
Query
prod.apps.*.counter.login.count
Functions!
sumSeries(apps.mysite.*.counter.login.count)
summarize(apps.mysite.*.counter.login.count, '1h')
movingAverage(apps.mysite.*.counter.login.count, 10)
timeShift(apps.mysite.*.counter.login.count, '7d')
Metric Libraries
- codahale metrics (java)
- metrics-net
- ostrich (scala)
- StatsD (all languages)
Metric.Increment("user.login");
Metric.Time("auction_search", 142);
Metric.Time("auction_search", () => search());
Graphite writer
apps.devsum.server-01.counters.auction_search.count 15 123123123131
apps.devsum.server-02.counters.auction_search.count 1 123123123131
apps.devsum.server-03.counters.auction_search.count 35 123123123131
apps.devsum.server-01.timers.auction_search.count 5 123123123131
apps.devsum.server-01.timers.auction_search.mean 10 123123123131
apps.devsum.server-01.timers.auction_search.max 50 123123123131
apps.devsum.server-01.timers.auction_search.min 2 123123123131
Graphite configuration
[stats]
pattern = ^apps.*
retentions = 10s:6h,1min:7d,10min:5y
[stats]
pattern = ^highres.*
retentions = 1s:6h,1min:1d
[stats]
pattern = ^statsd.*
retentions = 1min:1d,10min:1y
Time measurements
Average is not god enough!
5
7
2
7
2400
20
15
10000
4
2
Avg = 1246
Percentiles
5
7
2
7
2400
20
15
10000
4
2
Percentiles
10000
2400
20
15
7
7
5
4
2
2
upper 20 = 2
upper 50 = 7
upper 70 = 15
upper 90 = 2400
More demo
Functions
timeShift
percent
summarize
integral
derivate
Display options
templated
annotations
Future of metrics
- Metric 2.0
- Alerting
- Resolution
Metrics 2.0
prod.eu-01.webapp-01.counters.images.upload_bytes.count
Problems
- Finding metrics
- Understanding metrics
- Metric unit?
- Rate write?
- Meta data
- Change Agent
Metrics 2.0
prod.eu-01.webapp-01.counters.images.upload_bytes.count
{
server: webapp-01,
datacenter: eu-01,
unit: bytes,
rate: 10s,
metric_type: counter,
stat: images.upload
}
Metrics 2.0
Conceptual model vs
wire protocol vs
storage
Metric resolution and alerting
Thanks!
@torkelo
@grafana
grafana.org
github.com/grafana/grafana