Data analysis with Graphite

Graphite

  • Storage
  • UI
  • Passive
  • Math

StatsD

  • Timeless metrics
  • Raw -> Aggregate
  • Statistical
  • UDP*
  • RAM only
some.metric.name:2|c|@0.1

Riemann, Hekad

  • Arbitrary aggregations
  • Metrics from events

Dashboards

  • Grafana
  • GraphExplorer
  • Tessera
  • Cabot, Syren

Too many to list

Short intro to Metrics

What's a metric?

  • Timestamp
  • Name
  • Value
  • Maybe other stuff (tags?)

Raw metric

  • High data rate
  • Expensive to transmit and store
  • Can use for any calculation

Aggregate

  • Compressed
  • Biased towards certain usage
  • Accuracy
  • Data loss

Sampling

Gauges

Counters

Timers

Graphite

Short intro

Architecure

Protocols

  • Line protocol - TCP, UDP
  • Pickle protocol
  • AMQP
host.service.subservice.something.blah 231 1438182493

What if we have multiple points in the same interval?

Graphite takes the last one

Use prefixes to protect

Storage schema

  • Multiple periods and resolutions
  • Downscale by aggregate function
  • AVG, MIN, MAX, LAST
  • 12 bytes per point (+ change)
  • Preallocated
[all_min]
pattern = \.min$
xFilesFactor = 0.1
aggregationMethod = min
[apache_busyWorkers]
pattern = ^servers\.www.*\.workers\.busyWorkers$
retentions = 15s:7d,1m:21d,15m:5y

Finding a niddle in a haystack

Average is mean to me

Percentiles, StdDev

  • The birthday paradox
  • p99
  • p50 - median

Using graphite

How to get stuff out

  • json
  • csv
  • pickle
  • svg
  • png

Events

  • Events API
  • drawAsInfiinite + timeseries

Downsampling issues

consolidateBy()

Cleanup noise

movingAverage()

Drawing threshold

constantLine()

Correlating

  • second Y axis

  • flot

  • scaling

Working with counters

  • derivative()
  • nonNegativeDerivative()
  • scaleToSeconds()

Correlating multiple series

  • MostDeviant
  • Highest
  • Lowest
Made with Slides.com