Application Metrics & 

DevOps Awesomeness 

With Graphite and Grafana






Logging Analytics


Observability


Application Metrics





Who am I? 




Torkel Ödegaard




@torkelo
github.com/torkelo


Stockholm



Sweden





Coding Instinct

"we are survival machines - robot vehicles blindly programmed to preserve the selfish molecules known as genes" 



Open source metrics dashboard and graph editor for
Graphite, InfluxDB and OpenTSDB

Sponsors




Why?


Continuous delivery

  • Monitoring
  • Logging
  • Alerting
  • Analytics

Distributed systems


  • Isolated sub-systems / applications
  • Async messaging via queues
  • Many servers


Standard logging solution



log4net
log4j
NLog


FileAppender
DatabaseAppender
MailAppender
TcpAppender
EventLog










SELECT * FROM Logs WHERE ....

Standard metrics solution (win)


Performance Counters











Are  there better options?



  • Fast centralized logging analytics
  • Trends in errors, servers, application behavior
  • High resolution live visualization of application behavior
  • Detailed application performance metrics 
  • Long term trends / comparisons of user & application behavior



Elasticsearch







Kibana




Log -> Elasticsearch






LogStash


input {
    tcp {
        type => "log4j"
        port => 3333
    } 
}
filter {
    grok {
        type => "log4j"
        pattern => "%{LOGLEVEL:severity}\s+%{WORD:category} ..."    
        add_tag => "log4j"
    }
    date {
        type => "log4j"
        timestamp => "MM-dd-yyyy hh:mm:ss.SSS a Z"                
    }
}
output {
    elasticsearch { host => "my-elasic-server" }
}

Inputs


input {
    file {
        'path' => '/var/log/apache2/*.log'
        'type' => 'apache-logs'
    }

    redis {
        host => "127.0.0.1"
        type => "redis-input" 
        data_type => "list"
        key => "logstash"    
        message_format => "json_event"
    }
}

Filters


filter {
  grok {
    pattern => "%{COMBINEDAPACHELOG}"
    singles => true
  }
  date {
    match => ["timestamp", "dd/MMM/yyyy:HH:mm:ss Z"]
    locale => "en"
  }
}

filter {
  geoip {
    source => "clientip"    
  }
  useragent {
    source => "agent"    
  }
}

Outputs


output {
    elasticsearch { 
        host => "my-elasic-server" 
    }
    
    redis {
    }
    
    mongodb {
    }
    
    rabbitmq {
    }
}



Demo



Metrics / Measurements

Metric vs Log Event



MetricKey    Value   Timestamp





Graphite


  • Open source scalable time series database
  • Composed of 3 components
    • Carbon  - receives and records metrics
    • Whisper - Storage engine
    • Graphite-web - Http frontend 
  • Large community 
  • Written in python

http://graphite.readthedocs.org

https://github.com/graphite-project




Input

prod.apps.server-1.counter.login.count   10    1398969187

Query
prod.apps.*.counter.login.count



Functions!


sumSeries(apps.mysite.*.counter.login.count)

summarize(apps.mysite.*.counter.login.count, '1h')

movingAverage(apps.mysite.*.counter.login.count, 10)

timeShift(apps.mysite.*.counter.login.count, '7d')

Metric Libraries







Metric types


  • Counters
  • Timers
  • Gauges


Metric.Increment("user.login");            


Metric.Time("auction_search", 142);            


Metric.Time("auction_search", () => search());            
    
Graphite writer

apps.devsum.server-01.counters.auction_search.count   15    123123123131
apps.devsum.server-02.counters.auction_search.count    1    123123123131
apps.devsum.server-03.counters.auction_search.count   35    123123123131

apps.devsum.server-01.timers.auction_search.count    5    123123123131
apps.devsum.server-01.timers.auction_search.mean    10    123123123131
apps.devsum.server-01.timers.auction_search.max     50    123123123131
apps.devsum.server-01.timers.auction_search.min      2    123123123131
    

    

Demo



Graphite intro





play.grafana.org

Graphite configuration



[stats]
pattern = ^apps.*
retentions = 10s:6h,1min:7d,10min:5y

[stats]
pattern = ^highres.*
retentions = 1s:6h,1min:1d

[stats]
pattern = ^statsd.*
retentions = 1min:1d,10min:1y



    

Time measurements



Average is not god enough!


5
7
2
7
2400
20
15
10000
4
2

Avg = 1246

Percentiles


5
7
2
7
2400
20
15
10000
4
2

Percentiles

10000
2400
20
15
7
7
5
4
2
2

upper 20 = 2
upper 50 = 7
upper 70 = 15
upper 90 = 2400





StatsD


github.com/etsy/statsd/


More demo


Functions

timeShift
percent
summarize
integral
derivate

Display options

templated
annotations


Thanks!



@torkelo

@grafana

grafana.org

github.com/grafana/grafana



Made with Slides.com