ELK Workshop
What's wrong with our logs?
- Logs are spread across many files and servers
- Inconsistent/missing structure
- grep too slow and inconvinient
- Hard to graph/count/analyze data
- Can't do stream processing
Workshop goals
- Collect logs from real world applications
- Apply a certain structure to our logs
- Analyze them
- Gain experience with ELK
What is ELk?
- Elasticsearch - Lucene based search server
- Logstash - Log pipeline and processor
- Kibana - Log analytics frontend for Elasticsearch
Prerequisites
- Vagrant
- VirtualBox
- ELK Vagrant box
- Workshop git repository (https://github.com/nir0s/elk-workshop.git)
FIRST
~/elk/logstash/bin/logstash -e 'input { stdin { } } output { stdout {} }'
now, http://logstash.net/
Simple log collection
Collect logs from a file and index in Elasticsearch for easy browsing
input {
file {
path => ["/home/vagrant/elk-workshop/generated.log"]
}
}
output {
elasticsearch {
host => "localhost"
}
}
git checkout 01-the-file-input
mouth feed -f ApacheAccessEx -t File -m 1000 -g 0.001
Codecs
Codecs parse logs directly within an input plugin using a pre-defined format or serializer
input {
file {
path => ["/home/vagrant/pylog/generated.log"]
codec => json
}
}
Parsing logs using grok
filter {
grok {
match => ["message", "%{COMBINEDAPACHELOG}"]
}
}
git checkout 02-the-grok-filter
mouth feed -f ApacheAccessEx -t File -m 1000 -g 0.001
Basic Kibana usage
- Search for logs
- Filter by field
- Zoom in/out
Advanced Kibana usage
- Widgets
- Saving/loading/sharing dashboards
- Preparing dashboards for the big screen
Multi-line logs
filter {
multiline {
type => "catalina_out"
pattern => "^\s"
what => "previous"
}
}
Building a log pipeline using RabbitMQ
RabbitMQ is an advanced message broker with queuing abilities. We can use it to build an elaborate pipeline with ELK
input {
rabbitmq {
host => "localhost"
codec => "json"
queue => "logstash"
durable => "true"
auto_delete => "true"
exclusive => "false"
}
}
RabbitMQ reference: linkgit checkout 03-rabbitmq-as-a-broker
mouth feed -f ApacheAccessEx -t test_amqp -m 1000 -g 0.001 -c resources/feeder_config.py
Adding data to logs
The geoip filter
filter {
geoip {
source => "clientip"
}
}
git checkout 04-geoip-to-kibana-map
mouth feed -f ApacheAccessEx -t test_amqp -m 1000 -g 0.001 -c resources/feeder_config.py
Manipulation
The translate filter can replace data within a message
filter {
translate {
dictionary => [ "100", "Continue",
"101", "Switching Protocols",
"merci", "thank you",
"old version", "new version" ]
}
}
the date filter (and field removal)
filter {
date {
# 04/Aug/14:10:59:09 +0000
match => [ "timestamp", "dd/MMM/YY:HH:mm:ss +0000" ]
target => "@timestamp"
remove_field => [ "timestamp" ]
}
}
git checkout 05-the-date-filter
mouth feed -f ApacheAccessEx -t test_amqp -m 1000 -g 0.001 -c resources/feeder_config.py
Deduping logs
We can dedup logs in Elasticsearch, removing duplicate log entries to save space and cleanup the logs
filter {
fingerprint {
source => ["message"]
target => "fingerprint"
}
}
output {
elasticsearch {
document_id => "%{fingerprint}"
}
}
Counting events
filter {
metrics {
meter => ["messages"]
add_tag => "metric"
}
}
output {
if "metric" in ["tags"] {
graphite {
fields_are_metrics => true
include_metrics => "messages\.rate_[0-9]m"
metrics_format => "logstash.*"
}
}
}
multiple outputs
we might also want to output to a file so that we can analyze it later on (you know.. BigData and all)
output {
elasticsearch {
host => "localhost"
document_id => "%{fingerprint}"
}
file {
path => "/home/vagrant/elk-workshop/analyzed.log"
}
}
git checkout 06-output-to-file
mouth feed -f ApacheAccessEx -t test_amqp -m 1000 -g 0.001 -c resources/feeder_config.py
Output logs to additional places
- files
- Pub/sub
- Hipchat, IRC
- Nagios/Zabbix
- statsd
- Librato/datadog
A few words about clustering
Final word
What is a log really?
Any (timestamped?) event stream
Thanks for participating!
Where to go next
ELK Workshop
By Nir Cohen
ELK Workshop
- 2,603