ELK Workshop

What's wrong with our logs?

Logs are spread across many files and servers
Inconsistent/missing structure
grep too slow and inconvinient
Hard to graph/count/analyze data
Can't do stream processing

Workshop goals

Collect logs from real world applications
Apply a certain structure to our logs
Analyze them
Gain experience with ELK

What is ELk?

Elasticsearch - Lucene based search server
Logstash - Log pipeline and processor
Kibana - Log analytics frontend for Elasticsearch

Prerequisites

Vagrant
VirtualBox
ELK Vagrant box
Workshop git repository (https://github.com/nir0s/elk-workshop.git)

FIRST

~/elk/logstash/bin/logstash -e 'input { stdin { } } output { stdout {} }'

let's see that it works.

now, http://logstash.net/

Simple log collection

Collect logs from a file and index in Elasticsearch for easy browsing


input {
	file {
	    path => ["/home/vagrant/elk-workshop/generated.log"]
	}
}
output {
	elasticsearch {
	    host => "localhost"
	}
}

git checkout 01-the-file-input

mouth feed -f ApacheAccessEx -t File -m 1000 -g 0.001

Codecs

Codecs parse logs directly within an input plugin using a pre-defined format or serializer


input {
	file {
	 	path => ["/home/vagrant/pylog/generated.log"]
	 	codec => json
	}
}

Parsing logs using grok


filter {
	grok {
	    match => ["message", "%{COMBINEDAPACHELOG}"]
	}
}

git checkout 02-the-grok-filter

mouth feed -f ApacheAccessEx -t File -m 1000 -g 0.001

Basic Kibana usage

Search for logs
Filter by field
Zoom in/out

Advanced Kibana usage

Widgets
Saving/loading/sharing dashboards
Preparing dashboards for the big screen

Multi-line logs


filter {
	multiline {
	  	type => "catalina_out"
		         pattern => "^\s"
	 	what => "previous"
	}
}

Building a log pipeline using RabbitMQ

RabbitMQ is an advanced message broker with queuing abilities. We can use it to build an elaborate pipeline with ELK

input {
    rabbitmq {
        host => "localhost"
        codec => "json"
        queue => "logstash"
        durable => "true"
        auto_delete => "true"
        exclusive => "false"
    }
}

RabbitMQ reference: link

git checkout 03-rabbitmq-as-a-broker

mouth feed -f ApacheAccessEx -t test_amqp -m 1000 -g 0.001 -c resources/feeder_config.py

Adding data to logs

The geoip filter

filter {
     geoip {
         source => "clientip"
     }
}

git checkout 04-geoip-to-kibana-map

mouth feed -f ApacheAccessEx -t test_amqp -m 1000 -g 0.001 -c resources/feeder_config.py

Manipulation

The translate filter can replace data within a message

 filter {
    translate {
        dictionary => [ "100", "Continue",
                        "101", "Switching Protocols",
                        "merci", "thank you",
                        "old version", "new version" ]
    }
}

that would be helpful, for instance, if you want to replace http error codes with their verbal description.

the date filter (and field removal)

filter { 
    date {
        # 04/Aug/14:10:59:09 +0000
        match => [ "timestamp", "dd/MMM/YY:HH:mm:ss +0000" ]
        target => "@timestamp"
        remove_field => [ "timestamp" ]
    }
}

git checkout 05-the-date-filter

mouth feed -f ApacheAccessEx -t test_amqp -m 1000 -g 0.001 -c resources/feeder_config.py

Deduping logs

We can dedup logs in Elasticsearch, removing duplicate log entries to save space and cleanup the logs

filter {
	fingerprint {
		source => ["message"]
		target => "fingerprint"
	}
}	
output {
	elasticsearch {
		document_id => "%{fingerprint}"
	}
}

Counting events

filter {
	metrics {
		meter => ["messages"]
		add_tag => "metric"
	}
}
output {
	if "metric" in ["tags"] {
		graphite {
			fields_are_metrics => true
			include_metrics => "messages\.rate_[0-9]m"
			metrics_format => "logstash.*"
		}
	}
}

multiple outputs

we might also want to output to a file so that we can analyze it later on (you know.. BigData and all)

output {
    elasticsearch {
        host => "localhost"
        document_id => "%{fingerprint}"
    }
    file {
        path => "/home/vagrant/elk-workshop/analyzed.log"
    }
}

git checkout 06-output-to-file

mouth feed -f ApacheAccessEx -t test_amqp -m 1000 -g 0.001 -c resources/feeder_config.py

Output logs to additional places

files
Pub/sub
Hipchat, IRC
Nagios/Zabbix
statsd
Librato/datadog

ELK Workshop

What's wrong with our logs?

Workshop goals

What is ELk?

Prerequisites

FIRST

Simple log collection

Codecs

Parsing logs using grok

Basic Kibana usage

Advanced Kibana usage

Multi-line logs

Building a log pipeline using RabbitMQ

Adding data to logs

Manipulation

Deduping logs

Counting events

multiple outputs

Output logs to additional places

A few words about clustering

Final word

What is a log really?

Any (timestamped?) event stream

Thanks for participating!

Where to go next

ELK Workshop

ELK Workshop

Nir Cohen

ELK Workshop

What's wrong with our logs?

Workshop goals

What is ELk?

Prerequisites

FIRST

Simple log collection

Codecs

Parsing logs using grok

Basic Kibana usage

Advanced Kibana usage

Multi-line logs

Building a log pipeline using RabbitMQ

Adding data to logs

Manipulation

Deduping logs

Counting events

multiple outputs

Output logs to additional places

A few words about clustering

Final word

What is a log really?

Any (timestamped?) event stream

Thanks for participating!

Where to go next

ELK Workshop

More from Nir Cohen