ELK Workshop



What's wrong with our logs?

  • Logs are spread across many files and servers
  • Inconsistent/missing structure
  • grep too slow and inconvinient
  • Hard to graph/count/analyze data
  • Can't do stream processing



Workshop goals

  • Collect logs from real world applications
  • Apply a certain structure to our logs
  • Analyze them
  • Gain experience with ELK



What is ELk?

  • Elasticsearch - Lucene based search server
  • Logstash - Log pipeline and processor
  • Kibana - Log analytics frontend for Elasticsearch



Prerequisites

  • Vagrant
  • VirtualBox
  • ELK Vagrant box
  • Workshop git repository (https://github.com/nir0s/elk-workshop.git)



FIRST

~/elk/logstash/bin/logstash -e 'input { stdin { } } output { stdout {} }'
let's see that it works.

Simple log collection

Collect logs from a file and index in Elasticsearch for easy browsing


input {
	file {
	    path => ["/home/vagrant/elk-workshop/generated.log"]
	}
}
output {
	elasticsearch {
	    host => "localhost"
	}
}
                    

git checkout 01-the-file-input
mouth feed -f ApacheAccessEx -t File -m 1000 -g 0.001


Codecs

Codecs parse logs directly within an input plugin using a pre-defined format or serializer


input {
	file {
	 	path => ["/home/vagrant/pylog/generated.log"]
	 	codec => json
	}
}
                		




Parsing logs using grok


filter {
	grok {
	    match => ["message", "%{COMBINEDAPACHELOG}"]
	}
}
                		


git checkout 02-the-grok-filter
mouth feed -f ApacheAccessEx -t File -m 1000 -g 0.001



Basic Kibana usage

  • Search for logs
  • Filter by field
  • Zoom in/out



Advanced Kibana usage

  • Widgets
  • Saving/loading/sharing dashboards
  • Preparing dashboards for the big screen



Multi-line logs


filter {
	multiline {
	  	type => "catalina_out"
		         pattern => "^\s"
	 	what => "previous"
	}
}
                		

Building a log pipeline using RabbitMQ

RabbitMQ is an advanced message broker with queuing abilities. We can use it to build an elaborate pipeline with ELK

input {
    rabbitmq {
        host => "localhost"
        codec => "json"
        queue => "logstash"
        durable => "true"
        auto_delete => "true"
        exclusive => "false"
    }
}								               		
RabbitMQ reference: link

git checkout 03-rabbitmq-as-a-broker
mouth feed -f ApacheAccessEx -t test_amqp -m 1000 -g 0.001 -c resources/feeder_config.py

Adding data to logs

The geoip filter
filter {
     geoip {
         source => "clientip"
     }
}





git checkout 04-geoip-to-kibana-map
mouth feed -f ApacheAccessEx -t test_amqp -m 1000 -g 0.001 -c resources/feeder_config.py


Manipulation

The translate filter can replace data within a message
 filter {
    translate {
        dictionary => [ "100", "Continue",
                        "101", "Switching Protocols",
                        "merci", "thank you",
                        "old version", "new version" ]
    }
}
that would be helpful, for instance, if you want to replace http error codes with their verbal description.


the date filter (and field removal)

filter { 
    date {
        # 04/Aug/14:10:59:09 +0000
        match => [ "timestamp", "dd/MMM/YY:HH:mm:ss +0000" ]
        target => "@timestamp"
        remove_field => [ "timestamp" ]
    }
}



git checkout 05-the-date-filter
mouth feed -f ApacheAccessEx -t test_amqp -m 1000 -g 0.001 -c resources/feeder_config.py


Deduping logs

We can dedup logs in Elasticsearch, removing duplicate log entries to save space and cleanup the logs

filter {
	fingerprint {
		source => ["message"]
		target => "fingerprint"
	}
}	
output {
	elasticsearch {
		document_id => "%{fingerprint}"
	}
}								


Counting events

filter {
	metrics {
		meter => ["messages"]
		add_tag => "metric"
	}
}
output {
	if "metric" in ["tags"] {
		graphite {
			fields_are_metrics => true
			include_metrics => "messages\.rate_[0-9]m"
			metrics_format => "logstash.*"
		}
	}
}              		


multiple outputs

we might also want to output to a file so that we can analyze it later on (you know.. BigData and all)
output {
    elasticsearch {
        host => "localhost"
        document_id => "%{fingerprint}"
    }
    file {
        path => "/home/vagrant/elk-workshop/analyzed.log"
    }
}

git checkout 06-output-to-file
mouth feed -f ApacheAccessEx -t test_amqp -m 1000 -g 0.001 -c resources/feeder_config.py


Output logs to additional places

  • files
  • Pub/sub
  • Hipchat, IRC
  • Nagios/Zabbix
  • statsd
  • Librato/datadog



A few words about clustering

Final word

What is a log really?


Any (timestamped?) event stream

Thanks for participating!

Where to go next

ELK Workshop

By Nir Cohen

ELK Workshop

  • 2,547