LOGSTASH






Aurélien Rougemont
Nicolas Szalay

logstash facts




Logstash is open source (Apache 2.0. license)


Logstash is distributed as a jar


Logstash is written in (j)ruby


Unix pipes on steroids




Inputs | Codecs | Filters | Outputs

Inputs |


about 30 input plugins :

  • tcp
  • udp
  • syslog
  • amqp
  • file
  • redis
  • [...]



| CODECS |



more and more codecs :
  • graphite
  • json
  • msgpack
  • multiline
  • netflow
  • plain
  • rubydebug
  • [...]

| FILTERS |


about forty filters

  • date
  • grok
  • geoip
  • useragent
  • mutate
  • noop
  • [...]

| outputS



last but not least fifty output plugins :

  • es
  • redis
  • amqp
  • syslog
  • riemann
  • nagios
  • [...]

A log is ...





an event.







an event is ...



 EVENT = [ DATETIME ] + [ DATA ] 
or
[ DATETIME ] + [ STRUCTURED DATA ] 

Use standards datetime formats such as iso8601

 2013-12-01T23:28:45.000Z

GROK




is a regexp-like for dummies engine


logstash embeds over 120 predefined grok patterns

grok syntax


55.3.244.1 GET /index.html 15824 0.043
logstash.conf should contain
filter {
grok {
match => [ "message", "%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}" ]
}
}

and produces
{ 
"client" => "55.3.244.1",
"method" => "GET",
"request" => "/index.html",
"bytes" => 15824,
"duration" => 0.043,
}




real life setups

GANDI

fotolia




configuration examples




Syslog |

input { 
syslog {
port => 1337
type => "syslog"
tags => [ "global" ]
}
}

filter {
noop {
add_field => [ "lsprocessed" , "eventworker1" ]
}
}

output {
stdout { debug => true codec => "json" }
}

SYSLOG |

Dec  1 23:31:48 thrain su[5610]: FAILED su for root by beorn
logstash(SYSLOG)
 
{
"message" => "FAILED su for root by beorn",
"@timestamp" => "2013-12-01T22:31:48.000Z",
"@version" => "1",
"type" => "syslog",
"tags" => [
[0] "global"
],
"host" => "127.0.0.1",
"priority" => 13,
"timestamp" => "Dec 1 23:31:48",
"logsource" => "thrain",
"program" => "su",
"pid" => "5610",
"severity" => 5,
"facility" => 1,
"facility_label" => "user-level",
"severity_label" => "Notice",
"lsprocessed" => "eventworker1"
}



APACHE | logger

input { 
syslog {
port => 1337
type => "syslog"
tags => [ "global" ]
}
}

filter {
noop {
add_field => [ "lsprocessed" , "eventworker1" ]
}
}

output {
stdout { debug => true codec => "json" }
}

APACHE | logger

Dec  1 23:48:15 thrain sysadmin5: 127.0.0.1 - - [01/Dec/2013:23:48:15 +0100] "GET / HTTP/1.1" 200 482 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Firefox/24.0 Iceweasel/24.0"
 
logtsash(APACHE |logger)
{
"message" => "127.0.0.1 - - [01/Dec/2013:23:48:15 +0100] \"GET / HTTP/1.1\" 200 482 \"-\" \"Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Firefox/24.0 Iceweasel/24.0\"",
"@timestamp" => "2013-12-01T22:48:15.000Z",
"@version" => "1",
"type" => "syslog",
"tags" => [
[0] "global"
],
"host" => "127.0.0.1",
"priority" => 13,
"timestamp" => "Dec 1 23:48:15",
"logsource" => "thrain",
"program" => "sysadmin5",
"severity" => 5,
"facility" => 1,
"facility_label" => "user-level",
"severity_label" => "Notice",
"lsprocessed" => "eventworker1"
}

apache | JSON






 LogFormat "{ \
\"@timestamp\": \"%{%Y-%m-%dT%H:%M:%S%z}t\", \
\"@version\": \"1\", \
\"clientip\": \"%a\", \
\"duration\": %D, \
\"status\": %>s, \
\"message\": \"%U%q\", \
\"urlpath\": \"%U\", \
\"urlquery\": \"%q\", \
\"bytes\": %B, \
\"method\": \"%m\", \
\"referer\": \"%{Referer}i\", \
\"useragent\": \"%{User-agent}i\", \
\"platform\": \"website\", \
\"role\": \"frontend\", \
\"environment\": \"prod\", \
\"vhost\": \"sysadmin5.binaries.fr\" }" logstash_json


apache | JSON

input { 
syslog {
port => 1337
type => "syslog"
tags => [ "global" ]
}
}
filter {
noop {
add_field => [ "lsprocessed" , "eventworker1" ]
}
json {
source => "message"
}
}
output {
stdout { debug => true codec => "json" }
}


apache | json | logger

 Dec  2 00:12:02 thrain sysadmin5: {             "@timestamp": "2013-12-02T00:12:02+0100",             "@version": "1",             "clientip": "127.0.0.1",             "duration": 1774,             "status": 200,             "message": "/index.html",             "urlpath": "/index.html",             "urlquery": "",             "bytes": 146,             "method": "GET",             "referer": "-",             "useragent": "Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Firefox/24.0 Iceweasel/24.0",             "platform": "website",             "role": "frontend",             "environment": "prod",             "vhost": "sysadmin5.binaries.fr" }

logtsash( apache | json|logger)
{
"message" => "/index.html",
"@timestamp" => "2013-12-01T23:12:02.000Z",
"@version" => "1",
"type" => "syslog",
"tags" => [
[0] "global"
],
"host" => "127.0.0.1",
"priority" => 13,
"timestamp" => "Dec 2 00:12:02",
"logsource" => "thrain",
"program" => "sysadmin5",
"severity" => 5,
"facility" => 1,
"facility_label" => "user-level",
"severity_label" => "Notice",
"lsprocessed" => "eventworker1",
"clientip" => "127.0.0.1",
"duration" => 1774,
"status" => 200,
"urlpath" => "/index.html",
"urlquery" => "",
"bytes" => 146,
"method" => "GET",
"referer" => "-",
"useragent" => "Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Firefox/24.0 Iceweasel/24.0",
"platform" => "website",
"role" => "frontend",
"environment" => "prod",
"vhost" => "sysadmin5.binaries.fr"
}

apache | json | logger

logtsash++( apache | json|logger)
{
"message" => "/index.html",
"@timestamp" => "2013-12-01T23:12:02.000Z",
"@version" => "1",
"type" => "syslog",
"tags" => [
[0] "global"
],
"host" => "127.0.0.1",
"priority" => 13,
"timestamp" => "Dec 2 00:12:02",
"logsource" => "thrain",
"program" => "sysadmin5",
"severity" => 5,
"facility" => 1,
"facility_label" => "user-level",
"severity_label" => "Notice",
"lsprocessed" => "eventworker1",
"clientip" => "127.0.0.1",
"duration" => 1774,
"status" => 200,
"urlpath" => "/index.html",
"urlquery" => "",
"bytes" => 146,
"method" => "GET",
"referer" => "-",
"useragent" => "Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Firefox/24.0 Iceweasel/24.0",
"platform" => "website",
"role" => "frontend",
"environment" => "prod",
"vhost" => "sysadmin5.binaries.fr",
"geoip" => {
"ip" => "127.0.0.1",
"country_code" => 0,
"country_code2" => "--",
"country_code3" => "--",
"country_name" => "N/A",
"continent_code" => "--"
},
"ua" => {
"name" => "Iceweasel",
"os" => "Linux",
"os_name" => "Linux",
"device" => "Other",
"major" => "24",
"minor" => "0"
}
}


apache | json | fleece

CustomLog "|| /usr/bin/fleece --host logstash --port 1338" logstash_json

ErrorLog "|| /usr/bin/fleece --host logstash --port 1339 --field vhost=sysadmin5.binaries.fr --field role=frontend --field environment=prod --field platform=webmail"

Fleece is a non blocking lightweight udp jsonifyer

Data mining




The most natural indexed storage engine for logstash is Elasticsearch


Kibana


is an AJAX web interface to ES



is an easy way to build and share dashboards



queries look like :

 message: "/index.htm" AND tags: "apache" AND tags: "fleece"

KIBANA


a few numbers


Gandi

2000-3000 events/s steady

120 000 000 events / day

200 ms / day of search


Fotolia

1000 events/s steady

90 gB / day of data indexed


Feedbacks


  • KISS
  • start with capacity planning
  • Logstash has a perfectible documentation
  • read the code linked from the documentation
  • secure your elasticsearch cluster
  • understand how elasticsearch works (indices, mapping...)
  • use grok the right way
  • make consistent choices
  • tune the jvm
  • tune the IP stack ( especially net_backlog  )

Debug cmdline


 /usr/bin/java \
-Dcom.sun.management.jmxremote.port=7199 \
-Dcom.sun.management.jmxremote.ssl=false \
-Dcom.sun.management.jmxremote.authenticate=false \
-Xmx1024m \
-Djava.io.tmpdir=/var/lib/logstash/ \
-jar /usr/share/logstash/logstash-1.2.2-flatjar.jar agent \
-f /etc/logstash/ \
--log /var/log/logstash/logstash.log \
--filterworkers 8 \
-vv


questions ?




really ?





Thanks for your attention





nico@fotolia.com
beorn@binaries.fr

logstash

By Aurélien ROUGEMONT

logstash

Presentation made for Sysadmin #5 conference in IRCAM paris

  • 14,645