Grzesiek Miklaszewski
free source of information, set up by default
debugging
business analitics
statistics
performance and benchmarking
trends
big data
security
Started GET "/" for 127.0.0.1 at 2012-03-10 14:28:14 +0100
Processing by HomeController#index as HTML
Rendered text template within layouts/application (0.0ms)
Rendered layouts/_assets.html.erb (2.0ms)
Rendered layouts/_top.html.erb (2.6ms)
Rendered layouts/_about.html.erb (0.3ms)
Rendered layouts/_google_analytics.html.erb (0.4ms)
Completed 200 OK in 79ms (Views: 78.8ms | ActiveRecord: 0.0ms)method=GET path=/jobs/833552.json format=json controller=jobs
action=show status=200 duration=58.33 view=40.43 db=15.26Using lograge
Rails default
Fluentd is an open source data collector for unified logging layer
filtering/processing
open source gem
written in Ruby and C
plugin architecture
300+ plugins available as gems
JSON objects
memory and file-based buffering
Where an event comes from. For message routing.
Tag
Time
When an event happens. Epoch time.
Record
Actual log content. JSON object.
apache
apache.info
apache.info.mobile
rails.info.mobile
# Receive events from 24224/tcp
# This is used by log forwarding and the fluent-cat command
<source>
@type forward
port 24224
</source>
# http://this.host:9880/myapp.access?json={"event":"data"}
<source>
@type http
port 9880
</source>
# Match events tagged with "myapp.access" and
# store them to /var/log/fluent/access.%Y-%m-%d
# Of course, you can control how you partition your data
# with the time_slice_format option.
<match myapp.access>
@type file
path /var/log/fluent/access
</match># http://this.host:9880/myapp.access?json={"event":"data"}
<source>
@type http
port 9880
</source>
<filter myapp.access>
@type record_transformer
<record>
host_param "#{Socket.gethostname}"
</record>
</filter>
<match myapp.access>
@type file
path /var/log/fluent/access
</match><source>
@type forward
port 24224
</source>
<match **>
@type stdout
</match>Fluentd config
gem 'act-fluent-logger-rails'
gem 'lograge'Gems
config.log_level = :info
config.logger = ActFluentLoggerRails::Logger.new
config.lograge.enabled = true
config.lograge.formatter = Lograge::Formatters::Json.newconfig/environments/production.rb
production:
fluent_host: '127.0.0.1'
fluent_port: 24224
tag: 'foo'
messages_type: 'string'config/fluent-logger.yml
Fluentd output:
2014-07-07 19:39:01 +0000 foo: {"messages":"{\"method\":\"GET\",
\"path\":\"/\",\"format\":\"*/*\",\"controller\":\"static_pages\",
\"action\":\"home\",\"status\":200,\"duration\":550.14,
\"view\":462.89,\"db\":1.2}","level":"INFO"}2014-07-07 19:39:01 +0000 rails: {"method":"GET","path":"/",
"format":"*/*","controller":"static_pages","action":"home",
"status":200,"duration":550.14,"view":462.89,"db":1.2}After using fluent-plugin-parser:
<source>
@type tail
format nginx
tag nginx.access
path /var/log/nginx/access.log
</source>
<match nginx.access>
@type stdout
</match>Fluentd config
apache2
apache_error
nginx
csv/tsv
ltsv (Labeled Tab-Separated Value)
JSON
multiline
custom
module Fluent
class SomeInput < Input
Fluent::Plugin.register_input('NAME', self)
config_param :port, :integer, :default => 8888
def configure(conf)
super
@port = conf['port']
...
end
def start
super
...
end
def shutdown
...
end
end
endInput
Parser
Filter
Formatter
Output
Buffer
# Configuration
<match app.message>
@type rewrite_tag_filter
rewriterule1 message ^\[(\w+)\] $1.${tag}
</match>+----------------------------------------+
| original record |
|----------------------------------------|
| app.message {"message":"[info]: ..."} |
| app.message {"message":"[warn]: ..."} |
| app.message {"message":"[crit]: ..."} |
| app.message {"message":"[alert]: ..."} |
+----------------------------------------++----------------------------------------------+
| rewrited tag record |
|----------------------------------------------|
| info.app.message {"message":"[info]: ..."} |
| warn.app.message {"message":"[warn]: ..."} |
| crit.app.message {"message":"[crit]: ..."} |
| alert.app.message {"message":"[alert]: ..."} |
+----------------------------------------------+Google Compute Engine
Docker
BigQuery API
Google Compute Engine
n1-highcpu-4 instance, 4 cores, 3.6 GB RAM
https, 10 events in JSON batch
750 requests/sec
4 Puma processes, 4 threads each
7500 events/sec
Can we squeeze more from one instance?
More cores?
More Puma processes?
Engine is a bottleneck, uses single core
fluent-plugin-multiprocess
Other solutions?
Runs multiple fluentd processes
BigQuery is a RESTful web service that enables interactive analysis of massively large datasets working in conjunction with Google Storage. It is an Infrastructure as a Service (IaaS) that may be used complementarily with MapReduce.
Super-fast SQL queries
Google App Engine
REST API
SQL-style query syntax
JSON schemas
Nested fields
Datasets and tables
[
{
"name": "fullName",
"type": "string",
"mode": "required"
},
{
"name": "age",
"type": "integer",
"mode": "nullable"
},
{ "name": "phoneNumber",
"type": "record",
"mode": "nullable",
"fields": [
{
"name": "areaCode",
"type": "integer",
"mode": "nullable"
},
{
"name": "number",
"type": "integer",
"mode": "nullable"
}
]
}
]Store JSON as string
SELECT JSON_EXTRACT('{"a": 1, "b": [4, 5]}', '$.b') AS str;[4,5]Use EXTRACT_JSON
<match **>
@type bigquery
auth_method compute_engine
method insert # stream events
project "logs-beta"
dataset log
tables events
fetch_schema true
</match>
You can use R for complex reports
Centralised App Logging
IoT Data Logger
Data Archiving
Data Analytics
Google uses it!
Apache Flume
Logstash
Flowgger (Rust!)
Heka
Fluentd-forwarder (written in Go!)
Rsyslog
NXLOG
Graylog2
Google Pub/Sub
Google Dataflow
Google BigQuery
Fluentd
http://www.slideshare.net/treasure-data/the-basics-of-fluentd-35681111
https://github.com/actindi/act-fluent-logger-rails
https://github.com/fluent/fluentd-docker-image
http://jasonwilder.com/blog/2013/07/16/centralized-logging-architecture/
https://en.wikipedia.org/wiki/Extract,_transform,_load
https://cloud.google.com/bigquery/loading-data-into-bigquery
https://cloud.google.com/solutions/real-time/fluentd-bigquery
We are hiring!
RoR
QA
Scrum Master
gmiklaszewski@gmail.com