Portal Webserver ELK Logging Architecture
Central place for portal-based logs
ELK stack implementation to ship logs from all of our portal webservers to one central location
No more searching each individual server in a cluster for a needle in a haystack!
See the actual input to phoenix APIs!
ELK Stack
Stands for:
Elasticsearch
Logstash
Kibana
Currently:
We have the E and L implemented. K is a goal of ours
Logstash and Elasticsearch organize and store the data for retrieval
Kibana is just a user agent, a frontend to access Elasticsearch with
Logging and Filebeat
Filebeat is another service in the ELK stack that doesn't have an initial
It
helps
but it isn't necessary for the stack
Still, it's the easiest solution
Filebeat monitors logs that it's configured to monitor and sends the diff of any updates to Logstash
In summary
Our software (e.g. the Phoenix API) creates some log in a plaintext file, readable on disk by Filebeat
Filebeat monitors these logs and sends updates to Logstash
Logstash transforms into an Elasticsearch-friendly format and sends logs to Elasticsearch
Logs are then accessible via a REST API on the Elasticsearch endpoint
More info
Many portal webservers
web01-web08.[cluster].[datacenter].synacor.com
iterate through each number and cluster
Each of them running Filebeat locally
Shipping logs to one central Logstash location
Which then sends logs to one central Elasticsearch location
So what can you do?
Currently, we log all Phoenix calls this way
Instead of grepping through 8 servers to find the exact log entry you're looking for, search in one spot on ElasticSearch!
Currently only available via command line and with access to credentials
In the future:
Log other portal-based output
Log other services adjacent to the portal
Improve logging format
Actually stand up an instance of Kibana so we don't need to use command line queries anymore
Elasticsearch API Credentials
The DBA team wants us to keep them secret and secure
Do not transmit them over email or any on-the-record chat service (like HipChat)
A few people have the username/password for PS' ES user
This is why I'd rather get Kibana stood up sooner rather than later -- more secure that way
Limitation:
Hard disk space
Currently we're storing 14 days' worth of logs and deleting any older than that. A cron job runs nightly to delete anything older than that.
If we add more services, need to be mindful of how much space usage each service consumes
Might just be able to ask for more storage as we need it, though
Practical example:
TSS ticket for the CI on-call rotation
They ask when a user was deleted and why
Now we have the ability to look back in time and extract what the parameters to the API call were
Given username to search for,
TODO: insert command line query to do this here
Documentation:
SysArch review:
https://wiki.corp.synacor.com:8443/display/SysArch/ELK+Phoenix+Logging+SysArch+Review
Runbook:
https://wiki.corp.synacor.com:8443/display/ClientEngineering/ELK+Runbook
JIRA Epic:
https://jira.corp.synacor.com/browse/CE-525
Resume presentation
Portal Webserver ELK Logging Architecture
Made with Slides.com
BESbswy
BESbswy
BESbswy
BESbswy