[2015/10/2] Sensu Deep Talks #2
Yu Yamanaka (@yuurelx)
DevOps engineer at peroli, Inc.
About 15-80 alerts
Yes, it is not surprising number.
(of course, we are making efforts to reduce alerts!)
I got an urgent call,
but cannot do
anything ...
(Today's Main Topic)
Subtopics
Subtopics
Our company's project teams
?
?
Ad
Platform
MERY
Our system alerts flow
Subtopics
A check definition are shared by each projects
## /etc/sensu/conf.d/checks.conf
...
"unicorn": {
"command": "/etc/sensu/plugins/check-procs.rb -p 'unicorn master' -C 1",
"interval": 60,
"occurrences": 4,
"subscribers": [
"mery-web",
"adpf-admin"
],
"handlers": [
"pagerduty",
"force_restart_unicorn"
]
},
...
handlers per a check
subscribers per a check
A check cannot connect those resources
subscriber
subscriber
subscriber
handler
handler
handler
check
Cluster A
Cluster B
Cluster C
Team A
Team B
Team C
handler
?
You can resolve that by duplicating checks
subscriber
subscriber
subscriber
handler
handler
handler
check
Cluster A
Cluster B
Cluster C
Team A
Team B
Team C
handler
check
check
This is not DRY...
Subtopics
Custom definition attributes
Solution
subscriber
subscriber
subscriber
handler
check
Cluster A
Cluster B
Cluster C
Team A,B,C's API key
attrs
attrs
attrs
config
Fetch the key by attrs
Client side config
## /etc/sensu/client/config.json
...
"client": {
"name": "<%= node[:ec2][:instance_id] %>",
"address": "<%= node[:machinename] %>",
<%= %Q("service": "#{node[:service]}",) %> # e.g. mery, adpf
<%= %Q("environment": "#{node[:environment]}",) %> # e.g. staging, production
"keepalive": {
"thresholds": {
"warning": 40,
...
(deploy by tool as in Chef)
Server side config
## /etc/sensu/conf.d/handlers.json
...
"pagerduty": {
"mery": {
"production": { "api_key": "xxxxx" },
"staging": { "api_key": "yyyyy" }
},
"adpf": {
"production": { "api_key": "aaaaa" },
"staging": { "api_key": "bbbbb" }
}
}
...
Diff of PagerDuty plugin
## diff of /etc/sensu/handlers/pagerduty.rb
...
def handle
if @event['check']['pager_team']
api_key = settings['pagerduty'][@event['check']['pager_team']]['api_key']
+ elsif @event['client']['service'] && @event['client']['environment']
+ api_key = settings['pagerduty'][@event['client']['service']][@event['client']['environment']]['api_key']
else
api_key = settings['pagerduty']['api_key']
end
...
Create services on PagerDuty
Finally,
We have not been waked up by another team's alerts!!
alerts from staging environments too
Make alerts properly.
Yu Yamanaka (@yuurelx)
DevOps engineer at peroli, Inc.