Sensu alerts

for multi projects

[2015/10/2] Sensu Deep Talks #2

Yu Yamanaka (@yuurelx)

DevOps engineer at peroli, Inc.

No.1 curation platform for women in Japan !

By the way,

How many alerts do you receive in a week?

In case of MERY

About 15-80 alerts

Yes, it  is not surprising number.

But, since we are using Sensu with PagerDuty,

engineers’ sleeping are hampered by alerts...

That cannot be help to save users' experience

(of course, we are making efforts to reduce alerts!)

By contrast, if you got another team's alerts?

I got an urgent call,

but cannot do

anything ...

The problem must

be resolve !!

(Today's Main Topic)

1. Our teams & alerts flow
2. Why cannot divide alerts
3. How to resolve the issue

Subtopics

1. Our teams & alerts flow
2. Why cannot divide alerts
3. How to resolve the issue

Subtopics

Our company's project teams

?

?

Ad

Platform

MERY

Our system alerts flow

  1. A system failure occurs

  2. Sensu client on the server detects it & reports

  3. Sensu server creates an incident on PagerDuty through the API

  4. PagerDuty notifies that to Slack and calls engineers (not only ops but dev)

1. Our teams & alerts flow
2. Why cannot divide alerts
3. How to resolve the issue

Subtopics

A check definition are shared by each projects

## /etc/sensu/conf.d/checks.conf
...

"unicorn": {
  "command": "/etc/sensu/plugins/check-procs.rb -p 'unicorn master' -C 1",
  "interval": 60,
  "occurrences": 4,
  "subscribers": [
    "mery-web",
    "adpf-admin"
  ],
  "handlers": [
    "pagerduty",
    "force_restart_unicorn"
  ]
},

...

handlers per a check

subscribers per a check

A check cannot connect those resources

subscriber

subscriber

subscriber

handler

handler

handler

check

Cluster A

Cluster B

Cluster C

Team A

Team B

Team C

handler

?

You can resolve that by duplicating checks

subscriber

subscriber

subscriber

handler

handler

handler

check

Cluster A

Cluster B

Cluster C

Team A

Team B

Team C

handler

check

check

This is not DRY...

1. Our teams & alerts flow
2. Why cannot divide alerts
3. How to resolve the issue

Subtopics

Custom definition attributes

Solution

subscriber

subscriber

subscriber

handler

check

Cluster A

Cluster B

Cluster C

Team A,B,C's API key

attrs

attrs

attrs

config

Fetch the key by attrs

Client side config

## /etc/sensu/client/config.json
...

"client": {
  "name": "<%= node[:ec2][:instance_id] %>",
  "address": "<%= node[:machinename] %>",
  <%= %Q("service": "#{node[:service]}",) %>  # e.g. mery, adpf
  <%= %Q("environment": "#{node[:environment]}",) %>  # e.g. staging, production
  "keepalive": {
    "thresholds": {
      "warning": 40,

...

(deploy by tool as in Chef)

Server side config

## /etc/sensu/conf.d/handlers.json
...

  "pagerduty": {
    "mery": {
      "production": { "api_key": "xxxxx" },
      "staging": { "api_key": "yyyyy" }
    },
    "adpf": {
      "production": { "api_key": "aaaaa" },
      "staging": { "api_key": "bbbbb" }
    }
  }
...

Diff of PagerDuty plugin

## diff of /etc/sensu/handlers/pagerduty.rb
...

   def handle
     if @event['check']['pager_team']
       api_key = settings['pagerduty'][@event['check']['pager_team']]['api_key']
+    elsif @event['client']['service'] && @event['client']['environment']
+      api_key = settings['pagerduty'][@event['client']['service']][@event['client']['environment']]['api_key']
     else
       api_key = settings['pagerduty']['api_key']
     end
...

Create services on PagerDuty

Finally,

We have not been waked up by another team's alerts!!

alerts from staging environments too

Make alerts properly.

Thank you for your attention!

Yu Yamanaka (@yuurelx)

DevOps engineer at peroli, Inc.

Made with Slides.com