Watch Dog

                __
               /\/'-,
       ,--'''''   /"
 ____,'.  )       \___
'"""""------'"""`-----'

Problem

Service health

Number

of services

 

Environment

 

Unaware of problems

Solution

  • Monitor services every minute, every day.
  • Alert on-call engineers when any service ! in desired state.
  • Escalate unacknowledged alerts.
  • Automate weekly on-call rotation
  • Identify patterns of outages.
  • Provide visibility to concerned parties.

Know when something breaks

 

Instant Alerts!

  • Avoids nagging about same issue.
  • Notifies only change of state.

Reports to prove your point

  • Graphical reporting for a clean-cut overview.

  • Keep track of your service's performance

  • View historical data for your service in easy to understand reports.

  • Identify repeating patterns of outages over time. 

  • Make data-driven decisions on improvements.

Ensure your alerts are always answered

  • On-call schedule based alerting so everyone has time off.

  • Configure 24 hour support model

  • Alert based on time of day. Alert India resources during India day time and US based resources during day time. 

  • Rotate on-call engineer on weekly or bi-weekly basis.

  • Assign a backup on-call engineer.

Escalation and Rotation Policies

 

  • Escalation rules, protects you against accidentally overlooked incidents

  • When an incident is triggered, watchdog first tries to contact the on-call engineer

  • Automatically escalate the alert to the back-up engineer, manager and so on. 

  • Specify as many escalation levels as you need, and the escalation delay is user adjustable. 

Don’t need to worry about accidentally missing a critical alert.

Technology Stack

Contributors

Aakash Thakkar

Amar Mattey

Armando Soto

Harpreet Hira

Houston Harris

Sagar Dutta

Sonali Kalgaonkar

and

 

Help Us Move this Forward

Watch Dog

By Harpreet Hira