Watch Dog
__ /\/'-, ,--''''' /" ____,'. ) \___ '"""""------'"""`-----'
Problem
Service health
Number
of services
Environment
Unaware of problems
Solution
- Monitor services every minute, every day.
- Alert on-call engineers when any service ! in desired state.
- Escalate unacknowledged alerts.
- Automate weekly on-call rotation
- Identify patterns of outages.
- Provide visibility to concerned parties.
Know when something breaks
Instant Alerts!
- Avoids nagging about same issue.
- Notifies only change of state.
Reports to prove your point
-
Graphical reporting for a clean-cut overview.
-
Keep track of your service's performance
-
View historical data for your service in easy to understand reports.
-
Identify repeating patterns of outages over time.
-
Make data-driven decisions on improvements.
Ensure your alerts are always answered
-
On-call schedule based alerting so everyone has time off.
-
Configure 24 hour support model.
-
Alert based on time of day. Alert India resources during India day time and US based resources during day time.
-
Rotate on-call engineer on weekly or bi-weekly basis.
-
Assign a backup on-call engineer.
Escalation and Rotation Policies
-
Escalation rules, protects you against accidentally overlooked incidents.
-
When an incident is triggered, watchdog first tries to contact the on-call engineer.
-
Automatically escalate the alert to the back-up engineer, manager and so on.
-
Specify as many escalation levels as you need, and the escalation delay is user adjustable.
Don’t need to worry about accidentally missing a critical alert.
Technology Stack
Contributors
Aakash Thakkar
Amar Mattey
Armando Soto
Harpreet Hira
Houston Harris
Sagar Dutta
Sonali Kalgaonkar
and
Help Us Move this Forward
Watch Dog
By Harpreet Hira
Watch Dog
- 687