Developer at marmelab
@Kmaschta
kevin@marmelab.com
kmaschta.me
Errors Budget
An SRE Principle
Deployment
Reliability
Monitoring
Benjamin Treynor Sloss, VP of Engineering, Google
Source: https://irishtugofwar.com/gallery-2/
Source: http://jonathan-marks.com/complexity-cooperation/
Weather API
Availability
Database
Data Consistency
Stock Exchange App.
Response Time
Unrealistic & unreachable
Do more harm than good
99% ("two nines"): 3.65 days of downtime
99.9% ("three nines"): 8.77 hours of downtime
99.99% ("four nines"): 52.60 minutes of downtime
99.999% ("five nines"): 5.26 minutes of downtime
Focus on unplanned downtime
if (budget > 0 && !friday)
if (budget <= 0 || friday)
1. Get To Know What Really Matters For Your Users
2. Measure it (SLI)
3. Choose A Realistic Objective (SLO)
4. Align Team Behavior With The Errors Budget
5. Iterate and goto 1
- Risky
- Not Risky
Focus On Stability
Focus On Velocity
https://github.com/Kmaschta/monitoring-example
https://slides.com/kmaschta/errors-budget-sre/
By Kevin Maschtaler
A short introduction to the error budget method, or how to reconcile devs and sysadmins thanks to SRE principles.
I write code | @tint