Developer at marmelab
Errors Budget
An SRE Principle
Deployment
Reliability
Monitoring
Benjamin Treynor Sloss, VP of Engineering, Google
Weather API
Availability
Database
Data Consistency
Stock Exchange App.
Response Time
Unrealistic & unreachable
Do more harm than good
99% ("two nines"): 3.65 days of downtime
99.9% ("three nines"): 8.77 hours of downtime
99.99% ("four nines"): 52.60 minutes of downtime
99.999% ("five nines"): 5.26 minutes of downtime
Focus on unplanned downtime
if (budget > 0 && !friday)
if (budget <= 0 || friday)
1. Get To Know What Really Matters For Your Users
2. Measure it (SLI)
3. Choose A Realistic Objective (SLO)
4. Align Team Behavior With The Errors Budget
5. Iterate and goto 1
- Risky
- Not Risky
Focus On Stability
Focus On Velocity