Fashion Shows is our pilot Engagement, where we will define a process through which other product teams may hand in their products to SRE when feature development is discontinued
1. Architecture Deep Dive
Production Readiness Checklist process
Away day
Running through troubleshooting process with team
Understanding likely failure scenarios
2. Separation of services to support
Website (blue-steel/blue-steel-api)
Editorial (front-row/polaroid)
3. Adressing Observability Gaps
Blue-steel unformatted logging
Better visibility of deployments
Measuring time for photo upload in Instant shows
4. SLIs/SLOs
Asking Product Managers
Suggested SLOs based on Availability, Latency and Photo upload time
Tooling for collection better latency metrics
Working on visibility over backends from Fastly metrics/logs
5. Going on call
Observing
Quiet Fashion Shows season
6. Planning Game day
"Inject latency into blue steel" attack
Black hole for Fashion Shows API for one/more pod/container
Learnings
Our pilot engagement has been great for us to understand what an engagement with a dev/product team looks like, will serve as the basis for our standardized process
Encourages us to make more re-usable Observability improvements to benefit other teams, such as:
- Graceful shutdown logging library
- Deployment events on Circle CI