SRE Meetup, Paris 10 Juillet 2018
Cloud Architect
Automation enthusiast
Conway's Law
Any organization that designs a system (defined broadly) will produce a design whose structure is a copy of the organization's communication structure.
Forget Conway’s Law, distributed systems at scale follow Murphy’s Law: “Anything that can go wrong, will go wrong.”
Conway's Law
Any organization that designs a system (defined broadly) will produce a design whose structure is a copy of the organization's communication structure.
"If you can't monitor a service, you don't know what's happening, and if you're blind to what's happening, you can't be reliable" SRE, Google
The physical servers (owned by the company or rented from cloud providers)
Databases (dedicated and/or shared)
The operating system
Resource isolation and abstraction
Configuration management
Host-level monitoring
Host-level logging
Internal service outages
External (third-party) service outages
Internal library failures
External (third-party) library failures
A dependency failing to meet its SLA
API endpoint deprecation
API endpoint decommissioning
Microservice deprecation
Microservice decommissioning
Interface or endpoint deprecation
Timeouts to a downstream service
Timeouts to an external dependency
Incomplete code reviews
Poor architecture and design
Lack of proper unit and integration tests
Bad deployments
Lack of proper monitoring
Improper error and exception handling
Database failure
Scalability limitations