Resilience Engineering: A few useful models for Complex Systems
Thomas Depierre
Safety Science in Complex Domain
Safety II
Safety Differently
Resilience Engineering
Cognitive Systems Engineering
We build Complex Systems
Complex systems are systems whose behaviour is intrinsically difficult to model due to the dependencies, relationships, or interactions between their parts or between a given system and its environment.
fault tolerance isn’t composable
Peter Alvaro
Fun >>>>>>>> Business Critical >>>>>> Safety Critical
Reliability
- Doing what is specified
- Repeatedly
- TDD, Type System, Proof, etc
Safety
- Avoiding Loss Events
- Financial, Assets, Human life
- Systemic property
- Not a technical problem only !
Humans create safety continuously through normal work
The Rasmussen Model
Drift into Failure
We build Dynamic Systems
Law of Stretched Systems
every system is stretched to operate at its capacity; as soon as there is some improvement, for example in the form of new technology, it will be exploited to achieve a new intensity and tempo of activity.
Larry Hirschhorn
The above-the-line/below-the-line framework
Mental Model
A few more models if we have time
- ETTO
- Efficiency-Thoroughness Trade-Off
- if you wait until you have found the perfect solution, the system changed too much
- better go with a partial one for now
- Work-As-Imagined vs Work-As-Done
- Incident is your system telling you where these two mental model diverge
- Incidents as untyped pointers
- Why do things go right ?
- Incidents as unexpected investments/probes/experiments
- you paid the price already, at least get some sense-making out of it
- ...
Where to go for more ?
- https://github.com/lorin/resilience-engineering/blob/master/intro.md
- http://resiliencepapers.club
- Velocity 2013: Johan Bergström "What, Where And When Is Risk In System Design?" https://www.youtube.com/watch?v=Pb_zYs8G6Co
-
Velocity NY 2013: Richard Cook, "Resilience In Complex Adaptive Systems" https://www.youtube.com/watch?v=PGLYEDpNu60
-
The Field Guide to Understanding Human Error, Sidney Dekker
- Learning Reviews. A lot of them
- https://www.learningfromincidents.io/
- https://www.jeli.io/howie-the-post-incident-guide/
- ...
Resilience Engineering Onsite Portugal
By di4nao
Resilience Engineering Onsite Portugal
- 494