How we kill people:
What if we brought Safety into Software
We build Dynamic Systems
Law of Stretched Systems
every system is stretched to operate at its capacity; as soon as there is some improvement, for example in the form of new technology, it will be exploited to achieve a new intensity and tempo of activity.
Larry Hirschhorn
The Knight Capital Group Accident
In laymen’s terms, Knight Capital Group realised a $460 million loss in 45-minutes. Remember, Knight only has $365 million in cash and equivalents. In 45-minutes Knight went from being the largest trader in US equities and a major market maker in the NYSE and NASDAQ to bankrupt.
We build Complex Systems
Complex systems are systems whose behaviour is intrinsically difficult to model due to the dependencies, relationships, or interactions between their parts or between a given system and its environment.
fault tolerance isn’t composable
Peter Alvaro
Your "Hello world !" is incredibly complex too.
This is a systemic problem.
All Software is concerned.
Yes your small tool too
Yes your "non critical software" too
Fun >>>>>>>> Business Critical >>>>>> Safety Critical
Reliability
- Doing what is specified
- Repeatedly
- TDD, Type System, Proof, etc
Safety
- Avoiding Loss Events
- Financial, Assets, Human life
- Systemic property
- Not a technical problem only !
Safe >>>>>>>> Reliable
Software Systems are SocioTechnical
- Live in an environment
- Operators
- Users
- Writers/Engineers
- Humans are part of the system
- A Software System is not a technical only entity
Solutions ?
-
I do not have a perfect solution to offer.
-
But i can offer the same path that other engineering field use for Complex SocioTechnical systems
Cross Field Knowledge
Safety II
Safety Differently
Resilience Engineering
Cognitive Systems Engineering
Humans create safety continuously through normal work
How can we support them
Attitude
Learning
Tooling
Accept that failure will happen
Stop lying to ourselves about things like "run on my machine" or "that is not my problem"
Attitude
Stop doing Root Cause Analysis
Stop doing 5 Whys
Learning Reviews
Learning
Observability as first class citizen
Ergonomics. Please
Operable software
Tooling
"Ten challenges for making automation a 'team player' in joint human-agent activity"
Where to go for more ?
- https://github.com/lorin/resilience-engineering/blob/master/intro.md
- http://resiliencepapers.club
- Velocity 2013: Johan Bergström "What, Where And When Is Risk In System Design?" https://www.youtube.com/watch?v=Pb_zYs8G6Co
-
Velocity NY 2013: Richard Cook, "Resilience In Complex Adaptive Systems" https://www.youtube.com/watch?v=PGLYEDpNu60
-
The Field Guide to Understanding Human Error, Sidney Dekker
- Postmortems. A lot of them
- ...
Why you are killing people and a possible path to a safer world
By di4nao
Why you are killing people and a possible path to a safer world
Yes, software does kill people and companies. We will then look at System Thinking and Human Factors as a possible solution to build safer systems.
- 810