Why you are killing people and companies and a possible path to a safer world
Thomas Depierre
@DianaO
Diana Olympos
Twitter :
Github :
How this talk work
-
20 mins for me
-
10 mins for YOU
-
Just ask me and switch
-
If you do not use your time, i will
-
Anyone can answer or say anything.
We build Complex Systems
Yes even "Hello World"
Complex systems are systems whose behaviour is intrinsically difficult to model due to the dependencies, relationships, or interactions between their parts or between a given system and its environment.
Cross Field Knowledge
Complex Systems
Human Factors and Ergonomics
The Knight Capital Group Accident
In laymen’s terms, Knight Capital Group realised a $460 million loss in 45-minutes. Remember, Knight only has $365 million in cash and equivalents. In 45-minutes Knight went from being the largest trader in US equities and a major market maker in the NYSE and NASDAQ to bankrupt.
Complex Systems are SocioTechnical
- Live in an environment
- Operators
- Users
- Writers/Engineers
- Humans are part of the system
- A Complex System is not a technical only entity
Your "Hello world !" is incredibly complex too.
It is more visible in distributed systems
-
More errors case (network fallacies)
-
More load : rare even happens more
-
More impact, stakes
This is a systemic problem.
All Software is concerned.
Yes your small tool too
Now a bit of System Thinking
Human Factors and Ergonomics use the same terminology and are as applicable.
There are reasons for that.
Reliability
- Doing what is specified
- Repeatedly
- TDD, Type System, Proof, etc
- From the ground up
Safety
- Avoiding Loss Events
- Financial, Assets, Human life
- Systemic property
- Not a technical problem only !
Safe >>>>>>>> Reliable
There is not only anecdotal but some hard data to support the hypothesis that safety problems in software stem from requirements flaws and not coding errors.
Nancy Leveson, Engineering a safer world
Solutions ?
-
I do not have a perfect solution to offer.
-
But i can offer the same path that other engineering field use for Complex SocioTechnical systems
- Accept that failure will happen
- Bulkhead
- Deal with failure at a higher level (Supervise)
- Stop doing Root Cause Analysis
- Blameless Postmortem
- Debugging as first class
- Visibility as first class citizen
- Stop lying to ourselves about things like "run on my machine" or "that is not my problem"
- Ergonomics. Please
- Distributed Systems already try to deal with that. Most problems happens at the interfaces.
- Sometimes.
- But in general we do not handle it.
- Security is just an aspect of Safety, and the same principle apply there too
- Maybe a way to introduce more of that in your company or team
What to read for more ?
- How Complex Systems Fail, Richard Cook
- John Allspaw's blog (The danger of the 5 Whys)
- The Field Guide to Understanding Human Error, Sidney Dekker
- Engineering a Safer World, Nancy Leveson
- Postmortems. A lot of them
- ...
Why you are killing people and companies and a possible path to a safer world
By di4nao
Why you are killing people and companies and a possible path to a safer world
Yes, software does kill people and companies. We will then look at System Thinking and Human Factors as a possible solution to build safer systems.
- 875