Why you are killing people and companies and a possible path to a safer world

Thomas Depierre

@DianaO

Diana Olympos

 

Twitter :

Github :

How this talk work

  • 20 mins for me

  • 10 mins for YOU

  • Just ask me and switch

  • If you do not use your time, i will

  • Anyone can answer or say anything.

We build Complex Systems
Yes even "Hello World"

Complex systems are systems whose behaviour is intrinsically difficult to model due to the dependencies, relationships, or interactions between their parts or between a given system and its environment.

Cross Field Knowledge

Complex Systems

Human Factors and Ergonomics

The Knight Capital Group Accident

In laymen’s terms, Knight Capital Group realised a $460 million loss in 45-minutes. Remember, Knight only has $365 million in cash and equivalents. In 45-minutes Knight went from being the largest trader in US equities and a major market maker in the NYSE and NASDAQ to bankrupt.

Complex Systems are SocioTechnical

  • Live in an environment
  • Operators
  • Users
  • Writers/Engineers
  • Humans are part of the system
  • A Complex System is not a technical only entity

Your "Hello world !" is incredibly complex too.

It is more visible in distributed systems

  • More errors case (network fallacies)

  • More load : rare even happens more

  • More impact, stakes

This is a systemic problem.

 

All Software is concerned.

Yes your small tool too

Now a bit of System Thinking

Human Factors and Ergonomics use the same terminology and are as applicable.

 

There are reasons for that.

Reliability

  • Doing what is specified
  • Repeatedly
  • TDD, Type System, Proof, etc
  • From the ground up

Safety

  • Avoiding Loss Events
  • Financial, Assets, Human life
  • Systemic property
  • Not a technical problem only !

Safe >>>>>>>> Reliable

There is not only anecdotal but some hard data to support the hypothesis that safety problems in software stem from requirements flaws and not coding errors.

Nancy Leveson, Engineering a safer world

https://mitpress.mit.edu/books/engineering-safer-world​

Solutions ?

  • I do not have a perfect solution to offer.

  • But i can offer the same path that other engineering field use for Complex SocioTechnical systems

  • Accept that failure will happen
  • Bulkhead
  • Deal with failure at a higher level (Supervise)
  • Stop doing Root Cause Analysis
  • Blameless Postmortem
  • Debugging as first class
  • Visibility as first class citizen
  • Stop lying to ourselves about things like "run on my machine" or "that is not my problem"
  • Ergonomics. Please
  • Distributed Systems already try to deal with that. Most problems happens at the interfaces.
  • Sometimes.
  • But in general we do not handle it.
  • Security is just an aspect of Safety, and the same principle apply there too
  • Maybe a way to introduce more of that in your company or team

What to read for more ?