Break more things with Chaos Engineering
Karim Alibhai
github.com/karimsa
Richard Ison
github.com/richardison
npm install karimsa
npm install richard
data:image/s3,"s3://crabby-images/10038/10038a5f903b0387b352765dea214a11807fcc50" alt=""
Delivering the art & science of retail execution
We're hiring
fokoretail.com
Requirements
-
Know some Node.js
-
Have Node.js installed (any version is okay) - use nvm
-
Have internet access
-
You're building an application
-
There will be errors
-
There will be unexpected errors
-
The errors are not the user's problem, they're your problem
data:image/s3,"s3://crabby-images/76c23/76c23eed6fd2fd9ad2360e4ae8e34fa59fd8c273" alt=""
data:image/s3,"s3://crabby-images/0362b/0362b1910f455ab72e62d06ffce2fd69708fd0db" alt=""
How do we aim for high availability?
Availability =
Time to Failure
Time to Failure + Time to Recovery
Break more things!
(Ah, okay, that's why it's called that)
-
Define a steady state.
-
Define a testable hypothesis.
-
Introduce real-world events that challenge your steady state.
-
Try to disprove your hypothesis through the events
Principles of Chaos
principlesofchaos.org
Principles of Chaos
principlesofchaos.org
-
Write some code that works.
-
Hope that it will always work.
-
Unleash some chaos.
-
See if things break.
Coding Break
git clone -b step_0 https://github.com/FoKo/chaos-client.git
data:image/s3,"s3://crabby-images/4a272/4a272b80779c13e43173ac33fb552ffe06a80b94" alt=""
Chat Design
connect
-
Attempt a connection
-
If connected, make application available
-
If not connected, die
Solution: try again.
frenzie
node app.js
node -r frenzie app.js
=>
- frenzie will trigger random failures in your application
- It requires zero application changes between a non-chaos and chaos environment
- It's super simple to use
Chaos Break!
Circuit Breakers!
data:image/s3,"s3://crabby-images/9f233/9f2337359f1d37e491d7b971339b883c5e4f7bb5" alt=""
Open State
Circuit Breakers!
data:image/s3,"s3://crabby-images/9a06d/9a06de1a0dfbe83f6b5318993d8098be65c4b101" alt=""
Closed State
Circuit Breakers!
data:image/s3,"s3://crabby-images/98797/9879739e0bebd422312ac0665058dc3e00cee00e" alt=""
Half-Open State
What does this look like @ Foko/Real World?
data:image/s3,"s3://crabby-images/9cbf7/9cbf7fd3611317efa8bbbf0d6e6fc06b1baa25b4" alt=""
data:image/s3,"s3://crabby-images/4bc16/4bc16d1b8e791f7c575f57c2a12b0627abf65a17" alt=""
data:image/s3,"s3://crabby-images/4bc16/4bc16d1b8e791f7c575f57c2a12b0627abf65a17" alt=""
data:image/s3,"s3://crabby-images/4bc16/4bc16d1b8e791f7c575f57c2a12b0627abf65a17" alt=""
data:image/s3,"s3://crabby-images/4bc16/4bc16d1b8e791f7c575f57c2a12b0627abf65a17" alt=""
Coding Break!
Let's talk about disconnect
-
"The network is always reliable."
-
It's not.
-
A disconnect/timeout can occur at anytime
-
Message sending & connection attempts should be isolated
The Broker Pattern
data:image/s3,"s3://crabby-images/6708d/6708d39ed985e10aa5ab8eb882a802868ee12508" alt=""
How do we use this @ Foko Retail?
-
All major units of business logic are isolated into "jobs"
-
We request the Broker to execute a job for us and pass it a "context"
-
The broker ensures that the job chain eventually succeeds
-
Caller does not receive guarantee of success & is not halted in wait of the job
-
All work is dependent of other work - failure is isolated
Coding Break!
Questions?
Break more things (workshop)
By Karim Alibhai
Break more things (workshop)
A workshop about resiliency patterns & failure injection in Node.js.
- 1,867