Break more things with Chaos Engineering
Karim Alibhai
github.com/karimsa
Richard Ison
github.com/richardison
npm install karimsa
npm install richard
Delivering the art & science of retail execution
We're hiring
fokoretail.com
Requirements
-
Know some Node.js
-
Have Node.js installed (any version is okay) - use nvm
-
Have internet access
-
You're building an application
-
There will be errors
-
There will be unexpected errors
-
The errors are not the user's problem, they're your problem
How do we aim for high availability?
Availability =
Time to Failure
Time to Failure + Time to Recovery
Break more things!
(Ah, okay, that's why it's called that)
-
Define a steady state.
-
Define a testable hypothesis.
-
Introduce real-world events that challenge your steady state.
-
Try to disprove your hypothesis through the events
Principles of Chaos
principlesofchaos.org
Principles of Chaos
principlesofchaos.org
-
Write some code that works.
-
Hope that it will always work.
-
Unleash some chaos.
-
See if things break.
Coding Break
git clone -b step_0 https://github.com/FoKo/chaos-client.git
Chat Design
connect
-
Attempt a connection
-
If connected, make application available
-
If not connected, die
Solution: try again.
frenzie
node app.js
node -r frenzie app.js
=>
- frenzie will trigger random failures in your application
- It requires zero application changes between a non-chaos and chaos environment
- It's super simple to use
Chaos Break!
Circuit Breakers!
Open State
Circuit Breakers!
Closed State
Circuit Breakers!
Half-Open State
What does this look like @ Foko/Real World?
Coding Break!
Let's talk about disconnect
-
"The network is always reliable."
-
It's not.
-
A disconnect/timeout can occur at anytime
-
Message sending & connection attempts should be isolated
The Broker Pattern
How do we use this @ Foko Retail?
-
All major units of business logic are isolated into "jobs"
-
We request the Broker to execute a job for us and pass it a "context"
-
The broker ensures that the job chain eventually succeeds
-
Caller does not receive guarantee of success & is not halted in wait of the job
-
All work is dependent of other work - failure is isolated
Coding Break!
Questions?
Break more things (workshop)
By Karim Alibhai
Break more things (workshop)
A workshop about resiliency patterns & failure injection in Node.js.
- 1,794