Break more things with Chaos Engineering

Karim Alibhai

github.com/karimsa

Richard Ison

github.com/richardison

npm install karimsa
npm install richard

Delivering the art & science of retail execution

We're hiring

fokoretail.com

Requirements

  • Know some Node.js

  • Have Node.js installed (any version is okay) - use nvm

  • Have internet access

  • You're building an application

  • There will be errors

  • There will be unexpected errors

  • The errors are not the user's problem, they're your problem

How do we aim for high availability?

Availability = 

Time to Failure

Time to Failure + Time to Recovery

Break more things!

(Ah, okay, that's why it's called that)

  1. Define a steady state.

  2. Define a testable hypothesis.

  3. Introduce real-world events that challenge your steady state.

  4. Try to disprove your hypothesis through the events

Principles of Chaos

principlesofchaos.org

Principles of Chaos

principlesofchaos.org

  1. Write some code that works.

  2. Hope that it will always work.

  3. Unleash some chaos.

  4. See if things break.

Coding Break

git clone -b step_0 https://github.com/FoKo/chaos-client.git

Chat Design

connect

  1. Attempt a connection

  2. If connected, make application available

  3. If not connected, die

Solution: try again.

frenzie

node app.js
node -r frenzie app.js

=>

  • frenzie will trigger random failures in your application
  • It requires zero application changes between a non-chaos and chaos environment
  • It's super simple to use

Chaos Break!

Circuit Breakers!

Open State

Circuit Breakers!

Closed State

Circuit Breakers!

Half-Open State

What does this look like @ Foko/Real World?

Coding Break!

Let's talk about disconnect

  • "The network is always reliable."

  • It's not.

  • A disconnect/timeout can occur at anytime

  • Message sending & connection attempts should be isolated

The Broker Pattern

How do we use this @ Foko Retail?

  • All major units of business logic are isolated into "jobs"

  • We request the Broker to execute a job for us and pass it a "context"

  • The broker ensures that the job chain eventually succeeds

  • Caller does not receive guarantee of success & is not halted in wait of the job

  • All work is dependent of other work - failure is isolated

Coding Break!

Questions?

Break more things (workshop)

By Karim Alibhai

Break more things (workshop)

A workshop about resiliency patterns & failure injection in Node.js.

  • 1,540