Circuit breaker pattern

Radim Štěpaník

Radim Štěpaník

Circuit breaker pattern

⚡️Circuit breaker

When you should consider it

  • distributed environment
  • to fail fast
  • fault tolerance

Mistakes are everywhere

  • application level errors
  • network errors
  • service outages
  • datacenter outages

Typical scenario

Something starts burning

Typical scenario

A

B

C

GW

A

B

C

GW

🔥

Something starts burning

Retry

  • Just retry it - regardless of the conditions
  • Improvement - time distribution, wait for a while
  • It's good when everything is successful 

💥

👉

A

B

C

GW

🔥

How will it turn out?

A

B

C

GW

🔥

🔥

How will it turn out?

A

B

C

GW

🔥

🔥

🔥

🔥

How will it turn out?

A

B

C

GW

🔥

🔥

🔥

🔥

🔥

🔥

🔥

🔥

🔥

🔥

🔥

🔥

🔥

Complete disaster

How will it turn out?

Complete disaster

A

B

C

GW

🔥

What could CB do for you?

A

B

C

GW

🔥

Open

A

B

C

GW

Half-open

Closed again

A

B

C

GW

What could CB do for you?

Open

We are waiting for recovery

CB will close itself after a while

Circuit breaker

  • State of CB
    • errors, timeouts
  • Operation - function 
  • Fallback
  • Settings 
    • error threshold
    • timeout
    • ....protection parameters - time,
      amount of errors, etc.

Support

  • your own implementation
  • libraries
    • nodejs - https://www.npmjs.com/package/opossum
    • resilience4j - https://github.com/resilience4j/resilience4j
    • php - https://github.com/upwork/phystrix
    • ....

Code example

  • Stav jističe
    • chyby, timeouty
  • Samotná operace
  • Fallback
  • Nastavení 
    • práh erroru
    • timeout
    • .... ochranné pásmo - čas,
      množství, etc.
import CircuitBreaker from 'opossum';

function asyncFunctionThatCouldFail(x, y) {
    return new Promise((resolve, reject) => {
        // Do something, maybe on the network or a disk
    });
}

const options = {
    timeout: 3000, // If our function takes longer than 3 seconds, trigger a failure
    errorThresholdPercentage: 50, // When 50% of requests fail, trip the circuit
    resetTimeout: 30000, // After 30 seconds, try again.
};
const breaker = new CircuitBreaker(asyncFunctionThatCouldFail, options);

breaker.fire(x, y).then(console.log).catch(console.error);

Code example

import CircuitBreaker from 'opossum';

function asyncFunctionThatCouldFail(x, y) {
    return new Promise((resolve, reject) => {
        // Do something, maybe on the network or a disk
    });
}

const options = {
    timeout: 3000, // If our function takes longer than 3 seconds, trigger a failure
    errorThresholdPercentage: 50, // When 50% of requests fail, trip the circuit
    resetTimeout: 30000, // After 30 seconds, try again.
};
const breaker = new CircuitBreaker(asyncFunctionThatCouldFail, options)
    .fallback(() => 'do something else');    

breaker.fire(x, y).then(console.log).catch(console.error);

Use cases

  • IO operations 
    • http clients
    • database operations - e.g. Redis
    • reading from shared file
  • High level logic
    • setting of fallback mechanisms
    • 💡Usage with graphql?

Things you can't rely on

Use case

Products

Categories

Banners

GQL GW

50+ services

search

customer

cart

bestsellers

banners

menu

category

products

filters

content

    # Calls category
    getCategory(categoryUrl: $categoryUrl) {
      id
      howto
      # Calls content service
      ... on ContentCategory {
        content {
          id
          title
          body
        }
      }
      # Calls product service
      ... on ProductCategory {
        productCollection {
          items {
            ... on Product {
              ...productForList
            }
            # Calls estimated delivery service
            ... on BonusSet {
              estimatedDeliveries {
                ...productEstimatedDeliveryFragment
              }
            }
            # Calls banners service
            ... on SectionBannerSlideImage {
              ...sectionBannerSlide
            }
          }
        }
      }
    }
  
// 👉 every resolver has unique name - Query:category, Category:productCollection
const resolvers = {
    Query: {
        category: resolveCategory,
    },
    Category: {
        productCollection: resolveProductCollection,
    },
};

// 🌍 global configuration of specific resolvers
export const resolverConfig: Map<string, ResolverConfig> = new Map<string, ResolverConfig>()
    .set('Query:category', { timeout: 3000 })
    .set('Category:productCollection', { timeout: 3000, fallback: () => [] });

// 👷 Usage of high order function. Resolver is wrapped by circuit breaker
const resolversWithCircuitBreakers = mapResolvers(
    resolvers,
    // 
    ({ resolver, name }: CreateResolverInput): IFieldResolver<unknown, unknown, unknown> => {
        const { ...breakerDefaultConfig } = resolverConfig.get(name) ?? {};
        const config = { name, ...breakerDefaultConfig };
        const breaker = createCircuitBreaker(config, resolver);

        return async (...params: ResolverParams): Promise<unknown> => breaker.fire(...params);
    },
);

Failsafe solution

Monitoring hystrix

Monitoring prometheus/grafana

  • sms notification
  • teams notification

Is it the answer for everything?

  • increase application complexity
    • correct setting of timeout
    • must be defined in the code
  • increase processing requirements

👋 Thanks for your attention

Allegro Circuit breaker

By Radim Štěpaník

Allegro Circuit breaker

  • 235