Tackling transactions in Microservices applications

Bio

Rubén Pérez

Software Engineer at Schibsted Spain

~~Java champion~~

@bakwrau

Author of 0 books

Agenda

ACID Transactions
- They are not an option in a distributed system

Sagas
- Why they are a good option

Two Phase Commit
- Why we should avoid it

Monolith

Transactions in a Monolith

What is

ACID?

ACID

Set of properties of database transactions intended to guarantee validity even in the event of errors

ACID

tomicity

onsistency

solation

urability

So far so good...

If the product is successful, it will keep growing.

And getting worse…

TODO images

Problems arise

High Coupling

Too large for a single developer to understand

Slow day to day development

Spaghetti code / big ball of mud

Delayed deployments

Reliability

Long term commitment to a technology stack

We would like

Smaller code base

Less code complexity, faster to develop and easier to understand

Minimize cost of change

Different responsibilities are placed in different services

Deployed independently

Better scaling

So we move to Microservices

Microservices

Everything is a trade-off

We

don't

have

ACID

anymore

2 Phase Commit

Specialized type of consensus protocol

It is a distributed algorithm that coordinates all the processes that participate in a distributed atomic transaction on whether to commit or abort (rollback) the transaction

Commit Request Phase

Coordinator

Query to commit

Commit Phase

Coordinator

Query to commit

Yes

Commit

Ack

Commit Phase

Coordinator

Query to commit

Yes

Rollback

Ack

2 Phase Commit Pros

Provides atomicity because every commit is applied at the same time, or no commit is executed at all

Distributed transactions are very appealing from a developer’s point of view

2 Phase Commit Cons

It’s a blocking protocol

2PC coordinator is a Single Point of Failure

O(n^2) messages worst case

Reduced throughput due to locks, and depending on the slowest machine

2PC impacts availability (Availability is the product of the availability of all the participants in the transactions )

Can we do better?

Sagas

1987

Sagas are long lived transactions that can be broken up in a sequence of relatively independent sub-transactions than can be interleaved

All transactions in the sequence complete successfully or compensating transactions ran to amend a partial execution

To amend partial executions, each saga transaction

$ T_i $

should be provided with a compensating transaction

$C_i$

The compensating transaction semantically undoes any of the actions performed by $ T_i $

Guarantee:

$ T_1 $, $ T_2 $, ... $ T_n $

$ T_1 $, $ T_2 $, ... $ T_j $, $ C_j $, $ C_2 $, ... $ C_1 $

0 <= j < n

0 <= j < n

Trips

Sagas are a

Failure Management Pattern

Sagas vs 2PC

A saga does not have ACID guarantees

Is not atomic
Does not provide strict serializability

The trade-off -> availability

Choreography vs Orchestration

After $T_i$ completes, some code has to decide what to execute next

$$ T_{i+1}$$
$$ C_{i-1}$$

Choreography

Distributed decision making

Pros:
- No extra component
- Flexibility to add new saga (context passing)

Cons:
- Each service has to know what to do next (beyond its scope)
- Service coupling: the logic about the saga is scattered throughout the system

Orchestration

Centralized decision making component

Pros:
- Visibility of processes
- Ease of management
- Cohesion
- Better composability

Cons
- Need of a new component/implementation
- Need of write new code in that component in order to support new sagas

Two tipical ways of implementing this:

The orchestrator is in an already existing component

The orchestrator is a brand new component

Saga Execution Coordinator

SEC

Distributed/Durable Log
- Fault tolerant and highly available

SEC

Saga Execution Coordinator

Compensating requests:

Must be idempotent

Cannot abort (cannot say they are not completing the task)

Requests:

Should be commutative with the compensating requests

Can abort

Must be idempotent *

Book

Cancel

Book

START SAGA

START CAR

END CAR

START HOTEL

END HOTEL

START FLIGHT

END FLIGHT

START PAYMENT

END PAYMENT

END SAGA

SEC

START SAGA

START CAR

END CAR

START HOTEL

END HOTEL

START FLIGHT

ABORT FLIGHT

COMP HOTEL

COMP CAR

END SAGA

SEC

START SAGA

START CAR

END CAR

START HOTEL

END HOTEL

START FLIGHT

END FLIGHT

START PAYMENT

END PAYMENT

END SAGA

SEC

START SAGA

START CAR

END CAR

START HOTEL

START FLIGHT

ABORT FLIGHT

END HOTEL

COMP HOTEL

COMP CAR

END SAGA

SEC

SEC Failure

It is not a Single Point of Failure

The state is in the log, not in the SEC

SEC Failure

Just spin up new machines.

All executed $T_i$ have completed (Start & End logged)

To resume previous work:

Any executed $T_i$ not completed (Start but not End logged)

Any Aborted $T_i$

Wrap up

Higher Cohesion & Composable Services

SEC

Isolation of Complex Code

SEC

Isolation of Complex Code

SEC

Trips

Sagas

Transaction-like flows

Isolation of complex code

Composable service

Higher cohesion in our system