Tackling transactions in Microservices applications
Bio
Rubén Pérez
Software Engineer at Schibsted Spain
Java champion
@bakwrau
- Author of 0 books
Agenda
- ACID Transactions
- They are not an option in a distributed system
- Sagas
- Why they are a good option
- Two Phase Commit
- Why we should avoid it
Monolith
Transactions in a Monolith
What is
ACID?
ACID
Set of properties of database transactions intended to guarantee validity even in the event of errors
ACID
A
C
I
D
tomicity
onsistency
solation
urability
So far so good...
If the product is successful, it will keep growing.
And getting worse…
TODO images
Problems arise
-
High Coupling
-
Too large for a single developer to understand
-
Slow day to day development
-
Spaghetti code / big ball of mud
-
Delayed deployments
-
Reliability
-
Long term commitment to a technology stack
-
...
We would like
-
Smaller code base
-
Less code complexity, faster to develop and easier to understand
-
Minimize cost of change
-
Different responsibilities are placed in different services
-
Deployed independently
-
Better scaling
-
...
So we move to Microservices
Microservices
Everything is a trade-off
We
don't
have
ACID
anymore
2 Phase Commit
2 Phase Commit
-
It is a distributed algorithm that coordinates all the processes that participate in a distributed atomic transaction on whether to commit or abort (rollback) the transaction
Commit Request Phase
Coordinator
Query to commit
Query to commit
Query to commit
Query to commit
Commit Phase
Coordinator
Query to commit
Query to commit
Query to commit
Query to commit
Yes
Yes
Yes
Yes
Commit
Commit
Commit
Commit
Ack
Ack
Ack
Ack
Commit Phase
Coordinator
Query to commit
Query to commit
Query to commit
Query to commit
Yes
Yes
No
Yes
Rollback
Rollback
Rollback
Rollback
Ack
Ack
Ack
Ack
2 Phase Commit Pros
-
Provides atomicity because every commit is applied at the same time, or no commit is executed at all
-
Distributed transactions are very appealing from a developer’s point of view
2 Phase Commit Cons
-
It’s a blocking protocol
-
2PC coordinator is a Single Point of Failure
-
O(n^2) messages worst case
-
Reduced throughput due to locks, and depending on the slowest machine
-
2PC impacts availability (Availability is the product of the availability of all the participants in the transactions )
Can we do better?
Sagas
1987
Sagas are long lived transactions that can be broken up in a sequence of relatively independent sub-transactions than can be interleaved
All transactions in the sequence complete successfully or compensating transactions ran to amend a partial execution
To amend partial executions, each saga transaction
\( T_i \)
should be provided with a compensating transaction
\(C_i\)
The compensating transaction semantically undoes any of the actions performed by \( T_i \)
Guarantee:
- \( T_1 \), \( T_2 \), ... \( T_n \)
- \( T_1 \), \( T_2 \), ... \( T_j \), \( C_j \), \( C_2 \), ... \( C_1 \)
Trips
Sagas are a
Failure Management Pattern
Sagas vs 2PC
A saga does not have ACID guarantees
- Is not atomic
- Does not provide strict serializability
The trade-off -> availability
Choreography vs Orchestration
Two tipical ways of implementing this:
- The orchestrator is in an already existing component
- The orchestrator is a brand new component
Saga Execution Coordinator
SEC
- Distributed/Durable Log
- Fault tolerant and highly available
SEC
- Saga Execution Coordinator
Compensating requests:
- Must be idempotent
- Cannot abort (cannot say they are not completing the task)
Requests:
- Should be commutative with the compensating requests
- Can abort
- Must be idempotent
Book
Cancel
Book
START SAGA
START CAR
END CAR
START HOTEL
END HOTEL
START FLIGHT
END FLIGHT
START PAYMENT
END PAYMENT
END SAGA
SEC
START SAGA
START CAR
END CAR
START HOTEL
END HOTEL
START FLIGHT
ABORT FLIGHT
COMP HOTEL
COMP CAR
END SAGA
SEC
START SAGA
START CAR
END CAR
START HOTEL
END HOTEL
START FLIGHT
END FLIGHT
START PAYMENT
END PAYMENT
END SAGA
SEC
START SAGA
START CAR
END CAR
START HOTEL
START FLIGHT
ABORT FLIGHT
END HOTEL
COMP HOTEL
COMP CAR
END SAGA
SEC
SEC Failure
SEC Failure
It is not a Single Point of Failure
The state is in the log, not in the SEC
SEC Failure
Just spin up new machines.
- All executed \(T_i\) have completed (Start & End logged)
To resume previous work:
- Any executed \(T_i\) not completed (Start but not End logged)
- Any Aborted \(T_i\)
Wrap up
Higher Cohesion & Composable Services
SEC
Gw
Isolation of Complex Code
SEC
Gw
Gw
Isolation of Complex Code
SEC
Gw
Gw
Trips
Sagas
- Transaction-like flows
- Isolation of complex code
- Composable service
- Higher cohesion in our system
Sagas
Avoid transactions across service boundaries if you can
Sagas - Commit 2018
By Rubén Pérez
Sagas - Commit 2018
- 1,604