Consistency in Distributed Systems

Where do the problems start?

- Replicated objects
- Txns involving related updates to different objects
Weak Consistency, Bro

When we need to access data fast
PRIMARY COPY
UPDATE OCCURS
Weak Consistency, Bro

When we need to access data fast
Updated Primary Copy
Can be read from (but may be wrong/undefined)
This is sometimes OK (Twitter) but sometimes not (banks)
Strong Consistency!

Ensures that ALL data being read is the most up-to-date
Need to get enough replicas (nodes) to agree...
Think of each node as a "vote" for the system
Let's define a quorum as the number of nodes that must be present at decision time
Nodes can only participate in one quorum at a time
Quorum Assembly

- Assume n replicas of a data set (DB)
- Replicas can only participate in one quorum at a time
- Define a Write Quorum (QW) and a Read Quorum (QR):
Notice that:
- At most one write quorum can be assembled at one time
- Every pair of quorums contain at least one up-to-date replica
For example,
means that
- All replicas must be locked & updated together
- You can read from any one replica
Quorum Assembly

Let's think about n = 10, QW= 6, QR=5
Minimum SIX replicas must be locked to update
Minimum FIVE replicas must be locked to be read from
WRITE
READ
Distributed Atomic Updates
Two-Phase Commit

- Updates must be atomic - all nodes either commit or abort
- One node acts as the Commit Manager (CM) and the others as Participating Sites (PS)
Distributed Atomic Updates
Two-Phase Commit

PHASE 1
- CM requests votes from all nodes in the transaction
- All nodes write their data to local persistent store and record that 2PC is in progress, then send their vote
PHASE 2
- CM tallies the votes. If there is at least one "abort", the txn is aborted
- CM propagates the decision to the involved participating sites
Distributed Atomic Updates
Two-Phase Commit

How is this used in real life?
Distributed databases provide this out-of-the-box
- CockroachDB, RocksDB, Etcd (Kubernetes State Store)
Services to manage this
- Apache Zookeeper
Apache ZooKeeper
Let's Run It!

- Software to manage co-ordination in distributed systems
- Primitives are nodes (znodes) that can have data and children
/
/node1
/node2
/node3
/node1/child1
/node2/child1
/node2/child2
. . .
"FOO"
"BAR"
"BAZ"
/ZooKeeper
metadata
demo=# CREATE TABLE demo_table(
a integer,
b integer
);
CREATE TABLE
demo=# BEGIN; -- start a transaction
BEGIN
DEMO=# INSERT INTO demo_table VALUES ( 6, 9 ); -- this doesn't actually happen yet
INSERT 0 1
demo=# PREPARE TRANSACTION 'demo'; -- prepare the transaction
PREPARE TRANSACTION
demo=# SELECT * FROM foo; -- the table is empty, copy waiting for commit
a | b
---+---
(0 rows)
demo=# COMMIT PREPARED 'demo'; -- do the commit
COMMIT PREPARED
demo=# SELECT * FROM foo; -- data is visible
a | b
---+---
6 | 9
(1 row)
Consistency in Distributed Systems
By Corey Brooks
Consistency in Distributed Systems
- 1,483