Consistency in Distributed Systems

Where do the problems start?

  • Replicated objects
  • Txns involving related updates to different objects

Weak Consistency, Bro

When we need to access data fast

PRIMARY COPY

UPDATE OCCURS

Weak Consistency, Bro

When we need to access data fast

Updated Primary Copy

Can be read from (but may be wrong/undefined)

This is sometimes OK (Twitter) but sometimes not (banks)

Strong Consistency!

Ensures that ALL data being read is the most up-to-date

 

Need to get enough replicas (nodes) to agree...

 

Think of each node as a "vote" for the system

Let's define a quorum as the number of nodes that must be present at decision time

Nodes can only participate in one quorum at a time

Quorum Assembly

  • Assume n replicas of a data set (DB)
  • Replicas can only participate in one quorum at a time
  • Define a Write Quorum (QW) and a Read Quorum (QR):
QW > n/2
QW>n/2QW > n/2
Q W + QR > n
QW+QR>n Q W + QR > n

Notice that:

  • At most one write quorum can be assembled at one time
  • Every pair of quorums contain at least one up-to-date replica

 

QW := n ; QR := 1
QW:=n;QR:=1 QW := n ; QR := 1

For example,

means that

  • All replicas must be locked & updated together
  • You can read from any one replica

Quorum Assembly

Let's think about n = 10, QW= 6, QR=5

Minimum SIX replicas must be locked to update

Minimum FIVE replicas must be locked to be read from

WRITE

READ

Distributed Atomic Updates

Two-Phase Commit

 

  • Updates must be atomic - all nodes either commit or abort
  • One node acts as the Commit Manager (CM) and the others as Participating Sites (PS)

 

Distributed Atomic Updates

Two-Phase Commit

 

PHASE 1

  • CM requests votes from all nodes in the transaction
  • All nodes write their data to local persistent store and record that 2PC is in progress, then send their vote

PHASE 2

  • CM tallies the votes. If there is at least one "abort", the txn is aborted
  • CM propagates the decision to the involved participating sites

Distributed Atomic Updates

Two-Phase Commit

 

How is this used in real life?

Distributed databases provide this out-of-the-box

- CockroachDB, RocksDB, Etcd (Kubernetes State Store)

 

Services to manage this

- Apache Zookeeper

Apache ZooKeeper

Let's Run It!

- Software to manage co-ordination in distributed systems

- Primitives are nodes (znodes) that can have data and children

/

/node1

/node2

/node3

/node1/child1

/node2/child1

/node2/child2

. . .

"FOO"

"BAR"

"BAZ"

/ZooKeeper

metadata

demo=# CREATE TABLE demo_table(
  a integer,
  b integer
);
CREATE TABLE


demo=# BEGIN; -- start a transaction
BEGIN

DEMO=# INSERT INTO demo_table VALUES ( 6, 9 ); -- this doesn't actually happen yet
INSERT 0 1

demo=# PREPARE TRANSACTION 'demo'; -- prepare the transaction
PREPARE TRANSACTION

demo=# SELECT * FROM foo; -- the table is empty, copy waiting for commit
 a | b 
---+---
(0 rows)

demo=# COMMIT PREPARED 'demo'; -- do the commit
COMMIT PREPARED

demo=# SELECT * FROM foo; -- data is visible
 a | b 
---+---
 6 | 9
(1 row)

Consistency in Distributed Systems

By Corey Brooks

Consistency in Distributed Systems

  • 1,483