Conflict-free Replicated Data Types

Bartosz Sypytkowski

@horusiath

b.sypytkowski@gmail.com

http://bartoszsypytkowski.com

Let's describe problem by use case

View counter for globally accessible resources (a.k.a. Youtube views counter)

We need replication

But how to keep state in sync?

We need replication

Publishing a video

We need replication

Updating view counter

X = 1

X = 2

X = ?

?

CAP Theorem

Partition tolerance

Consistency

Availability

CAP Theorem

X = 2

X = 3

X = 1

X = 1

Consistency

X = 2

X = 3

X = 1

X = 1

Availability

X = 2

X = 3

X = 2

X = 3

CRDTs

Highly available / Eventually consistent

It's all about Math

x • y = y • x

(x • y) • z = z • (y • z)

x • x = x

Idempotent

Associative

Commutative

Why does it matter?

So which operations meet that criteria?

Number addition?

  • x + y = y + x
  • (x + y) + z = x + (y + z)
  • x + x ≠ x

So which operations meet that criteria?

Number max

  • max(x, y) = max(y, x)
  • max(max(x, y), z) = max(x, max(y, z))
  • max(x, x) = x

So which operations meet that criteria?

Set union

  • x ∪ y = y ∪ x
  • (x ∪ y) ∪ z = x ∪ (y ∪ z)
  • x ∪ x = x

CRDT types

  • Commutative

  • Convergent

CRDT libs/DBs

Collection types

  • G-Counter (increment-only counter)
  • PN-Counter (increment/decrement counter)
  • G-Set (growing-only set)
  • 2P-Set (add/remove set)
  • OR-Set (observed remove set)
  • LWW-Register (last write wins register)
  • MV-Register (multi value register)
  • RGA (replicated growable array)
  • L-Seq (modifiable ordered sequence)
  • ... and more

G-Counter

Increment-only counter

G-Counter

// empty value
empty → { }

// increment - every replica increments 
// only its own replica entry
inc({ a: 1, b: 1 }) → { a: 2, b: 1 }   // replica a ++

// merge - max number of all replicas
{ a: 2, b: 1 } ∪ { a: 1, b: 2 } → { a: 2, b: 2 }

// value - sum of all replicas
value({ a: 1, b: 2 }) → 3

PN-Counter

Increment/decrement counter

PN-Counter

Implementation

Just compose 2 G-Counters:

  • increments
  • decrements

G-Set

Growing-only set

Nothing to see here... just an ordinary set

// empty value - empty set
empty → {}

// add element
add({}, 123) → { 123 }

// merge - union sets
{ 123, 234 } ∪ { 234, 345 } → { 123, 234, 345 }

// value - just return a set
value({ 123, 234 }) → { 123, 234 }

What if we want to remove element?

// state of a G-Set on replicas A & B
A(X) → { 123, 234 }
B(X) → { 123, 234 }

// remove 123 on replica A
A(X) → { 234 } 
B(X) → { 123, 234 }

// merge both replicas
A(X) ∪ B(X) → { 123, 234 }  // WRONG!

2P-Set

2 Phase (add/remove) set

Again... just compose 2 sets:

  • add set
  • rem set  a.k.a. tombstones
// empty value - 2 empty sets
empty → { add: {}, rem: {} }

// add element
X = add(empty, 123) → { add: { 123 }, rem: {} }
X = add(X, 234) → { add: { 123, 234 }, rem: {} }

// remove element
X = rem(X, 123) → { add: { 123, 234 }, rem: { 123 } }

// value - diff between add & rem
value(X) → { 234 }

// merge - union corresponding add/rem
X → { add: { 123, 234 }, rem: { 123 } }
Y → { add: { 123, 345 }, rem: {} }
X ∪ Y → { add: { 123, 234, 345 }, rem: { 123 } }

1. What if tombstones grow big?

2. What if we want to add removed element again?

OR-Set

Observed Remove Set

add: { A/1, B/2 }

rem: { }

Insert: C/3

Remove: C/4

Insert: C/5

add: { A/1, B/2, C/3 }

rem: { }

add: { A/1, B/2 }

rem: { C/4 }

add: { A/1, B/2, C/5 }

rem: { }

LWW-Register

Last Write Wins Register

Value

Timestamp

LWW-Map

Last Write Wins Key-Value Map

LWW-Register

Key-Value pair

OR-Set

With composability we can do a lot!

What would you say for subset of SQL?

Version vector

Should we relly on system timestamps?

Version vector

Logic time

C:1
B:1
C:1
B:2
C:1
A:1
B:2
C:1
A:2
B:2
C:1
B:3
C:1
A:2
B:4
C:1
B:3
C:2
A:2
B:5
C:1
A:2
B:5
C:4
B:3
C:3
A:3
B:3
C:3
A:2
B:5
C:5
A:4
B:5
C:5

A

B

C

Version vector

Comparison operator

Less

Greater

Equal

Concurrent

a:1
b:0
c:0
a:2
b:1
c:2

<

a:1
b:0
c:0
a:1
b:0
c:0

=

a:4
b:1
c:2
a:3
b:0
c:0

>

a:4
b:1
c:2
a:3
b:2
c:2

?!

Highly Available Transactions

Problem: how to keep consistency between different keys?

A

B

set(x=1)

set(y=1)

Highly Available Transactions

Read Atomic Transaction Isolation

  • Synchronization independence
  • Partition independence

RAMP-Fast

Introduction

Each record is identified by:

  • key
  • value
  • metadata (dependencies)
  • timestamp

RAMP-Fast

Writes

Prepare {X=1, ts:1, dep: [Y]}

Prepare {Y=1, ts:1, dep: [X]}

Prepared

Prepared

Commit {ts:1}

Commit {ts:1}

Committed

Committed

RAMP-Fast

Reads (happy case)

Get {X}

{X=1, ts:1, dep: [Y]}

{Y=1, ts:1, dep: [X]}

Get {Y}

RAMP-Fast

Reads (repair)

Get {X}

{X=1, ts:1, dep: [Y]}

{Y=0, ts:0, dep: []}

Get {Y}

Mismatch

Get {Y, ts:1}

{Y=1, ts:1, dep: [X]}

Resources

End

Conflict-free Replicated Data Types

By Bartosz Sypytkowski

Conflict-free Replicated Data Types

  • 2,204