Conflict-free Replicated Data Types

Bartosz Sypytkowski

@horusiath

b.sypytkowski@gmail.com

http://bartoszsypytkowski.com

Let's describe problem by use case

View counter for globally accessible resources (a.k.a. Youtube views counter)

We need replication

But how to keep state in sync?

We need replication

Publishing a video

We need replication

Updating view counter

X = 1

X = 2

X = ?

?

CAP Theorem

Partition tolerance

Consistency

Availability

CAP Theorem

X = 2

X = 3

X = 1

Consistency

X = 2

X = 3

X = 1

Availability

X = 2

X = 3

X = 2

X = 3

CRDTs

Highly available / Eventually consistent

It's all about Math

x • y = y • x

(x • y) • z = z • (y • z)

x • x = x

Idempotent

Associative

Commutative

Why does it matter?

So which operations meet that criteria?

Number addition?

x + y = y + x
(x + y) + z = x + (y + z)
x + x ≠ x

So which operations meet that criteria?

Number max

max(x, y) = max(y, x)
max(max(x, y), z) = max(x, max(y, z))
max(x, x) = x

So which operations meet that criteria?

Set union

x ∪ y = y ∪ x
(x ∪ y) ∪ z = x ∪ (y ∪ z)
x ∪ x = x

CRDT types

Commutative
Convergent

CRDT libs/DBs

(JVM) akka.ddata
(Erlang VM) Riak
(Elixir / Erlang VM) Phoenix Presence
(Go) Roshi
(Erlang VM) AntidoteDB
(JVM) eventuate

Collection types

G-Counter (increment-only counter)
PN-Counter (increment/decrement counter)
G-Set (growing-only set)
2P-Set (add/remove set)
OR-Set (observed remove set)
LWW-Register (last write wins register)
MV-Register (multi value register)
RGA (replicated growable array)
L-Seq (modifiable ordered sequence)
... and more

G-Counter

Increment-only counter

G-Counter

// empty value
empty → { }

// increment - every replica increments 
// only its own replica entry
inc({ a: 1, b: 1 }) → { a: 2, b: 1 }   // replica a ++

// merge - max number of all replicas
{ a: 2, b: 1 } ∪ { a: 1, b: 2 } → { a: 2, b: 2 }

// value - sum of all replicas
value({ a: 1, b: 2 }) → 3

PN-Counter

Increment/decrement counter

PN-Counter

Implementation

Just compose 2 G-Counters:

increments
decrements

G-Set

Growing-only set

Nothing to see here... just an ordinary set

// empty value - empty set
empty → {}

// add element
add({}, 123) → { 123 }

// merge - union sets
{ 123, 234 } ∪ { 234, 345 } → { 123, 234, 345 }

// value - just return a set
value({ 123, 234 }) → { 123, 234 }

What if we want to remove element?

// state of a G-Set on replicas A & B
A(X) → { 123, 234 }
B(X) → { 123, 234 }

// remove 123 on replica A
A(X) → { 234 } 
B(X) → { 123, 234 }

// merge both replicas
A(X) ∪ B(X) → { 123, 234 }  // WRONG!

2P-Set

2 Phase (add/remove) set

Again... just compose 2 sets:

add set
rem set a.k.a. tombstones

// empty value - 2 empty sets
empty → { add: {}, rem: {} }

// add element
X = add(empty, 123) → { add: { 123 }, rem: {} }
X = add(X, 234) → { add: { 123, 234 }, rem: {} }

// remove element
X = rem(X, 123) → { add: { 123, 234 }, rem: { 123 } }

// value - diff between add & rem
value(X) → { 234 }

// merge - union corresponding add/rem
X → { add: { 123, 234 }, rem: { 123 } }
Y → { add: { 123, 345 }, rem: {} }
X ∪ Y → { add: { 123, 234, 345 }, rem: { 123 } }

1. What if tombstones grow big?

2. What if we want to add removed element again?

OR-Set

Observed Remove Set

add: { A/1, B/2 }

rem: { }

Insert: C/3

Remove: C/4

Insert: C/5

add: { A/1, B/2, C/3 }

rem: { }

add: { A/1, B/2 }

rem: { C/4 }

add: { A/1, B/2, C/5 }

rem: { }

LWW-Register

Last Write Wins Register

Value

Timestamp

LWW-Map

Last Write Wins Key-Value Map

LWW-Register

Key-Value pair

OR-Set

With composability we can do a lot!

What would you say for subset of SQL?

Version vector

Should we relly on system timestamps?

Version vector

Logic time

C:1

B:1
C:1

B:2
C:1

A:1
B:2
C:1

A:2
B:2
C:1

B:3
C:1

A:2
B:4
C:1

B:3
C:2

A:2
B:5
C:1

A:2
B:5
C:4

B:3
C:3

A:3
B:3
C:3

A:2
B:5
C:5

A:4
B:5
C:5

Version vector

Comparison operator

Less

Greater

Equal

Concurrent

a:1
b:0
c:0

a:2
b:1
c:2

<

a:1
b:0
c:0

a:1
b:0
c:0

=

a:4
b:1
c:2

a:3
b:0
c:0

>

a:4
b:1
c:2

a:3
b:2
c:2

?!

Highly Available Transactions

Problem: how to keep consistency between different keys?

set(x=1)

set(y=1)

Highly Available Transactions

Read Atomic Transaction Isolation

Synchronization independence
Partition independence

RAMP-Fast

Introduction

Each record is identified by:

key
value
metadata (dependencies)
timestamp

RAMP-Fast

Writes

Prepare {X=1, ts:1, dep: [Y]}

Prepare {Y=1, ts:1, dep: [X]}

Prepared

Commit {ts:1}

Committed

RAMP-Fast

Reads (happy case)

Get {X}

{X=1, ts:1, dep: [Y]}

{Y=1, ts:1, dep: [X]}

Get {Y}

RAMP-Fast

Reads (repair)

Get {X}

{X=1, ts:1, dep: [Y]}

{Y=0, ts:0, dep: []}

Get {Y}

Mismatch

Get {Y, ts:1}

{Y=1, ts:1, dep: [X]}

Resources

Paper: http://hal.upmc.fr/inria-00555588/document
Practical Demystification of CRDT: https://www.youtube.com/watch?v=PQzNW8uQ_Y4
Consistency without consensus: https://www.infoq.com/presentations/crdt-soundcloud
RAMP Transactions: www.bailis.org/papers/ramp-sigmod2014.pdf
RAMP made easy: rustyrazorblade.com/2015/11/ramp-made-easy/

Conflict-free Replicated Data Types

Let's describe problem by use case

View counter for globally accessible resources (a.k.a. Youtube views counter)

We need replication

But how to keep state in sync?

We need replication

Publishing a video

We need replication

Updating view counter

?

CAP Theorem

CAP Theorem

Consistency

Availability

CRDTs

Highly available / Eventually consistent

It's all about Math

Idempotent

Associative

Commutative

Why does it matter?

So which operations meet that criteria?

Number addition?

So which operations meet that criteria?

Number max

So which operations meet that criteria?

Set union

CRDT types

Commutative

Convergent

CRDT libs/DBs

Collection types

G-Counter

Increment-only counter

G-Counter

PN-Counter

Increment/decrement counter

PN-Counter

Implementation

G-Set

Growing-only set

What if we want to remove element?

2P-Set

2 Phase (add/remove) set

1. What if tombstones grow big?

2. What if we want to add removed element again?

OR-Set

Observed Remove Set

LWW-Register

Last Write Wins Register

LWW-Map

Last Write Wins Key-Value Map

With composability we can do a lot!

What would you say for subset of SQL?

Version vector

Should we relly on system timestamps?

Version vector

Logic time

Version vector

Comparison operator

<

=

>

?!

Highly Available Transactions

Problem: how to keep consistency between different keys?

Highly Available Transactions

Read Atomic Transaction Isolation

RAMP-Fast

Introduction

RAMP-Fast

Writes

RAMP-Fast

Reads (happy case)

RAMP-Fast

Reads (repair)

Resources

End

Conflict-free Replicated Data Types

More from Bartosz Sypytkowski