Conflict-free Replicated Data Types
Bartosz Sypytkowski
@horusiath
b.sypytkowski@gmail.com
http://bartoszsypytkowski.com
Let's describe problem by use case
View counter for globally accessible resources (a.k.a. Youtube views counter)
We need replication
But how to keep state in sync?
We need replication
Publishing a video
We need replication
Updating view counter
X = 1
X = 2
X = ?
?
CAP Theorem
Partition tolerance
Consistency
Availability
CAP Theorem
X = 2
X = 3
X = 1
X = 1
Consistency
X = 2
X = 3
X = 1
X = 1
Availability
X = 2
X = 3
X = 2
X = 3
CRDTs
Highly available / Eventually consistent
It's all about Math
x • y = y • x
(x • y) • z = z • (y • z)
x • x = x
Idempotent
Associative
Commutative
Why does it matter?
So which operations meet that criteria?
Number addition?
- x + y = y + x
- (x + y) + z = x + (y + z)
- x + x ≠ x
So which operations meet that criteria?
Number max
- max(x, y) = max(y, x)
- max(max(x, y), z) = max(x, max(y, z))
- max(x, x) = x
So which operations meet that criteria?
Set union
- x ∪ y = y ∪ x
- (x ∪ y) ∪ z = x ∪ (y ∪ z)
- x ∪ x = x
CRDT types
-
Commutative
-
Convergent
CRDT libs/DBs
Collection types
- G-Counter (increment-only counter)
- PN-Counter (increment/decrement counter)
- G-Set (growing-only set)
- 2P-Set (add/remove set)
- OR-Set (observed remove set)
- LWW-Register (last write wins register)
- MV-Register (multi value register)
- RGA (replicated growable array)
- L-Seq (modifiable ordered sequence)
- ... and more
G-Counter
Increment-only counter
G-Counter
// empty value
empty → { }
// increment - every replica increments
// only its own replica entry
inc({ a: 1, b: 1 }) → { a: 2, b: 1 } // replica a ++
// merge - max number of all replicas
{ a: 2, b: 1 } ∪ { a: 1, b: 2 } → { a: 2, b: 2 }
// value - sum of all replicas
value({ a: 1, b: 2 }) → 3
PN-Counter
Increment/decrement counter
PN-Counter
Implementation
Just compose 2 G-Counters:
- increments
- decrements
G-Set
Growing-only set
Nothing to see here... just an ordinary set
// empty value - empty set
empty → {}
// add element
add({}, 123) → { 123 }
// merge - union sets
{ 123, 234 } ∪ { 234, 345 } → { 123, 234, 345 }
// value - just return a set
value({ 123, 234 }) → { 123, 234 }
What if we want to remove element?
// state of a G-Set on replicas A & B
A(X) → { 123, 234 }
B(X) → { 123, 234 }
// remove 123 on replica A
A(X) → { 234 }
B(X) → { 123, 234 }
// merge both replicas
A(X) ∪ B(X) → { 123, 234 } // WRONG!
2P-Set
2 Phase (add/remove) set
Again... just compose 2 sets:
- add set
- rem set a.k.a. tombstones
// empty value - 2 empty sets
empty → { add: {}, rem: {} }
// add element
X = add(empty, 123) → { add: { 123 }, rem: {} }
X = add(X, 234) → { add: { 123, 234 }, rem: {} }
// remove element
X = rem(X, 123) → { add: { 123, 234 }, rem: { 123 } }
// value - diff between add & rem
value(X) → { 234 }
// merge - union corresponding add/rem
X → { add: { 123, 234 }, rem: { 123 } }
Y → { add: { 123, 345 }, rem: {} }
X ∪ Y → { add: { 123, 234, 345 }, rem: { 123 } }
1. What if tombstones grow big?
2. What if we want to add removed element again?
OR-Set
Observed Remove Set
add: { A/1, B/2 }
rem: { }
Insert: C/3
Remove: C/4
Insert: C/5
add: { A/1, B/2, C/3 }
rem: { }
add: { A/1, B/2 }
rem: { C/4 }
add: { A/1, B/2, C/5 }
rem: { }
LWW-Register
Last Write Wins Register
Value
Timestamp
LWW-Map
Last Write Wins Key-Value Map
LWW-Register
Key-Value pair
OR-Set
With composability we can do a lot!
What would you say for subset of SQL?
Version vector
Should we relly on system timestamps?
Version vector
Logic time
C:1
B:1 C:1
B:2 C:1
A:1 B:2 C:1
A:2 B:2 C:1
B:3 C:1
A:2 B:4 C:1
B:3 C:2
A:2 B:5 C:1
A:2 B:5 C:4
B:3 C:3
A:3 B:3 C:3
A:2 B:5 C:5
A:4 B:5 C:5
A
B
C
Version vector
Comparison operator
Less
Greater
Equal
Concurrent
a:1 b:0 c:0
a:2 b:1 c:2
<
a:1 b:0 c:0
a:1 b:0 c:0
=
a:4 b:1 c:2
a:3 b:0 c:0
>
a:4 b:1 c:2
a:3 b:2 c:2
?!
Highly Available Transactions
Problem: how to keep consistency between different keys?
A
B
set(x=1)
set(y=1)
Highly Available Transactions
Read Atomic Transaction Isolation
- Synchronization independence
- Partition independence
RAMP-Fast
Introduction
Each record is identified by:
- key
- value
- metadata (dependencies)
- timestamp
RAMP-Fast
Writes
Prepare {X=1, ts:1, dep: [Y]}
Prepare {Y=1, ts:1, dep: [X]}
Prepared
Prepared
Commit {ts:1}
Commit {ts:1}
Committed
Committed
RAMP-Fast
Reads (happy case)
Get {X}
{X=1, ts:1, dep: [Y]}
{Y=1, ts:1, dep: [X]}
Get {Y}
RAMP-Fast
Reads (repair)
Get {X}
{X=1, ts:1, dep: [Y]}
{Y=0, ts:0, dep: []}
Get {Y}
Mismatch
Get {Y, ts:1}
{Y=1, ts:1, dep: [X]}
Resources
- Paper: http://hal.upmc.fr/inria-00555588/document
- Practical Demystification of CRDT: https://www.youtube.com/watch?v=PQzNW8uQ_Y4
- Consistency without consensus: https://www.infoq.com/presentations/crdt-soundcloud
- RAMP Transactions: www.bailis.org/papers/ramp-sigmod2014.pdf
- RAMP made easy: rustyrazorblade.com/2015/11/ramp-made-easy/
End
Conflict-free Replicated Data Types
By Bartosz Sypytkowski
Conflict-free Replicated Data Types
- 2,291