Fernanda Mora
Luis Román
Corbett, James C., et al. "Spanner: Google’s globally distributed database." ACM Transactions on Computer Systems (TOCS) 31.3 (2013): 8.
Spanner: Time will never be the same again
-Gustavo Fring
To build a transactional storage system replicated globally
Data center 1
Data center 2
Data center 3
Spanservers
...
Spanservers
Spanservers
...
...
Replication
Replication
Millions of nodes, hundreds of datacenters, trillions of database rows
(key:string, timestamp:int64) -> string
Paxos Group
Sinchronization algorithms
Implementation
Practical use
Global scale
But we can't distinguish concurrent events!
Only partial order: what about concurrent events?!
GPS receivers
Atomic clocks
How to use TrueTime to guarantee:
Timestamp Managment
Operation | Concurrency Control | Replica Required |
---|---|---|
Read-write | pessimistic | leader |
Read-only | lock-free | leader for timestamp |
Snapshot read | lock-free | any |
Timestamp Managment
Spanner's Paxos implementation uses timed leases to make leadership long-lived
Discovers has a quorum of lease votes
No longer has a quorum of lease votes
Spanner depends on the following invariant: for each Paxos group, each Paxos leader's lease interval is disjoint from every other leader's
Timestamp Managment
Transactional reads and writes use two-phase locking. As a result, timestamps can be assigned at any time after the locks have been acquired but before they've been released.
Monotonicity Invariant: Spanner assigns timestamps to Paxos writes in monotonically increasing order, even across leaders.
Timestamp Managment
Spanner also enforces the following external consistency invariant: Define the start and commit events for transaction Ti by:
and the commit timestamp as
Timestamp Managment
Enforced by two rules:
1.- The coordinator assigns a commit timestamp no less than the value of computed after
Timestamp Managment
Enforced by two rules:
2.- The coordinator leader ensures that clients cannot see any data commited until after TT.after(si) is true
Timestamp Managment
Enforced by two rules:
Timestamp Managment
Serving reads at a timestamp:
Every replica tracks a value called safe time which is the maximum timestamp at which replica is up-to-date. A replica can satisfy a read at timestamp t if t<=
Timestamp Managment
Serving reads at a timestamp:
: timestamp of the highest-applied Paxos write.
Is the prepare timestamp assigned by the participant leader Ti in a group g.
Timestamp Managment
Read only transactions:
read-only transactions executes intwo phases:
1.- assign a timestamp
2.- execute the transaction's read at
Timestamp Managment
Read only transactions:
read-only transactions executes intwo phases:
1.- assign a timestamp
2.- execute the transaction's read at
Details
Details
Aquired locks
Aquired locks
Aquired locks
Compute ts
Start logging
Done logging
Prepared + ts
Commit overall ts
Commit wait done
Release locks
Release locks
Release locks
Distribution of TrueTime values, sampled right after timeslave daemon polls the time masters. 90th, 99th and 99.9th percentiles
2PC scalability. Mean and sd over 10 runs
Effect of killing servers on throughput
Distribution of TrueTime values (percentiles), sampled right after timeslave daemon polls the time masters.
Global scale database with strict transactional guarantees
Easy-to-use
Semi-relational interface
SQL-based query language
Scalability
Automatic sharding
Fault tolerance
Consistent replication
External consistency
Wide-area distribution