Banyan: Coordination-free Distributed Transactions over Mergeable Types
Synopsis by
Shashank Shekhar Dubey
Guide: Dr. KC Sivaramakrishnan
RISE Lab
Dept. of Computer Science & Engineering
IIT Madras
Brief discussion on Banyan: The problem in hand and the proposed solution
Operational semantics
Why build distributed systems?
Scalability
Fault-tolerance
High availability
High throughput
Low latency
Why build distributed systems?
Why build distributed systems?
Latency
Latency
Latency
Problem with eventual consistency
Conflicts while merging replicas
Mergeable Replicated Data Types (MRDT)
Distributed variant of ordinary data types
Inbuilt ability to reconcile conflicts
Challenges:
Data distribution
Recursive merge
Storage requirement
Operational Semantics
Banyan
Operational Semantics
Core language => Operations
Mathematical rules => Transactions
Operational Semantics
Understanding of system
Verification
Programming Model
Programming Model
Public Branch
Private Branch
Isolated R/W operations at each private branch
Remote Refresh
Publish
Refresh
Banyan objects:
Blob
Tree
Commit
Key-Value : [a;b;c] -> v
Initial state:
a
lb
V1
C1
write B1 [a] v1:
B1
Tag Store
Block Store
write B1 [a] v1
write B1 [b;c] v2:
write B1 [b;d] v3:
Garbage Collection
Garbage Collection
Cassandra : 4.9 MB
Banyan : 1.8 GB
376 x
Space usage: Cassandra vs Banyan
Garbage Collection
Usual approach : Node reachability
Our approach : Node accessibility
Bugs found in Irmin
Prefering shorter path when merging conflicting updates
Non-commutative merge in case of modify/delete conflict
Thank you