Dynamo Random Partitioning Must Die
BOP FTW
Random partitioning is suboptimal
Random partitioning puts locality optimization at odds with performance and stability.
Random partitioning is suboptimal
Not all workloads require more than k/v, k/row, or modest sized k/partition locality
Random partitioning is suboptimal
But there are a significant number of important workloads which would benefit from more (and more deliberate) locality.
Problematic Workloads
Time Series, with 4+ orders of magnitude difference in write rates between different sources.
Problematic Workloads
Graph (e.g. Titan/Tinkerpop). Where the amount of forest locality directly impacts query node hop count.
Problematic Workloads
Any poorly modeled workload where a small number of partitions dominate the load.
Thought Experiment
BOP with vnodes
Thought Experiment
BOP with many vnodes
Thought Experiment
BOP with few vnodes
Thought Experiment
BOP with a dynamic number of vnodes
Thought Experiment
BOP with vnodes that can dynamically change in response to load
Dynamic VNode API
trait vnode {
fn split (&self) -> (VNode, VNode);
fn merge(vnode1:VNode, vnode2:VNode) -> Result(Err,Vnode)
}
Node Level Negotiation API
trait Node {
fn shed_load (&self) -> PerfProfile;
fn acquire_load(&self) -> PerfProfile;
}
//yes, these might actually be the same method
Node Level Negotiation API
trait Node {
fn offer_vnode(&self, nodeX:Node);
fn accept_vnode(&self) -> PerfProfile;
fn move_vnode(&self, node:nodeX, vnode
}
//yes, these might actually be the same method
Spanning Rope*
An arbitrary name, but inspired by both Spanning Trees and Ropes
*Rope: a data structure composed of smaller strings that is used for efficiently storing and manipulating a very long string.
Spanning Rope
Trivially merges large empty ranges into a single vnode
Spanning Rope
Can function as a linked list of segments to allow trivial in-order reading across segments
Spanning Rope
Segments dynamically splittable, mergable, and movable.
Spanning Rope
Enables optimization algorithms (e.g. simplex/etc) to optimize against multiple competing constraints
Spanning Rope
Manages vnode size and placement through multiple heuristics/optimization algorithms
Spanning Rope
Can be tuned to steady state of 1 contiguous balanced vnode per node for maximum locality
Spanning Rope
Can be tuned to have thousands of vnodes for maximum nimbleness and flexibility
Spanning Rope
Would benefit substantially from range-segmented-sstables to allow merge, split, and transfer of vnodes between nodes to be very efficient
Spanning Rope
Optimize/constrain around
- Disk space
- CPU utlilization
- Range fragmentation
- Average latency
- P## latency
- etc. because pluggable allocation/optimization algorithms
Somebody asked for a pony

NGCC-We Come Not to Praise Dynamo
By Tupshin Harper
NGCC-We Come Not to Praise Dynamo
- 404