Dynamo Random Partitioning Must Die

BOP FTW

Random partitioning is suboptimal

Random partitioning puts locality optimization at odds with performance and stability.

Random partitioning is suboptimal

Not all workloads require more than k/v, k/row, or modest sized k/partition locality

Random partitioning is suboptimal

But there are a significant number of important workloads which would benefit from more (and more deliberate) locality.

Problematic Workloads

Time Series, with 4+ orders of magnitude difference in write rates between different sources.

Problematic Workloads

Graph (e.g. Titan/Tinkerpop). Where the amount of forest locality directly impacts query node hop count.

Problematic Workloads

Any poorly modeled workload where a small number of partitions dominate the load.

Thought Experiment

BOP with vnodes

Thought Experiment

BOP with many vnodes

Thought Experiment

BOP with few vnodes

Thought Experiment

BOP with a dynamic number of vnodes

Thought Experiment

BOP with vnodes that can dynamically change in response to load

Dynamic VNode API

trait vnode {
    fn split (&self) -> (VNode, VNode);
    fn merge(vnode1:VNode, vnode2:VNode) -> Result(Err,Vnode) 
}

Node Level Negotiation API

trait Node {
    fn shed_load (&self) -> PerfProfile;
    fn acquire_load(&self) -> PerfProfile;
}

 

//yes, these might actually be the same method

Node Level Negotiation API

trait Node {
    fn offer_vnode(&self, nodeX:Node);
    fn accept_vnode(&self) -> PerfProfile;
    fn move_vnode(&self, node:nodeX, vnode
}

 

//yes, these might actually be the same method

Spanning Rope*

An arbitrary name, but inspired by both Spanning Trees and Ropes

*Rope: a data structure composed of smaller strings that is used for efficiently storing and manipulating a very long string.

Spanning Rope

Trivially merges large empty ranges into a single vnode

Spanning Rope

Can function as a linked list of segments to allow trivial in-order reading across segments

Spanning Rope

Segments dynamically splittable, mergable, and movable.

Spanning Rope

Enables optimization algorithms (e.g. simplex/etc) to optimize against multiple competing constraints

Spanning Rope

Manages vnode size and placement through multiple heuristics/optimization algorithms

Spanning Rope

Can be tuned to steady state of 1 contiguous balanced vnode per node for maximum locality

Spanning Rope

Can be tuned to have thousands of vnodes for maximum nimbleness and flexibility

Spanning Rope

Would benefit substantially from range-segmented-sstables to allow merge, split, and transfer of vnodes between nodes to be very efficient

Spanning Rope

Optimize/constrain around

  • Disk space
  • CPU utlilization
  • Range fragmentation
  • Average latency
  • P## latency
  • etc. because pluggable allocation/optimization algorithms

Somebody asked for a pony

NGCC-We Come Not to Praise Dynamo

By Tupshin Harper

NGCC-We Come Not to Praise Dynamo

  • 404