Chord

Ólafur Helgason

VP of Engineering @ OZ

 

Papers We Love

Reykjavik University

19 nov 2014

Why Chord?

  • Super-cool and simple
  • Theory & practice
  • Influential paper & topic
  • Personal reasons - nostalgy

Presentation overview

  • Background
  • Distributed Hash Tables
  • Chord
  • Lookup and churn in Chord
  • Applications
  • Discussions - Yes you!

2001

Interest in Distributed Systems

Distributed Hash Table

  • Same interface as normal hash table
    • keys map to values
  • Keys are assigned to nodes
  • Node stores all values for its set of keys
    • Hash table buckets are nodes in a netw
  • Chord one of the first implementations
    • CAN, Tapestry, Pastry, Kademlia

Wtf do I care?

  • Benefits
    • Decentralized
    • Scalability
    • Fault tolerance
    • Reliability
  • Drawbacks/challenges
    • Nodes leave/fail
    • Nodes join

Chord: One ring to rule them all

lookup(key) -> node

One ring to rule them all

  • Each node responsible for 1/n of keyspace
  • Half of neighbours keyspace on join
  • Split keyspace on leave
  • Simple lookup
    • Each node queries neighbour
    • Node state (successor) ~ O(1) 
    • Lookup ~ O(N)

Example: m=6, 10 node, 5 keys

Simple lookup (non optimal)

Improve lookup performance

  • Finger table
    • At most m entries ~ log(N)
      • keyspace [0, 2^(m-1))
    • Entry i
      • successor(n + 2^(i-1))

Finger table

Improved lookup

Churn

  • Join
    • new node (N26) find its successor
    • receives keys from successor
    • each node periodically updates finger table
  • Leave
    • Transfer keys to successor & notify predecessor

Overview

  • Simple: lookup(key) -> node
  • Scalable
    • State per node ~ O(log(N))
    • lookup performance ~ O(log(N))
  • Provable correctness
    • Even under churn

DHT Applications/implementations

  • BitTorrent distributed tracker 
  • Overlay multicast
  • Corel CDN
  • Amazon Dynamo
    • Cassandra, Riak

Amazon Dynamo

  • Each virtual node gets random ring position
  • Data assigned to vnode such that
    • vnode = successor(key)
  • Data replicated to N-1 next successors
    • N is number of replicas
    • Coordinator to 'tune' replication across different physical nodes 
  • Fault-tolerance
    • Nodes can fail
    • Can read and write under network partitioning
      • Write not propogated to all nodes - inconsistency!
Made with Slides.com