An ode to 17 databases in N minutes

Featuring a variety of fairly mangled and variously mixed metaphors.

And animated gifs.
And probably star wars.
And some D&D. For no real reason.
Because I am the walking embodiment of a stereotype.



Hi

I'm Toby


@tobyhede

Dramatis Personae

  1. BigTable
  2. Cassandra
  3. Couch
  4. DynamoDb
  5. ElasticSearch
  6. Hyperdex
  7. Hadoop
  8. Datomic
  9. PostgreSQL
  10. Memcache
  11. MongoDb
  12. MySQL
  13. Neo4j
  14. Redis
  15. RethinkDB
  16. Riak
  17. UnQLite

Distributed Systems are Hard

Databases are Fun

Distributed Systems are Fun

Databases are Hard


NoSQL

NewSQL

PostSQL

Post-Rock Ambient Punk SQL

ACID

Atomic
Consistent
Isolated
Durable

CAPtain murica 

CAP Theorem
Brewer's Conjecture

CAP

Consistency
Availability
Partition Tolerance

Conjecture

and 

Formal Proof

Almost, but not quite, entirely unalike



Theorem 1 
It is impossible in the asynchronous network model to implement a read/write data object that guarantees the following properties:

  • Availability;
  • Atomic consistency in all fair executions (including those in which messages are lost).

Nancy Lynch & Seth Gilbert, Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services, ACM SIGACT News, Volume 33 Issue 2 (2002), pg. 51-59

BASE

Basically
Available
Soft
State

Everything is Distributed


Almost, but not quite, entirely untrue

P ? C : A



Cluster of Nodes

SCIENCE!


OMG

jepsen



Note to self:

Probably 

don't do 

the next bit

Links



What's the deal with Owl Bears?

PostgreSQL

MySQL for hipsters
jk, it's totes awesome
Name PostgreSQL
Class Relational
Alignment Consistent
API SQL over TCP/IP
Feats Hstore JSON & PL/V8 Foreign Data Wrappers
Saving Throws
  • +2 Rails
  • -1 Distribution
  • -1 pg_hba.conf

Hstore


  CREATE TABLE characters (
     id serial PRIMARY KEY,
     name TEXT,
     attributes hstore
  );
      
  INSERT INTO characters (name, attributes) VALUES (
    'postgresql',
    'class    => "relational",
     skills   => joins,
     api      => SQL over TCP/IP'
  );      
      

activerecord-postgres-hstore


  class Programmer < ActiveRecord::Base
    serialize :languages, ActiveRecord::Coders::Hstore
  end

  # Contains key 'ruby'
  Programmer.where("languages ? 'ruby'")

  # Key 'ruby' has value '2.0'
  Programmer.where("languages -> 'ruby' = '2.0'")

  # And slightly different for no extra charge
  Programmer.where("languages @> 'ruby=>2.0'")

  # And likes
  Programmer.where("languages -> 'ruby' LIKE '%1.9%'")


JSON

MySQL

The same as PostgreSQL
but controlled by an evil overlord
“One thing is for certain: there is no stopping them; the enterprise will soon be here. And I for one welcome our new oracular overlords. I’d like to remind them that as a trusted software engineer, I can be helpful in rounding up others to toil in their enterprise big data caves.”
Name MySQL
Class Relational
Alignment Consistent
API SQL over TCP/IP*
Feats Storage Engines Everywhere! HandlerSocket
Saving Throws
  • +2 PHP
  • +1 NoSQL Memcached Api
  • -1 NoSQL Memcached Api

See also

MariaDB
Percona Server (with XtraDB)
TokuDB

Dynamo/Db

The paper that launched a thousand at least 3 ships databases
Name DynamoDb
Class Key/Value
Alignment Available, "Strongly" Consistent
API JSON over HTTP 
Feats Predictable Latency Pricing
Scale
Saving Throws
  • +3 Giant Shoulders
  • +2 Scale
  • -1 Queries

Links

Riak

engineering ftw

Artist's impression
Actual results may vary

Name Riak
Class Key/Value 
Alignment Available, Tunable Consistency
API REST/Protocol Buffers/Erlang
Feats Clustering Tunable
Cloud Storage
Saving Throws
  • +2 Operations
  • +1 Full-Text Search 
  • +1 JSON
  • +1 Convergent Replicated Data Types

Consistent Hashing

and

Vritual Nodes

Simple API


  # Read an Object
  GET /buckets/{bucket}/keys/{key}

  # Store an Object with Generated Key
  POST /buckets/{bucket}/keys  

  # Store an Object with Specified Key
  PUT /buckets/{bucket}/keys/{key} 

  # Counters (CRDT)
  POST /buckets/{bucket}/counters/{key}
  

Ripple


  require 'ripple'

  # Documents are stored as JSON objects in Riak but have rich
  # semantics, including validations and associations.
  class Email
    include Ripple::Document
    property :from,    String, :presence => true
    property :to,      String, :presence => true
    property :sent,    Time,   :default => proc { Time.now }
    property :body,    String
  end

  email = Email.find("37458abc752f8413e")  # GET /riak/emails/37458abc752f8413e
  email.from = "toby@ninefold.com"
  email.save                               # PUT /riak/emails/37458abc752f8413e

Vector Clocks

How do they even?


Links

BigTable

something something dark side
Name BigTable
Class Column
Alignment Availability
API Proprietary
Feats App Engine Massive

Saving Throws
  • +2 sparse, distributed multi-dimensional sorted map
  • -1 f*cking google


Largest BigTable Cluster 

  • 100s PB data
  • sustained: 30M ops/sec
  • 100+ GB/s I/O


Spanner is Google's scalable, multi-version, globally-distributed, and synchronously-replicated database. It is the first system to distribute data at global scale and support externally-consistent distributed transactions. This paper describes how Spanner is structured, its feature set, the rationale underlying various design decisions, and a novel time API that exposes clock uncertainty. This API and its implementation are critical to supporting external consistency and a variety of powerful features: non-blocking reads in the past, lock-free read-only transactions, and atomic schema changes, across all of Spanner.

GPS

and an

Atomic Clock

Cassandra

more columns than a greek architecture
eventually, it's awesome


Eventually Persistent

Name Cassandra
Class Column
Alignment
Available, Tunable Consistency
API CQL / Thrift
Feats Distributed Write Performance
CQL
Saving Throws
  • +2 sparse, distributed multi-dimensional sorted map
  • +1 You no longer have XML descriptions of your data model that need a reboot if you change
  • +1 Light-weight transactions with Paxos

Title

Column 
The smallest increment of data, which contains a name, a value and a timestamp. 

 Row 
Columns that have the same primary key. 

 Primary Key 
The partition key. One or more columns that uniquely identify a row in a table 

 Table Column Family 
A collection of ordered (by name) columns fetched by row. A row consists of columns and have a primary key. The first part of the key is a column name. Subsequent parts of a compound key are other column names that define the order of columns in the table.

CREATE TABLE playlists (
  id uuid,
  song_order int,
  song_id uuid,
  title text,
  album text,
  artist text,
  PRIMARY KEY  (id, song_order ) 
);

INSERT INTO playlists (id, song_order, title, artist, album)
VALUES (8a172618-b121-4136-bb10-f665cfc469eb, 1, 'Elevator of the Machine', 'Mount Fuji Doomjazz Corporation', 'Egor');

INSERT INTO playlists (id, song_order, title, artist, album)
VALUES (8a172618-b121-4136-bb10-f665cfc469eb, 2, 'Knock by the Stairs', 'Mount Fuji Doomjazz Corporation', 'Egor');

INSERT INTO playlists (id, song_order, title, artist, album)
VALUES (8a172618-b121-4136-bb10-f665cfc469eb, 3, 'Oktober', 'Bersarin Quartet', 'Bersarin Quartet');




Links


Memcache

The simplest thing that can possibly work

I'm a cache, not a database!



Name Memcache
Class Key-Value
Alignment
Cache
API "Memcache" over TCP
Feats Ubiquity
Simplicity Workableness
Saving Throws
  • +3 Simplicity
+1 Memcache as Standard

Simple

  • set 
  • add 
  • replace 
  • append 
  • prepend 
  • cas 
  • get 
  • gets 
  • delete 
  • incr/decr

Client Server

Client controls hash
Server handles data

No cluster state


Rails Fragment Cache

config.action_controller.perform_caching = true
config.cache_store = :mem_cache_store, "1.cache.com", "2.cache.com"

<% cache(cache_key_for_products) do %>
  All available products:
  <% Product.all.each do |p| %>
    <% cache(p) do %>
      <%= link_to p.name, product_url(p) %>
    <% end %>
  <% end %>
<% end %>    

Rails with IdentityCache

class Product < ActiveRecord::Base
  include IdentityCache

  has_many :images

  cache_has_many :images, :embed => true
end

@product = Product.fetch(id)

@images = @product.fetch_images

Redis

the queue you have when you willfully refuse to admit you need a queue
Name Redis
Class Key/Value
Alignment
Consistent, mostly
API Plain-Text over TCP
Feats Versatile
Fast Queueing
Saving Throws
  • +1 Open Source as a Service
  • +1 Lua
  • -2 Distribution
  • +1 Blazing Speed

Data Structures

# strings
SET akey "Hello"
GET akey

# hashes
HSET anhash greeting "Hello" 
HGET anhash greeting

# lists
RPUSH alist "hello"
RPUSH alist "world"
LRANGE alist 0 -1

# sets
# sorted sets
# pub/sub

Optimistic Locking

MULTI
INCR foo
INCR bar
EXEC
MULTI, EXEC, DISCARD and WATCH

Check and Set

WATCH mykey
val = GET mykey
val = val + 1
MULTI
SET mykey $val
EXEC

Resque

class ImageConversionJob
  def work
    # convert some kind of image here
  end
end

resque = Resque.new
resque << ImageConversionJob.new

Gratuitous Aside

Lua!
OpenResty!

Lua Scripting


    EVAL "local result = redis.call('ZREVRANGE', 'myset', 0, 10, 'WITHSCORES'); 
redis.call('ZREMRANGEBYRANK','myset',0, 9); 
return result;" 0
    



I've just reinvented stored procedures!

Also

Maybe use a queue

RabbitMQ
Kafka

Neo4j

All your graphs are belong to us
Name Neo4j
Class Graph
Alignment
Consistent
API REST, Java
Feats Acid O/R Impedance Cypher
Saving Throws
  • +2 Transactions
  • +2 Traverse complex relationships



Almost, but not quite, entirely unlike a RDBMS


Optimized for highly connected data

(rather than aggregated data)







MATCH (m:Movie {title:"Star Wars"})<-[:ACTS_IN]-(actor)
RETURN actor;


Modelling a graph of Movies and Actors.



Links


(Totes fun)

Couch

Awesome for data you don't want to easily get to ever again but you want it replicated everywhere
Name CouchDb
Class Document
Alignment
Available
API REST, Java
Feats MVCC Master-Master WebApps
Saving Throws
  • +3 Replication
  • +2 Map/Reduce
  • -3 Ad-hoc queries

MongoDB

Webscale
If you think of it as Redis for JSON, you're probably ok

Name MongoDB
Class Document (BSON)
Skills Consistent-ish
API BSON 
JS Query Language
Feats Minimal Impedance WebScale Sharding
Saving Throws
  • +2 JSON
  • -1 Data
  • -1 Locks!

To recap: MongoDB is neither AP nor CP. The defaults can cause significant loss of acknowledged writes. The strongest consistency offered has bugs which cause false acknowledgements, and even if they're fixed, doesn't prevent false failures.


Consistency and fault-tolerance are hard. Genuinely hard. The devil is always in the details, and it takes a concentrated mind to navigate through these waters correctly

Mongoid

class Artist
  include Mongoid::Document
  field :name, type: String
  embeds_many :instruments
end

class Instrument
  include Mongoid::Document
  field :name, type: String
  embedded_in :artist
end

syd = Artist.where(name: "Syd Vicious").between(age: 18..25).first
syd.instruments.create(name: "Bass")
syd.with(database: "bands", session: "backup").save!
Totes from the docs

RethinkDB

MongoDB 2.0

Name RethinkDB
Class Document
Skills Consistent
API JSON 
Feats Minimal Impedance Admin
Sharding
Saving Throws
  • +1 Operations
  • +1 block-level multiversion concurrency control
  • -1 Performance
  • -2 Real World?


The commercial fringe

Couchbase

Aerospike

FoundationDB

MemSQL


Hyperdex

You had me at HyperSpace Hashing
Name Hyperdex
Class Key/Value
Skills Consistent
API C
Feats HyperSpace Transactions Scale
Saving Throws
  • +2 Performance
  • +1 Query attributes
  • -1 Community, real-world usage? 
  • +3 Science!

Hyperspace hashing deterministically maps objects to servers to enable efficient object insertion and retrieval. But it differs from these techniques because it takes into account the secondary attributes of an object when determining the mapping for an object. Specifcally, it maps objects to coordinates in a multi-dimensional Euclidean space - a hyperspace - which has axes defined by the objects' attributes. Each server in the system is mapped onto a region of the same hyperspace, and owns the objects that fall within its region. Clients use this mapping to deterministically insert, remove, and search for objects.









  require "hyperdex"
  client = HyperDex::Client::Client.new("127.0.0.1", 1982)

  client.put(:space, 'hello', {:v => "Hello World!"})
  puts client.get(:space, 'hello')
  
  client.atomic_add(:space, 'votes', {upvotes: 1})
    




  current_balance = client.get('accounts', 'account-number')['balance']

  raise InsufficientFunds if current_balance < amount

  balance = mybalance - amount
  
  # cond_put(spacename, key, predicates, attributes)
  success = client.cond_put('accounts', 'account-number', {balance: current_balance}, {balance: balance})
  

HBase

Hadoop

for the data problem you wish you had


Name HBase
Class Column
Skills Consistent
API ******
Feats BIG REALLY BIG
Saving Throws
  • +3 Complexity as a service
-1 Gartner Magic Quandrant

  • Hadoop
  • Hadoop Distributed File System
  • Zookeeper
  • Hive
  • HiveQL
  • Mahout
  • Map/Reduce

Small-to-medium Data

In 2010 Facebook claimed that they had the largest Hadoop cluster in the world with 21 PB of storage.
On July 27, 2011 they announced the data had grown to 30 PB. 
On June 13, 2012 they announced the data had grown to 100 PB. 
On November 8, 2012 they announced the warehouse grows by roughly half a PB per day.
Skynet begins to learn at a geometric rate. It becomes self-aware at 2:14 a.m. Eastern time, August 29th. In a panic, they try to pull the plug.
Berkeley Data Analytics Stack
But if you wanna appear really cool and underground then I reckon the next big thing is the Berkeley Data Analytics Stack. They are making really awesome shit.

UnQLite

NoSQLite
Name UnQLite
Class Document & Key/value
Skills N/A
API C/C++
Feats Embedded Single File Jx9
Saving Throws
  • +1 No Server
  • +1 Development
  • -3 Production

Elasticsearch

Oh why can't I make you my datastore?
Name ElasticSearch
Class Document
Alignment Consistent
API RESTful JSON
Feats Facets River Percolators
Saving Throws
  • +3 Logstash / Logging
  • +2 Query excellence
  • +1 APIs Java/Thrift/Memcached
  • -1 Eventuality
  • -2 Configurationability



[C]\      ,-[AMQP]
    \     |
[C]--[LB]---[AMQP]
    / |   |
[C]/  |   '-[AMQP]
      |
     [LS]---[ES]
     [LS]   [ES]
            [ES]

Tire


  s = Tire.search(search_index) do
    filter :range, {
      "@timestamp" => {
        "from" => from.iso8601,
        "to" => to.iso8601
      }
    }
    query {
      string "@tags:unicorn"
    }
  end

  s.results
  

  {
    "filter": {
      "and": [
        {
          "range": {
            "@timestamp": {
              "from": "2013-09-01T00:00:00+00:00",
              "to": "2013-09-26T23:59:59+00:00"
            }
          }
        },
        {
          "terms": {
            "@tags": ["error", "migration"],
            "execution": "and"
          }
        },
        {
          "not": {
            "filter": {
              "term": { "@source": "client.log" }
            }
          }
        }
      ]
    },
    "sort": [
      {
        "@timestamp": "desc"
      }
    ]
  }
  


curl -XPUT 'http://localhost:9200/events/event' -d '{
    "event": {
        "user" : "toby",
        "post_date" : "2014-02-18T20:00:00",
        "event" : "page_view"
    }
}'


GET _search
{
    "query" : {
        "term" : { "user" : "toby" }
    },
   "facets" : {
      "event" : { 
        "terms" : {"field" : "event"},
        "facet_filter" : {
            "term" : { "user" : "toby" }
        },
        "global": true
      }
    }
}

{
   "took": 8,
   "timed_out": false,
   "_shards": {
      "total": 10,
      "successful": 10,
      "failed": 0
   },
    "facets": {
      "event": {
         "_type": "terms",
         "missing": 0,
         "total": 5,
         "other": 0,
         "terms": [
            {
               "term": "page_view",
               "count": 5
            }
         ]
      }
    }
}

Links


 



Nailed it


Like Chess

With X-Wings
And Tie Fighters
And all your money

The End
@tobyhede

An ode to 17 database in N minutes

By tobyhede

An ode to 17 database in N minutes

  • 3,422