@tobyhede
Distributed Systems are Hard
Databases are Fun
Distributed Systems are Fun
Databases are Hard
Theorem 1It is impossible in the asynchronous network model to implement a read/write data object that guarantees the following properties:
- Availability;
- Atomic consistency in all fair executions (including those in which messages are lost).
Name | PostgreSQL |
---|---|
Class | Relational |
Alignment | Consistent |
API | SQL over TCP/IP |
Feats | Hstore | JSON & PL/V8 | Foreign Data Wrappers |
---|---|---|---|
Saving Throws |
|
CREATE TABLE characters (
id serial PRIMARY KEY,
name TEXT,
attributes hstore
);
INSERT INTO characters (name, attributes) VALUES (
'postgresql',
'class => "relational",
skills => joins,
api => SQL over TCP/IP'
);
class Programmer < ActiveRecord::Base serialize :languages, ActiveRecord::Coders::Hstore end # Contains key 'ruby' Programmer.where("languages ? 'ruby'") # Key 'ruby' has value '2.0' Programmer.where("languages -> 'ruby' = '2.0'") # And slightly different for no extra charge Programmer.where("languages @> 'ruby=>2.0'") # And likes Programmer.where("languages -> 'ruby' LIKE '%1.9%'")
Name | MySQL |
---|---|
Class | Relational |
Alignment | Consistent |
API | SQL over TCP/IP* |
Feats | Storage Engines | Everywhere! | HandlerSocket |
---|---|---|---|
Saving Throws |
|
Name | DynamoDb |
---|---|
Class | Key/Value |
Alignment | Available, "Strongly" Consistent |
API | JSON over HTTP |
Feats | Predictable Latency |
Pricing |
Scale |
---|---|---|---|
Saving Throws |
|
Name | Riak |
---|---|
Class | Key/Value |
Alignment | Available, Tunable Consistency |
API | REST/Protocol Buffers/Erlang |
Feats | Clustering |
Tunable
|
Cloud Storage
|
---|---|---|---|
Saving Throws |
|
# Read an Object
GET /buckets/{bucket}/keys/{key}
# Store an Object with Generated Key
POST /buckets/{bucket}/keys
# Store an Object with Specified Key
PUT /buckets/{bucket}/keys/{key}
# Counters (CRDT)
POST /buckets/{bucket}/counters/{key}
require 'ripple'
# Documents are stored as JSON objects in Riak but have rich
# semantics, including validations and associations.
class Email
include Ripple::Document
property :from, String, :presence => true
property :to, String, :presence => true
property :sent, Time, :default => proc { Time.now }
property :body, String
end
email = Email.find("37458abc752f8413e") # GET /riak/emails/37458abc752f8413e
email.from = "toby@ninefold.com"
email.save # PUT /riak/emails/37458abc752f8413e
Name | BigTable |
---|---|
Class | Column |
Alignment | Availability |
API | Proprietary |
Feats | App Engine |
Massive |
|
---|---|---|---|
Saving Throws |
|
Eventually Persistent
Name | Cassandra |
---|---|
Class | Column |
Alignment
|
Available, Tunable Consistency
|
API | CQL / Thrift |
Feats | Distributed |
Write Performance |
CQL |
---|---|---|---|
Saving Throws |
|
CREATE TABLE playlists (
id uuid,
song_order int,
song_id uuid,
title text,
album text,
artist text,
PRIMARY KEY (id, song_order )
);
INSERT INTO playlists (id, song_order, title, artist, album)
VALUES (8a172618-b121-4136-bb10-f665cfc469eb, 1, 'Elevator of the Machine', 'Mount Fuji Doomjazz Corporation', 'Egor');
INSERT INTO playlists (id, song_order, title, artist, album)
VALUES (8a172618-b121-4136-bb10-f665cfc469eb, 2, 'Knock by the Stairs', 'Mount Fuji Doomjazz Corporation', 'Egor');
INSERT INTO playlists (id, song_order, title, artist, album)
VALUES (8a172618-b121-4136-bb10-f665cfc469eb, 3, 'Oktober', 'Bersarin Quartet', 'Bersarin Quartet');
Name | Memcache |
---|---|
Class | Key-Value |
Alignment |
Cache |
API | "Memcache" over TCP |
Feats |
Ubiquity |
Simplicity | Workableness |
---|---|---|---|
Saving Throws |
+1 Memcache as Standard
|
config.action_controller.perform_caching = true
config.cache_store = :mem_cache_store, "1.cache.com", "2.cache.com"
<% cache(cache_key_for_products) do %>
All available products:
<% Product.all.each do |p| %>
<% cache(p) do %>
<%= link_to p.name, product_url(p) %>
<% end %>
<% end %>
<% end %>
class Product < ActiveRecord::Base
include IdentityCache
has_many :images
cache_has_many :images, :embed => true
end
@product = Product.fetch(id)
@images = @product.fetch_images
Name | Redis |
---|---|
Class | Key/Value |
Alignment |
Consistent, mostly |
API | Plain-Text over TCP |
Feats |
Versatile |
Fast |
Queueing |
---|---|---|---|
Saving Throws |
|
# strings
SET akey "Hello"
GET akey
# hashes
HSET anhash greeting "Hello"
HGET anhash greeting
# lists
RPUSH alist "hello"
RPUSH alist "world"
LRANGE alist 0 -1
# sets
# sorted sets
# pub/sub
MULTI
INCR foo
INCR bar
EXEC
WATCH mykey
val = GET mykey
val = val + 1
MULTI
SET mykey $val
EXEC
class ImageConversionJob
def work
# convert some kind of image here
end
end
resque = Resque.new
resque << ImageConversionJob.new
EVAL "local result = redis.call('ZREVRANGE', 'myset', 0, 10, 'WITHSCORES');
redis.call('ZREMRANGEBYRANK','myset',0, 9);
return result;" 0
Name | Neo4j |
---|---|
Class | Graph |
Alignment
|
Consistent |
API | REST, Java |
Feats | Acid | O/R Impedance | Cypher |
---|---|---|---|
Saving Throws |
|
Optimized for highly connected data
MATCH (m:Movie {title:"Star Wars"})<-[:ACTS_IN]-(actor)
RETURN actor;
Name | CouchDb |
---|---|
Class | Document |
Alignment
|
Available |
API | REST, Java |
Feats | MVCC | Master-Master | WebApps |
---|---|---|---|
Saving Throws |
|
Name | MongoDB |
---|---|
Class | Document (BSON) |
Skills | Consistent-ish |
API | BSON JS Query Language |
Feats | Minimal Impedance | WebScale | Sharding |
---|---|---|---|
Saving Throws |
|
To recap: MongoDB is neither AP nor CP. The defaults can cause significant loss of acknowledged writes. The strongest consistency offered has bugs which cause false acknowledgements, and even if they're fixed, doesn't prevent false failures.
Consistency and fault-tolerance are hard. Genuinely hard. The devil is always in the details, and it takes a concentrated mind to navigate through these waters correctly
class Artist
include Mongoid::Document
field :name, type: String
embeds_many :instruments
end
class Instrument
include Mongoid::Document
field :name, type: String
embedded_in :artist
end
syd = Artist.where(name: "Syd Vicious").between(age: 18..25).first
syd.instruments.create(name: "Bass")
syd.with(database: "bands", session: "backup").save!
Name | RethinkDB |
---|---|
Class | Document |
Skills | Consistent |
API | JSON |
Feats | Minimal Impedance |
Admin
|
Sharding |
---|---|---|---|
Saving Throws |
|
Name | Hyperdex |
---|---|
Class | Key/Value |
Skills | Consistent |
API | C |
Feats | HyperSpace | Transactions | Scale |
---|---|---|---|
Saving Throws |
|
Hyperspace hashing deterministically maps objects to servers to enable efficient object insertion and retrieval. But it differs from these techniques because it takes into account the secondary attributes of an object when determining the mapping for an object. Specifcally, it maps objects to coordinates in a multi-dimensional Euclidean space - a hyperspace - which has axes defined by the objects' attributes. Each server in the system is mapped onto a region of the same hyperspace, and owns the objects that fall within its region. Clients use this mapping to deterministically insert, remove, and search for objects.
require "hyperdex"
client = HyperDex::Client::Client.new("127.0.0.1", 1982)
client.put(:space, 'hello', {:v => "Hello World!"})
puts client.get(:space, 'hello')
client.atomic_add(:space, 'votes', {upvotes: 1})
current_balance = client.get('accounts', 'account-number')['balance']
raise InsufficientFunds if current_balance < amount
balance = mybalance - amount
# cond_put(spacename, key, predicates, attributes)
success = client.cond_put('accounts', 'account-number', {balance: current_balance}, {balance: balance})
Name | HBase |
---|---|
Class | Column |
Skills | Consistent |
API | ****** |
Feats | BIG | REALLY BIG |
|
---|---|---|---|
Saving Throws |
-1 Gartner Magic Quandrant
|
In 2010 Facebook claimed that they had the largest Hadoop cluster in the world with 21 PB of storage.
On July 27, 2011 they announced the data had grown to 30 PB.
On June 13, 2012 they announced the data had grown to 100 PB.
On November 8, 2012 they announced the warehouse grows by roughly half a PB per day.
Skynet begins to learn at a geometric rate. It becomes self-aware at 2:14 a.m. Eastern time, August 29th. In a panic, they try to pull the plug.
Name | UnQLite |
---|---|
Class | Document & Key/value |
Skills | N/A |
API | C/C++ |
Feats | Embedded | Single File | Jx9 |
---|---|---|---|
Saving Throws |
|
Name | ElasticSearch |
---|---|
Class | Document |
Alignment | Consistent |
API |
RESTful JSON |
Feats | Facets | River | Percolators |
---|---|---|---|
Saving Throws |
|
[C]\ ,-[AMQP] \ | [C]--[LB]---[AMQP] / | | [C]/ | '-[AMQP] | [LS]---[ES] [LS] [ES]
[ES]
s = Tire.search(search_index) do
filter :range, {
"@timestamp" => {
"from" => from.iso8601,
"to" => to.iso8601
}
}
query {
string "@tags:unicorn"
}
end
s.results
{
"filter": {
"and": [
{
"range": {
"@timestamp": {
"from": "2013-09-01T00:00:00+00:00",
"to": "2013-09-26T23:59:59+00:00"
}
}
},
{
"terms": {
"@tags": ["error", "migration"],
"execution": "and"
}
},
{
"not": {
"filter": {
"term": { "@source": "client.log" }
}
}
}
]
},
"sort": [
{
"@timestamp": "desc"
}
]
}
curl -XPUT 'http://localhost:9200/events/event' -d '{
"event": {
"user" : "toby",
"post_date" : "2014-02-18T20:00:00",
"event" : "page_view"
}
}'
GET _search
{
"query" : {
"term" : { "user" : "toby" }
},
"facets" : {
"event" : {
"terms" : {"field" : "event"},
"facet_filter" : {
"term" : { "user" : "toby" }
},
"global": true
}
}
}
{
"took": 8,
"timed_out": false,
"_shards": {
"total": 10,
"successful": 10,
"failed": 0
},
"facets": {
"event": {
"_type": "terms",
"missing": 0,
"total": 5,
"other": 0,
"terms": [
{
"term": "page_view",
"count": 5
}
]
}
}
}