The Fluid

Architecture

Your Host Tonight

Image source: my 6 yo daughter

Alex Fernández

Developer with 15+ years of experience

@pinchito

What We will cover


How to Flow


Flow with Requirements


Flow with Operational Constraints


Some Migration Strategies


Don't Stop Flowing

How to Flow


Turbulent Flow is Irreversible


It can be fun, but we don't want that kind of fun

Laminar Flow is Reversible


And we like when things flow smoothly!

Migrations are hard?


Thermo to the rescue!


Reversible processes are optimal:

  • no turbulence,
  • minimal entropy,
  • less complexity,
  • reduced headaches!

Change without pain


Go from A to B in a reversible way

(mostly)


Find your cruise velocity


Prepare a reversal strategy

Flow with Requirements

Circumstances Change

And you should adapt to them

Modern Architecture


Or so we thought

Slightly More modern architecture

Spot the seven differences

Fashion in ARchitecture


80s: minicomputers & terminals


90s: client-server


00s: three tiers


10s: NoSQL

The Perfect Architecture

Does not exist

Flow With Constraints


MediaSmart Mobile


Serve mobile ads


Performance and branding campaigns


150K+ requests / second

15M+ impressions / day

50+ servers

30+ countries

700M+ profiles

Guilty!



We help pay for your entertainment

Flow with capacity planning


From 4 to 150+ krps in 2.5 years

Flow for Operational Stability

From 38 to 112 krps in one day

Flow to Lower Costs


How Fast Can You Migrate


to a new cloud provider?


to a new hosting company?


to your own datacenter?


It matters because costs escalate quickly

Database migrations

are painful efforts

but shouldn't be!

How to Migrate Your Database


Build a compatibility layer


Avoid downtime if at all possible


Treat access and data separately


Have a reverse migration strategy


but try not to use it

Compatibility Layer


Adapter pattern (remember those?)


Reduced feature set


Don't use new features


Fake missing features


Adapter

Redis to Memcached driver:

exports.RedisAdapter = function(name, address) {
    // self-reference
    var self = this;
        
    // attributes
    var client = driver.getClient(address);
        
    self.get = function(key, callback) {
        runCommand('get', key, function(error, result) {
            if (error) return callback('Could not get ' + key + ':' + error);
            return callback(null, parse(key, result));
    });

    self.set = function(key, value, expiration, callback) {
        if (expiration) {
            return runCommand('set', key, JSON.stringify(value), 'EX', getExpiration(expiration), callback);
        }
        return runCommand('set', key, JSON.stringify(value), callback);
    });
};

Adapter in Use

var MemcachedAdapter = require('./memcached.js').MemcachedAdapter;
var MemcachedAdapter = require('./memcached.js').MemcachedAdapter;
var settings = require('./settings.js');

var db = {
    main: getAdapter('main', settings.MAIN_DB_ADDRESS),
};

db.main.get('hi', function(error, result) {
};

function getAdapter(name, address) {
    if (address.indexOf('redis:') === 0) {
        return new RedisAdapter(name, address);
    } else {
        return new MemcachedAdapter(name, address);
    }
}    

Each database configured to point at Redis or Memcached

Case Studies

Surprisingly hard to find

Warning: may not apply to your situation

Migrate and migrate again


We have used the following databases:
  • Couchbase
  • Memcached
  • Redis
  • DynamoDB
  • PostgreSQL
  • RedShift


Different systems have different trade-offs

and show different failure modes


Migration Strategies

Strategies or Patterns?


Battle tested strategies


Not an exhaustive collection


Just some ideas for migrations


Several options for the same requirements


Different reversibility behavior

Server: Stop and Migrate




  • Stop the system
  • Make a cold copy
  • Point clients to new database
  • Start again

Server: Stop and Migrate

settings.js:

module.exports = {
    reidsAddress: 'redis.mydomain.com',
};

db.js:

var settings = require('./settings.js');
exports.db = {
    current: new RedisAdapter(settings.redisAddress);
};

user:

var db = require('./db.js');
db.current.get(key, function(error, result) {
    ...
});

Server: Stop and Migrate


Most basic migration


Requires downtime


Reversal:

  • Just point your settings to the old address
  • Stop again and migrate back


Not really reversible

Case Study: MediaSmart VPC


Migration to Amazon virtual private cloud


Tried on 2015-03-03

Reversed on 2015-03-05

Due to an unrelated failure (!)


Tried again on 2015-03-11

Migrated EU on Friday the 13th

because who's afraid of superstitions?

Server: Read-only Version



  • Switch to read-only
  • Make a hot copy
  • Change to new database
  • Switch back to read and write

Server: Read-only Version


Read-only is not always admissible


A hot copy takes longer than a cold one


Reverse migration: switch to read-only again, migrate

Server: synchronize



  • Make a hot copy
  • Synchronize all writes
  • Switch to new copy when ready

Server: synchronize


Depends on server mechanism


No downtime: cool!


Reversal strategy: synchronize back


Full synchronization is hard!

Case Study: MediaSmart Daystats


Migration from Redis to Amazon's Redshift


Set up daily migration of customer stats

Moved old data at our leisure

Query data from one or the other
depending on the date range queried

Trivial reversal

Server: Double Copy



  • Make a hot copy
  • Switch to new database
  • Make another hot copy

Server: Double Copy


Some data loss is admissible


A timestamp is very valuable


Some data loss is inevitable


Reversal: prepare a reverse copy

Case Study: MediaSmart Profiles


Migration of ~120M profiles on 2015-02-16


Moved data from Redis to Amazon's DynamoDB

Lower cost, reasonable memory footprint

Some data loss is admissible


Trivial reversal: use old profiles

Client: Decorator


Pass all queries through an intermediary


Use any condition to select backend


Can be used to balance load

Client: Decorator

Just a clever adapter:

var Memcached = require('memcached');

exports.CleverAdapter = function(name, address) {
    // self-reference
    var self = this;
    
    // attributes
    var oldAdapter = new Memcached(address + ':11211');
    var newAdapter = new RedisAdapter(address);
        
    self.get = function(key, callback) {
        if (badWeather()) {
            return oldAdapter.get(key, callback);
        }
        return newAdapter.get(key, callback);
    }
};
  Downside: a few µs more per query

Client: Dual Lookup



  • Read from one database
  • If not present, try to read from second database

Client: Dual Lookup


db.js:

exports.db = {
    v1: new RedisAdapter(settings.oldRedis),
    v2: new RedisAdapter(settings.newRedis),
};


client:
function get(key, callback) {
    db.v1.get(key, function(error, result) {
        if (error || result) callback(error, result);
        db.v2.get(key, callback);
    });
}

Client: Dual Lookup


Migrate your servers at your leisure


Reversible by design


Now you're talking!


Bad latency issues

from old + new databases

Client: Dual Write



Similar to dual lookup


Latency may not be important

Client: Timed Rollover



  • Date < cutoff: go to the old database
  • Date > cutoff: go to the new database
  • May need some kind of copy


Client: Timed Rollover


client:

var CUTOFF_DATE = '2015-05-13';

function get(key, callback) {
    if (key.substringFrom('#') > CUTOFF_DATE) {
        db.v1.get(key, function(error, result) {
    } else {
        db.v2.get(key, callback);
    });
}

function set(key, value, expiration, callback) {
    if (new Date.toISOString() > CUTOFF_DATE) {
        db.v1.set(key, value, expiration, function(error, result) {
    } else {
        db.v2.set(key, value, expiration, callback);
    });
}

Client: Timed Rollover


Useful for sequential data

E.g. statistics, counters


No manual intervention is required


Reversal strategy:

  • change time limit,
  • possibly migrate data,
  • redeploy

Case Study: MediaSmart Mobile


Adding aggregates to daily stats

Improved common queries ~20x


Started adding aggregates in March 2015
If date > 2015-03-25: use aggregates
If date < 2015-03-25: do not use aggregates

Trivial reversal: change setting

Client: In-Place Conversion



  • Read value
  • If in old format, convert and write


Client: In-Place Conversion


Degenerate case: old database == new database


Can change driver, structure, format


Not concurrent


Reversal strategy:

  • Read value
  • If in new format, convert and write

Broker: Proxied Access

  • Read from or write to a proxy
  • Proxy decides where to access each time

Broker: Proxied Access


Can be used with other migration strategies


Typical case: access a Restful API


Another piece to maintain


Increased latency


Use with care

Case Study: Instagram


Migrated from AWS to Facebook datacenters


Year-long effort, from 2013-03 to ~2014-03


Had to go through AWS VPC first

Neti — a dynamic iptables manipulation daemon in Python

Three weeks into VPC, two weeks to FB


Bare minimum approach (!)

Case Study: Adtech Company X


Also migrating datacenters


Will start sending 10% of traffic

from new datacenter


No published materials


Inter-datacenter latencies

Broker: Queued Write



  • Read from first database
  • Write to queue
  • Write to both databases

Broker: Queued Write


Can be used with other migration strategies


Again, typically a Restful API


Avoids high write latencies


Helps ease migrations

Don't Stop Flowing

Strategies work for other things


Adapt them to your situation

How to Migrate Anything


Build one or more compatibility layers


Downtime is just bad engineering


Have a reverse migration strategy


and try not to use it

There Will Be Mistakes


Get over it

Unstable Equilibrium

The only way to fly supersonic fighters

Living with Unstabilities


Use safe defaults


Fail safely


Use a canary


Always monitor

Canary Example


Adtech: Real Time Bidding

Statistics are processed in a queue

Queue writes a canary
that expires in a few minutes

If no canary, stop bidding

Move Fast, Break Things

Or stay put and never break anything

Or anything in between

Thanks!

@pinchito

The Fluid Architecture

By Alex Fernández

The Fluid Architecture

Talk for EnterJS Darmstaadt, 2015-06-18.

  • 3,584