aws && docker && chef


adam meghji


Co-Founder and CTO at @universe

Entrepreneur & hacker.
Rails APIs, EmberJS, DevOps.
http://djmarmalade.com.
Always inspired!



The Social Marketplace For Events

Ticketing platform enabling event organizers to sell directly on their page(s), and through Universe.

Smart social integrations & cross-sell effect boost ticket sales.

Sell more tickets through better tools :)

today's talk

How Universe uses AWS AutoScaling  + Docker + Chef
for happy DevOps

GOAL: Share one possible infrastructure
architecture & the necessary tooling

ACTUAL GOAL: Inspiration!
Go home and play around.


the universe

tech stack




CHALLENGES for devs


Different languages: Ruby, Node, Python
Different frameworks: Rails, Sinatra, Express, Flask 

 Each Microservice API runs slightly differently!

Jumping in requires learning each API's unique configuration

We achieved consistency through Containers
 


CHALLENGEs for ops


Ticketing  business suffers from flash traffic surges

Need to scale microservice APIs independently

Microservice Architectures need to tolerate failure

It's the cloud -- sh*t happens!

We solve this through AWS AutoScaling Groups


CONTAINERIZING

YOUR APP


Why containerize?


Fast & reliable server provisioning

Consistency in development & production

Dependencies are organized
(private keys, certificates, libs)

Changes to depencies can be tested & staged


5 tips when containerizing


1. All app servers should be ephemeral

2. Separate your app servers & databases

3. Encrypt your secrets!

4. Team should connect via a Jump Server

5. Get a wildcard SSL certificate



TL;DR


The Twelve-Factor App
http://12factor.net

(covers ~90%? of what you need)



the original idea:

fat containers

FAT CONTAINERS


"Container runs a fully-baked app server VM"

1. CI server builds a Docker image if build passes

2. CI server tags the image by git commit

3. To deploy, pull specific image by commit tag, and restart container.


.. didn't work well!


Way too heavy!

Containers aren't VMs.

CI container is ephemeral, so could not incrementally build the images.

Added 20m to each CI job!!

docker ADD, push, and pull are SLOWWW






a better idea:

thin containers

THIN CONTAINERS


"Container runs a fully-baked app server PROCESS"

DOES NOT include application code.

DOES NOT include application gems, node_modules, etc.

Instead, app code & vendorized libs reside on host instance's FS,
and exposed to container's process via shared volumes.

Containers are much lighter, and infrequently built.




docker tip: baseimage-docker


http://phusion.github.io/baseimage-docker

Provides Ubuntu 14.04 LTS as base system

Provides a  correct init process
(init @ PID 1, not docker CMD)

Includes patches for open issues (apt incompatibilities, /etc/hosts, etc)

Helpful daemons & tools: syslog, cron, runit, setuser, etc.

docker tip: passenger-docker


https://github.com/phusion/passenger-docker

baseimage-docker

Ruby 1.9.3, 2.0.0, 2.1.5, and 2.2.0; JRuby 1.7.18 (optional)
JRuby + OpenJDK 8 from the openjdk-r PPA (optional)
Python 2.7, 3.0 (optional)
NodeJS 0.10 (optional)
nginx (optional)
Passenger 4 (optional)

example docker run

docker run --name web_staging -v /home/ubuntu/apps/web/staging:/home/app/web -w /home/app/web -p 80:80 uniiverse/web-staging    

--name web_staging, makes it easy to work with running container
docker exec web_staging tail -f /some/log/file 

  -v <host path>:<container path>, mounts codebase on host 

  -w <container path>, sets the cwd to the app root

  -p 80:80, expose nginx in container to port 80 on host VM


app SERVER

RELOADS

The challenge


Application code & packages are stored on host VM's FS

During deployment, changes happen to app code & packages

The app server process running within the Docker container needs to reload the app responsibly:

  • For HTTP, need zero downtime
  • For job workers, need safe restarts


ZERO-DOWNTIME RELOADS:

PASSENGER


Super easy to enable via passenger-docker image

Passenger handles app reloads automatically! 

Supports Ruby, Node, Python



ZERO-DOWNTIME RELOADS:

OTHER WEB SERVERS


When not handled automatically,
manually issue a reload via webserver CLI

i.e. for puma
docker exec web_staging bundle exec pumactl reload

SAFE RESTARTS: SIDEKIQ, KUE, etc. 


Send a SIGTERM, then SIGINT, then SIGKILL

rerun gem  launches your program, then watches the filesystem. If a relevant file changes, then it restarts your program.

i.e. sidekiq
docker run web_staging rerun --background "bundle exec sidekiq"

i.e. kue
docker run kaiju_staging rerun --background "node kue.js"


awS

autoscaling

aws autoscaling in 1 slide


AUTO SCALING GROUP:

CloudWatch events
+
EC2 Launch Configuration
+
Elastic Load Balancer (optional)

  == healthy EC2 instances


UNIVERSE AUTOSCALING


AUTO SCALING GROUP

1 per <app>_<environment>
(same as Docker images)

Defines which ELB is used for new instances

Defines min/max cluster size, and CloudWatch events which trigger scaling up/down


universe autoscaling


LAUNCH CONFIGURATION

Specifies machine type, disk config, etc.

Associates instance with IAM role

user_data.sh, executed on first boot




UNIVERSE AUTOSCALING


EC2 INSTANCES

Multi-AZ

Tagged with Name=<app>_<environment>

Runs the Docker container against app codebase


UNIVERSE AUTOSCALING


ELASTIC LOAD BALANCER

1 per <app>_<environment>

Terminates SSL via wildcard cert

New EC2 instances are automatically linked

(optional, only for HTTP services)

USER_data.sh


#!/bin/bash -v

# install pre-requisites
curl -L https://www.opscode.com/chef/install.sh | bash
DEBIAN_FRONTEND=noninteractive apt-get update -y
DEBIAN_FRONTEND=noninteractive apt-get install -y awscli
aws s3 --region=us-east-1 cp s3://uniiverse-bucket/uniiverse-validator.pem /etc/chef/

# write first-boot.json
(
cat << 'EOF'
{"run_list": ["role[boxoffice]"]}
EOF
) > /etc/chef/first-boot.json

# write client.rb
(
cat << 'EOF'
environment 'production'
log_level :info
log_location STDOUT
client_key '/etc/chef/uniiverse.pem'
chef_server_url 'https://api.opscode.com/organizations/uniiverse'
validation_client_name 'uniiverse-validator'
validation_key '/etc/chef/uniiverse-validator.pem'
EOF
) > /etc/chef/client.rb

# bootstrap via chef
chef-client -j /etc/chef/first-boot.json

USER_DATA.SH


1. installs Chef Client

2. writes Chef Client configuration to /etc/chef
(incl. Chef Role and Environment)

3. installs awscli (for S3 access)

4. S3 GET Chef private key via IAM role
(set in Launch Configuration)

5. runs Chef Client!


BOOTSTRAPPING AN inSTANCE



LET'S ZOOM IN ..



CHEF COOKBOOK: unii-base


1. Hardens a new instance (SSH configuration, firewalling, etc.)

2. Applies any security updates (HeartBleed, etc.)

3. Configures helpful daemons (fail2ban, log aggregation, etc.)

Consistently applied to all servers across any roles and environments


CHEF COOKBOOK: unii-DOCKER


1. Installs Docker daemon

2. Pulls private keys to auth with Private Repository



CHEF COOKBOOK: unii-app-server


1. Determines the `docker run` command


2. Pulls latest Docker image for <app>_<environment>


3. Downloads <app>_<environment>.tgz from S3 and extracts


4. Generates upstart configuration


5. Starts service, which launches Docker container


Zooming out again ..


DEPLOYMENTS

RE-APPLY UNII-APP-SERVER!

HIPCHAT + HUBOT CHATOPS

Deployments are executed by "Uniiverse ☃"



DEPLOYMENT:

BEHIND THE SCENES

DEPLOY SCRIPT: 4 PHASES


1. BUILD

an app.tgz

2. UPLOAD
to S3 bucket

3. SYNC
with all app servers in cluster

4. MIGRATE
any data 

STEP 1: BUILD

if Gemfile?
bundle install --path=vendor/bundle
(for vendorized gems)

if "app/assets/"?
RAILS_ENV=staging RAILS_GROUPS=assets bundle exec rake assets:precompile 
  (i.e. Rails assets)

if package.json?
NODE_ENV=staging make dist
  (builds node_modules/, compiles coffeescript, etc)


STEP 2: UPLOAD


1. Creates app.tgz archive, including app codebase, vendorized gems, etc. from STEP 1.

2. Excludes certain stuff (logs, caches, temp directories, etc)

3. Uploads app.tgz to special S3 bucket accessible only by app servers via IAM role


STEP 3: SYNC


1. Discovers all EC2 instances tagged with
Name=<app>_<environment>

2. Issues 1 Ansible ad-hoc command to reapply chef cookbooks

ansible all -i server1,server2, -a "chef-client -j /etc/chef/app-server.json" 


STEP 4: MIGRATE


if "db/migrate/"?
ansible all -i server3 -a "sudo docker exec web_staging bundle exec rake db:migrate"
  (i.e. runs any Rails migrations)


Only runs on 1 randomly-chosen server in the cluster


helpful

devops tools

./ssh.sh


quick CLI to SSH into any instance:
./ssh.sh <app> <environment> [command]

./ssh.sh web staging
./ssh.sh web staging free -m 

Resolves EC2 instances by Name tag

Solves the problem of server discovery for remote access
Uses the Jump Server to access the instance

./attach.sh


quick CLI to SSH into any app container:
./attach.sh <app> <environment> [command]

./attach.sh web staging
./attach.sh web staging bundle exec rails console staging
./attach.sh web staging bundle exec rake cache:clear 

Uses ./ssh.sh and docker exec

Simplifies connecting to the running container.
Perfect for opening an interactive console, etc.





THE

OUTCOME?

BENEFITS


A consistent way to ship microservices in Containers despite underlying language, framework, or dependencies

Services are multi-region, can auto-scale with traffic, and auto-heal during failure

Developers have 1 way of shipping code, with gory details neatly abstracted.  No longer daunting to add a new microservice.

Helpful tools and scripts can be written once and reused everywhere.


AREA OF OPTIMIZATION #1


OPTIMIZE COST:
Requires lots of ELB instances
(2 environments * N microservices)

ONE SOLUTION?
1 ELB & 1 ASG of HAProxy machines
    incl. subdomain routing,  health detection, etc.



AREA OF OPTIMIZATION #2


OPTIMIZE UTILIZATION:
EC2 instances are single-purpose
(only run 1 docker container)

ONE SOLUTION?
Marathon: execute long-running tasks via Mesos & REST API
All instance CPU & RAM is pooled, and tasks (i.e. containers) are evenly distributed by resource utilization



AREA OF OPTIMIZATION #3


A FEW NEW POINTS OF FAILURE:
  • Hosted Chef
  • Dockerhub Private Repo
  • Github

SOLUTION:
Build redundancy into the 3rd party services you rely on

IS THIS totally over-engineered?


We spiked working implementations of these awesome alternatives, all of which support Docker.

  • AWS Elastic Beanstalk
  • AWS OpsWorks
  • AWS EC2 Container Service
  • Google Container Engine (via kubernetes)
  • Marathon

We cherry-picked the parts we liked and rolled out own!
For us, it makes sense.  YMMV


happy

hacking!


THANK YOU :)


https://universe.com

@AdamMeghji
adam@universe.com
Made with Slides.com