AWS

Using Docker

Ron Kurr

Winter 2017

What

Review of AWS concepts
Possible AWS architecture for AL and others
Demonstration

Why

AL is nearing completion
AWS and Docker landscapes have evolved over the past 12 months
We need a path to get AL and others into Amazon
See how much we can avoid writing ourselves

How

Review of AWS concepts
Showcase of a possible AWS architecture
Quick comparison against other non-Amazon stacks
Q & A

Region

Virginia
Ohio
California
Oregon
Montreal
Ireland
Fankfurt

London
Singapore
Sydney
Seoul
Tokyo
Mumbai
Sao Paulo

Availability Zone

different physical location in the same geographic region
high speed interconnects between AZs
for fault tolerance
AZs do go down
entire regions almost never
all regions have at least 2 AZs
North American regions have at least 3

Virtual Private Cloud (VPC)

you own personal class A network
separated from other networks
complete control over networking
complete control over security
easily constructed and destroyed
popular to have separate VPCs for development, qa, staging and production
possible to reproduce network setups right down to ip addresses

Subnet

standard networking construct
10.10.0.0/24
can be internet accessible (public)
can be hidden from internet traffic (private)
a subnet is bound to an AZ
web servers and other web facing services go into the public subnet
databases and other security conscious services reside in the private subnet
network traffic flow is completely controllable via route tables and network ACLs

Gateways

by default, traffic cannot enter or exit the VPC
internet access requires that an Internet Gateway be attached to the VPC
public subnets can then access the internet
NAT Gateways are required for private subnets to gain access to the internet
each AZ should have its own NAT gateway

Internet Gateway

NAT Gateway

Instance

virtual machine
created from Amazon Machine Image (AMI)
Amazon Linux, Ubuntu, Red Hat and SUSE
multiple types (cores, RAM, I/O)
coded, eg m4.large
charged by the hour -- 1 minute usage is billed as a full hour
access is only done via SSH keys

EC2 Instance

AMI

Spot Instance

Security Group

networking "blanket" wrapped around instance
"only let traffic on port 22 in if it comes from 46.222.174.146"
"accept all traffic from anything coming from any of the public subnets"
"accept all traffic from any resource using the https-only security group"
easier to grok than routing tables and network ACLs

Bastion Server

aka Jump Box
relays SSH traffic from internet to boxes inside the VPC
only port 22 is open
usually restricted to particular range of ip address
SSH is very flexible and can proxy any port
run your SQL browser on your desktop and connect to the MySQL server on a private subnet
must be in a public subnet
want them in each AZ in case of an outage
perfect for auto scaling groups

Auto Scaling

the ability to grow or shrink a collection of EC2 instances dynamically
"I want at least 1 instance running at all times but prefer 4, if possible"
"turn off the group of instances after midnight and turn them back on in the morning"
"traffic is spiking so spin up an additional box"
"traffic has subsided so turn off a box"

Auto Scaling Group

Application Load Balancer (ALB)

public "face" of your application
forwards traffic to your application
terminates TLS so your applications don't have to deal with certificates
performs health checking so traffic flows only to healthy instances
logs all access
provides proxy headers to applications
layer 7 aware (understands HTTP)

Application Load Balancer

Web Application Firewall (WAF)

inspects web traffic before handing it over to the ALB
SQL injection attacks
cross-site scripting
can generate events/alarms using custom rules
"let me know if one ip sends more than 100 requests within 10 seconds"
"count all accesses from this ip address"
"put this range of addresses on the black list"
"block any payloads larger than 1mb"

WAF

CloudFront (CDN)

improves user experience by globally caching content
50+ edge locations
handles both downloads and uploads
integrates with WAF
adds additional proxy headers
access is logged
terminates TLS so you don't have to
works with both static and dynamic content
can restrict access based on geography

CloudFront

Distribution

Edge Location

Route 53

DNS and traffic management
availability monitoring via health checks
will remove "dead" instances from resolution
knows about AWS resources so TTL issues aren't a problem
resources dynamically added/removed
complex scenarios supported -- fail over to another region/VPC, move Blue to Green or 10% of traffic goes to Blue

Route 53

Hosted Zone

Route Table

Elastic Container Service (ECS)

run Docker containers in AWS
requires EC2 instances (Amazon Linux simplest to set up)
special agents on each instance control Docker
auto registration with ALB for health checking and routing
similar level of control to Docker Compose
placement "hints" are supported
scaling up/down based on load is possible

Registry

ECS

Container

Proposed Architecture

Not everything lives within an AZ

Proposed Architecture

6 subnets spread across 3 AZs

Proposed Architecture

6 subnets spread across 3 AZs

Proposed Architecture

6 subnets spread across 3 AZs

Sample HTTP Traffic

http --verbose https://d1phq4yrkrmw68.cloudfront.net/alpha/ elb==internal-Phoen-LoadB-ZMUPOXFQ3RGN-477287101.us-west-2.elb.amazonaws.com port==80 endpoint==/bravo/

GET /alpha/?elb=internal-Phoen-LoadB-ZMUPOXFQ3RGN-477287101.us-west-2.elb.amazonaws.com&port=80&endpoint=%2Fbravo%2F HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Host: d1phq4yrkrmw68.cloudfront.net
User-Agent: HTTPie/0.9.9



HTTP/1.1 200 
Connection: keep-alive
Content-Type: application/json;charset=UTF-8
Date: Wed, 01 Mar 2017 21:10:47 GMT
Transfer-Encoding: chunked
Via: 1.1 4ddddf0243e9305f37605c71001e5dd7.cloudfront.net (CloudFront)
X-Amz-Cf-Id: yQv2ex7TZJ70PUtWDQecKravhqull3g5Nl13R0dwRrZ8ibxc42vxvg==
X-Application-Context: application
X-Cache: Miss from cloudfront

{
    "calculated-return-path": "http://d1phq4yrkrmw68.cloudfront.net/bravo", 
    "incoming-headers": {
        "accept": "*/*", 
        "accept-encoding": "gzip, deflate", 
        "cloudfront-forwarded-proto": "https", 
        "cloudfront-is-desktop-viewer": "true", 
        "cloudfront-is-mobile-viewer": "false", 
        "cloudfront-is-smarttv-viewer": "false", 
        "cloudfront-is-tablet-viewer": "false", 
        "cloudfront-viewer-country": "US", 
        "host": "internal-phoen-loadb-zmupoxfq3rgn-477287101.us-west-2.elb.amazonaws.com", 
        "user-agent": "HTTPie/0.9.9", 
        "x-amz-cf-id": "xV8kpuwQPkRGzd4Ovc4kxXfWyMLs8ho-mdzfkd6vcFo6FOexuq6ezA==", 
        "x-amzn-trace-id": "Self=1-58b738d7-7b8c79b3681047004e7014d3;Root=1-58b738d7-4f77d80103ec7114311376d6", 
        "x-forwarded-host": "d1phq4yrkrmw68.cloudfront.net", 
        "x-forwarded-port": "80", 
        "x-forwarded-proto": "http"
    }, 
    "served-by": "ip-10-0-50-205.us-west-2.compute.internal", 
    "status-code": 200, 
    "timestamp": "2017-03-01T21:10:47.534Z"
}

Need to translate AWS-specific proxy headers

Mechanics

a service description is pushed to ECS
ECS registers containers with the ALB
containers are assigned random ports so no port conflicts to worry about
/service-a on ALB gets routed to /service-a on the proper container -- remapping currently not supported*
path-based routing recently added -- might be able to remap
both internal and external load balancers can be used
ALB will deregister a container with a failing health check
ECS will move containers around when a VM fails

The Pitch

use Cloud Formation to build VPCs to specification
insertion of databases should also be automated
use a combination of on-demand and spot instances to create an ECS fleet
tool Bamboo to spit out a neutral deployment descriptor that can be converted to an ECS service
rework code to understand the standard proxy headers so that links work regardless of changing CDN or ALB domain names
have all interactions go through the CDN/WAF combination
learn how to leverage Route 53's capabilities

Pros

lots less infrastructure to write ourselves
embraces AWS best practices
can be scripted by a variety of tools, CloudFormation, AWS CLI, Ansible and Terraform
ECS is the Docker we already understand and simply adds scheduling
replacement of dead containers is automatic
telemetry via CloudWatch
centralized logging via CloudWatch Logs
metrics and alarms can turn into automated recovery actions
already using Amazon's Docker Registry
probably more secure than what we currently do
all Amazon tooling -- no 2nd credit card bill

Cons

lots of moving pieces to understand
alarms and triggers need to be thought about and created
unsure how log searching compares to ElasticSearch
current ALB limitation of 10 routes per port
GUI deployment isn't viable
risk of vendor lock-in (everyone else is gravitating to Kubernetes for scheduling)
troubleshooting faulty containers can be painful
no service discovery mechanism
cloud only -- no on-premises equivalent for development

10 Route Limit Workaround

listener:8000 <=== this port for 10 services
    /one
    /two
    /three
    /four
    /five
    /six
    /seven
    /eight
    /nine
    /ten
...
listener:8010 <==== this port for another 10 services
    /ninety-nine
    /ninety-two
    /ninety-three
    /ninety-four
    /ninety-five
    /ninety-six
    /ninety-seven
    /ninety-eight
    /ninety-nine
    /one-hundred

Amazon says increases are coming (increased 04/05/17)
CloudFront would require a separate registration per port

Available On GitHub

Fully automated using Cloud Formation
https://github.com/kurron/cloud-formation-full-stack
https://github.com/kurron/cloud-formation-vpc
https://github.com/kurron/cloud-formation-ecs
https://github.com/kurron/cloud-formation-elb
https://github.com/kurron/cloud-formation-cdn
https://github.com/kurron/cloud-formation-waf
https://github.com/kurron/cloud-formation-ecs-service
https://github.com/kurron/cloud-formation-rds
https://github.com/kurron/cloud-formation-mongodb
https://github.com/kurron/cloud-formation-elasticache
https://github.com/kurron/cloud-formation-elasticsearch

Technology Radar

The Elastic Container Service (ECS) is AWS’ entry into the multihost Docker space. Although there is a lot of competition in this area, there aren’t many off-premises managed solutions out there yet. Although ECS seems like a good first step, we are worried that it is overly complicated at the moment and lacks a good abstraction layer. If you want to run Docker on AWS, though, this tool should certainly be high on your list. Just don’t expect it to be easy to get started with. Assess."

Technology Radar

CoreOS is a Linux distribution designed to run large, scalable systems. All applications deployed on a CoreOS instance are run in separate Docker containers, and CoreOS provides a suite of tools to help manage them, including etcd their own distributed configuration store. Newer services, such as fleet, help cluster management by ensuring that a specific number of service instances are always kept running. FastPatch allows atomic CoreOS upgrades using an active-passive root partition scheme and helps with quick rollback in case of problems. These new developments make CoreOS well worth looking into if you are already comfortable with Docker. Assess."

Technology Radar

Kubernetes is Google's answer to the problem of deploying containers into a cluster of machines, which is becoming an increasingly common scenario. It is not the solution used by Google internally but an open source project that originated at Google and has seen a fair number of external contributions. Since we mentioned Kubernetes on the previous Radar, our initial positive impressions have been confirmed, and we are seeing successful use of Kubernetes in production at our clients. Trial."

Technology Radar

HashiCorp continues to turn out interesting software. The latest to catch our attention is Nomad, which is competing in the ever-more-populated scheduler arena. Major selling points include not just being limited to containerized workloads, and operating in multi–data center / multiregion deployments. Assess."

Technology Radar

The emerging Containers as a Service (CaaS) space is seeing a lot of movement and provides a useful option between basic IaaS (Infrastructure as a Service) and more opinionated PaaS (Platform as a Service). While Rancher creates less noise than some other players, we have enjoyed the simplicity that it brings to running Docker containers in production. It can run stand-alone as a full solution or in conjunction with tools like Kubernetes. Trial."

Technology Radar

We've continued to have positive experiences deploying the Apache Mesos platform to manage cluster resources for highly distributed systems. Mesos abstracts out underlying computing resources such as CPU and storage, aiming to provide efficient utilization while maintaining isolation. Mesos includes Chronos for distributed and fault-tolerant execution of scheduled jobs, and Marathon for orchestrating long-running processes in containers. Trial."

Technology Radar

Currently, there is no opinion on Docker Enterprise.

$75 - 200/month per node

Technology Radar

Currently, there is no opinion on Apprenda Platform.

Policy Driven
Multi-tenancy
Auto Scaling
High Availability
Metering
Self Services Ops & Dev Portal
Logging and Monitoring
Service Catalog

Technology Radar

Currently, there is no opinion on Deis.

Recently acquired by Microsoft
Workflow - K8S native platform
Helm - K8S package manager
Steward - K8S native service broker

Technology Radar

Currently, there is no opinion on Canonical Distribution of Kubernetes.

packaging of upstream K8S bits
best practices baked into installer/upgrader
multi-cloud installation
VMWare installation
self-support is free
can pay for support, consulting and customizations
will host and transfer to us
Prometheus baked into their distribution
runs on our development VMs

Technology Radar

Currently, there is no opinion on Red Hat's support of Kubernetes.

Origin is the upstream community project that powers OpenShift. Built around a core of Docker container packaging and Kubernetes container cluster management, Origin is also augmented by application lifecycle management functionality and DevOps tooling. Origin provides a complete open source container application platform.

Conveniences built atop K8S

Kubernetes, Mesos, and Swarm: Comparing the Rancher Orchestration Engine Options

Docker Native gives you the quickest ramp-up with little to no vendor lock-in beyond dependence on Docker. However, Docker Native is very bare bones at the moment and if you need to get complicated, larger-scale applications to production you need to choose one of Mesos/Marathon or Kubernetes.

If you are doing a green field implementation and either don’t have strong opinions about how to layout clusters, or your opinions agree with those of Google, then Kubernetes is a better choice.

Rancher and Tectonic sit atop Kubernetes, providing UIs, catalogs, etc.

Options

ECS
- no GUI for deployment
- only need to learn Amazon stuff
- no on-premises version
- will have to craft some pieces ourselves
Kubernetes
- no licensing fee
- no GUI (work in progress)
- more stuff to learn
- can work on premises
- popular and continues to gain mind share
- understands Amazon (ALBs, Route 53)

Options

Rancher Labs
- lots of goodness
- unsure of support cost (contact us)
- push button AWS set up
- Swarm/Kubernetes/Mesos compatible
CoreOS
- lots of goodness
- only uses Kubernetes
- 10 nodes free (contact us)
Mesosphere
- enterprise support (contact us)
- proven to scale better than K8

Options

Canonical K8S Distribution
- easy installation
- free, purchase support as needed
- works in AWS
- works in VMWare
- encapsulates best practices in installer
- handles the quarterly K8S upgrades
Deis
- gone dark after the acquisition
- consulting seems to be the focus
Docker Swarm
- AWS support unknown
- smaller community

Options

OpenShift Origin
- open source and free
- built atop K8S
- simplifies developer and ops experience
- biased towards RHEL/CentOS via Atomic Host
- claims to stay current with K8S releases
- Books available

Recommendations

stay with Amazon
design around the limitations until they are removed
multiple VPCs to keep sandboxes separate
developers will have to deploy to their own cluster but we can use Spot Instances to control cost
we have to acquire AWS skills anyway for other things (CloudFront, WAF, CloudWatch, Lambda, etc) so why not use them to bolster any shortcomings in the current ECS implementation?
stay with a vendor neutral descriptor that we transform into something vendor specific -- leave the door open for changes
learn K8S in the background -- I suspect we'll eventually arrive there anyway

Recommendations

bet on Kubernetes
use DYI distributions first, exploring commercial tools later
investigate using Ingress to replace expensive ALBs*
investigate using spot instances for nodes. Support is in "experimental" phase
investigate using Helm charts to simplify stack deployments and rollbacks
needs a domain that AWS can control
use a single cluster for development, test and production, relying on name spaces to keep things sorted
could use federated clusters to keep production in its own cluster or for disaster recovery/prevention

AWS

Using Docker

What

Why

How

Region

Availability Zone

Virtual Private Cloud (VPC)

Subnet

Gateways

Instance

Security Group

Bastion Server

Auto Scaling

Application Load Balancer (ALB)

Web Application Firewall (WAF)

CloudFront (CDN)

Route 53

Elastic Container Service (ECS)

Proposed Architecture

Proposed Architecture

Proposed Architecture

Proposed Architecture

Sample HTTP Traffic

Mechanics

The Pitch

Pros

Cons

10 Route Limit Workaround

Available On GitHub

Technology Radar

Technology Radar

Technology Radar

Technology Radar

Technology Radar

Technology Radar

Technology Radar

Technology Radar

Technology Radar

Technology Radar

Technology Radar

Options

Options

Options

Options

Recommendations

Recommendations

Local K8S Installation Options

Questions