AWS

Using Docker

Ron Kurr

Winter 2017

What

  • Review of AWS concepts
  • Possible AWS architecture for AL and others
  • Demonstration

Why

  • AL is nearing completion
  • AWS and Docker landscapes have evolved over the past 12 months
  • We need a path to get AL and others into Amazon
  • See how much we can avoid writing ourselves

How

  • Review of AWS concepts
  • Showcase of a possible AWS architecture
  • Quick comparison against other non-Amazon stacks
  • Q & A

Region

  • Virginia
  • Ohio
  • California
  • Oregon
  • Montreal
  • Ireland
  • Fankfurt
  • London
  • Singapore
  • Sydney
  • Seoul
  • Tokyo
  • Mumbai
  • Sao Paulo

Availability Zone

  • different physical location in the same geographic region
  • high speed interconnects between AZs
  • for fault tolerance
  • AZs do go down
  • entire regions almost never
  • all regions have at least 2 AZs
  • North American regions have at least 3

Virtual Private Cloud (VPC)

  • you own personal class A network
  • separated from other networks
  • complete control over networking
  • complete control over security
  • easily constructed and destroyed
  • popular to have separate VPCs for development, qa, staging and production
  • possible to reproduce network setups right down to ip addresses

Subnet

  • standard networking construct
  • 10.10.0.0/24
  • can be internet accessible (public)
  • can be hidden from internet traffic (private)
  • a subnet is bound to an AZ
  • web servers and other web facing services go into the public subnet
  • databases and other security conscious services reside in the private subnet
  • network traffic flow is completely controllable via route tables and network ACLs

Gateways

  • by default, traffic cannot enter or exit the VPC
  • internet access requires that an Internet Gateway be attached to the VPC
  • public subnets can then access the internet
  • NAT Gateways are required for private subnets to gain access to the internet
  • each AZ should have its own NAT gateway

Internet Gateway

NAT Gateway

Instance

  • virtual machine
  • created from Amazon Machine Image (AMI)
  • Amazon Linux, Ubuntu, Red Hat and SUSE
  • multiple types (cores, RAM, I/O)
  • coded, eg m4.large
  • charged by the hour -- 1 minute usage is billed as a full hour
  • access is only done via SSH keys

EC2 Instance

AMI

Spot Instance

Security Group

  • networking "blanket" wrapped around instance
  • "only let traffic on port 22 in if it comes from 46.222.174.146"
  • "accept all traffic from anything coming from any of the public subnets"
  • "accept all traffic from any resource using the https-only security group"
  • easier to grok than routing tables and network ACLs

Bastion Server

  • aka Jump Box
  • relays SSH traffic from internet to boxes inside the VPC
  • only port 22 is open
  • usually restricted to particular range of ip address
  • SSH is very flexible and can proxy any port
  • run your SQL browser on your desktop and connect to the MySQL server on a private subnet
  • must be in a public subnet
  • want them in each AZ in case of an outage
  • perfect for auto scaling groups

Auto Scaling

  • the ability to grow or shrink a collection of EC2 instances dynamically
  • "I want at least 1 instance running at all times but prefer 4, if possible"
  • "turn off the group of instances after midnight and turn them back on in the morning"
  • "traffic is spiking so spin up an additional box"
  • "traffic has subsided so turn off a box"

Auto Scaling Group

Application Load Balancer (ALB)

  • public "face" of your application
  • forwards traffic to your application
  • terminates TLS so your applications don't have to deal with certificates
  • performs health checking so traffic flows only to healthy instances
  • logs all access
  • provides proxy headers to applications
  • layer 7 aware (understands HTTP)

Application Load Balancer

Web Application Firewall (WAF)

  • inspects web traffic before handing it over to the ALB
  • SQL injection attacks
  • cross-site scripting
  • can generate events/alarms using custom rules
  • "let me know if one ip sends more than 100 requests within 10 seconds"
  • "count all accesses from this ip address"
  • "put this range of addresses on the black list"
  • "block any payloads larger than 1mb"

WAF

CloudFront (CDN)

  • improves user experience by globally caching content
  • 50+ edge locations
  • handles both downloads and uploads
  • integrates with WAF
  • adds additional proxy headers
  • access is logged
  • terminates TLS so you don't have to
  • works with both static and dynamic content
  • can restrict access based on geography

CloudFront

Distribution

Edge Location

Route 53

  • DNS and traffic management
  • availability monitoring via health checks
  • will remove "dead" instances from resolution
  • knows about AWS resources so TTL issues aren't a problem
  • resources dynamically added/removed
  • complex scenarios supported -- fail over to another region/VPC, move Blue to Green or 10% of traffic goes to Blue

Route 53

Hosted Zone

Route Table

Elastic Container Service (ECS)

  • run Docker containers in AWS
  • requires EC2 instances (Amazon Linux simplest to set up)
  • special agents on each instance control Docker
  • auto registration with ALB for health checking and routing
  • similar level of control to Docker Compose
  • placement "hints" are supported
  • scaling up/down based on load is possible

Registry

ECS

Container

Proposed Architecture

Not everything lives within an AZ

Proposed Architecture

6 subnets spread across 3 AZs

Proposed Architecture

6 subnets spread across 3 AZs

Proposed Architecture

6 subnets spread across 3 AZs

Sample HTTP Traffic

http --verbose https://d1phq4yrkrmw68.cloudfront.net/alpha/ elb==internal-Phoen-LoadB-ZMUPOXFQ3RGN-477287101.us-west-2.elb.amazonaws.com port==80 endpoint==/bravo/

GET /alpha/?elb=internal-Phoen-LoadB-ZMUPOXFQ3RGN-477287101.us-west-2.elb.amazonaws.com&port=80&endpoint=%2Fbravo%2F HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Host: d1phq4yrkrmw68.cloudfront.net
User-Agent: HTTPie/0.9.9



HTTP/1.1 200 
Connection: keep-alive
Content-Type: application/json;charset=UTF-8
Date: Wed, 01 Mar 2017 21:10:47 GMT
Transfer-Encoding: chunked
Via: 1.1 4ddddf0243e9305f37605c71001e5dd7.cloudfront.net (CloudFront)
X-Amz-Cf-Id: yQv2ex7TZJ70PUtWDQecKravhqull3g5Nl13R0dwRrZ8ibxc42vxvg==
X-Application-Context: application
X-Cache: Miss from cloudfront

{
    "calculated-return-path": "http://d1phq4yrkrmw68.cloudfront.net/bravo", 
    "incoming-headers": {
        "accept": "*/*", 
        "accept-encoding": "gzip, deflate", 
        "cloudfront-forwarded-proto": "https", 
        "cloudfront-is-desktop-viewer": "true", 
        "cloudfront-is-mobile-viewer": "false", 
        "cloudfront-is-smarttv-viewer": "false", 
        "cloudfront-is-tablet-viewer": "false", 
        "cloudfront-viewer-country": "US", 
        "host": "internal-phoen-loadb-zmupoxfq3rgn-477287101.us-west-2.elb.amazonaws.com", 
        "user-agent": "HTTPie/0.9.9", 
        "x-amz-cf-id": "xV8kpuwQPkRGzd4Ovc4kxXfWyMLs8ho-mdzfkd6vcFo6FOexuq6ezA==", 
        "x-amzn-trace-id": "Self=1-58b738d7-7b8c79b3681047004e7014d3;Root=1-58b738d7-4f77d80103ec7114311376d6", 
        "x-forwarded-host": "d1phq4yrkrmw68.cloudfront.net", 
        "x-forwarded-port": "80", 
        "x-forwarded-proto": "http"
    }, 
    "served-by": "ip-10-0-50-205.us-west-2.compute.internal", 
    "status-code": 200, 
    "timestamp": "2017-03-01T21:10:47.534Z"
}

Need to translate AWS-specific proxy headers

Mechanics

  • a service description is pushed to ECS
  • ECS registers containers with the ALB
  • containers are assigned random ports so no port conflicts to worry about
  • /service-a on ALB gets routed to /service-a on the proper container -- remapping currently not supported*
  • path-based routing recently added -- might be able to remap
  • both internal and external load balancers can be used
  • ALB will deregister a container with a failing health check
  • ECS will move containers around when a VM fails

The Pitch

  • use Cloud Formation to build VPCs to specification
  • insertion of databases should also be automated
  • use a combination of on-demand and spot instances to create an ECS fleet
  • tool Bamboo to spit out a neutral deployment descriptor that can be converted to an ECS service
  • rework code to understand the standard proxy headers so that links work regardless of changing CDN or ALB domain names
  • have all interactions go through the CDN/WAF combination
  • learn how to leverage Route 53's capabilities

Pros

  • lots less infrastructure to write ourselves
  • embraces AWS best practices
  • can be scripted by a variety of tools, CloudFormation, AWS CLI, Ansible and Terraform
  • ECS is the Docker we already understand and simply adds scheduling
  • replacement of dead containers is automatic
  • telemetry via CloudWatch
  • centralized logging  via CloudWatch Logs
  • metrics and alarms can turn into automated recovery actions
  • already using Amazon's Docker Registry
  • probably more secure than what we currently do
  • all Amazon tooling -- no 2nd credit card bill

Cons

  • lots of moving pieces to understand
  • alarms and triggers need to be thought about and created
  • unsure how log searching compares to ElasticSearch
  • current ALB limitation of 10 routes per port
  • GUI deployment isn't viable
  • risk of vendor lock-in (everyone else is gravitating to Kubernetes for scheduling)
  • troubleshooting faulty containers can be painful
  • no service discovery mechanism
  • cloud only -- no on-premises equivalent for development

10 Route Limit Workaround

listener:8000 <=== this port for 10 services
    /one
    /two
    /three
    /four
    /five
    /six
    /seven
    /eight
    /nine
    /ten
...
listener:8010 <==== this port for another 10 services
    /ninety-nine
    /ninety-two
    /ninety-three
    /ninety-four
    /ninety-five
    /ninety-six
    /ninety-seven
    /ninety-eight
    /ninety-nine
    /one-hundred
  • Amazon says increases are coming (increased 04/05/17)
  • CloudFront would require a separate registration per port

Available On GitHub

  • Fully automated using Cloud Formation
  • https://github.com/kurron/cloud-formation-full-stack
  • https://github.com/kurron/cloud-formation-vpc
  • https://github.com/kurron/cloud-formation-ecs
  • https://github.com/kurron/cloud-formation-elb
  • https://github.com/kurron/cloud-formation-cdn
  • https://github.com/kurron/cloud-formation-waf
  • https://github.com/kurron/cloud-formation-ecs-service
  • https://github.com/kurron/cloud-formation-rds
  • https://github.com/kurron/cloud-formation-mongodb
  • https://github.com/kurron/cloud-formation-elasticache
  • https://github.com/kurron/cloud-formation-elasticsearch

Technology Radar

The Elastic Container Service (ECS) is AWS’ entry into the multihost Docker space. Although there is a lot of competition in this area, there aren’t many off-premises managed solutions out there yet. Although ECS seems like a good first step, we are worried that it is overly complicated at the moment and lacks a good abstraction layer. If you want to run Docker on AWS, though, this tool should certainly be high on your list. Just don’t expect it to be easy to get started with. Assess."

Technology Radar

CoreOS is a Linux distribution designed to run large, scalable systems. All applications deployed on a CoreOS instance are run in separate Docker containers, and CoreOS provides a suite of tools to help manage them, including etcd their own distributed configuration store. Newer services, such as fleet, help cluster management by ensuring that a specific number of service instances are always kept running. FastPatch allows atomic CoreOS upgrades using an active-passive root partition scheme and helps with quick rollback in case of problems. These new developments make CoreOS well worth looking into if you are already comfortable with Docker. Assess."

Technology Radar

Kubernetes is Google's answer to the problem of deploying containers into a cluster of machines, which is becoming an increasingly common scenario. It is not the solution used by Google internally but an open source project that originated at Google and has seen a fair number of external contributions. Since we mentioned Kubernetes on the previous Radar, our initial positive impressions have been confirmed, and we are seeing successful use of Kubernetes in production at our clients. Trial."

Technology Radar

HashiCorp continues to turn out interesting software. The latest to catch our attention is Nomad, which is competing in the ever-more-populated scheduler arena. Major selling points include not just being limited to containerized workloads, and operating in multi–data center / multiregion deployments.  Assess."

Technology Radar

The emerging Containers as a Service (CaaS) space is seeing a lot of movement and provides a useful option between basic IaaS (Infrastructure as a Service) and more opinionated PaaS (Platform as a Service). While Rancher creates less noise than some other players, we have enjoyed the simplicity that it brings to running Docker containers in production. It can run stand-alone as a full solution or in conjunction with tools like Kubernetes.  Trial."

Technology Radar

We've continued to have positive experiences deploying the Apache Mesos platform to manage cluster resources for highly distributed systems. Mesos abstracts out underlying computing resources such as CPU and storage, aiming to provide efficient utilization while maintaining isolation. Mesos includes Chronos for distributed and fault-tolerant execution of scheduled jobs, and Marathon for orchestrating long-running processes in containers.  Trial."

Technology Radar

Currently, there is no opinion on Docker Enterprise.

$75 - 200/month per node

Technology Radar

Currently, there is no opinion on Apprenda Platform.

Contact us.

  • Policy Driven
  • Multi-tenancy
  • Auto Scaling
  • High Availability
  • Metering
  • Self Services Ops & Dev Portal
  • Logging and Monitoring
  • Service Catalog

Technology Radar

Currently, there is no opinion on Deis.

  • Recently acquired by Microsoft
  • Workflow - K8S native platform
  • Helm - K8S package manager
  • Steward - K8S native service broker

Technology Radar

Currently, there is no opinion on Canonical Distribution of Kubernetes.

  • packaging of upstream K8S bits
  • best practices baked into installer/upgrader
  • multi-cloud installation
  • VMWare installation
  • self-support is free
  • can pay for support, consulting and customizations
  • will host and transfer to us
  • Prometheus baked into their distribution
  • runs on our development VMs

Technology Radar

Currently, there is no opinion on Red Hat's support of Kubernetes.

Origin is the upstream community project that powers OpenShift. Built around a core of Docker container packaging and Kubernetes container cluster management, Origin is also augmented by application lifecycle management functionality and DevOps tooling. Origin provides a complete open source container application platform.

Conveniences built atop K8S

Kubernetes, Mesos, and Swarm: Comparing the Rancher Orchestration Engine Options

Docker Native gives you the quickest ramp-up with little to no vendor lock-in beyond dependence on Docker. However, Docker Native is very bare bones at the moment and if you need to get complicated, larger-scale applications to production you need to choose one of Mesos/Marathon or Kubernetes.

If you are doing a green field implementation and either don’t have strong opinions about how to layout clusters, or your opinions agree with those of Google, then Kubernetes is a better choice.

Rancher and Tectonic sit atop Kubernetes, providing UIs, catalogs, etc.

Options

  • ECS

    • no GUI for deployment

    • only need to learn Amazon stuff

    • no on-premises version

    • will have to craft some pieces ourselves

  • Kubernetes

    • no licensing fee

    • no GUI (work in progress)

    • more stuff to learn

    • can work on premises

    • popular and continues to gain mind share

    • understands Amazon (ALBs, Route 53)

Options

  • Rancher Labs

    • lots of goodness

    • unsure of support cost (contact us)

    • push button AWS set up

    • Swarm/Kubernetes/Mesos compatible

  • CoreOS

    • lots of goodness

    • only uses Kubernetes

    • 10 nodes free (contact us)

  • Mesosphere

    • enterprise support (contact us)

    • proven to scale better than K8

Options

  • Canonical K8S Distribution
    • easy installation

    • free, purchase support as needed

    • works in AWS

    • works in VMWare

    • encapsulates best practices in installer

    • handles the quarterly K8S upgrades

  • Deis

    • gone dark after the acquisition

    • consulting seems to be the focus

  • Docker Swarm

    • AWS support unknown

    • smaller community

Options

  • OpenShift Origin
    • open source and free
    • built atop K8S
    • simplifies developer and ops experience
    • biased towards RHEL/CentOS via Atomic Host
    • claims to stay current with K8S releases
    • Books available

Recommendations

  • stay with Amazon
  • design around the limitations until they are removed
  • multiple VPCs to keep sandboxes separate
  • developers will have to deploy to their own cluster but we can use Spot Instances to control cost
  • we have to acquire AWS skills anyway for other things (CloudFront, WAF, CloudWatch, Lambda, etc) so why not use them to bolster any shortcomings in the current ECS implementation?
  • stay with a vendor neutral descriptor that we transform into something vendor specific -- leave the door open for changes
  • learn K8S in the background -- I suspect we'll eventually arrive there anyway

Recommendations

  • bet on Kubernetes
  • use DYI distributions first, exploring commercial tools later
  • investigate using Ingress to replace expensive ALBs*
  • investigate using spot instances for nodes.  Support is in "experimental" phase
  • investigate using Helm charts to simplify stack deployments and rollbacks
  • needs a domain that AWS can control
  • use a single cluster for development, test and production, relying on name spaces to keep things sorted
  • could use federated clusters to keep production in its own cluster or for disaster recovery/prevention

Local K8S Installation Options

Questions

Made with Slides.com