Cloud Design Patterns

Snapshot Pattern

Create an S3-backed, point-in-time snapshot of a running instance from the AWS console.


  • can take seconds or minutes, depending on amount to be saved
  • snapshots can be used as root volumes on new EC2 instances, creating a near-clone of the snapshotted instance

Stamp Pattern

Create an image of an operating system that is pre-configured for a particular purpose


  • quickly create multiple instances using the same "stamp"
  • in AWS parlance, the "stamp" is a custom AMI
  • Packer and Vagrant are great tools for creating custom AMI
  • Vagrant is another option but I've never built them that way
  • AMIs are region specific so you'll need to build several
  • our AMIs come pre-loaded with Docker and Loggly support

Scale-Up Pattern

The scale up pattern is a method that allows a server to change size and specifications dynamically, and as needed.

  • fancy way of saying we can move from a smaller EC2 box to a larger one simply by rebooting
  • possible to automate and schedule the change
  • only makes sense if CPU or RAM are the bottleneck
  • does require down time
  • Terraform can be used to automate this

Scale-Out Pattern

Tie 2 or more instances together to add processing power without incurring downtime

  • Create an elastic load balancer with forwarding ports and health checks
  • Create a launch configuration for the instance
  • Create an auto scaling group with configured CloudWatch alarms and scaling policies
  • Can react to changes in CPU and other metrics
  • Will scale up and down, depending on the current environment
  • Will be smart and remove the cheapest instance to terminate
  • Can be automated with Terraform

Scale-Out Pattern

On-Demand Disk Pattern

Increase the disk on an already running EC2 instance

  • requires down time
  • normally done manually
  • might be automated
  • allows for striping via software RAID
  • the amount of down time required might set off scaling and CloudWatch alarms
  • can add a new disk and begin using that along with the old one
  • can add a new disk, copy the data from the old disk, and discard the old disk
  • can change from magnetic storage to SSD
  • use of LVM from the beginning can simplify things

Multi-Server Pattern

A variation of the scale-out pattern where availability is the primary concern and is not dynamic

  • if you need one server to handle a load, install 2
  • allows for brown outs, hiccups and other anomalies without affecting user experience
  • shared session state should be stored in ElastiCache
  • persistent data should be backed by a redundant cluster
  • same recipe as in Scale-Out but we don't use a scaling group

Multi-Datacenter Pattern

A nuanced version of the multi-server pattern where multiple availability zones are used

  • load balancers must live in multiple AZs
  • EC2 instances must live in multiple AZs
  • managed services, such as RDS, support multiple AZs
  • by-hand services, such as MongoDB, need to account for multiple AZs as best they can

Floating IP Pattern

Provides a way to apply zero downtime updates when a load balancer is not being used.


  1. Create an EC2 instance that serves HTTP content to an end user
  2. Assign a floating IP address to the EC2 instance
  3. Create a secondary EC2 instance that serves HTTP content to the end users
  4. Swap the floating IP address to the secondary EC2 instance
  5. Perform modifications to the original EC2 instance
  6. Swap the floating IP address back to the original EC2 instance once modifications are complete

Deep Health Check Pattern

The deep health check pattern lets the instances connected to the load balancer notify the load balancer of health checks beyond the grasp of the load balancer itself.


  • Only detects downstream failures, doesn't repair them
  • Example downstream services include databases and Google API calls
  • JVM projects typically have this pattern baked in via Spring Boot

High Availability Storage Pattern

Leverage S3 to store static text and binary web assets, letting Amazon deal with the redundancy, encryption and failover


  1. upload static assets to an S3 bucket
  2. create an S3 bucket policy that allows internet users to access and download the assets
  3. use EC2 to run a web server instance  that hosts the links to the assets
  4. configure IAM accounts if particular security constraints are desired

Direct Storage Hosting Pattern

For static web sites, remove the web server from the Direct Storage Hosting pattern and let S3 serve up the entire site.


  • any Javascript hosted in this manner will have to use CORS in order to talk to services (they won't live in the S3 domain)

Private Data Delivery Pattern

Secure access to S3 assets using a time-sensitive URL


  • expiring URLs are a feature of S3
  • time limit is configurable

Content Delivery Network Pattern

Ensure that data does not have to travel far to reach the end user, reducing latency and improving the end user experience.

  • CloudFront is Amazon's CDN that does this
  • Content is optimized either by latency or geographic location
  • The URL is the "key" used in CF's caching mechanism

Rename Distribution Pattern

Gets new versions of data out to the user in such a way as to ensure that old data is not being used.  Also known as "cache busting"


  • lower the cache timeouts in CloudFront
  • use expiring S3 URLs
  • new URLs means the CloudFront will have new "keys" into its cache and won't serve up old content
  • the book was light on implementation details other than "use a URL shortening service to make things easier"
  • Spring provides cache busting out of the box

Clone Server Pattern

Dynamically created clones of a static master server are created as needed.


  • create a single master instance that never goes away
  • put a load balancer in front of the master instance
  • create an AMI based on the master instance
  • create a launch configuration for the secondary instances based on the AMI
  • spin up secondary instances using the launch configuration
  • use rsync to copy current data from the master server
  • register to the new instances with the ELB

NFS Sharing Pattern

Similar to the Clone Server pattern but we replace rsync with NFS

  • use of NFS ensures each instance sees the same data on the file system
  • HDFS or GlusterFS are viable alternatives to NFS
  • Amazon Elastic File System (EFS) is a managed version of this solution

State Sharing Pattern

Share session information between applications via a fast, in-memory cache.


  • Redis and MemCached are typical solutions
  • Amazon's ElastiCache is a managed service of those two APIs
  • data is looked up by key, usually a session identifier
  • data can be aged out using TTL settings

URL Rewriting Pattern

Improve upon the Clone Server and NFS Sharing patterns by storing data in S3.


  • files are uploaded to an application
  • the application stores the data in S3
  • when a client asks for the data, the server returns a 301 status code and redirects them to the S3 URL
  • no longer have to manage file systems

Cache Proxy Pattern

If CloudFront is taking too long replicating your data out to its edge nodes, cache the data yourself


  • use Varnish or Squid to cache the data
  • unclear if the author intended for the application and cache pair to run in different regions around the globe
  • for availability, need to run a cluster of caches
  • author suggest running a Consul cluster to help with Varnish cluster registrations
  • author also suggested having Varnish use S3 for its backing store and possibly avoiding having to run a cache cluster

Write Proxy Pattern

Instead of using HTML form uploads to add data, use alternative protocols to upload the data and have the web front-end read it back from S3.


  • run an EC2 instance to handle the uploads via the protocol of choice
  • the instance ultimately stores the data in S3
  • front-end server fetches data either from S3 directly or from a cache backed by S3

Storage Index Pattern

To avoid unnecessary S3 API calls, and the cost associated with them, keep and index of meta-data associated with each uploaded item.


  • authors uses Amazon's RDS to store meta-data but you can use anything you want
  • the write proxy (upload server) will also write meta-data to the index
  • the front-end serving the data consults the index for necessary information, including the direct S3 coordinates

Direct Object Upload Pattern

Leverage S3's robust HTTP upload support and allow direct uploads to S3.


  • provide a front-end that hosts an upload form
  • the front-end does some data validation and constructs an S3 upload URL
  • the client is redirected to the generated URL and the bits are uploaded directly to S3
  • after the upload is completed, the client gets redirected to whatever completion page you want

Database Replication Pattern

Create two instances of your database and use its master-slave support to deal with network partitions and server crashes


  • create a master MySQL instance in one AZ
  • create a slave MySQL instance in a different AZ
  • let MySQL replication logic keep the data in sync
  • using Amazon RDS is probably a better choice
  • not designed to increase capacity
  • propagation only occurs from master to slave

Read Replica Pattern

Variation of the Database Replication pattern where the secondary is used for read-only operations


  • set up the master and slave instances
  • install MySQL Proxy on the master instance and configure it to route read-only operations to the slave node
  • HAProxy, MySQL Fabric and Galera are other options
  • Amazon RDS is probably a better solution

Read Replica Pattern

In-Memory Cache Pattern

Applications avoid hitting the database by caching previous query results


  • Redis and other key-values stores can be used
  • Complexity added on the application side
  • ElastiCache provides Redis API
  • DynamoDB is a key-value store

Sharding Write Pattern

Improve the Read Replica pattern by sending writes to multiple databases


  • install MySQL Fabric to handle the details
  • carefully select a sharding key -- hard to change later
  • can improve write throughput
  • may impact read throughput -- data lives in multiple places
  • Amazon RDS might be a better choice

Sharding Write Pattern

Queuing Chain Pattern

Move work through the system via a series of message queues.


  • in the old days, we called this SEDA (staged event driven architecture)
  • use SQS to handle message queuing
  • incoming message is pulled from queue A, processed and results are published to queue B
  • pattern is repeated until the final solution is produced
  • can probably levarage Lambda to for processing pieces

Queuing Chain Pattern

Priority Queue Pattern

Variation of the Queuing Chain pattern where certain queues are considered more important than others and get priority attention


  • message consumers monitor multiple queues
  • higher priority queues are always serviced first, lower priority queues when there is no higher priority work
  • requires 3rd party library to provide priorities to SQS queues
  • maybe Lambda can work here?

Priority Queue Pattern

Job Observer Pattern

Dynamically scale message consumers by leveraging CloudWatch


  • similar idea to Scale Out pattern
  • add a Cloud Watch alarm to the priority queue
  • if the queue depth gets too deep, the auto-scaling group gets triggered and more consumers are created
  • once the queue depth gets reduced, the auto-scaling group can destroy the extra consumers
  • can be automated via Terraform
  • Lambda might obviate this pattern

Job Observer Pattern

Bootstrap Pattern

Enhancement to the Stamp pattern where customizations are applied at instance start up


  • EC2 has the notion of "user data" which are Bash scripts run at boot time
  • I've used it to start important Docker containers, such as Nomad agents, at boot time
  • can use S3 to grab the most current logic and data, such as a digital certificate or instance launch script
  • amount of work done impacts start up time
  • unclear what happens when the script fails

Cloud Dependency Injection Pattern

Variation of the Bootstrap pattern where the boot script behavior is controlled by the context of the instance being launched


  • the AMI used is general purpose
  • the EC2 instance has special meta-data applied to it (tags)
  • at boot time, the instance consults those tags to determine where in S3 it should obtain its start up script
  • allows for changes in instance configuration without having to bake a new AMI

Stack Deployment Pattern

Use CloudFormation to generate your application stack from a customized template or existing catalog of templates


  • templates are JSON files
  • similar idea to Terraform
  • the entire stack can easily be destroyed with a single command
  • unclear how rich the templates can be

Monitoring Integration Pattern

Use CloudWatch to monitor your infrastructure and alarm you when things become abnormal

  • Amazon only retains 2 weeks of data so it isn't suitable for an operations database
  • other tools, such as Nagios, Zabbix and Cacti are other tools in this space
  • author does not offer any concrete examples but does mention security audits, QoS aggreements and deep health checks as reasons to put monitoring in place 

Web Storage Archive Pattern

The idea that logs are rotated out of the instance into S3


  • author says that there are hundreds of tools that help in this space
  • Loggly is one of those tools
  • S3 provides for aging of old logs to cold storage (Glacier)

Weighted Transition Pattern

When rolling out new bits, direct a small portion of the traffic to the new bits to prove them out


  • old system and new system run at the same time
  • Route53 is configured to route a small percentage of traffic to the new system
  • monitor the new system for abnormalities
  • slowly move all the traffic to the new system
  • the hard part is merging the old database into the new database
  • might be automated by Terraform

Weighted Transition Pattern

Hybrid Backup Pattern

Not covered in depth but the idea is to backup on-premises application data to the cloud


  • use S3 for fast storage of data
  • use Glacier for cold storage
  • S3 also has a "less frequently accessed" mode
  • data can also be encrypted at rest

On-Demand NAT Pattern

For security reasons, keep your instances off the internet and in a private subnet.  Use a NAT router to temporarily enable internet access as needed.


  • place EC2 instance in a private subnet
  • bring up another instance to use as a NAT gateway
  • internal instances route public traffic through the gateway
  • turn off the NAT instance when done
  • can't get to S3, ELB and any other AWS services that require traversal via the internet
  • controlling the Internet Gateway might an alternative


On-Demand NAT Pattern

Management Network Pattern

Use dual NICs to control data meant for the internet and data meant for the corporate network


  • sometimes called a backnet or management network
  • easier to understand the flow of traffic and apply security groups
  • useful in migration because cloud-based assets can contact on-premises assets
  • requires a VPN on the management network 

Management Network Pattern

Functional Firewall Pattern

Stack individual security groups to achieve firewall rules that are easier to manage


  • insecure http (80) from internet to internal network
  • secure http (443) from internet to internal network
  • MySQL (3306) from instances using either the insecure or secure http group
  • Redis (6379) from instance using either insecure or secure http group
  • reuse individual groups and stack them, keeping things DRY 

Functional Firewall Pattern

Operational Firewall Pattern

Stricter version of the Functional Firewall that restricts the incoming traffic from a specific source


  • restrict to a network
  • restrict to a specific IP
  • restrict to a particular client certificate

Operational Firewall Pattern

Web Application Firewall Pattern

Not cloud-specific but a general approach.


  • use a web application firewall to understand the context of the data being transferred, eg ModSecurity
  • stateful packet inspection rules are not good enough
  • firewalls live outside of the instances they are trying to protect
  • author light on specifics

Web Application Firewall Pattern

AWS WAF is a web application firewall that helps protect your web applications from common web exploits that could affect application availability, compromise security, or consume excessive resources. AWS WAF gives you control over which traffic to allow or block to your web applications by defining customizable web security rules. You can use AWS WAF to create custom rules that block common attack patterns, such as SQL injection or cross-site scripting, and rules that are designed for your specific application. New rules can be deployed within minutes, letting you respond quickly to changing traffic patterns. 

Web Application Firewall Pattern

Multi-Load Balancer Pattern

Variation of the Operational Firewall


  • use multiple ELBs
  • each ELB does TLS termination (new certificate manager makes it even easier to do this now)
  • each ELB has client-specific security groups applied to control inbound traffic
  • instances behind the ELBs use the simpler insecure HTTP
  • standard security group stacking is applied to ensure instances are properly restricted

Multi-Load Balancer Pattern

Infrastructure As Code Pattern

Apply the same techniques to operational concerns that you do to production code


  • automate and store scripts in source control
  • CloudFormation to construct an entire stack
  • Packer to build AMIs, Docker Container and Droplets
  • Fugue is a new YML based tool aimed at AWS

Temporary Development Environment Pattern

Use virtualized environment coupled with automated AWS scripting to develop in near production environment

  • if you deploy in Linux/AWS, develop in Linux/AWS to avoid "gotchas"
  • Vagrant can spin up VirtualBox
  • Terraform can spin up a private, temporary AWS area
  • keep control logic under source control
  • CI environments should use something similar

Continuous Integration Pattern

Leverage AWS to run your CI builds


  • spin up build agents as needed
  • alternative is to auto-scale them during working hours
  • can keep the master node on premises
  • gives you the ability to build and test under different environments and configurations
  • unsure if we have enough Bamboo licenses to make this viable

Cloud Design Patterns for AWS

By Ronald Kurr

Cloud Design Patterns for AWS

A summary of the patterns described in the book "Implementing Cloud Design Patterns for AWS"

  • 1,809