CMSC389L

Week 2

AWS + S3 + CloudFront

September, 8, 2017

Recap

  • Piazza
  • Codelab 1, due Sunday 11:59PM
  • New Policy: Up to 1 dropped in-class worksheet, for any reason
    • For hackathons, interviews, and university-approved excuses, provide documentation and turn in the following week
  • Office Hours:
    • Tuesdays 4-5PM (AVW 4101)
    • Fridays 2-3PM (AVW 4101)

AWS Concepts

AWS Concepts

  • Availability Zones (AZs) are individual isolated locations
    • Each AZ consists of multiple interconnected datacenters
  • Regions are geographically-distributed groups of 2+ AZs
    • us-east-1 is Northern Virginia

Worksheet: AWS

Complete the first part of the worksheet, on AWS.

S3

S3

  • Simple Storage Service (S3): Key-value store for object storage at scale
  • Object Storage: Any sequence of bytes (photos, videos, source code, ...)
  • Durability: 99.999999999%
    • Replicated across data centers and AZs
  • Features: ACLs, Metadata, Versioning, Encryption, ...

When to use S3?

  • S3: Think "file storage"
    • Support for large files, up to 5TB
    • Integration with CloudFront CDN
    • Support for archiving data (to Glacier)
    • examples: website static content (HTML/CSS/etc.), log files
  • Databases: Think "queryable data"
    • DBMS guarantees
    • Faster read/writes
      • Index support
    • Limits on value size (f.e., 400Kb for DynamoDB)
    • examples: user profile data, credentials

S3: Case Study

  • Mapbox: 
    • Used by Airbnb, Strava, Washington Post, etc.
  • 100M miles of telemetry data / day
  • 250M users, 11 countries, 10 regions
  • Petabytes of map and imagery data
    • Rendered at 60FPS
    • Globally distributed with CloudFront

S3 Concepts

  • Object: Fundamental entity in S3
    • Consists of object data + metadata
    • Metadata: name-value pairs that describe the object
      • date last modified, content-type, etc.
  • Bucket: Container for objects stored in S3
    • Bucket name must be globally unique
    • Can store unlimited objects
  • Key: Unique identifier within a bucket
    • 1-1 relationship between buckets and keys

 

Example: https://s3.amazonaws.com/cmsc389l/apple.png

S3 Operations

​Common Operations:

  • Create Bucket: Creates a bucket in a specified region
  • Write Object: Stores data at a given key, either by creating or overwriting
  • Read Object: Returns an object, given a key
  • Delete Object: Deletes an object, given a key
  • List Keys: Lists all keys under a given prefix

S3 Guarantees

  • Atomic: Reads to a previously updated object will return either the updated object or the previous object. Never partial or corrupted data.
    • Eventual Consistency: Update and delete operations are not available until fully replicated.
      • For new keys, reads have read-after-write consistency
  • No Locks: Clients are responsible for implementing locks to prevent simultaneous updates

S3 Storage Types

  • Frequently Accessed
    • Standard (default)
      • 99.999999999% durability, 99.99% availability
      • Most expensive storage class
    • Reduced Redundancy Storage (S3-RRS)
      • Reduced durability (​99.99%)
  • Infrequently Accessed
    • Infrequently Accessed (S3-IA)
      • Reduced availability (99.9%) + retrieval fee
    • Glacier
      • No real-time access, access is on the scale of minutes to hours

S3 Features

  • Versioning: If enabled, S3 stores all versions of a file
  • Bucket Policies: Use policies to specify which services can access a bucket or key prefix
  • Cross-Region Replication: Easily enable automatic data replication to other regions
  • Lifecycle Management: Configurable rules based on object age. Supports transitioning and expiration
  • Built-in Monitoring: via CloudWatch (GetRequests, BucketSizeBytes, 4xxErrors, FirstByteLatency, etc.)

S3 Costs

  • Storage
    • Standard: $0.023 / GB-month
    • IA: $0.0125 / GB-month
    • Glacier: $0.004 / GB-month
  • Data Transfer
    • Into S3: Free
    • Out of S3 to:
      • us-east-1: $0.010 / GB-month
      • Internet: $0.090 / GB-month
  • Request Fee:
    • ​Writes: $0.05 / 10k requests
    • Reads: $0.004 / 10k requests

Worksheet: S3

Complete the second part of the worksheet, on S3.

CloudFront

CloudFront Concepts

  • Content Delivery Networks (CDN): a globally-distributed network of proxy servers which cache content
    • Use cases: web streaming, static content acceleration

CDN Metrics

  • Latency: Time taken until the first byte is downloaded
  • Data Transfer Rates: Rate at which data is transferred to the client
  • Cache Hit Ratio: The percent of requests where the data can be retrieved from the cache

CloudFront Concepts

  • Edge Locations: Proxy server in the CloudFront network
  • Regional Edge Caches: Proxy servers that intermediates between origin server and edge locations.

CloudFront Concepts

  • Origin Server: The definitive store of content that CloudFront will accelerate (f.e., S3 or EC2 web server)
  • Distribution: Configuration specifying your origin server, cache invalidation rules, etc. 

CloudFront Costs

  • Transfer from CloudFront to Internet: $0.085 / GB
    • ​Varies by region and throughput
      • Drops to $0.020 / GB
  • Transfer from AWS (S3, etc.) to CloudFront: Free
  • Per-request fee: $0.01 / 10k requests

Worksheet: CloudFront

Complete the final part of the worksheet, on CloudFront.

Closing Notes

  • Codelab 1 -- due Sunday
  • Codelab 2 (CloudFront + S3) -- due next Friday
  • Project 1 -- will be released after the next class
  • Join Piazza
  • Turn in your worksheets

CMSC389L Week 2

By Colin King

CMSC389L Week 2

  • 871