CMSC389L
Week 2
AWS + S3 + CloudFront
September, 8, 2017
Recap
- Piazza
- Codelab 1, due Sunday 11:59PM
- New Policy: Up to 1 dropped in-class worksheet, for any reason
- For hackathons, interviews, and university-approved excuses, provide documentation and turn in the following week
- Office Hours:
- Tuesdays 4-5PM (AVW 4101)
- Fridays 2-3PM (AVW 4101)
AWS Concepts
AWS Concepts
-
Availability Zones (AZs) are individual isolated locations
- Each AZ consists of multiple interconnected datacenters
-
Regions are geographically-distributed groups of 2+ AZs
- us-east-1 is Northern Virginia
Worksheet: AWS
Complete the first part of the worksheet, on AWS.
S3
S3
- Simple Storage Service (S3): Key-value store for object storage at scale
- Object Storage: Any sequence of bytes (photos, videos, source code, ...)
-
Durability: 99.999999999%
- Replicated across data centers and AZs
- Features: ACLs, Metadata, Versioning, Encryption, ...
When to use S3?
-
S3: Think "file storage"
- Support for large files, up to 5TB
- Integration with CloudFront CDN
- Support for archiving data (to Glacier)
- examples: website static content (HTML/CSS/etc.), log files
-
Databases: Think "queryable data"
- DBMS guarantees
- Faster read/writes
- Index support
- Limits on value size (f.e., 400Kb for DynamoDB)
- examples: user profile data, credentials
S3: Case Study
- Mapbox:
- Used by Airbnb, Strava, Washington Post, etc.
- 100M miles of telemetry data / day
- 250M users, 11 countries, 10 regions
- Petabytes of map and imagery data
- Rendered at 60FPS
- Globally distributed with CloudFront
S3 Concepts
-
Object: Fundamental entity in S3
- Consists of object data + metadata
-
Metadata: name-value pairs that describe the object
- date last modified, content-type, etc.
-
Bucket: Container for objects stored in S3
- Bucket name must be globally unique
- Can store unlimited objects
-
Key: Unique identifier within a bucket
- 1-1 relationship between buckets and keys
S3 Operations
Common Operations:
- Create Bucket: Creates a bucket in a specified region
- Write Object: Stores data at a given key, either by creating or overwriting
- Read Object: Returns an object, given a key
- Delete Object: Deletes an object, given a key
- List Keys: Lists all keys under a given prefix
S3 Guarantees
-
Atomic: Reads to a previously updated object will return either the updated object or the previous object. Never partial or corrupted data.
-
Eventual Consistency: Update and delete operations are not available until fully replicated.
- For new keys, reads have read-after-write consistency
-
Eventual Consistency: Update and delete operations are not available until fully replicated.
- No Locks: Clients are responsible for implementing locks to prevent simultaneous updates
S3 Storage Types
- Frequently Accessed
-
Standard (default)
- 99.999999999% durability, 99.99% availability
- Most expensive storage class
-
Reduced Redundancy Storage (S3-RRS)
- Reduced durability (99.99%)
-
Standard (default)
- Infrequently Accessed
-
Infrequently Accessed (S3-IA)
- Reduced availability (99.9%) + retrieval fee
-
Glacier
- No real-time access, access is on the scale of minutes to hours
-
Infrequently Accessed (S3-IA)
S3 Features
- Versioning: If enabled, S3 stores all versions of a file
- Bucket Policies: Use policies to specify which services can access a bucket or key prefix
- Cross-Region Replication: Easily enable automatic data replication to other regions
- Lifecycle Management: Configurable rules based on object age. Supports transitioning and expiration
- Built-in Monitoring: via CloudWatch (GetRequests, BucketSizeBytes, 4xxErrors, FirstByteLatency, etc.)
S3 Costs
- Storage
- Standard: $0.023 / GB-month
- IA: $0.0125 / GB-month
- Glacier: $0.004 / GB-month
- Data Transfer
- Into S3: Free
- Out of S3 to:
- us-east-1: $0.010 / GB-month
- Internet: $0.090 / GB-month
-
Request Fee:
- Writes: $0.05 / 10k requests
- Reads: $0.004 / 10k requests
Worksheet: S3
Complete the second part of the worksheet, on S3.
CloudFront
CloudFront Concepts
-
Content Delivery Networks (CDN): a globally-distributed network of proxy servers which cache content
- Use cases: web streaming, static content acceleration
CDN Metrics
- Latency: Time taken until the first byte is downloaded
- Data Transfer Rates: Rate at which data is transferred to the client
- Cache Hit Ratio: The percent of requests where the data can be retrieved from the cache
CloudFront Concepts
- Edge Locations: Proxy server in the CloudFront network
- Regional Edge Caches: Proxy servers that intermediates between origin server and edge locations.
CloudFront Concepts
- Origin Server: The definitive store of content that CloudFront will accelerate (f.e., S3 or EC2 web server)
- Distribution: Configuration specifying your origin server, cache invalidation rules, etc.
CloudFront Costs
- Transfer from CloudFront to Internet: $0.085 / GB
- Varies by region and throughput
- Drops to $0.020 / GB
- Varies by region and throughput
- Transfer from AWS (S3, etc.) to CloudFront: Free
- Per-request fee: $0.01 / 10k requests
Worksheet: CloudFront
Complete the final part of the worksheet, on CloudFront.
Closing Notes
- Codelab 1 -- due Sunday
- Codelab 2 (CloudFront + S3) -- due next Friday
- Project 1 -- will be released after the next class
- Join Piazza
- Turn in your worksheets
CMSC389L Week 2
By Colin King
CMSC389L Week 2
- 871