CMSC389L

Week 11

Databases + DynamoDB

Friday, April 13, 2018

Databases

Why Use a Database?

  • For when you want to persist queryable data

 

  • Usually small + frequently accessed data
    • Game State
    • Login Credentials
    • A/B Test Configurations
    • Product Catalogs
    • Queryable File Metadata
    • Messages
    • ...

AWS Databases

DynamoDB

DynamoDB Overview

"Amazon DynamoDB is a fast and flexible NoSQL database service for all applications that need consistent, single-digit millisecond latency at any scale. It is a fully managed cloud database and supports both document and key-value store models."

DynamoDB Today

  • Basics
    • Tables + Items
    • Attributes
      • Primary Keys
    • Indexing
      • Querying
      • Secondary Indexes
    • API
    • Provisioning Capacity
      • Auto Scaling
  • Advanced
    • Partitioning
    • Read Consistency
    • DAX
    • WIF
    • Streams
      • Lambda Triggers
    • Best Practices

Basics

  • Let's create a music database
  • It should store a list of all songs
    • w/ song metadata, ratings, etc.

 

  • List of songs -> Table
  • Song -> Item
  • Song metadata -> Attributes

Attributes

  • Attributes are the key:value pairs in an item.
  • Values: 3 Types
    • Scalar: Number, String, Binary, Boolean, Null
      • Number: Inventory Quantity
      • String: Name or Timestamp
      • Binary: Encrypted Text Messages
      • Boolean: Released album or not
    • Nested: JSON (lists, maps)
      • Example: Address
    • Set: Number/String/Binary Sets
      • Example: Set of artists in a band

Attributes (cont.)

  • Schema-less!
    • Easy to add/remove attributes
    • No need to complex schema transitions

 

  • One exception: primary keys

Primary Key

  • Primary Key: An attribute that uniquely identifies an item in a table.
    • An identifier (f.e., Alexa device ID)
    • Or date (daily sales metrics, etc.)
    • etc.
  • Two types of primary key:
    • Partition Key: single attribute
    • Composite Partition Key: two attributes (partition key, sort key)
  • All attributes must be scalar!

Partition Key

  • Partition Key: ​Used in an internal hash function to distribute keys across partitions.

Composite Partition Key

  • Composite Partition Key: Adds a "sort key" which orders the items on a partition.
    • Partition Key does not have to be unique.
    • Important for range queries (where inventory_count > 0)

Querying

  • What if you want all songs by an artist?
    • Query by partition key
  • What if you want a specific song?
    • Query by partition key, sort key
  • What if you want all songs in an album?
    • Can't do this!
    • Need a secondary index.

Secondary Indexes

  • Secondary Index: A copy of your DB with a different partition key and/or sort key.
    • Queryable by the new partition/sort key
    • Automatically maintained for you
  • Two Types: Global and Local
    • Local: Different sort key, same partition key
      • Shares resources with your base table
    • Global: Different partition key [and optionally different sort key]

API

  • Barebones CRUD API
    • PutItem: Create a new item in a table.
      • Must specify primary key
    • GetItem: Fetch a single item from a table by primary key.
    • UpdateItem: Modify attributes in an item.
    • DeleteItem: Remove a single item.
  • Batch methods:
    • BatchGetItem, BatchWriteItem, ...
  • Other:
    • Query: Return multiple items based on a sort key query
    • Scan: Read all items in a table/index
      • Can apply an attribute filter

Provisioning Capacity

  • DynamoDB requires that you provision capacity
    • Read Capacity Units
      • 2 reads (<4KB) / second
    • Write Capacity Units
      • 1 write (<1KB) / second
    • Must provision for table AND indexes
  • DynamoDB applies throttling to requests
    • SDKs built to handle this
  • Burst Capacity available
    • Unused capacity from the previous 5m will be applied to bursty traffic
    • Not reliable!

Scaling Capacity

  • Manually
    • Increase as many times as you need
    • Decrease up to 9 times / day
      • 4 / day + 1 per 4 hours
    • Often inefficient!

Scaling Capacity

  • Automatically
    • Just set auto-scaling range + target utilization
    • Good for non-bursty traffic -- because of limits

DynamoDB Today

  • Basics
    • Tables + Items
    • Attributes
      • Primary Keys
    • Indexing
      • Querying
      • Secondary Indexes
    • API
    • Provisioning Capacity
      • Auto Scaling
  • Advanced
    • Partitioning
    • Read Consistency
    • DAX
    • WIF
    • Streams
      • Lambda Triggers
    • Best Practices

Partitioning

  • Data is stored in partitions
  • Each partition consists of multiple physical servers
    • Effectively replicates across AZs
    • Each server has a 10GB SSD
  • Occasionally requires partition splitting
    • ​Done in the background -- with no downtime!

Consistency

  • Eventual Consistency
    • Reads may not reflect most up-to-date result
      • Recent write operations may not have fully propagated
    • Asks a random server in the correct partition
    • Usually consistent within <1s
  • Strong Consistency
    • Reads reflect all prior successful write operations
    • Checks the majority of servers in the correct partition
    • Uses 2x the RCUs

DAX

  • Latency
    • ​DynamoDB: single-digit milliseconds
    • DynamoDB Accelerator (DAX): microseconds
      • ​10x latency improvement
  • DAX: in-memory cache
    • ​Good for:
      • Real-time bidding
      • Multiplayer Gaming
      • Stock Trading
      • ...

WIF

  • Authentication
    • ​Want to allow a client to directly access your database?
      • Dangerous!
    • Instead, restrict access via IAM roles
  • Web Identity Federation (WIF)
    • Social login (Amazon, Facebook, Google)
    • Provides temporary Amazon credentials to access DB from client

Streams

  • DynamoDB Stream: Chronologically ordered log of stream records.
    • Stream record: Defines a modification to an item
      • New items, removed items, modified items, ...
  • Used for:
    • Cross region replication
    • Real-time metrics
    • Notifications/Emails
  • Important: 24 hour TTL
  • Can configure a Lambda trigger on a DynamoDB stream

Streams (Example)

Best Practices

  • Choose a good primary key!
    • |Partition Key| >> |items|
  • Use a cache (like ElasticCache)
  • Compress large values, such as messages or logs
    • Store as binary
    • Or use S3!
  • Avoid indexes where possible

Best Practices

Pricing

  • RCUs: $0.47 / WCU / month
  • WCUs: $0.09 / RCU / month
  • Data Transfer Out: $0.09 / GB
  • Storage: $0.25 / GB-month
  • Stream Reads: $0.02 / 100k reads
    • Lambda triggered reads are free!

 

Free Tier: 25 WCU, 25 RCU, 1GB out, 25GB storage, 2.5M stream reads

Wrapping Up

Lambda codelab due next Thursday

 

Checkpoint #1 due Sunday

 

Feedback form: ter.ps/feedback11

CMSC389L S18: DynamoDB

By Colin King

CMSC389L S18: DynamoDB

  • 962