DynamoDB
For the Rest of Us

The NoSQL Debate
How NoSQL databases work
Why they are thought about in the wrong way
Using Dynamo as a case study
Intro
Martin McKeaveney
TODO

Co-founder @
What is DynamoDB?
Amazon DynamoDB is a fully managed, multi region key-value and document database, providing single digit millisecond performance at any scale.

Relational Databases - A Primer
- Simple
- Common
- Flexible
- JOINS
- RDBMS query each table, scan the rows and JOIN all the results together into a view returned to the client
- Referential integrity
- Vertically Scalable
- JOINS are the bottleneck
- Unpredictable at scale
- SQL Queries are boundless
- It's very easy to write a bad SQL query
- Horizontal Scaling is hard

NoSQL Databases
- NoSQL Databases were invented to handle scale and throughput that SQL cannot
- No JOINS
- Data segmented
- Storage Optimised -> CPU Optimised

How NoSQL Works: Writes
- Architecture of most NoSQL Databases
- Use of partition keys and "segments"


How NoSQL Works: Reads

Faux-SQL vs NoSQL
-
NoSQL is not "worse" - people give up because they aren't used to it and try to apply relational patterns to it
-
"Flexibility" is ultimately a sales pitch for NoSQL
-
Extra work upfront.
-
Define your access patterns
-
OLAP vs OLTP - Aggregations and rollups
-
You know what’s really good at maintaining referential integrity and optimizing JOINs? A relational database.

DynamoDB Then
- Amazon created dynamo to outscale their Oracle infrastructure
- Most operations were key value, or only operate on single tables
-
Amazon built dynamo in house
- Dynamo paper published in 2007
- Inspired many other databases of it's kind
- The managed DynamoDB service was launched in 2012

DynamoDB Now
-
Dynamo serves over 1 trillion requests per day
-
DynamoDB is used by Lyft to store GPS locations for all their rides,
-
Tinder to store millions of user profiles and make billions of matches
-
BMW to run its car-as-a-sensor service that can scale up and down by two orders of magnitude within 24 hours
-
Under Armour to support its connected fitness community of 200 million users
-
Toyota Racing to make real time decisions on pit-stops, tire changes, and race strategy, and another 100,000+ AWS customers for a wide variety of high-scale, high-performance use cases.
Benefits
- Durable and Highly-Available
- Performant
- Fully Managed
- Adaptive Capacity
- Access Control
- Bounded Queries
DynamoDB won't let you write a bad query"
The Basics

Tables
A grouping of data records.

Think Tables in SQL
Items
An item is a single data record in a table.
uniquely identified by the stated primary key of the table.

Think Rows in SQL
Attributes
Attributes are pieces of data attached to a single item.
Attributes must have a type.
Think columns in SQL

-
Scalar − These types represent a single value, and include number, string, binary, Boolean, and null.
-
Document − These types represent a complex structure possessing nested attributes, and include lists and maps.
-
Set − These types represent multiple scalars, and include string sets, number sets, and binary sets.
Primary Key: Simple

2 Types
Simple or Composite
Primary Key: Composite

Example

MOT Example

MOT Example

MOT Example

MOT Example

"Give me all the MOT appointments at the Belfast test centre, gate 2 ordered by time."
MOT Example
// async function abstraction
async function queryItems(centreGate){
const params = {
TableName: 'MOTAppointment',
ExpressionAttributeNames: { '#key': 'CentreGate' },
ExpressionAttributeValues: { ':centregate': centreGate },
// this is the actual query
KeyConditionExpression: '#key = :centregate',
}
try {
const data = await docClient.query(params).promise()
return data
} catch (err) {
return err
}
}
queryItems("Belfast:2");
// returns all MOT appointments in at gate 2 in Belfast
// ordered by time ascendingConsistency

Cost
-
Read request unit: A strongly consistent read request of up to 4 KB requires one read request unit. For items up to 4 KB in size, an eventually consistent read request requires one-half read request unit.
-
Write request unit: A standard write request unit can write an item up to 1 KB. For items larger than 1 KB, additional write request units are required.
Querying Dynamo
You can only QUERY indexed data in dynamo. This is how it stays fast and scalable.
We can't query non-key attributes without doing a filter.
A filter will filter down the results AFTER a query, meaning you are paying for all the rows you query before the filter.

Indexes
- Secondary Indexes let you specify alternate key structures
- Increase query flexibility
- Avoid scans and filters
- You will also pay for extra writes on the indexes

Local Secondary Indexes
Can only be used on a table with a composite primary key.
You specify an index with the same PARTITION key but a different SORT key for a table.

Must be defined on table creation
Global Secondary Indexes
Global secondary indexes can be used to specify a completely different key structure for a table.

Best Practices
- Know your access patterns up front
- "Pre Join"
- Single Table Design

- Selecting Indexes
- You should generally go for eventually consistent reads
- Avoid filters and scans
- Abuse IDs
Data Modelling: Hierarchical
- Common data model expressed with JOINs in SQL databases

Step 1: Identify Access Patterns
-
Retrieve a single person by their Employee ID
-
Gather all people in a particular company
-
Gather all people from a certain department
-
Gather all people in a certain office
-
Gather all people with a certain job role
Step 2: Primary Keys
-
Choosing a good primary key:
- it enables you to uniquely identify each item for writes & updates
- it evenly distributes your data across the partition key.

Step 3: Indexes
Eg. Engineering#Dublin#VP
- We need a GSI
- a PARTITION key of Company
- a SORT key named DepartmentOfficeRole

Step 4:
Query
// async function abstraction
async function queryItems(company, depOfficeRole) {
const params = {
TableName: "WorldwideEmployees",
ExpressionAttributeNames: {
"#company": "Company",
"#depOfficeRole": "DepartmentOfficeRole",
},
KeyConditionExpression:
"#company = :company and begins_with(#depOfficeRole, :depOfficeRole)",
ExpressionAttributeValues: {
":company": company,
":depOfficeRole": officeDepartmentRole,
},
};
try {
const data = await docClient.query(params).promise();
return data;
} catch (err) {
return err;
}
}
// Give me everyone who works at Stripe,
// organised by department office and job role
queryItems("Stripe");
// Give me everyone who works in Engineering at Googles Zurich Office
queryItems("Google", "Engineering#Zurich");
// Give me all the Designers in Intercoms San Francisco Office
queryItems("Intercom", "UX#SanFrancisco#Designer");
Query Results

Thanks!
DynamoDB For the Rest of Us
By Martin McKeaveney
DynamoDB For the Rest of Us
- 891