Microsoft Samurai

Data Architecture

In The Cloud

By @ErikRalston

Overview

  • Challenges in large-scale applications

  • Flavors of Databases available now

  • Strategies for "Polyglot Persistence" in web apps

  • Offerings from Cloud Platforms

Audience

  • Looking for general principles in "Big"

  • Web developers scaling to "Big"

  • SQL developers afraid of NoSQL

Sources

INSERT INTO ROUND_HOLE VALUES ('SQUARE_PEG')

What Is

NoSQL?!

Not Only SQL

CAP Theorem

  • Consistency: every read would get you the most recent write
  • Availability: every node (if not failed) always executes queries
  • Partition-tolerance: even if the connections between nodes are down, the other two (A & C) promises, are kept. 

Pick 2

C

A

P

SQL

NoSQL

NoSQL

SQL Server

MySQL

Oracle

PostgreSQL

CouchDB

Cassandra

Riak

Dynamo

MondoDB

Redis

ACID

  • Atomic
  • Consistent
  • Isolated
  • Durable

BASE

  • Basic Availability
  • Soft-state
  • Eventual consistency

Volume Velocity Variety

Searching

Reporting

Processing

$

Relational

Document

Column Family

Key-Value

Graph

Relational

SQL Server, MySQL, PostgreSQL

Data is held as rows in tables with columns

Rows can connect to each other via foreign keys

ACID Transactions

Document

Azure DocumentDB, CouchBase, CouchDB, MongoDB

Data is held in documents with no consistent schema

Documents may be "linked", but usually just independent associations

Usually a hierarchy of Database > Collections > Document

Variety & Velocity

{
     "id" : "12345",
     "Name" : "Erik",
     "Title" : "Presenter"    
}
{
     "id" : "123",
     "Name" : "Tanya",
     "Title" : "Attendee"    
}
{
     "id" : "123",
     "Name" : "Tech Tuesday",
    "Title" : "Event",
    "Attendees" : [ "Chad", "Andre", "Mike" ],
    RSVPCount : 5  
}

Document

Document

Document

Collection (Uniquely identified with an ID)

Column Family

Azure Table Storage, Cassandra

Data is held in tables, with partitions and rows, no consistent columns

Table > Partition > Row

Volume & Variety

Partition (Uniquely Identified with Partition Key)

Table (Uniquely identified with an ID)

Row

Row

Row

Computer A

Computer B

Table 1

Partition X

Table 1

Partition Y

Problem:
E-Commerce

Product Catalog

Ratings

Orders

Complex Reporting

User Asynchronous Processing

Variety of products

No Reporting, Minimal Processing

Searching!

Large Volume

Basic Reporting

User Synchronous

Relational

Document

Column

Challenges

  • Backup & Restore
  • Availability
  • Skills

Thank
You

Microsoft Samurai: Data Architecture in the Cloud

By Erik Ralston

Microsoft Samurai: Data Architecture in the Cloud

An introduction to blending database strategies in cloud-hosted web applications

  • 887