Storage Systems II

INFO 253B: Backend Web Architecture

Kay Ashaolu

Typical Three Tier Web Architecture

Typical Three Tier Web Architecture

  • Reduce load on database by placing cheap copies in front of DB
  • Problem?
  • Have to keep cache(s) up to date

Caching

Horizontal scale-out

Scaling

  • What if data can’t fit on a single server?
  • What if a server goes down?
  • What if a machine fails completely?

Replication

  • Provides durability: don’t lose data
  • Provides capacity: multiple servers
  • Leads to many interesting challenges

Typical Replication

Data Placement

  • Which server gets data?
  • Assign students to server based on age

Data Placement

  • Which servers get what data?
  • Range vs. Hash vs. ?
  • How many copies of the data?
    • Durability: how many failures?
    • Capacity: how many requests?

Consistency

  • Need to keep replicas up to date
  • May be slow or impossible!
  • Very expensive if servers are located around the world!

NoSQL

  • Different approach to data storage
  • Simple but predictable data models
  • Often have to build own features
  • Designed for massive scale-out

Pros

  • Simple API
  • Easy to understand performance
  • Easy to scale and use
  • Examples: Redis, Amazon DynamoDB

Cons

  • Simple API
  • Must handle own schema management
  • May need to manually implement search features

Key-Value Store

def put(key, value):
    pass

def get(key):
    return value

Pros

  • No predefined schema
  • Store handles layout of arbitrary fields
  • Typically can run search capabilities
  • Easy to scale and use
  • Examples: MongoDB, CouchDB

Cons

  • No safeguards to schema
  • May need to implement complex join logic
  • Can have different documents with different schemas

Document Store

{
	"long_url": "http://www.google.com",
	"short_url": "qwelmw",
	"hit_count": 2
}

Summary

  • Databases designed to solve many common data storage problems
  • Storage comes in many flavors; right choice is often specific to use case
  • When in doubt, start simple!
  • My opinion: start with a RDBMS and learn about your data, move to a DB that better suites your use case afterwards

Questions?