Storage Systems

INFO 253B: Backend Web Architecture

Kay Ashaolu

Why Data Storage?

 

  • When we make a web request, where do we get the data from?
  • When we create data, where do we put it?
  • Where do "resources" live?

Example: bit.ly

  • Lots of data to store
    • Shortcut to url mapping
    • Statistics about links
    • Information about users

Example: bit.ly

long url http://news.google.com
short url http://bit.ly/awekl
hit count 482240

long url http://facebook.com/user/profile
short url http://bit.ly/czasw
hit count 11023

long url http://msnbc.com/news/article/
short url http://bit.ly/olkjpl
hit count 1232

Data Storage Design

  • What is the storage format?
  • How do we lay out data?
  • How do we access data?

Why use a file?

  • What are the pros and cons?
http://news.google.com, http://bit.ly/awekl, 482240
http://facebook.com/user/profile, http://bit.ly/czasw, 11023
http://msnbc.com/news/article, http://bit.ly/olkjpl, 1232

Problems with files

  • What if we want to add another field?
  • What if we want to query different parts of data? How efficient is this?
  • What if we have concurrent accesses?
  • What data structures should we use?

Data Independence

  • Databases: apps shouldn’t have to worry about these problems!
  • Underlying storage format independent of application-level logic

Relational Data Stores

  • RDBMS: Relational Database Management System
  • Invented in the 1970s
  • e.g., Oracle, MySQL, Postgres, IBM DB2, Microsoft SQL Server

Relational Model

  • Reason about sets of facts, or "tables"
  • Each fact is a "row"
  • Attributes are "columns" of row

For example

For example

SQL Query Language

  • High-level query language over tables
  • Declarative: say "what" you want computed, not "how"
  • Why is this good?

SQLite - RDBMS uses SQL

  • SQLLite is one DB that can be both relational and non relational
  • One of the most popular/successful databases
  • Amazingly enough, it stores data in a single file, but gives the ability to use the SQL language to query and store information
  • SQLite is used in more applications than you would think, including your mobile phone

Install SQLite to local

  • pip3 install pysqlite3 (may need to use sudo)
  • Now can run "sqlite3 [filename]" on terminal to run sql commands and have the database persisted in the file specified

Create Example

CREATE TABLE links (
    long_url VARCHAR(1000) NOT NULL,
    short_url VARCHAR(20) NOT NULL,
    hit_count BIGINT DEFAULT 0,
    created DATE
);

INSERT INTO links (long_url, short_url, hit_count, created) VALUES ("http://www.google.com", "qwelmw", 2, "2016-04-05"), 
("http://www.twitter.com", "adfer", 45, "2016-08-05");

Select Example

SELECT * FROM links;
long_url
 
short_url hit_count created
http://www.google.com
 
qwelmw 2 2016-4-5
http://www.facebook.com
 
adfer 45 2016-8-5

Select Example

SELECT * FROM links WHERE hit_count < 20;
long_url
 
short_url hit_count created
http://www.google.com
 
qwelmw 2 2016-4-5

Insert Example

SELECT * FROM links WHERE hit_count < 20;
long_url short_url hit_count created
http://www.google.com qwelmw 2 2016-4-5
http://www.twitter.com eovle 9 2016-10-28
INSERT INTO links VALUES ("http://www.twitter.com", "eovle", 9, "2016-10-28");

Update Example

SELECT * FROM links;
long_url short_url hit_count created
http://www.google.com qwelmw 0 2016-4-5
http://www.facebook.com adfer 0 2016-8-5
http://www.twitter.com eovle 9 2016-10-28
UPDATE links SET hit_count = '0' WHERE created < '2016-10-22';

Useful Properties

  • Atomicity: all updates happen or none do
  • Consistency: easy to reason about database
  • Isolation: operations are separated from each other
  • Durability: updates won’t disappear

Pros

  • High-level query language
  • High Data integrity
  • Data independence

Cons

  • Have to define schema at start
  • Scaling can be complicated
  • Can become slow

Useful Properties

Let's persist the dictionary API

  • Atomicity: all updates happen or none do
  • Consistency: easy to reason about database
  • Isolation: operations are separated from each other
  • Durability: updates won’t disappear

Dictionary Example

Text

from flask import Flask, request, Response, g
import json
import logging
import sqlite3

DATABASE = './database.db'

app = Flask(__name__)

@app.route('/save_def', methods=["POST"])
def save_def():
    if request.headers['Content-Type'] == 'application/json':
      
        arguments = request.get_json()
        word = arguments.get("word")
        definition = arguments.get("definition")

        db = get_db()
        query = "INSERT INTO dictionary (word, definition) VALUES (?, ?)"
        cur = db.execute(query, [word, definition])
        db.commit()
        cur.close()
        
        logging.warning("Word {} with definition {} saved".format(word, definition))

    else:
        logging.warning("Invalid content type: only application/json is allowed")

    resp = Response('')
    return resp

Dictionary Example

Text

@app.route('/get_def/<word>', methods=["GET"])
def get_def(word):
    # Note for GET Request, we get input parameters from URL, not
    # application/json
    # request body


    db = get_db()
    query = "SELECT definition from dictionary where word = ?"
    cur = db.execute(query, [word])
    rv = cur.fetchall()
    cur.close()

    if not rv:
        definition = "Not Found"
        logging.warning("{} not found in dictionary".format(word))
    else:
        definition = rv[0][0]

    data = {"word": word, "definition": definition}
    resp = Response(json.dumps(data), mimetype='application/json')

    return resp

Dictionary Example

Text

def get_db():
    db = getattr(g, '_database', None)
    if db is None:
        db = g._database = sqlite3.connect(DATABASE)
    return db

@app.teardown_appcontext
def close_connection(exception):
    db = getattr(g, '_database', None)
    if db is not None:
        db.close()

def init_db():
    with app.app_context():
        db = get_db()
        schema_sql = """
          CREATE TABLE IF NOT EXISTS dictionary (
            id INT primary key,
            word text not null,
            definition text not null
          );
        """
        cur = db.execute(schema_sql)
        db.commit()
        cur.close()


if __name__ == "__main__":
  init_db()
  logging.warning("Table created or already exists")
  app.run(host="localhost", port=8000, debug=True)

Questions?