Storage Systems II

INFO 153B/253B: Backend Web Architecture

Kay Ashaolu

Properties of Databases (ACID)

  • Atomic: all updates happen or none do
  • Consistent: rules placed on database are honored
  • Isolation: operations are separated from each other
  • Durable: updates won’t disappear

There is no such thing as a database that does it all

  • Another area where tradeoffs are made based on your application's needs
  • It is critical to understand the pros and cons of the different kinds of databases so that you can choose the best one for your applicaiton

Pros

  • Full ACID capabilities
  • High-level query language (SQL)
  • High Data integrity
  • Data independence

Cons

  • Have to define schema at start
  • Scaling can be complicated
  • Can become slow

RDBMS

CREATE TABLE links (
    long_url VARCHAR(1000) NOT NULL,
    short_url VARCHAR(20) NOT NULL,
    hit_count BIGINT DEFAULT 0,
    created DATE
);

NoSQL or (Not only SQL) Databases

  • Different approach to data storage
  • Simple but predictable data models
  • Often have to build own features
  • Designed for massive scale-out

Pros

  • Simple API
  • Easy to understand performance
  • Easy to scale and use
  • Examples: Redis, Amazon DynamoDB

Cons

  • Simple API
  • Must handle own schema management
  • May need to manually implement search features

Key-Value Store

def put(key, value):
    pass

def get(key):
    return value

Pros

  • No predefined schema
  • Store handles layout of arbitrary fields
  • Typically can run search capabilities
  • Easy to scale and use
  • Examples: MongoDB, CouchDB

Cons

  • No safeguards to schema
  • May need to implement complex join logic
  • Can have different documents with different schemas

Document Store

{
	"long_url": "http://www.google.com",
	"short_url": "qwelmw",
	"hit_count": 2
}

Thoughts on databases

  • Databases designed to solve many common data storage problems
  • Storage comes in many flavors; right choice is often specific to use case
  • When in doubt, start simple!
  • My opinion: start with a RDBMS and learn about your data, move to a DB that better suites your use case afterwards

Time for an example

  • The full code for this example is in our spring-23 folder if you would like to follow along:
    • https://github.com/UCB-INFO-BACKEND-WEBARCH/spring-23
    • Under the "Example Docker Compose and Postgress DB/demo folder
  • Now we are going to go over an example that shows how you can add a database to your system
  • There's a few new libraries and tools we are introducing in these chapters, as well as a full working example, so please take a note of this
  • You may find some documentation online to be conflicting, so please stick with what's in these slides and the code

Quote server, persisted

  • We are now going to take our quote server that we have containerized earlier in this class and use a database
  • We are going to store all of the quotes into the database when the database starts up, and then we are going to use sql to retrieve the right quote when someone sends us the proper GET or POST request
  • But before we dive into code, we need to get into some new concepts

Communication between containers

  • Now that we are creating multiple docker containers, we will need to be mindful about how we communicate between the containers
  • We could map and open a port for each container, but there are some problems with that approach
  • Can you think of any?

Docker networks

  • In order to prevent direct communications with containers that do not need to be public, docker has a concept called networks
  • A docker network is managed by docker and enables every container that is started connected to a network to be able to directly communicate with each otehr
  • All docker containers within a network can access each other via their container name
docker create network my-network
(within container: http://quote-service-container/quote/sunday)

Docker compose and YAML

  • The next concept we are going to introduce today is the use of a docker compose yaml file
  • A yaml file is another structured data file type (like JSON) that stores data
  • It is used most often for configuration files
  • Not as verbose as JSON, but easy to get wrong so be careful
version: "3.7"
services:
  db:
    image: postgres:latest
    volumes: 
     - postgres-volume:/var/lib/postgresql/data
     - ./init.db.sql:/docker-entrypoint-initdb.d/init.db.sql

Docker compose and YAML

  • A docker compose yaml file is a configuration file that enables you to define all of the containers, images, volumes, and networks you need to create your entire system
  • All of your containers will be spun up and down all at once with the following commands: 
docker-compose up 
# spins up all defined containers, images, networks, and containers in yaml file

docker-compse down 
# shuts down all defined  containers, images, networks, and containers in yaml file

docker-compose down --rmi all -v --remove-orphans
# shuts down and deletes all defined  containers, images, networks, and 
# containers in yaml file. Use this if you are editing code and want to restart
# your containers

Example file for today

version: "3.7"
services:
  db:
    image: postgres:latest
    volumes: 
     - postgres-volume:/var/lib/postgresql/data
     - ./init.db.sql:/docker-entrypoint-initdb.d/init.db.sql
    environment:
      POSTGRES_USERNAME: postgres
      POSTGRES_PASSWORD: postgres
      POSTGRES_DB: quote-db 
  quote-service:
    build: ./quote-service/
    image: quote-service-image
    ports:
      - "5050:5000"
volumes:
  postgres-volume: {}
  

Example file for today

  • Today we are attaching a postgres database to our quote server
  • So the previous slide had two services (containers) that are going to be created: one named db (postgres database) and one named quote-service (our quote service).
  • If you look at the configuration, you will see that many of the items we have done on the command line are present in this configuration, including ports, volumes, environment variables, and networks 

Couple of things to note

  • The second volume set for the db service is a characteristic of the postgres container
  • Any files that have SQL in them that are placed in the container /docker-entrypoint-initdb.d/ on startup will execute on container load
  • This is a good place to put tables that create the tables you will need in your applications if it exists, and insert any 

Initial sql script

CREATE TABLE IF NOT EXISTS quotes (
    day_of_week VARCHAR(20) NOT NULL,
    quote VARCHAR(2000) NOT NULL
    );

INSERT INTO quotes (day_of_week, quote) VALUES
    ('sunday', 'Life is about making an impact, not making an income. -Kevin Kruse'),
    ('monday', 'Whatever the mind of man can conceive and believe, it can achieve. -Napoleon Hill'),
    ('tuesday', 'Strive not to be a success, but rather to be of value. -Albert Einstein'),
    ('wednesday', 'You miss 100% of the shots you dont take. -Wayne Gretzky'),
    ('thursday', 'Every strike brings me closer to the next home run. -Babe Ruth'),
    ('friday', 'We become what we think about. -Earl Nightingale'),
    ('saturday', 'Life is what happens to you while you are busy making other plans. -John Lennon');

Docker compose and networks

  • A new version of docker compose has released
  • All containers created in a docker compose are automatically added in a single network
  • So we don't have to define a network unless we want multiple containers that are isolated from each other
  • TLDR; no need to create a network within docker compose

Now let's go to the changes in the quote service

  • Found in demo/quote-service/quote.py
  • Notice that we needed to do the following:
    • (1) initialization to connect our quote service (quote-service) to our postgres db (db)
    • (2) Delete the python dictionary and write the Python and SQL to retrieve the quote from the database

quote.py

  • Found in demo/quote-service/quote.py
  • Notice that we needed to do the following:
    • (1) initialization to connect our quote service (quote-service) to our postgres db (db)
    • (2) Delete the python dictionary and write the Python and SQL to retrieve the quote from the database

quote.py (1/2)

from flask import Flask, request
from flask_sqlalchemy import SQLAlchemy
from sqlalchemy import text
import json

app = Flask(__name__)
db = SQLAlchemy()
app.config["SQLALCHEMY_DATABASE_URI"] = "postgresql://postgres:postgres@db:5432/quote-db"
db.init_app(app)

def gen_response(day_of_week):
    if not day_of_week:
        response = {"message": "We need the day_of_week in order to send a quote"}
        response_code = 400
    else:
        stmt = text("SELECT quote FROM quotes WHERE day_of_week = :x")
        stmt = stmt.bindparams(x=day_of_week)
        result = db.session.execute(stmt).all()

        if not result:
             response = {"message": "Sorry we don't know that day of the week" }
             response_code = 404
        else:
            response = {"day": day_of_week, "quote": result[0].quote}
            response_code = 200

    return json.dumps(response), response_code

quote.py (2/2)

@app.route('/quote/<day_of_week>')
def quote_of_the_day(day_of_week):
    return gen_response(day_of_week)

@app.route('/quote', methods = ['GET'])
@app.route('/quote/', methods = ['GET'])
def quote_of_the_day_get():
    day_of_week = request.args.get("day_of_week")
    return gen_response(day_of_week)

@app.route('/quote', methods = ['POST'])
def quote_of_the_day_post():
    data = request.get_json()
    day_of_week = data.get("day_of_week")
    return gen_response(day_of_week)

Caveats/Notes

  • We are using two new python libraries: SQLAlchemy and FLASK-SQLAlchemy
  • SQLAlchemy is a database agnostic library that enables us to write python that can send SQL commands to a database. Technically if we wanted to use another RDBMS (like MYSQL or SQLite) we could just change the connections string and keep going
  • FLASK-SQLAlchemy is a library that makes it more straightforward to use SQLAlchemy within a FLASK app
  • The URI: postgresql://postgres:postgres@db:5432/quote-db" is saying: use the postgresql protocol to log into the host named "db" with username postgres and password postgres. The default port is 5432, and the db that we are using is quote-db

Caveats/Notes

  • We want to send a SELECT statement to the database in order to retrieve the right quote that was requested
  • The text() class from SQLAlchemy checks our SQL to make sure its right
  • In order to pass a parameter safely to our sql, we use the colon notation ":x" or ":anything-you-want"
  • You then need to bind the parameter with a value from python
  • And then you execute the statement and execute the .all() function
  • The .all() function gives you an array of each row that is returned

Questions?