Storage Systems I

INFO 153B/253B: Backend Web Architecture

Kay Ashaolu

Why Data Storage?

When we make a web request, where do we get the data from?
When we create data, where do we put it?
Where are "resources" stored?

Example: bit.ly

Lots of data to store
- Shortcut to url mapping
- Statistics about links
- Information about users

Example: bit.ly

long url http://news.google.com
short url http://bit.ly/awekl
hit count 482240

long url http://facebook.com/user/profile
short url http://bit.ly/czasw
hit count 11023

long url http://msnbc.com/news/article/
short url http://bit.ly/olkjpl
hit count 1232

Data Storage Design

What is the storage format?
How do we lay out data?
How do we access data?

Why use a file?

What are the pros and cons?

http://news.google.com, http://bit.ly/awekl, 482240
http://facebook.com/user/profile, http://bit.ly/czasw, 11023
http://msnbc.com/news/article, http://bit.ly/olkjpl, 1232

Problems with files

What if we want to add another field?
What if we want to query different parts of data? How efficient is this?
What if we have concurrent accesses?
What data structures should we use?

Data Independence

Databases: apps shouldn’t have to worry about these problems!
Underlying storage format independent of application-level logic

Relational Data Stores

RDBMS: Relational Database Management System
Invented in the 1970s
e.g., Oracle, MySQL, Postgres, IBM DB2, Microsoft SQL Server
We are going to use Postgres in this class

Relational Model

Reason about sets of facts, or "tables"
Each fact is a "row"
Attributes are "columns" of row

For example

SQL Query Language

High-level query language over tables
Declarative: say "what" you want computed, not "how"
Why is this good?

RDBMS uses SQL

Postgres is one of the most popular and versatile databases that are widely used
If you ever used shared hosting, you probably had the option of creating a Postgres databases
Traditionally a RDBMS, but it's object relational/extensible nature makes it great for surprising non-relational use cases

We are going to use Docker to run Postgres!

One of the great benefits of containerization is that you can easily spin up an instance of any service (database, queue, rest api, etc.) as long as the developers maintain a docker image of it
And Postgres has done that: most open source applications have some sort of docker image available (typically at hub.docker.com)

Before we start, a word about persistence

Right now anything that we save in our docker containers are lost once we stop the container
That is because containers are inherently empherial: they primary use CPU (processing) and RAM (temporary storage) to run
They can save files to disk (e.g. your code files), but those files are deleted when the container is removed

Before we start, a word about persistence

This isn't going to work with most use cases of databases 😀
The entire point of a database is to store data and make it available to be retrieved and edited at a later time
Depending on a container to remain running in order to keep data is a very dangerous proposition
Imagine if all of the data on your computer was deleted every time you shut it down?

Docker's solution: volumes

A volume is an independent concept in Docker (just like images and containers)
"A Docker volume is an independent file system entirely managed by Docker and exists as a normal file or directory on the host, where data is persisted." (from https://earthly.dev/blog/docker-volumes/)
In essence you are telling docker to create an entity that refers to a folder on your local computer
This folder can then be mapped to a folder inside a running container
Anything saved in that folder inside the container is saved in the mapped local folder
Now we can run SQL commands:

Let's use volumes to start up a Postgres container

# Create a volume managed by docker. By default it is stored in a 
# subdirectory where docker is stored on your local computer
docker volume create demo-postgres-vol

# Create a container from the image postgres
docker run --name demo-postgres \
-e POSTGRES_USERNAME=postgres \
-e POSTGRES_PASSWORD=postgres \
-e POSTGRES_DB=demo_db \
-v demo-postgres-vol:/var/lib/postgresql/data \
-dit postgres

# See the running logs of your database container
docker logs -f demo-postgres

What's happening?

We first created a volume called "demo-postgres-vol" that points to a subdirectory where docker itself is installed
We then created a container running Postgres with the following characteristics:
- The name is "demo-postgres"
- We set three environment variables. "Environment variables are predetermined values that are typically used to provide the ability to configure a value in your code from outside of your application."

What's happening?

We first created a volume called "demo-postgres-vol" that points to a subdirectory where docker itself is installed
- We mapped this volume to a specific path where Postgres saves all data from their databases. As in, if you installed Postgres on your computer, it would save database data in the /var/lib/postgresql/data folder. So persisting that folder by connecting that path to the docker volume ensures that the database data is persisted

What's happening?

We then created a container running Postgres with the following characteristics:
- The name is "demo-postgres"
- We set three environment variables. "Environment variables are predetermined values that are typically used to provide the ability to configure a value in your code from outside of your application."

What's happening?

The docker image postgres provides the ability to configure the postgres instance inside the container dependent on the environment variables present on startup
In our command:
- POSTGRES_USERNAME sets the username of the user that is configured for the database
- POSTGRES_PASSWORD sets the password for the user that is configured for the database
- POSTGRES_DB is the name of the database that will be automatically created on container startup

Awesome! Now how do I access the database

We are going to use another docker command "docker exec" to execute a command inside the demo-postgres container to execute some SQL commands.
Execute this in a new terminal

docker exec -it demo-postgres /bin/bash

This commands executes the "bssh" command inside the demo-postgres container. bash is the terminal application where you can execute commands just like you are used to in the terminal

Awesome! Now how do I access the database

Now that we are able to send commands to the Postgres docker container, we need to execute an application called "psql" that will enable us to interactively write SQL commands to the database

psql -U postgres demo_db

We now have an interactive terminal to send SQL commands. We are logged into with the "postgres" user that we created on docker container startup, and we are inside the "demo_db" database that was also created
We can now send SQL commands to the database

Create Example

CREATE TABLE links (
    long_url VARCHAR(1000) NOT NULL,
    short_url VARCHAR(20) NOT NULL,
    hit_count BIGINT DEFAULT 0,
    created DATE
);

INSERT INTO links (long_url, short_url, hit_count, created) VALUES ('http://www.google.com', 'qwelmw', 2, '2016-04-05'), 
('http://www.twitter.com', 'adfer', 45, '2016-08-05');

Select Example

SELECT * FROM links;

long_url	short_url	hit_count	created
http://www.google.com	qwelmw	2	2016-4-5
http://www.facebook.com	adfer	45	2016-8-5

Select Example

SELECT * FROM links WHERE hit_count < 20;

long_url	short_url	hit_count	created
http://www.google.com	qwelmw	2	2016-4-5

Insert Example

SELECT * FROM links WHERE hit_count < 20;

long_url	short_url	hit_count	created
http://www.google.com	qwelmw	2	2016-4-5
http://www.twitter.com	eovle	9	2016-10-28

INSERT INTO links VALUES 
	('http://www.twitter.com', 'eovle', 9, '2016-10-28');

Update Example

SELECT long_url, short_url, hit_count FROM links;

long_url	short_url	hit_count
http://www.google.com	qwelmw	0
http://www.facebook.com	adfer	0
http://www.twitter.com	eovle	9

UPDATE links SET hit_count = '0' WHERE created < '2016-10-22';

Storage Systems I

Why Data Storage?

Example: bit.ly

Example: bit.ly

Data Storage Design

Why use a file?

Problems with files

Data Independence

Relational Data Stores

Relational Model

For example

For example

SQL Query Language

RDBMS uses SQL

We are going to use Docker to run Postgres!

Before we start, a word about persistence

Before we start, a word about persistence

Docker's solution: volumes

Let's use volumes to start up a Postgres container

What's happening?

What's happening?

What's happening?

What's happening?

Awesome! Now how do I access the database

Awesome! Now how do I access the database

Create Example

Select Example

Select Example

Insert Example

Update Example

Questions?