Storage Systems I
INFO 153B/253B: Backend Web Architecture
Kay Ashaolu
Why Data Storage?
- When we make a web request, where do we get the data from?
- When we create data, where do we put it?
- Where are "resources" stored?
Example: bit.ly
-
Lots of data to store
- Shortcut to url mapping
- Statistics about links
- Information about users
Example: bit.ly
long url http://news.google.com
short url http://bit.ly/awekl
hit count 482240
long url http://facebook.com/user/profile
short url http://bit.ly/czasw
hit count 11023
long url http://msnbc.com/news/article/
short url http://bit.ly/olkjpl
hit count 1232
Data Storage Design
- What is the storage format?
- How do we lay out data?
- How do we access data?
Why use a file?
- What are the pros and cons?
http://news.google.com, http://bit.ly/awekl, 482240
http://facebook.com/user/profile, http://bit.ly/czasw, 11023
http://msnbc.com/news/article, http://bit.ly/olkjpl, 1232
Problems with files
- What if we want to add another field?
- What if we want to query different parts of data? How efficient is this?
- What if we have concurrent accesses?
- What data structures should we use?
Data Independence
- Databases: apps shouldn’t have to worry about these problems!
- Underlying storage format independent of application-level logic
Relational Data Stores
- RDBMS: Relational Database Management System
- Invented in the 1970s
- e.g., Oracle, MySQL, Postgres, IBM DB2, Microsoft SQL Server
- We are going to use Postgres in this class
Relational Model
- Reason about sets of facts, or "tables"
- Each fact is a "row"
- Attributes are "columns" of row
For example
For example
SQL Query Language
- High-level query language over tables
- Declarative: say "what" you want computed, not "how"
- Why is this good?
RDBMS uses SQL
- Postgres is one of the most popular and versatile databases that are widely used
- If you ever used shared hosting, you probably had the option of creating a Postgres databases
- Traditionally a RDBMS, but it's object relational/extensible nature makes it great for surprising non-relational use cases
We are going to use Docker to run Postgres!
- One of the great benefits of containerization is that you can easily spin up an instance of any service (database, queue, rest api, etc.) as long as the developers maintain a docker image of it
- And Postgres has done that: most open source applications have some sort of docker image available (typically at hub.docker.com)
Before we start, a word about persistence
- Right now anything that we save in our docker containers are lost once we stop the container
- That is because containers are inherently empherial: they primary use CPU (processing) and RAM (temporary storage) to run
- They can save files to disk (e.g. your code files), but those files are deleted when the container is removed
Before we start, a word about persistence
- This isn't going to work with most use cases of databases 😀
- The entire point of a database is to store data and make it available to be retrieved and edited at a later time
- Depending on a container to remain running in order to keep data is a very dangerous proposition
- Imagine if all of the data on your computer was deleted every time you shut it down?
Docker's solution: volumes
- A volume is an independent concept in Docker (just like images and containers)
- "A Docker volume is an independent file system entirely managed by Docker and exists as a normal file or directory on the host, where data is persisted." (from https://earthly.dev/blog/docker-volumes/)
- In essence you are telling docker to create an entity that refers to a folder on your local computer
- This folder can then be mapped to a folder inside a running container
- Anything saved in that folder inside the container is saved in the mapped local folder
- Now we can run SQL commands:
Let's use volumes to start up a Postgres container
# Create a volume managed by docker. By default it is stored in a
# subdirectory where docker is stored on your local computer
docker volume create demo-postgres-vol
# Create a container from the image postgres
docker run --name demo-postgres \
-e POSTGRES_USERNAME=postgres \
-e POSTGRES_PASSWORD=postgres \
-e POSTGRES_DB=demo_db \
-v demo-postgres-vol:/var/lib/postgresql/data \
-dit postgres
# See the running logs of your database container
docker logs -f demo-postgres
What's happening?
What's happening?
What's happening?
What's happening?
Awesome! Now how do I access the database
docker exec -it demo-postgres /bin/bash
Awesome! Now how do I access the database
psql -U postgres demo_db
Create Example
CREATE TABLE links (
long_url VARCHAR(1000) NOT NULL,
short_url VARCHAR(20) NOT NULL,
hit_count BIGINT DEFAULT 0,
created DATE
);
INSERT INTO links (long_url, short_url, hit_count, created) VALUES ('http://www.google.com', 'qwelmw', 2, '2016-04-05'),
('http://www.twitter.com', 'adfer', 45, '2016-08-05');
Select Example
SELECT * FROM links;
long_url |
short_url | hit_count | created |
---|---|---|---|
http://www.google.com |
qwelmw | 2 | 2016-4-5 |
http://www.facebook.com |
adfer | 45 | 2016-8-5 |
Select Example
SELECT * FROM links WHERE hit_count < 20;
long_url |
short_url | hit_count | created |
---|---|---|---|
http://www.google.com |
qwelmw | 2 | 2016-4-5 |
Insert Example
SELECT * FROM links WHERE hit_count < 20;
long_url | short_url | hit_count | created |
---|---|---|---|
http://www.google.com | qwelmw | 2 | 2016-4-5 |
http://www.twitter.com | eovle | 9 | 2016-10-28 |
INSERT INTO links VALUES
('http://www.twitter.com', 'eovle', 9, '2016-10-28');
Update Example
SELECT long_url, short_url, hit_count FROM links;
long_url | short_url | hit_count |
---|---|---|
http://www.google.com | qwelmw | 0 |
http://www.facebook.com | adfer | 0 |
http://www.twitter.com | eovle | 9 |
UPDATE links SET hit_count = '0' WHERE created < '2016-10-22';
Questions?
Storage Systems I - Backend Webarch
By kayashaolu
Storage Systems I - Backend Webarch
Course Website: https://www.ischool.berkeley.edu/courses/info/253b
- 1,504