Database and SQL Fundamentals

Who am I?

Columbus State Alumni
BSBA and MBA from THE Ohio State University
Over 10 years of IT Experience in Columbus
Currently working for Chase on their Knowledge Management application

What we will learn

What is a database
How to interact with a relational database
What is on the horizon

Important Concepts

Don't Repeat Yourself (Pragmatic Programmer)
Convention over Configuration (Ruby)
Be opinionated but not judgmental (Personal/ Opinionated Software)
Standards Scale(IEEE)
Be yourself (Life)

Required Technologies

Git
Postgres
Node

Other important tools

AWS(RDS, EC2)
IDE (IJ, Eclipse, SublimeText, VSCode)
Git viewer (GitKrakken, SourceViewer)

Review Blackboard

What Is a Database?

Day 1

Feedback

Issues: Create Github Issue
Suggestions: Create Github PR
Help: https://cscc7730.slack.com

Lets get started!

How is data stored?

File System

HFS(+)

Spreadsheets

Microsoft Access

Limitations

Scalability
Performance
Locking

1960s Data Modeling

1970s Enter SQL

1990s Enter RDBMS

RDBMS

(Relational DataBase Management System)

Relation

RDBMS

SQL

User

Other applications

accept SQL!

RDBMS Structure

Relations/Tables
Rows/Tuples
Attributes

Relations are orderless!

PSQL

Lab: Setup Local Database

VS

Lab: Connect Via Gui

SQL Fiddle

SQL Fiddle Lab

2010+ Cloud

2020+ Blockchain

Basic Database Interaction Pt 1

Day 2

Stand Up

Questions

Self-Hosted Servers

Tried and True

Hosted Dedicated Servers

Cheaper but still expensive
No infra cost
More reliable

Hosted Shared Servers

Cheaper still
Only get a portion of resources
Might have performance issues

PAAS Servers

Only pay for what you use
Slightly pricier if run all the time
Allow for either dedicated or shared servers

RDS

(AWS) Relational Database Service

Connecting Using PSQL

Local:

psql

Remote (RDS)

psql -h <address> -p <port> -U <user> <db>

Install Git And Download Repo

Why Caps?

Creating A Schema

To allow many users to use one database without interfering with each other.
To organize database objects into logical groups to make them more manageable.
Third-party applications can be put into separate schemas so they do not collide with the names of other objects.

https://www.postgresql.org/docs/9.1/static/ddl-schemas.html

Nouns == Relation

Creating A Relation

(Table)

Break It Down

Create <THING> <NAME>

Create TABLE THING(

NAME VARCHAR(50)

)

Create TABLE THING(

<ATTRIBUTE> <ATTR_TYPE> (,)

)

Attribute Constraints

Not Null
Reference

CREATE TABLE FOOBAR(

FOO INT NOT NULL,

BAR INT

)

INSERT INTO FOOBAR VALUES (1,0)

INSERT INTO FOOBAR (BAR) VALUES (1)

More on this later!

Attribute Data Types

TLDR

Numeric
Varchar
Boolean
Other

Numeric

Scale Matters

INT

4 Bytes

-2147483648 to +2147483647

DECIMAL

Variable

Casting Numerics

CAST(5.00 as MONEY)

Varchar

Variable Character Array

Boolean

Other

Examples

Byte String
Points
Custom
Composite
more

Other RDBMS Data Types

MySQL Types

Altering A Table

Be Careful!

Create Table With Constraint
Alter Table to add new attribute
Drop new attribute

Basic Create And Delete Lab

Inserting Into a Table

Try Inserting A Tuple

Updating Table Values

UPDATE <TABLE> SET <COLUMN>=<VALUE>

Selecting Data

SELECT retrieves rows from zero or more tables

Simple Select

SELECT 1+1;

From Clause

SELECT * FROM <TABLE>;

Where Clause

SELECT * FROM <TABLE> WHERE <COLUMN> = <VALUE>;

Logical Operators

AND
OR
NOT

Comparison Operators

=
< <=
> >=
<> !=

Comparison Operators

Removing Table Values

Select Lab

Basic Database Interaction Pt 2

Day 3

Stand Up

Questions

Review

Creating Relations

Adding Data

Selecting And Filtering Data

Person

Address

Customer

Profile

Employee

Info

Joins

Joins Are Links Declared In The Query

Join Types

Cross
- Repeated once per record (on both sides)
- With no Where same as Cartesian Product
Qualified joins
- Has restrictions
- Inner is default

Cross Join

"Kind of" From Both Tables

SQL Fiddle

Qualified Join

T1 { [INNER] | { LEFT | RIGHT | FULL } [OUTER] } JOIN T2 ON boolean_expression

Inner Join

What we have in common

SQL Fiddle

Directional

Which side are you interested in?
Sides
- Left
- Right

Left Join

Look to the left

Right Join

Look to the right

Full Outer

Join

From Both Tables

Example

Constraint

Check
Not Null
Unique
Keys

Allows for more fine grained control over types.

Check

Limits Values Allowed

Not Null

Must Have Value

Unique

Basic Constraints Lab

Primary Key

Serial

Foreign Key

References

Cascade

No Action (Default): Check end of transaction
Restrict: Check now
Cascade: Delete referencing row as well
Set Null: Set the reference to null
Set Default: Set all to a value

Cascade

Keys, Joins and Cascade Lab

Composite Keys

Returning

Running SQL Script Files

Functions

Built-in
Declared

Mathematical Operators

+ - / * %

abs() pi() sqrt()

cos() sin() tan()

GROUP BY

Create Receipts Lab

Advanced Database Interactions Pt 1

Day 4

Standup

Questions

Like Clause

Fiddle

Posix Regex

REGEXP_MATCHES(NAME, '(.+)(e)$')

Comparison Predicates

HAVING

As

SELECT NAME AS PRODUCT_NAME

FROM PRODUCT

Other Important

COALESCE
NULLIF
Limit / Order By
Left / Right
BEGIN/END

fiddle

Complex Search Lab

Case

Fiddle

Subqueries

Challenge!

Using nested selects, query all of the items purchased which were over $3 per Person. The result should be Person Name, Product Purchased and Price. There should be only 1 record for Person/Product Combo

Custom SQL Functions

Create Function

Fiddle

CREATE FUNCTION <Function Name>(<Param Type>...) RETURNS <Return Type> AS <Function Query>
LANGUAGE <Lang>

`OR REPLACE`

"It is not possible to change the name or argument types of a function this way (if you tried, you would actually be creating a new, distinct function)"

Function Overloading

Now Lets Create A Function!

Use Case

We to take the following schema...

CREATE TABLE ADDRESS(
  ID SERIAL PRIMARY KEY,
  ADDRESS_LINE_1 VARCHAR(50),
  ADDRESS_LINE_2 VARCHAR(50),
  CITY VARCHAR(50),
  STATE VARCHAR(2),
  ZIPCODE VARCHAR(5)
);
INSERT INTO ADDRESS (ADDRESS_LINE_1, CITY, STATE, ZIPCODE) VALUES ('301 W. 2nd Ave', 'Columbus', 'OH', '43201');

And create a function to concat the values

|| concats a string so `SELECT 'A' || 'B';` returns `AB`

Variable Naming

$$ vs '

Create Function Lab

Backup

What is Important?

Schema

Data

SQL Dump
File Level Archive
Continuous Archiving

SQL Dump

Standard Out

Cons

Large databases may require info to partitioned across multiple files.
Issues query so can lock DB tables

Pros

Super easy for small DBs
Can be specific by using params like --table public.tablename dbname

File System Backup

Database MUST be STOPPED

To back up an individual table copy "pg_clog/*" as well!

Other issues if clustered

Cons

Pros

Doesn't require any Postgres specific interfaces
Easily scripted and automated
No SQL to Restore
No DB Locks

Very space intensive
High I/O on storage
Cannot be specific

Write Ahead Log

Postgres

Logs Query

Log

Storage and Playback

Stores Query For Backup

Cons

Pros

Allows for sets of commands to be "rolled" back or forward in time.
Can provide the ability to "fix" a broken transaction.

Very complex to implement this robustly
Does not provide a "snapshot" of existing data
Most space intensive option

Custom Types

Composite
Enumerated
Range
Base
Array

Custom Type Lab

Views

View Creation Lab

Advanced Database Interactions Pt 2

Day 5

Standup

Relationships

One to One
One to Many
Many to Many

One To One

Used for organizing data
Not easy to insert at the same time
Sometimes used for metadata
Red flag

Person

Customer

Transaction

TransactionProduct

Product

AKA Mapping Table

RDBMSs

Struggle With Multi-Level Relationships

Speaking of Performance!

Indexing

Review

Relations are Orderless Unless You Specify In Query

OR....

You Declare An Index

Pros

Speeds up most selects if applied properly

Cons

Tends to slow down inserts and updates.

Types of Indices

B-tree
Hash
GiST
SP-GiST
GIN
BRIN

B-Tree

Default
Mathematically Complex balanced tree

Large Objects

Provides ability to store large objects
Objects are split into chunks in a B-Tree
End of the day there are better options

Transactions

"The essential point of a transaction is that it bundles multiple steps into a single, all-or-nothing operation."

BEGIN;
<Do Something>
SAVEPOINT <Savepoint_Name>;
<Do More Stuff>
ROLLBACK TO <Savepoint_Name>;
<Do Even More Stuff>
COMMIT;

tomicity

A

C

I

D

urability

solation

onsistency

Atomicity

All or Nothing.

Transaction is a unit.

Consistency

Database goes from a valid state before the transaction to a valid state after the transaction.

Consistency

Database goes from a valid state before the transaction to a valid state after the transaction.

Isolation

Transactions happen in isolation of other transactions.

Durability

Once committed a transaction remains committed.

Postgres is ACID Compliant

Other DBs might not be fully complaint

Security

Authentication
Acess

Trust Authentication

Everyone gets in

Password Authentication

Stores passwords in PostGres

LDAP, ETC Authentication

Stores Password in LDAP

Roles

Grant

Roles Lab

Statistics

Clustering

Using Data

Day 7

Triggers

`When Validation Isn't Enough`

Triggers

Can be used with tables or views
Calls a function

Also Interesting

If multiple triggers of the same kind are defined for the same event, they will be fired in alphabetical order by name.
SELECT does not modify any rows so you cannot create SELECT triggers.

Trigger Lab

LibPQ

C Library For Postgres Communication

...

#include <libpq-fe.h>

...

PGconn *conn = PQconnectdb("user=postgres dbname=postgres");

if (PQstatus(conn) == CONNECTION_BAD) {

...

}

PGresult *res = PQexec(
  conn, 
  "DROP TABLE IF EXISTS Cars"
);
if (PQresultStatus(res) != PGRES_COMMAND_OK) {
    ...
}

ODBC

JDBC

Native server driver, must be installed per OS

Same drive all OSes but must have Java

ORM and JPA

Object

Relational

Mapping

Complex SQL

JSON Object

ORM

Spring

Java Persistence API

JPA == ORM for Java

REST Controller

Mapping Service

JPA Repo

JPA Lab

JPA

@Entity
@Id
@OneToMany
@ManyToOne

SQL

Table
Primary Key
Foreign Key
Mapping Table

spring.jpa.show-sql
spring.jpa.hibernate.ddl-auto
spring.jpa.database-platform

@Qualifier

@Entity("Schema1")

...

@Qualifier("Schema1")

@Entity("Schema2")

...

@Qualifier("Schema2")

Schema1.User

Schema2.User

Also Works With Spring

SQLObject

SQLAlchemy

SQL Injection

"SQL statement that you will unknowingly run on your database"

Mitigation

Open Web Application Security Project (OWASP)

Prepared Statements

Escape Input

Visualization

Tableau

Tableau Lab

D3

Examples

SQL in IoT

On Disk
In Memory

SQLLite

Basic Advantage:

Deals with streams of real time data

Secret Sauce:
In Memory

What are the factors the affect your needs?

Capstone

Create a real-world use case

Use a Postgres DB to store the data for this use-case

Give a presentation on the Use Case

Create a schema
Create at least 4 tables
Create at least 1 Function
Create at least 1 view
SQL Schema
PG_DUMP
JPA Code
Tableau Graph (Optional)
10 minute Presentation

Deliverables

The Future of Databases

Final Class

Review

Relational Databases are the equivalent of structured Binary Trees

NoSQL

"No SQL" or Non-Relational

Real Time Data Streams

Extremely Large Datasets

Neural Network

Tables

Why do We Care?

Link

Document Oriented Database

Pros

Really fast if you are looking for a particular item.
2nd most popular
Allows for storing rather large documents
Highly scalable
Resilient

Cons

Searches are slow
Need to "manage" relationships.
More verbose
Doesn't line up exactly with certain paradigms

AWS Dynamo

Real Time Data Streams

Common Use Cases

IoT
Market Making
Issue Mitigation

Topic

Subscriber

Pros

Can go up and down
Can be replayed (stream capture)
Easily filtered
Super fast processing

Cons

Verbose
Chatty (Network)
Not reliable
Doesn't natively maintain a record

"Big Data"

Extremely large datasets

Distributed or parallel

HDFS: Distributed File System

Mapreduce: Distributed Computational Engine

2 Parts

Mapreduce

Spark

Flatmap Vs Spack Map

Allows for parallel processing