Database and SQL Fundamentals

Who am I?

  • Columbus State Alumni
  • BSBA and MBA from THE Ohio State University
  • Over 10 years of IT Experience in Columbus
  • Currently working for Chase on their Knowledge Management application

What we will learn

  • What is a database
  • How to interact with a relational database
  • What is on the horizon

Important Concepts

Required Technologies

  • Git
  • Postgres
  • Node

Other important tools

  • AWS(RDS, EC2)
  • IDE (IJ, Eclipse, SublimeText, VSCode)
  • Git viewer (GitKrakken, SourceViewer)

Review Blackboard

What Is a Database?

Day 1

Feedback

  • Issues: Create Github Issue
  • Suggestions: Create Github PR
  • Help: https://cscc7730.slack.com

Lets get started!

How is data stored?

 

File System

HFS(+)

Spreadsheets

Microsoft Access

Limitations

  • Scalability
  • Performance
  • Locking

1960s Data Modeling

1970s Enter SQL

1990s Enter RDBMS

RDBMS

(Relational DataBase Management System)

Relation

Relation

RDBMS

SQL

User

Other applications

accept SQL!

RDBMS Structure

  • Relations/Tables
  • Rows/Tuples
  • Attributes

Relations are orderless!

PSQL

Lab: Setup Local Database

VS

Lab: Connect Via Gui

SQL Fiddle

SQL Fiddle Lab

2010+ Cloud

2020+ Blockchain

Basic Database Interaction Pt 1

Day 2

Stand Up

Questions

Self-Hosted Servers

Tried and True

Hosted Dedicated Servers

  • Cheaper but still expensive

  • No infra cost

  • More reliable

Hosted Shared Servers

  • Cheaper still

  • Only get a portion of resources

  • Might have performance issues

PAAS Servers

  • Only pay for what you use

  • Slightly pricier if run all the time

  • Allow for either dedicated or shared servers

RDS

(AWS) Relational Database Service

Connecting Using PSQL

Local:

psql

Remote (RDS)

psql -h <address> -p <port> -U <user> <db>

Install Git And Download Repo

Why Caps?

Creating A Schema

  • To allow many users to use one database without interfering with each other.

  • To organize database objects into logical groups to make them more manageable.

  • Third-party applications can be put into separate schemas so they do not collide with the names of other objects.

Nouns == Relation

Creating A Relation

(Table)

Break It Down

Create <THING> <NAME>

Create TABLE THING(

  NAME VARCHAR(50)

)

Create TABLE THING(

  <ATTRIBUTE> <ATTR_TYPE> (,) 

)

Attribute Constraints

  • Not Null
  • Reference

CREATE TABLE FOOBAR(

    FOO INT NOT NULL,

    BAR INT 

)

  

INSERT INTO FOOBAR VALUES (1,0)

INSERT INTO FOOBAR (BAR) VALUES (1)

More on this later!

Attribute Data Types

Attribute Data Types

TLDR

  • Numeric
  • Varchar
  • Boolean
  • Other

Numeric

Scale Matters

INT

    4 Bytes

        -2147483648 to +2147483647

DECIMAL

    Variable

 

Casting Numerics

CAST(5.00 as MONEY)

Varchar

Variable Character Array

Boolean

Other

Examples 

  • Byte String
  • Points
  • Custom
  • Composite
  • more

Other RDBMS Data Types

Altering A Table

Be Careful!

  1. Create Table With Constraint

  2. Alter Table to add new attribute

  3. Drop new attribute

Basic Create And Delete Lab

Inserting Into a Table

Try Inserting A Tuple

Updating Table Values

UPDATE <TABLE> SET <COLUMN>=<VALUE>

Selecting Data

SELECT retrieves rows from zero or more tables

Simple Select

SELECT 1+1;

From Clause

SELECT * FROM <TABLE>;

Where Clause

SELECT * FROM <TABLE> WHERE <COLUMN> = <VALUE>;

Logical Operators

  • AND
  • OR
  • NOT

Comparison Operators

  • =
  • < <=
  • > >=
  • <> !=

Comparison Operators

Removing Table Values

Select Lab

Basic Database Interaction Pt 2

Day 3

Stand Up

Questions

Review

Creating Relations

Adding Data

Selecting And Filtering Data

Person

Address

Customer

Profile

Employee

Info

Joins

Joins Are Links Declared In The Query

Join Types

  • Cross
    • Repeated once per record (on both sides)
    • With no Where same as Cartesian Product
  • Qualified joins
    • Has restrictions
    • Inner is default

Cross Join

"Kind of" From Both Tables

Qualified Join

T1 { [INNER] | { LEFT | RIGHT | FULL } [OUTER] } JOIN T2 ON boolean_expression

Inner Join

What we have in common

Directional

  • Which side are you interested in?
  • Sides
    • Left
    • Right

Left Join

Look to the left

Right Join

Look to the right

Full Outer

Join

From Both Tables

Constraint

  • Check
  • Not Null
  • Unique
  • Keys

Allows for more fine grained control over types.

Check

Limits Values Allowed

Not Null

Must Have Value

Unique

Basic Constraints Lab

Primary Key

Serial

Foreign Key

References

Cascade

  • No Action (Default): Check end of transaction
  • Restrict: Check now
  • Cascade: Delete referencing row as well
  • Set Null: Set the reference to null
  • Set Default: Set all to a value

Cascade

Keys, Joins and Cascade Lab

Composite Keys

Returning

Running SQL Script Files

Functions

  • Built-in
  • Declared

Mathematical Operators

+ - / * %

abs() pi() sqrt()

cos() sin() tan()

GROUP BY

Create Receipts Lab

Advanced Database Interactions Pt 1

Day 4

Standup

Questions

Like Clause

Posix Regex

REGEXP_MATCHES(NAME, '(.+)(e)$')

Comparison Predicates

HAVING

As

SELECT NAME AS PRODUCT_NAME

FROM PRODUCT

Other Important

  • COALESCE
  • NULLIF
  • Limit / Order By
  • Left / Right
  • BEGIN/END

 

fiddle

Complex Search Lab

Case

Subqueries

Challenge!

Using nested selects, query all of the items purchased which were over $3 per Person. The result should be Person Name, Product Purchased and Price. There should be only 1 record for Person/Product Combo

Custom SQL Functions

Create Function

CREATE FUNCTION <Function Name>(<Param Type>...) RETURNS <Return Type> AS <Function Query>
LANGUAGE <Lang>

`OR REPLACE`

"It is not possible to change the name or argument types of a function this way (if you tried, you would actually be creating a new, distinct function)"

Function Overloading

Now Lets Create A Function!

Use Case

We to take the following schema...

CREATE TABLE ADDRESS(
  ID SERIAL PRIMARY KEY,
  ADDRESS_LINE_1 VARCHAR(50),
  ADDRESS_LINE_2 VARCHAR(50),
  CITY VARCHAR(50),
  STATE VARCHAR(2),
  ZIPCODE VARCHAR(5)
);
INSERT INTO ADDRESS (ADDRESS_LINE_1, CITY, STATE, ZIPCODE) VALUES ('301 W. 2nd Ave', 'Columbus', 'OH', '43201');

And create a function to concat the values

|| concats a string so `SELECT 'A' || 'B';` returns `AB`

Variable Naming

$$ vs '

Create Function Lab

Backup

What is Important?

Schema

Data

Data

  • SQL Dump
  • File Level Archive
  • Continuous Archiving

SQL Dump

Standard Out

Cons

  • Large databases may require info to partitioned across multiple files.
  • Issues query so can lock DB tables

Pros

  • Super easy for small DBs
  • Can be specific by using params like --table public.tablename dbname

File System Backup

Database MUST be STOPPED

To back up an individual table copy "pg_clog/*" as well!

Other issues if clustered

Cons

Pros

  • Doesn't require any Postgres specific interfaces
  • Easily scripted and automated
  • No SQL to Restore
  • No DB Locks
  • Very space intensive
  • High I/O on storage
  • Cannot be specific

Write Ahead Log

Postgres

Logs Query

Log

Storage and Playback

Stores Query For Backup

Cons

Pros

  • Allows for sets of commands to be "rolled" back or forward in time.
  • Can provide the ability to "fix" a broken transaction.
  • Very complex to implement this robustly
  • Does not provide a "snapshot" of existing data
  • Most space intensive option

Custom Types

  • Composite
  • Enumerated
  • Range
  • Base
  • Array

Custom Type Lab

Views

View Creation Lab

Advanced Database Interactions Pt 2

Day 5

Standup

Relationships

  • One to One
  • One to Many
  • Many to Many

One To One

  • Used for organizing data
  • Not easy to insert at the same time
  • Sometimes used for metadata
  • Red flag

Person

Customer

Customer

Customer

Transaction

Transaction

Transaction

TransactionProduct

Product

Product

Product

AKA Mapping Table

RDBMSs

Struggle With Multi-Level Relationships

More on this later

Speaking of Performance!

Indexing

Review

Relations are Orderless Unless You Specify In Query

OR....

You Declare An Index

Pros

  • Speeds up most selects if applied properly

Cons

  • Tends to slow down inserts and updates. 

Types of Indices

  • B-tree
  • Hash
  • GiST
  • SP-GiST
  • GIN
  • BRIN

B-Tree

  • Default
  • Mathematically Complex balanced tree

Large Objects

  • Provides ability to store large objects
  • Objects are split into chunks in a B-Tree
  • End of the day there are better options

Transactions

  • "The essential point of a transaction is that it bundles multiple steps into a single, all-or-nothing operation."

BEGIN;
<Do Something>
SAVEPOINT <Savepoint_Name>;
<Do More Stuff>
ROLLBACK TO <Savepoint_Name>;
<Do Even More Stuff>
COMMIT;

tomicity

A

C

I

D

urability

solation

onsistency

Atomicity

All or Nothing.

Transaction is a unit.

Consistency

Database goes from a valid state before the transaction to a valid state after the transaction.

Consistency

Database goes from a valid state before the transaction to a valid state after the transaction.

Isolation

Transactions happen in isolation of other transactions.

Durability

Once committed a transaction remains committed.

Postgres is ACID Compliant

Other DBs might not be fully complaint

Security

  • Authentication
  • Acess

Trust Authentication

Everyone gets in

Password Authentication

Stores passwords in PostGres

LDAP, ETC Authentication

Stores Password in LDAP 

Roles

Grant

Roles Lab

Statistics

Clustering

Normalization and Data Modeling

Day 6

Using Data

Day 7

Triggers

`When Validation Isn't Enough`

Triggers

  • Can be used with tables or views
  • Calls a function

Also Interesting

  • If multiple triggers of the same kind are defined for the same event, they will be fired in alphabetical order by name.
  • SELECT does not modify any rows so you cannot create SELECT triggers.

Trigger Lab

LibPQ 

C Library For Postgres Communication

...

#include <libpq-fe.h>

...

PGconn *conn = PQconnectdb("user=postgres dbname=postgres");
if (PQstatus(conn) == CONNECTION_BAD) {

    ...

}

PGresult *res = PQexec(
  conn, 
  "DROP TABLE IF EXISTS Cars"
);
if (PQresultStatus(res) != PGRES_COMMAND_OK) {
    ...
}

ODBC

JDBC

Native server driver, must be installed per OS

Same drive all OSes but must have Java

ORM and JPA

Object

Relational

Mapping

 

Complex SQL

Complex SQL

JSON Object

ORM

Spring

Java Persistence API

JPA == ORM for Java

REST Controller

Mapping Service

JPA Repo

JPA Lab

JPA

  • @Entity
  • @Id
  • @OneToMany
  • @ManyToOne

SQL

  • Table
  • Primary Key
  • Foreign Key
  • Mapping Table
  • spring.jpa.show-sql
  • spring.jpa.hibernate.ddl-auto
  • spring.jpa.database-platform

@Qualifier

@Entity("Schema1")

...

@Qualifier("Schema1")

@Entity("Schema2")

...

@Qualifier("Schema2")

Schema1.User

Schema2.User

Also Works With Spring

SQLObject

SQLAlchemy

SQL Injection

"SQL statement that you will unknowingly run on your database"

Mitigation

Open Web Application Security Project (OWASP)

Prepared Statements

Escape Input

Visualization

Tableau

Tableau Lab

D3

SQL in IoT

  • On Disk
  • In Memory

SQLLite

Basic Advantage:

Deals with streams of real time data

Secret Sauce:
In Memory

What are the factors the affect your needs?

Capstone

Create a real-world use case

Use a Postgres DB to store the data for this use-case

Give a presentation on the Use Case

  • Create a schema
  • Create at least 4 tables
  • Create at least 1 Function
  • Create at least 1 view
  • SQL Schema
  • PG_DUMP
  • JPA Code
  • Tableau Graph (Optional)
  • 10 minute Presentation

Deliverables

The Future of Databases

Final Class

Review

Relational Databases are the equivalent of structured Binary Trees

NoSQL

"No SQL" or Non-Relational

Real Time Data Streams

Extremely Large Datasets

Neural Network

Tables

Why do We Care?

Document Oriented Database

Pros

  • Really fast if you are looking for a particular item.
  • 2nd most popular
  • Allows for storing rather large documents
  • Highly scalable 
  • Resilient

Cons

  • Searches are slow
  • Need to "manage" relationships.
  • More verbose
  • Doesn't line up exactly with certain paradigms

AWS Dynamo

Real Time Data Streams

Common Use Cases

  • IoT
  • Market Making
  • Issue Mitigation

Topic

Subscriber

Subscriber

Subscriber

Pros

  • Can go up and down
  • Can be replayed (stream capture)
  • Easily filtered
  • Super fast processing

Cons

  • Verbose
  • Chatty (Network)
  • Not reliable
  • Doesn't natively maintain a record

"Big Data"

Extremely large datasets

Distributed or parallel

HDFS: Distributed File System

 

 

Mapreduce: Distributed Computational Engine

2 Parts

Mapreduce

Spark

Flatmap Vs Spack Map

Allows for parallel processing

Neural Networks

Graph Theory

G = (V, E)

G: Graph

V: Vertices

E: Edges

Blockchain

Qubits

Presentations

What we will learned

  • What is a database
  • How to interact with a relational database
  • What is on the horizon

What is a Database

  • A Repository for information
  • A BTree Representation
  • Not Magic

Interacting with RDBMS

  • Connect and Create DBs and Schemas
  • Add data to tables within the schema
  • Use this data to answer questions using SQL

What is on the horizon

  • Using visualization tools
  • Working with non-relational dbs
  • Integrating technologies together

Thanks

You guys did great!

SQL Class

By Jackie Gleason

SQL Class

  • 1,257