Database and SQL Fundamentals
Who am I?
- Columbus State Alumni
- BSBA and MBA from THE Ohio State University
- Over 10 years of IT Experience in Columbus
- Currently working for Chase on their Knowledge Management application
What we will learn
- What is a database
- How to interact with a relational database
- What is on the horizon
Important Concepts
- Don't Repeat Yourself (Pragmatic Programmer)
- Convention over Configuration (Ruby)
- Be opinionated but not judgmental (Personal/ Opinionated Software)
- Standards Scale(IEEE)
- Be yourself (Life)
Required Technologies
- Git
- Postgres
- Node
Other important tools
- AWS(RDS, EC2)
- IDE (IJ, Eclipse, SublimeText, VSCode)
- Git viewer (GitKrakken, SourceViewer)
Review Blackboard
What Is a Database?
Day 1
Feedback
- Issues: Create Github Issue
- Suggestions: Create Github PR
- Help: https://cscc7730.slack.com
Lets get started!
How is data stored?
File System
HFS(+)
Spreadsheets
Microsoft Access
Limitations
- Scalability
- Performance
- Locking
1960s Data Modeling
1970s Enter SQL
1990s Enter RDBMS
RDBMS
(Relational DataBase Management System)
Relation
Relation
RDBMS
SQL
User
Other applications
accept SQL!
RDBMS Structure
- Relations/Tables
- Rows/Tuples
- Attributes
Relations are orderless!
PSQL
Lab: Setup Local Database
VS
Lab: Connect Via Gui
SQL Fiddle
SQL Fiddle Lab
2010+ Cloud
2020+ Blockchain
Basic Database Interaction Pt 1
Day 2
Stand Up
Questions
Self-Hosted Servers
Tried and True
Hosted Dedicated Servers
-
Cheaper but still expensive
-
No infra cost
-
More reliable
Hosted Shared Servers
-
Cheaper still
-
Only get a portion of resources
-
Might have performance issues
PAAS Servers
-
Only pay for what you use
-
Slightly pricier if run all the time
-
Allow for either dedicated or shared servers
RDS
(AWS) Relational Database Service
Connecting Using PSQL
Local:
psql
Remote (RDS)
psql -h <address> -p <port> -U <user> <db>
Install Git And Download Repo
Why Caps?
Creating A Schema
-
To allow many users to use one database without interfering with each other.
-
To organize database objects into logical groups to make them more manageable.
-
Third-party applications can be put into separate schemas so they do not collide with the names of other objects.
Nouns == Relation
Creating A Relation
(Table)
Break It Down
Create <THING> <NAME>
Create TABLE THING(
NAME VARCHAR(50)
)
Create TABLE THING(
<ATTRIBUTE> <ATTR_TYPE> (,)
)
Attribute Constraints
- Not Null
- Reference
CREATE TABLE FOOBAR(
FOO INT NOT NULL,
BAR INT
)
INSERT INTO FOOBAR VALUES (1,0)
INSERT INTO FOOBAR (BAR) VALUES (1)
Attribute Data Types
Attribute Data Types
TLDR
- Numeric
- Varchar
- Boolean
- Other
Numeric
Scale Matters
INT
4 Bytes
-2147483648 to +2147483647
DECIMAL
Variable
Casting Numerics
CAST(5.00 as MONEY)
Varchar
Variable Character Array
Boolean
Other
Examples
- Byte String
- Points
- Custom
- Composite
- more
Other RDBMS Data Types
Altering A Table
Be Careful!
-
Create Table With Constraint
-
Alter Table to add new attribute
-
Drop new attribute
Basic Create And Delete Lab
Inserting Into a Table
Try Inserting A Tuple
Updating Table Values
UPDATE <TABLE> SET <COLUMN>=<VALUE>
Selecting Data
SELECT retrieves rows from zero or more tables
Simple Select
SELECT 1+1;
From Clause
SELECT * FROM <TABLE>;
Where Clause
SELECT * FROM <TABLE> WHERE <COLUMN> = <VALUE>;
Logical Operators
- AND
- OR
- NOT
Comparison Operators
- =
- < <=
- > >=
- <> !=
Comparison Operators
Removing Table Values
Select Lab
Basic Database Interaction Pt 2
Day 3
Stand Up
Questions
Review
Creating Relations
Adding Data
Selecting And Filtering Data
Person
Address
Customer
Profile
Employee
Info
Joins
Joins Are Links Declared In The Query
Join Types
- Cross
- Repeated once per record (on both sides)
- With no Where same as Cartesian Product
- Qualified joins
- Has restrictions
- Inner is default
Cross Join
"Kind of" From Both Tables
Qualified Join
T1 { [INNER] | { LEFT | RIGHT | FULL } [OUTER] } JOIN T2 ON boolean_expression
Inner Join
What we have in common
Directional
- Which side are you interested in?
- Sides
- Left
- Right
Left Join
Look to the left
Right Join
Look to the right
Full Outer
Join
From Both Tables
Constraint
- Check
- Not Null
- Unique
- Keys
Allows for more fine grained control over types.
Check
Limits Values Allowed
Not Null
Must Have Value
Unique
Basic Constraints Lab
Primary Key
Serial
Foreign Key
References
Cascade
- No Action (Default): Check end of transaction
- Restrict: Check now
- Cascade: Delete referencing row as well
- Set Null: Set the reference to null
- Set Default: Set all to a value
Cascade
Keys, Joins and Cascade Lab
Composite Keys
Returning
Running SQL Script Files
Functions
- Built-in
- Declared
Mathematical Operators
+ - / * %
abs() pi() sqrt()
cos() sin() tan()
GROUP BY
Create Receipts Lab
Advanced Database Interactions Pt 1
Day 4
Standup
Questions
Like Clause
Posix Regex
REGEXP_MATCHES(NAME, '(.+)(e)$')
Comparison Predicates
HAVING
As
SELECT NAME AS PRODUCT_NAME
FROM PRODUCT
Other Important
Complex Search Lab
Case
Subqueries
Challenge!
Using nested selects, query all of the items purchased which were over $3 per Person. The result should be Person Name, Product Purchased and Price. There should be only 1 record for Person/Product Combo
Custom SQL Functions
Create Function
CREATE FUNCTION <Function Name>(<Param Type>...) RETURNS <Return Type> AS <Function Query>
LANGUAGE <Lang>
`OR REPLACE`
"It is not possible to change the name or argument types of a function this way (if you tried, you would actually be creating a new, distinct function)"
Function Overloading
Now Lets Create A Function!
Use Case
We to take the following schema...
CREATE TABLE ADDRESS(
ID SERIAL PRIMARY KEY,
ADDRESS_LINE_1 VARCHAR(50),
ADDRESS_LINE_2 VARCHAR(50),
CITY VARCHAR(50),
STATE VARCHAR(2),
ZIPCODE VARCHAR(5)
);
INSERT INTO ADDRESS (ADDRESS_LINE_1, CITY, STATE, ZIPCODE) VALUES ('301 W. 2nd Ave', 'Columbus', 'OH', '43201');
And create a function to concat the values
|| concats a string so `SELECT 'A' || 'B';` returns `AB`
Variable Naming
$$ vs '
Create Function Lab
Backup
What is Important?
Schema
Data
Data
- SQL Dump
- File Level Archive
- Continuous Archiving
SQL Dump
Standard Out
Cons
- Large databases may require info to partitioned across multiple files.
- Issues query so can lock DB tables
Pros
- Super easy for small DBs
- Can be specific by using params like --table public.tablename dbname
File System Backup
Database MUST be STOPPED
To back up an individual table copy "pg_clog/*" as well!
Other issues if clustered
Cons
Pros
- Doesn't require any Postgres specific interfaces
- Easily scripted and automated
- No SQL to Restore
- No DB Locks
- Very space intensive
- High I/O on storage
- Cannot be specific
Write Ahead Log
Postgres
Logs Query
Log
Storage and Playback
Stores Query For Backup
Cons
Pros
- Allows for sets of commands to be "rolled" back or forward in time.
- Can provide the ability to "fix" a broken transaction.
- Very complex to implement this robustly
- Does not provide a "snapshot" of existing data
- Most space intensive option
Custom Types
- Composite
- Enumerated
- Range
- Base
- Array
Custom Type Lab
Views
View Creation Lab
Advanced Database Interactions Pt 2
Day 5
Standup
Relationships
- One to One
- One to Many
- Many to Many
One To One
- Used for organizing data
- Not easy to insert at the same time
- Sometimes used for metadata
- Red flag
Person
Customer
Customer
Customer
Transaction
Transaction
Transaction
TransactionProduct
Product
Product
Product
AKA Mapping Table
RDBMSs
Struggle With Multi-Level Relationships
Speaking of Performance!
Indexing
Review
Relations are Orderless Unless You Specify In Query
OR....
You Declare An Index
Pros
- Speeds up most selects if applied properly
Cons
- Tends to slow down inserts and updates.
Types of Indices
- B-tree
- Hash
- GiST
- SP-GiST
- GIN
- BRIN
B-Tree
- Default
- Mathematically Complex balanced tree
Large Objects
- Provides ability to store large objects
- Objects are split into chunks in a B-Tree
- End of the day there are better options
Transactions
- "The essential point of a transaction is that it bundles multiple steps into a single, all-or-nothing operation."
BEGIN;
<Do Something>
SAVEPOINT <Savepoint_Name>;
<Do More Stuff>
ROLLBACK TO <Savepoint_Name>;
<Do Even More Stuff>
COMMIT;
tomicity
A
C
I
D
urability
solation
onsistency
Atomicity
All or Nothing.
Transaction is a unit.
Consistency
Database goes from a valid state before the transaction to a valid state after the transaction.
Consistency
Database goes from a valid state before the transaction to a valid state after the transaction.
Isolation
Transactions happen in isolation of other transactions.
Durability
Once committed a transaction remains committed.
Postgres is ACID Compliant
Other DBs might not be fully complaint
Security
- Authentication
- Acess
Trust Authentication
Everyone gets in
Password Authentication
Stores passwords in PostGres
LDAP, ETC Authentication
Stores Password in LDAP
Roles
Grant
Roles Lab
Statistics
Clustering
Normalization and Data Modeling
Day 6
Using Data
Day 7
Triggers
`When Validation Isn't Enough`
Triggers
- Can be used with tables or views
- Calls a function
Also Interesting
- If multiple triggers of the same kind are defined for the same event, they will be fired in alphabetical order by name.
- SELECT does not modify any rows so you cannot create SELECT triggers.
Trigger Lab
LibPQ
C Library For Postgres Communication
...
#include <libpq-fe.h>
...
PGconn *conn = PQconnectdb("user=postgres dbname=postgres");
if (PQstatus(conn) == CONNECTION_BAD) {
...
}
PGresult *res = PQexec( conn, "DROP TABLE IF EXISTS Cars" ); if (PQresultStatus(res) != PGRES_COMMAND_OK) { ... }
ODBC
JDBC
Native server driver, must be installed per OS
Same drive all OSes but must have Java
ORM and JPA
Object
Relational
Mapping
Complex SQL
Complex SQL
JSON Object
ORM
Spring
Java Persistence API
JPA == ORM for Java
REST Controller
Mapping Service
JPA Repo
JPA Lab
JPA
- @Entity
- @Id
- @OneToMany
- @ManyToOne
SQL
- Table
- Primary Key
- Foreign Key
- Mapping Table
- spring.jpa.show-sql
- spring.jpa.hibernate.ddl-auto
- spring.jpa.database-platform
@Qualifier
@Entity("Schema1")
...
@Qualifier("Schema1")
@Entity("Schema2")
...
@Qualifier("Schema2")
Schema1.User
Schema2.User
Also Works With Spring
SQLObject
SQLAlchemy
SQL Injection
"SQL statement that you will unknowingly run on your database"
Mitigation
Open Web Application Security Project (OWASP)
Prepared Statements
Escape Input
Visualization
Tableau
Tableau Lab
D3
SQL in IoT
- On Disk
- In Memory
SQLLite
Basic Advantage:
Deals with streams of real time data
Secret Sauce:
In Memory
What are the factors the affect your needs?
Capstone
Create a real-world use case
Use a Postgres DB to store the data for this use-case
Give a presentation on the Use Case
- Create a schema
- Create at least 4 tables
- Create at least 1 Function
- Create at least 1 view
- SQL Schema
- PG_DUMP
- JPA Code
- Tableau Graph (Optional)
- 10 minute Presentation
Deliverables
The Future of Databases
Final Class
Review
Relational Databases are the equivalent of structured Binary Trees
NoSQL
"No SQL" or Non-Relational
Real Time Data Streams
Extremely Large Datasets
Neural Network
Tables
Why do We Care?
Document Oriented Database
Pros
- Really fast if you are looking for a particular item.
- 2nd most popular
- Allows for storing rather large documents
- Highly scalable
- Resilient
Cons
- Searches are slow
- Need to "manage" relationships.
- More verbose
- Doesn't line up exactly with certain paradigms
AWS Dynamo
Real Time Data Streams
Common Use Cases
- IoT
- Market Making
- Issue Mitigation
Topic
Subscriber
Subscriber
Subscriber
Pros
- Can go up and down
- Can be replayed (stream capture)
- Easily filtered
- Super fast processing
Cons
- Verbose
- Chatty (Network)
- Not reliable
- Doesn't natively maintain a record
"Big Data"
Extremely large datasets
Distributed or parallel
HDFS: Distributed File System
Mapreduce: Distributed Computational Engine
2 Parts
Mapreduce
Spark
Flatmap Vs Spack Map
Allows for parallel processing
Neural Networks
Graph Theory
G = (V, E)
G: Graph
V: Vertices
E: Edges
Blockchain
Qubits
Presentations
What we will learned
- What is a database
- How to interact with a relational database
- What is on the horizon
What is a Database
- A Repository for information
- A BTree Representation
- Not Magic
Interacting with RDBMS
- Connect and Create DBs and Schemas
- Add data to tables within the schema
- Use this data to answer questions using SQL
What is on the horizon
- Using visualization tools
- Working with non-relational dbs
- Integrating technologies together
Thanks
You guys did great!
SQL Class
By Jackie Gleason
SQL Class
- 1,257