Intro to SQL
Structured Query Language
aka
aka
Yay! Data.
What is SQL?
...is a special-purpose programming language designed for managing data held in a relational database management system (RDBMS)
As per Wikipedia: https://en.wikipedia.org/wiki/SQL
SQL is the language that we use to "talk" to our databases to store and retrieve data.
SQL is a standard language in that it has syntax, logic and operators to perform operations.
Database Structure
The structure of our database is a collection of tables consisting of a collection of rows and columns.
Visually we can think of a table much like a spreadsheet in that we have columns representing a consistent type of data value and rows consisting of unique values of this type.
One major difference and benefit of a relational database is that we define strict types of data in our columns.
CRUD
When we think of SQL we generally think of 4 operations that we perform regularly against our databases:
-
Create
- SQL: INSERT
-
Read
- SQL: SELECT
-
Update
- SQL: UPDATE
-
Delete
- SQL: DELETE
CRUD - Create/Insert
To create new records in our database we use the INSERT command to tell the database that we are "inserting" new data into our database.
-- Insert a single new record into the database
INSERT INTO users (first_name, last_name, email, phone)
VALUES ('Jay', 'Won', 'jaywon@dark.net', '555-5555')
This assumes that we have a table in our database named users with columns:
- first_name
- last_name
- phone
CRUD - Read/Select
To select pre-existing record(s) from our database we use the SELECT command to tell the database what it is we are "looking" for in our database.
--Select all rows from all matching records
SELECT *
FROM users
WHERE email = 'jaywon@dark.net'
--Select only certain columns from all matching records
SELECT id, first_name, last_name
FROM users
WHERE email = 'jaywon@dark.net'
--More general search for multiple records with likeness on some value
SELECT *
FROM orders
WHERE grand_total > 100
SELECT and it's counterparts are our primary tool for getting information from the data in our database.
CRUD - Update
To update existing records in our database we use the UPDATE command to tell the database that we are "updating" data that already exists in our database.
--Single record update
UPDATE users
SET first_name = 'Jason',
last_name = 'Sewell'
WHERE id = 1
--Multiple record update
UPDATE users
SET last_logged_in = null
WHERE id > 0
Here we make use of the WHERE clause that specifies a matching condition for which record(s) we want to update and SET to specify which columns we want to update with their new values.
CRUD - Delete
To remove existing records in our database we use the DELETE command to tell the database that we are "removing" data from our database.
--Single record delete
DELETE
FROM users
WHERE id = 1
--Multiple record delete
DELETE
FROM users
WHERE id > 0
Here we make use of the WHERE clause that specifies a matching condition for which record(s) we want to update and SET to specify which columns we want to update with their new values.
Aggregate Queries
Not only can we use SELECT queries to retrieve the data itself but we can also use aggregate functions to get information ABOUT our data from the database.
- AVG
- COUNT
- MIN
- MAX
- SUM
The following functions allow us to perform common simple math operations against our data:
NOTE: null values are commonly ignored in aggregate calculations
AVG, COUNT, SUM
Often times we want to perform simple math on our data to derive information from our data.
--Determine the number of users above age 35
SELECT COUNT(*) users_above_35
FROM users
WHERE age > 35
--Determine the number of users between a certain age
SELECT COUNT(*) users_between_30_35
FROM users
WHERE age BETWEEN 30 AND 35
--Determine the average age of our users
SELECT AVG(age) avg_user_age
FROM users
--Determine a total amount of products sold
SELECT SUM(total) total_purchases_amount
FROM purchases
--Determine a total amount of a certain kind of product sold
SELECT SUM(total) total_shamwow_purchases
FROM purchases
WHERE id = 3
MIN, MAX
Often times we want to perform simple math on our data to derive information from our data.
--Determine the least/most visits to our site
SELECT MIN(site_visits) min_visits
FROM users
SELECT MAX(site_visits) max_visits
FROM users
--Determine the least/most visits to our site between a certain age
SELECT MIN(site_visits)
FROM users
WHERE age BETWEEN 30 AND 35 min_visits_30_35
SELECT MAX(site_visits) max_visits_30_35
FROM users
WHERE age BETWEEN 30 and 35
More Powerful Aggregate Queries using GROUP
The GROUP clause is a powerful tool for performing aggregate queries against an entire table but while grouping records together on a common value.
-- Get total sales of each individual product as opposed to all products
SELECT product_id, SUM(total) total_product_sales
FROM orders
GROUP BY product_id
-- Get total sales on all products by date
SELECT purchase_date, SUM(total) total_sales
FROM orders
GROUP BY purchase_date
-- Get a list of users in order of total purchases to determine ballers
SELECT user_id, SUM(total) total_user_purchases
FROM orders
GROUP BY user_id
ORDER BY total_user_purchases DESC
HAVING vs WHERE
When using GROUP BY to perform aggregate operations with some kind of filtering specification based on the aggregate we need to use HAVING instead of WHERE but we can think of them in the same way.
-- Get a list of users that are only considered ballers
SELECT user_id, SUM(total) total_user_purchases
FROM orders
GROUP BY user_id
HAVING SUM(total) > 100000
ORDER BY total_user_purchases DESC
(Advanced) SQL
While we covered the basic CRUD operations our databases derive most of their value to us and our applications with more advanced query operations, especially in the context of an RDBMS.
What is an RDBMS?
An RDBMS or Relational Database Management System is a database that is compliant with the relational model theory.
The easiest way to understand relational model theory or an RDBMS is that much of our data, while unique in it's own right has meaningful relationships to other data stored in our database and an RDBMS helps us manage that these relationships are defined, exist, are accurate and are enforced.
Modeling an RDBMS
Much like creating objects in OOP that have properties that are specific to that type of object, we model our tables in a similar way.
If we are creating a table to represent users in our system, we only want to create properties that are specific to our user even if our application has broader context like an e-commerce application.
Also, much like programming we want our data to be "DRY" or prevent duplication or redundancy in our data.
Ex. A user can have many purchases but we don't want a record for every purchase that duplicates the user information.
Primary Keys
One of the key factors of a relational database is that we have uniqueness of records so that we can later identify single unique records.
Many records may have duplicate values in their columns so most times we need a unique identifier that only identifies a single row in a table.
To achieve this, we define a primary key on our table that will be guaranteed to be unique to that table and can be referenced from other tables directly.
This is often an INTEGER that we name id and we usually set this up to be AUTO INCREMENTING so that we don't have to determine the new value when records are inserted.
Foreign Keys
A secondary function of defining primary keys is that we can create our relationships from other tables based on the unique primary key our primary table.
These relationships are called constraints in that we say a certain value MUST exist in our primary table for our secondary table records to have proper context.
Foreign key relationships must ALWAYS match a non-primary key column in one table to the primary key column of another table.
Ex. A user can have many purchases but what value is a purchase record to us if we don't know WHO made the purchase.
Constraints
Constraints are "rules" we define on our database to enfore the INTEGRITY of our database.
We expect our database to always allow us to infer proper meaning from the huge amounts of data that can be stored in our database and it's inner relationships.
Like a human, if our database loses it's integrity, how can we trust it.
With our relationships we are ensuring that data in different tables doesn't get lost or lose it's context with relation to other data.
Cascading
With our constraints we can also have cascading effects on our relationships when primary key records are modified(only PK) or deleted.
Ex. When we delete a user from the database also delete all purchases that this user made.
This allows us to keep the integrity of our database in that we don't have purchases in our database that don't have a corresponding user record.
NOTE: Often we don't want to TRULY delete data, as it often continues to have value past visible meaning in your application. Always ask yourself do you want to truly delete or "archive".
JOINS
Since we have created related data in different tables we obviously need a way to "piece" these records back together to have meaning when querying our database.
Good thing we defined our foreign key relationships because now we can simply use the power of JOIN to retrieve meaningful results from our fragmented database!
INNER JOIN
There are multiple types of JOIN(s) but we will start with the most common which is the INNER JOIN.
--Query our purchases table but join our users so that the purchases have more context
SELECT purchases.*, user.id, user.email
FROM purchases
INNER JOIN users ON purchases.user_id = user.id
--Query our purchases table and join our users applying a filter for a certain user
SELECT purchases.*, user.id, user.email
FROM purchases
INNER JOIN users ON purchases.user_id = user.id
WHERE users.id = 3
This will give us a set of rows as a result with columns specified from both tables as a single row.
We removed duplication at the database level but we can query our database in ways that show duplication in our queries without storing data that way.
Transactions
Another benefit of using an RDBMS is that they support transactions.
A transaction is a series of queries you run against your database that performs CRUD operations against multiple tables or records in your database.
The purpose of running these queries within a transaction is that if any of the queries fail for any reason none of the changes made are saved or persisted.
"All or Nothing"
Tranasactions and the ACID Test
An RDBMS is said to support transactions properly if a single transaction meets the following criteria known as the ACID test:
-
Atomicity
- That a transaction completes as either "all or nothing".
-
Consistency
- Before and after the database must be in a reliable state meaning constraints and other rules are preserved.
-
Isolation
- The results of all operations in a transaction are the same as if the queries were run individually.
-
Durability
- Once the transaction is completed there will be no loss of changes to database.
Resources
Intro to SQL
By Jason Sewell
Intro to SQL
- 2,435