ZZEN9313

Webinar 2

 

Relational Databases

 

ZZEN9313 20H4

Author: Hayden Smith

Relational databases are essentially just structured tables that are related to one another.

ZZEN9313 20H4

Author: Hayden Smith

LicenseNumber Name Expiry
15063558 Harry Pogson 2020-12-20
22531555 Selina Chua 2021-06-07
13541235 Yu Hou 2020-07-07

Row/Record/Entry

Column/Field

Unique Column

Table/Entity/Relation

CREATE TABLE Citizens {
	LicenseNumber TEXT PRIMARY_KEY,
    Name TEXT,
    Expiry DATE
}

Schema

Relational Databases

An important part of the most basic relational table is that columns/fields must be atomic. Atomic means its a single value with a single, known type.

 

The definition of atomic is not always a technical one, sometimes it's a semantic one. E.G. is "Mr Hayden Smith" atomic?

ZZEN9313 20H4

Author: Hayden Smith

Linking tables

ZZEN9313 20H4

Author: Hayden Smith

E.G. What happens if "Peter" wants to update his name to "Paul"?

Something we don't like about this table: Redundancy

 

Redundancy means that when we mutate a database (i.e. insert a row, update a row, or delete a row) we may have to update it in multiple places.

Linking tables

ZZEN9313 20H4

Author: Hayden Smith

We can make this less redundant by normalising the database.

 

Now we have removed redundancy, however, these tables are not linked properly yet - so we need to link them.

Linking tables

ZZEN9313 20H4

Author: Hayden Smith

Now we need to link these tables together somehow

 

Primary Key

Foreign Key

Adding a primary key to the table

Normal Forms

In this course we will explore different levels of normalised databases:

  • First normal form
  • Second normal form
  • Third normal form

ZZEN9313 20H4

Author: Hayden Smith

BCNF, 4NF, and 5NF can be explored further in independent learning

1NF: First Normal Form

  • All values are atomic
  • There are no duplicate rows (i.e. you should probably include a foreign key)

ZZEN9313 20H4

Author: Hayden Smith

In this case, we've made all phone numbers atomic. We've also removed all multiple values (though an explicit primary key isn't defined)

 

source: https://www.youtube.com/watch?v=ABwD8IYByfk

2NF: Second Normal Form

  • Is in 1NF
  • Has no partial dependencies
    • Partial dependency: When a non-key attribute depends on part of the primary key

ZZEN9313 20H4

Author: Hayden Smith

In this table, { Employee Id, Department Id } make up the primary key. But Office Location (a non-key attribute) depends on Department ID only

source: https://www.youtube.com/watch?v=ABwD8IYByfk

3NF: Third Normal Form

  • Is in 2NF
  • Has no transitive dependencies
    • Transitive dependency: Indirect relationship between columns in a table that causes a functional dependency

ZZEN9313 20H4

Author: Hayden Smith

Since { MembershipID => Full Name } and { Full Name => Salutation }, we have a transitive dependency and need to separate out

source: https://www.guru99.com/database-normalization.html

Data Warehouses

  • Operational databases are optimised for OLTP (Online Transactional Processing)
  • Data warehouse databases are optimised for OLAP (Online Analytical Processing)

 

The typical process is that at some intermittent cadence (e.g. daily), data is copied from the operational database into the data warehouse.

 

Analysis is done on the data warehouse "offline" then. Since this database is offline and often entails processing huge amounts of data, there are two things to consider:

 

  1. Normalisation comes at a cost for large queries
  2. Since it's offline, there are limited insert/update/deletions, so the risk of anomalies is low

 

This tends to motivate us to use de-normalised

structures for data warehouses

ZZEN9313 20H4

Author: Hayden Smith

Denormalisation

Denormalisation is essentially the reverse process of normalisation: Reducing the number of tables and increasing the amount of redundancy.

 

For data warehouses, a common method of denormalisation is to produce star-schemas. Star schemas contain a fact table and dimension tables that essentially mean you never have to do table joins that are more than 1 separated

ZZEN9313 20H4

Author: Hayden Smith

Denormalisation

ZZEN9313 20H4

Author: Hayden Smith

Operational Database

Data Warehouse Database

Assessment 2

This assignment is due Monday 5pm in week 3.

 

The assignment is split up into two parts:

  1. Part 1 focuses on building a normalised database from a very raw table of data. Bulding occurs in the form of drawing a diagram.
  2. Part 2 focuses on converting an operational database into a star schema

ZZEN9313 20H4

Author: Hayden Smith

ZZEN9313 Week 2

By cs6771

ZZEN9313 Week 2

  • 537