Database and data modelling

  • show understanding of the limitations of using a file-based approach for the storage and retrieval of data
  • describe the features of a relational database which address the limitations of a file-based approach
  • show understanding of the features provided by a DBMS to address the issues of
    • data management, including maintaining a data dictionary
    • data modelling
    • logical schema
    • data integrity
    • data security, including backup procedures and the use of access rights to individuals/groups of users

Objectives

Data stored in database should ... 

  • Ensure
    • Data Integrity
    • Data Independence
    • Data Consistency
  • Allows
    • Data Concurrency
  • Minimal
    • Data Redundancy

Think Pair Share

  1. Research on Integrity, independence, consistency
  2. For each of the above feature:
    1. Give an example (scenario/database) why this is important
    2. Write in the notebook

File-based approach

  • Data stored in a file
  • E.g. as plain-text file like CSV
  • Limitations to this approach
    • The structure of the data file is fixed once created
    • Difficult to edit stored data
    • Data redundancy same piece of data may stored in different part of file / different files
    • Privacy cannot restrict people to read a particular section of the file
    • If same data file is being opened and edited in different computers, data might be inconsistent
    • Dependency since data is stored in a particular structure, the program to consume it will be dependent on the physical structure

DB Management System (DBMS)

  • DBMS is a software to control access to data in a database
  • E.g. MySQL, Access, Oracle Database, Amazon Web Services
  • DBMS manage how physically the data is stored
    • Programmer define the data structure
    • DBMS will store and retrieve the data in physical storage
    • Programmer don't need to handle "how" the data is stored

DBMS abstracts programmer (application programs) from physical files

What DBMS provides

  • High level data modelling
  • Concurrent access of data
    • Multi-user can access the same data-set while DBMS make sure the data integrity
    • One way to achieve is defining lock in database, once a user is editing a particular part of data, others will not able to retrieve/modify same piece of data
  • Querying of large collection of data
    • Very efficient to "search" and retrieve data from DBMS
    • Common language for query is called SQL
  • Backup of data
  • Security
    • Multilevel access control to what a user can retrieve / write
  • Database Administrator (DBA) is the user that manages the database (similar to administrator account in windows)

Relational database

  • Relational database is one of the most common/popular type of databases
  • Each type of item (entity) is stored as a database table
  • Each table will have certain attributes (fields)
  • Tuples or records are each item in a table

Table notation

  • The following notation to describe design of a table
  • Underlined attribute indicates primary key
MEMBERS( MemberId, GivenName, FamilyName, BandName )

BANDS( BandName, AgentId)

Agents( AgentId, Name, ContactName, Telephone)

Primary Key

  • Unique attribute in a table
  • Only one per table
  • To identify the tuple in a table
  • BANDS.BandName, MEMBERS.MemberId and AGENTS.AgentId in the previous example
  • Primary key will also be the primary index - that helps the database to retrieve data faster
    • Similar to the glossary at the end of textbooks, it helps you to locate which page is the term mentioned
    • You can have more than one indexed attribute, they will be called secondary index / key

Foreign Key

  • An attribute of a database table, that refers to the primary key of another table
  • E.g. BANDS.AgentId in the previous example. It stores the primary key from AGENTS table or MEMBERS.BandName
  • Foreign Key need not to be unique. E.g. in the previous example, a band can have more than one member or an agent can represents more than one band. 

Entity-Relationship diagram (ERD)

  • Identify the entity (representing objects/data stored) and the relationships among them

Stepwise refinement approach in ER modelling

  1. Decide the entities within your database (Nouns)
  2. Identify the relationships
  3. Decide the cardinalities of the relationships
    1. One to one (1:1)
    2. One to many (1:m)
    3. Many to many (m:n)
  4. Include the minimum amount:
    1. Zero
    2. One
    3. More than one
  5. Create the full ER diagram by describing the relationships

*(refer to p.162 for example)

ER Symbols

Entity

Cardinality

zero

one

many

Cardinality with minimum

One and only one

Many (at least one)

Many (can be zero)

Many (must be 2 or more)

At most one

Logical ER model

  • Some relationship defined in Conceptual ER Model cannot be implemented in certain database system
  • For Relational database, we cannot model M:N relationships, thus we need to add an additional Link Entity to resolve it. (read p.164) so that it became two 1:M relationships

Normalisation

  • A method to refine design of the database
  • Helps to minimize potential error in database

Following slides extracted from: https://www.1keydata.com/database-normalization

Steps in normalisation

  • Identify the table and columns, find the Primary Key
  • Convert the tables in 1NF
  • Convert tables to 2NF
  • Finally convert them to 3NF
  • Further example: https://www.sqa.org.uk/e-learning/MDBS01CD/page_26.htm

Unnormalised Form (UNF)

  • Contains non-atomic data
  • Repeating groups of data and redundant data
  • No unique key can be identified
    • Note: Key can be composite key (combination of multiple items)

Non-Atomic Data

  • The value of a field, can be split into multiple, same type of values
  • e.g. if a field is called "scores" and contains "95, 80" as value, it is non-atomic since it is containing multiple scores in one single field
  • e.g. Address field contains "MAC, Taipa, Macau" is atomic data, even though it can be split into MAC, Taipa and Macau but they are not same type of value

First normal form (1NF)

  • A Table is in 1NF given that:
    • Contains only atomic value
    • No repeating attributes in the data
      • If we change the table to have 2 fields called color1 and color2, they are repeating groups

1NF

Unnormalised

Second normal form (2NF)

  • Must be 1NF
  • Non-key attributes are only dependent to the primary key
  • Following example is 1NF since no repeating attributes, but not 2NF because Location is not dependent to CustomerID

1NF only

2NF

Third normal form (3NF)

  • Must be 2NF
  • No transitive dependency on attributes to PK
  • Following example is 2NF since GenreType depends on GenreID, and GenreID depends on BookID, so GenreType has transitive dependency to BookID (PK)

2NF only

3NF

Structured Query Language

  • SQL - Programming language provided by DBMS
  • Supports all operations with the database
  • Consists of:
    • Data Definition Language (DDL)
      • Create or alter table structure
    • Data Manipulation Language (DML)
      • Manipulate data (INSERT, UPDATE, DELETE)
      • Query data (SELECT)
  • Practice SQL:
    • http://www.compileonline.com/execute_sql_online.php

DDL - Create database

CREATE DATABASE BandBooking;

DDL - Create table

CREATE TABLE Members 
(
  MemberID  VARCHAR(10),
  Name VARCHAR(50),
  BandName  VARCHAR(30)
);

DDL - Change definition

ALTER TABLE Members ADD Age INTEGER;
-- This is comment
-- primary key

ALTER TABLE Members ADD PRIMARY KEY (MemberID);

-- Foreign key, Note Band table must be created

ALTER TABLE Members ADD FOREIGN KEY (BandName REFERENCES Band(BandName);

DDL - ADD Key

DDL - Delete attribute

ALTER TABLE Members DELETE Age;
INSERT INTO Members(MemberId, Name, BandName)
VALUES ( '123456', 'John Lennon', 'Beatles');

DML - INSERT

SELECT *
FROM Members
WHERE BandName = 'Beatles'
ORDER BY Name

DML - QUERY

-- Note about the use of WHERE

UPDATE Members
SET MemberId='jl1234'
WHERE Name='John Lennon';

DELETE FROM Members
WHERE Name='Justin Bieber';

DML - UPDATE / DELETE

Made with Slides.com