Database Normalization for Developers
Practice should always be based upon a sound knowledge of theory.
~ Leonardo da Vinci
Rosina Bignall
Blog: http://rosinabignall.com
Twitter: @rosinabignall
JoindIn: https://joind.in/13249
Slides: http://goo.gl/fGnlsq
Assistance Dog Zaakir
http://tamarr.org
Database Normalization
The process of efficiently organizing data in the database
Goals of Normalization
- Arranging data in logical groupings
- Eliminating redundant data
- Ensuring data dependencies make sense
- Access and manipulate the data quickly and efficiently
Who needs it
- Online Transactional Processing
Data vs Information
Business Rules
Relationships
Accuracy and Performance
Poor Normalization causes problems such as
- Excessive Disk I/O
- Poor Performance
- Inaccurate Data
- Incorrect or Missing Data
Definitions
Heading
A set of attributes
Definition
Definition
Tuple
A set of ordered pairs <A,v>, one pair for each attribute in the heading
Definition
Row
Essentially the implementation of a tuple
Definition
Relation
An ordered pair <H,h>, where h is the set of tuples all having heading H
Definition
Table
Essentially the implementation of a relation
Definition
Functional Dependency (FD)
An expression of the form X->Y where X and Y are subsets of the attributes then whenever two tuples have the same value for X they also have the same value for Y
Functional Dependencies
{ CITY } -> { STATUS}
First Normal Form (1NF)
All relations are by definition in 1NF
1NF simply means that each tuple in the relation contains exactly one value for each attribute
Second Normal Form (2NF)
A relation is in second normal form (2NF) if and only if for every nontrivial FD X->Y at least one of the following is true: (a) X is a superkey; (b) Y is a subkey; (c)X is not a subkey.
No repeating groups
Third Normal Form (3NF)
A relation is in third normal form (3NF) if and only if for every nontrivial FD X->Y either (a) X is a superkey; (b) Y is a subkey.
Every attribute must depend on the primary key
Boyce/Codd Normal Form (BCNF)
Every attribute must depend on the key, the whole key and nothing but the key
A relation is in Boyce/Codd normal form (BCNF) if and only if, for every nontrivial FD X->Y, X is a superkey.
Invoice Example
Procedure
- Determine Attributes
- Determine Functional Dependencies
- For every set of functional dependencies with the same left side (X) add a relation with heading containing the left side (X) and all the right sides (Y).
Attributes
- Date
- Inv_no
- Cust_no
- Cust_Address
- Item_Id
- Description
- Unit_price
- Quantity
- Amount
- Total
Functional Dependencies (FDs)
- Inv_no -> Date
- Inv_no -> Cust_no
- Inv_no -> Total
- Cust_no -> Cust_Address
- Item_Id -> Description
- Item_Id -> Unit_price
- Inv_no, Item_Id -> Quantity
- Inv_no, Item_Id -> Amount
Procedure
Determine AttributesDetermine Functional Dependencies- For every set of functional dependencies with the same left side (X) add a relation with heading containing the left side (X) and all the right sides (Y).
Functional Dependencies (FDs)
- Inv_no -> Date
- Inv_no -> Cust_no
- Inv_no -> Total
- Cust_no -> Cust_Address
- Item_Id -> Description
- Item_Id -> Unit_price
- Inv_no, Item_Id -> Quantity
- Inv_no, Item_Id -> Amount
Invoice Relation
- Inv_no -> Date
- Inv_no -> Cust_no
- Inv_no -> Total
Functional Dependencies (FDs)
Inv_no -> DateInv_no -> Cust_noInv_no -> Total- Cust_no -> Cust_Address
- Item_Id -> Description
- Item_Id -> Unit_price
- Inv_no, Item_Id -> Quantity
- Inv_no, Item_Id -> Amount
Customer Relation
- Cust_no -> Cust_Address
Functional Dependencies (FDs)
Inv_no -> DateInv_no -> Cust_noInv_no -> TotalCust_no -> Cust_Address- Item_Id -> Description
- Item_Id -> Unit_price
- Inv_no, Item_Id -> Quantity
- Inv_no, Item_Id -> Amount
Item Relation
- Item_Id -> Description
- Item_Id -> Unit_price
Functional Dependencies (FDs)
Inv_no -> DateInv_no -> Cust_noInv_no -> TotalCust_no -> Cust_AddressItem_Id -> DescriptionItem_Id -> Unit_price- Inv_no, Item_Id -> Quantity
- Inv_no, Item_Id -> Amount
Invoice Detail Relation
- Inv_no, Item_Id -> Quantity
- Inv_no, Item_Id -> Amount
Resulting Relations
Another way
Invoice
- Inv_no
- Date
- Cust_no
- Cust_Address
- Item_Id
- Description
- Unit_Price
- Quantity
- Amount
- Total
1st Normal Form
Invoice
- Inv_no (k)
- Cust_no
- Cust_Address
- Date
- Total
Invoice_Detail
- Inv_no (k)
- Item_Id (k)
- Description
- Unit_price
- Quantity
- Amount
2nd Normal Form
Invoice
- Inv_no (k)
- Cust_no
- Cust_Address
- Date
- Total
Invoice_Detail
- Inv_no (k)
- Item_Id (k)
- Quantity
- Amount
3rd Normal Form
Item
- Item_Id (k)
- Description
- Unit_price
Invoice
- Inv_no (k)
- Cust_no
- Date
- Total
Invoice_Detail
- Inv_no (k)
- Item_Id (k)
- Quantity
- Amount
Boyce/Codd Normal Form
Item
- Item_Id (k)
- Description
- Unit_price
Customer
- Cust_no (k)
- Cust_Address
Resulting Relations
Should you Normalize?
Resources
- Database Design & Relational Theory: Normal Forms & All That Jazz, C.J. Date
- Database In Depth: Relational Theory for Practitioners, C.J. Date
- Database Design for Mere Mortals: A Hands-On Guide to Relational Database Design, Michael J. Hernandez
- An introduction to database systems, 7th ed
Rosina Bignall
Blog: http://rosinabignall.com
Twitter: @rosinabignall
Slides: http://goo.gl/fGnlsq
Feedback please!
https://joind.in/13249
Database Normalization for Developers
By Rosina Bignall
Database Normalization for Developers
A talk for Confoo 2015
- 1,961