Crafting SQL Databases(DDL)

Normalization and Denormalization Strategies

Learning Outcome

5

Decide when denormalization is needed

4

Detect data redundancy and anomalies

3

Identify functional dependencies

2

Explain 1NF, 2NF, 3NF, and BCNF

1

Understand why Normalization is required

Recall

Tables need Primary Keys to uniquely identify records

Foreign Keys connect related tables

Composite Keys handle many-to-many relationships

Indexes improve read performance

Instagram stored post data like this:

But the problem is

  • Username repeats again and again
  • Comments are stored together
  • Updating username requires multiple changes
  • Risk of inconsistency and errors

Even with Primary & Foreign Keys, data can still be:

  • Redundant
  • Difficult to update
  • Error-prone

Databases need rules to:

  • Organize data properly
  • Reduce duplication
  • Avoid anomalies

 These rules are called Normalization

Normalization

Normalization is the process of removing duplicate data from a table.

Normalization divides a large table into smaller tables and links them using relationships.

The main reason for normalization is removing anomalies.

Then ....

The side effect of duplicate data or large tables is called an anomaly.

Types of Anomalies

 1. Insertion Anomaly

Insertion anomaly occurs when we cannot insert data due to missing required values.

2. Updation Anomaly

Update anomaly occurs when updating one value requires multiple updates.

3. Deletion Anomaly

Deletion anomaly occurs when deleting one record unintentionally deletes important data.

Example :

Imagine Instagram stores everything in ONE table:

New user Rahul signs up.

He has no posts yet.

But this table requires:

  • post_id

  • post_caption

So you cannot insert Rahul unless you add fake/NULL post data.

 That’s Insertion Anomaly

(You can't insert user without post.)

Hardik has 500 followers.

But his data appears in two rows.

Now database becomes inconsistent

That’s Update Anomaly

(Same data stored in multiple places → risky updates.)

If his followers become 600,

you must update BOTH rows.

If you update only one row:

Suppose Priya deletes her post:

If this row is deleted...

We also lose:

  • Priya’s username

  • Her follower count

Even though we only wanted to delete her post.

That’s Deletion Anomaly

(Deleting one thing accidentally deletes important data.)

Solution:

To remove anomalies, we divide data into smaller tables.

Users Table

Posts Table

Less duplication

Better data integrity

This process is called Normalization

Types of Dependencies

Relationship between attributes where one attribute determines another.

1. Total Functional Dependency

If all non-key attributes depend on a single key attribute, it is a total functional dependency.

Instagram Example :

User(user_id,username, email, phone)

  • user_id → username

  • user_id → email

  • user_id → phone

All depend on Primary Key (user_id)

2. Partial Functional Dependency

Occurs when non-key attributes depend on part of a composite key.

Instagram Example :

Like(user_id, post_id, username)

  • Composite Key: (user_id, post_id)

  • username depends only on user_id ❌

This creates redundancy

3. Transitive Functional Dependency

Occurs when a non-key attribute depends on another non-key attribute.

Instagram Example :

User(user_id, city_id, city_name)

  • user_id → city_id

  • city_id → city_name

city_name indirectly depends on user_id

Normal Forms

1NF

2NF

3NF

First Normal Form

Second Normal Form

Third Normal Form

First Normal Form (1NF)

A table is in 1NF if:

It has Primary Key

No duplicate data

All values are atomic

Not 1NF

1NF

Example :

post_id ​comments
5001 nice, wow, cool
post_id ​comments
5001 nice
5001 wow
5001 cool

Second Normal Form (2NF)

A table is in 2NF if:

It is already in 1NF

It has no partial dependency

Not 2NF

2NF

Example :

user_id ​post_id username
101 9001 Hardik
101 9002 Hardik
102 9001 Priya
user_id username
101 Hardik
102 Priya
user_id post_id
101 9001
101 9002
102 9001

User Table

Like Table

Third Normal Form (3NF)

A table is in 3NF if:

It is already in 2NF

It has no transitive dependency

Not 3NF

3NF

Example :

user_id city_id city_name
101 1 Mumbai
102 2 Pune
103 1 Mumbai
user_id city_id
101 1
102 2
103 1
city_id city_name
1 Mumbai
2 Pune

User Table

City Table

BCNF (Boyce–Codd Normal Form)

Rule

Stronger version of 3NF

Every determinant must be a candidate key

When needed:

Multiple candidate keys exist

Complex business rules

 What is Denormalization?

  • Intentionally breaking normalization
  • To improve performance

Example:

  • Store username inside Post table
  • Avoid joins for fast reads

Used when:

  • System is read-heavy
  • Performance is more important than redundancy

Summary

5

Denormalization is a performance choice, not a mistake

4

BCNF → Strong dependency rules

3

3NF → No transitive dependency

2

1NF → Atomic values and 2NF → No partial dependency

1

Normalization organizes data logically

Quiz

A table is in First Normal Form (1NF) if:

A. It has no foreign keys

B. It has atomic values and a primary key

C. It has composite keys only

D. It allows multi-valued attributes

Quiz-Answer

A table is in First Normal Form (1NF) if:

A. It has no foreign keys

B. It has atomic values and a primary key

C. It has composite keys only

D. It allows multi-valued attributes

SQL : Normalization-and-denormalization-strategies

By Content ITV

SQL : Normalization-and-denormalization-strategies

  • 39