Normalization in SQL to Reduce Data Redundancy

By:   |   Updated: 2024-07-15   |   Comments (1)   |   Related: > Database Design


Problem

Various levels of normalization in SQL can be used to reduce data redundancy and have a better-structured relational data model. This tutorial looks at these various levels with explanations and examples in Microsoft SQL Server for beginners.

Solution

In this tutorial, we will talk about normalization, a process to help organize your relational database design.

What is Normalization in SQL?

It is a process that reduces redundant data, improves data integrity, simplifies table structure, and fixes dependency issues. Normalization has different levels which are covered below.

Database Normalization Levels in a DBMS

The following table summarizes the levels of normalization in a relational model:

Normalization Level Explanation
First Normal Form (1NF) Ensures that the columns are atomic (indivisible) and all values in the column have the same data type. It means that the values cannot be divided into smaller units. For example, if we have several phone numbers inside the column, the value can be divided into smaller units.
Second Normal Form (2NF) Eliminates partial dependencies and makes sure that the non-key attributes are completely and functionally dependent on the primary key. You can have partial dependency when some of the columns partially depend on the primary key. For example, if you have a composed primary key of CustomerID and SalesID and you have the column SalesMonth which depends on the column SalesID, but not the CustomerID column.
Third Normal Form (3NF) Eliminates transitive dependencies. In other words, the non-primary keys do not depend on another non-primary key through a third attribute. The transitive dependency means that an attribute indirectly depends on another attribute.
Boyce-Codd Normal Form (BCNF) Ensures that all determinants of a table are candidate keys. It is also known as normalization 3.5. It ensures that the values determined by another value in the same table must be based on the complete key and not part of it.
Fourth Normal Form (4NF) Handles multi-valued dependencies by checking that there are no independent sets of multi-valued dependencies within a table. The multi-valued dependency exists between two columns when a value in the first column correspond to multiple values in the second one.
Fifth Normal Form (5NF) Verifies that the table information can be derived by joining two or more tables. All the information in a table should be able to be reconstructed joining smaller tables without data loss.

First Normal Form (1NF)

For a database table to comply with 1NF, it should meet the following criteria:

  • A cell cannot have more than a single value (must be indivisible).
  • The rows should contain a primary key to identify each row.
  • There is no top-to-bottom nor left-to-right order.
  • No duplicated rows are allowed.

1NF Example

The following example shows a table that is not in 1NF:

Player Information Team Info Game Stats Arena Info Sponsor
LeBron James, Lakers Frank Vogel, Los Angeles 25, 8, 7, 2023-01-15 Staples Center, Los Angeles Nike
LeBron James, Lakers Frank Vogel, Los Angeles 30, 9, 6, 2023-01-17 Staples Center, Los Angeles Nike
Stephen Curry, Warriors Steve Kerr, San Francisco 32, 5, 4, 2023-01-15 Chase Center, San Francisco Under Armour
Stephen Curry, Warriors Steve Kerr, San Francisco 28, 7, 5, 2023-01-17 Chase Center, San Francisco Under Armour

The game stats are not atomic, we can divide the data. The following example shows how to change this denormalized table to the 1NF:

Player Name Team Name Coach Name Team City Points Assists Rebounds Game Date Arena Arena City Sponsor
LeBron James Lakers Frank Vogel Los Angeles 25 8 7 15/01/2023 Staples Center Los Angeles Nike
LeBron James Lakers Frank Vogel Los Angeles 30 9 6 17/01/2023 Staples Center Los Angeles Nike
Stephen Curry Warriors Steve Kerr San Francisco 32 5 4 15/01/2023 Chase Center San Francisco Under Armour
Stephen Curry Warriors Steve Kerr San Francisco 28 7 5 17/01/2023 Chase Center San Francisco Under Armour

We divided the game stats in points, assists, rebounds and the game date.

Second Normal Form (2NF)

For the 2NF, the database table must accomplish the 1NF and eliminate partial dependencies. The partial dependency occurs when the columns that are not primary keys are partially dependent on a primary key. In other words, if the non-primary keys are dependent on the primary key, it is in the 2NF.

2NF Example

Let's say that we have the previous table with basketball statistics in the 1NF:

Player Name Team Name Coach Name Team City Points Assists Rebounds Game Date Arena Arena City Sponsor
LeBron James Lakers Frank Vogel Los Angeles 25 8 7 15/01/2023 Staples Center Los Angeles Nike
LeBron James Lakers Frank Vogel Los Angeles 30 9 6 17/01/2023 Staples Center Los Angeles Nike
Stephen Curry Warriors Steve Kerr San Francisco 32 5 4 15/01/2023 Chase Center San Francisco Under Armour
Stephen Curry Warriors Steve Kerr San Francisco 28 7 5 17/01/2023 Chase Center San Francisco Under Armour

As you can see, there is some dependency between Team Name, Team City, and Coach Name. There are also dependencies between Arena Name and Arena City. Sponsor does not have a dependency with the other columns, so it can be in a separate table. Finally, the player statistics can also be separated in another table because they do have a dependency with other columns.

To achieve the 2NF, we will divide the table into the following tables:

Players

Player ID Player Name
1 LeBron James
2 Stephen Curry

Teams

Team ID Team Name Coach Name Team City
1 Lakers Frank Vogel Los Angeles
2 Warriors Steve Kerr San Francisco

Arenas

Arena ID Arena Name Arena City
1 Staples Center Los Angeles
2 Chase Center San Francisco

Sponsors

Sponsor ID Sponsor Name
1 Nike
2 Under Armour

PlayerStats

Player ID Team ID Arena ID Sponsor ID Points Assists Rebounds Game Date
1 1 1 1 25 8 7 15/01/2023
1 1 1 1 30 9 6 17/01/2023
2 2 2 2 32 5 4 15/01/2023
2 2 2 2 28 7 5 17/01/2023

Third Second Normal Form (3NF)

The table is in the 3NF when two conditions are accomplished:

  • The table is in 2NF.
  • The transitive dependency should be removed.

The transitive dependency occurs when a non-key attribute depends on another with no key. It should depend on the primary key instead.

3NF Example

Let's take a look at an example to understand the 3NF.

To convert from the 2NF to the 3NF, we need to remove the transitive dependencies first. In this example, the Coach Name and Team City depend on the Team Name. To fix this, we will convert to the 3NF by doing this:

Teams

Team ID Team Name
1 Lakers
2 Warriors

TeamDetails

Team ID Coach Name Team City
1 Frank Vogel Los Angeles
2 Steve Kerr San Francisco

Dividing the table players into teams and team details will remove the transitive dependencies.

The Boyce-Codd Normal Form (BCNF) Normalization

The BCNF was created to fix some normalization problems that were not fixed in the 2NF. A table is in BCNF if all redundancy based on functional dependency is removed. The following example illustrates the BCNF.

BCNF Example

The following example shows how to convert the tables to the BCNF. We create a table PlayerSponsors to make sure that every determinant is a candidate key and we remove the Sponsor ID column from the PlayerStats table.

PlayerSponsors

Player ID Sponsor ID
1 1
2 2

PlayerStats

Player ID Team ID Arena ID Points Assists Rebounds Game Date
1 1 1 25 8 7 15/01/2023
1 1 1 30 9 6 17/01/2023
2 2 2 32 5 4 15/01/2023
2 2 2 28 7 5 17/01/2023

Fourth Normal Form (4NF)

The 4NF is related to the multivalued dependency. A multivalued dependency occurs when one attribute of the table uniquely determines another attribute, and both attributes are independent of each other.

4NF Example

The players can have several sponsors and arenas in different games. That can lead to a multivalued dependency. To fix that, we will modify the PlayerArenas table like this:

PlayerArenas

Player ID Arena ID Game Date
1 1 15/01/2023
1 1 17/01/2023
2 2 15/01/2023
2 2 17/01/2023

We will also remove the Arena ID from the PlayerStats table.

PlayerStats

Player ID Team ID Points Assists Rebounds Game Date
1 1 25 8 7 15/01/2023
1 1 30 9 6 17/01/2023
2 2 32 5 4 15/01/2023
2 2 28 7 5 17/01/2023

Fifth Normal Form (5NF)

Finally, we have the 5NF. This normalization is also called the PJ AND NF (projection and join normal form). This form removes redundancy by recording multi-valued facts by the isolation of semantically-related multiple relationships. In other words, a table is in the 5NF if every non-trivial join dependency in the table is implied by the primary key. To understand this 5NF, we will use an example.

5NF Example

In this example, we have two tables and we will decompose the PlayerStats table into 2 tables:

PlayerPerformance

Player ID Team ID Arena ID Game Date Points Assists Rebounds
1 1 1 15/01/2023 25 8 7
1 1 1 17/01/2023 30 9 6
2 2 2 15/01/2023 32 5 4
2 2 2 17/01/2023 28 7 5

PlayerTeams

Player ID Team ID Game Date
1 1 15/01/2023
1 1 17/01/2023
2 2 15/01/2023
2 2 17/01/2023

The decomposition into two tables ensures that the information can be reconstructed and that the join dependencies are consequences of the candidate keys.

Next Steps

For more related information, refer to the following links:



sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Daniel Calbimonte Daniel Calbimonte is a Microsoft SQL Server MVP, Microsoft Certified Trainer and 6-time Microsoft Certified IT Professional. Daniel started his career in 2001 and has worked with SQL Server 6.0 to 2022. Daniel is a DBA as well as specializes in Business Intelligence (SSIS, SSAS, SSRS) technologies.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips


Article Last Updated: 2024-07-15

Comments For This Article




Tuesday, July 16, 2024 - 7:24:33 AM - Selcuk K Back To Top (92393)
Under 1NF you remarked: "The rows should contain a primary key to identify each row" But there is no primary key. You did not put it. Why ?














get free sql tips
agree to terms