By: Daniel Calbimonte | Updated: 2024-07-15 | Comments (1) | Related: > Database Design
Problem
Various levels of normalization in SQL can be used to reduce data redundancy and have a better-structured relational data model. This tutorial looks at these various levels with explanations and examples in Microsoft SQL Server for beginners.
Solution
In this tutorial, we will talk about normalization, a process to help organize your relational database design.
What is Normalization in SQL?
It is a process that reduces redundant data, improves data integrity, simplifies table structure, and fixes dependency issues. Normalization has different levels which are covered below.
Database Normalization Levels in a DBMS
The following table summarizes the levels of normalization in a relational model:
Normalization Level | Explanation |
---|---|
First Normal Form (1NF) | Ensures that the columns are atomic (indivisible) and all values in the column have the same data type. It means that the values cannot be divided into smaller units. For example, if we have several phone numbers inside the column, the value can be divided into smaller units. |
Second Normal Form (2NF) | Eliminates partial dependencies and makes sure that the non-key attributes are completely and functionally dependent on the primary key. You can have partial dependency when some of the columns partially depend on the primary key. For example, if you have a composed primary key of CustomerID and SalesID and you have the column SalesMonth which depends on the column SalesID, but not the CustomerID column. |
Third Normal Form (3NF) | Eliminates transitive dependencies. In other words, the non-primary keys do not depend on another non-primary key through a third attribute. The transitive dependency means that an attribute indirectly depends on another attribute. |
Boyce-Codd Normal Form (BCNF) | Ensures that all determinants of a table are candidate keys. It is also known as normalization 3.5. It ensures that the values determined by another value in the same table must be based on the complete key and not part of it. |
Fourth Normal Form (4NF) | Handles multi-valued dependencies by checking that there are no independent sets of multi-valued dependencies within a table. The multi-valued dependency exists between two columns when a value in the first column correspond to multiple values in the second one. |
Fifth Normal Form (5NF) | Verifies that the table information can be derived by joining two or more tables. All the information in a table should be able to be reconstructed joining smaller tables without data loss. |
First Normal Form (1NF)
For a database table to comply with 1NF, it should meet the following criteria:
- A cell cannot have more than a single value (must be indivisible).
- The rows should contain a primary key to identify each row.
- There is no top-to-bottom nor left-to-right order.
- No duplicated rows are allowed.
1NF Example
The following example shows a table that is not in 1NF:
Player Information | Team Info | Game Stats | Arena Info | Sponsor |
---|---|---|---|---|
LeBron James, Lakers | Frank Vogel, Los Angeles | 25, 8, 7, 2023-01-15 | Staples Center, Los Angeles | Nike |
LeBron James, Lakers | Frank Vogel, Los Angeles | 30, 9, 6, 2023-01-17 | Staples Center, Los Angeles | Nike |
Stephen Curry, Warriors | Steve Kerr, San Francisco | 32, 5, 4, 2023-01-15 | Chase Center, San Francisco | Under Armour |
Stephen Curry, Warriors | Steve Kerr, San Francisco | 28, 7, 5, 2023-01-17 | Chase Center, San Francisco | Under Armour |
The game stats are not atomic, we can divide the data. The following example shows how to change this denormalized table to the 1NF:
Player Name | Team Name | Coach Name | Team City | Points | Assists | Rebounds | Game Date | Arena | Arena City | Sponsor |
---|---|---|---|---|---|---|---|---|---|---|
LeBron James | Lakers | Frank Vogel | Los Angeles | 25 | 8 | 7 | 15/01/2023 | Staples Center | Los Angeles | Nike |
LeBron James | Lakers | Frank Vogel | Los Angeles | 30 | 9 | 6 | 17/01/2023 | Staples Center | Los Angeles | Nike |
Stephen Curry | Warriors | Steve Kerr | San Francisco | 32 | 5 | 4 | 15/01/2023 | Chase Center | San Francisco | Under Armour |
Stephen Curry | Warriors | Steve Kerr | San Francisco | 28 | 7 | 5 | 17/01/2023 | Chase Center | San Francisco | Under Armour |
We divided the game stats in points, assists, rebounds and the game date.
Second Normal Form (2NF)
For the 2NF, the database table must accomplish the 1NF and eliminate partial dependencies. The partial dependency occurs when the columns that are not primary keys are partially dependent on a primary key. In other words, if the non-primary keys are dependent on the primary key, it is in the 2NF.
2NF Example
Let's say that we have the previous table with basketball statistics in the 1NF:
Player Name | Team Name | Coach Name | Team City | Points | Assists | Rebounds | Game Date | Arena | Arena City | Sponsor |
---|---|---|---|---|---|---|---|---|---|---|
LeBron James | Lakers | Frank Vogel | Los Angeles | 25 | 8 | 7 | 15/01/2023 | Staples Center | Los Angeles | Nike |
LeBron James | Lakers | Frank Vogel | Los Angeles | 30 | 9 | 6 | 17/01/2023 | Staples Center | Los Angeles | Nike |
Stephen Curry | Warriors | Steve Kerr | San Francisco | 32 | 5 | 4 | 15/01/2023 | Chase Center | San Francisco | Under Armour |
Stephen Curry | Warriors | Steve Kerr | San Francisco | 28 | 7 | 5 | 17/01/2023 | Chase Center | San Francisco | Under Armour |
As you can see, there is some dependency between Team Name, Team City, and Coach Name. There are also dependencies between Arena Name and Arena City. Sponsor does not have a dependency with the other columns, so it can be in a separate table. Finally, the player statistics can also be separated in another table because they do have a dependency with other columns.
To achieve the 2NF, we will divide the table into the following tables:
Players
Player ID | Player Name |
---|---|
1 | LeBron James |
2 | Stephen Curry |
Teams
Team ID | Team Name | Coach Name | Team City |
---|---|---|---|
1 | Lakers | Frank Vogel | Los Angeles |
2 | Warriors | Steve Kerr | San Francisco |
Arenas
Arena ID | Arena Name | Arena City |
---|---|---|
1 | Staples Center | Los Angeles |
2 | Chase Center | San Francisco |
Sponsors
Sponsor ID | Sponsor Name |
---|---|
1 | Nike |
2 | Under Armour |
PlayerStats
Player ID | Team ID | Arena ID | Sponsor ID | Points | Assists | Rebounds | Game Date |
---|---|---|---|---|---|---|---|
1 | 1 | 1 | 1 | 25 | 8 | 7 | 15/01/2023 |
1 | 1 | 1 | 1 | 30 | 9 | 6 | 17/01/2023 |
2 | 2 | 2 | 2 | 32 | 5 | 4 | 15/01/2023 |
2 | 2 | 2 | 2 | 28 | 7 | 5 | 17/01/2023 |
Third Second Normal Form (3NF)
The table is in the 3NF when two conditions are accomplished:
- The table is in 2NF.
- The transitive dependency should be removed.
The transitive dependency occurs when a non-key attribute depends on another with no key. It should depend on the primary key instead.
3NF Example
Let's take a look at an example to understand the 3NF.
To convert from the 2NF to the 3NF, we need to remove the transitive dependencies first. In this example, the Coach Name and Team City depend on the Team Name. To fix this, we will convert to the 3NF by doing this:
Teams
Team ID | Team Name |
---|---|
1 | Lakers |
2 | Warriors |
TeamDetails
Team ID | Coach Name | Team City |
---|---|---|
1 | Frank Vogel | Los Angeles |
2 | Steve Kerr | San Francisco |
Dividing the table players into teams and team details will remove the transitive dependencies.
The Boyce-Codd Normal Form (BCNF) Normalization
The BCNF was created to fix some normalization problems that were not fixed in the 2NF. A table is in BCNF if all redundancy based on functional dependency is removed. The following example illustrates the BCNF.
BCNF Example
The following example shows how to convert the tables to the BCNF. We create a table PlayerSponsors to make sure that every determinant is a candidate key and we remove the Sponsor ID column from the PlayerStats table.
PlayerSponsors
Player ID | Sponsor ID |
---|---|
1 | 1 |
2 | 2 |
PlayerStats
Player ID | Team ID | Arena ID | Points | Assists | Rebounds | Game Date |
---|---|---|---|---|---|---|
1 | 1 | 1 | 25 | 8 | 7 | 15/01/2023 |
1 | 1 | 1 | 30 | 9 | 6 | 17/01/2023 |
2 | 2 | 2 | 32 | 5 | 4 | 15/01/2023 |
2 | 2 | 2 | 28 | 7 | 5 | 17/01/2023 |
Fourth Normal Form (4NF)
The 4NF is related to the multivalued dependency. A multivalued dependency occurs when one attribute of the table uniquely determines another attribute, and both attributes are independent of each other.
4NF Example
The players can have several sponsors and arenas in different games. That can lead to a multivalued dependency. To fix that, we will modify the PlayerArenas table like this:
PlayerArenas
Player ID | Arena ID | Game Date |
---|---|---|
1 | 1 | 15/01/2023 |
1 | 1 | 17/01/2023 |
2 | 2 | 15/01/2023 |
2 | 2 | 17/01/2023 |
We will also remove the Arena ID from the PlayerStats table.
PlayerStats
Player ID | Team ID | Points | Assists | Rebounds | Game Date |
---|---|---|---|---|---|
1 | 1 | 25 | 8 | 7 | 15/01/2023 |
1 | 1 | 30 | 9 | 6 | 17/01/2023 |
2 | 2 | 32 | 5 | 4 | 15/01/2023 |
2 | 2 | 28 | 7 | 5 | 17/01/2023 |
Fifth Normal Form (5NF)
Finally, we have the 5NF. This normalization is also called the PJ AND NF (projection and join normal form). This form removes redundancy by recording multi-valued facts by the isolation of semantically-related multiple relationships. In other words, a table is in the 5NF if every non-trivial join dependency in the table is implied by the primary key. To understand this 5NF, we will use an example.
5NF Example
In this example, we have two tables and we will decompose the PlayerStats table into 2 tables:
PlayerPerformance
Player ID | Team ID | Arena ID | Game Date | Points | Assists | Rebounds |
---|---|---|---|---|---|---|
1 | 1 | 1 | 15/01/2023 | 25 | 8 | 7 |
1 | 1 | 1 | 17/01/2023 | 30 | 9 | 6 |
2 | 2 | 2 | 15/01/2023 | 32 | 5 | 4 |
2 | 2 | 2 | 17/01/2023 | 28 | 7 | 5 |
PlayerTeams
Player ID | Team ID | Game Date |
---|---|---|
1 | 1 | 15/01/2023 |
1 | 1 | 17/01/2023 |
2 | 2 | 15/01/2023 |
2 | 2 | 17/01/2023 |
The decomposition into two tables ensures that the information can be reconstructed and that the join dependencies are consequences of the candidate keys.
Next Steps
For more related information, refer to the following links:
- Implement Dimension Denormalization Using Synapse Mapping Data Flow
- Understanding First Normal Form in SQL Server
- Explore the Role of Normal Forms in Dimensional Modeling
- How to create a SQL Server foreign key
- Drop and Re-Create All Foreign Key Constraints in SQL Server
- SQL Server Foreign Key Update and Delete Rules
- SQL Queries Tutorial
- SQL Quick Reference Guide
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips
Article Last Updated: 2024-07-15