Introduction to relational database normalization.
The goal of database design is to generate a set of relation schemas that allows us to store information without unnecessary redundancy or duplication. Before generating tables we need to make sure that our design is optimal and does not allow redundant or duplicate data because redundancy increases the size of the database and complicates our insert, update and delete operations.
To accomplish such a task normalization comes in to picture. Normalization is a set of predefined rules that prevents data duplication. There are 9 normal forms, each normal form is applied if the previous normal form is satisfied. For almost 99% of applications, we need to apply the first three normal forms.
Types of database normalization.
There are various types of normal forms. Our focus will be on the first three normal forms.
First normal form(1NF): First Normal form says each cell in the table should contain an atomic value. In the below example, initially, the instructor column of the first row was containing two values. Then the table was changed to the first normal form.
Second normal form(2NF): Every table should describe only one entity and every column in that table should describe that entity. In the below example, the customer name is duplicated. To remove duplication a separate customers table is created. Although the Id of the customer is duplicated, integral value takes less space than a long string name.
Third normal form(3NF): The table should be in 2ND normal form. The third normal form says a column in the table should not be dependent on another column. In the below example, the balance column is dependent on the total invoice and total payment. So we should drop the balance column. It will be calculated while writing SQL queries.
Thus various types of normal forms help us to design schemas in such a way so that data duplication is avoided, cost of storing data is minimized. Also performing operations like insert, update and delete is less tedious.