Difference Between Normalization and Denormalization

Normalization vs Denormalization

Relational databases are made up of relations (related tables). Tables are made up of columns. If the tables are two large (i.e. too many columns in one table), then database anomalies can occur. If the tables are two small (i.e. database is made up of many smaller tables), it would be inefficient for querying. Normalization and Denormalization are two processes that are used to optimize the performance of the database. Normalization minimizes the redundancies that are present in data tables. Denormalization (reverse of normalization) adds redundant data or group data.

What is Normalization?

Normalization is a process that is carried out to minimize the redundancies that are present in data in relational databases. This process will mainly divide large tables in to smaller tables with fewer redundancies (called “Normal forms”). These smaller tables will be related to each other through well defined relationships. In a well normalized database, any alteration or modification in data will requires modifying only a single table. First Normal Form (1NF), Second Normal Form (2NF), and the Third Normal Form (3NF) were introduced by Edgar F. Codd. Boyce-Codd Normal Form (BCNF) was introduced in 1974 by Codd and Raymond F. Boyce. Higher Normal Forms (4NF, 5NF and 6NF) have been defined, but they are being used rarely.

A table that complies with 1NF assures that it actually represents a relation (i.e. it does not contain any records that are repeating), and does not contain any attributes that are relational valued (i.e. all the attributes should have atomic values). For a table to comply with 2NF, it should be complied with 1NF and any attribute that is not a part of any candidate key (i.e. non-prime attributes) should fully depend on any of the candidate keys in the table. According to the Codd’s definition, a table is said to be in 3NF, if and only if, that table is in the second normal form (2NF) and every attribute in the table that do not belong to a candidate key should directly depend on every candidate key of that table. BCNF (also known as 3.5NF) captures some the anomalies that are not addressed by the 3NF.

What is Denormalization?

Denormalization is the reverse process of the normalization process. Denormalization works by adding redundant data or grouping data to optimize the performance. Even though, adding redundant data sounds counter-productive, sometimes denormalization is a very important process to overcome some of the shortcomings in the relational database software that may incur heavy performance penalties with normalized databases (even tuned for higher performance). This is because joining several relations (which are results of normalizing) to produce a result to a query can sometimes be slow depending on the actual physical implementation of the database systems.

What is the difference between Normalization and Denormalization?

- Normalization and denormalization are two processes that are completely opposite.

- Normalization is the process of dividing larger tables in to smaller ones reducing the redundant data, while denormalization is the process of adding redundant data to optimize performance.

- Normalization is carried out to prevent databases anomalies.

- Denormalization is usually carried out to improve the read performance of the database, but due to the additional constraints used for denormalization, writes (i.e. insert, update and delete operations) can become slower. Therefore, a denormalized database can offer worse write performance than a normalized database.

- It is often recommended that you should “normalize until it hurts, denormalize until it works”.