Recall that 1NF sets the very basic rules for an organized database:
Manager |
Subordinate1 |
Subordinate2 |
Subordinate3 |
Subordinate4 |
Bob |
Jim |
Mary |
Beth |
|
Mary |
Mike |
Jason |
Carol |
Mark |
Jim |
Alan |
|
|
|
However, recall the first
rule imposed by 1NF: eliminate duplicative columns from the same table.
Clearly, the Subordinate1-Subordinate4 columns are duplicative.
Take a moment and ponder the problems raised by this scenario. Jim only has one subordinate - the
Subordinate2-Subordinate4 columns are simply wasted storage space (a precious
database commodity). Furthermore,
Mary already has 4 subordinates - what happens if she takes on another
employee? The whole table structure would require modification.
At this point, a second
bright idea usually occurs to database novices: We don't want to have more
than one column and we want to allow for a flexible amount of data storage.
Let's try something like this:
Manager |
Subordinates |
Bob |
Jim, Mary, Beth |
Mary |
Mike, Jason, Carol,
Mark |
Jim |
Alan |
This solution is closer,
but it also falls short of the mark. The
subordinates column is still duplicative and non-atomic.
What happens when we need to add or remove a subordinate?
We need to read and write the entire contents of the table.
That's not a big deal in this situation, but what if one manager had
one hundred employees? Also, it
complicates the process of selecting data from the database in future queries.
Here's a table that
satisfies the first rule of 1NF:
Manager |
Subordinate |
Bob |
Jim |
Bob |
Mary |
Bob |
Beth |
Mary |
Mike |
Mary |
Jason |
Mary |
Carol |
Mary |
Mark |
Jim |
Alan |
Now, what about the second
rule: identify each row with a unique column or set of columns (the primary
key)? You might take a look at the
table above and suggest the use of the subordinate column as a primary key.
In fact, the subordinate column is a good candidate for a primary key due
to the fact that our business rules specified that each subordinate may have
only one manager. However, the data
that we've chosen to store in our table makes this a less than ideal solution. What happens if we hire another employee named Jim?
How do we store his manager-subordinate relationship in the database?
It's best to use a truly
unique identifier (like an employee ID or SSN) as a primary key.
Our final table would look like this:
Manager |
Subordinate |
182 |
143 |
182 |
201 |
182 |
123 |
201 |
156 |
201 |
041 |
201 |
187 |
201 |
196 |
143 |
202 |
Now, the table is in first normal form.
On to ... Second Normal Form