Jump to content

First normal form

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by 80.62.117.218 (talk) at 09:45, 8 June 2021 (Explain normalization process, with reference to Codd). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

First normal form (1NF) is a property of a relation in a relational database. A relation is in first normal form if and only if the domain of each attribute contains only atomic (indivisible) values, and the value of each attribute contains only a single value from that domain.[1] The first definition of the term, in a 1971 conference paper by Edgar Codd, defined a relation to be in first normal form when none of its domains have any sets as elements.[2] These two definitions do not conflict with each other.[citation needed]

First normal form is an essential property of a relation in a relational database. Database normalization is the process of representing a database in terms of relations in standard normal forms, where first normal is a minimal requirement.

Overview

In a hierarchical database like the IBM Information Management System a record can have fields which themselves contain sets of records. These are known as repeating groups or non-simple domains. First normal form eliminates nested records by turning them into separate relations associated through foreign keys rather than through containment.

The purpose of first normal form is to ensure that all data values can be uniquely identified just by relation name and attribute name. Nested records require more complex addressing, where a data language have to navigate through the hierarchy to identity a data item. This also couples addressing tightly to the structure of the relationships. With first normal form, addressing is separated from the structure of the relationships which makes the data language simpler and more resilient to changes in the data model. [3]

Most relational databases management systems does not support nested records, so tables are in first normal form by default. Normalization to first normal form would therefore be a necessary step when moving data from a hierarchical database to a relational database.

First normal form was introduced by E.F. Codd in the paper "A Relational Model of Data for Large Shared Data Banks", although it was initially just called "Normal Form". It was renamed to "First Normal Form" when additional normal forms were introduced in the paper Further Normalization of the Relational Model [4]

Examples

The following scenarios first illustrate how a database design might violate first normal form, followed by examples that comply.

Designs that violate 1NF

Below is a table that stores the names and telephone numbers of customers. One requirement, though, is to retain multiple telephone numbers for some customers. In database supporting non-simple domains, the "Telephone Number" field could contain a set of phone numbers:

Customer
Customer ID First Name Surname Telephone Number
123 Pooja Singh
555-861-2025
192-122-1111
456 San Zhang
(555) 403-1659 Ext. 53
182-929-2929
789 John Doe
555-808-9633

This violates the first normal form because some attributes (Telephone Number) contains sets of values rather than atomic values.

Designs that comply with 1NF

To bring the model into the first normal form, we can perform normalization. Normalization (to first normal form) is a process where attributes with non-simple domains are extracted to separate stand-alone relations. The extracted relations are amended with foreign keys referring to the primary key of the relation which contained it. The process can be applied recursively to non-simple domains nested in multiple levels.[5]

In this example, Customer ID is the primary key of the containing relations and will therefore be appended as foreign key to the new relation:

Customer Name
Customer ID First Name Surname
123 Pooja Singh
456 San Zhang
789 John Doe
Customer Telephone Number
Customer ID Telephone Number
123 555-861-2025
123 192-122-1111
456 (555) 403-1659 Ext. 53
456 182-929-2929
789 555-808-9633

Columns do not contain more than one telephone number in this design. Instead, each Customer-to-Telephone Number link appears on its own row. Using Customer ID as key, a one-to-many relationship exists between the name and the number tables. A row in the "parent" table, Customer Name, can be associated with many telephone number rows in the "child" table, Customer Telephone Number, but each telephone number belongs to one, and only one customer. (In the "real" world, this would not be a good assumption.) It is worth noting that this design meets the additional requirements for second and third normal form.

Atomicity

Edgar F. Codd's definition of 1NF makes reference to the concept of 'atomicity'. Codd states that the "values in the domains on which each relation is defined are required to be atomic with respect to the DBMS."[6] Codd defines an atomic value as one that "cannot be decomposed into smaller pieces by the DBMS (excluding certain special functions)"[7] meaning a column should not be divided into parts with more than one kind of data in it such that what one part means to the DBMS depends on another part of the same column.

Hugh Darwen and Chris Date have suggested that Codd's concept of an "atomic value" is ambiguous, and that this ambiguity has led to widespread confusion about how 1NF should be understood.[8][9] In particular, the notion of a "value that cannot be decomposed" is problematic, as it would seem to imply that few, if any, data types are atomic:

  • A character string would seem not to be atomic, as the RDBMS typically provides operators to decompose it into substrings.
  • A fixed-point number would seem not to be atomic, as the RDBMS typically provides operators to decompose it into integer and fractional components.
  • An ISBN would seem not to be atomic, as it includes language and publisher identifier.

Date suggests that "the notion of atomicity has no absolute meaning":[10][11] a value may be considered atomic for some purposes, but may be considered an assemblage of more basic elements for other purposes. If this position is accepted, 1NF cannot be defined with reference to atomicity. Columns of any conceivable data type (from string types and numeric types to array types and table types) are then acceptable in a 1NF table—although perhaps not always desirable; for example, it may be more desirable to separate a Customer Name column into two separate columns as First Name, Surname.

1NF tables as representations of relations

According to Date's definition, a table is in first normal form if and only if it is "isomorphic to some relation", which means, specifically, that it satisfies the following five conditions:[12]

  1. There's no top-to-bottom ordering to the rows.
  2. There's no left-to-right ordering to the columns.
  3. There are no duplicate rows.
  4. Every row-and-column intersection contains exactly one value from the applicable domain (and nothing else).
  5. All columns are regular [i.e. rows have no hidden components such as row IDs, object IDs, or hidden timestamps].

Violation of any of these conditions would mean that the table is not strictly relational, and therefore that it is not in first normal form.

Examples of tables (or views) that would not meet this definition of first normal form are:

  • A table that lacks a unique key constraint. Such a table would be able to accommodate duplicate rows, in violation of condition 3.
  • A view whose definition mandates that results be returned in a particular order, so that the row-ordering is an intrinsic and meaningful aspect of the view. (Such views cannot be created using SQL that conforms to the SQL:2003 standard.) This violates condition 1. The tuples in true relations are not ordered with respect to each other.
  • A table with at least one nullable attribute. A nullable attribute would be in violation of condition 4, which requires every column to contain exactly one value from its column's domain. This aspect of condition 4 is controversial. It marks an important departure from Codd's later vision of the relational model,[13] which made explicit provision for nulls.[14] First normal form, as defined by Chris Date, permits relation-valued attributes (tables within tables). Date argues that relation-valued attributes, by means of which a column within a table can contain a table, are useful in rare cases.[15]

See also

References

  1. ^ Elmasri, Ramez; Navathe, Shamkant B. (July 2003). Fundamentals of Database Systems (Fourth ed.). Pearson. p. 315. ISBN 0321204484. It states that the domain of an attribute must include only atomic (simple, indivisible) values and that the value of any attribute in a tuple must be a single value from the domain of that attribute.
  2. ^ Codd, E. F. (October 1972). Further normalization of the database relational model. Courant Institute: Prentice-Hall. ISBN 013196741X. A relation is in first normal form if it has the property that none of its domains has elements which are themselves sets. {{cite book}}: |work= ignored (help)
  3. ^ Codd, E.F (1970). "A Relational Model of Data for Large Shared Data Banks". Communications of the ACM. Classics. 13 (6): 377–87.
  4. ^ Codd, E. F. (1971). Further Normalization of the Relational Model. Courant Computer Science Symposium 6 in Data Base Systems edited by Rustin, R.
  5. ^ Codd, E.F (1970). "A Relational Model of Data for Large Shared Data Banks". Communications of the ACM. Classics. 13 (6): 377–87. p. 381
  6. ^ Codd, E. F. The Relational Model for Database Management Version 2 (Addison-Wesley, 1990).
  7. ^ Codd, E. F. The Relational Model for Database Management Version 2 (Addison-Wesley, 1990), p. 6.
  8. ^ Darwen, Hugh. "Relation-Valued Attributes; or, Will the Real First Normal Form Please Stand Up?", in C. J. Date and Hugh Darwen, Relational Database Writings 1989-1991 (Addison-Wesley, 1992).
  9. ^ Date, C. J. (2007). What First Normal Form Really Means. Apress. p. 108. ISBN 978-1-4842-2029-0. '[F]or many years,' writes Date, 'I was as confused as anyone else. What's worse, I did my best (worst?) to spread that confusion through my writings, seminars, and other presentations.' {{cite book}}: |work= ignored (help)
  10. ^ Date, C. J. (2007). What First Normal Form Really Means. Apress. p. 112. ISBN 978-1-4842-2029-0. {{cite book}}: |work= ignored (help)
  11. ^ Date, C. J. (6 November 2015). SQL and Relational Theory: How to Write Accurate SQL Code. O'Reilly Media. pp. 50–. ISBN 978-1-4919-4115-7. Retrieved 31 October 2018.
  12. ^ Date, C. J. (2007). What First Normal Form Really Means. Apress. pp. 127–128. ISBN 978-1-4842-2029-0. {{cite book}}: |work= ignored (help)
  13. ^ Date, C. J. (2009). "Appendix A.2". SQL and Relational Theory. O'Reilly. Codd first defined the relational model in 1969 and didn't introduce nulls until 1979
  14. ^ Date, C. J. (October 14, 1985). "Is Your DBMS Really Relational?". Computerworld. Null values ... [must be] supported in a fully relational DBMS for representing missing information and inapplicable information in a systematic way, independent of data type. (the third of Codd's 12 rules)
  15. ^ Date, C. J. (2007). What First Normal Form Really Means. Apress. pp. 121–126. ISBN 978-1-4842-2029-0. {{cite book}}: |work= ignored (help)

Further reading