Sixth normal form

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Sixth normal form (6NF) is a term in relational database theory, used in two different ways.

6NF (C. Date's definition)[edit]

A book by Christopher J. Date and others on temporal databases,[1] defined sixth normal form as a normal form for databases based on an extension of the relational algebra.

In this work, the relational operators, such as join, are generalized to support a natural treatment of interval data, such as sequences of dates or moments in time.[2] Sixth normal form is then based on this generalized join, as follows:

A relvar R [table] is in sixth normal form (abbreviated 6NF) if and only if it satisfies no nontrivial join dependencies at all — where, as before, a join dependency is trivial if and only if at least one of the projections (possibly U_projections) involved is taken over the set of all attributes of the relvar [table] concerned.[Date et al.][3]

-Also a new definition - not as clear as the first* - has been given by the same authors (2014):

Relvar R is in sixth normal form (6NF) if and only if every JD of R is trivial — where a JD is trivial if and only if one of its components is equal to the pertinent heading in its entirety. [Date et al.] [4]

(Where JD state for join dependency. *We can wonder if a unary relvar has a trivial join dependency; also trivial refer to another ambiguous term, knowing: "pertinent".)

Any relation in 6NF is also in 5NF.

Sixth normal form is intended to decompose relation variables to irreducible components. Though this may be relatively unimportant for non-temporal relation variables, it can be important when dealing with temporal variables or other interval data. For instance, if a relation comprises a supplier's name, status, and city, we may also want to add temporal data, such as the time during which these values are, or were, valid (e.g., for historical data) but the three values may vary independently of each other and at different rates. We may, for instance, wish to trace the history of changes to Status.

For further discussion on Temporal Aggregation in SQL, see also Zimanyi.[5] For a different approach, see TSQL2.[6]

DKNF[edit]

Some authors use the term sixth normal form differently, namely, as a synonym for Domain/key normal form (DKNF). This usage predates Date et al.'s work.[7]

Usage[edit]

The sixth normal form is currently being used in some data warehouses where the benefits outweigh the drawbacks,[8] for example using Anchor Modeling. Although using 6NF leads to an explosion of tables, modern databases can prune the tables from select queries (using a process called 'table elimination') where they are not required and thus speed up queries that only access several attributes.

Examples[edit]

In order for a table to be in 6NF, it has to comply with the 5NF first and then it requires that each table satisfies only trivial join dependencies. Let’s take a simple example[9] with a table already in 5NF: Here, in the users table, every attribute is non null and the primary key is the username:

Users_table

Username Department Status

This table is in 5NF because each join dependency is implied by the candidate key. More specific, the only possible join dependencies are: {username, status}, {username,department}.

The 6NF version would look like this:

Users

Username Status

Users_dept

Username Department

Nevertheless you need to think very much before trying to apply the 6NF normalization because this implies an explosion of tables and may not suit your needs.

Another example in which we can demonstrate the 6NF is when we look at the space occupied. For this we chose the healthcare domain with this table:

TABLE 1

Medic Name Occupation Type Practice in years
Smith James orthopedic specialist 23
Miller Michael orthopedic probationer 4
Thomas Linda neurologist probationer 5
Scott Nancy orthopedic resident 1
Allen Brian neurologist specialist 12
Turner Steven ophthalmologist probationer 3
Collins Kevin ophthalmologist specialist 7
King Donald neurologist resident 1
Harris Sarah ophthalmologist resident 2

The healthcare domain contains several specializations up until the maximum developement in this domain. These are: - resident - probationer - specialist

To get promoted to a higher position, it will take several years for someone to obtain his or her proper training. If a doctor has practiced in the field less than the required period, he or she won’t be able to advance in rank. For example: If Michael Miller, orthopedic probationer, has worked in the medical field for 3 years and 11 months, he won’t be able to become an orthopedic specialist because the minimum period to promote from probationare to specialist is 4 years.

The transition from one position to another is based on an exam. The exam, required to make the progress from one grade to another (for example: from probationer to specialist), can be taken after a period of 4 years.

The next step in applying the 6NF for the Table 1, is to eliminate all non-trivial join dependencies.

TABLE 2.1

Medic Name Occupation Practice in years
Smith James orthopedic 23
Miller Michael orthopedic 4
Thomas Linda neurologist 5
Scott Nancy orthopedic 1
Allen Brian neurologist 12
Turner Steven ophthalmologist 3
Collins Kevin ophthalmologist 7
King Donald neurologist 1
Harris Sarah ophthalmologist 2

TABLE 2.2

Type Min practice Max practice
resident 0 2
probationer 3 5
specialist 6 45

We will show now that passing from 5NF to 6NF also reduces the space occupied by the table. In the brackets, it is indicated how much space each field of the table occupies (in bytes).

Medic Name Occupation Type Practice in years
Smith James[12] orthopedic[11] specialist[11] 23[4]
Miller Michael[15] orthopedic[11] probationer[12] 4[4]
Thomas Linda[13] neurologist[12] probationer[12] 5[4]
Scott Nancy[12] orthopedic[11] resident[9] 1[4]
Allen Brian[12] neurologist[12] specialist[11] 12[4]
Turner Steven[14] ophthalmologist[16] probationer[12] 3[4]
Collins Kevin[14] ophthalmologist[16] specialist[11] 7[4]
King Donald[12] neurologist[12] resident[9] 1[4]
Harris Sarah[13] ophthalmologist[16] resident[9] 2[4]

TABLE 1 = [366] (bytes)

We can see that Table 1, which is in 5NF, occupies, in total, 366 bytes. This table translated into 6NF will consist of tables Table 2.1 and Table 2.2. The last 2 will occupy together 326 bytes.

Medic Name Occupation Practice in years
Smith James[12] orthopedic[11] 23[4]
Miller Michael[15] orthopedic[11] 4[4]
Thomas Linda[13] neurologist[12] 5[4]
Scott Nancy[12] orthopedic[11] 1[4]
Allen Brian[12] neurologist[12] 12[4]
Turner Steven[14] ophthalmologist[16] 3[4]
Collins Kevin[14] ophthalmologist[16] 7[4]
King Donald[12] neurologist[12] 1[4]
Harris Sarah[13] ophthalmologist[16] 2[4]

TABLE 2.1 = [270]

Type Min practice Max practice
resident[9] 0[4] 2[4]
probationer[12] 3[4] 5[4]
specialist[11] 6[4] 45[4]

TABLE 2.2 = [56] => TABLE 2.1 + TABLE 2.2 = [326] (bytes)

We can see that, in this example, 6NF occupies less than 5NF (more specific, less with 40 bytes). Going into the 6NF reduces the occupied space. If the initial table is larger, after going into the 6NF, the reduced space will also be larger.

References[edit]

  1. ^ Date et al., 2003
  2. ^ op. cit., chapter 9: Generalizing the relational operators
  3. ^ op. cit., section 10.4, p. 176
  4. ^ op. cit., chapter 12, p. 213
  5. ^ Zimanyi 2005
  6. ^ Snodgrass, Richard T. TSQL2 Temporal Query Language. Describes history, gives references to standard and original book.
  7. ^ See www.dbdebunk.com for a discussion on this topic
  8. ^ See the Anchor Modeling website for a website that describes a data warehouse modelling method based on the sixth normal form
  9. ^ Example provided by: http://www.anattatechnologies.com/q/2011/07/normalization-6nf/

Further reading[edit]

  • Date, C.J. (2006). The relational database dictionary: a comprehensive glossary of relational terms and concepts, with illustrative examples. O'Reilly Series Pocket references. O'Reilly Media, Inc. p. 90. ISBN 978-0-596-52798-3. 
  • Date, Chris J.; Hugh Darwen; Nikos A. Lorentzos (January 2003). Temporal Data and the Relational Model: A Detailed Investigation into the Application of Interval and Relation Theory to the Problem of Temporal Database Management. Oxford: Elsevier LTD. ISBN 1-55860-855-9. 
  • Date, Chris J.; Hugh Darwen; Nikos A. Lorentzos (12 August 2014). Time and relational theory - Temporal databases in the relational model and SQL. Elsevier-Morgan Kaufmann. ISBN 9780128006313. 
  • Zimanyi,, E. (June 2006). "Temporal Aggregates and Temporal Universal Quantification in Standard SQL" (PDF). ACM SIGMOD Record, volume 35, number 2, page 16. ACM.