Functional dependency

From Wikipedia, the free encyclopedia
Jump to: navigation, search

In relational database theory, a functional dependency is a constraint between two sets of attributes in a relation from a database.

Given a relation R, a set of attributes X in R is said to functionally determine another set of attributes Y, also in R, (written XY) if, and only if, each X value is associated with precisely one Y value; R is then said to satisfy the functional dependency XY. Equivalently, the projection \pi_{X,Y}R is a function, i.e. Y is a function of X.[1][2] In simple words, if the values for the X attributes are known (say they are x), then the values for the Y attributes corresponding to x can be determined by looking them up in any tuple of R containing x. Customarily X is called the determinant set and Y the dependent set. A functional dependency FD: XY is called trivial if Y is a subset of X.

The determination of functional dependencies is an important part of designing databases in the relational model, and in database normalization and denormalization. A simple application of functional dependencies is Heath’s theorem; it says that a relation R over an attribute set U and satisfying a functional dependency XY can be safely split in two relations having the lossless-join decomposition property, namely into \pi_{XY}(R)\bowtie\pi_{XZ}(R) = R where Z = UXY are the rest of the attributes. (Unions of attribute sets are customarily denoted by mere juxtapositions in database theory.) An important notion in this context is a candidate key, defined as a minimal set of attributes that functionally determine all of the attributes in a relation. The functional dependencies, along with the attribute domains, are selected so as to generate constraints that would exclude as much data inappropriate to the user domain from the system as possible.

A notion of logical implication is defined for functional dependencies in the following way: a set of functional dependencies \Sigma logically implies another set of dependencies \Gamma, if any relation R satisfying all dependencies from \Sigma also satisfies all dependencies from \Gamma; this is usually written \Sigma \models \Gamma. The notion of logical implication for functional dependencies admits a sound and complete finite axiomatization, known as Armstrong's axioms.

Examples[edit]

Cars[edit]

Suppose one is designing a system to track vehicles and the capacity of their engines. Each vehicle has a unique vehicle identification number (VIN). One would write VINEngineCapacity because it would be inappropriate for a vehicle's engine to have more than one capacity. (Assuming, in this case, that vehicles only have one engine.) However, EngineCapacityVIN, is incorrect because there could be many vehicles with the same engine capacity.

This functional dependency may suggest that the attribute EngineCapacity be placed in a relation with candidate key VIN. However, that may not always be appropriate. For example, if that functional dependency occurs as a result of the transitive functional dependencies VIN → VehicleModel and VehicleModel → EngineCapacity then that would not result in a normalized relation.

Lectures[edit]

This example illustrates the concept of functional dependency. The situation modelled is that of college students visiting one or more lectures in each of which they are assigned a teaching assistant (TA). Let's further assume that every student is in some semester and is identified by a unique integer ID.

StudentID Semester Lecture TA
1234 6 Numerical Methods Azhar
2380 4 Numerical Methods Peter
1234 6 Visual Computing Ahmed
1201 4 Numerical Methods Peter
1201 4 Physics II Simone

We notice that whenever two rows in this table feature the same StudentID, they also necessarily have the same Semester values. This basic fact can be expressed by a functional dependency:

  • StudentID → Semester.

Other nontrivial functional dependencies can be identified, for example:

  • {StudentID, Lecture} → TA
  • {StudentID, Lecture} → {TA, Semester}

The latter expresses the fact that the set {StudentID, Lecture} is a superkey of the relation.

Properties and axiomatization of functional dependencies[edit]

Given that X, Y, and Z are sets of attributes in a relation R, one can derive several properties of functional dependencies. Among the most important are the following, usually called Armstrong's axioms:[3]

  • Reflexivity: If Y is a subset of X, then XY
  • Augmentation: If XY, then XZYZ
  • Transitivity: If XY and YZ, then XZ

"Reflexivity" can be weakened to just X \rightarrow \varnothing, i.e. it is an actual axiom, where the other two are proper inference rules, more precisely giving rise to the following rules of syntactic consequence:[4]

\vdash X \rightarrow \varnothing
X \rightarrow Y \vdash XZ \rightarrow YZ
X \rightarrow Y, Y \rightarrow Z \vdash X \rightarrow Z.

These three rules are a sound and complete axiomatization of functional dependencies. This axiomatization is sometimes described as finite because the number of inference rules is finite,[5] with the caveat that the axiom and rules of inference are all schemata, meaning that the X, Y and Z range over all ground terms (attribute sets).[4]

From these rules, we can derive these secondary rules:[3]

  • Union: If XY and XZ, then XYZ
  • Decomposition: If XYZ, then XY and XZ
  • Pseudotransitivity: If XY and WYZ, then WXZ

The union and decomposition rules can be combined in a logical equivalence stating that XYZ, holds iff XY and XZ. This is sometimes called the splitting/combining rule.[6]

Another rule that is sometimes handy is:[7]

  • Composition: If XY and ZW, then XZYW


Closure of Functional Dependency[edit]

The closure is basically the full set of values that can be determined from a set of known values for a given relationship using its functional dependencies. You use Armstrong's axioms to provide a proof - ie. Reflexivity, Augmentation, Transitivity.

Given R and F a set of FD’s that holds in R: The closure of F in R (denoted F+) is the set of all FD’s in that are logically implied by F

Closure of a set of attributes[edit]

Closure of a set of attributes X with respect to F is the set X+ of all attributes that are functionally determined by X using F+.

Example[edit]

Imagine the following list of FD's. We are going to calculate a closure for A from this relationship.

1. AB
2. B → C
3. ABD

The closure would be as follows:

a) A → A ( by Armstrong's reflexivity )
b) A → AB ( by 1. and (a ))
c) A → ABD ( by (b), 3, and Armstrong's transitivity )
d) A → ABCD ( by (c), and 2 )

The closure is therefore A → ABCD. By calculating the closure of A, we have validated that A is also a good candidate key as its closure is every single data value in the relationship.

Covers and Equivalence[edit]

Covers[edit]

Definition: F covers G if every FD in G can be inferred from F. F covers G if G+F+
Every set of functional dependencies has a canonical cover.

Equivalence of two sets of FD's[edit]

Two sets of FDs F and G over schema R are equivalent, written FG, if F+ = G+. If FG, then F is a cover for G and vice versa. In other words, equivalent sets of functional dependencies are called covers of each other.

Non-redundant Covers[edit]

A set F of FDs is nonredundant if there is no proper subset F' of F with F' = F. If such an F' exists, F is redundant. F is a nonredundant cover for G if F is a cover for G and F is nonredundant.
An alternative characterization of nonredundancy is that F is nonredundant if there is no FD XY in F such that F - {XY} \models XY. Call an FD XY in F redundunt in F if F - {XY} \models XY Y.

Applications to normalization[edit]

Heath's theorem[edit]

An important property (yielding an immediate application) of functional dependencies is that if R is a relation with columns named from some set of attributes U and R satisfies some functional dependency XY then R=\pi_{XY}(R)\bowtie\pi_{XZ}(R) where Z = UXY. Intuitively, if a functional dependency XY holds in R, then the relation can be safely split in two relations alongside the column X (which is a key for \pi_{XY}(R)\bowtie\pi_{XZ}(R)) ensuring that when the two parts are joined back no data is lost, i.e. a functional dependency provides a simple way to construct a lossless-join decomposition of R in two smaller relations. This fact is sometimes called Heath’s theorem; it is one of the early results in database theory.[8]

Heath’s theorem effectively says we can pull out the values of Y from the big relation R and store them into one, \pi_{XY}(R), which has no value repetitions in the row for X and is effectively a lookup table for Y keyed by X and consequently has only one place to update the Y corresponding to each X unlike the "big" relation R where there are potentially many copies of each X, each one with its copy of Y which need to be kept synchronized on updates. (This elimination of redundancy is an advantage in OLTP contexts, where many changes are expected, but not so much in OLAP contexts, which involve mostly queries.) Heath’s decomposition leaves only X to act as a foreign key in the remainder of the big table \pi_{XZ}(R).

Functional dependencies however should not be confused with inclusion dependencies, which are the formalism for foreign keys; even though they are used for normalization, functional dependencies express constraints over one relation (schema), whereas inclusion dependencies express constraints between relation schemas in a database schema.Furthermore, the two notions do not even intersect in the classification of dependencies: functional dependencies are equality-generating dependencies whereas inclusion dependencies are tuple-generating dependencies. Enforcing referential constraints after relation schema decomposition (normalization) requires a new formalism, i.e. inclusion dependencies. In the decomposition resulting from Heath's theorem, there's nothing preventing the insertion of tuples in \pi_{XZ}(R) having some value of X not found in \pi_{XY}(R).

Normal forms[edit]

Normal forms are database normalization levels which determine the "goodness" of a table. Generally, the third normal form is considered to be a "good" standard for a relational database.[citation needed]

Normalization aims to free the database from update, insertion and deletion anomalies. It also ensures that when a new value is introduced into the relation, it has minimal effect on the database, and thus minimal effect on the applications using the database.[citation needed]

Irreducible function depending set[edit]

A functional depending set S is irreducible if the set has the following three properties:

  1. Each right set of a functional dependency of S contains only one attribute.
  2. Each left set of a functional dependency of S is irreducible. It means that reducing any one attribute from left set will change the content of S (S will lose some information).
  3. Reducing any functional dependency will change the content of S.

Sets of Functional Dependencies(FD) with these properties are also called canonical or minimal.

See also[edit]

References[edit]

  1. ^ Terry Halpin (2008). Information Modeling and Relational Databases (2nd ed.). Morgan Kaufmann. p. 140. ISBN 978-0-12-373568-3. 
  2. ^ Chris Date (2012). Database Design and Relational Theory: Normal Forms and All That Jazz. O'Reilly Media, Inc. p. 21. ISBN 978-1-4493-2801-6. 
  3. ^ a b Abraham Silberschatz; Henry Korth; S. Sudarshan (2010). Database System Concepts (6th ed.). McGraw-Hill. p. 339. ISBN 978-0-07-352332-3. 
  4. ^ a b M. Y. Vardi. Fundamentals of dependency theory. In E. Borger, editor, Trends in Theoretical Computer Science, pages 171–224. Computer Science Press, Rockville, MD, 1987. ISBN 0881750840
  5. ^ Abiteboul, Serge; Hull, Richard B.; Vianu, Victor (1995), Foundations of Databases, Addison-Wesley, pp. 164–168, ISBN 0-201-53771-0 
  6. ^ Hector Garcia-Molina; Jeffrey D. Ullman; Jennifer Widom (2009). Database systems: the complete book (2nd ed.). Pearson Prentice Hall. p. 73. ISBN 978-0-13-187325-4. 
  7. ^ S. K. Singh (2009) [2006]. Database Systems: Concepts, Design & Applications. Pearson Education India. p. 323. ISBN 978-81-7758-567-4. 
  8. ^ Heath, I. J. (1971). "Unacceptable file operations in a relational data base". Proceedings of the 1971 ACM SIGFIDET (now SIGMOD) Workshop on Data Description, Access and Control - SIGFIDET '71. pp. 19–33. doi:10.1145/1734714.1734717.  edit cited in:

External links[edit]