Codd's theorem
Codd's theorem states that relational algebra and the domain-independent relational calculus queries, two well-known foundational query languages for the relational model, are precisely equivalent in expressive power. That is, a database query can be formulated in one language if and only if it can be expressed in the other.
The theorem is named after Edgar F. Codd, the father of the relational model for database management.
The domain independent relational calculus queries are precisely those relational calculus queries that are invariant under choosing domains of values beyond those appearing in the database itself. That is, queries that may return different results for different domains are excluded. An example of such a forbidden query is the query "select all tuples other than those occurring in relation R", where R is a relation in the database. Assuming different domains, i.e., sets of atomic data items from which tuples can be constructed, this query returns different results and thus is clearly not domain independent.
Codd's Theorem is notable since it establishes the equivalence of two syntactically quite dissimilar languages: relational algebra is a variable-free language, while relational calculus is a logical language with variables and quantification.
Relational calculus is essentially equivalent to first-order logic[citation needed], and indeed, Codd's Theorem had been known to logicians since the late 1940s.[1][2]
Query languages that are equivalent in expressive power to relational algebra were called relationally complete by Codd. By Codd's Theorem, this includes relational calculus. Relational completeness clearly does not imply that any interesting database query can be expressed in relationally complete languages. Well-known examples of inexpressible queries include simple aggregations (counting tuples, or summing up values occurring in tuples, which are operations expressible in SQL but not in relational algebra) and computing the transitive closure of a graph given by its binary edge relation (see also expressive power). Codd's theorem also doesn't consider SQL nulls and the three-valued logic they entail; the logical treatment of nulls remains mired in controversy. (For recent work extending Codd's theorem in this direction see the 2012 paper of Franconi and Tessaris.[3]) Additionally, SQL allows duplicate rows (has multiset semantics.) Nevertheless, relational completeness constitutes an important yardstick by which the expressive power of query languages can be compared.
Notes
- ^ L.H. Chin and A. Tarski. Remarks on Projective Algebras. Bulletin of the AMS, 54:80-81, 1948.
- ^ A. Tarski and F.B. Thompson. Some general properties of cylindric algebras. Bulletin of the AMS, 58:65, 1952.
- ^ Enrico Franconi and Sergio Tessaris, On the Logic of SQL Nulls, Proceedings of the 6th Alberto Mendelzon International Workshop on Foundations of Data Management, Ouro Preto, Brazil, June 27-30, 2012. pp. 114-128
References
- Serge Abiteboul, Richard B. Hull, and Victor Vianu: Foundations of Databases. Addison-Wesley, 1995.
- E. F. Codd, "Relational completeness of data base sublanguages", in R. Rustin, (ed.) Data Base Systems, Proceedings of 6th Courant Computer Science Symposium (May 24-25, 1971: New York, N.Y.), pp. 65-98, Prentice-Hall, 1972, ISBN 013196741X