Talk:Relational algebra

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Reworked Introduction[edit]

Today I pretty well completely rewrote the introduction. The most signficant changes I made are as follows:

  • I emphasise the possible use of relational algebra in database query languages (the previous introduction gave the impression it was only for internal purposes!), and I give examples of concrete syntaxes that have been devised and implemented on computers.
  • I hope this isn't controversial, but I changed all uses of "operation" to "operator" and "binary operation" to "dyadic operator". If it is controversial, I am willing to bow to a superior authority.
    • I think binary operator is a more generally understood term than dyadic operator. I would be in favor of changing that use of terminology back to the way it was. I do, however, agree that operator is more appropriate than operation. Davidfstr 18:48, 26 September 2007 (UTC)
  • I tried to clarify Codd's contribution.
  • I mentioned the all-important connections with predicate calculus.
  • I partially corrected the previous statement about completeness with respect to predicate calculus, but I left behind the bit about Horn clauses without recursion and negation because I didn't understand it well enough to work out if it is correct. I believe it is not correct, because negation is supported to some extent (via minus) and also because I thought that recursion was missing from FOPC anyway! I will ask around about this and possibly revisit this section later.
  • Oh, and I made the Introduction into a section with that heading.

I would welcome comments on this work.

AndrewWarden 18:48, 2 February 2006 (UTC)


Trying to split each of the operations into their own articles, as follows:

I notice that this operator is strangely missing from the article. AndrewWarden 17:16, 31 January 2006 (UTC)
I notice that this operator is strangely missing from the article. AndrewWarden 17:16, 31 January 2006 (UTC)
I hope this won't give Codd's division, which turned out to be a degenerate case of a much more interesting operator, first defined by Stephen Todd for ISBL and further developed by C.J. Date and myself. AndrewWarden 17:16, 31 January 2006 (UTC)

are you thinking of keeping this page as a good overview? this page was quiet helpful when I was going through intro to DB class.

I would like to support the comment above : the page as it is provides a very useful reference for students, especially those who have previously studied (but might now have forgotten) relational algebra.

I agree that having a central Relational algebra page is very useful. I wouldn't mind though if it gave only a brief overview of its key parts, linking to the above proposed articles for more in-depth coverage. Would this still be useful to students? -David Owen

Yeah, page stays as overview while each article of the different operations explains in depth. Joseph | Talk 04:33, 11 October 2005 (UTC)

I agree having a brief overview page and seperate articles for sub-topics. Makes eveything easier to find and those who are just starting will get the overview using loose terms like "Relational Algebra" GarethWatson 13:34, 27 October 2005 (UTC)

I agree that it was right to move the primative operations to separate pages, but it makes the page hard to follow now. Plenty of Wikipedia pages have a short description accompanied by a link to the full page, and I think that that solution would be best here. 14:44, 4 March 2006 (UTC)

N+1th vote for keeping short-ish descriptions on this page. I think the current state of the Outer Join section (chosen exclusively because I'm looking at it right now) is probably perfect, it's terse and accurate, and the simple example can be a significant aid to understanding. Groxx (talk) 14:52, 3 August 2010 (UTC)

Correspondence with first-order logic[edit]

The statement currently describing the expressive power of the given operators where it equates these with First-order logic can not be true. Look at the page for First-order logic. You will find that First-order logic is undecidable. I believe that with the relational operators given we have something isomorphic to Datalog or first order logic without complex function symbols, i.e. only zero-ary function symbols.

--Gavin Mendel-Gleason

Actualy Datalog is more expressive than relational algebra applyied to DB. This is due to the introduction of operators expressin the complement in term of negation as failure. Anyway it is true that Datalog is the logic programming language more close to relational algebra.

--Paolo Ceravolo

Satisfiability of relational algebra expressions is also undecidable in the sense that you cannot decide whether the result is always empty, but nevertheless you are right that the power of FOL with function symbols is greater than that of the relational algebra. -- Jan Hidders 10:34, 24 January 2006 (UTC)

I don't think it's incorrect, but could perhaps stand to be fleshed out/clarified. The full story requires talk about the interpretation/domain. In addition, a key element in the statement is the restriction to the "safe" parts of FOL (i.e. Codd's safety restrictions), although it is perhaps misleading to identify that restricted language as essentially the same as FOL.


A suitable relation algebra is known to be equivalent to FOL on infinite domains (this is a result of Tarski and Givant). However, the algebra described in this article is not the standard relation algebra used by mathematians; I'm not certain of the precise strength of this scheme. Randall Holmes 01:50, 28 January 2006 (UTC)

The statement in the form currently (Aug 17, 2008) made on the page is incorrect or at least strongly misleading. Relational algebra is in fact equivalent to relational calculus/first-order logic, however, there are some pitfalls. First, let me clarify that the observation that first-order logic is undecidable made above is orthogonal to this discussion: The decision problem of FO is indeed undecidable, but that problem asks whether a first-order sentence is a theorem (i.e., true on all possible databases over the given schema, including infinite ones, which we want to exclude). Equivalently, asking whether a Boolean (nullary) query in relational algebra is true on all possible databases is undecidable. But that's not the problem that we are interested in here at all. Query evaluation is easy for both first-order logic and relational algebra: in the case of first-order logic, this problem is usually referred to as the model checking problem, to distinguish it from the undecidable decision problem. The fact that relational algebra and calculus are equivalent is well known to be true, and is called Codd's Theorem. I just created the entry, see there. It is true that relational calculus requires restrictions to match relational algebra, but this is to exclude the domain independent queries. This restriction however is not needed because of the undecidability of FO but to clarify the meaning of the (infinite, when unrestricted) domain. First-order logic remains undecidable in the finite, so this is not really a restriction to manage undecidability.

-- cdrdata —Preceding unsigned comment added by Cdrdata (talkcontribs) 09:31, 17 August 2008 (UTC)

There are so many problems with this paragraph that I've removed it. First, it conflated the mathematical meanings with the relational database meanings (see the section Tuple ≠ tuple, relation ≠ relation below. The extension of a predicate is not a relation in the sense of the relational model. Then, logical conjunction is an operation on truth values, not on predicates. In standard mathematics, you can have the pointwise conjunction of two predicates, but applied to has-father and has-mother, (for example Abel has-father Adam, and Abel has-mother Eve), you then get the everywhere-false predicate. The join of a FATHER relation with a tuple (child: Abel, father: Adam) and a MOTHER relation with a tuple (child: Abel, mother: Eve), on the other hand, contains a tuple (child: Abel, father: Adam, mother: Eve).  --Lambiam 20:03, 24 August 2008 (UTC)

I find it quite bizarre that there is nothing in the article mentioning "first-order logic" or any of its synonyms. One needn't state that FOL and RA are isomorphic, but it seems one _must_ mention FOL's relationship, if not "correspondence" with RA. Jim Bowery (talk) 15:52, 17 April 2017 (UTC)

Indeed, the equivalence between relational algebra and relational calculus should be detailed. Feel free to do so :) Rp (talk) 21:30, 18 April 2017 (UTC)

Circular inclusion dependencies / foreign keys[edit]

The first example given between the employee and dept is not that great: it suggest a loop in the dependencies for the foreign keys: employee depends on dept, dept depends on employee. Loops in dependencies have to be avoided as they make the entities hard to save and the foreign key fields have to be nullable in all but one entity participating in the loop depencency. Please correct this example, by for example removing Manager from Dept and adding a column 'ManagesDept' to Employee which is another FK to dept.

--Frans Bouma

These considerations don't seem very relevant to me. The purpose of the example is just to illustratie the operations, not how to model. Besides, avoiding circular dependencies is not really part of the theoretical model as it is an artefact of the limitations of certain SQL DBMSs. Moreover, your fix is actually not that uncontroversial either as it introduces a column that must have null values. -- Jan Hidders 10:34, 24 January 2006 (UTC)

Natural join symbol[edit]

The symbol of the natural join in this article doesn't appear correctly in some machines (it appears as a square), as the author initially used some special symbol fonts that some systems may not have. Perhaps the author should consider changing the natural join symbol into an image, so that it can be displayed correctly in all system (previous comment originally left on main article page by (talk · contribs))

Or, we could use ×. It looks almost like the bow-tie symbol, and fits with the =×= syntax for outer joins. -- Mikeblas 00:39, 28 January 2006 (UTC)
I am unable to see the symbol as well; I am using Chrome and it is unable to auto-detect a good encoding. What character set is the symbol under? -- J.Dong820 (talk) 14:26, 17 December 2011 (UTC)


But I still don't see a definition of the relational algebra operator called projection, which should be equated to existential quantification in predicate logic. AndrewWarden 18:52, 1 February 2006 (UTC)
AW: I've just noticed projection (relational algebra) for the first time, with a history indicating that it's been there for quite some time. I'm puzzled, as I couldn't find it until today. Sorry for any confusion. Now that I've seen it, I see that there are some minor changes I'd like to make. The SQL reference isn't quite right, and there ought to be a note about implementations of projection in computer languages (in particular, the possibility to mention the attributes to be excluded rather than those to be included, the exclusion case being slightly closer, psychologically, to existential quantification). But I'll hang fire until I hear that you've finished what you want to do.
  • JA: I don't think I'll be touching the relational algebra or relational model articles, except cosmetically, anytime soon, as I'm still refreshing my memory and reviewing newer literature on these more concrete aspects that I last worked with many odd years ago. The relation (mathematics), projection (set theory), and related articles will be pretty standard stuff, from a slightly combinatorial perspective, and so should be pretty independent of what you need here, though I will eventually want to think a little more about the relation between what I think of as "abstract types", that have shapes like X_1 x … x X_k, and "concrete types", where you have meaningful attributes to worry about. But it's way too soon for that. Jon Awbrey 16:22, 2 February 2006 (UTC)
  • MVC: I am possibly getting this wrong, but is projection really distributive over intersection? Example: but . Feel free to erase this comment if it is not correct, or if this is not the right place to put it (this is my first time editing wikipedia, sorry). Thanks —Preceding unsigned comment added by (talk) 12:20, 13 May 2009 (UTC)

Joins, natural and otherwise[edit]

From Right Outer Join: The right outer join can be simulated using the natural join and set union as follows:

R X= S  =  S ∪ (RS)

From Set Operators: Although three of the six basic operators are taken from set theory, there are additional constraints that are present in their relational algebra counterparts: for set union and set difference, the two relations involved must be union-compatible — that is, the two relations must have the same set of attributes.

The two clauses above are incompatible. What is the correct way of simulating the outer joins with the more primitive operators?

Jon Thoroddsen 14:42, 24 February 2006 (UTC)

Voila! Jon Awbrey 18:01, 31 January 2006 (UTC)

Use \bowtie: Drowne | Talk 00:00, 7 July 2006 (UTC)

Joins, natural or else[edit]

  • JA: Here's the best that I could do for the θ-join after a day's fiddling with TeX. For some reason the negative space trick does not work in a matrix format, so the center does not hold the two vortices together!

  • It's called \bowtie.
  • JA: Thanx, and a tip o'th' hat! I knew what it was called, I just didn't know how to tie one! Jon Awbrey 22:24, 1 February 2006 (UTC)

left outer join[edit]

I think that results shown in example of left outer join in the table is wrong - two rows are missing - those with DeptName = "Sales"

Missing Symbols[edit]

Hello, I'm reading with IE7 (unusual?!) and many of the symbols do not appear.

eg In "Set operators", I see

  • R × S = {r BOX s| r BOX R, s BOX S}

(I guess that the last two boxes are espilons, but what is the first?)

In "Natural join", I see

RS = { t BOX s : t BOX R, s BOX S, fun (t BOX s) }

And again in the equations throughout the article.

I would go through and fix this, but, as I say, I'm not sure what symbol was intended in some places (and mozilla's not working here at the moment). Sam Staton 20:51, 13 February 2007 (UTC)

I have tried to fix this problem wherever I could spot instances of it on the page. Hope this solves the problem for IE7 users.
--Dessources 13:21, 17 August 2007 (UTC)


I think the sentence "and we wish to find the branch name with the highest balance, we would write Branch_NameGMax(Balance)(Account)" is incorrect. From what I understand, this will find the max balance in each branch. —The preceding unsigned comment was added by (talk) 03:15, 19 February 2007 (UTC).

Agree, and changed it (revision 10:01, 21 April 2008). Jonas Wagner

Division Results[edit]

Something seems wrong -- Ulricp (talk) 09:46, 17 May 2016 (UTC)

A (Database1, Database2) x B (Fred, Sara) does NOT give us C (Completed)! Original contributor, please fix that.

Why is Eugene missing? --Weedrat 14:18, 6 May 2007 (UTC)

The division (Completed ÷ DBProject) gives the Students who have entries (in the Completed table) for all the tasks in the DBProject table. In this case the DBProject table only contains two tasks: Database1 and Database2. You see that (Fred, Database1) and (Fred, Database2) are both in the Completed table, therefore Fred is in the result. Similarly, (Sara, Database1) and (Sara, Database2) are both in the Completed table, therefore Sara is in the result. Eugene doesn't have an entry for Database2, so Eugene is not in the result. Egriffin 22:01, 9 November 2007 (UTC)
So, there's a mistake here, then? The result table shows Fred and Eugene, not Fred and Sara. I'll change it. (talk) 05:57, 8 February 2011 (UTC)

Semi Join and anti join[edit]

Shouldn't the "Manager" column be there as well or am I wrong here? -Weedrat 13:34, 8 May 2007 (UTC)

Unicode code points[edit]

Is the code point for a bowtie really notable? It's the only occurrence of "Unicode" and "U+" in the article. Should we mention code points for every symbol used? I don't think it belongs in an article about mathematics — I can't recall other articles about maths mentioning code points (equally we could have "In LaTeX, it's \bowtie"). ⇌Elektron 14:44, 13 September 2007 (UTC)

SQL Analogies[edit]

Relational algebra has numerous analogies in SQL implementations. As a large number of database developers are well-versed in the SQL language, and that it's likely that this page would be visited by people who are very familiar with industry language knowledge, it would be useful, where direct analogies exist, to add SQL equivalents to the relational algebra examples.

I would be happy to do this pending approval of each example here in "Talk."

Please note that this is a property of SQL itself, given the semantics of SQL, and not specifically of SQL implementations. The term "analogy" is a bit out of place. Languages like FORTRAN and Java incorporate operations from arithmetic like addition and multiplication; these are not "analogies" of the arithmetic operations. Likewise, SQL incorporates operations from the theory of relational databases, which is based on relational algebra.  --Lambiam 08:59, 17 January 2008 (UTC)

Tuple ≠ tuple, relation ≠ relation[edit]

In a mathematical context, a tuple is an element of some Cartesian product, and a relation is a set of tuples forming a subset of some Cartesian product.

In relational database theory, these terms have related but different meanings. Each of the components of a tuple (also known as its fields) has a unique field name associated with it. Where (34, true) is a mathematical tuple, (age: 34, male: true) is a database tuple, also known as "record" or "row". It can be formally defined as a finite function that maps names to values, essentially the same as an assignment. A relation in relational database theory is then a set of such rows, all with the same set of component names, and for a given component name normally all with the same type of value. Where {(34, true), (40, false), (66, true)} is a mathematical relation, {(age: 34, male: true), (age: 40, male: false), (age: 66, male: true)} is a database relation.

Unfortunately, the article does not carefully distinguish between these meanings where it should.  --Lambiam 19:37, 24 August 2008 (UTC)

An important distinction worth correcting, I'll see what I can do. Kareeser|Talk! 04:29, 5 February 2010 (UTC)
Actually there is both a named and unnamed perspective for tuples in the relational model. See the "Alice" book [1], pp. 32-33. There's caveat though: "this leads to different but equivalent sets of natural primitive algebra operators for the two perspectives." Tijfo098 (talk) 17:46, 9 November 2012 (UTC)


In the section on "Selection and projection", "elided" seems like a rather obscure word to use in the sentence "since projection may produce fewer tuples due to the elimination of duplicates resulting from elided fields". Would it not be clearer to simply use the word "omitted"? —Preceding unsigned comment added by (talk) 00:07, 30 January 2009 (UTC)

Done. Tijfo098 (talk) 02:33, 22 November 2012 (UTC)

Theta join[edit]

The article (as of 13 Jan 2011) says that the theta-join is only defined if the attributes of R and S are disjoint. This may not be the case, since any particular theta-join on R and S will simply produce a subset of the relational Cartesian product R x S. Usually columns in R and S that share the same name are disambiguated in the new relation (R x S), by naming columns R.colname, S.colname, so the issue where the attributes of R and S being disjoint doesn't seem to arise at all.

See for example: —Preceding unsigned comment added by (talk) 21:59, 13 January 2011 (UTC)

SQL is "loosely based" on relational algebra?[edit]

The article states that SQL is loosely based on relational algebra. I'm an extreme beginner using SQL (I've only ever used mySQL) and am now taking a course in relational algebra and as far as I can see, SQL owes a lot of its foundation to RA with only the syntax being largely different. I wouldn't even consider myself a novice with either so I could be hilariously wrong but I'd love to see someone expand on the "Interpretations" heading in the article. Then again, this isn't a SQL article so I'd understand if no one else agreed. (talk) 20:13, 13 October 2011 (UTC)

Yes, SQL is based on relational algebra; "loosely" because it does many things that deviate from that foundation, and that "soil" its principles. Examples:
  • SQL tables and query results can contain duplicate rows; relations cannot;
  • SQL can employ the order of rows in a table (e.g. with the `LIMIT` clause); in relations no order exists;
  • SQL treats NULL in a fairly ad hoc way;
  • SQL views aren't defined in a relational way
etcetera. Rp (talk) 17:15, 26 October 2011 (UTC)
I've added something to article about bags. Tijfo098 (talk) 21:09, 22 November 2012 (UTC)

Some comments[edit]

[I am Hugh Darwen, who is mentioned in the implementations section of this article. I see that I did some work on this article back in 2006 but I had forgotten that when I wrote these comments.]

The article is quite good but I have some comments that I think should be addressed. I might have a go at editing the article myself but I'm not sure I'm up to that yet because the notation it uses is awkward for me.

1. The section on natural join is incorrect in equating that operator to "equijoin". I think the statement should just be deleted. Equijoin is correctly covered in the section on theta-join.

2. Extension (like rename and others, added by ISBL) is arguably needed for completeness--just as arguably as selection is, for that matter. Without it, we would have to achieve the same effect using natural join with relations representing operators such as "+", and such relations are infinite (or at best hopelessly large if we assume finite domains). If extension is taken as primitive, then attribute renaming can be expressed in terms of extension and projection.

3. Selection is the term originally used by Codd but since SQL came along and used the key word SELECT for a different purpose, this operator is usually referred to as restriction (and spelled WHERE in English notations).

4. I think the mention of foreign keys in the section on natural join is de trop. The mention of the use of rename to make common attributes is good, but the impression should not be given that it's applicable only when the common attributes constitute a foreign key. In any case, relational algebra is not concerned with database constraints.

5. The example given for selection is needlessly clumsy. "=true" can simply be omitted wherever it appears. It would be much clearer to use a simple comparison such as "birthdate<hiredate" for the restriction condition.

6. If non-primitives such as outer joins and semijoins are to be included, then I would add composition, where the composition of r1 and r2 is their natural join projected on all but the common attributes.

7. The section on projection fails to mention its connection with existential quantification in logic. Also, it should perhaps be mentioned that in a computer language we would expect projection to be alternatively expressible in terms of attributes to be excluded (corresponding to predicate variables being bound by quantification), as it was in ISBL.

8. In general, the operator definitions are rather imprecise. For example, the definition of projection seems to state that each tuple in the result is a set of attribute names (rather than attribute values).

9. The formal definition for natural join puzzles me where it states "where Fun is a predicate that is true for a relation r if and only if r is a function." The argument to the invocation of Fun in the definition appears to be a tuple (the union of tuples t and s), not a relation. I normally teach that the union of those tuples must be a tuple, meaning that it can't contain more than one attribute value for the same attribute name. I suppose it is regarding a tuple as a binary relation mapping attribute names to attribute values, in which case we do require that relation to be a function of attribute names, but the definition would allow it to be a function of attribute values, which is neither sufficient nor necessary. In any case it is a bit confusing to treat a tuple as both a relation and a member of a relation.

10. The section on natural join uses the term "columns" when referring to attributes.

Enough for now. I haven't scrutinised all of the text and I may come back with more later.

Hugh Darwen (AndrewWarden (talk) 12:48, 15 March 2012 (UTC))

Deleted section on EXTEND[edit]

That's apparently a Tutorial D keyword [2] (also PL/SQL one, but it does something else in PL/SQL: extends collections, which are obviously not relations). The Tutorial D keyword is stuff nobody cares about but Date. Tijfo098 (talk) 02:05, 22 November 2012 (UTC)

Actually, the section did have a point; I hope I have expressed it more clearly now. Tijfo098 (talk) 21:09, 22 November 2012 (UTC)

Outer Joins[edit]

The Symbols for the three different types of outer join do not display in the latest version of Chrome. I don't know Wikipedia well enough to know if there's a potential way to fix this, perhaps someone else does? They display fine in IE. ChrisInUK (talk) 11:56, 4 April 2013 (UTC)

Wrong formula[edit]

In section Set operators I believe the formula for Cartesian product might be wrong:

R × S = {(r1, r2, ..., rn, s1, s2, ..., sm) | (r1, r2, ..., rn) ∈ R, (s1, s2, ..., sm) ∈ S}

I believe it instead ought to be:

R × S = {r1s1, r1s2, ..., r1sm, r2s1, r2s2, ..., r2sm, ..., rns1, rns2, ..., rnsm | r1, r2, ..., rnR, s1, s2, ..., smS}

which simply isn't a practical representation. More practical would perhaps be:

R × S = { rs | rR, sS}

Rursus dixit. (mbork3!) 04:51, 30 September 2013 (UTC)

I don't understand why you might say that. — Arthur Rubin (talk) 07:18, 30 September 2013 (UTC)
I don't understand what you don't understand, so I cannot let that objection guide me in what I feel is wrong. Cartesian_product#A_two-dimensional_coordinate_system. Let's take an example on what's a Cartesian product is in set algebra:
Let R={a, b, c} and S={d, e, f}, then R × S is:
R × S = {(a,d),(a,e),(a,f),(b,d),(b,e),(b,f),(c,d),(c,e),(c,f)}
The difference between set algebra and relational algebra is that the tuple (x,y) (note that the parentheses are tuple constructors!) is not used but instead the union x∪y. This indicates that in difference from the set algebra Cartesian product R×S={(r,s)|r∈R∧s∈S} the relational operation should be R×S={r∪s|r∈R∧s∈S}. Rursus dixit. (mbork3!) 07:36, 30 September 2013 (UTC)
I retract. After some consideration I now realize the thinking behind
R × S = {(r1, r2, ..., rn, s1, s2, ..., sm) | (r1, r2, ..., rn) ∈ R, (s1, s2, ..., sm) ∈ S},
I think it is an odd and misleading way to define Cartesian product, avoiding to mention the fact that each integer position in the tuples (r1, ... rn, s1, ... sm) versus (r1, ..., rn) versus (s1, s2, ..., sm) represents different column labels, alternatively that each element ri, sj contains the cell label, f.ex. r12 = (name: Jim Jones), then name being the label of the cell item r12.
An improvement somehow would be welcome. I'll ponder ... Rursus dixit. (mbork3!) 08:22, 30 September 2013 (UTC)
You describe tuples as lists of their member values (which the article calls the unnamed perspective), while it's more common to describe tuples as mappings from column names to values (the named perspective). In the named perspective, members of tuples do not have a position, only a name. It is only in the named perspective that the cartesian product of two relations can be expressed as the pairwise union of their tuples; in the unnamed perspective, it is the pairwise concatenation of their tuples. In neither case, the union of the members of tuples is ever taken, like you propose. Does this clear things up? Rp (talk) 10:30, 1 October 2013 (UTC)

Definition of semijoin[edit]

The formal definition of semijoin given does not seem like a well-formed set description:

R S = { t R, s S, Fun (t s) }

Would not it be better to define semijoin like the following, i.e. as antijoin with unnegated existence quantifier (which also seems to connect well with the informal description of semijoin given in the text)?

R S = { t : t R s S : Fun (t s) }

KarlPettersson81 (talk) 20:18, 21 January 2014 (UTC)

deleted content in introduction[edit]

I deleted two paragraphs that were out of context and related to best practices around schema design, rather than relational algebra specifically. — Preceding unsigned comment added by WKALT (talkcontribs) 04:09, 25 March 2015 (UTC)