Talk:Candidate key

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Computing (Rated Start-class)
WikiProject icon This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Start-Class article Start  This article has been rated as Start-Class on the project's quality scale.
 ???  This article has not yet received a rating on the project's importance scale.
WikiProject Databases / Computer science  (Rated Start-class, Mid-importance)
WikiProject icon This article is within the scope of WikiProject Databases, a collaborative effort to improve the coverage of database related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Start-Class article Start  This article has been rated as Start-Class on the project's quality scale.
 Mid  This article has been rated as Mid-importance on the project's importance scale.
Taskforce icon
This article is supported by WikiProject Computer science (marked as Mid-importance).


A candidate key is a concept in the relational model of data. Any set of attributes upon which all attributes of a relation are functionally dependant is a candidate of this key is made primary key.

That's not the usual definition of candidate key, but of superkey. Who gave this definition? -- Jan Hidders 23:10, 25 Apr 2004 (UTC)

You cannot infer candidate keys from a single instance of a relation. It is too easy to insert a tuple and blow the keys good-bye. This example should go away.

Just deleting it would be a bit blunt, as it does make an important point. I have put it back with some explanation. -- Jan Hidders 14:50, 2 Nov 2004 (UTC)
From AndrewWarden 16:33, 3 February 2006 (UTC):
I noticed that yesterday somebody actually deleted the example and placed a line of gibberish in its place, resulting in that action being cancelled out by an editor. I wonder if the reason for the attempted deletion was the logical error I found and tried to correct. The error was an observation that you might be able to conclude that a certain subset of the heading of a certain "instance", having the uniqueness property, is not a CK because it in turn has some proper subset having the uniqueness property in that "instance". That's quite wrong, as explained in my replacement text.
I do completely agree with Ejrrjs's remarks below, but the observation that you can sometimes infer that some subset is not a CK is valid, so I've left it in. I don't think it's a very useful observation as far as database design is concerned, but it might be useful educationally. I would not object to deleting it.

Suppose that the relation also has the following instance:

a1 b1 c1 d1
a1 b1 c1 d2
a1 b1 c2 d1
a1 b2 c1 d1
a2 b1 c1 d1

Now the only viable key for the ABCD schema is ABCD. This means that we have to restrict somehow which instances are valid. We do that by means of functional dependencies. But these FDs have a semantic value, as in the real world A, B, C, D stand for product#, year, color, size, etc. Hence A->B is a property of the real world and cannot be inferred from a sample of instances, it has to be developed from a business model that states what makes sense to the user(s) by means of whatever elicitation technique is useful for that user situation. Afterwards, from your (complete) set of basic FDs you can automatically infer, using Armstrong rules, all valid FDs for the schema AND all valid candidate keys. Am I making myself clear? This is why I think it is misleading to suggest that you can obtain the candidate keys from one isolated instance of that schema. If you need references, Ullman, Helman, Navathe,'ll get them.

Kindly, Ejrrjs 23:07, 2 Nov 2004 (UTC)

Er, yes, that hadn't escaped me, but the point was that instead of just deleting the example and trying to impress me with your expertise it would be so much more nicer if you used that expertise to improve the example. It's not every day that someone with a PhD in database theory passes by here, you know. :-) So tell me, how do you like the article now? -- Jan Hidders 00:04, 4 Nov 2004 (UTC)

This example in the article isn't too good in my opinion. An example of a sample normal table could have explained it better.-- 07:47, 31 October 2006 (UTC)

I also feel that the example provided is not simple to foll. as Mr. suggests, maybe example with real-life data will help —Preceding unsigned comment added by (talk) 12:11, 18 February 2009 (UTC)

The examples for candidate key actually finally cleared up some confusion related to defining them in terms of functional dependencies, as done in the 2NF and 3NF entries. By removing the FD part of the definition and giving an example based strictly on two given relations, the my questions were answered. Please do not remove this helpful definition of candidate key. --RS — Preceding unsigned comment added by (talk) 19:22, 7 June 2012 (UTC)

Foreign keys[edit]

I'm a bit scratchy on my theory here, but I think the following statement is wrong:

Foreign keys, for example, are usually required to reference such a primary key and not any of the other candidate keys.

If I understand candidate keys correctly, then most RDBMS do allows foreign key relationships with any candidate key that has a unique or primary key defined. John Vandenberg 08:50, 20 February 2007 (UTC)

Incomprehensible sentence about RDBMS[edit]

I am a computer scientist, but still I found this article particularly hard to undertand. I have tried to make the lead section (at least) a bit less opaque, but the following paragraph got me confused:

Since in some RDBMSs tables may also represent multisets (which strictly means these DBMSs are not relational),

But, the lead paragraph has defined "candidate key" for *relations*. The lead section also observed that relations are sets, hence they cannot have repeated tuples, hence they always admit a candidate key. But then, if an RDBMS is not a relation, the definition does not apply, and this observation is false. So either the RDBMS should not be mentioned here, or the lead paragraph should be rewritten so that the definition can be used for them too. The paragraph goes on to say

it is an important design rule to specify explicitly at least one candidate key for each [RDBMS] relation.

But that is obviously impossible if the relation is a multiset!
So what shall we make of this? All the best, --Jorge Stolfi (talk) 01:52, 11 August 2009 (UTC)

I agree that this is written at an academic level and not very useful to most database practitioners. It's possible to understand database design and build well-formed, functional databases without a tenured chair is relational theory. It would be nice to have a second section devoted to the practical meaning —Preceding unsigned comment added by (talk) 18:28, 10 November 2010 (UTC)

Recommend merging with Unique key[edit]

I recommend that this article should be merged with the Unique key article, as a subsection of that article. (talk) 03:01, 27 March 2010 (UTC)

I disagree with this idea. Both a primary key (PK) and a unique (or alternate) key (UK or AK) are both candidate keys. Said differently, candidate key is an encompassing idea to those two subset ideas. It would be a mistake to simply merge this with UK, without also including PK in a new single entry. As there are commonalities and differences among these three concepts, and with candidate key not part of actual DBMS definitions (which both PK and UK share), it seems difficult to imagine what a single entry would look like.

Keeping separate entries for the three keys, properly linked, is a best approach (IMO). —Preceding unsigned comment added by (talk) 18:54, 30 March 2011 (UTC)

I also disagree with the merge. --Wayiran (talk) 02:13, 13 April 2011 (UTC)

Relation vs. Relvar[edit]

The article intro implies that what possesses a candidate key is a relation. Whereas the body of the article says that what possesses a candidate key is a relvar - "We have to consider all the relations that might be assigned to a relvar to determine whether a certain set of attributes is a candidate key."

I certainly side with the "relvar" interpretation. Though confusingly, in Codd's definition of "candidate key" in his "Further Normalization of the Relational Model" article, candidate keys are defined with respect to relations. (I'm aware the term "relvar" didn't exist back then.)

It's a problem. We get a lot of discussion-page comments on the NF articles where people express confusion about why the candidate keys in our example tables are what they are, given the data in those tables. I tell them that the data currently in the tables doesn't matter; you can't deduce candidate keys from that. And yet Codd's original definition could be seen to reinforce their (from my point of view) mistaken interpretation.

Perhaps the article should acknowledge that this is an area of confusion, and address it explicitly. --Nabav (talk) 09:17, 15 May 2010 (UTC)

poor example[edit]

The example shown is largely visual in nature, and is incomprehensible to someone using alternate display technology (such as adaptive technology for the blind). It needs explanatory text added. — Preceding unsigned comment added by (talk) 02:15, 14 April 2012 (UTC)