Jump to content

Talk:First normal form

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by 65.117.116.130 (talk) at 21:34, 29 December 2010 (What if NULL is part of the applicable domain?). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Primary keys with informational content

A commonly overlooked violation of 1NF is building intelligence into an identifier. The VIN for an automobile is not atomic, besides a sequence number it contains:

1) A code identifying the region where the vehicle was manufactured;

2) A code identifying the manufacturer;

3) Several codes identifying attributes of the vehicle;

4) A code indicating the vehicle model year;

5) A code indicating the plant where the vehicle was made;

Sadly, the VIN is probably the primary key to many tables in existence today. Business persons have a natural tendency to build inteligent keys. There is no mention of this in the main article, it only addresses the classic textbook example of eliminating redundant columns. Mooredc 17:56, 16 August 2006 (UTC)[reply]

GUIDs contain (or, did in the past) a mac address and a timestamp, does that make a GUID a bad key? Isn’t the question more to do with whether that “intelligence” is actually used for anything? —Random832 20:33, 19 September 2007 (UTC)[reply]
In August 2006 when Mooredc wrote this, the article did not contain a section on atomicity; now it does, and hopefully it gives a sense of why Date and others consider it a mistake to hang one's definition of 1NF on the notion of atomicity. 1NF aside, there are many situations in which building "intelligence" into a domain - particularly when it's a primary key column's domain - is a bad idea. If CAR_ID = "RED16", and the "RED" substring refers to the colour of the car in question, then we have a dilemma when someone paints the car blue. The identifier isn't much of an identifier if we are compelled to alter its value from time to time; on the other hand, if we keep its value the same in these types of situations, its meaning becomes muddled. Further, if knowing the colour of the car is important, we may as well expose the colour to the RDBMS by making the colour a column in its own right. You are exactly right when you say that what matters is whether the intelligence is used for anything - if it is, we get problems of the type described; if it isn't, we don't. --Nabav 22:03, 2 October 2007 (UTC)[reply]

I don't know if this is the right place to add my two pence, but I was wondering if it wouldn't be a good idea to have some kind of links in the bottom of the article to move from 1NF to 2NF to 3NF and so on, instead of having to go back to the normalization in order to move from the article on one normal form to the other. ray 17:04, 21 August 2006 (UTC)[reply]

Glad I'm not the only person thinking that. I'll get on with that template.--VinceBowdren 18:04, 21 August 2006 (UTC)[reply]
Obviously, your comment here did get noticed. I've been treating Talk:Database normalization as the place for discussion of this set of articles in general, though I and the others working in this area have all the articles on our watchlists anyway. I don't think its worth setting up a WikiProject for this few linked articles.--VinceBowdren 19:12, 21 August 2006 (UTC)[reply]

What if NULL is part of the applicable domain?

I mean, take this:

First Name Last Name Middle Name
John Public Q
J Hacker Random
John Doe

“Has no middle name” is a valid answer, yet having a nullable column seems to violate that requirement —Random832 20:28, 19 September 2007 (UTC)[reply]

The first point to make is that Null is never part of the applicable domain. The Null (SQL) article puts this succinctly: 'Since Null is not a member of any data domain, it is not considered a "value", but rather a marker (or placeholder) indicating the absence of value.'
Second, although you are completely right in saying that "has no middle name" is a valid answer, it does not follow that a Null in a Middle Name column is the only way of expressing the proposition. There are other ways of doing it. A logically-sound way that does not involve Nulls is:
Person ID First Name Last Name
1 John Public
2 J Hacker
3 John Doe
4 George Bush
Person ID Middle Name Sequence Num Middle Name
1 1 Q
2 1 Random
4 1 Herbert
4 2 Walker
Notice that with this method we also allow a person to have any number of middle names.
The two tables above are in 1NF even by Date's stringent definition. --Nabav 06:51, 2 October 2007 (UTC)[reply]
Agreed. Also even the table in the primary example needn't necessarily contain a Null value in the sense mentioned in the Null (SQL) article to present as such. It could, in fact, contain instead a single character string comprised solely of the "nul" character (\0). Nul is a legitimate member of the domain the larger of characters (generally serving to mark the end of a variable length string) and is logically separate of the "Null" value (i.e. an empty value set) MerlinYoda (not signed in) 65.117.116.130 (talk) 21:34, 29 December 2010 (UTC)[reply]

Seems like the use of the word tuple is inconsistent

The article contains the following:

A view whose definition mandates that results be returned in a particular order, so that the row-ordering is an intrinsic and meaningful aspect of the view.[5] This violates condition 1. The tuples in true relations are not ordered.

However, when I followed the link for tuple, the formal definition said that tuples are ordered. —Preceding unsigned comment added by 84.75.117.176 (talk) 10:44, 30 September 2007 (UTC)[reply]

The formal definition is talking about how the values within a tuple are ordered, whereas here we're talking about something different, namely the possibility of tuples being ordered with respect to each other. I've changed the wording to make this more clear. --Nabav 22:49, 1 October 2007 (UTC)[reply]

1NF Example is not appropriate

Please revise the 1NF Example as it can be an example of 2NF also. —Preceding unsigned comment added by 122.164.254.49 (talk) 11:14, 1 October 2007 (UTC)[reply]

done.Heathcliff (talk) 03:39, 8 February 2008 (UTC)[reply]
There was nothing wrong with the 1NF example being in 2NF as well. As has been explained before on the discussion pages of some of the other NF articles, and in the main database normalisation article, normalisation is not an iterative process: we do not normalise to 1NF, then to 2NF, then to 3NF, etc. On the contrary, if a design problem prevents a table from meeting Nth Normal Form, correction of the problem typically causes the revised table(s) to meet not only Nth Normal Form, but the higher normal forms as well (thus for example the revised table(s) will be overwhelmingly likely to be in 5th Normal Form). Thus I've reverted the example back. Having said all this, I believe the article DOES need some additional material that briefly notes that some 1NF tables suffer from problems, and that the job of 2NF is to address those problems. An example of such a 1NF table could be given. All of this would be in a separate section called something like "Normalization Beyond 1NF", and, of course, a link to the 2NF article would be given within that section. --Nabav (talk) 18:49, 29 February 2008 (UTC)[reply]
I've added the "Normalization Beyond 1NF" section now. --Nabav (talk) 19:41, 28 April 2008 (UTC)[reply]

How is the example given under the heading "Normalization Beyond 1NF" a 1NF table? IF you assume the customer ID is the primary key here, then isn't there a violation of having unque key in rows 2 and 3? - Henry —Preceding unsigned comment added by 192.193.164.8 (talk) 07:52, 7 August 2008 (UTC)[reply]

To assume that the customer ID is the key is to assume that Telephone Number must be functionally dependent on Customer. It isn't. The reason it isn't is that multiple telephone numbers can legitimately correspond to a single customer in our example. Upshot: the key is {Customer ID, Telephone Number}. I've made this explicit in the article now. --Nabav (talk) 21:02, 9 August 2008 (UTC)[reply]
As someone new to 1NF I find this confusing, because your example here is essentially identical to the original problem in the 1st example. If you are saying that another acceptable solution to make the table conform to 1NF is to alter the key, then that needs to be explicitly stated in the article. —Preceding unsigned comment added by 67.85.254.145 (talk) 16:57, 29 March 2009 (UTC)[reply]
Point taken, it was confusing. I've changed the example. --Nabav (talk) 16:50, 31 March 2009 (UTC)[reply]

Atomicity

Seems like there should be some mention of the importance of atomicity instead of just explaining that Date disagrees with Codd and listing the ways atomicity can be taken to extremes. Ostensibly, Codd was trying to say that putting a field in a field isn't generally a good idea. For example, you wouldn't want a flight number field to contain the codes for the airports of departure and arrival. That's the common sense way to interpret atomicity. Seems like that should be mentioned first instead of not at all. Then talk about all the theoretical mumbo jumbo that has limited practical value to someone just learning about 1NF. --Trweiss (talk) 22:09, 12 May 2008 (UTC)[reply]

Cases in which somebody wants to place more than one of the same type of thing in a single field have been covered in the "Repeating Groups" section of the article. The case you mention, involving placing multiple different types of things (i.e. departure airport and arrival airport) in the same field, arguably hasn't been covered. And covering it might be a good way to introduce atomicity. As an intro to the Atomicity section, we could mention something like the departure airport / arrival airport case, and make the point that this is the sort of thing people are trying to discourage when they encourage "atomicity". So in summary: I agree! We just need to be careful not to give the reader the impression that atomicity is a clearer and more well-defined concept than it actaully is. Atomicity is in the eye of the beholder (a string field can contain multiple letters of the alphabet combined together into words, for example, and no one complains about that) ... If the smaller components of the data within the field are not worth separating out as fields in their own right, then don't; if they are, then do. --Nabav (talk) 08:10, 13 May 2008 (UTC)[reply]
Yes Date disagreed with Codd about atomicity, but he expresses what he means no less obscurely than Codd did. His 4th condition "Every row-and-column intersection contains exactly one value from the applicable domain (and nothing else)" is just Codd's atomicity condition (with something intended to help justify his exclusion of nulls tacked onto the end in partentheses) and I am sure Codd would have agreed with Date's wording (apart perhaps from the parenthetical ending). Codd said that a value in the domain should not be decomposable except by some special functions; Date said decomposable values are usually not allowed but are allowed sometimes. I actually found Codd's ideas less vague and undefined than Date's, because it was quite clear that Codd knew what he wanted to forbid (and, in my opinion, quite clear what that was). It's not at all clear what Date wanted to forbid, or even that he had any fixed idea about what he wanted to forbid (apart from forbidding NULL). Nabav's last sentence above seems to be quite a good description of Codd's objectives in atomicity, and also of what I think Date was probably trying to say, and probably ought to be in the article.MichealT (talk) 13:21, 7 March 2010 (UTC)[reply]

1NF tables as representations of relations

The requirement that tables represent mathematical relations (no duplication or implicit ordering of rows) is not part of 1NF, but is prior to the idea of 1NF; if Date says otherwise he is wrong. The article should reflect Codd's original article, which proceeds in two steps: first, the proposal to base data modeling on mathematical relations, and second, the proposal of "normalization", later called 1NF, which is the systematic elimination of "non-simple" value domains. The article should not confuse these two things like it does now. Rp (talk) 14:50, 3 February 2009 (UTC)[reply]

In general, our NF articles reflect the original text in which the NF was introduced. The original text typically provides an absolutely rigorous definition of the NF. Not so in the case of 1NF, however. Codd never supplied a rigorous definition of 1NF - whereas, in his "Further Normalization..." article, he DID supply rigorous definitions of 2NF and 3NF.
What you get in Codd's "Relational Model of Data for Large Shared Data Banks" article is a rather informal description of a procedure which Codd called normalization. The description of the procedure is reliant on highly ambiguous notions such as that of a "non-simple domain", which is related to the equally ambiguous notion of "atomicity". Because of this ambiguity, it is quite impossible to make definite judgments as to whether a table is in (Codd's) 1NF or not.
Date's definition is not an attempt to express Codd's definition in different language. Rather, it is an attempt to define a version of 1NF that rests on unambiguous criteria and is (from Date's point of view) sensible. Fidelity to Codd's intentions is not particularly relevant here. Date disagrees with Codd on certain points, and would not claim that his definition of 1NF is in line with all of Codd's intentions (whatever they may have been).
Incidentally, I believe that in stipulating as part of his 1NF that a table has to be isomorphic to a relation, Date IS in agreement with Codd. The distinction you are making - between "a requirement prior to X" and "a requirement of X" - cannot hold. "A requirement prior to X" clearly IS a requirement of X. For example, the property of "being a fisherman" is a property of human beings, and I'm sure you'd agree that the idea of being human is prior to the idea of being a fisherman. But we speak the truth if we cite "being human" as one of the criteria for being a fisherman. Most definitions would not bother mentioning it, but a definition that does mention it is not wrong. ("A human being who fishes by profession" - this would not be wrong.)
This is exactly what Date is doing. And one can see why. Both he and Codd think that 1NF is a property of relations - well then, anything that is not a relation clearly cannot be in 1NF. A table with duplicate rows is not a relation. Therefore a table with duplicate rows is not in 1NF. Of course, at the time Codd was writing, there was no point even mentioning this, because SQL tables (which of course can accommodate duplicate rows if no unique key constraint has been defined) did not exist at that time! But they certainly do exist now, which is why it makes sense to say explicitly that a table that can accommodate duplicate rows is not in 1NF. --Nabav (talk) 15:21, 6 February 2009 (UTC)[reply]
I disagree. Regardless of his inaccurate description, Codd's notion of 1NF is a property of mathematical relations, namely the absence of "non-simple" attribute values, and this is how mathematical treatments of relational databases use the term. Therefore, Date's extension of the notion, which includes certain requirements on implementations of mathematical relations, is confusing, and the article should be amended to correct this. Wikipedia should explain terms as they are normally used, and not restrict itself to particular ways of using them, especially when they are not the most common. At the same time, Date's books are so popular that his use of the term must also be explained here. Rp (talk) 10:03, 9 February 2009 (UTC)[reply]
Perhaps you are right that the article currently doesn't devote enough attention to Codd's original account of 1NF. This can be corrected. However, I'd just sound one note of caution. When you say that Wikipedia should explain terms as they are "normally" used, I basically agree with you; but I don't agree that the term "first normal form" is normally used in either the way Codd uses it in his "Relational Model of Data for Large Shared Data Banks" article, or in the way Date uses it. The normal usage of the term, the most popular usage I have come across, is as a piece of practical database design guidance, according to which (for instance) examples 2 and 3 in this article are "violations" of 1NF - even though none of the attributes in those examples are of type RELATION or TABLE. The best one can say about such examples is that they somehow contravene the spirit of 1NF, if not the letter. Getting the right balance in an article about 1NF is very tricky, because the article must do justice not only to popular examples of that kind, but also to Codd's original characterization of 1NF, and also to Date's new definition of 1NF - all with a view to illuminating the actual implications of 1NF for databases, because those implications are what give 1NF its importance in the first place. Each of those three understandings of 1NF is controversial in its own way. Nevertheless, all we can do is continue to tweak the article until we do get the balance right! --Nabav (talk) 11:23, 9 February 2009 (UTC)[reply]

Isomorphic to a relation

There is a blatant contradiction between the between "isomorphic to a relation" and number 2 in the list of 5 conditions. A relation is nothing more or less that a subset of the cartesian product of several domains: each element is therefore an ordered finite sequence of values. The attributes don't have names. If a table is to be isomorphic to a relation, the columns must be ordered. This is an error on Date's part.

A better statement would be that a table must be a map between attribute names and values. One could also say that when a total order is assigned to the map's set of attribute names and is used to order the values in each each map element so as to produce a tuple (sequence of values) for each map element: the set of such tuples is required to be a relation (a fairly meaningless statement, as it's easy to show that any such construction from a finite map will generate a relation).

Of course it's possible to get the attribute names into a relation (replace each attribute domain by the cartesian product of the singleton set containing the attribute name and the original attribute domain) but the description is rather far-fetched and requires a lot of mental gymnastics to explain how one can handle the concept os the domain with attribute-name "pet" in one table holding a value equal to one in the domain with attribute-name "dog" in another table (redefining equality is such fun) and even then the order is still there - there's no way of eliminating it from a relation. So I prefer to say that a table must be (isomorphic to) a set-theoretic map, so that the fields are accessed by name and not by order.MichealT (talk) 13:04, 7 March 2010 (UTC)[reply]