Talk:First normal form

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Databases / Computer science   
WikiProject icon This article is within the scope of WikiProject Databases, a collaborative effort to improve the coverage of database related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
 ???  This article has not yet received a rating on the project's quality scale.
 ???  This article has not yet received a rating on the project's importance scale.
Taskforce icon
This article is supported by WikiProject Computer science.
 

Primary keys with informational content[edit]

A commonly overlooked violation of 1NF is building intelligence into an identifier. The VIN for an automobile is not atomic, besides a sequence number it contains:

1) A code identifying the region where the vehicle was manufactured;

2) A code identifying the manufacturer;

3) Several codes identifying attributes of the vehicle;

4) A code indicating the vehicle model year;

5) A code indicating the plant where the vehicle was made;

Sadly, the VIN is probably the primary key to many tables in existence today. Business persons have a natural tendency to build inteligent keys. There is no mention of this in the main article, it only addresses the classic textbook example of eliminating redundant columns. Mooredc 17:56, 16 August 2006 (UTC)

GUIDs contain (or, did in the past) a mac address and a timestamp, does that make a GUID a bad key? Isn’t the question more to do with whether that “intelligence” is actually used for anything? —Random832 20:33, 19 September 2007 (UTC)
In August 2006 when Mooredc wrote this, the article did not contain a section on atomicity; now it does, and hopefully it gives a sense of why Date and others consider it a mistake to hang one's definition of 1NF on the notion of atomicity. 1NF aside, there are many situations in which building "intelligence" into a domain - particularly when it's a primary key column's domain - is a bad idea. If CAR_ID = "RED16", and the "RED" substring refers to the colour of the car in question, then we have a dilemma when someone paints the car blue. The identifier isn't much of an identifier if we are compelled to alter its value from time to time; on the other hand, if we keep its value the same in these types of situations, its meaning becomes muddled. Further, if knowing the colour of the car is important, we may as well expose the colour to the RDBMS by making the colour a column in its own right. You are exactly right when you say that what matters is whether the intelligence is used for anything - if it is, we get problems of the type described; if it isn't, we don't. --Nabav 22:03, 2 October 2007 (UTC)

Links to other forms articles[edit]

I don't know if this is the right place to add my two pence, but I was wondering if it wouldn't be a good idea to have some kind of links in the bottom of the article to move from 1NF to 2NF to 3NF and so on, instead of having to go back to the normalization in order to move from the article on one normal form to the other. ray 17:04, 21 August 2006 (UTC)

Glad I'm not the only person thinking that. I'll get on with that template.--VinceBowdren 18:04, 21 August 2006 (UTC)
Obviously, your comment here did get noticed. I've been treating Talk:Database normalization as the place for discussion of this set of articles in general, though I and the others working in this area have all the articles on our watchlists anyway. I don't think its worth setting up a WikiProject for this few linked articles.--VinceBowdren 19:12, 21 August 2006 (UTC)

What if NULL is part of the applicable domain?[edit]

I mean, take this:

First Name Last Name Middle Name
John Public Q
J Hacker Random
John Doe

“Has no middle name” is a valid answer, yet having a nullable column seems to violate that requirement —Random832 20:28, 19 September 2007 (UTC)

The first point to make is that Null is never part of the applicable domain. The Null (SQL) article puts this succinctly: 'Since Null is not a member of any data domain, it is not considered a "value", but rather a marker (or placeholder) indicating the absence of value.'
Second, although you are completely right in saying that "has no middle name" is a valid answer, it does not follow that a Null in a Middle Name column is the only way of expressing the proposition. There are other ways of doing it. A logically-sound way that does not involve Nulls is:
Person ID First Name Last Name
1 John Public
2 J Hacker
3 John Doe
4 George Bush
Person ID Middle Name Sequence Num Middle Name
1 1 Q
2 1 Random
4 1 Herbert
4 2 Walker
Notice that with this method we also allow a person to have any number of middle names.
The two tables above are in 1NF even by Date's stringent definition. --Nabav 06:51, 2 October 2007 (UTC)
Agreed. Also even the table in the primary example needn't necessarily contain a Null value in the sense mentioned in the Null (SQL) article to present as such. It could, in fact, contain instead a single character string comprised solely of the "nul" character (\0). Nul is a legitimate member of the set of possible characters in a string (generally serving to mark the end of a variable length string) and is therefore separate of the "Null" value (i.e. the empty data set) MerlinYoda (not signed in) 65.117.116.130 (talk) 21:34, 29 December 2010 (UTC)
Is this "logically-sound way that does not involve Nulls", in which individual words of a person's name are stored in separate rows of some relation, ever implemented in a practical system? If so, which system? And if not, why not? --Damian Yerrick (talk | stalk) 15:58, 5 October 2011 (UTC)
The trouble with this discussion is that the example of a domain which contains null is very wrong and all the discussion is about this straw man. The statement that null is never part of the applicable domain should not be left unchallenged. The definition of NULL in SQL has nothing to do with relational algebra and much to do with SQL. Christopher Strachey (the first director of the OU Programming Research Group, and Oxford's first full professor of computer science) worked on the use of domain theory to obtain a semantics for computation beginning in about 1967 and in later years continued this work in collaboration with Dana Scott when Scott was at Oxford (from 1972 onwards) until his death in 1975, after which Scott continued to elaborate the theory. The result of this collaboration was of course the denotational semantics for which the names "Scott and Strachey" are famous in both the computer science world and the mathematics world. The domains Strachey was using (and those which Scott used when he joined in) always had a NULL element (usually called "bottom"); in fact bottom has become very much a standard concept in the field of declarative programming (which most relational database theorists claims includes their relational algebra). The bottom element of a domain is that element which provides no information other than it is a member of the domain concerned. Since relational database theory (and normalisation too) are, unlike SQL, part of computer science and mathematics, domains in relational theory can and usually do include NULL, although it is extremely good practise to avoid its use wherever possible (which, luckily, is quite often). So now tell me: if I want to represent these domains, including NULLs, why can't I do so and how does it conflict with first normal form? Michealt (talk) 21:55, 1 May 2011 (UTC)

Seems like the use of the word tuple is inconsistent[edit]

The article contains the following:

A view whose definition mandates that results be returned in a particular order, so that the row-ordering is an intrinsic and meaningful aspect of the view.[5] This violates condition 1. The tuples in true relations are not ordered.

However, when I followed the link for tuple, the formal definition said that tuples are ordered. —Preceding unsigned comment added by 84.75.117.176 (talk) 10:44, 30 September 2007 (UTC)

The formal definition is talking about how the values within a tuple are ordered, whereas here we're talking about something different, namely the possibility of tuples being ordered with respect to each other. I've changed the wording to make this more clear. --Nabav 22:49, 1 October 2007 (UTC)

So the link for tuple contradicts the second point of the definition - that there is no left to right column ordering - by saying that the elements of a tuple have an intrinsic order. Michealt (talk) 23:57, 16 August 2011 (UTC)

1NF Example is not appropriate[edit]

Please revise the 1NF Example as it can be an example of 2NF also. —Preceding unsigned comment added by 122.164.254.49 (talk) 11:14, 1 October 2007 (UTC)

done.Heathcliff (talk) 03:39, 8 February 2008 (UTC)
There was nothing wrong with the 1NF example being in 2NF as well. As has been explained before on the discussion pages of some of the other NF articles, and in the main database normalisation article, normalisation is not an iterative process: we do not normalise to 1NF, then to 2NF, then to 3NF, etc. On the contrary, if a design problem prevents a table from meeting Nth Normal Form, correction of the problem typically causes the revised table(s) to meet not only Nth Normal Form, but the higher normal forms as well (thus for example the revised table(s) will be overwhelmingly likely to be in 5th Normal Form). Thus I've reverted the example back. Having said all this, I believe the article DOES need some additional material that briefly notes that some 1NF tables suffer from problems, and that the job of 2NF is to address those problems. An example of such a 1NF table could be given. All of this would be in a separate section called something like "Normalization Beyond 1NF", and, of course, a link to the 2NF article would be given within that section. --Nabav (talk) 18:49, 29 February 2008 (UTC)
I've added the "Normalization Beyond 1NF" section now. --Nabav (talk) 19:41, 28 April 2008 (UTC)

How is the example given under the heading "Normalization Beyond 1NF" a 1NF table? IF you assume the customer ID is the primary key here, then isn't there a violation of having unque key in rows 2 and 3? - Henry —Preceding unsigned comment added by 192.193.164.8 (talk) 07:52, 7 August 2008 (UTC)

To assume that the customer ID is the key is to assume that Telephone Number must be functionally dependent on Customer. It isn't. The reason it isn't is that multiple telephone numbers can legitimately correspond to a single customer in our example. Upshot: the key is {Customer ID, Telephone Number}. I've made this explicit in the article now. --Nabav (talk) 21:02, 9 August 2008 (UTC)
As someone new to 1NF I find this confusing, because your example here is essentially identical to the original problem in the 1st example. If you are saying that another acceptable solution to make the table conform to 1NF is to alter the key, then that needs to be explicitly stated in the article. —Preceding unsigned comment added by 67.85.254.145 (talk) 16:57, 29 March 2009 (UTC)
Point taken, it was confusing. I've changed the example. --Nabav (talk) 16:50, 31 March 2009 (UTC)

Pure 1NF Example mention seems to me necessary:

1NF Simple Solution of Repeating groups[edit]

CUSTOMER_TELEPHONE
Customer ID First Name Surname Telephone Number
123 Robert Ingram 555-861-2025
456 Jane Wright 555-403-1659
456 Jane Wright 555-776-4100
789 Maria Fernandez 555-808-9633

This 1NF design, not in 2NF, is relational and allows further normalization. The new PK is (CustomerId,TelephoneNr). EnriqueVillar Jan 12, 2011 —Preceding unsigned comment added by 88.31.77.59 (talk) 17:32, 12 January 2011 (UTC)

Atomicity[edit]

Seems like there should be some mention of the importance of atomicity instead of just explaining that Date disagrees with Codd and listing the ways atomicity can be taken to extremes. Ostensibly, Codd was trying to say that putting a field in a field isn't generally a good idea. For example, you wouldn't want a flight number field to contain the codes for the airports of departure and arrival. That's the common sense way to interpret atomicity. Seems like that should be mentioned first instead of not at all. Then talk about all the theoretical mumbo jumbo that has limited practical value to someone just learning about 1NF. --Trweiss (talk) 22:09, 12 May 2008 (UTC)

Cases in which somebody wants to place more than one of the same type of thing in a single field have been covered in the "Repeating Groups" section of the article. The case you mention, involving placing multiple different types of things (i.e. departure airport and arrival airport) in the same field, arguably hasn't been covered. And covering it might be a good way to introduce atomicity. As an intro to the Atomicity section, we could mention something like the departure airport / arrival airport case, and make the point that this is the sort of thing people are trying to discourage when they encourage "atomicity". So in summary: I agree! We just need to be careful not to give the reader the impression that atomicity is a clearer and more well-defined concept than it actaully is. Atomicity is in the eye of the beholder (a string field can contain multiple letters of the alphabet combined together into words, for example, and no one complains about that) ... If the smaller components of the data within the field are not worth separating out as fields in their own right, then don't; if they are, then do. --Nabav (talk) 08:10, 13 May 2008 (UTC)
Yes Date disagreed with Codd about atomicity, but he expresses what he means no less obscurely than Codd did. His 4th condition "Every row-and-column intersection contains exactly one value from the applicable domain (and nothing else)" is just Codd's atomicity condition (with something intended to help justify his exclusion of nulls tacked onto the end in partentheses) and I am sure Codd would have agreed with Date's wording (apart perhaps from the parenthetical ending). Codd said that a value in the domain should not be decomposable except by some special functions; Date said decomposable values are usually not allowed but are allowed sometimes. I actually found Codd's ideas less vague and undefined than Date's, because it was quite clear that Codd knew what he wanted to forbid (and, in my opinion, quite clear what that was). It's not at all clear what Date wanted to forbid, or even that he had any fixed idea about what he wanted to forbid (apart from forbidding NULL). Nabav's last sentence above seems to be quite a good description of Codd's objectives in atomicity, and also of what I think Date was probably trying to say, and probably ought to be in the article.MichealT (talk) 13:21, 7 March 2010 (UTC)

1NF tables as representations of relations[edit]

The requirement that tables represent mathematical relations (no duplication or implicit ordering of rows) is not part of 1NF, but is prior to the idea of 1NF; if Date says otherwise he is wrong. The article should reflect Codd's original article, which proceeds in two steps: first, the proposal to base data modeling on mathematical relations, and second, the proposal of "normalization", later called 1NF, which is the systematic elimination of "non-simple" value domains. The article should not confuse these two things like it does now. Rp (talk) 14:50, 3 February 2009 (UTC)

In general, our NF articles reflect the original text in which the NF was introduced. The original text typically provides an absolutely rigorous definition of the NF. Not so in the case of 1NF, however. Codd never supplied a rigorous definition of 1NF - whereas, in his "Further Normalization..." article, he DID supply rigorous definitions of 2NF and 3NF.
What you get in Codd's "Relational Model of Data for Large Shared Data Banks" article is a rather informal description of a procedure which Codd called normalization. The description of the procedure is reliant on highly ambiguous notions such as that of a "non-simple domain", which is related to the equally ambiguous notion of "atomicity". Because of this ambiguity, it is quite impossible to make definite judgments as to whether a table is in (Codd's) 1NF or not.
Date's definition is not an attempt to express Codd's definition in different language. Rather, it is an attempt to define a version of 1NF that rests on unambiguous criteria and is (from Date's point of view) sensible. Fidelity to Codd's intentions is not particularly relevant here. Date disagrees with Codd on certain points, and would not claim that his definition of 1NF is in line with all of Codd's intentions (whatever they may have been).
Incidentally, I believe that in stipulating as part of his 1NF that a table has to be isomorphic to a relation, Date IS in agreement with Codd. The distinction you are making - between "a requirement prior to X" and "a requirement of X" - cannot hold. "A requirement prior to X" clearly IS a requirement of X. For example, the property of "being a fisherman" is a property of human beings, and I'm sure you'd agree that the idea of being human is prior to the idea of being a fisherman. But we speak the truth if we cite "being human" as one of the criteria for being a fisherman. Most definitions would not bother mentioning it, but a definition that does mention it is not wrong. ("A human being who fishes by profession" - this would not be wrong.)
This is exactly what Date is doing. And one can see why. Both he and Codd think that 1NF is a property of relations - well then, anything that is not a relation clearly cannot be in 1NF. A table with duplicate rows is not a relation. Therefore a table with duplicate rows is not in 1NF. Of course, at the time Codd was writing, there was no point even mentioning this, because SQL tables (which of course can accommodate duplicate rows if no unique key constraint has been defined) did not exist at that time! But they certainly do exist now, which is why it makes sense to say explicitly that a table that can accommodate duplicate rows is not in 1NF. --Nabav (talk) 15:21, 6 February 2009 (UTC)
I disagree. Regardless of his inaccurate description, Codd's notion of 1NF is a property of mathematical relations, namely the absence of "non-simple" attribute values, and this is how mathematical treatments of relational databases use the term. Therefore, Date's extension of the notion, which includes certain requirements on implementations of mathematical relations, is confusing, and the article should be amended to correct this. Wikipedia should explain terms as they are normally used, and not restrict itself to particular ways of using them, especially when they are not the most common. At the same time, Date's books are so popular that his use of the term must also be explained here. Rp (talk) 10:03, 9 February 2009 (UTC)
Perhaps you are right that the article currently doesn't devote enough attention to Codd's original account of 1NF. This can be corrected. However, I'd just sound one note of caution. When you say that Wikipedia should explain terms as they are "normally" used, I basically agree with you; but I don't agree that the term "first normal form" is normally used in either the way Codd uses it in his "Relational Model of Data for Large Shared Data Banks" article, or in the way Date uses it. The normal usage of the term, the most popular usage I have come across, is as a piece of practical database design guidance, according to which (for instance) examples 2 and 3 in this article are "violations" of 1NF - even though none of the attributes in those examples are of type RELATION or TABLE. The best one can say about such examples is that they somehow contravene the spirit of 1NF, if not the letter. Getting the right balance in an article about 1NF is very tricky, because the article must do justice not only to popular examples of that kind, but also to Codd's original characterization of 1NF, and also to Date's new definition of 1NF - all with a view to illuminating the actual implications of 1NF for databases, because those implications are what give 1NF its importance in the first place. Each of those three understandings of 1NF is controversial in its own way. Nevertheless, all we can do is continue to tweak the article until we do get the balance right! --Nabav (talk) 11:23, 9 February 2009 (UTC)

Isomorphic to a relation[edit]

There is a blatant contradiction between the between "isomorphic to a relation" and number 2 in the list of 5 conditions. A relation is nothing more or less that a subset of the cartesian product of several domains: each element is therefore an ordered finite sequence of values. The attributes don't have names. If a table is to be isomorphic to a relation, the columns must be ordered. This is an error on Date's part.

A better statement would be that a table must be a map between attribute names and values. One could also say that when a total order is assigned to the map's set of attribute names and is used to order the values in each each map element so as to produce a tuple (sequence of values) for each map element: the set of such tuples is required to be a relation (a fairly meaningless statement, as it's easy to show that any such construction from a finite map will generate a relation).

Of course it's possible to get the attribute names into a relation (replace each attribute domain by the cartesian product of the singleton set containing the attribute name and the original attribute domain) but the description is rather far-fetched and requires a lot of mental gymnastics to explain how one can handle the concept os the domain with attribute-name "pet" in one table holding a value equal to one in the domain with attribute-name "dog" in another table (redefining equality is such fun) and even then the order is still there - there's no way of eliminating it from a relation. So I prefer to say that a table must be (isomorphic to) a set-theoretic map, so that the fields are accessed by name and not by order.MichealT (talk) 13:04, 7 March 2010 (UTC)

Inaccessible lead needs improvement[edit]

The lead paragraph is too technical. It is my understanding that an 1NF can roughly be summarized as a table where the fields only contain one piece of information (no lists items). This statement may not be technically rigorous but it at least tries to capture the gist of the situation so that beginners can gain a basic understanding of the topic. The sentence, "A [1NF] table is a faithful representation of a relation and that it is free of repeating groups" absolutely inaccessible to the bulk of the readership. It is an instant turnoff that belongs nowhere in the lead. Technical jargon should be appear after a more down-to-earth introduction. I don't feel like I have the background to wisely word a less technical lead, so hopefully one of you can do it. Jason Quinn (talk) 14:48, 16 June 2011 (UTC)

Perhaps Date puts it better, when he writes

"At every row-and-column position within the relation [table], there is always exactly one data value, never a set of multiple values. In other words, first normal form just means no repeating groups (loosely speaking).

 :Of course, even this could be misleading to a beginner, because a value may be a relation value (and may therefore contain more than one primitive values); so to give beginners a basic understanding of the topic (i.e. not lead them into believing things which appear obvious but are fundamentally incorrect), we might—counterintuitively— need to give more, rather than fewer, technical details. This is because the beginner might think it is obvious what "data value" means, where it is not.--Boson (talk) 20:37, 16 June 2011 (UTC)
Physics articles have a similar issue of trying to make technical topics accessible. This is done with some success there. Between the two, database concepts are far less complex. I see no reason why simple introductions cannot be given for the various normal forms. From my own reading of beginner's material on databases, I think that obfuscation through jargon is endemic to the whole field. It is almost as if the authors are trying to make the subject more complex than it is. It's just bad writing. I took courses in mathematics where the foundational mathematical concepts that underlie relational databases were presented with far more clarity and ease of access than some introductory database books manage to achieve through their "practical" approach. That's ass backwards. I do not see giving a non-precise "working" definition before a precise one as a problem. (Imagine if we did that with the real numbers: "Hey kids, the real numbers are the unique complete Archimedean ordered field up to isomorphism".) From a pedagogical perspective, it is necessary to motivate a complex topic by using less-complex examples. If some precision is lost because of this, it can always be clarified later. The value of learning by example cannot be underestimated. There's no good reason that examples of 1NF and non-1NF tables cannot be given before the technical sections. The human brain is a learning machine. We are able to extract abstract concepts from concrete examples even when the cause of the differences is not known. The examples that are given in the "Repeating groups" section might as well not exist at all because the technical jargon surrounding them has already alienated most of the audience likely to read this article. They should be moved up and outside of that section. Jason Quinn (talk) 02:06, 17 June 2011 (UTC)

Primary Keys[edit]

One big difference between the Date and Codd formulations of first normal form is that Codd regarded primary keys as crucial - every relation must have one or more primary keys (today those would be called candidate keys) and the definition of the relation must specify which of these primary keys is to be the primary key. The importance of the primary key is that it, together with an identification of the relation, is the means by which a row is identified by a user of the relational system - so the primary key is something meaningful to the user. As a consequence of the requirement for a primary key, a relation can not contain duplicate rows. The Date view seems to be that it is fundamental that there are no duplicate rows; so there is at least one superkey (the whole row) and so there must be at least one minimal superkey, that is at least one candidate key, and any one of those can be picked as the primary key. When it comes to views, since duplicates are eliminated from projections there is still guaranteed to be a primary key. Superficially, these two approaches look the same; in fact they turn out to have very different consequences, and probably explain why Codd could envisage a relation permitting attributes that could be NULL, while this is anathema to Date (Codd in his later work doiesn't require derived relations to have primary keys - but Date's starting from the position that there can never be duplicates and needing the existence of a primary key to be a consequence of this for base relations, can't allow nulls); it also leads to a difference of views about whether the meanigfulness (to users) of primary keys is or is not fundamental. It is a great pity that teh article does not treat primary keys at all - but I suppose it is an inevitable consequence of it s being written from a perspective based on Date's approach, instead of Codd's. I think the article could be greatly improved by inclusion of both approaches, instead of just one.Michealt (talk) 00:21, 17 August 2011 (UTC)

Link to 'normal forms' in lead sentence[edit]

Hello Jerome Potts. The new lead sentence references normal form. This is likely to confuse new readers who will feel the need to read the Database normalization#Normal forms section in order to understand what first normal form is. It isn't easy to understand the linked section, and it isn't necessary in order to understand the definition. It is sufficient to say 'First normal form is a property of a relation', and reference the normal forms section later, for further reading. - Crosbie 18:43, 21 March 2013 (UTC)

Statement of First Normal Form is Too Narrow[edit]

The second sentence of the main article reads: "A relation is in first normal form if the domain of each attribute contains only atomic values, and the value of each attribute contains only a single value from that domain."

There are two problems with this:

  1. Neither the relational model nor any of the well-known databases allows anything other than an atomic value as the domain. (For the present purposes, consider the newest additions to SQL, such as row and array values, to be atomic.) As the main article says, "A typical relational database management system will not allow fields in a table to contain multiple values in this way." Therefore, a reader would puzzled as to why he or she is being warned against doing something that isn't possible. (Like the sign on an elevator that says: "Out of order. Do not use.")
  2. The real problem is with tables that have columns that seem to repeat, such as a Department table with columns like Employee1, Employee2, etc. This badly designed table satisfied 1NF as the rule is stated here. This is because Employee1, Employee2, etc., are distinct columns and the rule is exclusively about what goes on in a single column. Also, Employee1, Employee2, etc., are distinct attributes. In many writings on relational database theory, attributes are given as A1, A2, etc., a naming scheme that suggests that the Ai are a list, just as the Employeei are.

So, something should be done to this page, which I'm willing to do, but I thought some discussion might be in order first. The choices that come to mind are:

  • Add a note that 1NF can never be violated, and that the bad design (Employee1, Employee2, etc.) that is normally thought to be in violation of it in fact is not, but that bad design is still bad, as are many other kinds of bad designs that happen to be normalized but are still bad.
  • Change the statement of 1NF so it somehow applies to the case where the columns (attributes) are distinct, but in violation of the rule because they appear to be the same attribute repeated.

It will be hard to phrase my second choice because while Employee1, Employee2, etc., appear to be a repeated column, in fact they are not. An example that better illustrates the problem would be a table with columns FirstName, MiddleName, LastName, which is very frequent. Is every table in the world with these columns in violation of 1NF? Probably most people would say they are not. But, suppose it's observed that first, middle, and last names don't apply to many cultures, and therefore better column names would be Name1, Name2, Name2. Not it looks like it's in violation of 1NF. But it's a bad rule that depends on how the columns are named. A good rule should be about the structure of the table, not what choices are made for column names (as long as they're distinct, that is). — Preceding unsigned comment added by Rochkind (talkcontribs) 14:02, 7 May 2013 (UTC)

The current definition is sourced to Fundamentals of Database Systems, Fourth Edition. If we tinker with the definition it is no longer reliably sourced - it is original research. As you say, the article already states 'A typical relational database management system will not allow fields in a table to contain multiple values in this way'. It would even better to have a source for this statement also. - Crosbie 18:06, 7 May 2013 (UTC)
So you'd go for my first bullet point? Rochkind (talk) —Preceding undated comment added 21:21, 7 May 2013 (UTC)
Perhaps it is analogous to stating that a properly made three-legged stool has three legs, but I think we need to be very careful. Interpretation of "atomic" may also vary (historically). And I think we should be careful (probably more careful than the article currently is) to treat normalization as conceptually independent of (and prior to) an actual database. After all, relations do not contain nulls (because nulls are not values), but "relational" databases may do so. I think it is OK to state that relations are always in first normal form, but - at the stage of design where normalization takes place - it is something that has to be achieved deliberately. In other words, you can use some notation for depicting a relation and come up with something that is not in first normal form. Checking against the rules for first normal form is one of the tests to see if it is a valid relation. You don't check to see that it is a relation and then conclude that it must therefore be in first normal form. I take your point about "Employee1" and "Employee2", but I think we need to consider the concepts. We need to relate "Employee1" and "Employee2 to attributes of a conceptual data model and define the concepts in a meaningful way. And we need to avoid nulls. --Boson (talk) 23:25, 7 May 2013 (UTC)
Agree. But the problem is that someone might be told that a database with employee1, employee2, etc., is not in First Normal Form (what just about everyone would say), and then going to Wikipedia, would find out that it IS in First Normal Form, even though it is obviously a bad design. If First Normal Form is not the rule that one uses against employee1, employee2, etc., then what is? Normalization may be about relations (conceptual model), but in practice it is also used on tables (physical model). Is this article only about mathematics, or is it also about computer-system development? Perhaps I should add a section about First Normal Form in Practice and see what people think of it? Rochkind (talk) —Preceding undated comment added 23:32, 7 May 2013 (UTC)
There may be no absolute definition of atomicity, but if your data model has an entity "Employee" with, say, a one-to-many relationship between Department and Employee, and your database is meant to allow normal queries about employees (which implies the use of relational operations by the DBMS when implementing queries) then I would say that use of Employee1, Employee2 etc. (as foreign keys) is (normally) denormalization, so it does violate 1NF. I don't think that is changed by the fact that you can trick the DBMS into not noticing what you are really doing at the logical level. It is different if the "columns" really represent different concepts (as with Name1, Name2 used for the names written in different positions of the full name), and it is different if a set of employees is meant to be treated only as a unit. --Boson (talk) 11:35, 8 May 2013 (UTC)
Couldn't agree more. But don't you think it's a problem that our Wikipedia page here says nothing whatsoever about this interpretation of 1NF? Most database-design books and articles discuss what's wrong with my table under the heading of 1NF. Rochkind (talk) 15:27, 8 May 2013 (UTC)
I think I now better understand what you are saying. I agree that it is a problem. --Boson (talk) 23:43, 8 May 2013 (UTC)
Rochkind - when you say 'Most database-design books and articles', it would be useful to provide a reference or a link to an example. Similarly, any changes to the article should be well-sourced. - Crosbie 05:16, 9 May 2013 (UTC)
They will be. Rochkind (talk) 01:12, 11 May 2013 (UTC)

I think it will be difficult to explain this to the average reader without discussing the relationship between the real world and the physical tables, via the conceptual data model and the logical (ER) data model, etc. It is easy to see that

  • Each employee is a person
  • An employee is either an active employee or an inactive employee
  • An active employee is employed in exactly one department

can lead to different types of employee, but - unless the table is seen in isolation - it is not easy to see how you can end up with a table containing a column Employee1 and a column Employee2, both of which must in effect be foreign keys referring to the same table, unless the two references derive from two different relationships (for instance if Employee2 expresses a relationship like "manages/is managed by"). To keep the discussion at the level of the relation, perhaps it would be useful to introduce the concept of "external predicates", the word "external" indicating that we are talking about "what relations mean to the user, rather than to the system". The "heading" of a relation "can be regarded as denoting a certain predicate . . . [and ] each tuple in the body . . . can be regarded as denoting a certain proposition (i.e., a statement that is unconditionally either true or false)" (this can be sourced, for instance, to Date, Darwen, Lorentzos: Temporal Data and the Relational Model).
Another thing that occurs to me is the historical perspective, which may help explain why certain ideas about atomicity may have evolved. I'm not sure how much can be stated without involving original research, but I suspect that the development of object-oriented systems served by relational back-ends and the "integration" of data models and object models may have helped emphasize the idea of single devoper-defined operations on complicated data types, supporting a different understanding of atomicity.
Perhaps we also need to separate different perspectives:

  • that of the "relational mathematician"
  • that of the system designer, explaining things in terms of the data model
  • that of the database implementor, explaining things in terms of tables, foreign keys, etc.
  • that of the man in the street (or on the Clapham omnibus), who needs the above perspectives explained in layman's terms

We might need to think about what information goes in which articles (including Database normalization, Relation (database), and the various normal forms). After all, this article and the other nNF articles are sub-articles of Database normalization. Perhaps all the articles should put more emphasis on the process of normalization - stressing that you don't start with the final database, but without giving the impression that one actually (as opposed to notionally?) progresses through different normal forms. At the moment I would be hard-pressed to define the readership that this article is targeted at. --Boson (talk) 15:40, 11 May 2013 (UTC)

Is tree representation not in 1NF?[edit]

As per the article, Date’s definition states that there’s no top-to-bottom ordering of the rows. If that’s the case, is a common representation of a tree structure, such as the table below, not in 1NF?

Person
ID Name Mother ID
1 Jane Doe
2 Alice Doe 1
3 Bob Doe 1
4 Edgar Doe 2

I find it hard to believe so there must be something else going on. It would be good if someone explicitly mentioned this or gave an example of a table which is not in 1NF because of the top-to-bottom ordering.— mina86 (talk) 16:16, 9 September 2015 (UTC)

I would understand the definition to mean that the table that you gave is the same as this table:
Person
ID Name Mother ID
3 Bob Doe 1
4 Edgar Doe 2
1 Jane Doe
2 Alice Doe 1
--Boson (talk) 21:40, 9 September 2015 (UTC)
PS: The null value for Mother ID would also be a problem, but that is a different issue. --Boson (talk) 21:47, 9 September 2015 (UTC)
D'oh! So it was just a case of me being silly. Thanks; and forget I brought it up.— mina86 (talk) 23:06, 9 September 2015 (UTC)