Talk:Database normalization

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Databases / Computer science  (Rated Start-class, Top-importance)
WikiProject icon This article is within the scope of WikiProject Databases, a collaborative effort to improve the coverage of database related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
 Start  This article has been rated as Start-Class on the project's quality scale.
 Top  This article has been rated as Top-importance on the project's importance scale.
Taskforce icon
This article is supported by WikiProject Computer science (marked as Top-importance).
 
WikiProject Computing (Rated Start-class)
WikiProject icon This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
 Start  This article has been rated as Start-Class on the project's quality scale.
 ???  This article has not yet received a rating on the project's importance scale.
 

Archives: 1 2 3

Contents

[edit] Karnaugh map

How would I know if I had reached the most normalised/optimised stage? Is there any tool like Karnaugh map that lists all permutations/combinations?Anwar (talk) 13:01, 22 May 2008 (UTC)

I don't know, but for small schemas (say, 20 relations or less) this is easy to see. A related issue is that there may be implicit dependencies that have not been marked explicitly in the database schema - determining those is not so easy, see e.g. [1] Rp (talk) 08:49, 7 July 2009 (UTC)
We can use the concept of normalization only for the time of removing the redundancy data in the databases. When the data gets the unique data in the records, may get reach the normalised stage. —Preceding unsigned comment added by 210.212.230.196 (talk) 09:52, 18 December 2010 (UTC)
Normal forms are defined as conditions; you can simply check whether the required conditions hold to know whether a database schema is in a particular normal form. Rp (talk) 23:38, 26 December 2010 (UTC)

[edit] Trade-off

The article does not explain the trade-offs suffered with normalisation. For instance, a highly normalised database needs more tables each stripped to the bare minimum. So serving a singular query would require pulling data from several tables. This costs time and money. In a way, the business process is not optimised (though the database is!).Anwar (talk) 13:18, 22 May 2008 (UTC)

Do you have an article or a computing paper where this is explained, so we can add it better to the article? --Enric Naval (talk) 11:43, 24 May 2008 (UTC)
A normalization, and thus pulling data from several tables doesn't cost significantly more time. In fact, a database that is not normalized will take much more CPU time and Disk I/O to process, requiring more resources as query load increases. Then your business will either need to purchase more hardware, or re-architect it's database and code base, and trust me, that costs alot more. Common sense always dictates that you "Do it right the first time" 71.231.132.75 (talk) 14:38, 24 December 2009 (UTC)
This is not correct - certainly deviations from normal form can cost a lot of space and query time, but denormalization can also help performance - for instance, if certain joins are very frequent, it may pay off to denormalize the database over those joins. Rp (talk) 21:56, 26 December 2009 (UTC)

[edit] Denormalization

The statement, "It has never been proven that this denormalization itself provides any increase in performance, or if the concurrent removal of data constraints is what increases the performance." needs more backup on that argument or needs to be revised or removed. Volomike (talk) 12:53, 17 January 2009 (UTC)

It's wrong. Denormalization doesn't remove data constraints. It just means that instead of maintaining two tables, you maintain their join. This can be faster if you often need that join. Rp (talk) 21:13, 1 July 2009 (UTC)
PS the article should bother to explain this. Rp (talk) 21:19, 1 July 2009 (UTC)
Agreed. 99.60.1.164 (talk) 17:13, 22 August 2009 (UTC)

[edit] Request for normalization

Please help with http://strategy.wikimedia.org/wiki/Proposal:Assessment_content and http://strategy.wikimedia.org/wiki/Proposal_talk:Assessment_content#Normalization_of_assessment_items_.28questions.29_in_database 99.60.1.164 (talk) 01:37, 23 August 2009 (UTC)

Please note that [2] contains a list per quesion with each element containing two timestamps, so this is definetly a 6NF-level problem. An easier, and perhaps more important sub-requirement is the review system in [3], in particular, this example outlines a sub-schema related to
a selection of text or a url (to a permanent aritcle version or diff, etc.) could be an item for which multiple, randomly-selected reviewers chosen for their stated familiarity with a topic area would be selected. Those reviewers could be shown that text or url (perhaps as part of a list of such items) in a securely authenticated and captcha-ed channel. They would be asked to vote on the accuracy of the item, and have the opportunity to fully explain their votes in comments. If a statistically significant number of votes are in agreement, then the item could be approved as to veracity or rejected. When the votes are not in agreement, then additional voter(s) would perform a tie-breaking function. Each voter's track record in terms of agreement with other voters could be recorded secretly and used to (1) weight their vote to nullify defective voters, and/or (2) used to select whether the voter is allowed to become a tie-breaker.
The fields required to support that need to be added to [4]. 99.35.130.5 (talk) 18:00, 10 September 2009 (UTC)

[edit] ugly tables

this article has the ugliest tables i've seen —Preceding unsigned comment added by 67.187.187.128 (talk) 00:42, 27 November 2009 (UTC)

You noticed that too? 71.231.132.75 (talk) 14:40, 24 December 2009 (UTC)

[edit] Elementary Key Normal Form

This article should include EKNF (given this name in 1982 by Zaniolo, in his paper "A new normal form for the design of database schemata [5]), a normal form which is stronger (more faithful to the principal of separation) than 3NF and which has the "complete representation" property proposed in 1976 by Philip A Bernstein (Bernstein P.A., "Synthesizing third normal form relations from functional dependencies" [6]) as a criterion for schema synthesis algorithms.MichealT (talk) 11:57, 7 March 2010 (UTC)

Please add it - and while you're at it, perhaps you can remove Date's 6NF which really doesn't fit in with the rest, but I hesitate to touch it. Rp (talk) 23:40, 26 December 2010 (UTC)

[edit] Normalization

In researching the area of normalization, one thing that I often see mentioned is the "insert" or "delete" anomaly. I don't understand how this is an anomaly, let me explain.

Assume a system intended to store information about students and registered courses. If a logical schema presented the following requirements:

 -each customer shall enroll in one or more courses and
 -each course shall have one or more students

Then one can legitimately arrive at a single table

  (S#, SNAME, C#, CNAME)

which would face the update anomaly and redundancy.

But, the "insertion" anomaly and "delete" anomaly are a result of the statement of the logical requirements that state that no course may have zero students.

By changing the requirements to be:

 -each customer shall enroll in zero or more courses
 -each course shall have zero or more students

The supposed "update" and "insert" anomalies may be eliminated.

In the original paper by Codd about normalization, I see reference to "redundancy" and attendant update anomalies but where did insert and delete anomalies come from?

130.215.36.61 (talk) 11:01, 16 September 2010 (UTC) amrith

You are mistaken. The anomaly has nothing to do with requirements (whether courses without students may occur), but with expressiveness: in this table, courses without students cannot be represented, unless we introduce NULL student IDs and names (and NULL is notoriously difficult to deal with consistently). So one may argue that here, the purpose of normalization is to avoid NULLs. Rp (talk) 11:14, 16 September 2010 (UTC)

[edit] Non-repeating groups

If I understand "repeating groups" correctly, having no repeating groups sort of just means "don't have separate tables for stuff that could be grouped in a table with the same number of fields as the separate tables". Is that "more or less" correct? I kind of find the example confusing. It's currently presented as:

Customer Transactions
Jones
Tr. ID Date Amount
12890 14-Oct-2003 −87
12904 15-Oct-2003 −50
Wilkins
Tr. ID Date Amount
12898 14-Oct-2003 −21
Stevens
Tr. ID Date Amount
12907 15-Oct-2003 −18
14920 20-Nov-2003 −70
15003 27-Nov-2003 −60

I'm guessing that this is supposed to represent three different tables in the database that "each do the same thing". Would it be better if the tables were presented like this:

Customer Jones Table Customer Wilkins Table Customer Stevens Table
Tr. ID Date Amount
12890 14-Oct-2003 −87
12904 15-Oct-2003 −50
Tr. ID Date Amount
12898 14-Oct-2003 −21
Tr. ID Date Amount
12907 15-Oct-2003 −18
14920 20-Nov-2003 −70
15003 27-Nov-2003 −60

Do you think this more clearly illustrate "repeating groups" than the present example? (And with less chance of confusion?) Jason Quinn (talk) 10:44, 17 June 2011 (UTC)

I'm not sure I really understand what you intended to say, but the example with repeating groups is meant to be an example of "what not to do", i.e. an example of a relation that is not in first normal form. Think of a repeating group as an array within one row; in the example, there is a column that contains a variable number of transactions (intended to illustrate a clear-cut case, I presume). I am not sure if you are using the word 'table' in the sense of an array rather than representing a relation.--Boson (talk) 12:42, 17 June 2011 (UTC)
I don't know if I understood the example correctly now. Are the tables given in the example a single table displayed in three separate sections or three separate tables? I was under the impression they were three separate tables. This now illustrates exactly why I think the current nested presentation is ambiguous and confusing. Jason Quinn (talk) 02:38, 18 June 2011 (UTC)
As I understand the example, it shows (and must show) a single table with two columns (Customer and Transactions). Each Transactions "field" contains an "array" of individual transactions (possibly with a varying number of logical occurrences). I am using terms rather loosely. It may help to remember that we are really talking logical design, not physical implementation. --Boson (talk) 10:07, 18 June 2011 (UTC)
If this is the case, the example is really terrible and ought to be completely replaced. It seems like most books and websites, use an example of a field that contain multiple phone numbers as a table that is not in 1NF. Conceptually that is so much better than there's no comparison. Jason Quinn (talk) 09:50, 19 June 2011 (UTC)
I don't really see why phone numbers should be better than transactions. I don't know who created the example, but they may have wanted to indicate that a repeating group containing more than one attribute is more typical for a non-1NF database. For that reason an example with only one attribute is, perhaps, not very good and might make the uninitiated think of an (atypical) "repeating attribute". Can you think of a better way of illustrating a repeating group containing more than one (visible) attribute per occurrence.--Boson (talk) 14:41, 19 June 2011 (UTC)
A similar example of normalization is contained in Date's An Introduction to Database Systems, but is is perhaps clearer because the sub-column headings are shown only once, in the heading row (immediately under Transactions), though, I suppose, it could be argued that that is not as good, because a relation includes the headings and we are effectively talking about relation-valued domains. I don't know if this is the case in later editions of the book; my copy is from 1990. By 2003, of course, Date was saying explicitly that attributes could be relation-valued. I am not expert enough to judge if there really is a contradiction between his earlier and later books, but we need to be careful about how we define 1NF and what examples we use. By the way, the current example seems to go back to a rewrite of the article in January 2009 by Nabav (talk · contribs).
--Boson (talk) 16:17, 19 June 2011 (UTC)
Hi, just to clarify: my intention in the Customers & Transactions example was to do what Boson described, i.e. depict a data structure in which each Customer is associated with "package" of transactions. Any number of transactions can be contained within a given package.
I would like to avoid the implication that the data structure as a whole is a table. Also I'd like to avoid the implication that the "package" of transactions is a table. Before Codd came on the scene, structures like these were implemented in, for example, hierarchical databases (rather than in relational tables, which didn't exist). One of the most fundamental characteristics of relational tables and 1NF is that they allow us to get by without complex structures like the one in the example.
To avoid people thinking of the example in terms of tables, perhaps we could make the example look less tabular. This could be done by arranging the transactions for each customer HORIZONTALLY, rather than one on top of the other. --Nabav (talk) 13:34, 12 July 2011 (UTC)

[edit] The Design Exercise

From where I stand, there is an important aspect of Data Normalization that is completely missing in this article. That aspect is the use of these procedures in the design process of any database - or even for the evaluation of how well classes have been partitioned in object-oriented coding.
I have been programming professionally since 1971 and have found these NF definitions useful since I have started using them - decades now.
The point of DN is not that it is an implementation goal, but that it is an powerful design tool. Any data base, regardless of how it is to be implemented, should be fully normalized (at a minimum, 3NF) during the design process because that discipline will reduce the chances that the purpose behind the data fields is being missed. DN forces the designer to ask the right kind of questions and collect the required data so that the system, once implemented, will perform productively.
In the case where there is an existing system which is simply being automated, when interviewing the users of that system, they will described the data fields in the common tougue - very loose terms that barely touch on the purpose of a field. Only on careful investigation can keys and dependencies be determined, and once determined they will commonly result in discoveries that result in more user interviews. As that information is collected, it needs to be recoded and the "common tongue" is not good enough. The simplest comprehensive form for describing the database at this stage in the design process is by describing it in terms of a normalized database.
Once the database has been documented in this form, the next stage of the desgn process is how it will be implemented. At this point the designer needs to consider precisely how and when backups and restores will be done, whether the harware will support various query and update transactions, etc. At this stage, the database implementation will be described and, except for rare or trivial cases, substantial denormalizations will be applied.

Scott Bowden (talk) 16:08, 9 February 2012 (UTC)

I would say that probably belongs mainly under Data model, Data modeling, Entity-relationship model, etc., but there doesn't appear to be much in the way of linking to those articles. Perhaps a short section here with a hatnote pointing to those articles? Adding links wouldn't hurt, anyway. --Boson (talk) 20:42, 9 February 2012 (UTC)
Personal tools
Namespaces
Variants
Actions
Navigation
Interaction
Toolbox
Print/export