|WikiProject Computing||(Rated Start-class, High-importance)|
- 1 Weasel Words
- 2 Old discussion(s)
- 3 Usage in English
- 4 Meaning of data and information
- 5 Data WAS the plural of datum
- 6 Inaccurate pronunciations
- 7 Data synonym for information
- 8 Data: verb or noun?
- 9 Data as plural
- 10 mass, plural and determiners
- 11 Citation to add
- 12 Data refers to a collection of organised information
- 13 A matter of personal importance
- 14 Consistency in this Article
- 15 dictionary
- 16 Times Quote
- 17 Data (computing)
- 18 Data in different contexts
It looks like there are some disagreements on the usage of data as a mass noun. It is excellent that we have many citations on it's usage, but instead of stating statistics from the sources, we are having minor edit wars on words such as "many"/"most", "often/usually", etc. Perhaps we should instead cite the statistics from the source and leave speculation to the reader.Gsonnenf (talk) 11:36, 2 April 2009 (UTC)
I'm going to try clearing out the weasel words from the Usage in English section and see what is allowed to stick. Citation needed tag added because it's so specific, not that it's disputed.Acronymsical (talk) 16:59, 28 February 2011 (UTC)
It look difficult for me to understand A datum is a statement accepted at face value..
What do think about definition and explanations like this:
I think, it is good, when a definition uses other wikipedia terms. Not just plain English. Kenny sh 08:30, 10 May 2004 (UTC)
- Hello. There are a couple of serious problems with the above definition. The main problem is that it says data has to do with "information systems", "data processing", and "information". Either it's assumed these terms have to do with computers, in which case this definition is much too narrow, or not, in which case it's needlessly vague. A secondary problem is that the definition can't be understood without looking up some other terms. The existing definition, which uses only ordinary English words, is terse, comprehensible, and yet quite general. The proposed new definition does not have these merits. Regards, Wile E. Heresiarch 14:06, 10 May 2004 (UTC)
A separate page for datum is needed. In geology/cartography/geography and surveying a datum is a reference surface. For instance, sea-level is often used as a datum below which depths (or above which heights) are measured.
Hello COMPATT, to address your comments about the distinction between data and information -- I agree that programs are a form of data, but I think it's important to keep in mind that the word "data" has a history of usage that goes back much farther than computer science. The distinction between data and information, which is made in the article, is that information is derived from an interpretation of data. Some data don't have any obvious interpretation, and so we might noodle over ancient inscriptions for a long time, but some other data have such an immediate interpretation, especially in a given cultural context, that the interpretation is held to be the same as the data -- for example if I look at a photograph, I might immediately see "a dog" instead of "a pattern of silver particles which suggests a dog". I think the interpretation aspect, and its dependence on context, might be emphasized in the article. Well, I've rambled on long enough! Have a great day, Wile E. Heresiarch 14:33, 18 Mar 2004 (UTC)
Hello, as a comment on the edit that I just made. I put a new, short intro paragraph at the beginning, to hopefully get straight to the point. (The article was noodling around in etymology a little too much before getting to the punch line. Hopefully that's corrected now.) As the term "data" is rather general, I've attempted to give a general definition, and then immediately describe one of the most-used types of data (measurements & observations). I'm hoping that there is a right level of generality now. Happy editing, Wile E. Heresiarch 15:44, 19 Mar 2004 (UTC)
Usage in English
There's another meaning of the singular datum. In the US Navy, the term is applied to the last known position of a submarine whose precise location is no longer known. I don't think I ever heard it used in the plural; there just aren't that many submarines and there's a great deal of seawater under which to spread them. Dick Kimball (talk) 18:20, 2 April 2008 (UTC)
- I inserted most general and shortest functional definition of data (see function definition)
Referring to the sentence "this is all the data from the experiment", the assertion that "this usage is inconsistent with the rules of Latin grammar and traditional English" seems odd. If the word data is being treated as a mass noun, then surely the sentence is consistent with "traditional English".
Meaning of data and information
I changed it. In my opinion: - too much information noise (uncertainty of the author (?)) in this paragraph.
- As it is, the phone number is not actionable - you know it is a phone number, but it is of no use. This information becomes knowledge when you can act on this information, either to solve a problem (for example, to call Helen, whose phone number it is), or to gain insight into an issue (e.g. by noting that other phone numbers have the same exchange). People or computers can find patterns in and between data to perceive relationships between information, creating or enhancing knowledge. Since knowledge is prerequisite to wisdom, we always want more data and information. But, as modern societies verge on information overload, we especially need better ways to find patterns.
This in not about data, it is not necessary digresion – I removed.
See also: http://en.wikipedia.org/wiki/Talk:Knowledge about DIKW.
I do not find (on the Web) any articles which confirm the interpretation of the DIKW model which were suggested.
--Adam M. Gadomski 18:01, 4 November 2005 (UTC)
- Adam, please read again Wikipedia:No original research. You are linking extensively to your own research. Wikipedia is not the place to publish your original research. Also see Wikipedia:Guide_to_writing_better_articles. You seem to write in a heavy duty academic prose style, which isn't really used here. Some of what you write might have been OK but I can't tell it apart. Sbwoodside 22:30, 4 November 2005 (UTC)
Simon, your reply is a meta-response. Is it a style of "Space-invaders"? You copy the original research with not proper references - is it correct???
You (and only you) inserted DIKW in Wikipedia in a few articles.
Why do you do it?
- I see that your self-promotion on the Web is perfect, my congratulations, but I would like to see your sc.publications too - maybe this information could clear my doubts why "you are linking extensively" to and "update" this subject.
--Adam M. Gadomski 16:41, 24 November 2005 (UTC)
Data WAS the plural of datum
The first line of this article needs to change. Datum WAS the plural of datum, but no one uses it this way. In fact, in surveying, datum and data are too completely different words. Datum is a coordinate system for locating a point on the earth, while surveyors use data to mean what everyone else does. The plural of survey datum is datums, since data has a completely different meaning.
English does not follow the rules of a dead language that it happened to borrow a word from. See the back-formation article for numerous examples. You'll note that no one ever complains that "asset" is incorrect usage.
- Well, Datum has its own article, but I guess you're right that for this article probably the first line could be rewritten because in this context I think most people just talk about data and rarely use "datum" (not enough to justify the first sentence position). The first sentence / intro should summarize the article :-) Sbwoodside 19:22, 22 September 2006 (UTC)
- By the way, lots of people have been talking about changing the intro, why not be bold? WP:BOLD Sbwoodside 19:23, 22 September 2006 (UTC)
- I can't think of the last time I saw "datum" used, even by people who routinely treat "data" as plural ("data are"). I'd say that in popular usage, people tend to treat data as a mass noun as they would information (and use them interchangably). And, for good or ill, this popular usage seems to be crowding out the traditional academic/professional treatment. At work yesterday I reviewed a draft policy document regarding the handling of sensitive data; one paragraph used "data are" the next use "data is." I pointed this out and the second paragraph was changed to "data are" as the correct construction. Go ask a sample of demographers, social scientists, physicists, doctors, market researchers, and other people who work with data professionally and a significant majority will say that the "data are" construction is correct (and the others are wrong ;-) XKL 16:08, 26 May 2007 (UTC)
- Educated folk have no problem using "datum" in the singular and "data" in the plural in English sentences. This whole discussion is an attempt to justify Newspeak, and is little more than a sorry excuse for mental laziness. The English Wikipedia wasn't to be written in Ebonics; that Wiki is yet to be created. —QuicksilverT @ 22:58, 5 December 2007 (UTC)
- I have rewritten the intro for the article in an attempt to capture the meaning and usage of the word without introducing the controversy in the first sentence. Quicksilver, you are arguing ad hominem with your "Educated folk" remark. This wikipedia article is (should be) attempting to reflect reality, and diversity of opinion within it, not your own view. Personally, as an "educated folk" myself, I am strongly of the opinion that English is defined as far as possible by the people who speak it, and that examination of usage indicates a strong preference for regarding data as a mass noun (eg. the formation "database"). However I am content to have that debate elsewhere. Joffan (talk) 00:40, 4 January 2008 (UTC)
The statement "but these are English sentences, so Latin grammar rules do not apply" seems to be an unencyclopaedic opinion tagged on to an otherwise neutral sentence stating the status of the word as plural in Latin. The rules applied in English sentences are clearly rules of English grammar, not Latin, but English happens to have the same rule as Latin in this instance, i.e., that a plural noun requires a verb in the plural. The debate is not whether Latin rules should apply to English, but whether the word data is plural or singular in English, based on etymology and usage. I propose to delete the clause "but these are English sentences..." if there is no further discussion. GKantaris (talk) 15:45, 2 January 2008 (UTC) - OK, as there is no discussion, I've deleted the clause. GKantaris (talk) 16:21, 14 January 2008 (UTC)
The problem is not one of right vs wrong but of precision. In general English usage, 'data' is used interchangably with 'information' so it feels more natural to use it as a mass noun. For more technical use, 'data' must be pluralised to distinguish it from 'datum' and 'information'. (15 (a datum), is part of 15-08-65 (data) which is my birthday (information).
Many words, such as 'average', 'intellegent' or 'fruit' have precise technical meanings that differ from the way they are used in everyday speech and there is nothing wrong with this. —Preceding unsigned comment added by 220.127.116.11 (talk) 14:33, 24 November 2008 (UTC)
- While I don't curse the traditional use of a plural verb form with data, I more often than not find it awkward, and I think most modern speakers treat data as a mass noun (like water, air, information), regardless of their education level. Below is the usage note at the entry for data in the American Heritage Dictionary of the English Language (3rd edition, 1992). I think it presents the issue well:
Data originated as the plural of Latin datum, "something given," and many maintain that it must still be treated as a plural form. The New York Times, for example, adheres to the traditional rule in this headline: "Data Are Elusive on the Homeless." But while data comes from a Latin plural form, the practice of treating data as plural in English often does not correspond to its meaning, given an understanding of what counts as data in modern research. We know, for example, what "data on the homeless" would consist of — surveys, case histories, statistical analyses, and so forth — but it would be a vain exercise to try to sort all of these out into sets of individual facts, each of them a "datum" on the homeless. (Does a case history count as a single datum, or as a collection of them? Is a correlation between rates of homelessness and unemployment itself a datum, or is it an abstraction over a number of data?) Since scientists and researchers think of data as a singular mass entity like information, it is entirely natural that they should have come to talk about it as such and that others should defer to their practice. Sixty percent of the Usage Panel accepts the use of data with a singular verb and pronoun in the sentence Once the data is in, we can begin to analyze it. A still larger number, 77 percent, accepts the sentence We have very little data on the efficacy of such programs, where the singularity of data is implicit in the use of the quantifier very little (contrast the oddness of We have very little facts on the efficacy of such programs).
- The Boston Globe reviewof the AHD gives insight into the philosophy of that dictionary's editors. Eric talk 14:08, 2 October 2009 (UTC)
- Here is some of what Merriam-Webster's Dictionary of English Usage, 1994, USA, pp 317-318 says on the subject:
To summarize, data has never been a plural of a count noun in English. It is used in two constructions — plural, with plural apparatus, and singular, as a mass noun, with singular apparatus. Both constructions are fully standard at any level of formality. The plural construction is more common.
Pronounced "Day-Ta" (US) and "Dar-Tar" (AU & UK*)
Living in the UK, I've only ever heard it pronounced as the former, "Day-ta"; only from Americans have I heard the latter, "Dar-Tar".
- Living in the Southern and Mid-Atlantic U.S., I've only heard it pronounced "Day-ta". JD Lambert(T|C) 01:54, 15 July 2007 (UTC)
I've lived in many states in the US, from the west coast to the east coast to the midwest. I've never heard anyone say dar-tar. I've heard day-ta and daa-ta (like Dagwood). Never dar-tar. Entbark 03:48, 23 July 2007 (UTC)
- Entbark, you may not have been to Massachusetts, or may not have heard someone from the Boston area, as they seem to be fond of injecting gratuitous "r"s into their speech. For example, listen to Norm Abram on The New Yankee Workshop. —QuicksilverT @ 23:41, 5 December 2007 (UTC)
Data synonym for information
Someone changed the page to say data is not a synonym for information. They should look it up in the dictionary: http://www.dict.org/bin/Dict?Form=Dict1&Query=data&Strategy=*&Database=* Daniel.Cardenas 15:34, 25 April 2007 (UTC)
- How can you post a reference which denies your own statement ??? From your link :
Data on its own has no meaning, only when interpreted by some kind of data processing system does it take on meaning and become information.
1234567.89 is data.
"Your bank balance has jumped 8087% to $1234567.89" is information.
Bob Novak 06:42, 26 April 2007 (UTC)
- I would also like to point you to some introductory material on information theory, like the one at MIT open course ware - Information and Entropy, where concepts like information, data and code are explained. Bob Novak 07:57, 26 April 2007 (UTC)
That is classroom material applicable to computer science people and the like, but not 100% applicable to the rest of the world. Thanks for the link. Daniel.Cardenas 15:04, 27 April 2007 (UTC)
Data: verb or noun?
This statement, 'The word data is the plural of Latin datum, neuter past participle of dare, "to give", hence "something given",' is a little confusing. If datum and data are both nouns, they cannot also be past participles since participles are verb forms. That statement makes it sound like the noun datum is a particple of dare. Nouns cannot be particples. The same word can be used as both a noun and a verb (e.g., "I scream" and "I heard a scream"), but a noun is NOT a participle EVER.
Oh, and I found where that phrase was taken from: http://www.johntcullen.com/sharpwriter/content/data_is.htm. Hardly a trustworthy source. He doesn't list any references, much less know the difference between a verb and noun.
Entbark 19:49, 12 July 2007 (UTC)
So, if no one is opposed to me changing it, I will modify the etymology section in a few days. Entbark 03:53, 23 July 2007 (UTC)
- The English usage section is still confused. Rather than try and win a debate, this needs to take a NPOV stance and observe there are two viewpoints:
- 1. That this is a Latin neuter noun and therefore the rules for a Latin plural apply.
- 2. That this is an uncounted noun and legitimately used in the singular.
- Clearly, we need a convention for this article. Common usage is the uncounted or mass noun. This seems to be backed up by the OED  which has this note on usage. Traditionally and in technical use data is treated as a plural, as in Latin it is the plural of datum. In modern non-scientific use, however, it is often treated as a singular, and sentences such as data was collected over a number of years are now acceptable. The etymology seems a little suspect though as we are told it is actually derived from a verb, yet the arguments used are that it takes the form of being a Latin singular neuter noun. Also, we know that datums is a legitimate plural usage of geological datum and people accept this, odd that the use of datums is not derided there through etymological argument. Spenny 13:59, 11 September 2007 (UTC)
- It is a declined form of the past participle of the Latin verb dare, "to give". The Latin "data" would translate as an adjective, "given", or as a noun, "given things"; it is equivalent. Because it is a participle, it grammatically functions as a noun or an adjective, and so follows the same pluralization rules as nouns and adjectives: singular -um, plural -a. --Nucleusboy (talk) 03:00, 28 November 2007 (UTC)
Data as plural
mass, plural and determiners
Grammatical rules dictate that a mass or uncountable noun, when appended to a determiner, must choose a determiner of the same type.
So if data is treated as a mass noun, one would ask " How much data was collected?" On the other hand if data is treated strictly as a countable, one would ask "How many data were collected?"
- I agree completely! "How much data was collected" does sound awkward and a little childish. Dave (djkernen)|Talk to me|Please help! 20:24, 6 December 2011 (UTC)
Citation to add
The current page says:
Data is the lowest level of abstraction, information is the next level, and finally, knowledge is the highest level among all three. For example, the height of Mt. Everest is generally considered as "data", a book on Mt. Everest geological characteristics may be considered as "information", and a report containing practical information on the best way to reach Mt. Everest's peak may be considered as "knowledge".
for the needed citation I propose
Most frequently the data - information - knowledge - [wisdom] hierarchy is attributed to Ackoff
Ackoff, Russell L (1989). “From Data to Wisdom” Journal of Applied Systems Analysis, v. 16 pp. 3-9
but it has been presented by earlier authors:
Kochen, Manfred (1974) Principles of Information Retrieval John Wiley & Sons Inc. (Ch 3)
I think that a careful reading of Kochen or Ackoff would lead one to argue that knowledge resides within the human mind (as soon as it is written down it becomes information) Thus I would change the example by deleting
" and a report containing practical information on the best way to reach Mt. Everest's peak may be considered as "knowledge"."
"and the practical understanding of an experienced climber of the best way to reach Mt. Everest's peak may be considered as "knowledge".
- Hmm. Under the usual rules of information theory, information seems to be an even lower level of abstraction than data; for example "e34t q3y y5i39.53yq3 53y q53q" would contain information, but not data, since it doesn't mean anything, whereas "4353675.7436" is data because it indicates a specific number. On the other hand, strings are data, whereas they only contain information; so in that sense information is more abstract than data. Ben Standeven (talk) 17:50, 24 August 2008 (UTC)
Data refers to a collection of organised information
I think this statement is incorrect. I have always understood data to be raw unprocessed, where as information was determined by the data. Yet this statement seems to be saying that Information is the raw form and data is the processed form. I have checked up on Google and it seems that this article is the only place which refers to data as process form see
I agree, as this is the academic interpretation I have always heard. I was surprised to find it reversed in the article. —Preceding unsigned comment added by 18.104.22.168 (talk) 18:37, 9 October 2008 (UTC)
A matter of personal importance
It is plainly evident that the question of whether "data" is a plural or is a mass noun is relevant only to the argument itself. This discussion exists to perpetuate the sense of correctness felt by the arguers on either side, nothing more. Struhs (talk) 18:56, 29 September 2009 (UTC)
Though, there exists some sense of correctness felt by arguers, as exists in all debates, etymology and word usage IS the domain of encyclopedic knowledge. The debate is important to linguists, authors of style guides, and academic sources. Many etymologists may even find evolution of the word striking and exciting.22.214.171.124 (talk) —Preceding undated comment added 07:44, 9 July 2010 (UTC).
Consistency in this Article
I know that the whole data is/data are debate is a hot button on this page, but Wiki policy does require one to at least be consistent. Since the article opens with the assertion that data is the plural of datum I believe would should use it that way consistenly in this article, except of course where we are presenting examples of its usage as a singular noun. So I went through the few places where its use was inconsistent and, um, regularized it. Dave (djkernen)|Talk to me|Please help! 20:28, 6 December 2011 (UTC)
- The problem with this is that the article also goes on to say that the most common usage is that of the singular, and Wikipedia's policies all go towards common usage rather than correct usage. Plus, the statement that data is the plural of datum, is, as you've noted, controversial, and thus its inclusion without qualifiers means it violates Wikipedia's Neutral Point of View policy. It would not be a good idea to make a choice based on a violation.
The article says: 'Some major newspapers such as The New York Times use it either in the singular or plural. In the New York Times the phrases "the survey data are still being analyzed" and "the first year for which data is available" have appeared within one day.' However the author of this sentence is parsing the second example incorrectly. The verb 'is' refers to 'the first year,' definitely singular, and not to the word 'data.'
Data(computing) in an Operational definition states Data are the quantities, characters, or symbols on which operations are performed by a computer.... Characters/symbols include what is commonly referred to as texts. Examples of texts where operations are performed by a computer include: computer programs (say written in COBOL), word processing, and a Google search of millions of web pages. As I understand the Theoretical definition of data given here, texts in general are neither qualitative nor quantitative, thus texts in general are not data (some texts, "male/female" for example, may be qualitative data)
If texts in general are not data then is the following sentence correct? Data and texts are the quantities, characters, or symbols on which operations are performed by a computer.... Rather than simply adding and texts, is their a better correction? It would be awkward to have an article titled Data(computing) that in its first sentence expands the article beyond just data.
Data in different contexts
I am a newbie, please advise if there is a better way to go about what I am trying to do. Which is to suggest that there are fundamental problems with the Data entry with the hope that it may be improved.
(1) There is an entry for "Data" and an entry for "Data (computing)", but the talk page for "Data" refers to "WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology". This sounds more like a talk page for "Data (computing)" than for "Data". I would have thought that the entry for "Data" would encompass contexts other than computing. Examples follow.
(2) Consider for example data in the physical sciences, the life sciences, the social sciences, in statistics and in "official" statistics.
(2a) Data in the physical sciences tends to measurements, e.g., of the positions of stars that were the motivation for Gauss' development of the normal distribution. Note Wikipedia entry Accuracy and precision.
(2b) Data in life and social sciences often consists of counts, e.g., numbers of persons in a population or numbers of births during a time period. Note Wikipedia entry Population biology.
(2c) Official statistics refers to data produced by governments, from which various kinds of statistics are derived, e.g., economic statistics, demographic and social statistics, and environmental statistics. The United Nations Statistics Division website http://unstats.un.org contains extensive information on official statistics.
(3) As the word is used in all these contexts, "Data" does indeed refer to "a collection of organised information". This shows that usage in these areas is not consistent with the usage given by http://www.diffen.com/difference/Data_vs_Information. Either one accepts that the same word is used with different meanings in different contexts, or one makes a choice for one meaning or the other. I suggest that in this instance extensive and at least roughly consistent usage of the word in Life sciences, Social science, Official statistics and Statistics ought to override the usage proposed in http://www.diffen.com/difference/Data_vs_Information.
(4) As the word is used in these contexts, the characterization can be sharpened beyond "organized information", which is so broad as to encompass nearly anything. "Data" is used more specficially to refer to systematic information about entities in some well-defined aggregate. "Systematic" signifies that the same information is provided for every entity in the aggregate, undefined values (age at first marriage for never married persons) and missing values excepted. "Well-defined" signifies conditions that define membership in the aggregate ("Emperor penguins in Antarctic on midnight 31 December 2013/1 January 2014"). Data in this sense is more specific than Information.
(5) From this perspective, at least, the sentence with which the Data entry begins, "Data are values of qualitative or quantitative variables that belong to a set" is deeply confused. Data provides values of variables, and the variables it provides values for constitute a set, but there is no reference to the set of entities the variables refer to.
(6) Data in this sense may or may not be "raw". The "raw" data captured from Population census (this redirects to Census, which is far more general) census questionnaires is processed by "editing" to produce "clean" data. The processes are described in detail in the United Nations Principles and Recommendations for Population and Housing Censuses and Handbook on Population and Housing Census Editing.
(7) Data in this sense is information, but of a very specific kind. Information is far more general that data in this sense.
(8) "Data" probably encompasses too much to manage with a single meaning for all contexts. The challenge is to identify a manageable number of meanings and characterize them well. The characterization sketched above may not be able to accommodate literary texts regarded as data, for example, and this may be a well established and defensible usage. It is probably necessary to say a good deal more about Data structure, though not only in the context of computing. The content of the [[Data]] and [[Data (computing)]] entries does not to me justify the distinction.
(9) This discussion is pertinent to improving the Data quality assessment entry, currently in a primitive state. Considering data quality assessment issues might be a useful for clarifying what "data" is.