Talk:Data dredging

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Statistics (Rated C-class, Mid-importance)
WikiProject icon

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

C-Class article C  This article has been rated as C-Class on the quality scale.
 Mid  This article has been rated as Mid-importance on the importance scale.
 

Merge[edit]

I support the merge of this page with the page on Data Dredging. These are essentially the same concept by two different names. They should be on the same page. Maybe a disambiguation entry can be posted to differentiate these concepts. AjeetKhurana (talk) —Preceding undated comment added 13:58, 20 May 2009 (UTC).

Support merge - Data dredging is probably the best title (comment by John Quiggin, forgot to sign).

Do not merge - Bias through incorrect data-snooping is essentially different from the problem created by testing a hypothesis with the same data-set. For example, data-snooping bias may occur when dealing with an highly fluctuating set of data where every removal of a datapoint results in a new extreme, and so on. (Pc100935 11:59, 18 December 2006 (UTC))

Splitting a data-set parts A and B and then using part B to test a hypothesis formulated using part A is not recommended since these datasets can be highly correlated. Best practice is to formulate a hypothesis before looking at the data and use the data to test the hypothesis. If a hypothesis is based on existing data it should only be tested by collecting new independent data. (Pc100935 11:59, 18 December 2006 (UTC))

Support. Why hasn't the merge been made already? SweetNightmares (talk) 04:45, 21 January 2010 (UTC)

Support merge - while slightly different concepts are introduced by the two articles, there is no reason why that can't be written into one more coherent article that covers both. 94.195.129.125 (talk) 20:58, 5 April 2010 (UTC)

Merge done. 16:13, 30 November 2010 (UTC)

Global tag justified?[edit]

I don't think the global tag is justified: it's applied to an example of illegitimate hypothesis formation, but the example doesn't need to be universal! Richard Pinch (talk) 18:57, 11 June 2008 (UTC)

Well, it could certainly be rephrased in a more international manner, but I think the true problem is that the sentence is too long and not very clear, and doesn't bring a clear conclusion (why would it be wrong?) Calimo (talk) 15:11, 13 January 2009 (UTC)

Circumventing the scientific approach?[edit]

"Circumventing the traditional scientific approach of conducting an experiment without a hypothesis can lead to premature conclusions."

I believe the traditional scientific approach is to form a hypothesis before conducting an experiment, so the sentence should be rewritten to say, "Circumventing the traditional scientific approach by conducting an experiment without a hypothesis can lead to premature conclusions."

Unless someone knows better and objects, I will make the change. —Blanchette (talk) 06:57, 9 May 2011 (UTC)

Done. —Blanchette (talk) 21:19, 20 May 2011 (UTC)

Topics for new articles?[edit]

p-hacking, data peeking, and the replication crisis are related topics that probably deserve articles of their own. -- The Anome (talk) 10:26, 20 May 2014 (UTC)

Vague[edit]

This article has been watered down since I last read it, apparently in an effort to cast the topic in a more "neutral" light. Just the opening paragraph, for instance, now says: "Data dredging ... is the use of data mining to uncover relationships in data." That does not seem sufficient to me at all. All data mining is used for uncovering relationsships in data; the paragraph is almost tautological. The opening paragraph should instead succintly define data dredging as it differs from other ways of using data. If I can find reasonable sources, I may go ahead and rewrite some of it.--Anders Feder (talk) 10:03, 4 September 2014 (UTC)