Talk:Data dredging

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Statistics (Rated C-class, Mid-importance)
WikiProject icon

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

C-Class article C  This article has been rated as C-Class on the quality scale.
 Mid  This article has been rated as Mid-importance on the importance scale.


I support the merge of this page with the page on Data Dredging. These are essentially the same concept by two different names. They should be on the same page. Maybe a disambiguation entry can be posted to differentiate these concepts. AjeetKhurana (talk) —Preceding undated comment added 13:58, 20 May 2009 (UTC).

Support merge - Data dredging is probably the best title (comment by John Quiggin, forgot to sign).

Do not merge - Bias through incorrect data-snooping is essentially different from the problem created by testing a hypothesis with the same data-set. For example, data-snooping bias may occur when dealing with an highly fluctuating set of data where every removal of a datapoint results in a new extreme, and so on. (Pc100935 11:59, 18 December 2006 (UTC))

Splitting a data-set parts A and B and then using part B to test a hypothesis formulated using part A is not recommended since these datasets can be highly correlated. Best practice is to formulate a hypothesis before looking at the data and use the data to test the hypothesis. If a hypothesis is based on existing data it should only be tested by collecting new independent data. (Pc100935 11:59, 18 December 2006 (UTC))

Support. Why hasn't the merge been made already? SweetNightmares (talk) 04:45, 21 January 2010 (UTC)

Support merge - while slightly different concepts are introduced by the two articles, there is no reason why that can't be written into one more coherent article that covers both. (talk) 20:58, 5 April 2010 (UTC)

Merge done. 16:13, 30 November 2010 (UTC)

Global tag justified?[edit]

I don't think the global tag is justified: it's applied to an example of illegitimate hypothesis formation, but the example doesn't need to be universal! Richard Pinch (talk) 18:57, 11 June 2008 (UTC)

Well, it could certainly be rephrased in a more international manner, but I think the true problem is that the sentence is too long and not very clear, and doesn't bring a clear conclusion (why would it be wrong?) Calimo (talk) 15:11, 13 January 2009 (UTC)

Circumventing the scientific approach?[edit]

"Circumventing the traditional scientific approach of conducting an experiment without a hypothesis can lead to premature conclusions."

I believe the traditional scientific approach is to form a hypothesis before conducting an experiment, so the sentence should be rewritten to say, "Circumventing the traditional scientific approach by conducting an experiment without a hypothesis can lead to premature conclusions."

Unless someone knows better and objects, I will make the change. —Blanchette (talk) 06:57, 9 May 2011 (UTC)

Done. —Blanchette (talk) 21:19, 20 May 2011 (UTC)