|WikiProject Statistics||(Rated C-class, Mid-importance)|
I support the merge of this page with the page on Data Dredging. These are essentially the same concept by two different names. They should be on the same page. Maybe a disambiguation entry can be posted to differentiate these concepts. AjeetKhurana (talk) —Preceding undated comment added 13:58, 20 May 2009 (UTC).
Support merge - Data dredging is probably the best title (comment by John Quiggin, forgot to sign).
Do not merge - Bias through incorrect data-snooping is essentially different from the problem created by testing a hypothesis with the same data-set. For example, data-snooping bias may occur when dealing with an highly fluctuating set of data where every removal of a datapoint results in a new extreme, and so on. (Pc100935 11:59, 18 December 2006 (UTC))
Splitting a data-set parts A and B and then using part B to test a hypothesis formulated using part A is not recommended since these datasets can be highly correlated. Best practice is to formulate a hypothesis before looking at the data and use the data to test the hypothesis. If a hypothesis is based on existing data it should only be tested by collecting new independent data. (Pc100935 11:59, 18 December 2006 (UTC))
Support merge - while slightly different concepts are introduced by the two articles, there is no reason why that can't be written into one more coherent article that covers both. 126.96.36.199 (talk) 20:58, 5 April 2010 (UTC)
Merge done. 16:13, 30 November 2010 (UTC)
Global tag justified?
I don't think the global tag is justified: it's applied to an example of illegitimate hypothesis formation, but the example doesn't need to be universal! Richard Pinch (talk) 18:57, 11 June 2008 (UTC)
- Well, it could certainly be rephrased in a more international manner, but I think the true problem is that the sentence is too long and not very clear, and doesn't bring a clear conclusion (why would it be wrong?) Calimo (talk) 15:11, 13 January 2009 (UTC)
Circumventing the scientific approach?
"Circumventing the traditional scientific approach of conducting an experiment without a hypothesis can lead to premature conclusions."
I believe the traditional scientific approach is to form a hypothesis before conducting an experiment, so the sentence should be rewritten to say, "Circumventing the traditional scientific approach by conducting an experiment without a hypothesis can lead to premature conclusions."
Topics for new articles?
Apparently p-hacking was written again in 2014, and in fact I would be in favour of that. Data-dredging is a more sophisticated approach, whereas P-hacking is much easier and mindlessly done. Viguarda (talk) 11:30, 20 January 2015 (UTC)
- My first thoughts on reading this are that it doesn't seem to mention, or possibly distinguish, between the intentional and accidental cases. One could search through a data set for any statistically significant event, or for specific conclusions. Note that the latter is different from cherry picking, maybe cherry tree picking would be a better analogy. One selects the tree with the best fruit, picks all the fruit, (so as not to be accused of cherry picking), and then presents the results. Gah4 (talk) 00:03, 15 October 2016 (UTC)
This article has been watered down since I last read it, apparently in an effort to cast the topic in a more "neutral" light. Just the opening paragraph, for instance, now says: "Data dredging ... is the use of data mining to uncover relationships in data." That does not seem sufficient to me at all. All data mining is used for uncovering relationsships in data; the paragraph is almost tautological. The opening paragraph should instead succintly define data dredging as it differs from other ways of using data. If I can find reasonable sources, I may go ahead and rewrite some of it.--Anders Feder (talk) 10:03, 4 September 2014 (UTC)
I was thinking of using an image from this site as a headline image since it explains the idea really well (it's Creative Commons Attribution-licensed). Any thoughts on this, or suggestions for a particularly ridiculous one? Blythwood (talk) 22:48, 3 June 2015 (UTC)
Second Vague, also second no merge
The article appears to vague to me, too and combines multiple problems into one which should be separated. More thorough mathematical derivations would be helpful in my opinion. Some core statements are known to be wrong, although still frequently mentioned as urban legends in social sciences (e.g., testing multiple stochastically independent hypotheses on the same data set is no problem. That distinction is not made in the page so far). I suggest to rewrite. — Preceding unsigned comment added by Timo von Oertzen (talk • contribs) 15:05, 30 January 2017 (UTC)