Talk:P-value

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Misconception on p-values in intro?[edit]

I'm just reading about many of the misconceptions on the p-values, and as far as I can see one is reproduced in the intro. The intro states:

When the p-value is calculated correctly, this test guarantees that the Type I error rate is at most α.

Whereas Steve Goodman states:

"Misconception #9: P = .05 means that if you reject the null hypothesis, the probability of a type I error is only 5%. Now we are getting into logical quicksand. This statement is equivalent to Misconception #1, although that can be hard to see immediately. A type I error is a “false positive,” a conclusion that there is a difference when no difference exists. If such a conclusion represents an error, then by definition there is no difference. So a 5% chance of a false rejection is equivalent to saying that there is a 5% chance that the null hypothesis is true, which is Misconception #1."

Goodman, Steven. "A dirty dozen: twelve p-value misconceptions." Seminars in hematology. Vol. 45. No. 3. WB Saunders, 2008. http://www.perfendo.org/docs/BayesProbability/twelvePvaluemisconceptions.pdf

Should we erase the sentence or do I misunderstand it and it tells us a slightly different thing? Hildensia (talk) 20:19, 19 September 2016 (UTC)

Misleading examples[edit]

The examples given are rather misleading. For example in the section about the rolling of two dice the articles says. "In this case, a single roll provides a very weak basis (that is, insufficient data) to draw a meaningful conclusion about the dice. "

However it makes no attempt to explain why this is so - and a slight alteration of the conditions of the experiment renders this statement false.

Consider a hustler/gambler who has two sets of apparently identical dice - one of which is loaded and the other fair. If he forgets which is which - and then rolls one set and gets two sixes immediately then it is quite clear that he has identified the loaded set.

The example relies upon the underlying assumption that dice are almost always fair - and therefore it would take more than a single roll to convince you that they are not. However this assumption is never clarified - which might mislead people into supposing that a 0.05 p value would never be sufficient to establish statistical significance. Richard Cant — Preceding unsigned comment added by 152.71.70.77 (talk)

Edit in need of discussion[edit]

An edit [1] introduced a change that is worth discussion, replacing, in the overview section, the text:

In frequentist inference, the p-value is widely used in statistical hypothesis testing, specifically in null hypothesis significance testing. In this method, as part of experimental design, before performing the experiment, one first chooses a model (the null hypothesis) and a threshold value for p, called the significance level of the test, traditionally 5% or 1% [1] and denoted as α. If the p-value is less than or equal to the chosen significance level (α), the test suggests that the observed data is inconsistent with the null hypothesis, so the null hypothesis must be rejected. However, that does not prove that the tested hypothesis is true. When the p-value is calculated correctly, this test guarantees that the Type I error rate is at most α. For typical analysis, using the standard α = 0.05 cutoff, a widely used interpretation is:
  • A small p-value (≤ 0.05) indicates strong evidence against the null hypothesis, so it is rejected.
  • A large p-value (> 0.05) indicates weak evidence against the null hypothesis (fail to reject).
  • p-values very close to the cutoff (~ 0.05) are considered to be marginal (need attention).
So, the analysis must always report the p-value, so readers can draw their own conclusions.

with

In frequentist inference, the p-value is widely used in statistical hypothesis testing, specifically in null hypothesis significance testing. In this method, as part of experimental design, before performing the experiment, one first chooses a model (the null hypothesis) and a threshold value for p, called the significance level of the test, traditionally 5% or 1% [1] and denoted as α. If the p-value is less than or equal to the chosen significance level (α), the test suggests that the observed data is inconsistent with the null hypothesis, so the null hypothesis must be rejected. However, that does not prove that the tested hypothesis is true. When the p-value is calculated correctly, this test guarantees that the Type I error rate is at most α. For typical analysis, using the standard α = 0.05 cutoff, the null hypothesis is rejected when p < .05 and not rejected when p > .05. The p-value does not in itself support reasoning about the probabilities of hypotheses but is only a tool for deciding whether to reject the null hypothesis.

The edit does indeed replace a fluffy unsourced discussion with something sharper, but maybe it is worth having some sort of case-by-case account of how to handle p-values. Thoughts? Can we source the interpretation? — Charles Stewart (talk) 22:08, 29 April 2016 (UTC)

Thanks for the input. I may be one of the person who wrote this "fluffy" text, which is certainly sourced (Nature| volume = 506| issue = 7487| pages = 150–152).
My concern, for this page, as for many others, is that a random user seeking a quick answer will certainly be discouraged by the amount of ultra detailed information that in the information theory point of vue would amount to a large noise covering useful information ;-).
This is an encyclopedia, it must be useful. This page IMO is now nearly useless.' JPLeRouzic (talk) 05:33, 9 August 2016 (UTC)
    • ^ a b Nuzzo, R. (2014). "Scientific method: Statistical errors". Nature. 506 (7487): 150–152. doi:10.1038/506150a.