Talk:Statistical significance

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Statistics (Rated C-class, High-importance)
WikiProject icon

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

C-Class article C  This article has been rated as C-Class on the quality scale.
 High  This article has been rated as High-importance on the importance scale.
 
WikiProject Mathematics (Rated C-class, Mid-importance)
WikiProject Mathematics
This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of Mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Mathematics rating:
C Class
Mid Importance
 Field:  Probability and statistics
One of the 500 most frequently viewed mathematics articles.

Comment on Dubious[edit]

The citation (USEPA December 1992) contains numerous statistical tests, some presented as p-values and some as confidence intervals. Figures 5-1 through 5-4 show some of the test statistics used in the citation. In two of the figures the statistics are for individual studies and can be assumed to be prospective. In the other two, the statistics are for pooled studies and can be assumed to be retrospective. Table 5-9 includes the results of the test of the hypothesis RR = 1 versus RR > 1 for individual studies and for pooled studies. In the cited report, no distinction between prospective tests and retrospective tests was made. This is a departure from the traditional scientific method, which makes a strict distinction between predictions of the future and explanations of the past. Gjsis (talk)

Journals banning significance testing[edit]

There is a small movement among some journals to ban significance testing as justification of results. This is largely in subfields where significance testing has been overused or misinterpreted. For instance, Basic and Applied Social Psychology, back in early 2015. I think this is worth mentioning somewhere in the article. Thoughts? – SJ + 19:15, 24 October 2016 (UTC)

I guess the issue would be due weight wp:weight. Are there prominent secondary sources (e.g., review articles) that comment and encourage this movement? Or is this just an editorial policy of a handful of journals? If the latter, I recommend holding off. danielkueh (talk) 20:18, 24 October 2016 (UTC)
Yes, it seems to be a big deal to some secondary sources. Some suggest that using null hypothesis significance testing to estimate the importance of a result is controversial. Here's Nature noting the controversy, here is Science News calling the method flawed, and here's an overview of the argument over P-values from a stats prof. All highlight the decision by BASP as a critical point in this field-wide discussion. – SJ + 04:05, 4 November 2016 (UTC)

Part of the debate: Why Most Published Research Findings Are False: [1]. Isambard Kingdom (talk) 20:28, 24 October 2016 (UTC)

Thanks for sharing but the PLoS article doesn't recommend or encourage the banning of statistical significance. Instead, it recommends researchers not to just "chase statistical significance" and that they should also be improving other factors related to sample size and experimental design. danielkueh (talk) 20:40, 24 October 2016 (UTC)


This article may be really helpful as a citation in the Reproducibility section. This could help to provide insight as to why sometimes it is so difficult to reproduce a study when the original researchers "chased" statistical significance. Cite error: There are <ref> tags on this page without content in them (see the help page). 148.85.225.112 (talk) 01:23, 7 November 2016 (UTC)

Timeline for introduction of 'null hypothesis' as concept[edit]

The null hypothesis wasn't given that name until 1935 (per Lady_tasting_tea), perhaps there is a way to describe the original definition / the Neyman-Pearson results without using that term. (In the history section). At the least this could include a ref to Fisher's work clarifying the concept. – SJ + 04:24, 4 November 2016 (UTC)

Suggested rework of the first paragraph[edit]

The first paragraph needs work. Revisiting the earlier discussion, updated to account for feedback and interim changes to the current lede:

Current first paragraph:

In statistical hypothesis testing, statistical significance (or a statistically significant result)
is attained whenever the observed p-value of a test statistic is less than the significance level defined for the study.
The p-value is the probability of obtaining results at least as extreme as those observed, given that the null hypothesis is true.
The significance level, α, is the probability of rejecting the null hypothesis, given that it is true.

Proposed paragraph, broken into logical segments:

1. In statistical hypothesis testing, a result has statistical significance when
it is very unlikely to have occurred given the null hypothesis.[1]
2. More precisely, the significance level defined for a study, α,
is the probability of the study rejecting the null hypothesis, given that it were true;
3. and the p-value of a result, p,
is the probability of obtaining a result at least as extreme, given that the null hypothesis were true.
4. The result is statistically significant, by the standards of the study, when p < α.

References

  1. ^ Myers, et al. 2010

Suggestions and comments welcome. Please – SJ + 12:54, 2 March 2017 (UTC)

We have already discussed this and there is no consensus for this proposed change. Please see the archives (e.g., Archive 2) for details of the inverse probability fallacy. Plus, the null hypothesis is always assumed to be true (conditional probability). danielkueh (talk) 14:14, 2 March 2017 (UTC)
Hello Daniel, how are you? This proposed first paragraph is different from the previous proposal, and has incorporated all feedback from the earlier discussion. Please look at the details. I have numbered the clauses to make this easier.
I believe you are taking issue with clause 1a. In that case, how would you complete that sentence without using p and α? "A statistically significant result of a study is... "
Finally, I'm not sure where you are going with your comment about the null hypothesis: getting a significant result often leads to concluding the null hypothesis is false.
Hello, Sj, I'm fine. Thanks for asking. Here is my list of issues:
  • First, to be technically correct, you would either "reject" or "retain" a null hypothesis. You would not conclude that it is false (see [[2]]). This approach is based on conditional probability, which is best explained with an example. Suppose you performed an experiment comparing two groups of runners and found that the difference in speed between the two groups was about 10 m/s, with a p-value of 0.02. You would read that as "the probability of finding a mean difference of 10 m/s, given that the null is true, is p = 0.02. Thus, the null is assumed to be true. All we're doing is setting a threshold that would allow us to either retain or reject the null. This is a not an easy concept to grasp, and if your goal is readability, I don't see the benefit of introducing it so early in the lead paragraph.
I see what you mean, thanks. Does my revised wording above avoid that? – SJ +
If you insist on going this route, at the very least, change "likely" to "very likely," and cite Myers et al. (2010) who states the following:
"First, a statistically significant result means that the value of a test statistic has occurred that is very unlikely if H_0 is true."
danielkueh (talk) 14:06, 8 March 2017 (UTC)
Because somebody says "very" we should say it too? Isambard Kingdom (talk) 14:19, 8 March 2017 (UTC)
No, because "likely" and "unlikely" are often understood as probabilities greater and less than fifty percent, respectively. In statistical significance, we often deal with small probabilities (5% or less). And yes, we should be consistent with the sources, especially if it's written by "somebody" whose work is a reliable source in the field (wp:v). danielkueh (talk) 14:31, 8 March 2017 (UTC)
  • What does "unlikely (segment 1)" mean? Is it measurable? Or is it just subjective probability? There is no transition from 1a to 2. I know "likelihood" in this context is measured by the p-value. An "unlikely result" is one in which the p-value is less than the pre-set alpha. I know that only because I have learned and used statistics. But to a naive reader, that is not clear. Wouldn't it be much simpler to just say a significant result is one in which the the p-value is less than the alpha? Removing all ambiguity?
Added a transition: "More precisely," . The point of the rest of the paragraph is to provide that clarification. – SJ +
  • If you were look at past discussions on this talk page (starting with Archive 2), you will notice that there is no clear consensus on what statistical significance really is. Your proposed paragraph attempts to at least define it as a a result that is "unlikely" as observed by a p-value that is less than alpha. I myself used to think it is just p-value less than alpha and I am sympathetic to defining a concept more concretely. But there some editors have argued that statistical significance is not something concrete like a number. They argue that it is a "concept" or a decision-making judgment aid of some kind. After much discussion, I am somewhat agnostic about what it is. I have dug deep to try to find a canonical reference that would settle it one way or another. But I have yet to find one. That is why we have settled on the imperfect arrangement of just saying that statistical significance (and the less controversial statistical significant result), whatever it may be, is attained whenever the p-value is less than alpha. Until we can find a canonical reference that provides a definitive definition, it is not up to WP to settle the issue by arbitrarily defining it one way or another.
Aha. I will have to look at the earliest instances more closely. As long as α << 0.5, it seems accurate to say that p < α describes a result that would be unlikely (happening a small % of the time) under the null hypothesis. But could you say α=1 and declare that all results for an experiment are significant? My gut says no, but the current definition in this article says yes. – SJ +
In principle, yes. The choice of the alpha is arbitrary and is based on convention. A blackjack player may arbitrarily change the rules of the game by setting the upper limit to 51 instead of 21 but he or she is not likely to find any takers. In any event, this is besides the point of this issue. Remove "is a result" as it is redundant and muddles the definition. Instead, just say "statistical significance (or a statistically significant result) is very unlikely to occur, ....." By the way, if you intend to omit "statistical significance" in favor of "statistical significant result," then you should propose a change to the article title. Otherwise, the tittle should be included in the lead sentence (WP:lead). danielkueh (talk) 14:06, 8 March 2017 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── Good points. Revised to include the title properly. – SJ + 21:29, 11 March 2017 (UTC)

  • I am all about improving readability but not at the expense of accuracy or precision. There is nothing wrong with using the technical terms (or jargon) in this article. It is after all, a high-level scientific concept, and a very narrow component of a much larger concept or approach (hypothesis thesis). As an example, if you were to take a look at the article "Chi Square," you will notice that it is defined as "the distribution of a sum of the squares of k independent standard normal random variables." Is it "jargon laden?" Yes, it is. But you know what, the chi square article is not an introductory article to statistics where it has to define for the reader every technical term such as distribution, sum of squares, k independent variables, etc. If readers want to know more, they can follow the wikilcnks or read a more basic introductory article.
  • Overall, and with due respect, I don't find the newly proposed first lead paragraph to be an improvement. It attempts to squeeze too much into so little space. Part of it, such as the setting alpha, is already explained in the current second lead paragraph. Why try to squeeze it in the first lead paragraph? There is no reason to do that. If anything, the entire lead can be expanded.
Best, danielkueh (talk) 05:54, 3 March 2017 (UTC)
The chi square definition looks fine to me, and not opaque: all of the terms it uses are understood outside the context of that concept. In contrast, the definition here tries to introduce two new variables to the reader before concluding the first sentence, neither of which exists outside the scope of that definition. I hope we can do better. Early statisticians had a notion of significance, which they then embodied in this definition; we can describe that notion without using the jargon they developed to make the concept precise, before adding detail.
Additionally, one issue here is that "statistical significance" is used in a tremendous number of non-technical documents. Many people looking for an explanation of it, unlike those looking for chi squared details, will not have a stats background. – SJ + 09:02, 8 March 2017 (UTC)
The term "DNA" is used in a tremendous number of non-technical publications and lots of non-biologists are curious about it. But we don't define DNA as a "goopy substance" do we? I have used chi square multiple times and I myself find the WP lead sentence on that topic to be barely comprehensible. I also find it somewhat disingenuous to say that it is less opaque than the lead sentence of this article given that this article introduces only two variables and actually defines them, not to mention an entire second paragraph that explains the context. In any event, I have said pretty much all I have to say about the proposal. Other editors, especially the stats-savvy ones should weigh in. Where I draw the line is that the lead should NOT change without consensus (wp:consensus). And if there is one, then so be it. But even if you do get a change, be prepared for a potential barrage of pushback. It will keep you busy. :) danielkueh (talk) 14:06, 8 March 2017 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── The opening sentence of DNA is a perfect example of clarity and perspective. It explains why the topic is important, and why one might have encountered it, without going into technical details (which become more specific, step by step, in the following sentences). – SJ + 21:29, 11 March 2017 (UTC)

Well, I think "molecule" and "development" are technical details. But then again, I'm just a silly biologist. The first lead paragraph of this article used to state the importance of statistical significance to statistical hypothesis testing but that was abruptly removed. Anyway, the importance of statistical significance is explained in the second lead paragraph of this article, which is not addressed by the present or newly proposed first lead paragraph. And that's ok, because statistical significance is a threshold (or finish line) that requires quite a bit of explaining. danielkueh (talk) 01:12, 12 March 2017 (UTC)
Sj, Very clear text. I support the change. Thank you. Isambard Kingdom (talk) 14:18, 2 March 2017 (UTC)
Thank you kindly, Isambard. I would like to refine it a bit further, but I do feel leading with a jargon-free summary, and introducing p and α before using them to define signficance, will make this clearer to many. – SJ + 04:50, 3 March 2017 (UTC)


I've revised the proposal based on Daniel's feedback so far, and removed the last sentence which has been shifted to the second paragraph. – SJ + 09:09, 8 March 2017 (UTC) And again. – SJ +

For the purpose of this discussion, I recommend writing out new revisions of just the proposed draft as opposed to making changes directly to the proposal where it first appeared so that other interested editors can follow the changes that have been made. But if you still want to make changes to the original draft, then use strikethroughs and inserts. As for this latest version, it still feels a little rushed from the first to the second sentence. But that's minor. Overall, it looks a lot better. No objections from me for this version. Other interested editors should weigh in. danielkueh (talk) 01:12, 12 March 2017 (UTC)
Ok, I'll do that for any further updates. So far it's only the three of us weighing in; I'll wait a while for other feedback. Warmly, – SJ + 18:53, 18 March 2017 (UTC)

It's been a couple of weeks; any objections to trying this new lede paragraph out? – SJ + 17:48, 31 March 2017 (UTC)

Give it a shot. :) danielkueh (talk) 19:26, 31 March 2017 (UTC)
Okay, see what you think in context. I reused a later Myers cite; not sure if the page numbers apply. Cheers, – SJ + 09:09, 21 April 2017 (UTC)
It looks fine. Good job. I corrected the right Myers reference so that it points to the right chapter. danielkueh (talk) 11:41, 21 April 2017 (UTC)
==Statistically significant versus highly statistically significant==
   I agree that the first paragraph could do with a re-write  - this might give it greater clarity. However, I also wish to add that I think the opening could distinguish " a statistically significant result" from a "highly significant result". My understanding is that the former is what you get if the probability of getting your results by chance alone was less than 1 in 20, i.e. p < 0.05, whereas the latter is what you get if the probability of getting your results by chance alone is less than 1 in 100, i.e. p < 0.01. This article could also distinguish Type One errors from Type Two errors. Vorbee (talk) 16:51, 12 August 2017 (UTC)