In the design and analysis of experiments, post-hoc analysis (from Latin post hoc, "after this") consists of looking at the data—after the experiment has concluded—for patterns that were not specified a priori. It is sometimes called by critics data dredging to evoke the sense that the more one looks the more likely something will be found. More subtly, each time a pattern in the data is considered, a statistical test is effectively performed. This greatly inflates the total number of statistical tests and necessitates the use of multiple testing procedures to compensate. However, this is difficult to do precisely and in fact most results of post-hoc analyses are reported as they are with unadjusted p-values. These p-values must be interpreted in light of the fact that they are a small and selected subset of a potentially large group of p-values. Results of post-hoc analyses should be explicitly labeled as such in reports and publications to avoid misleading readers.
In practice, post-hoc analyses are usually concerned with finding patterns and/or relationships between subgroups of sampled populations that would otherwise remain undetected and undiscovered were a scientific community to rely strictly upon a priori statistical methods. Post-hoc tests — also known as a posteriori tests — greatly expand the range and capability of methods that can be applied in exploratory research. Post-hoc examination strengthens induction by limiting the probability that significant effects will seem to have been discovered between subgroups of a population when none actually exist. As it is, many scientific papers are published without adequate, preventative post-hoc control of the Type I Error Rate.
Post-hoc analysis is an important procedure without which multivariate hypothesis testing would greatly suffer, rendering the chances of discovering false positives unacceptably high. Ultimately, post-hoc testing creates better informed scientists who can therefore formulate better, more efficient a priori hypotheses and research designs.
Student–Newman–Keuls post-hoc ANOVA
The Student–Newman–Keuls and related tests are often referred to as post hoc. However, an experimenter often plans to test all pairwise comparisons before seeing the data. Therefore these tests are better categorized as a priori.
An example of an analysis often mislabeled as a post-hoc analysis is the Newman–Keuls method: "A different approach to evaluating a posteriori pairwise comparisons stems from the work of Student (1927), Newman (1939), and Keuls (1952). The Newman–Keuls procedure is based on a stepwise or layer approach to significance testing. Sample means are ordered from the smallest to the largest. The largest difference, which involves means that are r = p steps apart, is tested first at α level of significance; if significant, means that are r = p − 1 steps apart are tested at α level of significance and so on. The Newman–Keuls procedure provides an r-mean significance level equal to α for each group of r ordered means, that is, the probability of falsely rejecting the hypothesis that all means in an ordered group are equal to α. It follows that the concept of error rate applies neither on an experimentwise nor on a per comparison basis–the actual error rate falls somewhere between the two. The Newman–Keuls procedure, like Tukey's procedure, requires equal sample n's.
The critical difference , that two means separated by r steps must exceed to be declared significant is, according to the Newman–Keuls procedure,
The Newman–Keuls and Tukey procedures require the same critical difference for the first comparison that is tested. The Tukey procedure uses this critical difference for all the remaining tests, whereas the Newman–Keuls procedure reduces the size of the critical difference, depending on the number of steps separating the ordered means. As a result, the Newman–Keuls test is more powerful than Tukey's test. Remember, however, that the Newman–Keuls procedure does not control the experimentwise error rate at α.
Frequently a test of the overall null hypothesis m1 = m2 = … = mp is performed with an F statistic in ANOVA rather than with a range statistic. If the F statistic is significant, Shaffer (1979) recommends using the critical difference instead of to evaluate the largest pairwise comparison at the first step of the testing procedure. The testing procedure for all subsequent steps is unchanged. She has shown that the modified procedure leads to greater power at the first step without affecting control of the type I error rate. This makes dissonances, in which the overall null hypothesis is rejected by an F test without rejecting any one of the proper subsets of comparison, less likely."
List of post hoc tests
- Fisher's least significant difference (LSD)
- Bonferroni correction (This is properly used with planned, not post hoc, contrasts.)
- False discovery rate
- Duncan's new multiple range test
- Newman–Keuls method
- Rodger's method
- Scheffé's method
- Tukey's range test
- Dunnett's test
An examination of the statistical power characteristics of all but two of the tests listed above (six post hoc procedures and three planned ones) is provided in a 2013 article which can be viewed/downloaded by clicking the link in reference #6.
- Multiple comparisons
- The significance level α (alpha) in statistical hypothesis testing
- Subgroup analysis
- Post hoc ergo propter hoc
- Tukey's range test
- Jaccard, J.; Becker, M. A.; Wood, G. (1984). "Pairwise multiple comparison procedures: A review". Psychological Bulletin 96 (3): 589. doi:10.1037/0033-2909.96.3.589.
- Student, "Errors of routine analysis." Biometrika 19: 151–164. 1927.
- Newman D (1939). "The distribution of range in samples from a normal population, expressed in terms of an independent estimate of standard deviation". Biometrika 31 (1): 20–30. doi:10.1093/biomet/31.1-2.20.
- Keuls M (1952). "The use of the “studentized range” in connection with an analysis of variance". Euphytica 1: 112–122.
- Hayter, A. J. (1986). "The Maximum Familywise Error Rate of Fisher's Least Significant Difference Test". Journal of the American Statistical Association 81 (396): 1000–1004. doi:10.2307/2289074.
- Rodger, R.S. and Roberts, M. (2013). Comparison of power for multiple comparison procedures. Journal of Methods and Measurement in the Social Sciences, 4(1), 20–47.
- James E. Carlson and Others (1975) "The Distribution of the Test Statistic Used in the Newman–Keuls Multiple Comparison Technique", Annual Meeting of the American Educational Research Association (Washington, D. C., March 30–April 3, 1975)
- Klockars, A. J.; Hancock, G. R. (2000). "Scheffé's More Powerful F-Protected Post Hoc Procedure". Journal of Educational and Behavioral Statistics 25 (1): 13–19. doi:10.2307/1165310.