Foundations of statistics
The Foundations of statistics concerns the epistemological debate in statistics over how one should conduct inductive inference from data. Among the issues considered in statistical inference are the question of Bayesian inference versus frequentist inference, the distinction between Fisher's "significance testing" and Neyman-Pearson "hypothesis testing", and whether the likelihood principle should be followed. Some of these issues have been debated for up to 200 years without resolution.
Bandyopadhyay & Forster describe four statistical paradigms: "(1) classical statistics or error statistics, (ii) Bayesian statistics, (iii) likelihood-based statistics, and (iv) the Akaikean-Information Criterion-based statistics".
It is unanimously agreed that statistics depends somehow on probability. But, as to what probability is and how it is connected with statistics, there has seldom been such complete disagreement and breakdown of communication since the Tower of Babel. Doubtless, much of the disagreement is merely terminological and would disappear under sufficiently sharp analysis.
- 1 Fisher's "significance testing" vs Neyman-Pearson "hypothesis testing"
- 2 Bayesian inference versus frequentist inference
- 3 The likelihood principle
- 4 Modeling
- 5 Other reading
- 6 See also
- 7 Notes
- 8 References
- 9 Further reading
- 10 External links
Fisher's "significance testing" vs Neyman-Pearson "hypothesis testing"
In the development of classical statistics in the second quarter of the 20th century two competing models of inductive statistical testing were developed. Their relative merits were hotly debated (for over 25 years) until Fisher's death. While a hybrid of the two methods is widely taught and used, the philosophical questions raised in the debate have not been resolved.
Fisher popularized significance testing, primarily in two popular and highly influential books. Fisher's writing style in these books was strong on examples and relatively weak on explanations. The books lacked proofs or derivations of significance test statistics (which placed statistical practice in advance of statistical theory). Fisher's more explanatory and philosophical writing was written much later. There appear to be some differences between his earlier practices and his later opinions.
Fisher was motivated to obtain scientific experimental results without the explicit influence of prior opinion. The significance test is a probabilistic version of Modus tollens, a classic form of deductive inference. The significance test might be simplistically stated, "If the evidence is sufficiently discordant with the hypothesis, reject the hypothesis". In application, a statistic is calculated from the experimental data, a probability of exceeding that statistic is determined and the probability is compared to a threshold. The threshold (the numeric version of "sufficiently discordant") is arbitrary (usually decided by convention). A common application of the method is deciding whether a treatment has a reportable effect based on a comparative experiment. Statistical significance is a measure of probability not practical importance. It can be regarded as a requirement placed on statistical signal/noise. The method is based on the assumed existence of an imaginary infinite population corresponding to the null hypothesis.
The significance test requires only one hypothesis. The result of the test is to reject the hypothesis (or not), a simple dichotomy. The test distinguish between truth of the hypothesis and insufficiency of evidence to disprove the hypothesis; so it is like a criminal trial in which the defendant's guilt is assessed in (so it is like a criminal trial in which the defendant is assumed innocent until proven guilty).
Neyman & Pearson collaborated on a different, but related, problem – selecting among competing hypotheses based on the experimental evidence alone. Of their joint papers the most cited was from 1933. The famous result of that paper is the Neyman-Pearson lemma. The lemma says that a ratio of probabilities is an excellent criterion for selecting a hypothesis (with the threshold for comparison being arbitrary). The paper proved an optimality of Student's t-test (one of the significance tests). Neyman expressed the opinion that hypothesis testing was a generalization of and an improvement on significance testing. The rationale for their methods is found in their joint papers.
Hypothesis testing requires multiple hypotheses. A hypothesis is always selected, a multiple choice. A lack of evidence is not an immediate consideration. The method is based on the assumption of a repeated sampling of the same population (the classical frequentist assumption).
Grounds of disagreement
The length of the dispute allowed the debate of a wide range of issues regarded as foundational to statistics.
|Fisher's Attack||Neyman's Rebuttal||Discussion|
|Repeated sampling of the same population
||Fisher's theory of fiducial inference is flawed
||Fisher's attack on the basis of frequentist probability failed, but was not without result. He identified a specific case (2x2 table) where the two schools of testing reach different results. This case is one of several that are still troubling. Commentators believe that the "right" answer is context dependent. Fiducial probability has not fared well, being virtually without advocates, while frequentist probability remains a mainstream interpretation.|
|Type II errors
||A purely probabilistic theory of tests requires an alternative hypothesis||Fisher's attack on type II errors has faded with time. In the intervening years statistics has separated the exploratory from the confirmatory. In the current environment, the concept of type II errors is used in power calculations for confirmatory hypothesis test sample size determination.|
||Fisher's attack on inductive behavior has been largely successful because of his selection of the field of battle. While operational decisions are routinely made on a variety of criteria (such as cost), scientific conclusions from experimentation are typically made on the basis of probability alone.|
In this exchange Fisher also discussed the requirements for inductive inference, with specific criticism of cost functions penalizing faulty judgments. Neyman countered that Gauss and Laplace used them. This exchange of arguments occurred 15 years after textbooks began teaching a hybrid theory of statistical testing.
Fisher and Neyman were in disagreement about the foundations of statistics (although united in opposition to the Bayesian view):
- The interpretation of probability
- The disagreement over Fisher's inductive reasoning vs Neyman's inductive behavior contained elements of the Bayesian/Frequentist divide. Fisher was willing to alter his opinion (reaching a provisional conclusion) on the basis of a calculated probability while Neyman was more willing to change his observable behavior (making a decision) on the basis of a computed cost.
- The proper formulation of scientific questions with special concern for modeling
- Whether it is reasonable to reject a hypothesis based on a low probability without knowing the probability of an alternative
- Whether a hypothesis could ever be accepted on the basis of data
- In mathematics, deduction proves, counter-examples disprove
- In the Popperian philosophy of science, advancements are made when theories are disproven
- Subjectivity: While Fisher and Neyman struggled to minimize subjectivity, both acknowledged the importance of "good judgment". Each accused the other of subjectivity.
- Fisher subjectively chose the null hypothesis.
- Neyman-Pearson subjectively chose the criterion for selection (which was not limited to a probability).
- Both subjectively determined numeric thresholds.
Fisher and Neyman were separated by attitudes and perhaps language. Fisher was a scientist and an intuitive mathematician. Inductive reasoning was natural. Neyman was a rigorous mathematician. He was convinced by deductive reasoning rather by a probability calculation based on an experiment. Thus there was an underlying clash between applied and theoretical, between science and mathematics.
Neyman, who had occupied the same building in England as Fisher, accepted a position on the west coast of the United States of America in 1938. His move effectively ended his collaboration with Pearson and their development of hypothesis testing. Further development was continued by others.
Textbooks provided a hybrid version of significance and hypothesis testing by 1940. None of the principals had any known personal involvement in the further development of the hybrid taught in introductory statistics today.
Statistics later developed in different directions including decision theory (and possibly game theory), Bayesian statistics, exploratory data analysis, robust statistics and nonparametric statistics. Neyman-Pearson hypothesis testing contributed strongly to decision theory which is very heavily used (in statistical quality control for example). Hypothesis testing readily generalized to accept prior probabilities which gave it a Bayesian flavor. Neyman-Pearson hypothesis testing has become an abstract mathematical subject taught in post-graduate statistics, while most of what is taught to under-graduates and used under the banner of hypothesis testing is from Fisher.
No major battles between the two classical schools of testing have erupted for decades, but sniping continues (perhaps encouraged by partisans of other controversies). After generations of dispute, there is virtually no chance that either statistical testing theory will replace the other in the foreseeable future.
The hybrid of the two competing schools of testing can be viewed very differently – as the imperfect union of two mathematically complementary ideas  or as the fundamentally flawed union of philosophically incompatible ideas. Fisher enjoyed some philosophical advantage, while Neyman & Pearson employed the more rigorous mathematics. Hypothesis testing is controversial among some users, but the most popular alternative (confidence intervals) is based on the same mathematics.
The history of the development left testing without a single citable authoritative source for the hybrid theory that reflects common statistical practice. The merged terminology is also somewhat inconsistent. There is strong empirical evidence that the graduates (and instructors) of an introductory statistics class have a weak understanding of the meaning of hypothesis testing.
- The interpretation of probability has not been resolved (but fiducial probability is an orphan).
- Neither test method has been rejected. Both are heavily used for different purposes.
- Texts have merged the two test methods under the term hypothesis testing.
- Mathematicians claim (with some exceptions) that significance tests are a special case of hypothesis tests.
- Others treat the problems and methods as distinct (or incompatible).
- The dispute has adversely affected statistical education.
Bayesian inference versus frequentist inference
Two different interpretations of probability (based on objective evidence and subjective degrees of belief) have long existed. Gauss and Laplace could have debated alternatives more than 200 years ago. Two competing schools of statistics have developed as a consequence. Classical inferential statistics was largely developed in the second quarter of the 20th Century, much of it in reaction to the (Bayesian) probability of the time which utilized the controversial principle of indifference to establish prior probabilities. The rehabilitation of Bayesian inference was a reaction to the limitations of frequentist probability. More reactions followed. While the philosophical interpretations are old, the statistical terminology is not. The current statistical terms "Bayesian" and "frequentist" stabilized in the second half of the 20th Century. The (philosophical, mathematical, scientific, statistical) terminology is confusing: the "classical" interpretation of probability is Bayesian while "classical" statistics is frequentist. "Frequentist" also has varying interpretations—different in philosophy than in physics.
The nuances of philosophical probability interpretations are discussed elsewhere. In statistics the alternative interpretations enable the analysis of different data using different methods based on different models to achieve slightly different goals. Any statistical comparison of the competing schools considers pragmatic criteria beyond the philosophical.
Two major contributors to frequentist (classical) methods were Fisher and Neyman. Fisher's interpretation of probability was idiosyncratic (but strongly non-Bayesian). Neyman's views were rigorously frequentist. Three major contributors to 20th century Bayesian statistical philosophy, mathematics and methods were de Finetti, Jeffreys and Savage. Savage popularized de Finetti's ideas in the English-speaking world and made Bayesian mathematics rigorous. In 1965, Dennis Lindley's 2-volume work "Introduction to Probability and Statistics from a Bayesian Viewpoint" brought Bayesian methods to a wide audience. Statistics has advanced over the past three generations; The "authoritative" views of the early contributors are not all current.
Frequentist inference is partially and tersely described above in (Fisher's "significance testing" vs Neyman-Pearson "hypothesis testing"). Frequentist inference combines several different views. The result is capable of supporting scientific conclusions, making operational decisions and estimating parameters with or without confidence intervals. Frequentist inference is based solely on the (one set of) evidence.
A classical frequency distribution describes the probability of the data. The use of Bayes' theorem allows a more abstract concept – the probability of a hypothesis (corresponding to a theory) given the data. The concept was once known as "inverse probability". Bayesian inference updates the probability estimate for a hypothesis as additional evidence is acquired. Bayesian inference is explicitly based on the evidence and prior opinion, which allows it to be based on multiple sets of evidence.
Comparisons of characteristics
Frequentists and Bayesians use different models of probability. Frequentists often consider parameters to be fixed but unknown while Bayesians assign probability distributions to similar parameters. Consequently, Bayesians speak of probabilities that don't exist for frequentists; A Bayesian speaks of the probability of a theory while a true frequentist can speak only of the consistency of the evidence with the theory. Example: A frequentist does not say that there is a 95% probability that the true value of a parameter lies within a confidence interval, saying instead that 95% of confidence intervals contain the true value.
Neither school is immune from mathematical criticism and neither accepts it without a struggle. Stein's paradox (for example) illustrated that finding a "flat" or "uninformative" prior probability distribution in high dimensions is subtle. Bayesians regard that as peripheral to the core of their philosophy while finding frequentism to be riddled with inconsistencies, paradoxes and bad mathematical behavior. Frequentists can explain most. Some of the "bad" examples are extreme situations - such as estimating the weight of a herd of elephants from measuring the weight of one ("Basu's elephants"), which allows no statistical estimate of the variability of weights. The likelihood principle has been a battleground.
Both schools have achieved impressive results in solving real-world problems. Classical statistics effectively has the longer record because numerous results were obtained with mechanical calculators and printed tables of special statistical functions. Bayesian methods have been highly successful in the analysis of information that is naturally sequentially sampled (radar and sonar). Many Bayesian methods and some recent frequentist methods (such as the bootstrap) require the computational power widely available only in the last several decades.
There is hint that Bayesian philosophy is "book smart" compared to Frequentist "street smarts". Bayesian philosophy has sometimes been silent on shuffling the cards. The "design of experiments" teaches the importance of the source of statistical data. Fisher was a major contributor to the theory.
Bayesians are united in opposition to the limitations of frequentism, but are philosophically divided into numerous camps (empirical, hierarchical, objective, personal, subjective), each with a different emphasis. One (frequentist) philosopher of statistics has noted a retreat from the statistical field to philosophical probability interpretations over the last two generations. There is a perception that successes in Bayesian applications do not justify the supporting philosophy. Bayesian methods often create useful models that are not used for traditional inference and which owe little to philosophy. None of the philosophical interpretations of probability (frequentist or Bayesian) appears robust. The frequentist view is too rigid and limiting while the Bayesian view can be simultaneously objective and subjective, etc.
- "carefully used, the frequentist approach yields broadly applicable if sometimes clumsy answers"
- "To insist on unbiased [frequentist] techniques may lead to negative (but unbiased) estimates of a variance; the use of p-values in multiple tests may lead to blatant contradictions; conventional 0.95-confidence regions may actually consist of the whole real line. No wonder that mathematicians find it often difficult to believe that conventional statistical methods are a branch of mathematics."
- "Bayesianism is a neat and fully principled philosophy, while frequentism is a grab-bag of opportunistic, individually optimal, methods."
- "in multiparameter problems flat priors can yield very bad answers"
- "[Bayes' rule] says there is a simple, elegant way to combine current information with prior experience in order to state how much is known. It implies that sufficiently good data will bring previously disparate observers to agreement. It makes full use of available information, and it produces decisions having the least possible error rate."
- "Bayesian statistics is about making probability statements, frequentist statistics is about evaluating probability statements."
- "[S]tatisticians are often put in a setting reminiscent of Arrow’s paradox, where we are asked to provide estimates that are informative and unbiased and confidence statements that are correct conditional on the data and also on the underlying true parameter." (These are conflicting requirements.)
- "formal inferential aspects are often a relatively small part of statistical analysis"
- "The two philosophies, Bayesian and frequentist, are more orthogonal than antithetical."
- "An hypothesis that may be true is rejected because it has failed to predict observable results that have not occurred. This seems a remarkable procedure."
- Bayesian theory has a mathematical advantage
- Frequentist probability has existence and consistency problems
- But, finding good priors to apply Bayesian theory remains (very?) difficult
- Both theories have impressive records of successful application
- Neither supporting philosophical interpretation of probability is robust
- There is increasing skepticism of the connection between application and philosophy
- Some statisticians are recommending active collaboration (beyond a cease fire)
The likelihood principle
Likelihood is a synonym for probability in common usage. In statistics it is reserved for probabilities that fail to meet the frequentist definition. A probability refers to variable data for a fixed hypothesis while a likelihood refers to variable hypotheses for a fixed set of data. Repeated measurements of a fixed length with a ruler generate a set of observations. Each fixed set of observational conditions is associated with a probability distribution and each set of observations can be interpreted as a sample from that distribution – the frequentist view of probability. Alternatively a set of observations may result from sampling any of a number of distributions (each resulting from a set of observational conditions). The probabilistic relationship between a fixed sample and a variable distribution (resulting from a variable hypothesis) is termed likelihood – a Bayesian view of probability. A set of length measurements may imply readings taken by careful, sober, rested, motivated observers in good lighting.
A likelihood is a probability (or not) by another name which exists because of the limited frequentist definition of probability. Likelihood is a concept introduced and advanced by Fisher for more than 40 years (although prior references to the concept exist and Fisher's support was half-hearted). The concept was accepted and substantially changed by Jeffreys. In 1962 Birnbaum "proved" the likelihood principle from premises acceptable to most statisticians. The "proof" has been disputed by statisticians and philosophers. The principle says that all of the information in a sample is contained in the likelihood function, which is accepted as a valid probability distribution by Bayesians (but not by frequentists).
Some (frequentist) significance tests are not consistent with the likelihood principle. Bayesians accept the principle which is consistent with their philosophy (perhaps encouraged by the discomfiture of frequentists). "[T]he likelihood approach is compatible with Bayesian statistical inference in the sense that the posterior Bayes distribution for a parameter is, by Bayes’s Theorem, found by multiplying the prior distribution by the likelihood function." Frequentists interpret the principle adversely to Bayesians as implying no concern about the reliability of evidence. "The likelihood principle of Bayesian statistics implies that information about the experimental design from which evidence is collected does not enter into the statistical analysis of the data." Many Bayesians (Savage for example) recognize that implication as a vulnerability.
The likelihood principle has become an embarrassment to both major philosophical schools of statistics; It has weakened both rather than favoring either. Its strongest supporters claim that it offers a better foundation for statistics than either of the two schools. "[L]ikelihood looks very good indeed when it is compared with these [Bayesian and frequentist] alternatives." These supporters include statisticians and philosophers of science. The concept needs further development before it can be regarded as a serious challenge to either existing school, but it seems to offer a promising compromise position. While Bayesians acknowledge the importance of likelihood for calculation, they believe that the posterior probability distribution is the proper basis for inference.
Inferential statistics is based on models. Much of classical hypothesis testing, for example, was based on the assumed normality of the data. Robust and nonparametric statistics were developed to reduce the dependence on that assumption. Bayesian statistics interprets new observations from the perspective of prior knowledge – assuming a modeled continuity between past and present. The design of experiments assumes some knowledge of those factors to be controlled, varied, randomized and observed. Statisticians are well aware of the difficulties in proving causation (more of a modeling limitation than a mathematical one), saying "correlation does not imply causation".
More complex statistics utilizes more complex models, often with the intent of finding a latent structure underlying a set of variables. As models and data sets have grown in complexity, foundational questions have been raised about the justification of the models and the validity of inferences drawn from them. The range of conflicting opinion expressed about modeling is large.
- Models can be based on scientific theory or on ad-hoc data analysis. The approaches use different methods. There are advocates of each.
- Model complexity is a compromise. The Akaikean information criterion and Bayesian information criterion are two less subjective approaches to achieving that compromise.
- Fundamental reservations have been expressed about even simple regression models used in the social sciences. A long list of assumptions inherent to the validity of a model is typically neither mentioned nor checked. A favorable comparison between observations and model is often considered sufficient.
- Bayesian statistics focuses so tightly on the posterior probability that it ignores the fundamental comparison of observations and model.
- Traditional observation-based models are inadequate to solve many important problems. A much wider range of models, including algorithmic models, must be utilized. "If the model is a poor emulation of nature, the conclusions may be wrong."
- Modeling is often poorly done (the wrong methods are used) and poorly reported.
In the absence of a strong philosophical consensus review of statistical modeling, many statisticians accept the cautionary words of statistician George Box, "All models are wrong, but some are useful."
For a short introduction to the foundations of statistics, see ch. 8 ("Probability and statistical inference") of Kendall's Advanced Theory of Statistics (6th edition, 1994).
In his book Statistics As Principled Argument, Robert P. Abelson articulates the position that statistics serves as a standardized means of settling disputes between scientists who could otherwise each argue the merits of their own positions ad infinitum. From this point of view, statistics is a form of rhetoric; as with any means of settling disputes, statistical methods can succeed only as long as all parties agree on the approach used.
- Efron 1978.
- Bandyopadhyay & Forster 2011.
- Citations of Savage (1972)
- Lehmann 2011.
- Gigerenzer et al. 1989.
- Louçã 1993.
- Fisher 1925.
- Fisher 1935.
- Fisher 1956.
- Neyman & Pearson 1933.
- Neyman & Pearson 1967.
- Fisher 1955.
- Neyman 1956.
- Lehmann 1993.
- Lenhard 2006.
- Halpin & Stam 2006.
- Lehmann & Romano 2005.
- Hubbard & Bayarri c. 2003.
- Sotos et al. 2007.
- Fienberg 2006.
- de Finetti 1964.
- Jeffreys 1939.
- Savage 1954.
- Efron 2013.
- Little 2005.
- Yu 2009.
- Berger 2003.
- Mayo 2013.
- Senn 2011.
- Gelman & Shalizi 2012.
- Cox 2005.
- Bernardo 2008.
- Kass c. 2012.
- Gelman 2008.
- Edwards 1999.
- Aldrich 2002.
- Birnbaum 1962.
- Backe 1999.
- Savage 1960, p. 585.
- Forster & Sober 2001.
- Royall 1997.
- Lindley 2000.
- Some large models attempt to predict the behavior of voters in the United States of America. The population is around 300 million. Each voter may be influenced by many factors. For some of the complications of voter behavior (most easily understood by the natives) see: http://www.stat.columbia.edu/~gelman/presentations/redbluetalkubc.pdf
- Efron mentions millions of data points and thousands of parameters from scientific studies.
- Tabachnick & Fidell 1996.
- Forster & Sober 1994.
- Freedman 1995.
- Breiman 2001.
- Chin n.d.
- Abelson, Robert P. (1995). Statistics as Principled Argument. Lawrence Erlbaum Associates. ISBN 0-8058-0528-1.
... the purpose of statistics is to organize a useful argument from quantitative evidence, using a form of principled rhetoric.
- Aldrich, John (2002). "How likelihood and identification went Bayesian". International Statistical Review. 70 (1): 79–98. doi:10.1111/j.1751-5823.2002.tb00350.x.
- Backe, Andrew (1999). "The likelihood principle and the reliability of experiments". Philosophy of Science. 66: S354–S361. doi:10.1086/392737.
- Bandyopadhyay, Prasanta; Forster, Malcolm, eds. (2011). Philosophy of statistics. Handbook of the Philosophy of Science. 7. Oxford: North-Holland. ISBN 978-0444518620. The text is a collection of essays.
- Berger, James O. (2003). "Could Fisher, Jeffreys and Neyman Have Agreed on Testing?". Statistical Science. 18 (1): 1–32. doi:10.1214/ss/1056397485.
- Bernardo, Jose M. (2008). "Comment on Article by Gelman". Bayesian Analysis. 3 (3): 453. doi:10.1214/08-BA318REJ.
- Birnbaum, A. (1962). "On the foundations of statistical inference". J. Amer. Statist. Ass. 57: 269–326.
- Breiman, Leo (2001). "Statistical Modeling: The Two Cultures". Statistical Science. 16 (3): 199–231. doi:10.1214/ss/1009213726.
- Chin, Wynne W. (n.d.). "Structural Equation Modeling in IS Research - Understanding the LISREL and PLS perspective". University of Houston lecture notes?
- Cox, D. R. (2005). "Frequentist and Bayesian Statistics: a Critique". Statistical Problems in Particle Physics, Astrophysics and Cosmology. PHYSTAT05.
- de Finetti, Bruno (1964). "Foresight: its Logical laws, its Subjective Sources". In Kyburg, H. E. Studies in Subjective Probability. H. E. Smokler. New York: Wiley. pp. 93–158. Translation of the 1937 French original with later notes added.
- Edwards, A.W.F. (1999). "Likelihood". Preliminary version of an article for the International Encyclopedia of the Social and Behavioral Sciences.
- Efron, Bradley (2013). "A 250-Year Argument: Belief, Behavior, and the Bootstrap". Bulletin (new series) of the American Mathematical Society. 50 (1): 129–146. doi:10.1090/s0273-0979-2012-01374-5.
- Efron, Bradley (1978). "Controversies in the foundations of statistics" (PDF). The American Mathematical Monthly. 85 (4): 231–246. doi:10.2307/2321163.
- Fienberg, Stephen E. (2006). "When did Bayesian inference become "Bayesian"?". Bayesian Analysis. 1 (1): 1–40. doi:10.1214/06-ba101.
- Fisher, R. A. (1925). Statistical Methods for Research Workers. Edinburgh: Oliver and Boyd.
- Fisher, Sir Ronald A. (1935). Design of Experiments. Edinburgh: Oliver and Boyd.
- Fisher, R (1955). "Statistical Methods and Scientific Induction" (PDF). Journal of the Royal Statistical Society, Series B. 17 (1): 69–78.
- Fisher, Sir Ronald A. (1956). The logic of scientific inference. Edinburgh: Oliver and Boyd.
- Forster, Malcolm; Sober, Elliott (1994). "How to Tell when Simpler, More Unified, or Less Ad Hoc Theories will Provide More Accurate Predictions". British Journal for the Philosophy of Science (45): 1–36.
- Forster, Malcolm; Sober, Elliott (2001). "Why likelihood". Likelihood and evidence: 89–99.
- Freedman, David (March 1995). "Some issues in the foundation of statistics". Foundations of Science. 1 (1): 19–39.
- Gelman, Andrew (2008). "Rejoinder". Bayesian Analysis. 3 (3): 467–478. doi:10.1214/08-BA318REJ. A joke escalated into a serious discussion of Bayesian problems by 5 authors (Gelman, Bernardo, Kadane, Senn, Wasserman) on pages 445-478.
- Gelman, Andrew; Shalizi, Cosma Rohilla (2012). "Philosophy and the practice of Bayesian statistics". British Journal of Mathematical and Statistical Psychology. 66: 8–38. doi:10.1111/j.2044-8317.2011.02037.x.
- Gigerenzer, Gerd; Swijtink, Zeno; Porter, Theodore; Daston, Lorraine; Beatty, John; Kruger, Lorenz (1989). "Part 3: The Inference Experts". The Empire of Chance: How Probability Changed Science and Everyday Life. Cambridge University Press. pp. 70–122. ISBN 978-0-521-39838-1.
- Halpin, P F; Stam, HJ (Winter 2006). "Inductive Inference or Inductive Behavior: Fisher and Neyman: Pearson Approaches to Statistical Testing in Psychological Research (1940–1960)". The American Journal of Psychology. 119 (4): 625–653. doi:10.2307/20445367. JSTOR 20445367. PMID 17286092.
- Hubbard, Raymond; Bayarri, M. J. (c. 2003). "P Values are not Error Probabilities" (PDF). A working paper that explains the difference between Fisher's evidential p-value and the Neyman–Pearson Type I error rate .
- Jeffreys, H. (1939). The theory of probability. Oxford University Press.
- Kass (c. 2012). "Why is it that Bayes' rule has not only captured the attention of so many people but inspired a religious devotion and contentiousness, repeatedly across many years?" (PDF).
- Lehmann, E. L. (December 1993). "The Fisher, Neyman-Pearson Theories of Testing Hypotheses: One Theory or Two?". Journal of the American Statistical Association. 88 (424): 1242–1249. doi:10.1080/01621459.1993.10476404.
- Lehmann, E. L. (2011). Fisher, Neyman, and the creation of classical statistics. New York: Springer. ISBN 978-1441994998.
- Lehmann, E.L.; Romano, Joseph P. (2005). Testing Statistical Hypotheses (3E ed.). New York: Springer. ISBN 0-387-98864-5.
- Lenhard, Johannes (2006). "Models and Statistical Inference: The Controversy between Fisher and Neyman–Pearson". Brit. J. Phil. Sci. 57: 69–91. doi:10.1093/bjps/axi152.
- Lindley, D.V. (2000). The philosophy of statistics. Journal of the Royal Statistical Society, Series D. 49. pp. 293–337. doi:10.1111/1467-9884.00238.
- Little, Roderick J. (2006). "Calibrated Bayes: A Bayes/Frequentist Roadmap". 60 (3).
- Louçã, Francisco (2008). "Should The Widest Cleft in Statistics-How and Why Fisher opposed Neyman and Pearson" (PDF). Working paper contains numerous quotations from the original sources of the dispute.
- Mayo, Deborah G. (February 2013). "Discussion: Bayesian Methods: Applied? Yes. Philosophical Defense? In Flux". The American Statistician. 67 (1): 11–15. doi:10.1080/00031305.2012.752410.
- Neyman, J; Pearson, E. S. (January 1, 1933). "On the Problem of the most Efficient Tests of Statistical Hypotheses". Phil. Trans. R. Soc. Lond. A. 231 (694–706): 289–337. doi:10.1098/rsta.1933.0009.
- Neyman, J (1967). Joint statistical papers of J.Neyman and E.S.Pearson. Cambridge University Press.
- Neyman, Jerzy (1956). "Note on an Article by Sir Ronald Fisher". Journal of the Royal Statistical Society, Series B. 18 (2): 288–294.
- Royall, Richard (1997). Statistical evidence : a likelihood paradigm. London New York: Chapman & Hall. ISBN 978-0412044113.
- Savage, L.J. (1972). Foundations of Statistics (second ed.).
- Senn, Stephen (2011). "You May Believe You Are a Bayesian But You Are Probably Wrong". RMM. 2: 48–66.
- Sotos, Ana Elisa Castro; Vanhoof, Stijn; Noortgate, Wim Van den; Onghena, Patrick (2007). "Students' Misconceptions of Statistical Inference: A Review of the Empirical Evidence from Research on Statistics Education". Educational Research Review. 2: 98–113. doi:10.1016/j.edurev.2007.04.001.
- Stuart A., Ord J.K. (1994). Kendall's Advanced Theory of Statistics, volume I: Distribution Theory (Edward Arnold).
- Tabachnick, Barbara G.; Fidell, Linda S. (1996). Using Multivariate Statistics (3rd ed.). ISBN 0-673-99414-7. "Principal components is an empirical approach while factor analysis and structural equation modeling tend to be theoretical approaches." p 27
- Yu, Yue (2009). "Bayesian vs. Frequentist" (pdf). Lecture notes? University of Illinois at Chicago
- Barnett, Vic (1999). Comparative Statistical Inference (3rd ed.). Wiley. ISBN 978-0-471-97643-1.
- Cox, David R. (2006). Principles of Statistical Inference. Cambridge University Press. ISBN 978-0-521-68567-2.
- Efron, Bradley (1986). Why Isn't Everyone a Bayesian? (with discussion). The American Statistician. 40. pp. 1–11. doi:10.2307/2683105. JSTOR 2683105.
- Good, I. J. (1988). The Interface Between Statistics and Philosophy of Science. Statistical Science. 3. pp. 386–397. doi:10.1214/ss/1177012754. JSTOR 2245388.
- Kadane J.B., Schervish M.J., Seidenfeld T. (1999), Rethinking the Foundations of Statistics (Cambridge University Press). [Bayesian.]
- Mayo, Deborah G. (1992). Did Pearson reject the Neyman-Pearson philosophy of statistics?. Synthese. 90. pp. 233–262. doi:10.1007/BF00485352.