Statistical proof is the rational demonstration of degree of certainty for a proposition, hypothesis or theory that is used to convince others subsequent to a statistical test of the supporting evidence and the types of inferences that can be drawn from the test scores. Statistical methods are used to increase the understanding of the facts and the proof demonstrates the validity and logic of inference with explicit reference to a hypothesis, the experimental data, the facts, the test, and the odds. Proof has two essential aims: the first is to convince and the second is to explain the proposition through peer and public review.
The burden of proof rests on the demonstrable application of the statistical method, the disclosure of the assumptions, and the relevance that the test has with respect to a genuine understanding of the data relative to the external world. There are adherents to several different statistical philosophies of inference, such as Bayes theorem versus the likelihood function, or positivism versus critical rationalism. These methods of reason have direct bearing on statistical proof and its interpretations in the broader philosophy of science.
A common demarcation between science and non-science is the hypothetico-deductive proof of falsification developed by Karl Popper, which is a well-established practice in the tradition of statistics. Other modes of inference, however, may include the inductive and abductive modes of proof. Scientists do not use statistical proof as a means to attain certainty, but to falsify claims and explain theory. Science cannot achieve absolute certainty nor is it a continuous march toward an objective truth as the vernacular as opposed to the scientific meaning of the term "proof" might imply. Statistical proof offers a kind of proof of a theory's falsity and the means to learn heuristically through repeated statistical trials and experimental error. Statistical proof also has applications in legal matters with implications for the legal burden of proof.
There are two kinds of axioms, 1) conventions that are taken as true that should be avoided because they cannot be tested, and 2) hypotheses. Proof in the theory of probability was built on four axioms developed in the late 17th century:
- The probability of a hypothesis is a non-negative real number: ;
- The probability of necessary truth equals one: ;
- If two hypotheses h1 and h2 are mutually exclusive, then the sum of their probabilities is equal to the probability of their disjunction: ;
- The conditional probability of h1 given h2 is equal to the unconditional probability of the conjunction h1 and h2, divided by the unconditional probability of h2 where that probability is positive , where .
The preceding axioms provide the statistical proof and basis for the laws of randomness, or objective chance from where modern statistical theory has advanced. Experimental data, however, can never prove that the hypotheses (h) is true, but relies on an inductive inference by measuring the probability of the hypotheses relative to the empirical data. The proof is in the rational demonstration of using the logic of inference, math, testing, and deductive reasoning of significance.
Test and proof
The term proof descended from its Latin roots (provable, probable, probare L.) meaning to test. Hence, proof is a form of inference by means of a statistical test. Statistical tests are formulated on models that generate probability distributions. Examples of probability distributions might include the binary, normal, or poisson distribution that give exact descriptions of variables that behave according to natural laws of random chance. When a statistical test is applied to samples of a population, the test determines if the sample statistics are significantly different from the assumed null-model. True values of a population, which are unknowable in practice, are called parameters of the population. Researchers sample from populations, which provide estimates of the parameters, to calculate the mean or standard deviation. If the entire population is sampled, then the sample statistic mean and distribution will converge with the parametric distribution.
Using the scientific method of falsification, the probability value that the sample statistic is sufficiently different from the null-model than can be explained by chance alone is given prior to the test. Most statisticians set the prior probability value at 0.05 or 0.1, which means if the sample statistics diverge from the parametric model more than 5 (or 10) times out of 100, then the discrepancy is unlikely to be explained by chance alone and the null-hypothesis is rejected. Statistical models provide exact outcomes of the parametric and estimates of the sample statistics. Hence, the burden of proof rests in the sample statistics that provide estimates of a statistical model. Statistical models contain the mathematical proof of the parametric values and their probability distributions.
The formula is read as the probability of the parameter (or hypothesis =h, as used in the notation on axioms) “given” the data (or empirical observation), where the horizontal bar refers to "given". The right hand side of the formula calculates the prior probability of a statistical model (Pr [Parameter]) with the likelihood (Pr [Data | Parameter]) to produce a posterior probability distribution of the parameter (Pr [Parameter | Data]). The posterior probability is the likelihood that the parameter is correct given the observed data or samples statistics. Hypotheses can be compared using Bayesian inference by means of the Bayes factor, which is the ratio of the posterior odds to the prior odds. It provides a measure of the data and if it has increased or decreased the likelihood of one hypotheses relative to another.
The statistical proof is the Bayesian demonstration that one hypothesis has a higher (weak, strong, positive) likelihood. There is considerable debate if the Bayesian method aligns with Karl Poppers method of proof of falsification, where some have suggested that "...there is no such thing as "accepting" hypotheses at all. All that one does in science is assign degrees of belief...":180 According to Popper, hypotheses that have withstood testing and have yet to be falsified are not verified but corroborated. Some researches have suggested that Popper's quest to define corroboration on the premise of probability put his philosophy in line with the Bayesian approach. In this context, the likelihood of one hypothesis relative to another may be an index of corroboration, not confirmation, and thus statistically proven through rigorous objective standing.
In legal proceedings
Statistical proof in a legal proceeding can be sorted into three categories of evidence:
- The occurrence of an event, act, or type of conduct,
- The identity of the individual(s) responsible
- The intent or psychological responsibility
Statistical proof was not regularly applied in decisions concerning United States legal proceedings until the mid 1970's following a landmark jury discrimination case in Castaneda v. Partida. The US Supreme Court ruled that gross statistical disparities constitutes "prima facie proof" of discrimination, resulting in a shift of the burden of proof from plaintiff to defendant. Since that ruling, statistical proof has been used in many other cases on inequality, discrimination, and DNA evidence. However, there is not a one-to-one correspondence between statistical proof and the legal burden of proof. "The Supreme Court has stated that the degrees of rigor required in the fact finding processes of law and science do not necessarily correspond.":1533
In an example of a death row sentence (McCleskey v. Kemp[nb 2]) concerning racial discrimination, the petitioner, a black man named McCleskey was charged with the murder of a white police officer during a robbery. Expert testimony for McClesky introduced a statistical proof showing that "defendants charged with killing white victims were 4.3 times as likely to receive a death sentence as charged with killing blacks.".:595 Nonetheless, the statistics was insufficient "to prove that the decisionmakers in his case acted with discriminatory purpose.":596 It was further argued that there were "inherent limitations of the statistical proof",:596 because it did not refer to the specifics of the individual. Despite the statistical demonstration of an increased probability of discrimination, the legal burden of proof (it was argued) had to be examined on a case by case basis.
- Gold, B.; Simons, R. A. (2008). Proof and other dilemmas: Mathematics and philosophy. Mathematics Association of America Inc. ISBN 0-88385-567-4.
- Gattei, S. (2008). Thomas Kuhn's "Linguistic Turn" and the Legacy of Logical Empiricism: Incommensurability, Rationality and the Search for Truth. Ashgate Pub Co. p. 277. ISBN 0-7546-6160-1.
- Pedemont, B. (2007). "How can the relationship between argumentation and proof be analysed?". Educational Studies in Mathematics. 66 (1): 23–41. doi:10.1007/s10649-006-9057-x.
- Meier, P. (1986). "Damned Liars and Expert Witnesses" (PDF). Journal of the American Statistical Association. 81 (394): 269–276. doi:10.1080/01621459.1986.10478270.
- Wiley, E. O. (1975). "Karl R. Popper, Systematics, and Classification: A Reply to Walter Bock and Other Evolutionary Taxonomists". Systematic Zoology. Society of Systematic Biologists. 24 (2): 233–43. ISSN 0039-7989. JSTOR 2412764. doi:10.2307/2412764 – via JSTOR. (Registration required (. ))
- Howson, Colin; Urbach, Peter (1991). "Bayesian reasoning in science". Nature. 350 (6317): 371–4. ISSN 1476-4687. doi:10.1038/350371a0. (Registration required (. ))
- Sundholm, G. "Proof-Theoretical Semantics and Fregean Identity Criteria for Propositions" (PDF). The Monist. 77 (3): 294–314. doi:10.5840/monist199477315.
- Bissell, D. (1996). "Statisticians have a Word for it" (PDF). Teaching Statistics. 18 (3): 87–89. doi:10.1111/j.1467-9639.1996.tb00300.x.
- Sokal, R. R.; Rohlf, F. J. (1995). Biometry (3rd ed.). W.H. Freeman & Company. p. 887. ISBN 0-7167-2411-1.
- Heath, David (1995). An introduction to experimental design and statistics for biology. CRC Press. ISBN 1-85728-132-2.
- Hald, Anders (2006). A History of Parametric Statistical Inference from Bernoulli to Fisher, 1713-1935. Springer. p. 260. ISBN 0-387-46408-5.
- Huelsenbeck, J. P.; Ronquist, F.; Bollback, J. P. (2001). "Bayesian Inference of Phylogeny and Its Impact on Evolutionary Biology" (PDF). Science. 294 (5550): 2310–2314. doi:10.1126/science.1065889.
- Wade, P. R. (2000). "Bayesian methods in conservation biology" (PDF). Conservation Biology. 14 (5): 1308–1316. doi:10.1046/j.1523-1739.2000.99415.x.
- Sober, E. (1991). Reconstructing the Past: Parsimony, Evolution, and Inference. A Bradford Book. p. 284. ISBN 0-262-69144-2.
- Helfenbein, K. G.; DeSalle, R. (2005). "Falsiﬁcations and corroborations: Karl Popper's inﬂuence on systematics" (PDF). Molecular Phylogenetics and Evolution. 35: 271–280. doi:10.1016/j.ympev.2005.01.003.
- Fienberg, S. E.; Kadane, J. B. "The presentation of Bayesian statistical analyses in legal proceedings". Journal of the Royal Statistical Society, Series D. 32 (1/2): 88–98. JSTOR 2987595. doi:10.2307/2987595.
- Garaud, M. C. (1990). "Legal Standards and Statistical Proof in Title VII Litigation: In Search of a Coherent Disparate Impact Model". University of Pennsylvania Law Review. 139 (2): 455–503. JSTOR 3312286.
- The Harvard Law Review Association (1995). "Developments in the Law: Confronting the New Challenges of Scientific Evidence". Harvard Law Review. 108 (7): 1481–1605. JSTOR 1341808. doi:10.2307/1341808.
- Faigman, D. L. (1991). "Normative Constitutional Fact-Finding": Exploring the Empirical Component of Constitutional Interpretation". University of Pennsylvania Law Review. 139 (3): 541–613. JSTOR 3312337.
- Supreme Court of the United States Castaneda v. Partida, 1977  cited in Meier (1986) Ibid. who states "Thus, in the space of less than half a year, the Supreme Court had moved from the traditional legal disdain for statistical proof to a strong endorsement of it as being capable, on its own, of establishing a prima facie case against a defendant."
- 481 U.S. 279 (1987).