# Bayesian probability

(Redirected from Epistemic probability)

Bayesian probability is one interpretation of the concept of probability. The Bayesian interpretation of probability can be seen as an extension of propositional logic that enables reasoning with hypotheses, i.e., the propositions whose truth or falsity is uncertain.

Bayesian probability belongs to the category of evidential probabilities; to evaluate the probability of a hypothesis, the Bayesian probabilist specifies some prior probability, which is then updated in the light of new, relevant data (evidence).[1] The Bayesian interpretation provides a standard set of procedures and formulae to perform this calculation.

In contrast to interpreting probability as the "frequency" or "propensity" of some phenomenon, Bayesian probability is a quantity that we assign for the purpose of representing a state of knowledge,[2] or a state of belief.[3] In the Bayesian view, a probability is assigned to a hypothesis, whereas under the frequentist view, a hypothesis is typically tested without being assigned a probability.

The term "Bayesian" refers to the 18th century mathematician and theologian Thomas Bayes, who provided the first mathematical treatment of a non-trivial problem of Bayesian inference.[4] Mathematician Pierre-Simon Laplace pioneered and popularised what is now called Bayesian probability.[5]

Broadly speaking, there are two views on Bayesian probability that interpret the probability concept in different ways. According to the objectivist view, the rules of Bayesian statistics can be justified by requirements of rationality and consistency and interpreted as an extension of logic.[2][6] According to the subjectivist view, probability quantifies a "personal belief".[3]

## Bayesian methodology

Bayesian methods are characterized by the following concepts and procedures:

• The use of random variables, or, more generally, unknown quantities,[7] to model all sources of uncertainty in statistical models. This also includes uncertainty resulting from lack of information (see also the aleatoric and epistemic uncertainty).
• The need to determine the prior probability distribution taking into account the available (prior) information.
• The sequential use of the Bayes' formula: when more data becomes available, calculate the posterior distribution using the Bayes' formula; subsequently, the posterior distribution becomes the next prior.
• For the frequentist a hypothesis is a proposition (which must be either true or false), so that the frequentist probability of a hypothesis is either one or zero. In Bayesian statistics, a probability can be assigned to a hypothesis that can differ from 0 or 1 if the truth value is uncertain.

## Objective and subjective Bayesian probabilities

Broadly speaking, there are two views on Bayesian probability that interpret the 'probability' concept in different ways. For objectivists, probability objectively measures the plausibility of propositions, i.e. the probability of a proposition corresponds to a reasonable belief everyone (even a "robot") sharing the same knowledge should share in accordance with the rules of Bayesian statistics, which can be justified by requirements of rationality and consistency.[2][6] For subjectivists, probability corresponds to a 'personal belief'.[3] For subjectivists, rationality and coherence constrain the probabilities a subject may have, but allow for substantial variation within those constraints. The objective and subjective variants of Bayesian probability differ mainly in their interpretation and construction of the prior probability.

## History

The term Bayesian refers to Thomas Bayes (1702–1761), who proved a special case of what is now called Bayes' theorem in a paper titled "An Essay towards solving a Problem in the Doctrine of Chances".[8] In that special case, the prior and posterior distributions were Beta distributions and the data came from Bernoulli trials. It was Pierre-Simon Laplace (1749–1827) who introduced a general version of the theorem and used it to approach problems in celestial mechanics, medical statistics, reliability, and jurisprudence.[9] Early Bayesian inference, which used uniform priors following Laplace's principle of insufficient reason, was called "inverse probability" (because it infers backwards from observations to parameters, or from effects to causes).[10] After the 1920s, "inverse probability" was largely supplanted by a collection of methods that came to be called frequentist statistics.[10]

In the 20th century, the ideas of Laplace were further developed in two different directions, giving rise to objective and subjective currents in Bayesian practice. Harold Jeffreys' Theory of Probability (first published in 1939) played an important role in the revival of the Bayesian view of probability, followed by works by Abraham Wald (1950) and Leonard J. Savage (1954). The adjective Bayesian itself dates to the 1950s; the derived Bayesianism, neo-Bayesianism is of 1960s coinage.[11] In the objectivist stream, the statistical analysis depends on only the model assumed and the data analysed.[12] No subjective decisions need to be involved. In contrast, "subjectivist" statisticians deny the possibility of fully objective analysis for the general case.

In the 1980s, there was a dramatic growth in research and applications of Bayesian methods, mostly attributed to the discovery of Markov chain Monte Carlo methods, which removed many of the computational problems, and an increasing interest in nonstandard, complex applications.[13] Despite the growth of Bayesian research, most undergraduate teaching is still based on frequentist statistics.[14][citation needed] Nonetheless, Bayesian methods are widely accepted and used, such as in the field of machine learning.[15]

## Justification of Bayesian probabilities

The use of Bayesian probabilities as the basis of Bayesian inference has been supported by several arguments, such as the Cox axioms, the Dutch book argument, arguments based on decision theory and de Finetti's theorem.

### Axiomatic approach

Richard T. Cox showed that[6] Bayesian updating follows from several axioms, including two functional equations and a controversial hypothesis of differentiability. It is known that Cox's 1961 development (mainly copied by Jaynes) is non-rigorous, and in fact a counterexample has been found by Halpern.[16] The assumption of differentiability or even continuity is questionable since the Boolean algebra of statements may only be finite.[7] Other axiomatizations have been suggested by various authors to make the theory more rigorous.[7]

### Dutch book approach

The Dutch book argument was proposed by de Finetti, and is based on betting. A Dutch book is made when a clever gambler places a set of bets that guarantee a profit, no matter what the outcome of the bets. If a bookmaker follows the rules of the Bayesian calculus in the construction of his odds, a Dutch book cannot be made.

However, Ian Hacking noted that traditional Dutch book arguments did not specify Bayesian updating: they left open the possibility that non-Bayesian updating rules could avoid Dutch books. For example, Hacking writes[17] "And neither the Dutch book argument, nor any other in the personalist arsenal of proofs of the probability axioms, entails the dynamic assumption. Not one entails Bayesianism. So the personalist requires the dynamic assumption to be Bayesian. It is true that in consistency a personalist could abandon the Bayesian model of learning from experience. Salt could lose its savour."

In fact, there are non-Bayesian updating rules that also avoid Dutch books (as discussed in the literature on "probability kinematics" following the publication of Richard C. Jeffreys' rule, which is itself regarded as Bayesian [18]). The additional hypotheses sufficient to (uniquely) specify Bayesian updating are substantial, complicated, and unsatisfactory.[19]

### Decision theory approach

A decision-theoretic justification of the use of Bayesian inference (and hence of Bayesian probabilities) was given by Abraham Wald, who proved that every admissible statistical procedure is either a Bayesian procedure or a limit of Bayesian procedures.[20] Conversely, every Bayesian procedure is admissible.[21]

## Personal probabilities and objective methods for constructing priors

Following the work on expected utility theory of Ramsey and von Neumann, decision-theorists have accounted for rational behavior using a probability distribution for the agent. Johann Pfanzagl completed the Theory of Games and Economic Behavior by providing an axiomatization of subjective probability and utility, a task left uncompleted by von Neumann and Oskar Morgenstern: their original theory supposed that all the agents had the same probability distribution, as a convenience.[22] Pfanzagl's axiomatization was endorsed by Oskar Morgenstern: "Von Neumann and I have anticipated" the question whether probabilities "might, perhaps more typically, be subjective and have stated specifically that in the latter case axioms could be found from which could derive the desired numerical utility together with a number for the probabilities (cf. p. 19 of The Theory of Games and Economic Behavior). We did not carry this out; it was demonstrated by Pfanzagl ... with all the necessary rigor".[23]

Ramsey and Savage noted that the individual agent's probability distribution could be objectively studied in experiments. The role of judgment and disagreement in science has been recognized since Aristotle and even more clearly with Francis Bacon. The objectivity of science lies not in the psychology of individual scientists, but in the process of science and especially in statistical methods, as noted by C. S. Peirce.[24] Recall that the objective methods for falsifying propositions about personal probabilities have been used for a half century, as noted previously. Procedures for testing hypotheses about probabilities (using finite samples) are due to Ramsey (1931) and de Finetti (1931, 1937, 1964, 1970). Both Bruno de Finetti and Frank P. Ramsey acknowledge[citation needed] their debts to pragmatic philosophy, particularly (for Ramsey) to Charles S. Peirce.

The "Ramsey test" for evaluating probability distributions is implementable in theory, and has kept experimental psychologists occupied for a half century.[25] This work demonstrates that Bayesian-probability propositions can be falsified, and so meet an empirical criterion of Charles S. Peirce, whose work inspired Ramsey. (This falsifiability-criterion was popularized by Karl Popper.[26][27])

Modern work on the experimental evaluation of personal probabilities uses the randomization, blinding, and Boolean-decision procedures of the Peirce-Jastrow experiment.[28] Since individuals act according to different probability judgments, these agents' probabilities are "personal" (but amenable to objective study).

Personal probabilities are problematic for science and for some applications where decision-makers lack the knowledge or time to specify an informed probability-distribution (on which they are prepared to act). To meet the needs of science and of human limitations, Bayesian statisticians have developed "objective" methods for specifying prior probabilities.

Indeed, some Bayesians have argued the prior state of knowledge defines the (unique) prior probability-distribution for "regular" statistical problems; cf. well-posed problems. Finding the right method for constructing such "objective" priors (for appropriate classes of regular problems) has been the quest of statistical theorists from Laplace to John Maynard Keynes, Harold Jeffreys, and Edwin Thompson Jaynes: These theorists and their successors have suggested several methods for constructing "objective" priors:

Each of these methods contributes useful priors for "regular" one-parameter problems, and each prior can handle some challenging statistical models (with "irregularity" or several parameters). Each of these methods has been useful in Bayesian practice. Indeed, methods for constructing "objective" (alternatively, "default" or "ignorance") priors have been developed by avowed subjective (or "personal") Bayesians like James Berger (Duke University) and José-Miguel Bernardo (Universitat de València), simply because such priors are needed for Bayesian practice, particularly in science.[29] The quest for "the universal method for constructing priors" continues to attract statistical theorists.[29]

Thus, the Bayesian statistician needs either to use informed priors (using relevant expertise or previous data) or to choose among the competing methods for constructing "objective" priors.

## Bayesian average

A Bayesian average is a method of estimating the mean of a population consistent with Bayesian interpretation, where instead of estimating the mean strictly from any or all available data set, other existing information related to that data set may also be incorporated into the calculation in order to minimize the impact of large deviations, or to assert a default value when the data set is small.

Calculating the Bayesian average uses the prior mean m and a constant C. C is assigned a value that is proportional to the typical data set size. The value is larger when the expected variation between data sets (within the larger population) is small. It is smaller, when the data sets are expected to vary substantially from one another.

$\bar{x} = {Cm + \sum_{i=1}^n{x_i} \over C + n}$[30]

## References

1. ^ Paulos, John Allen. The Mathematics of Changing Your Mind, New York Times (US). August 5, 2011; retrieved 2011-08-06
2. ^ a b c Jaynes, E.T. "Bayesian Methods: General Background." In Maximum-Entropy and Bayesian Methods in Applied Statistics, by J. H. Justice (ed.). Cambridge: Cambridge Univ. Press, 1986
3. ^ a b c de Finetti, B. (1974) Theory of probability (2 vols.), J. Wiley & Sons, Inc., New York
4. ^ Stigler, Stephen M. (1986) The history of statistics. Harvard University press. pg 131.
5. ^ Stigler, Stephen M. (1986) The history of statistics., Harvard University press. pp97-98, 131.
6. ^ a b c Cox, Richard T. Algebra of Probable Inference, The Johns Hopkins University Press, 2001
7. ^ a b c Dupré, Maurice J., Tipler, Frank T. New Axioms For Bayesian Probability, Bayesian Analysis (2009), Number 3, pp. 599-606
8. ^ McGrayne, Sharon Bertsch. (2011). The Theory That Would Not Die, p. 10., p. 10, at Google Books
9. ^ Stigler, Stephen M. (1986) The history of statistics. Harvard University press. Chapter 3.
10. ^ a b Fienberg, Stephen. E. (2006) When did Bayesian Inference become "Bayesian"? Bayesian Analysis, 1 (1), 1–40. See page 5.
11. ^ "The works of Wald, Statistical Decision Functions (1950) and Savage, The Foundation of Statistics (1954) are commonly regarded starting points for current Bayesian approaches"; "Recent developments of the so-called Bayesian approach to statistics" Marshall Dees Harris, Legal-economic research, University of Iowa. Agricultural Law Center (1959), p. 125 (fn. 52); p. 126. "This revolution, which may or may not succeed, is neo-Bayesianism. Jeffreys tried to introduce this approach, but did not succeed at the time in giving it general appeal." Annals of the Computation Laboratory of Harvard University 31 (1962), p. 180. "It is curious that even in its activities unrelated to ethics, humanity searches for a religion. At the present time, the religion being 'pushed' the hardest is Bayesianism." Oscar Kempthorne, 'The Classical Problem of Inference—Goodness of Fit', Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability (1967), p. 235.
12. ^ Bernardo, J.M. (2005), Reference analysis, Handbook of statistics, 25, 17–90
13. ^ Wolpert, R.L. (2004) A conversation with James O. Berger, Statistical science, 9, 205–218
14. ^ Bernardo, José M. (2006) A Bayesian mathematical statistics primer. ICOTS-7
15. ^ Bishop, C.M. Pattern Recognition and Machine Learning. Springer, 2007
16. ^ Halpern, J. A counterexample to theorems of Cox and Fine, Journal of Artificial Intelligence Research, 10: 67-85.
17. ^ Hacking (1967, Section 3, page 316), Hacking (1988, page 124)
18. ^ "Bayes' Theorem". stanford.edu.
19. ^ van Frassen, B. (1989) Laws and Symmetry, Oxford University Press. ISBN 0-19-824860-1
20. ^ Wald, Abraham. Statistical Decision Functions. Wiley 1950.
21. ^ Bernardo, José M., Smith, Adrian F.M. Bayesian Theory. John Wiley 1994. ISBN 0-471-92416-4.
22. ^ Pfanzagl (1967, 1968)
23. ^ Morgenstern (1976, page 65)
24. ^ Stigler, Stephen M. (1978). "Mathematical statistics in the early States". Annals of Statistics 6 (March): 239–265 esp. p. 248. doi:10.1214/aos/1176344123. JSTOR 2958876. MR 483118.
25. ^ Davidson et al. (1957)
26. ^ "Karl Popper" in Stanford Encyclopedia of Philosophy
27. ^ Popper, Karl. (2002) The Logic of Scientific Discovery 2nd Edition, Routledge ISBN 0-415-27843-0 (Reprint of 1959 translation of 1935 original) Page 57.
28. ^ Peirce & Jastrow (1885)
29. ^ a b Bernardo, J. M. (2005). Reference Analysis. Handbook of Statistics 25 (D. K. Dey and C. R. Rao eds). Amsterdam: Elsevier, 17-90
30. ^ Yang, Xiao; Zhang, Zhaoxin (2013). "Combining Prestige and Relevance Ranking for Personalized Recommendation". Proceedings of the 22nd ACM international conference on information & knowledge management (CIKM): 1877–1880. doi:10.1145/2505515.2507885.