Bayesian probability
Bayesian probability is one of the different interpretations of the concept of probability and belongs to the category of evidential probabilities. The Bayesian interpretation of probability can be seen as an extension of logic that enables reasoning with uncertain statements. To evaluate the probability of a hypothesis, the Bayesian probabilist specifies some prior probability, which is then updated in the light of new relevant data. The Bayesian interpretation provides a standard set of procedures and formulae to perform this calculation. Bayesian probability interprets the concept of probability as "a measure of a state of knowledge",[1] in contrast to interpreting it as a frequency or a physical property of a system.
"Bayesian" refers to the 18th century statistician Thomas Bayes (1702–1761), who provided an early, mathematically incomplete treatment of a problem in Bayesian inference.[2] Ironically, Bayes was a minor figure in the history of science, who had little or no impact on the early development of statistics; it was the French mathematician Pierre-Simon Laplace (1749–1827) who pioneered and popularized what is now called Bayesian probability.[3]
Broadly speaking, there are two views on Bayesian probability that interpret the state of knowledge concept in different ways. According to the objectivist view, the rules of Bayesian statistics can be justified by requirements of rationality and consistency and interpreted as an extension of logic.[1][4] According to the subjectivist view, the state of knowledge measures a "personal belief".[5] Many modern machine learning methods are based on objectivist Bayesian principles.[6] One of the crucial features of the Bayesian view is that a probability is assigned to a hypothesis, whereas under the frequentist view, a hypothesis is typically rejected or not rejected without directly assigning a probability.
Contents |
[edit] Calculations with Bayesian probabilities
Bayes' theorem is one of the main tools for manipulating probabilities of any kind; that is, it is applicable no matter what interpretation is being placed on the probabilities being manipulated. Bayesian inference is a formal approach to making statistical inferences in cases where some of the probabilities are interpreted as representing beliefs, or knowledge, rather than having a frequency-based interpretation. While "Bayesian inference" makes uses of Bayes' theorem, not all cases where Bayes' theorem is applied should be labelled as "Bayesian statistics" or "Bayesian inference".
The use of Bayes' theorem in Bayesian inference may be described as follows. Let H denote a hypothesis; that a certain statement of supposed fact is true, or that a statistical parameter takes a certain value. Before observing data from a given experiment, one starts with some belief about whether the hypothesis H is true, expressed in the form of a probability, usually called the prior probability. Bayes' theorem is used to determine what one's probability for the hypothesis should be, once the outcome D from the experiment is known. The phrase "should be" is important here, as Bayes' theorem is a condensation of the rules that anyone should apply to updating beliefs, provided that they are acting according to reasonable rules of requirements of rationality and consistency.[1][4] The probability of the hypothesis once the outcome from the experiment is known is called the posterior probability.
The posterior probability is proportional to the likelihood of the observed data, multiplied by the prior probability, and is given by Bayes' theorem. Thus
where
is the prior probability of H: the probability that H is correct before the data D are seen.
is the conditional probability of seeing the data D given that the hypothesis H is true. This conditional probability is called the likelihood.
is the marginal probability of D.
is the posterior probability: the probability that the hypothesis is true, given the data and the previous state of belief about the hypothesis.
The quantity
is the prior probability of witnessing the data D under all possible hypotheses, and it depends on the prior probabilities given to each of these other possible hypotheses. Given any exhaustive set of mutually exclusive hypotheses Hi,
Here i can be considered to index alternative cases, of which exactly one is actually valid, and Hi is the hypothesis that case i is valid. Then
is then the probability that both case i is valid and that the data from the experiment turn out to be what was observed. Since the set of alternative cases is assumed to be mutually exclusive and exhaustive, the above formula is a case of the law of total probability. In many cases,
, which is a normalizing constant, need not be evaluated. As a result, Bayes' formula is often simplified to:
where
denotes proportionality.
[edit] Bayesian methodology
In general, Bayesian methods are characterized by the following concepts and procedures:
- The use of hierarchical models and the marginalization over the values of nuisance parameters. In most cases, the computation is intractable, but good approximations can be obtained using Markov chain Monte Carlo methods.
- The sequential use of the Bayes' formula: when more data becomes available after calculating a posterior distribution, the posterior becomes the next prior.
- In frequentist statistics, a hypothesis is a proposition (which must be either true or false), so that the (frequentist) probability of a frequentist hypothesis is either one or zero. In Bayesian statistics, a probability can be assigned to a hypothesis.
[edit] Objective and subjective Bayesian probabiliities
Broadly speaking, there are two views on Bayesian probability that interpret the 'state of knowledge' concept in different ways. For objectivists, the rules of Bayesian statistics can be justified by requirements of rationality and consistency[1][4]. Such requirements of rationality and consistency are also important for subjectivists, for which the state of knowledge corresponds to a 'personal belief' (rather than the objective state of knowledge in the world)[5]. For subjectivists however, rationality and consistency constrain the probabilities a subject may have, but allow for substantial variation within those constraints. The objective and subjective variants of Bayesian probability differ mainly in their interpretation and construction of the prior probability.
[edit] History
The term Bayesian refers to Thomas Bayes (1702–1761), who proved a special case of what is now called Bayes' theorem. However, it was Pierre-Simon Laplace (1749–1827) who introduced a general version of the theorem and used it to approach problems in celestial mechanics, medical statistics, reliability, and jurisprudence [7]. Early Bayesian inference, which used uniform priors following Laplace's principle of insufficient reason, was called "inverse probability" (because it infers backwards from observations to parameters, or from effects to causes[8]). After the 1920s, "inverse probability" was largely supplanted by a collection of methods that came to be called frequentist statistics.[8]
In the 20th century, the ideas of Laplace were further developed in two different directions, giving rise to objective and subjective currents in Bayesian practice. In the objectivist stream, the statistical analysis depends on only the model assumed and the data analysed.[9]. No subjective decisions need to be involved. In contrast, "subjectivist" statisticians deny the possibility of fully objective analysis for the general case.
In the 1980s, there was a dramatic growth in research and applications of Bayesian methods, mostly attributed to the discovery of Markov chain Monte Carlo methods, which removed many of the computational problems, and an increasing interest in nonstandard, complex applications.[10] Despite growth of Bayesian research, most undergraduate teaching is still based on frequentist statistics.[11] Nonetheless, Bayesian methods are widely accepted and used, such as for example in the field of machine learning.[6]
[edit] Justification of Bayesian probabilities
The use of Bayesian probabilities as the basis of Bayesian inference has been supported by several arguments, such as the Cox axioms, the Dutch book argument, arguments based on decision theory and de Finetti's theorem.
[edit] Axiomatic approach
Richard T. Cox showed that[4] Bayesian updating follows from several axioms, including two functional equations and the controversial hypothesis that probability should be treated as a continuous function. Here "continuity" is equivalent to countable additivity, as proved in measure-theoretic probability books. The countable additivity requirement is rejected (e.g. for being non-falsifiable) by Bruno de Finetti, for example.
[edit] Dutch book approach
The Dutch book argument was proposed by de Finetti, and is based on betting. A Dutch book is made when a clever gambler places a set of bets that guarantee a profit, no matter what the outcome is of the bets. If a bookmaker follows the rules of the Bayesian calculus in the construction of his odds, a Dutch book cannot be made.
However, Ian Hacking noted that traditional Dutch book arguments did not specify Bayesian updating: they left open the possibility that non-Bayesian updating rules could avoid Dutch books. For example, Hacking writes[12] "And neither the Dutch book argument, nor any other in the personalist arsenal of proofs of the probability axioms, entails the dynamic assumption. Not one entails Bayesianism. So the personalist requires the dynamic assumption in order to be Bayesian. It is true that in consistency a personalist could abandon the Bayesian model of learning from experience. Salt could lose its savour."
In fact, there are non-Bayesian updating rules that also avoid Dutch books (as discussed in the literature on "probability kinematics" following the publication of Richard C. Jeffrey's rule). The additional hypotheses sufficient to (uniquely) specify Bayesian updating are substantial, complicated, and unsatisfactory.[13]
[edit] Decision theory approach
A decision-theoretic justification of the use of Bayesian inference (and hence of Bayesian probabilities) was given by Abraham Wald,[citation needed] who proved that every Bayesian procedure is admissible.[citation needed] Conversely, every admissible statistical procedure is either a Bayesian procedure or a limit of Bayesian procedures.[14]
[edit] Personal probabilities and objective methods for constructing priors
Following the work on expected utility theory of Ramsey and von Neumann, decision-theorists have accounted for rational behavior using a probability distribution for the agent. Johann Pfanzagl completed The Theory of Games and Economic Behavior by providing an axiomatization of subjective probability and utility, a task left uncompleted by von Neumann and Oskar Morgenstern: their original theory supposed that all the agents had the same probability distribution, as a convenience.[15] Pfanzagl's axiomatization was endorsed by Oskar Morgenstern: "Von Neumann and I have anticipated" the question whether probabilities "might, perhaps more typically, be subjective and have stated specifically that in the latter case axioms could be found from which could derive the desired numerical utility together with a number for the probabilities (cf. p. 19 of The Theory of Games and Economic Behavior). We did not carry this out; it was demonstrated by Pfanzagl ... with all the necessary rigor".[16]
Ramsey and Savage noted that the individual agent's probability distribution could be objectively studied in experiments. The role of judgment and disagreement in science has been recognized since Aristotle and even more clearly with Francis Bacon. The objectivity of science lies not in the psychology of individual scientists, but in the process of science and especially in statistical methods, as noted by C. S. Peirce.[citation needed] Recall that the objective methods for falsifying propositions about personal probabilities have been used for a half century, as noted previously. Procedures for testing hypotheses about probabilities (using finite samples) are due to Ramsey (1931) and de Finetti (1931, 1937, 1964, 1970). Both Bruno de Finetti and Frank P. Ramsey acknowledge[citation needed] their debts to pragmatic philosophy, particularly (for Ramsey) to Charles S. Peirce.
The "Ramsey test" for evaluating probability distributions is implementable in theory, and has kept experimental psychologists occupied for a half century.[17] This work demonstrates that Bayesian-probability propositions can be falsified, and so meet an empirical criterion of Charles S. Peirce, whose work inspired Ramsey. (This falsifiability-criterion was popularized by Karl Popper.[citation needed])
Modern work on the experimental evaluation of personal probabilities use the randomization, blinding, and Boolean-decision procedures of the Peirce-Jastrow experiment.[18] Since individuals act according to different probability judgements, these agents' probabilities are "personal" (but amenable to objective study).
Personal probabilities are problematic for science and for some applications where decision-makers lack the knowledge or time to specify an informed probability-distribution (on which they are prepared to act). To meet the needs of science and of human limitations, Bayesian statisticians have developed "objective" methods for specifying prior probabilities.
Indeed, some Bayesians have argued the prior state of knowledge defines the (unique) prior probability-distribution for "regular" statistical problems; cf. well-posed problems. Finding the right method for constructing such "objective" priors (for appropriate classes of regular problems) has been the quest of statistical theorists from Laplace to John Maynard Keynes, Harold Jeffreys, and Edwin Thompson Jaynes: These theorists and their successors have suggested several methods for constructing "objective" priors:
Each of these methods contributes useful priors for "regular" one-parameter problems, and each prior can handle some challenging statistical models (with "irregularity" or several parameters). Each of these methods has been useful in Bayesian practice. Indeed, methods for constructing "objective" (alternatively, "default" or "ignorance") priors have been developed by avowed subjective (or "personal") Bayesians like James Berger (Duke University) and José-Miguel Bernardo (Universitat de València), simply because[citation needed] such priors are needed for Bayesian practice, particularly in science. Each of these methods gives implausible priors for some problems, and so the quest for "the universal method for constructing priors" continues to attract statistical theorists.[citation needed]
Thus, the Bayesian statistican needs either to use informed priors (using relevant expertise or previous data) or to choose among the competing methods for constructing "objective" priors.
[edit] See also
- Bayesian brain – the application of Bayesian theory to the functioning of the brain
- Bayesian experimental design
- Bayesian inference – Statistical inference and methods using Bayesian probability
- Bayesian Kepler periodogram – a method used to find new exoplanets
- Bayesian network – Bayesian reasoning for multiple variables in the presence of conditional independencies
- Bertrand's paradox: a paradox in classical probability, solved by Bayesian methods
- Expected utility
- De Finetti's game – a procedure for evaluating someone's subjective probability
- Empirical Bayes method
- Fiducial inference – Fisher's attempt to produce probability-distributions on the parameter space without using a prior.
- Frequency probability – the main alternative to the Bayesian view
- Inference
- Likelihood function
- Maximum entropy thermodynamics – a Bayesian view of thermodynamics due to Edwin T. Jaynes
- Predictive inference
- Probability interpretations
- Uncertainty
- Dempster–Shafer theory - Generalization of Bayesian theory
[edit] Footnotes
- ^ a b c d ET. Jaynes. Probability Theory: The Logic of Science Cambridge University Press, (2003). ISBN 0-521-59271-2
- ^ Stephen M. Stigler (1986) The history of statistics. Harvard University press. pg 131.
- ^ Stephen M. Stigler (1986) The history of statistics. Harvard University press. pg 97-98, pg 131.
- ^ a b c d Richard T. Cox, Algebra of Probable Inference, The Johns Hopkins University Press, 2001
- ^ a b de Finetti, B. (1974) Theory of probability (2 vols.), J. Wiley & Sons, Inc., New York
- ^ a b Bishop, CM., Pattern Recognition and Machine Learning. Springer, 2007
- ^ Stephen M. Stigler (1986) The history of statistics. Harvard University press. Chapter 3.
- ^ a b Stephen. E. Fienberg, (2006) "When did Bayesian Inference become "Bayesian"? Bayesian Analysis, 1 (1), 1–40. See page 5.
- ^ JM. Bernardo (2005), "Reference analysis", Handbook of statistics, 25, 17–90
- ^ Wolpert, RL. (2004) A conversation with James O. Berger, Statistical science, 9, 205–218
- ^ José M. Bernardo (2006) A Bayesian mathematical statistics prior. ICOTS-7
- ^ Hacking (1967, Section 3, page 316), Hacking (1988, page 124)
- ^ van Frassen, B. (1989) Laws and Symmetries, Oxford University Press. ISBN 0198248601
- ^ Bickel & Doksum (2001, page 32)
- ^ Pfanzagl (1967, 1968
- ^ Morgenstern (1976, page 65)
- ^ Davidson et al. (1957)
- ^ Pierce & Jastrow (1885)
[edit] References
- Bickel, Peter J. and Doksum, Kjell A. (2001). Mathematical Statistics: Basic and Selected Topics, Volume 1 (Second (updated printing 2007) ed.). Pearson Prentice–Hall.
- Box, G.E.P. and Tiao, G.C. (1973) Bayesian Inference in Statistical Analysis, Wiley, ISBN 0-471-57428-7
- Donald Davidson, Patrick Suppes and Sidney Siegel (1957). Decision-Making: An Experimental Approachpublisher=Stanford University Press.
- de Finetti, Bruno. "Probabilism: A Critical Essay on the Theory of Probability and on the Value of Science," (translation of 1931 article) in Erkenntnis, volume 31, September 1989.
- de Finetti, Bruno (1937) “La Prévision: ses lois logiques, ses sources subjectives,” Annales de l'Institut Henri Poincaré,
- de Finetti, Bruno. "Foresight: its Logical Laws, Its Subjective Sources," (translation of the 1937 article in French) in H. E. Kyburg and H. E. Smokler (eds), Studies in Subjective Probability, New York: Wiley, 1964.
- de Finetti, Bruno . Theory of Probability, (translation by AFM Smith of 1970 book) 2 volumes, New York: Wiley, 1974–5.
- DeGroot, Morris (2004) Optimal Statistical Decisions. Wiley Classics Library. (Originally published 1970.) ISBN 0-471-68029-X.
- Edwards, Ward (1968). "Conservatism in Human Information Processing". in Kleinmuntz, B. Formal Representation of Human Judgment. Wiley.
- Edwards, Ward (1982). "Conservatism in Human Information Processing (excerpted)". in Daniel Kahneman, Paul Slovic and Amos Tversky. Judgment under uncertainty: Heuristics and biases. Cambridge University Press.
- Edwards, Ward (October 2008). Jie W. Weiss and David J. Weiss. ed. A Science of Decision Making:The Legacy of Ward Edwards. Oxford University Press. pp. 536. ISBN 9780195322989.
- Hald, Anders (1998). A History of Mathematical Statistics from 1750 to 1930. New York: Wiley. ISBN 0471179124.
- Pierce, C.S. and Jastrow J. (1885). "On Small Differences in Sensation". Memoirs of the National Academy of Sciences 3: pp. 73–83.
- Ramsey, Frank Plumpton (1931) “Truth and Probability” (PDF), Chapter VII in The Foundations of Mathematics and other Logical Essays, Reprinted 2001, Routledge. ISBN 0415225469,
- Pfanzagl, J (1967). "Subjective Probability Derived from the Morgenstern-von Neumann Utility Theory". in Martin Shubik. Essays in Mathematical Economics In Honor of Oskar Morgenstern. Princeton University Press. pp. 237–251.
- Pfanzagl, J. in cooperation with V. Baumann and H. Huber (1968). "Events, Utility and Subjective Probability". Theory of Measurement. Wiley. pp. 195–220.
- Morgenstern, Oskar (1978). "Some Reflections on Utility". in Andrew Schotter. Selected Economic Writings of Oskar Morgenstern. New York University Press. pp. 65–70. ISBN 9780814777718.
- Gelman, Andrew, Carlin, John B., Stern, Hal S. and Rubin, Donald B. (2003). Bayesian Data Analysis, Second Edition. Boca Raton, FL: Chapman and Hall/CRC. ISBN 1-584-88388-X.
- Carlin, Bradley P. and Louis, Thomas A. (2008). Bayesian Methods for Data Analysis, Third Edition. Boca Raton, FL: Chapman and Hall/CRC. ISBN 1-584-88697-8.
- Berger, James O (1985). Statistical Decision Theory and Bayesian Analysis. Springer Series in Statistics (Second ed.). Springer-Verlag. ISBN 0-387-96098-8.
- Bernardo, José M. and Smith, Adrian F. M. (1994). Bayesian Theory. Wiley. ISBN 047149464x.
- Robert, Christian P (1996). The Bayesian Choice – A Decision-Theoretic Motivation. Springer. ISBN 3540942963.
- Jaynes E.T. (2003) Probability Theory: The Logic of Science, CUP. ISBN 9780521592710 (Link to Fragmentary Edition of March 1996).
- Howson, C. and Urbach, P. (2005). Scientific Reasoning: the Bayesian Approach (3rd ed.). Open Court Publishing Company. ISBN 978-0812695786.
- Hacking, Ian (December 1967). "Slightly More Realistic Personal Probability". Philosophy of Science 34 (4): pp. 311–325.. JSTOR 186120. http://www.jstor.org/stable/186120.
- Hacking, I (1988) "Slightly More Realistic Personal Probability". 1967 apprticle partly reprinted in: Gärdenfors, Peter and Sahlin, Nils-Eric. (1988) Decision, Probability, and Utility: Selected Readings. 1988. Cambridge University Press. ISBN 0521336589
- Hajek, A. and Hartmann, S. (2010): "Bayesian Epistemology", in: Dancy, J., Sosa, E., Steup, M. (Eds.) (2001) A Companion to Epistemology, Wiley. ISBN 1405139005 Preprint
- Hartmann, S. and Sprenger, J. (2011) "Bayesian Epistemology", in: Bernecker, S. and Pritchard, D. (Eds.) (2011) Routledge Companion to Epistemology. Routledge. ISBN 10415962196 (Preprint)
- Phillips, L.D.; Edwards, W. (October 2008). "Chapter 6: Conservatism in a simple probability inference task (Journal of Experimental Psychology (1966) 72: 346-354)". in Jie W. Weiss and David J. Weiss. A Science of Decision Making:The Legacy of Ward Edwards. Oxford University Press. pp. 536. ISBN 9780195322989.
- Stigler, Stephen M. (1990). The History of Statistics: The Measurement of Uncertainty before 1900. Belknap Press/Harvard University Press. ISBN 0-674-40341-X.
- Stigler, Stephen M. (1999) Statistics on the Table: The History of Statistical Concepts and Methods. Harvard University Press. ISBN 0-674-83601-4
[edit] External links
- On-line textbook: Information Theory, Inference, and Learning Algorithms, by David MacKay, has many chapters on Bayesian methods, including introductory examples; arguments in favour of Bayesian methods (in the style of Edwin Jaynes); state-of-the-art Monte Carlo methods, message-passing methods, and variational methods; and examples illustrating the intimate connections between Bayesian inference and data compression.
- An Intuitive Explanation of Bayesian Reasoning A very gentle introduction by Eliezer Yudkowsky
- An on-line introductory tutorial to Bayesian probability from Queen Mary University of London
- Bretthorst, G. Larry, 1988, Bayesian Spectrum Analysis and Parameter Estimation in Lecture Notes in Statistics, 48, Springer-Verlag, New York, New York;
- James Franklin The Science of Conjecture: Evidence and Probability Before Pascal, history from a Bayesian point of view.


