Bayes' rule

From Wikipedia, the free encyclopedia
Jump to: navigation, search

In probability theory and applications, Bayes' rule relates the odds of event A_1 to event A_2, before (prior to) and after (posterior to) conditioning on another event B. The odds on A_1 to event A_2 is simply the ratio of the probabilities of the two events. The prior odds is the ratio of the unconditional or prior probabilities, the posterior odds is the ratio of conditional or posterior probabilities given the event B. The relationship is expressed in terms of the likelihood ratio or Bayes factor, \Lambda. By definition, this is the ratio of the conditional probabilities of the event B given that A_1 is the case or that A_2 is the case, respectively. The rule simply states: posterior odds equals prior odds times Bayes factor (Gelman et al., 2005, Chapter 1).

When arbitrarily many events A are of interest, not just two, the rule can be rephrased as posterior is proportional to prior times likelihood, P(A|B)\propto P(A) P(B|A) where the proportionality symbol means that the left hand side is proportional to (i.e., equals a constant times) the right hand side as A varies, for fixed or given B (Lee, 2012; Bertsch McGrayne, 2012). In this form it goes back to Laplace (1774) and to Cournot (1843); see Fienberg (2005).

Bayes' rule is an equivalent way to formulate Bayes' theorem. If we know the odds for and against A we also know the probabilities of A. It may be preferred to Bayes' theorem in practice for a number of reasons.

Bayes' rule is widely used in statistics, science and engineering, for instance in model selection, probabilistic expert systems based on Bayes networks, statistical proof in legal proceedings, email spam filters, and so on (Rosenthal, 2005; Bertsch McGrayne, 2012). As an elementary fact from the calculus of probability, Bayes' rule tells us how unconditional and conditional probabilities are related whether we work with a frequentist interpretation of probability or a Bayesian interpretation of probability. Under the Bayesian interpretation it is frequently applied in the situation where A_1 and A_2 are competing hypotheses, and B is some observed evidence. The rule shows how one's judgement on whether A_1 or A_2 is true should be updated on observing the evidence B (Gelman et al., 2003).

The rule[edit]

Single event[edit]

Given events A_1, A_2 and B, Bayes' rule states that the conditional odds of A_1:A_2 given B are equal to the marginal odds of A_1:A_2 multiplied by the Bayes factor or likelihood ratio \Lambda:

O(A_1:A_2|B) = \Lambda(A_1:A_2|B) \cdot O(A_1:A_2) ,

where

\Lambda(A_1:A_2|B) = \frac{P(B|A_1)}{P(B|A_2)}.

Here, the odds and conditional odds, also known as prior odds and posterior odds, are defined by

O(A_1:A_2) = \frac{P(A_1)}{P(A_2)},
O(A_1:A_2|B) = \frac{P(A_1|B)}{P(A_2|B)}.

In the special case that A_1 = A and A_2 = \neg A, one writes O(A)=O(A:\neg A), and uses a similar abbreviation for the Bayes factor and for the conditional odds. The odds on A is by definition the odds for and against A. Bayes' rule can then be written in the abbreviated form

O(A|B) = O(A)  \cdot \Lambda(A|B) ,

or in words: the posterior odds on A equals the prior odds on A times the likelihood ratio for A given information B. In short, posterior odds equals prior odds times likelihood ratio.

The rule is frequently applied when A_1 = A and A_2 = \neg A are two competing hypotheses concerning the cause of some event B. The prior odds on A, in other words, the odds between  A and  \neg A, expresses our initial beliefs concerning whether or not  A is true. The event B represents some evidence, information, data, or observations. The likelihood ratio is the ratio of the chances of observing B under the two hypotheses A and \neg A. The rule tells us how our prior beliefs concerning whether or not  A is true needs to be updated on receiving the information B.

Many events[edit]

If we think of A as arbitrary and B as fixed then we can rewrite Bayes' theorem P(A|B)=P(A)P(B|A)/P(B) in the form P(A|B) \propto P(A)P(B|A) where the proportionality symbol means that, as A varies but keeping B fixed, the left hand side is equal to a constant times the right hand side.

In words posterior is proportional to prior times likelihood. This version of Bayes' theorem was first called "Bayes' rule" by Cournot (1843). Cournot popularized the earlier work of Laplace (1774) who had independently discovered Bayes' rule. The work of Bayes was published posthumously (1763) but remained more or less unknown till Cournot drew attention to it; see Fienberg (2006).

Bayes' rule may be preferred to the usual statement of Bayes' theorem for a number of reasons. One is that it is intuitively simpler to understand. Another reason is that normalizing probabilities is sometimes unnecessary: one sometimes only needs to know ratios of probabilities. Finally, doing the normalization is often easier to do after simplifying the product of prior and likelihood by deleting any factors which do not depend on A, so we do not need to actually compute the denominator P(B) in the usual statement of Bayes' theorem P(A|B)=P(A)P(B|A)/P(B).

In Bayesian statistics, Bayes' rule is often applied with a so-called improper prior, for instance, a uniform probability distribution over all real numbers. In that case, the prior distribution does not exist as a probability measure within conventional probability theory, and Bayes' theorem itself is not available.

Series of events[edit]

Bayes' rule may be applied a number of times. Each time we observe a new event, we update the odds between the events of interest, say A_1 and A_2 by taking account of the new information. For two events (information, evidence) B and C,

 O(A_1:A_2|B \cap C) = \Lambda(A_1:A_2|B \cap C) \cdot \Lambda(A_1:A_2|B) \cdot O(A_1:A_2) ,

where

\Lambda(A_1:A_2|B) = \frac{P(B|A_1)}{P(B|A_2)} ,
\Lambda(A_1:A_2|B \cap C) = \frac{P(C|A_1 \cap B)}{P(C|A_2 \cap B)} .

In the special case of two complementary events A and \neg A, the equivalent notation is

 O(A|B,C) = \Lambda(A|B \cap C) \cdot \Lambda(B|A) \cdot O(A).

Derivation[edit]

Consider two instances of Bayes' theorem:

P(A_1|B) = \frac{1}{P(B)} \cdot P(B|A_1) \cdot P(A_1),
P(A_2|B) = \frac{1}{P(B)} \cdot P(B|A_2) \cdot P(A_2).

Combining these gives

\frac{P(A_1|B)}{P(A_2|B)} = \frac{P(B|A_1)}{P(B|A_2)} \cdot \frac{P(A_1)}{P(A_2)}.

Now defining

O(A_1:A_2|B)  \triangleq \frac{P(A_1|B)}{P(A_2|B)}
O(A_1:A_2) \triangleq \frac{P(A_1)}{P(A_2)}
\Lambda(A_1:A_2|B) \triangleq  \frac{P(B|A_1)}{P(B|A_2)},

this implies

O(A_1:A_2|B) = \Lambda(A_1:A_2|B) \cdot O(A_1:A_2).

A similar derivation applies for conditioning on multiple events, using the appropriate extension of Bayes' theorem

Examples[edit]

Frequentist example[edit]

Consider the drug testing example in the article on Bayes' theorem.

The same results may be obtained using Bayes' rule. The prior odds on an individual being a drug-user are 199 to 1 against, as \textstyle 0.5%=\frac{1}{200} and \textstyle 99.5%=\frac{199}{200}. The Bayes factor when an individual tests positive is \textstyle \frac{0.99}{0.01} = 99:1 in favour of being a drug-user: this is the ratio of the probability of a drug-user testing positive, to the probability of a non-drug user testing positive. The posterior odds on being a drug user are therefore \textstyle 1 \times 99 : 199 \times 1 = 99:199, which is very close to \textstyle 100:200 = 1:2. In round numbers, only one in three of those testing positive are actually drug-users.

Model selection[edit]

External links[edit]

  • Sharon Bertsch McGrayne (2012), "The Theory That Would Not Die: How Bayes' Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant from Two Centuries of Controversy", Yale University Press.
  • Andrew Gelman, John B. Carlin, Hal S. Stern, and Donald B. Rubin (2003), "Bayesian Data Analysis", Second Edition, CRC Press.
  • Stephen E. Fienberg (2006), "When did Bayesian inference become "Bayesian"?"", Bayesian analysis vol. 1, nr. 1, pp. 1-40.
  • Peter M. Lee (2012), "Bayesian Statistics: An Introduction", Wiley.
  • The on-line textbook: Information Theory, Inference, and Learning Algorithms, by David J.C. MacKay, discusses Bayesian model comparison in Chapters 3 and 28.
  • Rosenthal, Jeffrey S. (2005): Struck by Lightning: the Curious World of Probabilities. Harper Collings 2005, ISBN 978-0-00-200791-7.
  • Stone, JV (2013). Chapter 1 of book "Bayes’ Rule: A Tutorial Introduction", University of Sheffield, Psychology.
  • Pierre Bessière, Emmanuel Mazer, Juan-Manuel Ahuactzin and Kamel Mekhnacha (2013), "Bayesian Programming", CRC Press