Conditional probability
|
|
This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed. (December 2007) |
In probability theory, the "conditional probability of A given B" is the probability of A if B is known to occur. It is commonly notated P(A | B), and sometimes PB(A). (The vertical line should not be mistaken for logical OR.) P(A | B) can be visualised as the probability of event A when the sample space is restricted to event B. Mathematically, it is defined for
as
Formally, P(A | B) is defined as the probability of A according to a new probability function on the sample space, such that outcomes not in B have probability 0 and that it is consistent with all original probability measures. The above definition follows (see Formal derivation).[1]
Contents |
[edit] Definition
[edit] Conditioning on an event
Given two events A and B in the same probability space with P(B) > 0, the conditional probability of A given B is defined as the quotient of the unconditional joint probability of A and B, and the unconditional probability of B:
The above definition is how conditional probabilities are introduced by Kolmogorov. However, other authors such as De Finetti prefer to introduce conditional probability as an axiom of probability. Although mathematically equivalent, this may be preferred philosophically; under major probability interpretations such as the subjective theory, conditional probability is considered a primitive entity. Further, this "multiplication axiom" introduces a symmetry with the summation axiom[2]:
Multiplication axiom:
Summation axiom (A and B mutually exclusive):
[edit] Definition with σ-algebra
If P(B) = 0, then the simple definition of P(A | B) is undefined. However, it is possible to define a conditional probability with respect to a σ-algebra of such events (such as those arising from a continuous random variable).
For example, if X and Y are non-degenerate and jointly continuous random variables with density ƒX,Y(x, y) then, if B has positive measure,
The case where B has zero measure can only be dealt with directly in the case that B={y0}, representing a single point, in which case
If A has measure zero then the conditional probability is zero. An indication of why the more general case of zero measure cannot be dealt with in a similar way can be seen by noting that the limit, as all δyi approach zero, of
depends on their relationship as they approach zero. See conditional expectation for more information.
[edit] Conditioning on a random variable
Conditioning on an event may be generalized to conditioning on a random variable. Let X be a random variable taking some value from {xn}. Let A be an event. The conditional probability of A given X is defined as the random variable
More formally:
The conditional probability P(A | X) is function of X, i.e if the function g is defined as
,
then
Note that P(A | X) and X are now both random variables. From the law of total probability, the expected value of P(A | X) is equal to the unconditional probability of A.
[edit] Example
Consider the rolling of two fair six-sided dice.
- Let A be the value rolled on die 1
- Let B be the value rolled on die 2
- Let An be the event that A = n
- Let Σm be the event that

Suppose we roll A and B. What is the probability that A = 2? Table 1 shows the sample space. A = 2 in 6 of the 36 outcomes, so
.
| + | B=1 | 2 | 3 | 4 | 5 | 6 |
|---|---|---|---|---|---|---|
| A=1 | 2 | 3 | 4 | 5 | 6 | 7 |
| 2 | 3 | 4 | 5 | 6 | 7 | 8 |
| 3 | 4 | 5 | 6 | 7 | 8 | 9 |
| 4 | 5 | 6 | 7 | 8 | 9 | 10 |
| 5 | 6 | 7 | 8 | 9 | 10 | 11 |
| 6 | 7 | 8 | 9 | 10 | 11 | 12 |
Suppose however that somebody else rolls the dice in secret, revealing only that
. Table 2 shows that
for 10 outcomes. A = 2 in 3 of these. The probability that A = 2 given that
is therefore
. This is a conditional probability, because it has a condition that limits the sample space. In more compact notation, P(A2 | Σ5) = 0.3.
| + | B=1 | 2 | 3 | 4 | 5 | 6 |
|---|---|---|---|---|---|---|
| A=1 | 2 | 3 | 4 | 5 | 6 | 7 |
| 2 | 3 | 4 | 5 | 6 | 7 | 8 |
| 3 | 4 | 5 | 6 | 7 | 8 | 9 |
| 4 | 5 | 6 | 7 | 8 | 9 | 10 |
| 5 | 6 | 7 | 8 | 9 | 10 | 11 |
| 6 | 7 | 8 | 9 | 10 | 11 | 12 |
[edit] Statistical independence
If two events A and B are statistically independent, the occurrence of A does not affect the probability of B, and vice versa. That is,

.
Using the definition of conditional probability, it follows from either formula that
This is the definition of statistical independence. This form is the preferred definition, as it is symmetrical in A and B, and no values are undefined if P(A) or P(B) is 0.
[edit] Common fallacies
- These fallacies should not be confused with Robert K. Shope's 1978 "conditional fallacy", which deals with counterfactual examples that beg the question.
[edit] Assuming conditional probability is of similar size to its inverse
In general, it cannot be assumed that
. This can be an insidious error, even for those who are highly conversant with statistics.[3] The relationship between P(A | B) and P(B | A) is given by Bayes' theorem:
That is,
only if
, or equivalently,
.
[edit] Assuming marginal and conditional probabilities are of similar size
In general, it cannot be assumed that
. These probabilities are linked through the formula for total probability:
.
This fallacy may arise through selection bias.[4] For example, in the context of a medical claim, let SC be the event that sequelae S occurs as a consequence of circumstance C. Let H be the event that an individual seeks medical help. Suppose that in most cases, C does not cause S so P(SC) is low. Suppose also that medical attention is only sought if S has occurred. From experience of patients, a doctor may therefore erroneously conclude that P(SC) is high. The actual probability observed by the doctor is P(SC | H).
[edit] Formal derivation
This section is based on the derivation given in Grinsted and Snell's Introduction to Probability.[5]
Let Ω be a sample space with elementary events {ω}. Suppose we are told the event
has occurred. A new probability distribution (denoted by the conditional notation) is to be assigned on {ω} to reflect this. For events in B, It is reasonable to assume that the relative magnitudes of the probabilities will be preserved. For some constant scale factor α, the new distribution will therefore satisfy:
Substituting 1 and 2 into 3 to select α:
So the new probability distribution is
Now for a general event A,
[edit] See also
- Borel–Kolmogorov paradox
- Chain rule (probability)
- Posterior probability
- Conditioning (probability)
- Joint probability distribution
- Conditional probability distribution
- Class membership probabilities
[edit] References
- ^ George Casella and Roger L. Berger (1990), Statistical Inference, Duxbury Press, ISBN 0534119581 (p. 18 et seq.)
- ^ Gillies, Donald (2000); "Philosophical Theories of Probability"; Routledge; Chapter 4 "The subjective theory"
- ^ Paulos, J.A. (1988) Innumeracy: Mathematical Illiteracy and its Consequences, Hill and Wang. ISBN 0809074478 (p. 63 et seq.)
- ^ Thomas Bruss, F; Der Wyatt Earp Effekt; Spektrum der Wissenschaft; March 2007
- ^ Grinstead and Snell's Introduction to Probability, p. 134
[edit] External links
- Weisstein, Eric W., "Conditional Probability" from MathWorld.
- F. Thomas Bruss Der Wyatt-Earp-Effekt oder die betörende Macht kleiner Wahrscheinlichkeiten (in German), Spektrum der Wissenschaft (German Edition of Scientific American), Vol 2, 110–113, (2007).
- Conditional Probablity Problems with Solutions






![P(X \in A \mid Y \in \cup_i[y_i,y_i+\delta y_i]) \approxeq
\frac{\sum_{i} \int_{x\in A} f_{X,Y}(x,y_i)\,dx\,\delta y_i}{\sum_{i}\int_{x\in\Omega} f_{X,Y}(x,y_i) \,dx\, \delta y_i} ,](http://upload.wikimedia.org/wikipedia/en/math/2/d/d/2dd41b1964098c5bce4df0a83a9c0ce0.png)


,


.

.





