Conditional probability
|
|
This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed. (December 2007) |
and
are events. Assuming probability is proportional to area, the unconditional probability P(A) ≈ 0.33. However, the conditional probability
,
≈ 0.85 and
.In probability theory, the "conditional probability of
given
" is the probability of
if
is known to occur. It is commonly denoted
, and sometimes
. (The vertical line should not be mistaken for logical OR.)
can be visualised as the probability of event
when the sample space is restricted to event
. Mathematically, it is defined for
as
Formally,
is defined as the probability of
according to a new probability function on the sample space, such that outcomes not in
have probability 0 and that it is consistent with all original probability measures. The above definition follows (see Formal derivation).[1]
Contents |
[edit] Definition
[edit] Conditioning on an event
Given two events
and
in the same probability space with
, the conditional probability of
given
is defined as the quotient of the unconditional joint probability of
and
, and the unconditional probability of
:
The above definition is how conditional probabilities are introduced by Kolmogorov. However, other authors such as De Finetti prefer to introduce conditional probability as an axiom of probability. Although mathematically equivalent, this may be preferred philosophically; under major probability interpretations such as the subjective theory, conditional probability is considered a primitive entity. Further, this "multiplication axiom" introduces a symmetry with the summation axiom[2]:
Multiplication axiom:
Summation axiom (A and B mutually exclusive):
[edit] Definition with σ-algebra
If
, then the simple definition of
is undefined. However, it is possible to define a conditional probability with respect to a σ-algebra of such events (such as those arising from a continuous random variable).
For example, if X and Y are non-degenerate and jointly continuous random variables with density ƒX,Y(x, y) then, if B has positive measure,
The case where B has zero measure can only be dealt with directly in the case that B={y0}, representing a single point, in which case
If A has measure zero then the conditional probability is zero. An indication of why the more general case of zero measure cannot be dealt with in a similar way can be seen by noting that the limit, as all δyi approach zero, of
depends on their relationship as they approach zero. See conditional expectation for more information.
[edit] Conditioning on a random variable
Conditioning on an event may be generalized to conditioning on a random variable. Let
be a random variable taking some value from
. Let
be an event. The conditional probability of
given
is defined as the random variable
More formally:
The conditional probability
is function of X, i.e if the function g is defined as
,
then
Note that
and
are now both random variables. From the law of total probability, the expected value of
is equal to the unconditional probability of
.
[edit] Example
Consider the rolling of two fair six-sided dice.
- Let
be the value rolled on die 1 - Let
be the value rolled on die 2 - Let
be the event that 
- Let
be the event that 
Suppose we roll
and
. What is the probability that
? Table 1 shows the sample space.
in 6 of the 36 outcomes, so
.
| + | B=1 | 2 | 3 | 4 | 5 | 6 |
|---|---|---|---|---|---|---|
| A=1 | 2 | 3 | 4 | 5 | 6 | 7 |
| 2 | 3 | 4 | 5 | 6 | 7 | 8 |
| 3 | 4 | 5 | 6 | 7 | 8 | 9 |
| 4 | 5 | 6 | 7 | 8 | 9 | 10 |
| 5 | 6 | 7 | 8 | 9 | 10 | 11 |
| 6 | 7 | 8 | 9 | 10 | 11 | 12 |
Suppose however that somebody else rolls the dice in secret, revealing only that
. Table 2 shows that
for 10 outcomes.
in 3 of these. The probability that
given that
is therefore
. This is a conditional probability, because it has a condition that limits the sample space. In more compact notation,
.
| + | B=1 | 2 | 3 | 4 | 5 | 6 |
|---|---|---|---|---|---|---|
| A=1 | 2 | 3 | 4 | 5 | 6 | 7 |
| 2 | 3 | 4 | 5 | 6 | 7 | 8 |
| 3 | 4 | 5 | 6 | 7 | 8 | 9 |
| 4 | 5 | 6 | 7 | 8 | 9 | 10 |
| 5 | 6 | 7 | 8 | 9 | 10 | 11 |
| 6 | 7 | 8 | 9 | 10 | 11 | 12 |
[edit] Statistical independence
If two events
and
are statistically independent, the occurrence of
does not affect the probability of
, and vice versa. That is,

.
Using the definition of conditional probability, it follows from either formula that
This is the definition of statistical independence. This form is the preferred definition, as it is symmetrical in
and
, and no values are undefined if
or
is 0.
[edit] Common fallacies
- These fallacies should not be confused with Robert K. Shope's 1978 "conditional fallacy", which deals with counterfactual examples that beg the question.
[edit] Assuming conditional probability is of similar size to its inverse
In general, it cannot be assumed that
. This can be an insidious error, even for those who are highly conversant with statistics.[3] The relationship between
and
is given by Bayes' theorem:
That is,
only if
, or equivalently,
.
[edit] Assuming marginal and conditional probabilities are of similar size
In general, it cannot be assumed that
. These probabilities are linked through the formula for total probability:
.
This fallacy may arise through selection bias.[4] For example, in the context of a medical claim, let
be the event that sequelae
occurs as a consequence of circumstance
. Let
be the event that an individual seeks medical help. Suppose that in most cases,
does not cause
so
is low. Suppose also that medical attention is only sought if
has occurred. From experience of patients, a doctor may therefore erroneously conclude that
is high. The actual probability observed by the doctor is
.
[edit] Over- or under-weighting priors
Not taking prior probability into account partially or completely is called base rate neglect. The reverse, insufficient adjustment from the prior probability is conservatism.
[edit] Formal derivation
This section is based on the derivation given in Grinsted and Snell's Introduction to Probability.[5]
Let
be a sample space with elementary events
. Suppose we are told the event
has occurred. A new probability distribution (denoted by the conditional notation) is to be assigned on
to reflect this. For events in
, It is reasonable to assume that the relative magnitudes of the probabilities will be preserved. For some constant scale factor
, the new distribution will therefore satisfy:
Substituting 1 and 2 into 3 to select
:
So the new probability distribution is
Now for a general event
,
[edit] See also
- Borel–Kolmogorov paradox
- Chain rule (probability)
- Posterior probability
- Conditioning (probability)
- Joint probability distribution
- Conditional probability distribution
- Class membership probabilities
- Monty Hall problem
[edit] References
- ^ George Casella and Roger L. Berger (1990), Statistical Inference, Duxbury Press, ISBN 0534119581 (p. 18 et seq.)
- ^ Gillies, Donald (2000); "Philosophical Theories of Probability"; Routledge; Chapter 4 "The subjective theory"
- ^ Paulos, J.A. (1988) Innumeracy: Mathematical Illiteracy and its Consequences, Hill and Wang. ISBN 0809074478 (p. 63 et seq.)
- ^ Thomas Bruss, F; Der Wyatt Earp Effekt; Spektrum der Wissenschaft; March 2007
- ^ Grinstead and Snell's Introduction to Probability, p. 134
[edit] External links
- Weisstein, Eric W., "Conditional Probability" from MathWorld.
- F. Thomas Bruss Der Wyatt-Earp-Effekt oder die betörende Macht kleiner Wahrscheinlichkeiten (in German), Spektrum der Wissenschaft (German Edition of Scientific American), Vol 2, 110–113, (2007).
- Conditional Probablity Problems with Solutions






![P(X \in A \mid Y \in \cup_i[y_i,y_i+\delta y_i]) \approxeq
\frac{\sum_{i} \int_{x\in A} f_{X,Y}(x,y_i)\,dx\,\delta y_i}{\sum_{i}\int_{x\in\Omega} f_{X,Y}(x,y_i) \,dx\, \delta y_i} ,](http://upload.wikimedia.org/wikipedia/en/math/2/d/d/2dd41b1964098c5bce4df0a83a9c0ce0.png)


,
be the event that 
be the event that 

.

.





