Elementary description
[edit]
If are events such that , the conditional probability of the event given is defined by
If is fixed, the mapping is a conditional probability distribution given the event .
If also , then also
and so
which is known as the Bayes theorem.
Conditioning of discrete random variables
[edit]
If is a discrete real random variable (that is, attaining only values , ), then the conditional probability of an event given that is
The mapping defines a conditional probability distribution given that .
Note that is a number, that is, a deterministic quantity. If we allow to be a realization of the random variable , we obtain conditional probability of the event given random variable , denoted by , which is a random variable itself. The conditional probability attains the value of with probability .
Now suppose and are two discrete real random variables with a joint distribution. Then the conditional probability distribution of given is
If we allow to be a realization of the random variable , we obtain the conditional distribution of random variable given random variable . Given , the random variable that attains the value with probability .
The random variables and are independent when the events and are independent for all and , that is,
Clearly, this is equivalent to
The conditional expectation of given the value is
which is defined whenever the marginal probability
This is a description common in statistics [1]. Note that is a number, that is, a deterministic quantity, and the particular value of does not matter; only the probabilities do.
If we allow to be a realization of the random variable , we obtain conditional expectation of random variable given random variable , denoted by . This form is closer to the mathematical form favored by probabilists (described in more detail below), and it is a random variable itself. The conditional expectation attains the value with probability .
Conditioning of continuous random variables
[edit]
For continuous random variables , with joint density , the conditional probability density of given that is
where
is the marginal density of . The conventional notation is often used to mean the same as , that is, the function of two variables and . The notation , often used in practice, is ambigous, because if and are substituted for by something else (like specific numbers), the information what means is lost.
The continuous random variables are independent if, for all and , the events and are independent, which can be proved to be equivalent to
This is clearly equivalent to
The conditional probability density of given is the random function . The conditional expectation of given the value is
and the conditional expectation of given is the random variable
dependent on the values of .
Unfortunately, in the the literature, esp. more elementary oriented statistics texts, the authors do not always distinguish properly between conditioning given the value of a random variable (the result is a number) and conditioning given the random variable (the result is a random variable), so, confusingly enough, the words “ given the random variable\textquotedblright can mean either.
Mathematical synopsis
[edit]
This section follows [2]. In probability theory, a conditional expectation (also known as conditional expected value or conditional mean) is the expected value of a random variable with respect to a conditional probability distribution, defined as follows.
If is a real random variable, and is an event with positive probability, then the conditional probability distribution of given assigns a probability to the Borel set . The mean (if it exists) of this conditional probability distribution of is denoted by and called the conditional expectation of given the event .
If is another random variable, then the conditional expectation of given that the value is a function of , let us say . An argument using the Radon-Nikodym theorem is needed to define properly because the event that may have probability zero. Also, is defined only for almost all , with respect to the distribution of . The conditional expectation of given random variable , denoted by , is the random variable .
It turns out that the conditional expectation is a function only of the sigma-algebra, say , generated by the events for Borel sets , rather than the particular values of . For a -algebra , the conditional expectation of given the -algebra is a random variable that is -measurable and whose integral over any -measurable set is the same as the integral of over the same set. The existence of this conditional expectation is proved from the Radon-Nikodym theorem. If happens to be -measurable, then .
If has an expected value, then the conditional expectation also has an expected value, which is the same as that of . This is the law of total expectation.
For simplicity, the presentation here is done for real-valued random variables, but generalization to probability on more general spaces, such as or normed metric spaces equipped with a probability measure, is immediate.
Mathematical prerequisites
[edit]
Recall that probability space is , where is a -algebra of subsets of , and a probability measure with measurable sets. A random variable on the space is a -measurable function. is the sigma algebra of all Borel sets in . If is a set and a random variable, or are common shorthands for the event
Probability conditional on the value of a random variable
[edit]
Let be probability space, a -measurable random variable with values in , (i.e., an event not necessarily independent of ), and . For and , the conditional probability of given is by definition
We wish to attach a meaning to the conditional probability of given even when . The following argument follows Wilks [3], who attributes it to Kolmogorov [4]. Fix and define
Since is -measurable, the set function is a measure on Borel sets . Define another measure on by
Clearly,
\newline and hence implies . Thus the measure is absolutely continuous with respect to the measure and by the Radon-Nykodym theorem, there exists a real-valued -measurable function such that
We interpret the function as the conditional probability of given ,
Once the conditional probability is defined, other concepts of probability follow, such as expectation and density.
One way to justify this interpretation is as the conditional probability of given the limit of probability conditioned on the value of being in a small neighborhood of . Set (a neighborhood of with radius ) to get
and using the fact that , we have
so
for almost all in the measure .\footnote{I do not know how to prove that without additional assumptions on , like continuous. [3] claims the limit a.e. “ can\textquotedblright be proved, though he does not proceed this way, and neglects to mention a.e. is in the measure .}
As another illustration and justification for understanding as the conditional probability of given , we now show what happens when the random variable is discrete. Suppose attains only values , , with . Then
Choose and as a neighborhood of with radius so small that does not contain any other , . Then for any ,
by the definition of , and from the definition of by Radon-Nykodym derivative,
This gives, for ,
by definition of conditional probability. The function is defined only on the set . Because that's where the variable is concentrated, this is a.s.
Expectation conditional on the value of a random variable
[edit]
Suppose that and are random variables, integrable. Define again the measures on generated by the random variable ,
and a signed finite measure on ,
Here, is the indicator function of the event , so if and zero otherwise. Since
and , we have that , so is absolutely continuous with respect to . Consequently, there exists Radon-Nikodym derivative such that
The value is conditional expectation of given and denoted by . Then the result can be written as
for almost all in the measure generated by the random variable .
This definition is consistent with that of conditional probability: the conditional probability of given is the same as the conditional mean of the indicator function of given . The proof is also completely the same. Actually we did not have to do conditional probability at all and just call it a special case of conditional expectation.
Expectation conditional on a random variable and on a -algebra
[edit]
Let be conditional expectation of the random variable given that random variable . Here is a fixed, deterministic value. Now take random, namely the value of the random variable , . The result is called the conditional expectation of given , which is the random variable
So now we have the conditional expectation given in terms of the sample space rather than in terms of , the range space of the random variable . It will turn out that after the change of the independent variable, the particular values attained by the random variable do not matter that much; rather, it is the granularity of that is important. The granularity of can be expressed in terms of the -algebra generated by the random variable , which is
By substitution, the conditional expectation satisfies
which, by writing
is seen to be the same as
It can be proved that for any -algebra , the random variable exists and is defined by this equation uniquely, up to equality a.e. in [5]. The random variable is called the conditional expectation of given the -algebra . It can be interpreted as a sort of averaging of the random variable to the granularity given by the -algebra [6].
The conditional probability of a an event (that is, a set) given the -algebra is obtained by substituting , which gives
An event is defined to be independent of a -algebra if and any are independent. It is easy to see that is independent of -algebra if and only if
that is, if and only if a.s. (which is a particularly obscure way to write independence given how complicated the definitions are).
Two random variables , are said to be independent if
which is now seen to be the same as
Properties of conditional expectation
[edit]
To be done.
Conditional density and likelihood
[edit]
Now that we have for an arbitrary event , we can define the conditional probability for a random variable and Borel set . Thus we can define the conditional density as the Radon-Nikodym derivative,
where is the Lebesgue measure. In the conditional density , and are random variables that identify the density function, and and are the arguments of the density function.
Note that in general is defined only for almost all (in Lebesgue measure) and almost all (in the measure generated by the random variable ).\textbf{ }Under reasonable additional conditions (for example, it is enough to assume that the joint density is continuous at , and ), the density of conditional on satisfies
Note that this density is a deterministic function.
Density of a random variable conditional on a random variable is
It is a function valued random variable obtained from the deterministic function by taking to be the value of the random variable .
A common shorthand for the conditional density is
This abuse of notation identifies a function from the symbols for its arguments, which is incorrect. Imagine that we wish to evaluate the value of the conditional density of at given ; then becomes , which is a nonsense.
When the value of is constant, the function is a probability density function of . When the value of is constant, the function is called the likelihood function.
- ^ William Feller. An introduction to probability theory and its applications. Vol. I. Third edition. John Wiley \& Sons Inc., New York, 1968.
- ^ Wikipedia. Conditional expectation. Version as of 18:29, 28 March 2007 (UTC), 2007.
- ^ a b Samuel S. Wilks. Mathematical statistics. A Wiley Publication in Mathematical Statistics. John Wiley \& Sons Inc., New York, 1962.
- ^ A. N. Kolmogorov. Foundations of the theory of probability. Chelsea Publishing Co., New York, 1956. Translation edited by Nathan Morrison, with an added bibliography by A. T. Bharucha-Reid.
- ^ Claude Dellacherie and Paul-Andr{\'e} Meyer. Probabilities and potential, volume 29 of North-Holland Mathematics Studies. North-Holland Publishing Co., Amsterdam, 1978.
- ^ S. R. S. Varadhan. Probability theory, volume 7 of Courant Lecture Notes in Mathematics. New York University Courant Institute of Mathematical Sciences, New York, 2001.