Empirical measure
In probability theory, an empirical measure is a random measure arising from a particular realization of a (usually finite) sequence of random variables. The precise definition is found below. Empirical measures are relevant to mathematical statistics.
The motivation for studying empirical measures is that it is often impossible to know the true underlying probability measure
. We collect observations
and compute relative frequencies. We can estimate
, or a related distribution function
by means of the empirical measure or empirical distribution function, respectively. These are uniformly good estimates under certain conditions. Theorems in the area of empirical processes provide rates of this convergence.
Contents |
[edit] Definition
Let
be a sequence of independent identically distributed random variables with values in the state space S with probability measure P.
Definition
- The empirical measure
is defined for measurable subsets of S and given by
- where
is the indicator function and
is the Dirac measure.
For a fixed measurable set A, nPn(A) is a binomial random variable with mean nP(A) and variance nP(A)(1 − P(A)). In particular,
is an unbiased estimator of P(A).
Definition
is the empirical measure indexed by
, a collection of measurable subsets of S.
To generalize this notion further, observe that the empirical measure Pn maps measurable functions
to their empirical mean,
In particular, the empirical measure of A is simply the empirical mean of the indicator function,
.
For a fixed measurable function f,
is a random variable with mean
and variance
.
By the strong law of large numbers,
converges to P(A) almost surely for fixed A. Similarly
converges to
almost surely for a fixed measurable function f. The problem of uniform convergence of
to P was open until Vapnik and Chervonenkis solved it in 1968.[1]
If the class
(or
) is Glivenko–Cantelli with respect to P then
converges to P uniformly over
(or
). In other words, with probability 1 we have
[edit] Empirical distribution function
The empirical distribution function provides an example of empirical measures. For real-valued iid random variables
it is given by
In this case, empirical measures are indexed by a class
It has been shown that
is a uniform Glivenko–Cantelli class, in particular,
with probability 1.
[edit] See also
|
|
This article includes a list of references, but its sources remain unclear because it has insufficient inline citations. Please help to improve this article by introducing more precise citations. (March 2011) |
[edit] References
- ^ Vapnik, V.; Chervonenkis, A (1968). "Uniform convergence of frequencies of occurrence of events to their probabilities". Dokl. Akad. Nauk SSSR 181.
- P. Billingsley, Probability and Measure, John Wiley and Sons, New York, third edition, 1995.
- M.D. Donsker, Justification and extension of Doob's heuristic approach to the Kolmogorov–Smirnov theorems, Annals of Mathematical Statistics, 23:277–281, 1952.
- R.M. Dudley, Central limit theorems for empirical measures, Annals of Probability, 6(6): 899–929, 1978.
- R.M. Dudley, Uniform Central Limit Theorems, Cambridge Studies in Advanced Mathematics, 63, Cambridge University Press, Cambridge, UK, 1999.
- J. Wolfowitz, Generalization of the theorem of Glivenko–Cantelli. Annals of Mathematical Statistics, 25, 131–138, 1954.

is the
is the
is the empirical measure indexed by 


![F_n(x)=P_n((-\infty,x])=P_nI_{(-\infty,x]}.](http://upload.wikimedia.org/wikipedia/en/math/b/4/6/b46a6b6ef0d1c2b968194ccfba575624.png)
