User:Igny/empirical measure

From Wikipedia, the free encyclopedia

In probability theory, an empirical measure is a measure arising from a particular realization of a (usually finite) sequence of random variables. The precise definition is found below. Empirical measures are relevant to mathematical statistics.

The motivation for studying empirical measures is that it is often impossible to know the true underlying probability measure . We collect observations and compute relative frequencies. We can estimate , or a related distribution function by means of the empirical measure or empirical distribution function, respectively. These are uniformly good estimates under certain conditions. Theorems in the area of empirical processes provide rates of this convergence.

Definition[edit]

Let be a sequence of independent identically distributed random variables with values in the state space S with probability measure P.

Definition

The empirical measure is defined for measurable subsets of S and given by
where is the indicator function and is the Dirac measure.

Definition

is the empirical measure indexed by , a collection of measurable subsets of S.

For a fixed measurable set A, is a binomial random variable with mean nP(A) and variance nP(A)(1-P(A)).

By the strong law of large numbers, converges to P(A) almost surely for fixed A. The problem of uniform convergence of to P was open until Vapnik and Chervonenkis solved it in 1968. If the class is Glivenko-Cantelli with respect to P then converges to P uniformly over In other words, with probability 1 we have

Empirical mean[edit]

To generalize this notion further, observe that the empirical measure maps measurable functions to their empirical mean,

In particular, the empirical measure of A is simply the empirical mean of the indicator function, .

For a fixed measurable function f, is a random variable with mean and variance . Similarly converges to almost surely for a fixed measurable function f. If the class is Glivenko-Cantelli then with probability 1 we have

Empirical distribution function[edit]

The empirical distribution function provides an example of empirical measures. For real-valued iid random variables it is given by

In this case, empirical measures are indexed by a class It has been shown that is a uniform Glivenko-Cantelli class, in particular,

with probability 1.

Kernel estimation[edit]

For a metric space S, the Dirac measure can be replaced by an arbitrary measure centered at

This corresponds to the following map of measurable functions

For the measures are usually chosen to have a density with respect to the Lebesgue measure dx, that is

where is a kernel with a bandwidth h. See kernel density estimation for more details.

See also[edit]

References[edit]

  • P. Billingsley, Probability and Measure, John Wiley and Sons, New York, third edition, 1995.
  • M.D. Donsker, Justification and extension of Doob's heuristic approach to the Kolmogorov-Smirnov theorems, Annals of Mathematical Statistics, 23:277--281, 1952.
  • R.M. Dudley, Central limit theorems for empirical measures, Annals of Probability, 6(6): 899–929, 1978.
  • R.M. Dudley, Uniform Central Limit Theorems, Cambridge Studies in Advanced Mathematics, 63, Cambridge University Press, Cambridge, UK, 1999.
  • J. Wolfowitz, Generalization of the theorem of Glivenko-Cantelli. Annals of Mathematical Statistics, 25, 131-138, 1954.

Category:Probability theory Category:Measures (measure theory)