Empirical process

In probability theory, an empirical process is a stochastic process that characterizes the deviation of the empirical distribution function from its expectation. In mean field theory, limit theorems (as the number of objects becomes large) are considered and generalise the central limit theorem for empirical measures. Applications of the theory of empirical processes arise in non-parametric statistics.^[1]

Definition[edit]

For X₁, X₂, ... X_n independent and identically-distributed random variables in R with common cumulative distribution function F(x), the empirical distribution function is defined by

F_{n}(x)={\frac {1}{n}}\sum _{i=1}^{n}I_{(-\infty ,x]}(X_{i}),

where I_C is the indicator function of the set C.

For every (fixed) x, F_n(x) is a sequence of random variables which converge to F(x) almost surely by the strong law of large numbers. That is, F_n converges to F pointwise. Glivenko and Cantelli strengthened this result by proving uniform convergence of F_n to F by the Glivenko–Cantelli theorem.^[2]

A centered and scaled version of the empirical measure is the signed measure

G_{n}(A)={\sqrt {n}}(P_{n}(A)-P(A))

It induces a map on measurable functions f given by

f\mapsto G_{n}f={\sqrt {n}}(P_{n}-P)f={\sqrt {n}}\left({\frac {1}{n}}\sum _{i=1}^{n}f(X_{i})-\mathbb {E} f\right)

By the central limit theorem, $G_{n}(A)$ converges in distribution to a normal random variable N(0, P(A)(1 − P(A))) for fixed measurable set A. Similarly, for a fixed function f, $G_{n}f$ converges in distribution to a normal random variable $N(0,\mathbb {E} (f-\mathbb {E} f)^{2})$ , provided that $\mathbb {E} f$ and $\mathbb {E} f^{2}$ exist.

Definition

{\bigl (}G_{n}(c){\bigr )}_{c\in {\mathcal {C}}}

is called an empirical process indexed by

{\mathcal {C}}

, a collection of measurable subsets of S.

{\bigl (}G_{n}f{\bigr )}_{f\in {\mathcal {F}}}

is called an empirical process indexed by

{\mathcal {F}}

, a collection of measurable functions from S to

\mathbb {R}

.

A significant result in the area of empirical processes is Donsker's theorem. It has led to a study of Donsker classes: sets of functions with the useful property that empirical processes indexed by these classes converge weakly to a certain Gaussian process. While it can be shown that Donsker classes are Glivenko–Cantelli classes, the converse is not true in general.

Example[edit]

As an example, consider empirical distribution functions. For real-valued iid random variables X₁, X₂, ..., X_n they are given by

F_{n}(x)=P_{n}((-\infty ,x])=P_{n}I_{(-\infty ,x]}.

In this case, empirical processes are indexed by a class ${\mathcal {C}}=\{(-\infty ,x]:x\in \mathbb {R} \}.$ It has been shown that ${\mathcal {C}}$ is a Donsker class, in particular,

{\sqrt {n}}(F_{n}(x)-F(x))

converges weakly in

\ell ^{\infty }(\mathbb {R} )

to a Brownian bridge B(F(x)) .

References[edit]

^ Mojirsheibani, M. (2007). "Nonparametric curve estimation with missing data: A general empirical process approach". Journal of Statistical Planning and Inference. 137 (9): 2733–2758. doi:10.1016/j.jspi.2006.02.016.
^ Wolfowitz, J. (1954). "Generalization of the Theorem of Glivenko-Cantelli". The Annals of Mathematical Statistics. 25: 131–138. doi:10.1214/aoms/1177728852.

External links[edit]

Empirical Processes: Theory and Applications, by David Pollard, a textbook available online.
Introduction to Empirical Processes and Semiparametric Inference, by Michael Kosorok, another textbook available online.

[1] Mojirsheibani, M. (2007). "Nonparametric curve estimation with missing data: A general empirical process approach". Journal of Statistical Planning and Inference. 137 (9): 2733–2758. doi:10.1016/j.jspi.2006.02.016.

[2] Wolfowitz, J. (1954). "Generalization of the Theorem of Glivenko-Cantelli". The Annals of Mathematical Statistics. 25: 131–138. doi:10.1214/aoms/1177728852.

[1]

[2]

v t e Stochastic processes
Discrete time	Bernoulli process Branching process Chinese restaurant process Galton–Watson process Independent and identically distributed random variables Markov chain Moran process Random walk Loop-erased Self-avoiding Biased Maximal entropy
Continuous time	Additive process Bessel process Birth–death process pure birth Brownian motion Bridge Excursion Fractional Geometric Meander Cauchy process Contact process Continuous-time random walk Cox process Diffusion process Dyson Brownian motion Empirical process Feller process Fleming–Viot process Gamma process Geometric process Hawkes process Hunt process Interacting particle systems Itô diffusion Itô process Jump diffusion Jump process Lévy process Local time Markov additive process McKean–Vlasov process Ornstein–Uhlenbeck process Poisson process Compound Non-homogeneous Schramm–Loewner evolution Semimartingale Sigma-martingale Stable process Superprocess Telegraph process Variance gamma process Wiener process Wiener sausage
Both	Branching process Galves–Löcherbach model Gaussian process Hidden Markov model (HMM) Markov process Martingale Differences Local Sub- Super- Random dynamical system Regenerative process Renewal process Stochastic chains with memory of variable length White noise
Fields and other	Dirichlet process Gaussian random field Gibbs measure Hopfield model Ising model Potts model Boolean network Markov random field Percolation Pitman–Yor process Point process Cox Poisson Random field Random graph
Time series models	Autoregressive conditional heteroskedasticity (ARCH) model Autoregressive integrated moving average (ARIMA) model Autoregressive (AR) model Autoregressive–moving-average (ARMA) model Generalized autoregressive conditional heteroskedasticity (GARCH) model Moving-average (MA) model
Financial models	Binomial options pricing model Black–Derman–Toy Black–Karasinski Black–Scholes Chan–Karolyi–Longstaff–Sanders (CKLS) Chen Constant elasticity of variance (CEV) Cox–Ingersoll–Ross (CIR) Garman–Kohlhagen Heath–Jarrow–Morton (HJM) Heston Ho–Lee Hull–White Korn-Kreer-Lenssen LIBOR market Rendleman–Bartter SABR volatility Vašíček Wilkie
Actuarial models	Bühlmann Cramér–Lundberg Risk process Sparre–Anderson
Queueing models	Bulk Fluid Generalized queueing network M/G/1 M/M/1 M/M/c
Properties	Càdlàg paths Continuous Continuous paths Ergodic Exchangeable Feller-continuous Gauss–Markov Markov Mixing Piecewise-deterministic Predictable Progressively measurable Self-similar Stationary Time-reversible
Limit theorems	Central limit theorem Donsker's theorem Doob's martingale convergence theorems Ergodic theorem Fisher–Tippett–Gnedenko theorem Large deviation principle Law of large numbers (weak/strong) Law of the iterated logarithm Maximal ergodic theorem Sanov's theorem Zero–one laws (Blumenthal, Borel–Cantelli, Engelbert–Schmidt, Hewitt–Savage, Kolmogorov, Lévy)
Inequalities	Burkholder–Davis–Gundy Doob's martingale Doob's upcrossing Kunita–Watanabe Marcinkiewicz–Zygmund
Tools	Cameron–Martin formula Convergence of random variables Doléans-Dade exponential Doob decomposition theorem Doob–Meyer decomposition theorem Doob's optional stopping theorem Dynkin's formula Feynman–Kac formula Filtration Girsanov theorem Infinitesimal generator Itô integral Itô's lemma Karhunen–Loève theorem Kolmogorov continuity theorem Kolmogorov extension theorem Lévy–Prokhorov metric Malliavin calculus Martingale representation theorem Optional stopping theorem Prokhorov's theorem Quadratic variation Reflection principle Skorokhod integral Skorokhod's representation theorem Skorokhod space Snell envelope Stochastic differential equation Tanaka Stopping time Stratonovich integral Uniform integrability Usual hypotheses Wiener space Classical Abstract
Disciplines	Actuarial mathematics Control theory Econometrics Ergodic theory Extreme value theory (EVT) Large deviations theory Mathematical finance Mathematical statistics Probability theory Queueing theory Renewal theory Ruin theory Signal processing Statistics Stochastic analysis Time series analysis Machine learning
List of topics Category

Empirical process

Definition[edit]

Example[edit]

See also[edit]

References[edit]

Further reading[edit]

External links[edit]