Chebyshev's inequality: Difference between revisions

Content deleted Content added

Inline

Revision as of 15:29, 23 May 2012

In probability theory, Chebyshev’s inequality (also spelled as Tchebysheff’s inequality) guarantees that in any data sample or probability distribution,"nearly all" values are close to the mean — the precise statement being that no more than 1/k² of the distribution’s values can be more than k standard deviations away from the mean. The inequality has great utility because it can be applied to completely arbitrary distributions (unknown except for mean and variance), for example it can be used to prove the weak law of large numbers.

The term Chebyshev’s inequality may also refer to the Markov's inequality, especially in the context of analysis.

History

The theorem is named after Russian mathematician Pafnuty Chebyshev, although it was first formulated by his friend and colleague Irénée-Jules Bienaymé.^[1] The theorem was first stated without proof by Chebyshev in 1874.^[2] His student Andrey Markov provided a proof in 1884 in his PhD thesis.^[3]

Statement

It can be stated quite generally using measure theory; the statement in the language of probability theory then follows as a particular case, for a space of measure 1.

Measure-theoretic statement

Let (X, Σ, μ) be a measure space, and let f be an extended real-valued measurable function defined on X. Then for any real number t > 0,

\mu (\{x\in X\,:\,\,|f(x)|\geq t\})\leq {1 \over t^{2}}\int _{X}|f|^{2}\,d\mu .

More generally, if g is an extended real-valued measurable function, nonnegative and nondecreasing on the range of f, then

\mu (\{x\in X\,:\,\,f(x)\geq t\})\leq {1 \over g(t)}\int _{X}g\circ f\,d\mu .

The previous statement then follows by defining g(t) as

g(t)={\begin{cases}t^{2}&{\text{if }}t\geq 0\\0&{\text{otherwise,}}\end{cases}}

and taking |f| instead of f.

Probabilistic statement

Let X be a random variable with finite expected value μ and non-zero variance σ². Then for any real number k > 0,

\Pr(|X-\mu |\geq k\sigma )\leq {\frac {1}{k^{2}}}.

Only the case k ≥ 1 provides useful information (when k < 1 the right-hand side is greater than one, so the inequality becomes vacuous, as the probability of any event cannot be greater than one). As an example, using k = √2 shows that at least half of the values lie in the interval (μ − √2σ, μ + √2σ).

Because it can be applied to completely arbitrary distributions (unknown except for mean and variance), the inequality generally gives a poor bound compared to what might be possible if something is known about the distribution involved.

For example, suppose we randomly select a journal article from a source with an average of 1000 words per article, with a standard deviation of 200 words. We can then infer that the probability that it has between 600 and 1400 words (i.e. within k = 2 SDs of the mean) must be more than 75%, because there is less than 1⁄k²
= ⁠1/4⁠ chance to be outside that range, by Chebyshev’s inequality. But if we additionally know that the distribution is normal, we can say that is a 75% chance the word count is between 770 and 1230 (which is an even tighter bound).

As demonstrated in the example above, the theorem will typically provide rather loose bounds. However, the bounds provided by Chebyshev’s inequality cannot, in general (remaining sound for variables of arbitrary distribution), be improved upon. For example, for any k ≥ 1, the following example meets the bounds exactly.

X={\begin{cases}-1,&{\text{with probability }}{\frac {1}{2k^{2}}}\\0,&{\text{with probability }}1-{\frac {1}{k^{2}}}\\1,&{\text{with probability }}{\frac {1}{2k^{2}}}\end{cases}}

For this distribution, mean μ = 0 and standard deviation σ = ⁠1/k⁠, so

\Pr(|X-\mu |\geq k\sigma )=\Pr(|X|\geq 1)={\frac {1}{k^{2}}}.

Equality holds only for distributions that are a linear transformation of this one.

Variant: One-sided Chebyshev inequality

A one-tailed variant with k > 0, is^[4]

\Pr(X-\mu \geq k\sigma )\leq {\frac {1}{1+k^{2}}}.

The one-sided version of the Chebyshev inequality is called Cantelli's inequality, and is due to Francesco Paolo Cantelli.

An application: distance between the mean and the median

The one-sided variant can be used to prove the proposition that for probability distributions having an expected value and a median, the mean (i.e., the expected value) and the median can never differ from each other by more than one standard deviation. To express this in mathematical notation, let μ, m, and σ be respectively the mean, the median, and the standard deviation. Then

\left|\mu -m\right|\leq \sigma .\,

(There is no need to rely on an assumption that the variance exists, i.e., is finite. Unlike the situation with the expected value, saying the variance exists is equivalent to saying the variance is finite. But this inequality is trivially true if the variance is infinite.)

The proof is as follows. Setting k = 1 in the statement for the one-sided inequality gives:

\Pr(X-\mu \geq \sigma )\leq {\frac {1}{2}}.

By changing the sign of X and so of μ, we get

\Pr(X\leq \mu -\sigma )\geq {\frac {1}{2}}.

Thus the median is within one standard deviation of the mean.

For a proof using Jensen's inequality see An inequality relating means and medians.

Proof (of the two-sided Chebyshev's inequality)

Measure-theoretic proof

Fix t and let A_t be defined as A_t := {x ∈ X | ƒ(x) ≥ t}, and let 1_{A_t} be the indicator function of the set A_t. Then, it is easy to check that, for any x

0\leq g(t)1_{A_{t}}\leq g(f(x))\,1_{A_{t}},

since g is nondecreasing on the range of f, and therefore,

{\begin{aligned}g(t)\mu (A_{t})&=\int _{X}g(t)1_{A_{t}}\,d\mu \\&\leq \int _{A_{t}}g\circ f\,d\mu \\&\leq \int _{X}g\circ f\,d\mu .\end{aligned}}

The desired inequality follows from dividing the above inequality by g(t).

Probabilistic proof

Markov's inequality states that for any real-valued random variable Y and any positive number a, we have Pr(|Y| > a) ≤ E(|Y|)/a. One way to prove Chebyshev's inequality is to apply Markov's inequality to the random variable Y = (X − μ)² with a = (σk)².

It can also be proved directly. For any event A, let I_A be the indicator random variable of A, i.e. I_A equals 1 if A occurs and 0 otherwise. Then

{\begin{aligned}&{}\qquad \Pr(|X-\mu |\geq k\sigma )=\operatorname {E} (I_{|X-\mu |\geq k\sigma })=\operatorname {E} (I_{[(X-\mu )/(k\sigma )]^{2}\geq 1})\\[6pt]&\leq \operatorname {E} \left(\left({X-\mu  \over k\sigma }\right)^{2}\right)={1 \over k^{2}}{\operatorname {E} ((X-\mu )^{2}) \over \sigma ^{2}}={1 \over k^{2}}.\end{aligned}}

The direct proof shows why the bounds are quite loose in typical cases: the number 1 to the left of "≥" is replaced by [(X − μ)/(kσ)]² to the right of "≥" whenever the latter exceeds 1. In some cases it exceeds 1 by a very wide margin.

Chebyshev's Inequality (More General)

Several extensions of Chebyshev's inequality have been developed.

A more general form

A more general version of Chebyshev's Inequality states:

\Pr(|X|\geq \varepsilon )\leq {\frac {\operatorname {E} {X^{2}}}{\varepsilon ^{2}}}

This version can be proved from Markov's inequality. Also, this version can be used to derive the more specific statement above.

Exponential Chebyshev's inequality

A related inequality sometimes known as the exponential Chebyshev's inequality is the inequality

P(X\geq \varepsilon )\leq e^{-t\varepsilon }\operatorname {E} (e^{-tX})

where t > 0.

Let K( x , t ) be the cumulant generating function. In symbols

K(x,t)=\log(\operatorname {E} (e^{tx}))

Taking the Legendre-Frenchel transform of K( x , t ) and using the exponential Chebyshev's inequality we have

-\log(P\geq \varepsilon )\leq \sup(t\varepsilon -K(x,t))

This inequality may be useful in obtaining exponential inequalities for unbounded variables.

Higher moments

An extension to higher moments is also possible

P(|X-\operatorname {E} (X))|\geq \lambda )\leq {\frac {\operatorname {E} (|X-\operatorname {E} (X)|)^{n}}{\lambda ^{n}}}

where λ > 0 and n ≥ 2.

Asymmetric two sided case

A asymmetric two sided version of this inequality is also known^[5].

When the distribution is known to be symmetric

$P(\lambda _{1}\leq X\leq \lambda _{2})\geq 1-{\frac {4\sigma ^{2}}{(\lambda _{2}-\lambda _{1})^{2}}}$

where σ² is the variance.

When the distribution is asymmetric or is unknown

$P(\lambda _{1}\leq X\leq \lambda _{2})\geq 1-{\frac {4[(\mu -\lambda _{1})(\lambda _{2}-\mu )-\sigma ^{2}]}{(\lambda _{2}-\lambda _{1})^{2}}}$

where σ² is the variance and μ is the mean.

Bivariate case

A version for the bivariate case is also known.^[6]

Let

Sharpened bounds

Standardised variables

Sharpened bounds can be derived by first standardising the random variable.^[7] Let X be a random variable with finite variance Var( x ). Let Z be the standardised form defined as

Z={\frac {X-\operatorname {E} (X)}{\operatorname {Var(X)} ^{1/2}}}

Cantelli's lemma is then

P(Z\geq k)\leq {\frac {1}{1+k^{2}}}

This inequality is sharp and is attained by k and -1 / k with probability 1 / ( 1 + k² ) and k² / ( 1 + k² ) respectively.

If k > 1 and the distribution of X is symmetric then we have

P(Z\geq k)\leq {\frac {1}{2k^{2}}}.

Equality holds if and only if Z = -k, 0 or k with probabilities 1 / 2 k², 1 - 1 / k² and 1 / 2 k² respectively.^[7]

An extension to a two sided inequality is also possible.

Let u, v > 0. Then we have^[7]

P(Z\leq -u{\text{ or }}Z\geq v)\leq {\frac {4+(u-v)^{2}}{(u+v)^{2}}}

Semivariances

An alternative method of obtaining sharper bounds is through the use of semivariances (partial moments). The upper ( σ₊² ) and lower ( σ_-² ) semivariances are defined

\sigma _{+}^{2}={\frac {\sum (x-m)^{2}}{n_{+}}}

\sigma _{-}^{2}={\frac {\sum (m-x)^{2}}{n_{-}}}

where m is the arithmetic mean of the sample, n₊ ( n_- ) is the number of elements greater (less) than the mean and the sum is taken over the number of elements greater (less) than the mean respectively.

The variance of the sample is the sum of the two semivariances

\sigma ^{2}=\sigma _{+}^{2}+\sigma _{-}^{2}

In terms of the lower semivariance Chebyshev's inequality can be written^[8]

Pr(x\leq m-a\sigma _{-})\leq {\frac {1}{a^{2}}}

Putting

a={\frac {k\sigma }{\sigma _{-}}}

Chebyshev's inequality can now be written

Pr(x\leq m-k\sigma )\leq {\frac {1}{k^{2}}}{\frac {\sigma _{-}^{2}}{\sigma ^{2}}}

A similar result can also be derived for the upper semivariance. If we put

\sigma _{u}^{2}=max(\sigma _{-}^{2},\sigma _{+}^{2})

Chebyshev's inequality can be written

Pr(|x\leq m-k\sigma |)\leq {\frac {1}{k^{2}}}{\frac {\sigma _{u}^{2}}{\sigma ^{2}}}

Because σ_-² ≤ σ², use of the semivariance sharpens the original inequality.

If the distribution is known to be symmetric

\sigma _{+}^{2}=\sigma _{-}^{2}={\frac {1}{2}}\sigma ^{2}

and

Pr(x\leq m-k\sigma )\leq {\frac {1}{2k^{2}}}

Note: The inequality with the lower semivariance has been found to be of use in estimating downside risk in finance and argiculture.

Specific distributions

DasGupta has determined a set of best possible bounds for a normal distribution for this inequality.^[9]

Steliga and Szynal have extended these bounds to the Pareto distribution.^[5]

References

^ Donald Knuth, "The Art of Computer Programming", 3rd ed., vol. 1, 1997, p.98
^ Chebyshev P (1874) Sur les valeurs limites des integrales, J Math Pure Appl 19: 157–160
^ Markov A (1884) On certain applications of algebraic continued fractions, Ph.D. thesis, St Petersburg
^ Grimmett and Stirzaker, problem 7.11.9. Several proofs of this result can be found here.
^ ^a ^b Steliga K, Szynal D (2010) Int J Pure App Math 58 (2) 137-152
^ Ferentinos K (1982) On Tchebycheﬀ type inequalities. Trabajos Estadıst Investigacion Oper 33: 125-132
^ ^a ^b ^c Ion RA (2001) Ph.D. thesis. Sharp Chebyshev-Type Inequalities. Chapter 4 ^{[full citation needed]}^{[clarification needed]}
^ Berck P, Hihn JM (1982) Using the semivariance to estimate safety-first rules. Am J Agric Econ 64: 298-300
^ DasGupta A (2000) Best constants in Chebyshev inequalities with various applications. Metrika, 51: 185-200

@@ Line 111: / Line 111: @@
 ==Chebyshev's Inequality (More General)==
+Several extensions of Chebyshev's inequality have been developed.
+===A more general form===
 A more general version of Chebyshev's Inequality states:
-:<math> \Pr(|X| \ge \varepsilon) \le \frac{\operatorname{E}{X^2}}{\varepsilon^2}</math>
+:<math> \Pr( | X | \ge \varepsilon ) \le \frac{ \operatorname{ E }{ X^2 } }{ \varepsilon^2 }</math>
 This version can be proved from [[Markov's inequality]]. Also, this version can be used to derive the more specific statement above.
+===Exponential Chebyshev's inequality===
 A related inequality sometimes known as the exponential Chebyshev's inequality is the inequality
@@ Line 135: / Line 141: @@
 This inequality may be useful in obtaining exponential inequalities for unbounded variables.
+===Higher moments===
 An extension to higher moments is also possible
@@ Line 142: / Line 149: @@
 where ''λ'' > 0 and ''n'' ≥ 2.
+===Asymmetric two sided case===
 A asymmetric two sided version of this inequality is also known<ref name=Steliga2010>Steliga K, Szynal D (2010) Int J Pure App Math 58 (2) 137-152</ref>.
@@ Line 156: / Line 164: @@
 where ''σ''<sup>2</sup> is the variance and μ is the [[mean]].
+===Bivariate case===
+A version for the bivariate case is also known.<ref name=Ferentinos1982>Ferentinos K (1982) On Tchebycheﬀ type inequalities. Trabajos Estadıst Investigacion Oper 33: 125-132</ref>
+Let
 ==Sharpened bounds==