# Chebyshev's inequality

In probability theory, Chebyshev's inequality (also called the Bienaymé–Chebyshev inequality) guarantees that, for a wide class of probability distributions, no more than a certain fraction of values can be more than a certain distance from the mean. Specifically, no more than 1/k2 of the distribution's values can be more than k standard deviations away from the mean (or equivalently, at least 1 − 1/k2 of the distribution's values are within k standard deviations of the mean). The rule is often called Chebyshev's theorem, about the range of standard deviations around the mean, in statistics. The inequality has great utility because it can be applied to any probability distribution in which the mean and variance are defined. For example, it can be used to prove the weak law of large numbers.

In practical usage, in contrast to the 68–95–99.7 rule, which applies to normal distributions, Chebyshev's inequality is weaker, stating that a minimum of just 75% of values must lie within two standard deviations of the mean and 88.89% within three standard deviations.[1][2]

The term Chebyshev's inequality may also refer to Markov's inequality, especially in the context of analysis. They are closely related, and some authors refer to Markov's inequality as "Chebyshev's First Inequality," and the similar one referred to on this page as "Chebyshev's Second Inequality."

## History

The theorem is named after Russian mathematician Pafnuty Chebyshev, although it was first formulated by his friend and colleague Irénée-Jules Bienaymé.[3]:98 The theorem was first stated without proof by Bienaymé in 1853[4] and later proved by Chebyshev in 1867.[5] His student Andrey Markov provided another proof in his 1884 Ph.D. thesis.[6]

## Statement

Chebyshev's inequality is usually stated for random variables, but can be generalized to a statement about measure spaces.

### Probabilistic statement

Let X (integrable) be a random variable with finite expected value μ and finite non-zero variance σ2. Then for any real number k > 0,

${\displaystyle \Pr(|X-\mu |\geq k\sigma )\leq {\frac {1}{k^{2}}}.}$

Only the case ${\displaystyle k>1}$ is useful. When ${\displaystyle k\leq 1}$ the right-hand side ${\displaystyle {\frac {1}{k^{2}}}\geq 1}$ and the inequality is trivial as all probabilities are ≤ 1.

As an example, using ${\displaystyle k={\sqrt {2}}}$ shows that the probability that values lie outside the interval ${\displaystyle (\mu -{\sqrt {2}}\sigma ,\mu +{\sqrt {2}}\sigma )}$ does not exceed ${\displaystyle {\frac {1}{2}}}$.

Because it can be applied to completely arbitrary distributions provided they have a known finite mean and variance, the inequality generally gives a poor bound compared to what might be deduced if more aspects are known about the distribution involved.

k Min. % within k standard
deviations of mean
Max. % beyond k standard
deviations from mean
1 0% 100%
2 50% 50%
1.5 55.56% 44.44%
2 75% 25%
22 87.5% 12.5%
3 88.8889% 11.1111%
4 93.75% 6.25%
5 96% 4%
6 97.2222% 2.7778%
7 97.9592% 2.0408%
8 98.4375% 1.5625%
9 98.7654% 1.2346%
10 99% 1%

### Measure-theoretic statement

Let (X, Σ, μ) be a measure space, and let f be an extended real-valued measurable function defined on X. Then for any real number t > 0 and 0 < p < ∞,[7]

${\displaystyle \mu (\{x\in X\,:\,\,|f(x)|\geq t\})\leq {1 \over t^{p}}\int _{|f|\geq t}|f|^{p}\,d\mu .}$

More generally, if g is an extended real-valued measurable function, nonnegative and nondecreasing, with ${\displaystyle g(t)\neq 0}$ then:[citation needed]

${\displaystyle \mu (\{x\in X\,:\,\,f(x)\geq t\})\leq {1 \over g(t)}\int _{X}g\circ f\,d\mu .}$

The previous statement then follows by defining ${\displaystyle g(x)}$ as ${\displaystyle |x|^{p}}$ if ${\displaystyle x\geq t}$ and ${\displaystyle 0}$ otherwise.

## Example

Suppose we randomly select a journal article from a source with an average of 1000 words per article, with a standard deviation of 200 words. We can then infer that the probability that it has between 600 and 1400 words (i.e. within k = 2 standard deviations of the mean) must be at least 75%, because there is no more than 1k2
= 1/4
chance to be outside that range, by Chebyshev's inequality. But if we additionally know that the distribution is normal, we can say there is a 75% chance the word count is between 770 and 1230 (which is an even tighter bound).

## Sharpness of bounds

As shown in the example above, the theorem typically provides rather loose bounds. However, these bounds cannot in general (remaining true for arbitrary distributions) be improved upon. The bounds are sharp for the following example: for any k ≥ 1,

${\displaystyle X={\begin{cases}-1,&{\text{with probability }}{\frac {1}{2k^{2}}}\\0,&{\text{with probability }}1-{\frac {1}{k^{2}}}\\1,&{\text{with probability }}{\frac {1}{2k^{2}}}\end{cases}}}$

For this distribution, the mean μ = 0 and the standard deviation σ = 1/k , so

${\displaystyle \Pr(|X-\mu |\geq k\sigma )=\Pr(|X|\geq 1)={\frac {1}{k^{2}}}.}$

Chebyshev's inequality is an equality for precisely those distributions that are a linear transformation of this example.

## Proof (of the two-sided version)

### Probabilistic proof

Markov's inequality states that for any real-valued random variable Y and any positive number a, we have Pr(|Y| > a) ≤ E(|Y|)/a. One way to prove Chebyshev's inequality is to apply Markov's inequality to the random variable Y = (Xμ)2 with a = ()2.

It can also be proved directly using conditional expectation:

{\displaystyle {\begin{aligned}\sigma ^{2}&=\mathbb {E} [(X-\mu )^{2}]\\&=\mathbb {E} [(X-\mu )^{2}\mid k\sigma \leq |X-\mu |]\Pr[k\sigma \leq |X-\mu |]+\mathbb {E} [(X-\mu )^{2}\mid k\sigma >|X-\mu |]\Pr[k\sigma >|X-\mu |]\\&\geq (k\sigma )^{2}\Pr[k\sigma \leq |X-\mu |]+0\cdot \Pr[k\sigma >|X-\mu |]\\&=k^{2}\sigma ^{2}\Pr[k\sigma \leq |X-\mu |]\end{aligned}}}

Chebyshev's inequality then follows by dividing by k2σ2.

This proof also shows why the bounds are quite loose in typical cases: the conditional expectation on the event where |X-μ|<σ is thrown away, and the lower bound of k2σ2 on the event |X-μ|≥k'σ can be quite poor.

### Measure-theoretic proof

Fix ${\displaystyle t}$ and let ${\displaystyle A_{t}}$ be defined as ${\displaystyle A_{t}=\{x\in X\mid f(x)\geq t\}}$, and let ${\displaystyle 1_{A_{t}}}$ be the indicator function of the set ${\displaystyle A_{t}}$. Then, it is easy to check that, for any ${\displaystyle x}$,

${\displaystyle g(t)1_{A_{t}}(x)\leq g(f(x))\,1_{A_{t}}(x),}$

since g is nondecreasing, and therefore,

{\displaystyle {\begin{aligned}g(t)\mu (A_{t})&=\int _{X}g(t)1_{A_{t}}\,d\mu \\&\leq \int _{A_{t}}g\circ f\,d\mu \\&\leq \int _{X}g\circ f\,d\mu ,\end{aligned}}}

where the last inequality is justified by the non-negativity of g. The desired inequality follows from dividing the above inequality by g(t).

### Proof assuming random variable X is continuous

Using the definitions of probability density function f(x), and variance Var(X):

{\displaystyle {\begin{aligned}\Pr(a\leq X\leq b)=\int _{a}^{b}f_{X}(x)\,dx,\end{aligned}}}
{\displaystyle {\begin{aligned}\operatorname {Var} (X)=\sigma ^{2}&=\int _{\mathbb {R} }(x-\mu )^{2}f(x)\,dx,\end{aligned}}}

we have:

{\displaystyle {\begin{aligned}\Pr(|X-\mu |\geq k\sigma )&=\int _{|x-\mu |\geq k\sigma }f(x)\,dx\\&\leq \int _{|x-\mu |\geq k\sigma }{\frac {|x-\mu |}{k\sigma }}f(x)\,dx\\&\leq \int _{|x-\mu |\geq k\sigma }{\frac {(x-\mu )^{2}}{k^{2}\sigma ^{2}}}f(x)\,dx\\&=\int _{|x-\mu |\geq k\sigma }{\frac {1}{k^{2}\sigma ^{2}}}(x-\mu )^{2}f(x)\,dx\\&={\frac {1}{k^{2}\sigma ^{2}}}\int _{|x-\mu |\geq k\sigma }(x-\mu )^{2}f(x)\,dx\\&\leq {\frac {1}{k^{2}\sigma ^{2}}}\int _{-\infty }^{\infty }(x-\mu )^{2}f(x)\,dx\\&={\frac {1}{k^{2}\sigma ^{2}}}\sigma ^{2}\\&={\frac {1}{k^{2}}}.\end{aligned}}}

Replacing kσ with ε, where k=ε/σ, we have another form of the Chebyshev's inequality:

{\displaystyle {\begin{aligned}\Pr(|X-\mu |\geq \epsilon )\leq {\frac {\sigma ^{2}}{\epsilon ^{2}}},\end{aligned}}}

or, the equivalent

{\displaystyle {\begin{aligned}\Pr(|X-\mu |<\epsilon )>1-{\frac {\sigma ^{2}}{\epsilon ^{2}}},\end{aligned}}}

where ε is defined the same way as k; any positive real number.

## Extensions

Several extensions of Chebyshev's inequality have been developed.

### Asymmetric two-sided

If X has mean μ and variance σ2, then

${\displaystyle \Pr(l[8]

This reduces to Chebyshev's inequality in the symmetric case (l and u equidistant from the mean).

#### Bivariate generalization

Let X1, X2 be two random variables with means μ1, μ2 and finite variances σ1, σ2 respectively. Then a union bound shows that

${\displaystyle \Pr \left(l_{1}\leq {\frac {X_{1}-\mu _{1}}{\sigma _{1}}}\leq u_{1},l_{2}\leq {\frac {X_{2}-\mu _{2}}{\sigma _{2}}}\leq u_{2}\right)\geq 1-{\frac {4+(u_{1}+l_{1})^{2}}{(u_{1}-l_{1})^{2}}}-{\frac {4+(u_{2}+l_{2})^{2}}{(u_{2}-l_{2})^{2}}}}$

This bound does not require X1 and X2 independent.[9]

### Bivariate, known correlation

Berge derived an inequality for two correlated variables X1, X2.[10] Let ρ be the correlation coefficient between X1 and X2 and let σi2 be the variance of Xi. Then

${\displaystyle \Pr \left(\bigcap _{i=1}^{2}\left[{\frac {|X_{i}-\mu _{i}|}{\sigma _{i}}}

Lal later obtained an alternative bound[11]

${\displaystyle \Pr \left(\bigcap _{i=1}^{2}\left[{\frac {|X_{i}-\mu _{i}|}{\sigma _{i}}}\leq k_{i}\right]\right)\geq 1-{\frac {k_{1}^{2}+k_{2}^{2}+{\sqrt {(k_{1}^{2}+k_{2}^{2})^{2}-4k_{1}^{2}k_{2}^{2}\rho }}}{2(k_{1}k_{2})^{2}}}}$

Isii derived a further generalisation.[12] Let

${\displaystyle Z=\Pr \left(\left(-k_{1}

and define:

${\displaystyle \lambda ={\frac {k_{1}(1+\rho )+{\sqrt {(1-\rho ^{2})(k_{1}^{2}+\rho )}}}{2k_{1}-1+\rho }}}$

There are now three cases.

• Case A: If ${\displaystyle 2k_{1}^{2}>1-\rho }$ and ${\displaystyle k_{2}-k_{1}\geq 2\lambda }$ then
${\displaystyle Z\leq {\frac {2\lambda ^{2}}{2\lambda ^{2}+1+\rho }}.}$
• Case B: If the conditions in case A are not met but k1k2 ≥ 1 and
${\displaystyle 2(k_{1}k_{2}-1)^{2}\geq 2(1-\rho ^{2})+(1-\rho )(k_{2}-k_{1})^{2}}$
then
${\displaystyle Z\leq {\frac {(k_{2}-k_{1})^{2}+4+{\sqrt {16(1-\rho ^{2})+8(1-\rho )(k_{2}-k_{1})}}}{(k_{1}+k_{2})^{2}}}.}$
• Case C: If none of the conditions in cases A or B are satisfied then there is no universal bound other than 1.

### Multivariate

The general case is known as the Birnbaum–Raymond–Zuckerman inequality after the authors who proved it for two dimensions.[13]

${\displaystyle \Pr \left(\sum _{i=1}^{n}{\frac {(X_{i}-\mu _{i})^{2}}{\sigma _{i}^{2}t_{i}^{2}}}\geq k^{2}\right)\leq {\frac {1}{k^{2}}}\sum _{i=1}^{n}{\frac {1}{t_{i}^{2}}}}$

where Xi is the i-th random variable, μi is the i-th mean and σi2 is the i-th variance.

If the variables are independent this inequality can be sharpened.[14]

${\displaystyle \Pr \left(\bigcap _{i=1}^{n}{\frac {|X_{i}-\mu _{i}|}{\sigma _{i}}}\leq k_{i}\right)\geq \prod _{i=1}^{n}\left(1-{\frac {1}{k_{i}^{2}}}\right)}$

Olkin and Pratt derived an inequality for n correlated variables.[15]

${\displaystyle \Pr \left(\bigcap _{i=1}^{n}{\frac {|X_{i}-\mu _{i}|}{\sigma _{i}}}

where the sum is taken over the n variables and

${\displaystyle u=\sum _{i=1}^{n}{\frac {1}{k_{i}^{2}}}+2\sum _{i=1}^{n}\sum _{j

where ρij is the correlation between Xi and Xj.

Olkin and Pratt's inequality was subsequently generalised by Godwin.[16]

### Finite-dimensional vector

Ferentinos[9] has shown that for a vector X = (x1, x2, ...) with mean μ = (μ1, μ2, ...), standard deviation σ = (σ1, σ2, ...) and the Euclidean norm || ⋅ || that

${\displaystyle \Pr(\|X-\mu \|\geq k\|\sigma \|)\leq {\frac {1}{k^{2}}}.}$

A second related inequality has also been derived by Chen.[17] Let n be the dimension of the stochastic vector X and let E(X) be the mean of X. Let S be the covariance matrix and k > 0. Then

${\displaystyle \Pr \left((X-\operatorname {E} (X))^{T}S^{-1}(X-\operatorname {E} (X))

where YT is the transpose of Y. A simple proof was obtained in Navarro[18] as follows:

${\displaystyle Z=(X-\operatorname {E} (X))^{T}S^{-1}(X-\operatorname {E} (X))=(X-\operatorname {E} (X))^{T}S^{-1/2}S^{-1/2}(X-\operatorname {E} (X))=Y^{T}Y\geq 0}$

where

${\displaystyle Y=(Y_{1},...,Y_{n})^{T}=S^{-1/2}(X-\operatorname {E} (X))}$

and ${\displaystyle S^{-1/2}}$ is a symmetric invertible matrix such that: ${\displaystyle S^{-1/2}S^{-1/2}=S^{-1}}$. Hence ${\displaystyle \operatorname {E} (Y)=(0,\ldots ,0)^{T}}$ and ${\displaystyle \operatorname {Cov} (Y)=I_{n}}$ where ${\displaystyle I_{n}}$ represents the identity matrix of dimension n. Then ${\displaystyle \operatorname {E} (Y_{i}^{2})=\operatorname {Var} (Y_{i})=1}$ and

${\displaystyle \operatorname {E} (Z)=\operatorname {E} (Y^{T}Y)=\sum _{i=1}^{n}\operatorname {E} (Y_{i}^{2})=n}$

Finally, by applying Markov's inequality to Z we get

${\displaystyle \Pr \left(Z\geq k\right)=\Pr \left((X-\operatorname {E} (X))^{T}S^{-1}(X-\operatorname {E} (X))\geq k\right)\leq {\frac {\operatorname {E} (Z)}{k}}={\frac {n}{k}}}$

and so the desired inequality holds.

The inequality can be written in terms of the Mahalanobis distance as

${\displaystyle \Pr \left(d_{S}^{2}(X,\operatorname {E} (X))

where the Mahalanobis distance based on S is defined by

${\displaystyle d_{S}(x,y)={\sqrt {(x-y)^{T}S^{-1}(x-y)}}}$

Navarro[19] proved that these bounds are sharp, that is, they are the best possible bounds for that regions when we just know the mean and the covariance matrix of X.

Stellato et al.[20] showed that this multivariate version of the Chebyshev inequality can be easily derived analytically as a special case of Vandenberghe et al.[21] where the bound is computed by solving a semidefinite program (SDP).

### Infinite dimensions

There is a straightforward extension of the vector version of Chebyshev's inequality to infinite dimensional settings. Let X be a random variable which takes values in a Fréchet space ${\displaystyle {\mathcal {X}}}$ (equipped with seminorms || ⋅ ||α). This includes most common settings of vector-valued random variables, e.g., when ${\displaystyle {\mathcal {X}}}$ is a Banach space (equipped with a single norm), a Hilbert space, or the finite-dimensional setting as described above.

Suppose that X is of "strong order two", meaning that

${\displaystyle \operatorname {E} \left(\|X\|_{\alpha }^{2}\right)<\infty }$

for every seminorm || ⋅ ||α. This is a generalization of the requirement that X have finite variance, and is necessary for this strong form of Chebyshev's inequality in infinite dimensions. The terminology "strong order two" is due to Vakhania.[22]

Let ${\displaystyle \mu \in {\mathcal {X}}}$ be the Pettis integral of X (i.e., the vector generalization of the mean), and let

${\displaystyle \sigma _{a}:={\sqrt {\operatorname {E} \|X-\mu \|_{\alpha }^{2}}}}$

be the standard deviation with respect to the seminorm || ⋅ ||α. In this setting we can state the following:

General version of Chebyshev's inequality. ${\displaystyle \forall k>0:\quad \Pr \left(\|X-\mu \|_{\alpha }\geq k\sigma _{\alpha }\right)\leq {\frac {1}{k^{2}}}.}$

Proof. The proof is straightforward, and essentially the same as the finitary version. If σα = 0, then X is constant (and equal to μ) almost surely, so the inequality is trivial.

If

${\displaystyle \|X-\mu \|_{\alpha }\geq k\sigma _{\alpha }^{2}}$

then ||Xμ||α > 0, so we may safely divide by ||Xμ||α. The crucial trick in Chebyshev's inequality is to recognize that ${\displaystyle 1={\tfrac {\|X-\mu \|_{\alpha }^{2}}{\|X-\mu \|_{\alpha }^{2}}}}$.

The following calculations complete the proof:

{\displaystyle {\begin{aligned}\Pr \left(\|X-\mu \|_{\alpha }\geq k\sigma _{\alpha }\right)&=\int _{\Omega }\mathbf {1} _{\|X-\mu \|_{\alpha }\geq k\sigma _{\alpha }}\,\mathrm {d} \Pr \\&=\int _{\Omega }\left({\frac {\|X-\mu \|_{\alpha }^{2}}{\|X-\mu \|_{\alpha }^{2}}}\right)\cdot \mathbf {1} _{\|X-\mu \|_{\alpha }\geq k\sigma _{\alpha }}\,\mathrm {d} \Pr \\[6pt]&\leq \int _{\Omega }\left({\frac {\|X-\mu \|_{\alpha }^{2}}{(k\sigma _{\alpha })^{2}}}\right)\cdot \mathbf {1} _{\|X-\mu \|_{\alpha }\geq k\sigma _{\alpha }}\,\mathrm {d} \Pr \\[6pt]&\leq {\frac {1}{k^{2}\sigma _{\alpha }^{2}}}\int _{\Omega }\|X-\mu \|_{\alpha }^{2}\,\mathrm {d} \Pr &&\mathbf {1} _{\|X-\mu \|_{\alpha }\geq k\sigma _{\alpha }}\leq 1\\[6pt]&={\frac {1}{k^{2}\sigma _{\alpha }^{2}}}\left(\operatorname {E} \|X-\mu \|_{\alpha }^{2}\right)\\[6pt]&={\frac {1}{k^{2}\sigma _{\alpha }^{2}}}\left(\sigma _{\alpha }^{2}\right)\\[6pt]&={\frac {1}{k^{2}}}\end{aligned}}}

### Higher moments

An extension to higher moments is also possible:

${\displaystyle \Pr \left(|X-\operatorname {E} (X)|\geq k\operatorname {E} (|X-\operatorname {E} (X)|^{n})^{\frac {1}{n}}\right)\leq {\frac {1}{k^{n}}},\qquad k>0,n\geq 2.}$

### Exponential moment

A related inequality sometimes known as the exponential Chebyshev's inequality[23] is the inequality

${\displaystyle \Pr(X\geq \varepsilon )\leq e^{-t\varepsilon }\operatorname {E} \left(e^{tX}\right),\qquad t>0.}$

Let K(t) be the cumulant generating function,

${\displaystyle K(t)=\log \left(\operatorname {E} \left(e^{tx}\right)\right).}$

Taking the Legendre–Fenchel transformation[clarification needed] of K(t) and using the exponential Chebyshev's inequality we have

${\displaystyle -\log(\Pr(X\geq \varepsilon ))\geq \sup _{t}(t\varepsilon -K(t)).}$

This inequality may be used to obtain exponential inequalities for unbounded variables.[24]

### Bounded variables

If P(x) has finite support based on the interval [a, b], let M = max(|a|, |b|) where |x| is the absolute value of x. If the mean of P(x) is zero then for all k > 0[25]

${\displaystyle {\frac {\operatorname {E} (|X|^{r})-k^{r}}{M^{r}}}\leq \Pr(|X|\geq k)\leq {\frac {\operatorname {E} (|X|^{r})}{k^{r}}}.}$

The second of these inequalities with r = 2 is the Chebyshev bound. The first provides a lower bound for the value of P(x).

Sharp bounds for a bounded variate have been proposed by Niemitalo, but without a proof though[26]

Let 0 ≤ XM where M > 0. Then

• Case 1:
${\displaystyle \Pr(Xk\quad {\text{and}}\quad \operatorname {E} (X^{2})
• Case 2:
${\displaystyle \Pr(Xk\quad {\text{and}}\quad \operatorname {E} (X^{2})\geq k\operatorname {E} (X)+M\operatorname {E} (X)-kM\\\qquad \qquad \qquad {\text{or}}\\\operatorname {E} (X)\leq k\quad {\text{and}}\quad \operatorname {E} (X^{2})\geq k\operatorname {E} (X)\end{cases}}}$
• Case 3:
${\displaystyle \Pr(X

## Finite samples

### Univariate case

Saw et al extended Chebyshev's inequality to cases where the population mean and variance are not known and may not exist, but the sample mean and sample standard deviation from N samples are to be employed to bound the expected value of a new drawing from the same distribution.[27]

${\displaystyle P(|X-m|\geq ks)\leq {\frac {g_{N+1}\left({\frac {Nk^{2}}{N-1+k^{2}}}\right)}{N+1}}\left({\frac {N}{N+1}}\right)^{1/2}}$

where X is a random variable which we have sampled N times, m is the sample mean, k is a constant and s is the sample standard deviation. g(x) is defined as follows:

Let x ≥ 1, Q = N + 1, and R be the greatest integer less than Q/x. Let

${\displaystyle a^{2}={\frac {Q(Q-R)}{1+R(Q-R)}}.}$

Now

${\displaystyle g_{Q}(x)={\begin{cases}R&{\text{if }}R{\text{ is even,}}\\R&{\text{if }}R{\text{ is odd and }}x

This inequality holds even when the population moments do not exist, and when the sample is only weakly exchangeably distributed; this criterion is met for randomised sampling. A table of values for the Saw–Yang–Mo inequality for finite sample sizes (N < 100) has been determined by Konijn.[28] The table allows the calculation of various confidence intervals for the mean, based on multiples, C, of the standard error of the mean as calculated from the sample. For example, Konijn shows that for N = 59, the 95 percent confidence interval for the mean m is (mCs, m + Cs) where C = 4.447 × 1.006 = 4.47 (this is 2.28 times larger than the value found on the assumption of normality showing the loss on precision resulting from ignorance of the precise nature of the distribution).

Kabán gives a somewhat less complex version of this inequality.[29]

${\displaystyle P(|X-m|\geq ks)\leq {\frac {1}{N+1}}\left\lfloor {\frac {N+1}{N}}\left({\frac {N-1}{k^{2}}}+1\right)\right\rfloor }$

If the standard deviation is a multiple of the mean then a further inequality can be derived,[29]

${\displaystyle P(|X-m|\geq ks)\leq {\frac {N-1}{N}}{\frac {1}{k^{2}}}{\frac {s^{2}}{m^{2}}}+{\frac {1}{N}}.}$

A table of values for the Saw–Yang–Mo inequality for finite sample sizes (N < 100) has been determined by Konijn.[28]

For fixed N and large m the Saw–Yang–Mo inequality is approximately[30]

${\displaystyle P(|X-m|\geq ks)\leq {\frac {1}{N+1}}.}$

Beasley et al have suggested a modification of this inequality[30]

${\displaystyle P(|X-m|\geq ks)\leq {\frac {1}{k^{2}(N+1)}}.}$

In empirical testing this modification is conservative but appears to have low statistical power. Its theoretical basis currently remains unexplored.

#### Dependence on sample size

The bounds these inequalities give on a finite sample are less tight than those the Chebyshev inequality gives for a distribution. To illustrate this let the sample size N = 100 and let k = 3. Chebyshev's inequality states that at most approximately 11.11% of the distribution will lie at least three standard deviations away from the mean. Kabán's version of the inequality for a finite sample states that at most approximately 12.05% of the sample lies outside these limits. The dependence of the confidence intervals on sample size is further illustrated below.

For N = 10, the 95% confidence interval is approximately ±13.5789 standard deviations.

For N = 100 the 95% confidence interval is approximately ±4.9595 standard deviations; the 99% confidence interval is approximately ±140.0 standard deviations.

For N = 500 the 95% confidence interval is approximately ±4.5574 standard deviations; the 99% confidence interval is approximately ±11.1620 standard deviations.

For N = 1000 the 95% and 99% confidence intervals are approximately ±4.5141 and approximately ±10.5330 standard deviations respectively.

The Chebyshev inequality for the distribution gives 95% and 99% confidence intervals of approximately ±4.472 standard deviations and ±10 standard deviations respectively.

#### Samuelson's inequality

Although Chebyshev's inequality is the best possible bound for an arbitrary distribution, this is not necessarily true for finite samples. Samuelson's inequality states that all values of a sample will lie within N − 1 standard deviations of the mean. Chebyshev's bound improves as the sample size increases.

When N = 10, Samuelson's inequality states that all members of the sample lie within 3 standard deviations of the mean: in contrast Chebyshev's states that 99.5% of the sample lies within 13.5789 standard deviations of the mean.

When N = 100, Samuelson's inequality states that all members of the sample lie within approximately 9.9499 standard deviations of the mean: Chebyshev's states that 99% of the sample lies within 10 standard deviations of the mean.

When N = 500, Samuelson's inequality states that all members of the sample lie within approximately 22.3383 standard deviations of the mean: Chebyshev's states that 99% of the sample lies within 10 standard deviations of the mean.

### Multivariate case

Stellato et al.[20] simplified the notation and extended the empirical Chebyshev inequality from Saw et al.[27] to the multivariate case. Let ${\textstyle \xi \in \mathbb {R} ^{n_{\xi }}}$ be a random variable and let ${\textstyle N\in \mathbb {Z} _{\geq n_{\xi }}}$. We draw ${\textstyle N+1}$ iid samples of ${\textstyle \xi }$ denoted as ${\textstyle \xi ^{(1)},\dots ,\xi ^{(N)},\xi ^{(N+1)}\in \mathbb {R} ^{n_{\xi }}}$. Based on the first ${\textstyle N}$ samples, we define the empirical mean as ${\textstyle \mu _{N}={\frac {1}{N}}\sum _{i=1}^{N}\xi ^{(i)}}$ and the unbiased empirical covariance as ${\textstyle \Sigma _{N}={\frac {1}{N}}\sum _{i=1}^{N}(\xi ^{(i)}-\mu _{N})(\xi ^{(i)}-\mu _{N})^{\top }}$. If ${\displaystyle \Sigma _{N}}$ is nonsingular, then for all ${\displaystyle \lambda \in \mathbb {R} _{\geq 0}}$ then

{\displaystyle {\begin{aligned}&P^{N+1}\left((\xi ^{(N+1)}-\mu _{N})^{\top }\Sigma _{N}^{-1}(\xi ^{(N+1)}-\mu _{N})\geq \lambda ^{2}\right)\\[8pt]\leq {}&\min \left\{1,{\frac {1}{N+1}}\left\lfloor {\frac {n_{\xi }(N+1)(N^{2}-1+N\lambda ^{2})}{N^{2}\lambda ^{2}}}\right\rfloor \right\}.\end{aligned}}}

#### Remarks

In the univariate case, i.e. ${\textstyle n_{\xi }=1}$, this inequality corresponds to the one from Saw et al.[27] Moreover, the right-hand side can be simplified by upper bounding the floor function by its argument

${\displaystyle P^{N+1}\left((\xi ^{(N+1)}-\mu _{N})^{\top }\Sigma _{N}^{-1}(\xi ^{(N+1)}-\mu _{N})\geq \lambda ^{2}\right)\leq \min \left\{1,{\frac {n_{\xi }(N^{2}-1+N\lambda ^{2})}{N^{2}\lambda ^{2}}}\right\}.}$

As ${\textstyle N\to \infty }$, the right-hand side tends to ${\textstyle \min \left\{1,{\frac {n_{\xi }}{\lambda ^{2}}}\right\}}$ which corresponds to the multivariate Chebyshev inequality over ellipsoids shaped according to ${\textstyle \Sigma }$ and centered in ${\textstyle \mu }$.

## Sharpened bounds

Chebyshev's inequality is important because of its applicability to any distribution. As a result of its generality it may not (and usually does not) provide as sharp a bound as alternative methods that can be used if the distribution of the random variable is known. To improve the sharpness of the bounds provided by Chebyshev's inequality a number of methods have been developed; for a review see eg.[31]

### Standardised variables

Sharpened bounds can be derived by first standardising the random variable.[32]

Let X be a random variable with finite variance Var(X). Let Z be the standardised form defined as

${\displaystyle Z={\frac {X-\operatorname {E} (X)}{\operatorname {Var} (X)^{1/2}}}.}$

Cantelli's lemma is then

${\displaystyle P(Z\geq k)\leq {\frac {1}{1+k^{2}}}.}$

This inequality is sharp and is attained by k and −1/k with probability 1/(1 + k2) and k2/(1 + k2) respectively.

If k > 1 and the distribution of X is symmetric then we have

${\displaystyle P(Z\geq k)\leq {\frac {1}{2k^{2}}}.}$

Equality holds if and only if Z = −k, 0 or k with probabilities 1 / 2 k2, 1 − 1 / k2 and 1 / 2 k2 respectively.[32] An extension to a two-sided inequality is also possible.

Let u, v > 0. Then we have[32]

${\displaystyle P(Z\leq -u{\text{ or }}Z\geq v)\leq {\frac {4+(u-v)^{2}}{(u+v)^{2}}}.}$

### Semivariances

An alternative method of obtaining sharper bounds is through the use of semivariances (partial variances). The upper (σ+2) and lower (σ2) semivariances are defined as

${\displaystyle \sigma _{+}^{2}={\frac {\sum _{x>m}(x-m)^{2}}{n-1}},}$
${\displaystyle \sigma _{-}^{2}={\frac {\sum _{x

where m is the arithmetic mean of the sample and n is the number of elements in the sample.

The variance of the sample is the sum of the two semivariances:

${\displaystyle \sigma ^{2}=\sigma _{+}^{2}+\sigma _{-}^{2}.}$

In terms of the lower semivariance Chebyshev's inequality can be written[33]

${\displaystyle \Pr(x\leq m-a\sigma _{-})\leq {\frac {1}{a^{2}}}.}$

Putting

${\displaystyle a={\frac {k\sigma }{\sigma _{-}}}.}$

Chebyshev's inequality can now be written

${\displaystyle \Pr(x\leq m-k\sigma )\leq {\frac {1}{k^{2}}}{\frac {\sigma _{-}^{2}}{\sigma ^{2}}}.}$

A similar result can also be derived for the upper semivariance.

If we put

${\displaystyle \sigma _{u}^{2}=\max(\sigma _{-}^{2},\sigma _{+}^{2}),}$

Chebyshev's inequality can be written

${\displaystyle \Pr(|x\leq m-k\sigma |)\leq {\frac {1}{k^{2}}}{\frac {\sigma _{u}^{2}}{\sigma ^{2}}}.}$

Because σu2σ2, use of the semivariance sharpens the original inequality.

If the distribution is known to be symmetric, then

${\displaystyle \sigma _{+}^{2}=\sigma _{-}^{2}={\frac {1}{2}}\sigma ^{2}}$

and

${\displaystyle \Pr(x\leq m-k\sigma )\leq {\frac {1}{2k^{2}}}.}$

This result agrees with that derived using standardised variables.

Note
The inequality with the lower semivariance has been found to be of use in estimating downside risk in finance and agriculture.[33][34][35]

### Selberg's inequality

Selberg derived an inequality for P(x) when axb.[36] To simplify the notation let

${\displaystyle Y=\alpha X+\beta }$

where

${\displaystyle \alpha ={\frac {2k}{b-a}}}$

and

${\displaystyle \beta ={\frac {-(b+a)k}{b-a}}.}$

The result of this linear transformation is to make P(aXb) equal to P(|Y| ≤ k).

The mean (μX) and variance (σX) of X are related to the mean (μY) and variance (σY) of Y:

${\displaystyle \mu _{Y}=\alpha \mu _{X}+\beta }$
${\displaystyle \sigma _{Y}^{2}=\alpha ^{2}\sigma _{X}^{2}.}$

With this notation Selberg's inequality states that

${\displaystyle \Pr(|Y|
${\displaystyle \Pr(|Y|
${\displaystyle P(|Y|

These are known to be the best possible bounds.[37]

### Cantelli's inequality

Cantelli's inequality[38] due to Francesco Paolo Cantelli states that for a real random variable (X) with mean (μ) and variance (σ2)

${\displaystyle P(X-\mu \geq a)\leq {\frac {\sigma ^{2}}{\sigma ^{2}+a^{2}}}}$

where a ≥ 0.

This inequality can be used to prove a one tailed variant of Chebyshev's inequality with k > 0[39]

${\displaystyle \Pr(X-\mu \geq k\sigma )\leq {\frac {1}{1+k^{2}}}.}$

The bound on the one tailed variant is known to be sharp. To see this consider the random variable X that takes the values

${\displaystyle X=1}$ with probability ${\displaystyle {\frac {\sigma ^{2}}{1+\sigma ^{2}}}}$
${\displaystyle X=-\sigma ^{2}}$ with probability ${\displaystyle {\frac {1}{1+\sigma ^{2}}}.}$

Then E(X) = 0 and E(X2) = σ2 and P(X < 1) = 1 / (1 + σ2).

#### An application - distance between the mean and the median

The one-sided variant can be used to prove the proposition that for probability distributions having an expected value and a median, the mean and the median can never differ from each other by more than one standard deviation. To express this in symbols let μ, ν, and σ be respectively the mean, the median, and the standard deviation. Then

${\displaystyle \left|\mu -\nu \right|\leq \sigma .}$

There is no need to assume that the variance is finite because this inequality is trivially true if the variance is infinite.

The proof is as follows. Setting k = 1 in the statement for the one-sided inequality gives:

${\displaystyle \Pr(X-\mu \geq \sigma )\leq {\frac {1}{2}}\implies \Pr(X\geq \mu +\sigma )\leq {\frac {1}{2}}.}$

Changing the sign of X and of μ, we get

${\displaystyle \Pr(X\leq \mu -\sigma )\leq {\frac {1}{2}}.}$

As the median is by definition any real number m that satisfies the inequalities

${\displaystyle \operatorname {P} (X\leq m)\geq {\frac {1}{2}}{\text{ and }}\operatorname {P} (X\geq m)\geq {\frac {1}{2}}}$

this implies that the median lies within one standard deviation of the mean. A proof using Jensen's inequality also exists.

### Bhattacharyya's inequality

Bhattacharyya[40] extended Cantelli's inequality using the third and fourth moments of the distribution.

Let μ = 0 and σ2 be the variance. Let γ = E(X3)/σ3 and κ = E(X4)/σ4.

If k2kγ − 1 > 0 then

${\displaystyle P(X>k\sigma )\leq {\frac {\kappa -\gamma ^{2}-1}{(\kappa -\gamma ^{2}-1)(1+k^{2})+(k^{2}-k\gamma -1)}}.}$

The necessity of k2kγ − 1 > 0 requires that k be reasonably large.

### Mitzenmacher and Upfal's inequality

Mitzenmacher and Upfal[41] note that

${\displaystyle (X-\operatorname {E} [X])^{2k}>0}$

for any integer k > 0 and that

${\displaystyle \operatorname {E} [(X-\operatorname {E} (X))^{2k}]}$

is the 2kth central moment. They then show that for t > 0

${\displaystyle \Pr \left(|X-\operatorname {E} [X]|>t\operatorname {E} [(X-\operatorname {E} [X])^{2k}]^{1/2k}\right)\leq {\frac {1}{t^{2k}}}.}$

For k = 1 we obtain Chebyshev's inequality. For t ≥ 1, k > 2 and assuming that the kth moment exists, this bound is tighter than Chebyshev's inequality.

## Related inequalities

Several other related inequalities are also known.

### Zelen's inequality

Zelen has shown that[42]

${\displaystyle \Pr(X-\mu \geq k\sigma )\leq \left[1+k^{2}+{\frac {\left(k^{2}-k\theta _{3}-1\right)^{2}}{\theta _{4}-\theta _{3}^{2}-1}}\right]^{-1}}$

with

${\displaystyle k\geq {\frac {\theta _{3}+{\sqrt {\theta _{3}^{2}+4}}}{2}},\qquad \theta _{m}={\frac {M_{m}}{\sigma }}}$

where Mm is the m-th moment[clarification needed] and σ is the standard deviation.

### He, Zhang and Zhang's inequality

For any collection of n non-negative independent random variables Xi with expectation 1 [43]

${\displaystyle \Pr \left({\frac {\sum _{i=1}^{n}X_{i}}{n}}-1\geq {\frac {1}{n}}\right)\leq {\frac {7}{8}}.}$

### Hoeffding's lemma

Let X be a random variable with aXb and E[X] = 0, then for any s > 0, we have

${\displaystyle E\left[e^{sX}\right]\leq e^{{\frac {1}{8}}s^{2}(b-a)^{2}}.}$

### Van Zuijlen's bound

Let Xi be a set of independent Rademacher random variables: Pr(Xi = 1) = Pr(Xi = −1) = 0.5. Then[44]

${\displaystyle \Pr \left(\left|{\frac {\sum _{i=1}^{n}X_{i}}{\sqrt {n}}}\right|\leq 1\right)\geq 0.5.}$

The bound is sharp and better than that which can be derived from the normal distribution (approximately Pr > 0.31).

## Unimodal distributions

A distribution function F is unimodal at ν if its cumulative distribution function is convex on (−∞, ν) and concave on (ν,∞)[45] An empirical distribution can be tested for unimodality with the dip test.[46]

In 1823 Gauss showed that for a unimodal distribution with a mode of zero[47]

${\displaystyle P(|X|\geq k)\leq {\frac {4\operatorname {E} (X^{2})}{9k^{2}}}\quad {\text{if}}\quad k^{2}\geq {\frac {4}{3}}\operatorname {E} (X^{2}),}$
${\displaystyle P(|X|\geq k)\leq 1-{\frac {k}{{\sqrt {3}}\operatorname {E} (X^{2})}}\quad {\text{if}}\quad k^{2}\leq {\frac {4}{3}}\operatorname {E} (X^{2}).}$

If the mode is not zero and the mean (μ) and standard deviation (σ) are both finite, then denoting the median as ν and the root mean square deviation from the mode by ω, we have[citation needed]

${\displaystyle \sigma \leq \omega \leq 2\sigma }$

and

${\displaystyle |\nu -\mu |\leq {\sqrt {\frac {3}{4}}}\omega .}$

Winkler in 1866 extended Gauss' inequality to rth moments [48] where r > 0 and the distribution is unimodal with a mode of zero:

${\displaystyle P(|X|\geq k)\leq \left({\frac {r}{r+1}}\right)^{r}{\frac {\operatorname {E} (|X|)^{r}}{k^{r}}}\quad {\text{if}}\quad k^{r}\geq {\frac {r^{r}}{(r+1)^{r+1}}}\operatorname {E} (|X|^{r}),}$
${\displaystyle P(|X|\geq k)\leq \left(1-\left[{\frac {k^{r}}{(r+1)\operatorname {E} (|X|)^{r}}}\right]^{1/r}\right)\quad {\text{if}}\quad k^{r}\leq {\frac {r^{r}}{(r+1)^{r+1}}}\operatorname {E} (|X|^{r}).}$

Gauss' bound has been subsequently sharpened and extended to apply to departures from the mean rather than the mode due to the Vysochanskiï–Petunin inequality. The latter has been extended by Dharmadhikari and Joag-Dev[49]

${\displaystyle P(|X|>k)\leq \max \left(\left[{\frac {r}{(r+1)k}}\right]^{r}E|X^{r}|,{\frac {s}{(s-1)k^{r}}}E|X^{r}|-{\frac {1}{s-1}}\right)}$

where s is a constant satisfying both s > r + 1 and s(s − r − 1) = rr and r > 0.

It can be shown that these inequalities are the best possible and that further sharpening of the bounds requires that additional restrictions be placed on the distributions.

### Unimodal symmetrical distributions

The bounds on this inequality can also be sharpened if the distribution is both unimodal and symmetrical.[50] An empirical distribution can be tested for symmetry with a number of tests including McWilliam's R*.[51] It is known that the variance of a unimodal symmetrical distribution with finite support [ab] is less than or equal to ( b − a )2 / 12.[52]

Let the distribution be supported on the finite interval [ −NN ] and the variance be finite. Let the mode of the distribution be zero and rescale the variance to 1. Let k > 0 and assume k < 2N/3. Then[50]

${\displaystyle P(X\geq k)\leq {\frac {1}{2}}-{\frac {k}{2{\sqrt {3}}}}\quad {\text{if}}\quad 0\leq k\leq {\frac {2}{\sqrt {3}}},}$
${\displaystyle P(X\geq k)\leq {\frac {2}{9k^{2}}}\quad {\text{if}}\quad {\frac {2}{\sqrt {3}}}\leq k\leq {\frac {2N}{3}}.}$

If 0 < k ≤ 2 / 3 the bounds are reached with the density[50]

${\displaystyle f(x)={\frac {1}{2{\sqrt {3}}}}\quad {\text{if}}\quad |x|<{\sqrt {3}}}$
${\displaystyle f(x)=0\quad {\text{if}}\quad |x|\geq {\sqrt {3}}.}$

If 2 / 3 < k ≤ 2N / 3 the bounds are attained by the distribution

${\displaystyle (1-\beta _{k})\delta _{0}(x)+\beta _{k}f_{k}(x),}$

where βk = 4 / 3k2, δ0 is the Dirac delta function and where

${\displaystyle f_{k}(x)={\frac {1}{3k}}\quad {\text{if}}\quad |x|<{\frac {3k}{2}},}$
${\displaystyle f_{k}(x)=0\quad {\text{if}}\quad |x|\geq {\frac {3k}{2}}.}$

The existence of these densities shows that the bounds are optimal. Since N is arbitrary these bounds apply to any value of N.

The Camp–Meidell's inequality is a related inequality.[53] For an absolutely continuous unimodal and symmetrical distribution

${\displaystyle P(|X-\mu |\geq k\sigma )\leq 1-{\frac {k}{\sqrt {3}}}\quad {\text{if}}\quad k\leq {\frac {2}{\sqrt {3}}},}$
${\displaystyle P(|X-\mu |\geq k\sigma )\leq {\frac {4}{9k^{2}}}\quad {\text{if}}\quad k>{\frac {2}{\sqrt {3}}}.}$

DasGupta has shown that if the distribution is known to be normal[54]

${\displaystyle P(|X-\mu |\geq k\sigma )\leq {\frac {1}{3k^{2}}}.}$

### Notes

#### Effects of symmetry and unimodality

Symmetry of the distribution decreases the inequality's bounds by a factor of 2 while unimodality sharpens the bounds by a factor of 4/9.[citation needed]

Because the mean and the mode in a unimodal distribution differ by at most 3 standard deviations[55] at most 5% of a symmetrical unimodal distribution lies outside (210 + 33)/3 standard deviations of the mean (approximately 3.840 standard deviations). This is sharper than the bounds provided by the Chebyshev inequality (approximately 4.472 standard deviations).

These bounds on the mean are less sharp than those that can be derived from symmetry of the distribution alone which shows that at most 5% of the distribution lies outside approximately 3.162 standard deviations of the mean. The Vysochanskiï–Petunin inequality further sharpens this bound by showing that for such a distribution that at most 5% of the distribution lies outside 45/3 (approximately 2.981) standard deviations of the mean.

#### Symmetrical unimodal distributions

For any symmetrical unimodal distribution[citation needed]

• at most approximately 5.784% of the distribution lies outside 1.96 standard deviations of the mode
• at most 5% of the distribution lies outside 210/3 (approximately 2.11) standard deviations of the mode

#### Normal distributions

DasGupta's inequality states that for a normal distribution at least 95% lies within approximately 2.582 standard deviations of the mean. This is less sharp than the true figure (approximately 1.96 standard deviations of the mean).

## Bounds for specific distributions

• DasGupta has determined a set of best possible bounds for a normal distribution for this inequality.[54]
• Steliga and Szynal have extended these bounds to the Pareto distribution.[8]
• Grechuk et al. developed a general method for deriving the best possible bounds in Chebyshev's inequality for any family of distributions, and any deviation risk measure in place of standard deviation. In particular, they derived Chebyshev inequality for distributions with log-concave densities.[56]

## Zero means

When the mean (μ) is zero Chebyshev's inequality takes a simple form. Let σ2 be the variance. Then

${\displaystyle P(|X|\geq 1)\leq \sigma ^{2}.}$

With the same conditions Cantelli's inequality takes the form

${\displaystyle P(X\geq 1)\leq {\frac {\sigma ^{2}}{1+\sigma ^{2}}}.}$

### Unit variance

If in addition E( X2 ) = 1 and E( X4 ) = ψ then for any 0 ≤ ε ≤ 1[57]

${\displaystyle \Pr(|X|>\varepsilon )\geq {\frac {(1-\epsilon ^{2})^{2}}{\psi -1+(1-\varepsilon ^{2})^{2}}}\geq {\frac {(1-\varepsilon ^{2})^{2}}{\psi }}.}$

The first inequality is sharp. This is known as the Paley–Zygmund inequality.

It is also known that for a random variable obeying the above conditions that[58]

${\displaystyle P(X\geq \varepsilon )\geq {\frac {C_{0}}{\psi }}-{\frac {C_{1}}{\sqrt {\psi }}}\varepsilon +{\frac {C_{2}}{\psi {\sqrt {\psi }}}}\varepsilon }$

where

${\displaystyle C_{0}=2{\sqrt {3}}-3\quad (\approxeq 0.464),}$
${\displaystyle C_{1}=1.397,}$
${\displaystyle C_{2}=0.0231.}$

It is also known that[58]

${\displaystyle \Pr(X>0)\geq {\frac {C_{0}}{\psi }}.}$

The value of C0 is optimal and the bounds are sharp if

${\displaystyle \psi \geq {\frac {3}{{\sqrt {3}}+1}}\quad (\approxeq 1.098).}$

If

${\displaystyle \psi \leq {\frac {3}{{\sqrt {3}}+1}}}$

then the sharp bound is

${\displaystyle P(X>0)\geq {\frac {2}{3+\psi +{\sqrt {(1+\psi )^{2}-4}}}}.}$

## Integral Chebyshev inequality

There is a second (less well known) inequality also named after Chebyshev[59]

If f, g : [a, b] → R are two monotonic functions of the same monotonicity, then

${\displaystyle {\frac {1}{b-a}}\int _{a}^{b}\!f(x)g(x)\,dx\geq \left[{\frac {1}{b-a}}\int _{a}^{b}\!f(x)\,dx\right]\left[{\frac {1}{b-a}}\int _{a}^{b}\!g(x)\,dx\right].}$

If f and g are of opposite monotonicity, then the above inequality works in the reverse way.

This inequality is related to Jensen's inequality,[60] Kantorovich's inequality,[61] the Hermite–Hadamard inequality[61] and Walter's conjecture.[62]

### Other inequalities

There are also a number of other inequalities associated with Chebyshev:

## Haldane's transformation

One use of Chebyshev's inequality in applications is to create confidence intervals for variates with an unknown distribution. Haldane noted,[63] using an equation derived by Kendall,[64] that if a variate (x) has a zero mean, unit variance and both finite skewness (γ) and kurtosis (κ) then the variate can be converted to a normally distributed standard score (z):

${\displaystyle z=x-{\frac {\gamma }{6}}(x^{2}-1)+{\frac {x}{72}}[2\gamma ^{2}(4x^{2}-7)-3\kappa (x^{2}-3)]+\cdots }$

This transformation may be useful as an alternative to Chebyshev's inequality or as an adjunct to it for deriving confidence intervals for variates with unknown distributions.

While this transformation may be useful for moderately skewed and/or kurtotic distributions, it performs poorly when the distribution is markedly skewed and/or kurtotic.

## Notes

The Environmental Protection Agency has suggested best practices for the use of Chebyshev's inequality for estimating confidence intervals.[65] This caution appears to be justified as its use in this context may be seriously misleading.[66]

## References

1. ^ Kvanli, Alan H.; Pavur, Robert J.; Keeling, Kellie B. (2006). Concise Managerial Statistics. cEngage Learning. pp. 81–82. ISBN 9780324223880.
2. ^ Chernick, Michael R. (2011). The Essentials of Biostatistics for Physicians, Nurses, and Clinicians. John Wiley & Sons. pp. 49–50. ISBN 9780470641859.
3. ^ Knuth, Donald (1997). The Art of Computer Programming: Fundamental Algorithms, Volume 1 (3rd ed.). Reading, Massachusetts: Addison–Wesley. ISBN 978-0-201-89683-1. Retrieved 1 October 2012.
4. ^ Bienaymé, I.-J. (1853). "Considérations àl'appui de la découverte de Laplace". Comptes Rendus de l'Académie des Sciences. 37: 309–324.
5. ^ Tchebichef, P. (1867). "Des valeurs moyennes". Journal de Mathématiques Pures et Appliquées. 2. 12: 177–184.
6. ^ Markov A. (1884) On certain applications of algebraic continued fractions, Ph.D. thesis, St. Petersburg
7. ^ Grafakos, Lukas (2004). Classical and Modern Fourier Analysis. Pearson Education Inc. p. 5.
8. ^ a b Steliga, Katarzyna; Szynal, Dominik (2010). "On Markov-Type Inequalities" (PDF). International Journal of Pure and Applied Mathematics. 58 (2): 137–152. ISSN 1311-8080. Retrieved 10 October 2012.
9. ^ a b Ferentinos, K (1982). "On Tchebycheﬀ type inequalities". Trabajos Estadıst Investigacion Oper. 33: 125–132. doi:10.1007/BF02888707.
10. ^ Berge, P. O. (1938). "A note on a form of Tchebycheff's theorem for two variables". Biometrika. 29 (3/4): 405–406. doi:10.2307/2332015. JSTOR 2332015.
11. ^ Lal D. N. (1955) A note on a form of Tchebycheﬀ's inequality for two or more variables. Sankhya 15(3):317–320
12. ^ Isii K. (1959) On a method for generalizations of Tchebycheff's inequality. Ann Inst Stat Math 10: 65–88
13. ^ Birnbaum, Z. W.; Raymond, J.; Zuckerman, H. S. (1947). "A Generalization of Tshebyshev's Inequality to Two Dimensions". The Annals of Mathematical Statistics. 18 (1): 70–79. doi:10.1214/aoms/1177730493. ISSN 0003-4851. MR 0019849. Zbl 0032.03402. Retrieved 7 October 2012.
14. ^ Kotz, Samuel; Balakrishnan, N.; Johnson, Norman L. (2000). Continuous Multivariate Distributions, Volume 1, Models and Applications (2nd ed.). Boston [u.a.]: Houghton Mifflin. ISBN 978-0-471-18387-7. Retrieved 7 October 2012.
15. ^ Olkin, Ingram; Pratt, John W. (1958). "A Multivariate Tchebycheff Inequality". The Annals of Mathematical Statistics. 29 (1): 226–234. doi:10.1214/aoms/1177706720. MR 0093865. Zbl 0085.35204.
16. ^ Godwin H. J. (1964) Inequalities on distribution functions. New York, Hafner Pub. Co.
17. ^ Xinjia Chen (2007). "A New Generalization of Chebyshev Inequality for Random Vectors". arXiv:0707.0805v2 [math.ST].
18. ^ Jorge Navarro (2016). "A very simple proof of the multivariate Chebyshev's inequality". Communications in Statistics – Theory and Methods. 45 (12): 3458–3463. doi:10.1080/03610926.2013.873135.
19. ^ Jorge Navarro (2014). "Can the bounds in the multivariate Chebyshev inequality be attained?". Statistics and Probability Letters. 91: 1–5. doi:10.1016/j.spl.2014.03.028.
20. ^ a b Stellato, Bartolomeo; Parys, Bart P. G. Van; Goulart, Paul J. (2016-05-31). "Multivariate Chebyshev Inequality with Estimated Mean and Variance". The American Statistician. 0 (ja): 123–127. arXiv:1509.08398. doi:10.1080/00031305.2016.1186559. ISSN 0003-1305.
21. ^ Vandenberghe, L.; Boyd, S.; Comanor, K. (2007-01-01). "Generalized Chebyshev Bounds via Semidefinite Programming". SIAM Review. 49 (1): 52–64. Bibcode:2007SIAMR..49...52V. CiteSeerX 10.1.1.126.9105. doi:10.1137/S0036144504440543. ISSN 0036-1445.
22. ^ Vakhania, Nikolai Nikolaevich. Probability distributions on linear spaces. New York: North Holland, 1981.
23. ^ Section 2.1 Archived April 30, 2015, at the Wayback Machine
24. ^ Baranoski, Gladimir V. G.; Rokne, Jon G.; Xu, Guangwu (15 May 2001). "Applying the exponential Chebyshev inequality to the nondeterministic computation of form factors". Journal of Quantitative Spectroscopy and Radiative Transfer. 69 (4): 199–200. Bibcode:2001JQSRT..69..447B. doi:10.1016/S0022-4073(00)00095-9. (the references for this article are corrected by Baranoski, Gladimir V. G.; Rokne, Jon G.; Guangwu Xu (15 January 2002). "Corrigendum to: 'Applying the exponential Chebyshev inequality to the nondeterministic computation of form factors'". Journal of Quantitative Spectroscopy and Radiative Transfer. 72 (2): 199–200. Bibcode:2002JQSRT..72..199B. doi:10.1016/S0022-4073(01)00171-6.)
25. ^ Dufour (2003) Properties of moments of random variables
26. ^ Niemitalo O. (2012) One-sided Chebyshev-type inequalities for bounded probability distributions.
27. ^ a b c Saw, John G.; Yang, Mark C. K.; Mo, Tse Chin (1984). "Chebyshev Inequality with Estimated Mean and Variance". The American Statistician. 38 (2): 130–2. doi:10.2307/2683249. ISSN 0003-1305. JSTOR 2683249.
28. ^ a b Konijn, Hendrik S. (February 1987). "Distribution-Free and Other Prediction Intervals". The American Statistician. 41 (1): 11–15. doi:10.2307/2684311. JSTOR 2684311.
29. ^ a b Kabán, Ata (2012). "Non-parametric detection of meaningless distances in high dimensional data". Statistics and Computing. 22 (2): 375–85. doi:10.1007/s11222-011-9229-0.
30. ^ a b Beasley, T. Mark; Page, Grier P.; Brand, Jaap P. L.; Gadbury, Gary L.; Mountz, John D.; Allison, David B. (January 2004). "Chebyshev's inequality for nonparametric testing with small N and α in microarray research". Journal of the Royal Statistical Society. C (Applied Statistics). 53 (1): 95–108. doi:10.1111/j.1467-9876.2004.00428.x. ISSN 1467-9876.
31. ^ Savage, I. Richard. "Probability inequalities of the Tchebycheff type." Journal of Research of the National Bureau of Standards-B. Mathematics and Mathematical Physics B 65 (1961): 211-222
32. ^ a b c Ion, Roxana Alice (2001). "Chapter 4: Sharp Chebyshev-type inequalities". Nonparametric Statistical Process Control. Universiteit van Amsterdam. ISBN 978-9057760761. Retrieved 1 October 2012.
33. ^ a b Berck, Peter; Hihn, Jairus M. (May 1982). "Using the Semivariance to Estimate Safety-First Rules". American Journal of Agricultural Economics. 64 (2): 298–300. doi:10.2307/1241139. ISSN 0002-9092. JSTOR 1241139. Retrieved 8 October 2012.
34. ^ Nantell, Timothy J.; Price, Barbara (June 1979). "An Analytical Comparison of Variance and Semivariance Capital Market Theories". The Journal of Financial and Quantitative Analysis. 14 (2): 221–42. doi:10.2307/2330500. JSTOR 2330500.
35. ^ Neave, Edwin H.; Ross, Michael N.; Yang, Jun (2009). "Distinguishing upside potential from downside risk". Management Research News. 32 (1): 26–36. doi:10.1108/01409170910922005. ISSN 0140-9174.
36. ^ Selberg, Henrik L. (1940). "Zwei Ungleichungen zur Ergänzung des Tchebycheffschen Lemmas" [Two Inequalities Supplementing the Tchebycheff Lemma]. Skandinavisk Aktuarietidskrift (Scandinavian Actuarial Journal) (in German). 1940 (3–4): 121–125. doi:10.1080/03461238.1940.10404804. ISSN 0346-1238. OCLC 610399869.
37. ^ Conlon, J.; Dulá, J. H. "A geometric derivation and interpretation of Tchebyscheff's Inequality" (PDF). Retrieved 2 October 2012. Cite journal requires |journal= (help)
38. ^ Cantelli F. (1910) Intorno ad un teorema fondamentale della teoria del rischio. Bolletino dell Associazione degli Attuari Italiani
39. ^ Grimmett and Stirzaker, problem 7.11.9. Several proofs of this result can be found in Chebyshev's Inequalities by A. G. McDowell.
40. ^ Bhattacharyya, B. B. (1987). "One-sided chebyshev inequality when the first four moments are known". Communications in Statistics – Theory and Methods. 16 (9): 2789–91. doi:10.1080/03610928708829540. ISSN 0361-0926.
41. ^ Mitzenmacher, Michael; Upfal, Eli (January 2005). Probability and Computing: Randomized Algorithms and Probabilistic Analysis (Repr. ed.). Cambridge [u.a.]: Cambridge Univ. Press. ISBN 9780521835404. Retrieved 6 October 2012.
42. ^ Zelen M. (1954) Bounds on a distribution function that are functions of moments to order four. J Res Nat Bur Stand 53:377–381
43. ^ He, S.; Zhang, J.; Zhang, S. (2010). "Bounding probability of small deviation: A fourth moment approach". Mathematics of Operations Research. 35 (1): 208–232. doi:10.1287/moor.1090.0438. S2CID 11298475.
44. ^ Martien C. A. van Zuijlen (2011) On a conjecture concerning the sum of independent Rademacher random variables
45. ^ Feller, William (1966). An Introduction to Probability Theory and Its Applications, Volume 2 (2 ed.). Wiley. p. 155. Retrieved 6 October 2012.
46. ^ Hartigan, J. A.; Hartigan, P. M. (1985). "The Dip Test of Unimodality". The Annals of Statistics. 13: 70–84. doi:10.1214/aos/1176346577. MR 0773153.
47. ^ Gauss C. F. Theoria Combinationis Observationum Erroribus Minimis Obnoxiae. Pars Prior. Pars Posterior. Supplementum. Theory of the Combination of Observations Least Subject to Errors. Part One. Part Two. Supplement. 1995. Translated by G. W. Stewart. Classics in Applied Mathematics Series, Society for Industrial and Applied Mathematics, Philadelphia
48. ^ Winkler A. (1886) Math-Natur theorie Kl. Akad. Wiss Wien Zweite Abt 53, 6–41
49. ^ Dharmadhikari, S. W.; Joag-Dev, K. (1985). "The Gauss–Tchebyshev inequality for unimodal distributions" (PDF). Teoriya Veroyatnostei i ee Primeneniya. 30 (4): 817–820.
50. ^ a b c Clarkson, Eric; Denny, J. L.; Shepp, Larry (2009). "ROC and the bounds on tail probabilities via theorems of Dubins and F. Riesz". The Annals of Applied Probability. 19 (1): 467–76. arXiv:0903.0518. Bibcode:2009arXiv0903.0518C. doi:10.1214/08-AAP536. PMC 2828638. PMID 20191100.
51. ^ McWilliams, Thomas P. (1990). "A Distribution-Free Test for Symmetry Based on a Runs Statistic". Journal of the American Statistical Association. 85 (412): 1130–3. doi:10.2307/2289611. ISSN 0162-1459. JSTOR 2289611.
52. ^ Seaman, John W., Jr.; Young, Dean M.; Odell, Patrick L. (1987). "Improving small sample variance estimators for bounded random variables". Industrial Mathematics. 37: 65–75. ISSN 0019-8528. Zbl 0637.62024.
53. ^ Bickel, Peter J.; Krieger, Abba M. (1992). "Extensions of Chebyshev's Inequality with Applications" (PDF). Probability and Mathematical Statistics. 13 (2): 293–310. ISSN 0208-4147. Retrieved 6 October 2012.
54. ^ a b DasGupta, A (2000). "Best constants in Chebychev inequalities with various applications". Metrika. 5 (1): 185–200. doi:10.1007/s184-000-8316-9.
55. ^ "More thoughts on a one tailed version of Chebyshev's inequality – by Henry Bottomley". se16.info. Retrieved 2012-06-12.
56. ^ Grechuk, B., Molyboha, A., Zabarankin, M. (2010). Chebyshev Inequalities with Law Invariant Deviation Measures, Probability in the Engineering and Informational Sciences, 24(1), 145-170.
57. ^ Godwin H. J. (1964) Inequalities on distribution functions. (Chapter 3) New York, Hafner Pub. Co.
58. ^ a b Lesley F. D., Rotar V. I. (2003) Some remarks on lower bounds of Chebyshev's type for half-lines. J Inequalities Pure Appl Math 4(5) Art 96
59. ^ Fink, A. M.; Jodeit, Max, Jr. (1984). "On Chebyshev's other inequality". In Tong, Y. L.; Gupta, Shanti S. (eds.). Inequalities in Statistics and Probability. Institute of Mathematical Statistics Lecture Notes - Monograph Series. 5. pp. 115–120. doi:10.1214/lnms/1215465637. ISBN 978-0-940600-04-1. MR 0789242. Retrieved 7 October 2012.
60. ^ Niculescu, Constantin P. (2001). "An extension of Chebyshev's inequality and its connection with Jensen's inequality". Journal of Inequalities and Applications. 6 (4): 451–462. CiteSeerX 10.1.1.612.7056. doi:10.1155/S1025583401000273. ISSN 1025-5834. Retrieved 6 October 2012.
61. ^ a b Niculescu, Constantin P.; Pečarić, Josip (2010). "The Equivalence of Chebyshev's Inequality to the Hermite–Hadamard Inequality" (PDF). Mathematical Reports. 12 (62): 145–156. ISSN 1582-3067. Retrieved 6 October 2012.
62. ^ Malamud, S. M. (15 February 2001). "Some complements to the Jensen and Chebyshev inequalities and a problem of W. Walter". Proceedings of the American Mathematical Society. 129 (9): 2671–2678. doi:10.1090/S0002-9939-01-05849-X. ISSN 0002-9939. MR 1838791. Retrieved 7 October 2012.
63. ^ Haldane, J. B. (1952). "Simple tests for bimodality and bitangentiality". Annals of Eugenics. 16 (4): 359–364. doi:10.1111/j.1469-1809.1951.tb02488.x. PMID 14953132.
64. ^ Kendall M. G. (1943) The Advanced Theory of Statistics, 1. London
65. ^ Calculating Upper Confidence Limits for Exposure Point Concentrations at hazardous Waste Sites (Report). Office of Emergency and Remedial Response of the U.S. Environmental Protection Agency. December 2002. Retrieved 5 August 2016.
66. ^ "Statistical Tests: The Chebyshev UCL Proposal". Quantitative Decisions. 25 March 2001. Retrieved 26 November 2015.