# Binomial distribution

(Redirected from Binomial probability)
Notation Probability mass function Cumulative distribution function B(n, p) n ∈ N0 — number of trials p ∈ [0,1] — success probability in each trial k ∈ { 0, …, n } — number of successes ${\displaystyle \textstyle {n \choose k}\,p^{k}(1-p)^{n-k}}$ ${\displaystyle \textstyle I_{1-p}(n-k,1+k)}$ ${\displaystyle np}$ ${\displaystyle \lfloor np\rfloor }$ or ${\displaystyle \lceil np\rceil }$ ${\displaystyle \lfloor (n+1)p\rfloor }$ or ${\displaystyle \lceil (n+1)p\rceil -1}$ ${\displaystyle np(1-p)}$ ${\displaystyle {\frac {1-2p}{\sqrt {np(1-p)}}}}$ ${\displaystyle {\frac {1-6p(1-p)}{np(1-p)}}}$ ${\displaystyle {\frac {1}{2}}\log _{2}{\big (}2\pi e\,np(1-p){\big )}+O\left({\frac {1}{n}}\right)}$ in shannons. For nats, use the natural log in the log. ${\displaystyle (1-p+pe^{t})^{n}\!}$ ${\displaystyle (1-p+pe^{it})^{n}\!}$ ${\displaystyle G(z)=\left[(1-p)+pz\right]^{n}.}$ ${\displaystyle g_{n}(p)={\frac {n}{p(1-p)}}}$ (for fixed ${\displaystyle n}$)
Binomial distribution for ${\displaystyle p=0.5}$
with n and k as in Pascal's triangle

The probability that a ball in a Galton box with 8 layers (n = 8) ends up in the central bin (k = 4) is ${\displaystyle 70/256}$.

In probability theory and statistics, the binomial distribution with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent experiments, each asking a yes–no question, and each with its own boolean-valued outcome: a random variable containing single bit of information: success/yes/true/one (with probability p) or failure/no/false/zero (with probability q = 1 − p). A single success/failure experiment is also called a Bernoulli trial or Bernoulli experiment and a sequence of outcomes is called a Bernoulli process; for a single trial, i.e., n = 1, the binomial distribution is a Bernoulli distribution. The binomial distribution is the basis for the popular binomial test of statistical significance.

The binomial distribution is frequently used to model the number of successes in a sample of size n drawn with replacement from a population of size N. If the sampling is carried out without replacement, the draws are not independent and so the resulting distribution is a hypergeometric distribution, not a binomial one. However, for N much larger than n, the binomial distribution remains a good approximation, and is widely used.

## Specification

### Probability mass function

In general, if the random variable X follows the binomial distribution with parameters n ∈ ℕ and p ∈ [0,1], we write X ~ B(np). The probability of getting exactly k successes in n trials is given by the probability mass function:

${\displaystyle Pr(k;n,p)=\Pr(X=k)={n \choose k}p^{k}(1-p)^{n-k}}$

for k = 0, 1, 2, ..., n, where

${\displaystyle {\binom {n}{k}}={\frac {n!}{k!(n-k)!}}}$

is the binomial coefficient, hence the name of the distribution. The formula can be understood as follows. k successes occur with probability pk and n − k failures occur with probability (1 − p)n − k. However, the k successes can occur anywhere among the n trials, and there are ${\displaystyle {n \choose k}}$different ways of distributing k successes in a sequence of n trials.

In creating reference tables for binomial distribution probability, usually the table is filled in up to n/2 values. This is because for k > n/2, the probability can be calculated by its complement as

${\displaystyle f(k,n,p)=f(n-k,n,1-p).}$

The probability mass function satisfies the following recurrence relation, for every ${\displaystyle n,p}$:

${\displaystyle \left\{{\begin{array}{l}p(n-k)f(k,n,p)=(k+1)(1-p)f(k+1,n,p),\\[10pt]f(0,n,p)=(1-p)^{n}\end{array}}\right\}}$

Looking at the expression ƒ(knp) as a function of k, there is a k value that maximizes it. This k value can be found by calculating

${\displaystyle {\frac {f(k+1,n,p)}{f(k,n,p)}}={\frac {(n-k)p}{(k+1)(1-p)}}}$

and comparing it to 1. There is always an integer M that satisfies

${\displaystyle (n+1)p-1\leq M<(n+1)p.}$

ƒ(knp) is monotone increasing for k < M and monotone decreasing for k > M, with the exception of the case where (n + 1)p is an integer. In this case, there are two values for which ƒ is maximal: (n + 1)p and (n + 1)p − 1. M is the most probable (most likely) outcome of the Bernoulli trials and is called the mode. Note that the probability of it occurring can be fairly small.

### Cumulative distribution function

The cumulative distribution function can be expressed as:

${\displaystyle F(k;n,p)=\Pr(X\leq k)=\sum _{i=0}^{\lfloor k\rfloor }{n \choose i}p^{i}(1-p)^{n-i}}$

where ${\displaystyle \scriptstyle \lfloor k\rfloor \,}$ is the "floor" under k, i.e. the greatest integer less than or equal to k.

It can also be represented in terms of the regularized incomplete beta function, as follows:[1]

{\displaystyle {\begin{aligned}F(k;n,p)&=\Pr(X\leq k)\\&=I_{1-p}(n-k,k+1)\\&=(n-k){n \choose k}\int _{0}^{1-p}t^{n-k-1}(1-t)^{k}\,dt.\end{aligned}}}

Some closed-form bounds for the cumulative distribution function are given below.

## Example

Suppose a biased coin comes up heads with probability 0.3 when tossed. What is the probability of achieving 0, 1,..., 6 heads after six tosses?

${\displaystyle \Pr(0{\text{ heads}})=f(0)=\Pr(X=0)={6 \choose 0}0.3^{0}(1-0.3)^{6-0}\approx 0.1176}$
${\displaystyle \Pr(1{\text{ heads}})=f(1)=\Pr(X=1)={6 \choose 1}0.3^{1}(1-0.3)^{6-1}\approx 0.3025}$
${\displaystyle \Pr(2{\text{ heads}})=f(2)=\Pr(X=2)={6 \choose 2}0.3^{2}(1-0.3)^{6-2}\approx 0.3241}$
${\displaystyle \Pr(3{\text{ heads}})=f(3)=\Pr(X=3)={6 \choose 3}0.3^{3}(1-0.3)^{6-3}\approx 0.1852}$
${\displaystyle \Pr(4{\text{ heads}})=f(4)=\Pr(X=4)={6 \choose 4}0.3^{4}(1-0.3)^{6-4}\approx 0.0595}$
${\displaystyle \Pr(5{\text{ heads}})=f(5)=\Pr(X=5)={6 \choose 5}0.3^{5}(1-0.3)^{6-5}\approx 0.0102}$
${\displaystyle \Pr(6{\text{ heads}})=f(6)=\Pr(X=6)={6 \choose 6}0.3^{6}(1-0.3)^{6-6}\approx 0.0007}$[2]

## Mean

If X ~ B(n, p), that is, X is a binomially distributed random variable, n being the total number of experiments and p the probability of each experiment yielding a successful result, then the expected value of X is:[3]

${\displaystyle \operatorname {E} [X]=np.}$

For example, if n = 100, and p =1/4, then the average number of successful results will be 25.

Proof: We calculate the mean, μ, directly calculated from its definition

${\displaystyle \mu =\sum _{i=1}^{n}x_{i}p_{i},}$

and the binomial theorem:

{\displaystyle {\begin{aligned}\mu &=\sum _{k=0}^{n}k{\binom {n}{k}}p^{k}(1-p)^{n-k}\\&=np\sum _{k=0}^{n}k{\frac {(n-1)!}{(n-k)!k!}}p^{k-1}(1-p)^{(n-1)-(k-1)}\\&=np\sum _{k=1}^{n}{\frac {(n-1)!}{((n-1)-(k-1))!(k-1)!}}p^{k-1}(1-p)^{(n-1)-(k-1)}\\&=np\sum _{k=1}^{n}{\binom {n-1}{k-1}}p^{k-1}(1-p)^{(n-1)-(k-1)}\\&=np\sum _{\ell =0}^{n-1}{\binom {n-1}{\ell }}p^{\ell }(1-p)^{(n-1)-\ell }&&{\text{with }}\ell :=k-1\\&=np\sum _{\ell =0}^{m}{\binom {m}{\ell }}p^{\ell }(1-p)^{m-\ell }&&{\text{with }}m:=n-1\\&=np(p+(1-p))^{m}\\&=np\end{aligned}}}

It is also possible to deduce the mean from the equation ${\displaystyle X=X_{1}+\cdots +X_{n}}$ whereby all ${\displaystyle X_{i}}$ are Bernoulli distributed random variables with ${\displaystyle E[X_{i}]=p}$. We get

${\displaystyle E[X]=E[X_{1}+\cdots +X_{n}]=E[X_{1}]+\cdots +E[X_{n}]=\underbrace {p+\cdots +p} _{n{\text{ times}}}=np}$

## Variance

The variance is:

${\displaystyle \operatorname {Var} (X)=np(1-p).}$

Proof: Let ${\displaystyle X=X_{1}+\cdots +X_{n}}$ where all ${\displaystyle X_{i}}$ are independently Bernoulli distributed random variables. Since ${\displaystyle \operatorname {Var} (X_{i})=p(1-p)}$, we get:

${\displaystyle \operatorname {Var} (X)=\operatorname {Var} (X_{1}+\cdots +X_{n})=\operatorname {Var} (X_{1})+\cdots +\operatorname {Var} (X_{n})=n\operatorname {Var} (X_{1})=np(1-p).}$

## Mode

Usually the mode of a binomial B(n, p) distribution is equal to ${\displaystyle \lfloor (n+1)p\rfloor }$, where ${\displaystyle \lfloor \cdot \rfloor }$ is the floor function. However, when (n + 1)p is an integer and p is neither 0 nor 1, then the distribution has two modes: (n + 1)p and (n + 1)p − 1. When p is equal to 0 or 1, the mode will be 0 and n correspondingly. These cases can be summarized as follows:

${\displaystyle {\text{mode}}={\begin{cases}\lfloor (n+1)\,p\rfloor &{\text{if }}(n+1)p{\text{ is 0 or a noninteger}},\\(n+1)\,p\ {\text{ and }}\ (n+1)\,p-1&{\text{if }}(n+1)p\in \{1,\dots ,n\},\\n&{\text{if }}(n+1)p=n+1.\end{cases}}}$

Proof: Let

${\displaystyle f(k)={\binom {n}{k}}p^{k}q^{n-k}.}$

For ${\displaystyle p=0}$ only ${\displaystyle f(0)}$ has a nonzero value with ${\displaystyle f(0)=1}$. For ${\displaystyle p=1}$ we find ${\displaystyle f(n)=1}$ and ${\displaystyle f(k)=0}$ for ${\displaystyle k\neq n}$. This proves that the mode is 0 for ${\displaystyle p=0}$ and ${\displaystyle n}$ for ${\displaystyle p=1}$.

Let ${\displaystyle 0. We find

${\displaystyle {\frac {f(k+1)}{f(k)}}={\frac {(n-k)p}{(k+1)(1-p)}}}$.

From this follows

{\displaystyle {\begin{aligned}k>(n+1)p-1\Rightarrow f(k+1)f(k)\end{aligned}}}

So when ${\displaystyle (n+1)p-1}$ is an integer, then ${\displaystyle (n+1)p-1}$ and ${\displaystyle (n+1)p}$ is a mode. In the case that ${\displaystyle (n+1)p-1\notin \mathbb {Z} }$, then only ${\displaystyle \lfloor (n+1)p-1\rfloor +1=\lfloor (n+1)p\rfloor }$ is a mode.[4]

## Median

In general, there is no single formula to find the median for a binomial distribution, and it may even be non-unique. However several special results have been established:

• If np is an integer, then the mean, median, and mode coincide and equal np.[5][6]
• Any median m must lie within the interval ⌊np⌋ ≤ m ≤ ⌈np⌉.[7]
• A median m cannot lie too far away from the mean: |mnp| ≤ min{ ln 2, max{p, 1 − p} }.[8]
• The median is unique and equal to m = round(np) in cases when either p ≤ 1 − ln 2 or p ≥ ln 2 or |m − np| ≤ min{p, 1 − p} (except for the case when p = ½ and n is odd).[7][8]
• When p = 1/2 and n is odd, any number m in the interval ½(n − 1) ≤ m ≤ ½(n + 1) is a median of the binomial distribution. If p = 1/2 and n is even, then m = n/2 is the unique median.

## Covariance between two binomials

If two binomially distributed random variables X and Y are observed together, estimating their covariance can be useful. Using the definition of covariance, in the case n = 1 (thus being Bernoulli trials) we have

${\displaystyle \operatorname {Cov} (X,Y)=\operatorname {E} (XY)-\mu _{X}\mu _{Y}.}$

The first term is non-zero only when both X and Y are one, and μX and μY are equal to the two probabilities. Defining pB as the probability of both happening at the same time, this gives

${\displaystyle \operatorname {Cov} (X,Y)=p_{B}-p_{X}p_{Y},}$

and for n independent pairwise trials

${\displaystyle \operatorname {Cov} (X,Y)_{n}=n(p_{B}-p_{X}p_{Y}).}$

If X and Y are the same variable, this reduces to the variance formula given above.

## Related distributions

### Sums of binomials

If X ~ B(np) and Y ~ B(mp) are independent binomial variables with the same probability p, then X + Y is again a binomial variable; its distribution is[citation needed] Z=X+Y ~ B(n+mp):

{\displaystyle {\begin{aligned}\operatorname {P} (Z=k)&=\sum _{i=0}^{k}\left[{\binom {n}{i}}p^{i}(1-p)^{n-i}\right]\left[{\binom {m}{k-i}}p^{k-i}(1-p)^{m-k+i}\right]\\&={\binom {n+m}{k}}p^{k}(1-p)^{n+m-k}\end{aligned}}}

However, if X and Y do not have the same probability p, then the variance of the sum will be smaller than the variance of a binomial variable distributed as ${\displaystyle B(n+m,{\bar {p}}).\,}$

### Conditional binomials

If X ~ B(np) and, conditional on X, Y ~ B(Xq), then Y is a simple binomial variable with distribution.[citation needed]

${\displaystyle Y\sim B(n,pq).}$

For example, imagine throwing n balls to a basket UX and taking the balls that hit and throwing them to another basket UY. If p is the probability to hit UX then X ~ B(np) is the number of balls that hit UX. If q is the probability to hit UY then the number of balls that hit UY is Y ~ B(Xq) and therefore Y ~ B(npq).

### Bernoulli distribution

The Bernoulli distribution is a special case of the binomial distribution, where n = 1. Symbolically, X ~ B(1, p) has the same meaning as X ~ B(p). Conversely, any binomial distribution, B(np), is the distribution of the sum of n Bernoulli trials, B(p), each with the same probability p.[citation needed]

### Poisson binomial distribution

The binomial distribution is a special case of the Poisson binomial distribution, or general binomial distribution, which is the distribution of a sum of n independent non-identical Bernoulli trials B(pi).[9]

### Normal approximation

Binomial probability mass function and normal probability density function approximation for n = 6 and p = 0.5

If n is large enough, then the skew of the distribution is not too great. In this case a reasonable approximation to B(np) is given by the normal distribution

${\displaystyle {\mathcal {N}}(np,\,np(1-p)),}$

and this basic approximation can be improved in a simple way by using a suitable continuity correction. The basic approximation generally improves as n increases (at least 20) and is better when p is not near to 0 or 1.[10] Various rules of thumb may be used to decide whether n is large enough, and p is far enough from the extremes of zero or one:

• One rule[10] is that for n > 5 the normal approximation is adequate if the absolute value of the skewness is strictly less than 1/3; that is, if
${\displaystyle {\frac {|1-2p|}{\sqrt {np(1-p)}}}={\frac {1}{\sqrt {n}}}\left|{\sqrt {\frac {1-p}{p}}}-{\sqrt {\frac {p}{1-p}}}\,\right|<{\frac {1}{3}}.}$
• A stronger rule states that the normal approximation is appropriate only if everything within 3 standard deviations of its mean is within the range of possible values; that is, only if
${\displaystyle \mu \pm 3\sigma =np\pm 3{\sqrt {np(1-p)}}\in (0,n).}$
This 3-standard-deviation rule is equivalent to the following conditions, which also imply the first rule above.
${\displaystyle n>9\,{\frac {1-p}{p}}\quad {\hbox{and}}\quad n>9\,{\frac {p}{1-p}}.}$
• Another commonly used rule is that both values ${\displaystyle np}$ and ${\displaystyle n(1-p)}$ must be greater than 5. However, the specific number varies from source to source, and depends on how good an approximation one wants. Some sources[citation needed] suggest 9 instead of 5, which gives the same result stated in the previous rule.

The following is an example of applying a continuity correction. Suppose one wishes to calculate Pr(X ≤ 8) for a binomial random variable X. If Y has a distribution given by the normal approximation, then Pr(X ≤ 8) is approximated by Pr(Y ≤ 8.5). The addition of 0.5 is the continuity correction; the uncorrected normal approximation gives considerably less accurate results.

This approximation, known as de Moivre–Laplace theorem, is a huge time-saver when undertaking calculations by hand (exact calculations with large n are very onerous); historically, it was the first use of the normal distribution, introduced in Abraham de Moivre's book The Doctrine of Chances in 1738. Nowadays, it can be seen as a consequence of the central limit theorem since B(np) is a sum of n independent, identically distributed Bernoulli variables with parameter p. This fact is the basis of a hypothesis test, a "proportion z-test", for the value of p using x/n, the sample proportion and estimator of p, in a common test statistic.[11]

For example, suppose one randomly samples n people out of a large population and ask them whether they agree with a certain statement. The proportion of people who agree will of course depend on the sample. If groups of n people were sampled repeatedly and truly randomly, the proportions would follow an approximate normal distribution with mean equal to the true proportion p of agreement in the population and with standard deviation ${\displaystyle \sigma ={\sqrt {\frac {p(1-p)}{n}}}}$

### Poisson approximation

The binomial distribution converges towards the Poisson distribution as the number of trials goes to infinity while the product np remains fixed or at least p tends to zero. Therefore, the Poisson distribution with parameter λ = np can be used as an approximation to B(n, p) of the binomial distribution if n is sufficiently large and p is sufficiently small. According to two rules of thumb, this approximation is good if n ≥ 20 and p ≤ 0.05, or if n ≥ 100 and np ≤ 10.[12]

Concerning the accuracy of Poisson approximation, see Novak,[13] ch. 4, and references therein.

### Limiting distributions

${\displaystyle {\frac {X-np}{\sqrt {np(1-p)}}}}$
approaches the normal distribution with expected value 0 and variance 1.[citation needed] This result is sometimes loosely stated by saying that the distribution of X is asymptotically normal with expected value np and variance np(1 − p). This result is a specific case of the central limit theorem.

### Beta distribution

Beta distributions provide a family of prior probability distributions for binomial distributions in Bayesian inference:[14]

${\displaystyle P(p;\alpha ,\beta )={\frac {p^{\alpha -1}(1-p)^{\beta -1}}{\mathrm {B} (\alpha ,\beta )}}}$.

## Confidence intervals

Even for quite large values of n, the actual distribution of the mean is significantly nonnormal.[15] Because of this problem several methods to estimate confidence intervals have been proposed.

Let n1 be the number of successes out of n, the total number of trials, and let

${\displaystyle {\hat {p}}={\frac {n_{1}}{n}}}$

be the proportion of successes. Let zα/2 be the 100(1 − α/2)th percentile of the standard normal distribution.

• Wald method
${\displaystyle {\hat {p}}\pm z_{\frac {\alpha }{2}}{\sqrt {\frac {{\hat {p}}(1-{\hat {p}})}{n}}}.}$
A continuity correction of 0.5/n may be added.[clarification needed]
• Agresti-Coull method[16]
${\displaystyle {\tilde {p}}\pm z_{\frac {\alpha }{2}}{\sqrt {\frac {{\tilde {p}}(1-{\tilde {p}})}{n+z_{\frac {\alpha }{2}}^{2}}}}.}$
Here the estimate of p is modified to
${\displaystyle {\tilde {p}}={\frac {n_{1}+{\frac {1}{2}}z_{\frac {\alpha }{2}}^{2}}{n+z_{\frac {\alpha }{2}}^{2}}}}$
• ArcSine method[17]
${\displaystyle \sin ^{2}\left(\arcsin \left({\sqrt {\hat {p}}}\right)\pm {\frac {z}{2{\sqrt {n}}}}\right)}$
• Wilson (score) method[18]
${\displaystyle {\frac {{\hat {p}}+{\frac {1}{2n}}z_{1-{\frac {\alpha }{2}}}^{2}\pm {\frac {1}{2n}}z_{1-{\frac {\alpha }{2}}}{\sqrt {4n{\hat {p}}(1-{\hat {p}})+z_{1-{\frac {\alpha }{2}}}^{2}}}}{1+{\frac {1}{n}}z_{1-{\frac {\alpha }{2}}}^{2}}}.}$

The exact (Clopper-Pearson) method is the most conservative.[15] The Wald method although commonly recommended in the text books is the most biased.[clarification needed]

## Generating binomial random variates

Methods for random number generation where the marginal distribution is a binomial distribution are well-established.[19][20]

One way to generate random samples from a binomial distribution is to use an inversion algorithm. To do so, one must calculate the probability that P(X=k) for all values k from 0 through n. (These probabilities should sum to a value close to one, in order to encompass the entire sample space.) Then by using a pseudorandom number generator to generate samples uniformly between 0 and 1, one can transform the calculated samples U[0,1] into discrete numbers by using the probabilities calculated in step one.

## Tail bounds

For knp, upper bounds for the lower tail of the distribution function can be derived. Recall that ${\displaystyle F(k;n,p)=\Pr(X\leq k)}$, the probability that there are at most k successes.

Hoeffding's inequality yields the bound

${\displaystyle F(k;n,p)\leq \exp \left(-2{\frac {(np-k)^{2}}{n}}\right),\!}$

and Chernoff's inequality can be used to derive the bound

${\displaystyle F(k;n,p)\leq \exp \left(-{\frac {1}{2\,p}}{\frac {(np-k)^{2}}{n}}\right).\!}$

Moreover, these bounds are reasonably tight when p = 1/2, since the following expression holds for all k3n/8[21]

${\displaystyle F(k;n,{\tfrac {1}{2}})\leq {\frac {14}{15}}\exp \left(-{\frac {16({\frac {n}{2}}-k)^{2}}{n}}\right).\!}$

However, the bounds do not work well for extreme values of p. In particular, as p ${\displaystyle \rightarrow }$ 1, value F(k;n,p) goes to zero (for fixed k, n with k<n) while the upper bound above goes to a positive constant. In this case a better bound is given by [22]

${\displaystyle F(k;n,p)\leq \exp \left(-nD\left({\frac {k}{n}}\left|\right|p\right)\right)\quad \quad {\mbox{if }}0<{\frac {k}{n}}

where D(a || p) is the relative entropy between an a-coin and a p-coin (i.e. between the Bernoulli(a) and Bernoulli(p) distribution):

${\displaystyle D(a||p)=(a)\log {\frac {a}{p}}+(1-a)\log {\frac {1-a}{1-p}}.\!}$

Asymptotically, this bound is reasonably tight; see [22] for details. An equivalent formulation of the bound is

${\displaystyle \Pr(X\geq k)=F(n-k;n,1-p)\leq \exp \left(-nD\left({\frac {k}{n}}\left|\right|p\right)\right)\quad \quad {\mbox{if }}p<{\frac {k}{n}}<1.\!}$

Both these bounds are derived directly from the Chernoff bound. It can also be shown that,

${\displaystyle \Pr(X\geq k)=F(n-k;n,1-p)\geq {\frac {1}{(n+1)^{2}}}\exp \left(-nD\left({\frac {k}{n}}\left|\right|p\right)\right)\quad \quad {\mbox{if }}p<{\frac {k}{n}}<1.\!}$

This is proved using the method of types (see for example chapter 12 of Elements of Information Theory by Cover and Thomas [23]).

We can also change the ${\displaystyle (n+1)^{2}}$ in the denominator to ${\displaystyle {\sqrt {2n}}}$, by approximating the binomial coefficient with Stirlings formula.[24]

## References

1. ^ Wadsworth, G. P. (1960). Introduction to Probability and Random Variables. New York: McGraw-Hill. p. 52.
2. ^ Hamilton Institute. "The Binomial Distribution" October 20, 2010.
3. ^ See Proof Wiki
4. ^
5. ^ Neumann, P. (1966). "Über den Median der Binomial- and Poissonverteilung". Wissenschaftliche Zeitschrift der Technischen Universität Dresden (in German). 19: 29–33.
6. ^ Lord, Nick. (July 2010). "Binomial averages when the mean is an integer", The Mathematical Gazette 94, 331-332.
7. ^ a b Kaas, R.; Buhrman, J.M. (1980). "Mean, Median and Mode in Binomial Distributions". Statistica Neerlandica. 34 (1): 13–18. doi:10.1111/j.1467-9574.1980.tb00681.x.
8. ^ a b Hamza, K. (1995). "The smallest uniform upper bound on the distance between the mean and the median of the binomial and Poisson distributions". Statistics & Probability Letters. 23: 21–25. doi:10.1016/0167-7152(94)00090-U.
9. ^ Wang, Y. H. (1993). "On the number of successes in independent trials" (PDF). Statistica Sinica. 3 (2): 295–312.
10. ^ a b Box, Hunter and Hunter (1978). Statistics for experimenters. Wiley. p. 130.
11. ^ NIST/SEMATECH, "7.2.4. Does the proportion of defectives meet requirements?" e-Handbook of Statistical Methods.
12. ^ a b NIST/SEMATECH, "6.3.3.1. Counts Control Charts", e-Handbook of Statistical Methods.
13. ^ Novak S.Y. (2011) Extreme value methods with applications to finance. London: CRC/ Chapman & Hall/Taylor & Francis. ISBN 9781-43983-5746.
14. ^ MacKay, David (2003). Information Theory, Inference and Learning Algorithms. Cambridge University Press; First Edition. ISBN 978-0521642989.
15. ^ a b Brown, Lawrence D.; Cai, T. Tony; DasGupta, Anirban (2001), "Interval Estimation for a Binomial Proportion", Statistical Science, 16 (2): 101–133, doi:10.1214/ss/1009213286, retrieved 2015-01-05
16. ^ Agresti, Alan; Coull, Brent A. (May 1998), "Approximate is better than 'exact' for interval estimation of binomial proportions" (PDF), The American Statistician, 52 (2): 119–126, doi:10.2307/2685469, retrieved 2015-01-05
17. ^
18. ^ Wilson, Edwin B. (June 1927), "Probable inference, the law of succession, and statistical inference" (PDF), J. American Statistical Association, 22 (158): 209–212, doi:10.2307/2276774, retrieved 2015-01-05
19. ^ Devroye, Luc (1986) Non-Uniform Random Variate Generation, New York: Springer-Verlag. (See especially Chapter X, Discrete Univariate Distributions)
20. ^ Kachitvichyanukul, V.; Schmeiser, B. W. (1988). "Binomial random variate generation". Communications of the ACM. 31 (2): 216–222. doi:10.1145/42372.42381.
21. ^ Matoušek, J, Vondrak, J: The Probabilistic Method (lecture notes) [1].
22. ^ a b R. Arratia and L. Gordon: Tutorial on large deviations for the binomial distribution, Bulletin of Mathematical Biology 51(1) (1989), 125–131 [2].
23. ^ Theorem 11.1.3 in Cover, T.; Thomas, J. (2006). Elements of Information Theory (2nd ed.). Wiley. p. 350.
24. ^ http://math.stackexchange.com/questions/1548940/sharper-lower-bounds-for-binomial-chernoff-tails/1564088#1564088
25. ^ Mandelbrot, B. B., Fisher, A. J., & Calvet, L. E. (1997). A multifractal model of asset returns. 3.2 The Binomial Measure is the Simplest Example of a Multifractal