Binomial distribution: Difference between revisions

binomial
	Probability mass function
	Cumulative distribution function
Notation	B(n, p)
Parameters	n ∈ N0 — number of trials; p ∈ [0,1] — success probability in each trial
Support	k ∈ { 0, …, n } — number of successes
PMF
CDF
Mean
Median	or
Mode	or
Variance
Skewness
Excess kurtosis
Entropy	; in shannons. For nats, use the natural log, and omit the factor of in the log.
MGF
CF
PGF
Fisher information	(for fixed )

Browse history interactively

← Previous edit Next edit →

Content deleted Content added

VisualWikitext

Inline

Revision as of 15:02, 2 April 2015

In probability theory and statistics, the binomial distribution with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability p. A success/failure experiment is also called a Bernoulli experiment or Bernoulli trial; when n = 1, the binomial distribution is a Bernoulli distribution. The binomial distribution is the basis for the popular binomial test of statistical significance.

The binomial distribution is frequently used to model the number of successes in a sample of size n drawn with replacement from a population of size N. If the sampling is carried out without replacement, the draws are not independent and so the resulting distribution is a hypergeometric distribution, not a binomial one. However, for N much larger than n, the binomial distribution is a good approximation, and widely used.

Specification

Probability mass function

In general, if the random variable X follows the binomial distribution with parameters n and p, we write X ~ B(n, p). The probability of getting exactly k successes in n trials is given by the probability mass function:

f(k;n,p)=\Pr(X=k)={n \choose k}p^{k}(1-p)^{n-k}

for k = 0, 1, 2, ..., n, where

{n \choose k}={\frac {n!}{k!(n-k)!}}

is the binomial coefficient, hence the name of the distribution. The formula can be understood as follows: we want exactly k successes (p^k) and n − k failures (1 − p)^n − k. However, the k successes can occur anywhere among the n trials, and there are ${n \choose k}$ different ways of distributing k successes in a sequence of n trials.

In creating reference tables for binomial distribution probability, usually the table is filled in up to n/2 values. This is because for k > n/2, the probability can be calculated by its complement as

f(k,n,p)=f(n-k,n,1-p).

Looking at the expression ƒ(k, n, p) as a function of k, there is a k value that maximizes it. This k value can be found by calculating

{\frac {f(k+1,n,p)}{f(k,n,p)}}={\frac {(n-k)p}{(k+1)(1-p)}}

and comparing it to 1. There is always an integer M that satisfies

(n+1)p-1\leq M<(n+1)p.

ƒ(k, n, p) is monotone increasing for k < M and monotone decreasing for k > M, with the exception of the case where (n + 1)p is an integer. In this case, there are two values for which ƒ is maximal: (n + 1)p and (n + 1)p − 1. M is the most probable (most likely) outcome of the Bernoulli trials and is called the mode. Note that the probability of it occurring can be fairly small.

Recurrence relation

$\left\{p(n-k){\text{Prob}}(k)+(k+1)(p-1){\text{Prob}}(k+1)=0,{\text{Prob}}(0)=(1-p)^{n}\right\}$

Cumulative distribution function

The cumulative distribution function can be expressed as:

F(k;n,p)=\Pr(X\leq k)=\sum _{i=0}^{\lfloor k\rfloor }{n \choose i}p^{i}(1-p)^{n-i}

where $\scriptstyle \lfloor k\rfloor \,$ is the "floor" under k, i.e. the greatest integer less than or equal to k.

It can also be represented in terms of the regularized incomplete beta function, as follows:^[1]

{\begin{aligned}F(k;n,p)&=\Pr(X\leq k)\\&=I_{1-p}(n-k,k+1)\\&=(n-k){n \choose k}\int _{0}^{1-p}t^{n-k-1}(1-t)^{k}\,dt.\end{aligned}}

Some closed-form bounds for the cumulative distribution function are given below.

Example

Suppose a biased coin comes up heads with probability 0.3 when tossed. What is the probability of achieving 0, 1,..., 6 heads after six tosses?

\Pr(0{\text{ heads}})=f(0)=\Pr(X=0)={6 \choose 0}0.3^{0}(1-0.3)^{6-0}\approx 0.1176

\Pr(1{\text{ heads}})=f(1)=\Pr(X=1)={6 \choose 1}0.3^{1}(1-0.3)^{6-1}\approx 0.3025

\Pr(2{\text{ heads}})=f(2)=\Pr(X=2)={6 \choose 2}0.3^{2}(1-0.3)^{6-2}\approx 0.3241

\Pr(3{\text{ heads}})=f(3)=\Pr(X=3)={6 \choose 3}0.3^{3}(1-0.3)^{6-3}\approx 0.1852

\Pr(4{\text{ heads}})=f(4)=\Pr(X=4)={6 \choose 4}0.3^{4}(1-0.3)^{6-4}\approx 0.0595

\Pr(5{\text{ heads}})=f(5)=\Pr(X=5)={6 \choose 5}0.3^{5}(1-0.3)^{6-5}\approx 0.0102

\Pr(6{\text{ heads}})=f(6)=\Pr(X=6)={6 \choose 6}0.3^{6}(1-0.3)^{6-6}\approx 0.0007

^[2]

Mean and variance

If X ~ B(n, p), that is, X is a binomially distributed random variable, n being the total number of experiments and p the probability of each experiment yielding a successful result, then the expected value of X is:^[3]

\operatorname {E} [X]=np,

(For example, if n=100, and p=1/4, then the average number of successful results will be 25)

The variance is:

\operatorname {Var} [X]=np(1-p).

Mode and median

Usually the mode of a binomial B(n, p) distribution is equal to $\lfloor (n+1)p\rfloor$ , where $\lfloor \cdot \rfloor$ is the floor function. However when (n + 1)p is an integer and p is neither 0 nor 1, then the distribution has two modes: (n + 1)p and (n + 1)p − 1. When p is equal to 0 or 1, the mode will be 0 and n correspondingly. These cases can be summarized as follows:

{\text{mode}}={\begin{cases}\lfloor (n+1)\,p\rfloor &{\text{if }}(n+1)p{\text{ is 0 or a noninteger}},\\(n+1)\,p\ {\text{ and }}\ (n+1)\,p-1&{\text{if }}(n+1)p\in \{1,\dots ,n\},\\n&{\text{if }}(n+1)p=n+1.\end{cases}}

In general, there is no single formula to find the median for a binomial distribution, and it may even be non-unique. However several special results have been established:

If np is an integer, then the mean, median, and mode coincide and equal np.^[4]^[5]
Any median m must lie within the interval ⌊np⌋ ≤ m ≤ ⌈np⌉.^[6]
A median m cannot lie too far away from the mean: |m − np| ≤ min{ ln 2, max{p, 1 − p} }.^[7]
The median is unique and equal to m = round(np) in cases when either p ≤ 1 − ln 2 or p ≥ ln 2 or |m − np| ≤ min{p, 1 − p} (except for the case when p = ½ and n is odd).^[6]^[7]
When p = 1/2 and n is odd, any number m in the interval ½(n − 1) ≤ m ≤ ½(n + 1) is a median of the binomial distribution. If p = 1/2 and n is even, then m = n/2 is the unique median.

Covariance between two binomials

If two binomially distributed random variables X and Y are observed together, estimating their covariance can be useful. Using the definition of covariance, in the case n = 1 (thus being Bernoulli trials) we have

\operatorname {Cov} (X,Y)=\operatorname {E} (XY)-\mu _{X}\mu _{Y}.

The first term is non-zero only when both X and Y are one, and μ_X and μ_Y are equal to the two probabilities. Defining p_B as the probability of both happening at the same time, this gives

\operatorname {Cov} (X,Y)=p_{B}-p_{X}p_{Y},

and for n independent pairwise trials

\operatorname {Cov} (X,Y)_{n}=n(p_{B}-p_{X}p_{Y}).

If X and Y are the same variable, this reduces to the variance formula given above.

Related distributions

Sums of binomials

If X ~ B(n, p) and Y ~ B(m, p) are independent binomial variables with the same probability p, then X + Y is again a binomial variable; its distribution is^{[citation needed]}

X+Y\sim B(n+m,p).\,

However, if X and Y do not have the same probability p, then the variance of the sum will be smaller than the variance of a binomial variable distributed as

B(n+m,{\bar {p}}).\,

Conditional binomials

If X ~ B(n, p) and, conditional on X, Y ~ B(X, q), then Y is a simple binomial variable with distribution^{[citation needed]}

Y\sim B(n,pq).

For example imagine throwing n balls to a basket U_X and taking the balls that hit and throwing them to another basket U_Y. If p is the probability to hit U_X then X ~ B(n, p) is the number of balls that hit U_X. If q is the probability to hit U_Y then the number of balls that hit U_Y is Y ~ B(X, q) and therefore Y ~ B(n, pq).

Bernoulli distribution

The Bernoulli distribution is a special case of the binomial distribution, where n = 1. Symbolically, X ~ B(1, p) has the same meaning as X ~ Bern(p). Conversely, any binomial distribution, B(n, p), is the distribution of the sum of n Bernoulli trials, Bern(p), each with the same probability p.^{[citation needed]}

Poisson binomial distribution

The binomial distribution is a special case of the Poisson binomial distribution, which is a sum of n independent non-identical Bernoulli trials Bern(p_i).^{[citation needed]} If X has the Poisson binomial distribution with p₁ = … = p_n =p then X ~ B(n, p).

Normal approximation

If n is large enough, then the skew of the distribution is not too great. In this case a reasonable approximation to B(n, p) is given by the normal distribution

{\mathcal {N}}(np,\,np(1-p)),

and this basic approximation can be improved in a simple way by using a suitable continuity correction. The basic approximation generally improves as n increases (at least 20) and is better when p is not near to 0 or 1.^[8] Various rules of thumb may be used to decide whether n is large enough, and p is far enough from the extremes of zero or one:

One rule is that both x=np and n(1 − p) must be greater than 5. However, the specific number varies from source to source, and depends on how good an approximation one wants; some sources give 10 which gives virtually the same results as the following rule for large n until n is very large (ex: x=11, n=7752).

A second rule^[8] is that for n > 5 the normal approximation is adequate if

\left|\left({\frac {1}{\sqrt {n}}}\right)\left({\sqrt {\frac {1-p}{p}}}-{\sqrt {\frac {p}{1-p}}}\right)\right|<0.3

Another commonly used rule holds that the normal approximation is appropriate only if everything within 3 standard deviations of its mean is within the range of possible values,^{[citation needed]} that is if

\mu \pm 3\sigma =np\pm 3{\sqrt {np(1-p)}}\in [0,n].

The following is an example of applying a continuity correction. Suppose one wishes to calculate Pr(X ≤ 8) for a binomial random variable X. If Y has a distribution given by the normal approximation, then Pr(X ≤ 8) is approximated by Pr(Y ≤ 8.5). The addition of 0.5 is the continuity correction; the uncorrected normal approximation gives considerably less accurate results.

This approximation, known as de Moivre–Laplace theorem, is a huge time-saver when undertaking calculations by hand (exact calculations with large n are very onerous); historically, it was the first use of the normal distribution, introduced in Abraham de Moivre's book The Doctrine of Chances in 1738. Nowadays, it can be seen as a consequence of the central limit theorem since B(n, p) is a sum of n independent, identically distributed Bernoulli variables with parameter p. This fact is the basis of a hypothesis test, a "proportion z-test", for the value of p using x/n, the sample proportion and estimator of p, in a common test statistic.^[9]

For example, suppose one randomly samples n people out of a large population and ask them whether they agree with a certain statement. The proportion of people who agree will of course depend on the sample. If groups of n people were sampled repeatedly and truly randomly, the proportions would follow an approximate normal distribution with mean equal to the true proportion p of agreement in the population and with standard deviation σ = (p(1 − p)/n)^1/2.

Poisson approximation

The binomial distribution converges towards the Poisson distribution as the number of trials goes to infinity while the product np remains fixed. Therefore the Poisson distribution with parameter λ = np can be used as an approximation to B(n, p) of the binomial distribution if n is sufficiently large and p is sufficiently small. According to two rules of thumb, this approximation is good if n ≥ 20 and p ≤ 0.05, or if n ≥ 100 and np ≤ 10.^[10]

Limiting distributions

Poisson limit theorem: As n approaches ∞ and p approaches 0 while np remains fixed at λ > 0 or at least np approaches λ > 0, then the Binomial(n, p) distribution approaches the Poisson distribution with expected value λ.^[10]

de Moivre–Laplace theorem: As n approaches ∞ while p remains fixed, the distribution of

{\frac {X-np}{\sqrt {np(1-p)}}}

approaches the normal distribution with expected value 0 and variance 1.^{[citation needed]} This result is sometimes loosely stated by saying that the distribution of X is asymptotically normal with expected value np and variance np(1 − p). This result is a specific case of the central limit theorem.

Beta distribution

Beta distributions provide a family of conjugate prior probability distributions for binomial distributions in Bayesian inference. The domain of the beta distribution can be viewed as a probability, and in fact the beta distribution is often used to describe the distribution of a probability value p:^[11]

P(p;\alpha ,\beta )={\frac {p^{\alpha -1}(1-p)^{\beta -1}}{\mathrm {B} (\alpha ,\beta )}}

.

Confidence intervals

Even for quite large values of n, the actual distribution of the mean is significantly nonnormal.^[12] Because of this problem several methods to estimate confidence intervals have been proposed.

Let n₁ be the number of successes out of n, the total number of trials, and let

{\hat {p}}={\frac {n_{1}}{n}}

be the proportion of successes. Let z_α/2 be the 100(1 − α/2)th percentile of the standard normal distribution.

Wald method

{\hat {p}}\pm z_{\frac {\alpha }{2}}{\sqrt {\frac {{\hat {p}}(1-{\hat {p}})}{n}}}.

A continuity correction of 0.5/n may be added.^{[clarification needed]}

Agresti-Coull method^[13]

{\tilde {p}}\pm z_{\frac {\alpha }{2}}{\sqrt {\frac {{\tilde {p}}(1-{\tilde {p}})}{n+z_{\frac {\alpha }{2}}^{2}}}}.

Here the estimate of p is modified to

{\tilde {p}}={\frac {n_{1}+{\frac {1}{2}}z_{\frac {\alpha }{2}}^{2}}{n+z_{\frac {\alpha }{2}}^{2}}}

ArcSine method^[14]

\sin ^{2}\left(\arcsin \left({\sqrt {\hat {p}}}\right)\pm {\frac {z}{2{\sqrt {n}}}}\right)

Wilson (score) method^[15]

{\frac {{\hat {p}}+{\frac {1}{2n}}z_{1-{\frac {\alpha }{2}}}^{2}\pm {\frac {1}{2n}}z_{1-{\frac {\alpha }{2}}}{\sqrt {4n{\hat {p}}(1-{\hat {p}})+z_{1-{\frac {\alpha }{2}}}^{2}}}}{1+{\frac {1}{n}}z_{1-{\frac {\alpha }{2}}}^{2}}}.

The exact (Clopper-Pearson) method is the most conservative.^[12] The Wald method although commonly recommended in the text books is the most biased.^{[clarification needed]}

Generating binomial random variates

Methods for random number generation where the marginal distribution is a binomial distribution are well-established.^[16]^[17]

One way to generate random samples from a binomial distribution is to use an inversion algorithm. To do so, one must calculate the probability that P(X=k) for all values k from 0 through n. (These probabilities should sum to a value close to one, in order to encompass the entire sample space.) Then by using a Linear congruential generator to generate samples uniform between 0 and 1, one can transform the calculated samples U[0,1] into discrete numbers by using the probabilities calculated in step one.

Tail Bounds

For k ≤ np, upper bounds for the lower tail of the distribution function can be derived. In particular, Hoeffding's inequality yields the bound

F(k;n,p)\leq \exp \left(-2{\frac {(np-k)^{2}}{n}}\right),\!

and Chernoff's inequality can be used to derive the bound

F(k;n,p)\leq \exp \left(-{\frac {1}{2\,p}}{\frac {(np-k)^{2}}{n}}\right).\!

Moreover, these bounds are reasonably tight when p = 1/2, since the following expression holds for all k ≥ 3n/8^[18]

F(k;n,{\tfrac {1}{2}})\geq {\frac {1}{15}}\exp \left(-{\frac {16({\frac {n}{2}}-k)^{2}}{n}}\right).\!

However, the bounds do not work well for extreme values of p. In particular, as p $\rightarrow$ 1, value F(k;n,p) goes to zero (for fixed k, n with k<n) while the upper bound above goes to a positive constant. In this case a better bound is given by ^[19]

F(k;n,p)\leq \exp \left(-nD\left({\frac {k}{n}}\left|\right|p\right)\right)\quad \quad {\mbox{if }}0<{\frac {k}{n}}<p\!

where D(a|| p) is the relative entropy between an a-coin and a p-coin (i.e. between the Bernoulli(a) and Bernoulli(p) distribution):

D(a||p)=(a)\log {\frac {a}{p}}+(1-a)\log {\frac {1-a}{1-p}}.\!

Asymptotically, this bound is reasonably tight; see ^[19] for details. An equivalent formulation of the bound is

\Pr(X\geq k)=F(n-k;n,1-p)\leq \exp \left(-nD\left({\frac {k}{n}}\left|\right|p\right)\right)\quad \quad {\mbox{if }}p<{\frac {k}{n}}<1.\!

Both these bounds are derived directly from the Chernoff bound. It can also be shown that,

\Pr(X\geq k)=F(n-k;n,1-p)\geq {\frac {1}{(n+1)^{2}}}\exp \left(-nD\left({\frac {k}{n}}\left|\right|p\right)\right)\quad \quad {\mbox{if }}p<{\frac {k}{n}}<1.\!

This is proved using the method of types (see for example chapter 12 of Elements of Information Theory by Cover and Thomas ^[20]).

References

^ Wadsworth, G. P. (1960). Introduction to probability and random variables. USA: McGraw-Hill New York. p. 52.
^ Hamilton Institute. "The Binomial Distribution" October 20, 2010.
^ See Proof Wiki
^ Neumann, P. (1966). "Über den Median der Binomial- and Poissonverteilung". Wissenschaftliche Zeitschrift der Technischen Universität Dresden (in German). 19: 29–33.
^ Lord, Nick. (July 2010). "Binomial averages when the mean is an integer", The Mathematical Gazette 94, 331-332.
^ ^a ^b Kaas, R.; Buhrman, J.M. (1980). "Mean, Median and Mode in Binomial Distributions". Statistica Neerlandica. 34 (1): 13–18. doi:10.1111/j.1467-9574.1980.tb00681.x.
^ ^a ^b Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1016/0167-7152(94)00090-U, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1016/0167-7152(94)00090-U instead.
^ ^a ^b Box, Hunter and Hunter (1978). Statistics for experimenters. Wiley. p. 130.
^ NIST/SEMATECH, "7.2.4. Does the proportion of defectives meet requirements?" e-Handbook of Statistical Methods.
^ ^a ^b NIST/SEMATECH, "6.3.3.1. Counts Control Charts", e-Handbook of Statistical Methods.
^ MacKay, David (2003). Information Theory, Inference and Learning Algorithms. Cambridge University Press; First Edition. ISBN 978-0521642989.
^ ^a ^b Brown, Lawrence D.; Cai, T. Tony; DasGupta, Anirban (2001), "Interval Estimation for a Binomial Proportion", Statistical Science, 16 (2): 101–133, retrieved 2015-01-05
^ Agresti, Alan; Coull, Brent A. (May 1998), "Approximate is better than 'exact' for interval estimation of binomial proportions" (PDF), The American Statistician, 52 (2): 119–126, retrieved 2015-01-05
^ Pires MA Confidence intervals for a binomial proportion: comparison of methods and software evaluation.
^ Wilson, Edwin B. (June 1927), "Probable inference, the law of succession, and statistical inference" (PDF), J. American Statistical Association, 22 (158): 209–212, retrieved 2015-01-05
^ Devroye, Luc (1986) Non-Uniform Random Variate Generation, New York: Springer-Verlag. (See especially Chapter X, Discrete Univariate Distributions)
^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1145/42372.42381, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1145/42372.42381 instead.
^ Matoušek, J, Vondrak, J: The Probabilistic Method (lecture notes) [1].
^ ^a ^b R. Arratia and L. Gordon: Tutorial on large deviations for the binomial distribution, Bulletin of Mathematical Biology 51(1) (1989), 125–131 [2].
^ T. Cover and J. Thomas, "Elements of Information Theory, 2nd Edition", Wiley 2006

External links

Interactive graphic: Univariate Distribution Relationships
Binomial distribution formula calculator
Binomial distribution calculator
Difference of two binomial variables: X-Y or |X-Y|

Template:Common univariate probability distributions

[1] Wadsworth, G. P. (1960). Introduction to probability and random variables. USA: McGraw-Hill New York. p. 52.

[2] Hamilton Institute. "The Binomial Distribution" October 20, 2010.

[3] See Proof Wiki

[4] Neumann, P. (1966). "Über den Median der Binomial- and Poissonverteilung". Wissenschaftliche Zeitschrift der Technischen Universität Dresden (in German). 19: 29–33.

[5] Lord, Nick. (July 2010). "Binomial averages when the mean is an integer", The Mathematical Gazette 94, 331-332.

[KaasBuhrman-6] Kaas, R.; Buhrman, J.M. (1980). "Mean, Median and Mode in Binomial Distributions". Statistica Neerlandica. 34 (1): 13–18. doi:10.1111/j.1467-9574.1980.tb00681.x.

[Hamza-7] Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1016/0167-7152(94)00090-U, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1016/0167-7152(94)00090-U instead.

[bhh-8] Box, Hunter and Hunter (1978). Statistics for experimenters. Wiley. p. 130.

[9] NIST/SEMATECH, "7.2.4. Does the proportion of defectives meet requirements?" e-Handbook of Statistical Methods.

[nist-10] NIST/SEMATECH, "6.3.3.1. Counts Control Charts", e-Handbook of Statistical Methods.

[MacKay-11] MacKay, David (2003). Information Theory, Inference and Learning Algorithms. Cambridge University Press; First Edition. ISBN 978-0521642989.

[Brown2001-12] Brown, Lawrence D.; Cai, T. Tony; DasGupta, Anirban (2001), "Interval Estimation for a Binomial Proportion", Statistical Science, 16 (2): 101–133, retrieved 2015-01-05

[Agresti1988-13] Agresti, Alan; Coull, Brent A. (May 1998), "Approximate is better than 'exact' for interval estimation of binomial proportions" (PDF), The American Statistician, 52 (2): 119–126, retrieved 2015-01-05

[Pires00-14] Pires MA Confidence intervals for a binomial proportion: comparison of methods and software evaluation.

[Wilson1927-15] Wilson, Edwin B. (June 1927), "Probable inference, the law of succession, and statistical inference" (PDF), J. American Statistical Association, 22 (158): 209–212, retrieved 2015-01-05

[16] Devroye, Luc (1986) Non-Uniform Random Variate Generation, New York: Springer-Verlag. (See especially Chapter X, Discrete Univariate Distributions)

[17] Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1145/42372.42381, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1145/42372.42381 instead.

[18] Matoušek, J, Vondrak, J: The Probabilistic Method (lecture notes) [1].

[ag-19] R. Arratia and L. Gordon: Tutorial on large deviations for the binomial distribution, Bulletin of Mathematical Biology 51(1) (1989), 125–131 [2].

[ct-20] T. Cover and J. Thomas, "Elements of Information Theory, 2nd Edition", Wiley 2006

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

@@ Line 19: / Line 19: @@
   | skewness   = <math>\frac{1-2p}{\sqrt{np(1-p)}}</math>
   | kurtosis   = <math>\frac{1-6p(1-p)}{np(1-p)}</math>
-  | entropy    = <math>\frac12 \log_2 \big( 2\pi e\, np(1-p) \big) + O \left( \frac{1}{n} \right)</math>
+  | entropy    = <math>\frac12 \log_2 \big( 2\pi e\, np(1-p) \big) + O \left( \frac{1}{n} \right)</math></br> in [[Shannon (unit)|shannons]]. For [[nat (unit)|nats]], use the natural log, and omit the factor of <math>e</math> in the log.
   | mgf        = <math>(1-p + pe^t)^n \!</math>
   | char       = <math>(1-p + pe^{it})^n \!</math>