Poisson binomial distribution

From Wikipedia, the free encyclopedia
Jump to: navigation, search
Poisson binomial
Parameters \mathbf{p}\in [0,1]^n — success probabilities for each of the n trials
Support k ∈ { 0, …, n }
pmf \sum\limits_{A\in F_k} \prod\limits_{i\in A} p_i \prod\limits_{j\in A^c} (1-p_j)
CDF \sum\limits_{l=0}^k \sum\limits_{A\in F_l} \prod\limits_{i\in A} p_i \prod\limits_{j\in A^c}{(1-p_j)}
Mean \sum\limits_{i=1}^n p_i
Variance  \sigma^2 =\sum\limits_{i=1}^n (1 - {p_i}){p_i}
Skewness \frac{1}{\sigma^3}\sum\limits_{i=1}^n {\left( 1-2{p_i} \right)\left( 1-{{p}_{i}} \right){{p}_{i}}}
Ex. kurtosis \frac{1}{\sigma^4}\sum\limits_{i=1}^n {\left( 1 - 6(1 - p_i){p_i} \right)\left( 1 - p_i \right)p_i}
MGF \prod\limits_{j=1}^n (1-{p_j}+{p_j}{e^{t}})
CF \prod\limits_{j=1}^n (1-{p_j}+{p_j}{e^{it}})

In probability theory and statistics, the Poisson binomial distribution is the discrete probability distribution of a sum of independent Bernoulli trials that are not necessarily identically distributed. The concept is named after Siméon Denis Poisson.

In other words, it is the probability distribution of the number of successes in a sequence of n independent yes/no experiments with success probabilities p_1, p_2, \dots , p_n. The ordinary binomial distribution is a special case of the Poisson binomial distribution, when all success probabilities are the same, that is p_1 = p_2 = \dots = p_n .

Mean and variance[edit]

Since a Poisson binomial distributed variable is a sum of n independent Bernoulli distributed variables, its mean and variance will simply be sums of the mean and variance of the n Bernoulli distributions:

\mu = \sum\limits_{i=1}^n p_i
\sigma^2 =\sum\limits_{i=1}^n (1-p_i) p_i

For a fixed values of the mean and of n the variance is maximal when all success probabilities are equal and we have a binomial distribution. For fixed mean the variance is upper bounded by the variance of the Poisson distribution with the same mean which is attained asymptotically as n tends to infinity.

Probability mass function[edit]

The probability of having k successful trials out of a total of n can be written as the sum [1]

Pr(K=k) = \sum\limits_{A\in {{F}_{k}}}{\prod\limits_{i\in A}{{{p}_{i}}}\prod\limits_{j\in {{A}^{c}}}{(1-{{p}_{j}})}}

where F_k is the set of all subsets of k integers that can be selected from {1,2,3,...,n}. For example, if n = 3, then {{F}_{2}}=\left\{ \{1,2\},\{1,3\},\{2,3\} \right\}. A^c is the complement of A, i.e. A^c =\{1,2,3,\dots,n\}\backslash A.

F_k will contain n!/(n-k)!k! elements, the sum over which is infeasible to compute in practice unless the number of trials n is small (e.g. if n=30, F_{15} contains over 1020 elements). There are luckily more efficient ways to calculate Pr(K=k).

As long as none of the success probabilities are equal to one, one can calculate the probability of k successes using the recursive formula [2] [3]

\Pr (K=k)=\left\{ \begin{align}
  & \prod\limits_{i=1}^{n}{(1-{{p}_{i}})}  \qquad k=0 \\ 
 & \frac{1}{k}\sum\limits_{i=1}^{k}{{{(-1)}^{i-1}}\Pr (K=k-i)T(i)} \qquad k>0 \\ 
\end{align} \right.

where T(i)=\sum\limits_{j=1}^{n}{{{\left( \frac{{{p}_{j}}}{1-{{p}_{j}}} \right)}^{i}}} .

The recursive formula is not numerically stable, and should be avoided if n is greater than approximately 20. Another possibility is using the discrete Fourier transform [4]

\Pr (K=k)=\frac{1}{n+1}\sum\limits_{l=0}^{n}{{{C}^{-lk}}\prod\limits_{m=1}^{n}{\left( 1+({{C}^{l}}-1){{p}_{m}} \right)}}

where C=\exp \left( \frac{2i\pi }{n+1} \right)

where i=\sqrt{-1}

Still other methods are described in .[5]

Entropy[edit]

There is no simple formula for the entropy of a Poisson binomial distribution, but the entropy can be upper bounded by that entropy of a binomial distribution with the same number parameter and the same mean. Therefore the entropy can also be upper bounded by the entropy of a Poisson distribution with the same mean. [6]

See also[edit]

References[edit]

  1. ^ Wang, Y. H. (1993). "On the number of successes in independent trials". Statistica Sinica 3 (2): 295–312. 
  2. ^ Shah, B. K. (1994). "On the distribution of the sum of independent integer valued random variables". American Statistician 27 (3): 123–124. JSTOR 2683639. 
  3. ^ Chen, X. H.; A. P. Dempster, J. S. Liu (1994). "Weighted finite population sampling to maximize entropy". Biometrika 81 (3): 457. doi:10.1093/biomet/81.3.457. 
  4. ^ Fernandez, M.; S. Williams (2010). "Closed-Form Expression for the Poisson-Binomial Probability Density Function". IEEE Transactions on Aerospace Electronic Systems 46: 803–817. doi:10.1109/TAES.2010.5461658. 
  5. ^ Chen, S. X.; J. S. Liu (1997). "Statistical Applications of the Poisson-Binomial and conditional Bernoulli distributions". Statistica Sinica 7: 875–892. 
  6. ^ Harremoës, P. (2001). "Binomial and Poisson distributions as maximum entropy distributions". IEEE Transactions on Information Theory 47 (5): 2039–2041. doi:10.1109/18.930936.