= Beta negative binomial distribution =

} & \text{if}\ \alpha>3 \\
              \infty & \text{otherwise}\ \end{cases}</math>
  | kurtosis =
  | entropy =
  | mgf = does not exist
  | char = ${}_{2}F_{1}(\beta,r;\alpha+\beta+r;e^{it}) \frac{(\alpha)^{(r)}}{(\alpha+\beta)^{(r)}} \!$ where $(x)^{(r)} = \frac{\Gamma(x+r)}{\Gamma(x)}$ is the Pochhammer symbol and ${}_{2}F_{1}$ is the hypergeometric function.
  | pgf = ${}_{2}F_{1}(\beta,r;\alpha+\beta+r;z) \frac{(\alpha)^{(r)}}{(\alpha+\beta)^{(r)}}$
}}

In probability theory, a beta negative binomial distribution is the probability distribution of a discrete random variable $X$ equal to the number of failures needed to get $r$ successes in a sequence of independent Bernoulli trials. The probability $p$ of success on each trial stays constant within any given experiment but varies across different experiments following a beta distribution. Thus the distribution is a compound probability distribution.

This distribution has also been called both the inverse Markov-Pólya distribution and the generalized Waring distribution or simply abbreviated as the BNB distribution. A shifted form of the distribution has been called the beta-Pascal distribution.

If parameters of the beta distribution are $\alpha$ and $\beta$, and if
$X \mid p \sim \mathrm{NB}(r,p),$
where
$p \sim \textrm{B}(\alpha,\beta),$
then the marginal distribution of $X$ (i.e. the posterior predictive distribution) is a beta negative binomial distribution:
$X \sim \mathrm{BNB}(r,\alpha,\beta).$

In the above, $\mathrm{NB}(r,p)$ is the negative binomial distribution and $\textrm{B}(\alpha,\beta)$ is the beta distribution.

==Definition and derivation==
Denoting $f_{X|p}(k|q), f_{p}(q|\alpha,\beta)$ the densities of the negative binomial and beta distributions respectively, we obtain the PMF $f(k|\alpha,\beta,r)$ of the BNB distribution by marginalization:
$\begin{align}
f(k|\alpha,\beta,r) \; =& \; \int_0^1 f_{X|p}(k|r,q) \cdot f_{p}(q|\alpha,\beta) \mathrm{d} q \\
 =& \; \int_0^1 \binom{k+r-1}{k} (1-q)^k q^r \cdot \frac{q^{\alpha-1}(1-q)^{\beta-1}} {\Beta(\alpha,\beta)} \mathrm{d} q \\
 =& \; \frac{1}{\Beta(\alpha,\beta)} \binom{k+r-1}{k} \int_0^1 q^{\alpha+r-1}(1-q)^{\beta+k-1} \mathrm{d} q
\end{align}$

Noting that the integral evaluates to:
$\int_0^1 q^{\alpha+r-1}(1-q)^{\beta+k-1} \mathrm{d} q = \frac{\Gamma(\alpha+r)\Gamma(\beta+k)}{\Gamma(\alpha+\beta+k+r)}$
we can arrive at the following formulas by relatively simple manipulations.

If $r$ is an integer, then the PMF can be written in terms of the beta function,:
$f(k|\alpha,\beta,r)=\binom{r+k-1}k\frac{\Beta(\alpha+r,\beta+k)}{\Beta(\alpha,\beta)}$.
More generally, the PMF can be written
$f(k|\alpha,\beta,r)=\frac{\Gamma(r+k)}{k!\;\Gamma(r)}\frac{\Beta(\alpha+r,\beta+k)}{\Beta(\alpha,\beta)}$
or
$f(k|\alpha,\beta,r)=\frac{\Beta(r+k,\alpha+\beta)}{\Beta(r,\alpha)}\frac{\Gamma(k+\beta)}{k!\;\Gamma(\beta)}$.
===PMF expressed with Gamma===
Using the properties of the Beta function, the PMF with integer $r$ can be rewritten as:
$f(k|\alpha,\beta,r)=\binom{r+k-1}k\frac{\Gamma(\alpha+r)\Gamma(\beta+k)\Gamma(\alpha+\beta)}{\Gamma(\alpha+r+\beta+k)\Gamma(\alpha)\Gamma(\beta)}$.

More generally, the PMF can be written as
$f(k|\alpha,\beta,r)=\frac{\Gamma(r+k)}{k!\;\Gamma(r)}\frac{\Gamma(\alpha+r)\Gamma(\beta+k)\Gamma(\alpha+\beta)}{\Gamma(\alpha+r+\beta+k)\Gamma(\alpha)\Gamma(\beta)}$.

===PMF expressed with the rising Pochammer symbol===
The PMF is often also presented in terms of the Pochammer symbol for integer $r$
$f(k|\alpha,\beta,r)=\frac{r^{(k)}\alpha^{(r)}\beta^{(k)}}{k!(\alpha+\beta)^{(r+k)}}$

==Properties==
===Factorial Moments===
The k-th factorial moment of a beta negative binomial random variable X is defined for $k < \alpha$ and in this case is equal to

$\operatorname{E}\bigl[(X)_k\bigr] = \frac{\Gamma(r+k)}{\Gamma(r)}\frac{\Gamma(\beta+k)}{\Gamma(\beta)}\frac{\Gamma(\alpha-k)}{\Gamma(\alpha)}.$

===Non-identifiable===
The beta negative binomial is non-identifiable which can be seen easily by simply swapping $r$ and $\beta$ in the above density or characteristic function and noting that it is unchanged. Thus estimation demands that a constraint be placed on $r$, $\beta$ or both.

===Relation to other distributions===
The beta negative binomial distribution contains the beta geometric distribution as a special case when either $r=1$ or $\beta=1$. It can therefore approximate the geometric distribution arbitrarily well. It also approximates the negative binomial distribution arbitrary well for large $\alpha$. It can therefore approximate the Poisson distribution arbitrarily well for large $\alpha$, $\beta$ and $r$.

===Heavy tailed===
By Stirling's approximation to the beta function, it can be easily shown that for large $k$
$f(k|\alpha,\beta,r) \sim \frac{\Gamma(\alpha+r)}{\Gamma(r)\Beta(\alpha,\beta)}\frac{k^{r-1}}{(\beta+k)^{r+\alpha}}$
which implies that the beta negative binomial distribution is heavy tailed and that moments less than or equal to $\alpha$ do not exist.

==Beta geometric distribution==
The beta geometric distribution is an important special case of the beta negative binomial distribution occurring for $r=1$. In this case the pmf simplifies to

$f(k|\alpha,\beta)=\frac{\mathrm{B}(\alpha+1,\beta+k)} {\mathrm{B}(\alpha,\beta)}$.

This distribution is used in some Buy Till you Die (BTYD) models.

Further, when $\beta=1$ the beta geometric reduces to the Yule–Simon distribution. However, it is more common to define the Yule-Simon distribution in terms of a shifted version of the beta geometric. In particular, if $X \sim BG(\alpha,1)$ then $X+1 \sim YS(\alpha)$.

==Beta negative binomial as a Pólya urn model==

In the case when the 3 parameters $r, \alpha$ and $\beta$ are positive integers, the Beta negative binomial can also be motivated by an urn model - or more specifically a basic Pólya urn model. Consider an urn initially containing $\alpha$ red balls (the stopping color) and $\beta$ blue balls. At each step of the model, a ball is drawn at random from the urn and replaced, along with one additional ball of the same color. The process is repeated over and over, until $r$ red colored balls are drawn. The random variable $X$ of observed draws of blue balls are distributed according to a $\mathrm{BNB}(r, \alpha, \beta)$. Note, at the end of the experiment, the urn always contains the fixed number $r+\alpha$ of red balls while containing the random number $X+\beta$ blue balls.

By the non-identifiability property, $X$ can be equivalently generated with the urn initially containing $\alpha$ red balls (the stopping color) and $r$ blue balls and stopping when $\beta$ red balls are observed.

==See also==
- Negative binomial distribution
- Dirichlet negative multinomial distribution
