= Negative multinomial distribution =

{\Gamma(x_0)} \prod_{i=1}^m{\frac{p_i^{x_i}}{x_i!}},</math>
 where Γ(x) is the Gamma function.
  | cdf =
  | mean = $\tfrac{x_0}{p_0}\,\mathbf{p}$
  | variance = $\tfrac{x_0}{p_0^2}\,\mathbf{pp}' + \tfrac{x_0}{p_0}\,\operatorname{diag}(\mathbf{p})$
  | mode =
  | entropy =
  | mgf = $\bigg(\frac{p_0}{1 - \sum_{j=1}^m p_j e^{t_j}}\bigg)^{\!x_0}$
  | cf = $\bigg(\frac{p_0}{1 - \sum_{j=1}^m p_j e^{it_j}}\bigg)^{\!x_0}$
  }}

In probability theory and statistics, the negative multinomial distribution is a generalization of the negative binomial distribution (NB(x_{0}, p)) to more than two outcomes.

As with the univariate negative binomial distribution, if the parameter $x_0$ is a positive integer, the negative multinomial distribution has an urn model interpretation. Suppose we have an experiment that generates m+1≥2 possible outcomes, {X_{0},...,X_{m}}, each occurring with non-negative probabilities {p_{0},...,p_{m}} respectively. If sampling proceeded until n observations were made, then {X_{0},...,X_{m}} would have been multinomially distributed. However, if the experiment is stopped once X_{0} reaches the predetermined value x_{0} (assuming x_{0} is a positive integer), then the distribution of the m-tuple {X_{1},...,X_{m}} is negative multinomial. These variables are not multinomially distributed because their sum X_{1}+...+X_{m} is not fixed, being a draw from a negative binomial distribution.

==Properties==

===Marginal distributions===

If m-dimensional x is partitioned as follows
$\mathbf{X}
=
\begin{bmatrix}
 \mathbf{X}^{(1)} \\
 \mathbf{X}^{(2)}
\end{bmatrix}

\text{ with sizes }\begin{bmatrix} n \times 1 \\ (m-n) \times 1 \end{bmatrix}$
and accordingly $\boldsymbol{p}$
$\boldsymbol p
=
\begin{bmatrix}
 \boldsymbol p^{(1)} \\
 \boldsymbol p^{(2)}
\end{bmatrix}
\text{ with sizes }\begin{bmatrix} n \times 1 \\ (m-n) \times 1 \end{bmatrix}$
and let
$q = 1-\sum_i p_i^{(2)} = p_0+\sum_i p_i^{(1)}$

The marginal distribution of $\boldsymbol X^{(1)}$ is $\mathrm{NM}(x_0,p_0/q, \boldsymbol p^{(1)}/q )$. That is the marginal distribution is also negative multinomial with the $\boldsymbol p^{(2)}$ removed and the remaining ps properly scaled so as to add to one.

The univariate marginal $m=1$ is said to have a negative binomial distribution.

===Conditional distributions===

The conditional distribution of $\mathbf{X}^{(1)}$ given $\mathbf{X}^{(2)}=\mathbf{x}^{(2)}$ is $\mathrm{NM}(x_0+\sum{x_i^{(2)}},\mathbf{p}^{(1)})$. That is,
$\Pr(\mathbf{x}^{(1)}\mid \mathbf{x}^{(2)}, x_0, \mathbf{p} )= \Gamma\!\left(\sum_{i=0}^m{x_i}\right)\frac{(1-\sum_{i=1}^n{p_i^{(1)}})^{x_0+\sum_{i=1}^{m-n}x_i^{(2)}}}{\Gamma(x_0+\sum_{i=1}^{m-n}x_i^{(2)})}\prod_{i=1}^n{\frac{(p_i^{(1)})^{x_i}}{(x_i^{(1)})!}}.$

===Independent sums===
If $\mathbf{X}_1 \sim \mathrm{NM}(r_1, \mathbf{p})$ and If $\mathbf{X}_2 \sim \mathrm{NM}(r_2, \mathbf{p})$ are independent, then
$\mathbf{X}_1+\mathbf{X}_2 \sim \mathrm{NM}(r_1+r_2, \mathbf{p})$. Similarly and conversely, it is easy to see from the characteristic function that the negative multinomial is infinitely divisible.

===Aggregation===
If
$\mathbf{X} = (X_1, \ldots, X_m)\sim\operatorname{NM}(x_0, (p_1,\ldots,p_m))$
then, if the random variables with subscripts i and j are dropped from the vector and replaced by their sum,
$\mathbf{X}' = (X_1, \ldots, X_i + X_j, \ldots, X_m)\sim\operatorname{NM} (x_0, (p_1, \ldots, p_i + p_j, \ldots, p_m)).$

This aggregation property may be used to derive the marginal distribution of $X_i$ mentioned above.

===Correlation matrix===
The entries of the correlation matrix are
$\rho(X_i,X_i) = 1.$
$\rho(X_i,X_j) = \frac{\operatorname{cov}(X_i,X_j)}{\sqrt{\operatorname{var}(X_i)\operatorname{var}(X_j)}} = \sqrt{\frac{p_i p_j}{(p_0+p_i)(p_0+p_j)}}.$

==Parameter estimation==

===Method of Moments===

If we let the mean vector of the negative multinomial be
$\boldsymbol{\mu}=\frac{x_0}{p_0}\mathbf{p}$
and covariance matrix
$\boldsymbol{\Sigma}=\tfrac{x_0}{p_0^2}\,\mathbf{p}\mathbf{p}' + \tfrac{x_0}{p_0}\,\operatorname{diag}(\mathbf{p}),$
then it is easy to show through properties of determinants that $|\boldsymbol{\Sigma}| = \frac{1}{p_0}\prod_{i=1}^m{\mu_i}$. From this, it can be shown that
$x_0=\frac{\sum{\mu_i}\prod{\mu_i}}{|\boldsymbol{\Sigma}|-\prod{\mu_i}}$
and
$\mathbf{p}= \frac{|\boldsymbol{\Sigma}|-\prod{\mu_i}}{|\boldsymbol{\Sigma}|\sum{\mu_i}}\boldsymbol{\mu}.$

Substituting sample moments yields the method of moments estimates
$\hat{x}_0=\frac{(\sum_{i=1}^{m}{\bar{x_i})}\prod_{i=1}^{m}{\bar{x_i}}}{|\mathbf{S}|-\prod_{i=1}^{m}{\bar{x_i}}}$
and
$\hat{\mathbf{p}}=\left(\frac{|\boldsymbol{S}|-\prod_{i=1}^{m}{\bar{x}_i}}{|\boldsymbol{S}|\sum_{i=1}^{m}{\bar{x}_i}}\right)\boldsymbol{\bar{x}}$

==Related distributions==
- Negative binomial distribution
- Multinomial distribution
- Inverted Dirichlet distribution, a conjugate prior for the negative multinomial
- Dirichlet negative multinomial distribution
