Dirichlet negative multinomial distribution: Difference between revisions

Notation
Parameters
Support
PDF	; where Γ(x) is the Gamma function.
Mean	for
Variance	for
MGF	undefined

Browse history interactively

← Previous edit Next edit →

Content deleted Content added

VisualWikitext

Inline

Revision as of 19:16, 8 May 2020

In probability theory and statistics, the Dirichlet negative multinomial distribution is a multivariate distribution on the non-negative integers. It is a multivariate extension of the beta negative binomial distribution. It is also a generalization of the negative multinomial distribution (NM(k, p)) allowing for heterogeneity or overdispersion to the probability vector. It is used in quantitative marketing research to flexibly model the number of household transactions across several brands.

If parameters of the Dirichlet distribution are ${\boldsymbol {\alpha }}$ , and if

X\mid p\sim \operatorname {NM} (x_{0},\mathbf {p} ),

where

\mathbf {p} \sim \operatorname {Dir} (\alpha _{0},{\boldsymbol {\alpha }}),

then the marginal distribution of X is a Dirichlet negative multinomial distribution:

X\sim \operatorname {DNM} (x_{0},\alpha _{0},{\boldsymbol {\alpha }}).

In the above, $\operatorname {NM} (x_{0},\mathbf {p} )$ is the negative multinomial distribution and $\operatorname {Dir} (\alpha _{0},{\boldsymbol {\alpha }})$ is the Dirichlet distribution.

Motivation

Dirichlet negative multinomial as a compound distribution

The Dirichlet distribution is a conjugate distribution to the negative multinomial distribution. This fact leads to an analytically tractable compound distribution. For a random vector of category counts $\mathbf {x} =(x_{1},\dots ,x_{K})$ , distributed according to a negative multinomial distribution, the compound distribution is obtained by integrating on the distribution for p which can be thought of as a random vector following a Dirichlet distribution:

\Pr(\mathbf {x} \mid x_{0},\alpha _{0},{\boldsymbol {\alpha }})=\int _{\mathbf {p} }\Pr(\mathbf {x} \mid x_{0},\mathbf {p} )\Pr(\mathbf {p} \mid \alpha _{0},{\boldsymbol {\alpha }}){\textrm {d}}\mathbf {p}

\Pr(\mathbf {x} \mid x_{0},\alpha _{0},{\boldsymbol {\alpha }})=\int _{\mathbf {p} }\left[\Gamma \!\left(\sum _{i=0}^{m}{x_{i}}\right){\frac {p_{0}^{x_{0}}}{\Gamma (x_{0})}}\prod _{i=1}^{m}{\frac {p_{i}^{x_{i}}}{x_{i}!}}\right]{\frac {1}{\mathrm {B} ({\boldsymbol {\alpha }})}}\prod _{i=0}^{m}p_{i}^{\alpha _{i}-1}{\textrm {d}}\mathbf {p}

\Pr(\mathbf {x} \mid x_{0},\alpha _{0},{\boldsymbol {\alpha }})={\frac {\Gamma \left(\sum _{i=0}^{m}{x_{i}}\right)}{\Gamma (x_{0})\prod _{i=1}^{m}x_{i}!}}{\frac {1}{\mathrm {B} ({\boldsymbol {\alpha }})}}\int _{\mathbf {p} }\prod _{i=0}^{m}p_{i}^{x_{i}+\alpha _{i}-1}{\textrm {d}}\mathbf {p}

which results in the following formula:

\Pr(\mathbf {x} \mid x_{0},\alpha _{0},{\boldsymbol {\alpha }})={\frac {\Gamma \left(\sum _{i=0}^{m}{x_{i}}\right)}{\Gamma (x_{0})\prod _{i=1}^{m}x_{i}!}}{\frac {{\mathrm {B} }(\mathbf {x_{+}} +{\boldsymbol {\alpha }}_{+})}{\mathrm {B} ({\boldsymbol {\alpha }}_{+})}}

where $\mathbf {x_{+}}$ and ${\boldsymbol {\alpha }}_{+}$ are the $m+1$ dimensional vectors created by appending the scalars $x_{0}$ and $\alpha _{0}$ to the $m$ dimensional vectors $\mathbf {x}$ and ${\boldsymbol {\alpha }}$ respectively. We can write this equation explicitly as

\Pr(\mathbf {x} \mid x_{0},\alpha _{0},{\boldsymbol {\alpha }})=x_{0}{\frac {\Gamma (\sum _{i=0}^{m}x_{i})\Gamma (\sum _{i=0}^{m}\alpha _{i})}{\Gamma (\sum _{i=0}^{m}(x_{i}+\alpha _{i}))}}\prod _{i=0}^{m}{\frac {\Gamma (x_{i}+\alpha _{i})}{\Gamma (x_{i}+1)\Gamma (\alpha _{i})}}.

Alternative formulations exist. One convenient representation^[1] is

\Pr(\mathbf {x} \mid x_{0},\alpha _{0},{\boldsymbol {\alpha }})={\frac {\Gamma (x_{\bullet })}{\Gamma (x_{0})\prod _{i=1}^{m}\Gamma (x_{i}+1)}}\times {\frac {\Gamma (\alpha _{\bullet })}{\prod _{i=0}^{m}\Gamma (\alpha _{i})}}\times {\frac {\prod _{i=0}^{m}\Gamma (x_{i}+\alpha _{i})}{\Gamma (x_{\bullet }+\alpha _{\bullet })}}

where $x_{\bullet }=x_{0}+x_{1}+\cdots +x_{m}$ and $\alpha _{\bullet }=\alpha _{0}+\alpha _{1}+\cdots +\alpha _{m}$ .

Properties

Marginal distributions

To obtain the marginal distribution over a subset of Dirichlet negative multinomial random variables, one only needs to drop the irrelevant $\alpha _{i}$ 's (the variables that one wants to marginalize out) from the ${\boldsymbol {\alpha }}$ vector. The joint distribution of the remaining random variates is $\mathrm {DNM} (x_{0},\alpha _{0},{\boldsymbol {\alpha _{-}}})$ where ${\boldsymbol {\alpha _{-}}}$ is the vector with the removed $\alpha _{i}$ 's.

Conditional distributions

If m-dimensional x is partitioned as follows

\mathbf {x} ={\begin{bmatrix}\mathbf {x} ^{(1)}\\\mathbf {x} ^{(2)}\end{bmatrix}}{\text{ with sizes }}{\begin{bmatrix}q\times 1\\(m-q)\times 1\end{bmatrix}}

and accordingly ${\boldsymbol {\alpha }}$

{\boldsymbol {\alpha }}={\begin{bmatrix}{\boldsymbol {\alpha }}^{(1)}\\{\boldsymbol {\alpha }}^{(2)}\end{bmatrix}}{\text{ with sizes }}{\begin{bmatrix}q\times 1\\(m-q)\times 1\end{bmatrix}}

then the conditional distribution of $\mathbf {x} ^{(1)}$ on $\mathbf {x} ^{(2)}$ is $\mathrm {DNM} (x_{0}^{\prime },\alpha _{0}^{\prime },{\boldsymbol {\alpha }}^{(1)})$ where

x_{0}^{\prime }=x_{0}+\sum _{i=1}^{m-q}x_{i}^{(2)}

and

\alpha _{0}^{\prime }=\alpha _{0}+\sum _{i=1}^{m-q}\alpha _{i}^{(2)}

.

Conditional on the sum

The conditional distribution of a Dirichlet negative multinomial distribution on $\sum _{i=1}^{m}x_{i}=n$ is Dirichlet-multinomial distribution with parameters $n$ and ${\boldsymbol {\alpha }}$ . That is

\Pr(\mathbf {x} \mid \sum _{i=1}^{m}x_{i}=n,x_{0},\alpha _{0},{\boldsymbol {\alpha }})={\frac {\left(n!\right)\Gamma \left(\sum _{i=1}^{m}\alpha _{i}\right)}{\Gamma \left(n+\sum _{i=1}^{m}\alpha _{i}\right)}}\prod _{i=1}^{m}{\frac {\Gamma (x_{i}+\alpha _{i})}{\left(x_{i}!\right)\Gamma (\alpha _{i})}}

.

Notice that the equation does not depend on $x_{0}$ or $\alpha _{0}$ .

Correlation matrix

For $\alpha _{0}>2$ the entries of the correlation matrix are

\rho (X_{i},X_{i})=1.

\rho (X_{i},X_{j})={\frac {\operatorname {cov} (X_{i},X_{j})}{\sqrt {\operatorname {var} (X_{i})\operatorname {var} (X_{j})}}}={\sqrt {\frac {\alpha _{i}\alpha _{j}}{(\alpha _{0}+\alpha _{i}-1)(\alpha _{0}+\alpha _{j}-1)}}}.

Heavy tailed

The Dirichlet negative multinomial is a heavy tailed distribution. It does not have a finite mean for $\alpha _{0}\leq 1$ and it has infinite covariance matrix for $\alpha _{0}\leq 2$ . It therefore has undefined moment generating function.

Aggregation

If

X=(X_{1},\ldots ,X_{m})\sim \operatorname {DNM} (x_{0},\alpha _{0},\alpha _{1},\ldots ,\alpha _{m})

then, if the random variables with positive subscripts i and j are dropped from the vector and replaced by their sum,

X'=(X_{1},\ldots ,X_{i}+X_{j},\ldots ,X_{K})\sim \operatorname {DNM} \left(x_{0},\alpha _{0},\alpha _{1},\ldots ,\alpha _{i}+\alpha _{j},\ldots ,\alpha _{m}\right).

Applications

Dirichlet negative multinomial as a urn model

The Dirichlet negative multinomial can also be motivated by an urn model in the case when $x_{0}$ is a positive integer. Consider a sequence of independent and identically distributed multinomial trials, each of which has $1+m$ outcomes. Call one of the outcomes a “success”, and suppose it has probability $p_{0}$ . The other $m$ outcomes – called "failures" - have probabilities $p_{1},p_{2},\ldots ,p_{m}$ . If the vector $(x_{1},x_{2},\cdots x_{m})$ counts the m types of failures before the $x_{0}$ success is observed, then the $(x_{1},x_{2},\ldots ,x_{m})$ have negative mulitnomial distribution with parameters $(x_{0},p_{1},\ldots ,p_{m})$ .

If the parameters $(p_{0},p_{1},\ldots ,p_{m})$ are themselves sampled from a Dirichlet distribution with parameters $(\alpha _{0},\alpha _{1},\ldots ,\alpha _{m})$ , then the resulting distribution of $(x_{1},x_{2},\ldots ,x_{m})$ is Dirichlet negative multinomial. The resultant distribution has $2+m$ parameters.

If we imagine an urn containing $x_{0}$ white balls (success) and $m$ colors of balls numbering $(\alpha _{1},\alpha _{2},\ldots ,\alpha _{m})$ respectively (also non-negative integers in this case) and we successively draw from the urn until we observe $x_{0}$ white balls - and each time we draw from the urn and observe the color of the drawn ball we return two balls of the observed color back to the urn (in a Pólya urn scheme), then the resulting distribution of the number of observed m failure colors is Dirichlet negative multinomial. In the case where all parameters are positive integers, the distribution can clearly be written in purely in terms of factorials as opposed to gamma functions.

References

^ Farewell, Daniel & Farewell, Vernon. (2012). Dirichlet negative multinomial regression for overdispersed correlated count data. Biostatistics (Oxford, England). 14. 10.1093/biostatistics/kxs050.

[1] Farewell, Daniel & Farewell, Vernon. (2012). Dirichlet negative multinomial regression for overdispersed correlated count data. Biostatistics (Oxford, England). 14. 10.1093/biostatistics/kxs050.

[1]

@@ Line 132: / Line 132: @@
 If we imagine an urn containing <math>x_0</math> white balls (success) and <math>m</math> colors of balls numbering <math>(\alpha_1, \alpha_2, \ldots, \alpha_m) </math> respectively (also non-negative integers in this case) and we successively draw from the urn until we observe <math>x_0</math> white balls - and each time we draw from the urn and observe the color of the drawn ball we return two balls of the observed color back to the urn (in a [[Pólya urn model|Pólya urn scheme]]), then the resulting distribution of the number of observed m failure colors is Dirichlet negative multinomial. In the case where all parameters are positive integers, the distribution can clearly be written in purely in terms of [[factorials]] as opposed to [[gamma function]]s.
-===Dirichlet negative multinomial as a model for consumer behavior===
-One application of the DNM distribution is for observed consumer transactions of m distinct brands with no special groupings. The model considered by Goodhardt, Ehrenberg and Chatfield<ref>Goodhardt, G.J., Ehrenberg, A.S.C. and Chatfield, C.(1984). The Dirichlet: a comprehensive model of buying
-behaviour. Journal of the Royal Statistical Society. Series A (General) 147, 621–655.</ref> assumes
-# Purchasing of a product class takes the form of a [[Poisson process]] for each consumer
-# There is heterogeneity in purchase rates among the consumers following a [[gamma distribution]]
-# The choice of m brands for each consumer follows a [[multinomial distribution]]
-# There is heterogeneity of the brand choice among the consumers follows a [[Dirichlet distribution]]
-The model has been found to account for a number of empirical generalisations, including [[Double jeopardy (marketing)|double jeopardy]], the duplication of purchase law, and natural monopoly. It has been shown to hold over different product categories, countries, time, and for both subscription and repertoire repeat-purchase markets. It has been described as one of the most famous empirical generalisations in marketing,
 ==See also==