|WikiProject Statistics||(Rated Start-class, Mid-importance)|
|WikiProject Mathematics||(Rated Start-class, Mid-importance)|
I have rewritten some of this article to address the following problem: The previous version of this article did not clearly make the distinction between the joint probability for a sequence of draws and the counts over such a sequence. See for example Tom Minka's paper at http://research.microsoft.com/en-us/um/people/minka/papers/multinomial.html, where this problem is highlighted. The introduction of multinomial distribution also mentions this problem.
See also http://research.microsoft.com/en-us/um/people/minka/papers/dirichlet/ by Minka, where he defines the Dirichlet compound multinomial (DCM), without the multinomial coefficient. In other words, Minka's DCM models particular sequences, while the version in the present Wikipedia article models counts.
The above comment is confused. None of the Minka papers mentioned has a definition of the Dirichlet-multinomial (DM). One of the papers has a Likelihood function of a sample without the normalizing constant (aka "nuisance parameter" of the likelihood) as the normalizing constant was not pertinent to the estimation of the Dirichlet. But the likelihood function is not a distribution if not summed over all permutations of the sample which is what the normalizing constant is for. That is, the above author wrote an expression which was not even a density which must sum to 1. The DM is not 2 different distributions where a distinction needs to be drawn as the above author contends. It is a well-known single distribution which comes with a normalizing constant. The DM is a multivariate generalization of the Beta-binomial (comes with normalizing constant) and as Minka points out, the DM approaches the multinomial distribution which also has a normalizing constant as both Wikipedia articles correctly express. — Preceding unsigned comment added by 22.214.171.124 (talk) 20:30, 7 April 2016 (UTC)
The terminology categorical distribution is not used everywhere. Bishop does not use that name, but he does have different formulas in his book (Pattern Recognition and Machine Learning) for the multinomial and categorical distribution. (Minka simply calls both multinomial.) Since wikipedia already has an article called categorical distribution, I thought it would help to use this terminology here and elsewhere in Wikipedia. CalvynkW (talk) 12:03, 14 April 2011 (UTC)
Hi does anyone have the original Pòlya reference on this topic? From some search it appear to have been done in the twenties. — Preceding unsigned comment added by 126.96.36.199 (talk) 17:20, 15 December 2011 (UTC)
Despite the comment above, the distinction wasn't clearly made between the two forms of the DCM. I have almost totally rewritten and greatly expanded the page, and it should hopefully now make the distinction clear enough. The page now mostly focuses on the simpler form (without the multinomial constant). Benwing (talk) 21:35, 17 March 2012 (UTC)
I think the current exposition without the multinomial constant should specify exact what the index runs over. If it is a categorical variable in Bishop's sense, namely a vector with only one value 1, then this does NOT model counts. Feel free to correct my derivation: https://math.stackexchange.com/questions/709959/how-to-derive-the-dirichlet-multinomial/ Anne van Rossum (talk) 21:25, 21 March 2014 (UTC)