# Talk:Categorical distribution

WikiProject Statistics (Rated Start-class, Low-importance)

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

Start  This article has been rated as Start-Class on the quality scale.
Low  This article has been rated as Low-importance on the importance scale.
WikiProject Mathematics (Rated Start-class, Low-importance)
This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of Mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Mathematics rating:
 Start Class
 Low Importance
Field: Probability and statistics

Is the name categorical distribution a standard and official name? I only heard it in the context of BUGS. Albmont 11:11, 16 March 2007 (UTC)

## Error in equation display

A lot of the equations display only as text codes preceded by "Failed to parse" errors -- saw this on 2014-02-14 — Preceding unsigned comment added by 160.36.130.252 (talk) 21:50, 14 February 2014 (UTC)

## multinomial distribution

"It should not be confused with the multinomial distribution."

But I've seen in several papers the usage of the term "multinomial distribution" for this distribution. For example, the Latent Dirichlet allocation paper. Are they wrong? Took (talk) 13:13, 10 July 2008 (UTC)

The categorical distribution is equivalent to a multinomial distribution with the number of trials equal to one. In the computer science literature, this distribution is almost always referred to as the multinomial distribution. The Bernoulli distribution "exists" because there is often a need to talk about it in it's own right, even though, it too is equivalent to a binomial distribution with the number of trials equal to one. It would seem necessary, therefore, to have a name for this distribution as well. --Rhaertel80 (talk) 19:50, 23 March 2009 (UTC)

## Analogy to Multinomial Does not Hold

This first part seems clear:

"A categorical distribution is the most general distribution whose sample space is the set {1, 2, ..., n}. It is the generalization of the Bernoulli distribution for a categorical random variable."

In other words, while the Bernoulli is a flip of a 2-sided coin labeled {0,1}, the Categorical is the roll of an n sided die, labeled {0,1...n-1}.

This second part must be wrong:

"Consider the extended analogy: Bernoulli random variable : Bernoulli distribution : Binomial distribution :: Categorical random variable : Categorical distribution : Multinomial distribution"

Ok, so since I know that Binomial is a sum of N Bernoulli random variables, this analogy seems to say that Multinomial is a sum of N Categorical random variables - but this is just not right. Take N=2 and n=3. A Multinomial would count X_1 = 0, X_2 = 2 together with X_1 = 2, X_2 = 0 as giving the same sum X_1 + X_2 = 2; but a sum of two Categorical Distributions would also include X_1 = 1, X_2 = 1 (which a Multinomial would not include). —Preceding unsigned comment added by DiagonalArg (talkcontribs) 09:28, 6 September 2008 (UTC)

Good point, but I think with a small change you can see that the analogy does hold. Instead of thinking of a categorical variable as a scalar with domain [0,n-1], we could equivalently think of it as a vector of size 1 with exactly one element being 1 and all others being 0. Then, the multinomial distribution is a distribution over the pointwise sum of such vectors. Another part of the analogy is that the parameters to a Bernoulli and Binomial distribution are the same as are the parameters to a categorical and multinomial. And, a Bernoulli distribution is a special case of the binomial when n=1 as is the categorical a special case of the multinomial when n-1. Strictly speaking, this would make a categorical variable a vector as previously described, but it is more convenient to treat it as a scalar. --Rhaertel80 (talk) 19:50, 23 March 2009 (UTC)

## Original research?

This "categorical distribution" seems to be nothing but a standard discrete probability distribution. Does the name exist is the litterature? The analogy to multinomial does not hold. A multinomially distributed random variable is a sum of n independent random variables that each attains the unit vectors e_1,...,e_k with probabilities p_1,...p_k. Joakimekstrom March 21, 12pm EST —Preceding undated comment added 15:50, 22 March 2009 (UTC).

In the statistics literature, I have seen this distribution referred to as a discrete distribution. However, there is reason to dislike this. Technically speaking, Poisson and other countably infinite distributions are also discrete, as are the bernoulli, binomial, multinomial, etc. (see, for instance, the list of distributions at the bottom of most pages on distributions under "Discrete univariate with finite support" and also "...with infinite support"). At times, there is a need to specifically refer to this distribution by a unique name, much like there is a need to refer to the Bernoulli distribution. It could be the difference between a discrete distribution and the discrete distribution, but this is still potentially confusing, especially for beginners. --Rhaertel80 (talk) 19:50, 23 March 2009 (UTC)

Good edits. The formulation by Bishop is also used by Mardia (1970). Joakimekstrom March 25, 5 am EST —Preceding undated comment added 09:10, 25 March 2009 (UTC).
Your edits made the article much cleaner. In my experience the more common use of the categorical distribution is the (equivalent) non 1-of-K encoded vector. I think it might still be worth a mention. We have found the generation could to be helpful as well. --128.187.80.2 (talk) 16:36, 25 March 2009 (UTC)

## Bayesian Statistics

The article says that "if the frequency of each outcome is Ei and one begins with a uniform prior, then the posterior distribution is the function Dir(E1,...,En)." However, isn't the uniform prior Dir(1,...,1), similar to a uniform distribution being beta(1, 1)? That would make the posterior distribution Dir(E1+1,...,En+1). Ummonk (talk) 06:14, 25 April 2009 (UTC)

Yes you're right. The observed frequencies need to be added to the parameters of the prior to form the parameters of the posterior. The uniform prior Dir(1,...,1) is a sensible noninformative prior to use in many cases, although there are others such as the Perks (1947) prior which is also uniform and arguably makes more sense: Dir(1/n,...,1/n) where n is the number of categories. Take your pick. --88.109.217.134 (talk) 11:47, 16 September 2009 (UTC)

## Another multinormal?

As far as I know, in probability theory multinormal distributions are continuous. Is this word used for discrete distributions in another discipline? If so, the article should not say "In probability theory and..." Boris Tsirelson (talk) 07:13, 1 October 2010 (UTC)

Where? I don't see "multinormal" here, except in the paragraph immediately above. Melcombe (talk) 08:37, 1 October 2010 (UTC)
Oops... You are right. Sorry. It is "multinomial", not "multinormal". But still, the article is rather unclear. Is "categorical distribution" exactly the same as a distribution on a finite set? Or is the finite set supposed to be a product set, that is, the set of finite sequences ("vectors") of something? Boris Tsirelson (talk) 08:58, 1 October 2010 (UTC)
Yes, it's just a general distribution on a finite set. The intro was horribly, horribly written, as is the case with so many of the statistics articles in Wikipedia; it should be better now. (As an example of bad writing, currently the first paragraph of normal distribution says "In probability theory and statistics, the normal distribution, or Gaussian distribution, is an absolutely continuous probability distribution whose cumulants of all orders above two are zero." How in the hell is this a remotely useful intro?) Benwing (talk) 05:00, 2 October 2010 (UTC)