# Talk:Generalized Dirichlet distribution

WikiProject Statistics (Rated Start-class)

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

Start  This article has been rated as Start-Class on the quality scale.

The article would become a lot more interesting and informative if the difference with the usual Dirichlet distribution could be clarified. Tomixdf 10:20, 25 August 2007 (UTC)

Hello Tomixdf. The condition for the GDD to reduce to the DD is stated just after the first displayed equation. I will emphasise it. Do you have any idea what Wong's "two wooden box" experiment means? I sure don't. Best wishes, Robinh 21:43, 25 August 2007 (UTC)
No, I have no idea what the "two wooden box experiment" is about :-). I'll look into the GDD though, it looks quite interesting. What I actually meant was: in which situations would a GDD perform better than an ordinary DD? A simple, real world example would be nice, IMO. Some plots of samples drawn from GDDs with different parameters would be great, for example. Tomixdf 23:24, 25 August 2007 (UTC)
Hello again Tomixdf. Unfortunately, the best real world example of GDD being better than the DD is Wong's two wooden box experiment. Which I do not understand (not for lack of trying). I have Wong 1998 right in front of me, open at the right page, which I have been staring at for hours and hours. And I still don't understand it properly. Connor and Mosiman discuss two nice little examples (one of percentages of bone constituents and one of areas of turtle scutes) but these are better discussed on the Neutral vector page, IMO. Let me know if you think this is inappropriate! I'll stick them in when I get a minute. Best wishes, Robinh 21:37, 26 August 2007 (UTC)
I'm also a bit puzzled by Wong's boxes :-D. But there is some pseudocode to generate samples from GDD, so it should be easy to make some plots showing the different shapes of DD and GDD for different parameter sets. I'll try to look into it one of these days. Tomixdf 18:18, 29 August 2007 (UTC)

The article needs some editing for logic. There are problems with thought experiment 2. I have copied the original material in the paragraph below, changing only the emphasis on not coloured and can not remember colour and adding the EDITING REQUIRED note to alert the reader that the material may require some editing to be correct.

EDITING REQUIRED. Note: Experiment 2, below, seems to contain an error or typo because "The marbles are not coloured", but then it is said he "cannot remember which box contains which colour marbles." Experiment 2. Analyst 2 believes that X follows a generalized Dirichlet distribution: X\sim GD(\alpha_1,\ldots,\alpha_k;\beta_1,\ldots,\beta_k). All parameters are again assumed to be positive integers. The analyst makes k + 1 wooden boxes. The boxes have two areas: one for balls and one for marbles. The marbles are not coloured. Then for j=1,\ldots,k, he puts αj marbles of colour j, and βj marbles, in to box j. He then puts a ball of colour k + 1 in box k + 1. The analyst then draws a marble from the urn. Because the boxes are wood, the analyst cannot tell which box to put the marble in (as he could in experiment 1 above); he also has a poor memory and cannot remember which box contains which colour marbles. He has to discover which box is the correct one to put the marble in. He does this by opening box 1 and comparing the marbles in it to the drawn marble. If the colours differ, the box is the wrong one. The analyst puts a marble in box 1 and proceeds to box 2. He repeats the process until the marbles in the box match the drawn marble, at which point he puts the marble in the box with the other marbles of matching colour. The analyst then draws another marble from the urn and repeats until n balls are drawn. The posterior is then generalized Dirichlet with parameters α being the number of balls, and β the number of marbles, in each box.

Note that in experiment 2, changing the order of the boxes has a non-trivial effect, unlike experiment 1. —Preceding unsigned comment added by 63.251.211.5 (talk) 02:26, 30 September 2008 (UTC)

Hello. Thanks for your comments. Problems sorted. To me, 'ball' and 'marble' are near-synonyms. I used these words because Wong used them. You are quite right about the order of the boxes: the GD distribution is neutral, but not completely neutral. Best wishes, Robinh (talk) 07:27, 30 September 2008 (UTC)

## How does the GDD arise?

One of the most useful facts to know about a distribution is how it arises in practice, i.e. what sort of real world situations are well modelled by the distribution.

The ordinary Dirichlet distribution arises in situations such as rolling a (possibly biased) die. Following a Bayesian approach, you start with a Dirichlet prior, say ${\displaystyle \alpha =(1,1,1,1,1,1)}$ if you think the die is fair (one prior count for each face), then roll the die numerous times, updating the face counts. The resulting Dirichlet reflects one's posterior belief about the die taking into account the experimental evidence. This example is helpful if it follows an introductory example of coin-tossing using the beta distribution. The Dirichlet is understood as being like a beta but with more options. The marginal distribution for one die face is intuitively seen to be beta, because the face corresponds to "heads" and all the other faces, lumped together, correspond to "tails".

However, what sort of situation gives rise to the GDD? The urn thought-experiment seems contrived and unlike any situation I can think of. Why are the boxes opaque? What's the role of the marbles? Why does the order of the boxes matter?

Presumably the GDD was devised out of necessity, not as a pure maths exercise, so it would be very helpful to have some real world examples of where it applies. (I've always found urns and balls particularly unhelpful; it's not something one does every day.)

--88.109.216.145 (talk) 12:42, 5 November 2009 (UTC)

Hello. I too have wondered what sort of real-world example might arise. Such examples appear to be rare. I'm working on it. As you say, the urn thought experiment is a little contrived, but unfortunately the GDD is what it is, and the urn experiment does correspond to the way that the distribution is defined. I find Wong 1998 to be short on motivation, and the GDD is poorly cited in the more recent literature.

I'm not sure the GDD was devised out of necessity. The real issue is neutrality as per Mosimann, and neutral vectors are GDD. The interesting thing is that there is an analytical expression for the normalizing constant of the GDD. YMMV! Best wishes, Robinh (talk) 23:00, 6 November 2009 (UTC)

Wouldn't the GDD arise from the die example if the die was imprecise in certain way such that there were two sides that were related. Say if the dies were made of varying thickness in one dimension, so that two of the faces were grouped into a front and back that had the same square dimensions, but the other sides were identical rectangles, and only the thickness of the object varied. The Four sides wouldn't be neutral, and the front and back wouldn't be neutral. —Preceding unsigned comment added by 204.134.43.171 (talk) 15:03, 14 October 2010 (UTC)