Talk:Negative multinomial distribution

WikiProject Statistics (Rated Start-class, Low-importance)

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

Start  This article has been rated as Start-Class on the quality scale.
Low  This article has been rated as Low-importance on the importance scale.

Including an explicit references to the UCLA NMD calculator

Expert probability, statistics and education opinions are solicited for this NMD article.

Please comment below on the encyclopedic value of adding an explicit reference to the UCLA NMD calculator in this Negative multinomial distribution article.

The editor that started the article (User:Iwaterpolo) is associated with UCLA and declared his association when he created the article. The editor that removed the link and the image illustrating the results of the NMD example (User:O18) argues that there is a conflict of interest and removed these materials from the article.

There is no other publicly available online resources for computing with NMD. User:Iwaterpolo argues that this link, and this image, which demonstrates NMD data analysis, add significant value to the material and should be included. User:O18 argues that the links and figure should be excluded because User:Iwaterpolo is associated with UCLA, the institution that developed these unique applet resources, and has a Wikipedia:Conflict_of_interest. Specifically, removed the following from the article:

Iwaterpolo (talk) 05:25, 9 November 2009 (UTC)

Iwaterpolo, what I wanted was for you to propose any edits related to SOCR on the article's talk page before making them, getting general consensus that they should be made and having someone else make the edits. Since you appear to have started that process now, I'm happy. Thanks! O18 (talk 15:10, 9 November 2009 (UTC)
I saw the addition of the link and screenshot as a non sequitur. It was almost along the lines of creating an article about luxury cars and putting a link to a Lexus dealership in the middle of the article. Sure, you're not actually selling anything, but I'm still concerned about the continuing attempt to promote the tool in Wikipedia. -- Atama 20:36, 11 November 2009 (UTC)
Your allegory of the luxury cars article and a Lexus dealership is not exactly correct, as there are a number of luxury cars and any article about them should *not* single out any one car. In this case, there is only one tool that would allow you to compute the NMD, and it was referenced in the article. If anyone wants to test, analyze example data, or fit in a NMD distribution model, currently there is only one resources that provides that functionality, and it is available directly online. But, I do agree with you that, as I'm involved with this development, it'd be better if someone else were to add such a reference figure, if others fined it useful. Thanks. Iwaterpolo (talk) 20:54, 11 November 2009 (UTC)
Iwaterpolo, the analogy (like all analogies) is not perfect, but the point remains that the link struck the two of us as not necessary and even confusing. But the larger point is that you should have proposed adding these links on the discussion page instead of adding it. If you have more questions about COI, I think Atama has offered to answer them on your talk page. O18 (talk 01:37, 12 November 2009 (UTC)

request clarification of example

The example text now reads, "The Negative Multinomial distribution may be used to model the sites cancer rates and help measure some of the cancer type dependencies within each location." Why can it be used? What not just use the multinomial? What is the default case? O18 (talk 20:45, 10 November 2009 (UTC)

The 3 questions you pose are:
Why can NMD be used (as a model for this 2-factor Contingency tables)?
Varieties of models are appropriate for such Contingency tables. Examples include Chi-Square tests, Multinomial distribution, Fisher's exact tests, NMD inference, etc. NMD is certainly not an exclusive modeling distribution. It all depends on the use context.
Why not just use Multinomial distribution in this case?
Random variables with NMD and MD distributions have different interpretations: If ${\displaystyle X=\{X_{0},X_{1}\}\sim NMD}$, then the distribution of X represents likelihood of the total number of experiments (n) to observe ${\displaystyle x_{0}=k_{0}}$ and ${\displaystyle x_{1}=n-k_{0}}$ outcomes (n varies, n>0, ${\displaystyle k_{0}}$ is fixed). Where as if ${\displaystyle Y=\{Y_{0},Y_{1}\}\sim MD}$, then the distribution of Y represents likelihood of observing ${\displaystyle y_{0}=k_{0}}$ and ${\displaystyle y_{1}=n-k_{0}}$ outcomes, when n is fixed but ${\displaystyle k_{0}}$ varies!
What is the default case?
I may have missed the point here - not sure what default case refers to.
Iwaterpolo (talk) 21:30, 11 November 2009 (UTC)
Okay, maybe it would help if you gave a three case example of the NMD instead of the negative binomial example. I thought from the text on the page now that the default case was the case that had to get to k0, maybe I am mistaken. Also my question was, why choose the NMD, what is its advantage in this instance? It looks more like a case for MD to me, so that is why I was asking. The reason it looks like a case for the MD is that the total number is fixed, not the number in any one column. Am I missing something? O18 (talk 01:18, 12 November 2009 (UTC)
Okay, I read the article referenced and clarified the text on the article to agree with the reference (yes, you sample until X0 = k0). The following question remains, in the example, why is this a good model / a better model than the MD? O18 (talk 01:32, 12 November 2009 (UTC)

How many outcomes, m or m+1?

The introduction states:

Suppose we have an experiment that generates ${\displaystyle m\geq 1}$ possible outcomes, ${\displaystyle \{X_{0},\cdots ,X_{m}\}}$.

Looks like there are ${\displaystyle m+1}$ outcomes. Tayste (talk - contrib) 18:58, 7 December 2009 (UTC)

It is even worse than that. Sometimes the p vector includes p0 and sometimes it does not. Sometimes X0 is included and sometimes it is not. 018 (talk) 15:05, 8 December 2009 (UTC)
Okay, the ref makes it clear that all of the p vector and the X vector should be included in the parameters. 018 (talk) 15:13, 8 December 2009 (UTC)
Looks like the single bernoulli trial has m+1 possible outcomes, whereas the Negative multinomial r.v. itself is m-dimensional, since its “0th” component is predetermined to be equal to k0 and thus of little interest to us. I’d also tend to think that p0 should be excluded from the parameters, and instead we just have NMD(k0, p), with p0 = 1 − (p1+…+pm). In that way it looks more similar to the negative binomial distribution.  … stpasha »  16:54, 8 December 2009 (UTC)
Is there another text that disagrees with the reference provided? If not, I think we should go with the canonical definition 018 (talk) 18:21, 8 December 2009 (UTC)

Explain why the example is NMD?

What was the stopping criterion ${\displaystyle k_{0}}$ for data collection in the skin cancer example? Did they stop when they had reached exactly 68 Head and Neck observations? The round figure grand total of 400 is suspicious for NMD. I think the explanation needs to distinguish this clearly from an ordinary Multinomial situation. Tayste (talk - contrib) 19:30, 7 December 2009 (UTC)

This is the obvious question and the editor who added this text has not edited since I asked it about a month ago. I imagine it is supposed to be more flexible than the multinational distribution, not that the sampling dictated this. Perhaps we should try to find an example from the literature where they state more of this. 018 (talk) 15:01, 8 December 2009 (UTC)
See section below. Iwaterpolo (talk) 18:39, 29 November 2010 (UTC)
This doesn't really answer the question. Can you please try to answer the question. 018 (talk) 15:57, 30 November 2010 (UTC)
After spending a while pondering the example, it seems that NMD was selected because it has the desired property of positive correlations, and not because of applicability. 88.113.108.40 (talk) 06:08, 15 January 2015 (UTC)

Merge?

If “Polya distribution” is the same as the negative binomial, wouldn't the Multivariate Polya distribution be the same as the negative multinomial? If so then the two articles should be merged.  // stpasha »  20:01, 12 April 2010 (UTC)

Having glanced at the article, this is at the least a specific case. 018 (talk) 20:08, 12 April 2010 (UTC)
This is a good question, however, the Multivariate Polya distribution and the Negative multinomial distribution are quite distinct. In general, you can refer to each of the distribution descriptions to see why, but these differences are clear by looking at the univariate cases (1-dimension). The Beta-binomial distribution is the univariate case of Multivariate Polya distribution, whereas Negative binomial distribution is the univariate case of Negative multinomial distribution. These distribution articles should not be merged. Iwaterpolo (talk) 20:14, 29 November 2010 (UTC)
But my comment remains valid (as well as to your degenerate cases). The question is, does a degenerate case of another distribution deserve it's own page. I'd say yes, but we should note it. 018 (talk) 16:06, 30 November 2010 (UTC)

Content editing by non-experts

Please consider technical content edits of any Wikipedia article only if you have the necessary expertise. This should be obvious for most editors, however, occasionally it may be overlooked in our collective efforts to police the content (undoubtedly a critical component of the process of ensuring validity, notability and relevance of Wikipedia content). Several recent edits by User:O18 and User:Stpasha (e.g., 325357097 and 330324768) introduced a number of technical errors in the description and interpretation of the Negative Multinomial Distribution article. Please refrain from modifying (or trying to simplify) technical content you are not absolutely certain about or may not have the necessary expertise, and either solicit expert review/opinions or use the discussion pages, as appropriate. Some of these prior edits now require a complete review of the history and appropriate revisions by experts. One notable example of a recent erroneous revision replaced the critical "\{X_0, X_1, \cdots, X_m\} occur exactly \{k_0, k_1, \cdots, k_m\} times" statement with a simpler (but incorrect) "until n observations are made, then \{X_0,\cdots,X_m\}" (there are no n-observations for NMD, this is a common confusion with Negative Binomial Distribution. This confusion is also clear from some of the rudimentary questions asked earlier in this talk page. Many of your revisions have significantly improved the article, however, if you are uncertain about some details (e.g., "I am not totally able to understand this, I hope this makes it clearer and is right.") it may be better to avoid editing and rely on subject experts. Thanks. Iwaterpolo (talk) 18:39, 29 November 2010 (UTC)

It would help if those who consider themselves expert could: (i) bother to give adequate citations for supposed results; (ii) use standard sources in preference to, or as well as, obscure journal articles (where available); (iii) follow Wikipedia conventions for things like capitalization and article formatting; (iv) avoid unexplained notation, particularly where this conficts with notation used in the related Wikipedia articles (the supposed characteristic function is a case in point). Melcombe (talk) 11:11, 30 November 2010 (UTC)
Iwaterpolo, in the edit you linked above I wrote in the comment, "update definition to agree with reference." I think that making the page agree with the reference is generally a good thing. IN this case, section 2, first paragraph, last sentence in the paragraph of Le Gall 2006 reads,

If independant trials are repeated until E0 occurs exactly k0 times, the numbers of occurrences (k1,…,kr) of {E1,…,Er} during these trials have a NMD(k0,p) with parameters k0 and p (Kotz and Johnson, 1982): ...

Here, I will point out that not only is that the definition in the reference, but the reference makes a reference for that being the definition--suggesting it is well agreed upon.
I think that instead of saying someone is an expert or not, we should focus on the content of the page. 018 (talk) 15:56, 30 November 2010 (UTC)
Absolutely right, focus should be on content - and as discussed here, content means correctness and confidence in our collective contributions. Iwaterpolo (talk) 19:29, 30 November 2010 (UTC)
Lets talk about the article on the article's talk page. 018 (talk) 19:54, 30 November 2010 (UTC)

cleanup tag

There are many outstanding questions on how the example relates to the NMD (see above). This article needs cleanup until these are addressed. 018 (talk) 20:21, 7 December 2010 (UTC)

reference to Mathematica

The reference to Mathematica is arbitrary. Any CAS can solve a cubic equation, so a mention of a particular one is a kind of advertisement. In particular, choosing a 3000\$ software for a menial task is a bad idea for the free enciclyopedia: how can I check its correctness?!? — Preceding unsigned comment added by 90.164.109.129 (talk) 16:08, 13 May 2012 (UTC)