# Talk:Negative binomial distribution

WikiProject Statistics (Rated B-class, Mid-importance)

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

B  This article has been rated as B-Class on the quality scale.
Mid  This article has been rated as Mid-importance on the importance scale.

## Negative binomial regression

There isn't yet an article on negative binomial regression in Wikipedia: maybe I'll write one if I get the time. This application of the distribution uses a reparameterization in terms of the mean and dispersion, so I have added a bullet point to clarify what this form is, and the various terms used, with some additional references. I moved Joe Hilbe's book into the references (updating to the second edition), and so deleted the section on additional reading Peterwlane (talk) 06:30, 30 May 2013 (UTC)

## What's up with the pmf?

Wolfram mathworld, as well as the statistics textbooks I've consulted, list the pmf as having k+r-1 choose k-1, but in this page it is consistently choose r. Why is this? I can't find any difference in convention. In all cases p is probability of success and k is the desired number of failures. I would really like an explanation because as it stands I perceive it as an error. http://mathworld.wolfram.com/NegativeBinomialDistribution.html — Preceding unsigned comment added by Doublepluswit (talkcontribs) 18:17, 7 May 2013 (UTC)

On this page, k is the number of successes, not the number of failures, while on the Wolfram page, r-1 is the number of successes. Notice that k can be zero, but r cannot. This leads to the valid difference. I would be interested as to the reasoning for using the convention that is used on this page, however. Wolfram's/Ross's convention is more natural and convenient from my perspective at least. Machi4velli (talk) 09:31, 1 July 2013 (UTC)

## Example with real-valued r

In the case of an integer valued r one may correctly write:

In probability theory and statistics, the negative binomial distribution is a discrete probability distribution of the number of successes in a sequence of Bernoulli trials before a specified (non-random) number r of failures occurs. For example, if one throws a die repeatedly until the third time “1” appears, then the probability distribution of the number of non-“1”s that had appeared will be negative binomial.

How would this example be in the case of a real valued r?

## Beta negative binomial mixture

Would not it be a good idea to discuss also the beta negative binomial mixture (see among others Wang, 2011)?

Reference

Wang, Z. (2011). One Mixed Negative Binomial Distribution with Application. Journal of Statistical Planning and Inference, 141, 1153-1160. — Preceding unsigned comment added by Ad van der Ven (talkcontribs) 10:34, 17 August 2011 (UTC)

## Wrong p in Sampling and point estimation of p?

The formula in the section "Sampling and point estimation of p" seems to give the probability of failure, which is not the definition we're using. For example, if you observe k=0, you saw no successes and k failures, so the probability of success (p) should be low. But the formula gives p=1. Should it be changed to k / (r + k) ? Martin (talk) 15:03, 28 November 2010 (UTC)

## German version is better

I can barely read any German, yet the article here made more sense than this one...74.59.244.25 (talk) 03:26, 18 February 2008 (UTC)

I'm just noticing this comment now. I'll look at the German version. Michael Hardy (talk) 17:48, 14 April 2010 (UTC)

## request for introduction

This article needs a proper introduction, that can help a layman understand what the term means, what it entails when used in text or conversation. Currently, this is not feasible, you'd have to scroll down a long ways and start reading the examples to even begin to understand; if you had no previous knowledge of mathematics or statistics at all. I'm putting this at the top, as I think it's more vital issue than any concerning the mathemathical/statistical content of the page. Starting with

In probability and statistics the negative binomial distribution is a discrete probability distribution.

Does not explain what Negative Binomial Distribution is - what separates it from other discrete probability distributions. I personally think this should be attempted as highest priority, obviously I'm not able to do it (or I wouldn't be writing here, eh). Assuming anyone (not to mentione everyone) is able to understand mathematical formulas that incorporate greek letters is IMHO pedagogically unsound --Asherett 12:17, 13 September 2007 (UTC)

Well, obviously the statement that "In probability and statistics the negative binomial distribution is a discrete probability distribution" does not way WHICH discrete probability distribution it is---that comes later in the article. As for making it clear to someone who knows NOTHING AT ALL about mathematics or statistics: that may not be so easy. Perhaps making it clear to a broader audience can be done, with some effort, though. Michael Hardy 19:15, 13 September 2007 (UTC)

## reversion

I have reverted the most recent edit to negative binomial distribution for the following reason.

• Sometimes one defines the negative binomial distribution to be the distribution of the number of failures before the rth success. In that case, the statement that the expected value is r(1 − p)/p is correct.
• But sometimes, and in particular in the present article, one defines it to be the distribution of the number of trials needed to get r successes. In that case, the statement is wrong.

If you're going to edit one part of the article to be consistent with the former definition, you need to be consistent and change the definition. Michael Hardy 17:40, 7 Jul 2004 (UTC)

## etymology

shouldn't there be a sentence or two saying why this is name negative binomial and what it has anything to do with binomial, especially for layman. —Preceding unsigned comment added by 164.67.59.174 (talk) 19:17, 2 September 2009 (UTC)

## Equivalence?

If Xr is the number of trials needed to get r successes, and Ys is the number of successes in s trials, then

$\operatorname{Pr}(X_r \leq s)=\operatorname{Pr}(Y_s \geq r).$

The article went from there to say the following:

Every question about probabilities of negative binomial variables can be translated into an equivalent one about binomial variables.

I removed it. I tentatively propose this as a counterexample: Suppose Wr is the number of failures before the r successes have been achieved. Then Wr has a negative binomial distribution according to the second convention in this article, and it is clear that this distribution is just the negative binomial distribution according to the first convention, translated r units to the left. This probability distribution is infinitely divisible, a fact now explained in the article. That means that for any positive integer m, no matter how big, there is some probability distribution F such that if U1, ..., Um are random variables distributed according to F, then U1 + ... + Um has the same distribution that Wr has.

So how can the question of whether the negative binomial distribution is infinitely divisible be "translated into an equivalent one about binomial variables"? Michael Hardy 01:43, 27 Aug 2004 (UTC)

Removing the bit about "every question" seems OK to me; the important point is the relation between binomial and negative binomial probabilities. But Mike, it wasn't put in there for the purpose of annoying you. You might consider using the edit summary to say something about the edit rather than your state of mind -- how about rm questionable claim about "every question" instead of I am removing a statement that has long irritated me. Wile E. Heresiarch 15:23, 6 Nov 2004 (UTC)

## Major reorganization

Trying to be bold, I've just committed several major changes. I found the previous version somewhat confusing, since it talked about three slightly different but closely related "conventions" for the negative binomial, and it never became fully clear to me which convention was in use at which point in the subsequent discussion. I've replaced the definition with what I consider to be the most natural version (the previous convention #3). The reasons that definition is "natural" is that it arises naturally as the Gamma-Poisson mixture, converges-in-distribution to the Poisson, etc. The shifted negative binomial (previous convention #1) can still be derived (see the worked example of the candy pusher). Now we have a single, consistent (hopefully!) definition of the negative binomial instead of three similar-yet-different conventions. I'm painfully aware that all of the previous three conventions are in use and sometimes referred to as the negative binomial; but then again, that doesn't even begin to exhaust the variations on this distribution that can be found in the wild, so why not pick one reasonble definition and stick to that here? --MarkSweep 12:04, 5 Nov 2004 (UTC)

Well, if we were writing a textbook, we would certainly want to pick one defn and stick to it. However, we're here to document stuff as it is used by others. If there are multiple defns in common use, I don't see that we have the option to pick and choose. Sometimes multiple defns can be collapsed by saying "#2 is a special case of #1 with A always a blurfle" and then describing only #1. I don't know if that's feasible here. Regards & happy editing, Wile E. Heresiarch 15:09, 6 Nov 2004 (UTC)
Yes, that was basically the case here. The previous "convention #2" was the Pascal distribution, which is a special case of the general negative binomial (previous "convention #3"). This didn't become fully clear in the previous revision, where the discussion of the Pascal distribution seemed more like an afterthought. The previous "convention #1" appeared to be simply a Pascal distribution shifted by a fixed amount. There is still a discussion of that in the worked example, but that could arguably be moved to the front and made more explicit. --MarkSweep 23:25, 6 Nov 2004 (UTC)
Hi, just found this page and I don't like that the starting point is the more general formula that has r being a strictly positive real. I think that 99% of the time somebody is interested in this distribution, r is going to be an integer. Which isn't to say that we should purge this more complete definition, just that there is a lot to be said for following the way the present article on the Binomial distribution is written (since this is closely related) and because that one is a heck of a lot clearer. I would suggest using one variable where r is an integer and a seperate variable where it is a real (to keep them straight). Along the same lines, I also think that starting talking about Bernoulli trials so far down the page is not a good idea--I'd like to see it up top. Is this what you two are talking about? Oh, wait, those dates are 2004! oh well, I'll still wait to see if anyone cares b/c this is a big edit. --O18 07:13, 9 November 2005 (UTC)
I support the previous comment. I am a graduating maths/computer science student, but the first definition was absolutely non-intuitive for me and only the "Occurrence" section made it clear. I doubt whether the generalization is more important than the fact that this distribution is derived from the Pascal distribution. —The preceding unsigned comment was added by 85.206.197.19 (talk) 20:10, 4 May 2007 (UTC).

## Plots?

Is it possible to get some plots of what this looks like? I got sent here from the mosquito page, and anyone reading that probably doesn't want to wade through many lines of math, just see a picture of what it means. --zandperl 04:10, 30 August 2005 (UTC)

• One year later, exactly the same issue. Remarkably, the mosquito page still links here, but there's no plot. Anyone?

Sketch-The-Fox 23:21, 19 August 2006 (UTC)

The datapoints and datalines in the animated plot are all mistakenly right-shifted by one. The support begins at k=0, not k=1. The bar charts in some of the other languages (Spanish, Arabic, French, Polish, Slovenian, Turkish, Chinese) are ambiguous, since each bar extends a full unit, so perhaps the creator of the animated plot misinterpreted which side of the bars to assign the values to. The correct values are:

μ=10, r=1, p=0.909091: {{0, 0.0909091}, {1, 0.0826446}, {2, 0.0751315}, {3, 0.0683013}, {4, 0.0620921}, {5, 0.0564474}, {6, 0.0513158}, {7, 0.0466507}, {8, 0.0424098}, {9, 0.0385543}, {10, 0.0350494}, {11, 0.0318631}, {12, 0.0289664}, {13, 0.0263331}, {14, 0.0239392}, {15, 0.0217629}, {16, 0.0197845}, {17, 0.0179859}, {18, 0.0163508}, {19, 0.0148644}, {20, 0.0135131}, {21, 0.0122846}, {22, 0.0111678}, {23, 0.0101526}, {24, 0.0092296}}

μ=10, r=2, p=0.833333: {{0, 0.0277778}, {1, 0.0462963}, {2, 0.0578704}, {3, 0.0643004}, {4, 0.0669796}, {5, 0.0669796}, {6, 0.0651191}, {7, 0.0620181}, {8, 0.058142}, {9, 0.0538352}, {10, 0.0493489}, {11, 0.0448627}, {12, 0.040501}, {13, 0.0363471}, {14, 0.0324527}, {15, 0.0288469}, {16, 0.0255415}, {17, 0.0225366}, {18, 0.0198239}, {19, 0.0173894}, {20, 0.0152157}, {21, 0.0132835}, {22, 0.0115728}, {23, 0.0100633}, {24, 0.0087355}}

μ=10, r=3, p=0.769231: {{0, 0.0122895}, {1, 0.0283604}, {2, 0.0436313}, {3, 0.0559376}, {4, 0.0645434}, {5, 0.0695082}, {6, 0.0712905}, {7, 0.0705071}, {8, 0.0677953}, {9, 0.0637391}, {10, 0.0588361}, {11, 0.0534874}, {12, 0.0480015}, {13, 0.0426049}, {14, 0.0374548}, {15, 0.0326529}, {16, 0.0282574}, {17, 0.0242937}, {18, 0.0207638}, {19, 0.0176534}, {20, 0.0149375}, {21, 0.0125847}, {22, 0.0105606}, {23, 0.00882994}, {24, 0.00735829}}

μ=10, r=4, p=0.714286: {{0, 0.00666389}, {1, 0.0190397}, {2, 0.0339994}, {3, 0.0485706}, {4, 0.0607133}, {5, 0.0693866}, {6, 0.0743428}, {7, 0.07586}, {8, 0.0745054}, {9, 0.0709575}, {10, 0.0658891}, {11, 0.0598992}, {12, 0.0534814}, {13, 0.0470166}, {14, 0.0407797}, {15, 0.034954}, {16, 0.0296485}, {17, 0.0249147}, {18, 0.0207623}, {19, 0.0171718}, {20, 0.0141054}, {21, 0.0115146}, {22, 0.00934628}, {23, 0.00754669}, {24, 0.0060643}}

μ=10, r=5, p=0.666667: {{0, 0.00411523}, {1, 0.0137174}, {2, 0.0274348}, {3, 0.0426764}, {4, 0.0569019}, {5, 0.0682823}, {6, 0.0758692}, {7, 0.079482}, {8, 0.079482}, {9, 0.0765382}, {10, 0.0714357}, {11, 0.0649415}, {12, 0.0577258}, {13, 0.0503251}, {14, 0.0431358}, {15, 0.0364258}, {16, 0.0303548}, {17, 0.0249981}, {18, 0.0203688}, {19, 0.016438}, {20, 0.0131504}, {21, 0.0104368}, {22, 0.00822294}, {23, 0.00643535}, {24, 0.00500527}}

μ=10, r=10, p=0.5: {{0, 0.000976562}, {1, 0.00488281}, {2, 0.0134277}, {3, 0.0268555}, {4, 0.0436401}, {5, 0.0610962}, {6, 0.0763702}, {7, 0.0872803}, {8, 0.0927353}, {9, 0.0927353}, {10, 0.0880985}, {11, 0.0800896}, {12, 0.0700784}, {13, 0.0592971}, {14, 0.0487083}, {15, 0.0389667}, {16, 0.0304427}, {17, 0.0232797}, {18, 0.0174598}, {19, 0.0128651}, {20, 0.0093272}, {21, 0.00666229}, {22, 0.00469388}, {23, 0.00326531}, {24, 0.0022449}}

μ=10, r=20, p=0.333333: {{0, 0.000300729}, {1, 0.00200486}, {2, 0.007017}, {3, 0.0171527}, {4, 0.032876}, {5, 0.0526015}, {6, 0.0730577}, {7, 0.0904524}, {8, 0.101759}, {9, 0.105528}, {10, 0.10201}, {11, 0.0927365}, {12, 0.0798564}, {13, 0.0655232}, {14, 0.0514825}, {15, 0.0388979}, {16, 0.0283631}, {17, 0.020021}, {18, 0.0137181}, {19, 0.00914539}, {20, 0.0059445}, {21, 0.00377429}, {22, 0.00234463}, {23, 0.00142717}, {24, 0.000852336}}

μ=10, r=40, p=0.2: {{0, 0.000132923}, {1, 0.00106338}, {2, 0.00435987}, {3, 0.0122076}, {4, 0.0262464}, {5, 0.0461937}, {6, 0.0692905}, {7, 0.0910675}, {8, 0.107004}, {9, 0.114138}, {10, 0.111855}, {11, 0.101687}, {12, 0.0864336}, {13, 0.0691469}, {14, 0.052354}, {15, 0.0376949}, {16, 0.0259153}, {17, 0.0170736}, {18, 0.0108133}, {19, 0.00660178}, {20, 0.00389505}, {21, 0.00222574}, {22, 0.00123428}, {23, 0.000665436}, {24, 0.000349354}}

This is visually confirmed in the last two frames of the animation, where the mean of the lopsided plot is obviously to the right of the intended mean of 10.

Note that the Italian version also uses this animated plot. AndreasWittenstein (talk) 17:30, 4 February 2011 (UTC)

Fixed.  // stpasha »  23:06, 4 February 2011 (UTC)

Wow, that was quick! Thanks, Pasha. AndreasWittenstein (talk) 00:07, 6 February 2011 (UTC)

## the mean is wrong

should be (1-p)r/p, surely

UM According to 'A First Course in Probability' by Sheldon Ross, the mean is r/p

Wrong. Look, how many times do so many of us have to keep repeating this? Sheldon Ross's book CORRECTLY gives the mean of what Sheldon Ross's book calls the negative binomial distribution. But there are (as this article explains) at least two conventions concerning WHICH distribution should be called that. Sheesh. Michael Hardy 21:42, 29 November 2006 (UTC)

Correct mean and variance. The mean for the distribution as defined on the page should be r*(1-p)/p, and the variance should be r*(1-p)/p^2. An easy way to verify these are correct is to plot them together with the pmf (using the same values for r and p). —Preceding unsigned comment added by Cstein (talkcontribs) 12:41, 15 June 2010 (UTC)

Please check again! Other sources, e.g., Wolfram Alpha and the German article, also say that the mean is r*(1-p)/p, but they use a different p. If you define

$f(k) = {k+r-1 \choose r-1} (1-p)^r p^k$

then the mean is r*p/(1-p), and the variance r*p/(1-p)^2. Gogol-Döring (talk) 09:44, 21 July 2010 (UTC)

If p is the positive probability, as the page states, then the mean is r*(1-p)/p. This needs to be fixed! — Preceding unsigned comment added by 71.163.43.88 (talk) 21:53, 13 March 2013 (UTC)

— The book I use is "Statistical Distributions, 2nd Edition" by Evans, Hastings, and Peacock. They define r as the number of successes, and p as P(success). They also define q=(1-p) which shortens all the formulas. They say the mean is rq/p, and the variance is mean/p. This makes sense to me. Suppose "success" is "being a genius". Suppose p is 10^-6 or one in a million. That means if you want r geniuses, you need about r/10^-6 = r × 10^6 = r million people. So the smaller p is, the bigger the mean has to be. And of course, the smaller p is, the less relevant q is, because it's basically one.

I can see that if you say you're looking for r failures, rather than r successes, you could get what this article says.

MikeDunlavey (talk) 14:01, 11 April 2015 (UTC)

## the mgf is wrong

The numerator should be pe^t instead of p. The following link can support this http://www.math.tntech.edu/ISR/Introduction_to_Probability/Discrete_Distributions/thispage/newnode10.html

The bottom of that page gives the mgf of negative binomial distribution. I verified it. —Preceding unsigned comment added by 136.142.163.158 (talkcontribs)

{ 0, 1, 2, 3, ... }
whereas the one on the web page you cite is supported on the set
{ r, r + 1, r + 2, .... }

## Use of gamma function for a discrete distribution

Is it the convention among probability literature to represent the negative binomial with the gamma function? In Sheldon Ross's introductory text, the distribution is introduced without it (although that is an alternative representation of the distribution). I am not objecting but as a beginner am curious why this is how it is represented. --reddaly

I think either adding this way of writing it: $\binom{n-1}{r-1}p^{r}(1-p)^{n-r}$, or specifying that $\Gamma(x + 1) = x!$ would be beneficial. some people start running when they see the gamma function

Good idea. It would be easier on the eyes for those who haven't yet discovered how to love the Γ function. Aastrup 22:24, 18 July 2007 (UTC)

## Expected Value derivation

The classic derivation of the mean of the NBD should be on this page, as it is on the binomial distribution page. --Vince |Talk| 04:44, 12 May 2007 (UTC)

I agree. Aastrup 22:24, 18 July 2007 (UTC)

## MLE

This article lacks Maximum Likelihood, and especially Anscombe's Conjecture (which has been proven). Aastrup 22:24, 18 July 2007 (UTC)

## overdispersed Poisson

I recently added a note about how the Poisson distribution with a dispersion parameter is more general than the negative binomial distribution and would make more sense when one is simply looking for a Poisson distribution with a dispersion parameter. I think it's important to realize that the Poisson distribution with a dispersion parameter described by M&N is more general in that the variance has positive support instead of the more limited greater support than the mean. There certainly are situation where the negative binomial distribution makes sense, but if one is just looking for a Poisson with a dispersion parameter, why beat around the bush with this other distribution and not just go for the real thing? O18 (talk) 17:38, 26 January 2008 (UTC)

There is no such thing as "overdispersed Poisson", because if it is overdispersed, then it is not Poisson. If "the Poisson distribution with a dispersion parameter described by M&N" is important, then go ahead and describe it in some other article, perhaps in a new article. This article is about the negative binomial distribution only. The (positive) binomial distributions have variance < mean, and the Poisson distribution has variance = mean, and the negative binomial distribution has variance > mean. Bo Jacoby (talk) 22:36, 26 January 2008 (UTC).

## First paragraph

Among several objections I have to the edits done on April 1st by 128.103.233.11, is this: the rest of the article is about the distribution of the number of failures before the rth success, not about the one that counts the number of trials up to and including the rth success. Thus, in the experiment that that user described, the distribution should have started at 0, not at 2. This matters because (1) we want to include the case where r is not an integer, because (2) we want to be able to see the infinite divisibility of this distribution. Michael Hardy (talk) 16:23, 4 April 2009 (UTC)

## Trials up to rth success

This page should be updated to include a column in the side table for the version of the negative binomial for "numbers of trials to rth success". This is the most intuitive, if not the most common, version of this distribution. It answers the question "how many batches should I run if I want r success." I think the page on the geometric distribution handles this nicely, there is no reason the exact analog cannot be done here. Until this is done, I predict endless waves of people claiming that the mean is r/p. As it is, this page is currently unreadable. Formivore (talk) 22:05, 6 April 2009 (UTC)

Yeah, what situation is the distribution as described on this page useful for? I've only ever encountered the NB distribution that is "numbers of trials to rth success". O18 (talk) 05:39, 26 July 2009 (UTC)
Really, what's so difficult about this? The negative binomial distribution builds upon a sequence of Bernoulli trials. Each trial has a binary outcome: two possibilities. The words “success” and “failure” are just labels we arbitrarily attach to those 2 outcomes. Say, if your trials consist of flipping the coin, would you call Heads the “success” or Tails? If the trial consists of people voting for a democratic or a republican party, which one should be called the success (okay, you might have a personal opinion on this account :)? If the trial is a survey question with answers Yes/No — which one is success? and so on...
“Numbers of trials to rth failure” is just as valid interpretation as the opposite one. For example, suppose in a hospital a doctor gets fired after the 3rd patient who dies from his error. “A patient dying” we'll call the failure (well it would be awkward to call it a success). So how many patients will the doctor have until he gets fired — that would be our negative binomial distribution?  // stpasha »  20:47, 15 April 2010 (UTC)
stpasha, I think you just highlighted the point. The distribution on the page is the number successes before the rth failure. But you said, the number of patients. For the distribution on the page, it would be the number of patients that don't die until the doctor gets fired. But that is a much less natural parameterization. Think about a manufacturing process--you want to know how many widgets you have to make before you get, say three that work. The distribution on the page would how many bad widgets you have to make before you get 3 good ones, but you really want to know how many total widgets you have to make. 018 (talk) 23:12, 15 April 2010 (UTC)

The definition on this page ("the number of successes in a sequence of Bernoulli trials before a specified (non-random) number of failures") is not one I can recall seeing in any probability and statistics textbook. I have about 30 on my bookshelf; of the six I sampled, five defined NB(r,p) as the number of trials before the rth success, and one defined it as the number of failures before the r success. None used the definition of this page. One of the "classic" probability text (Feller, An Introduction to Probability Theory and its Applications, vol. 1, page 165) uses the number-of-failures definition. I too would suggest that this page describe those two competing definitions (similar to what is done on the Geometric distribution page). The current definition should either be scrapped, or (if someone can point out a source that uses that definition) perhaps retained as an alternate definition in a separate section. DarrylNester (talk) 16:30, 18 February 2013 (UTC)

## Little match girl

In the setup of TLMG, Pat must empty her box of matches or face child abuse. In Dr. Evans' example, Pat must empty her box of candy bars or face child abuse. What is the probability that Pat freezes to death? --Damian Yerrick (talk | stalk) 13:10, 25 July 2009 (UTC)
So, in explaining about Fisher's exact test, do you think it would be inappropriate to add a link to the problem of adding the milk while the tea is still steeping (were such an article to exist)? In one sense it is not apropos, in another, it is just part of the canon regardless of how interesting it looks when you don't know the history. O18 (talk) 20:12, 25 July 2009 (UTC)
I think that the link firstly is far to tenuous, both myself and an IP have no idea what you are on about w.r.t. the link. Secondly, I would remind you of WP:EGG (no easter egg links). Finally, should we then link integer to Hansel and gretel, temperature, porridge and bed to little red riding hood? Follicle_(anatomy) to Rapunzel ? I consider the links no more bizzare than this.User A1 (talk) 01:45, 26 July 2009 (UTC)
Just to be clear, I believe Damian is being sarcastic? User A1 (talk) 01:47, 26 July 2009 (UTC)
Sorry, still going: in explaining about Fisher's exact test, do you think it would be inappropriate to add a link to the problem of adding the milk while the tea is still steeping. Sure that's fine, as it is a good example of the applicability of the mathematics, but I wouldn't then link that to waltzing matilda, on the pretext that the swagman steeps his tea. User A1 (talk) 01:49, 26 July 2009 (UTC)
After using google for a while, it appears that the little match girl, negative binomial link was only on website that use the text as it used to appear in this article. I don't understand why you couldn't see why the link was related, but given that it is 2 to 1, and the 1 doesn't really care, I say lets just ax it and be done. O18 (talk) 05:30, 26 July 2009 (UTC)

## Major Changes

I have added the alternate formulation of the negative binomial that describes the probability of k **trials** given r sucesses to the side table and to the body section describing the pmf. This presentation of both formulations follows e.g. Casella, and I believe is justified both by the record of this talk page as well as by theoretical considerations. While the trials to r sucesses formulation has some disadvantages (parameter-dependent support) it has the big advantage actually being the waiting time distribution of a Bernoulli process. The two-column side table was taken from the page on the geometric distribution; in fact the two geometric distributions are just the cases of the two neg. binomials with r=1. If it's worth doing there (where the difference is a factor of (1-p) fer cryin' out loud), it's worth doing here.

I have not modified any other sections. I believe everything else on the page is still valid after this change (since the original pmf is still there). Some of it may now not be needed and could be removed. If anyone has cleaner way of doing this presentation (which is a bit clunky) go ahead. However, I would appreciate it if this change was not reverted without a good argument against it. Formivore (talk) 23:44, 16 October 2009 (UTC)

I believe the second formulation (the number of trials before r-th success) should be removed as a second column of the infobox. A person who doesn’t know what the NB distribution is and comes to this page, will likely to get confused by the fact that there seems to be two different(?) distributions by the same name, and will never realize that they only differ up to a shift by a constant r. Btw Casella starts with the informal description of what is called “2nd formulation” here, but later on redefines it into our “1st formulation” and says that “unless specified otherwise, later on we will always be understanding this definition when we use term ‘negative binomial’”.
If we leave only one definition (leaving the other one as a short subsection describing the differences in the alternative formulation), it has following advantages: (1) the reader will never get confused regarding which definition is used on the page, (2) this definition can be properly generalized to the negative multinomial distribution, (3) this definition is infinitely divisible, arises as a mixture of gamma-poisson, and other things mentioned on this talk page.
It will also be beneficial to recast the parameter p as the probability of failure not of success (or alternatively, to swap around what we consider failures and what are successes here). E.g. we may define the NB distribution as “probability of having k=0,1,2,… successes before a fixed number r of failures occur”? That way the definition sounds more naturally, and extends to the multinomial case gracefully. stpasha » 10:44, 8 December 2009 (UTC)
Both of these are couched it terms of trials and r appears in the "choose" function, but r is stated to be a real. Maybe we should start simple and then get more complicated later? What loss is there to having r be an integer and then having a section that allows for otherwise and then states the pdf with gamma functions (I'm assuming that is what takes the place of the choose). 018 (talk) 14:50, 8 December 2009 (UTC)
I have to disagree Stpasha. If someone comes to this page not knowing what the NB is, there is a good chance they will have the wrong distribution in mind leading to more confusion, not less. Roughly half of this talk page is taken up with confusions of this sort. Maybe it's not best to have the double side table, but there should at least be a very clear explanation at the top of the article of the two formulations.
Successes and failures are defined as they are to generalize the geometric distribution. This is a more important analogy that the negative multinomial, which is fairly obscure. That said I don't see how one way is more natural than the other for the NB. Formivore (talk) 07:42, 12 December 2009 (UTC)
Well, the “success” and “failure” are just arbitrary labels we assign to two possible outcomes of a Bernoulli trial. Say, if we consider an individual who has small chance p of dying in each day (so that the lifespan has geometrical distribution), then the event of his death will be called “success”.
In order to have consistency we might as well reparametrize the geometric distribution as well, so that its pmf is f(k) = pk−1(1−p). This expression actually looks simpler than the f(k) = p(1−p)k−1 (although of course they are quite the same). stpasha » 12:36, 12 December 2009 (UTC)

Looking at this article is looking at social failure. All kind of flotsam has accumulated. Regardless of whether the side table should have one or two columns, this article should be revised to remove redundancies and sections that are not notable. I'd propose the following changes:
1)Move the "Limiting Case" and "Gamma-Poisson mixture" subsections further down in the article. I don't know if the parameterization used to arrive at the limit is broadly applicable, or if it is only used for this derivation. If the former is true, then this should be explained. Otherwise this should be moved to the "Related Distributions" section. The mixture derivation does not describe a specification at all and should be moved to the "Occurrence" section. This section should just describe what this distribution is, that's it.
2)The "Relation to other distributions" subsections describes, in a derivation involving the incomplete gamma function, the k trials to r successes stuff that the wrangling has been about. A good explication at the beginning of the article will obviate this section.
3)The "Example" at the end of the article is unnecessary and poorly written. There is also a much shorter example in the "Waiting time in a Bernoulli process" subsection that does not involve candy bars. Formivore (talk) 08:59, 12 December 2009 (UTC)

The article titled geometric distribution has two columns: one for the number of trials before the first success, and one of the number of trials including the first success.

It was necessary to do that because before it appeared that way, idiots wreckless irresponsible editors kept coming along saying "I CAN'T BELIEVE THIS ARTICLE MAKES SUCH A CLUMSY MISTAKE!!!! MY TEXTBOOK SAYS....." and then recording information that's correct for one of the two distributions and wrong for the other, and failing to notice that there are two of them, even though the article clearly said so.

We cannot omit the negative binomial distribution of the number of trials before the rth success because

• That's the one that's infinitely divisible;
• That's the one that arises as a compound Poisson distribution;
• That's the one that allows r to be real rather than necessarily and integer.

Michael Hardy (talk) 19:55, 12 December 2009 (UTC)

Ok it seems like it’s either me, or Michael (or both) are confused here. Which only reinforces the point that the entire situation is utterly befuddling. The first column is not the “number of trials before the r-th success”, but rather the number of “failures” before the r-th success. So the difference between two columns is not in before/including, but rather whether we count only the failures, or both the failures and the successes. The two definitions differ by a shift constant r, so it's no biggie.
Oh, and I'm not saying we should omit the definition of negative binomial as the number of failures before the rth success, that's the one I'm suggesting to keep, while the other one to scratch out (the one whose support starts from r). stpasha » 09:36, 13 December 2009 (UTC)

OK, I haven't look at this discussion for a while. I was hasty with language; what I meant was:

• One distribution is that of the number of trials needed to get a specified number of successes; and
• One distribution is that of the number of failures before a specified number of successes.

The latter allows the "specified number" to be a non-integer, and is infinitely divisible. If we're going to keep only one, it should be that one. Michael Hardy (talk) 03:16, 21 December 2009 (UTC)

Michael, I think that would be a great idea for a text book, but I would rather see the page be, well, encyclopedic in its coverage. One thing I think is certain, if we want to state the non-integer case, it should be in another section, not in the bar on the right. 018 (talk) 16:49, 21 December 2009 (UTC)

I never said we should have a "bar" for the specifically non-integer case. But we should have one for the case that's supported on {0, 1, 2, ...}. And it should state a parameter space that includes non-integers. Somewhere in the text of the article that should be explained (possibly in its own section). Michael Hardy (talk) 20:48, 21 December 2009 (UTC)

## More dumbing down needed?

The recent edits by user:24.127.43.26 and by user:Phantomofthesea make me wonder if we need to dumb this down again to rid ourselves of irresponsible editors who edit without paying attention to what they're reading or what they're writing. Michael Hardy (talk) 02:52, 19 February 2010 (UTC)

Maybe we should reject the “success/failure” terminology altogether, and instead use something more neutral, like “0/1”. That way whenever a person reads this page he/she would have to stop for 3 seconds and think how our 0/1 maps to his/her textbook’s success/failure.  // stpasha »  20:51, 15 April 2010 (UTC)

## a question

I don't want to mess with the entry, but according to Casella & Berger, the pmf listed here is incorrect. The p and (1-p) are switched. It should be (p^r)(1-p)^k. I haven't looked through to see how that mistake affects the rest of the article, if at all, so I'll leave it to someone with more knowledge of this article than me to correct. —Preceding unsigned comment added by 128.186.4.160 (talkcontribs)

Sigh............. not this comment again. The article says:
Different texts adopt slightly different definitions for the negative binomial distribution.
OK? You need to read what it says!. Michael Hardy (talk) 21:25, 13 April 2010 (UTC)

Ok, I read it more carefully and I concede that what is written here is technically correct. However why not just stick with the Casella Berger definition on here? I'd argue that the Casella/Berger book is the most widely used of its kind, so defining the pmf this way is just confusing to most people. —Preceding unsigned comment added by 68.42.50.243 (talkcontribs) 21:00, 13 April 2010

I don't see where this decision was discussed above. In the first instance that I see of such a discussion Michael Hardy is saying it has happened before, so I guess there must be an unlinked archive? 018 (talk) 02:58, 14 April 2010 (UTC)
Sorry, I've now looked at this talk page and see that it looks to me like MH pointed out that any change would require the entire page be changed in the section titled, " reversion". Since then in the section, "Trials up to rth success" Formivore correctly predicts endless waves of people correcting it because of the more intuitive interpretation of the alternative specification. There has also been a more lengthy discussion in the section titled "Major Changes" where Formivore tries to update the article and describes it as being in disarray and confusing. Formivore appears to have given up. In the end MH likes that one can make a (somehow useful?) change of support for one of the parameters for the less intuitive parameterization. 018 (talk) 04:47, 14 April 2010 (UTC)
To focus my ramble into a question, what is the value of being able to treat r as real and not just integer valued? Why do we care? Also, even if we do want this, might it make more sense to give that formulation a separate section that starts by reparameterizing, showing the new cdf/pdf and then explaining why it is useful. 018 (talk) 17:02, 14 April 2010 (UTC)

We want to treat r as real because it shows that this is an infinitely divisible distribution and that there's a corresponding Levy process. Michael Hardy (talk) 17:46, 14 April 2010 (UTC)

Okay, so (1) Why is there no mention of "Levy process", and (2) why does this trump the overall understandability of the article? Would you agree that this parameterization could be moved it its own section? 018 (talk) 18:09, 14 April 2010 (UTC)

## an idea

We could put the whole box in a transcluded page to make it somewhat more difficult to quickly edit it. It is a bit extreme, but there have been many well intentioned (if somewhat inattentive) incorrect edits to it. 018 (talk) 17:07, 29 April 2010 (UTC)

Sounds great.  // stpasha »  18:02, 29 April 2010 (UTC)
And we need to do the same thing with dice/die...  // stpasha »
Okay, I did it here. Lets see how it goes. 018 (talk) 19:23, 30 April 2010 (UTC)

## Error in definition section equations.

In the summary on the right hand side we have:

r - is the number of failures

pmf: c \times (1-p)^r p^k

which makes sense.

Then in the definition section we have, success is p, failure is (1-p) but then the pmf function is given as

c \times (1-p)^k p^r

which is the wrong way round - this gives the probability of k failures in r+k trials. This error continues thoughout the definition section. However in the related distributions section when the more common version of the pmf is written using \lambda (more commonly \theta in my experiance) we have (1-p)^r p^k, which is as it should be following the textual definitions given previously. —Preceding unsigned comment added by 193.63.46.63 (talk) 10:24, 4 October 2010 (UTC)

## Very serious issues with File:Negbinomial.gif

I see there are several very serious issues with the article's picture Negbinomial.gif:

I am suprised noone has caught this after all the time this picture (or its previous incarnations) has been shown in the title page.

We know that the mean of the distribution is $\mu = \frac{pr}{1-p}$. To have a constant mean value of 10, then p and r have to be related as $p=\frac{10}{10+r}$, which is the wrong thing to do, as p and r should be completely independent of one another. For example, for r=10, then p=1/2. But if r=20, then p=1/3. Since p and r are the exogenous variables of the distribution, then we should show a picture of a distribution that keeps one or the other constant. We should NOT show a picture that varies both simultaneously, since this does not show the true behavior of the function.

Also, if p and r are set this way, then there is no way that the standard deviation will be constant. Since we know that $\sigma^2=\frac{pr}{(1-p)^2}$, then we find that $\sigma^2=\frac{10(10+r)}{r}$, a non-constant function of r. And it is odd that the author of the picture chose to show the standard deviation as a horizontal segment. It should be in the same domain as the mean (i.e., a vertical line).

This picture has so many issues that I must recommend that it not be shown and that a new one (hopefully a correct one) be developed. In an earlier post, someone mentioned that the German language version of the article is a better one. Like that person, I don't read German either, but I can tell that the sample picture used there is a correct one. Perhaps an equivalent picture for the English language can be developed to replace Negbinomial.gif. If noone comes up with one in the next few days, I'll just bite the bullet and make my own and upload it. Bruno talk 15:58, 25 May 2011 (UTC)

This is not an error, just an expositional bit you don't like. I very much prefer the explanation on the page to one where the mean is changing, though it might be worth adding p to the graph as well as r to make it clear that both parameters are changing. One could, of course, reparameterize the NB so that the graphs shown were not only changing one parameter, in some sense it is arbitrary. In any case, the pmf graph should probably have p labeled on it even if it were not changing. 018 (talk) 16:33, 17 August 2011 (UTC)
Yes, while it may not be an error, I surely don't like it as you point out, and neither should you. I frankly do not see how presenting a function where its main parameters vary simultaneously contributes to clarity. While the animation looks pretty, the reader cannot get a clear understanding as to how the function actually works. Either you vary one parameter at a time and do the animation that way, or you show a static picture like in the German language article. Yes, it is not wrong (save for the standard deviation point), but it is not right either.
There is already a lot of fodder for confusion by presenting the material in the article in a way that deviates from the standard texts. The picture contributes to this. Again, it is not wrong either, but it is certainly not right, in that readers are left wondering what is going on. We see evidence of this elsewhere in this Talk page, where less-than-careful readers get into pitfalls, and other writers feel compelled to point out the deviations. I don't see why the article should be rife with these problems, as there are much better ways to present the material. This is a substandard article, starting from the picture. I'd offer to rewrite the whole thing but I know I will certainly run into the same resistance I am running into by pointing out deficiencies in the graphic that could easily be corrected. Expositional bit, my foot! Bruno talk 13:47, 18 August 2011 (UTC)
The mean being constant is what I think makes it clear. I've never gotten much out of drawings where the centrality parameter is changing. Do you agree that this figure could be made clearer by adding all parameters? 018 (talk) 19:29, 18 August 2011 (UTC)

I think the figure here does a poor job of providing a visual idiom for the negative bionomial distribution that distinguishes it from monotone distributions link the geometric with a single parameter. — Preceding unsigned comment added by 65.127.74.2 (talk) 15:41, 19 May 2013 (UTC)

## Concrete outcomes vs subjective values

The outcomes "success" and "failure" are concrete and can be mapped to a set {1,0} by an indicator function. The values "good" and "bad" are subjective values based on social constructions, experiences and personal preferences (concepts that may not even exist in concrete form). The comparison here between "success"/"failure" and good"/"bad" does not make any sense: "When applied to real-world situations, the words success and failure need not necessarily be associated with outcomes which we see as good or bad." The point of an experiment is *not* to have subjective biases, so why would the experimenter see the outcomes as good or bad? Runestone1 (talk) 00:44, 2 June 2011 (UTC)

The quote agrees with you... — Preceding unsigned comment added by Machi4velli (talkcontribs) 09:43, 1 July 2013 (UTC)

## Correct equation?

In the "Extension to real-valued r" section, I see a denominator of "x! gamma(x)". Is that right, or should it be "x gamma(x)"?

It's correct. Note that if x were a positive integer, we would have gamma(x) = (x-1)!, and you'll see that it would reconstruct the binomial coefficient in the integer case Machi4velli (talk) 10:00, 1 July 2013 (UTC)

## Mode does not appear to be correct

The current mode given doesn't seem to match other sources for negative binomial mode. I've checked under the multiple parameterizations and haven't come up with that formula. Can someone else confirm of deny this? — Preceding unsigned comment added by 108.246.235.64 (talk) 23:40, 19 July 2013 (UTC)

The current mode given doesn't seem to match other sources for negative binomial mode. I've checked under the multiple parameterizations and haven't come up with that formula. Can someone else confirm of deny this? — Preceding unsigned comment added by 108.246.235.64 (talk) 23:49, 19 July 2013 (UTC)

## Gamma-Poisson mixture parameter confused

I changed the second sentence of the section "Gamma–Poisson mixture" which had suggested that negativebinomial(r,p) ~ poisson(gamma(shape = r, shape = p/(1 − p))). Instead, it should be the rate parameter that should have that value. Here's some R code that makes it obvious:

many = 10000
r = 15 # trunc(runif(1,2,20))
p = 0.56 # runif(1)
x = rnbinom(many,r,p)
# negativebinomial(r,p) ~ poisson(gamma(shape = r, rate = p/(1 − p)))
lambda = rgamma(many,shape = r, rate = p/(1 − p))
y = rpois(many,lambda)
# negativebinomial(r,p) ~ poisson(gamma(shape = r, scale = p/(1 − p)))
lambda = rgamma(many,shape = r, scale = p/(1 − p))
z = rpois(many,lambda)
plot(sort(x),1:many/many, xlim=range(c(x,y)),ylim=c(0,1),col='green', lwd=3, type='l')
lines(sort(y),1:many/many, col='blue')
lines(sort(z),1:many/many, col='red')

Scwarebang (talk) 00:24, 12 October 2013 (UTC)

## An error in PMF, Mean and Variance formulas

I always have assumed that the formulas given in Wikipedia were correct. I will not do that anymore... I have no idea, why such basic mistake prevailed for so long. Maybe because the mistake is on the sidebar, which is not straight-forward to edit (I have no idea, how to do it). Here are the correct entries:

$pmf = \left\{ \begin{array}{cc} \begin{array}{cc} (1-p)^k p^r \binom{k+r-1}{r-1} & k\geq 0 \\ 0 & k < 0 \end{array} \end{array} \right.$

Mean = $r \left(\frac{1}{p}-1\right)$

Variance = $\frac{r}{p^2} \cdot \left(1-p\right)$

The results were checked manually on Wolfram's Mathematica, and then checked again, if there is no error on the Mathematica's side. — Preceding unsigned comment added by Adam Ryczkowski (talkcontribs) 10:21, 23 October 2013 (UTC)

## Isn't the CDF wrong?

CDF: $1-I_p(k+1,\,r)$, the regularized incomplete beta function.

According to the regularized incomplete beta function $I_x(a,b) = \sum_{j=a}^\infty \binom{a+b-1}{j} x^j (1-x)^{a+b-1-j}$.

So, $CDF=1-\sum_{j=k+1}^\infty \binom{k+r}{j} p^j (1-p)^{k+r-j}$. But we know that $CDF=1-\sum_{j=k+1}^\infty \binom{j+r}{j} p^j (1-p)^{r}$.

## MLE section uses wrong pmf?

Doesn't the MLE section use the wrong pmf (where r is the number of successes instead of failures)? This is inconsistent with the rest of the article. — Preceding unsigned comment added by Nicole.wp (talkcontribs) 23:35, 27 May 2014 (UTC)

I thought so, and changed it. Apologies if I should have discussed first.Clay Spence (talk) 20:45, 8 October 2014 (UTC)

## Inconsistent introduction

The case with r real contradicts the initial definition "The number of successes before r failures in independent Bernoulli trials". It should be made clear at the outset that this case is a special case.

The reason for putting this case first seems to have been that it's the most common in practice. I doubt this, although it might, perhaps, be the most common in elementary texts (and is all that the Wolfram Math World article discusses).

The real case occurs often when the assumptions for the Poisson distribution are not quite met.

For example, fatal road accidents tend to follow the Poisson distribution, while deaths do not because more than one may occur in a single accident. We get the negative binomial if the number of deaths per accident follows a logarithmic series distribution.

Similarly, people can have different rates for non-fatal accidents, so that repeated accidents to the same person are not Poisson distributed.

I remember the second case from my first statistics course in my first year at university (1964/65). Greenwood and Yule (1920) studied accidents to women manufacturing high explosive shells in World War I. Letting the Poisson parameters for individual workers (representing accident proneness) be gamma distributed gives the negative binomial.

(In our course we started with the negative binomial and an accident proneness variable with an arbitrary distribution. This then gives rise to an integral equation for this distribution which the gamma fitted. High school calculus was a prerequisite for B.Sc. and the course discussed the beta and gamma distributions, but we would not have been able to solve the integral equation except by trying all the continuous distributions we knew.)

I suggest starting with a description such as "the negative binomial is a distribution on the integers 0, 1, 2, 3, ... with two parameters often denoted by p and r where p is between 0 and 1 and r is positive."

Then continue with "The special case when r is an integer is known as the Pascal distribution and represents the number of successes in independent Bernoulli trials before obtaining r failures." I also think the different definitions in the Pascal case should be mentioned here and not relegated to a side bar.

Continue "The more general case (also known as the Polya distribution) occurs in at least two ways." Then list those ways as I described above.

We should also mention in the article (but not in the introduction) that alternative parametrizations for the Polya case are often more useful, for example the mean = pr/(1-p) = $\mu$ and r or odds ratio = p/(1-p) = $\theta$ and r.

TerryM--re (talk) 00:30, 6 October 2014 (UTC)