Talk:Poisson distribution

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Mathematics (Rated B+ class, High-importance)
WikiProject Mathematics
This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of Mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Mathematics rating:
A-BB+ Class
High Importance
 Field: Probability and statistics
One of the 500 most frequently viewed mathematics articles.
WikiProject Statistics (Rated B-class, Top-importance)
WikiProject icon

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

B-Class article B  This article has been rated as B-Class on the quality scale.
 Top  This article has been rated as Top-importance on the importance scale.



I think there is a misprint in the 'Entropy' section of the right panel, in the formula ( a png-file) that gives a large lambda asymptotic for the entropy: Logarithm symbol should be 'ln' rather than 'log' . Otherwise, expressions valid for any lambda and its limiting case look contradicting each other. The present notation create confusion since definitions of entropy may use different bases of logarithm.

Torogran (talk) 10:13, 19 March 2008 (UTC)

I agree that it is confusing to have both log and ln in the same box, although mathematicians use the notation 'log' in the case of the natural logarithm whereas life science types tend to use 'ln'. I will change it to make it consistent. Plasmidmap (talk) 17:29, 11 July 2008 (UTC)

Yup, I agree that it is confusing to have both log and ln in the same box, although mathematicians use the notation 'log' in the case of the natural logarithm whereas life science types tend to use 'ln'. I will change it to make it consistent. Lina-- (talk) 03:11, 28 March 2010 (UTC)


I'm pretty sure there's an error in this article. Einstein demonstrated the existence of photons while investigating the photoelectric effect, not blackbody radiation. Planck had already dealt with blackbody radiation a few years earlier.

If you know that to be a fact, go ahead and change it. (Such a fact isn't really essential to the topic of this article.) Michael Hardy 21:19, 6 December 2006 (UTC)

Is the number of errors in Wikipedia page really Poisson distributed? With certain assumptions of the process producing the errors, it might be for pages of the same length, but hardly for all pages. Or maybe this was a troll? —Preceding unsigned comment added by (talk) 19:29, 26 May 2008 (UTC)


Strangely, λ doesn't display as \lambda\, on my computer and I don't have a clue what the \, is for.

Also, I moved the normal distribution approx. into the connections to other dist. section to be consistent with the binomial distribution.

Frobnitzem 21:04, 7 September 2006 (UTC)

The \, causes it to render properly on some browsers. Michael Hardy 21:06, 5 February 2007 (UTC)

On a very unrelated note, it seems as if The Economist has taken the graphics for the Poisson/Erlang/Power law/Gaussian distributions from Wikipedia and published them in an article: Article: [1] and image: [2]

The limit of the binomial distribution isn't so much how the Poisson distribution arises as one example of a physical situation that the Poisson distribution can model fairly well. It far more often arises as the limit of a wide number of independent processes, which can in turn be modelled by the binomial distribution - but the model isn't the thing.

As it happens, it's a lot more illuminating and a better look at the causality to examine this limit of a wide number of independent processes using differential equations and generating functions, but it's simpler to use the binomial distribution approach. PML.

The comment above definitely could bear elaboration! Michael Hardy 01:45 Feb 5, 2003 (UTC)

Well, for instance consider how many breaks a power line of length l might have after a storm. Suppose there is an independent probability lambda delta l of a break in any stretch of length delta l. (We know this is crawling with assumptions; if we do this right - like the better sort of economist - in any real case we will check the theory back to outcomes to see if it was really like that in the first place.)

Anyhow, we pretend we already have a general formula and put it in the form of a Probability generating function P(lambda, l, x). Then we get an expression for P(lambda, l + delta l , x) in terms of P(lambda, l, x) and P(lambda, delta l , x). When we take the limit of this we get a differential equation which we can solve to get the Poisson distribution.

If people already know the slightly more advanced concept of a Cumulant generating function we can rearrange the problem in that form, and then the result almost jumps out at you without needing to solve anything (a Cumulant generating function is what you get when you take the logarithm of a probability generating function).

Actually, the cumulant-generating function is the logarithm of the moment-generating function. Michael Hardy 22:05, 2 Apr 2004 (UTC)

I have heard that the empirical data that was first used for this formula was the annual number of deaths of German soldiers from horse kicks in the 19th century. PML.

  • I'm not sure that this isn't just the same as what is on the page, just with different maths. I disagree with PML (but am open to being convinced otherwise) and think the binomial is a great place to start a derivation of the Poisson distribution from. It is exactly the appropriate approximation for nuclear decay, phones rining, et cetera. I would also use it for the above example. --Pdbailey 13:21, 31 Aug 2004 (UTC)

Concerning the source of the horse-kick data, see Ladislaus Bortkiewicz; it was his book The Law of Small Numbers that made that data-set famous. 02:25 Feb 5, 2003 (UTC)

I've seen this approach via differential equations before, but I don't think it's a reason not to include the limit theorem. For that matter, I still think an account of the limit theorem should appear earlier in the article than anything about differential equations or cumulant-generating functions. Michael Hardy 02:31 Feb 5, 2003 (UTC)

The word "arise" really only tells us that we can do the algebra this way, not that the process is itself like this.

My concern was that the wording suggests that it all somehow comes out of the Binomial distribution, when that is simply yet another thing that can describe/model the same sort of underlying processes. You would expect the limit of the binomial distribution to work, but only because it is itself modelling the same processes; but it only does that when you plug the right things in, i.e. taking the limit while you keep the expected values where you want them. You can have a binomial distribution that converges to other limits under other constraints. PML.

None of which looks to me like a reason why the limit theorem should not be given prominence before cumulants or differential equations are mentioned. I agree that the "constraints" do need to be emphasized. Michael Hardy 02:41 Feb 5, 2003 (UTC)

I think you're missing my point. I'm not saying you shouldn't mention these things early on. Only, you shouldn't make them look like where the Poisson distribution comes from, the underlying mechanism. You could easily use these things to show how to calculate it, to get to the algebraic formula, while stating that these are merely applying underlying things which will be bought out later. It's the word "arise" in the subtopic introduction I'm uncomfortable with, not what you're doing after that.

An analogy: it's a lot easier to state a formula for Fibonacci numbers, and prove that the formula works with mathematical induction, than to derive it in the first place - and it was probably derived in the first place by using generating functions. So you introduce the subject with the easy bit but you don't make it look like where you're coming from. PML.

I don't know the history, but to me it is plausible that the limit theorem I stated on this page is how the distribution was first discovered. And if you talk about phone calls arriving at a switchboard, it's not so implausible to think of each second that passes as having many opportunities for a phone call to arrive and few opportunities actually realized, so that limit theorem does seem to describe the mechanism. Michael Hardy 17:20 Feb 5, 2003 (UTC)

I am a dunce, but wouldn't the number of mutations in a given stretch of DNA be a binomial distribution, since you have discrete units? You couldn't very well have a nice Poisson process with a DNA stretch of only 4 base pairs... on the other hand maybe I don't know what I'm talking about... Graft 21:14, 2 Apr 2004 (UTC)

It would be well-approximated by a Poisson distribution if the number of "discrete units" is large, and using a Poisson distribution is simpler. Michael Hardy 21:23, 2 Apr 2004 (UTC)

I've been developing a new distribution curve to describe the number of correctly ordered random events when the order of each event is relative to the other events. In other words, 'A' comes before 'B', but there may be any distance between 'A' and 'B'. The pattern also demonstrates that when given a portion of the relative sequence, the probability of getting the unknown portion correct increases by an amount dependent upon the distance between the given events. In fact, given only one relative order, you have a better chance at getting the rest of the sequence correct, when the known relative order includes the endpoints of the sequence. The least valuable given would be consecutive events. I believe that this distribution curve will have value when analyzing DNA sequences. I've also determined that the Binomial Distribution is not appropriate for assessing ordered events (i.e. Grading a student's list of presidents in historical order). If anyone is interested, I am willing to discuss my work and provide my argument against use of the Binomial Distribution to compare the homology of DNA sequences. You may contact me through, and begin the title with "Rhonda give this to John". My wife has taken over my email account. After establishing contact, I can give you a better means of contacting me. User: JNLII, May 8, 2008. —Preceding unsigned comment added by JNLII (talkcontribs) 16:27, 8 May 2008 (UTC)

Waiting time to next event.[edit]

In the waiting time to the next event

P(T>t)=P(N_t=0)=e^{-\lambda t}.\,

This looks like it isn't normarmalized. since there should be a \lambda out in front. Am I wrong? Pdbailey 03:47, 11 Jan 2005 (UTC)

Yes; you're wrong. The normalizing constant should appear in the probability density function, but not in this expression, which is 1 minus the cumulative distribution function. Michael Hardy 03:50, 11 Jan 2005 (UTC)

Parameter estimation[edit]

I'm confused about the recent edits to the MLE section. I'm under the distinct impression that the sample mean is the minimum-variance unbiased estimator for λ, but a combination of ignorance and laziness prevents me from investigating this myself. Could someone please enlighten me? --MarkSweep 07:07, 15 May 2005 (UTC)

Evidently when I wrote it, I was also confused. I think its right this time, please check the derivation. I didn't put in the part about "minimum variance" because I can't prove it quickly, and I haven't got a source that says that, but it would be a good thing to add. PAR 14:07, 15 May 2005 (UTC)
This MLE is unbiased, and is the MVUE. MLEs generally are often biased. Michael Hardy 22:42, 15 May 2005 (UTC)
This proof definitely has to be in here. I made an attempt at proving it and it seems I've been successful at it. I have no experience nor time to learn the math formatting on wikipedia so I'll just put it here and I hope someone will once put it in the article:
The lower bound is reached when the variance of the estimator equals the cramer-rao lower bound. The variance of the estimator equals the variance of the sample mean, which equals (1/n^2) * n * Var(X_i ) = Var(X_i) / n = lambda / n. (n is number of samples). Now we have to find the cramer-rao lower bound and hope it's the same.
The cramer-rao lower bound equals 1/(n * E{ (diff(ln(f(X,lambda)), lambda))^2 } ). n is the number of samples. diff(func, var) means the partial derivative of func to the variable var. ln is the natural logarithm. f(X, lambda) is the probability mass function.
Working it out step by step: ln(f(X,theta)) = - lambda + ln(lambda^x) - ln(x!). Here x! means x factorial. Next step is taking the derivative: diff(ln(f(X,lambda)), lambda) = (x - lambda) / lambda. Next step is taking the expected value of the square of this expression. You can put 1/lambda^2 up front so that what remains inside the expectation operator is the variance of the poisson distribution. So you get that the cramer-rao lower bound equals 1/(n * 1/lambda^2 * Var(X_i)) = 1/(n * 1/lambda) = lambda / n. It's the same. QED. Aphexer (talk) 13:24, 5 June 2009 (UTC)

Poisson Distribution for Crime Analysis?[edit]

Is a Poisson distribution the best one for describing the frequency of crime? Before I add it as an example on the main page, I’d like to post this for discussion.

Recently, I've been trying to use the normal distribution to approximate the monthly statistics of the eight "Part I" crimes in the ten police districts of San Francisco. But the normal distribution is continuous and not discrete like the Poisson. It also doesn't seem appropriate for situations where the value of a crime like homicide is zero for several weeks.

My goal is to approximate the occurrences of crime with the appropriate distribution, and then use this distribution to determine whether a change in crime from one week to the next is statistically significant or not.

Distinguishing between significant change and predicable variations might help deploy police resources more effectively. Knowing the mean and standard deviation of the historical crime data, I can compare a new week’s data to the mean, and - given the correct distribution - assess the significance of any change that has occurred. But is the Poisson distribution the one to use?

Also, how do I take into account trends? Does the Poisson distribution assume that the underlying process does not change? This may be a problem because crime has been going down for years.

- Tom Feledy

Well, IANAS, but my advice would be to first set up a simple Poisson model and assess its goodness of fit. My guess is there could easily be several problems with a simple Poisson model: First of all, it has only a single parameter, so you cannot adjust the mean independently of the variance; you may want to look into a Poisson mixture like the negative binomial distribution as an alternative with more parameters. Second, as you point out yourself, zero counts (fortunately) dominate for many types of crimes. This suggests that you need a zero-inflated or "adjusted" distribution, like a zero-inflated Poisson model in the simplest case. Finally, if you have independent variables that could potentially explain differences in the frequency of certain crimes, then a conditional model (e.g. Poisson regression analysis) will be more appropriate than a model that ignores background information and trends. --MarkSweep 02:26, 31 May 2005 (UTC)
You might also look at a non-constant rate parameter. But estimating that might be delicate. Michael Hardy 02:52, 31 May 2005 (UTC)

Two-argument gamma function?[edit]

The article as it stands uses a two-argument function called Γ to define the CDF. The only gamma function Wikipedia knows about takes only one argument. What is this two-argument function? Thanks! — ciphergoth

It's the incomplete Gamma function. The Poisson CDF can be expressed as
\Pr[X\leq k] = Q(k+1, \lambda) = \frac{\Gamma(k+1,\lambda)}{k!} \!
where Q is the upper regularized Gamma function and \Gamma is the upper incomplete Gamma function. Given that
\Gamma(1,\lambda) = \exp(-\lambda)\!
\Gamma(k+1,\lambda) = k\,\Gamma(k,\lambda) + \lambda^k \exp(-\lambda)\!
one can easily show by induction that
\sum_{j=0}^k \Pr[X=j] = \frac{\Gamma(k+1,\lambda)}{k!}\!
holds. --MarkSweep 16:30, 14 October 2005 (UTC)

Hah! I had exactly the same question. It took me ages to find the answer - via the Wolfram Mathematica website among others - so I've updated the page at that point. I hope consensus is it goes well there. [User: count_ludwig (not yet registered)] 18:30, 17 July 2007 (UTC)

I am still doubting the accuracy of this CDF. I tried in Matlab and it is actually the lower incomplete function which gives the same values as the built-in CDF. Moreover, I agree with, by looking at the bounds of the integral, the lower incomplete function makes more sense than the upper one. Could MarkSweep provide the complete proof? Nicogla 11:46, 21 September 2007 (UTC)

Indeed I agree with this last comment. The cdf is usually defined as the integral from 0 to x; capital gamma (at least as defined in Wiki) is integral from x to infinity, YouRang 11:, 22 08 August 2008

Ack -- made some edits, then reverted them. The issue here is the distinction between the "integral" in the CDF, and the "integral" in the definition of the Gamma function; the limits of the latter one is parametrized by lambda, not by k. Unfortunately, numerical maths can be inconsistent in which argument place refers to which thing (and also in things like normalization of the Gamma.) You can see that the "upper incomplete" is the correct one letting lambda go to infinity; the value of the CDF for fixed k should go to zero. Sdedeo (tips) 13:22, 15 October 2008 (UTC)

It's true that this is a form for the CDF of a Poisson:

\Pr[X\leq k] = Q(k+1, \lambda) = \frac{\Gamma(k+1,\lambda)}{k!}

However, note that this reduces to the much simpler form of


where \gamma() is the lower incomplete gamma function. Easy breezy. I would strongly recommend adding this form to the main article, as its simpler to understand. Borky (talk) 17:34, 15 December 2009 (UTC)


When I was studying statistics (few years back now), the notation used in the independent references we worked from identified the distribution as Po(λ) rather than Poisson(λ). Of course, if someone disagrees, feel free to put it back as it was. Chris talk back 01:58, 31 October 2005 (UTC)

Actually, I do disagree. To a certain extent it's an arbitrary decision, but consider the following factors: (1) I think neither "Po" nor "Poisson" is an established convention, so there is no reason to prefer one over the other; (2) "Poisson" is more descriptive and less confusing; (3) "Poisson" is what we use in a number of other articles (e.g. negative binomial distribution). I'd say there are no reasons to prefer "Po", at least one good reason to prefer "Poisson", plus a not-so-good reason (inertia) to stick with "Poisson". --MarkSweep (call me collect) 04:59, 31 October 2005 (UTC)
When I've seen it abbreviated, I think I've usually seen "X ~ Poi(λ)", with three letters. I'm not militant about it, but I prefer writing out the whole thing. Michael Hardy 22:20, 31 October 2005 (UTC)
Whatever. Personally I think Poi just doesn't look right, but that's a matter of opinion. Chris talk back 23:29, 1 November 2005 (UTC)

Erlang Distribution[edit]

There's a refrence to erlang distribution, but the article does not mention the mutual dependence between Erlang Distribution and Poission Distribution. That is, the number of occurrences within a given interval follows a Poission distribution iff the time between occurrences follows an exponential distribution. (unsigned by user:Oobyduby)

CDF is defined for all reals[edit]

It has to be a piecewise constant function with jumps at integers. —The preceding unsigned comment was added by PBH (talkcontribs) .

I don't see why. Most books I have referenced (Casella and Berger's Statisitical Inference, for example) give the range as non-negative integers. Why should it be piecewise constant? --TeaDrinker 16:12, 30 May 2006 (UTC) Ah, looking at the graph again I see the error. Indeed the CDF should be piecewise constant, not interpolated as has been done. My mistake. --TeaDrinker 16:15, 30 May 2006 (UTC)
How does this look?
It does not quite look like the other (pdf) plot. However it does do the stepwise progression. Cheers, --TeaDrinker 16:32, 30 May 2006 (UTC)
I would do away with the vertical pieces. If you do it in MATLAB, you could probably use something like plot( x, y, '.' ); At any case, this is much better, at least mathematically if not aesthetically. PBH 16:56, 30 May 2006 (UTC)

To me, the mass function seems far easier to grasp intuitively than the cdf, so I wouldn't mind if no cdf graph appeared. In the mean time, I've commented out the incorrect one that appeared. Michael Hardy 02:02, 31 May 2006 (UTC)

I've posted a CDF and then removed one that was grossly misleading. The problem with the pdf and cdf here is that it isn't clear that the lines are eye guides and do not represent actual mass. This error is more problematic in the case of the CDF because there is no reason for the eye guide, the cdf (unlike the pdf) has support on the positive real line. The plot I posted also has problems. there should be no vertical lines, and there should be open circles on the right edges of each horizontal line and closed circles at the left edge. Pdbailey 00:17, 2 June 2006 (UTC)

okay, I added these features. If you want to post one that you think looks prettier, please be sure that it meets the definition of the CDF. Pdbailey 02:46, 2 June 2006 (UTC)

Parameter estimation[edit]

In the parameter estimation section it is surely not necessary to appeal to the characteristic function?

Expectation is a linear operator and the expectation of each k_i is lambda. Therefore the sum of the expectations of N of them chosen randomly is N lambda and the 1/N factor gives our answer. Surely the characteristic function here is needless obfuscation? --Richard Clegg 14:49, 14 September 2006 (UTC)

I've fixed that. It was very very silly at best. Someone actually wrote that if something is an unbiased estimator, it is efficient and achieves the Cramer-Rao lower bound. Not only is it trivially easy to give examples of unbiased estimators that come nowhere near the CR lower bound, but one always does so when doing routine applications of the Rao-Blackwell theorem. Michael Hardy 20:44, 14 September 2006 (UTC)


the poisson graphs dont look right. shouldnt the mean be lamba? it doesnt look like it from the graphs if so. —Preceding unsigned comment added by (talkcontribs)

Well, it's quite hard to visually tell the mean from a function plot, but fortunately in this case the mode is also floor(λ), and in the case of λ an integer there is a second mode at λ−1. I don't see anything that's visually off in Image:Poisson distribution PMF.png. --MarkSweep (call me collect) 07:58, 5 December 2006 (UTC)

Poisson model question[edit]

Does a material requisiton filling process fit into a poisson model? A wrong requisition is generated hardly ever, so p is very small. X= "Requisitions with errors" —The preceding unsigned comment was added by (talk) 12:40, 19 December 2006 (UTC).

Poisson median formula source and correctness[edit]

Implementing the Poisson distribution in C++, I find that the quantile(1/2) does not agree with the formula given for the median. The media is about 1 greater than the quantile(half). Is this formula correct? What is its provenance. Other suggestions? Thanks

Paul A Bristow 16:52, 19 December 2006 (UTC) Paul A. Bristow

Have you tried with the GSL (GNU Scientific library): [3] and [4]? --Denis Arnaud (talk with me) 18:36, 22 March 2007 (UTC+1)
I have checked it by numerical calculation via a self-written program and using formulae from the Numerical Recipes. The formula in the table is almost correct, but 0.2 has to be replaced with 0.02. Then it is fairly correct (the absolute error is less than about 0.001, the relative one even smaller).--SiriusB 14:06, 13 June 2007 (UTC)

UPPER incomplete gamma funct?[edit]

Doesn't it make sense that the cdf would be the lower incomplete gamma function rather than the upper? Am I missing something? 23:27, 4 February 2007 (UTC)blinka


Isn't the mode both the floor and if lambda is an integer, the next lower integer as well? Pdbailey 22:23, 26 March 2007 (UTC)

I've add this several times and it has been deleted without comment in the edit summary, please post here if you disagree! Pdbailey (talk) 02:48, 12 February 2008 (UTC)
By the way, if that is the case, then another way to write it is simply \lceil{\lambda-1}\rceil. Chutzpan (talk) 16:32, 23 March 2008 (UTC)
Chutzpan, I can see what you are saying and it is notationally smaller when typeset, but this property is mainly cherished over clarity by mathematicians. I think it is clearer to point out that the distribution is only bimodal when lambda is an integer since otherwise the reader has to take a minute to figure that out. Pdbailey (talk) 17:10, 23 March 2008 (UTC)


"Albert Einstein used Poisson noise to show that matter was composed of discrete atoms and to estimate Avogadro's number; he also used Poisson noise in treating blackbody radiation to demonstrate that electromagnetic radiation was composed of discrete photons."

These claims need their respective citations. They are far from being "common knowledge" about Einstein, at least in the specific wording that the claims use. I am removing them until the proper citations are given.

Even with citations, this is too specific for this article. Many thousands of scientific endeavors use Poisson processes of one kind or another. McKay 04:03, 12 April 2007 (UTC)
I think that, if the citations support the claims, the claims are historically interesting. However, I'm not sure if the claims are fully supported. For example, did Einstein's 1905 Brownian motion article talk about a Poisson noise or rather about a Gaussian noise? Was the editor referring to this or to another article? And with regard to the claim about the blackbody radiation, the first entry in this talk page had already doubted about its validity. I will ask editors of Wikipedia's Albert Einstein article anyway. (Sorry, I forgot to sign last time. Another Wikipedian 05:36, 12 April 2007 (UTC))

Formula in complex analysis[edit]

I know very little about statics, but it seems to me the article does not discuss the poisson formula in complex analysis, which I know. I am thinking of renaming a newly created Schwarz formula to poisson formula replacing the redirect. Any feedback? -- Taku 09:57, 28 April 2007 (UTC)

section order[edit]

I propose that the the first section after the introduction regarding shot noise should be folded into the examples section as a bullet. It is already covered in the article on shot noise very throughly and I'm not sure what's so much more interesting about this example than any of the others. Pdbailey 13:59, 17 May 2007 (UTC)

Posterior of a Gamma-distributed prior for a poisson parameter[edit]

Somehow, the formula given for the posterior distribution makes little sense (in the sense that it is not stable - it will not yield agreeing distributions if fed with data specifically tailored to meet the prior). I'd have to check with my textbooks (I will when time allows), but it seems to me that there's a typo - browsing the net, I found a similar (yet different) one:


That came from here, if you're interested (and I don't really know how trustworthy that page is... so I'd still check some textbooks). In any case, that new formula clearly fits better with the assertion that

The posterior mean E[λ] approaches the maximum likelihood estimate \widehat{\lambda}_\mathrm{MLE} in the limit as \alpha\to 0,\ \beta\to 0.



(when n\to \infty, which is another change)

Correct posterior is \lambda \sim \mathrm{Gamma}(\alpha + \sum_{i=1}^n k_i, \beta + n). \! Note that Gamma is usually parameterised by shape and rate (inverse scale) when used as a conjugate prior for the Poisson, not by shape and scale. —Preceding unsigned comment added by (talk) 07:18, 30 May 2008 (UTC)

That whole section is nonsense. There is no reason that the prior needs to be a gamma distribution and there are many applications where this flatly doesn't hold. Driving the parameters of the prior to zero returns us to exactly where we started: an assertion that the MLE(lambda)=k. So why did we bother with the Bayesian inference? - (talk) 20:46, 13 February 2011 (UTC)
No, it's not nonsense. The prior doesn't need to be a gamma, but, as it says, the conjugate prior is a gamma. You wouldn't usually use zero parameters in practice for Bayesian inference. Qwfp (talk) 08:13, 14 February 2011 (UTC)

I added a reference to A Compendium of Conjugate Priors, by Fink to establish that the conjugate prior is a gamma. Found via:, which is a helpful overview of the relationships. The paper is at: — Preceding unsigned comment added by (talk) 17:04, 4 November 2012 (UTC)

Web server example: Repeat visitors vs. first-time visitors[edit]

The "Occurrence" section currently reads:

Examples of events that may be modelled as a Poisson distribution include: ...
• The number of times a web server is accessed per minute.

Since website visitors tend to click around a multi-page website at a click-rate which differs from the arrival rate, may I suggest any of the following amendments:

  1. The number of times a web page is accessed per minute.
  2. The number of times a web server is accessed per minute by new, unique visitors.

-- JEBrown87544 04:18, 2 September 2007 (UTC)

Maximum of the distribution[edit]

It would be useful to the information on calculating the maximum of the distribution.

The Poisson distribution has the maximum between (\lambda - 1) and lambda, because poisson(x, lambda) = poisson(x + 1, lambda) gives the result x = lambda - 1. We look for two equal probability values that are distant from one another by 1. This give as a pretty good clue where the maximum is.

Since lambda doesn't have to be an integer, we have to consider consider floor(lambda - 1), and ceil(lambda) as the possible values for the maximum. Also (floor(lambda - 1) + 1) can be the maximum, so we consider this too. It seems safe to assume that there are three values for the maximum to consier:

floor(lambda - 1) floor(lambda - 1) + 1 floor(lambda - 1) + 2

But we need to make sure that (\lambda - 1) is not negative.

The C code for calculating the maximum is:

int poisson_max(double lambda) {

 assert(lambda > 0);
 int k_ini = int(lambda - 1);
 if (k_ini < 0)
   k_ini = 0;
 int k_max = k_ini;
 double f_max = gsl_ran_poisson_pdf(k_max, lambda);
 // We choose the max of k_ini, (k_ini + 1), and (k_ini + 2).
 for(int k = k_ini + 1; k <= k_ini + 2; ++k)
     double f = gsl_ran_poisson_pdf(k, lambda);
     if (f > f_max)
         f_max = f;
         k_max = k;
 return k_max;


Thanks & best, 19:16, 6 October 2007 (UTC)Irek

This is called the mode and it is on the page as such. It is lambda or lambda and the number one less if it is an integer. Pdbailey 20:39, 6 October 2007 (UTC)

Thanks for the info! —Preceding unsigned comment added by (talk) 21:01, 6 October 2007 (UTC)

sorry, should be floor lambda or lambda and lambda + 1 in the case of an integer. Pdbailey 21:30, 6 October 2007 (UTC)


I think the article should mention pronunciation of poisson. —Preceding unsigned comment added by (talk) 22:04, 9 December 2007 (UTC)

Is it really pronounced with a nasal? In French yes, but in English? --Jirka6 (talk) 08:46, 15 February 2013 (UTC)


The introduction to this article is excellent, but given the importance of the Poisson distribution to many fields such as, for example, call-center management (where the average practitioner may not necessarily have a mathematical background), it would be desirable to make the rest of the article more comprehensible. (talk) 12:52, 12 January 2008 (UTC)

I agree, too technical. Article needs a simpler example kind of a thing. - Niri / ನಿರಿ 11:21, 26 December 2010 (UTC)

May I suggest an modification to the graphs, one that presents it in terms of concrete things rather than abstract symbols (k and lambeda) that need to be looked up?

Imagine that this is being read by someone who isn't even familiar with the convention of expressing probabilities as a proportion of 1. Only one axis is labeled. One page further down, it is explained that "k is the number of occurrences of an event". What? When? Where?

If we need to have k and lambeda in there, let's have axis labels and a legend that define them:

lambeda = number of events we expect to observe (on average)

k = number of events we really observe

p = probability of observing k events

We could have a caption something like

"Example of use: If one event is expected in the observations (e.g. the event happens on average once every decade, and your observations cover a decade) then the chance of having no events in the observations is 0.37 (37%), the probability of seeing one event is 0.37, the probability of seeing two events is 0.18, etc.."

For the second graph, the same, only let it read "one or zero events (k<=1),... two events or fewer... three events or fewer..."

This implies all that the current caption says. By character count it's twice as long, though. Any other suggestions?

If people get the basic idea upfront, from the graphs, they will get much more out of the article.

HLHJ (talk) 19:04, 20 May 2008 (UTC)


Why mode is not stated as \lceil\lambda - 1\rceil - it simpler than currently stated formula and as far as I understend equivalent? Uzytkownik (talk) 12:04, 26 March 2008 (UTC)

I've answered this question in the first section on the talk page that is titled, "mode." Pdbailey (talk) 01:58, 27 March 2008 (UTC)


The definition given here seems very time-centric:

In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a number of events occurring in a fixed period of time if these events occur with a known average rate and independently of the time since the last event. The Poisson distribution can also be used for the number of events in other specified intervals such as distance, area or volume.

Are there other dimensions besides time and space that could apply? If not I suggest simply saying 'time or space' (volume, area and distance all being spatial). In any case, surely a broader definition should be given. Richard001 (talk) 09:27, 3 April 2008 (UTC)

Richard001, it's a little harder with space because the event usually already occurred. i.e. the number of stars within a certain portion of the sky. Pdbailey (talk) 03:09, 10 May 2008 (UTC)


Many of the examples given in the "Occurence" section are probably not Poisson. It might be better to have a much shorter list of easily defensible examples. OliAtlason (talk) 23:02, 18 April 2008 (UTC)

Agreed, be bold! Pdbailey (talk) 03:07, 10 May 2008 (UTC)

Schwarz formula[edit]

Why is Schwarz formula in the See Also section ? How is it relevant to the poisson distribution ? Is someone confusing poisson kernel with poisson distribution ? —Preceding unsigned comment added by (talk) 04:09, 14 May 2008 (UTC)

Well spotted, I've (finally) removed it Quietbritishjim (talk) 14:46, 3 January 2010 (UTC)

The law of rare events[edit]

I tried to clarify the meaning of "rare" in this term -- it applies to the (very many) individual Bernoulli variables being summed, not the result. The later "law of small numbers should probably be moved into this section by the way, but I'm not sure how to do it cleanly. Quietbritishjim (talk) 15:32, 9 June 2008 (UTC)

Error (typo style)?[edit]

The text in case reads:

since the current fluctuations should be of the order \scriptstyle\sigma_{I} = e\sqrt{N/t\  } (i.e. the variance of the Poisson process),

The equation appears to describe the standard deviation rather than the variance as suggested by the text in parentheses. Is this a typo? —Preceding unsigned comment added by (talk) 06:55, 27 June 2008 (UTC)

I concur - it's the standard deviation. Not quite sure enough to fix it, though. Bryanclair (talk) 07:29, 13 October 2008 (UTC)

That was the standard deviation. The sentence is very imprecise, but now it doesn't disagree with the rest of the text in this way. Pdbailey (talk) 15:15, 13 October 2008 (UTC)

Related distributions[edit]

I think that the statement ( the generalization ) in the second item of "Related distributions" is only true if  X_1, X_2, ..., X_n are independent.

Dr. Francisco Javier Molina Lopez —Preceding unsigned comment added by (talk) 08:31, 20 July 2008 (UTC)

Square root[edit]

The article states:

  • Variance stabilizing transformation: When a variable is Poisson distributed, its square root is approximately normally distributed with expected value of about \sqrt \lambda and variance of about 1/4. Under this transformation, the convergence to normality is far faster than the untransformed variable.

Can we have a reference for that? Also, in what sense is the convergence faster? McKay (talk) 00:36, 2 November 2008 (UTC)

McKay, I put in a reference for the variance stabalizing transformation claim. It is in many places so I used the textbook that is most verbose on the subject. The convergence rate comes from experience, I recognize a better reference is required. Give me some time. PDBailey (talk) 01:50, 2 November 2008 (UTC)
McKay, regarding the convergence rate: There exists some weight outside of the support for the Poisson when you use the normal approximation. Because of this, the square root is better whenever this is important (i.e. small lambda in the distribution case, or small observed counts in the data case). The square root approximation also is better for use with data because it is a variance stabilizing transformation. When you get a draw with 89 counts, you not only don't know the value of the mean, you also do not know the value of the variance. In contrast, in the transformed space, you have very precise knowledge of the variance and can construct better confidence intervals. That said, thinking of Y as an RV, then the 95% confidence intervals for the two approximations form an approximately bounds for the 95% conficence interval of the Poisson distribution.
With all this said, I'd be fine to remove the rate claim. I did not look for a reference, nor do I care to. PDBailey (talk) 19:10, 2 November 2008 (UTC)

I propose to add the following online reference in addition to ref. 2, where this issue is discussed: (talk) 22:49, 10 December 2008 (UTC)

Abraham de Moivre[edit]

According to

it was A. de Moivre who "publishes [in 1711] a (largely overlooked) derivation of the Poisson distribution ( Poisson's better-known derivation was published in 1837)."

I find this worth mentioning. —Preceding unsigned comment added by Howeworth (talkcontribs) 23:50, 17 January 2009 (UTC)

on the CDF[edit]

Bikasuishin just updated the CDF to remove a leading 1 minus that was added by about a week prior. So that after Bikasuishin's edit, it is.

\frac{\Gamma(\lfloor k+1\rfloor, \lambda)}{\lfloor k\rfloor !}\!\text{ for }k\ge 0

Since this appears to be a point of contention, it is worth noting that (using the definition of the Incomplete gamma function and using only integer values of k)

\frac{k! \ e^{-\lambda} \sum_i^{k} \frac{\lambda^i}{i!}}{k !}

which simplifies to

e^{-\lambda} \sum_i^{k} \frac{\lambda^i}{i!}\

Isn't this a vastly simpler CDF function. In deed, one can easily see that it is increasing in k and even calculate it without clicking around to other pages. PDBailey (talk) 23:37, 8 February 2009 (UTC)

That's arguably simpler, but it's a lot less useful to the reader who wants to check a quick way to compute the CDF in a practical situation (which, by the way, is the reason I came to this page). It's much more efficient to use the representation in terms of the incomplete gamma function than the discrete sum. Convincing oneself that the corresponding formula is correct is a simple matter (provided one knows partial integration, at any rate). In case it's not immediately obvious, we can provide a derivation of the "gamma" version, but that's the one that belongs in the infobox. Bikasuishin (talk) 23:56, 8 February 2009 (UTC)
Both are CDFs and correct, but I am not really sure how one can integrate a function that is only non-zero at singularities without some familiarity of at least one of the non-Riemann integration concepts. The sum and floor function has the advantage of not requiring that in addition to being immediately derivable. I would propose that the CDF used for fast calculation needs a note in the text, not the other way around. PDBailey (talk) 04:23, 9 February 2009 (UTC)
What do you mean by "only non-zero at singularities"? You can't find much smoother than a function like tke-t, and the integral is certainly a convergent indefinite integral in the Riemann sense. But that's beside the point. To compute the incomplete Gamma function, you don't write a numerical integration routine (that's not the proper way to do it anyway). You do the same as you would for the error function: you fire up your favorite math software package (SAGE, Maple, whatever). Bikasuishin (talk) 09:56, 9 February 2009 (UTC)
What is the pdf of the Poisson at any point in k \in (3,4)? also, R, for example, does not have a built in incomplete gamma function. You could back one out (at the integers) using the CDF of the Poisson though. PDBailey (talk) 13:59, 9 February 2009 (UTC)

The leading "1-" is correct. A cumulative distribution rises steadily from 0 to 1. The Gamma function peaks at 0, so the present form is impossible!

This seems to be a confusion due to two extant definitions of the incomplete gamma function; see the upper and lower definitions at incomplete gamma function. If we are consistent and use a capital Γ for the upper function, then the "1-" is indeed required. McKay (talk) 03:54, 5 September 2014 (UTC)

Zero-deleted or doubly-truncated[edit]

I can't seem to find anything on a zero deleted Poisson distribution. Could it be called something else? —Preceding unsigned comment added by (talk) 07:00, 13 March 2009 (UTC)

Simpler Derivation of Poisson Distribution PDF[edit]

The current derivation of the PDF from the binomial distribution seems a little to lengthy to me. The following is a little more succinct, comments welcome.


\lim_{n\to\infty} P(X=k)&=\lim_{n\to\infty}{n \choose k} p^k (1-p)^{n-k} \\
&=\lim_{n\to\infty}{n! \over (n-k)!k!} \left({\lambda \over n}\right)^k 

\left(1-{\lambda\over n}\right)^{n-k}\\

\cdot\lim_{n\to\infty}\left(1-\frac{\lambda}{n}\right)^{-k} \\




\cdot e^{k}\lim_{n\to\infty}\left(\frac{1}{1-\frac{k}{n}}\right)^{k}\\

&=\frac{\lambda^{k}}{k!}e^{-k}\cdot 1\cdot e^{-k}\cdot e^{k}=\frac{\lambda^{k}}{k!}e^{-k}\\


RyanC. (talk) 00:22, 8 October 2009 (UTC)

You are using Stirling Formula with a compensation term (which gets canceled out in the division). Otherwise both proofs are exactly the same (the one in the article is more verbose to explain what is going on). --Bart weisser (talk) 11:46, 21 October 2009 (UTC)
One thing I forgot to use is the squeeze theorem or something similar to show that if the limit of the approximated term equals one then so does the exact term. Maybe I'll get around to doing that when I have time, but either this derivation or the existing one should probably include that to complete the proof, otherwise the result is technically just an approximation of the limiting case of the binomial distribution.
RyanC. (talk) 02:24, 27 October 2009 (UTC)
I don't see how you can pull the lambda out of the limit since lambda=pn. Proof in the main page seems to have the same problem unless I'm missing something. Mbroshi (talk) 21:20, 26 July 2011 (UTC)
Nevermind--I was missing something. Mbroshi (talk) 21:31, 26 July 2011 (UTC)

Graphs: continuous vs discrete[edit]

I see that the the graphs' authors noted that the index/X-values should be discrete and the lines are for visual aid, but I had to read the description to see that. I suggest a graph with only discrete values...there are enough points that I think the average user should be able to see the trend. This can clear up one of the more confusing aspects of various distributions (discrete vs continuous) at an immediate glance of the figure. -- Bubbachuck (talk) 01:21, 12 October 2009 (UTC)

You have my vote on this (for the PDF)! --Keilandreas (talk) 02:41, 25 October 2010 (UTC)
I put a plot without guidelines here: -- I don't like it as much. Skbkekas (talk) 15:26, 29 October 2010 (UTC)
Hey, thanks! I clearly prefer the new version without guidelines. It avoids confusion by directly pointing the viewer to the fact that the distribution is only defined for integers. I definitely vote for replacing the old (guidelined) version. The color of the data points clearly shows the relationship among them. And after all, I believe that guidelines are not a valid tool when representing functions in graphs. --Keilandreas (talk) 18:26, 3 November 2010 (UTC)

Poisson law of large numbers - question[edit]

This name has been added as an alternative in the lead (with a citation). Anyone fully informed about this? A quick web search finds a mixture of meanings, half of which seem to be saying this is equivalent to the Poission distribution, and half saying it means something which is at least close to law of large numbers and not a distribution at all. So, is this just a mistake which has been transmitted to several places, or is there a strong basis for this? If it is worth including, should it be so near the start? Melcombe (talk) 17:32, 11 November 2009 (UTC)

The law of rare events[edit]

I've updated this section. I've really kept the same ideas and layout, but updated the presentation considerably. Most of all I've removed the junk in the proof, such as the several occasions that an equality would have a limit as n\to\infty on one side and an expression depending on n on the other; or for example the nonsense "If the binomial probability can be defined such that p = λ / n". I also added a citation needed to generalisation at the end of this section. Previously PlanetMath had been cited, but it didn't give a proof (its proof "sketch" certainly isn't convincing) and as its user-contributed it's not reliable enough for us to just take its word for it. I think this section is much better now, but it's still far from perfect. Quietbritishjim (talk) 14:52, 3 January 2010 (UTC)

I know you are a mathematician, but I believe that, by nature, the asymptotic equivalence of Poisson and Binomial distributions is contigent on the fact that n\to\infty. The other implication, resulting from this, is that p = λ / n. So this begs the question of how these two assumptions can be considered, from your point of view, redundant? (Bart Weisser) —Preceding unsigned comment added by (talk) 17:15, 7 January 2010 (UTC)

(No need to acknowledge me as a mathematician, I'm only a research student and everyone's free to discuss and edit anyway.) As I said, I didn't change any content, only its presentation, so p = λ / n and n\to\infty were in there both before and after my edit.

  • If you're talking about the sentence I quoted above, then it was a bit harsh of me to call it nonsense (although I misread it at first so it's certainly not clear), but it's an awkward way of thinking about it: it says "if we have some binomial variables and their p 's just happen to equal λ / n for some λ", whereas I've said "let us define some binomial variables with their p 's chosen to be λ / n".
  • If you're talking about my removal of some instances of n\to\infty, that's because sometimes to calculate the limit of a sequence we first do some manipulation of the sequence for finite n. For example, before my edit the article included the following statement:

\lim_{n\to\infty}\log\left(F\right) =
\log\left(n!\right) - k\log\left(n\right) - \log\left[\left(n-k\right)!\right],
(F was the old notation for An; I changed it to show the n dependence and so it looked less like the CDF.) This is wrong because the left hand side =\infty, so it doesn't make sense to say it equals something for something in terms of n. What the author(s) meant was that this holds for all _finite_ n, which is useful in finding the limit, although not enough on its own (if we tried applying the properties of limits to this we would end up trying to calculate \infty-\infty).

Quietbritishjim (talk) 11:22, 11 January 2010 (UTC)

Thanks for clarifying. I thought about it shortly after I posted the reply, and I guess it makes sense, that this is definitely not a limit. For the sake of formalism, the "as n goes large" statement should be enough (Bart Weisser) —Preceding unsigned comment added by (talk) 21:24, 18 January 2010 (UTC)
This section is complete bull shit. Poisson distribution exists in this much simpler way: when the arrival time is exponential distributed, then the number of arrival is Poisson distributed. And the relation is precise. There is no need to do approximation. Jackzhp (talk) 22:28, 20 November 2010 (UTC)

Poisson statistics article needed[edit]

This article is the redirect for Poisson statistics, but it is actually not a very good discussion of statistics; it is mostly aimed at math. I think it might be useful to put in a separate article that is actually about Poisson statistics, which would link to this article for the mathematical details. Geoffrey.landis (talk) 16:50, 14 December 2010 (UTC)

Example Needed[edit]

Come on, people - there is not one single example of the simple usage of the Poisson distribution in this article!!! New Thought (talk) 12:16, 27 March 2011 (UTC)

Lambda = zero?[edit]

Why can't lambda equal zero? I notice the article for Skellam distribution specifies that its two means - which correspond to invididual Poisson lambdas - may equal zero, so I'm just checking if there's not an oversight. It would seem nice for generality, as a process can of course have a rate of zero. --Iae (talk) 23:21, 14 July 2011 (UTC)

tail probability[edit]

recent edits added the phrase, "tail probability" what the heck is a tail probability? 018 (talk) 00:31, 14 August 2011 (UTC)

Copyright problem removed[edit]

Prior content in this article duplicated one or more previously published sources. The material was copied from: Infringing material has been rewritten or removed and must not be restored, unless it is duly released under a compatible license. (For more information, please see "using copyrighted works from others" if you are not the copyright holder of this material, or "donating copyrighted materials" if you are.) For legal reasons, we cannot accept copyrighted text or images borrowed from other web sites or published material; such additions will be deleted. Contributors may use copyrighted publications as a source of information, but not as a source of sentences or phrases. Accordingly, the material may be rewritten, but only if it does not infringe on the copyright of the original or plagiarize from that source. Please see our guideline on non-free text for how to properly implement limited quotations of copyrighted text. Wikipedia takes copyright violations very seriously, and persistent violators will be blocked from editing. While we appreciate contributions, we must require all contributors to understand and comply with these policies. Thank you. Danger (talk) 11:48, 9 October 2011 (UTC)

Derivation of the Poisson Distribution from the Exponential Distribution[edit]

The Poisson Distribution can also be derived from the Exponential Distribution.

Using the football analogy, let f(t)=\lambda e^{-\lambda t} ,t\ge 0 be the probability density function for scoring a goal at time t.

A match is a time interval of unit length (0, 1]. A team scores on average λ goals per match.

Then let p(k;\lambda ) be the probability of scoring exactly k goals in a match.

To score no goals in a match means you don’t score in the interval (0, 1]. Therefore,

p(0;\lambda )=\int _{1}^{\infty }\lambda  e^{-\lambda t} dt=-\left[e^{-\lambda t} \right]_{1}^{\infty } =e^{-\lambda }

To score exactly one goal at time x, where 0 < x ≤ 1, means you cannot then score again in the interval (x, 1]. Because the probability density function is “memoryless”, this is calculated as not scoring in the interval (0, 1-x]. Therefore,

p(1;\lambda )=\int _{0}^{1}\lambda e^{-\lambda x}  \left(\int _{1-x}^{\infty }\lambda e^{-\lambda t}  dt\right)dx=\int _{0}^{1}\lambda e^{-\lambda x}  e^{-\lambda (1-x)} dx=\lambda e^{-\lambda }

To score the first goal at time y, where 0 < y ≤ 1, and the second goal at time x + y, where x > 0 and x + y ≤ 1, means you cannot then score again in the interval (x + y, 1]. Therefore,

p(2;\lambda )=\int _{0}^{1}\lambda e^{-\lambda y}  \left\{\int _{0}^{1-y}\lambda e^{-\lambda x} \left(\int _{1-x-y}^{\infty }\lambda e^{-\lambda t}  dt\right)dx \right\}dy=\int _{0}^{1}\lambda e^{-\lambda y}  \left\{\int _{0}^{1-y}\lambda e^{-\lambda x} e^{-\lambda (1-x-y)} dx \right\}dy=\int _{0}^{1}\lambda ^{2} e^{-\lambda } (1-y) dy=\frac{\lambda ^{2} }{2} e^{-\lambda }

To score the first goal at time z, where 0 < z ≤ 1, and the second goal at time y + z, where y > 0 and y + z ≤ 1, and the third goal at time x + y + z, where x > 0 and x + y + z ≤ 1, means you cannot then score again in the interval (x + y + z, 1]. Therefore,

p(3;\lambda )=\int _{0}^{1}\lambda e^{-\lambda z}  \left\langle \int _{0}^{1-z}\lambda e^{-\lambda y}  \left\{\int _{0}^{1-y-z}\lambda e^{-\lambda x} \left(\int _{1-x-y-z}^{\infty }\lambda e^{-\lambda t}  dt\right)dx \right\}dy\right\rangle dz=...=\frac{\lambda ^{3} }{3!} e^{-\lambda }

The above formulae can be seen to be following the generic pattern:

p(k;\lambda )=\frac{\lambda ^{k} }{k!} e^{-\lambda }

Stuart.winter02 (talk) 17:16, 12 March 2012 (UTC)

Evaluating the Poisson Distribution[edit]

This edit '' removed a section on how to evaluate the poisson distribution, 'as relevance unexplained and unsourced'. Fair enough. The issue remains however that a naive evaluation of the Poisson distribution may lead to a serious or complete loss of precision. So something to adress this is needed. Lklundin (talk) 11:09, 5 May 2012 (UTC)

Why is something needed? There are no similar sections for articles on other distributions. There is nothing to the problem of numerical evaluation that is specific to the Poisson distribution, or is there? Possibly the problem would be better addressed in an article on numerical computations. Is there a specific need for values of the pmf, compared to values of the cumulatve distribition function ... the latter may be better implemented via existing algorithms for the incomplete gamma function (via the route between the Poisson and chi-squared distribution functions now included in the article.
The numerical stability (i.e. how accurate a straight-forward evaluation is on an actual computer) is very different for different distributions. Numerical evaluation of for example the normal distribution does not run into problems nearly as easy as does the Poisson distribution. The problem with the straight-forward evaluation of the Poisson distribution on an actual computer is that the dividend and divisor can quite easily reach values that exceed what is representable on a normal computer, causing the subsequent division to yield an inaccurate or even meaningless result. It is easy to verify that actual (e.g. online) Poisson distribution calculators do not simply perform a straight-forward evaluation. I agree that addressing the evaluation only for the Poisson distribution is not optimal. I think on the other hand that it is useful that the Poisson distribution article has this information available, either direcly in the article, or via a link. Although the evaluation example of (150,150) in the now-removed section was original research, the much improved numerical stability of the easily derived alternative evaluation method is easy to see. I could not quickly google a reference to an improved evaluation method for the Poisson distribution, which makes me think that this is too trivial. If this is really the case, I don't think this implies that the topic of the evaluation is unsuitable for the article. Lklundin (talk) 20:28, 5 May 2012 (UTC)
Wikipedia has standards for what is includable ..."notability" and, if something is "too trivial" to have been mentioned in publications, then it is clearly not notable. But I have found a source: Johnson, N.L., Kotz, S., Kemp, A.W. (1993) Univariate Discrete Distributions (2nd edition). Wiley. ISBN 0-471-54897-9, p165 ....this gives some compuational formulae, including the recursive calculation of the log-probabilities and provides references to comparisons and seemingly even better computational algorithms. Specifically, they reference (I haven't seen this): Fox BL & Glynn PW (1988) "Computing Poisson Probabilities", Communications of the ACM, 31, 440-445. Given this, there is a possibility of finding source code in the usual places such as netlib (I haven't looked).
Similar problems do occur for other distributions, for example the binomial and hypergeometric distributions. Computational formula are presented in binomial coefficient, and not in binomial distribution. There seems to be no problem here specific to the Poisson distribution but rather exactly the same sort of problem arises in many instances of converting an algebraic formula for numerical calculation. Melcombe (talk) 23:46, 5 May 2012 (UTC)
According to the documentation, the R function dpois for Poisson density is based on C code contributed by Catherine Loader. The algorithm is described in loader2000Fast.pdf. The source code can be found in the R sources standalone math library (in RHOME/src/nmath/dpois.c):

double attribute_hidden dpois_raw(double x, double lambda, int give_log)
    /*       x >= 0 ; integer for dpois(), but not e.g. for pgamma()!
        lambda >= 0
    if (lambda == 0) return( (x == 0) ? R_D__1 : R_D__0 );
    if (!R_FINITE(lambda)) return R_D__0;
    if (x < 0) return( R_D__0 );
    if (x <= lambda * DBL_MIN) return(R_D_exp(-lambda) );
    if (lambda < x * DBL_MIN) return(R_D_exp(-lambda + x*log(lambda) -lgammafn(x+1)));
    return(R_D_fexp( M_2PI*x, -stirlerr(x)-bd0(x,lambda) ));

double dpois(double x, double lambda, int give_log)
#ifdef IEEE_754
    if(ISNAN(x) || ISNAN(lambda))
        return x + lambda;

    if (lambda < 0) ML_ERR_return_NAN;
    if (x < 0 || !R_FINITE(x))
        return R_D__0;

    x = R_D_forceint(x);

    return( dpois_raw(x,lambda,give_log) );

Most of this is argument checking. The calculation of the density in function dpois_raw:

  R_D_exp(-lambda + x*log(lambda) -lgammafn(x+1))  

is not complicated. The idea of computing ratios of large (or small) numbers using logarithms to avoid overflow or underflow is a standard method, and does not seem to require special mention in the Poisson article. If it did, then every distribution that involves a gamma function or factorial would require similar sections.Mathstat (talk) 12:06, 6 May 2012 (UTC)

Small numbers vs Large numbers[edit]

I know it is often talked of the "law of small numbers". However the specific reference given is about the Poisson law of large numbers.

Here I link a quick shot of the book page: [5] Magister Mathematicae (talk) (Gullberg, 1997) 04:58, 20 October 2012 (UTC)

Informal term "Poisson mean"?[edit]

I was confused by this term as it is not defined. If it is common practice to refer to the mean of a "SomeName" distribution as "SomeName mean", perhaps this could be clarified? If it isn't common practice, why not just refer to a "mean of the Poisson distribution"? Craniator (talk) 03:29, 18 March 2013 (UTC)

Confusing wording in section "Confidence Interval"[edit]

This section refers to "chi-square deviate with lower tail area p and degrees of freedom n". From googling and wikipediaing, "deviate" isn't a well defined term. Also an authoritative notation for 2-parameter chi-squared distribution is difficult to find online, and I was under the impression that p is the area to the right of a threshold. Here, p is referred to as the lower tail area. Is this in fact the case? Craniator (talk) 23:16, 17 March 2013 (UTC)

A probabilist would say "the p-quantile of the chi-square distribution with n degrees of freedom". But statisticians have their language dialect. :-) Boris Tsirelson (talk) 16:42, 18 March 2013 (UTC)

Prime number section[edit]

In my view the section on prime numbers does not belong in this article. It is a slightly interesting fact about prime numbers, but does not provide information on the Poisson distribution. It is also perilously close to a copyvio of [6]. McKay (talk) 02:05, 21 March 2013 (UTC)

Looking at this again, I notice that the given source does not even prove it. The source says only that it follows from an unproved conjecture of Hardy and Littlewood. This makes it even less appropriate for this page and I'm deleting it. McKay (talk) 06:24, 24 September 2013 (UTC)

Definition query[edit]

Can some expand on or explain the statement:

  • \lambda=\lambda T, when the number of events occurring will be observed in the time interval T=1.

and how this differs from saying something as banal as

  • X=X Y, when  Y=1.

Chuunen Baka (talkcontribs) 11:20, 25 July 2013 (UTC)

Indeed that phrase makes no sense; I delete it. Boris Tsirelson (talk) 11:55, 25 July 2013 (UTC)

Error in the relationship with chi-squared distribution[edit]

There was an error in the statement of the relationship to chi-squared distribution. The following was on the previous version of the page:

F_\text{Poisson}(k;\lambda)  = F_{\chi^2}(2\lambda;2(k+1))  \quad\quad \text{ integer } k,

However, the correct relationship is this:

F_\text{Poisson}(k;\lambda)  = 1-F_{\chi^2}(2\lambda;2(k+1))  \quad\quad \text{ integer } k,

My source is section Section 4.5 (page 167 in my electronic version) of the 3rd edition (2005) of Johnson, Kotz, and Kemp's "Univariate Discrete Distributions". The previous version of the article cites (for the incorrect result) the 2nd edition (1993) of the same book. I do not know if the error is present in the 2nd edition of Johnson, Kotz and Kemp since I don't have it. I left the reference unchanged since I couldn't figure out how to do that (I am a *very* occasional Wikipedia editor -- could someone with more skill please fix that?). However, I did correct the statement relating the p.m.f. of Poisson random variable to chi-squared distribution that immediately follows the relationship between the two distributions.

Bullmoose953 (talk) 03:51, 27 July 2013 (UTC)

Degree of spread measured in... what units?[edit]

Sorry, I'm still struggling with understanding the concept, and I'd hoped for the description to be more precise. Compare this to other things that measure spread, for example, STD or SE, they use the same units as the variable being measured. I.e. if you measure hight of human population in centimetres, then the standard error will measure the spread of values in your measurements in centimetres as well. Poisson distribution doesn't seem to use the same units... or does it? I'm struggling to makes sense of the numbers I'm getting from a formula, and it just doesn't add up to anything :( sorry. (talk) 19:14, 25 May 2014 (UTC)

Poisson distribution, in contrast to the normal (and many other) distributions, is for integers only. Thus the notion of units is not applicable; "someone typically gets 4 pieces of mail per day" — could you say this in different units? meters? kilograms? No, never. "Piece" is not a unit. In physics (and chemistry, etc) such quantities are treated as dimensionless. They are the same in all systems of units (SI, CGSE etc). Boris Tsirelson (talk) 21:03, 25 May 2014 (UTC)
Well, you misunderstood me. I don't mean it has to be some special unit (as in physics), it has to relate to the units of the feature being researched. The example you quote: someone gets 4 pieces of mail - then the the spread could be measured in: a) pieces of mail. b) ratio of how many pieces of mail are being predicted to some etalon ratio, such as maximum entropy for example. These are two explanations that I've been pondering, but as I've said - none of them makes sense, if you plug in the numbers. To be more concrete, the example I stole from it goes like this: *The average number of homes sold by the Acme Realty company is 2 homes per day. What is the probability that exactly 3 homes will be sold tomorrow?*. And plugging the numbers into the formula I get 0.18045. All hunky-dory, but I don't know what does this number mean when applied to spread. When interpreted as frequency or probability - its units are "chances" or "units of frequency", so intuitively I can interpret it as "if I were to sell property, then approximately one time in six I'd sell three items whereas on average I'd sell two" or something to that effect. But when it measures spread - shouldn't it reflect on how the error scales with the measurement? I can't tell from looking at 0.18045 what is the error, and whether the data is widely scattered or is dense because I'm lacking the ability to interpret this number in any such way. Hope I make it clear! (talk) 22:19, 25 May 2014 (UTC)
Not very clear to me. It seems, you really do not want a distribution (that is, collection of probabilities) but rather STD. Then, what is the problem? STD for Poisson is the square root of lambda. Boris Tsirelson (talk) 05:44, 26 May 2014 (UTC)
Or do you want to say that, in your opinion, the phrase "it predicts the degree of spread around a known average rate of occurrence" is misleading? Then just say so. Boris Tsirelson (talk) 05:48, 26 May 2014 (UTC)
Yes, I see now. I'll try a different approach. STD is a measure of spread. Let's assume the "worst" case of the normal curve where the "shoulders" are on the same level as the peak. I.e. the curve degenerated to a straight line (with STD approaching infinity). This gives me the worst predictive power possible in this setting. The more I progress towards "lower shoulders" and "narrower peak", the better I get at predicting the outcomes. Thus I can compare spreads: the one with lower shoulders and narrower peak is "more dense" - it gives me better predictive power. Now take my previous example with Acme Realty. Suppose now I'm selling on average 5 pieces of property and want to predict selling exactly 6. This gives me the chance of 0.14622. What I don't know is am I now better (more precise) at predicting the outcome or not? And if I can tell, then how do I know?
Oh, I just saw your other comment. Well, kind of. I don't really understand what that sentence says. I thought that the degree of spread would be something along the lines I described above in STD example. (talk) 06:00, 26 May 2014 (UTC)
Not sure it answers your question, but anyway: the normal distribution has two parameters, the mean and the spread; in contrast, Poisson distribution has only one parameter; its STD is necessarily the square root of its mean. Boris Tsirelson (talk) 07:02, 26 May 2014 (UTC)
Well, if that's the case, I'd say that the wording of the "it predicts the degree of spread around a known average rate of occurrence" isn't good. In a sense it always predicts the same degree of spread. It's not even useful for predicting this quality of spread. Why not just say that it predicts the chance of a Poisson random variable to receive a given value? Because to me that's what it does... (talk) 16:37, 26 May 2014 (UTC)
True, taken literally it is, of course, "the chance of a Poisson random variable to receive a given value". And this is not specific to this distribution; the same may be said about every other distribution. But on the other hand, a distribution (Poisson or other) determines (somewhat indirectly) all numeric characteristics of location (expectation, median ...), spread (mean square deviation, interquartile range, ...), asymmetry and whatever. Looking at probabilities you easily get an idea which deviation is a rare event and which is not. Boris Tsirelson (talk) 19:04, 26 May 2014 (UTC)

I happen to notice this discussion, and yes, the referred sentence better could read something like: it describes the variation of the numbers around the mean. Nijdam (talk) 09:43, 28 May 2014 (UTC)

That would work for me too. The other thing about this particular distribution being able to describe spread is that it is too trivial - it just tells you back what you've already told it. I wasn't concerned with rarity of events. The way I understand what degree of spread measures it is how good is my prediction. I.e. it tells me that I've found a function whose values are at some good distance from actual observed values, and the degree is thus some relative units measured from some baseline (say, when my prediction is always correct) to some possible limit (my prediction is wrong half of the time exactly). In other words, try to pose yourself a question: "If Poisson distribution measures the degree of the spread, then what is the numerical value of the degree of the spread given my previous example of Acme Realty?" (talk) 11:47, 30 May 2014 (UTC)
Sorry, I cannot understand your phrases "I wasn't concerned with rarity of events" and "when my prediction is always correct". Even dealing with the normal (rather than Poisson) distribution, the only always correct prediction is "somewhere between minus infinity and plus infinity". Then you writes "my prediction is wrong half of the time exactly"! But this is the opposite attitude. The "half" measures rarity (and therefore you are concerned). Boris Tsirelson (talk) 15:31, 30 May 2014 (UTC)
Sorry it took me a while to reply. Let's simplify it and concentrate on the following: what is the numerical value of the degree of spread given the Acme Realty example?. I can explain what I mean by other things you quoted, but it will drive the discussion away. I'd rather concentrate on answering this particular question. Just to expand on my motivation for resolving this question: my background in statistic is that I took a semester-long course in mathematical statistic, which is not much, but I imagine that I should be able to understand the description of some not very complex statistical structure, especially so, I can understand how it works and what it does. Yet the sentence is completely opaque to me. It's as if someone chained a handful of words in a syntactically valid way, but lacking any meaning. It is this meaning that I'm trying to discover, or to discover that the meaning was lost due to the bad wording. (talk) 12:55, 7 June 2014 (UTC)
OK, now I see; you do not like the generic term "degree of spread"; instead you want to see something specific (be it mean square deviation, interquartile range or whatever). As for me, I interpret this "degree of spread" as a vague idea that hints to all these specific statistics, and to the vague intuitive idea of spread as well. However, the phrase in the article is not my formulation, and I am not really defending it. If you can write it better, be bold. Boris Tsirelson (talk) 14:54, 7 June 2014 (UTC)

Introductory example[edit]

I think the introductory example is poorly explained/chosen.

The important thing about receiving mail that is not highlighted in the text, but that is the feature that makes mail per day _likely_ to be Poisson distributed, is the fact that each mail event occurs, to a good approximation for most people, independently of each other mail event. Saying that 'assuming the process or mix of processes that produces the event flow is essentially random' is, in my opinion, too vague and unclear. If we're taking about a probability distribution then of course we're talking about something that is 'essentially random'. What is likely meant here by 'essentially random' is independence of waiting times between events, but this could be stated explicitly. As it reads currently it sounds like the most important thing that makes mail/day a Poisson process is just that it's a random process described only by its average rate. But this is of course not the only distribution on the natural numbers that could be described only by its mean. Take waiting for a bus versus waiting for a taxi. Both taxis and buses may happen to pass you 4 times an hour, on average, but the distribution of 'taxis/hour' is much more likely to be Poisson than 'buses/hour' (depending on the city you live in...).

I would suggest:

For instance, an individual keeping track of the amount of mail they receive each day may notice that they receive an average number of 4 letters per day. It is reasonable to assume that receiving one piece of mail does not affect the arrival time of future pieces of mail, that pieces of mail from a wide range of sources arrive independently of one another, so the number of pieces of mail received per day follows a Poisson distribution. Other examples might include: number of phone calls received by a call center per hour, number of decay events per second from a radioactive source, number of taxis passing a particular street corner per hour.

Everybody knows this is nowhere (talk) 06:20, 13 September 2014 (UTC)

Law of rare events[edit]

In the explanation for law of rare events, the article claims that

"With this assumption one can derive the Poisson distribution from the Binomial one, given only the information of expected number of total events in the whole interval."

which is true; the distribution only depends on the expected number of total events in the whole interval and not the distribution of the probability of the events in the interval. However, the explanation then, goes ahead and assumes that the probability of an event happening in a subinterval I_i is \lambda/n, which is only true if the events happen at constant rate in the interval, which defies the whole claim about we only need the expected value of total number and not the distribution of rate in the interval. --Sprlzrd (talk) 23:51, 29 April 2015 (UTC)