Talk:Cumulative distribution function/Archive 1

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Distribution function

This page had a redirect from distribution function, which I've now made into its own article describing a related but distinct concept in physics. I'll try to modify the pages pointing here through that redirect so that the net change in the wikipedia is minimal.SMesser 16:12, 24 Feb 2005 (UTC)

I added a reference to here on the "distribution" page so that "distribution function" appears separately for statistics and for physics Melcombe (talk) 16:35, 21 February 2008 (UTC)

Cumulative density function

I originally created the redirect cumulative density function in March to point to this article. Why? A simple google test for cumulative density function shows 41,000 hits while cumulative distribution function shows 327,000 hits. Michael Hardy's contention is that "cumulative density" is patent nonsense (see deletion log) and a redirect shouldn't exist.

Regardless of the correctness of "cumulative density", there still is a significant usage of it in reference to this article and its content. "Cumulative density function" is even used in a doctoral thesis. Hardly patent nonsense.

Even if "cumulative density function" is incorrect, someone still may look for it, find nothing, and create an article paralleling this article. If you don't buy the "it's not patent nonsense, or even just nonsense" then I invoke (from WP:R#When should we delete a redirect?) that it increases accidental linking and therefore should not be deleted.

Michael, if you have a problem with the correctness of "cumulative density" then by all means add a section here or change the redirect to an article and explain it there. Either way, cumulative density function needs to be a valid link. Cburnett 14:42, 14 December 2005 (UTC)

I just saw this debate now. I've changed the redirect page into a navigation page explaining the severe confusion. Michael Hardy 21:59, 20 July 2007 (UTC)


Please be consistent! In Probability theory the integral of the "probability density function" "PDF" is called "cumulative density function" CDF or simply "distribution function". Thus the adjective cumulative.


The term "Cumulative distribution function" is nonsense because it implies the integral of the integral of the PDF. Utterly nonsense! Please correct this link! User:lese 4 Nov 2007.

"Cumulative distribution function" appears in Everitt's Dictionary of Statistics while "cumulative density function" does not. Similarly in the Unwin Dictionary of Mathematics. Melcombe (talk) 16:42, 21 February 2008 (UTC)

How is this a debate?

The word "cumulative distribution function" is used in many elementary books. It is a pretty stupid term, but we are stuck with it. The best we can do is acknowledge that the term is out there, that is should simply be "distribution function" and that it's definition MUST be with <= or else many tables, software routines, etc will be incorrectly used. —Preceding unsigned comment added by Jmsteele (talkcontribs)

I don't think its a stupid term and I have no problem with it. On the other hand "cumulative density function" is a horribly stupid term. Michael Hardy 21:59, 20 July 2007 (UTC)

Doesn't make sense

"Note that in the definition above, the "less or equal" sign, '≤' could be replaced with "strictly less" '<'. This would yield a different function, but either of the two functions can be readily derived from the other. The only thing to remember is to stick to either definition as mixing them will lead to incorrect results. In English-speaking countries the convention that uses the weak inequality (≤) rather than the strict inequality (<) is nearly always used."

Surely it doesn't matter at all! Since the probability of one single value is 0, hence the two interval boundaries can be included or excluded.

If you're only interested in integrals. Shinobu 22:50, 7 June 2006 (UTC)
The convention in the entire world is to use '≤' and it matters HUGELY for the binomial, poisson, negative binomial, etc. To use anything else and to rely upon the formulas in any text would lead substantial errors, say when one is using a table of the binomial distribution. Jmsteele 01:18, 21 October 2006 (UTC)
I'm not sure about that. The definition: F(x) = P(X <= x)
Because P(X <= x) = P(X < x) + P(X = x), F(x) = P(X < x) + P(X = x)
Now for normal functions (the kind of functions you mention) P(X = x) = 0.
Of course, there are things like deltafunctions, but that's not what you're talking about. Shinobu 16:27, 27 October 2006 (UTC)

Please consider some very important distributions: The Binomial, Poisson, Hypergeometric. You simply MUST use the definition F(x) = P(X <= x) or else all software packages and all tables will be misundestood. PS I am a professor of statistics, so give me some slack here. This is not a matter of delta functions it is a matter of sums of coin flips ... very basic stuff.

F(x) vs Phi(x)

I completely disagree with "It is conventional to use a capital F for a cumulative distribution function, in contrast to the lower-case f used for probability density functions and probability mass functions." From all the literature I have read, is the cumulative distribution function and is used for probability density/mass functions. Where's the reference to make such a bold claim that F and f are convention? See the probit article which uses for the inverse to cdf. -- Thoreaulylazy 19:13, 3 October 2006 (UTC)

There is no such convention - you can pick any symbol you like, of course. It is common practice to use the capital for the cdf, because it's the primitive of the df. I've seen phi in quantum mechanical books, but I've also seen f and rho. Shinobu 22:58, 3 October 2006 (UTC)
From all the literatures I have read, the pair of F and f was the convention. I don't mean to say that Φ and φ are wrong, but how can you be so sure to declare something else as a bold claim? Many different fields have different notational conventions, and we just have to accept it. Musiphil 07:03, 3 December 2006 (UTC)

====This is a collapsed disctintion. One uses Phi for the normal distribution and phi for the normal density. These are reseved symbols for these purposes --- see any statistics book. One uses F and f fo the generic distributions and densities, but these are not reserved. In many books and papers one will find G g , H h etc. Each time the capital representing distribution and the lower case the density.

Programming algorithm

I've been looking for a better algorithm to generate a random value based on an arbitrary CDF (better than the one I wrote). For example, if one would like to obtain a random value with a "flat" distribution, one can use the 'rand()' function in C's math.h . However, I wrote this function to use an arbitrary function to generate the random value:

                   // xmin and xmax are the range of outputs you want
                        // ymin and ymax are the actual limits of the function you want
                        // function is a function pointer that points to the CDF
long double randfunc(double xmin, double xmax,  long double (*function)(long double), double ymin, double ymax)
{       long double val;
        {       if( (ymax-ymin)*( rand()/((long double)RAND_MAX + 1)) + ymin  < 
                    function(val= ((xmax-xmin)*( rand()/((long double)RAND_MAX + 1)) + xmin)))
                {        return val;

I was trying to find a way to do it faster/better. If anyone knows of anything.. let me know. Fresheneesz 07:53, 27 December 2006 (UTC)

Wikipedia really isn't the place to ask these sorts of questions. The talk pages are more for discussion on the articles themselves. Anyway, i will tell you that Donald Knuth's textbook Numerical recipies in C has a good dissertation on random number generation and also includes algorithms. I would further advise that you read the text, not just implement the algorithms listed there, its quite good! User A1 13:48, 12 March 2007 (UTC)
I think it'd be nice to have something on algorithms on this page. I have actually found a better answer. It invovles either integrating the CDF, and using the definate integral instead of an indefinite integral, or if no definite integral is possible, preintegrate the function and use the numbers prerendered in memory. Fresheneesz 02:39, 13 March 2007 (UTC)
This is a well-understood problem that has been solved many times over. However, to really appreciate the solutions, I recommend that you pick up a graduate-level textbook on random variables (for example, the book by Papoullis and Pillai). I think you'll find that given an arbitrary CDF and a random variable that is uniformly distributed from 0 to 1, the inverse of the CDF will transform the uniformly distributed random variable into a randomv ariable with that CDF. That is, if your desired CDF is F, the function F-1 will transform a random variable distributed between 0 and 1 to a random variable distributed by F. This can be used to motivate such algorithms. HOWEVER, I think you'll find that if you know more about the particular random variable you are generating, there are much more efficient ways to generate that random variable from a uniform random variable. Again, an understanding of the underlying probability will greatly simplify the generation of such algorithms. (students of probability are often asked to generate such algorithms as homework problems in, for example, MATLAB) --TedPavlic 21:19, 8 April 2007 (UTC)

cdf vs pdf


i removed the comment that probability distribution function is the same as CDF, which i assert to be wrong. My reference is "Probability and Statistics for Engineering and the Sciences" pp 140 (J. Devore) . The PDF is the same as the probability density function, not the CDF. The CDF is the integral of the PDF, not the PDF itself.

Please comment. 05:28, 12 March 2007 (UTC)

The PDF is the probability density function, since "PDF" stands for "probability density function", NOT for "probability distribution function". Michael Hardy (talk) 20:10, 3 March 2009 (UTC)
The "probability distribution function" is the same as the "cumulative distribution function" (CDF) and the "distribution function". The "probability density function" (PDF) is the derivative of the probability distribution function. See e.g. [1]. --X-Bert 22:22, 7 April 2007 (UTC)
For more information, you should look into measure theory, which is the basis for probability. Originally, the word probability was prepended to measure theoretic concepts to imply a special structure of the measure being used. However, because probability is now used by many who do not have the mathematical sophistication for measure theory, lots of other terms have been introduced (often accidentally) to hide the roots of probability. Thus, the language is now quite sloppy. Reviewing the measure theoretic roots of probability clears up any confusion about why the terms that are used in probability have the names that they do. --TedPavlic 21:10, 8 April 2007 (UTC)

I would also like to see probability distribution function removed as an alternative name to CDF as it only confuses readers since probability distribution function can refer to the probability mass/density fxn ( ) or cumulative dist fxn depending on the author. If it is included as an alternative name, then I think the different interpretations need to be pointed out in the same paragraph.Wiki me (talk) 17:31, 3 March 2009 (UTC)

In standard usage, "PDF" stands for "probability density function", not for "probability distribution function". The latter is synonymous with "CDF"—cumulative distribution function. If Devore said "probability density function" is the same as "PDF", then Devore is right because "PDF" stands for "probability density function". Michael Hardy (talk) 18:45, 3 March 2009 (UTC)
Michael Hardy, while the phrase you added is technically correct, I think the PDF is introduced much more naturally further down in the lead. Specifically, the introduction with PDF in the first paragraph makes one wonder if it is a synonym for CDF or what it is and when it will be defined. Introducing it at the same time it is defined and its relationship with the CDF is given makes the most sense. PDBailey (talk) 18:48, 3 March 2009 (UTC)
I didn't "add" the phrase; I merely undid the anon's recent deletion of the phrase. Michael Hardy (talk) 20:26, 3 March 2009 (UTC)

I've looked this up in Devore

Sixth edition, page 147:

probability density function (pdf)

Fifth edition, page 145:

probability density function (pdf)

So Devore agrees with everyone else that pdf stands for probability density function.

It appears that our anonymous user who cites Devore simply saw the words "probability density function (pdf)", made a big flying leap to the conclusion that the "d" in "pdf" stands for "distribution" rather than "density", and thought Devore was saying the probability density function is the same thing as the probability distribution function. Lousy reading comprehension. Michael Hardy (talk) 20:26, 3 March 2009 (UTC)

Properties Notation

The formula after 'If X is a discrete random variable, then it attains values x1, x2, ... with probability pi = P(xi),' has the final sum of p(xi), why would this not be the sum of pi since we have already introduced pi? Also, I think pi = P(xi) should be introduced as pi = P(X=xi) for clarity? Chrislawrence5 17:31, 16 April 2007 (UTC)

Limits of integration of F(x)

I'll preface my comment by saying that I am not a mathematician, so I may be off base. However, should there be some sort of reminder statement that the limits (especially the lower limit, ) of integration for the expression

should also be compatible with the range of applicability of  ? Some distributions are not defined over the entire range of . I was scratching my head confirming the CDF for the Pareto distribution starting with the PDF and couldn't get the listed answer until I realized this. Perhaps this would be obvious to some, but I suggest it to others who are more up on this stuff as a possible point of clarification. I will defer this change, however, to someone who is more of an authority on this. --Lacomj (talk) 21:07, 20 December 2008 (UTC)

The way it is defined, the Paraeto distribution's pdf is zero below xm in the notation of the wikipedia page on the distribution. so you have to see it as
I think it would be most correct to point out that the Paraeto is zero off its support on that page, but this might appear to be redundant to others. PDBailey (talk) 00:04, 3 March 2009 (UTC)

The reasons

What is the reasons to define F(x)=P[X<=x], why not F(x)=P[X>=x]? just convention or what? Jackzhp (talk) 15:24, 2 March 2009 (UTC)

Yes, convention. PDBailey (talk) 23:57, 2 March 2009 (UTC)

CDF as expectation

There is a formula expressing CDF as an expectation of an indicator function:

Maybe it should be included somewhere, as it is probably not so obvious for people without statistical background. The formula is particularly useful for numerical estimation of function F (in which case we simply replace expectation with sample mean), and in analysis of limiting properties of estimated cdf through the use of central limit theorem. —Preceding unsigned comment added by Stpasha (talkcontribs) 00:03, 19 June 2009 (UTC)

Be bold and add it, you might want to link to Kolmogorov-Smirnov_test. PDBailey (talk) 21:33, 20 June 2009 (UTC)