Talk:Probability distribution

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Mathematics (Rated B-class, Top-importance)
WikiProject Mathematics
This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of Mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Mathematics rating:
B Class
Top Importance
 Field:  Probability and statistics
One of the 500 most frequently viewed mathematics articles.
WikiProject Statistics (Rated C-class, Top-importance)
WikiProject icon

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

C-Class article C  This article has been rated as C-Class on the quality scale.
 Top  This article has been rated as Top-importance on the importance scale.

Lead is confusing[edit]

"random variable may be attributed to a function defined on a state space equipped with a probability distribution that assigns a probability to every subset ... of its state space..." A function defined on a state space should assign something to points of the state space. If it assigns something to sets, it is rather a SET FUNCTION. Also, if the state space is already equipped with a probability distribution, what is the role of the random variable?

"A random variable then defines a probability measure on the sample space by assigning a subset of the sample space the probability of its inverse image in the state space." Really? The two states are swapped here. On the sample space a measure is given from the beginning; on the state space it appears due to the random variable.

"In other words the probability distribution of a random variable is the push forward measure of the probability distribution on the state space." I do not understand this phrase; once again, on which space the measure is given from the beginning? Boris Tsirelson (talk) 16:03, 12 December 2008 (UTC)

Organization of the article: subsection Terminology etc[edit]

Perhaps subsection Terminology with its current content is superfluous. Repetition of discrete vs continuous distribution seems confusing (the reader could think he overlooked sth he didn't). The notion of support should appear in the article on measures. Measure, probability and possibly a few other concepts should be referenced at the bottom of the article. More thoughts are needed about organization of the vast legion of important links to related articles.

As a continuous counterpart of formula for probability distribution should be used that with integral using density function (since resembles discrete version very well - just think of density as discrete histogram with dense set of values). After presenting discrete and absolutely continuous distributions in separate sections, there should appear the third section unifying both into the general theory with formula using Lebesgue integral w.r.t. axiomatic probability measure P (this formula which is now horribly serving as definition in the special case of absolute continuous random variable - inacceptable for general purpose encyclopedia). Then some properties, examples and graphs should follow (independently of famous and important distributions described in dedicated articles). Compare e.g. with some good stylistic and logistic ideas applied to the article on expected value. --Megaloxantha (talk) 02:37, 31 December 2008 (UTC)

New "Some Properties" section[edit]

Can we do something sensible about the stuff presently in this section, which says:

  • The probability density function of the sum of two independent random variables is the convolution of each of their density functions.
  • The probability density function of the difference of two random variables is the cross-correlation of each of their density functions.

In the first bullet point, there is a need to say this for general distributions, not just those which have densities ...unfortunately the article on convolution does not seem to give a sensible formula in terms of cumulative distribution functions. In the second bullet point, there is again the problem of dealing with general distributions but, in addition, the use of the word "cross-correlation" would need to be given the interpretation in the article pointed-to, which is very different from the common one in statistics.

Melcombe (talk) 10:38, 31 December 2008 (UTC)

Proposal to archive discussion[edit]

Given that this article has had a major revamp, much of the old discussion is irrelevant to the present content. I have reordered the threads a little to put newer stuff towards the end, but I am suggesting that all the stuff now before the section headed "Lead is confusing" be archived. Any thoughts? Melcombe (talk) 12:15, 31 December 2008 (UTC)

Discussion archived as above. Melcombe (talk) 14:21, 20 May 2009 (UTC)

Observation Space[edit]

In the formal definition, we have:

"A random variable is defined as a measurable function X from a probability space to its observation space ."

Can we have someone put up a definition of the "observation space"? I'd do it myself but I'm not able enough. --WestwoodMatt (talk) 15:41, 20 March 2010 (UTC)

Formally, it is just a measurable space. Informally, it is the set of all possible values (or a larger set, if more convenient). I also wonder, is "observation space" a standard word, a neologism, or what?Boris Tsirelson (talk) 16:30, 20 March 2010 (UTC)
I'm assuming it's the same thing as the image of . But I'm not used to coming at this from the direction of defining the image as a measurable space - the only treatments I'm familiar with regard the image of as just being a subset of . Hence I'm slightly out of my depth, and although I could work through it myself step by step, I'd be unsure as to whether I'd done it right - I'm new to measure theory. --WestwoodMatt (talk) 23:33, 20 March 2010 (UTC)
"Observation space" is used sometimes, see for example here. I did not find the definition, but probably it means the codomain, not just the image. Boris Tsirelson (talk) 07:01, 21 March 2010 (UTC)
An explanation added; please look now. Boris Tsirelson (talk) 10:13, 21 March 2010 (UTC)
I suspect "Observation space" is not standard in the literature, and should be omitted. At least as a PhD student in Stochastic Analysis with decent background in probability and measure theory, I've never seen used before. -- Some random passerby
Also I, an old professor-probabilist, did not see it before (that is before 21 March 2010). Probably, for now it is used mostly by non-mathematicians. But does it mean that it should be omitted? Boris Tsirelson (talk)
I think that the use of non-standard terminology is unfortunate because it makes it harder to read and understand. Especially so in definitions. I also think measure-theoretic definitions are mostly of interest to mathematicians, and thus that terminology should be taken from there. -- The same random passerby.

See also Wikipedia talk:WikiProject Mathematics#Codomain of a random variable: observation space?. Boris Tsirelson (talk) 16:50, 27 March 2010 (UTC)

The term seems self-explanatory when used the way it's used in the article. Someone used the word "unfortunate" above without rigorously defining it. I don't have a problem with that. Michael Hardy (talk) 18:50, 27 March 2010 (UTC)

When you're in a section called "formal definition" it is necessary to define stuff down to this level. Okay then, so although I can make a stab at defining "unfortunate" to a no-mathematically-inclined person (it describes an observation of a random variable whose outcome is contrary to the desires of one's motivational consciousness) I would not be able (in this context) to determine rigorously what an "observation space" is. When something is described as "self-explanatory" I always suspect that this is because the person so describing it lacks the ability to define it. As a logician I can not accept this as an answer. And as a statistician (i.e. I'm not one, I'm just learning this stuff as I go along) I don't understand what an "observation space" is in the terms of the mathematical objects that a probability space is defined in. --WestwoodMatt (talk) 21:35, 27 March 2010 (UTC)
Formal definitions are supposed to be formal. Also, adding definitions not used in the literature is probably a breach of the "no original research" criteria for wikipedia articles.

Enough is enough. Observation space is gone.

Add "mode", "tail", "inflection", etc. to terminology?[edit]

Shouldn't the basic terminology used to discuss/describe a distribution be included here? It would not only fill out the basic presentation of the material but also provide an anchor for references to such terms in other articles. Jojalozzo 03:02, 25 July 2011 (UTC)

Strange terminology[edit]

As far as I know, "probability distribution" is, in general, a probability measure (rather than this or that function). In some (but not all) cases it can be described by a cumulative distribution function. Sometimes also by the probability mass function; sometimes also by the probability density function. But the lead says "a probability mass, probability density, or probability distribution is a function..." Or is it meant that a measure is also a kind of function (namely, a set function)? But no, this is not written in the sections. Boris Tsirelson (talk) 05:57, 2 July 2012 (UTC)

I have rewritten the lead to try to overcome this problem. But the rest of the article is extremely short of anything understandable about probability measure. Melcombe (talk) 19:20, 3 July 2012 (UTC)
Nice! Much better than before. (But it would be enough, to link "probability measure" in the lead only once.) Boris Tsirelson (talk) 21:09, 3 July 2012 (UTC)

Normal distribution[edit]

Why does this article refer to the Gaussian distribution as "the most important distribution"? Isn't this kind of arbitrary? — Preceding unsigned comment added by 2607:4000:200:13:1A03:73FF:FEB3:B07C (talk) 05:46, 27 August 2012 (UTC)

Good question but no it's not arbitrary: there's a theorem somewhere that says that given a large enough sample size, all distributions tend towards the Gaussian in the limit. --Matt Westwood 09:46, 27 August 2012 (UTC)
A quote:
"Gaussian random variables and processes always played a central role in the probability theory and statistics. The modern theory of Gaussian measures combines methods from probability theory, analysis, geometry and topology and is closely connected with diverse applications in functional analysis, statistical physics, quantum field theory, financial mathematics and other areas."
R. Latala, "On some inequalities for Gaussian measures". Proceedings of the International Congress of Mathematicians (2002), 813-822. arXiv:math.PR/0304343.
Boris Tsirelson (talk) 12:41, 27 August 2012 (UTC)
An example: The normal distribution in Rn is the unique (up to scaling) rotation-invariant probability measure with independent components. This purely mathematical result is fundamental to the Kinetic theory of gases. --Rainald62 (talk) 21:36, 29 April 2013 (UTC)
Yes... You mean Maxwell–Boltzmann distribution. Boris Tsirelson (talk) 05:49, 30 April 2013 (UTC)

generic statistical distributions (especially sample distributions)[edit]

Right now there seem to be somewhat strange redirects. What article should I link to for a plain old distribution of occurrences as observed in a finite sample?

Nanite (talk) 18:45, 21 January 2014 (UTC)

(Non-)Zero Probabilities for Continuous random variables[edit]

I believe "In contrast, when a random variable takes values from a continuum, probabilities can be nonzero only if they refer to intervals" is incorrect. Imagine a random variable whose image is [0,1]. Let the point 1 have a 50% chance of occurring, and the rest of the probability mass is uniformly distributed amongst [0,1). Is there anything wrong with this counter-example? Note that obviously this counter-example extends to letting any finite number of points of the image of a continuous random variable have non-zero probability. (talk) 17:23, 23 September 2014 (UTC)

Right. But that phrase occurs in Introduction, and probably is not meant to be a theorem. Maybe "In contrast, when a random variable takes values from a continuum then, typically, probabilities can be nonzero only if they refer to intervals"? Boris Tsirelson (talk) 20:23, 23 September 2014 (UTC)

Probability in function spaces[edit]

I feel that this article is incomplete because it deals only with 'random variables' (scalar or not). However, the notion of a 'random element' can be applied to any set. Specifically, a probability measure on a functional space is a well defined concept under the Kolmogorov measure theoretic point of view. Nevertheless, there are several items on the article that does not apply easly to the case of random functions. For exemple, the concept of a probability density for random functions does not exist ([1]); the concept of cumulative distribution function is replaced, in most presentations, by a infinite hierarchy of cumulative distribution functions; etc. Crodrigue1 (talk) 21:36, 12 October 2016 (UTC)

True. This is shortly mentioned in "Kolmogorov definition" section. This is more often called "probability measure" than "probability distribution". And, specifically for function spaces, this is usually treated in "Random processes". Boris Tsirelson (talk) 06:40, 13 October 2016 (UTC)

Mode: a doubt[edit]

See Talk:Mode (statistics)#Different treatment of discrete and continuous? Boris Tsirelson (talk) 18:58, 26 June 2017 (UTC)


There is some confusion about the term 'continuous r.v.'. A continuous r.v. is in this article defined as having a continuous cumulative distribution function, hence it doesn't need to have a density. In the article probability density function it is said a density belongs to a continuous r.v. Madyno (talk) 08:49, 23 July 2017 (UTC)

  1. ^ Novikov, 1965; Functionals and the random force method in turbulence theory; Soviet Physics JETP, 20(5), pp. 1290 - 1294