Template talk:Infobox probability distribution

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Statistics (Rated Template-class)
WikiProject icon

This template is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

 Template  This template does not require a rating on the quality scale.
 

Usage[edit]

To use this template, put this in the article and fill it in as appropriate (see below code for detail):

{{Infobox probability distribution
| name       = 
| type       = 
| pdf_image  = 
| cdf_image  = 
| parameters = 
| support    = 
| pdf        = 
| cdf        = 
| mean       = 
| median     = 
| mode       = 
| variance   = 
| skewness   = 
| kurtosis   = 
| entropy    = 
| mgf        = 
| char       = 
}}

Fields (data goes between the equal size and pipe):

  • "name" should be the name of the distribution without "distribution" in it (e.g., "Normal", "Exponential")
  • "type" should be either "density" or "mass", which corresponds to probability density function and probability mass function
  • "pdf_image" should be a full wikicode for an image (including the "[[File: ...]]"
  • "cdf_image" same as "pdf_image"
  • The following should all be tex equations and exclude any function labels (exclude function portion like f(x; \mu, \sigma^2); brevity is key)
    • "parameters" should be the parameters for the distribtion (such as \mu and \sigma^2 for the normal distribution)
    • "support" should be the support of the distribution, which may depend on the parameters. Specify this as "<math>x \in some set</math>" for continuous distributions, and as "<math>k \in some set</math>" for discrete distributions.
    • "pdf" the pdf/pmf
    • "cdf" the cdf
    • "mean" the mean
    • "median" the median
    • "mode" the mode
    • "variance" the variance
    • "skewness" the skewness
    • "kurtosis" the kurtosis excess
    • "entropy" the information entropy
    • "mgf" the moment generating function
    • "char" the characteristic function

If any of these don't exist, then put "Does not exist" (or something to the same effect); leave blank if unknown.

Standard Plots[edit]

  • Construction - Standard plots should now be produced in Scalable Vector Graphics (SVG) format, approximately 1300 pixels wide by 975 pixels high, using Postscript Times font size 10 pixels and a line thickness of 3.6 pixels. Symbols should be entered using their unicode codes; e.g., #x03b8 for miniscule lambda. The axes of the plot should be in a ratio of 4 wide to 3 high. The size of the image should also be in a ratio of 4 wide to 3 high. The image should consist of a few distinct colors only, different curves having different colors. When multiple curves are plotted, a table of parameter values and associated colors should be included on the plot. If one of the curves is "prototypical" or "standard", it should be in black. We want these plots to be usable in any language, so only numerals and mathematical symbols should appear in the image. Axes should be labelled with the appropriate symbols (x and p(x)/P(x) for PDF/CDF plots, k and pk/Pk for PMF/CMF plots. Further explanation should be made in the text caption for the image. Continuous distributions should be done as solid lines. Display of discrete distributions should use points connected by lines, with a short explanation in the caption (e.g."connecting lines do not imply continuity"), unless a single plot is used, in which case "impulse" plots are best.
  • Upload Procedure - Images should be uploaded to the Wikipedia commons. The file names should be of the form XXX_distribution_ZZZ.svg where XXX is the distribution name and ZZZ is either "PDF", "CDF", "PMF", or "CMF" depending on which function is plotted. The description page for the plots should contain a short description, the GFDL tag, and a link to Category:Probability distributions images". When done using gnuplot, the relevant instructions should be included.

Discussion[edit]

(I favor visible points connected by lines, with an explanation in the caption that the lines don't imply continuity. I know this goes against the "no caption" idea, but I really don't like losing the freedom to clarify things with a caption. Especially if we don't use axis labels! PAR 05:12, 10 Apr 2005 (UTC))

Comments: 1) In the past, we've used filenames of the form FOO_distribution_PDF.png (additional underscore/space before "PDF"). 2) Gnuplot "linespoints" style for discrete PMFs is a good idea. The alternative, using "impulses" or something similar, makes it difficult to display more than one distribution at a time. CDFs can be plotted using the floor function and one of the "steps" styles. Alternative, if only a single PMF is plotted, "impulses" is probably the best style to use. 3) I don't think including any kind of descriptive text is a good idea, because it renders the plots less useful for other editions of Wikipedia, which may want to include their own descriptive text in French, Japanese, etc. A brief legend with formulas is fine, of course. --MarkSweep 06:07, 10 Apr 2005 (UTC)

I included the file naming convention you mentioned in the rewrite. I took out the "no caption" requirement in the template description of the pdf field, because by making the requirement of no text in the caption and no text in the image, we're boxed into having no explanation capabilities at all, and we definitely need the freedom to explain. I also added that the axes of an image need to be labelled (math symbols only). I notice that the mathematical symbols vary among the pages. Should there be some slight standardization, like p(a,b,c;x) for the PDF and P(a,b,c;x) for the CDF, with parameters a,b,c? I tentatively threw these into the specification too. PAR 12:36, 10 Apr 2005 (UTC)

Regarding captions, they can be added below plots outside the image. Have a look at normal distribution or my recent edit to Poisson distribution.
About the formulas, I've seen f(x\,|\,a,b,c) used for the PDF/PMF and F(x\,|\,a,b,c) for the CDF in the articles themselves. Or perhaps use a more descriptive and/or conventional name instead of f, like N(x\,|\,0,1) for the standard normal PDF and \Phi(x\,|\,0,1) for the standard normal CDF. I've also used things like \mathrm{Gamma}(\lambda\,|\,\alpha,\beta) (see exponential distribution#Bayesian inference for an example).
I'm not sure if axes need to be labeled. We've been fairly consistent in using x for continuous distributions and k for discrete distributions. So the label of the horizontal axis should be obvious. (Though I realize that repeating the obvious wouldn't hurt either.) --MarkSweep 19:39, 10 Apr 2005 (UTC)

I like that caption method you used. I prefer p(x) and P(x) (p for probability) but if you have any argument against it, lets do f(x) and F(x). For the parameters, f(x\,|\,a,b,c) looks fine to me. PAR 20:26, 10 Apr 2005 (UTC)

I would prefer p(x) for pdf/pmf and P(x) for cdf and to use the captions I made for Normal (external to image in small font). Cburnett 22:37, Apr 10, 2005 (UTC)

I will leave the specification as it stands, then, using p and P. PAR 00:04, 11 Apr 2005 (UTC)

Suits me. --MarkSweep 00:24, 11 Apr 2005 (UTC)
Another thing though: I saw you added a CMF plot for the Poisson distribution with essentially the same caption as for the PMF saying that the function is only defined for integer values. I don't think that's strictly true: I would have expected to see a step function that's constant almost everywhere except for non-continuous jumps at integers 0 to n. --MarkSweep 00:24, 11 Apr 2005 (UTC)

I don't think that discrete distributions even deal in the real number system, just integers, (or integral multiples of something) at least as far as the random variate is concerned. The CDF is undefined between the integers because the random variates are not selected from the real number system, but from the integer number system, (or equivalently some real # time the integers). I looked at the CDF article and it only talks about continuous distributions. Maybe we should write an article for CMF's. PAR 01:58, 11 Apr 2005 (UTC)

For a discrete random variable X it's customary to define the CDF as
P(x) = \Pr[X\leq x] = \sum_{k\leq x} p(k).
That way, x can be a real number and P is then a step function P: \mathbb{R} \to [0,1]. I don't think there is a need to define a separate notion of a CMF. --MarkSweep 02:48, 11 Apr 2005 (UTC)

I really think that is wrong. X and x have to be from the same set. X is discrete, x must be discrete. I mean its wrong by dimensional analysis. X and x must have the same dimensions. For example, when dealing with income distribution, there are N people ranked by income, and X is R/N which means XN has units of people. If x is just any real number then we could have xN=3.7. 3.7 what? 3.7 people, but there's no such thing as 3.7 people. In Poisson statistics there is no such thing as 3.7 counts. It's like saying we need to define the CDF over the complex number plane.

P(x) = \Pr[X\leq \Re(x)] = \sum_{k\leq \Re(x)} p(k).

That way, x can be a complex number and P is then a step function P: \mathbb{R} \to [0,1]. Not only that, it screws up the plots :) PAR 16:10, 11 Apr 2005 (UTC)

Casella & Berger's Statistical Inference (ISBN 0-534-24312-6) defines the pmf — pedantically — as

f_X(x) = P(X = x)
=
\left \{
 \begin{matrix}
  (1-p)^{x-1} p & \mbox{for x=1,2,...} \\
  0             & \mbox{otherwise}
 \end{matrix}
\right.
which f_X is defined on the real line and can use the same definition of cdf as the continuous RV's. But they seem to generally not write the otherwise condition. Cburnett 00:23, Apr 12, 2005 (UTC)

--

Ok, I went to check my books and found the name of the plot is a "cumulative frequency polygon" or a "cumulative ogive". Please google these terms. With regard to the definition of the cumulative distribution function for discrete variables, the relevant books I checked say:

Guenther, "Concepts of Statistical Inference": "Pr(X\le r) means the probability that the expreiment yields a value less than or equal to r... Almost always r or x will be one of the numerical values which the experimaent can generate." All cumulative distribution functions for discrete variables are given as lists at the values of X. No plots.
Parsons, "Statistical Analysis": Section 2.4 is titled "Graphic representations of frequency distributions" and lists only the "cumulative ogive" as a method of plotting the cumulative distribution function. Again, the cumulative distribution function examples are given as lists in X.
Lindgren, "Statistical Theory": Theres no concise quote, but its clear that the CDF is defined as a continuous function on the real number line. Plots are done accordingly.

Basically, there is some disagreement as to the proper definition of the CDF and how to plot it. There is however, ample justification for the use of the "cumulative ogive" and since it is desireable to have multiple plots of the CDF that are easily readable, I favor the ogive plots.

Also, every reference I checked uses f(x) and F(x) as the PDF and CDF, so I think I will change my mind on that. PAR 05:11, 12 Apr 2005 (UTC)


I was about to make some Zipf and Zeta distribution plots, and I was thinking it would be very informative to plot these PMF's on a log-log scale, where they become straight lines. Assuming there's an explanation in the caption, does this sound like a good idea? PAR 21:07, 20 Apr 2005 (UTC)

Absolutely. I did the same for the Yule-Simon distribution (which I should re-do to match the standard style). Perhaps do both linear scale and double log scale plots for these three distributions? --MarkSweep 23:07, 20 Apr 2005 (UTC)

I uploaded the Zipf plots, but inadvertently labelled the CMF horizontal axis with k. Before I fix it, can anyone remind me of the reason for not labelling axes? Is it just to maintain flexibility in the text notation? PAR 01:33, 23 Apr 2005 (UTC)

Use of color[edit]

Discussion moved here from User talk:MarkSweep.

Thanks for your work on these graphics; they look great. I have a gripe though (sorry): could you guys use dashed/dotted/marked lines for the different colors in recognition of the needs of color blind people? Or at least make a link to a color blind version? The most common color blindnesses by far are protanopia and deuteranopia. Protanopes require some distinguishing scheme for green/yellow, green/orange, green/brown, blue/purple, and cyan/grey. I'm not sure about deuteranopes, but I imagine they would have trouble with red/orange, red/yellow, blue/cyan, and purple/grey. --Chinasaur 01:25, 11 Apr 2005 (UTC)

Right, I'm peripherally aware of the issue, especially concerning red/green color blindness (forget what it's called, if only I had an encyclopedia…). Unfortunately, the choice of colors provided by gnuplot is extremely limited and not at all compatible with color deficient vision. I seem to recall that it is possible to use a small set of colors that most people can distinguish easily. Do you have any advice on which colors to use? This is assuming that we can get gnuplot to use colors specified by arbitrary RGB triples. Failing that, one could use dashed and dotted lines. In any case, we're rapidly approaching the point where we need to automate the creation of these plots. Does anyone have experience with Gimp scripting? --MarkSweep 02:05, 11 Apr 2005 (UTC)
My understanding of color blindness isn't the lack of ability to see colors, just that shades of red/green (or whatever) appear to be the same. So unless you have someone with color blindness on hand to determine if two colors appear the same then I don't see it as worth the time *guessing* what they *might* see. Also, with providing source then anyone could generate their own plots (though not everyone will be able to, at least it's a start). Cburnett 03:42, Apr 11, 2005 (UTC)
Side note: Even considering that I personally have run octave at some point in my life, I think that assuming everyone (anyone?) will be able and willing to generate his/her own plots given the source is absurdly optimistic :)...
Main point: I gave you some suggestions above about colors that are difficult for protanopes (someone missing the "red", i.e. long wavelength cone cell) to distinguish; I am protanopic, so this is accurate. In general the principle is simple: if you take any color and change the R value in its RGB, this change will be hard for a protanope to notice. It's a little trickier for the deuteranopes because (to simplify grossly) the green of RGB does not so well match the "green" cone cell that they are missing. However, the corresponding principle should be adequate for your purposes.
For example, in the plots at Beta distribution, I believe there are blue (RGB:0 0 1) and purple (1 0 1) lines that I can barely distinguish, red (1 0 0) and black (0 0 0) lines that I can distinguish slightly but not well, and a light colored line that could be either green (0 1 0) or yellow (1 1 0). As you can see, the difficulties encountered by a real, live protanope are predictable by the principle given above.
My suggestions are:
  1. For confusing colors, differ them in saturation and brightness in addition to differing them in hue. For example, rather than just blue=(0 0 1) and purple=(1 0 1), use blue=(0 0 .5) purple=(1 .5 1). For red and black, use (1 .25 .25) and (0 0 0). Etc.
  2. Alternatively, for confusing colors differ the line style, so for blue and purple, make one dashed. Likewise for red and black, yellow green and orange, etc.
Whether you want to muck up your current graphs or create alternative colorblind versions and then somehow link to those, up to you.
These suggestions cover cases of dichromats, people missing one cone entirely. Another common form of red/green color blindness is anomalous trichromacy, people with all three cones but messed up spectra. I don't think your plots should be too problematic for anomalous trichromats.
Sorry this is so pedantic; probably there should be a more central WP color blindness styleguide for creating graphics; then I would only have to rant about this in one place. If anyone knows where and how to make this happen I will be happy to contribute. --Chinasaur 10:12, 11 Apr 2005 (UTC)
Regarding side point: most people that would be genuinely interested in generated pdfs/pmfs and cdfs of distributions are likely to know of a way to generate plots (either through gnuplot or matlab or some of the statistical packages).
Regarding main point: finally, someone complaining about color choice that is color blind! :) How's this for an idea: on each image page (i.e., Image:Normal distribution cdf.png) you/me/whomever picks a point (abscissa or ordinate) and then relates the order they appear to the legend on the graph. So for the normal cdf, I could say something like "at ordinate=0.3, the order of plots from left-to-right matches the legend top-to-bottom."? For most plots, there's some point where you can do this. For the pdf of the normal: "the plots with the peak at 0 are the top 3 plots in the legend in the same order; the bottom legend entry is the plot with peak at -2" Basically describing the plots instead of marking up the plots.
Though, now that I think of it, isn't it easy in gnuplot to add marks to lines and they show up in the legend? (By marks I mean symbols on top of a solid line, not using dotted or dashed lines). Cburnett 17:40, Apr 13, 2005 (UTC)
Yes, one could use "linespoints" style. However, that would look very similar to the plots of PMFs we have now and could be confusing. Probably best two either switch on "dashed" in gnuplot's PostScript "terminal", or to use the method you describe. Alternatively, one can put arbitrary labels on plots, which would require manual intervention. Overall I think "dashed" output and/or using a safer set of colors that vary on more than one dimension would be the best choice. --MarkSweep 19:35, 13 Apr 2005 (UTC)
Will anyone still read this...? Cburnett, good point about the nerdiness of people reading these articles. Your colors solution is clever, but looking at the figures and imagining the caption you would have to add it seems a little laborious. My suggestion is the use of two linestyles, solid and dashed. The colors that are likely to be confused are not that numerous, so you should be able to cover most confusing groups with two linestyles; use solid/dashed for pairs like red/black, blue/purple, green/yellow, red/brown, grey/cyan, grey/purple (see that's already way more lines than you need). --Chinasaur 11:57, 15 July 2005 (UTC)
  1. A guide for creating plots has been started at Wikipedia:How to create graphs for Wikipedia articles. It is mostly about gnuplot so far.
  2. It is not too difficult to alter the colors of a PostScript file after the fact in a text editor. Maybe someone could come up with a palette of colors that is distinguishable by pretty much anyone, and we could convert the gnuplot-generated colors to that palette? A .ps file text-conversion script could probably be made by someone who knew what they were doing.
  3. I really don't like dashes or dotted lines. Sorry.  :-) In my opinion they should only be used for special situations, like asymptotes of a function or whatever. Maybe we could come up with another scheme that doesn't look bad, like notating each plot with a symbol or something? - Omegatron 22:44, July 24, 2005 (UTC)

I have updated the instructions for Standard Plots to reflect Wikimedia's preference for Scalable Vector Graphics (SVG). These are actually easier to create in Gnuplot using terminal svg. They are more useful than png or other rasterized formats. Lovibond (talk) 16:10, 16 April 2009 (UTC)

Status of usage[edit]

The following is a list of probability distribution pages, as classified by the probability distribution page. Following the list is an additional category "unclassified" which have not yet been entered into the probability distribution page, and need to be. The status of each page is given by the letters following the name of the page

  • A - has an infobox
  • B - has standardized plots of PDF/PMF and CDF/CMF
  • C - has all relevant infobox entries filled other than images
  • D - has gnuplot code in the above image description pages
  • E - uses "standard" notation

The status is not up to date!!! Please bring it up to date as you check out the pages.

List[edit]

Testing[edit]

I put this template on:

to test how it looked.

The purpose for the template is to consolidate the basic information in one spot since it seems rather spotty and inconsistent across the distribtion articles. Please give me some feedback. Cburnett 02:19, 10 Mar 2005 (UTC)

  • I like it. I've been meaning to work on these articles and to fill in all those details. Let's start with the most important distributions and also create some missing articles about the less common ones. --MarkSweep 07:17, 10 Mar 2005 (UTC)
  • I think it is an excellent idea, and have added it to a number of probability distribution pages that I am interested in. I have a problem with the idea of simply entering the name of the distribution without adding the words "distribution" after it. For example, the title of the exponential distribution infobox should read "exponential distribution" not just "exponential". Replicating what I wrote on the exponential distribution discussion page:
I understand that the infobox text reads "Name: Exponential" and that makes sense, but what is displayed is just "Exponential" and that makes no sense, its an adjective thats missing a noun to modify. Alternatively, we could change the infobox to display "{{{name}}} distribution"?
I mean, if you were writing a paper on the exponential distribution, would you title it "exponential? Even if its a name, like Poisson, its a modifier. Are there hidden benefits to having a poorly written title? PAR 17:29, 1 Apr 2005 (UTC)
P.S. to Cburnett - sorry if there was any aggravation, I didn't know this page existed.
Discussion is more relevant on this talk page instead of an individual distributions page.
My primary reasons for not wanting distribution in the infobox is that it a) takes up width of the template and will most likely cause it to wrap (ugh) and b) I don't see it as wholly necessary since the " distribution" is at the top of the page. Further point on b, I commonly say (and hear) things like "Let X be gamma" or "Let X be gaussian" since "distribution" is understood and, thusly, implicit. I have no qualms with excluding distribution from the infobox.
Actually, I would just assume drop the name from the infobox than clutter it up... Cburnett 19:03, 1 Apr 2005 (UTC)
Well, we could make it a smaller font, put in line breaks for long ones, etc., but I favor keeping it rather than dropping it. My qualms remain, but its a style issue, not a true-false issue, so I'm not fanatical. Can we try to put another interested contributor on the spot as a tie breaker, e.g. MarkSweep? PAR 19:30, 1 Apr 2005 (UTC)
I'm inclined to side with Cburnett on this, purely for reasons of space. We have articles with long titles like Scaled-inverse-chi-squared distribution, and I don't see how the full title would fit in an infobox, which shouldn't be wider than 350px. --MarkSweep 04:15, 2 Apr 2005 (UTC)
Ok, I accept the will of the majority but I reserve the right to complain endlessly about it. PAR 06:33, 2 Apr 2005 (UTC)

Addition field: support[edit]

I definitely think Support (mathematics) should be added right above parameters. Though, I think Support (statistics) or Support (mathematics)#Statistics should be created to specifically address pdf/pmf supports. (Posting this here since I won't get to it for a bit.) Cburnett 02:57, 21 Mar 2005 (UTC)

I concur. I'm adding this now. --MarkSweep 21:48, 23 Mar 2005 (UTC)
Actually, I changed the order: parameters first; then support (which may depend on the parameters, e.g. for the binomial distribution; then the pdf and cdf formulas, which depend both on the parameters and the support/domain. --MarkSweep 22:01, 23 Mar 2005 (UTC)


Now, how about a yes/no answer to "In exponential family?" Can't imagine that it'd be fun/easy to work that into each article. Easier to be in the infobox. Cburnett 00:28, 24 Mar 2005 (UTC)

Good idea. Perhaps a combination of that and a link to the conjugate prior distribution, if available? --MarkSweep 23:49, 7 Apr 2005 (UTC)
Oh, and also sufficient statistic etc. --MarkSweep 01:42, 9 Apr 2005 (UTC)
Perhaps we should get everything else done first then worry about adding stuff? :) With the number of distributions I think it'll still be a fair amount of work just to get them all using the template with distribution plots and somewhat cohesive articles to boot. Otherwise, I'd like to see expo family, conjugate prior if it has one, sufficient statistic for N samples, and anything else we can think of.
Though I have talked to some people on IRC and know it won't happen any time soon, but it'd be neat to have an online pdf/cdf generator using gnuplot or something. Put in the distribution and parameters and plot it. Cburnett 03:54, Apr 11, 2005 (UTC)

Standard Layout for Probablity Distribution Pages[edit]

I suggest we continue this on Wikipedia talk:WikiProject Probability. --MarkSweep 03:22, 19 August 2005 (UTC)


Subpages[edit]

These two are used in the template to only show "PDF" or "PMF" depending on the distribution:

Inverse-gamma[edit]

Can anyone verify the pdf image at Inverse-gamma distribution? I don't think I've ever actually seen a plot of the pdf so I'm not sure it's correct. Cburnett 05:44, Apr 7, 2005 (UTC)

I plotted them and, by eye, they look fine. PAR 12:07, 7 Apr 2005 (UTC)

Italics or not?[edit]

Should this be italicized or not?

X \sim N(\mu, \sigma^2) or X \sim \mbox{N}(\mu, \sigma^2)
X \sim Gamma(\alpha,\beta) or X \sim \mbox{Gamma}(\alpha,\beta)
X \sim Inv-Gamma(\alpha,\beta) or X \sim Inv\mbox{-}Gamma(\alpha,\beta) or X \sim \mbox{Inv-Gamma}(\alpha,\beta)

I guess I prefer the italics but tex interprets the hyphen as subtraction (first inv-gamma) but putting the hyphen into an \mbox{} makes it look better (second inv-gamma). So going italics would mildly complicate names unless we drop the hyphen altogher:

X \sim InvGamma(\alpha,\beta) or X \sim \mbox{InvGamma}(\alpha,\beta)

Cburnett 17:25, Apr 13, 2005 (UTC)

Well, since no one gave input: I'm using no italics using \mathrm or \mbox (if there's a hyphen). Cburnett 05:14, Apr 24, 2005 (UTC)
That sounds fine. On a related topic, should there be an entry in the infobox to inform about these names? Perhaps the first entry after the plots could be "Formula: \mathrm{Binom}(n, p)", for example. --MarkSweep 05:47, 24 Apr 2005 (UTC)
To be pedantic, I would prefer something like X \sim \mathrm{Binomial}(k; n, p) which explicitly states the parameters, their order, and the variable used for the support. Though, for binomial I've also see "Bin". It's just whatever we want to set, I guess.
Maybe it's time to start Wikipedia:WikiProject Probability distributions or move this all to Wikipedia:WikiProject Probability (though all this about distributions is just a subset of probability so I think it merits its own project...but not if we're going to set notation of distributions). Cburnett 06:41, Apr 24, 2005 (UTC)

Uniform Distribution[edit]

We need some input on the Talk:Uniform distribution (continuous) page. Michael Hardy and I have had a running discussion on the values of the uniform distribution at the transition points. I think we have settled the text aspect of the problem, but the PDF plot is at issue now. Most of the discussion page is devoted to our back and forth, so if people could read the discussion and put in their two cents, I think we can settle this issue. PAR 03:10, 25 Apr 2005 (UTC)

Entropy[edit]

Is this the right kind of entropy in this context? Maybe "free entropy" should be used instead? (Michael Hardy)

I think MarkSweep or CBurnett added this to the template, we should ask them what they had in mind. I have had a question about this too, because the entropy is defined in the template as:
S=\int_{-\infty}^\infty f(x)\ln(f(x))\,dx
and I don't see that definition explicitly in the information entropy article. By the way, what is free entropy? PAR 11:10, 21 May 2005 (UTC)
I think it was me who added that entry to the infobox template. The definition of the entropy functional I've been using is the following:
\mathrm{\Eta}(f) = - \int_{-\infty}^{\infty} f(x)\,\ln(f(x))\,dx\!
with the added convention that 0\,\ln(0) = 0. This is how entropy is defined in the information entropy article, except that that article uses discrete distributions in its introductory examples and doesn't explicitly mention this integral. The above integral is also the definition that has been used in all other infoboxes (when it exists and can be expressed compactly). I don't see a reason for using a different notion of entropy for the Wigner semicircle distribution, just because it arises primarily in physics. --MarkSweep 20:59, 21 May 2005 (UTC)
One problem with the definition of entropy in the infobox template is that there is a lack of clarity regarding the units – i.e., the scaling. Especially for discrete pdfs, most people tend to think of entropy as something to be measured in units of bits; however, it is sometimes more convenient to use units of nats, which is what is done in the formula above. The information entropy article uses a mixture of the two, so linking to it in the infobox label does not clarify which is intended. Note, for example, that the geometric distribution and binomial distribution articles seem to be currently reporting entropy in units of bits, whereas the Bernoulli distribution and exponential distribution articles seem to be using units of nats. I suggest to expressly state "(in bits)" or "(in nats)" in the label within the infobox template, and then try to fix the articles that aren't using the agreed units.
Another problem is that the definition (or meaning) of entropy for the discrete and continuous pdf cases is really rather different. For a discrete random variable, the entropy is the minimum average amount of information necessary to exactly represent the value of the random variable. Thus, in this case, the entropy can be interpreted as the amount of information conveyed per sample of the random variable. For the continuous case, entropy is defined as differential entropy (as given above) and is somewhat different. In this case, the expected amount of information necessary to exactly represent the value of the random variable is generally infinite. Thus, in the continuous case, the entropy must not be interpreted as the amount of information conveyed per sample. The interpretation of what "entropy" means in the continuous case is more difficult and obtuse (involving rate–distortion theory).
2001:4898:E0:2019:B581:94D:4C40:2BDC (talk) 20:55, 8 May 2013 (UTC)

Compound Poisson - help[edit]

I was playing around with a compound Poisson distribution, which is the sum of a number of identically distributed variables Xi. The number of elements in the sum is a Poisson-distributed variable. For every probability distribution, there will be a compound Poisson version. The one I was working on was one in which the Xi are zeta-distributed. My question is - what is the name of this distribution? "Compound Poisson/zeta" or what?. In general, if XXX is the name of a distribution, whats the name of the compound Poisson distribution over XXX? Thanks - PAR 01:15, 7 Jun 2005 (UTC)

additional field? Exponential family[edit]

Many distributions are in the Exponential family and I think including the exponential family form to would be a nice addition. It is true that even the exponential family page has only one distribution in the exponential family form, but I think this would be a worthwhile endevor for the wikipedia. I'm not sure how the specification of the natural parameter should be handled, another field or in the exponential family form field. Pdbailey 17:02, 24 April 2006 (UTC)

Template needs repairing[edit]

One of the box items points to a non-existing page Template:Probability distribution/link. The template therefore needs repairing. I don't know what was intended here, so could someone who does know please make the necessary correction. DFH 10:40, 31 March 2007 (UTC)

I think what is happening here is that this construction splices the entry in the "type" field (i.e., "mass" or "density") into "Probability <blank> function", which is then made into a wikilink. When there is no type entry or one besides those two (e.g., Cantor distribution), the construction fails to parse to anything linkable. How to fix it I don't know, but I hope this helps someone who does. Baccyak4H (Yak!) 20:10, 21 August 2007 (UTC)
I've just noticed that the key to this is the #subpages section above (Bit late for DFH I suspect). It should be straightforward to make another subpage e.g. for singular distributions, if someone really wants to. --Qwfp (talk) 11:53, 10 February 2008 (UTC)

background color clashes with math png[edit]

Whenever one of the fields is done using LaTeX math markup and the result converted to PNG rather than to HTML, the background is white, which clashes with the table background color. Is it possible to change either the table background or the rendering math background to be consistent or at least less clashing?

I believe I tried to find a way to change the math background some time ago but never succeeded, so suspect the table would be where it is possible. Any thoughts or suggestions? Baccyak4H (Yak!) 18:45, 15 April 2008 (UTC)

Probability generating function?[edit]

Per a discussion on the math ref desk, I had the idea of allow a field in this template for the probability-generating function for discrete (i.e., type=mass) distributions. It certainly may be useful, and relevant, but it may make the template a little big and bulky. Thoughts? Baccyak4H (Yak!) 14:27, 4 September 2008 (UTC)

Yes, and the cumulant generating function is also worthwhile considering for the template. Bo Jacoby (talk) 15:07, 4 September 2008 (UTC).
It would be worthwile to add these fields, especially since for some distributions the pgf cannot be found anywhere in the article (e. g., Poisson distribution), even if it is—more or less—trivial to compute. --130.92.9.57 (talk) 14:40, 27 February 2009 (UTC)

Narrowing right-side margin for wider text[edit]

22-Feb-2009: I am setting the right-hand margin to zero for the infobox, by using the style coding "margin-right: 0em". Formerly, it had shifted the infobox to the left by about 4 letters (characters), causing the left-side text to be squeezed by 4 letters more narrow. This change should cause most articles to allow more text to the left-side of the infobox. -Wikid77 (talk) 15:25, 22 February 2009 (UTC)

New parameters: box_width and marginleft[edit]

22-Feb-2009: I am adding new parameter "box_width" (default 325px) to allow narrowing the infobox to allow more left-side text. Also, new paramter "marginleft" sets the left-hand margin outside the infobox, by using the style coding "margin-left: 1em". Formerly, the infobox had always been shifted from the left-side text by about 4 letters (characters), causing the left-side text to be squeezed by 4 letters more narrow. Those changes should not affect any articles, until those new parameters are specified within an article, to allow more text to the left-side of the infobox. -Wikid77 (talk) 15:25, 22 February 2009 (UTC)

Put some documentation inside template[edit]

22-Feb-2009: I have put a limited amount of documentation inside the template, to be displayed (as typical) during the template stand-alone mode. New versions of the MediaWiki software skip the documentation when formatting pages, and older versions could allow at least 10 copies of a medium-sized template to be used within one article, before the template-processing buffers filled with coding & documentation. The English Wikipedia has been upgraded to skip template-documentation text since January 2008. Documentation text has always been omitted before displaying an article page over the Internet (unless editing the template).

Feel free to revise or reduce that documentation in the template. -Wikid77 (talk) 15:25, 22 February 2009 (UTC)

Formating of support intervals: why the semi-colon?[edit]

Presently the template uses the notation such as "[0;4]" to indicate an interval from 0 to 4 inclusive. I am not familiar with a semi-colon being used to denote an interval. I have always seen commas used (at least in English textbooks) such that the previous interval would be denoted by "[0,4]". Is the semi-colon in common use in statistics? My experience in analysis is such that I have only come across the comma. Jason Quinn (talk) 19:33, 9 May 2010 (UTC)

The template doesn't enforce any particular formatting for the support intervals (and in fact, support is rarely an interval). If you feel the commas are more appropriate, feel free to edit the support value in the infobox.  // stpasha »  19:44, 9 May 2010 (UTC)
You are correct that it is not the template itself. I don't know what I was thinking when I wrote that. Still I am curious why ";" has been used very consistently in the support intervals when people have used this template. If it is not common practice, I may end up changing it in the future but I'll wait for more input. Jason Quinn (talk) 20:53, 9 May 2010 (UTC)

Quantile[edit]

I guess I'm out of practice with templates. I put in the quantile function as an option, entered it in the Exponential distribution, but it did not show up. How to fix this? PAR (talk) 01:06, 11 June 2014 (UTC)

Added.--Mpaa (talk) 11:09, 11 June 2014 (UTC)