Talk:Pareto distribution

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Mathematics (Rated B-class, Low-priority)
WikiProject Mathematics
This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of Mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Mathematics rating:
B Class
Low Priority
 Field:  Probability and statistics
WikiProject Statistics (Rated B-class, Mid-importance)
WikiProject icon

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

B-Class article B  This article has been rated as B-Class on the quality scale.
 Mid  This article has been rated as Mid-importance on the importance scale.

See also: Archive 1.


For those visual thinkers among us, can we have an example graph of this?

A very dull graph: starting at , the density falls as x increases and the cumulative distribution rises, each with a slope which becomes shallower for large x.
I have removed the statement If the value of k is chosen judiciously then the Pareto distribution obeys the "80-20 rule" since it depends on a right truncation which this distribution doesn't have; allowing such truncation judiciously would mean most distributions met the "80-20" rule.--Henrygb 00:13, 6 Aug 2004 (UTC)

Technical cleanup tag[edit]

It was commented to me that articles like this are not and should not be aimed at non-technical readers. WikiProject Science and other communal efforts I've seen generally have the goal of making the first part of the article accessible to the general public, but allowing for later parts which may be intelligible only to technical readers. That's certainly possible to do in this case.

I'm the one who commented, and I didn't say "articles like this"; I said articles on probability distributions, and I didn't just say "non-technical readers"; I said readers not familiar with the mathematical theory of probability. Michael Hardy 02:47, 6 Mar 2005 (UTC)

Because Pareto distributions are used in economics and sociology with regard to political issues of public interest, it's entirely likely that non-technical readers will arrive at this article needing to know what this thing is. Not necessarily in precise detail, but in vague outline, at least.

This article isn't very accessible even to many technical readers. I have a degree from MIT, and I've taken math up through differential equations.

Make any article as accessible as possible to as many readers as possible. Why not? When your son or daughter asks you a question, do you ever answer "this is not something you should know or understand"? I hope not. There is always something which can be said about even very remote, abstract, technical, difficult, ... subject matter. Often I find that a really thoughtful answer to a child, is comprised of exactly the salient points a lay reader would be helped by as well. In the current instance I have been studying 1/f noise, power law distributions, critical phase transitions,...and related subjects, and mention of the 80-20 rule brought me to this article. The article is opaque to me and no help to an interested reader with weak math background. Suggestions for improving this article... Explain in english, the relationship to power laws, explain the 80-20 rules' scale invariance, speak in english first, then write math formulas which are apparently unclear even to PHD mathematicians but may be helpful to some. Yes, many of us want a "vague outline." The belief that "... generally, articles on mathematics shouldn't need to be comprehensible to everyone ..." (see below) is both right and wrong in some ways, but is definitely not the correct attitude for contributors. The statement is correct in that full understanding of a math intricacy will be unavailable to many, but the statement is wrong in that some understanding is almost always available and when we contribute to Wikipedia we should strive to impart what understanding we can to all readers. These lay readers may be the Newtons and Laplaces of the future. It is worth noting that the "lay" user group is the largest user group. Most folks are not mathematicians, although we may still love math. — Preceding unsigned comment added by (talk) 16:41, 6 June 2014 (UTC)
But the relevant question is whether you've studied probability theory. Course 18.440 at MIT, titled Probability and Random Variables, does not require "up through differential equations", but only first-year calculus, which most MIT students have before entering as freshmen (and "first-year" is construed differently at MIT in this case). Anyone who's studied continuous probability distributions (and not just at MIT!) can understand this article. But yes, probably some things could be said for "lay" readers initially. Perhaps because of this distribution's occurrence in social sciences, that would make more of a difference in this case than with most probability distributions. But generally, articles on mathematics shouldn't need to be comprehensible to everyone who's studied only high-school math. Michael Hardy 02:47, 6 Mar 2005 (UTC)

I could make a graph (either mentally, digitally, or on paper) that plots a typical Pareto distribution, but that would be a lot of work that I shouldn't really have to do. I'm sure there are many scientists and computer engineers who would benefit from a better introduction.

... actually, if I were trying to write an introduction for non-mathematically inclined social scientists, a graph wouldn't be the first thing I would attend to. Maybe I'll work on this at some point .... Michael Hardy 02:47, 6 Mar 2005 (UTC)

Fortunately, I think all this article needs to be much more widely accessible is a graph or two of typical Pareto distributions, with labels and a brief explanation. -- Beland 02:19, 6 Mar 2005 (UTC)

I did study probability theory, back in the day, and found the article a bit terse. What I hoped to see at the start was a few extra paragraphs:

1. A brief general introductory paragraph or two pitched at people with only a craps or texas-hold-em knowledge of probability - why it matters, the elevator speech statement of what it means, etc.

2. Move the short section on things claimed to match a Pareto from the bottom of the article, with perhaps a few hard numbers added to it. (The usual - for k=1, x% will be <=3, with similar for k=2 or 3. This is still fluffy, but gives a numerical feel to that graph and the fluffy stuff in the first paragraph

Pretty much the same content, but with the take home goodies near the top. --ScottEllsworth 08:00, 20 Mar 2005 (UTC)

Pareto density at xmin[edit]

Don't we need to come up with a value at the transition point x=xmin? I'm in favor of x=(1/2) k/xmin because it allows definition in terms of, say, the Heaviside step function, and F-1(F(p(x)))=p(x) uniformly where F is the fourier transform. Whatever we come up with, I will alter the graphic accordingly. Paul Reiser 23:27, 15 Mar 2005 (UTC)

For purposes of probability theory, the value of a density at a boundary point does not matter since it does not affect the value of any integral. But for pruposes of maximum likelihood estimation in statistical inference, you'd probably want to make it the maximum. Therefore I would not use half the maximum. But of course, the inverse Fourier transform of the Fourier transform of the density may give you half the maximum. Michael Hardy 00:08, 16 Mar 2005 (UTC)

        • Comment by an actuary*****

Why is the exponent called "k" ? This tends to make one think that the parameter only takes on integral values, which is not true. European actuaries use alpha, Americans use "Q"--either would be better.

Consider mentioning that the Pareto is often shifted so its support starts at 0; put in a reference to "shifted distribution."

Note that conditional distribution is also Pareto with the same exponent.

Maybe note that Method of Moments parameter estimation doesn't work (even more so than usual!). (Because setting the mean equal to the sample mean implies an assumption that the exponent is at least one.)

Asymptotic theory says that asymptotically, tails of distributions (if not of finite support) look exponential, or Pareto. Should link. —Preceding unsigned comment added by (talkcontribs)

Alternative R code[edit]

The provided R code for random sample generation does not translate from the origin to lambda, and thus yields numbers lower than lambda. A good alternative that provides the wanted values directly can be found in [1].

Generalized Pareto distribution[edit]

I have just changed Generalized Pareto Distribution so that it redirects here rather than to Generalized extreme value distribution, which was incorrect. Now we need someone to expand the new section (perhaps with reference to [2]). Any volunteers? DFH 18:59, 23 December 2006 (UTC)

The page Generalized Pareto distribution has been created and I changed your redirect to direct there insteadPurple Post-its (talk) 16:19, 29 June 2011 (UTC)

Relationship to the exponential distribution[edit]

I'm not quite sure what the relationship is, but how it is defined in this article is rather ambiguous. The exponential random variable has one parameter, but the formula implies that there are two parameters for the exponential distribution. There's a relationship between the Pareto distribution and the uniform distribution, as described in Statistical Distributions, Second Edition, by Evans, Hastings, and Peacock. Perhaps this is a simpler and more meaningful relationship. Steve Simon 23:30, 22 January 2007 (UTC)

A probability distribution does not have any parameters; rather a family of probability distributions may be parameterized. The usually-seen family of exponential distributions has just one parameter. However, one may speak of an exponential distribution on an interval (a, ∞), and the minimum point a is itself a parameter. This is the conditional probability distribution of an exponential distribution on (0, ∞), given the event of being ≥ a. One then gets a more extensive family of exponential distributions, parameterized by two real parameters. Michael Hardy 20:00, 23 January 2007 (UTC)
So which parameter corresponds to the value of "a" in your example. Is it k? Is it ? Also, if the two parameter family of exponential distributions is an important one, perhaps that should be incorporated on the Exponential distribution page. Steve Simon 21:31, 25 January 2007 (UTC)

Distribution example: The standardized price returns on individual stocks[edit]

Price returns on individual stocks can be, as we have recently discovered, negative as well as positive. I do not see how they can be described by a Pareto distribution.

Even if you take the absolute value I have not read anything in the financial literature that says it is Pareto distributed. And we are speaking of price returns over what time period (a second, a day, a year ?) by the way.

Doobliebop (talk) 20:05, 21 July 2009 (UTC) You can fit the distribution to the positive and negative returns independently (yielding two gamma and alpha coefficients) or to the absolute value. Take a look at "the (mis)behavior of markets" by B. Mandelbrot for an easy introduction, or for an academic paper "On fitting the Pareto-Levy distribution to stock market index data: selecting a suitable cutoff value" by H.F. Coronel-Brizio and A.R. Hernandez-Montoya. In this paper they fit the negative and positive returns independently. Mandelbrot's book implies that the time period is not important... historical, weekly, minute etc...


This passage purports to tell us how to generate a Pareto-distributed random variable:

The process is quite simple; one has to generate numbers from an exponential distribution with its λ equal to a random generated sample from a gamma distribution
This process generates data starting at 0, so then we need to add .

I have a high tolerance for unclear writing and a Ph.D. in statistics, but I can't follow this. It says "one has to generate numbers from an exponential distribution[...]". That seems clear. Then it says "with its λ equal to a random generated sample from a gamma distribution ". There it's lost me. λ is either the expected value or its reciprocal; since conventions vary, you should specify which. And it doesn't specify which. But then it says "equal to a random generated sample from a gamma distribution". Does that mean λ itself is a random variable? I'm not at all sure. And what does the line of TeX mean? I'd guess it's specifying the shape parameter of the gamma distribution and saying it's equal to the Pareto index. But again I'm not sure. Then after the word "and" I can't figure out what's being said at all. Finally it says we need to add xm.

There really is a simple way to generate Pareto-distributed random variables, already described elsewhere in this article: Let Y be exponentially distributed with intensity α. Then xmeY is Pareto-distributed. That's all it takes! Yet we have this long, complicated, opaque discussion that someone with a Ph.D. in statistics can't figure out, when we should be writing at a level that undergraduates will easily understand. And I do think undergraduates will easily understand what I wrote in this paragraph. Michael Hardy (talk) 02:34, 23 July 2009 (UTC)


T = xm*U^(-1/alpha), U ~ uniform(0,1), T ~ Pareto(alpha,xm). Of course, exponential deviates can easily be transformed. No need to discuss the gamma distribution here. —Preceding unsigned comment added by Dblord (talkcontribs) 12:18, 16 August 2009 (UTC)

Yes, I think that also works. Michael Hardy (talk) 12:36, 16 August 2009 (UTC)
That's what the Python library does. (Python is a programming language with "batteries included.") The source code cites "Jain pg. 495," probably referring to something by Raj Jain. Jive Dadson (talk) 03:29, 15 October 2009 (UTC)

But where do Pareto variables come from?[edit]

Although the transformation of exponential deviates is much more clear method for generating random variables, I like the gamma method as explaining an important way that pareto variables can arise. An exponential distribution with an uncertain rate parameter yields a pareto distribution. This is the analogous to how a student-t distribution is a normal distribution with an uncertain variance. Perhaps the process isn't a good random number generator, but it's nice as a generative model. Bscan (talk) 14:53, 25 May 2016 (UTC)

Graph labels[edit]

All the graphs are labeled with values of k, whereas the formulas use . That should be fixed for consistency. (talk) 21:45, 6 December 2009 (UTC)

It is 2013 now and this mismatch still persists. Cerberus (talk) 16:19, 24 April 2013 (UTC)

There are three graphs that use κ or k for the parameter α. For the first two, source code in R is available so it should be easy to fix it for someone with R programming experience. The third graph should be changed to vector graphics in any case and the labeling could be changed at the same time. Isheden (talk) 12:07, 25 April 2013 (UTC)

The Y - label of the very first plot needs to be changed. It is not Pr(X==x), it is a density, so the appropriate label is f_X(x). — Preceding unsigned comment added by Austrartsua (talkcontribs) 23:36, 18 February 2014 (UTC)

I've uploaded a new version of the Lorenz distributions to include instead of k, also changed format to SVG. Tkmckenzie (talk) 17:15, 13 August 2014 (UTC)

Citing one of my papers[edit]

I have added to this article a reference to a paper that I wrote, so I could be suspected of simple self-promotion. However, I think it improves the article, and in particular, it answers a question that has been asked in (maybe more than one?) discussion page of a Pareto-related Wikipedia article (I don't remember if it was this article, or Pareto principle, or one of the others). The added material appears below. Michael Hardy (talk) 20:38, 15 September 2010 (UTC)

Relation to the "Pareto principle"[edit]

The "80-20 law", according to which 20% of all people receive 80% of all income, and 20% of the most affluent 20% receive 80% of that 80%, and so on, holds precisely when the Pareto index is α = log45. Moreover, the following have been shown[1] to be mathematically equivalent:

  • Income is distributed according to a Pareto distribution with index α > 1.
  • There is some number 0 ≤ p ≤ 1/2 such that 100p% of all people receive 100(1 − p)% of all income, and similarly for every real (not necessarily integer) n > 0, 100pn% of all people receive 100(1 − p)n% of all income.

This does not apply only to income, but also to wealth, or to anything else that can be modeled by this distribution.

This excludes Pareto distributions in which 0 ≤ α ≤ 1, which, as noted above, have infinite expected value, and so cannot reasonably model income distribution.

(end of excerpt)[edit]

Examples should have citations[edit]

The article gives a nice list of examples where the experimental distributions "are sometimes seen as approximately Pareto-distributed." This should have citations to indicate where these examples may be verified/studied futher. Equating social distributions to the distribution of sand grain sizes or meteor sizes suggests an inevitability that may incite the curious to investigate further. A good citation can help such curious readers (such as myself). —Preceding unsigned comment added by Wilkus (talkcontribs) 04:15, 29 September 2010 (UTC)

Infinite divisibility[edit]

As far as I understand, the Pareto distribution is infinitely divisible. What are these distributions, i.e. what is the general form of the distributions which converge towards the Pareto distribution? -- Zz (talk) 11:03, 14 February 2011 (UTC)

Pareto II or Lomax Distribution[edit]

Wikipedia is sorely in need of material on the Lomax distribution (aka Pareto type II distribution). There is a sentence in this article that indicates we are discussing type I distributions, but the different types are nowhere to be found..... Purple Post-its (talk) 16:26, 29 June 2011 (UTC)

Where would this go? I wrote a draft of Pareto II, III, IV, and Feller-Pareto generalizations Pareto generalizations but it is not clear where to insert it. This does not seem to fit under 'Variants' and not under 'Relation to other distributions' yet if inserted in the middle of Pareto Type I details it would be confusing. How about a new section entitled 'Pareto generalizations' - or should the various generalizations and variants including bounded Pareto become a separate article? Mathstat (talk) 13:01, 23 February 2012 (UTC)
Your draft comprises a very interesting contribution, but I agree that it does not fit very well in this article. I'd suggest moving it to the existing article Generalized Pareto distribution. Actually, the section you contributed here was more or less what I asked for on the talk page in that article. What do you think? Isheden (talk) 20:44, 3 March 2012 (UTC)
It isn't clear whether the GPD contains all the types i-iv and FP as special cases. I used the term "Generalized Pareto distributions" (plural) as in Arnold. Also the references state that "Pareto" distribution often means Pareto I or Pareto II, so it may be important to see Pareto II on the same page as Pareto I. Mathstat (talk) 21:33, 3 March 2012 (UTC)
That page starts out with "The family of generalized Pareto distributions (GPD)" so I think that page could be renamed to the plural form following Arnold. An alternative would be to add a section Generalizations in this article after Variants. Isheden (talk) 22:08, 3 March 2012 (UTC)
Kleiber and Kotz (2003) state on p. 60 "In his pioneering contributions at the end of the nineteenth century, Pareto (1895, 1986, 1897a) suggested three variants of his distribution." These are the types I, II, III. So "Pareto" distribution does not necessarily refer to Type I (not even by Pareto). Mathstat (talk) 21:58, 3 March 2012 (UTC)
Perhaps it is a good idea then to turn Pareto distribution into a disambiguation page? Isheden (talk) 22:08, 3 March 2012 (UTC)
Or should Lomax distribution be merged into this page? Isheden (talk) 22:09, 3 March 2012 (UTC)

I posted a discussion related to this on Wikipedia talk:WikiProject Statistics. Any comments would be appreciated. Isheden (talk) 13:35, 13 March 2012 (UTC)

Merge from Pareto index[edit]

I think it would be more natural to discuss the Pareto index in this article, similar to how the rate parameter is discussed in the article exponential distribution. Isheden (talk) 16:02, 4 January 2012 (UTC)

I'd have to disagree, as the Pareto index has implications of economic inequity, along with the Gini Coefficient. As an encyclopedic effort, I find the Pareto index page much more comprehensible, and useful as it is. Merging it with this page would simply dilute its value and make it more difficult to find and use well. In other words, I find the one page a layperson's resource for economics, and the other a highly technical discussion of mathematics. Just to be absurd, why not merge both pages with Bradford's law? Sorry, but that best illustrates my point. TheLastWordSword (talk) 19:34, 3 March 2012 (UTC)

OK, those are valid points. The term Pareto index seems to be used mainly in economics and that page is more applied. I'll add some other terms for the parameter in this article and remove the merge proposal. Isheden (talk) 20:38, 3 March 2012 (UTC)

Fix the cases where the skewness, the excess kurtosis, etc. exist, but are infinite[edit]

Before this writing, the authors of this article did not define the expected value, variance, or some moments in some cases, when in fact they are defined, but are infinite, in a higher-level terminology. At the time of this writing, I've corrected the cases for the expected value and the variance, but the cases for the skewness, etc. need to be corrected.— Preceding unsigned comment added by (talkcontribs)

Degenerate case property: Superfluous?[edit]

The degenerate distribution (referred to in the article as a "Dirac delta function") is a special or degenerate case of all common probability models. Its mention in this article is therefore arguably superfluous. I submit that the subsection be deleted from the article. Lovibond (talk) 03:11, 29 September 2012 (UTC)

There being no objections in five weeks, the superfluous material is excised. Lovibond (talk) 19:06, 2 November 2012 (UTC)

Clarification needed[edit]

1. In the Section "Pareto Types I–IV" it reads: "The Pareto distribution of the second kind is also known as the Lomax distribution."
In the ensuing table one finds:
Type II -
Lomax -
It follows that: (1) Lomax is not Type II and (2) the Pareto distribution of the "second kind" is not Type II.
The meaning of "second kind" remains unclear. This needs elaboration by an expert.

2. In addition, in the article Lomax distribution one reads: "The Lomax distribution, also called the Pareto Type II distribution, is ...... ". This seems conflicting with the previous conclusion from the given table that Lomax is not Type II. How is that?

3. Further it seems inconsistent to use in the table alternately exponent (outside the [ ] brackets) and exponent twice (without [ ] ). More uniformity might help the reader to spot the essential difference more easily. Also the alternate use of σ and C seems to complicate the comparison unnecessarily. What about writing Lomax as . Then the difference with Type II (the term -μ) and the other types in the table becomes obvious.

4. Another thing. In the section "Definition" the symbol xm is used instead of σ in the table. Should this not made made uniform?

5.In probability theory, the symbols σ and μ have a routine meaning, which is different from the meaning used in the table mentioned. Would it not be wise to change the symbol σ into λ (as in Lomax distribution) or xm and the symbol μ into η, or whatever, to avoid confusion?

6.Also, in the following table concerning moments, the Lomax distribution is missing. Would it not be logical to add it there?

Hopefully there is someone who can sort this all out. Asitgoes (talk) 08:28, 19 October 2012 (UTC)

I agree that this is a confusing issue. Probably the inconsistencies reflect that different sources use different notation and also slightly different definitions. A quick check on Google books gives support for both definitions: [3] [4] In the table, Pareto type II also has a location parameter μ but it's not clear if this is standard usage of the term. I also agree that the notation for the scale parameter should be made uniform. Again, different sources use different notation but σ seems to be used typically for various generalizations of the Pareto distribution. The notation for the shape parameter should also be consistent. Isheden (talk) 09:01, 19 October 2012 (UTC)
There is one more matter. In the table, the support of the Lomax distribution is given as x≥0 and C>0. It seems that x>-C is also supportable, which is ampler. Why not change this?? Asitgoes (talk) 09:21, 19 October 2012 (UTC)
Both sources in my previous post give x>0 as support. Are there reliable sources that define the support the way you propose? Isheden (talk) 09:34, 19 October 2012 (UTC)
Not being a Pareto/Lomax expert I do not know a reliable source, but (simple) mathematics suggest that if x+C>0 all is well. Asitgoes (talk) 11:10, 19 October 2012 (UTC)

I have changed the expression for the Lomax distribution in line with the suggestion in 3. above. Some of the other points may be more relevant for the article on the Lomax distribution, see talk:Lomax distribution. Isheden (talk) 11:05, 19 October 2012 (UTC)

For reasons of logic and clarity, in the first lines of the section "Pareto Types I–IV", I have put the condition μ=0 for the Pareto distribution Type II to be the same as the Lomax distribution, and also I have made a note that the previously used symbol xm is being replaced by σ. Asitgoes (talk) 21:33, 21 October 2012 (UTC)

Bradford's Law[edit]

Removed a sentence in the lede which had been tagged as needing a citation. "Outside the field of economics it" (Pareto distribution) "is sometimes referred to as the Bradford distribution.[citation needed]" The Bradford distribution of Bradford's law is applied in bibliometrics. It is probably more accurate to say that in bibliometrics Bradford's law is sometimes known as a power law. It is actually more like Lotka's law is a power law. See e.g. Distributions in Information Science – Making the Case for Logarithmic Binning, page 2. Note that Bradford's law is already in the "See also" section, which seems to be the right place to mention it. Mathstat (talk) 18:32, 16 December 2012 (UTC)

Power Law?[edit]

Many characterize the Pareto distribution as a power law

Power laws are of the form: but Pareto is essentially of the form . These are distinct functional forms and neither can be used to approximate the others. In what sense is the Pareto distribution a manifestation of a power law? — Preceding unsigned comment added by 2604:6000:6FC0:1E:18C8:C97E:9EBF:9913 (talk) 21:58, 18 November 2015 (UTC)

It's not a power law according to that definition, but it involves a power in a similar way. That's all. Eric Kvaalen (talk) 16:08, 24 January 2016 (UTC)

Low and negative wealth[edit]

I have reinserted the sentences "Note that the Pareto distribution is not realistic for wealth for the lower end. In fact, net worth may even be negative." These were recently removed by user:Wingedsubmariner with the comment "Removed uncited statement about applying pareto distribution to low-end wealth, incorrect information on handling negative values". It's not incorrect. In the Pareto distribution there is nothing below a certain minimum (xm). But for real people there is no such minimum wealth, and some people have negative net worth because they have debts or liabilities greater than their assets. In fact, even for income distribution the Pareto is surely not realistic because there is no positive minimum income. Some people have zero income (in some societies at least). Eric Kvaalen (talk) 16:08, 24 January 2016 (UTC)

Moment generating function[edit]

This article gives a moment generating function for the pareto distribution. But the existence of the mgf implies that all moments exist, which is not true. So is it true that mgf which is given? any reference? This looks fishy. Kjetil B Halvorsen 15:24, 27 January 2016 (UTC) — Preceding unsigned comment added by Kjetil1001 (talkcontribs)

(The variance in the table on the right should be interpreted as the second moment).[edit]

This sentence spreads FUD to the reader. First, the Moment (mathematics) article, clearly states that "the second central moment is the variance". Second, I checked empirically and the formula given on the table on the right does return the expected variance. Unless there are any objections, I would remove this doubt in the article. Cristiklein (talk) 08:26, 17 October 2016 (UTC)

Existence of mean and variance[edit]

The article states the mean exists for and the variance for . Possibly I'm making a silly mistake due to being too tired, but I think the values on the RHSs should be 2 and 3 respectively. I came to this page from , which states this. Here is a reference in agreement: Pagw (talk) 21:49, 30 July 2017 (UTC)