Talk:Pareto distribution
| WikiProject Mathematics (Rated B-Class) | ||||||
|---|---|---|---|---|---|---|
| This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of Mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks. | ||||||
| Mathematics rating: | B Class | Low Priority | Field: Probability and statistics | |||
|
||||||
| WikiProject Statistics | |||||||||||||||||
|
|||||||||||||||||
[edit] Graph
For those visual thinkers among us, can we have an example graph of this?
- A very dull graph: starting at
, the density falls as x increases and the cumulative distribution rises, each with a slope which becomes shallower for large x. - I have removed the statement If the value of k is chosen judiciously then the Pareto distribution obeys the "80-20 rule" since it depends on a right truncation which this distribution doesn't have; allowing such truncation judiciously would mean most distributions met the "80-20" rule.--Henrygb 00:13, 6 Aug 2004 (UTC)
[edit] Technical cleanup tag
It was commented to me that articles like this are not and should not be aimed at non-technical readers. WikiProject Science and other communal efforts I've seen generally have the goal of making the first part of the article accessible to the general public, but allowing for later parts which may be intelligible only to technical readers. That's certainly possible to do in this case.
- I'm the one who commented, and I didn't say "articles like this"; I said articles on probability distributions, and I didn't just say "non-technical readers"; I said readers not familiar with the mathematical theory of probability. Michael Hardy 02:47, 6 Mar 2005 (UTC)
Because Pareto distributions are used in economics and sociology with regard to political issues of public interest, it's entirely likely that non-technical readers will arrive at this article needing to know what this thing is. Not necessarily in precise detail, but in vague outline, at least.
This article isn't very accessible even to many technical readers. I have a degree from MIT, and I've taken math up through differential equations.
- But the relevant question is whether you've studied probability theory. Course 18.440 at MIT, titled Probability and Random Variables, does not require "up through differential equations", but only first-year calculus, which most MIT students have before entering as freshmen (and "first-year" is construed differently at MIT in this case). Anyone who's studied continuous probability distributions (and not just at MIT!) can understand this article. But yes, probably some things could be said for "lay" readers initially. Perhaps because of this distribution's occurrence in social sciences, that would make more of a difference in this case than with most probability distributions. But generally, articles on mathematics shouldn't need to be comprehensible to everyone who's studied only high-school math. Michael Hardy 02:47, 6 Mar 2005 (UTC)
I could make a graph (either mentally, digitally, or on paper) that plots a typical Pareto distribution, but that would be a lot of work that I shouldn't really have to do. I'm sure there are many scientists and computer engineers who would benefit from a better introduction.
- ... actually, if I were trying to write an introduction for non-mathematically inclined social scientists, a graph wouldn't be the first thing I would attend to. Maybe I'll work on this at some point .... Michael Hardy 02:47, 6 Mar 2005 (UTC)
Fortunately, I think all this article needs to be much more widely accessible is a graph or two of typical Pareto distributions, with labels and a brief explanation. -- Beland 02:19, 6 Mar 2005 (UTC)
I did study probability theory, back in the day, and found the article a bit terse. What I hoped to see at the start was a few extra paragraphs:
1. A brief general introductory paragraph or two pitched at people with only a craps or texas-hold-em knowledge of probability - why it matters, the elevator speech statement of what it means, etc.
2. Move the short section on things claimed to match a Pareto from the bottom of the article, with perhaps a few hard numbers added to it. (The usual - for k=1, x% will be <=3, with similar for k=2 or 3. This is still fluffy, but gives a numerical feel to that graph and the fluffy stuff in the first paragraph
Pretty much the same content, but with the take home goodies near the top. --ScottEllsworth 08:00, 20 Mar 2005 (UTC)
[edit] Pareto density at xmin
Don't we need to come up with a value at the transition point x=xmin? I'm in favor of x=(1/2) k/xmin because it allows definition in terms of, say, the Heaviside step function, and F-1(F(p(x)))=p(x) uniformly where F is the fourier transform. Whatever we come up with, I will alter the graphic accordingly. Paul Reiser 23:27, 15 Mar 2005 (UTC)
- For purposes of probability theory, the value of a density at a boundary point does not matter since it does not affect the value of any integral. But for pruposes of maximum likelihood estimation in statistical inference, you'd probably want to make it the maximum. Therefore I would not use half the maximum. But of course, the inverse Fourier transform of the Fourier transform of the density may give you half the maximum. Michael Hardy 00:08, 16 Mar 2005 (UTC)
-
-
-
- Comment by an actuary*****
-
-
Why is the exponent called "k" ? This tends to make one think that the parameter only takes on integral values, which is not true. European actuaries use alpha, Americans use "Q"--either would be better.
Technically, when the exponent is < 1, the mean "does not exist"; "is infinity" is slightly off. Similar comment for < 2, variance.
Consider mentioning that the Pareto is often shifted so its support starts at 0; put in a reference to "shifted distribution."
Note that conditional distribution is also Pareto with the same exponent.
Maybe note that Method of Moments parameter estimation doesn't work (even more so than usual!). (Because setting the mean equal to the sample mean implies an assumption that the exponent is at least one.)
Asymptotic theory says that asymptotically, tails of distributions (if not of finite support) look exponential, or Pareto. Should link. —Preceding unsigned comment added by 65.217.188.20 (talk • contribs)
- I don't think "is infinity" is "slightly off". There are distributions like the Cauchy distribution for which the expected value does not exist even if one allows ∞ as the expected value, and there are those like the Pareto distribution for which "is ∞" makes sense. Michael Hardy (talk) 00:21, 22 July 2009 (UTC)
[edit] Error in CDF formula for Pareto
I believe that the expression for the cumulative distribution function has an error. It currently reads
cdf =
and should read
cdf =
This is perhaps part of the confusion arising out of not shifting the origin to x_m. There should also be a reference to the excellent (highly technical) article in mathworld: http://mathworld.wolfram.com/ParetoDistribution.html
Unless I get, within a short period of time, some indication that I am wrong, I will change it in the main article.
69.15.90.194 (talk) 21:03, 22 July 2009 (UTC) Can anyone expand on this^ Is the xm in the denominator a shifting term for the origin? I can't find any references to this anywhere...
- It's not clear to mean what changes are proposed. What is written above says the article "now" gives the version in which the lower bound is zero. That may have been the case in the past, but it's not now. Michael Hardy (talk) 21:23, 22 July 2009 (UTC)
[edit] I got the wrong PDF?
I Changed the
cdf =
For this one
cdf =
I got the result from integrating a pdf...which is a bit different from the one given; it is essentially the same one but mine did not shift the origin to x_m ... this is not a fake result or anything. Something should indicate this "kinda" conflict between 2 version of the same probability function. but definitively...i will remove my mistake...only because the current cdp does not reflect the shifting nature of the pdf. I'll specify in the generating topic that it will generate a random sample from a non shifted pareto distribution.
Cyberyder 04:24, 6 April 2006 (UTC)
[edit] Alternative R code
The provided R code for random sample generation does not translate from the origin to lambda, and thus yields numbers lower than lambda. A good alternative that provides the wanted values directly can be found in [1].
[edit] Generalized Pareto distribution
I have just changed Generalized Pareto Distribution so that it redirects here rather than to Generalized extreme value distribution, which was incorrect. Now we need someone to expand the new section (perhaps with reference to [2]). Any volunteers? DFH 18:59, 23 December 2006 (UTC)
The page Generalized Pareto distribution has been created and I changed your redirect to direct there insteadPurple Post-its (talk) 16:19, 29 June 2011 (UTC)
[edit] Relationship to the exponential distribution
I'm not quite sure what the relationship is, but how it is defined in this article is rather ambiguous. The exponential random variable has one parameter, but the formula implies that there are two parameters for the exponential distribution. There's a relationship between the Pareto distribution and the uniform distribution, as described in Statistical Distributions, Second Edition, by Evans, Hastings, and Peacock. Perhaps this is a simpler and more meaningful relationship. Steve Simon 23:30, 22 January 2007 (UTC)
- A probability distribution does not have any parameters; rather a family of probability distributions may be parameterized. The usually-seen family of exponential distributions has just one parameter. However, one may speak of an exponential distribution on an interval (a, ∞), and the minimum point a is itself a parameter. This is the conditional probability distribution of an exponential distribution on (0, ∞), given the event of being ≥ a. One then gets a more extensive family of exponential distributions, parameterized by two real parameters. Michael Hardy 20:00, 23 January 2007 (UTC)
-
- So which parameter corresponds to the value of "a" in your example. Is it k? Is it
? Also, if the two parameter family of exponential distributions is an important one, perhaps that should be incorporated on the Exponential distribution page. Steve Simon 21:31, 25 January 2007 (UTC)
- So which parameter corresponds to the value of "a" in your example. Is it k? Is it
[edit] Many types of Pareto
Hi :) I'm studying My Actuarial Exam 4/C along with loss models and i can't help but to notice that the pareto distribution listed in wikipedia is the 1 parameter for of the distribution. In fact this distribution is pretty much the same as the 2 parameter instead, Xm. I was wondering if it was possible to change the name of the article for "Single parameter pareto distribution" and i will eventually add a subtopic for the 2 parameter distribution. 64.18.166.93 18:16, 7 April 2007 (UTC)
[edit] Distribution Example: size of sand particles?
Is it true that the size of sand particles (probably from the same part of the same beach) are pareto-distributed? I would have thought they'd be gaussian-normal-distributed...
I never looked at sand particles under a microscope, though, I just thought they were all the same size...
<--- Feels Educated now. :)
[edit] Distribution example: The standardized price returns on individual stocks
Price returns on individual stocks can be, as we have recently discovered, negative as well as positive. I do not see how they can be described by a Pareto distribution.
Even if you take the absolute value I have not read anything in the financial literature that says it is Pareto distributed. And we are speaking of price returns over what time period (a second, a day, a year ?) by the way.
Doobliebop (talk) 20:05, 21 July 2009 (UTC) You can fit the distribution to the positive and negative returns independently (yielding two gamma and alpha coefficients) or to the absolute value. Take a look at "the (mis)behavior of markets" by B. Mandelbrot for an easy introduction, or for an academic paper "On fitting the Pareto-Levy distribution to stock market index data: selecting a suitable cutoff value" by H.F. Coronel-Brizio and A.R. Hernandez-Montoya. In this paper they fit the negative and positive returns independently. Mandelbrot's book implies that the time period is not important... historical, weekly, minute etc...
[edit] Huh??
This passage purports to tell us how to generate a Pareto-distributed random variable:
- The process is quite simple; one has to generate numbers from an exponential distribution with its λ equal to a random generated sample from a gamma distribution
-
- and
-
- This process generates data starting at 0, so then we need to add
.
I have a high tolerance for unclear writing and a Ph.D. in statistics, but I can't follow this. It says "one has to generate numbers from an exponential distribution[...]". That seems clear. Then it says "with its λ equal to a random generated sample from a gamma distribution
". There it's lost me. λ is either the expected value or its reciprocal; since conventions vary, you should specify which. And it doesn't specify which. But then it says "equal to a random generated sample from a gamma distribution". Does that mean λ itself is a random variable? I'm not at all sure. And what does the line of TeX mean? I'd guess it's specifying the shape parameter of the gamma distribution and saying it's equal to the Pareto index. But again I'm not sure. Then after the word "and" I can't figure out what's being said at all. Finally it says we need to add xm.
There really is a simple way to generate Pareto-distributed random variables, already described elsewhere in this article: Let Y be exponentially distributed with intensity α. Then xmeY is Pareto-distributed. That's all it takes! Yet we have this long, complicated, opaque discussion that someone with a Ph.D. in statistics can't figure out, when we should be writing at a level that undergraduates will easily understand. And I do think undergraduates will easily understand what I wrote in this paragraph. Michael Hardy (talk) 02:34, 23 July 2009 (UTC)
[edit] right!
T = xm*U^(-1/alpha), U ~ uniform(0,1), T ~ Pareto(alpha,xm). Of course, exponential deviates can easily be transformed. No need to discuss the gamma distribution here. —Preceding unsigned comment added by Dblord (talk • contribs) 12:18, 16 August 2009 (UTC)
- Yes, I think that also works. Michael Hardy (talk) 12:36, 16 August 2009 (UTC)
- That's what the Python library does. (Python is a programming language with "batteries included.") The source code cites "Jain pg. 495," probably referring to something by Raj Jain. Jive Dadson (talk) 03:29, 15 October 2009 (UTC)
[edit] Graph labels
All the graphs are labeled with values of k, whereas the formulas use
. That should be fixed for consistency. 128.164.79.202 (talk) 21:45, 6 December 2009 (UTC)
[edit] Citing one of my papers
I have added to this article a reference to a paper that I wrote, so I could be suspected of simple self-promotion. However, I think it improves the article, and in particular, it answers a question that has been asked in (maybe more than one?) discussion page of a Pareto-related Wikipedia article (I don't remember if it was this article, or Pareto principle, or one of the others). The added material appears below. Michael Hardy (talk) 20:38, 15 September 2010 (UTC)
[edit] Relation to the "Pareto principle"
The "80-20 law", according to which 20% of all people receive 80% of all income, and 20% of the most affluent 20% receive 80% of that 80%, and so on, holds precisely when the Pareto index is α = log45. Moreover, the following have been shown[1] to be mathematically equivalent:
- Income is distributed according to a Pareto distribution with index α > 1.
- There is some number 0 ≤ p ≤ 1/2 such that 100p% of all people receive 100(1 − p)% of all income, and similarly for every real (not necessarily integer) n > 0, 100pn% of all people receive 100(1 − p)n% of all income.
This does not apply only to income, but also to wealth, or to anything else that can be modeled by this distribution.
This excludes Pareto distributions in which 0 ≤ α ≤ 1, which, as noted above, have infinite expected value, and so cannot reasonably model income distribution.
- [1] Michael Hardy, "Pareto's Law", Mathematical Intelligencer, volume 32, number 3, pages 38–43.
[edit] (end of excerpt)
[edit] Examples should have citations
The article gives a nice list of examples where the experimental distributions "are sometimes seen as approximately Pareto-distributed." This should have citations to indicate where these examples may be verified/studied futher. Equating social distributions to the distribution of sand grain sizes or meteor sizes suggests an inevitability that may incite the curious to investigate further. A good citation can help such curious readers (such as myself). —Preceding unsigned comment added by Wilkus (talk • contribs) 04:15, 29 September 2010 (UTC)
[edit] Infinite divisibility
As far as I understand, the Pareto distribution is infinitely divisible. What are these distributions, i.e. what is the general form of the distributions which converge towards the Pareto distribution? -- Zz (talk) 11:03, 14 February 2011 (UTC)
[edit] Pareto II or Lomax Distribution
Wikipedia is sorely in need of material on the Lomax distribution (aka Pareto type II distribution). There is a sentence in this article that indicates we are discussing type I distributions, but the different types are nowhere to be found..... Purple Post-its (talk) 16:26, 29 June 2011 (UTC)
- Where would this go? I wrote a draft of Pareto II, III, IV, and Feller-Pareto generalizations Pareto generalizations but it is not clear where to insert it. This does not seem to fit under 'Variants' and not under 'Relation to other distributions' yet if inserted in the middle of Pareto Type I details it would be confusing. How about a new section entitled 'Pareto generalizations' - or should the various generalizations and variants including bounded Pareto become a separate article? Mathstat (talk) 13:01, 23 February 2012 (UTC)
-
- Your draft comprises a very interesting contribution, but I agree that it does not fit very well in this article. I'd suggest moving it to the existing article Generalized Pareto distribution. Actually, the section you contributed here was more or less what I asked for on the talk page in that article. What do you think? Isheden (talk) 20:44, 3 March 2012 (UTC)
-
-
- It isn't clear whether the GPD contains all the types i-iv and FP as special cases. I used the term "Generalized Pareto distributions" (plural) as in Arnold. Also the references state that "Pareto" distribution often means Pareto I or Pareto II, so it may be important to see Pareto II on the same page as Pareto I. Mathstat (talk) 21:33, 3 March 2012 (UTC)
-
-
-
- Kleiber and Kotz (2003) state on p. 60 "In his pioneering contributions at the end of the nineteenth century, Pareto (1895, 1986, 1897a) suggested three variants of his distribution." These are the types I, II, III. So "Pareto" distribution does not necessarily refer to Type I (not even by Pareto). Mathstat (talk) 21:58, 3 March 2012 (UTC)
- Perhaps it is a good idea then to turn Pareto distribution into a disambiguation page? Isheden (talk) 22:08, 3 March 2012 (UTC)
- Or should Lomax distribution be merged into this page? Isheden (talk) 22:09, 3 March 2012 (UTC)
- Kleiber and Kotz (2003) state on p. 60 "In his pioneering contributions at the end of the nineteenth century, Pareto (1895, 1986, 1897a) suggested three variants of his distribution." These are the types I, II, III. So "Pareto" distribution does not necessarily refer to Type I (not even by Pareto). Mathstat (talk) 21:58, 3 March 2012 (UTC)
-
I posted a discussion related to this on Wikipedia talk:WikiProject Statistics. Any comments would be appreciated. Isheden (talk) 13:35, 13 March 2012 (UTC)
[edit] Merge from Pareto index
I think it would be more natural to discuss the Pareto index in this article, similar to how the rate parameter is discussed in the article exponential distribution. Isheden (talk) 16:02, 4 January 2012 (UTC)
I'd have to disagree, as the Pareto index has implications of economic inequity, along with the Gini Coefficient. As an encyclopedic effort, I find the Pareto index page much more comprehensible, and useful as it is. Merging it with this page would simply dilute its value and make it more difficult to find and use well. In other words, I find the one page a layperson's resource for economics, and the other a highly technical discussion of mathematics. Just to be absurd, why not merge both pages with Bradford's law? Sorry, but that best illustrates my point. TheLastWordSword (talk) 19:34, 3 March 2012 (UTC)
OK, those are valid points. The term Pareto index seems to be used mainly in economics and that page is more applied. I'll add some other terms for the parameter in this article and remove the merge proposal. Isheden (talk) 20:38, 3 March 2012 (UTC)
, the density falls as x increases and the cumulative distribution rises, each with a slope which becomes shallower for large x.
? Also, if the two parameter family of exponential distributions is an important one, perhaps that should be incorporated on the 
.