Talk:Gini coefficient

From Wikipedia, the free encyclopedia
Jump to: navigation, search
          This article is of interest to the following WikiProjects:
WikiProject Mathematics (Rated C-class, Mid-importance)
WikiProject Mathematics
This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of Mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Mathematics rating:
C Class
Mid Importance
 Field: Probability and statistics
One of the 500 most frequently viewed mathematics articles.
This article has comments.
WikiProject Economics (Rated C-class, Mid-importance)
WikiProject icon This article is within the scope of WikiProject Economics, a collaborative effort to improve the coverage of Economics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
C-Class article C  This article has been rated as C-Class on the project's quality scale.
 Mid  This article has been rated as Mid-importance on the project's importance scale.
 
WikiProject Sociology (Rated C-class, High-importance)
WikiProject icon This article is within the scope of WikiProject Sociology, a collaborative effort to improve the coverage of Sociology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
C-Class article C  This article has been rated as C-Class on the project's quality scale.
 High  This article has been rated as High-importance on the project's importance scale.
 
WikiProject Statistics (Rated C-class, Low-importance)
WikiProject icon

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

C-Class article C  This article has been rated as C-Class on the quality scale.
 Low  This article has been rated as Low-importance on the importance scale.
 
WikiProject Globalization (Rated C-class, Mid-importance)
WikiProject icon This article is within the scope of WikiProject Globalization, a collaborative effort to improve the coverage of Globalization on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
C-Class article C  This article has been rated as C-Class on the project's quality scale.
 Mid  This article has been rated as Mid-importance on the project's importance scale.
 
This article has an assessment summary page.

Limitations not convincing[edit]

Many of the arguments under the "limitations" section boil down to: the Gini measures inequality, not poverty (e.g. "for example, while both Bangladesh (per capita income of $1,693) and the Netherlands (per capita income of $42,183) had an income Gini index of 0.31 in 2010,[54] the quality of life, economic opportunity and absolute income in these countries are very different"; also the "Gini coefficient falls yet the poor get poorer, Gini coefficient rises yet everyone getting richer" section). But of course, it's *meant* to measure inequality, not poverty. I don't think this is a limitation of the measure. Perhaps it's a critique of people misusing the measure, but I'm not sure I've actually seen anyone misusing it in this way.

Countries by Gini Index[edit]

Main article: List of countries by income equality

A Gini coefficient above 50 is considered high; countries like Chile, USA, Russia, China, Bolivia, Mexico and Central America countries can be found in this category. A Gini coefficient of 30 or above is considered medium; countries like USA and Venezuela can be found in this category. A Gini coefficient lower than 30 is considered low; countries like Austria and Denmark can be found in this category.[55]

30 and above should be changed to 30 to 50. Then USA should be only in high gini index list.

Known distribution function[edit]

How can it be that the uniform distribution and the delta distribution produce the same index? Indeed how can it be that the explanation for the delta distributions says:

   The Dirac delta function represents the case where everyone has the same wealth (or income)

If the argument of the delta is the fraction of the population then the explanation is incorrect. If the argument of the delta is the income, then the explanation is correct. So it seems that the relation between the tables is an exchange of the axis. What ever it is, it would be good that some expert revise these tables and explain what is each of them properly. Kakila (talk) 21:55, 28 October 2015 (UTC)

Good catch. I would ignore: The delta function assumes all the distribution concentrates at one point. Assume that point is at the upper end, meaning that while the integral of the density is 1, the excess over that exact point is 0. This gives an index of 0 but it is absurd, both practically and mathematically.Limit-theorem (talk) 22:48, 28 October 2015 (UTC)
In the table, the Dirac distribution and the uniform distribution are not equal, so I don’t understand that problem.
I agree that the explanation of how a probability distribution can be used to represent an income distribution is rather vague. I have tried to fix this in this article and the Lorenz curve article.
I also agree that if the probability distribution is a Dirac delta, deriving the Lorenz curve is problematic. However, it states in the Lorenz curve article:

The inverse x(F) may not exist because the cumulative distribution function has intervals of constant values. However, the previous formula can still apply by generalizing the definition of x(F):

x(F1) = inf {y : F(y) ≥ F1}
I have not verified that this “fixes” the problem with the Dirac delta, but I do know that for a number of probability functions f(x,a) (e.g. normal distribution, Chi-square distribution) which tend to a Dirac delta for extreme values of a, the Lorenz curve tends to the expected straight line L(F)=F. In other words, I expect that, given the above generalization of x(F), the statement that the Dirac distribution yields a Gini Coefficient of zero is not absurd. PAR (talk) 00:25, 1 November 2015 (UTC)
I removed the Dirac Delta. Too confusing and not applicable for wealth since it concentrates on a point; given that we have natural numbers it cannot apply in a measurement. Limit-theorem (talk) 07:17, 1 November 2015 (UTC)
I don't think it should be removed. Understanding the delta function will remove the confusion, deleting references to it will remove good information. Yes, with real populations, we deal with discrete, natural numbers, but dealing with real numbers is a good approximation when the natural numbers are huge. If we limit ourselves to natural numbers, then we need to wipe out every reference and definition in the article which uses real numbers and functions of real numbers. Not a good idea. So once we accept the continuum approximation involving real numbers, we have continuous probability distributions of income or whatever. Many of these distributions will tend to a Dirac delta function for extreme values of some parameter, so its reasonable to ask what happens to the Gini coefficient when this happens. and the answer is not ambiguous or confusing. The bottom line is that if we are dealing with huge numbers of people and their incomes (for example), the Dirac delta function is a continuous approximation to everybody having nothing except for one person having everything. An unlikely, but "measurable" situation. The resulting Gini coefficient is a good approximation to the discrete Gini coefficient that we would then calculate.. PAR (talk) 09:50, 18 November 2015 (UTC)

Formula in the definition[edit]

The formula for the Gini coefficient reported in the "definition" section is a little confusing.

As of now it reads: G = \frac{\displaystyle{\sum_i \sum_j \left| x_i - x_j \right|}}{\displaystyle{2 \sum_i \sum_j x_i}}

But it's strange to have the double summation in the denominator because:

  1. only the i index appears on the x in the denominator
  2. the 2 in the denominator should already take care of the double summation of absolute differences that it's done in the numerator. Having 2 times twice this sum may suggest that you to sum *4* rather than 2 times all frequencies x.

This seems more accurate instead: G = \frac{\displaystyle{\sum_i \sum_j \left| x_i - x_j \right|}}{\displaystyle{2 \sum_i x_i}} — Preceding unsigned comment added by Brucap (talkcontribs) 10:35, 9 December 2015 (UTC)

It's correct as it stands. The second summation in the denominator evaluates to a factor n (not 2). Compare to the definitions in mean absolute difference.
G = \frac{1}{2} \frac{\frac{1}{n^2}{\sum_i \sum_j \left| x_i - x_j \right|}}{{\frac{1}{n} \sum_i x_i}} = \frac{\sum_i \sum_j \left| x_i - x_j \right|}{2n \sum_i x_i}
Woodstone (talk) 11:31, 9 December 2015 (UTC)

I see: now it's clear. But then I would actually write  2 n \sum_i x_i instead that the double summation: it's more transparent. — Preceding unsigned comment added by 130.60.140.212 (talk) 16:11, 10 December 2015 (UTC)

Income distribution function[edit]

Above the table, the (income) probability distribution is defined as:

  • the function f(x) where f(x)dx is the fraction of the population with income between x and x + dx.

In case of total equality, everyone has the same income x0, so for every value of x ≠ x0, one has f(x) = 0. However the integral of f(x) over all x must be 1. This together makes f(x) a dirac function centered on x0.

A homeogeneous distribution between a and b means that the chance of having an income of any value between a and b is equal. So the same number of people have income a as are having income b. As long as ab that is not total equality.

Woodstone (talk) 16:22, 26 January 2016 (UTC)

I kept the Dirac as is but one needs to be extra careful with Dirac because it concentrates on one point (which necessary implies continuous distribution) and excludes mass elsewhere. It is what we call a degenerate case implying no probability distribution.Limit-theorem (talk) 16:41, 26 January 2016 (UTC)
The statement "The Dirac delta distribution represents the case where everyone has the same wealth (or income) but it implies that there are no variations at all between incomes." repeats itself.
If everyone has the same income, then it follows that there there are no variations at all between incomes. I will revert this unless it can be explained. PAR (talk) 20:53, 26 January 2016 (UTC)
Fixed. The deeper problem is that the Dirac Delta is not a probability distribution, hence the Gini cannot be technically computed. I have removed all references to Dirac Delta before and it keeps being put in. We should decide.Limit-theorem (talk) 22:53, 26 January 2016 (UTC)
Yes, I keep putting them back in because there is no problem. Yes, the Dirac delta is a distribution, not a function, but that does not mean it is useless. Technically, the Gini *can* be computed. Take for example the uniform distribution
f(x,a,b)=\frac{1}{b-a}  for   \,\,a \le x \le b  and zero otherwise.
The Gini coefficient is
\textrm{Gini}(a,b)=\frac{b-a}{3(b+a)}
Clearly,
\lim_{a\to b}f(x,a,b)=\delta(x-b)
where δ(.) is the Dirac delta function and even more clearly:
\lim_{a\to b}\textrm{Gini}(a,b)=0
THIS IS ALWAYS THE CASE. Whenever the probability distribution f(x,p) (where p is a bunch of parameters) approaches the Dirac delta function as the parameters approach some critical value or values, the Gini coefficient will approach zero. No problem, no puzzle, and technically computable. With an understanding of the meaning of the delta function, it stands as a valid probability distribution, yielding Gini coefficient zero. PAR (talk) 01:58, 27 January 2016 (UTC)
Here is a proof using the delta function alone, it's easier than I thought. For the continuous case, the Gini coefficient is:

\textrm{Gini}=1-\frac{\int_0^\infty (1-F(t))^2 \,dt}{\int_0^\infty t f(t)\,dt}
where f(.) is the probability density function (PDF) and F(.) is its cumulative distribution function (CDF). Let's take the case where the PDF is a delta function: δ(x-m) where m is not zero. The CDF is H(x-m) where H(.) is the Heaviside step function. It's easily found that:

\int_0^\infty (1-H(t-m))^2 \,dt = m
and

\int_{-\infty}^\infty t\, \delta(t-m) \,dt = m
and so Gini=1-m/m=0. If we need a proof for m=0, then we can do a change of variables y=x-1. Then by the same type of reasoning, we can prove that Gini = 1-(m+1)/(m+1) and Gini is zero when m=0. PAR (talk) 06:21, 27 January 2016 (UTC)
I think the derivation is right, but you need a reference as Wikipedia is not original research. My reservation comes from the fact that Delta is not a probability distribution hence many other requirements fail. Perhaps mention that "in the degenerate case, we get...". Also you can more rigorously get your result if you took the limit of a Bernoulli with p approaching 1, rather than just start with the limiting case and integrating. Limit-theorem (talk) 09:58, 28 January 2016 (UTC)
I fail to see why the Dirac delta is not a probability distribution. It exactly describes the case where only one value (of income) is realised, in other words where there is no variation (in income) at all. Yes, strictly only applies to infinite population, but is the limit for the case that everyone has the same income. −Woodstone (talk) 11:55, 28 January 2016 (UTC)
Yes. Its integral is one and its everywhere non negative (in the sense of a measure), and that's all it needs to be a probability density. It has all its moments (unlike some), it has a mean and variance and entropy, a cumulative distribution function and a characteristic function. Its listed in the "probability distribution" template, etc. With regard to the limit of a Bernoulli, that's a good point, its also the limit of many discrete distributions. For example, its the limit of the Binomial distribution as n->infinity and x=1/N.
I spent some time thinking about that "proof", and I didn't mean to be a dick about it, I wanted to explain it to myself as well. I don't think it rises to the level of "research" though. Nevertheless, I will try to find a reference. PAR (talk) 14:47, 28 January 2016 (UTC)
Thanks. Incidentally you can get the result by taking the limit of the Normal Distribution with standard deviation at 0. But the Normal is not appropriate for Gini because it is on the real line, you need a distribution with positive support. Limit-theorem (talk) 00:35, 29 January 2016 (UTC)
Half normal distribution would be good, then, and I think it goes to a delta for σ->0. I think it would be good to add it at some point. PAR (talk) 01:23, 29 January 2016 (UTC)
Actually Lognormal G = \operatorname{erf}\left(\frac{\sigma }{2 }\right) so you get the same result. I will add lornormal later as it is a common income distribution assumption.Limit-theorem (talk) 06:24, 29 January 2016 (UTC)