Talk:Gini coefficient: Difference between revisions

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Content deleted Content added
Sarouk7 (talk | contribs)
No edit summary
→‎Gini index other uses: No impact on GS
Line 283: Line 283:
It would be remiss if this fundamental application of the Gini Coefficient is not included in the Wiki article.
It would be remiss if this fundamental application of the Gini Coefficient is not included in the Wiki article.
Where else do you think it should be added if not under [[Gini_coefficient#Other_uses|other uses]]? [[User:Sarouk7|Sarouk7]] ([[User talk:Sarouk7|talk]]) 13:37, 11 July 2021 (UTC)
Where else do you think it should be added if not under [[Gini_coefficient#Other_uses|other uses]]? [[User:Sarouk7|Sarouk7]] ([[User talk:Sarouk7|talk]]) 13:37, 11 July 2021 (UTC)

::Dear editor, thank you for your kind effort. Please note that these papers have no impact, the most cited appears to have 9 citations on Google Scholar, the other one has 4. This is not encyclopedia level. As to the "Parsa, Motahareh, Antonio Di Crescenzo, and Hadi Jabbari Nooghabi. "Comparison of Systems Ageing Properties by Gini-type Index." 13th Iranian Statistics Conference. 2016.", it has no citations. [[User:Limit-theorem|Limit-theorem]] ([[User talk:Limit-theorem|talk]]) 18:07, 11 July 2021 (UTC)

Revision as of 18:07, 11 July 2021

Template:Vital article

Inconsistent?

Maybe I am missing something, but as I read the text, the index goes from zero (0) to one (1), yet the two maps have values in the double digits (25 to 66). Values cannot exceed the index. Shouldn't the text be changed to read 0~100 or the maps to 0.25~0.66? Kdammers (talk) 17:25, 30 April 2020 (UTC)[reply]

The Gini index can be expressed as a percentage. So a Gini index of 0.3 is written as 30%. Publications occasionally omit the percent sign in these cases.−Woodstone (talk) 07:21, 1 May 2020 (UTC)[reply]
But why isn't the article consistent in its use? One hundred is not the same as 100%. I checked again: I can't see anything about percent in the map legend. Kdammers (talk) 14:59, 1 May 2020 (UTC)[reply]
The picture is probably from an external source and not editable. A Gini coefficient of less than 1% is theoretically possible, but never occurs in reality. So if you see a Gini coefficient displayed as a number over 1, you can safely interpret it as a percentage, and in reverse, a stated G under one can be safely interpreted as not a percentage.−Woodstone (talk) 12:54, 2 May 2020 (UTC)[reply]

Some suggested Gini-introduction wording and the Total Inequality subject

I've noticed that the formula for Total Inequality that I posted gives a value of .5 when the Gini = 0. When there's no inequality, then of course the Total Inequality should be zero, and so there must have been some error when I derived that formula. So I've deleted that post.

What remains below is something that I'd like to add to the top of the Gini article, along with a brief graphical explanation of a derivation of the formula for Gini, with repect to B.

I'll wait a month for that, in case there are objections.

Along with the graphical derivation, I'd like to follow it with an algebraic derivation.

If I find out what's wrong with my derivation of the Total Inequality formula that I posted here, I'll correct it and post it.

Here's some wording that I suggest to be placed at the top of the Gini article:

Given the importance of and interest in, for cumulative population up to some percentile, such as the poorest 10%, in a how much of their equal share that %ile group has, then... .

...the Gini is important and of interest as the sum, over all percentiles, of the shortfall (from equal-share of income) of the cumulative population up to each percentile. .

...as a single number that sums that shortfall over all percentiles. . — Preceding unsigned comment added by 71.84.136.105 (talk) 23:34, 16 October 2020 (UTC)[reply]

Suggested Additions to Gini Definition and Explanation.

I'd like to add an obvious, natural, intuitive explanation for the motivation for the Gini, an explanation that leads directly, naturally and easily to the graphical definition of the Gini.

Below is a start on a composition of that explanation, with more text and re-arrangement to be added. I'll wait about a month before I add it, in case there might be objection:

The Gini is the sum, over all population-percentiles, of the shortfall, from equal-share, of the cumulative-income up to each percentile. ....with that summed shortfall divided by the greatest value that it could have, with complete inequality (1 person having it all). — Preceding unsigned comment added by 71.84.136.105 (talk) 18:24, 21 October 2020 (UTC)[reply]

And, because a household's income's departure from mean, and changes in it, is counted in a lot more cumulative-income totals if the household is at a low percentile, then the Gini weights the departure from mean of a low-percentile household more strongly than that of a high-percentile household.

And, to answer the objection that the Gini overcounts the middle, and that (like all indices that look at all the incomes) the Gini doesn't tell where the departure from the mean occurred, I'd suggest that the difference between the Lorenz curve and the 45 degree line be integrated only up to the mean, or only to the 10th percentile.

For the version integrating to the mean, I'd call it "Gini to Mean", "Robin Hood Gini", or "Social Gini" (because what happens to the poorest is more socially important).

For the version that integrates to the 10th percentile, I'd call it "Gini to .1", or "Social Gini".

Of course, for those indices, the integral would be divided by the highest value that it could have. — Preceding unsigned comment added by 71.84.136.105 (talk) 19:15, 17 October 2020 (UTC)[reply]

The anonymous editor who wrote the above should be aware that WP is not the place to add one's own inventions and proposals. WP describes what is currently used widely in the world supported by reliable sources. −Woodstone (talk) 07:24, 23 October 2020 (UTC)[reply]

The article rightly points out that the Gini (like all the inequality indices that look at all incomes and summarize the inequality with one number) doesn't tell anything about what the departure from the mean is over any particular region of population-percentile. . It isn't a new "invention" or original proposal, or original-research, to mention that the abovestated objection is obviously answered if the distance between the Lorenz curve and the 45-degree line is integrated only up to, say, the 10th population-percentile, or to the 40th percentile, or to the population-percentile at which the mean income is had. ...and that summed shortfall is divided by the greatest value that it could have, as is usual with the Gini. .

That's just the application of the already-universally-used inequality-index to a specific segment of the population, instead of applying it over the entire population. ...in order to show something more specific. .

So yes, you're right that i shouldn't name that as if it were a new inequality-index, because of course it isn't one. It's just a use of the already-universal index, as an obvious answer to a common criticism of the Gini as currently used.


When I post, to this talk-page, text that I propose to be added, the integration of the Lorenz curve's shortfall from the equality-line within a lower-percentile region will only be mentioned pursuant to objections and issues raised in Wikipedia articles, including this Gini article.

It won't be presented or offered as a proposal.

I won't add it to the article unless, for a month after I post the proposed text to this talk-page, there's no objection. If there's objection or disagreement, then I'll abide by the Wikipedia procedure for disagreements.

— Preceding unsigned comment added by 71.84.136.105 (talk) 23:11, 23 October 2020 (UTC)[reply]


Wording That I Propose To Add, Regarding The Gini's Motivation:

Wording That I Propose To Add, Regarding The Gini's Motivation:

I propose to add the following wording after my brief definition paragraph and before the section with the graphical definition with the Lorenz diagram.

Motivation leading to the Gini:

(Initially, until otherwise stated,for this discussion, the percentile x is a percentile below the percentile at which the meann income occures.)

A sometimes-used inequality-measure is mean income divided by income at the `10th percentile. It could be replaced by its reciprocal, the income at the 10th percentile, or any percentile x, divided by what it would be with equality (the mean income).

That is,

Ix/Iav, where Ix is income at percentile x, and Iav is average income. But, instead...

1 - Ix/Iav

...is used instead, for which some justifiations are given a few lines below.

Multiplying the above expression by the constant Iav (constant in any particular income distributio) gives:

Iav - Ix. ...the shortfall of Ix from what it would be with equality.

Advantages of using Iav - Ix, the shortfall from equality-value, instead of Ix/Iav include the following:

In a sum of differences, if a number in one of the differences is changed, the magnitude of change in the overall sum is equal to the magnitude of the change in the number that was changed in one of the differences. Of course that isn't true if a number in one of the quotients in a sum of quotients is changed.

It results in an index that varies from 0 to 1 as inequality varies from none to maximum.

The shortfall from equal-share is what is used.

Of course, as an inequality-measure, the shortfall from equal-share of income at one percentile has the disadvantage that doesn't look at, count, be affected by, or react to change in, any other incomes, other than that one income.


From that, a first improvement in breadth, a first generalization by averaging, would be the average income shortfall, over the percentile-range from 0 to x. That can also be said as: The shortfall from equal-share of the cumulative income up to x percentile.

But that, too, has a disadvantage, a shortcoming: It doesn't registger or react to even the most drastic changes in the income distribution within that percentile-range from 0 to x.

e.g. For x = .1, the people in the 9th to 10th percentile range could take everything away from all the people in the 0 to 9th percentile range, and the index wouldn't react or change at all.


So, take the averaging-generalization another step: Average, over all percentiles from 0 to x, the cumulative income up to each percentile from 0 to x.

That, explicitly from the wording, registers, counts the shortfall from equal-share of the cumulative income up to each percentile from 0 to x.

If the distribution from percentiles 0 to x changes in a way that increases the inequality within the 0 to x range, the index reacts to that by increasing.

That average, over all percentiles from 0 to x, of the cumulative income up to each percentile from 0 to x, is divided by the greatest value that it could have. ...the value that it woiuld have if there were complete inequality (1 person having all the income).

Of course, when divided by its maximum possible value, the result is the same whether it's the mean or the sum, of those shortfalls of cumulative incomes, that is divided. In practice the sum is used, but the result can correctly be described as the result of dividing the mean by the largest value that it could have.


That's the Gini, evaluated for the percentile range from 0 to x.

It's an average of averages of individual income shortfalls.


Of course, in practice, as currently used, the Gini is evaluated over the whole percentile-range. i.e., from 0 to 1.

For x values up to the percentile at which the mean income occurs, the Gini is motivated and justified as described above.

But it has justification above the mean as well.

It could be argued that we don't care about the shortfall of cumulative incomes that include incomes above the mean. Incomes above the mean aren't being shorted, and their "shortfall from equal-share" is negative.

Yes, but the shortfall below equal-share of cumulative income up to a percentile, x, even if some of those incomes in that range are above the mean, shows the degree to which income is unequally-concentrated among percentiles above x.

So the above-mean part of the Gini isn't without value or meaning, because it counts income-concentration at the top.

Like CV (also called RSD, Relative Standard-Deviation) the Gini is intended to summarize both the equality-shortfall of lower incomes, and the concentration of income at the top. Hence the justification of evaluating the Gini all the way to the top.

...and therefore, like CV, the Gini has the disadvantage of not telling where in the population the departures from the mean occur. It is to be emphasized that the Gini has that shortcoming in common with any and every index that looks at all the incomes and reports a single number.

...and that the Gini has that lack of specificity only because specificity is intentionally traded for an overall single number that counts both the shortfall below the mean and the concentration at the top.

It is to be emphasized, then, that the Gini's lack of specificity is entirely the result a choice of generality instead of specificity. ...the choice to apply it over the entire population for an overall summary, rather than, only for some subset of the population, up to , say, the 10th percentile, the 40th percentile, or the percentile corresponding to the mean income, to report shortfall in that particular region.

The choice of generality over specificity.


Including the Gini, there are several inequality-measures that look at all incomes, and return a single number. The various such inequality-measures differ in how they weight changes in various parts of the income-distribution. That difference in weighting or emphasis of different parts of the income distribution doesn't mean that some measures are right and others are wrong. Choice between them is just a matter of which weighting or emphasis an individual prefers.

The Gini most strongly weights changes in incomes at the low end of the income-distribution. Here's why:

An income at a low percentile is counted in all the cumulative-incomes up to all the percentiles above that income's percentile. e.g. If the poorest person's income is raised by a dollar, that raises the cumulative income to every percentile by dollar. The entire Lorenz curve is raised by the amount by which the lowest income is raised. Changes to incomes at higher percentile change fewer cumulative-incomes.

Because the change to a low-percentile income is applied to more cumulative incomes, the Gini is changed more when a low-percentile income is changed by a given amount.

Therefore, changes in an income are weighted proportionally to the distance of that income's percentile from the 100th percentile.

...making the Gini particularly desirable,among the indices that look at all incomes and return one number, for those who are most interested in comparisons of how the lower-income population-percentiles are treated in an income-distribution.

(But obviously, if one is interested only in comparison of how a particular percentile-region of the population is treated, then that suggests the desirability of an inequality measure that looks specifically only at that percentile-region.)


— Preceding unsigned comment added by 71.84.136.105 (talk) 22:29, 25 October 2020 (UTC)[reply]

Much of the above wording without reference to reliable sources, remains a statement of personal ideas and proposals. Therefore it does not qualify for inclusion in WP. Please read WP:NOR.

Woodstone (talk) 10:19, 26 October 2020 (UTC)[reply]


I've replaced the proposed text-addition with a version from which I've removed or changed the criticizable or challengable parts.

The WP:NOR page clearly states that basic arithmetic isn't original research. I don't say anything in that proposed wording that isn't supported by basic arithmetic. Thus the standard of verifiability is met too, because every statement is verified by basic arithmetic.

But if something that I say in that proposed section lacks verification, or might be untrue, then of course I want to hear *specifically* about it, so that I can either remove or verify it. Was there a particular unverified statement in my proposed section that you were referring to?

I wasn't aware of my proposed text containing a proposal. What proposal?

The text here is a lot of rambling in incomplete sentences. But a key point seems to be limiting the calculation to a partial sum. Can you point to sources that discuss this way of extending the definition?? −Woodstone (talk) 09:29, 27 October 2020 (UTC)[reply]

"The text here is a lot of rambling..."

This use of "rambling" is a vague, referentless angry-noise. There's nothing disordered or rambling in my text. ...but of course I'd welcome constructive suggestions if clarity could be improved.

I started with a familiar, in-use, way to say something about an income-description, one mentioned in Wikipedia's "Inequality-metrics" article.

I mentioned the obvious fact that an obvious limitation of it is remedied by an averaging, resulting in another popular and familiar thing to say about an income-distribution.

Then I pointed out that the resulting measure of cumulative income to x percentile overlooks what happens within that region. ...and that,obviously, that shortcoming is remedied by averaging the cumulative-incomes up to all the percentiles up to x, instead of just to x. ...an obvious repetition of averaging. ...resulting in another familiar measure, the one that's universally used, with 100 percentile as x. The Gini. Having shown a motivation for that measure below the mean, I then told why it also says something relevant above the mean, as the Gini Index as actually used. Rather than rambling, those topics were covered in a useful and meaningful order.

"in incomplete sentences."

You're referring to my use of ellilpsis. I do that for clarity. Long sentences become unclear, just by their length. So I like to write a shorter sentence. ...followed by an ellipsis-deliminted continuation that adds information or clarifies meaning without confusingly jamming onto the short and clear sentence. The use of ellipsis in that manner isn't incorrect.

"But a key point seems to be limiting the calculation to a partial sum."

...when showing a motivation of the Gini, to remedy well-known shortcomings of certain familiar low-percentile income measures by repeated averaging.

...and to answer the unspecificity-criticism by clarifying that it isn't an inherent inevitable attribute of the Gini summation, but rather is only the result of an intentional choice for generality over specificity.

"Can you point to sources that discuss this way of extending the definition??"

My current text doesn't advocate any proposal. The averaging that I speak of is basic arithmetic, and, in fact, is already familiar and popular when it looks at the income-share of the bottom 10%, as opposed to the income at the 10th percentile.

That discussion based on already-familiar-and-popular uses of averaging, basic arithmetic, is much too obvious to need citation of Notable-Sources to verify it.

And basic arithmetic, especially when already in popular use for inequality-measures, isn't Original-Research.

Here's a quote from Wikipedia's "Inequaity-Metrics" article. ...a quote that shows the lack of understanding of the Gini's motivation and meaning:

"The Gini index is the most frequently used inequality index. The reason for its popularity is that it is easy to understand how to compute the Gini index as a ratio of two areas in Lorenz curve diagrams. As a disadvantage, the Gini index only maps a number to the properties of a diagram, but the diagram itself is not based on any model of a distribution process. The "meaning" of the Gini index only can be understood empirically. Additionally the Gini does not capture where in the distribution the inequality occurs. As a result, two very different distributions of income can have the same Gini index. "

The author of that implies that the Gini is without "meaning", and is about nothing but "the properties of a diagram."

The author implies that failure to show "where in the distribdution the inequaliy occurs is a specifically Gini problem (while actually it's an intentional choice for generality over specificity, made when applying the summation to the entire distribution and outputting a single number).

My text addresses those two misconceptions. If the Gini has an obvious motivation and meaning (It does), then the reader has a right to hear about it.

...via a use of basic arithmetic that's already familiarly and popularly in use for measuring inequality.


I've added a few paragraphs to my proposed article-addition, about the weighting of income-changes at different population-percentiles.

I won't add my proposed added-text to the article unless the objection to the addition has been resolved. ...in which case I'll add the text with whatever reasonable changes have been suggested by others. ...in whatever reasonable form has been suggested by others.

After a calendar-month, starting from today, has elapsed, if there hasn't been such a resolution, and if there evidently won't be one, then I guess the matter will go to a vote, or to arbitration based on official Wikipedia rules. It seems to me that, in that circumstance, a vote would be better, at least at first. Hopefully that vote could include any uninvolved and uncommitted visitors, so as to indicate what readers would prefer.

One purpose of that month's delay is to ensure that there's plenty of time for anyone to suggest improvements, especially for clarity, in the proposed added text.

...but also as an opportunity for objections to be expressed, and for me to state my answers to them. — Preceding unsigned comment added by 71.84.136.105 (talk) 19:37, 27 October 2020 (UTC)[reply]


Just one more thing here: It occurs to me that maybe a fairly big block of proposed added-text should be posted to the sandbox instead of to this talk-page. I don't know--As I understand it, posts to the sandbox are only there temporarily, so there wouldn't be much time for getting opinions. Anyway, I won't post more until I've written a complete final version of what I suggest.

It will include:

1. The Gini-motivation that I've described

2. Some answers to the objection about lack of information about where, in the distribution, the inequality is. Those answers will include a suggestion from a notable-source; and my clarification that that non-specificity is the result of an intentional choice for it, rather than an inherent Gini problem; and my mention of the Gini's obvious strong weighting of low-percentile incomes, which obviously means that the Gini _does_ say something about what's happening at a particular end of the distribution. (In addition to telling an obvious reason for that weighting, I'll also refer to a place in the Wikipedia Gini article that quotes a Gini-calculation-formula from a notable-source...a formula that clearly weights each household's income by its rank-number (larger for lower incomes) ).


Incidentally, of course all the indices that attempt to summarize the whole overall inequality have some weighting of the incomes of the different parts of the population, and changes in those incomes. There's no one right weighting. If there were a demonstrably right weighting of the importance of inequality between various population-segments, then it would make sense to use a number based on a that weighting of the inequalities. But, because there's no such right weighting of importances, then obviously it can't be meaningful, or make any sense, to give some aggregation-number based some weighting. To meaningfully tell about what's happening at both top and bottom, then it would be necessary to give separate numbers for those different facts. — Preceding unsigned comment added by 71.84.136.105 (talk) 18:19, 29 October 2020 (UTC)[reply]


Gini index other uses

Professor @Limit-theorem:, I see you deleted my contribution under other uses of the Gini index with the following comment Does not seem relevant or well spread a method. I can see that you have a solid mathematical/statistical background, so I wanted to make a case for my changes, in which the Gini-type coefficient was introduced in reliability theory. As you can see in FIGURE1 conceptually it is very similar to the original Gini coefficient but has a totally different application. Also, the Gini index has now far more application than just in economics. After its introduction in that paper, the Gini-type coefficient received multiple applications. This link is for proceedings from the last Annual Reliability and Maintainability Symposium in May 2021; The applications include but are not limited to:

  • M. Parsa, A. Di Crescenzo, and H. Jabbari, On Gini−type index applications in reliability analysis, Reliability Theory and its Applications, Mashhad, Iran, 2017.
  • M. Parsa, A. Di Crescenzo, H Jabbari, Analysis of reliability systems via Gini−type index, European Journal of Operational Research, pp. 340 – 353, 2018
  • N.L. Johnson & S. Kotz. A vector multivariate hazard rate. Journal of Multivariate Analysis, 5, 53–66, 1975.
  • A. Păun, C. Chandler, C.B. Leangsuksun, M. Păun. A failure index for HPC applications, Journal of Parallel and Distrib. Comput., Elsevier, 2016.
  • M. Kaminskiy, Gini−Type Index for Aging/Rejuvenating Populations, arXiv preprint, arXiv:1408.2724, 2014.

It would be remiss if this fundamental application of the Gini Coefficient is not included in the Wiki article. Where else do you think it should be added if not under other uses? Sarouk7 (talk) 13:37, 11 July 2021 (UTC)[reply]

Dear editor, thank you for your kind effort. Please note that these papers have no impact, the most cited appears to have 9 citations on Google Scholar, the other one has 4. This is not encyclopedia level. As to the "Parsa, Motahareh, Antonio Di Crescenzo, and Hadi Jabbari Nooghabi. "Comparison of Systems Ageing Properties by Gini-type Index." 13th Iranian Statistics Conference. 2016.", it has no citations. Limit-theorem (talk) 18:07, 11 July 2021 (UTC)[reply]