Gini coefficient: Difference between revisions

Content deleted Content added
Rogmike (talk | contribs)
m It was stated that there is no simple connection between GINI coefficient of statistical dispersion GINI coefficient of a classifier. Since this connection exists I changed that sentence and added published reference [90]
Tags: Reverted Visual edit
Restored revision 1181481272 by Onyema Omenuwa (talk): COI / selfpromo
Line 428: Line 428:


== Relation to other statistical measures ==
== Relation to other statistical measures ==
There is a summary measure of the diagnostic ability of a binary classifier system that is also called the ''Gini coefficient'', which is defined as twice the area between the [[receiver operating characteristic]] (ROC) curve and its diagonal. It is related to the [[Receiver operating characteristic#Area under curve|AUC]] ([[Integral|Area Under]] the ROC Curve) measure of performance given by <math>AUC = (G+1)/2</math><ref name=hand>{{cite journal|last1=Hand|first1=David J.|first2=Robert J.|last2=Till|title=A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems|journal=Machine Learning|year=2001|volume=45|issue=2|pages=171–186|doi=10.1023/A:1010920819831|s2cid=43144161|url=https://link.springer.com/content/pdf/10.1023%2FA%3A1010920819831.pdf |archive-url=https://web.archive.org/web/20130810150809/http://link.springer.com/content/pdf/10.1023/A:1010920819831.pdf |archive-date=2013-08-10 |url-status=live|doi-access=free}}</ref> and to [[Mann–Whitney U]]. Both Gini coefficients are defined as areas between certain curves and share certain properties, there is simple direct relationship between the Gini coefficient of statistical dispersion (GINIs) and the Gini coefficient of a classifier (GINIc). If the ratio of "events" to "non-events" in the dataset is '''α''' then '''GINIs = GINIc / (1 + α)'''<ref>{{Cite web |last=Roginsky |first=Michael |date=November 1, 2023 |title=On connection of two GINI coefficients |url=https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4620262}}</ref>
There is a summary measure of the diagnostic ability of a binary classifier system that is also called the ''Gini coefficient'', which is defined as twice the area between the [[receiver operating characteristic]] (ROC) curve and its diagonal. It is related to the [[Receiver operating characteristic#Area under curve|AUC]] ([[Integral|Area Under]] the ROC Curve) measure of performance given by <math>AUC = (G+1)/2</math><ref name=hand>{{cite journal|last1=Hand|first1=David J.|first2=Robert J.|last2=Till|title=A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems|journal=Machine Learning|year=2001|volume=45|issue=2|pages=171–186|doi=10.1023/A:1010920819831|s2cid=43144161|url=https://link.springer.com/content/pdf/10.1023%2FA%3A1010920819831.pdf |archive-url=https://web.archive.org/web/20130810150809/http://link.springer.com/content/pdf/10.1023/A:1010920819831.pdf |archive-date=2013-08-10 |url-status=live|doi-access=free}}</ref> and to [[Mann–Whitney U]]. Although both Gini coefficients are defined as areas between certain curves and share certain properties, there is no simple direct relationship between the Gini coefficient of statistical dispersion and the Gini coefficient of a classifier.
<references />
The Gini index is also related to the Pietra index — both of which measure statistical heterogeneity and are derived from the Lorenz curve and the diagonal line.<ref>{{cite journal|first1=Iddo I.|last1=Eliazar |first2=Igor M.|last2=Sokolov |year=2010 |title=Measuring statistical heterogeneity: The Pietra index|journal=Physica A: Statistical Mechanics and Its Applications|volume=389|issue= 1|pages= 117–125|doi= 10.1016/j.physa.2009.08.006|bibcode=2010PhyA..389..117E}}</ref><ref>{{cite journal|title=Probabilistic Analysis of Global Performances of Diagnostic Tests: Interpreting the Lorenz Curve-Based Summary Measures|first=Wen-Chung|last=Lee|journal=Statistics in Medicine|volume=18|pages=455–471|year=1999|url=http://ntur.lib.ntu.edu.tw/bitstream/246246/160620/1/37.pdf|doi=10.1002/(SICI)1097-0258(19990228)18:4<455::AID-SIM44>3.0.CO;2-A|pmid=10070686|issue=4|access-date=1 August 2012|archive-date=3 August 2012|archive-url=https://web.archive.org/web/20120803160558/http://ntur.lib.ntu.edu.tw/bitstream/246246/160620/1/37.pdf|url-status=dead}}</ref><ref name="McDonald1974">{{cite journal |last1=McDonald |first1=James B |last2=Jensen |first2=Bartell C. |date=December 1979 |title=An Analysis of Some Properties of Alternative Measures of Income Inequality Based on the Gamma Distribution Function |url= |journal=Journal of the American Statistical Association |volume=74 |issue=368 |pages=856–860 |doi= 10.1080/01621459.1979.10481042|access-date=}}</ref>
The Gini index is also related to the Pietra index — both of which measure statistical heterogeneity and are derived from the Lorenz curve and the diagonal line.<ref>{{cite journal|first1=Iddo I.|last1=Eliazar |first2=Igor M.|last2=Sokolov |year=2010 |title=Measuring statistical heterogeneity: The Pietra index|journal=Physica A: Statistical Mechanics and Its Applications|volume=389|issue= 1|pages= 117–125|doi= 10.1016/j.physa.2009.08.006|bibcode=2010PhyA..389..117E}}</ref><ref>{{cite journal|title=Probabilistic Analysis of Global Performances of Diagnostic Tests: Interpreting the Lorenz Curve-Based Summary Measures|first=Wen-Chung|last=Lee|journal=Statistics in Medicine|volume=18|pages=455–471|year=1999|url=http://ntur.lib.ntu.edu.tw/bitstream/246246/160620/1/37.pdf|doi=10.1002/(SICI)1097-0258(19990228)18:4<455::AID-SIM44>3.0.CO;2-A|pmid=10070686|issue=4|access-date=1 August 2012|archive-date=3 August 2012|archive-url=https://web.archive.org/web/20120803160558/http://ntur.lib.ntu.edu.tw/bitstream/246246/160620/1/37.pdf|url-status=dead}}</ref><ref name="McDonald1974">{{cite journal |last1=McDonald |first1=James B |last2=Jensen |first2=Bartell C. |date=December 1979 |title=An Analysis of Some Properties of Alternative Measures of Income Inequality Based on the Gamma Distribution Function |url= |journal=Journal of the American Statistical Association |volume=74 |issue=368 |pages=856–860 |doi= 10.1080/01621459.1979.10481042|access-date=}}</ref>