Somers' D

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by 2601:445:4001:4514:c16c:ffd0:5f43:21b5 (talk) at 17:46, 5 January 2016. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

In statistics, Somers’ D, sometimes incorrectly referred to as Somer’s D, is a measure of ordinal association between two variables and . Somers’ D takes values between when all pairs of the variables disagree and when all pairs of the variables agree. Somers’ D is named after R. H. Somers, who proposed it in 1962.[1]

Somers’ D plays a central role in rank statistics and is the parameter behind many nonparametric methods.[2] It is also used as a quality measure of logistic regressions and credit scoring models.

Somers’ D for sample

We say that two pairs and are concordant, if the ranks of both elements agree, or and or if and . We say that two pairs and are discordant, if the ranks of both elements disagree, or if and or if and . If or , the pair is neither concordant nor discordant.

Let be a set of observations of two possibly dependent random variables and . Define Kendall tau rank correlation coefficient as

where is the number of concordant pairs and is the number of discordant pairs. Somes’ D of with respect to is defined as .

Note that Kendall's tau is symmetric in and , whereas Somers’ D is asymmetric in and .

Somers’ D for distribution

Let two bivariate random variables and are independently drawn from the same probability distribution . Again, Somers’ D can be defined through Kendall's tau

or the difference between the probabilities of concordance and discordance. Somes’ D of with respect to is defined as . Thus, is the difference between the two corresponding probabilities, conditional on the values not being equal. If has continuous СDF, then and Kendall's tau and Somers’ D coincide. Somers’ D normalizes Kendall's tau for possible mass points of variable .

If and are both binary with values 0 and 1, then Somers’ D is the difference between two probabilities:

Somers’ D for logistic regression

Several statistics can be used to measure quality of logistic regressions: AUC or c-statistic, Goodman and Kruskal's gamma, Kendall's tau (Tau-a), Somers’ D, etc. Somers’ D is probably the most widely used of the available rank order correlation statistics.[3] For being predicted probability of the outcome and being the outcome, Somers’ D for logistic regression can be rewritten as

where is the number of pairs tied on variable .

In logistic regressions, Somers’ D is related to the well-known area under the receiver operating characteristic curve (AUA), .

References

  1. ^ Somers, R. H. 1962. A new asymmetric measure of association for ordinal variables. American Sociological Review 27: 799–811.
  2. ^ Newson, Roger (2002). "Parameters behind "nonparametric" statistics: Kendall's tau, Somers' D and median differences". Stata Journal. 2 (1): 45–64.
  3. ^ O'Connell, A. A. (2005) Logistic Regression Models for Ordinal Response Variables (Quantitative Applications in the Social Sciences). Ohio State University, USA.