Talk:Kendall rank correlation coefficient
|WikiProject Statistics||(Rated Start-class, High-importance)|
|WikiProject Mathematics||(Rated Start-class, Low-importance)|
Suggest reformulation of kendall's tau definition
I notice there's been some debate here (more than 4 years ago!) over the definition of kendall's tau. While the current definition is technically correct, it's not the most intuitive definition. The way I've usually seen it defined is
This is the same as what's currently given because the sum of concordant and discordant pairs is just the total number of pairs, given by N choose 2 and equal to . See e.g. http://ir.cis.udel.edu/~carteret/papers/sigir09.pdf
- It's not equivalent because if there exists a pair such that or then
Replace the current formulation with the following...?
I found the current formulation and explanation to be wrong and misleading. I've prepared the following change:
"Kendall tau coefficient is defined
The denominator in the definition of can be interpreted as the total number of pairs of items. So, a high value in the numerator means that most pairs are concordant, indicating that the two rankings are consistent. Note that a tied pair is not regarded as concordant or discordant. If there is a large number of ties, the total number of pairs (in the denominator of the expression of ) should be adjusted accordingly."
This is from http://www.statsdirect.com/help/nonparametric_methods/kend.htm and I've confirmed it to be correct. Understanding it simply requires understanding concordant pairs, for which there is already a rather good entry. —Preceding unsigned comment added by Squeakywaffle (talk • contribs) 23:32, 18 September 2008 (UTC)
Well if nobody has any objections I'm going to go ahead and make this change. I will have to remove the example, but this talk page is dominated by comments questioning the validity of the current formulation and explanation, and IMO having those correct is more important than having an example.
Error in equation?
According to http://www.rsscse.org.uk/TS/bts/noether/text.html and my experiments, I think the equation should be 1 - 4P/n(n-1) NOT 4P/n(n-1) - 1 —Preceding unsigned comment added by 188.8.131.52 (talk) 23:35, 11 December 2007 (UTC)
Error in explanation
The phrasing of the actual definition needs work:
"where n is the number of items, and P is the sum, over all the items, of the number of items ranked after the given item by both rankings." —Preceding unsigned comment added by 184.108.40.206 (talk) 01:39, 3 July 2008 (UTC)
The last paragraph of the definition has a problem: it says
- P can also be interpreted as the number of concordant pairs subtracted by the number of discordant pairs.
This can't be literally true: P (as defined above) is a positive number, while this subtraction doesn't have to be.
Instead, while tau = 2P/N-1 (when N=n*(n-1)/2)), if we write S="number of concordant pairs subtracted by the number of discorant pairs", I think that tau = S/N, so that we have S=(2P-N).
Or am I missing something?
shouldn't the example be P = 5 + 4 + 4 + 4 + 3 + 1 + 0 + 0 = 22. instead of P = 5 + 4 + 5 + 4 + 3 + 1 + 0 + 0 = 22?
Sboehringer 17:00, 5 March 2007 (UTC)
The definition section says: "They are said to be discordant, if xi > xj and yi < yj or if xi < xj and yi > yj."
Shouldn't this be stated more generally: "They are said to be discordant, if xi > xj and yi ≤ yj or if xi < xj and yi ≥ yj."?
Empirical estimation vs formal definition
What is given in the section of the definition appears to me more like the estimator of Kendall's tau for a given sample.
The probabilistic definition should probably more look like:
where is an independent copy of .