Talk:Kendall tau distance

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Statistics (Rated Start-class, Mid-importance)
WikiProject icon

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

Start-Class article Start  This article has been rated as Start-Class on the quality scale.
 Mid  This article has been rated as Mid-importance on the importance scale.
 
WikiProject Mathematics (Rated Start-class, Low-importance)
WikiProject Mathematics
This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of Mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Mathematics rating:
Start Class
Low Importance
 Field: Discrete mathematics

Merge?[edit]

Can this be merged with Kendall tau rank correlation coefficient? This article seems to be just the unnormalised version. —3mta3 (talk) 01:49, 21 April 2010 (UTC)

Two different concepts, they should be treated in two different articles. Please note that Kendall tau rank correlation coefficient is a correlation in the \left[-1, 1\right] interval; Kendall tau distance is a metric, a distance, and is in the \left[0, 1\right] interval. It makes a huge difference, doesn't it? 150.203.177.97 (talk) 20:11, 1 May 2010 (UTC)
I don't see how. -- Walt Pohl (talk) 14:05, 10 May 2010 (UTC)
All that I can say is that I've never seen these two topics treated in the same context. 150.203.177.97 (talk) 03:25, 18 May 2010 (UTC)
There is no real indication of context/use for Kendall tau distance. The statement in the lead: "Kendall tau distance is also called bubble-sort distance since it is equivalent to the number of swaps that the bubble sort algorithm would make to place one list in the same order as the other list." must be wrong, as the value, as defined in this article, must be zero if two lists are in completely opposite orders whereas there must be a large number of swaps. So what use is this distance, as a distance? What is actually the right definition, or are there several possibilities, one of which might do for this "bubble sort" thing? Melcombe (talk) 12:29, 18 May 2010 (UTC)
Either the article was revised or the current text is extremely unclear, but several of the comments above are incorrect (compared to the current text).
[1] Kendall tau rank correlation coefficient is not (the only) normalization of Kendall tau distance. See [2], below.
[2] Kendall tau distance is not in the [0, 1] interval, but in the [0, n(n-1)/2] interval. Only by normalizing the value by dividing Kendall tau distance by n(n-1)/2 do we get [0, 1]. Clearly, this is a different way to normalize the distance than is suggested when we say that the Kendall tau rank correlation coefficient is a normalization of this statistic.
[3] While I cannot confirm, this suggests that the Kendall tau distance may still be the bubble-sort distance for two reasons: [3.A] it is not bound between 0 and 1 and [3.B] a normalized value of 1, i.e., a non-normalized value of n(n-1)/2 -- and not 0 -- indicates complete disagreement.
This is not intended as a critique of the earlier comments, but a clarification both to any future readers of the edit page and as a potential guideline for any merging / edit efforts with respect to what characteristics may need to be clarified on the main page. I don't actually understand this statistic well enough to comment on whether the merge should happen or not, but, regardless, I think a clearer linkage between the distance and the rank correlation coefficient is required. For example, can one be meaningfully stated in terms of the other? --128.237.247.68 (talk) 20:23, 27 April 2011 (UTC)--

To me, one is measuring correlation and the other (the degree of)consensus. While these may be similar mathematically, the applications are different. I would not merge the articles, but would continue to link them. —Preceding unsigned comment added by 152.20.223.195 (talk) 19:19, 17 June 2010 (UTC)

Yes, merge, they are essentially identical. An article can easily cover the variations in terminology and usage. Dicklyon (talk) 04:13, 29 July 2010 (UTC)

I would vote for merging them. As it is now, it is really confusing. --142.3.152.21 (talk) 23:03, 20 September 2010 (UTC)

Merge: the two measures are clearly connected. Where n=# of pairs, c=# of concordances, normalized tau distance = 2c/(n*(n-1)), while tau correlation coefficient is 2(2c-n)/(n*(n-1)). Just slightly different ways to express the exact same information. - Frankie1969 (talk) 14:00, 23 April 2011 (UTC)

Thanks for this summary. On rereading both articles, I think the solution is different than what you listed, and quite a bit simpler than that. Let p=# of pairs, c=# of concordances, and d=# of discordances (Note: p = c + d). Then, the following is true.
Kendall tau distance: K = d.
Normalized distance: norm(K) = d/p = K/p.
Correlation coefficient: \tau = (c-d)/p = (c - K)/p = (p - 2K)/p = 1 - 2K/p.
There we have the mathematical relation of the two terms: :\tau = 1 - 2K/p. Note that as K \rightarrow 0, \tau \rightarrow 1 and as K \rightarrow p, \tau \rightarrow -1. Since one is a simple, (inverse) linear transformation of the other, then I agree, these articles should be merged. --128.237.247.68 (talk) 20:51, 27 April 2011 (UTC)
They are related but not the same. These two articles should not be merged. For example, Manhattan Distance and Distance between two points are clearly related, however, these are two completely different ways of measuring distance and have a history behind their use. Kendall Tau is clearly related to the Kendall distance metric but they are two completely different things. The distance metric is used in the implementation of the Kendall Tau for speedup, but is used in other contexts not just lists. I vote no merge. — Preceding unsigned comment added by 99.37.231.254 (talk) 17:55, 19 December 2011 (UTC)

Missing information[edit]

I'm not a statistician, so I never heard of this notion; just came here because bubble sort distance redirects here. However, it strikes me that one point is important that is not discussed at all: are the lists supposed to contain the same set of elements but possibly in different order? It seems like that, since having distance 0 would not otherwise imply that the lists are equal. Also I note that having equal entries in the same list falsifies the count, as an extreme case a list with all terms equal would have distance 0 to any other list (and make the triangle inequality fail). It would be clearer to say that the notion applies to pairs of rankings, or permutations of n, if that is indeed intended.

Also if a list τ is explicitly (τ(1),τ(2),...,τ(n)) (which seems the most natural interpretation of the unexplained expression τ(i) that is used in the formulas), then the claim "it is equivalent to the number of swaps that the bubble sort algorithm would make to place one list in the same order as the other list" is not true. The sequences (2,4,1,3) and (3,4,1,2) have Kendall tau distance 1 (only the pair i=1, j=4 gives a different comparison), but it would take 5 adjacent transpositions to transform one sequence into the other (which sequence would moreover never by performed in bubble sort, since it only applies transpositions that put elements into the proper order). I can see that something can be corrected either by interpreting τ(i) differently, namely as the position of i in the list, assuming that it occurs exactly once, or by taking an unconventional view of what bubble sort does (interchanging values that have no values from the interval between them in the list, rather than neighbours), but wouldn't know which of the two to prefer. The first option would force the lists to have the same set of distinct elements in order for the formulas to make sense, thus "resolving" the difficulty I mentioned above.

Finally, if as suggested by the examples one is comparing two rankings of the same set that does not have any significant intrinsic ordering on them (such as a set of people), then any reference to bubble sort is really inappropriate, because sorting is about putting elements into the correct order for some intrinsically defined order, such as alphabetical order for strings. Please modify the text so that it makes a bit more sense. Marc van Leeuwen (talk) 13:42, 11 November 2011 (UTC)

I think our confusion is coming from the fact that in your example you are trying to bubble sort to an "unnatural" order. If you want to bubble sort (2,4,1,3) into (3,4,1,2), then you have to remember you are not sorting left to right, and so "adjacent neighbours" are not as the appear on screen. The second list defines your target order, and so defines the neighbours. To make things clearer, allow me to change your numbers to letters, so you want to sort (B,D,A,C) onto (C,D,A,B), then numbers refer to the positions in this array as displayed in this text (1,2,3,4). In which case, your target ordering defines the neighbours for comparison, so the "leftmost" datapoint is i=3, which is next to 4 which is next to 1 which is next to 2. So bubble sort first compares the objects as positions 3 and 4 and since A<C makes no swap. Next it checks position 4 against 1 (remembering that 4 is LEFT of 1) and since C>B it will switch. Next it compares position 1 against 2, and finds that C<D and so no switch. Since the list are in matching order now, it won't do any more switches. — Preceding unsigned comment added by 31.221.13.140 (talk) 10:05, 17 October 2012 (UTC)

Condition modification[edit]

I've slightly changed the definition of \tau_1 and \tau_2, to make it clearer. I've moved from:

  • \tau_1 and \tau_2 are the rankings of the elements in L1 and L2

to:

  • \tau_1(i) and \tau_2(i) are the rankings of the element i in L1 and L2

Can you guys confirm that my modification is correct? Thanks --Larry.europe (talk) 23:59, 5 February 2014 (UTC)