Talk:K-medians clustering

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Statistics (Rated Stub-class)
WikiProject icon

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

Stub-Class article Stub  This article has been rated as Stub-Class on the quality scale.
 ???  This article has not yet received a rating on the importance scale.
 

I teach a university course on Machine Learning, and just came across this.

I don't have the articles cited here handy for reference, but I'm just not sure about this claim that the use of medians as opposed to means is supposed to minimize the distances under taxicab metric vs Euclidean. Consider the 1-dimensional case. In this case taxicab and Euclidean are the same. Now imagine a skewed distribution, like a Gaussian that has been stretched on the right hand side of the center. This stretching doesn't move the median, but it does move the mean to the right to minimize the sum of distances. And since this is 1-dimensional, it should be true, no matter whether we're talking about taxicab or Euclidean, or Minkowski distances more generally.

In statistics, there are common reasons for using the median instead of the mean (it's basically to do with the relative importance of the number of instances for which you have a given amount of error, vs. magnitude of the error). -- So this thing doesn't really need the taxicab/Euclidean argument to motivate its use theoretically. So, as long as we're not sure, I'd suggest being conservative and removing this claim, or maybe someone can explain to me why this claim is actually true, despite my above argument and tell me what I'm missing, or point out a more readily available reference.

RichardBergmair 10:22, 10 Mar 2014 (UTC)