Talk:Ward's method

WikiProject Statistics (Rated Start-class, Low-importance)

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

Start  This article has been rated as Start-Class on the quality scale.
Low  This article has been rated as Low-importance on the importance scale.

Not minimum variance

The incorrect label for Ward as "minimum variance" method is widely spread, unfortunately. But in fact this method is "minimum increase of sum-of-squares (of errors)", which is not minimum variance. If it were minimum variance, the decision to link a cluster with a singleton point would always be the same as the decision made by centroid linkage method in that case (because assigning a point to the closest centroid guarantees the new cluster with minimal variance), but that is not the case and the two methods - Ward and centroid - can make different decisions in the above situation. See Podany, J. New combinatorial clustering methods // Vegetatio, vol 81, 1989, where on page 67 he's up in arms against incorrect labels for Ward's method.

188.123.252.14 (talk) 13:48, 1 February 2014 (UTC) There indeed exists true minimum variance method (called MNVAR by Podani) which is not identical to Ward's method.

The objective function of Ward's method is to minimize, at each step, 2*[SS12-(SS1+SS2)], where SS1 and SS2 are the within-cluster errors of clusters 1 and 2 and SS12 is the within-cluster error of their combined cluster.

Why do we multiply by 2?

The formula of Wards Method is : 2*[SS12-(SS1+SS2)]
or ?
SS12-(SS1+SS2)
And our aim is minimizing this right?
Does it matter we multiply with 2 or not? I am trying to formulate algo based on SSE BurstPower (talk) 16:38, 2 March 2016 (UTC)