Talk:Ward's method

Not minimum variance[edit]

The incorrect label for Ward as "minimum variance" method is widely spread, unfortunately. But in fact this method is "minimum increase of sum-of-squares (of errors)", which is not minimum variance. If it were minimum variance, the decision to link a cluster with a singleton point would always be the same as the decision made by centroid linkage method in that case (because assigning a point to the closest centroid guarantees the new cluster with minimal variance), but that is not the case and the two methods - Ward and centroid - can make different decisions in the above situation. See Podany, J. New combinatorial clustering methods // Vegetatio, vol 81, 1989, where on page 67 he's up in arms against incorrect labels for Ward's method.

188.123.252.14 (talk) 13:48, 1 February 2014 (UTC) There indeed exists true minimum variance method (called MNVAR by Podani) which is not identical to Ward's method.[reply]

The objective function of Ward's method is to minimize, at each step, 2*[SS12-(SS1+SS2)], where SS1 and SS2 are the within-cluster errors of clusters 1 and 2 and SS12 is the within-cluster error of their combined cluster.

Why do we multiply by 2?[edit]

The formula of Wards Method is : 2*[SS12-(SS1+SS2)]
or ?
SS12-(SS1+SS2)
And our aim is minimizing this right?
Does it matter we multiply with 2 or not? I am trying to formulate algo based on SSE BurstPower (talk) 16:38, 2 March 2016 (UTC)[reply]

→ Reply to BurstPower

Lance-Williams formula for Ward's method, on a step where clusters 1 and 2 have merged into cluster t, and now the distances between t and every other cluster r are being updated, is:

Dtr = [(Nr+N1)*Dr1+(Nr+N2)*Dr2-Nr*D12] / (Nr+Nt),

where some "Dij" is the distance between clusters i and j, and Ni is the number of points in cluster i.

Now, if the input distances are squared euclidean ones then this formula corresponds to the objective function Dtr = 2*[SS12-(SS1+SS2)].

Clearly, the 2 multiplier can be avoided on the steps (in order to speed up the process). Just divide the input squared distances by 2 before starting the agglomeration. Then the above Lance-Williams formula will correspond to objective function in the form Dtr = SS12-(SS1+SS2). So, you can go either way (with the same result), but the second one is programmically better: no need to multiply by 2 on every step.188.123.231.80 (talk) 19:09, 15 March 2016 (UTC) [reply]