|WikiProject Statistics||(Rated Start-class, Low-importance)|
Not minimum variance
The incorrect label for Ward as "minimum variance" method is widely spread, unfortunately. But in fact this method is "minimum increase of sum-of-squares (of errors)", which is not minimum variance. If it were minimum variance, the decision to link a cluster with a singleton point would always be the same as the decision made by centroid linkage method in that case (because assigning a point to the closest centroid guarantees the new cluster with minimal variance), but that is not the case and the two methods - Ward and centroid - can make different decisions in the above situation. See Podany, J. New combinatorial clustering methods // Vegetatio, vol 81, 1989, where on page 67 he's up in arms against incorrect labels for Ward's method.
The objective function of Ward's method is to minimize, at each step, 2*[SS12-(SS1+SS2)], where SS1 and SS2 are the within-cluster errors of clusters 1 and 2 and SS12 is the within-cluster error of their combined cluster.
Why do we multiply by 2?
The formula of Wards Method is : 2*[SS12-(SS1+SS2)]
And our aim is minimizing this right?
Does it matter we multiply with 2 or not? I am trying to formulate algo based on SSE BurstPower (talk) 16:38, 2 March 2016 (UTC)
→ Reply to BurstPower
Lance-Williams formula for Ward's method, on a step where clusters 1 and 2 have merged into cluster t, and now the distances between t and every other cluster r are being updated, is:
Dtr = [(Nr+N1)*Dr1+(Nr+N2)*Dr2-Nr*D12] / (Nr+Nt),
where some "Dij" is the distance between clusters i and j, and Ni is the number of points in cluster i.
Now, if the input distances are squared euclidean ones then this formula corresponds to the objective function Dtr = 2*[SS12-(SS1+SS2)].
Clearly, the 2 multiplier can be avoided on the steps (in order to speed up the process). Just divide the input squared distances by 2 before starting the agglomeration. Then the above Lance-Williams formula will correspond to objective function in the form Dtr = SS12-(SS1+SS2). So, you can go either way (with the same result), but the second one is programmically better: no need to multiply by 2 on every step.184.108.40.206 (talk) 19:09, 15 March 2016 (UTC)