Spectral clustering

In multivariate statistics and the clustering of data, spectral clustering techniques make use of the spectrum (eigenvalues) of the similarity matrix of the data to perform dimensionality reduction before clustering in fewer dimensions. The similarity matrix is provided as an input and consists of a quantitative assessment of the relative similarity of each pair of points in the dataset.

Algorithms

Given a set of data points A, the similarity matrix may be defined as a matrix $S$ , where $S_{ij}$ represents a measure of the similarity between points $i,j\in A$ .

One spectral clustering technique is the normalized cuts algorithm or Shi–Malik algorithm introduced by Jianbo Shi and Jitendra Malik,^[1] commonly used for image segmentation. It partitions points into two sets $(B_{1},B_{2})$ based on the eigenvector $v$ corresponding to the second-smallest eigenvalue of the Laplacian matrix

L=I-D^{-1/2}SD^{-1/2}\,

of $S$ , where $D$ is the diagonal matrix

D_{ii}=\sum _{j}S_{ij}.

This partitioning may be done in various ways, such as by taking the median $m$ of the components in $v$ , and placing all points whose component in $v$ is greater than $m$ in $B_{1}$ , and the rest in $B_{2}$ . The algorithm can be used for hierarchical clustering by repeatedly partitioning the subsets in this fashion.

A related algorithm is the Meila–Shi algorithm^[2], which takes the eigenvectors corresponding to the k largest eigenvalues of the matrix $P=D^{-1}S$ for some k, and then invokes another algorithm (e.g. k-means clustering) to cluster points by their respective k components in these eigenvectors.

An efficiency improvement of spectral clustering is the spectral neighborhood (SPAN) algorithm^[3], which performs spectral clustering without explicitly computing the similarity matrix, and therefore dramatically improves the scalability of the standard spectral clustering algorithm.

Relationship with k-means

The kernel k-means problem is an extension of the k-means problem where the input data points are mapped non-linearly into a higher-dimensional feature space via a kernel function $k(x_{i},x_{j})=\phi ^{T}(x_{i})\phi (x_{j})$ . The weighted kernel k-means problem further extends this problem by defining a weight $w_{r}$ for each cluster as the reciprocal of the number of elements in the cluster,

\max _{C_{i}}\sum _{r=1}^{k}w_{r}\sum _{x_{i},x_{j}\in C_{r}}k(x_{i},x_{j}).

Suppose $F$ is a matrix of the normalizing coefficients for each point for each cluster $F_{ij}=w_{r}$ if $i,j\in C_{r}$ and zero otherwise. Suppose $K$ is the kernel matrix for all points. The weighted kernel k-means problem with n points and k clusters is given as,

\max _{F}\operatorname {trace} \left(KF\right)

such that,

F=G_{n\times k}G_{n\times k}^{T}

G^{T}G=I

such that ${\text{rank}}(G)=k$ . In addition, there are identity constrains on $F$ given by,

F\cdot \mathbb {I} =\mathbb {I}

where $\mathbb {I}$ represents a vector of ones.

F^{T}\mathbb {I} =\mathbb {I}

This problem can be recast as,

\max _{G}{\text{ trace }}\left(G^{T}G\right).

This problem is equivalent to the spectral clustering problem when the identity constraints on $F$ are relaxed. In particular, the weighted kernel k-means problem can be reformulated as a spectral clustering (graph partitioning) problem and vice-versa. The output of the algorithms are eigenvectors which do not satisfy the identity requirements for indicator variables defined by $F$ . Hence, post-processing of the eigenvectors is required for the equivalence between the problems.^[4] Transforming the spectral clustering problem into a weighted kernel k-means problem greatly reduces the computational burden.^[5]

References

^ Jianbo Shi and Jitendra Malik, "Normalized Cuts and Image Segmentation", IEEE Transactions on PAMI, Vol. 22, No. 8, Aug 2000.
^ Marina Meilă & Jianbo Shi, "Learning Segmentation by Random Walks", Neural Information Processing Systems 13 (NIPS 2000), 2001, pp. 873–879.
^ Liangcai Shu, Aiyou Chen, Ming Xiong, Weiyi Meng, "Efficient Spectral Neighborhood Blocking for Entity Resolution", IEEE International Conference on Data Engineering (ICDE), pp.1067-1078, Hannover, Germany, April 2011.
^ Dhillon, I.S. and Guan, Y. and Kulis, B. (2004). "Kernel k-means: spectral clustering and normalized cuts". Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. pp. 551--556. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help); Unknown parameter |organization= ignored (help)CS1 maint: multiple names: authors list (link)
^ Dhillon, Inderjit (November 2007). "Weighted Graph Cuts without Eigenvectors: A Multilevel Approach". IEEE Transactions on Pattern Analysis and Machine Intelligence. 29 (11): 1–14. {{cite journal}}: |access-date= requires |url= (help); Unknown parameter |coauthors= ignored (|author= suggested) (help)CS1 maint: date and year (link)

[1] Jianbo Shi and Jitendra Malik, "Normalized Cuts and Image Segmentation", IEEE Transactions on PAMI, Vol. 22, No. 8, Aug 2000.

[2] Marina Meilă & Jianbo Shi, "Learning Segmentation by Random Walks", Neural Information Processing Systems 13 (NIPS 2000), 2001, pp. 873–879.

[3] Liangcai Shu, Aiyou Chen, Ming Xiong, Weiyi Meng, "Efficient Spectral Neighborhood Blocking for Entity Resolution", IEEE International Conference on Data Engineering (ICDE), pp.1067-1078, Hannover, Germany, April 2011.

[dhillon2004kernel-4] Dhillon, I.S. and Guan, Y. and Kulis, B. (2004). "Kernel k-means: spectral clustering and normalized cuts". Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. pp. 551--556. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help); Unknown parameter |organization= ignored (help)CS1 maint: multiple names: authors list (link)

[5] Dhillon, Inderjit (November 2007). "Weighted Graph Cuts without Eigenvectors: A Multilevel Approach". IEEE Transactions on Pattern Analysis and Machine Intelligence. 29 (11): 1–14. {{cite journal}}: |access-date= requires |url= (help); Unknown parameter |coauthors= ignored (|author= suggested) (help)CS1 maint: date and year (link)

[1]

[2]

[3]

[4]

[5]

Algorithms

Relationship with k-means

See also

References