Chinese whispers (clustering method)
Chinese whispers is a clustering method used in network science named after the famous whispering game.[1] Clustering methods are basically used to identify communities of nodes or links in a given network. This algorithm was designed by Chris Biemann and Sven Teresniak in 2005.[1] The name comes from the fact that the process can be modeled as a separation of communities where the nodes send the same type of information to each other.[1]
Chinese whispers is a hard partitioning, randomized, flat clustering (no hierarchical relations between clusters) method.[1] The random property means that running the process on the same network several times can lead to different results, while because of hard partitioning one node can belong to only one cluster at a given moment. The original algorithm is applicable to undirected, weighted and unweighted graphs. Chinese whispers runs in linear time, which means that it is extremely fast even if there are very many nodes and links in the network.[1]
Algorithm
[edit]The algorithm works in the following way in an undirected unweighted graph:[1]
- All nodes are assigned to a distinct class, so that the number of initial classes equals the number of nodes.
- All of the network nodes are selected one by one in a random order. For each selected node, its cluster label is changed to the cluster with which it has the most connections. If there is a tie, one is picked randomly from the tied clusters.
- Step two repeats itself until a predetermined number of iterations or until the process converges. In the end the emerging classes represent the clusters of the network.
The predetermined threshold for the number of the iterations is needed because it is possible that process does not converge. On the other hand in a network with approximately 10000 nodes the clusters does not change significantly after 40-50 iterations even if there is no convergence.[1]
Strengths and weaknesses
[edit]The main strength of Chinese whispers lies in its time linear property. Because the processing time increases linearly with the number of nodes, the algorithm is capable of identifying communities in a network very fast. For this reason Chinese whispers is a good tool to analyze community structures in graph with a very high number of nodes. The effectiveness of the method increases further if the network has the small world property.[1]
On the other hand because the algorithm is not deterministic in the case of small node number the resulting clusters often significantly differ from each other. The reason for this is that in the case of a small network it matters more from which node the iteration process starts while in large networks the relevance of starting points disappears.[1] For this reason for small graphs other clustering methods are recommended.
Applications
[edit]Chinese whispers is used in many subfield of network science. Most frequently it is mentioned in the context of natural language processing problems.[2][3] On the other hand the algorithm is applicable to any kind of community identification problem which is related to a network framework. Chinese whispers is available for personal use as an extension package for Gephi[4] which is an open source program designed for network analysis.
References
[edit]- ^ a b c d e f g h i Chris Biemann,"Chinese Whispers- an Efficient Graph Clustering Algorithm and its Applications to Natural Language Processing Problems", 2006
- ^ Antonio Di Marco - Roberto Navigili,"Clustering and Diversifying Web Search Results with Graph Based Word Sense Induction", 2013
- ^ Ioannis Korkontzelos - Suresh Manandhar,"Detecting Compositionality in Multi-Word Expressions", 2009
- ^ ""Gephi Marketplace"". Archived from the original on 2015-09-23. Retrieved 2015-06-02.