Balanced clustering: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
ce
Add: s2cid, journal, year, title, issue, volume, isbn, doi, pages, authors 1-1. Upgrade ISBN10 to 13. | Use this tool. Report bugs. | #UCB_Gadget
Line 1: Line 1:
'''Balanced clustering''' is a special case of [[Cluster analysis|clustering]] where, in the strictest sense, cluster sizes are constrained to <math>\lfloor {n\over k}\rfloor</math> or <math>\lceil{n \over k}\rceil</math>, where <math>n</math> is the number of points and <math>k</math> is the number of clusters.<ref>{{Cite journal|last=M. I. Malinen and P. Fränti|date=August 2014|title=Balanced k-Means for Clustering|journal=Joint Int. Workshop on Structural, Syntactic, and Statistical Pattern Recognition (S+SSPR 2014) |series=Lecture Notes in Computer Science |volume=8621}}</ref> A typical algorithm is balanced [[K-means clustering|k-means]], which minimizes [[Mean squared error|mean square error (MSE)]]. Another type of balanced clustering called balance-driven clustering has a two-objective cost function that minimizes both the imbalance and the MSE. Typical cost functions are ratio cut<ref>{{Cite journal|last=L. Hagen and A. B. Kahng|date=1992|title=New spectral methods for ratio cut partitioning and clustering|journal=IEEE Transactions on Computer-Aided Design}}</ref> and Ncut.<ref>{{Cite journal|author=J. Shi and J. Malik|date=2000|title=Normalized cuts and image segmentation|url=https://repository.upenn.edu/cis_papers/107|journal=IEEE Transactions on Pattern Analysis and Machine Intelligence|volume=22|issue=8|pages=888–905|doi=10.1109/34.868688}}</ref> Balanced clustering can be used for example in scenarios where freight has to be delivered to <math>n</math> locations with <math>k</math> cars. It is then preferred that each car delivers to an equal number of locations.
'''Balanced clustering''' is a special case of [[Cluster analysis|clustering]] where, in the strictest sense, cluster sizes are constrained to <math>\lfloor {n\over k}\rfloor</math> or <math>\lceil{n \over k}\rceil</math>, where <math>n</math> is the number of points and <math>k</math> is the number of clusters.<ref>{{Cite journal|last=M. I. Malinen and P. Fränti|date=August 2014|title=Balanced k-Means for Clustering|journal=Joint Int. Workshop on Structural, Syntactic, and Statistical Pattern Recognition (S+SSPR 2014) |series=Lecture Notes in Computer Science |volume=8621|pages=32–41 |doi=10.1007/978-3-662-44415-3_4 |isbn=978-3-662-44414-6 }}</ref> A typical algorithm is balanced [[K-means clustering|k-means]], which minimizes [[Mean squared error|mean square error (MSE)]]. Another type of balanced clustering called balance-driven clustering has a two-objective cost function that minimizes both the imbalance and the MSE. Typical cost functions are ratio cut<ref>{{Cite journal|last=L. Hagen and A. B. Kahng|date=1992|title=New spectral methods for ratio cut partitioning and clustering|journal=IEEE Transactions on Computer-Aided Design|volume=11 |issue=9 |pages=1074–1085 |doi=10.1109/43.159993 }}</ref> and Ncut.<ref>{{Cite journal|author=J. Shi and J. Malik|date=2000|title=Normalized cuts and image segmentation|url=https://repository.upenn.edu/cis_papers/107|journal=IEEE Transactions on Pattern Analysis and Machine Intelligence|volume=22|issue=8|pages=888–905|doi=10.1109/34.868688}}</ref> Balanced clustering can be used for example in scenarios where freight has to be delivered to <math>n</math> locations with <math>k</math> cars. It is then preferred that each car delivers to an equal number of locations.


== Software ==
== Software ==
Line 7: Line 7:
== References ==
== References ==
<references />
<references />
M.S. Levin (2017). "On Balanced Clustering (Indices, Models, Examples)". J. of Communications Technology and Electronics, 62(12): 1506–1515. {{doi|10.1134/S1064226917120105}}
{{cite journal |doi=10.1134/S1064226917120105 |title=On Balanced Clustering (Indices, Models, Examples) |year=2017 |last1=Levin |first1=M. Sh. |journal=Journal of Communications Technology and Electronics |volume=62 |issue=12 |pages=1506–1515 |s2cid=255277095 }}
[[Category:Clustering criteria]]
[[Category:Clustering criteria]]

Revision as of 09:41, 14 July 2023

Balanced clustering is a special case of clustering where, in the strictest sense, cluster sizes are constrained to or , where is the number of points and is the number of clusters.[1] A typical algorithm is balanced k-means, which minimizes mean square error (MSE). Another type of balanced clustering called balance-driven clustering has a two-objective cost function that minimizes both the imbalance and the MSE. Typical cost functions are ratio cut[2] and Ncut.[3] Balanced clustering can be used for example in scenarios where freight has to be delivered to locations with cars. It is then preferred that each car delivers to an equal number of locations.

Software

There exists implementations for balanced k-means[4] and Ncut[5]

References

  1. ^ M. I. Malinen and P. Fränti (August 2014). "Balanced k-Means for Clustering". Joint Int. Workshop on Structural, Syntactic, and Statistical Pattern Recognition (S+SSPR 2014). Lecture Notes in Computer Science. 8621: 32–41. doi:10.1007/978-3-662-44415-3_4. ISBN 978-3-662-44414-6.
  2. ^ L. Hagen and A. B. Kahng (1992). "New spectral methods for ratio cut partitioning and clustering". IEEE Transactions on Computer-Aided Design. 11 (9): 1074–1085. doi:10.1109/43.159993.
  3. ^ J. Shi and J. Malik (2000). "Normalized cuts and image segmentation". IEEE Transactions on Pattern Analysis and Machine Intelligence. 22 (8): 888–905. doi:10.1109/34.868688.
  4. ^ M. I. Malinen and P. Fränti. "Balanced k-Means implementation". University of Eastern Finland.
  5. ^ T. Cour, S. Yu and J. Shi. "Ncut implementation". University of Pennsylvania.

Levin, M. Sh. (2017). "On Balanced Clustering (Indices, Models, Examples)". Journal of Communications Technology and Electronics. 62 (12): 1506–1515. doi:10.1134/S1064226917120105. S2CID 255277095.