Unsupervised learning

In machine learning, the problem of unsupervised learning is that of trying to find hidden structure in unlabeled data. Since the examples given to the learner are unlabeled, there is no error or reward signal to evaluate a potential solution. This distinguishes unsupervised learning from supervised learning and reinforcement learning.

Unsupervised learning is closely related to the problem of density estimation in statistics.^[1] However unsupervised learning also encompasses many other techniques that seek to summarize and explain key features of the data. Many methods employed in unsupervised learning are based on data mining methods used to preprocess^{[citation needed]} data.

Approaches to unsupervised learning include:

clustering (e.g., k-means, mixture models, hierarchical clustering),^[2]
hidden Markov models,
blind signal separation using feature extraction techniques for dimensionality reduction (e.g., principal component analysis, independent component analysis, non-negative matrix factorization, singular value decomposition).^[3]

Among neural network models, the self-organizing map (SOM) and adaptive resonance theory (ART) are commonly used unsupervised learning algorithms. The SOM is a topographic organization in which nearby locations in the map represent inputs with similar properties. The ART model allows the number of clusters to vary with problem size and lets the user control the degree of similarity between members of the same clusters by means of a user-defined constant called the vigilance parameter. ART networks are also used for many pattern recognition tasks, such as automatic target recognition and seismic signal processing. The first version of ART was "ART1", developed by Carpenter and Grossberg (1988).^[4]

Applications in genomics

Unsupervised learning techniques are widely used to reduce the dimensionality of high dimensional genomic data sets that may involve hundreds of thousands of variables. For example, weighted correlation network analysis is often used for identifying clusters (referred to as modules), modeling the relationship between clusters, calculating fuzzy measures of cluster (module) membership, identifying intramodular hubs, and for studying cluster preservation in other data sets.^{[citation needed]}

Bibliography

^ Jordan, Michael I.; Bishop, Christopher M. (2004). "Neural Networks". In Allen B. Tucker (ed.). Computer Science Handbook, Second Edition (Section VII: Intelligent Systems). Boca Raton, FL: Chapman & Hall/CRC Press LLC. ISBN 1-58488-360-X.
^ Hastie,Trevor,Robert Tibshirani, Friedman,Jerome (2009). The Elements of Statistical Learning: Data mining,Inference,and Prediction. New York: Springer. pp. 485–586. ISBN 978-0-387-84857-0.{{cite book}}: CS1 maint: multiple names: authors list (link)
^ Acharyya, Ranjan (2008); A New Approach for Blind Source Separation of Convolutive Sources, ISBN 978-3-639-07797-1 (this book focuses on unsupervised learning with Blind Source Separation)
^ Carpenter, G.A. and Grossberg, S. (1988). "The ART of adaptive pattern recognition by a self-organizing neural network" (PDF). Computer. 21: 77–88. doi:10.1109/2.33.{{cite journal}}: CS1 maint: multiple names: authors list (link)

Hinton, Geoffrey; Sejnowski, Terrence J. (editors) (1999); Unsupervised Learning: Foundations of Neural Computation, MIT Press, ISBN 0-262-58168-X (This book focuses on unsupervised learning in neural networks)
Duda, Richard O.; Hart, Peter E.; and Stork, David G. (2001); Unsupervised Learning and Clustering, Chapter 10 in Pattern classification (2nd edition), p. 571, New York, NY: Wiley, ISBN 0-471-05669-3
Zoubin Ghahramani (September 16, 2004). "Unsupervised Learning" (PDF).

Hastie, T.,Robert, T., Jerome, F. (2009). The Elements of Statistical Learning: Data mining,Inference,and Prediction. New York: Springer. pp. 485–586. ISBN 978-0-387-84857-0.{{cite book}}: CS1 maint: multiple names: authors list (link)

Applications in genomics

Bibliography

See also