t-distributed stochastic neighbor embedding

From Wikipedia, the free encyclopedia
Jump to: navigation, search

t-distributed stochastic neighbor embedding (t-SNE) is a machine learning algorithm for dimensionality reduction developed by Laurens van der Maaten and Geoffrey Hinton.[1] It is a nonlinear dimensionality reduction technique that is particularly well suited for embedding high-dimensional data into a space of two or three dimensions, which can then be visualized in a scatter plot. Specifically, it models each high-dimensional object by a two- or three-dimensional point in such a way that similar objects are modeled by nearby points and dissimilar objects are modeled by distant points.

The t-SNE algorithms comprises two main stages. First, t-SNE constructs a probability distribution over pairs of high-dimensional objects in such a way that similar objects have a high probability of being picked, whilst dissimilar points have an infinitesimal probability of being picked. Second, t-SNE defines a similar probability distribution over the points in the low-dimensional map, and it minimizes the Kullback–Leibler divergence between the two distributions with respect to the locations of the points in the map.

t-SNE has been used in a wide range of applications, including computer security research,[2] music analysis,[3] cancer research,[4] and bio-informatics.[5]


Given a set of N high-dimensional objects \mathbf{x}_1, \dots, \mathbf{x}_N, t-SNE first computes probabilities p_{ij} that are proportional to the similarity of objects \mathbf{x}_i and \mathbf{x}_j, as follows:

p_{j|i} = \frac{\exp(-\lVert\mathbf{x}_i - \mathbf{x}_j\rVert^2 / 2\sigma_i^2)}{\sum_{k \neq i} \exp(-\lVert\mathbf{x}_i - \mathbf{x}_k\rVert^2 / 2\sigma_i^2)},

p_{ij} = \frac{p_{j|i} + p_{i|j}}{2N}

The bandwidth of the Gaussian kernels \sigma_i, is set in such a way that the perplexity of the conditional distribution equals a predefined perplexity using a binary search. As a result, the bandwidth is adapted to the density of the data: smaller values of \sigma_i are used in denser parts of the data space.

t-SNE aims to learn a d-dimensional map \mathbf{y}_1, \dots, \mathbf{y}_N (with \mathbf{y}_i \in \mathbb{R}^d) that reflects the similarities p_{ij} as well as possible. To this end, it measures similarities q_{ij} between two points in the map \mathbf{y}_i and \mathbf{y}_j, using a very similar approach. Specifically, q_{ij} is defined as:

q_{ij} = \frac{(1 + \lVert \mathbf{y}_i - \mathbf{y}_j\rVert^2)^{-1}}{\sum_{k \neq l} (1 + \lVert \mathbf{y}_k - \mathbf{y}_l\rVert^2)^{-1}}

Herein a heavy-tailed Student-t distribution is used to measure similarities between low-dimensional points in order to allow dissimilar objects to be modeled far apart in the map CITATION.

The locations of the points \mathbf{y}_i in the map are determined by minimizing the (non-symmetric) Kullback–Leibler divergence of the distribution Q from the distribution P, that is:

KL(P||Q) = \sum_{i \neq j} p_{ij} \, \log \frac{p_{ij}}{q_{ij}}

The minimization of the Kullback–Leibler divergence with respect to the points \mathbf{y}_i is performed using gradient descent. The result of this optimization is a map that reflects the similarities between the high-dimensional inputs well.


  1. ^ van der Maaten, L.J.P.; Hinton, G.E. (Nov 2008). "Visualizing High-Dimensional Data Using t-SNE". Journal of Machine Learning Research 9: 2579–2605. 
  2. ^ Gashi, I.; Stankovic, V.; Leita, C.; Thonnard, O. (2009). "An Experimental Study of Diversity with Off-the-shelf AntiVirus Engines". Proceedings of the IEEE International Symposium on Network Computing and Applications: 4–11. 
  3. ^ Hamel, P.; Eck, D. (2010). "Learning Features from Music Audio with Deep Belief Networks". Proceedings of the International Society for Music Information Retrieval Conference: 339–344. 
  4. ^ Jamieson, A.R.; Giger, M.L.; Drukker, K.; Lui, H.; Yuan, Y.; Bhooshan, N. (2010). "Exploring Nonlinear Feature Space Dimension Reduction and Data Representation in Breast CADx with Laplacian Eigenmaps and t-SNE". Medical Physics 37(1) 37: 339–351. doi:10.1118/1.3267037. 
  5. ^ Wallach, I.; Liliean, R. (2009). "The Protein-Small-Molecule Database, A Non-Redundant Structural Resource for the Analysis of Protein-Ligand Binding". Bioinformatics 25(5) 25 (5): 615–620. doi:10.1093/bioinformatics/btp035.