# Stress majorization

Stress majorization is an optimization strategy used in multidimensional scaling (MDS) where, for a set of n m-dimensional data items, a configuration X of n points in r(<<m)-dimensional space is sought that minimizes the so-called stress function ${\displaystyle \sigma (X)}$. Usually r is 2 or 3, i.e. the (n x r) matrix X lists points in 2- or 3-dimensional Euclidean space so that the result may be visualised (i.e. an MDS plot). The function ${\displaystyle \sigma }$ is a cost or loss function that measures the squared differences between ideal (${\displaystyle m}$-dimensional) distances and actual distances in r-dimensional space. It is defined as:

${\displaystyle \sigma (X)=\sum _{i

where ${\displaystyle w_{ij}\geq 0}$ is a weight for the measurement between a pair of points ${\displaystyle (i,j)}$, ${\displaystyle d_{ij}(X)}$ is the euclidean distance between ${\displaystyle i}$ and ${\displaystyle j}$ and ${\displaystyle \delta _{ij}}$ is the ideal distance between the points (their separation) in the ${\displaystyle m}$-dimensional data space. Note that ${\displaystyle w_{ij}}$ can be used to specify a degree of confidence in the similarity between points (e.g. 0 can be specified if there is no information for a particular pair).

A configuration ${\displaystyle X}$ which minimizes ${\displaystyle \sigma (X)}$ gives a plot in which points that are close together correspond to points that are also close together in the original ${\displaystyle m}$-dimensional data space.

There are many ways that ${\displaystyle \sigma (X)}$ could be minimized. For example, Kruskal[1] recommended an iterative steepest descent approach. However, a significantly better (in terms of guarantees on, and rate of, convergence) method for minimizing stress was introduced by Jan de Leeuw.[2] De Leeuw's iterative majorization method at each step minimizes a simple convex function which both bounds ${\displaystyle \sigma }$ from above and touches the surface of ${\displaystyle \sigma }$ at a point ${\displaystyle Z}$, called the supporting point. In convex analysis such a function is called a majorizing function. This iterative majorization process is also referred to as the SMACOF algorithm ("Scaling by MAjorizing a COmplicated Function").

## The SMACOF algorithm

The stress function ${\displaystyle \sigma }$ can be expanded as follows:

${\displaystyle \sigma (X)=\sum _{i

Note that the first term is a constant ${\displaystyle C}$ and the second term is quadratic in X (i.e. for the Hessian matrix V the second term is equivalent to tr${\displaystyle X'VX}$) and therefore relatively easily solved. The third term is bounded by:

${\displaystyle \sum _{i

where ${\displaystyle B(Z)}$ has:

${\displaystyle b_{ij}=-{\frac {w_{ij}\delta _{ij}}{d_{ij}(Z)}}}$ for ${\displaystyle d_{ij}(Z)\neq 0,i\neq j}$

and ${\displaystyle b_{ij}=0}$ for ${\displaystyle d_{ij}(Z)=0,i\neq j}$

and ${\displaystyle b_{ii}=-\sum _{j=1,j\neq i}^{n}b_{ij}}$.

Proof of this inequality is by the Cauchy-Schwarz inequality, see Borg[3] (pp. 152–153).

Thus, we have a simple quadratic function ${\displaystyle \tau (X,Z)}$ that majorizes stress:

${\displaystyle \sigma (X)=C+\,\operatorname {tr} \,X'VX-2\,\operatorname {tr} \,X'B(X)X}$
${\displaystyle \leq C+\,\operatorname {tr} \,X'VX-2\,\operatorname {tr} \,X'B(Z)Z=\tau (X,Z)}$

The iterative minimization procedure is then:

• at the kth step we set ${\displaystyle Z\leftarrow X^{k-1}}$
• ${\displaystyle X^{k}\leftarrow \min _{X}\tau (X,Z)}$
• stop if ${\displaystyle \sigma (X^{k-1})-\sigma (X^{k})<\epsilon }$ otherwise repeat.

This algorithm has been shown to decrease stress monotonically (see de Leeuw[2]).

## Use in graph drawing

Stress majorization and algorithms similar to SMACOF also have application in the field of graph drawing.[4][5] That is, one can find a reasonably aesthetically appealing layout for a network or graph by minimizing a stress function over the positions of the nodes in the graph. In this case, the ${\displaystyle \delta _{ij}}$ are usually set to the graph-theoretic distances between nodes i and j and the weights ${\displaystyle w_{ij}}$ are taken to be ${\displaystyle \delta _{ij}^{-\alpha }}$. Here, ${\displaystyle \alpha }$ is chosen as a trade-off between preserving long- or short-range ideal distances. Good results have been shown for ${\displaystyle \alpha =2}$.[6]

## References

1. ^ Kruskal, J. B. (1964), "Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis", Psychometrika, 29 (1): 1–27, doi:10.1007/BF02289565.
2. ^ a b de Leeuw, J. (1977), "Applications of convex analysis to multidimensional scaling", in Barra, J. R.; Brodeau, F.; Romie, G.; et al., Recent developments in statistics, pp. 133–145.
3. ^ Borg, I.; Groenen, P. (1997), Modern Multidimensional Scaling: theory and applications, New York: Springer-Verlag.
4. ^ Michailidis, G.; de Leeuw, J. (2001), "Data visualization through graph drawing", Computation Stat., 16 (3): 435–450, doi:10.1007/s001800100077.
5. ^ Gansner, E.; Koren, Y.; North, S. (2004), "Graph Drawing by Stress Majorization", Proceedings of 12th Int. Symp. Graph Drawing (GD'04), Lecture Notes in Computer Science, 3383, Springer-Verlag, pp. 239–250.
6. ^ Cohen, J. (1997), "Drawing graphs to convey proximity: an incremental arrangement method", ACM Transactions on Computer-Human Interaction, 4 (3): 197–229, doi:10.1145/264645.264657.