Distance matrix

From Wikipedia, the free encyclopedia
Jump to: navigation, search

In mathematics, computer science and graph theory, a distance matrix is a matrix (two-dimensional array) containing the distances, taken pairwise, of a set of points. This matrix will have a size of N×N where N is the number of points, nodes or vertices (often in a graph).

Comparison with related matrices[edit]

Comparison with Adjacency matrix[edit]

Distance matrices are related to adjacency matrices, with the differences that (a) the latter only provides the information which vertices are connected but does not tell about costs or distances between the vertices and (b) an entry of a distance matrix is smaller if two elements are closer, while "close" (connected) vertices yield larger entries in an adjacency matrix.

Comparison with Euclidean distance matrix[edit]

Unlike a Euclidean distance matrix, the matrix does not need to be symmetric—that is, the values xi,j do not necessarily equal xj,i. Similarly, the matrix values are not restricted to non-negative reals (as they would be in the Euclidean distance matrix) but rather can have negative values, zeros or imaginary numbers depending on the cost metric and specific use. Although it is often the case, distance matrices are not restricted to being hollow—that is, they can have non-zero entries on the main diagonal.

Examples and uses[edit]

For example, suppose these data are to be analyzed, where pixel euclidean distance is the distance metric.

Raw data

The distance matrix would be:

a b c d e f
a 0 184 222 177 216 231
b 184 0 45 123 128 200
c 222 45 0 129 121 203
d 177 123 129 0 46 83
e 216 128 121 46 0 83
f 231 200 203 83 83 0

These data can then be viewed in graphic form as a heat map. In this image, black denotes a distance of 0 and white is maximal distance.

Graphical View

In bioinformatics, distance matrices are used to represent protein structures in a coordinate-independent manner, as well as the pairwise distances between two sequences in sequence space. They are used in structural and sequential alignment, and for the determination of protein structures from NMR or X-ray crystallography.

Sometimes it is more convenient to express data as a similarity matrix.

See also[edit]