Index of dissimilarity

The index of dissimilarity is a demographic measure of the evenness with which two groups are distributed across component geographic areas that make up a larger area. The index score can also be interpreted as the percentage of one of the two groups included in the calculation that would have to move to different geographic areas in order to produce a distribution that matches that of the larger area. The index of dissimilarity can be used as a measure of segregation.

Basic formula

The basic formula for the index of dissimilarity is:

D={\frac {1}{2}}\sum _{i=1}^{N}\left|{\frac {a_{i}}{A}}-{\frac {b_{i}}{B}}\right|

where (comparing a black and white population, for example):

a_i = the population of group A in the i^th area, e.g. census tract

A = the total population in group A in the large geographic entity for which the index of dissimilarity is being calculated.

b_i = the population of group B in the i^th area

B = the total population in group B in the large geographic entity for which the index of dissimilarity is being calculated.

The index of dissimilarity is applicable to any categorical variable (whether demographic or not) and because of its simple properties is useful for input into multidimensional scaling and clustering programs. It has been used extensively in the study of social mobility to compare distributions of origin (or destination) occupational categories.

Linear algebra perspective

The formula for the Index of Dissimilarity can be made much more compact and meaningful by considering it from the perspective of Linear algebra. Suppose we are studying the distribution of rich and poor people in a city (e.g. London). Suppose our city contains $N$ blocks:

$\{{\text{block 1}},{\text{block 2}},\ldots ,{\text{block N}}\}$

Let's create a vector $\mathbf {r}$ which shows the number of rich people in each block of our city:

$\mathbf {r} =[r_{1},r_{2},\cdots ,r_{N}]$

Similarly, let's create a vector $\mathbf {p}$ which shows the number of poor people in each block of our city:

$\mathbf {p} =[p_{1},p_{2},\cdots ,p_{N}]$

Now, the $L^{1}$ -norm of a vector is simply the sum of (the magnitude of) each entry in that vector.^[1] That is, for a vector $\mathbf {v} =[v_{1},v_{2},\cdots ,v_{N}]$ , we have the $L^{1}$ -norm:

$|\mathbf {v} |_{1}=\sum _{i=1}^{N}|v_{i}|$

If we denote $R$ as the total number of rich people in our city, than a compact way to calculate $R$ would be to use the $L^{1}$ -norm:

$R=|\mathbf {r} |_{1}=\sum _{i=1}^{N}|r_{i}|$

Similarly, if we denote $P$ as the total number of poor people in our city, then:

$P=|\mathbf {p} |_{1}=\sum _{i=1}^{N}|p_{i}|$

When we divide a vector $\mathbf {v}$ by its norm, we get what is called the normalized vector or Unit vector ${\hat {\mathbf {v} }}$ :

${\hat {\mathbf {v} }}={\frac {\mathbf {v} }{|\mathbf {v} |_{1}}}$

Let us normalize the rich vector $\mathbf {r}$ and the poor vector $\mathbf {p}$ :

${\hat {\mathbf {r} }}={\frac {\mathbf {r} }{|\mathbf {r} |_{1}}}={\frac {\mathbf {r} }{R}}$

${\hat {\mathbf {p} }}={\frac {\mathbf {p} }{|\mathbf {r} |_{1}}}={\frac {\mathbf {p} }{P}}$

We finally return to the formula for the Index of Dissimilarity ( $D$ ); it is simply equal to one-half the $L^{1}$ -norm of the difference between the vectors ${\hat {\mathbf {r} }}$ and ${\hat {\mathbf {p} }}$ :

Index of Dissimilarity
(in Linear Algebraic notation)

$D={\frac {1}{2}}|{\hat {\mathbf {r} }}-{\hat {\mathbf {p} }}|_{1}$

Numerical example

Consider a city consisting of four blocks of 2 people each. One block consists of 2 rich people. One block consists of 2 poor people. Two blocks consist of 1 rich and 1 poor person. What is the index of dissimilarity for this city?

Firstly, let's find the rich vector $\mathbf {r}$ and poor vector $\mathbf {p}$ :

$\mathbf {r} =[2,0,1,1]$

$\mathbf {p} =[0,2,1,1]$

Next, let's calculate the total number of rich people and poor people in our city:

$R=2+0+1+1=4$

$P=0+2+1+1=4$

Next, let's normalize the rich and poor vectors:

${\hat {\mathbf {r} }}={\frac {\mathbf {r} }{R}}={\frac {1}{4}}[2,0,1,1]=[0.5,0,0.25,0.25]$

${\hat {\mathbf {p} }}={\frac {\mathbf {p} }{P}}={\frac {1}{4}}[0,2,1,1]=[0,0.5,0.25,0.25]$

We can now calculate the difference ${\hat {\mathbf {r} }}-{\hat {\mathbf {p} }}$ :

${\hat {\mathbf {r} }}-{\hat {\mathbf {p} }}=[0.5,0,0.25,0.25]-[0,0.5,0.25,0.25]=[0.5,-0.5,0,0]$

Finally, let's find the index of dissimilarity ( $D$ ):

$D={\frac {1}{2}}|{\hat {\mathbf {r} }}-{\hat {\mathbf {p} }}|_{1}={\frac {1}{2}}(|0.5|+|-0.5|)=0.5$

Equivalence between formulae

We can prove that the Linear Algebraic formula for $D$ is identical to the basic formula for $D$ . Let's start with the Linear Algebraic formula:

$D={\frac {1}{2}}|{\hat {\mathbf {r} }}-{\hat {\mathbf {p} }}|_{1}$

Let's replace the normalized vectors $\mathbf {r}$ and $\mathbf {p}$ with:

$D={\frac {1}{2}}\left|{\frac {\mathbf {r} }{R}}-{\frac {\mathbf {p} }{P}}\right|_{1}$

Finally, from the definition of the $L^{1}$ -norm, we know that we can replace it with the summation:

$D={\frac {1}{2}}\sum _{i=1}^{N}|{\frac {r_{i}}{R}}-{\frac {p_{i}}{P}}|$

Thus we prove that the linear algebra formula for the index of dissimilarity is equivalent to the basic formula for it:

$D={\frac {1}{2}}|{\hat {\mathbf {r} }}-{\hat {\mathbf {p} }}|_{1}={\frac {1}{2}}\sum _{i=1}^{N}|{\frac {r_{i}}{R}}-{\frac {p_{i}}{P}}|$

Zero segregation

When the Index of Dissimilarity is zero, this means that the community we are studying has zero segregation. For example, if we are studying the segregation of rich and poor people in a city, then if $D=0$ , it means that:

There are no blocks in the city which are "rich blocks", and there are no blocks in the city which are "poor blocks"
There is a homogeneous distribution of rich and poor people throughout the city

If we set $D=0$ in the linear algebraic formula, we get the necessary condition for having zero segregation:

$\mathbf {\hat {r}} =\mathbf {\hat {p}}$