# Genetic distance

Genetic distance refers to the genetic divergence between species or between populations within a species. Smaller genetic distances indicate that the populations have more similar genes. This indicates that they are closely related i.e. that they have a recent common ancestor or recent interbreeding has taken place. Genetic distance is useful in reconstructing the history of populations. For example, evidence from genetic distance suggests that humans arrived in America about 30 000 years ago. [1] Genetic distance is also used in conservation, where the genetic distance between breeds of domesticated animals are measured in order to determine which breeds must be protected to maintain biodiversity.[2]

## Biological foundation

Each gene in an organisms genome exists at a specific location, called the locus for that gene. Viable variations (called alleles) at these loci cause variety within species (eg. hair colour, eye colour). Most alleles do not have an observable impact on the organism. Within a population new alleles caused by mutation quickly die out or spread throughout the population. However when a population is split into two smaller populations (by either geography or speciation), mutations occurring after the split will be present in only one of the two groups. Random fluctuations in the prevalence of other alleles (due to the essentially random process of reproduction) also produce differences between isolated populations. This process is known as genetic drift. By examining the difference between allele frequencies between the populations, genetic distance can estimate how long ago the two populations were together.

## Measures of genetic distance

Genetic distance works by examining the frequency with which particular alleles are found in the populations or species. Two populations with the same frequencies for all alleles are considered genetically identical. There is less consensus on how to measure differing populations, and a large number of different distance metrics are used. The principle difficulty is how best to combine the information detected for large numbers of alleles.[3] As a result, there are several measures used to indicate genetic distance.[4] The most commonly used are Nei's genetic distance, Cavalli-Sforza and Edwards measure, Reynolds, Weir and Cockerham's genetic distance,[5] listed below.

In all the formulae in this section, we suppose that $X$ and $Y$ are two populations for which $L$ loci have been sampled and let $X_{u}$ represent the $u$th allele at the $l$th locus.

### Nei's standard genetic distance

In 1972, Masatoshi Nei published what came to be known as Nei's standard genetic distance. This distance has the nice property that, if the rate of genetic change does not vary between loci then Nei's standard genetic distance is the number of changes per locus. This measure assumes that genetic differences arise due to mutation and genetic drift.[6]

\begin{align} D_{a}=-\ln\frac{\sum \limits_l \sum \limits_{u} X_{u} Y_{u}}{\sqrt{(\sum \limits_{l} \sum \limits_{u} X_{u}^2)(\sum \limits_{l} \sum \limits_{u} Y_{u}^2)}} \end{align}

This distance can also be expressed in terms of the arithmetic mean probabilities of identity. Let $j_X$ be the probability the two members of population $X$ having the same allele at a particular locus and $j_{XY}$ the probability of a member of $X$ and a member of $Y$ having the same allele. $J_X$, $J_Y$ and $J_{XY}$ are the arithmetic mean of $j_X$, $j_Y$ and $j_{XY}$ over all loci. These can be written[7]

\begin{align} J_X=\sum \limits_{l} \sum \limits_{u} \frac{{X_u}^2}{r} \end{align}
\begin{align} J_Y=\sum \limits_{l} \sum \limits_{u} \frac{{Y_u}^2}{r} \end{align}
\begin{align} J_{XY}=\sum \limits_{l} \sum \limits_{u} \frac{X_uY_u}{r} \end{align}

Nei's standard distance can then be written

\begin{align} D_{a}=-\ln{\frac{J_{XY}}{\sqrt{J_XJ_Y}}} \end{align}[4]

### Cavalli-Sforza chord measure

In 1967 Luigi Luca Cavalli-Sforza and A. W. F. Edwards published this measure. It assumes that genetic differences arise due to genetic drift only. One major advantage of the Cavalli-Sforza is that the populations are represented in a high dimensional Euclidean space, the scale of which is one unit per gene substitution. This makes the distance a Euclidean distance and gives the distance an intuitive biological foundation.[8]

\begin{align} D_{CH} = \frac{2}{\pi} \sqrt{2(L-\sum \limits_{l}\sum \limits_u \sqrt{X_{u}Y_{u})}} \end{align}

Some authors drop the factor of $\frac{2}{\pi}$. This simplifies the formula at the cost of losing the property that the scale is one unit per gene substitution.

### Reynolds, Weir, and Cockerham's genetic distance

In 1983, this measure was published by John Reynolds, B.S. Weir and C. Clark Cockerham. This measure assumes that genetic differences arise due to genetic drift only. It estimates the coancestry coefficient $\Theta$ which provides a measure of the genetic distance by:[9]

\begin{align} \Theta_w=\sqrt{\frac{\sum \limits_{l} \sum \limits_{u} (X_u-Y_u)^2}{2\sum \limits_{l} (1-\sum \limits_{u}X_uY_u)}} \end{align}

### Other measures of genetic distance

Many other measures of genetic distance have been proposed with varying success.

#### Nei's distance 1983

This distance assumes that genetic differences arise due to mutation and genetic drift, but this distance measure is known to give more reliable population trees than other distances particularly for microsatellite DNA data.[10]

\begin{align} D_{A}=1-\sum \limits_{u} \sqrt{X_uY_u} \end{align} [10]

#### Euclidean distance

\begin{align} D_{EU}=\sqrt{\sum \limits_{u}(X_u-Y_u)^2} \end{align} [4]

#### Goldstein distance 1995

It was specifically devoloped for microsatellite markers and is based on the stepwise-mutation model (SMM). $\mu_X$ and $\mu_Y$ are the means of the allele frequencies in population X and Y. [11]

\begin{align} (\delta\mu)^2=(\mu_X-\mu_Y)^2 \end{align} [11]

#### Nei's minimum genetic distance 1973

This measure assumes that genetic differences arise due to mutation and genetic drift.

\begin{align} D_{m}=\frac{(J_X+J_Y)}{2}-J_{XY} \end{align} [12]

#### Roger's distance 1972

\begin{align} D_{R}=\frac{1}{r}\sqrt\frac{\sum \limits_{u} (X_u-Y_u)^2}{2} \end{align} [13]

#### Fixation index

A commonly used measure of genetic distance is the fixation index which varies between 0 and 1. A value of 0 indicates that two populations are genetically identical whereas a value of 1 indicates that two populations are different species.

## Software

• PHYLIP uses GENDIST
• Nei's standard genetic distance 1972
• Cavalli-Sforza and Edwards 1967
• Reynolds, Weir, and Cockerham's 1983
• TFPGA
• Nei's standard genetic distance (original and unbiased)
• Nei's minimum genetic distance (original and unbiased)
• Wright's (1978) modification of Roger's (1972) distance
• Reynolds, Weir, and Cockerham's 1983
• POPGENE
• Nei's standard genetic distance (original and unbiased) and identity measures
• DISPAN
• Nei's standard genetic distance 1972
• Nei's genetic distance between populations 1983

## References

1. ^ Piazza, L. Luca Cavalli-Sforza, Paolo Menozzi, Alberto (1994). The history and geography of human genes (Abridged paperback ed. ed.). Princeton, N.J.: Princeton University Press. p. 95. ISBN 0-691-08750-4.
2. ^ Ruane, J. (1999). A critical review of the value of genetic distance studies in conservation of animal genetic resources. Journal of Animal Breeding and Genetics, 116(5), 317-323. Chicago
3. ^ L.L Cavalli-Sforza, W.F. Bodmer (1971). The Genetics of Human Populations. W.H. Freeman and Company. ISBN 0-7167-0681-4.
4. ^ a b c Population Genetics IV: Genetic distances -- biological vs. geometric approaches.
5. ^ McEachern, MaryBrooke; Savage, W.; Hooper, S.; Kanthaswamy, S. "Measures of Genetic Distance". Retrieved 21 October 2013.
6. ^ Nei, M. (1972) Genetic distance between populations. Am. Nat. 106:283-292.
7. ^ Nei, Masatochi. "Measures of Genetic Distance". Retrieved 22 October 2013.
8. ^ L.L. Cavalli-Sforza, A.W.F. Edwards (1967). "Phylogenetic Analysis -Models and Estimation Procedures". The American Journal of Human Genetics 19 (No. 3 Part I (May)).
9. ^ John Reynolds, B.S. Weir, C. Clark Cockerham (November 1983). "Estimation of the coancestry coefficient: Basis for a short-term genetic distance". Genetics 105: 767–779.
10. ^ a b Takezaki, N. and Nei, M. (1996) Genetic distances and reconstruction of phylogenetic trees from microsatellite DNA. Genetics 144:389-399.
11. ^ a b Gillian Cooper; William Amos; Richard Bellamy; Mahveen Ruby Siddiqui; Angela Frodsham; Adrian V. S. Hill; David C. Rubinsztein (1999). "An Empirical Exploration of the $(\delta\mu)^2$ Genetic Distance for 213 Human Microsatellite Markers". The American Journal of Human Genetics 65: 1125–1133.
12. ^ Masatoshi Nei, A.K. Roychoudhury (February 1974). "Sampling vaiances of heterozygosity and genetic distance". Genetics 76: 379-390.
13. ^ Naoko Takezaki; Masatoshi Nei (Septemher 1996). "Genetic Distances and Reconstruction of Phylogenetic Trees From Microsatellite DNA". Genetics 144: 389-399.