Genetic distance

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Genetic distance is the genetic divergence between species or between populations within a species. Populations with more similar genes have smaller genetic distances. This indicates that they are closely related i.e. that they have a recent common ancestor or recent interbreeding has taken place.

Genetic distance is useful in reconstructing the history of populations. For example, evidence from genetic distance suggests that humans arrived in America about 30 000 years ago.[1] Genetic distance is also used in conservation, where the genetic distance between breeds of domesticated animals are measured in order to determine which breeds must be protected to maintain biodiversity.[2]

Biological foundation[edit]

Each gene in an organism's genome exists at a specific location, called the locus for that gene. Viable variations (called alleles) at these loci cause variety within species (e.g. hair colour, eye colour). Most alleles do not have an observable impact on the organism. Within a population new alleles caused by mutation either die out or spread throughout the population. However when a population is split into smaller isolated populations (by either geography or speciation), mutations that occur after the split will be present only in the isolated population. Random fluctuations in the prevalence of other alleles (due to the essentially random process of reproduction) also produce differences between isolated populations. This process is known as genetic drift. By examining the difference between allele frequencies between the populations, genetic distance can estimate how long ago the two populations were together.

Measures of genetic distance[edit]

Genetic distance examines the frequency with which particular alleles are found in the populations or species. Two populations with the same frequencies for all alleles are considered genetically identical. There is less consensus on how to measure differing populations, and a large number of different genetic distance metrics are used. The principal difficulty is how best to combine the information detected for large numbers of alleles.[3] As a result, there are several measures used to indicate genetic distance.[4] The most commonly used are Nei's genetic distance, Cavalli-Sforza and Edwards measure, Reynolds, Weir and Cockerham's genetic distance,[5] listed below.

In all the formulae in this section, we suppose that X and Y are two populations for which L loci have been sampled and let X_{u} represent the uth allele at the lth locus.

Nei's standard genetic distance[edit]

In 1972, Masatoshi Nei published what came to be known as Nei's standard genetic distance. This distance has the nice property that, if the rate of genetic change does not vary between loci then Nei's standard genetic distance is the number of changes per locus. This measure assumes that genetic differences arise due to mutation and genetic drift.[6]

\begin{align}
D_{a}=-\ln\frac{\sum \limits_l \sum \limits_{u} X_{u} Y_{u}}{\sqrt{(\sum \limits_{l} \sum \limits_{u} X_{u}^2)(\sum \limits_{l} \sum \limits_{u} Y_{u}^2)}}
\end{align}

This distance can also be expressed in terms of the arithmetic mean probabilities of identity. Let j_X be the probability the two members of population X having the same allele at a particular locus and j_{XY} the probability of a member of X and a member of Y having the same allele. J_X, J_Y and J_{XY} are the arithmetic mean of j_X, j_Y and j_{XY} over all loci. These can be written[7]

\begin{align}
J_X=\sum \limits_{l} \sum \limits_{u} \frac{{X_u}^2}{r}
\end{align}
\begin{align}
J_Y=\sum \limits_{l} \sum \limits_{u} \frac{{Y_u}^2}{r}
\end{align}
\begin{align}
J_{XY}=\sum \limits_{l} \sum \limits_{u} \frac{X_uY_u}{r}
\end{align}

Nei's standard distance can then be written

\begin{align}
D_{a}=-\ln{\frac{J_{XY}}{\sqrt{J_XJ_Y}}}
\end{align}[4]

Cavalli-Sforza chord measure[edit]

In 1967 Luigi Luca Cavalli-Sforza and A. W. F. Edwards published this measure. It assumes that genetic differences arise due to genetic drift only. One major advantage of the Cavalli-Sforza is that the populations are represented in a high-dimensional Euclidean space, the scale of which is one unit per gene substitution. This makes the distance a Euclidean distance and gives the distance an intuitive biological foundation.[8]

\begin{align}
D_{CH} = \frac{2}{\pi} \sqrt{2(L-\sum \limits_{l}\sum \limits_u \sqrt{X_{u}Y_{u})}}
\end{align}

Some authors drop the factor of \frac{2}{\pi}. This simplifies the formula at the cost of losing the property that the scale is one unit per gene substitution.

Reynolds, Weir, and Cockerham's genetic distance[edit]

In 1983, this measure was published by John Reynolds, B.S. Weir and C. Clark Cockerham. This measure assumes that genetic differences arise due to genetic drift only. It estimates the coancestry coefficient \Theta which provides a measure of the genetic distance by:[9]

\begin{align}
\Theta_w=\sqrt{\frac{\sum \limits_{l} \sum \limits_{u} (X_u-Y_u)^2}{2\sum \limits_{l} (1-\sum \limits_{u}X_uY_u)}}
\end{align}

Other measures of genetic distance[edit]

Many other measures of genetic distance have been proposed with varying success.

Nei's distance 1983[edit]

This distance assumes that genetic differences arise due to mutation and genetic drift, but this distance measure is known to give more reliable population trees than other distances particularly for microsatellite DNA data.[10]

\begin{align}
D_{A}=1-\sum \limits_{u} \sqrt{X_uY_u}
\end{align}
[10]

Euclidean distance[edit]

\begin{align}
D_{EU}=\sqrt{\sum \limits_{u}(X_u-Y_u)^2}
\end{align}
[4]

Goldstein distance 1995[edit]

It was specifically developed for microsatellite markers and is based on the stepwise-mutation model (SMM).  \mu_X and  \mu_Y are the means of the allele frequencies in population X and Y.[11]

\begin{align}
(\delta\mu)^2=(\mu_X-\mu_Y)^2
\end{align}
[11]

Nei's minimum genetic distance 1973[edit]

This measure assumes that genetic differences arise due to mutation and genetic drift.

\begin{align}
D_{m}=\frac{(J_X+J_Y)}{2}-J_{XY}
\end{align}
[12]

Roger's distance 1972[edit]

\begin{align}
D_{R}=\frac{1}{r}\sqrt\frac{\sum \limits_{u} (X_u-Y_u)^2}{2} 
\end{align}
[13]

Fixation index[edit]

A commonly used measure of genetic distance is the fixation index which varies between 0 and 1. A value of 0 indicates that two populations are genetically identical whereas a value of 1 indicates that two populations are different species.

Software[edit]

  • PHYLIP uses GENDIST
    • Nei's standard genetic distance 1972
    • Cavalli-Sforza and Edwards 1967
    • Reynolds, Weir, and Cockerham's 1983
  • TFPGA
    • Nei's standard genetic distance (original and unbiased)
    • Nei's minimum genetic distance (original and unbiased)
    • Wright's (1978) modification of Roger's (1972) distance
    • Reynolds, Weir, and Cockerham's 1983
  • GDA
  • POPGENE
    • Nei's standard genetic distance (original and unbiased) and identity measures
  • DISPAN
    • Nei's standard genetic distance 1972
    • Nei's genetic distance between populations 1983

See also[edit]

References[edit]

  1. ^ Piazza, L. Luca Cavalli-Sforza, Paolo Menozzi, Alberto (1994). The history and geography of human genes (Abridged paperback ed. ed.). Princeton, N.J.: Princeton University Press. p. 95. ISBN 0-691-08750-4. 
  2. ^ Ruane, J. (1999). A critical review of the value of genetic distance studies in conservation of animal genetic resources. Journal of Animal Breeding and Genetics, 116(5), 317-323. Chicago.
  3. ^ L.L Cavalli-Sforza, W.F. Bodmer (1971). The Genetics of Human Populations. W.H. Freeman and Company. ISBN 0-7167-0681-4. 
  4. ^ a b c Population Genetics IV: Genetic distances -- biological vs. geometric approaches.
  5. ^ McEachern, MaryBrooke; Savage, W.; Hooper, S.; Kanthaswamy, S. "Measures of Genetic Distance". Retrieved 21 October 2013. 
  6. ^ Nei, M. (1972) Genetic distance between populations. Am. Nat. 106:283-292.
  7. ^ Nei, Masatochi. "Measures of Genetic Distance". Retrieved 22 October 2013. 
  8. ^ L.L. Cavalli-Sforza, A.W.F. Edwards (1967). "Phylogenetic Analysis -Models and Estimation Procedures". The American Journal of Human Genetics 19 (3 Part I (May)). 
  9. ^ John Reynolds, B.S. Weir, C. Clark Cockerham (November 1983). "Estimation of the coancestry coefficient: Basis for a short-term genetic distance". Genetics 105: 767–779. 
  10. ^ a b Takezaki, N. and Nei, M. (1996) Genetic distances and reconstruction of phylogenetic trees from microsatellite DNA. Genetics 144:389-399.
  11. ^ a b Gillian Cooper; William Amos; Richard Bellamy; Mahveen Ruby Siddiqui; Angela Frodsham; Adrian V. S. Hill; David C. Rubinsztein (1999). "An Empirical Exploration of the (\delta\mu)^2 Genetic Distance for 213 Human Microsatellite Markers". The American Journal of Human Genetics 65: 1125–1133. 
  12. ^ Masatoshi Nei, A.K. Roychoudhury (February 1974). "Sampling vaiances of heterozygosity and genetic distance". Genetics 76: 379–390. 
  13. ^ Naoko Takezaki; Masatoshi Nei (September 1996). "Genetic Distances and Reconstruction of Phylogenetic Trees From Microsatellite DNA". Genetics 144: 389–399. 

External links[edit]