Genetic distance

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Genetic distance is a measure of the genetic divergence between species or between populations within a species.[1] Populations with many similar genes have small genetic distances. This indicates that they are closely related and have a recent common ancestor.

Genetic distance is useful for reconstructing the history of populations. For example, evidence from genetic distance suggests that African and Eurasian people diverged about 100,000 years ago.[2] Genetic distance is also used for understanding the origin of biodiversity. For example, the genetic distances between different breeds of domesticated animals are often investigated in order to determine which breeds should be protected to maintain genetic diversity.[3]

Biological foundation[edit]

In the genome of an organism, each gene is located at a specific placed called the locus for that gene. Allelic variations at these loci cause phenotypic variation within species (e.g. hair colour, eye colour). However, most alleles do not have an observable impact on the phenotype. Within a population new alleles generated by mutation either die out or spread throughout the population. When a population is split into different isolated populations (by either geographical or ecological factors), mutations that occur after the split will be present only in the isolated population. Random fluctuation of allele frequencies also produces genetic differentiation between populations. This process is known as genetic drift. By examining the differences between allele frequencies between the populations and computing genetic distance, we can estimate how long ago the two populations were separated.[4]

Measures of genetic distance[edit]

Although it is simple to define genetic distance as a measure of genetic divergence, there are several different statistical measures that have been proposed. This has happened because different authors considered different evolutionary models. The most commonly used are Nei's genetic distance,[4] Cavalli-Sforza and Edwards measure,[5] and Reynolds, Weir and Cockerham's genetic distance,[6] listed below.

In all the formulae in this section, X and Y represent two different populations for which L loci have been studied. Let X_{u} represent the uth allele at the lth locus.

Nei's standard genetic distance[edit]

In 1972, Masatoshi Nei published what came to be known as Nei's standard genetic distance. This distance has the nice property that if the rate of genetic change (amino acid substitution) is constant per year or generation then Nei's standard genetic distance (D) increases in proportion to divergence time. This measure assumes that genetic differences are caused by mutation and genetic drift.[4]

\begin{align}
D=-\ln\frac{\sum \limits_l \sum \limits_{u} X_{u} Y_{u}}{\sqrt{(\sum \limits_{l} \sum \limits_{u} X_{u}^2)(\sum \limits_{l} \sum \limits_{u} Y_{u}^2)}}
\end{align}

This distance can also be expressed in terms of the arithmetic mean of gene identity. Let j_X be the probability for the two members of population X having the same allele at a particular locus and j_Y be the corresponding probability in population Y. Also, let j_{XY} be the probability for a member of X and a member of Y having the same allele. Now let J_X, J_Y and J_{XY} represent the arithmetic mean of j_X, j_Y and j_{XY} over all loci, respectively. In other words

\begin{align}
J_X=\sum \limits_{l} \sum \limits_{u} \frac{{X_u}^2}{L}
\end{align}
\begin{align}
J_Y=\sum \limits_{l} \sum \limits_{u} \frac{{Y_u}^2}{L}
\end{align}
\begin{align}
J_{XY}=\sum \limits_{l} \sum \limits_{u} \frac{X_uY_u}{L}
\end{align}

where L is the total number of loci examined.[7]

Nei's standard distance can then be written as[4]

\begin{align}
D=-\ln{\frac{J_{XY}}{\sqrt{J_XJ_Y}}}
\end{align}

Cavalli-Sforza chord distance[edit]

In 1967 Luigi Luca Cavalli-Sforza and A. W. F. Edwards published this measure. It assumes that genetic differences arise due to genetic drift only. One major advantage of this measure is that the populations are represented in a hypersphere, the scale of which is one unit per gene substitution. The chord distance in the hyperdimensional sphere is given by[1][5]

\begin{align}
D_{CH} = \frac{2}{\pi} \sqrt{2(1-\sum \limits_{l}\sum \limits_u \sqrt{X_{u}Y_{u})}}
\end{align}

Some authors drop the factor \frac{2}{\pi} to simplify the formula at the cost of losing the property that the scale is one unit per gene substitution.

Reynolds, Weir, and Cockerham's genetic distance[edit]

In 1983, this measure was published by John Reynolds, B.S. Weir and C. Clark Cockerham. This measure assumes that genetic differentiation occurs only by genetic drift without mutations. It estimates the coancestry coefficient \Theta which provides a measure of the genetic divergence by:[6]

\begin{align}
\Theta_w=\sqrt{\frac{\sum \limits_{l} \sum \limits_{u} (X_u-Y_u)^2}{2\sum \limits_{l} (1-\sum \limits_{u}X_uY_u)}}
\end{align}

Other measures of genetic distance[edit]

Many other measures of genetic distance have been proposed with varying success.

Nei's DA distance 1983[edit]

This distance assumes that genetic differences arise due to mutation and genetic drift, but this distance measure is known to give more reliable population trees than other distances particularly for microsatellite DNA data.[8][9]

\begin{align}
D_{A}=1-\sum \limits_{l} \sum \limits_{u} \sqrt{X_uY_u}/{L}
\end{align}

Euclidean distance[edit]

Main article: Euclidean distance
\begin{align}
D_{EU}=\sqrt{\sum \limits_{u}(X_u-Y_u)^2}
\end{align}
[1]

Goldstein distance 1995[edit]

It was specifically developed for microsatellite markers and is based on the stepwise-mutation model (SMM).  \mu_X and  \mu_Y are the means of the allele frequencies in population X and Y.[10]

\begin{align}
(\delta\mu)^2=\sum \limits_{l}(\mu_X-\mu_Y)^2/{L}
\end{align}

Nei's minimum genetic distance 1973[edit]

This measure assumes that genetic differences arise due to mutation and genetic drift.[2]

\begin{align}
D_{m}=\frac{(J_X+J_Y)}{2}-J_{XY}
\end{align}

Roger's distance 1972[edit]

\begin{align}
D_{R}=\frac{1}{L}\sqrt\frac{\sum \limits_{u} (X_u-Y_u)^2}{2} 
\end{align}
[11]

Fixation index[edit]

Main article: Fixation index

A commonly used measure of genetic distance is the fixation index which varies between 0 and 1. A value of 0 indicates that two populations are genetically identical whereas a value of 1 indicates that two populations are different species. No mutation is assumed.

Software[edit]

  • PHYLIP uses GENDIST
    • Nei's standard genetic distance 1972
    • Cavalli-Sforza and Edwards 1967
    • Reynolds, Weir, and Cockerham's 1983
  • TFPGA
    • Nei's standard genetic distance (original and unbiased)
    • Nei's minimum genetic distance (original and unbiased)
    • Wright's (1978) modification of Roger's (1972) distance
    • Reynolds, Weir, and Cockerham's 1983
  • GDA
  • POPGENE
  • POPTREE2 Takezaki, Nei, and Tamura (2010, 2014)
    • Commonly used genetic distances and gene diversity analysis
  • DISPAN
    • Nei's standard genetic distance 1972
    • Nei's DA distance between populations 1983

See also[edit]

References[edit]

  1. ^ a b c Nei, M. (1987). Molecular Evolutionary Genetics. (Chapter 9). New York: Columbia University Press. 
  2. ^ a b Nei, M. and A. K. Roychoudhury (1974). "Genic variation within and between the three major races of man, Caucasoids, Negroids, and Mongoloids". The American Journal of Human Genetics 26: 421–443. 
  3. ^ Ruane, J. (1999). A critical review of the value of genetic distance studies in conservation of animal genetic resources. Journal of Animal Breeding and Genetics, 116(5), 317-323. Chicago.
  4. ^ a b c d Nei, M. (1972). "Genetic distance between populations". Am. Nat. 106: 283–292. 
  5. ^ a b L.L. Cavalli-Sforza, A.W.F. Edwards (1967). "Phylogenetic Analysis -Models and Estimation Procedures". The American Journal of Human Genetics 19 (3 Part I (May)). 
  6. ^ a b John Reynolds, B.S. Weir, C. Clark Cockerham (November 1983). "Estimation of the coancestry coefficient: Basis for a short-term genetic distance". Genetics 105: 767–779. 
  7. ^ Nei, M. (1987) Genetic distance and molecular phylogeny. In: Population Genetics and Fishery Management (N. Ryman and F. Utter, eds.), University of Washington Press, Seattle, WA, pp. 193-223.
  8. ^ Nei, M., F. Tajima, & Y. Tateno (1983) Accuracy of estimated phylogenetic trees from molecular data. II. Gene frequency data. J. Mol. Evol. 19:153-170.
  9. ^ Takezaki, N. and Nei, M. (1996) Genetic distances and reconstruction of phylogenetic trees from microsatellite DNA. Genetics 144:389-399.
  10. ^ Gillian Cooper; William Amos; Richard Bellamy; Mahveen Ruby Siddiqui; Angela Frodsham; Adrian V. S. Hill; David C. Rubinsztein (1999). "An Empirical Exploration of the (\delta\mu)^2 Genetic Distance for 213 Human Microsatellite Markers". The American Journal of Human Genetics 65: 1125–1133. doi:10.1086/302574. 
  11. ^ Rogers, J. S. (1972). Measures of similarity and genetic distance. In Studies in Genetics VII. pp. 145−153. University of Texas Publication 7213. Austin, Texas.

External links[edit]