Genetic distance: Difference between revisions

Content deleted Content added

Inline

Revision as of 03:39, 18 April 2024

÷

Genetic distance is a measure of the genetic divergence between species or between populations within a species, whether the distance measures time from common ancestor or degree of differentiation.^[2] Populations with many similar alleles have small genetic distances. This indicates that they are closely related and have a recent common ancestor.

Genetic distance is useful for reconstructing the history of populations, such as the multiple human expansions out of Africa.^[3] It is also used for understanding the origin of biodiversity. For example, the genetic distances between different breeds of domesticated animals are often investigated in order to determine which breeds should be protected to maintain genetic diversity.^[4]

Biological foundation

Life on earth began from very simple unicellular organisms evolving into most complex multicellular organisms through the course of over three billion years. ^[5] Creating a comprehensive tree of life that represents all the organisms that have ever lived on earth is important for understanding the evolution of life in the face of all challenges faced by living organisms to deal with similar challenges in future. Evolutionary biologists have attempted to create evolutionary or phylogenetic trees encompassing as many organisms as possible based on the available resources. Fossil dating and molecular clock are the two means of generating evolutionary history of living organisms. Fossil record is random, incomplete and does not provide a continuous chain of events like a movie with missing frames cannot tell the whole plot of the movie. ^[6]^[7]

Molecular clocks on the other hand are specific sequences of DNA, RNA or proteins (amino acids) that are used to determine at molecular level the similarities and differences among species, to find out the timeline of divergence, ^[8] and to trace back the common ancestor of species based on the mutation rates and sequence changes being accumulated in those specific sequences. ^[9] The primary driver of evolution is the mutation or changes in genes and accounting for those changes overtime determines the approximate genetic distance between species. These specific molecular clocks are fairly conserved across a range of species and have a constant rate of mutation like a clock and are calibrated based on evolutionary events (fossil records). For example, gene for alpha-globin (constituent of hemoglobin) mutates at a rate of 0.56 per base pair per billion year. ^[10]. The molecular clock can fill those gaps created by missing fossil records.

In the genome of an organism, each gene is located at a specific place called the locus for that gene. Allelic variations at these loci cause phenotypic variation within species (e.g. hair colour, eye colour). However, most alleles do not have an observable impact on the phenotype. Within a population new alleles generated by mutation either die out or spread throughout the population. When a population is split into different isolated populations (by either geographical or ecological factors), mutations that occur after the split will be present only in the isolated population. Random fluctuation of allele frequencies also produces genetic differentiation between populations. This process is known as genetic drift. By examining the differences between allele frequencies between the populations and computing genetic distance, we can estimate how long ago the two populations were separated.^[11]

Let’s suppose a sequence of DNA or a hypothetical gene that has mutation rate of one base per 10 million years. Using this sequence of DNA, the divergence of two different species or genetic distance between two different species can be determined by counting the number of base pair differences among them. For example, a difference of 4 bases in the hypothetical sequence among those two species would indicate that they diverged 40 million years ago, and their common ancestor would have lived at least 20 million years ago before their divergence. Based on molecular clock, the equation below can be used to calculate the time since divergence.^[12]

Number of mutation ÷ Mutation per year (rate of mutation) = time since divergence

Process of determining genetic distance

Recent advancement in sequencing technology and the availability of comprehensive genomic databases and bioinformatics tools that are capable of storing and processing colossal amount of data generated by the advanced sequencing technology has tremendously improved evolutionary studies and the understanding of evolutionary relationships among species.^[13]^[14]

Markers for genetic distance

Different biomolecular markers such DNA, RNA and amino acid sequences (protein) can be used for determining the genetic distance. ^[15]^[16]

The selection criteria^[17] of appropriate biomarker for genetic distance entails the following three steps:

choice of variability
choice of specific region of DNA or RNA
the use of technique

The choice of variability depends on the intended outcome. For example, very high level of variability is recommended for demographic studies and parentage analyses, medium to high variability for comparing distinct populations, and moderate to very low variability is recommended for phylogenetic studies.^[17] The genomic localization and ploidy of the marker is also an important factor. For example, the gene copy number is inversely proportional to the robustness with haploid genome (mitochondrial DNA) more prone to genetic drift than diploid genome (nuclear DNA).

The choice and examples of molecular markers for evolutionary biology studies.^[17]

	Biological issues/biodiversity level	Level of variability	Nature of information required	Examples of most used markers
Intra-population	Intra-population Fine population structure, reproduction system	Medium to high	(N) codominant loci = (Multilocus) genotype	Microsatellites, allozymes
	Fingerprinting. parentage analysis	Very high	Codominant loci or numerous dominant loci	Microsatellites (RAPD, AFLP)
	Demography	Medium to high	Allele frequency in samples taken at different times	Allozymes, Microsatellites
	Demographic history	Medium to high	Allele frequency + evolutionary relationships	Mt-DNA sequences
Inter-population	Phylogeography, definition of evolutionary significant units (population structure)	Medium to high	Allele frequency in each population	Allozymes, microsatellites (risk of size homoplasy)
Inter-population	Bio-conservation	Medium	Allele evolutionary relationships	Mt-DNA (if variable enough)
Inter-specific	Close species	ca. 1%/my	No variability within species if possible	Sequences of Mt-DNA, ITS rDNA

Application of genetic distance

Phylogenetics: Exploring the genetic distance among species can help in establishing evolutionary relationships among them, the time of divergence between them and creating a comprehensive phylogenetic tree that connect them to their common ancestors.
Accuracy of genomic prediction: Genetic distance can be used to predict unobserved phenotypes which has implication in medical diagnostics, and breeding of plants and animals. ^[18]
Population Genetics: Genetic distance can help in studying population genetics, understanding intra and inter-population genetic diversity.
Taxonomy and Species Delimitation: Determining genetic distance through DNA barcoding is an effective tool for delimiting species especially identifying cryptic species.^[19] An optimized percentage threshold genetic distance is recommended based on the data and species being studied to improve and enhance the reliability and applicability of delimitation^[20]^[21]^[22] that can delineate species boundaries and identify cryptic species that look similar but are genetically distinct.

Evolutionary forces affecting genetic distance

Evolutionary forces such as mutation, genetic drift, natural selection, and gene flow drive the process of evolution and genetic diversity. All these forces play significant role in genetic distance within and among species.^[23]

Measures

Different statistical measures exist that aim to quantify genetic deviation between populations or species. By utilizing assumptions gained from experimental analysis of evolutionary forces, a model that more accurately suits a given experiment can be selected to study a genetic group. Additionally, comparing how well different metrics model certain population features such as isolation can identify metrics that are more suited for understanding newly studied groups^[24] The most commonly used genetic distance metrics are Nei's genetic distance,^[11] Cavalli-Sforza and Edwards measure,^[25] and Reynolds, Weir and Cockerham's genetic distance^[26].

Jukes-Cantor Distance

One of the most basic and straight forward distance measures is Jukes-Cantor distance. This measure is constructed based on the assumption that no insertions or deletions occurred, all substitutions are independent, and that each nucleotide change is equally likely. With these axioms, we can obtain the following equation^[27]:

d_{AB}=-{\frac {3}{4}}\ln(1-{\frac {4}{3}}f_{AB})

where $d_{AB}$ is the Jukes-Cantor distance between two sequences A, and B, and $f_{AB}$ being the dissimilarity between the two sequences.

Nei's standard genetic distance

In 1972, Masatoshi Nei published what came to be known as Nei's standard genetic distance. This distance has the nice property that if the rate of genetic change (amino acid substitution) is constant per year or generation then Nei's standard genetic distance (D) increases in proportion to divergence time. This measure assumes that genetic differences are caused by mutation and genetic drift.^[11]

D=-\ln {\frac {\sum \limits _{\ell }\sum \limits _{u}X_{u}Y_{u}}{\sqrt {\left(\sum \limits _{u}X_{u}^{2}\right)\left(\sum \limits _{u}Y_{u}^{2}\right)}}}

This distance can also be expressed in terms of the arithmetic mean of gene identity. Let $j_{X}$ be the probability for the two members of population $X$ having the same allele at a particular locus and $j_{Y}$ be the corresponding probability in population $Y$ . Also, let $j_{XY}$ be the probability for a member of $X$ and a member of $Y$ having the same allele. Now let $J_{X}$ , $J_{Y}$ and $J_{XY}$ represent the arithmetic mean of $j_{X}$ , $j_{Y}$ and $j_{XY}$ over all loci, respectively. In other words,

J_{X}=\sum _{u}{\frac {{X_{u}}^{2}}{L}}

J_{Y}=\sum _{u}{\frac {{Y_{u}}^{2}}{L}}

J_{XY}=\sum _{\ell }\sum _{u}{\frac {X_{u}Y_{u}}{L}}

where $L$ is the total number of loci examined.^[28]

Nei's standard distance can then be written as^[11]

D=-\ln {\frac {J_{XY}}{\sqrt {J_{X}J_{Y}}}}

Cavalli-Sforza chord distance

In 1967 Luigi Luca Cavalli-Sforza and A. W. F. Edwards published this measure. It assumes that genetic differences arise due to genetic drift only. One major advantage of this measure is that the populations are represented in a hypersphere, the scale of which is one unit per gene substitution. The chord distance in the hyperdimensional sphere is given by^[2]^[25]

D_{\text{CH}}={\frac {2}{\pi }}{\sqrt {2\left(1-\sum _{\ell }\sum _{u}{\sqrt {X_{u}Y_{u}}}\right)}}

Some authors drop the factor ${\frac {2}{\pi }}$ to simplify the formula at the cost of losing the property that the scale is one unit per gene substitution.

Reynolds, Weir, and Cockerham's genetic distance

In 1983, this measure was published by John Reynolds, Bruce Weir and C. Clark Cockerham. This measure assumes that genetic differentiation occurs only by genetic drift without mutations. It estimates the coancestry coefficient $\Theta$ which provides a measure of the genetic divergence by:^[26]

\Theta _{w}={\sqrt {\frac {\sum \limits _{\ell }\sum \limits _{u}(X_{u}-Y_{u})^{2}}{2\sum \limits _{\ell }\left(1-\sum \limits _{u}X_{u}Y_{u}\right)}}}

Other measures

Many other measures of genetic distance have been proposed with varying success.

Nei's D_A distance 1983

This distance assumes that genetic differences arise due to mutation and genetic drift, but this distance measure is known to give more reliable population trees than other distances particularly for microsatellite DNA data. This method is not ideal in cases where natural selection plays a significant role in a populations genetics.^[29]^[30]

D_{A}=1-\sum _{\ell }\sum _{u}{\sqrt {X_{u}Y_{u}}}/{L}

$D_{A}$ : Nei's DA distance, the genetic distance between populations X and Y

$\ell$ : A locus or gene studied with $\sum _{\ell }$ being the sum of loci or genes

$X_{u}$ and $Y_{u}$ : The frequencies of allele u in populations X and Y, respectively

L: The total number of loci examined

Euclidean distance

Euclidean distance is a formula brought about from Euclid's Elements which is used to convey, as simply as possible, the genetic dissimilarity between populations with a larger distance indicating greater dissimilarity. The work of René Descartes brought about the cartesian coordinate system which can be used to visually convey the results of euclidean distance calculations.^[32]^[33]

D_{EU}={\sqrt {\sum _{u}(X_{u}-Y_{u})^{2}}}

^[2]

$D_{EU}$ : Euclidean genetic distance between populations X and Y

$X_{u}$ and $Y_{u}$ : Allele frequencies at locus u in populations X and Y, respectively

Goldstein distance 1995

It was specifically developed for microsatellite markers and is based on the stepwise-mutation model (SMM). $\mu _{X}$ and $\mu _{Y}$ are the means of the allele sizes in population X and Y.^[34]

(\delta \mu )^{2}=\sum _{\ell }{\frac {(\mu _{X}-\mu _{Y})^{2}}{L}}

\delta \mu

: Goldstein genetic distance between populations X and Y

\mu _{x}

and

\mu _{y}

: Mean allele sizes in populations X and Y

L: Total number of microsatallite loci examined

Nei's minimum genetic distance 1973

This measure assumes that genetic differences arise due to mutation and genetic drift.^[35]

D_{m}={\frac {J_{X}+J_{Y}}{2}}-J_{XY}

Roger's distance 1972

D_{R}={\frac {1}{L}}{\sqrt {\frac {\sum \limits _{u}(X_{u}-Y_{u})^{2}}{2}}}

^[36]

Fixation index

A commonly used measure of genetic distance is the fixation index (F_ST) which varies between 0 and 1. A value of 0 indicates that two populations are genetically identical (minimal or no genetic diversity between the two populations) whereas a value of 1 indicates that two populations are genetically different (maximum genetic diversity between the two populations). No mutation is assumed. Large populations between which there is much migration, for example, tend to be little differentiated whereas small populations between which there is little migration tend to be greatly differentiated. F_ST is a convenient measure of this differentiation, and as a result F_ST and related statistics are among the most widely used descriptive statistics in population and evolutionary genetics. But F_ST is more than a descriptive statistic and measure of genetic differentiation. F_ST is directly related to the Variance in allele frequency among populations and conversely to the degree of resemblance among individuals within populations. If F_ST is small, it means that allele frequencies within each population are very similar; if it is large, it means that allele frequencies are very different.

Software

PHYLIP uses GENDIST
- Nei's standard genetic distance 1972
- Cavalli-Sforza and Edwards 1967
- Reynolds, Weir, and Cockerham's 1983
TFPGA
- Nei's standard genetic distance (original and unbiased)
- Nei's minimum genetic distance (original and unbiased)
- Wright's (1978) modification of Roger's (1972) distance
- Reynolds, Weir, and Cockerham's 1983
GDA
POPGENE
POPTREE2 Takezaki, Nei, and Tamura (2010, 2014)
- Commonly used genetic distances and gene diversity analysis
DISPAN
- Nei's standard genetic distance 1972
- Nei's D_A distance between populations 1983

References

^ Cavalli-Sforza, L.L., Menozzi, P. & Piazza, A. (1994). The History and Geography of Human Genes. New Jersey: Princeton University Press.
^ ^a ^b ^c Nei, M. (1987). "Chapter 9". Molecular Evolutionary Genetics. New York: Columbia University Press.
^ Ramachandran S, Deshpande O, Roseman CC, Rosenberg NA, Feldman MW, Cavalli-Sforza LL (November 2005). "Support from the relationship of genetic and geographic distance in human populations for a serial founder effect originating in Africa". Proc Natl Acad Sci U S A. 102 (44): 15942–7. Bibcode:2005PNAS..10215942R. doi:10.1073/pnas.0507611102. PMC 1276087. PMID 16243969.
^ Ruane J (1999). "A critical review of the value of genetic distance studies in conservation of animal genetic resources". Journal of Animal Breeding and Genetics. 116 (5): 317–323. doi:10.1046/j.1439-0388.1999.00205.x.
^ #author.fullName}. "Timeline: The evolution of life". New Scientist. Retrieved 2024-04-17. {{cite web}}: |last= has generic name (help)
^ #author.fullName}. "Timeline: The evolution of life". New Scientist. Retrieved 2024-04-17. {{cite web}}: |last= has generic name (help)
^ #author.fullName}. "Timeline: The evolution of life". New Scientist. Retrieved 2024-04-17. {{cite web}}: |last= has generic name (help)
^ "Molecular clocks". evolution.berkeley.edu. Retrieved 2024-04-18.
^ "Molecular clocks". evolution.berkeley.edu. Retrieved 2024-04-18.
^ "Molecular clocks". evolution.berkeley.edu. Retrieved 2024-04-18.
^ ^a ^b ^c ^d Nei, M. (1972). "Genetic distance between populations". Am. Nat. 106 (949): 283–292. doi:10.1086/282771. S2CID 55212907.
^ Cheng, Eric C.K. (2024-01-18), "Crafting future pedagogies through Lesson Study", Implementing a 21st Century Competency-Based Curriculum Through Lesson Study, London: Routledge, pp. 3–18, ISBN 978-1-003-37410-7, retrieved 2024-04-18
^ Koboldt, Daniel C.; Steinberg, Karyn Meltz; Larson, David E.; Wilson, Richard K.; Mardis, Elaine R. (2013-09). "The Next-Generation Sequencing Revolution and Its Impact on Genomics". Cell. 155 (1): 27–38. doi:10.1016/j.cell.2013.09.006. {{cite journal}}: Check date values in: |date= (help); no-break space character in |first2= at position 6 (help); no-break space character in |first3= at position 6 (help); no-break space character in |first4= at position 8 (help); no-break space character in |first= at position 7 (help)
^ Hudson, Matthew E. (2008-01). "Sequencing breakthroughs for genomic ecology and evolutionary biology". Molecular Ecology Resources. 8 (1): 3–17. doi:10.1111/j.1471-8286.2007.02019.x. ISSN 1755-098X. {{cite journal}}: Check date values in: |date= (help)
^ Kartavtsev, Yuri Phedorovich (2021-05-20). "Some Examples of the Use of Molecular Markers for Needs of Basic Biology and Modern Society". Animals. 11 (5): 1473. doi:10.3390/ani11051473. ISSN 2076-2615.{{cite journal}}: CS1 maint: unflagged free DOI (link)
^ Bhandari, Vaibhav; Naushad, Hafiz S.; Gupta, Radhey S. (2012). "Protein based molecular markers provide reliable means to understand prokaryotic phylogeny and support Darwinian mode of evolution". Frontiers in Cellular and Infection Microbiology. 2. doi:10.3389/fcimb.2012.00098. ISSN 2235-2988.{{cite journal}}: CS1 maint: unflagged free DOI (link)
^ ^a ^b ^c Anne, Chenuil (2006-05). "Choosing the right molecular genetic markers for studying biodiversity: from molecular evolution to practical aspects". Genetica. 127 (1–3): 101–120. doi:10.1007/s10709-005-2485-1. ISSN 0016-6707. {{cite journal}}: Check date values in: |date= (help)
^ Scutari, Marco; Mackay, Ian; Balding, David (2016-09-02). Hickey, John Micheal (ed.). "Using Genetic Distance to Infer the Accuracy of Genomic Prediction". PLOS Genetics. 12 (9): e1006288. doi:10.1371/journal.pgen.1006288. ISSN 1553-7404.{{cite journal}}: CS1 maint: unflagged free DOI (link)
^ Shin, Caren P.; Allmon, Warren D. (2023-09). "How we study cryptic species and their biological implications: A case study from marine shelled gastropods". Ecology and Evolution. 13 (9). doi:10.1002/ece3.10360. ISSN 2045-7758. {{cite journal}}: Check date values in: |date= (help)
^ Ma, Zhuo; Ren, Jinliang; Zhang, Runzhi (2022-03-05). "Identifying the Genetic Distance Threshold for Entiminae (Coleoptera: Curculionidae) Species Delimitation via COI Barcodes". Insects. 13 (3): 261. doi:10.3390/insects13030261. ISSN 2075-4450.{{cite journal}}: CS1 maint: unflagged free DOI (link)
^ Meyer, Christopher P; Paulay, Gustav (2005-11-29). Godfray, Charles (ed.). "DNA Barcoding: Error Rates Based on Comprehensive Sampling". PLoS Biology. 3 (12): e422. doi:10.1371/journal.pbio.0030422. ISSN 1545-7885.{{cite journal}}: CS1 maint: unflagged free DOI (link)
^ Bianchi, Filipe Michels; Gonçalves, Leonardo Tresoldi (2021-04-24). "Borrowing the Pentatomomorpha tome from the DNA barcode library: Scanning the overall performance ofcox1as a tool". Journal of Zoological Systematics and Evolutionary Research. 59 (5): 992–1012. doi:10.1111/jzs.12476. ISSN 0947-5745.
^ Saeb, Amr T. M.; Al-Naqeb, Dhekra (2016). "The Impact of Evolutionary Driving Forces on Human Complex Diseases: A Population Genetics Approach". Scientifica. 2016: 1–10. doi:10.1155/2016/2079704. ISSN 2090-908X.{{cite journal}}: CS1 maint: unflagged free DOI (link)
^ Séré M, Thévenon S, Belem AMG, De Meeûs T. (2017). "Comparison of different genetic distances to test isolation by distance between populations". Heredity (Edinb). 119(2): 55–63. doi:10.1038/hdy.2017.26.{{cite journal}}: CS1 maint: multiple names: authors list (link)
^ ^a ^b L.L. Cavalli-Sforza; A.W.F. Edwards (1967). "Phylogenetic Analysis – Models and Estimation Procedures". The American Journal of Human Genetics. 19 (3 Part I (May)): 233–257. PMC 1706274. PMID 6026583.
^ ^a ^b John Reynolds; B.S. Weir; C. Clark Cockerham (November 1983). "Estimation of the coancestry coefficient: Basis for a short-term genetic distance". Genetics. 105 (3): 767–779. doi:10.1093/genetics/105.3.767. PMC 1202185. PMID 17246175.
^ "TREECON for Windows user manual".
^ Nei, M. (1987) Genetic distance and molecular phylogeny. In: Population Genetics and Fishery Management (N. Ryman and F. Utter, eds.), University of Washington Press, Seattle, WA, pp. 193–223.
^ Nei M., Tajima F., Tateno Y. (1983). "Accuracy of estimated phylogenetic trees from molecular data. II. Gene frequency data". J. Mol. Evol. 19 (2): 153–170. doi:10.1007/bf02300753. PMID 6571220. S2CID 19567426.{{cite journal}}: CS1 maint: multiple names: authors list (link)
^ Takezaki N. (1996). "Genetic distances and reconstruction of phylogenetic trees from microsatellite DNA". Genetics. 144 (1): 389–399. doi:10.1093/genetics/144.1.389. PMC 1207511. PMID 8878702.
^ Magalhães TR, Casey JP, Conroy J, Regan R, Fitzpatrick DJ, Shah N; et al. (2012). "HGDP and HapMap analysis by Ancestry Mapper reveals local and global population relationships". PLOS ONE. 7 (11): e49438. Bibcode:2012PLoSO...749438M. doi:10.1371/journal.pone.0049438. PMC 3506643. PMID 23189146.{{cite journal}}: CS1 maint: multiple names: authors list (link)
^ Sarton, George (March 1928). "The Thirteen Books of Euclid's Elements. Thomas L. Heath , Heiberg". Isis. 10 (1): 60–62. doi:10.1086/346308. ISSN 0021-1753.
^ Descartes, René (1664). La géométrie (in French). Chez Charles Angot.
^ Gillian Cooper; William Amos; Richard Bellamy; Mahveen Ruby Siddiqui; Angela Frodsham; Adrian V. S. Hill; David C. Rubinsztein (1999). "An Empirical Exploration of the $(\delta \mu )^{2}$ Genetic Distance for 213 Human Microsatellite Markers". The American Journal of Human Genetics. 65 (4): 1125–1133. doi:10.1086/302574. PMC 1288246. PMID 10486332.
^ Nei M, Roychoudhury AK (February 1974). "Sampling variances of heterozygosity and genetic distance". Genetics. 76 (2): 379–90. doi:10.1093/genetics/76.2.379. PMC 1213072. PMID 4822472.
^ Rogers, J. S. (1972). Measures of similarity and genetic distance. In Studies in Genetics VII. pp. 145−153. University of Texas Publication 7213. Austin, Texas.

External links

[1] Cavalli-Sforza, L.L., Menozzi, P. & Piazza, A. (1994). The History and Geography of Human Genes. New Jersey: Princeton University Press.

[Nei_1987-2] Nei, M. (1987). "Chapter 9". Molecular Evolutionary Genetics. New York: Columbia University Press.

[3] Ramachandran S, Deshpande O, Roseman CC, Rosenberg NA, Feldman MW, Cavalli-Sforza LL (November 2005). "Support from the relationship of genetic and geographic distance in human populations for a serial founder effect originating in Africa". Proc Natl Acad Sci U S A. 102 (44): 15942–7. Bibcode:2005PNAS..10215942R. doi:10.1073/pnas.0507611102. PMC 1276087. PMID 16243969.

[4] Ruane J (1999). "A critical review of the value of genetic distance studies in conservation of animal genetic resources". Journal of Animal Breeding and Genetics. 116 (5): 317–323. doi:10.1046/j.1439-0388.1999.00205.x.

[5] #author.fullName}. "Timeline: The evolution of life". New Scientist. Retrieved 2024-04-17. {{cite web}}: |last= has generic name (help)

[6] #author.fullName}. "Timeline: The evolution of life". New Scientist. Retrieved 2024-04-17. {{cite web}}: |last= has generic name (help)

[7] #author.fullName}. "Timeline: The evolution of life". New Scientist. Retrieved 2024-04-17. {{cite web}}: |last= has generic name (help)

[8] "Molecular clocks". evolution.berkeley.edu. Retrieved 2024-04-18.

[9] "Molecular clocks". evolution.berkeley.edu. Retrieved 2024-04-18.

[10] "Molecular clocks". evolution.berkeley.edu. Retrieved 2024-04-18.

[Nei_1972-11] Nei, M. (1972). "Genetic distance between populations". Am. Nat. 106 (949): 283–292. doi:10.1086/282771. S2CID 55212907.

[12] Cheng, Eric C.K. (2024-01-18), "Crafting future pedagogies through Lesson Study", Implementing a 21st Century Competency-Based Curriculum Through Lesson Study, London: Routledge, pp. 3–18, ISBN 978-1-003-37410-7, retrieved 2024-04-18

[13] Koboldt, Daniel C.; Steinberg, Karyn Meltz; Larson, David E.; Wilson, Richard K.; Mardis, Elaine R. (2013-09). "The Next-Generation Sequencing Revolution and Its Impact on Genomics". Cell. 155 (1): 27–38. doi:10.1016/j.cell.2013.09.006. {{cite journal}}: Check date values in: |date= (help); no-break space character in |first2= at position 6 (help); no-break space character in |first3= at position 6 (help); no-break space character in |first4= at position 8 (help); no-break space character in |first= at position 7 (help)

[14] Hudson, Matthew E. (2008-01). "Sequencing breakthroughs for genomic ecology and evolutionary biology". Molecular Ecology Resources. 8 (1): 3–17. doi:10.1111/j.1471-8286.2007.02019.x. ISSN 1755-098X. {{cite journal}}: Check date values in: |date= (help)

[15] Kartavtsev, Yuri Phedorovich (2021-05-20). "Some Examples of the Use of Molecular Markers for Needs of Basic Biology and Modern Society". Animals. 11 (5): 1473. doi:10.3390/ani11051473. ISSN 2076-2615.{{cite journal}}: CS1 maint: unflagged free DOI (link)

[16] Bhandari, Vaibhav; Naushad, Hafiz S.; Gupta, Radhey S. (2012). "Protein based molecular markers provide reliable means to understand prokaryotic phylogeny and support Darwinian mode of evolution". Frontiers in Cellular and Infection Microbiology. 2. doi:10.3389/fcimb.2012.00098. ISSN 2235-2988.{{cite journal}}: CS1 maint: unflagged free DOI (link)

[:0-17] Anne, Chenuil (2006-05). "Choosing the right molecular genetic markers for studying biodiversity: from molecular evolution to practical aspects". Genetica. 127 (1–3): 101–120. doi:10.1007/s10709-005-2485-1. ISSN 0016-6707. {{cite journal}}: Check date values in: |date= (help)

[18] Scutari, Marco; Mackay, Ian; Balding, David (2016-09-02). Hickey, John Micheal (ed.). "Using Genetic Distance to Infer the Accuracy of Genomic Prediction". PLOS Genetics. 12 (9): e1006288. doi:10.1371/journal.pgen.1006288. ISSN 1553-7404.{{cite journal}}: CS1 maint: unflagged free DOI (link)

[19] Shin, Caren P.; Allmon, Warren D. (2023-09). "How we study cryptic species and their biological implications: A case study from marine shelled gastropods". Ecology and Evolution. 13 (9). doi:10.1002/ece3.10360. ISSN 2045-7758. {{cite journal}}: Check date values in: |date= (help)

[20] Ma, Zhuo; Ren, Jinliang; Zhang, Runzhi (2022-03-05). "Identifying the Genetic Distance Threshold for Entiminae (Coleoptera: Curculionidae) Species Delimitation via COI Barcodes". Insects. 13 (3): 261. doi:10.3390/insects13030261. ISSN 2075-4450.{{cite journal}}: CS1 maint: unflagged free DOI (link)

[21] Meyer, Christopher P; Paulay, Gustav (2005-11-29). Godfray, Charles (ed.). "DNA Barcoding: Error Rates Based on Comprehensive Sampling". PLoS Biology. 3 (12): e422. doi:10.1371/journal.pbio.0030422. ISSN 1545-7885.{{cite journal}}: CS1 maint: unflagged free DOI (link)

[22] Bianchi, Filipe Michels; Gonçalves, Leonardo Tresoldi (2021-04-24). "Borrowing the Pentatomomorpha tome from the DNA barcode library: Scanning the overall performance ofcox1as a tool". Journal of Zoological Systematics and Evolutionary Research. 59 (5): 992–1012. doi:10.1111/jzs.12476. ISSN 0947-5745.

[23] Saeb, Amr T. M.; Al-Naqeb, Dhekra (2016). "The Impact of Evolutionary Driving Forces on Human Complex Diseases: A Population Genetics Approach". Scientifica. 2016: 1–10. doi:10.1155/2016/2079704. ISSN 2090-908X.{{cite journal}}: CS1 maint: unflagged free DOI (link)

[Séré_2017-24] Séré M, Thévenon S, Belem AMG, De Meeûs T. (2017). "Comparison of different genetic distances to test isolation by distance between populations". Heredity (Edinb). 119(2): 55–63. doi:10.1038/hdy.2017.26.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[Cavalli-Sforza-25] L.L. Cavalli-Sforza; A.W.F. Edwards (1967). "Phylogenetic Analysis – Models and Estimation Procedures". The American Journal of Human Genetics. 19 (3 Part I (May)): 233–257. PMC 1706274. PMID 6026583.

[Reynold,Weir,Cockerham-26] John Reynolds; B.S. Weir; C. Clark Cockerham (November 1983). "Estimation of the coancestry coefficient: Basis for a short-term genetic distance". Genetics. 105 (3): 767–779. doi:10.1093/genetics/105.3.767. PMC 1202185. PMID 17246175.

[Treecon-27] "TREECON for Windows user manual".

[28] Nei, M. (1987) Genetic distance and molecular phylogeny. In: Population Genetics and Fishery Management (N. Ryman and F. Utter, eds.), University of Washington Press, Seattle, WA, pp. 193–223.

[Nei,_Tajima,_&_Tateno_1983-29] Nei M., Tajima F., Tateno Y. (1983). "Accuracy of estimated phylogenetic trees from molecular data. II. Gene frequency data". J. Mol. Evol. 19 (2): 153–170. doi:10.1007/bf02300753. PMID 6571220. S2CID 19567426.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[Genetic_distances_and_reconstruction_of_phylogenetic_trees_from_microsatellite_DNA-30] Takezaki N. (1996). "Genetic distances and reconstruction of phylogenetic trees from microsatellite DNA". Genetics. 144 (1): 389–399. doi:10.1093/genetics/144.1.389. PMC 1207511. PMID 8878702.

[pmid23189146-31] Magalhães TR, Casey JP, Conroy J, Regan R, Fitzpatrick DJ, Shah N; et al. (2012). "HGDP and HapMap analysis by Ancestry Mapper reveals local and global population relationships". PLOS ONE. 7 (11): e49438. Bibcode:2012PLoSO...749438M. doi:10.1371/journal.pone.0049438. PMC 3506643. PMID 23189146.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[32] Sarton, George (March 1928). "The Thirteen Books of Euclid's Elements. Thomas L. Heath , Heiberg". Isis. 10 (1): 60–62. doi:10.1086/346308. ISSN 0021-1753.

[33] Descartes, René (1664). La géométrie (in French). Chez Charles Angot.

[An_Empirical_Exploration_of_the_dmu2_Genetic_Distance_for_213_Human_Microsatellite_Markers-34] Gillian Cooper; William Amos; Richard Bellamy; Mahveen Ruby Siddiqui; Angela Frodsham; Adrian V. S. Hill; David C. Rubinsztein (1999). "An Empirical Exploration of the $(\delta \mu )^{2}$ Genetic Distance for 213 Human Microsatellite Markers". The American Journal of Human Genetics. 65 (4): 1125–1133. doi:10.1086/302574. PMC 1288246. PMID 10486332.

[Nei_&_Roychoudhury_1974-35] Nei M, Roychoudhury AK (February 1974). "Sampling variances of heterozygosity and genetic distance". Genetics. 76 (2): 379–90. doi:10.1093/genetics/76.2.379. PMC 1213072. PMID 4822472.

[36] Rogers, J. S. (1972). Measures of similarity and genetic distance. In Studies in Genetics VII. pp. 145−153. University of Texas Publication 7213. Austin, Texas.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

@@ Line 1: / Line 1: @@
-{{Short description|Measure of divergence between populations}}
+÷{{Short description|Measure of divergence between populations}}
 {{genetics sidebar}}
 [[File:The history and geography of human genes Luigi Luca Cavalli-Sforza map genetic.png|thumb|300px|Genetic distance map by Cavalli-Sforza et al. (1994) <ref>Cavalli-Sforza, L.L., Menozzi, P. & Piazza, A. (1994). The History and Geography of Human Genes. New Jersey: Princeton University Press.</ref>]]
@@ Line 7: / Line 7: @@
 ==Biological foundation==
+Life on earth began from very simple [[Unicellular organism|unicellular]] organisms evolving into most complex [[Multicellular organism|multicellular]] organisms through the course of over three billion years. <ref>{{Cite web |last=#author.fullName} |title=Timeline: The evolution of life |url=https://www.newscientist.com/article/dn17453-timeline-the-evolution-of-life/ |access-date=2024-04-17 |website=New Scientist |language=en-US}}</ref> Creating a comprehensive tree of life that represents all the organisms that have ever lived on earth is important for understanding the evolution of life in the face of all challenges faced by living organisms to deal with similar challenges in future. Evolutionary biologists have attempted to create evolutionary or [[Phylogenetics|phylogenetic]] trees encompassing as many organisms as possible based on the available resources. [[Fossil]] [[Radiodating|dating]] and [[molecular clock]] are the two means of generating evolutionary history of living organisms. Fossil record is random, incomplete and does not provide a continuous chain of events like a movie with missing frames cannot tell the whole plot of the movie. <ref>{{Cite web |last=#author.fullName} |title=Timeline: The evolution of life |url=https://www.newscientist.com/article/dn17453-timeline-the-evolution-of-life/ |access-date=2024-04-17 |website=New Scientist |language=en-US}}</ref><ref>{{Cite web |last=#author.fullName} |title=Timeline: The evolution of life |url=https://www.newscientist.com/article/dn17453-timeline-the-evolution-of-life/ |access-date=2024-04-17 |website=New Scientist |language=en-US}}</ref>
+Molecular clocks on the other hand are specific sequences of [[DNA]], [[RNA]] or [[Protein|proteins]] (amino acids) that are used to determine at molecular level the similarities and differences among species, to find out the timeline of divergence, <ref>{{Cite web |title=Molecular clocks |url=https://evolution.berkeley.edu/molecular-clocks/ |access-date=2024-04-18 |website=evolution.berkeley.edu}}</ref> and to trace back the common ancestor of species based on the [[Mutation rate|mutation rates]] and sequence changes being accumulated in those specific sequences. <ref>{{Cite web |title=Molecular clocks |url=https://evolution.berkeley.edu/molecular-clocks/ |access-date=2024-04-18 |website=evolution.berkeley.edu}}</ref> The primary driver of evolution is the mutation or changes in genes and accounting for those changes overtime determines the approximate genetic distance between species. These specific molecular clocks are fairly [[Conserved sequence|conserved]] across a range of species and have a constant rate of mutation like a clock and are calibrated based on evolutionary events (fossil records). For example, gene for alpha-globin (constituent of hemoglobin) mutates at a rate of 0.56 per base pair per billion year. <ref>{{Cite web |title=Molecular clocks |url=https://evolution.berkeley.edu/molecular-clocks/ |access-date=2024-04-18 |website=evolution.berkeley.edu}}</ref>. The molecular clock can fill those gaps created by missing fossil records.
 In the genome of an [[organism]], each [[gene]] is located at a specific place called the [[locus (genetics)|locus]] for that gene. Allelic variations at these loci cause phenotypic variation within species (e.g. hair colour, eye colour). However, most alleles do not have an observable impact on the phenotype. Within a population new alleles generated by mutation either die out or spread throughout the population. When a population is split into different isolated populations (by either geographical or ecological factors), mutations that occur after the split will be present only in the isolated population. Random fluctuation of allele frequencies also produces genetic differentiation between populations. This process is known as [[genetic drift]]. By examining the differences between [[allele frequencies]] between the populations and computing genetic distance, we can estimate how long ago the two populations were separated.<ref name="Nei 1972">{{cite journal|author=Nei, M.|title=Genetic distance between populations|journal=Am. Nat.|volume=106|pages=283–292|year=1972|issue=949|doi=10.1086/282771|s2cid=55212907}}</ref>
+Let’s suppose a sequence of DNA or a hypothetical gene that has mutation rate of one [[Base pair|base]] per 10 million years. Using this sequence of DNA, the divergence of two different species or genetic distance between two different species can be determined by counting the number of base pair differences among them. For example, a difference of 4 bases in the hypothetical sequence among those two species would indicate that they diverged 40 million years ago, and their common ancestor would have lived at least 20 million years ago before their divergence. Based on molecular clock, the equation below can be used to calculate the time since divergence.<ref>{{Citation |last=Cheng |first=Eric C.K. |title=Crafting future pedagogies through Lesson Study |date=2024-01-18 |work=Implementing a 21st Century Competency-Based Curriculum Through Lesson Study |pages=3–18 |url=http://dx.doi.org/10.4324/9781003374107-2 |access-date=2024-04-18 |place=London |publisher=Routledge |isbn=978-1-003-37410-7}}</ref>
+''Number of mutation ÷ Mutation per year (rate of mutation) = time since divergence''
+[[File:Divergence timeline between species.png|thumb|Divergence timeline between two hypothetical species.]]
+== Process of determining genetic distance ==
+Recent advancement in [[DNA sequencing|sequencing technology]] and the availability of comprehensive [[List of biological databases|genomic databases]] and [[Bioinformatics tool|bioinformatics tools]] that are capable of storing and processing colossal amount of data generated by the advanced sequencing technology has tremendously improved [[Evolutionary biology|evolutionary studies]] and the understanding of evolutionary relationships among species.<ref>{{Cite journal |last=Koboldt |first=Daniel C. |last2=Steinberg |first2=Karyn Meltz |last3=Larson |first3=David E. |last4=Wilson |first4=Richard K. |last5=Mardis |first5=Elaine R. |date=2013-09 |title=The Next-Generation Sequencing Revolution and Its Impact on Genomics |url=https://linkinghub.elsevier.com/retrieve/pii/S0092867413011410 |journal=Cell |language=en |volume=155 |issue=1 |pages=27–38 |doi=10.1016/j.cell.2013.09.006}}</ref><ref>{{Cite journal |last=Hudson |first=Matthew E. |date=2008-01 |title=Sequencing breakthroughs for genomic ecology and evolutionary biology |url=https://onlinelibrary.wiley.com/doi/10.1111/j.1471-8286.2007.02019.x |journal=Molecular Ecology Resources |language=en |volume=8 |issue=1 |pages=3–17 |doi=10.1111/j.1471-8286.2007.02019.x |issn=1755-098X}}</ref>
+=== Markers for genetic distance ===
+Different [[Molecular marker|biomolecular markers]] such DNA, RNA and [[amino acid]] sequences (protein) can be used for determining the genetic distance. <ref>{{Cite journal |last=Kartavtsev |first=Yuri Phedorovich |date=2021-05-20 |title=Some Examples of the Use of Molecular Markers for Needs of Basic Biology and Modern Society |url=https://www.mdpi.com/2076-2615/11/5/1473 |journal=Animals |language=en |volume=11 |issue=5 |pages=1473 |doi=10.3390/ani11051473 |issn=2076-2615}}</ref><ref>{{Cite journal |last=Bhandari |first=Vaibhav |last2=Naushad |first2=Hafiz S. |last3=Gupta |first3=Radhey S. |date=2012 |title=Protein based molecular markers provide reliable means to understand prokaryotic phylogeny and support Darwinian mode of evolution |url=http://journal.frontiersin.org/article/10.3389/fcimb.2012.00098/abstract |journal=Frontiers in Cellular and Infection Microbiology |volume=2 |doi=10.3389/fcimb.2012.00098 |issn=2235-2988}}</ref>
+The selection criteria<ref name=":0">{{Cite journal |last=Anne |first=Chenuil |date=2006-05 |title=Choosing the right molecular genetic markers for studying biodiversity: from molecular evolution to practical aspects |url=http://link.springer.com/10.1007/s10709-005-2485-1 |journal=Genetica |language=en |volume=127 |issue=1-3 |pages=101–120 |doi=10.1007/s10709-005-2485-1 |issn=0016-6707}}</ref> of appropriate biomarker for genetic distance entails the following three steps:
+# choice of [[Genetic variability|variability]]
+# choice of specific region of DNA or RNA
+# the use of [[DNA Analysis|technique]]
+The choice of variability depends on the intended outcome. For example, very high level of variability is recommended for [[Demography|demographic studies]] and [[Parentage testing|parentage analyses]], medium to high variability for comparing distinct populations, and moderate to very low variability is recommended for phylogenetic studies.<ref name=":0" /> The genomic localization and [[ploidy]] of the marker is also an important factor. For example, the [[Copy number variation|gene copy numbe]]<nowiki/>r is inversely proportional to the robustness with [[haploid]] genome ([[mitochondrial DNA]]) more prone to [[genetic drift]] than diploid genome ([[nuclear DNA]]).
+The choice and examples of molecular markers for evolutionary biology studies.<ref name=":0" />
+{| class="wikitable"
+|
+|'''Biological issues/biodiversity level'''
+|'''Level of variability'''
+|'''Nature of information required'''
+|'''Examples of most used markers'''
+|-
+| rowspan="4" |'''Intra-population'''
+|Intra-population
+Fine population structure, reproduction system
+|Medium to high
+|(N) [[Dominance (genetics)|codominant]]
+loci = ([[Multilocus genotype|Multilocus]])
+[[genotype]]
+|[[Microsatellite|Microsatellites]], [[Alloenzyme|allozymes]]
+|-
+|[[DNA profiling|Fingerprinting]]. [[Parentage testing|parentage analysis]]
+|Very high
+|Codominant loci or numerous dominant loci
+|Microsatellites ([[Random amplification of polymorphic DNA|RAPD]], [[AFLP-PCR|AFLP]])
+|-
+|Demography
+|Medium to high
+|Allele frequency in samples taken at different times
+|Allozymes, Microsatellites
+|-
+|[[Demographic history]]
+|Medium to high
+|Allele frequency + evolutionary relationships
+|[[Mitochondrial DNA|Mt-DNA]] sequences
+|-
+| rowspan="2" |'''Inter-population'''
+|[[Phylogeography]], definition of evolutionary significant units (population structure)
+|Medium to high
+|Allele frequency in each population
+|Allozymes, microsatellites (risk of size [[homoplasy]])
+|-
+|[[Conservation biology|Bio-conservation]]
+|Medium
+|Allele evolutionary relationships
+|Mt-DNA (if variable enough)
+|-
+|'''Inter-specific'''
+|Close species
+|[[Circa#:~:text=Circa is a Latin word,art platform based in London|ca]]. 1%/[[Million years ago|my]]
+|No variability within species if possible
+|Sequences of Mt-DNA, [[Internal transcribed spacer|ITS rDNA]]
+|}
+=== '''Application of genetic distance''' ===
+* '''Phylogenetics''': Exploring the genetic distance among species can help in establishing evolutionary relationships among them, the time of divergence between them and creating a comprehensive phylogenetic tree that connect them to their common ancestors.
+*  '''Accuracy of genomic prediction''': Genetic distance can be used to predict unobserved phenotypes which has implication in [[Medical diagnosis|medical diagnostics]], and breeding of plants and animals. <ref>{{Cite journal |last=Scutari |first=Marco |last2=Mackay |first2=Ian |last3=Balding |first3=David |date=2016-09-02 |editor-last=Hickey |editor-first=John Micheal |title=Using Genetic Distance to Infer the Accuracy of Genomic Prediction |url=https://dx.plos.org/10.1371/journal.pgen.1006288 |journal=PLOS Genetics |language=en |volume=12 |issue=9 |pages=e1006288 |doi=10.1371/journal.pgen.1006288 |issn=1553-7404}}</ref>
+* '''[[Population genetics|Population Genetics]]''': Genetic distance can help in studying population genetics, understanding intra and inter-population genetic diversity.
+* '''[[Taxonomy (biology)|Taxonomy]] and [[Species delimitation|Species Delimitation]]''': Determining genetic distance through [[DNA barcoding]] is an effective tool for delimiting species especially identifying [[cryptic species]].<ref>{{Cite journal |last=Shin |first=Caren P. |last2=Allmon |first2=Warren D. |date=2023-09 |title=How we study cryptic species and their biological implications: A case study from marine shelled gastropods |url=https://onlinelibrary.wiley.com/doi/10.1002/ece3.10360 |journal=Ecology and Evolution |language=en |volume=13 |issue=9 |doi=10.1002/ece3.10360 |issn=2045-7758}}</ref> An optimized percentage threshold genetic distance is recommended based on the data and species being studied to improve and enhance the reliability and applicability of delimitation<ref>{{Cite journal |last=Ma |first=Zhuo |last2=Ren |first2=Jinliang |last3=Zhang |first3=Runzhi |date=2022-03-05 |title=Identifying the Genetic Distance Threshold for Entiminae (Coleoptera: Curculionidae) Species Delimitation via COI Barcodes |url=https://www.mdpi.com/2075-4450/13/3/261 |journal=Insects |language=en |volume=13 |issue=3 |pages=261 |doi=10.3390/insects13030261 |issn=2075-4450}}</ref><ref>{{Cite journal |last=Meyer |first=Christopher P |last2=Paulay |first2=Gustav |date=2005-11-29 |editor-last=Godfray |editor-first=Charles |title=DNA Barcoding: Error Rates Based on Comprehensive Sampling |url=https://dx.plos.org/10.1371/journal.pbio.0030422 |journal=PLoS Biology |language=en |volume=3 |issue=12 |pages=e422 |doi=10.1371/journal.pbio.0030422 |issn=1545-7885}}</ref><ref>{{Cite journal |last=Bianchi |first=Filipe Michels |last2=Gonçalves |first2=Leonardo Tresoldi |date=2021-04-24 |title=Borrowing the Pentatomomorpha tome from the DNA barcode library: Scanning the overall performance of<i>cox1</i>as a tool |url=http://dx.doi.org/10.1111/jzs.12476 |journal=Journal of Zoological Systematics and Evolutionary Research |volume=59 |issue=5 |pages=992–1012 |doi=10.1111/jzs.12476 |issn=0947-5745}}</ref> that can delineate species boundaries and identify cryptic species that look similar but are genetically distinct.
+=== Evolutionary forces affecting genetic distance ===
+[[Evolutionary biology|Evolutionary forces]] such as mutation, genetic drift, [[natural selection]], and [[gene flow]] drive the process of evolution and genetic diversity. All these forces play significant role in genetic distance within and among species.<ref>{{Cite journal |last=Saeb |first=Amr T. M. |last2=Al-Naqeb |first2=Dhekra |date=2016 |title=The Impact of Evolutionary Driving Forces on Human Complex Diseases: A Population Genetics Approach |url=http://www.hindawi.com/journals/scientifica/2016/2079704/ |journal=Scientifica |language=en |volume=2016 |pages=1–10 |doi=10.1155/2016/2079704 |issn=2090-908X}}</ref>
 ==Measures==