Incomplete lineage sorting

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search
Figure 1. The pretransmission interval and incomplete lineage sorting in the phylogeny of a human-transmissible virus. The shaded tree represents a transmission chain where each region represents the pathogen population in each of three patients. The width of the shaded regions corresponds to the genetic diversity. In this scenario, A infects B with an imperfect transmission bottleneck, and then B infects C. The genealogy at the bottom is reconstructed from a sample of a single lineage from each patient at three distinct time points. When diversity exists in donor A, a pre-transmission interval will occur at each inferred transmission event (MRCA(A,B) precedes transmission from A to B), and the order of transmission events may become randomized in the virus genealogy. Note that the pre-transmission interval also is a random variable defined by the donor’s diversity at time of each transmission. Terminal branch lengths are also elongated due to these processes.

Incomplete lineage sorting[1][2][3] is a characteristic of phylogenetic analysis where the tree produced by a single gene differs from the population or species level tree, producing a discordant tree. As a result, species level tree may depend on the selected genes used for assessment.[4][5] This is in contrast to complete lineage sorting, where the tree produced by the gene is the same as the population or species level tree. Both are common results in phylogenetic analysis, although it depends on the gene, organism, and sampling technique.


The concept of incomplete lineage sorting has some important implications for phylogenetic techniques. The concept itself is somewhat challenging and relies on persistence of polymorphisms across different speciation events. Suppose two subsequent speciation events occur where the ancient species gives rise to species A, and secondly to species B and C. When studying a single gene, it can contain multiple haplotypes (a polymorphism). A haplotype can be lost or fixed in a species by genetic drift. If the ancestral species has 2 haplotypes, species A will contain haplotype 1 and 2, and by genetic drift and divergence by further mutation it can fix haplotype 1a. the lineage between species A and species B and C still contains haplotypes 1 and 2. This lineage has thus incomplete sorting of the gene lineages. In species B haplotype 2 can becomes fixed, whereas haplotype 1b can become fixes in species C. If the phylogeny of these species is based on these genes, it will not represent the actual relationships between the species. In other words, the most related species will not necessarily inherit the most related haplotypes of genes. This is of course a simplified example and in real research it is usually more complex containing more genes and/or species[6].

When studying primates, chimpanzees and bonobos are more related to each other than any other taxa and are thus sister taxa. Still, for 1.6% of the bonobo genome, sequences are more closely related to homologues of humans than to chimpanzees, which is probably a result of incomplete lineage sorting[4].


The implications of incomplete lineage sorting for phylogenetic research is very important. There is a chance that when creating a phylogenetic tree it may not resemble actual relationships because of this incomplete lineage sorting. However, gene flow between lineages by hybridization or horizontal gene transfer may produce the same conflicting phylogenetic tree. Distinguishing these different processes may seem difficult, but much research and different statistical approaches are (being) developed to gain greater insight in these evolutionary dynamics[7]. One of the resolutions to reduce the implications of incomplete lineage sorting is to use multiple genes for creating species or population phylogenies. The more genes used, the more reliable the phylogeny becomes[6].

In diploid organisms[edit]

Incomplete lineage sorting commonly happens with sexual reproduction because the species cannot be traced back to a single person or breeding pair. When organism tribe populations are large (ie. thousands) each gene has some diversity and the gene tree consists of other pre-existing lineages. If the population is bigger these ancestral lineages are going to persist longer. When you get large ancestral populations together with closely timed speciation events these different pieces of DNA retain conflicting affiliations. This makes it hard to determine a common ancestor or points of branching[4].

In viruses[edit]

Incomplete lineage sorting is a common feature in viral phylodynamics, where the phylogeny represented by transmission of a disease from one person to the next, which is to say the population level tree, often doesn't correspond to the tree created from a genetic analysis due to the population bottlenecks that are an inherent feature of viral transmission of disease. Figure 1 illustrates how this can occur. This has relevance to criminal transmission of HIV where in some criminal cases, a phylogenetic analysis of one or two genes from the strains from the accused and the victim have been used to infer transmission; however, the commonality of incomplete lineage sorting means that transmission cannot be inferred solely on the basis of such a basic analysis.[8]

See also[edit]


  1. ^ Simpson, Michael G (2010-07-19). Plant Systematics. ISBN 9780080922089.
  2. ^ Kuritzin, A; Kischka, T; Schmitz, J; Churakov, G (2016). "Incomplete Lineage Sorting and Hybridization Statistics for Large-Scale Retroposon Insertion Data". PLOS Computational Biology. 12 (3): e1004812. Bibcode:2016PLSCB..12E4812K. doi:10.1371/journal.pcbi.1004812. PMC 4788455. PMID 26967525.
  3. ^ Suh, A; Smeds, L; Ellegren, H (2015). "The Dynamics of Incomplete Lineage Sorting across the Ancient Adaptive Radiation of Neoavian Birds". PLOS Biology. 13 (8): e1002224. doi:10.1371/journal.pbio.1002224. PMC 4540587. PMID 26284513.
  4. ^ a b c Rogers, Jeffrey; Gibbs, Richard A. (2014-05-01). "Comparative primate genomics: emerging patterns of genome content and dynamics". Nature Reviews Genetics. 15 (5): 347–359. doi:10.1038/nrg3707. PMC 4113315. PMID 24709753.
  5. ^ Shen, Xing-Xing; Hittinger, Chris Todd; Rokas, Antonis (2017). "Contentious relationships in phylogenomic studies can be driven by a handful of genes". Nature Ecology & Evolution. 1 (5): 126. doi:10.1038/s41559-017-0126. ISSN 2397-334X. PMC 5560076. PMID 28812701.
  6. ^ a b Futuyma, Douglas J. (2013-07-15). Evolution (3rd ed.). Sunderland, Massachusetts U.S.A. ISBN 9781605351155. OCLC 824532153.
  7. ^ Warnow, Tandy; Bayzid, Md Shamsuzzoha; Mirarab, Siavash (2016-05-01). "Evaluating Summary Methods for Multilocus Species Tree Estimation in the Presence of Incomplete Lineage Sorting". Systematic Biology. 65 (3): 366–380. doi:10.1093/sysbio/syu063. ISSN 1063-5157. PMID 25164915.
  8. ^ Leitner, Thomas (May 2019). "Phylogenetics in HIV transmission: taking within-host diversity into account". Current Opinion in HIV and AIDS. 14 (3): 181. doi:10.1097/COH.0000000000000536. ISSN 1746-630X. PMID 30920395.

External links[edit]