Multispecies coalescent process: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
EBraun68 (talk | contribs)
No edit summary
EBraun68 (talk | contribs)
A table describing software used for multispecies coalescent analyses has been added. I plan to add a number of additional software packages that have been used in the scientific literature
Line 112: Line 112:


Simulations have shown that there are parts of species tree parameter space where [[Maximum likelihood estimation|maximum likelihood]] estimates of phylogeny are incorrect trees with increasing probability as the amount of data analyzed increases.<ref>{{cite journal | vauthors = Kubatko LS, Degnan JH | title = Inconsistency of phylogenetic estimates from concatenated data under coalescence | journal = Systematic Biology | volume = 56 | issue = 1 | pages = 17–24 | date = February 2007 | pmid = 17366134 | doi = 10.1080/10635150601146041 | veditors = Collins T }}</ref> This is important because the "concatenation approach," where multiple sequence alignments from different loci are concatenated to form a single large supermatrix alignment that is then used for maximum likelihood (or [[Bayesian inference in phylogeny|Bayesian MCMC]]) analysis, is both easy to implement and commonly used in empirical studies. This represents a case of model misspecification because the concatenation approach implicitly assumes that all gene trees have the same topology.<ref>{{cite journal | vauthors = Warnow T | title = Concatenation Analyses in the Presence of Incomplete Lineage Sorting | journal = PLoS Currents | volume = 7 | date = May 2015 | pmid = 26064786 | pmc = 4450984 | doi = 10.1371/currents.tol.8d41ac0f13d1abedf4c4a59f5d17b1f7 }}</ref> Indeed, it has now been proven that analyses of data generated under the multispecies coalescent using maximum likelihood analysis of a concatenated data are not guaranteed to converge on the true species tree as the number of loci used for the analysis increases<ref>{{cite journal | vauthors = Roch S, Steel M | title = Likelihood-based tree reconstruction on a concatenation of aligned sequence data sets can be statistically inconsistent | journal = Theoretical Population Biology | volume = 100C | pages = 56–62 | date = March 2015 | pmid = 25545843 | doi = 10.1016/j.tpb.2014.12.005 }}</ref><ref>{{cite journal | vauthors = Mendes FK, Hahn MW | title = Why Concatenation Fails Near the Anomaly Zone | journal = Systematic Biology | volume = 67 | issue = 1 | pages = 158–169 | date = January 2018 | pmid = 28973673 | doi = 10.1093/sysbio/syx063 }}</ref><ref>{{Cite journal|last=Roch|first=Sebastien|last2=Nute|first2=Michael|last3=Warnow|first3=Tandy|date=2019-03-01|editor-last=Kubatko|editor-first=Laura|title=Long-Branch Attraction in Species Tree Estimation: Inconsistency of Partitioned Likelihood and Topology-Based Summary Methods|url=https://academic.oup.com/sysbio/article/68/2/281/5104882|journal=Systematic Biology|language=en|volume=68|issue=2|pages=281–297|doi=10.1093/sysbio/syy061|issn=1063-5157}}</ref> (i.e., maximum likelihood concatenation is is statistically inconsistent).
Simulations have shown that there are parts of species tree parameter space where [[Maximum likelihood estimation|maximum likelihood]] estimates of phylogeny are incorrect trees with increasing probability as the amount of data analyzed increases.<ref>{{cite journal | vauthors = Kubatko LS, Degnan JH | title = Inconsistency of phylogenetic estimates from concatenated data under coalescence | journal = Systematic Biology | volume = 56 | issue = 1 | pages = 17–24 | date = February 2007 | pmid = 17366134 | doi = 10.1080/10635150601146041 | veditors = Collins T }}</ref> This is important because the "concatenation approach," where multiple sequence alignments from different loci are concatenated to form a single large supermatrix alignment that is then used for maximum likelihood (or [[Bayesian inference in phylogeny|Bayesian MCMC]]) analysis, is both easy to implement and commonly used in empirical studies. This represents a case of model misspecification because the concatenation approach implicitly assumes that all gene trees have the same topology.<ref>{{cite journal | vauthors = Warnow T | title = Concatenation Analyses in the Presence of Incomplete Lineage Sorting | journal = PLoS Currents | volume = 7 | date = May 2015 | pmid = 26064786 | pmc = 4450984 | doi = 10.1371/currents.tol.8d41ac0f13d1abedf4c4a59f5d17b1f7 }}</ref> Indeed, it has now been proven that analyses of data generated under the multispecies coalescent using maximum likelihood analysis of a concatenated data are not guaranteed to converge on the true species tree as the number of loci used for the analysis increases<ref>{{cite journal | vauthors = Roch S, Steel M | title = Likelihood-based tree reconstruction on a concatenation of aligned sequence data sets can be statistically inconsistent | journal = Theoretical Population Biology | volume = 100C | pages = 56–62 | date = March 2015 | pmid = 25545843 | doi = 10.1016/j.tpb.2014.12.005 }}</ref><ref>{{cite journal | vauthors = Mendes FK, Hahn MW | title = Why Concatenation Fails Near the Anomaly Zone | journal = Systematic Biology | volume = 67 | issue = 1 | pages = 158–169 | date = January 2018 | pmid = 28973673 | doi = 10.1093/sysbio/syx063 }}</ref><ref>{{Cite journal|last=Roch|first=Sebastien|last2=Nute|first2=Michael|last3=Warnow|first3=Tandy|date=2019-03-01|editor-last=Kubatko|editor-first=Laura|title=Long-Branch Attraction in Species Tree Estimation: Inconsistency of Partitioned Likelihood and Topology-Based Summary Methods|url=https://academic.oup.com/sysbio/article/68/2/281/5104882|journal=Systematic Biology|language=en|volume=68|issue=2|pages=281–297|doi=10.1093/sysbio/syy061|issn=1063-5157}}</ref> (i.e., maximum likelihood concatenation is is statistically inconsistent).

==Software for inference using the multispecies coalescent==

There are three basic methods for phylogenetic estimation in the multispecies coalescent framework: 1) full likelihood methods, which integrate over the uncertainty in gene trees (the approach originally proposed by Maddison 1997<ref name=":3" />); 2) gene tree summary methods, which accept a set of gene trees as input and output an estimate of the species tree; and 3) site pattern methods, which produce an estimate of the species tree directly from aligned sites.
{| class="wikitable"
|+Software for phylogenetic estimation in the multispecies coalescent framework
!Program
!Description
!Method
!References
|-
|[https://github.com/smirarab/ASTRAL ASTRAL]
|ASTRAL (Accurate Species TRee ALgorithm) summarizes a set of gene trees using a quartet method generate an estimate of the species tree with coalescent branch lengths and support values (local posterior probabilities<ref>{{Cite journal|last=Sayyari|first=Erfan|last2=Mirarab|first2=Siavash|date=2016-07|title=Fast Coalescent-Based Computation of Local Branch Support from Quartet Frequencies|url=https://academic.oup.com/mbe/article-lookup/doi/10.1093/molbev/msw079|journal=Molecular Biology and Evolution|language=en|volume=33|issue=7|pages=1654–1668|doi=10.1093/molbev/msw079|issn=0737-4038|pmc=PMC4915361|pmid=27189547}}</ref>)
|Summary
|Mirarab et al. (2014)<ref>{{Cite journal|last=Mirarab|first=S.|last2=Reaz|first2=R.|last3=Bayzid|first3=Md. S.|last4=Zimmermann|first4=T.|last5=Swenson|first5=M. S.|last6=Warnow|first6=T.|date=2014-09-01|title=ASTRAL: genome-scale coalescent-based species tree estimation|url=https://academic.oup.com/bioinformatics/article-lookup/doi/10.1093/bioinformatics/btu462|journal=Bioinformatics|language=en|volume=30|issue=17|pages=i541–i548|doi=10.1093/bioinformatics/btu462|issn=1460-2059|pmc=PMC4147915|pmid=25161245}}</ref>; Zhang et al. (2018)<ref>{{Cite journal|last=Zhang|first=Chao|last2=Rabiee|first2=Maryam|last3=Sayyari|first3=Erfan|last4=Mirarab|first4=Siavash|date=2018-05|title=ASTRAL-III: polynomial time species tree reconstruction from partially resolved gene trees|url=https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-018-2129-y|journal=BMC Bioinformatics|language=en|volume=19|issue=S6|pages=153|doi=10.1186/s12859-018-2129-y|issn=1471-2105|pmc=PMC5998893|pmid=29745866}}</ref>
|-
|[http://abacus.gene.ucl.ac.uk/software.html BPP]
|Software package for inferring phylogeny and divergence times among populations under the multispecies coalescent process; also includes method for species delimitation
|Full likelihood
|Yang et al. (2015)<ref>{{Cite journal|last=Yang|first=Ziheng|date=2015-10-01|title=The BPP program for species tree estimation and species delimitation|url=https://academic.oup.com/cz/article/61/5/854/1821090|journal=Current Zoology|language=en|volume=61|issue=5|pages=854–865|doi=10.1093/czoolo/61.5.854|issn=2396-9814}}</ref>; Flouri et al. (2018)<ref>{{Cite journal|last=Flouri|first=Tomáš|last2=Jiao|first2=Xiyun|last3=Rannala|first3=Bruce|last4=Yang|first4=Ziheng|date=2018-10-01|editor-last=Yoder|editor-first=Anne D|title=Species Tree Inference with BPP Using Genomic Sequences and the Multispecies Coalescent|url=https://academic.oup.com/mbe/article/35/10/2585/5057515|journal=Molecular Biology and Evolution|language=en|volume=35|issue=10|pages=2585–2593|doi=10.1093/molbev/msy147|issn=0737-4038|pmc=PMC6188564|pmid=30053098}}</ref>
|-
|MP-EST
|Accepts a set of gene trees as input and generates the [[Pseudolikelihood|maximum pseudolikelihood]] estimate of the species tree
|Summary
|Liu et al. (2010)<ref>{{Cite journal|last=Liu|first=Liang|last2=Yu|first2=Lili|last3=Edwards|first3=Scott V|date=2010|title=A maximum pseudo-likelihood approach for estimating species trees under the coalescent model|url=http://bmcevolbiol.biomedcentral.com/articles/10.1186/1471-2148-10-302|journal=BMC Evolutionary Biology|language=en|volume=10|issue=1|pages=302|doi=10.1186/1471-2148-10-302|issn=1471-2148|pmc=PMC2976751|pmid=20937096}}</ref>
|-
|SVDquartets (in [https://paup.phylosolutions.com/ PAUP*])
|PAUP* is a general phylogenetic estimation package that implements many methods. SVDquartets is a method that has shown to be statistically consistent for data generated given the multispecies coalescent
|Site-pattern method
|Chifman and Kubatko (2014)<ref>{{Cite journal|last=Chifman|first=Julia|last2=Kubatko|first2=Laura|date=2014-12-01|title=Quartet Inference from SNP Data Under the Coalescent Model|url=https://academic.oup.com/bioinformatics/article-lookup/doi/10.1093/bioinformatics/btu530|journal=Bioinformatics|language=en|volume=30|issue=23|pages=3317–3324|doi=10.1093/bioinformatics/btu530|issn=1460-2059|pmc=PMC4296144|pmid=25104814}}</ref>
|}


== References ==
== References ==

Revision as of 03:08, 5 October 2020

Multispecies Coalescent Process is a stochastic process model that describes the genealogical relationships for a sample of DNA sequences taken from several species.[1] It represents the application of coalescent theory to the case of multiple species. The multispecies coalescent results in cases where the relationships among species for an individual gene (the gene tree) can differ from the broader history of the species (the species tree). It has important implications for the theory and practice of phylogenetics[2][3] and for understanding genome evolution.

A gene tree is a binary graph that describes the evolutionary relationships between a sample of sequences for a non-recombining locus. A species tree describes the evolutionary relationships between a set of species, assuming tree-like evolution. However, several processes can lead to discordance between gene trees and species trees. The Multispecies Coalescent model provides a framework for inferring species phylogenies while accounting for ancestral polymorphism and gene tree-species tree conflict. The process is also called the Censored Coalescent.[4]

Discordance between gene trees and the species tree can lead to the cases where characters that appear to be homoplastic (to be gained or lost independently in separate lineages) given the relationships among species when those characters actually has a single origin; this phenomenon is called hemiplasy.[5] Many studies that have focused on hemiplasy have focused on genomic characters like nucleotide or amino acid substitutions,[6] indels,[7][8] or karyotypic differences[9] (though the last of these is thought to be less subject to hemiplasy than many other characters). However, phenotypic characters also have the potential to exhibit hemiplasy.[10]

Gene tree-species tree congruence

Multispecies coalescent for rooted three-taxon tree
Illustration of the multispecies coalescent showing the relationship between the species tree (black outline) and gene trees (dashed red lines embedded in the species tree). The time between the two speciation events (T, measured in coalescent units) can be used to calculate the probability of the four possible gene trees (using the equations shown). Note that two of the gene trees are topologically identical but they differ in the times at which lineages coalesce.

If we consider a rooted three-taxon tree, the simplest non-trivial phylogenetic tree, there are three different tree topologies[11] but four possible gene trees.[12] The existence of four distinct gene trees despite the smaller number of topologies reflects the fact that there are topologically identical gene tree that differ in their coalescent times. In the type 1 tree the alleles in species A and B coalesce after the speciation event that separated the A-B lineage from the C lineage. In the type 2 tree the alleles in species A and B coalesce before the speciation event that separated the A-B lineage from the C lineage (in other words, the type 2 tree is a deep coalescence tree). The type 1 and type 2 gene trees are both congruent with the species tree. The other two gene trees differ from the species tree; the two discordant gene trees are also deep coalescence trees.

The distribution of times to coalescence is actually continuous for all of these trees. In other words, the exact coalescent time for any two loci with the same gene tree may differ. However, it is convenient to break up the trees based on whether the coalescence occurred before or after the earliest speciation event.

Given the internal branch length in coalescent units it is straightforward to calculate the probability of each gene tree.[13] For diploid organisms the branch length in coalescent units is the number of generations between the speciation events divided by twice the effective population size. Since all three of the deep coalescence tree are equiprobable and two of those deep coalescence tree are discordant it is easy to see that the probability that a rooted three-taxon gene tree will be congruent with the species tree is:

Where the branch length in coalescent units (T) is also written in an alternative form: the number of generations (t) divided by twice the effective population size (Ne). Pamilo and Nei[13] also derived the probability of congruence for rooted trees of four and five taxa as well as a general upper bound on the probability of congruence for larger trees. Rosenberg[14] followed up with equations used for the complete set of topologies (although the large number of distinct phylogenetic trees that becomes possible as the number of taxa increases[11] makes these equations impractical unless the number of taxa is very limited).

Examples of the differences between hemiplasy (which requires gene tree-species tree differences) and true homoplasy (which can occur on a gene tree that is congruent with the species tree or a gene tree that is the discordant with tree species tree). This example shows the origins of some trait (blue). The presence (+) or absence (-) of the trait in each species is indicated at the top of the figure. Note that homoplasy can also reflect a single origin followed by a loss.

The phenomenon of hemiplasy is a natural extension of the basic idea underlying gene tree-species tree discordance. If we consider the distribution of some character that disagrees with the species tree it might reflect homoplasy (multiple independent origins of the character or a single origin followed by multiple losses) or it could reflect hemiplasy (a single origin of the trait that is associated with a gene tree that disagrees with the species tree).

The phenomenon called incomplete lineage sorting (often abbreviated ILS in the scientific literatures[7]) is linked to the phenomenon. If we examine the illustration of hemiplasy with using a rooted four-taxon tree (see image to the right) the lineage between the common ancestor of taxa A, B, and C and the common ancestor of taxa A and B must be polymorphic for the allele with the derived trait (e.g., a transposable element insertion[15]) and the allele with the ancestral trait. The concept of incomplete lineage sorting ultimately reflects on persistence of polymorphisms across one or more speciation events.

Mathematical description of the multispecies coalescent

The probability density of the gene trees under the multispecies coalescent model is discussed along with its use for parameter estimation using multi-locus sequence data.

Assumptions

The species phylogeny is assumed to be known. Complete isolation after species divergence, with no migration, hybridization, or introgression is also assumed. We assume no recombination so that all the sites within the locus share the same gene tree (topology and coalescent times).

Data and model parameters

The model and implementation of this method can be applied to any species tree. As an example, the species tree of the great apes: human (H), chimpanzee (C), gorilla (G) and orangutan (O) is considered. The topology of the species tree, (((HC)G)O)), is assumed known and fixed in the analysis (Figure 1).[4] Let be the entire data set, where represent the sequence alignment at locus , with for a total of loci.

The population size of a current species is considered only if more than one individual is sampled from that species at some loci.

The parameters in the model for the example of Figure 1 include the three divergence times , and and population size parameters for humans; for chimpanzees; and , and for the three ancestral species.

The divergence times ('s) are measured by the expected number of mutations per site from the ancestral node in the species tree to the present time (Figure 1 of Rannala and Yang, 2003).

Therefore, the parameters are .

Likelihood-based inference

The gene genealogy at each locus is represented by the tree topology and the coalescent times . Given parameters , the probability distribution of is specified by the coalescent process under the model given by

The probability of data given the gene tree and coalescent times (and thus branch lengths) at the locus, is the Felsenstein's phylogenetic likelihood.[16] Due to the assumption of independent evolution across the loci,

By Bayesian inference based on the joint conditional distribution

Then, the posterior distribution of is given by

where the integration represents summation over all possible gene tree topologies and integration over the coalescent times at each locus.[17]

Distribution of gene genealogies

The joint distribution of is derived directly in this section. Two sequences from different species can coalesce only in one populations that are ancestral to the two species. For example, sequences H and G can coalesce in populations HCG or HCGO, but not in populations H or HC. The coalescent processes in different populations are different.

For each population, the genealogy is traced backward in time, until the end of the population at time , and the number of lineages entering the population and the number of lineages leaving it are recorded. For example, and , for population H (Table 1).[4] This process is called a censored coalescent process because the coalescent process for one population may be terminated before all lineages that entered the population have coalesced. If the population consists of disconnected subtrees or lineages.

With one time unit defined as the time taken to accumulate one mutation per site, any two lineages coalesce at the rate . The waiting time until the next coalescent event, which reduces the number of lineages from to has exponential density

If , the probability that no coalescent event occurs between the last one and the end of the population at time ; i.e. during the time interval . This probability is and is 1 if .

(Note: One should recall that the probability of no events over time interval for a Poisson process with rate is . Here the coalescent rate when there are lineages is .)

In addition, to derive the probability of a particular gene tree topology in the population, if a coalescent event occurs in a sample of lineages, the probability that a particular pair of lineages coalesce is .

Multiplying these probabilities together, the joint probability distribution of the gene tree topology in the population and its coalescent times as

.

The probability of the gene tree and coalescent times for the locus is the product of such probabilities across all the populations. Therefore, the gene genealogy of Figure 1,[4][18] we have

Impact on phylogenetic estimation

The multispecies coalescent has profound implications for the theory and practice of molecular phylogenetics.[2][3] Since individual gene trees can differ from the species tree one cannot estimate the tree for a single locus and assume that the gene tree correspond the species tree. In fact, one can be virtually certain that any individual gene tree will differ from the species tree for at least some relationships when any reasonable number of taxa are considered. However, gene tree-species tree discordance has an impact on the theory and practice of species tree estimation that goes beyond the simple observation that one cannot use a single gene tree to estimate the species tree because there is a part of parameter space where the most frequent gene tree is incongruent with the species tree. This part of parameter space is called the anomaly zone[19] and any discordant gene trees that are more expected to arise more often than the gene tree. that matches the species tree are called anomalous gene trees.

The existence of the anomaly zone implies that one cannot simply estimate a large number of gene trees and assume the gene tree recovered the largest number of times is the species tree. Of course, estimating the species tree by a "democratic vote" of gene trees would only work for a limited number of taxa outside of the anomaly zone given the extremely large number of phylogenetic trees that are possible.[11] However, the existence of the anomalous gene trees also means that simple methods for combining gene trees, like the majority rule extended ("greedy") consensus method or the matrix representation with parsimony (MRP) supertree[20][21] approach, will not be consistent estimators of the species tree[22][23] (i.e., they will be misleading). Simply generating the majority-rule consensus tree for the gene trees, where groups that are present in at least 50% of gene trees are retained, will not be misleading as long as a sufficient number of gene trees are used.[22] However, this ability of the majority-rule consensus tree for a set of gene trees to avoid incorrect clades comes at the cost of having unresolved groups.

Simulations have shown that there are parts of species tree parameter space where maximum likelihood estimates of phylogeny are incorrect trees with increasing probability as the amount of data analyzed increases.[24] This is important because the "concatenation approach," where multiple sequence alignments from different loci are concatenated to form a single large supermatrix alignment that is then used for maximum likelihood (or Bayesian MCMC) analysis, is both easy to implement and commonly used in empirical studies. This represents a case of model misspecification because the concatenation approach implicitly assumes that all gene trees have the same topology.[25] Indeed, it has now been proven that analyses of data generated under the multispecies coalescent using maximum likelihood analysis of a concatenated data are not guaranteed to converge on the true species tree as the number of loci used for the analysis increases[26][27][28] (i.e., maximum likelihood concatenation is is statistically inconsistent).

Software for inference using the multispecies coalescent

There are three basic methods for phylogenetic estimation in the multispecies coalescent framework: 1) full likelihood methods, which integrate over the uncertainty in gene trees (the approach originally proposed by Maddison 1997[2]); 2) gene tree summary methods, which accept a set of gene trees as input and output an estimate of the species tree; and 3) site pattern methods, which produce an estimate of the species tree directly from aligned sites.

Software for phylogenetic estimation in the multispecies coalescent framework
Program Description Method References
ASTRAL ASTRAL (Accurate Species TRee ALgorithm) summarizes a set of gene trees using a quartet method generate an estimate of the species tree with coalescent branch lengths and support values (local posterior probabilities[29]) Summary Mirarab et al. (2014)[30]; Zhang et al. (2018)[31]
BPP Software package for inferring phylogeny and divergence times among populations under the multispecies coalescent process; also includes method for species delimitation Full likelihood Yang et al. (2015)[32]; Flouri et al. (2018)[33]
MP-EST Accepts a set of gene trees as input and generates the maximum pseudolikelihood estimate of the species tree Summary Liu et al. (2010)[34]
SVDquartets (in PAUP*) PAUP* is a general phylogenetic estimation package that implements many methods. SVDquartets is a method that has shown to be statistically consistent for data generated given the multispecies coalescent Site-pattern method Chifman and Kubatko (2014)[35]

References

  1. ^ Degnan JH, Rosenberg NA (June 2009). "Gene tree discordance, phylogenetic inference and the multispecies coalescent". Trends in Ecology & Evolution. 24 (6): 332–40. doi:10.1016/j.tree.2009.01.009. PMID 19307040.
  2. ^ a b c Maddison, Wayne P. (1997-09-01). "Gene Trees in Species Trees". Systematic Biology. 46 (3): 523–536. doi:10.1093/sysbio/46.3.523. ISSN 1063-5157. {{cite journal}}: Unknown parameter |name-list-format= ignored (|name-list-style= suggested) (help)
  3. ^ a b Edwards SV (January 2009). "Is a new and general theory of molecular systematics emerging?". Evolution; International Journal of Organic Evolution. 63 (1): 1–19. doi:10.1111/j.1558-5646.2008.00549.x. PMID 19146594.
  4. ^ a b c d Rannala B, Yang Z (August 2003). "Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci". Genetics. 164 (4): 1645–56. PMC 1462670. PMID 12930768.
  5. ^ Avise JC, Robinson TJ (June 2008). Kubatko L (ed.). "Hemiplasy: a new term in the lexicon of phylogenetics". Systematic Biology. 57 (3): 503–7. doi:10.1080/10635150802164587. PMID 18570042.
  6. ^ Mendes FK, Hahn Y, Hahn MW (December 2016). "Gene Tree Discordance Can Generate Patterns of Diminishing Convergence over Time". Molecular Biology and Evolution. 33 (12): 3299–3307. doi:10.1093/molbev/msw197. PMID 27634870.
  7. ^ a b Jarvis ED, Mirarab S, Aberer AJ, Li B, Houde P, Li C, et al. (December 2014). "Whole-genome analyses resolve early branches in the tree of life of modern birds". Science. 346 (6215): 1320–31. doi:10.1126/science.1253451. PMC 4405904. PMID 25504713.
  8. ^ Houde P, Braun EL, Narula N, Minjares U, Mirarab S (2019-07-06). "Phylogenetic Signal of Indels and the Neoavian Radiation". Diversity. 11 (7): 108. doi:10.3390/d11070108.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  9. ^ Robinson TJ, Ruiz-Herrera A, Avise JC (September 2008). "Hemiplasy and homoplasy in the karyotypic phylogenies of mammals". Proceedings of the National Academy of Sciences of the United States of America. 105 (38): 14477–81. doi:10.1073/pnas.0807433105. PMC 2567171. PMID 18787123.
  10. ^ Guerrero RF, Hahn MW (December 2018). "Quantifying the risk of hemiplasy in phylogenetic inference". Proceedings of the National Academy of Sciences of the United States of America. 115 (50): 12787–12792. doi:10.1073/pnas.1811268115. PMC 6294915. PMID 30482861.
  11. ^ a b c Felsenstein, Joseph (March 1978). "The Number of Evolutionary Trees". Systematic Zoology. 27 (1): 27. doi:10.2307/2412810. {{cite journal}}: Unknown parameter |name-list-format= ignored (|name-list-style= suggested) (help)
  12. ^ Hobolth A, Christensen OF, Mailund T, Schierup MH (February 2007). "Genomic relationships and speciation times of human, chimpanzee, and gorilla inferred from a coalescent hidden Markov model". PLoS Genetics. 3 (2): e7. doi:10.1371/journal.pgen.0030007. PMC 1802818. PMID 17319744.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  13. ^ a b Pamilo P, Nei M (September 1988). "Relationships between gene trees and species trees". Molecular Biology and Evolution. 5 (5): 568–83. doi:10.1093/oxfordjournals.molbev.a040517. PMID 3193878.
  14. ^ Rosenberg NA (March 2002). "The probability of topological concordance of gene trees and species trees". Theoretical Population Biology. 61 (2): 225–47. doi:10.1006/tpbi.2001.1568. PMID 11969392.
  15. ^ Suh A, Smeds L, Ellegren H (August 2015). Penny D (ed.). "The Dynamics of Incomplete Lineage Sorting across the Ancient Adaptive Radiation of Neoavian Birds". PLoS Biology. 13 (8): e1002224. doi:10.1371/journal.pbio.1002224. PMC 4540587. PMID 26284513.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  16. ^ Felsenstein J (1981). "Evolutionary trees from DNA sequences: a maximum likelihood approach". Journal of Molecular Evolution. 17 (6): 368–76. doi:10.1007/BF01734359. PMID 7288891.
  17. ^ Xu B, Yang Z (December 2016). "Challenges in Species Tree Estimation Under the Multispecies Coalescent Model". Genetics. 204 (4): 1353–1368. doi:10.1534/genetics.116.190173. PMC 5161269. PMID 27927902.
  18. ^ Yang, Ziheng (2014). Molecular evolution : a statistical approach (First ed.). Oxford: Oxford University Press. pp. Chapter 9. ISBN 9780199602605. OCLC 869346345. {{cite book}}: Unknown parameter |name-list-format= ignored (|name-list-style= suggested) (help)
  19. ^ Degnan JH, Rosenberg NA (May 2006). Wakeley J (ed.). "Discordance of species trees with their most likely gene trees". PLoS Genetics. 2 (5): e68. doi:10.1371/journal.pgen.0020068. PMC 1464820. PMID 16733550.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  20. ^ Baum, Bernard R. (February 1992). "Combining trees as a way of combining data sets for phylogenetic inference, and the desirability of combining gene trees". TAXON. 41 (1): 3–10. doi:10.2307/1222480. ISSN 0040-0262. {{cite journal}}: Unknown parameter |name-list-format= ignored (|name-list-style= suggested) (help)
  21. ^ Ragan, Mark A. (March 1992). "Phylogenetic inference based on matrix representation of trees". Molecular Phylogenetics and Evolution. 1 (1): 53–58. doi:10.1016/1055-7903(92)90035-F. {{cite journal}}: Unknown parameter |name-list-format= ignored (|name-list-style= suggested) (help)
  22. ^ a b Degnan JH, DeGiorgio M, Bryant D, Rosenberg NA (February 2009). "Properties of consensus methods for inferring species trees from gene trees". Systematic Biology. 58 (1): 35–54. doi:10.1093/sysbio/syp008. PMC 2909780. PMID 20525567.
  23. ^ Wang, Yuancheng; Degnan, James H. (2011-05-02). "Performance of Matrix Representation with Parsimony for Inferring Species from Gene Trees". Statistical Applications in Genetics and Molecular Biology. 10 (1). doi:10.2202/1544-6115.1611. {{cite journal}}: Unknown parameter |name-list-format= ignored (|name-list-style= suggested) (help)
  24. ^ Kubatko LS, Degnan JH (February 2007). Collins T (ed.). "Inconsistency of phylogenetic estimates from concatenated data under coalescence". Systematic Biology. 56 (1): 17–24. doi:10.1080/10635150601146041. PMID 17366134.
  25. ^ Warnow T (May 2015). "Concatenation Analyses in the Presence of Incomplete Lineage Sorting". PLoS Currents. 7. doi:10.1371/currents.tol.8d41ac0f13d1abedf4c4a59f5d17b1f7. PMC 4450984. PMID 26064786.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  26. ^ Roch S, Steel M (March 2015). "Likelihood-based tree reconstruction on a concatenation of aligned sequence data sets can be statistically inconsistent". Theoretical Population Biology. 100C: 56–62. doi:10.1016/j.tpb.2014.12.005. PMID 25545843.
  27. ^ Mendes FK, Hahn MW (January 2018). "Why Concatenation Fails Near the Anomaly Zone". Systematic Biology. 67 (1): 158–169. doi:10.1093/sysbio/syx063. PMID 28973673.
  28. ^ Roch, Sebastien; Nute, Michael; Warnow, Tandy (2019-03-01). Kubatko, Laura (ed.). "Long-Branch Attraction in Species Tree Estimation: Inconsistency of Partitioned Likelihood and Topology-Based Summary Methods". Systematic Biology. 68 (2): 281–297. doi:10.1093/sysbio/syy061. ISSN 1063-5157.
  29. ^ Sayyari, Erfan; Mirarab, Siavash (2016-07). "Fast Coalescent-Based Computation of Local Branch Support from Quartet Frequencies". Molecular Biology and Evolution. 33 (7): 1654–1668. doi:10.1093/molbev/msw079. ISSN 0737-4038. PMC 4915361. PMID 27189547. {{cite journal}}: Check date values in: |date= (help)CS1 maint: PMC format (link)
  30. ^ Mirarab, S.; Reaz, R.; Bayzid, Md. S.; Zimmermann, T.; Swenson, M. S.; Warnow, T. (2014-09-01). "ASTRAL: genome-scale coalescent-based species tree estimation". Bioinformatics. 30 (17): i541–i548. doi:10.1093/bioinformatics/btu462. ISSN 1460-2059. PMC 4147915. PMID 25161245.{{cite journal}}: CS1 maint: PMC format (link)
  31. ^ Zhang, Chao; Rabiee, Maryam; Sayyari, Erfan; Mirarab, Siavash (2018-05). "ASTRAL-III: polynomial time species tree reconstruction from partially resolved gene trees". BMC Bioinformatics. 19 (S6): 153. doi:10.1186/s12859-018-2129-y. ISSN 1471-2105. PMC 5998893. PMID 29745866. {{cite journal}}: Check date values in: |date= (help)CS1 maint: PMC format (link) CS1 maint: unflagged free DOI (link)
  32. ^ Yang, Ziheng (2015-10-01). "The BPP program for species tree estimation and species delimitation". Current Zoology. 61 (5): 854–865. doi:10.1093/czoolo/61.5.854. ISSN 2396-9814.
  33. ^ Flouri, Tomáš; Jiao, Xiyun; Rannala, Bruce; Yang, Ziheng (2018-10-01). Yoder, Anne D (ed.). "Species Tree Inference with BPP Using Genomic Sequences and the Multispecies Coalescent". Molecular Biology and Evolution. 35 (10): 2585–2593. doi:10.1093/molbev/msy147. ISSN 0737-4038. PMC 6188564. PMID 30053098.{{cite journal}}: CS1 maint: PMC format (link)
  34. ^ Liu, Liang; Yu, Lili; Edwards, Scott V (2010). "A maximum pseudo-likelihood approach for estimating species trees under the coalescent model". BMC Evolutionary Biology. 10 (1): 302. doi:10.1186/1471-2148-10-302. ISSN 1471-2148. PMC 2976751. PMID 20937096.{{cite journal}}: CS1 maint: PMC format (link) CS1 maint: unflagged free DOI (link)
  35. ^ Chifman, Julia; Kubatko, Laura (2014-12-01). "Quartet Inference from SNP Data Under the Coalescent Model". Bioinformatics. 30 (23): 3317–3324. doi:10.1093/bioinformatics/btu530. ISSN 1460-2059. PMC 4296144. PMID 25104814.{{cite journal}}: CS1 maint: PMC format (link)