Bayesian inference in phylogeny

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Bayesian Inference of Phylogeny uses a likelihood function to create a quantity called the posterior probability of trees using a model of evolution, and based on some prior probabilities, producing the most likely phylogenetic tree for the given data. The Bayesian approach has become popular due to advances in computing speeds and the integration of Markov chain Monte Carlo (MCMC) algorithms. Bayesian inference has a number of applications in molecular phylogenetics, systematics and evolutionary biology.

Bayesian Inference of Phylogeny Background and Bases[edit]

Bayesian Inference of Phylogeny has increased in popularity amongst phylogeneticists since the early 1990's. Based on Bayes' theorem it combines the prior probability of a tree (Pr[Tree]) with the likelihood (Pr [Data|Tree]) to produce a posterior probability distribution on trees (Pr [Tree|Data]). This posterior probability of a tree will indicate the probability of a tree to be correct, being the tree with the highest posterior probability the one chosen to represent best a phylogeny. Some of the advantages over traditional parsimony and maximum likelihood methods are the possibility of account for the phylogenetic uncertainty, use of prior information and incorporation complex models of evolution that limit computational analyses for traditional methods. Although overcoming complex analytical operations the posterior probability still involves a summation over all trees and, for each tree, integration over all possible combinations of substitution model parameter values and branch length. The development of numerical methods to overcome these issues revolutionized Bayesian Inference, the Markov chain Monte Carlo (MCMC) being the method most widely used.[1] The MCMC algorithm can be described in three steps: first, using a stochastic mechanism, a new state for the Markov chain is proposed. Second, the probability of this new state to be correct is calculated. Third, a new random variable (0,1) is proposed. If this new value is less than the acceptance probability the new state is accepted and the state of the chain is updated. This process is run for either thousands or millions of time. The amount of time a single tree is visited during the course of the chain is just a valid approximation of its posterior probability.

Recall that for Bayesian inference:

 p(\theta | D) = \frac{p(D|\theta)p(\theta)}{p(D)}\

The denominator p(D)\ is the marginal probability of the data, averaged over all possible parameter values weighted by their prior distribution. Formally,

p(D) = \int_\Theta  p(D|\theta)p(\theta)d\theta\

where \Theta\ is the parameter space for \theta\ .

In the original Metropolis algorithm,[2] given a current \theta\ -value x\ , and a new \theta\ -value y\ , the new value is accepted with probability:

h(y)/h(x) = \frac{p(D|y)p(y)}{p(D|x)p(x)}\

Metropolis-Hasting algorithms[edit]

The LOCAL algorithm of Larget and Simon[edit]

The LOCAL algorithm begins by selecting an internal branch of the tree at random. The nodes at the ends of this branch are each connected to two other branches. One of each pair is chosen at random. Imagine taking these three selected edges and stringing them like a clothesline from left to right, where the direction (left/right) is also selected at random. The two endpoints of the first branch selected will have a sub-tree hanging like a piece of clothing strung to the line. The algorithm proceeds by multiplying the three selected branches by a common random amount, akin to stretching or shrinking the clothesline. Finally the leftmost of the two hanging sub-trees is disconnected and reattached to the clothesline at a location selected uniformly at random. This is the candidate tree.

Suppose we began by selecting the internal branch with length t_8\ (in Figure (a) (to be added)) that separates taxa A\ and B\ from the rest. Suppose also that we have (randomly) selected branches with lengths t_1\ and t_9\ from each side, and that we oriented these branches as shown in Figure(b). Let m = t_1+t_8+t_9\ , be the current length of the clothesline. We select the new length to be m^{\star} = m\exp(\lambda(U_1-0.5))\ , where U_1\ is a uniform random variable on (0,1)\ . Then for the LOCAL algorithm, the acceptance probability can be computed to be:

\frac{h(y)}{h(x)} \times \frac{{m^{\star}}^3}{m^3}\

Assessing convergence[edit]

Suppose we want to estimate a branch length of a 2-taxon tree under JC, in which n_1 sites are unvaried and n_2 are variable. Assume exponential prior distribution with rate \lambda\ . The density is p(t) = \lambda e^{-\lambda t}\ . The probabilities of the possible site patterns are:


for unvaried sites, and


Thus the unnormalized posterior distribution is:

 h(t) =  \left(1/4\right)^{n_1+n_2}\left(1/4+3/4{e^{-4/3t}}^{n_1}\right)\

or, alternately,

 h(t) = \left(1/4-1/4{e^{-4/3t}}^{n_2}\right)(\lambda e^{-\lambda t})\

Update branch length by choosing new value uniformly at random from a window of half-width w\ centered at the current value:

 t^\star = |t+U|\

where U\ is uniformly distributed between -w\ and w\ . The acceptance probability is:


Example: n_1 = 70\ , n_2 = 30\ . We will compare results for two values of w\ , w = 0.1\ and w = 0.5\ . In each case, we will begin with an initial length of 5\ and update the length 2000\ times. (See Figure 3.2 (to be added) for results.)

Metropolis-coupled MCMC (Geyer)[edit]

If the target distribution has multiple peaks, separated by low valleys, the Markov chain may have difficulty in moving from one peak to another. As a result, the chain may get stuck on one peak and the resulting samples will not approximate the posterior density correctly. This is a serious practical concern for phylogeny reconstruction, as multiple local peaks are known to exist in the tree space during heuristic tree search under maximum parsimony (MP), maximum likelihood (ML), and minimum evolution (ME) criteria, and the same can be expected for stochastic tree search using MCMC. Many strategies have been proposed to improve mixing of Markov chains in presence of multiple local peaks in the posterior density. One of the most successful algorithms is the Metropolis-coupled MCMC (or \mathrm{MC}^3\ ).

In this algorithm, m\ chains are run in parallel, with different stationary distributions \pi_j(.)\ , j = 1, 2, \ldots, m\ , where the first one, \pi_1 = \pi\ is the target density, while \pi_j\ , j = 2, 3, \ldots, m\ are chosen to improve mixing. For example, one can choose incremental heating of the form:

 \pi_j(\theta) = \pi(\theta)^{1/[1+\lambda(j-1)]}, \ \ \lambda > 0,

so that the first chain is the cold chain with the correct target density, while chains 2, 3, \ldots, m are heated chains. Note that raising the density \pi(.) to the power 1/T\ with T>1\ has the effect of flattening out the distribution, similar to heating a metal. In such a distribution, it is easier to traverse between peaks (separated by valleys) than in the original distribution. After each iteration, a swap of states between two randomly chosen chains is proposed through a Metropolis-type step. Let \theta^{(j)}\ be the current state in chain j\ , j = 1, 2, \ldots, m\ . A swap between the states of chains i\ and j\ is accepted with probability:

 \alpha = \frac{\pi_i(\theta^{(j)})\pi_j(\theta^{(i)})}{\pi_i(\theta^{(i)})\pi_j(\theta^{(j)})}\

At the end of the run, output from only the cold chain is used, while those from the hot chains are discarded. Heuristically, the hot chains will visit the local peaks rather easily, and swapping states between chains will let the cold chain occasionally jump valleys, leading to better mixing. However, if \pi_i(\theta)/\pi_j(\theta)\ is unstable, proposed swaps will seldom be accepted. This is the reason for using several chains which differ only incrementally. (See Figure3.3 (to be added)).

An obvious disadvantage of the algorithm is that m\ chains are run and only one chain is used for inference. For this reason, \mathrm{MC}^3\ is ideally suited for implementation on parallel machines, since each chain will in general require the same amount of computation per iteration.

Brief comparison to Parsimony and Maximum Likelihood[edit]

Pitfalls and controversies[edit]

  • Boostrap values vs Posterior Probabilities. It has been observed that bootstrap support value, calculated under parsimony or maximum likelihood, tend to be lower than the posterior probabilities obtained by Bayesian inference.[3] This fact leads to a number of question such as: Do posterior probabilities lead to overconfidence in the results? Are boostrap values more robust than posterior probabilities?
  • Controversy of using prior probabilities. Using prior probabilities for Bayesian analysis has been seen by many as an advantage as it will provide a hypothesis a more realistic view of the real world. However some biologists argue about the subjectivity of Bayesian posterior probabilities after the incorporation of these priors.
  • Model choice. The results of the Bayesian analysis of a phylogeny are directly correlated to the model of evolution chosen so it is important to choose a model that fits the observed data, otherwise inferences in the phylogeny will be erroneous. Many scientists have raised questions about the interpretation of Bayesian inference when the model is unknown or incorrect. For example, an oversimplified model might give higher posterior probabilities[4][5] or simple evolutionary model are associated to less uncertainty than that from boostrap values.[6]

MRBAYES software for Bayesian Inference of Phylogeny[edit]

MrBayes is a free software that performs Bayesian inference of phylogeny. Originally written by John P. Huelsenbeck and Frederik Ronquist in 2001.[7] As Bayesian methods increased in popularity MrBayes became one of the softwares of choice for many molecular phylogeneticists. It is offered for Macintosh, Windows, and UNIX operating systems and it has a command -line interface. The program uses the standard MCMC algorithm as well as the Metropolis coupled MCMC variant. MrBayes reads aligned matrices of sequences (DNA or amino acids) in the standard NEXUS format.[8]

MrBayes uses MCMC to approximate the posterior probabilities of trees. The user can change assumptions of the substitution model, priors and the details of the (MC)3 analysis. It also allows the user to remove and add taxa and characters to the analysis. The program uses the most standard model of DNA substitution[9] and offers different methods for relaxing the assumption of equal rates across sites.[10] MrBayes is also able to infer ancestral states accommodating uncertainty to the phylogenetic tree and model parameters.

MrBayes 3 [11] was a completely reorganized and restructured version of the original MrBayes. The main novelty was the ability of the software to accommodate heterogeneity of data sets. This new framework allows the user to mixed models and take advantages of the efficiency of Bayesian MCMC analysis when dealing with composite datasets. It uses the Metropolis-Coupling MCMC by default.

MrBayes 3.2 new version of MrBayes was released in 2012.[12] The new version allows the users to run multiple analyses in parallel. It also provides faster likelihood calculations and allow these calculations to be delegated to graphics processing unites (GPUs). Version 3.2 provides wider outputs options compatible with FigTree and other tree viewers.

List of phylogenetics softwares for Bayesian Inference of Phylogeny[edit]

This list include some of the most common phylogenetic softwares used for inferring phylogenies under a Bayesian framework. Some of them do not use exclusively Bayesian methods.

Name Description Method Author
Armadillo Workflow Platform Workflow platform dedicated to phylogenetic and general bioinformatic analysis Inference of phylogenetic trees using Distance, Maximum Likelihood, Maximum Parsimony, Bayesian methods and related workflows E. Lord, M. Leclercq, A. Boc, A.B. Diallo and V. Makarenkov
Bali-Phy Simultaneous Bayesian inference of alignment and phylogeny Bayesian inference, alignment as well as tree search M.A. Suchard, B. D. Redelings
BATWING Bayesian Analysis of Trees With Internal Node Generation Bayesian inference, demographic history, population splits I. J. Wilson, Weale, D.Balding
Bayes Phylogenies Bayesian inference of trees using Markov Chain Monte Carlo methods Bayesian inference, multiple models, mixture model (auto-partitioning) M. Pagel, A. Meade
BEAST Bayesian Evolutionary Analysis Sampling Trees Bayesian inference, relaxed molecular clock, demographic history A. J. Drummond, A. Rambaut & M. A. Suchard
BUCKy Bayesian concordance of gene trees Bayesian concordance using modified greedy consensus of unrooted quartets C. Ané, B. Larget, D.A. Baum, S.D. Smith, A. Rokas and B. Larget, S.K. Kotha, C.N. Dewey, C. Ané
Geneious (MrBayes plugin) Geneious provides genome and proteome research tools Neighbor-joining, UPGMA, MrBayes plugin, PHYML plugin, RAxML plugin, FastTree plugin, GARLi plugin, PAUP* Plugin A. J. Drummond,M.Suchard,V.Lefort et al.
PAML Phylogenetic analysis by maximum likelihood Maximum likelihood and Bayesian inference Z. Yang
TOPALi Phylogenetic inference Phylogenetic model selection, Bayesian analysis and Maximum Likelihood phylogenetic tree estimation, detection of sites under positive selection, and recombination breakpoint location analysis I.Milne, D.Lindner, et al.

Applications of Bayesian Inference of Phylogeny[edit]

Bayesian Inference has extensively been used by molecular phylogeneticists for a wide number of applications. Some of these include:

  • Inference of phylogenies
  • Inference and evaluation of uncertainty of phylogenies
  • Inference of ancestral states and character evolution
  • Molecular dating analysis
  • Model dynamics of species diversification and extinction

External links[edit]



  1. ^ Larget, B., and D. L. Simon. 1999. Markov chain Monte Carlo algorithms for the Bayesian analysis of phylogenetic trees. Mol. Biol. Evol. 16:750–759.
  2. ^ Metropolis, N., A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller. 1953. Equation of state calculations by fast computing machines. J. Chem. Phys. 21:1087–1092.
  3. ^ Garcia-Sandoval, R. 2014. Why some clades have low bootstrap frequencies and high Bayesian posterior probabilities. Israel Journal of Ecology & Evolution 60 (1): 41-44.
  4. ^ Suzuki, Y. et al. 2002. Overcredibility of molecular phylogenies obtained by Bayesian phylogenetics. Proc. Natl. Acad. Sci. U. S. A. 99, 16138–16143
  5. ^ Erixon, P. et al. 2003. Reliability of Bayesian posterior probabilities and bootstrap frequencies in phylogenetics. Syst. Biol. 52, 665–673
  6. ^ Nylander, J. A. A. 2004. MrModeltest 2.0. Program distributed by the author. Evolutionary Biology Centre, Uppsala University. Norbyvagen 18 D. SE-752 36, Uppsala, Sweden.
  7. ^ Huelsenbeck, J. P. and F. Ronquist. 2001. MrBayes: Bayesian inference of phylogeny. Bioinformatics 17:754–755.
  8. ^ Maddison,D.R., Swofford,D.L. And Maddison,W.P. 1997. NEXUS: an extensible file format for systematic information. Syst. Biol., 46: 590-621.
  9. ^ Yang, Z. 1994. Estimating the pattern of nucleotide substitution. J. Mol. Evol. 39: 105-111.
  10. ^ Yang, Z. 1993. Maximum likelihood estimation of phylogeny from DNA sequences when substitutions rates differ over sites. Mol. Biol. Evol. 10: 1396-1401.
  11. ^ Ronquist F., Huelsenbeck J.P. 2003. Mrbayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics. 19:1572–1
  12. ^ Ronquist F., TeslenkoM.,Van Der Mark P.,Ayres D.L., DarlingA.,Hhna S., Larget B., Liu L., Suchard M.A., Huelsenbeck J. 2012. Mrbayes 3.2: Efficient bayesian phylogenetic inference and model choice across a large model space. Syst. Biol. 61:539–542.