BAli-Phy

From Wikipedia, the free encyclopedia
Jump to: navigation, search
BAli-Phy
Stable release 2.3.4 / 24 May 2014; 6 months ago (2014-05-24)
Written in C++
Operating system UNIX, Linux, Mac, MS-Windows
Type Bioinformatics tool
Licence GPLv2
Website website

BAli-Phy is a free software program for simultaneously estimating a multiple sequence alignment and its phylogenetic tree. BAli-Phy achieves high accuracy in alignment estimation by using information from the co-estimated phylogeny. BAli-Phy takes alignment uncertainty into account while estimating the phylogeny by averaging over possible alignments. Unlike most phylogeny inference software, input sequences need not be aligned. This differs from traditional approaches to alignment and phylogeny estimation, which first estimate the alignment without a high-quality tree estimate, and then estimate the tree given alignment.

BAli-Phy produces a Bayesian posterior distribution on both the alignments and the tree. The software shows uncertainty in both the alignment and the tree. BAli-Phy uses Markov chain Monte Carlo methods for estimation. It can take several days to run.

Alignment Uncertainty[edit]

Alignment uncertainty stems from two main sources: near-optimal alignments and evolutionary parameter uncertainty. Evolutionary parameters include branch lengths, substitution rates, insertion/deletion rates, and the phylogeny itself. If the exact value for these parameters is unknown, and the alignment estimate is sensitive to the parameter, then the alignment cannot be known with confidence.

Even when evolutionary parameters are fully known, many different alignments may be optimal, or nearly optimal. In this case, the researcher cannot have confidence in any single alignment, but must average over the cloud of near-optimal alignments.

BAli-Phy can handle both near-optimal alignment uncertainty and evolutionary parameter uncertainty by integrating over possible alignments and parameter values.

Input/Output[edit]

BAli-Phy accepts nucleotide, amino acid, and codon sequences in FASTA format. Input sequences need not be aligned. Ambiguous nucleotides such as R and Y are supported, as are the ambiguous amino acids B, Z, and J.

Trees are output in Newick format. Alignments are output in FASTA format. Output alignments include homology information for sequences at internal nodes of the tree.

See also[edit]

External links[edit]