Long branch attraction
||This article includes a list of references, but its sources remain unclear because it has insufficient inline citations. (October 2014)|
In phylogenetics, long branch attraction (LBA) is a form of systematic error whereby distantly related lineages are incorrectly inferred to be closely related. LBA arises when the amount of molecular or morphological change accumulated within a lineage is sufficient to cause that lineage to appear similar (thus closely related) to another long-branched lineage, solely because they have both undergone a large amount of change, rather than because they are related by descent. Such bias is more common when the overall divergence of some taxa results in long branches within a phylogeny. Long-branches are often attracted to the base of a phylogenetic tree, because the lineage included to represent an outgroup is often also long-branched. The frequency of true LBA is unclear and often debated. Although often viewed as a failing of parsimony-based methodology, LBA can result from a variety of scenarios and be inferred under multiple analysis paradigms.
LBA was first recognized as problematic when analyzing discrete morpological character sets under parsimony criteria, however Maximum Likelihood analyses of DNA or protein sequences are also susceptible. A simple hypothetical example can be found in Felsenstein 1978 where it is demonstrated that for certain unknown "true" trees, some methods can show bias for grouping long branches, ultimately resulting in the inference of a false sister relationship. Often this is because convergent evolution of one or more characters included in the analysis has occurred in multiple taxa. Although they were derived independently, these shared traits can be misinterpreted in the analysis as being shared due to common ancestry.
In phylogenetic and clustering analyses, LBA is a result of the way clustering algorithms work: terminals or taxa with many autapomorphies (character states unique to a single branch) may by chance exhibit the same states as those on another branch (homoplasy). A phylogenetic analysis will group these taxa together as a clade unless other synapomorphies outweigh the homoplastic features to group together true sister taxa.
These problems may be minimized by using methods that correct for multiple substitutions at the same site, by adding taxa related to those with the long branches that add additional true synapomorphies to the data, or by using alternative slower evolving traits (e.g. more conservative gene regions).
The result of LBA in evolutionary analyses is that rapidly evolving lineages may be inferred to be closely related, regardless of their true relationships. For example, in DNA sequence-based analyses, the problem arises when sequences from two (or more) lineages evolve rapidly. There are only four possible nucleotides and when DNA substitution rates are high, the probability that two lineages will evolve the same nucleotide at the same site increases. When this happens, parsimony may erroneously interpret this homoplasy as a synapomorphy (i.e., evolving once in the common ancestor of the two lineages).
The opposite effect may also be observed, in that if two (or more) branches exhibit particularly slow evolution among a wider, fast evolving group, those branches may be misinterpreted as closely related. As such, "long branch attraction" can in some ways be better expressed as "branch length attraction". However, it is typically long branches that exhibit attraction.
The recognition of long-branch attraction implies that there is some other evidence that suggests that the phylogeny is incorrect. For example morphological data may suggest that taxa marked as closely related are not truly sister taxa. Hennig's Auxiliary Principle suggests that synapomorphies should be viewed as de facto evidence of grouping unless there is specific contrary evidence (Hennig, 1966; Schuh and Brower, 2009).
One example of this phenomenon is the relationship between four skippers (butterflies): Agathymus mariae, Ancyloxpha numitor, Thorybes pylades, and Pyrrhopyge zenodorus. When comparing these species scientists used a multitude of procedures to compare one to another in order to get the most accurate results.
They began with analyzing a certain length of DNA in each species. Compared side by side they counted the matching nucleotides in each strand and came up with a phylogenetic tree based on the similarity shared between each DNA strand. This resulted in a tree supporting the close relationship between A. mariae and P. zenodorus.
The next step in the process involved another reconstruction method, distance-based. The amount of expected changes within each given DNA sequence was estimated. The species with similar amounts of changes were grouped together and were calculated to have a bootstrap value of 80%, also supporting tree 1.
The next method used in this procedure is called Maximum likelihood. So far in the data analysis, the trees have been in parsimony, meaning they have been the simplest forms. However, maximum likelihood is a process that takes into account what changes are the most likely to occur. It is not necessarily the easiest tree but is one that is the mostly likely to statistically occur. The maximum likelihood tree supports at tree linking A. numitor and A. mariae. This is the first method to have results that conflicts with that of the previously executed methods.
Another method to compare results is called Bayesian method. This method is very similar to maximum likelihood. It deals with the statistical data and creates a tree that represents the most likely occurrence. It differs from maximum likelihood in that it predicts how likely it would happen in the future. In this data set analysis, the Bayesian method resulted in a tree that also supports the close relationship of A. numitor and A. mariae.
When all this data is gathered and compared, we find that the second relationship is the most logical relationship. This experiment with skippers supports the importance of deciphering all the data before concluding that a certain tree is the correct one. Morphological traits are very important aspects of constructing trees, but parsimony is not always correct. It is helpful in using a few methods to determine an accurate tree (Grishin 2009).
- Bergsten, J. (2005). A review of long‐branch attraction. Cladistics, 21(2), 163-193.
- Anderson, F. E., & Swofford, D. L. (2004). Should we be worried about long-branch attraction in real data sets? Investigations using metazoan 18S rDNA. Molecular phylogenetics and evolution, 33(2), 440-451.
- Huelsenbeck, J. P. (1997). Is the Felsenstein zone a fly trap?. Systematic Biology, 46(1), 69-74.
- Felsenstein, J. (1978). Cases in which parsimony or compatibility methods will be positively misleading. Systematic Biology, 27(4), 401-410.
- Bergsten, J. (2005): A review of long-branch attraction. Cladistics 21(2): 163-193. PDF fulltext
- Felsenstein, J. (2004): Inferring Phylogenies. Sinauer Associates, Sunderland, MA.
- Hennig, W. (1966): Phylogenetic Systematics. University of Illinois Press, Urbana, IL.
- Schuh, R. T. and Brower, A. V. Z. (2009): Biological Systematics: Principles and Applications, (2nd edn.) Cornell University Press, Ithaca, NY.
- Bergsten J. (2005): "A review of long-branch attraction". Blackwell Publishing [cited 2014 Oct 1] 21(2):163-193. Available from: http://onlinelibrary.wiley.com/doi/10.1111/j.1096-0031.2005.00059.x/pdf
- Grishin, Nick V. "Long Branch Attraction." Long Branch Attraction. Butterflies of America, 17 Aug. 2009. Web. 15 Sept. 2014. <http://butterfliesofamerica.com/knowhow/LBA.htm>.