# Ancestral reconstruction

Ancestral Reconstruction (also known as Character Mapping or Character Optimization) is a method by which one can attempt to understand the phenotypic and genetic states of organisms that lived millions of years ago.[1] This is desirable because it means that you can “fill in” gaps in phylogenetic trees, thus being able to more clearly see which organisms are closely related and evolved from one another and so forth. The idea is that modern sequences are essentially a variation of ancient ones. Therefore, if you can gain access to ancient sequences, you may be able to find out which other variations could have arisen from that sequence, and thus, which organisms may also be descendant from that sequence.[2] On a smaller scale, you can also attempt to track the changing of one character to another (example: fins turning to legs). This is done using statistical models. The idea was first proposed in 1963 by Zuckerkandl and Pauling. Overall, the consequence is that we can use the genetic information we already have through methods such as phylogenetics in order to determine the route that evolution has taken and when evolutionary events occurred.[3] One of the most prominent examples is tracing the evolution that took place from ape to man. More creatively it has been used to compare the catalytic properties of ancient versus modern proteins.[4]

## Methods

### Overview

There are three different ways by which you can go about ancestral reconstruction. They are: maximum parsimony, maximum likelihood, and the Bayesian Inference. All three involve complex statistical and/ or calculus equations. Maximum parsimony is the method that came about first. In this method, all evolutionary events are given a likelihood of 1, and results in a point product, which means that it just gives one answer as being correct. Maximum likelihood came about next, and was a continuation of maximum parsimony. Maximum likelihood accounts for what we know about the likelihood of various events, which is that they are not all equal. Finally, Bayesian Inference relates the conditional probability of an event to the likelihood of the tree, as well as the amount of uncertainty that is associated with that tree. Because this method accounts for the uncertainties, it yields a sample of trees that are viable.

### Maximum Parsimony

When using the maximum parsimony method you adopt the simplest solution when it comes to hypotheses.[5] In other words, you accept a phylogenetic tree which has the transition from one state to another represented with the least possible number of changes. Immediately the drawbacks of this method become clear. Natural selection and evolution do not work towards a goal, they simply experiment with changes and keep the ones that are good, while weeding out those that weren’t good, so thinking of evolution as something with a goal of getting to a certain end result as fast as possible is inaccurate. As far as the actual statistical method is concerned, there have been many problems pointed mostly due to the fact that using the Parsimony method means that you accept 6 general rules to be true, which is not always the case. The rules are: that the phylogenic tree you are using is correct, that you have all of the relevant data, in which no mistakes were made in coding, that all branches of the phylogenetic tree are equally likely to change, that the rate of evolution is slow, and that the chance of losing or gaining a characteristic is the same.[1] Of course, these are not always true. It is possible that some branches of the tree could be experiencing higher selection and change rates than others, perhaps due to changing ecological factors or any other reason. Furthermore, some periods of time represent more rapid evolution than others, when this happens Parsimony becomes inaccurate.[6] The Cambrian Explosion represented a period of time with a pronounced increase in the variation of organisms in phyla. One last problem is that when you are looking at a single character state the test will automatically assume that two organisms that share that characteristic will be more closely related than those who don’t; for example, just because dogs and apes have fur does not mean that they are more closely related than apes are to humans.

### Maximum likelihood

The method of maximum likelihood involves assuming the phenotypes that developed, the ones that you see, were those that were statistically most likely [7] The main difference between this and maximum parsimony is that the maximum likelihood test accounts for the fact that not all events are equally likely to happen. For example, a transition (genetics), which is a type of point mutation from one purine to another, or from one pyrimidine to another is much more likely to happen than a transversion, which is the chance of a purine being switched to a pyrimidine, or vice versa. This test, unlike maximum parsimony recognizes and accounts for those differences. However, just because some events are more likely than others does not mean that they always happen. We know that throughout evolutionary history there have been times when there was a large gap between what was most likely to happen, and what actually occurred. When this is the case, maximum parsimony may actually be more accurate because it is more willing to make large, unlikely leaps than maximum likelihood is. Maximum likelihood has been shown to be quite reliable in reconstructing character states however it does not do as good of a job at giving accurate estimations of the stability of proteins. Maximum likelihood always overestimates the stability of proteins, which makes sense since it assumes that the proteins that were made and used were the most stable and optimal.[5] There has been much debate as to the pros and cons surrounding this method. Nowadays, some have concluded that the test represents a good medium between accuracy and speed.[8] However, other studies have complained that maximum likelihood takes too much time and computational power to be useful.[9]

### Bayesian Inference

The Bayesian Inference is the method that many scholars will argue is the most accurate. Bayesian inference is not exclusively an evolutionary principle; it is actually a statistical tool based on Bayes’ theory in which you can accurately combine old information with new hypotheses. In the case of evolution, it combines the likelihood of the data observed with the likelihood that the events happened in the order they did, as well as recognizes the potential for error and uncertainty. Overall, it does the best job at recreating sequences as well as protein stability. This is likely because it is able to take multiple points of reference and logically combine them.[10] Unlike the other tests, which would result in a single data point for an outcome, the Bayesian inference gives you distribution of possible trees. This is the advantage that Bayesian inference has over both Maximum Likelihood and Maximum Parsimony, this method accounts for uncertainty, and thus can result in a more accurate estimation at the variance in the possible outcomes. Even better, these results are actually relatively simple to interpret.[11]

## Trait reconstruction

Ancestral reconstruction is widely used to infer the ecological, phenotypic, or biogeographic traits associated with ancestral nodes in a phylogenetic tree. Methods for ancestral reconstruction include parsimony, maximum likelihood, and Bayesian inference. All three methods have their own benefits and drawbacks. More recently Bayesian inference is preferred over the other two as maximum likelihood and parsimony both usually overestimate protein stability.

## DNA and Protein reconstruction

Originally proposed by Pauling and Zuckerkandl in 1963[12] the reconstruction of ancient proteins and DNA sequences has only recently become a significant scientific endeavor. The developments of extensive genomic sequence databases in conjunction with advances in biotechnology and phylogenetic inference methods have made ancestral reconstruction cheap, fast, and scientifically practical.

Ancestral protein and DNA reconstruction allows for the recreation of protein and DNA evolution in the laboratory so that it can be studied directly.[13] With respect to proteins, this allows for the investigation of the evolution of present-day molecular structure and function. Additionally, ancestral protein reconstruction can lead to the discoveries of new biochemical functions that have been lost in modern proteins.[14][15] It also allows insights into the biology and ecology of extinct organisms.[16] Although the majority of ancestral reconstructions have dealt with proteins, it has also been used to test evolutionary mechanisms at the level of bacterial genomes[17] and primate gene sequences.[18]

In summary, ancestral reconstruction allows for the study of evolutionary pathways, adaptive selection, and functional divergence of the evolutionary past. For a review of biological and computational techniques of ancestral reconstruction see Chang et al..[13] For criticism of ancestral reconstruction computation methods see Williams P.D. et al..[19]

## Genome reconstruction

At chromosomal level, ancestral reconstruction tries to restore the genome rearrangements happened during the evolution. Sometimes it's also called karyotype reconstruction. Chromosome painting is currently the main experimental technique. See refs. Wienberg et al. [20] and Froenicke et al..[21]

Recently, researchers have developed computational methods to reconstruct the ancestral karyotype by taking advantage of comparative genomics. See refs. Murphy et al. [22] and Ma et al..[23]

## References

1. ^ a b Omland, K.E. 1999. The Assumptions and Challenges of Ancestral State Reconstructions. Symposium: Reconstructing Ancestral Character States: 604.
2. ^ Cai, W.; Pei, J.; Grishin, N. (2004). "Reconstruction of ancestral protein sequence and its applications". BMC Evolutionary Biology 4: 33. doi:10.1186/1471-2148-4-33.
3. ^ Martins P. 1996. Phylogenies and the comparative method in animal behavior. Oxford University Press.
4. ^ Ronquist, F (2004). "Bayesian inference of character evolution". Trends in Ecology & Evolution 19: 9.
5. ^ a b Williams, P.D.; Pollock, D.D.; Blackburne, B.P.; Goldstein, R.A. (2006). "Assessing the Accuracy of Ancestral Protein Reconstruction Methods". PLoS Computational Biology 2: 69. doi:10.1371/journal.pcbi.0020069. PMC 1480538. PMID 16789817.
6. ^ Mooers, A.; Schluter, D. (1999). "Reconstructing Ancestor States with Maximum Likelihood: Support for One- and Two- Rate Models". Systematic Biology 48: 3.
7. ^ Pagel, M (1999). "The Maximum Likelihood Approach to Reconstructing Ancestral Character States of Discrete Characters on Phylogenies". Systematic Biology 48: 3.
8. ^ Guindon, S.; Gascuel, O. (2003). "A Simple, Fast, and Accurate Algorithm to Estimate Large Phylogenies by Maximum Likelihood". Systematic Biology 52: 5.
9. ^ Doornik, J.; Ooms, M. (2003). "Computational aspects of maximum likelihood estimation of autoregressive fractionally integrated moving average models". Computational Statistics & Data Analysis 42 (3).
10. ^ Cunningham, C.; Omland, K.; Oakley, T. (1998). "Reconstructing ancestral character states: a critical reappraisal". Tree 13: 9.
11. ^ Huelsenbeck, J.; Ronquist, F. (2001). "Bayesian inference of phylogenetic trees". Bioinformatics 17: 8.
12. ^ Pauling L. and Zuckerkandl E. (1963). "Chemical paleogenetics, molecular restoration studies of extinct forms of life". Acta chemica Scandinavica 17 (89): 9–16. doi:10.3891/acta.chem.scand.17s-0009.
13. ^ a b Chang S.W.; Ugalde J.A.; Matz M.V. (2005). "Applications of Ancestral Protein Reconstruction in Understanding Protein Function: GFP-Like Proteins". Methods in Enzymology. Methods in Enzymology 395: 652–670. doi:10.1016/S0076-6879(05)95034-9. ISBN 978-0-12-182800-4. PMID 15865989.
14. ^ Jermann T. M. et al. (1995). "Reconstructing the evolutionary history of the artiodactyl ribonuclease superfamily". Nature 374 (6517): 57–59. doi:10.1038/374057a0. PMID 7532788.
15. ^ Sadqi, Mourad; Eva de Alba; Raúl Pérez-Jiménez; Jose M. Sanchez-Ruiz; Victor Muñoz (January 2009). "A designed protein as experimental model of primordial folding". Proc Natl Acad Sci U S A 106 (11): 4127–4132. doi:10.1073/pnas.0812108106. PMC 2647338. PMID 19240216.
16. ^ Chang B.S. et al. (2002). "Recreating a functional ancestral archosaur visual pigment". Molecular Biology and Evolution 19 (9): 1483–1489. doi:10.1093/oxfordjournals.molbev.a004211. PMID 12200476.
17. ^ Zhang C. et al. (2003). "Genome Diversification in Phylogenetic Lineages I and II of Listeria monocytogenes: Identification of Segments Unique to Lineage II Populations". Journal of Bacteriology 185 (18): 5573–5584. doi:10.1128/JB.185.18.5573-5584.2003. PMC 193770. PMID 12949110.
18. ^ Krishnan N.M. et al. (2004). "Ancestral sequence reconstruction in primate mitochondrial DNA: compositional bias and effect on functional inference". Molecular Biology and Evolution 21 (10): 1871–1883. doi:10.1093/molbev/msh198. PMID 15229290.
19. ^ Williams P.D. et al. (2006). "Assessing the Accuracy of Ancestral Protein Reconstruction Methods". PLoS Computational Biology 2 (6): e69. doi:10.1371/journal.pcbi.0020069. PMC 1480538. PMID 16789817.
20. ^ Wienberg, J. et al. (2004). "The evolution of eutherian chromosomes". Curr Opin Genet Dev 14 (6): 657–666. doi:10.1016/j.gde.2004.10.001. PMID 15531161.
21. ^ Froenicke, L. et al. (2006). "Are molecular cytogenetics and bioinformatics suggesting diverging models of ancestral mammalian genomes?". Genome Res 16 (3): 306–310. doi:10.1101/gr.3955206. PMC 1415215. PMID 16510895.
22. ^ Murphy, W. J. et al. (2005). "Dynamics of mammalian chromosome evolution inferred from multispecies comparative maps". Science 309 (5734): 613–617. doi:10.1126/science.1111387. PMID 16040707.
23. ^ Ma, J. et al. (2006). "Reconstructing contiguous regions of an ancestral genome". Genome Res 16 (12): 1557–1565. doi:10.1101/gr.5383506. PMC 1665639. PMID 16983148.