Ancestral sequence reconstruction

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

Ancestral sequence reconstruction (ASR) – also known as ancestral gene/sequence reconstruction/resurrection – is a technique used in the study of molecular evolution. The method consists of the synthesis of an ancestral gene and expression of the corresponding ancestral protein.[1] The idea of protein 'resurrection' was suggested in 1963 by Pauling and Zuckerkandl.[2] Some early efforts were made in the eighties-nineties, led by the laboratory of Steven A. Benner, showing the potential of this technique – one that only started to be fulfilled in the post-genomic era.[3] Thanks to the improvement of algorithms and of better sequencing and synthesis techniques, the method was developed further in the early 2000s to allow the resurrection of a greater variety of and much more ancient genes.[4] Over the last decade, ancestral protein resurrection has developed as a strategy to reveal the mechanisms and dynamics of protein evolution.[5]


An illustration of a phylogenetic tree and how it plays in conceptualising how ASR is conducted.

Unlike conventional evolutionary and biochemical approaches to studying proteins, i.e. the so-called horizontal comparison of related protein homologues from different branch ends of the tree of life; ASR probes the statistically inferred ancestral proteins within the nodes of the tree – in a vertical manner (see diagram, right). This approach gives access to protein properties that may have transiently arisen over evolutionary time and has recently been used as a way to infer the potential selection pressures that resulted in the present day sequences seen today. ASR has been used to probe the causative mutation that resulted in a protein's neofunctionalization after duplication by first determining that said mutation was located between ancestors '5' and '4' on the diagram (illustratively) using functional assays.[6] In the field of protein biophysics, ASR has also been used to study the development of a protein's thermodynamic and kinetic landscapes over evolutionary time as well as protein folding pathways by combining many modern day analytical techniques such as HX/MS.[7] These sort of insights are typically inferred from several ancestors reconstructed along a phylogeny – referring to the previous analogy, by studying nodes higher and higher (further and further back in evolutionary time) within the tree of life.[8]

Most ASR studies are conducted in vitro, and have revealed ancestral protein properties that seem to be evolutionarily desirable traits – such as increased thermostability, catalytic activity and catalytic promiscuity. These data have been accredited to artifacts of the ASR algorithms, as well as indicative illustrations of ancient Earth's environment – often, ASR research must be complemented with extensive controls (usually alternate ASR experiments) to mitigate algorithmic error. It should also be noted that not all studied ASR proteins exhibit this so-called 'ancestral superiority'.[9] The nascent field of 'evolutionary biochemistry' has been bolstered by the recent increase in ASR studies using the ancestors as ways to probe organismal fitness within certain cellular contexts – effectively testing ancestral proteins in vivo.[8] Due to inherent limitations in these sorts of studies – primarily being the lack of suitably ancient genomes to fit these ancestors in to, the small repertoire of well categorised laboratory model systems, and the inability to mimic ancient cellular environments; very few ASR studies in vivo have been conducted. Despite the above mentioned obstacles, preliminary insights into this avenue of research from a 2015 paper, have revealed that observed 'ancestral superiority' in vitro were not recapitulated in vivo of a given protein.[10] ASR presents one of a few mechanisms to study biochemistry of the Precambrian era of life (>541Ma) and is hence often used in 'paleogenetics'; indeed Zuckerandl and Pauling originally intended ASR to be the starting point of a field they termed 'Paleobiochemistry'.


Several related homologues of the protein of interest are selected and aligned in a multiple sequence alignment (MSA), a 'phylogenetic tree' is constructed with statistically inferred sequences at the nodes of the branches. It is these sequences that are the so-called 'ancestors' – the process of synthesising the corresponding DNA, transforming it into a cell and producing a protein is the so-called 'reconstruction'. Ancestral sequences are typically calculated by maximum likelihood, however Bayesian methods are also implemented. Because the ancestors are inferred from a phylogeny, the topology and composition of the phylogeny plays a major role in the output ASR sequences. Given that there is much discourse and debate over how to construct phylogenies – for example whether or not thermophilic bacteria are basal or derivative in bacterial evolution – many ASR papers construct several phylogenies with differing topologies and hence differing ASR sequences. These sequences are then compared and often several (~10) are expressed and studied per phylogenetic node. ASR does not claim to recreate the actual sequence of the ancient protein/DNA, but rather a sequence that is likely to be similar to the one that was indeed at the node. This is not considered a shortcoming of ASR as it fits into the 'neutral network' model of protein evolution, whereby at evolutionary junctions (nodes) a population of genotypically different but phenotypically similar protein sequences existed in the extant organismal population. Hence, it is possible that ASR would generate one of the sequences of a node's neutral network and while it may not represent the genotype of the last common ancestor of the modern day sequences, it does likely represent the phenotype.[8] This is supported by the modern day observation that many mutations in a protein's non-catalytic/functional site cause minor changes in biophysical properties. Hence, ASR allows one to probe the biophysical properties of past proteins and is indicative of ancient genetics.

Maximum likelihood (ML) methods work by generating a sequence where the residue at each position is predicted to be the most likely to occupy said position by the method of inference used – typically this is a scoring matrix (similar to those used in BLASTs or MSAs) calculated from extant sequences. Alternate methods include maximum parsimony (MP) that construct a sequence based on a model of sequence evolution – usually the idea that the minimum number of nucleotidal sequence changes represents the most efficient route for evolution to take and by Occam's razor is the most likely. MP is often considered the least reliable method for reconstruction as it arguably oversimplifies evolution to a degree that is not applicable on the billion year scale. Another method involves the consideration of residue uncertainty – so-called Bayesian methods – this form of ASR is sometimes used to compliment ML methods but typically produces more ambiguous sequences. In ASR, the term 'ambiguity' refers to residue positions where no clear substitution can be predicted – often in these cases, several ASR sequences are produced, encompassing most of the ambiguities and compared to one-another. ML ASR often needs complimenting experiments to indicate that the derived sequences are more than just consensuses of the input sequences. This is particularly necessary in the observation of 'Ancestral Superiority'.[7] In the trend of increasing thermostability, one explanation is that ML ASR creates a consensus sequence of several different, parallel mechanisms evolved to confer minor protein thermostability throughout the phylogeny – leading to an additive effect resulting in 'superior' ancestral thermostability.[11] The expression of consensus sequences and parallel ASR via non-ML methods are often required to disband this theory per experiment. One other concern raised by ML methods is that the scoring matrices are derived from modern sequences and particular amino acid frequencies seen today may not be the same as in Precambrian biology, resulting in skewed sequence inference. Several studies have attempted to construct ancient scoring matrices via various methodologies and have compared the resultant sequences and their protein's biophysical properties. While these modified sequences result in somewhat different ASR sequences, the observed biophysical properties did not seem to vary outside from experimental error.[12] Because of the 'holistic' nature of ASR and the intense complexity that arises when one considers all the possible sources of experimental error – the experimental community considers the ultimate measurement of ASR reliability to be the comparison of several alternate ASR reconstructions of the same node and the identification of similar biophysical properties. While this method does not offer a robust statistical, mathematical measure of reliability it does build off of the fundamental idea used in ASR that individual amino acid substitutions do not cause significant biophysical property changes in a protein – a tenant that must be held true in order to be able to overcome the effect of inference ambiguity.[13]

Candidates used for ASR are often selected based on the particular property of interest being studied – e.g. thermostability.[9] By selecting sequences from either end of a property's range (e.g., psychrophilic proteins and thermophilic proteins) but within a protein family, ASR can be used to probe the specific sequence changes that conferred the observed biophysical effect – such as stabilising interactions. Consider in the diagram, if sequence 'A' encoded a protein that was optimally functional at neutral pHs and 'D' in acidic conditions, sequence changes between '5' and '2' may illustrate the precise biophysical explanation for this difference. As ASR experiments can extract ancestors that are likely billions of years old, there are often tens if not hundreds of sequence changes between ancestors themselves and ancestors and extant sequences – because of this, such sequence-function evolutionary studies can take a lot of work and rational direction.[1][6][14]

Resurrected proteins[edit]

There are many examples of ancestral proteins that have been computationally reconstructed, expressed in living cell lines, and – in many cases – purified and biochemically studied. The Thornton lab notably resurrected several ancestral hormone receptors (from about 500Ma)[15][16][17] and collaborated with the Stevens lab to resurrect ancient V-ATPase subunits[18] from yeast (800Ma). The Marqusee lab has recently published several studies concerning the evolutionary biophysical history of E. coli Ribonuclease H1.[9][19] Some other examples are ancestral visual pigments in vertebrates,[20] enzymes in yeast that break down sugars (800Ma);[21] enzymes in bacteria that provide resistance to antibiotics (2 – 3Ga);[22] the ribonucleases involved in ruminant digestion; and the alcohol dehydrogenases (Adhs) involved in yeast fermentation(~85Ma).[13]

It should be noted that the 'age' of a reconstructed sequence is determined using a molecular clock model, and often several are employed.[7][23] This dating technique is often calibrated using geological time-points (such as ancient ocean constituents or BIFs) and while these clocks offer the only method of inferring a very ancient protein's age, they have sweeping error margins and are diffuclt to defend against contrary data. To this end, ASR 'age' should really be only used as an indicative feature and is often surpassed altogether for a measurement of the number of substitutions between the ancestral and the modern sequences (the fundiment on which the clock is calculated).[9] That being said, the use of a clock allows one to compare observed biophysical data of an ASR protein to the geological or ecological environment at the time. For example, ASR studies on bacterial EF-Tus (proteins involved in translation, that are likely rarely subject to HGT and typically exhibit Tms ~2C greater than Tenv) indicate a hotter Precambrian earth which fits very closely with geological data on ancient earth ocean temperatures based on Oxygen-18 isotopic levels.[12] ASR studies of yeast Adhs reveal that the emergence of subfunctionalized Adhs for ethanol metabolism (not just waste excretion) arose at a time similar to the dawn of fleshy fruit in the Cambrian era and that before this emergence, Adh served to excrete ethanol as a byproduct of excess pyruvate.[13] The use of a clock also perhaps indicates that the origin of life occurred before the earliest molecular fossils indicate (>4.1Ga), but given the debatable reliability of molecular clocks, such observations should be taken with caution.[23][24]


These experiments address various important questions in evolutionary biology: does evolution proceed in small steps or in large leaps; is evolution reversible; how does complexity evolve? It has been shown that slight mutations in the amino acid sequence of hormone receptors determine an important change in their preferences for hormones. These changes mean huge steps in the evolution of the endocrine system. Thus very small changes at the molecular level may have enormous consequences. The Thornton lab has also been able to show that evolution is irreversible studying the glucocorticoid receptor. This receptor was changed by seven mutations in a cortisol receptor, but reversing these mutations didn't give the original receptor back. Indicating that epistasis plays a major role in protein evolution – an observation that in combination with the observations of several examples of parallel evolution, support the neutral network model mentioned above.[8] Other earlier neutral mutations acted as a ratchet and made the changes to the receptor irreversible.[25] These different experiments on receptors show that, during their evolution, proteins are greatly differentiated and this explains how complexity may evolve. A closer look at the different ancestral hormone receptors and the various hormones shows that at the level of interaction between single amino acid residues and chemical groups of the hormones arise by very small but specific changes. Knowledge about these changes may for example lead to the synthesis of hormonal equivalents capable of mimicking or inhibiting the action of a hormone, which might open possibilities for new therapies.

Given that ASR has revealed a tendency towards ancient thermostability and enzymatic promiscuity, ASR poses as a valuable tool for protein engineers who often desire these traits (producing effects sometimes greater than current, rationally lead tools).[11] ASR also promises to 'resurrect' phenotipically similar 'ancient organisms' which in turn would allow evolutionary biochemists to probe the story of life. Proponents of ASR such as Benner state that through these and other experiments, the end of the current century will see a level of understanding in biology analogous to the one that arose in classical chemistry in the last century.[13]

See also[edit]


  1. ^ a b Thornton, J.W. (2004). "Resurrecting ancient genes: experimental analysis of extinct molecules". Nature Reviews Genetics. 5 (5): 366–375. doi:10.1038/nrg1324. PMID 15143319.
  2. ^ Pauling, L. & Zuckerkandl, E. Chemical paleogenetics: molecular restoration studies of extinct forms of life. Acta Chem. Scand. supl. 17, S9–S16 (1963) Acta Chemica Scandinavica Online Archive
  3. ^ Jermann, TM; Opitz, JG; Stackhouse, J; Benner, SA (Mar 1995). "Reconstructing the evolutionaryhistory of the artiodactyl ribonuclease superfamily". Nature. 374 (6517): 57–9. doi:10.1038/374057a0. PMID 7532788.
  4. ^ Thornton, JW; Need, E; Crews, D (Sep 2003). "Resurrecting the ancestral steroid receptor:ancient origin of estrogen signaling". Science. 301 (5640): 1714–7. doi:10.1126/science.1086185. PMID 14500980.
  5. ^ Pearson, Helen (March 21, 2012) "Prehistoric proteins: raising the dead" Nature (London)
  6. ^ a b Anderson, Douglas P.; Whitney, Dustin S.; Hanson-Smith, Victor; Woznica, Arielle; Campodonico-Burnett, William; Volkman, Brian F.; King, Nicole; Thornton, Joseph W.; Prehoda, Kenneth E. (2016-01-07). "Evolution of an ancient protein function involved in organized multicellularity in animals". eLife. 5: e10147. doi:10.7554/eLife.10147. ISSN 2050-084X. PMC 4718807. PMID 26740169.
  7. ^ a b c Wheeler, Lucas C.; Lim, Shion A.; Marqusee, Susan; Harms, Michael J. (2016-06-01). "The thermostability and specificity of ancient proteins". Current Opinion in Structural Biology. 38: 37–43. doi:10.1016/ ISSN 1879-033X. PMC 5010474. PMID 27288744.
  8. ^ a b c d Harms, Michael J.; Thornton, Joseph W. (2013-08-01). "Evolutionary biochemistry: revealing the historical and physical causes of protein properties". Nature Reviews Genetics. 14 (8): 559–571. doi:10.1038/nrg3540. ISSN 1471-0056. PMC 4418793. PMID 23864121.
  9. ^ a b c d Lim, Shion A.; Hart, Kathryn M.; Harms, Michael J.; Marqusee, Susan (2016-11-15). "Evolutionary trend toward kinetic stability in the folding trajectory of RNases H". Proceedings of the National Academy of Sciences of the United States of America. 113 (46): 13045–13050. doi:10.1073/pnas.1611781113. ISSN 1091-6490. PMC 5135364. PMID 27799545.
  10. ^ Hobbs, Joanne K.; Prentice, Erica J.; Groussin, Mathieu; Arcus, Vickery L. (2015-10-01). "Reconstructed Ancestral Enzymes Impose a Fitness Cost upon Modern Bacteria Despite Exhibiting Favourable Biochemical Properties". Journal of Molecular Evolution. 81 (3–4): 110–120. doi:10.1007/s00239-015-9697-5. hdl:1721.1/105120. ISSN 1432-1432. PMID 26349578.
  11. ^ a b Risso, Valeria A.; Gavira, Jose A.; Sanchez-Ruiz, Jose M. (2014-06-01). "Thermostable and promiscuous Precambrian proteins". Environmental Microbiology. 16 (6): 1485–1489. doi:10.1111/1462-2920.12319. ISSN 1462-2920.
  12. ^ a b Gaucher, Eric A.; Govindarajan, Sridhar; Ganesh, Omjoy K. (2008-02-07). "Palaeotemperature trend for Precambrian life inferred from resurrected proteins". Nature. 451 (7179): 704–707. doi:10.1038/nature06510. ISSN 0028-0836. PMID 18256669.
  13. ^ a b c d Ancestral Sequence Reconstruction. Oxford, New York: Oxford University Press. 2007-07-26. ISBN 9780199299188.
  14. ^ Figure 1 from reference Harms, Michael J.; Thornton, Joseph W. (2013-08-01). "Evolutionary biochemistry: revealing the historical and physical causes of protein properties". Nature Reviews Genetics. 14 (8): 559–571. doi:10.1038/nrg3540. PMC 4418793. PMID 23864121.
  15. ^ Thornton, JW; Need, E; Crews, D (2003). "Resurrecting the Ancestral Steroid Receptor: Ancient Origin of Estrogen Signaling". Science. 301 (5640): 1714–1717. doi:10.1126/science.1086185. PMID 14500980.
  16. ^ Eick, GN; Colucci, JK; Harms, MJ; Orlund, EA; Thornton, JW (2012). "Evolution of minimal specificity and promiscuity in steroid hormone receptors". PLOS Genetics. 8 (11): e1003072. doi:10.1371/journal.pgen.1003072. PMC 3499368. PMID 23166518.
  17. ^ Harms MJ, Eick GN, Goswami D, Colucci JK, Griffin PR, Ortlund EA, Thornton JW. (2013) Biophysical mechanisms for large-effect mutations in the evolution of steroid hormone receptors. Proceedings of the National Academy of Sciences USA. published online June 24
  18. ^ Finnigan, G; Hanson-Smith, V; Stevens, TH; Thornton, JW (2012). "Mechanisms for the evolution of increased complexity in a molecular machine". Nature. 481 (7381): 360–4. doi:10.1038/nature10724. PMC 3979732. PMID 22230956.
  19. ^ Hart, Kathryn M.; Harms, Michael J.; Schmidt, Bryan H.; Elya, Carolyn; Thornton, Joseph W.; Marqusee, Susan (2014-11-11). "Thermodynamic System Drift in Protein Evolution". PLOS Biology. 12 (11): e1001994. doi:10.1371/journal.pbio.1001994. ISSN 1545-7885. PMC 4227636. PMID 25386647.
  20. ^ Shi, Y.; Yokoyama, S. (2003). "Molecular analysis of the evolutionary significance of ultraviolet vision in vertebrates". Proc. Natl. Acad. Sci. USA. 100 (14): 8308–8313. doi:10.1073/pnas.1532535100. PMC 166225. PMID 12824471.
  21. ^ Voordeckers, K; Brown, CA; Vanneste, K; van der Zande, E; Voet, A; et al. (2012). "Reconstruction of Ancestral Metabolic Enzymes Reveals Molecular Mechanisms Underlying Evolutionary Innovation through Gene Duplication". PLoS Biol. 10 (12): e1001446. doi:10.1371/journal.pbio.1001446. PMC 3519909. PMID 23239941.
  22. ^ Risso, VA; Jose, AG; Mejia-Carmona, DF; Gauchier, EA; Sanchez-Ruiz, JM (2013). "Hyperstability and Substrate Promiscuity in Laboratory Resurrections of Precambrian β-Lactamases". J. Am. Chem. Soc. 135 (8): 2899–2902. doi:10.1021/ja311630a. PMID 23394108.
  23. ^ a b Battistuzzi, Fabia U; Feijao, Andreia; Hedges, S Blair (2004-11-09). "A genomic timescale of prokaryote evolution: insights into the origin of methanogenesis, phototrophy, and the colonization of land". BMC Evolutionary Biology. 4: 44. doi:10.1186/1471-2148-4-44. ISSN 1471-2148. PMC 533871. PMID 15535883.
  24. ^ Bell, Elizabeth A.; Boehnke, Patrick; Harrison, T. Mark; Mao, Wendy L. (2015-11-24). "Potentially biogenic carbon preserved in a 4.1 billion-year-old zircon". Proceedings of the National Academy of Sciences. 112 (47): 14518–14521. doi:10.1073/pnas.1517557112. ISSN 0027-8424. PMC 4664351. PMID 26483481.
  25. ^ Bridgham, JT; Ortlund, EA; Thornton, JW (2009). "An epistatic ratchet constrains the direction of glucocorticoid receptor evolution". Nature. 461 (7263): 515–519. doi:10.1038/nature08249. PMC 6141187. PMID 19779450.