Expanded genetic code

From Wikipedia, the free encyclopedia
Jump to: navigation, search
For vocabulary, see Glossary of gene expression terms. For a non-technical introduction to the topic, see Introduction to genetics.
The synthase and tRNA must not cross talk with the existing tRNA/synthase machinery, only with the ribosomes

An expanded genetic code refers to an artificially modified genetic code in which one or more specific codons have been allocated to encode an amino acid that is not among the 20 "standard" amino acids.[1]

"Standard" or "natural" amino acids are the 20 proteinogenic alpha-amino acids that in nature are the building-blocks of all proteins within humans and other eukaryotes, and that are also directly encoded by the genetic code.[2][3][4] All others are known as "non-standard", "non-canonical", or "unnatural".

In May 2014, researchers announced that they had successfully introduced two new artificial nucleotides into bacterial DNA, and by including individual artificial nucleotides in the culture media, were able to passage the bacteria 24 times; they did not create mRNA or proteins able to use the artificial nucleotides.[5][6][7][8]


The translation of genetic information contained in messenger RNA (mRNA) into a protein is catalysed by ribosomes. Transfer RNAs (tRNA) are used as keys to decode the mRNA into its encoded polypeptide. The tRNA recognizes a specific three nucleotide codon in the mRNA with a complementary sequence called the anticodon on one of its loops. Each three nucleotide codon is translated into one of twenty naturally occurring amino acids.[9] There is at least one tRNA for any codon, and sometimes multiple codons code for the same amino acid. Many tRNAs are compatible with several codons. An enzyme called an aminoacyl tRNA synthetase covalently attaches the amino acid to the appropriate tRNA.[10] Most cells have a different synthetase for each amino acid (20 synthetases). On the other hand, some bacteria have fewer than 20 aminoacyl tRNA synthetases, and introduce the "missing" amino acid(s) by modification of a structurally related amino acid by an amidotransferase enzyme.[11] Attachment of an amino acid to tRNA uses energy from ATP.[10] The aminoacyl tRNA synthetase often does not recognize the anticodon, but another part of the tRNA, meaning that if the anticodon were to be mutated the encoding of that amino acid would change to a new codon.

In the ribosome, the information in mRNA is translated into a specific amino acid when the mRNA codon matches with the complementary anticodon of a tRNA, and the attached amino acid is added onto a growing polypeptide chain. When it is released from the ribosome, the polypeptide chain folds into a functioning protein.[10]


There are a few restrictions for the tRNA, synthetase, codon, and unnatural amino acid (Uaa)[12] being incorporated into a protein. For successful translation of a novel amino acid, the codon to which the unnatural amino acid is assigned cannot already code for one of the 20 natural amino acids. Usually a nonsense codon (stop codon) or a four-base codon are used.[9] Together, the tRNA, aminoacyl tRNA synthetase, and codon are called an orthogonal set. The orthogonal set must not crosstalk with the endogenous tRNA and synthetase sets, while still being functionally compatible with the ribosome and other components of the translation apparatus. The active site of the synthetase is modified to accept only the non-natural amino acid. Most often, a library of mutant synthetases is screened for one which charges the tRNA with the desired unnatural amino acid. The synthetase is also modified to recognize only the orthogonal tRNA.[9] The tRNA synthetase pair is often engineered in other bacteria or eukaryotic cells.[13] The unnatural amino acid must be able to permeate the cytoplasm when it is added to the growth medium of the cell.[9]

A similar earlier concept is that of alloprotein, which are made by incubating cells with an unnatural amino acid in the absence of a similar coded amino acid in order for the former to be incorporated into protein in place of the latter, for example L-2-aminohexanoic acid (Ahx) for methionine (Met).[14]

The possibility of reassigning codons was realized by Normanly et al. in 1990, when a viable mutant strain of E. coli read through the amber (stop) codon.[15] As a result the amber codon became the choice codon to be assigned a novel amino acid. Later, in the Schultz lab the tRNATyr/tyrosyl-tRNA synthetase (TyrRS) from Methanococcus jannaschii, an archaebacterium,[9] was used to introduce a tyrosine instead of STOP, the default value of the amber codon.[16] As mentioned, this was possible because of the differences between the endogenous bacterial synthases and the orthologous archeal synthase, which do not recognize each other. Schultz subsequently expanded the genetic codes of various organisms, allowing the genetically-encoded incorporation of more than 70 unnatural amino acids into proteins. Unnatural amino acids incorporated into proteins by Schultz (and his collaborators, competitors, and trainees) include heavy atom containing amino acids to facilitate x-ray crystallographic studies; amino acids with novel steric/packing and electronic properties; photocrosslinking amino acids which can be used to probe protein-protein interactions in vitro or in vivo; keto, acetylene, azide, and boronate containing amino acids which can be used to selectively introduce a large number of biophysical probes, tags, and novel chemical functional groups into proteins in vitro or in vivo; redox active amino acids to probe and modulate electron transfer; photocaged and photoisomerizable amino acids to photoregulate biological processes; metal binding amino acids for catalysis and metal ion sensing; amino acids that contain fluorescent or infra-red active side chains to probe protein structure and dynamics; α-hydroxy acids and D-amino acids as probes of backbone conformation and hydrogen bonding interactions; and sulfated amino acids and mimetics of phosphorylated amino acids as probes of posttranslational modifications.[17][18][19][20]

Directed evolution[edit]

The orthologous set of synthetase and tRNA can then be mutated and screened through directed evolution to charge the tRNA with a different, even novel, amino acid. Mutations to the plasmid containing the pair can be introduced by error-prone PCR or through degenerate primers for the synthetase's active site. Selection involves multiple rounds of a two-step process, where the plasmid is transferred into cells expressing chloramphenicol acetyl transferase with a premature amber codon. In the presence of toxic chloramphenicol and the non-natural amino acid, the surviving cells will have overridden the amber codon using the orthogonal tRNA aminoacylated with either the standard amino acids or the non-natural one. To remove the former, the plasmid is inserted into cells with a barnase gene (toxic) with a premature amber codon but without the non-natural amino acid, removing all the orthogonal synthases that do not specifically recognize the non-natural amino acid.[9] In addition to the recoding of the tRNA to a different codon, they can be mutated to recognize a four-base codon, allowing additional free coding options.[21] The non-natural amino acid, as a result, introduces diverse physicochemical and biological properties in order to be used as a tool to explore protein structure and function or to create novel or enhanced protein for practical purposes.

Several methods for selecting the synthetase that accepts only the non-natural amino acid have been developed. One of which is by using a combination of positive and negative selection
Some amino acids that have been added in order to label protein


The orthogonal pairs of synthetase and tRNA that work for one organism may not work for another, as the synthetase may mis-aminoacylate endogenous tRNAs or the tRNA be mis-aminoacylated itself by an endogenous synthetase. As a result, the sets created to date differ between organisms.

Orthogonal sets in E. coli[edit]

  • tRNATyr-TyrRS pair from the archaeon Methanococcus jannaschii
  • tRNALys–LysRS pair from the archaeon Pyrococcus horikoshii[22]
  • tRNAGlu–GluRS pair from Methanosarcina mazei[23]
  • leucyl-tRNA synthetase from Methanobacterium thermoautotrophicum and a mutant leucyl tRNA derived from Halobacterium sp[24]
  • tRNAAmber-PylRS pair from the archaeon Methanosarcina barkeri and Methanosarcina mazei

Orthogonal sets in yeast[edit]

  • tRNATyr-TyrRS pair from Escherichia coli[25]
  • tRNALeu–LeuRS pair from Escherichia coli[26]
  • tRNAiMet from human and GlnRS from Escherichia coli[27]
  • tRNAAmber-PylRS pair from the archaeon Methanosarcina barkeri and Methanosarcina mazei

Orthogonal sets in mammalian cells[edit]

  • tRNATyr-TyrRS pair from Bacillus stearothermophilus[28]
  • modified tRNATrp-TrpRS pair from Bacillus subtilis trp[29]
  • tRNALeu–LeuRS pair from Escherichia coli[30]
  • tRNAAmber-PylRS pair from the archaeon Methanosarcina barkeri and Methanosarcina mazei

Unnatural base pair (UBP)[edit]

Main article: Unnatural base pair

An unnatural base pair (UBP) is a designed subunit (or nucleobase) of DNA which is created in a laboratory and does not occur in nature. A demonstration of UBPs were achieved in vitro by Ichiro Hirao's group at RIKEN institute in Japan. In 2002, they developed an unnatural base pair between 2-amino-8-(2-thienyl)purine (s) and pyridine-2-one (y) that functions in vitro in transcription and translation for the site-specific incorporation of non-standard amino acids into proteins.[31] In 2006, they created 7-(2-thienyl)imidazo[4,5-b]pyridine (Ds) and pyrrole-2-carbaldehyde (Pa) as a third base pair for replication and transcription.[32] Afterward, Ds and 4-[3-(6-aminohexanamido)-1-propynyl]-2-nitropyrrole (Px) was discovered as a high fidelity pair in PCR amplification.[33][34] In 2013, they applied the Ds-Px pair to DNA aptamer generation by in vitro selection (SELEX) and demonstrated the genetic alphabet expansion significantly augment DNA aptamer affinities to target proteins.[35]

In 2012, a group of American scientists led by Floyd Romesberg, a chemical biologist at the Scripps Research Institute in San Diego, California, published that his team designed an unnatural base pair (UBP).[36] The two new artificial nucleotides or Unnatural Base Pair (UBP) were named d5SICS and dNaM. More technically, these artificial nucleotides bearing hydrophobic nucleobases, feature two fused aromatic rings that form a (d5SICS–dNaM) complex or base pair in DNA.[7][37] In 2014 the same team from the Scripps Research Institute reported that they synthesized a stretch of circular DNA known as a plasmid containing natural T-A and C-G base pairs along with the best-performing UBP Romesberg's laboratory had designed, and inserted it into cells of the common bacterium E. coli that successfully replicated the unnatural base pairs through multiple generations.[38] This is the first known example of a living organism passing along an expanded genetic code to subsequent generations.[7][39] This was in part achieved by the addition of a supportive algal gene that expresses a nucleotide triphosphate transporter which efficiently imports the triphosphates of both d5SICSTP and dNaMTP into E. coli bacteria.[7] Then, the natural bacterial replication pathways use them to accurately replicate the plasmid containing d5SICS–dNaM.

The successful incorporation of a third base pair into a living micro-organism is a significant breakthrough toward the goal of greatly expanding the number of amino acids which can be encoded by DNA, thereby expanding the potential for living organisms to produce novel proteins.[38] The artificial strings of DNA do not encode for anything yet, but scientists speculate they could be designed to manufacture new proteins which could have industrial or pharmaceutical uses.[40]

Practical applications[edit]

With an expanded genetic code, the unnatural amino acid can be genetically directed to any chosen site in the protein of interest. The high efficiency and fidelity of this process allows a better control of the placement of the modification compared to modifying the protein post-translationally, which, in general, will target all amino acids of the same type, such as the thiol group of cysteine and the amino group of lysine.[41] Also, an expanded genetic code allows modifications to be carried out in vivo. The ability to site-specifically direct lab-synthesized chemical moieties into proteins allows many types of studies that would otherwise be extremely difficult, such as:

  • Probing protein structure and function: By using amino acids with slightly different size such as o-Methyltyrosine or dansylalanine instead of tyrosine, and by inserting genetically coded reporter moieties (color-changing and/or spin-active) into selected protein sites, chemical information about the protein's structure and function can be measured.
  • Identifying and regulating protein activity: By using photocaged aminoacids, protein function can be "switched" on or off by illuminating the organism.
  • Changing the mode of action of a protein: One can start with the gene for a protein that binds a certain sequence of DNA and, by inserting a chemically active amino acid into the binding site, convert it to a protein that cuts the DNA rather than binding it.
  • Improving immunogenicity and overcoming self-tolerance: By replacing strategically chosen tyrosines with p-nitro phenylalanine, a tolerated self-protein can be made immunogenic.[42]
  • Selective destruction of selected cellular components: using an expanded genetic code, unnatural, destructive chemical moieties (sometimes called "chemical warheads") can be incorporated into proteins that target specific cellular components.[43]

New organisms with expanded genetic codes[edit]

In the Schultz lab, a bacterial organism has been generated which biosynthesizes a novel, previously unnatural amino acid (p-aminophenylalanine) from basic carbon sources and includes this amino acid in its genetic code.[20][44][45] This is the first example of the creation of an autonomous twenty-one-amino-acid organism.

See also[edit]


  1. ^ Xie, J; Schultz, PG (2005). "Adding amino acids to the genetic repertoire". Current Opinion in Chemical Biology 9 (6): 548–54. doi:10.1016/j.cbpa.2005.10.011. PMID 16260173. 
  2. ^ Modeling Electrostatic Contributions to Protein Folding and Binding - Tjong, p.1 footnote
  3. ^ Frontiers in Drug Design and Discovery ed. Atta-Ur-Rahman & others, p.299
  4. ^ Elzanowski A, Ostell J (2008-04-07). "The Genetic Codes". National Center for Biotechnology Information (NCBI). Retrieved 2010-03-10. 
  5. ^ Pollack, Andrew (May 7, 2014). "Researchers Report Breakthrough in Creating Artificial Genetic Code". New York Times. Retrieved May 7, 2014. 
  6. ^ Callaway, Ewen (May 7, 2014). "First life with 'alien' DNA". Nature (journal). doi:10.1038/nature.2014.15179. Retrieved May 7, 2014. 
  7. ^ a b c d Malyshev, Denis A.; Dhami, Kirandeep; Lavergne, Thomas; Chen, Tingjian; Dai, Nan; Foster, Jeremy M.; Corrêa, Ivan R.; Romesberg, Floyd E. (May 7, 2014). "A semi-synthetic organism with an expanded genetic alphabet". Nature (journal). doi:10.1038/nature13314. Retrieved May 7, 2014. 
  8. ^ Amos, Jonathan (8 May 2014). "Semi-synthetic bug extends ‘life's alphabet’". BBC News. Retrieved 2014-05-09. 
  9. ^ a b c d e f Wang, L.; Brock, A.; Herberich, B.; Schultz, P. G. (April 2001). "Expanding the Genetic Code of Escherichia coli". Science 292 (5516): 498–500. doi:10.1126/science.1060077. PMID 11313494. 
  10. ^ a b c Alberts, et. al, Bruce (2008). Molecular Biology of the Cell (5th ed.). New York: Garland Science. ISBN 0815341059. 
  11. ^ Woese, et. al, Carl (2000). "Aminoacyl-tRNA synthetases, the genetic code, and the evolutionary process.". Microbiol. Mol. Biol. Rev. 64: 202–236. 
  12. ^ Minnihan, Ellen C; Yokoyama, Kenichi, Stubbe, JoAnne (Nov 2009). "Unnatural amino acids: better than the real things?". F1000 Biology Reports 1 (88). doi:10.3410/B1-88. 
  13. ^ Sakamoto, K. (2002). "Site-specific incorporation of an unnatural amino acid into proteins in mammalian cells". Nucleic Acids Research 30 (21): 4692–4699. doi:10.1093/nar/gkf589. PMC 135798. PMID 12409460. 
  14. ^ Koide, H.; Yokoyama, S.; Kawai, G.; Ha, J. M.; Oka, T.; Kawai, S.; Miyake, T.; Fuwa, T.; Miyazawa, T. (1988). "Biosynthesis of a protein containing a nonprotein amino acid by Escherichia coli: L-2-aminohexanoic acid at position 21 in human epidermal growth factor". Proceedings of the National Academy of Sciences of the United States of America 85 (17): 6237–6241. doi:10.1073/pnas.85.17.6237. PMC 281944. PMID 3045813.  edit
  15. ^ Normanly, J; Kleina, L.G.; Masson, J.M.; Abelson, J.; Miller, J.H. (1990). "Construction of Escherichia coli amber suppressor tRNA genes. III. Determination of tRNA specificity". J. Mol. Biol. 213 (4): 719–726. doi:10.1016/S0022-2836(05)80258-X. PMID 2141650. 
  16. ^ Wang, L.; Magliery, T.J.; Liu, D.R.; Schultz, P.G. (2000). "A new functional suppressor tRNA/aminoacyl-tRNA synthetase pair for the in vivo incorporation of unnatural amino acids into proteins". J. Am. Chem. Soc. 122 (20): 5010–5011. doi:10.1021/ja000595y. 
  17. ^ Wang, L; Xie, J; Schultz, P. G. (2006). "Expanding the genetic code". Annual Review of Biophysics and Biomolecular Structure 35: 225–49. doi:10.1146/annurev.biophys.35.101105.121507. PMID 16689635.  edit
  18. ^ Young, T. S.; Schultz, P. G. (2010). "Beyond the canonical 20 amino acids: Expanding the genetic lexicon". Journal of Biological Chemistry 285 (15): 11039–44. doi:10.1074/jbc.R109.091306. PMC 2856976. PMID 20147747.  edit
  19. ^ http://www.annualreviews.org/doi/abs/10.1146/annurev.biophys.35.101105.121507
  20. ^ a b http://schultz.scripps.edu/research.php
  21. ^ Watanabe, T; Muranaka, N; Hohsaka, T. (2008). "Four-base codon-mediated saturation mutagenesis in a cell-free translation system". J Biosci Bioeng 105 (3): 211–5. doi:10.1263/jbb.105.211. PMID 18397770. 
  22. ^ Anderson, J.C.; Wu, N.; Santoro, S.W.; Lakshman, V.; King, D.S.; Schultz, P.G. (2004). "An expanded genetic code with a functional quadruplet codon". Proc Natl Acad Sci USA 101 (20): 7566–7571. doi:10.1073/pnas.0401517101. PMC 419646. PMID 15138302. 
  23. ^ Santoro, S.W.; Anderson, J.C.; Lakshman, V.; Schultz, P.G. (2003). "An archaebacteria-derived glutamyl-tRNA synthetase and tRNA pair for unnatural amino acid mutagenesis of proteins in Escherichia coli". Nucleic Acids Res 31 (23): 6700–6709. doi:10.1093/nar/gkg903. PMC 290271. PMID 14627803. 
  24. ^ Anderson, J.C.; Schultz, P.G. (2003). "Adaptation of an orthogonal archaeal leucyl-tRNA and synthetase pair for four-base, amber, and opal suppression". Biochemistry 42 (32): 9598–9608. doi:10.1021/bi034550w. PMID 12911301. 
  25. ^ Chin, J.W.; Cropp, T.A.; Anderson, J.C.; Mukherji, M.; Zhang, Z.; Schultz, P.G. (2003). "An expanded eukaryotic genetic code". Science 301 (5635): 964–967. doi:10.1126/science.1084772. PMID 12920298. 
  26. ^ Wu, N.; Deiters, A.; Cropp, T.A.; King, D.; Schultz, P.G. (2004). "A genetically encoded photocaged amino Acid". J Am Chem Soc 126 (44): 14306–14307. doi:10.1021/ja040175z. PMID 15521721. 
  27. ^ Kowal, A.K.; Kohrer, C.; RajBhandary, U.L. (2001). "Twenty-first aminoacyl-tRNA synthetase–suppressor tRNA pairs for possible use in site-specific incorporation of amino acid analogues into proteins in eukaryotes and in eubacteria". Proc Natl Acad Sci USA 98 (5): 2268–2273. doi:10.1073/pnas.031488298. PMC 30127. PMID 11226228. 
  28. ^ Sakamoto, K.; Hayashi, A.; Sakamoto, A.; Kiga, D.; Nakayama, H.; Soma, A.; Kobayashi, T.; Kitabatake, M. et al. (2002). "Site-specific incorporation of an unnatural amino acid into proteins in mammalian cells". Nucleic Acids Res. 30 (21): 4692–4699. doi:10.1093/nar/gkf589. PMC 135798. PMID 12409460. 
  29. ^ Zhang, Z.; Alfonta, L.; Tian, F.; Bursulaya, B.; Uryu, S.; King, D.S.; Schultz, P.G. (2004). "Selective incorporation of 5-hydroxytryptophan into proteins in mammalian cells". Proc. Natl. Acad. Sci. USA 101 (24): 8882–8887. doi:10.1073/pnas.0307029101. PMC 428441. PMID 15187228. 
  30. ^ Wang, W.; Takimoto, J.; Louie, G.V.; Baiga, T.J.; Noel, J.P.; Lee, K.F.; Slesinger, P.A.; Wang, L. (2007). "Genetically encoding unnatural amino acids for cellular and neuronal studies". Nat. Neurosci 10 (8): 1063–1072. doi:10.1038/nn1932. PMC 2692200. PMID 17603477. 
  31. ^ Hirao, I. et al. (2002) An unnatural base pair for incorporating amino acid analogs into proteins. Nat. Biotechnol. 20, 177-182
  32. ^ Hirao, I. et al. (2006) An unnatural hydrophobic base pair system: site-specific incorporation of nucleotide analogs into DNA and RNA. Nat. Methods 6, 729-735
  33. ^ Kimoto, M. et al. (2009) An unnatural base pair system for efficient PCR amplification and functionalization of DNA molecules. Nucleic acids Res. 37, e14
  34. ^ Yamashige, R. et al. Highly specific unnatural base pair systems as a third base pair for PCR amplification. Nucleic Acids Res. 40, 2793-2806
  35. ^ Kimoto, M. et al. (2013) Generation of high-affinity DNA aptamers using an expanded genetic alphabet. Nat. Biotechnol. 31, 453-457
  36. ^ Malyshev, Denis A.; Dhami, Kirandeep; Quach, Henry T.; Lavergne, Thomas; Ordoukhanian, Phillip (24 July 2012). "Efficient and sequence-independent replication of DNA containing a third base pair establishes a functional six-letter genetic alphabet". Proceedings of the National Academy of Sciences of the United States of America (PNAS) 109 (30): 12005–12010. doi:10.1073/pnas.1205176109. Retrieved 2014-05-11. 
  37. ^ Callaway, Ewan (May 7, 2014). "Scientists Create First Living Organism With 'Artificial' DNA". Nature News (Huffington Post). Retrieved 8 May 2014. 
  38. ^ a b Fikes, Bradley J. (May 8, 2014). "Life engineered with expanded genetic code". San Diego Union Tribune. Retrieved 8 May 2014. 
  39. ^ Sample, Ian (May 7, 2014). "First life forms to pass on artificial DNA engineered by US scientists". The Guardian. Retrieved 8 May 2014. 
  40. ^ Pollack, Andrew (May 7, 2014). "Scientists Add Letters to DNA’s Alphabet, Raising Hope and Fear". New York Times. Retrieved 8 May 2014. 
  41. ^ Wang, Q; Parrish, AR; Wang, L (2009). "Expanding the Genetic Code for Biological Studies". Chemistry & biology 16 (3): 323–36. doi:10.1016/j.chembiol.2009.03.001. PMC 2696486. PMID 19318213. 
  42. ^ Gauba, V; Grünewald, J; Gorney, V; Deaton, L. M.; Kang, M; Bursulaya, B; Ou, W; Lerner, R. A.; Schmedt, C; Geierstanger, B. H.; Schultz, P. G.; Ramirez-Montagut, T (2011). "Loss of CD4 T-cell-dependent tolerance to proteins with modified amino acids". Proceedings of the National Academy of Sciences 108 (31): 12821–6. doi:10.1073/pnas.1110042108. PMC 3150954. PMID 21768354.  edit
  43. ^ Liu, CC; Mack, AV; Brustad, EM; Mills, JH; Groff, D; Smider, VV; Schultz, PG. (2009). "The Evolution of Proteins with Genetically Encoded "Chemical Warheads"". J Am Chem Soc. 131 (28): 9616–7. doi:10.1021/ja902985e. PMC 2745334. PMID 19555063. 
  44. ^ Journal of the American Chemical Society. 2003 Jan 29;125(4):935-9. Generation of a bacterium with a 21 amino acid genetic code. Mehl RA, Anderson JC, Santoro SW, Wang L, Martin AB, King DS, Horn DM, Schultz PG.
  45. ^ http://straddle3.net/context/03/en/2003_01_30.html