Artificial gene synthesis
Artificial gene synthesis, sometimes known as DNA printing is a method in synthetic biology that is used to create artificial genes in the laboratory. Currently based on solid-phase DNA synthesis, it differs from molecular cloning and polymerase chain reaction (PCR) in that the user does not have to begin with preexisting DNA sequences. Therefore, it is possible to make a completely synthetic double-stranded DNA molecule with no apparent limits on either nucleotide sequence or size. The method has been used to generate functional bacterial or yeast chromosomes containing approximately one million base pairs. Recent research also suggests the possibility of creating novel nucleobase pairs in addition to the two base pairs in nature, which could greatly expand the genetic code.
Synthesis of the first complete gene, a yeast tRNA, was demonstrated by Har Gobind Khorana and coworkers in 1972. Synthesis of the first peptide- and protein-coding genes was performed in the laboratories of Herbert Boyer and Alexander Markham, respectively.
Commercial gene synthesis services are now available from numerous companies worldwide, some of which have built their business model around this task. Current gene synthesis approaches are most often based on a combination of organic chemistry and molecular biological techniques and entire genes may be synthesized "de novo", without the need for precursor template DNA. Gene synthesis has become an important tool in many fields of recombinant DNA technology including heterologous gene expression, vaccine development, gene therapy and molecular engineering. The synthesis of nucleic acid sequences is often more economical than classical cloning and mutagenesis procedures.
- 1 Gene optimization
- 2 Standard methods
- 3 Applications
- 4 Entire bacterial genomes
- 5 Yeast chromosome
- 6 Unnatural base pair (UBP)
- 7 See also
- 8 Notes
- 9 References
- 10 External links
While the ability to make increasingly long stretches of DNA efficiently and at lower prices is a technological driver of this field, increasingly attention is being focused on improving the design of genes for specific purposes. Early in the genome sequencing era, gene synthesis was used as an (expensive) source of cDNAs that were predicted by genomic or partial cDNA information but were difficult to clone. As higher quality sources of sequence verified cloned cDNA have become available, this practice has become less urgent.
Producing large amounts of protein from gene sequences (or at least the protein coding regions of genes, the open reading frame) found in nature can sometimes prove difficult and is a problem of sufficient impact that scientific conferences have been devoted to the topic. Many of the most interesting proteins sought by molecular biologists are normally regulated to be expressed in very low amounts in wild type cells. Redesigning these genes offers a means to improve gene expression in many cases. Rewriting the open reading frame is possible because of the degeneracy of the genetic code. Thus it is possible to change up to about a third of the nucleotides in an open reading frame and still produce the same protein. The available number of alternate designs possible for a given protein is astronomical. For a typical protein sequence of 300 amino acids there are over 10150 codon combinations that will encode an identical protein. Using optimization methods such as replacing rarely used codons with more common codons sometimes have dramatic effects. Further optimizations such as removing RNA secondary structures can also be included. At least in the case of E. coli, protein expression is maximized by predominantly using codons corresponding to tRNA that retain amino acid charging during starvation. Computer programs written to perform these, and other simultaneous optimizations are used to handle the enormous complexity of the task. A well optimized gene can improve protein expression 2 to 10 fold, and in some cases more than 100 fold improvements have been reported. Because of the large numbers of nucleotide changes made to the original DNA sequence, the only practical way to create the newly designed genes is to use gene synthesis.
Oligonucleotides are chemically synthesized using building blocks called nucleoside phosphoramidites. These can be normal or modified nucleosides which have protecting groups to prevent their amines, hydroxyl groups and phosphate groups from interacting incorrectly. One phosphoramidite is added at a time, the 5' hydroxyl group is deprotected and a new base is added and so on. The chain grows in the 3' to 5' direction, which is backwards relative to biosynthesis. At the end, all the protecting groups are removed. Nevertheless, being a chemical process, several incorrect interactions occur leading to some defective products. The longer the oligonucleotide sequence that is being synthesized, the more defects there are, thus this process is only practical for producing short sequences of nucleotides. The current practical limit is about 200 bp (base pairs) for an oligonucleotide with sufficient quality to be used directly for a biological application. HPLC can be used to isolate products with the proper sequence. Meanwhile, a large number of oligos can be synthesized in parallel on gene chips. For optimal performance in subsequent gene synthesis procedures they should be prepared individually and in larger scales.
Annealing based connection of oligonucleotides
Usually, a set of individually designed oligonucleotides is made on automated solid-phase synthesizers, purified and then connected by specific annealing and standard ligation or polymerase reactions. To improve specificity of oligonucleotide annealing, the synthesis step relies on a set of thermostable DNA ligase and polymerase enzymes. To date, several methods for gene synthesis have been described, such as the ligation of phosphorylated overlapping oligonucleotides, the Fok I method and a modified form of ligase chain reaction for gene synthesis. Additionally, several PCR assembly approaches have been described. They usually employ oligonucleotides of 40-50 nucleotides long that overlap each other. These oligonucleotides are designed to cover most of the sequence of both strands, and the full-length molecule is generated progressively by overlap extension (OE) PCR, thermodynamically balanced inside-out (TBIO) PCR or combined approaches. The most commonly synthesized genes range in size from 600 to 1,200 bp although much longer genes have been made by connecting previously assembled fragments of under 1,000 bp. In this size range it is necessary to test several candidate clones confirming the sequence of the cloned synthetic gene by automated sequencing methods.
Moreover, because the assembly of the full-length gene product relies on the efficient and specific alignment of long single stranded oligonucleotides, critical parameters for synthesis success include extended sequence regions comprising secondary structures caused by inverted repeats, extraordinary high or low GC-content, or repetitive structures. Usually these segments of a particular gene can only be synthesized by splitting the procedure into several consecutive steps and a final assembly of shorter sub-sequences, which in turn leads to a significant increase in time and labor needed for its production. The result of a gene synthesis experiment depends strongly on the quality of the oligonucleotides used. For these annealing based gene synthesis protocols, the quality of the product is directly and exponentially dependent on the correctness of the employed oligonucleotides. Alternatively, after performing gene synthesis with oligos of lower quality, more effort must be made in downstream quality assurance during clone analysis, which is usually done by time-consuming standard cloning and sequencing procedures. Another problem associated with all current gene synthesis methods is the high frequency of sequence errors because of the usage of chemically synthesized oligonucleotides. The error frequency increases with longer oligonucleotides, and as a consequence the percentage of correct product decreases dramatically as more oligonucleotides are used. The mutation problem could be solved by shorter oligonucleotides used to assemble the gene. However, all annealing based assembly methods require the primers to be mixed together in one tube. In this case, shorter overlaps do not always allow precise and specific annealing of complementary primers, resulting in the inhibition of full length product formation. Manual design of oligonucleotides is a laborious procedure and does not guarantee the successful synthesis of the desired gene. For optimal performance of almost all annealing based methods, the melting temperatures of the overlapping regions are supposed to be similar for all oligonucleotides. The necessary primer optimization should be performed using specialized oligonucleotide design programs. Several solutions for automated primer design for gene synthesis have been presented so far.
Error correction procedures
To overcome problems associated with oligonucleotide quality several elaborate strategies have been developed, employing either separately prepared fishing oligonucleotides, mismatch binding enzymes of the mutS family or specific endonucleases from bacteria or phages. Nevertheless, all these strategies increase time and costs for gene synthesis based on the annealing of chemically synthesized oligonucleotides.
Massively parallel sequencing has also been used as a tool to screen complex oligonucleotide libraries and enable the retrieval of accurate molecules. In one approach, oligonucleotides are sequenced on the 454 pyrosequencing platform and a robotic system images and picks individual beads corresponding to accurate sequence. In another approach, a complex oligonucleotide library is modified with unique flanking tags before massively parallel sequencing. Tag-directed primers then enable the retrieval of molecules with desired sequences by dial-out PCR.
Increasingly, genes are ordered in sets including functionally related genes or multiple sequence variants on a single gene. Virtually all of the therapeutic proteins in development, such as monoclonal antibodies, are optimized by testing many gene variants for improved function or expression.
Major applications of synthetic genes include synthesis of DNA sequences identified by high throughput sequencing but never cloned into plasmids and the ability to safely obtain genes for vaccine research without the need to grow the full pathogens. Digital manipulation of digital genetic code before synthesis into DNA can be used to optimize protein expression in a particular host, or remove non-functional segments in order to facilitate further replication of the DNA.
Synthesis of DNA allows DNA digital data storage.
DNA synthesis and synthetic biology
The significant drop in cost of gene synthesis in recent years due to increasing competition of companies providing this service has led to the ability to produce entire bacterial plasmids that have never existed in nature. The field of synthetic biology utilizes the technology to produce synthetic biological circuits, which are stretches of DNA manipulated to change gene expression within cells and cause the cell to produce a desired product.
Entire bacterial genomes
Synthia and Mycoplasma laboratorium
On June 28, 2007, a team at the J. Craig Venter Institute published an article in Science Express, saying that they had successfully transplanted the natural DNA from a Mycoplasma mycoides bacterium into a Mycoplasma capricolum cell, creating a bacterium which behaved like a M. mycoides.
On Oct 6, 2007, Craig Venter announced in an interview with UK's The Guardian newspaper that the same team had synthesized a modified version of the single chromosome of Mycoplasma genitalium using chemicals. The chromosome was modified to eliminate all genes which tests in live bacteria had shown to be unnecessary. The next planned step in this minimal genome project is to transplant the synthesized minimal genome into a bacterial cell with its old DNA removed; the resulting bacterium will be called Mycoplasma laboratorium. The next day the Canadian bioethics group, ETC Group issued a statement through their representative, Pat Mooney, saying Venter's "creation" was "a chassis on which you could build almost anything". The synthesized genome had not yet been transplanted into a working cell.
On May 21, 2010, Science reported that the Venter group had successfully synthesized the genome of the bacterium Mycoplasma mycoides from a computer record, and transplanted the synthesized genome into the existing cell of a Mycoplasma capricolum bacterium that had had its DNA removed. The "synthetic" bacterium was viable, i.e. capable of replicating billions of times. The team had originally planned to use the M. genitalium bacterium they had previously been working with, but switched to M. mycoides because the latter bacterium grows much faster, which translated into quicker experiments. Venter describes it as "the first species.... to have its parents be a computer". The transformed bacterium is dubbed "Synthia" by ETC. A Venter spokesperson has declined to confirm any breakthrough at the time of this writing.
In March 2014, Jef Boeke of the Langone Medical Centre at New York University, published that his team has synthesized one of the S. cerevisiae 16 yeast chromosomes, the chromosome III, that he named synIII. The procedure involved replacing the genes in the original chromosome with synthetic versions and the finished human made chromosome was then integrated into a yeast cell. It required designing and creating 273,871 base pairs of DNA - fewer than the 316,667 pairs in the original chromosome.
Unnatural base pair (UBP)
DNA sequences have been described which use newly created nucleobases to form a third base pair, in addition to the two base pairs found in nature, A-T (adenine – thymine) and G-C (guanine – cytosine). Multiple research groups have been searching for a third base pair for DNA, including teams led by Steven A. Benner, Philippe Marliere, and Ichiro Hirao. Some new base pairs have been reported.
In 2012, a group of American scientists led by Floyd Romesberg, a chemical biologist at the Scripps Research Institute in San Diego, California, published that his team designed an unnatural base pair (UBP). The two new artificial nucleotides or Unnatural Base Pair (UBP) were named d5SICS and dNaM. More technically, these artificial nucleotides bearing hydrophobic nucleobases, feature two fused aromatic rings that form a (d5SICS–dNaM) complex or base pair in DNA. In 2014 the same team from the Scripps Research Institute reported that they synthesized a stretch of circular DNA known as a plasmid containing natural T-A and C-G base pairs along with the best-performing UBP Romesberg's laboratory had designed, and inserted it into cells of the common bacterium E. coli that successfully replicated the unnatural base pairs through multiple generations. This is the first known example of a living organism passing along an expanded genetic code to subsequent generations. This was in part achieved by the addition of a supportive algal gene that expresses a nucleotide triphosphate transporter which efficiently imports the triphosphates of both d5SICSTP and dNaMTP into E. coli bacteria. Then, the natural bacterial replication pathways use them to accurately replicate the plasmid containing d5SICS–dNaM.
The successful incorporation of a third base pair is a significant breakthrough toward the goal of greatly expanding the number of amino acids which can be encoded by DNA, from the existing 20 amino acids to a theoretically possible 172, thereby expanding the potential for living organisms to produce novel proteins. The artificial strings of DNA do not encode for anything yet, but scientists speculate they could be designed to manufacture new proteins which could have industrial or pharmaceutical uses.
- DNA 'Printing' A Big Boon To Research, But Some Raise Concerns
- Kimoto, M.; et al. (2013). "Generation of high-affinity DNA aptamers using an expanded genetic alphabet". Nat. Biotechnol. 31: 453–457. doi:10.1038/nbt.2556.
- Malyshev, Denis A.; Dhami, Kirandeep; Quach, Henry T.; Lavergne, Thomas; Ordoukhanian, Phillip (24 July 2012). "Efficient and sequence-independent replication of DNA containing a third base pair establishes a functional six-letter genetic alphabet". Proceedings of the National Academy of Sciences of the United States of America. 109 (30): 12005–12010. Bibcode:2012PNAS..10912005M. doi:10.1073/pnas.1205176109. Retrieved 2014-05-11. Cite error: Invalid
<ref>tag; name "Malyshev_PNAS_20120724" defined multiple times with different content (see the help page).
- Malyshev, Denis A.; Dhami, Kirandeep; Lavergne, Thomas; Chen, Tingjian; Dai, Nan; Foster, Jeremy M.; Corrêa, Ivan R.; Romesberg, Floyd E. (May 7, 2014). "A semi-synthetic organism with an expanded genetic alphabet". Nature. 509: 385–8. doi:10.1038/nature13314. PMC . PMID 24805238. Retrieved May 7, 2014. Cite error: Invalid
<ref>tag; name "NATJ-20140507" defined multiple times with different content (see the help page).
- Sample, Ian (May 7, 2014). "First life forms to pass on artificial DNA engineered by US scientists". The Guardian. Retrieved 8 May 2014.
- Khorana HG, Agarwal KL, Büchi H, et al. (December 1972). "Studies on polynucleotides. 103. Total synthesis of the structural gene for an alanine transfer ribonucleic acid from yeast". J. Mol. Biol. 72 (2): 209–217. doi:10.1016/0022-2836(72)90146-5. PMID 4571075.
- Itakura K, Hirose T, Crea R, et al. (December 1977). "Expression in Escherichia coli of a chemically synthesized gene for the hormone somatostatin". Science. 198 (4321): 1056–1063. Bibcode:1977Sci...198.1056I. doi:10.1126/science.412251. PMID 412251.
- Edge MD, Green AR, Heathcliffe GR, et al. (August 1981). "Total synthesis of a human leukocyte interferon gene". Nature. 292 (5825): 756–62. Bibcode:1981Natur.292..756E. doi:10.1038/292756a0. PMID 6167861.
- For example, the company DNA 2.0 was established in 2003 in Menlo Park, CA as a "synthetic genomics company" (quotated page).
- "Difficult to Express Proteins". Sixth Annual PEGS Summit. Cambridge Healthtech Institute. 2010. Archived from the original on 11 May 2010. Retrieved 11 May 2010.
- Liszewski, Kathy (1 May 2010). "New Tools Facilitate Protein Expression". Genetic Engineering & Biotechnology News. Bioprocessing. 30 (9). Mary Ann Liebert. pp. 1, 40–41. Archived from the original on 9 May 2010. Retrieved 11 May 2010
- Welch M, Govindarajan M, Ness JE, Villalobos A, Gurney A, Minshull J, Gustafsson C (2009). Kudla G, ed. "Design Parameters to Control Synthetic Gene Expression in Escherichia coli". PLoS ONE. 4 (9): e7002. Bibcode:2009PLoSO...4.7002W. doi:10.1371/journal.pone.0007002. PMC . PMID 19759823.
- "Protein Expression". DNA2.0. Retrieved 11 May 2010.
- Fuhrmann M, Oertel W, Hegemann P (August 1999). "A synthetic gene coding for the green fluorescent protein (GFP) is a versatile reporter in Chlamydomonas reinhardtii". Plant J. 19 (3): 353–61. doi:10.1046/j.1365-313X.1999.00526.x. PMID 10476082.
- Mandecki W, Bolling TJ (August 1988). "FokI method of gene synthesis". Gene. 68 (1): 101–7. doi:10.1016/0378-1119(88)90603-8. PMID 3265397.
- Stemmer WP, Crameri A, Ha KD, Brennan TM, Heyneker HL (October 1995). "Single-step assembly of a gene and entire plasmid from large numbers of oligodeoxyribonucleotides". Gene. 164 (1): 49–53. doi:10.1016/0378-1119(95)00511-4. PMID 7590320.
- Gao X, Yo P, Keith A, Ragan TJ, Harris TK (November 2003). "Thermodynamically balanced inside-out (TBIO) PCR-based gene synthesis: a novel method of primer design for high-fidelity assembly of longer gene sequences". Nucleic Acids Res. 31 (22): e143. doi:10.1093/nar/gng143. PMC . PMID 14602936.
- Young L, Dong Q (2004). "Two-step total gene synthesis method". Nucleic Acids Res. 32 (7): e59. doi:10.1093/nar/gnh058. PMC . PMID 15087491.
- Hillson NH, Rosengarten RD, Keasling JD (2012). "j5 DNA Assembly Design Automation Software". ACS Synthetic Biology. 1 (1): 14–21. doi:10.1021/sb2000116.
- Hoover DM, Lubkowski J (May 2002). "DNAWorks: an automated method for designing oligonucleotides for PCR-based gene synthesis". Nucleic Acids Res. 30 (10): e43. doi:10.1093/nar/30.10.e43. PMC . PMID 12000848.
- Villalobos A, Ness JE, Gustafsson C, Minshull J, Govindarajan S (2006). "Gene Designer: a synthetic biology tool for constructing artificial DNA segments". BMC Bioinformatics. 7: 285. doi:10.1186/1471-2105-7-285. PMC . PMID 16756672.
- Tian J, Gong H, Sheng N, et al. (December 2004). "Accurate multiplex gene synthesis from programmable DNA microchips". Nature. 432 (7020): 1050–4. Bibcode:2004Natur.432.1050T. doi:10.1038/nature03151. PMID 15616567.
- Matzas M; et al. (2010). "High-fidelity gene synthesis by retrieval of sequence-verified DNA identified using high-throughput pyrosequencing". Nature Biotechnology. 28 (12): 1291–1294. doi:10.1038/nbt.1710. PMC . PMID 21113166.
- Schwartz JJ, Lee C, Shendure J (2012). "Accurate gene synthesis with tag-directed retrieval of sequence-verified DNA molecules". Nature Methods. 9: 913–915. doi:10.1038/nmeth.2137. PMC . PMID 22886093.
- Cheng, Allen A.; Lu, Timothy K. (2012). "Synthetic Biology: An Emerging Engineering Discipline". Annual Review of Biomedical Engineering. 14 (1): 155–178. doi:10.1146/annurev-bioeng-071811-150118. PMID 22577777.
- "Genome Transplantation in Bacteria: Changing One Species to Another". Science. 2007-06-28. Archived from the original on 24 May 2010. Retrieved 2010-05-22.
- Pilkington, Ed (2009-10-06). "I am creating artificial life, declares US gene pioneer". London: The Guardian. Archived from the original on 28 May 2010. Retrieved 2010-05-22.
- "Synthetic Genome Brings New Life to Bacterium" (PDF). Science. 2010-05-21. Archived (PDF) from the original on 25 May 2010. Retrieved 2010-05-21.
- "How scientists made 'artificial life'". BBC News. 2010-05-20. Archived from the original on June 1, 2013. Retrieved 2010-05-21.
- Shukman, David (27 March 2014). "Scientists hail synthetic chromosome advance". BBC News. Retrieved 2014-03-28.
- Annaluru, Narayana; et al. (March 27, 2014). "Total Synthesis of a Functional Designer Eukaryotic Chromosome". Science. 344: 55–58. Bibcode:2014Sci...344...55A. doi:10.1126/science.1249252. Retrieved 2014-03-28.
- Fikes, Bradley J. (May 8, 2014). "Life engineered with expanded genetic code". San Diego Union Tribune. Retrieved 8 May 2014.
- Yang, Zunyi; et al. (August 15, 2011). "Amplification, Mutation, and Sequencing of a Six-Letter Synthetic Genetic System". J. Am. Chem. Soc. 133 (38): 15105–15112. doi:10.1021/ja204910n.
- Yamashige, Rie; et al. (March 2012). "Highly specific unnatural base pair systems as a third base pair for PCR amplification". Nucl Acids Res. 40 (6): 2793–2806. doi:10.1093/nar/gkr1068.
- Malyashev, D. A.; et al. (July 24, 2012). "Efficient and sequence-independent replication of DNA containing a third base pair establishes a functional six-letter genetic alphabet". Proc. Natl. Acad. Sci. USA. 109 (30): 12005–12010. Bibcode:2012PNAS..10912005M. doi:10.1073/pnas.1205176109.
- Callaway, Ewan (May 7, 2014). "Scientists Create First Living Organism With 'Artificial' DNA". Nature News. Huffington Post. Retrieved 8 May 2014.
- Pollack, Andrew (May 7, 2014). "Scientists Add Letters to DNA's Alphabet, Raising Hope and Fear". New York Times. Retrieved 8 May 2014.
- Carr PA, Park JS, Lee YJ, Yu T, Zhang S, Jacobson JM (2004). "Protein-mediated error correction for de novo DNA synthesis". Nucleic Acids Res. 32 (20): e162. doi:10.1093/nar/gnh160. PMC . PMID 15561997.
- Fuhrmann M, Oertel W, Berthold P, Hegemann P (2005). "Removal of mismatched bases from synthetic genes by enzymatic mismatch cleavage". Nucleic Acids Res. 33 (6): e58. doi:10.1093/nar/gni058. PMC . PMID 15800209.
- Tian J, Ma K, Saaem I (July 2009). "Advancing high-throughput gene synthesis technology". Mol Biosyst. 5 (7): 714–22. doi:10.1039/b822268c. PMID 19562110. [Instant insight: Rewriting the genetic code Lay summary] Check
- Craig Venter: On the Verge of Creating Synthetic Life - TED (Technology Entertainment Design) conference (video)