Jump to content

DNA sequencing: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
→‎Chain termination method: preliminary editing
Frankatca (talk | contribs)
→‎Next generation technology: added ZS Genetics TEM method ~~~~
Line 57: Line 57:
==Next generation technology==
==Next generation technology==
Given the intense interest from the medical and pharmaceutical industrys, and the large amount of money available from Goverment and private sources, many companies and firms are trying to develop next generation DNA sequencing technologies to replace the current paradigm of dye terminator ladders seperated by capillary electrophoresis.<br />
Given the intense interest from the medical and pharmaceutical industrys, and the large amount of money available from Goverment and private sources, many companies and firms are trying to develop next generation DNA sequencing technologies to replace the current paradigm of dye terminator ladders seperated by capillary electrophoresis.<br />

Feb. 2nd, 2006, William R. Glover, III, filed multiple patent applications, assigned to ZS Genetics, Inc., for a novel, rapid sequencing method using Transmission Electron Microscopy, DNA labeled with TEM-visible marker atoms, such as bromine and iodine, and machine vision systems said to be capable of reading more than 10^7 bp/hour and DNA strands of 20,000 bp, or more, dramatically reducing the computational requirement. On March 26, 2007, William Glover, at an open seminar at MIT, stated that ZS Genetics expects to demonstrate this new method of sequencing in 2008.


As of Feb 2007, some of the players include Perlegen, an Affymetrix subsidiary, which uses sequencing by hybridization, originally proposed <ref>Genomics. 1989, pages 114-28. Sequencing of megabase plus DNA by hybridization: theory of the method. Genetic Engineering Center, Belgrade, Yugoslavia </ref>by Drmanac et al.; Solexa, now owned by Illumina, which uses a Bridge amplification technology <ref>US Patent 5,641,658</ref> originally developed by Adams and Kron; Helicos, a private company in Cambridge, MA which emphasizes single molecule sequencing; 454 corporation, a subsidiary of Curagen with funding from Roche, using the innovative pyrosequencing method developed by Ronaghi et al <ref> Nucleic Acids Research. 2004, page e166</ref>; a polony based technique (from Mitra and Church at Harvard) developed by Agencourt and licensed to ABI, and doubtless others.<br />
As of Feb 2007, some of the players include Perlegen, an Affymetrix subsidiary, which uses sequencing by hybridization, originally proposed <ref>Genomics. 1989, pages 114-28. Sequencing of megabase plus DNA by hybridization: theory of the method. Genetic Engineering Center, Belgrade, Yugoslavia </ref>by Drmanac et al.; Solexa, now owned by Illumina, which uses a Bridge amplification technology <ref>US Patent 5,641,658</ref> originally developed by Adams and Kron; Helicos, a private company in Cambridge, MA which emphasizes single molecule sequencing; 454 corporation, a subsidiary of Curagen with funding from Roche, using the innovative pyrosequencing method developed by Ronaghi et al <ref> Nucleic Acids Research. 2004, page e166</ref>; a polony based technique (from Mitra and Church at Harvard) developed by Agencourt and licensed to ABI, and doubtless others.<br />

Revision as of 01:54, 29 March 2007

DNA sequencing is the process of determining the order of the nucleotide bases, adenine, guanine, cytosine, and thymine, in a DNA oligonucleotide. The sequence of DNA constitutes the heritable genetic information in nuclei, plasmids, mitochondria, and chloroplasts that forms the basis for the developmental programs of all living organisms. Determining the DNA sequence is therefore useful in basic research studying fundamental biological processes, as well as in applied fields such as diagnostic or forensic research. Because DNA is key to all living organisms, knowledge of the DNA sequence may be useful in almost any biological subject area. For example, in medicine it can be used to identify, diagnose and potentially develop treatments for genetic diseases. Similarly, genetic research into plant or animal pathogens may lead to treatments of various diseases caused by these pathogens.

For thirty years, a large proportion of DNA sequencing has been carried out with the chain termination method [1], developed by Frederick Sanger and Howard Chadwell in 1975. This technique uses sequence-specific termination of an in vitro DNA synthesis reaction using modified nucleotides as substrate for DNA polymerases. The advent of DNA sequencing has significantly accelerated biological research and discovery. The rapid speed of sequencing attainable with modern DNA sequencing technology has been instrumental in the large-scale sequencing of the human genome, in the Human Genome Project. Related projects, often by scientific collaboration across continents, have generated the complete DNA sequences of many animal, plant, and microbial genomes.

Early Methods

Prior to the development of rapid DNA sequencing methods in the early 1970s by Sanger in England and Gilbert et al. at Harvard, [1], a number of laborious methods were used. For instance, in 1973 [2] Gilbert and Maxam reported the sequence of 24 basepairs using a method known as wandering spot analysis.

Maxam-Gilbert sequencing

In 1976-7, Allan Maxam and Walter Gilbert developed a method of DNA sequencing based on chemical modification of DNA followed by its subsequent cleavage at specific bases[2]. Although Maxam and Gilbert published their chemical sequencing method two years after the ground breaking paper of Sanger and Coulson on plus minus sequencing [3] [4], Maxam-Gilbert sequencing rapidly became more popular, since purified DNA could be used directly, while the initial Sanger method required that each read start be cloned for production of single-stranded DNA. However, as the chain termination method has been developed and improved, Maxam-Gilbert sequencing has fallen out of favour due to its technical complexity, the need for use of hazardous chemicals, and difficulties with scale-up.

In brief, the method requires purifying a particular DNA fragment, radioactively labeled at one end. Chemical treatment generates breaks at a small proportion of one or two of the four nucleotide bases in each of four reactions (G, A+G, C, C+T). Thus a series of labelled fragments is generated, from the radiolabelled end to the first 'cut' site in each molecule. The fragments are then size-separated in a gel, with the four reactions arranged side by side. The gel can then be exposed to photographic film, yielding an image of a series of 'bands', from which the sequence may be inferred.

Also sometimes known as 'chemical sequencing', this method originated in the study of DNA-protein interactions (footprinting), nucleic acid structure and epigenetic modifications to DNA, and within these it still has important applications.

Chain termination method

Part of a radioactively labelled sequencing gel

While the chemical sequencing method of Maxam and Gilbert, and the plus-minus method of Sanger and Coulson were orders of magnitude faster than previous methods, the chain-terminator method developed by Sanger was even more efficient, and rapidly became the method of choice. For instance, the Maxam-Gilbert technique requires the use of highly toxic chemicals, and the preparation of large amounts of radioactively labeled DNA; the chain-terminator method uses far fewer toxic chemicals and lower amounts of radioactivity.

The key invention was the use of dideoxynucleotide triphosphates (ddNTPS) as chain terminators for DNA polymerase; the later development by L Hood and coworkers of fluorescently labeled ddNTPs and primers set the stage for automated, high-throughput DNA sequencing. Thus, the Sanger method can be seen as the beginning of the transformation of DNA sequencing from an art, requiring highly skilled workers, to a craft that requires only modest amounts of skill.
In chain-terminator sequencing (Sanger sequencing), which is possible because of the availability of clones or thermal-cycling DNA amplification, extension is initiated at a specific site on the template DNA, by using a short oligonucleotide 'primer' complementary to the template at that region. The oligonucleotide primer is extended using a DNA polymerase, an enzyme that replicates DNA. Included with the primer and DNA polymerase are the four deoxynucleotide bases (DNA building blocks), along with a low concentration of a chain-terminating nucleotide (most commonly a di-deoxynucleotide). Limited incorporation of the chain-terminating nucleotide by the DNA polymerase results in a series of DNA fragments of varying length that have been terminated at those positions where the chain-terminating nucleotide had been incorporated by the polymerase. The DNA fragments are then size-separated by electrophoresis in a slab polyacrylamide gel or, more commonly now, in a narrow glass tube (capillary) filled with a viscous polymer.

The classical chain-termination or Sanger method first involves preparing the DNA to be sequenced as a single strand. (The single-band preparation results in one band per nucleotide, whereas a double-strand preparation would give two bands, making sequence prediction impossible.) The DNA sample is divided into four separate samples. Each of the four samples has a primer, the four standard deoxynucleotides (dATP, dGTP, dCTP and dTTP), DNA polymerase, and only one of the four dideoxynucleotides (ddATP, ddGTP, ddCTP and ddTTP) added to it. The dideoxynucleotides are added in limited quantities. The primer or the dideoxynucleotides are either radiolabeled or have a fluorescent tag.

As the DNA strand is elongated, the DNA polymerase catalyses the joining of deoxynucleotides to the corresponding bases. The bases available to the polymerase are a mixture of normal and tagged/terminating nucleotides. So if the appropriate dideoxynucleotide happens to be near the polymerase, it is incorporated into the elongating DNA strand. The tagged/terminating base prevents further elongation because a dideoxynucleotide lacks a crucial 3'-OH group. So a series of DNA fragments are produced with random length and (base-nonspecific, hence the four separate reactions) tags. Unfortunately, only short stretches of DNA can be sequenced in each reaction. The polymerase chain reaction(PCR) technique is limited to 10,000 base-pairs and the maximum length of extension is dictated by the concentration of tagged/terminating nucleotides.

The DNA is then denatured and the resulting fragments are separated (with a resolution of just one nucleotide) by gel electrophoresis, from longest to shortest. Each of the four DNA samples is run on one of four individual lanes (lanes A, T, G, C) depending on which dideoxynucleotide was added. Depending on whether the primers or dideoxynucleotides were radiolabeled or fluorescently labeled, the DNA bands can be detected by exposure to X-rays or UV-light and the DNA sequence can be directly read off the gel. In the image on the right, X-ray film was exposed to the dried gel, and the dark bands indicate the positions of the DNA molecules of different lengths. A dark band in a lane indicates a chain termination for that particular DNA subunit and the DNA sequence can be read off as indicated.

There can be various problems with sequencing through the Sanger Method. The primer used can also be annealed to a second site. This would cause two sequences to be interpreted at the same time. This can be solved by higher annealing temperatures and higher G and C content in the primer. Another problem can occur when RNA contaminates the reaction, which can act like a primer and leads to bands in all lanes at all positions due to non specific priming. Other contaminants can be from other plasmids, inhibitors of DNA polymerase, and low concentrations in general. Secondary structure of DNA being read by DNA polymerase can lead to reading problems and will be visualized on the readout by bands in all lanes of only a few positions. In short, the problems of this method are the standard problems one would encounter in PCR.

There are two sub-types of chain-termination sequencing. In the original method, the nucleotide order of a particular DNA template can be inferred by performing four parallel extension reactions using one of the four chain-terminating bases in each reaction. The DNA fragments are detected by labelling the primer with a base-nonspecific label, radioactive phosphorus for example, prior to performing the sequencing reaction. The four reactions would then be run out in four adjacent lanes on a slab polyacrylamide gel.

The Sanger method can be done using primers that add a non-specific label on the 5' end of the PCR product. Instead of the label being included in the terminating nucleotide, the label is in the primer. The difference between this and the radioactive Sanger method is that the label is at the 5' end instead of the 3' end. Four separate reactions are still required, but the dye labels can be read using an optical system instead of film or phosphor storage screens, so it is faster, cheaper, and easier to automate. This approach is known as 'dye-primer sequencing'.

Dye terminator sequencing

View of the start of an example dye-terminator read (click to expand)

An alternative to the labelling of the primer is to label the terminators instead, commonly called 'dye terminator sequencing'. The major advantage of this approach is that the complete sequencing set can be performed in a single reaction, rather than the four needed with the labeled-primer approach. This is accomplished by labelling each of the dideoxynucleotide chain-terminators with a separate fluorescent dye, which fluoresces at a different wavelength. This method is easier and quicker than the dye primer approach, but may produce more uneven data peaks (different heights), due to a template dependent difference in the incorporation of the large dye chain-terminators. This problem has been significantly reduced with the introduction of new enzymes and dyes that minimize incorporation variability.

This method is now used for the vast majority of sequencing reactions as it is both simpler and cheaper. The major reason for this is that the primers do not have to be separately labelled (which can be a significant expense for a single-use custom primer), although this is less of a concern with frequently used 'universal' primers.

Automation and sample preparation

Modern automated DNA sequencing instruments (called DNA sequencers) are able to sequence as many as 384 fluoresecently labelled samples in a batch (run) and perform as many as 24 runs a day. These perform only the size separation and peak reading; the actual sequencing reaction(s), cleanup and resuspension in a suitable buffer solution must be performed separately.

The magnitude of the fluorescent signal is related to the number of strands of DNA that are in the reaction. If the initial amount of DNA is small, the signals will be weak. However, the properties of PCR allow one to increase the signal by increasing the number of cycles in the PCR programme.

Large-scale sequencing strategies

Current methods can directly sequence only short lengths of DNA at a time. For example, modern sequencing machines using the Sanger method can achieve a maximum of around 1000 base pairs [3]. This limitation is due to the geometrically decreasing probability of chain termination at increasing lengths, as well as physical limitations on gel size and resolution.

It is often necessary to obtain the sequence of much larger regions. For example, even simple bacterial genomes contain millions of base pairs, and the human chromosome 1 has about 246 million. Several strategies have been devised for larger-scale DNA sequencing. Primer walking, often with cloning and sub-cloning steps (dependent on the size of the region to be sequenced), used to be the standard method. With the increase in computing power, shotgun sequencing is now common, or used as part of a hybrid method. These strategies all involve taking many small reads of the DNA by one of the above methods and subsequently assembling them into a contiguous sequence. The different strategies have different tradeoffs in speed and accuracy; the shotgun method is the most practical for sequencing large genomes, but its assembly process is complex and potentially error-prone - particularly in the presence of repeating sequences.

It is only possible to obtain high quality sequence data when the desired segment of DNA is relatively pure, i.e. free from other contaminants, including other DNA. This can be achieved through PCR for shorter regions (several kilobases), if a very short sequence at both ends is known. Alternatively, the sample can be cloned using a "vector", essentially using bacteria to "grow" copies of the desired DNA (less than a thousand to tens of thousands of base-pairs per clone). The vector DNA can then be easily purified away from the cell and the rest of the cell's DNA. Most large-scale sequencing efforts involve the preparation of a large library of such clones. A disadvantage of sequencing clones is that some areas may be un-clonable due to deleterious effect of the cloned sequence on the host bacterium. On the other hand, PCR can also generate artifacts, including the creation of non-specific PCR products.

Next generation technology

Given the intense interest from the medical and pharmaceutical industrys, and the large amount of money available from Goverment and private sources, many companies and firms are trying to develop next generation DNA sequencing technologies to replace the current paradigm of dye terminator ladders seperated by capillary electrophoresis.

Feb. 2nd, 2006, William R. Glover, III, filed multiple patent applications, assigned to ZS Genetics, Inc., for a novel, rapid sequencing method using Transmission Electron Microscopy, DNA labeled with TEM-visible marker atoms, such as bromine and iodine, and machine vision systems said to be capable of reading more than 10^7 bp/hour and DNA strands of 20,000 bp, or more, dramatically reducing the computational requirement. On March 26, 2007, William Glover, at an open seminar at MIT, stated that ZS Genetics expects to demonstrate this new method of sequencing in 2008.

As of Feb 2007, some of the players include Perlegen, an Affymetrix subsidiary, which uses sequencing by hybridization, originally proposed [5]by Drmanac et al.; Solexa, now owned by Illumina, which uses a Bridge amplification technology [6] originally developed by Adams and Kron; Helicos, a private company in Cambridge, MA which emphasizes single molecule sequencing; 454 corporation, a subsidiary of Curagen with funding from Roche, using the innovative pyrosequencing method developed by Ronaghi et al [7]; a polony based technique (from Mitra and Church at Harvard) developed by Agencourt and licensed to ABI, and doubtless others.
Other proposals include labeling the DNA polymerase [8], reading the sequence as a strand transits through some sort of nanopore [9], vision based techniques capable of single molecule resolution, such as AFM or electron microscopy, and doubtless many others.

Major landmarks in DNA sequencing

  • 1951 Discovery of the structure of the DNA helix.
“With my fingers too cold to write legibly I huddled next to the fireplace, daydreaming about how several DNA chains could fold together in a pretty and hopefully scientific way. Soon, however, I abandoned thinking at the molecular level and turned to the much easier job of reading biochemical papers on the interrelations of DNA, RNA and protein synthesis.” from Chapter 21 of The Double Helix by James Dewey Watson.
  • 1972 Development of recombinant DNA technology, which permits isolation of defined fragments of DNA; prior to this, the only accessible samples for sequencing were from bacteriophage or virus DNA.
  • 1977 Allan Maxam and Walter Gilbert publish DNA Sequencing by chemical degradation [4]. Fred Sanger, independently, publishes DNA sequencing by enzymatic synthesis.
  • 1980 Fred Sanger and Wally Gilbert receive the Nobel Prize
  • 1981 Genbank started as a public repository of DNA sequences.
    • Andre Marion and Sam Eletr from Hewlett Packard start Applied Biosystems, that comes to dominate automated sequencing, in May.
  • 1982 Akiyoshi Wada proposes automated sequencing and gets support to build robots with help from Hitachi.
  • 1984 MRC scientists decipher the complete DNA sequence of the Epstein-Barr virus, 170 kb.
  • 1985 Mullis and colleagues develop PCR, a technique to replicate small amounts of DNA
  • 1986 Leroy E. Hood's Laboratory at the California Institute of Technology and Smith announce the first semi-automated DNA sequencing machine.
  • 1987 Applied Biosystems markets first automated sequencing machine, the Prism 373.
    • Walter Gilbert leaves the U.S. National Research Council genome panel to start Genome Corp., with the goal of sequencing and commercializing the data.
  • 1990 NIH begins large-scale sequencing trials on Mycoplasma capricolum, Escherichia coli, Caenorhabditis elegans, and Saccharomyces cerevisiae (@75c/base.
    • Lipman, Myers publish the BLAST algorithm for aligning sequences.
    • Barry Karger (Analytical Chemistry, January), Lloyd Smith (Nucleic Acids Research, August), and Norman Dovichi (Journal of Chromatography, September) publish on capillary electrophoresis.
  • 1991 Craig Venter develops strategy to find expressed genes with ESTs (Expressed Sequence Tags).
    • Uberbacher develops GRAIL- gene-finding program.
  • 1992 Craig Venter leaves NIH to set up The Institute for Genomic Research (TIGR). William Haseltine heads Human Genome Sciences, to commercialize TIGR products.
    • Wellcome Trust begins participation in the Human Genome Project.
    • Simon et al. develop BACs (Bacterial Artificial Chromosomes) for cloning. **First chromosome physical maps published:
    • Page et al. - Y chromosome;
    • Cohen et al. chromosome 21.
    • Lander - complete mouse genetic map;
    • Weissenbach - complete human genetic map.
  • 1993 Wellcome Trust and MRC open Sanger Centre, near Cambridge, UK.
    • The GenBank database migrates from Los Alamos (DOE) to NCBI (NIH).
  • 1995 Venter, Fraser and Smith publish first sequence of free-living organism, Haemophilus influenzae (1.8 Mb).
    • Richard Mathies et al. publish on sequencing dyes (PNAS, May).
    • Michael Reeve and Carl Fuller, thermostable polymerase for sequencing (Nature, August).
  • 1996 International HGP partners agree to release sequence data into public databases within 24 hours.
    • International consortium releases genome sequence of yeast S. cerevisiae (12.1 Mb).
    • Yoshihide Hayashizaki's at RIKEN completes the first set of full-length mouse cDNAs.
    • ABI introduces a capillary electrophoresis system, the 310.
  • 1997 Blattner, Plunkett et al. publish the sequence of E. coli (5 Mb)
  • 1998 Phil Green and Brent Ewing of Washington University publish “phred” for interpreting sequencer data (in use since ‘95).
    • Venter starts new company “Celera”; “will sequence HG in 3 yrs for $300m.”
    • Applied Biosystems introduces the 3700 capillary sequencing machine.
    • Wellcome Trust doubles support for the HGP to $330 million for 1/3 of the sequencing.
    • NIH & DOE goal: "working draft" of the human genome by 2001.
    • Sulston, Waterston et al finish sequence of C. elegans (97Mb).
  • 1999 NIH moves up completion date for rough draft, to spring 2000.
    • NIH launches the mouse genome sequencing project.
    • First sequence of human chromosome 22 published.
  • 2000 Celera and collaborators sequence fruit fly Drosophila melanogaster (180Mb) - validation of Venter's shotgun method. HGP and Celera debate issues related to data release.
    • HGP consortium publishes sequence of chromosome 21.
    • HGP & Celera jointly announce working drafts of HG sequence, promise joint publication.
    • Human gene estimates range from 35,000 to 120,000. International consortium completes 1st plant sequence, Arabidopsis thaliana (125 Mb).
  • 2001 HGP consortium publishes Human Genome Sequence draft in Nature (15 Feb).
    • Celera publishes the Human Genome sequence in Science (16 Feb).
  • 2003 Cold Spring Harbor sponsors GeneSweep, a sweepstakes on the number of human genes.
  • 2005 420,000 VariantSEQr human resequencing primer sequences published on new NCBI Probe database.
  • 2007 Solexa and Applied Biosystems release next-generation sequencing technology with potential to produce 10^6 greater sequencing data than capillary electrophoresis systems. Illumina acquires Solexa.

See also

  • Genome project - how entire genomes are assembled from these short sequences.
  • Applied Biosystems - provided most of the chemistry and equipment for the genome projects. Next-generation technology for very high data generation rates.
  • 454 Life Sciences - company specializing in high-throughput DNA sequencing using a sequencing-by-synthesis approach.
  • Illumina (company) - Advancing genetic analysis one billion bases at a time; whole genome sequencing.
  • Joint Genome Institute - sequencing center from the US Department of Energy whose mission is to provide integrated high-throughput sequencing and computational analysis to enable genomic-scale/systems-based scientific approaches to DOE-relevant challenges in energy and the environment.

Citations

  1. ^ http://nobelprize.org/nobel_prizes/chemistry/laureates/1980/gilbert-lecture.pdf
  2. ^ Proc Natl Acad Sci U S A. 1973 December; 70(12 Pt 1-2): 3581–3584. The Nucleotide Sequence of the lac Operator, Walter Gilbert and Allan Maxam
  3. ^ Sanger, F. & Coulson, A. R. (1975) J. Mol. Biol. 94, 441-448
  4. ^ http://nobelprize.org/nobel_prizes/chemistry/laureates/1980/sanger-lecture.pdf
  5. ^ Genomics. 1989, pages 114-28. Sequencing of megabase plus DNA by hybridization: theory of the method. Genetic Engineering Center, Belgrade, Yugoslavia
  6. ^ US Patent 5,641,658
  7. ^ Nucleic Acids Research. 2004, page e166
  8. ^ http://visigenbio.com/technology_overview.html
  9. ^ http://mcb.harvard.edu/branton/index.htm