Sanger sequencing

Sanger sequencing is a method of DNA sequencing first commercialized by Applied Biosystems, based on the selective incorporation of chain-terminating dideoxynucleotides by DNA polymerase during in vitro DNA replication.^[1]^[2] Developed by Frederick Sanger and colleagues in 1977, it was the most widely used sequencing method for approximately 40 years. More recently, higher volume Sanger sequencing has been supplanted by "Next-Gen" sequencing methods, especially for large-scale, automated genome analyses. However, the Sanger method remains in wide use, for smaller-scale projects, validation of Next-Gen results and for obtaining especially long contiguous DNA sequence reads (> 500 nucleotides).

Method

The classical chain-termination method requires a single-stranded DNA template, a DNA primer, a DNA polymerase, normal deoxynucleosidetriphosphates (dNTPs), and modified di-deoxynucleotidetriphosphates (ddNTPs), the latter of which terminate DNA strand elongation. These chain-terminating nucleotides lack a 3'-OH group required for the formation of a phosphodiester bond between two nucleotides, causing DNA polymerase to cease extension of DNA when a modified ddNTP is incorporated. The ddNTPs may be radioactively or fluorescently labeled for detection in automated sequencing machines.

The DNA sample is divided into four separate sequencing reactions, containing all four of the standard deoxynucleotides (dATP, dGTP, dCTP and dTTP) and the DNA polymerase. To each reaction is added only one of the four dideoxynucleotides (ddATP, ddGTP, ddCTP, or ddTTP), while the other added nucleotides are ordinary ones. The dideoxynucleotide is added in approximately 100-fold excess of the corresponding deoxynucleotide(e.g. 0.005mM dTTP : 0.5mM ddTTP) allowing for enough fragments to be produced while still transcribing the complete sequence.^[2] Putting it in a more sensible order, four separate reactions are needed in this process to test all four ddNTPs. Following rounds of template DNA extension from the bound primer, the resulting DNA fragments are heat denatured and separated by size using gel electrophoresis. In the original publication of 1977,^[2] the formation of base-paired loops of ssDNA was a cause of serious difficulty in resolving bands at some locations. This is frequently performed using a denaturing polyacrylamide-urea gel with each of the four reactions run in one of four individual lanes (lanes A, T, G, C). The DNA bands may then be visualized by autoradiography or UV light and the DNA sequence can be directly read off the X-ray film or gel image.

Part of a radioactively labelled sequencing gel

In the image on the right, X-ray film was exposed to the gel, and the dark bands correspond to DNA fragments of different lengths. A dark band in a lane indicates a DNA fragment that is the result of chain termination after incorporation of a dideoxynucleotide (ddATP, ddGTP, ddCTP, or ddTTP). The relative positions of the different bands among the four lanes, from bottom to top, are then used to read the DNA sequence.

Technical variations of chain-termination sequencing include tagging with nucleotides containing radioactive phosphorus for radiolabelling, or using a primer labeled at the 5' end with a fluorescent dye. Dye-primer sequencing facilitates reading in an optical system for faster and more economical analysis and automation. The later development by Leroy Hood and coworkers^[3]^[4] of fluorescently labeled ddNTPs and primers set the stage for automated, high-throughput DNA sequencing.

Chain-termination methods have greatly simplified DNA sequencing. For example, chain-termination-based kits are commercially available that contain the reagents needed for sequencing, pre-aliquoted and ready to use. Limitations include non-specific binding of the primer to the DNA, affecting accurate read-out of the DNA sequence, and DNA secondary structures affecting the fidelity of the sequence.

Dye-terminator sequencing

Dye-terminator sequencing utilizes labelling of the chain terminator ddNTPs, which permits sequencing in a single reaction, rather than four reactions as in the labelled-primer method. In dye-terminator sequencing, each of the four dideoxynucleotide chain terminators is labelled with fluorescent dyes, each of which emit light at different wavelengths.

Owing to its greater expediency and speed, dye-terminator sequencing is now the mainstay in automated sequencing. Its limitations include dye effects due to differences in the incorporation of the dye-labelled chain terminators into the DNA fragment, resulting in unequal peak heights and shapes in the electronic DNA sequence trace chromatogram after capillary electrophoresis (see figure to the left).

This problem has been addressed with the use of modified DNA polymerase enzyme systems and dyes that minimize incorporation variability, as well as methods for eliminating "dye blobs". The dye-terminator sequencing method, along with automated high-throughput DNA sequence analyzers, is now being used for the vast majority of sequencing projects.

Automation and sample preparation

View of the start of an example dye-terminator read

Automated DNA-sequencing instruments (DNA sequencers) can sequence up to 384 DNA samples in a single batch. Batch runs may occur up to 24 times a day. DNA sequencers separate strands by size (or length) using capillary electrophoresis, they detect and record dye fluorescence, and output data as fluorescent peak trace chromatograms. Sequencing reactions (thermocycling and labelling), cleanup and re-suspension of samples in a buffer solution are performed separately, before loading samples onto the sequencer. A number of commercial and non-commercial software packages can trim low-quality DNA traces automatically. These programs score the quality of each peak and remove low-quality base peaks (which are generally located at the ends of the sequence). The accuracy of such algorithms is inferior to visual examination by a human operator, but is adequate for automated processing of large sequence data sets.

Challenges

Common challenges of DNA sequencing with the Sanger method include poor quality in the first 15-40 bases of the sequence due to primer binding^{[citation needed]} and deteriorating quality of sequencing traces after 700-900 bases. Base calling software such as Phred typically provides an estimate of quality to aid in trimming of low-quality regions of sequences.^[5]^[6]

In cases where DNA fragments are cloned before sequencing, the resulting sequence may contain parts of the cloning vector. In contrast, PCR-based cloning and next-generation sequencing technologies based on pyrosequencing often avoid using cloning vectors. Recently, one-step Sanger sequencing (combined amplification and sequencing) methods such as Ampliseq and SeqSharp have been developed that allow rapid sequencing of target genes without cloning or prior amplification.^[7]^[8]

Current methods can directly sequence only relatively short (300-1000 nucleotides long) DNA fragments in a single reaction. The main obstacle to sequencing DNA fragments above this size limit is insufficient power of separation for resolving large DNA fragments that differ in length by only one nucleotide.

Microfluidic Sanger sequencing

Microfluidic Sanger sequencing is a lab-on-a-chip application for DNA sequencing, in which the Sanger sequencing steps (thermal cycling, sample purification, and capillary electrophoresis) are integrated on a wafer-scale chip using nanoliter-scale sample volumes. This technology generates long and accurate sequence reads, while obviating many of the significant shortcomings of the conventional Sanger method (e.g. high consumption of expensive reagents, reliance on expensive equipment, personnel-intensive manipulations, etc.) by integrating and automating the Sanger sequencing steps.

In its modern inception, high-throughput genome sequencing involves fragmenting the genome into small single-stranded pieces, followed by amplification of the fragments by Polymerase Chain Reaction (PCR). Adopting the Sanger method, each DNA fragment is irreversibly terminated with the incorporation of a fluorescently labeled dideoxy chain-terminating nucleotide, thereby producing a DNA “ladder” of fragments that each differ in length by one base and bear a base-specific fluorescent label at the terminal base. Amplified base ladders are then separated by Capillary Array Electrophoresis (CAE) with automated, in situ “finish-line” detection of the fluorescently labeled ssDNA fragments, which provides an ordered sequence of the fragments. These sequence reads are then computer assembled into overlapping or contiguous sequences (termed "contigs") which resemble the full genomic sequence once fully assembled.^[9]

Sanger methods achieve read lengths of approximately 800bp (typically 500-600bp with non-enriched DNA). The longer read lengths in Sanger methods display significant advantages over other sequencing methods especially in terms of sequencing repetitive regions of the genome. A challenge of short-read sequence data is particularly an issue in sequencing new genomes (de novo) and in sequencing highly rearranged genome segments, typically those seen of cancer genomes or in regions of chromosomes that exhibit structural variation.^[10]

Applications of microfluidic sequencing technologies

Other useful applications of DNA sequencing include single nucleotide polymorphism (SNP) detection, single-strand conformation polymorphism (SSCP) heteroduplex analysis, and short tandem repeat (STR) analysis. Resolving DNA fragments according to differences in size and/or conformation is the most critical step in studying these features of the genome.^[9]

Device design

The sequencing chip has a four-layer construction, consisting of three 100-mm-diameter glass wafers (on which device elements are microfabricated) and a polydimethylsiloxane (PDMS) membrane. Reaction chambers and capillary electrophoresis channels are etched between the top two glass wafers, which are thermally bonded. Three-dimensional channel interconnections and microvalves are formed by the PDMS and bottom manifold glass wafer.

The device consists of three functional units, each corresponding to the Sanger sequencing steps. The Thermal Cycling (TC) unit is a 250-nanoliter reaction chamber with integrated resistive temperature detector, microvalves, and a surface heater. Movement of reagent between the top all-glass layer and the lower glass-PDMS layer occurs through 500-μm-diameter via-holes. After thermal-cycling, the reaction mixture undergoes purification in the capture/purification chamber, and then is injected into the capillary electrophoresis (CE) chamber. The CE unit consists of a 30-cm capillary which is folded into a compact switchback pattern via 65-μm-wide turns.

Sequencing chemistry

Thermal cycling

In the TC reaction chamber, dye-terminator sequencing reagent, template DNA, and primers are loaded into the TC chamber and thermal-cycled for 35 cycles ( at 95 °C for 12 seconds and at 60 °C for 55 seconds).

Purification

The charged reaction mixture (containing extension fragments, template DNA, and excess sequencing reagent) is conducted through a capture/purification chamber at 30 °C via a 33-Volts/cm electric field applied between capture outlet and inlet ports. The capture gel through which the sample is driven, consists of 40 μM of oligonucleotide (complementary to the primers) covalently bound to a polyacrylamide matrix. Extension fragments are immobilized by the gel matrix, and excess primer, template, free nucleotides, and salts are eluted through the capture waste port. The capture gel is heated to 67-75 °C to release extension fragments.

Capillary electrophoresis

Extension fragments are injected into the CE chamber where they are electrophoresed through a 125-167-V/cm field.

Platforms

The Apollo 100 platform (Microchip Biotechnologies Inc., Dublin, CA)^[11] integrates the first two Sanger sequencing steps (thermal cycling and purification) in a fully automated system. The manufacturer claims that samples are ready for capillary electrophoresis within three hours of the sample and reagents being loaded into the system. The Apollo 100 platform requires sub-microliter volumes of reagents.

Comparisons to other sequencing techniques

Performance values for genome sequencing technologies including Sanger methods and next-generation methods^[10]^[12]
Technology	Number of lanes	Injection volume (nL)	Analysis time	Average read length	Throughput (including analysis; Mb/h)	Gel pouring	Lane tracking
Slab gel	96	500–1000	6–8 hours	700 bp	0.0672	Yes	Yes
Capillary array electrophoresis	96	1–5	1–3 hours	700 bp	0.166	No	No
Microchip	96	0.1–0.5	6–30 minutes	430 bp	0.660	No	No
454/Roche FLX		< 0.001	4 hours	200–300 bp	20–30	No
Illumina/Solexa			2–3 days	30–100 bp	20	No
ABI/SOLiD			8 days	35 bp	5–15	No

The ultimate goal of high-throughput sequencing is to develop systems that are low-cost, and extremely efficient at obtaining extended (longer) read lengths. Longer read lengths of each single electrophoretic separation, substantially reduces the cost associated with de novo DNA sequencing and the number of templates needed to sequence DNA contigs at a given redundancy. Microfluidics may allow for faster, cheaper and easier sequence assembly.^[9]

References

^ Sanger F; Coulson AR (May 1975). "A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase". J. Mol. Biol. 94 (3): 441–8. doi:10.1016/0022-2836(75)90213-2. PMID 1100841.
^ ^a ^b ^c Sanger F; Nicklen S; Coulson AR (December 1977). "DNA sequencing with chain-terminating inhibitors". Proc. Natl. Acad. Sci. U.S.A. 74 (12): 5463–7. Bibcode:1977PNAS...74.5463S. doi:10.1073/pnas.74.12.5463. PMC 431765. PMID 271968.
^ Smith LM; Sanders JZ; Kaiser RJ; et al. (1986). "Fluorescence detection in automated DNA sequence analysis". Nature. 321 (6071): 674–9. Bibcode:1986Natur.321..674S. doi:10.1038/321674a0. PMID 3713851. We have developed a method for the partial automation of DNA sequence analysis. Fluorescence detection of the DNA fragments is accomplished by means of a fluorophore covalently attached to the oligonucleotide primer used in enzymatic DNA sequence analysis. A different coloured fluorophore is used for each of the reactions specific for the bases A, C, G and T. The reaction mixtures are combined and co-electrophoresed down a single polyacrylamide gel tube, the separated fluorescent bands of DNA are detected near the bottom of the tube, and the sequence information is acquired directly by computer. {{cite journal}}: Unknown parameter |name-list-format= ignored (|name-list-style= suggested) (help)
^ Smith LM; Fung S; Hunkapiller MW; Hunkapiller TJ; Hood LE (April 1985). "The synthesis of oligonucleotides containing an aliphatic amino group at the 5' terminus: synthesis of fluorescent DNA primers for use in DNA sequence analysis". Nucleic Acids Res. 13 (7): 2399–412. doi:10.1093/nar/13.7.2399. PMC 341163. PMID 4000959.
^ "Phred - Quality Base Calling". Retrieved 2011-02-24.
^ Ledergerber, C; Dessimoz, C (2011). "Base-calling for next-generation sequencing platforms". Briefings in Bioinformatics. 12 (5): 489–97. doi:10.1093/bib/bbq077. PMC 3178052. PMID 21245079.
^ Murphy, K.; Berg, K.; Eshleman, J. (2005). "Sequencing of genomic DNA by combined amplification and cycle sequencing reaction". Clinical Chemistry. 51 (1): 35–39. doi:10.1373/clinchem.2004.039164. PMID 15514094.
^ Sengupta, D. .; Cookson, B. . (2010). "SeqSharp: A general approach for improving cycle-sequencing that facilitates a robust one-step combined amplification and sequencing method". The Journal of molecular diagnostics. 12 (3): 272–277. doi:10.2353/jmoldx.2010.090134. PMC 2860461. PMID 20203000.
^ ^a ^b ^c Kan, Cheuk-Wai; Fredlake, Christopher P.; Doherty, Erin A. S.; Barron, Annelise E. (1 November 2004). "DNA sequencing and genotyping in miniaturized electrophoresis systems". Electrophoresis. 25 (21–22): 3564–3588. doi:10.1002/elps.200406161. PMID 15565709.
^ ^a ^b Morozova, Olena; Marra, Marco A (2008). "Applications of next-generation sequencing technologies in functional genomics". Genomics. 92 (5): 255. doi:10.1016/j.ygeno.2008.07.001. PMID 18703132.
^ Microchip Biologies Inc. Apollo 100
^ Sinville, Rondedrick; Soper, Steven A (2007). "High resolution DNA separations using microchip electrophoresis". Journal of Separation Science. 30 (11): 1714–28. doi:10.1002/jssc.200700150. PMID 17623451.

External links

MBI Says New Tool That Automates Sanger Sample Prep Cuts Reagent and Labor Costs

[Sanger75-1] Sanger F; Coulson AR (May 1975). "A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase". J. Mol. Biol. 94 (3): 441–8. doi:10.1016/0022-2836(75)90213-2. PMID 1100841.

[Sanger1977-2] Sanger F; Nicklen S; Coulson AR (December 1977). "DNA sequencing with chain-terminating inhibitors". Proc. Natl. Acad. Sci. U.S.A. 74 (12): 5463–7. Bibcode:1977PNAS...74.5463S. doi:10.1073/pnas.74.12.5463. PMC 431765. PMID 271968.

[3] Smith LM; Sanders JZ; Kaiser RJ; et al. (1986). "Fluorescence detection in automated DNA sequence analysis". Nature. 321 (6071): 674–9. Bibcode:1986Natur.321..674S. doi:10.1038/321674a0. PMID 3713851. We have developed a method for the partial automation of DNA sequence analysis. Fluorescence detection of the DNA fragments is accomplished by means of a fluorophore covalently attached to the oligonucleotide primer used in enzymatic DNA sequence analysis. A different coloured fluorophore is used for each of the reactions specific for the bases A, C, G and T. The reaction mixtures are combined and co-electrophoresed down a single polyacrylamide gel tube, the separated fluorescent bands of DNA are detected near the bottom of the tube, and the sequence information is acquired directly by computer. {{cite journal}}: Unknown parameter |name-list-format= ignored (|name-list-style= suggested) (help)

[4] Smith LM; Fung S; Hunkapiller MW; Hunkapiller TJ; Hood LE (April 1985). "The synthesis of oligonucleotides containing an aliphatic amino group at the 5' terminus: synthesis of fluorescent DNA primers for use in DNA sequence analysis". Nucleic Acids Res. 13 (7): 2399–412. doi:10.1093/nar/13.7.2399. PMC 341163. PMID 4000959.

[urlPhred_-_Quality_Base_Calling-5] "Phred - Quality Base Calling". Retrieved 2011-02-24.

[urlBase-calling_for_next-generation_sequencing_platforms_—_Brief_Bioinform-6] Ledergerber, C; Dessimoz, C (2011). "Base-calling for next-generation sequencing platforms". Briefings in Bioinformatics. 12 (5): 489–97. doi:10.1093/bib/bbq077. PMC 3178052. PMID 21245079.

[7] Murphy, K.; Berg, K.; Eshleman, J. (2005). "Sequencing of genomic DNA by combined amplification and cycle sequencing reaction". Clinical Chemistry. 51 (1): 35–39. doi:10.1373/clinchem.2004.039164. PMID 15514094.

[8] Sengupta, D. .; Cookson, B. . (2010). "SeqSharp: A general approach for improving cycle-sequencing that facilitates a robust one-step combined amplification and sequencing method". The Journal of molecular diagnostics. 12 (3): 272–277. doi:10.2353/jmoldx.2010.090134. PMC 2860461. PMID 20203000.

[kan-9] Kan, Cheuk-Wai; Fredlake, Christopher P.; Doherty, Erin A. S.; Barron, Annelise E. (1 November 2004). "DNA sequencing and genotyping in miniaturized electrophoresis systems". Electrophoresis. 25 (21–22): 3564–3588. doi:10.1002/elps.200406161. PMID 15565709.

[marra-10] Morozova, Olena; Marra, Marco A (2008). "Applications of next-generation sequencing technologies in functional genomics". Genomics. 92 (5): 255. doi:10.1016/j.ygeno.2008.07.001. PMID 18703132.

[two-11] Microchip Biologies Inc. Apollo 100

[12] Sinville, Rondedrick; Soper, Steven A (2007). "High resolution DNA separations using microchip electrophoresis". Journal of Separation Science. 30 (11): 1714–28. doi:10.1002/jssc.200700150. PMID 17623451.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]