DNA sequencer

From Wikipedia, the free encyclopedia
  (Redirected from DNA sequencers)
Jump to: navigation, search
DNA Sequencer
DNA-Sequencers from Flickr 57080968.jpg
DNA sequencers
Manufacturers Roche, Illumina, Life Technologies, Beckman Coulter, Pacific Biosciences

A DNA sequencer is a scientific instrument used to automate the DNA sequencing process. Given a sample of DNA, a DNA sequencer is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. The order of the DNA bases is reported as a text string, called a read. Some DNA sequencers can be also considered optical instruments as they analyze light signals originating from fluorochromes attached to nucleotides.

The first automated DNA sequencer was introduced by Applied Biosystems in 1987. It used the Sanger sequencing method, a technology which formed the basis of the “first generation” of DNA sequencers[1][2] and enabled the completion of the human genome project in 2001.[3]

The human genome project catalysed the development of cheaper, high throughput and more accurate platforms known as Next Generation Sequencers (NGS). These include the 454, SoLiD and Illumina DNA sequencing platforms. Next generation sequencing machines have increased the rate of DNA sequence substantially compared with previous Sanger methods. DNA samples can be prepared automatically in as little as 90 mins,[4] while a human genome can be sequenced at 15 times coverage in a matter of days.[5]

More recent, third-generation DNA sequencers such as SMRT and Oxford Nanopore measure the addition of nucleotides to a single DNA molecule in real time.

Because of limitations in DNA sequencer technology these reads are short compared to the length of a genome therefore the reads must be assembled into longer contigs.[6] The data may also contain errors, caused by limitations in the DNA sequencing technique or by errors during PCR amplification. DNA sequencer manufacturers use a number of different methods to detect which DNA bases are present. The specific protocols applied in different sequencing platforms have an impact in the final data that is generated. Therefore, comparing data quality and cost across different technologies can be a daunting task. Each manufacturer provides their own ways to inform sequencing errors and scores. However, errors and scores between different platforms cannot always be compared directly. Since these systems rely on different DNA sequencing approaches, choosing the best DNA sequencer and method will typically depend on the experiment objectives and available budget.[1]

History[edit]

The first DNA sequencing methods were developed by Gilbert (1973)[7] and Sanger (1975).[8] Gilbert introduced a sequencing method based on chemical modification of DNA followed by cleavage at specific bases whereas Sanger’s technique is based on dideoxynucleotide chain termination. The Sanger method became popular due to its increased efficiency and low radioactivity. The first automated DNA sequencer was the AB370, introduced in 1987 by Applied Biosystems.[9] AB370 was able to sequence 96 bases at once, 500 kilobases per day, and reaching read lengths up to 600 bases. This was the beginning of the “first generation” of DNA sequencers,[1][2] which implemented Sanger sequencing and capillary electrophoresis. These techniques formed the base for the completion of the human genome project in 2001.[3] The human genome project catalysed the development of cheaper, high throughput and more accurate platforms known as Next Generation Sequencers (NGS). In 2005, 454 Life Sciences released the 454 sequencer, followed by Solexa Genome Analyzer and SoLiD (Supported Oligo Ligation Detection) by Agencourt in 2006. Applied Biosystems acquired Agencourt in 2006, and in 2007, Roche bought 454 Life Sciences, while Illumina purchased Solexa. These are still the most common NGS systems due to their competitive cost, accuracy, and performance.

More recently, a third-generation of DNA sequencers was introduced. The sequencing methods applied by these sequencers do not require DNA amplification (polymerase chain reaction – PCR), which speeds up the sample preparation before sequencing and reduces errors. In addition, sequencing data is collected from the reactions caused by the addition of nucleotides in the complementary strand in real time. Two companies introduced different approaches in their third-generation sequencers. Pacific Biosciences sequencers utilize a method called Single-molecule real-time (SMRT), where sequencing data is produced by light (captured by a camera) emitted when a nucleotide is added to the complementary strand by enzymes containing fluorescent dyes. Oxford Nanopore Technologies is another company developing third-generation sequencers data using electronic systems based on nanopore sensing technologies.

Manufacturers of DNA sequencers[edit]

DNA sequencers have been developed, manufactured, and sold by the following companies, among others.

Roche[edit]

The 454 DNA sequencer was the first next-generation sequencer to become commercially successful.[9] It was developed by 454 Life Sciences and purchased by Roche in 2007. 454 utilizes the detection of pyrophosphate released by the DNA polymerase reaction when adding a nucleotide to the template strain.

Roche currently manufactures two systems based on their pyrosequencing technology: the GS FLX+ and the GS Junior System.[10] The GS FLX+ System promises read lengths of approximately 1000 base pairs while the GS Junior System promises 400 base pair reads.[11][12] A predecessor to GS FLX+, the 454 GS FLX Titanium system was released in 2008, achieving an output of 0.7G of data per run, with 99.9% accuracy after quality filter, and a read length of up to 700bp. In 2009, Roche launched the GS Junior, a bench top version of the 454 sequencer with read length up to 400bp, and simplified library preparation and data processing.

One of the advantages of 454 systems are their running speed, Manpower can be reduced with automation of library preparation and semi-automation of emulsion PCR. A disadvantages of the 454 system is that it is prone to errors when estimating the number of bases in a long string of identical nucleotides. This is referred to as a homopolymer error and occurs when there are 6 or more identical bases in row.[13] Another disadvantage is that the price of reagents is relatively more expensive compared with other next-generation sequencers.

In 2013 Roche announced that they would be shutting down development of 454 technology and phasing out 454 machines completely in 2016.[14][15]

Roche produces a number of software tools which are optimised for the analysis of 454 sequencing data.[16] GS Run Processor[17] converts raw images generated by a sequencing run into intensity values. The process consists of two main steps: image processing and signal processing. The software also applies normalization, signal correction, base-calling and quality scores for individual reads. The software outputs data in Standard Flowgram Format (or SFF) files to be used in data analysis applications (GS De Novo Assembler, GS Reference Mapper or GS Amplicon Variant Analyzer). GS De Novo Assembler is a tool for de novo assembly of whole-genomes up to 3GB in size from shotgun reads alone or combined with paired end data generated by 454 sequencers. It also supports de novo assembly of transcripts (including analysis), and also isoform variant detection.[16] GS Reference Mapper maps short reads to a reference genome, generating a consensus sequence. The software is able to generate output files for assessment, indicating insertions, deletions and SNPs. Can handle large and complex genomes of any size.[16] Finally, the GS Amplicon Variant Analyzer aligns reads from amplicon samples against a reference, identifying variants (linked or not) and their frequencies. Can also be used to detect unknown and low-frequency variants. Includes graphical tools for analysis of alignments.[16]

Illumina[edit]

Illumina Genome Analyzer II sequencing machine

Illumina produces a number of next-generation sequencing machines using technology acquired from Manteia Predictive Medicine and developed by Solexa.[18] Illumina makes a number of next generation sequencing machines using this technology including the HiSeq, Genome Analyzer IIx, MiSeq and the HiScanSQ, which can also process microarrays.[19]

The technology leading to these DNA sequencers was first released by Solexa in 2006 as the Genome Analyzer.[9] Illumina purchased Solexa in 2007. The Genome Analyzer uses a sequencing by synthesis method. The first model produced 1G per run. During the year 2009 the output was increased from 20G per run in August to 50G per run in December. In 2010 Illumina released the HiSeq 2000 with an output of 200 and then 600G per run which would take 8 days. At its release the HiSeq 2000 provided one of the cheapest sequencing platforms at $0.02 per million bases as costed by the Beijing Genomics Institute.

In 2011 Illumina released a benchtop sequencer called the MiSeq. At its release the MiSeq could generate 1.5G per run with paired end 150bp reads. A sequencing run can be performed in 10 hours when using automated DNA sample preparation.[9]

The Illumina HiSeq uses two software tools to calculate the number and position of DNA clusters to assess the sequencing quality: the HiSeq control system and the real-time analyzer. These methods help to assess if nearby clusters are interfering with each other.[9]

Life Technologies[edit]

Life Technologies produces DNA sequencers under the Applied Biosystems and Ion Torrent brands. Applied Biosystems makes the SOLiD next-generation sequencing platform,[20] and Sanger-based DNA sequencers such as the 3500 Genetic Analyzer.[21] Under the Ion Torrent brand, Applied Biosystems produces two next-generation sequencers: the Ion PGM System and the Ion Proton System.[22]

SOLiD systems was acquired by Applied Biosystems in 2006. SOLiD applies sequencing by ligation and dual base encoding. The first SOLiD system was launched in 2007, generating reading lengths of 35bp and 3G data per run. After five upgrades, the 5500xl sequencing system was released in 2010, considerably increasing read length to 85bp, improving accuracy up to 99.99% and producing 30G per 7-day run.[9]

The limited read length of the SOLiD has remain a significant shortcoming[23] and has to some extent limited its use to experiments where read length is less vital such as resequencing and transcriptome analysis and more recently ChIP-Seq and methylation experiments.[9] The DNA sample preparation time for SOLiD systems has become much quicker with the automation of sequencing library preparations such as the Tecan system.[9]

The colour space data produced by the SOLiD platform can be decoded into DNA bases for further analysis, however software that considers the original colour space information can give more accurate results. Life Technologies has released BioScope,[24] a data analysis package for resequencing, ChiP-Seq and transcriptome analysis. It uses the MaxMapper algorithm to map the colour space reads.

Beckman Coulter[edit]

Beckman Coulter (now Danaher) has previously manufactured chain termination and capillary electrophoresis-based DNA sequencers under the model name CEQ, including the CEQ 8000. The company now produces the GeXP Genetic Analysis System, which uses dye terminator cycle sequencing. This method uses a thermocycler in much the same way as PCR to denature, anneal, and extend DNA fragments, amplifying the sequenced fragments.[25][26]

Pacific Biosciences[edit]

Pacific Biosciences produces a sequencing system named the PacBio RS using a single molecule real time sequencing, or SMRT, method.[27] This system can produce read lengths of multiple thousands of base pairs, though with a high rate of errors. These errors may be alleviated by use of optimized assembly strategies.[28] Scientists have reported 99.999% accuracy with these strategies.[29]

Comparison[edit]

Sequencer Ion Torrent PGM [4][30][31] 454 GS FLX [9] HiSeq 2000 [4][9] SOLiDv4 [9] PacBio [4][32] Sanger 3730xl [9]
Manufacturer Ion Torrent (Life Technologies) 454 Life Sciences (Roche) Illumina Applied Biosystems (Life Technologies) Pacific Biosciences Applied Biosystems (Life Technologies)
Sequencing Chemistry Ion semiconductor sequencing Pyrosequencing Polymerase-based sequence-by-synthesis Ligation-based sequencing Phospholinked fluorescent nucleotides Dideoxy chain termination
Amplification approach Emulsion PCR Emulsion PCR Bridge amplification Emulsion PCR Single-molecule; no amplification PCR
Data output per run 100-200 Mb 0.7 Gb 600 Gb 120 Gb 100-700 Mb 1.9∼84 Kb
Accuracy 99% 99.9% 99.9% 99.94% 88.0% (>99.9% CCS)[33] 99.999%
Time per run 2 hours 24 hours 3–10 days 7–14 days 2–3 hours 20 minutes - 3 hours
Read length 200-400 bp 700 bp 100x100 bp paired end 50x50 bp paired end 5,500-10,000 bp (N50) 400-900 bp
Cost per run $350 USD $7,000 USD $6,000 USD (30x human genome) $4,000 USD $125–300 USD $4 USD (single read/reaction)
Cost per Mb $1.00 USD $10 USD $0.07 USD $0.13 USD $0.20 - $3.00 USD $2400 USD
Cost per instrument $80,000 USD $500,000 USD $690,000 USD $495,000 USD $695,000 USD $95,000 USD

Table 1. Comparing metrics and performance of next-generation DNA sequencers.[34]

References[edit]

  1. ^ a b c Metzker, M. L. (2005). "Emerging technologies in DNA sequencing". Genome Res. 15 (12): 1767–1776. doi:10.1101/gr.3770505. PMID 16339375. 
  2. ^ a b Hutchison, C. A. III. (2007). "DNA sequencing: bench to bedside and beyond". Nucleic Acids Res. 35 (18): 6227–6237. doi:10.1093/nar/gkm688. PMID 17855400. 
  3. ^ a b F. S. Collins, M. Morgan, and A. Patrinos (2003). "The Human Genome Project: lessons from large-scale biology". Science 300 (5617): 286–290. doi:10.1126/science.1084564. PMID 12690187. 
  4. ^ a b c d Michael A Quail, Miriam Smith, Paul Coupland, Thomas D Otto, Simon R Harris, Thomas R Connor, Anna Bertoni, Harold P Swerdlow and Yong Gu (2012) A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers. BMC Genomics.
  5. ^ Michael A. Quail, Iwanka Kozarewa, Frances Smith, Aylwyn Scally, Philip J. Stephens, Richard Durbin, Harold Swerdlow, and Daniel J. Turner (2008). "A large genome centre’s improvements to the Illumina sequencing system". Nat Methods 5 (12): 1005–1010. doi:10.1038/nmeth.1270. PMC 2610436. PMID 19034268. 
  6. ^ Heng Li, Jue Ruan, and Richard Durbin (2008) http://genome.cshlp.org/content/18/11/1851 Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Research.
  7. ^ Gilbert W, Maxam A. (1973). "The Nucleotide Sequence of the lac Operator". Proc Natl Acad Sci U S A 70 (12): 13581–3584. PMC 427284. PMID 4587255. 
  8. ^ Sanger F, Coulson AR (May 1975). "A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase". J. Mol. Biol. 94 (3): 441–8. doi:10.1016/0022-2836(75)90213-2. PMID 1100841. 
  9. ^ a b c d e f g h i j k l Lin Liu, Yinhu Li, Siliang Li, Ni Hu, Yimin He, Ray Pong, Danni Lin, Lihua Lu, and Maggie Law (2012). "Comparison of Next-Generation Sequencing Systems". Journal of Biomedicine and Biotechnology. 
  10. ^ Products : 454 Life Sciences, a Roche Company
  11. ^ Products - GS FLX+ System : 454 Life Sciences, a Roche Company
  12. ^ Products - GS Junior System : 454 Life Sciences, a Roche Company
  13. ^ Mardis, Elaine R. (1 September 2008). "Next-Generation DNA Sequencing Methods". Annual Review of Genomics and Human Genetics 9 (1): 387–402. doi:10.1146/annurev.genom.9.081307.164359. PMID 18576944. 
  14. ^ http://www.genomeweb.com/sequencing/roche-shutting-down-454-sequencing-business Roche Shutting Down 454 Sequencing Business
  15. ^ http://www.bio-itworld.com/2013/4/23/roche-shuts-down-third-generation-ngs-research-programs.html Roche Shuts Down Third-Generation NGS Research Programs
  16. ^ a b c d Products - Analysis Software : 454 Life Science, a Roche Company
  17. ^ Genome Sequencer FLX System Software Manual, version 2.3
  18. ^ Solexa's Progress Is In The Genes - Businessweek
  19. ^ Illumina Systems
  20. ^ SOLiD
  21. ^ Sanger Sequencing | Life Technologies
  22. ^ http://www.lifetechnologies.com/au/en/home/brands/ion-torrent.html Ion Torrent
  23. ^ SOLiD system accuracy
  24. ^ SOLiD BioScope Software | Life Technologies
  25. ^ Rai, Alex J.; Kamath, Rashmi M.; Gerald, William; Fleisher, Martin (29 October 2008). "Analytical validation of the GeXP analyzer and design of a workflow for cancer-biomarker discovery using multiplexed gene-expression profiling". Analytical and Bioanalytical Chemistry 393 (5): 1505–1511. doi:10.1007/s00216-008-2436-7. 
  26. ^ Beckman Coulter, Inc - GenomeLab GeXP Genetic Analysis System
  27. ^ Pacific Biosciences: Overview
  28. ^ Koren, S; Schatz, MC; Walenz, BP; Martin, J; Howard, JT; Ganapathy, G; Wang, Z; Rasko, DA; McCombie, WR; Jarvis, ED; Phillippy, AM (Jul 1, 2012). "Hybrid error correction and de novo assembly of single-molecule sequencing reads.". Nature Biotechnology 30 (7): 693–700. doi:10.1038/nbt.2280. PMID 22750884. 
  29. ^ http://www.nature.com/nmeth/journal/v10/n6/full/nmeth.2474.html
  30. ^ Karow, J. (2010) Ion Torrent Systems Presents $50,000 Electronic Sequencer at AGBT. In Sequence.
  31. ^ Ion PGM - Ion Torrent
  32. ^ Pacific Biosciences
  33. ^ Xiaoli Jiao, Xin Zheng, [...], and Da Wei Huang (July 2013). "A Benchmark Study on Error Assessment and Quality Control of CCS Reads Derived from the PacBio RS". Journal of data mining in genomics & proteomics 4 (3). doi:10.4172/2153-0602.1000136. PMC 3811116. PMID 24179701. 
  34. ^ Shendure, J.; Ji, H. (2008). "Next-generation DNA sequencing". Nat Biotech 26 (10): 1135–1145. doi:10.1038/nbt1486. PMID 18846087.