Single-molecule real-time sequencing

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

Single-molecule real-time sequencing (SMRT) is a parallelized single molecule DNA sequencing method. Single-molecule real-time sequencing utilizes a zero-mode waveguide (ZMW).[1] A single DNA polymerase enzyme is affixed at the bottom of a ZMW with a single molecule of DNA as a template. The ZMW is a structure that creates an illuminated observation volume that is small enough to observe only a single nucleotide of DNA being incorporated by DNA polymerase. Each of the four DNA bases is attached to one of four different fluorescent dyes. When a nucleotide is incorporated by the DNA polymerase, the fluorescent tag is cleaved off and diffuses out of the observation area of the ZMW where its fluorescence is no longer observable. A detector detects the fluorescent signal of the nucleotide incorporation, and the base call is made according to the corresponding fluorescence of the dye.[2]

Technology[edit]

The DNA sequencing is done on a chip that contains many ZMWs. Inside each ZMW, a single active DNA polymerase with a single molecule of single stranded DNA template is immobilized to the bottom through which light can penetrate and create a visualization chamber that allows monitoring of the activity of the DNA polymerase at a single molecule level. The signal from a phospho-linked nucleotide incorporated by the DNA polymerase is detected as the DNA synthesis proceeds which results in the DNA sequencing in real time.

Phospholinked nucleotide[edit]

For each of the nucleotide bases, there is a corresponding fluorescent dye molecule that enables the detector to identify the base being incorporated by the DNA polymerase as it performs the DNA synthesis. The fluorescent dye molecule is attached to the phosphate chain of the nucleotide. When the nucleotide is incorporated by the DNA polymerase, the fluorescent dye is cleaved off with the phosphate chain as a part of a natural DNA synthesis process during which a phosphodiester bond is created to elongate the DNA chain. The cleaved fluorescent dye molecule then diffuses out of the detection volume so that the fluorescent signal is no longer detected.

Zero-mode waveguide[edit]

The zero-mode waveguide (ZMW) is a nanophotonic confinement structure that consists of a circular hole in an aluminum cladding film deposited on a clear silica substrate.[3]

The ZMW holes are ~70 nm in diameter and ~100 nm in depth. Due to the behavior of light when it travels through a small aperture, the optical field decays exponentially inside the chamber.[4]

The observation volume within an illuminated ZMW is ~20 zeptoliters (20 X 10−21 liters). Within this volume, the activity of DNA polymerase incorporating a single nucleotide can be readily detected.

Sequencing performance[edit]

Sequencing performance for the technology can be measured in read length and total throughput per experiment.

On 19 Sep 2018, Pacific Biosciences [PacBio] released the Sequel 6.0 chemistry, synchronizing the chemistry version with the software version. Performance is contrasted for large-insert libraries with high molecular weight DNA versus shorter-insert libraries below ~15,000 bases in length. For larger templates average read lengths are up to 30,000 bases. For shorter-insert libraries, average read length are up to 100,000 bases while reading the same molecule in a circle. The latter shorter-insert libraries then yield up to 50 billion bases from a single SMRT Cell.[5]

History[edit]

Pacific Biosciences [PacBio] commercialized SMRT sequencing in 2011,[6] after releasing a beta version of its RS instrument in late 2010.[7]

RS and RS II[edit]

SMRT Cell for a RS or RS II Sequencer

At commercialization read length had a normal distribution with a mean of about 1100 bases. A new chemistry kit released in early 2012 increased the sequencer's read length; an early customer of the chemistry cited mean read lengths of 2500 to 2900 bases.[8]

The XL chemistry kit released in late 2012 increased average read length to more than 4300 bases.[9][10]

On Aug 21, 2013, PacBio released new DNA/Polymerase Binding Kit P4. This P4 enzyme has average read lengths of more than 4,300 bases when paired with the C2 sequencing chemistry and more than 5,000 bases when paired with the XL chemistry.[11] The enzyme’s accuracy is similar to C2, reaching QV50 between 30X and 40X coverage. The resulting P4 attributes provided higher-quality assemblies using fewer SMRT Cells and with improved variant calling. [12] When coupled with input DNA size selection (using an electrophoresis instrument such as BluePippin) yields average read length over 7 kilobases.[13]

On Oct 3, 2013, PacBio released new reagent combination for PacBio RS II, the P5 DNA polymerase with C3 chemistry (P5-C3). Together, they extend sequencing read lengths to an average of approximately 8,500 bases, with the longest reads exceeding 30,000 bases.[14] Throughput per SMRT cell is around 500 million bases demonstrated by sequencing results from the CHM1 cell line.[15]

On Oct 15, 2014, PacBio announced the release of new chemistry P6-C4 for the RS II system, which represents the company's 6th generation of polymerase and 4th generation chemistry, further extends the average read length to 10,000 - 15,000 bases, with the longest reads exceeding 40,000 bases. The throughput with the new chemistry was expected to be between 500 million to 1 billion bases per SMRT Cell, depending on the sample being sequenced.[16][17] This was the final version of chemistry released for the RS instrument.

Throughput per experiment for the technology is both influenced by the read length of DNA molecules sequenced as well as total multiplex of a SMRT Cell. The prototype of the SMRT Cell contained about 3000 ZMW holes that allowed parallelized DNA sequencing. At commercialization, the SMRT Cells were each patterned with 150,000 ZMW holes that were read in two sets of 75,000.[18] In April 2013, the company released a new version of the sequencer called the "PacBio RS II" that uses all 150,000 ZMW holes concurrently, doubling the throughput per experiment.[19][20] The highest throughput mode in November 2013 used P5 binding, C3 chemistry, BluePippin size selection, and a PacBio RS II officially yielded 350 million bases per SMRT Cell though a Human de novo data set released with the chemistry averaging 500 million bases per SMRT Cell. Throughput varies based on the type of sample being sequenced.[21] With the introduction of P6-C4 chemistry typical throughput per SMRT Cell increased to 500 million bases to 1 billion bases.

RS Performance
C1 C2 P4-XL P5-C3 P6-C4
Average read length bases 1100 2500 - 2900 4300 - 5000 8500 10,000 - 15,000
Throughput per SMRT Cell 30M - 40M 60M - 100M 250M - 300M 350M - 500M 500M - 1B

Sequel[edit]

SMRT Cell for a Sequel Sequencer

In September 2015, the company announced the launch of a new sequencing instrument, the Sequel System, that increased capacity to 1 million ZMW holes.[22][23]

With the Sequel instrument initial read lengths were comparable to the RS, then later chemistry releases increased read length.

On January 23, 2017 the V2 chemistry released increased read lengths to 10,000 and 18,000 bases.[24]

On Mar 8, 2018, PacBio released the Sequel 2.1 chemistry, reporting average read length up to 20,000 bases and half of all reads above 30,000 bases in length. Yield per SMRT Cell increased to 10 or 20 billion bases, for either large-insert libraries or shorter-insert (e.g. amplicon) libraries respectively.[25]

In September 2018 the company announced the Sequel 6.0 chemistry with average read lengths increased to 100,000 bases for shorter-insert libraries and 30,000 for longer-insert libraries. SMRT Cell increased to 50 billion bases for shorter-insert libraries[5]

Sequel Performance
V2 2.1 6.0
Average read length bases 10,000 - 18,000 20,000 - 30,000 30,000 - 100,000
Throughput per SMRT Cell 5B - 8B 10B - 20B 20B - 50B

8M Chip[edit]

In early 2019 the company plans to release a new SMRT Cell with eight million ZMW's, increasing the expected throughput per SMRT Cell by a factor of eight.[26]

Application[edit]

Single-molecule real-time sequencing may be applicable for a broad range of genomics research.

For de novo genome sequencing, read lengths from the single-molecule real-time sequencing are comparable to or greater than that from the Sanger sequencing method based on dideoxynucleotide chain termination. The longer read length allows de novo genome sequencing and easier genome assemblies.[27][28][29] Scientists are also using single-molecule real-time sequencing in hybrid assemblies for de novo genomes to combine short-read sequence data with long-read sequence data.[30][31] In 2012, several peer-reviewed publications were released demonstrating the automated finishing of bacterial genomes,[32][33] including one paper that updated the Celera Assembler with a pipeline for genome finishing using long SMRT sequencing reads.[34] In 2013, scientists estimated that long-read sequencing could be used to fully assemble and finish the majority of bacterial and archaeal genomes.[35]

The same DNA molecule can be resequenced independently by creating the circular DNA template and utilizing a strand displacing enzyme that separates the newly synthesized DNA strand from the template.[36] In August 2012, scientists from the Broad Institute published an evaluation of SMRT sequencing for SNP calling.[37]

The dynamics of polymerase can indicate whether a base is methylated.[38] Scientists demonstrated the use of single-molecule real-time sequencing for detecting methylation and other base modifications.[39][40][41] In 2012 a team of scientists used SMRT sequencing to generate the full methylomes of six bacteria.[42] In November 2012, scientists published a report on genome-wide methylation of an outbreak strain of E. coli.[43]

Long reads make it possible to sequence full gene isoforms, including the 5' and 3' ends. This type of sequencing is useful to capture isoforms and splice variants.[44][45]

References[edit]

  1. ^ Levene, M. J. (2003). "Zero-Mode Waveguides for Single-Molecule Analysis at High Concentrations". Science. 299 (5607): 682–686. doi:10.1126/science.1079700. ISSN 0036-8075. PMID 12560545.
  2. ^ Eid, J.; Fehr, A.; Gray, J.; Luong, K.; Lyle, J.; Otto, G.; Peluso, P.; Rank, D.; Baybayan, P.; Bettman, B.; Bibillo, A.; Bjornson, K.; Chaudhuri, B.; Christians, F.; Cicero, R.; Clark, S.; Dalal, R.; deWinter, A.; Dixon, J.; Foquet, M.; Gaertner, A.; Hardenbol, P.; Heiner, C.; Hester, K.; Holden, D.; Kearns, G.; Kong, X.; Kuse, R.; Lacroix, Y.; Lin, S.; Lundquist, P.; Ma, C.; Marks, P.; Maxham, M.; Murphy, D.; Park, I.; Pham, T.; Phillips, M.; Roy, J.; Sebra, R.; Shen, G.; Sorenson, J.; Tomaney, A.; Travers, K.; Trulson, M.; Vieceli, J.; Wegener, J.; Wu, D.; Yang, A.; Zaccarin, D.; Zhao, P.; Zhong, F.; Korlach, J.; Turner, S. (2009). "Real-Time DNA Sequencing from Single Polymerase Molecules". Science. 323 (5910): 133–138. doi:10.1126/science.1162986. ISSN 0036-8075. PMID 19023044.
  3. ^ Korlach, J.; Marks, P. J.; Cicero, R. L.; Gray, J. J.; Murphy, D. L.; Roitman, D. B.; Pham, T. T.; Otto, G. A.; Foquet, M.; Turner, S. W. (2008). "Selective aluminum passivation for targeted immobilization of single DNA polymerase molecules in zero-mode waveguide nanostructures". Proceedings of the National Academy of Sciences. 105 (4): 1176–1181. doi:10.1073/pnas.0710982105. ISSN 0027-8424. PMC 2234111. PMID 18216253.
  4. ^ Foquet, Mathieu; Samiee, Kevan T.; Kong, Xiangxu; Chauduri, Bidhan P.; Lundquist, Paul M.; Turner, Stephen W.; Freudenthal, Jake; Roitman, Daniel B. (2008). "Improved fabrication of zero-mode waveguides for single-molecule detection". Journal of Applied Physics. 103 (3): 034301. doi:10.1063/1.2831366. ISSN 0021-8979.
  5. ^ a b "PacBio on Twitter". 2018-09-19.
  6. ^ "PacBio Ships First Two Commercial Systems; Order Backlog Grows to 44". GenomeWeb.
  7. ^ "PacBio Reveals Beta System Specs for RS; Says Commercial Release is on Track for First Half of 2011". GenomeWeb.
  8. ^ "After a Year of Testing, Two Early PacBio Customers Expect More Routine Use of RS Sequencer in 2012". GenomeWeb.
  9. ^ "PacBio's XL Chemistry Increases Read Lengths and Throughput; CSHL Tests the Tech on Rice Genome". GenomeWeb.
  10. ^ "PacBio Users Report Progress in Long Reads for Plant Genome Assembly, Tricky Regions of Human Genome". GenomeWeb.
  11. ^ "PacBio Blog". pacificbiosciences.com.
  12. ^ http://blog.pacificbiosciences.com/2013/08/new-dna-polymerase-p4-delivers-higher.html
  13. ^ "Longing for the longest reads: PacBio and BluePippin". In between lines of code. 2013-06-19.
  14. ^ http://blog.pacificbiosciences.com/2013/10/new-chemistry-for-pacbio-rs-ii-provides.html
  15. ^ Chaisson, Mark J. P.; Huddleston, John; Dennis, Megan Y.; Sudmant, Peter H.; Malig, Maika; Hormozdiari, Fereydoun; Antonacci, Francesca; Surti, Urvashi; Sandstrom, Richard; Boitano, Matthew; Landolin, Jane M.; Stamatoyannopoulos, John A.; Hunkapiller, Michael W.; Korlach, Jonas; Eichler, Evan E. (10 November 2014). "Resolving the complexity of the human genome using single-molecule sequencing". Nature. 517 (7536): 608–611. doi:10.1038/nature13907. PMC 4317254. PMID 25383537.
  16. ^ http://investor.pacificbiosciences.com/releasedetail.cfm?ReleaseID=876252
  17. ^ "New Chemistry Boosts Average Read Length to 10 kb – 15 kb for PacBio RS II - PacBio". 15 October 2014.
  18. ^ "Pacific Biosciences". pacificbiosciences.com.
  19. ^ "PacBio Launches PacBio RS II Sequencer". Next Gen Seek. 2013-04-11.
  20. ^ "New Products: PacBio's RS II; Cufflinks". GenomeWeb.
  21. ^ "Duke Sequencing on Twitter". Twitter. 2013-08-30.
  22. ^ "Bio-IT World".
  23. ^ "PacBio Launches Higher-Throughput, Lower-Cost Single-Molecule Sequencing System".
  24. ^ "New Chemistry and Software for Sequel System Improve Read Length, Lower Project Costs - PacBio". 9 January 2017.
  25. ^ "New Software, Polymerase for Sequel System Boost Throughput and Affordability - PacBio". 7 March 2018.
  26. ^ http://investor.pacificbiosciences.com/static-files/e53d5ef9-02cd-42ab-9d86-3037ad9deaec
  27. ^ John Eid (2009). "Real-Time DNA Sequencing from Single Polymerase Molecules". Sciencemag.org. 323 (5910): 133–138. doi:10.1126/science.1162986. PMID 19023044.
  28. ^ Rasko, DA; Webster, DR; Sahl, JW; Bashir, A; Boisen, N; Scheutz, F; Paxinos, EE; Sebra, R; Chin, CS; Iliopoulos, D; Klammer, A; Peluso, P; Lee, L; Kislyuk, AO; Bullard, J; Kasarskis, A; Wang, S; Eid, J; Rank, D; Redman, JC; Steyert, SR; Frimodt-Møller, J; Struve, C; Petersen, AM; Krogfelt, KA; Nataro, JP; Schadt, EE; Waldor, MK (Aug 2011). "Origins of the E. coli Strain Causing an Outbreak of Hemolytic–Uremic Syndrome in Germany". New England Journal of Medicine. 365 (8): 709–717. doi:10.1056/NEJMoa1106920. PMC 3168948. PMID 21793740.
  29. ^ Chin, CS; Sorenson, J; Harris, JB; Robins, WP; Charles, RC; Jean-Charles, RR; Bullard, J; Webster, DR; Kasarskis, A; Peluso, P; Paxinos, EE; Yamaichi, Y; Calderwood, SB; Mekalanos, JJ; Schadt, EE; Waldor, MK (Jan 2011). "The Origin of the Haitian Cholera Outbreak Strain". New England Journal of Medicine. 364 (1): 33–42. doi:10.1056/NEJMoa1012928. PMC 3030187. PMID 21142692.
  30. ^ "Tech Tips". GEN. 2012-04-16.
  31. ^ http://schatzlab.cshl.edu/presentations/2011-09-07.PacBio%20Users%20Meeting.pdf
  32. ^ http://genome.cshlp.org/content/early/2012/07/24/gr.141515.112.full.pdf
  33. ^ Schadt, Eric E.; Waldor, Matthew K.; Mekalanos, John J.; Kasarskis, Andrew; Davis, Brigid M.; Turnsek, Maryann; Tarr, Cheryl L.; Frace, Michael; Rowe, Lori; Joshi, Amruta; Lamay, Brianna; Lin, Steven; Luong, Khai; Mollova, Emilia; Valdovino, Marie; Yen, Jackie; Bullard, James; Sorenson, Jon; Sebra, Robert; Peluso, Paul; Wang, Susana; Ashby, Meredith; Hsu, David; Paxinos, Ellen; Webster, Dale; Chin, Chen-Shan; Robins, William P.; Klammer, Aaron A.; Bashir, Ali (July 2012). "A hybrid approach for the automated finishing of bacterial genomes". Nature Biotechnology. 30 (7): 701–707. doi:10.1038/nbt.2288. PMC 3731737. PMID 22750883.
  34. ^ Koren, S; Schatz, MC; Walenz, BP; et al. (July 2012). "Hybrid error correction and de novo assembly of single-molecule sequencing reads". Nature Biotechnology. 30 (7): 693–700. doi:10.1038/nbt.2280. PMC 3707490. PMID 22750884.
  35. ^ Koren, S; Harhay, GP; Smith, TP; Bono, JL; Harhay, DM; Mcvey, SD; Radune, D; Bergman, NH; Phillippy, AM (2013-09-13). "Reducing assembly complexity of microbial genomes with single-molecule sequencing". Genome Biology. 14 (9): R101. doi:10.1186/gb-2013-14-9-r101. PMC 4053942. PMID 24034426.
  36. ^ Shah, Neil P.; Kuriyan, John; Kasarskis, Andrew; Schadt, Eric E.; Zarrinkar, Patrick P.; Hunt, Jeremy P.; Wang, Susana; Travers, Kevin J.; Perl, Alexander E.; Levis, Mark J.; Damon, Lauren E.; Salerno, Sara; Chin, Chen-Shan; Wang, Qi; Smith, Catherine C. (May 2012). "Validation of ITD mutations in FLT3 as a therapeutic target in human acute myeloid leukaemia". Nature. 485 (7397): 260–263. doi:10.1038/nature11016. PMC 3390926. PMID 22504184.
  37. ^ Carneiro, MO; Russ, C; Ross, MG; Gabriel, SB; Nusbaum, C; DePristo, MA (2012-08-05). "Pacific biosciences sequencing technology for genotyping and variation discovery in human data". BMC Genomics. 13 (1): 375. doi:10.1186/1471-2164-13-375. PMC 3443046. PMID 22863213.
  38. ^ Turner, Stephen W.; Korlach, Jonas; Clark, Tyson A.; Olivares, Eric C.; Travers, Kevin J.; Lee, Jessica H.; Webster, Dale R.; Flusberg, Benjamin A. (June 2010). "Direct detection of DNA methylation during single-molecule, real-time sequencing". Nature Methods. 7 (6): 461–465. doi:10.1038/nmeth.1459. PMC 2879396. PMID 20453866.
  39. ^ Tyson A. Clark (2012). "Characterization of DNA methyltransferase specificities using single-molecule, real-time DNA sequencing". Oxfordjournals.org. 40 (4): e29. doi:10.1093/nar/gkr1146. PMC 3287169. PMID 22156058.
  40. ^ Song, CX; Clark, TA; Lu, XY; Kislyuk, A; Dai, Q; Turner, SW; He, C; Korlach, J (2011). "Sensitive and specific single-molecule sequencing of 5-hydroxymethylcytosine". Nature Methods. 9 (1): 75–77. doi:10.1038/nmeth.1779. PMC 3646335. PMID 22101853.
  41. ^ Clark, Tyson A.; Spittle, Kristi E.; Turner, Stephen W.; Korlach, Jonas (2011). "Genome Integrity - Full text - Direct Detection and Sequencing of Damaged DNA Bases". Genomeintegrity.com. 2: 10. doi:10.1186/2041-9414-2-10. PMC 3264494. PMID 22185597.
  42. ^ Iain A. Murray (2012). "The methylomes of six bacteria". Oxfordjournals.org. 40 (22): 11450–11462. doi:10.1093/nar/gks891. PMC 3526280. PMID 23034806.
  43. ^ Fang, G; Munera, D; Friedman, DI; Mandlik, A; Chao, MC; Banerjee, O; Feng, Z; Losic, B; Mahajan, MC; Jabado, OJ; Deikus, G; Clark, TA; Luong, K; Murray, IA; Davis, BM; Keren-Paz, A; Chess, A; Roberts, RJ; Korlach, J; Turner, SW; Kumar, V; Waldor, MK; Schadt, EE (December 2012). "Genome-wide mapping of methylated adenine residues in pathogenic Escherichia coli using single-molecule real-time sequencing". Nature Biotechnology. 30 (12): 1232–1239. doi:10.1038/nbt.2432. PMC 3879109. PMID 23138224.
  44. ^ Snyder, Michael; Grubert, Fabian; Tilgner, Hagen; Sharon, Donald (November 2013). "A single-molecule long-read survey of the human transcriptome". Nature Biotechnology. 31 (11): 1009–1014. doi:10.1038/nbt.2705. PMC 4075632. PMID 24108091.
  45. ^ (PDF) http://www.pnas.org/content/early/2013/11/25/1320101110.full.pdf. Missing or empty |title= (help)

External links[edit]