Read (biology)

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

In DNA sequencing, a read is an inferred sequence of base pairs (or base pair probabilities) corresponding to all or part of a single DNA fragment. A typical sequencing experiment involves fragmentation of the genome into millions of molecules, which are size-selected and ligated to adapters. The set of fragments is referred to as a sequencing library, which is sequenced to produce a set of reads.[1]

Read length[edit]

Sequencing technologies vary in the length of reads produced. Reads of length 20-40 base pairs (bp) are referred to as ultra-short.[2] Typical sequencers produce read lengths in the range of 100-500 bp.[3] However, Pacific Biosciences platforms produce read lengths of approximately 1500 bp.[4] Read length is a factor which can affect the results of biological studies.[5] For example, longer read lengths improve the resolution of de novo genome assembly and detection of structural variants. It is estimated that read lengths greater than 100 kilobases (kb) will be required for routine de novo human genome assembly.[6] Bioinformatic pipelines to analyze sequencing data usually take into account read lengths.[7]


  1. ^ "Sequencing library: what is it?". Breda Genetics. Retrieved 23 July 2017.
  2. ^ Chaisson, Mark J. (2009). "De novo fragment assembly with short mate-paired reads: Does the read length matter?". Genome Research. 19 (2): 336–346. doi:10.1101/gr.079053.108. Retrieved 23 July 2017.
  3. ^ Junemann, Sebastian (2013). "Updating benchtop sequencing performance comparison". Nature Biotechnology. 31 (4): 294–296. doi:10.1038/nbt.2522. Retrieved 23 July 2017.
  4. ^ Quail, Michael A. (2012). "A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers". BMC Genomics. 13 (1): 341. doi:10.1186/1471-2164-13-341. Retrieved 23 July 2017.
  5. ^ Chhangawala, Sagar; Rudy, Gabe; Mason, Christopher E.; Rosenfeld, Jeffrey A. (23 June 2015). "The impact of read length on quantification of differentially expressed genes and splice junction detection". Genome Biology. 16 (1). doi:10.1186/s13059-015-0697-y.
  6. ^ Chaisson, Mark J.P. (2015). "Genetic variation and the de novo assembly of human genomes". Nature Reviews Genetics. 16 (11): 627. doi:10.1038/nrg3933. PMC 4745987. Retrieved 23 July 2017.
  7. ^ Conesa, Ana; Madrigal, Pedro; Tarazona, Sonia; Gomez-Cabrero, David; Cervera, Alejandra; McPherson, Andrew; Szcześniak, Michał Wojciech; Gaffney, Daniel J.; Elo, Laura L.; Zhang, Xuegong; Mortazavi, Ali (26 January 2016). "A survey of best practices for RNA-seq data analysis". Genome Biology. 17 (1). doi:10.1186/s13059-016-0881-8.