From Wikipedia, the free encyclopedia
Jump to: navigation, search

RNA-Seq (RNA sequencing), also called whole transcriptome shotgun sequencing[1] (WTSS), uses next-generation sequencing (NGS) to reveal the presence and quantity of RNA in a biological sample at a given moment in time.[2][3]

RNA-Seq is used to analyze the continually changing cellular transcriptome. Specifically, RNA-Seq facilitates the ability to look at alternative gene spliced transcripts, post-transcriptional modifications, gene fusion, mutations/SNPs and changes in gene expression.[4] In addition to mRNA transcripts, RNA-Seq can look at different populations of RNA to include total RNA, small RNA, such as miRNA, tRNA, and ribosomal profiling.[5] RNA-Seq can also be used to determine exon/intron boundaries and verify or amend previously annotated 5' and 3' gene boundaries.

Prior to RNA-Seq, gene expression studies were done with hybridization-based microarrays. Issues with microarrays include cross-hybridization artifacts, poor quantification of lowly and highly expressed genes, and needing to know the sequence a priori.[6] Because of these technical issues, transcriptomics transitioned to sequencing-based methods. These progressed from Sanger sequencing of Expressed Sequence Tag libraries, to chemical tag-based methods (e.g., serial analysis of gene expression), and finally to the current technology, NGS of cDNA (notably RNA-Seq).



RNA 'Poly(A)' library[edit]

See also: Polyadenylation

Creation of a sequence library can change from platform to platform in high throughput sequencing,[7] where each has several kits designed to build different types of libraries and adapting the resulting sequences to the specific requirements of their instruments. However, due to the nature of the template being analyzed, there are commonalities within each technology. Frequently, in mRNA analysis the 3' polyadenylated (poly(A)) tail is targeted in order to ensure that coding RNA is separated from noncoding RNA. This can be accomplished simply with poly (T) oligos covalently attached to a given substrate. Presently many studies utilize magnetic beads for this step.[1][8][9]

Studies including portions of the transcriptome outside poly(A) RNAs have shown that when using poly(T) magnetic beads, the flow-through RNA (non-poly(A) RNA) can yield important noncoding RNA gene discovery which would have otherwise gone unnoticed.[1] Also, since ribosomal RNA represents over 90% of the RNA within a given cell, studies have shown that its removal via probe hybridization increases the capacity to retrieve data from the remaining portion of the transcriptome.

The next step is reverse transcription. Due to the 5' bias of randomly primed-reverse transcription as well as secondary structures influencing primer binding sites,[8] hydrolysis of RNA into 200-300 nucleotides prior to reverse transcription reduces both problems simultaneously. However, there are trade-offs with this method where although the overall body of the transcripts are efficiently converted to DNA, the 5' and 3' ends are less so. Depending on the aim of the study, researchers may choose to apply or ignore this step.

Once the cDNA is synthesized it can be further fragmented to reach the desired fragment length of the sequencing system.

Small RNA/non-coding RNA sequencing[edit]

When sequencing RNA other than mRNA, the library preparation is modified. The cellular RNA is selected based on the desired size range. For small RNA targets, such as miRNA, the RNA is isolated through size selection. This can be performed with a size exclusion gel, through size selection magnetic beads, or with a commercially developed kit. Once isolated, linkers are added to the 3' and 5' end then purified. The final step is cDNA generation through reverse transcription.

Direct RNA sequencing[edit]

As converting RNA into cDNA using reverse transcriptase has been shown to introduce biases and artifacts that may interfere with both the proper characterization and quantification of transcripts,[10] single molecule Direct RNA Sequencing (DRSTM) technology was under development by Helicos (now bankrupt). DRSTM sequences RNA molecules directly in a massively-parallel manner without RNA conversion to cDNA or other biasing sample manipulations such as ligation and amplification.

Experimental considerations[edit]

A variety of parameters are considered when designing and conducting RNA-Seq experiments:

  • Tissue specificity: Gene expression varies within and between tissues, and RNA-Seq measures this mix of cell types. This may make it difficult to isolate the biological mechanism of interest. Single cell sequencing can be used to study each cell individually, mitigating this issue.
  • Time dependence: Gene expression changes over time, and RNA-Seq only takes a snapshot. Time course experiments can be performed to observe changes in the transcriptome.
  • Coverage (also known as depth): RNA harbors the same mutations observed in DNA, and detection requires deeper coverage. With high enough coverage, RNA-Seq can be used to estimate the expression of each allele. This may provide insight into phenomena such as imprinting or cis-regulatory effects. The depth of sequencing required for specific applications can be extrapolated from a pilot experiment.[11]
  • Data generation artifacts (also known as technical variance): The reagents (eg library preparation kit), personnel involved, and type of sequencer (eg Illumina, Pacific Biosciences) can result in technical artifacts that might be mis-interpreted as meaningful results. As with any scientific experiment, it is prudent to conduct RNA-Seq in a well controlled setting. If this is not possible or the study is a meta-analysis, another solution is to detect technical artifacts by inferring latent variables (typically principal component analysis or factor analysis) and subsequently correcting for these variables.[12]
  • Data management: A single RNA-Seq experiment in humans is usually on the order of 1 Gb.[13] This large volume of data can pose storage issues. One solution is compressing the data using multi-purpose computational schemas (eg gzip) or genomics-specific schemas. The latter can be based on reference sequences or de novo. Another solution is to perform microarray experiments, which may be sufficient for hypothesis-driven work or replication studies (as opposed to exploratory research).


Diagram outlining the RNASeq analyses described in this section

Transcriptome assembly[edit]

Two methods are used to assign raw sequence reads to genomic features (i.e., assemble the transcriptome):

  • De novo: This approach does not require a reference genome to reconstruct the transcriptome, and is typically used if the genome is unknown, incomplete, or substantially altered compared to the reference.[14] Challenges when using short reads for de novo assembly include 1) determining which reads should be joined together into contiguous sequences (contigs), 2) robustness to sequencing errors and other artifacts, and 3) computational efficiency. The primary algorithm used for de novo assembly transitioned from overlap graphs, which identify all pair-wise overlaps between reads, to de Bruijn graphs, which break reads into sequences of length k and collapse all k-mers into a hash table.[15] Overlap graphs were used with Sanger sequencing, but do not scale well to the millions of reads generated with RNA-Seq. Examples of assemblers that use de Bruijn graphs are Velvet,[16] Trinity,[17] Oases,[18] and Bridger.[19] Paired end and long read sequencing of the same sample can mitigate the deficits in short read sequencing by serving as a template or skeleton. Metrics to assess the quality of a de novo assembly include median contig length, number of contigs and N50.[20]
RNA-seq mapping of short reads in exon-exon junctions. The final mRNA is sequenced, which is missing the intronic sections of the pre-mRNA.
  • Genome guided: This approach relies on the same methods used for DNA, with the additional complexity of aligning reads that cover non-continuous portions of the reference genome.[21] These non-continuous reads are the result of sequencing spliced transcripts (see figure). Typically, alignment algorithms have two steps: 1) align short portions of the read (i.e., seed the genome), and 2) use dynamic programming methods find an optimal alignment, sometimes in combination with known annotations. Software tools that use genome-guided alignment include Bowtie,[22] TopHat (which builds on BowTie results to align splice junctions),[23][24] Subread,[25] STAR,[21] Sailfish,[26] and Kallisto.[27] The quality of a genome guided assembly can be measured with both de novo assembly metrics (e.g., N50) or comparisons to known transcript, splice junction, genome, and protein sequences using precision, recall, or their combination (e.g., F1 score).[28]

A note on assembly quality: The current consensus is that 1) assembly quality can vary depending on which metric is used, 2) assemblies that scored well in one species do not necessarily perform well in the other species, and 3) combining different approaches might be the most reliable.[29][30]

Gene expression[edit]

The characterization of gene expression in cells via measurement of mRNA levels has long been of interest to researchers, both in terms of which genes are expressed in what tissues, and at what levels. Even though it has been shown that due to other post transcriptional gene regulation events (such as RNA interference) there is not necessarily always a strong correlation between the abundance of mRNA and the related proteins,[31] measuring mRNA concentration levels is still a useful tool in determining how the transcriptional machinery of the cell is affected in the presence of external signals (e.g. drug treatment), or how cells differ between a healthy state and a diseased state.

Expression can be deduced via RNA-seq to the extent at which a sequence is retrieved. Transcriptome studies in yeast [32] show that in this experimental setting, a fourfold coverage is required for amplicons to be classified and characterized as an expressed gene. When the transcriptome is fragmented prior to cDNA synthesis, the number of reads corresponding to the particular exon normalized by its length in vivo yields gene expression levels which correlate with those obtained through qPCR.[11] This is frequently further normalized by the total number of mapped reads so that expression levels are expressed as Fragments Per Kilobase of transcript per Million mapped reads (FPKM).[33]

The only way to be absolutely sure of the individual's mutations is to compare the transcriptome sequences to the germline DNA sequence. This enables the distinction of homozygous genes versus skewed expression of one of the alleles and it can also provide information about genes that were not expressed in the transcriptomic experiment. An R-based statistical package known as CummeRbund[34] can be used to generate expression comparison charts for visual analysis.

Differential expression and absolute quantification of transcripts[edit]

RNA-Seq is generally used to compare gene expression between conditions, such as a drug treatment vs non-treated, and find out which genes are up- or down regulated in each condition. In principle RNA-Seq will allow to account for all the transcripts in the cell for each condition. Differently expressed genes can be identified by using tools that count the sequencing reads per gene and compare them between samples. The most commonly used tools for this type of analysis are DESeq[35] and edgeR,[36] packages from Bioconductor.[37][38] Both these tools use a model based on the negative binomial distribution.[35][36]

It is not possible to do absolute quantification using the common RNA-Seq pipeline, because it only provides RNA levels relative to all transcripts. If the total amount of RNA in the cell changes between conditions, relative normalization will misrepresent the changes for individual transcripts. Absolute quantification of mRNAs is possible by performing RNA-Seq with added spike ins, samples of RNA at known concentrations. After sequencing, the read count of the spike ins sequences is used to determine the direct correspondence between read count and biological fragments.[39][40] In developmental studies, this technique has been used in Xenopus tropicalis embryos at a high temporal resolution, to determine transcription kinetics.[41]

Single nucleotide variation discovery[edit]

Transcriptome single nucleotide variation has been analyzed in maize on the Roche 454 sequencing platform.[42] Directly from the transcriptome analysis, around 7000 single nucleotide polymorphisms (SNPs) were recognized. Following Sanger sequence validation, the researchers were able to conservatively obtain almost 5000 valid SNPs covering more than 2400 maize genes. RNA-seq is limited to transcribed regions however, since it will only discover sequence variations in exon regions. This misses many subtle but important intron alleles that affect disease such as transcription regulators, leaving analysis to only large effectors. While some correlation exists between exon to intron variation, only whole genome sequencing would be able to capture the source of all relevant SNPs.[43]

Post-transcriptional SNVs[edit]

Having the matching genomic and transcriptomic sequences of an individual can also help in detecting post-transcriptional edits,[7] where, if the individual is homozygous for a gene, but the gene's transcript has a different allele, then a post-transcriptional modification event is determined.

mRNA centric single nucleotide variants (SNVs) are generally not considered as a representative source of functional variation in cells, mainly due to the fact that these mutations disappear with the mRNA molecule, however the fact that efficient DNA correction mechanisms do not apply to RNA molecules can cause them to appear more often. This has been proposed as the source of certain prion diseases,[44] also known as TSE or transmissible spongiform encephalopathies.

RNA-seq mapping of short reads over exon-exon junctions, depending on where each end maps to, it could be defined a Trans or a Cis event.

Fusion gene detection[edit]

See also: Fusion gene

Caused by different structural modifications in the genome, fusion genes have gained attention because of their relationship with cancer.[45] The ability of RNA-seq to analyze a sample's whole transcriptome in an unbiased fashion makes it an attractive tool to find these kinds of common events in cancer.[46]

The idea follows from the process of aligning the short transcriptomic reads to a reference genome. Most of the short reads will fall within one complete exon, and a smaller but still large set would be expected to map to known exon-exon junctions. The remaining unmapped short reads would then be further analyzed to determine whether they match an exon-exon junction where the exons come from different genes. This would be evidence of a possible fusion event, however, because of the length of the reads, this could prove to be very noisy. An alternative approach is to use pair-end reads, when a potentially large number of paired reads would map each end to a different exon, giving better coverage of these events (see figure). Nonetheless, the end result consists of multiple and potentially novel combinations of genes providing an ideal starting point for further validation.

Coexpression networks[edit]

Coexpression networks are data-derived representations of genes behaving in a similar way across tissues and experimental conditions.[47] Their main purpose lies in hypothesis generation and guilt-by-association approaches for inferring functions of previously unknown genes.[47] RNASeq data has been recently used to infer genes involved in specific pathways based on Pearson correlation, both in plants [48] and mammals.[49] The main advantage of RNASeq data in this kind of analysis over the microarray platforms is the capability to cover the entire transcriptome, therefore allowing the possibility to unravel more complete representations of the gene regulatory networks. Differential regulation of the splice isoforms of the same gene can be detected and used to predict and their biological functions.[50][51] Weighted gene co-expression network analysis has been successfully used to identify co-expression modules and intramodular hub genes based on RNA seq data. Co-expression modules may corresponds to cell types or pathways. Highly connected intramodular hubs can be interpreted as representatives of their respective module. Variance-Stabilizing Transformation approaches for estimating correlation coefficients based on RNA seq data have been proposed.[48]

Application to genomic medicine[edit]


The past five years have seen a flourishing of NGS-based methods for genome analysis leading to the discovery of a number of new mutations and fusion transcripts in cancer. RNA-Seq data could help researchers interpreting the "personalized transcriptome" so that it will help understanding the transcriptomic changes happening therefore, ideally, identifying gene drivers for a disease. The feasibility of this approach is however dictated by the costs in terms of money and time.

A basic search on PubMed reveals that the term RNA Seq, queried as ""RNA Seq" OR "RNA-Seq" OR "RNA sequencing" OR "RNASeq"" in order to capture the most common ways of phrasing it, gives 5.425 hits demonstrating usage statistics of this technology. A few examples will be taken into consideration to explain that RNA-Seq applications to the clinic have the potentials to significantly affect patient's life and, on the other hand, requires a team of specialists (bioinformaticians, physicians/clinicians, basic researchers, technicians) to fully interpret the huge amount of data generated by this analysis.

As an example of clinical applications, researchers at the Mayo Clinic used an RNA-Seq approach to identify differentially expressed transcripts between oral cancer and normal tissue samples. They also accurately evaluated the allelic imbalance (AI), ratio of the transcripts produced by the single alleles, within a subgroup of genes involved in cell differentiation, adhesion, cell motility and muscle contraction[52] identifying a unique transcriptomic and genomic signature in oral cancer patients. Novel insight on skin cancer (melanoma) also come from RNA-Seq of melanoma patients. This approach led to the identification of eleven novel gene fusion transcripts originated from previously unknown chromosomal rearrangements. Twelve novel chimeric transcripts were also reported, including seven of those that confirmed previously identified data in multiple melanoma samples.[53] Furthermore, this approach is not limited to cancer patients. RNA-Seq has been used to study other important chronic diseases such as Alzheimer (AD) and diabetes. In the former case, Twine and colleagues compared the transcriptome of different lobes of deceased AD's patient's brain with the brain of healthy individuals identifying a lower number of splice variants in AD's patients and differential promoter usage of the APOE-001 and -002 isoforms in AD's brains.[54] In the latter case, different groups showed the unicity of the beta-cells transcriptome in diabetic patients in terms of transcripts accumulation and differential promoter usage[55] and long non coding RNAs (lncRNAs) signature.[56]

Compared with microarrays, NGS technology has identified novel and low frequency RNAs associated with disease processes. This advantage aids in the diagnosis and possible future treatments of diseases, including cancer. For example, NGS technology identified several previously undocumented differentially-expressed transcripts in rats treated with AFB1, a potent hepatocarcinogen. Nearly 50 new differentially-expressed transcriptions were identified between the controls and AFB1-treated rats. Additionally potential new exons were identified, including some that are responsive to AFB1. The next-generation sequencing pipeline identified more differential gene expressions compared with microarrays, particularly when DESeq software was utilized. Cufflinks identified two novel transcripts that were not previously annotated in the Ensembl database; these transcripts were confirmed using cloning PCR.[57] Numerous other studies have demonstrated NGS's ability to detect aberrant mRNA and small non-coding RNA expression in disease processes above that provided by microarrays. The lower cost and higher throughput offered by NGS confers another advantage to researchers.

The role of small non-coding RNAs in disease processes has also been explored in recent years. For example, Han et al. (2011) examined microRNA expression differences in bladder cancer patients in order to understand how changes and dysregulation in microRNA can influence mRNA expression and function. Several microRNAs were differentially expressed in the bladder cancer patients. Upregulation in the aberrant microRNAs was more common than downregulation in the cancer patients. One of the upregulated microRNAs, has-miR-96, has been associated with carcinogenesis, and several of the overexpressed microRNAs have also been observed in other cancers, including ovarian and cervical. Some of the downregulated microRNAs in cancer samples were hypothesized to have inhibitory roles.[58]

ENCODE and TCGA[edit]

A lot of emphasis has been given to RNA-Seq data after the Encyclopedia of the regulatory elements (ENCODE) and The Cancer Genome Atlas (TCGA) projects have used this approach to characterize dozens of cell lines[59] and thousands of primary tumor samples,[60] respectively. ENCODE aimed to identify genome-wide regulatory regions in different cohort of cell lines and transcriptomic data are paramount in order to understand the downstream effect of those epigenetic and genetic regulatory layers. TCGA, instead, aimed to collect and analyze thousands of patient's samples from 30 different tumor types in order to understand the underlying mechanisms of malignant transformation and progression. In this context RNA-Seq data provide a unique snapshot of the transcriptomic status of the disease and look at an unbiased population of transcripts that allows the identification of novel transcripts, fusion transcripts and non-coding RNAs that could be undetected with different technologies.


  1. ^ a b c Ryan D. Morin; Matthew Bainbridge; Anthony Fejes; Martin Hirst; Martin Krzywinski; Trevor J. Pugh; Helen McDonald; Richard Varhol; Steven J.M. Jones & Marco A. Marra. (2008). "Profiling the HeLa S3 transcriptome using randomly primed cDNA and massively parallel short-read sequencing". BioTechniques. 45 (1): 81–94. doi:10.2144/000112900. PMID 18611170. 
  2. ^ Chu Y, Corey DR (August 2012). "RNA sequencing: platform selection, experimental design, and data interpretation". Nucleic Acid Ther. 22 (4): 271–4. doi:10.1089/nat.2012.0367. PMC 3426205free to read. PMID 22830413. 
  3. ^ Wang, Zhong; Gerstein, Mark; Snyder, Michael. "RNA-Seq: a revolutionary tool for transcriptomics". Nature Reviews Genetics. 10 (1): 57–63. doi:10.1038/nrg2484. PMC 2949280free to read. PMID 19015660. 
  4. ^ Maher CA, Kumar-Sinha C, Cao X, et al. (March 2009). "Transcriptome sequencing to detect gene fusions in cancer". Nature. 458 (7234): 97–101. doi:10.1038/nature07638. PMC 2725402free to read. PMID 19136943. 
  5. ^ Ingolia NT, Brar GA, Rouskin S, McGeachy AM, Weissman JS (August 2012). "The ribosome profiling strategy for monitoring translation in vivo by deep sequencing of ribosome-protected mRNA fragments". Nat Protoc. 7 (8): 1534–50. doi:10.1038/nprot.2012.086. PMC 3535016free to read. PMID 22836135. 
  6. ^ Kukurba, Kimberly R.; Montgomery, Stephen B. (2015-11-01). "RNA Sequencing and Analysis". Cold Spring Harbor Protocols. 2015 (11): 951–969. doi:10.1101/pdb.top084970. ISSN 1559-6095. PMC 4863231free to read. PMID 25870306. 
  7. ^ a b Wang Z, Gerstein M, Snyder M (January 2009). "RNA-Seq: a revolutionary tool for transcriptomics". Nature Reviews Genetics. 10 (1): 57–63. doi:10.1038/nrg2484. PMC 2949280free to read. PMID 19015660. 
  8. ^ a b Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B (2008). "Mapping and quantifying mammalian transcriptomes by RNA-seq". Nature Methods. 5 (7): 621–628. doi:10.1038/nmeth.1226. PMID 18516045. 
  9. ^ The Protocol Online websitehttp://www.protocol-online.org/prot/Molecular_Biology/RNA/RNA_Extraction/mRNA_Isolation/index.html provides a list of several protocols relating to mRNA isolation
  10. ^ Liu D, Graber JH (2006). "Quantitative comparison of EST libraries requires compensation for systematic biases in cDNA generation". BMC Bioinformatics. 7: 77. doi:10.1186/1471-2105-7-77. PMC 1431573free to read. PMID 16503995. 
  11. ^ a b Li H, Lovci MT, Kwon YS, Rosenfeld MG, Fu XD, Yeo GW (2008). "Determination of tag density required for digital transcriptome analysis: Application to an androgen-sensitive prostate cancer model". Proc Natl Acad Sci USA. 105 (51): 20179–84. doi:10.1073/pnas.0807121105. PMC 2603435free to read. PMID 19088194. 
  12. ^ Stegle, Oliver; Parts, Leopold; Piipari, Matias; Winn, John; Durbin, Richard (16 February 2012). "Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses". Nature Protocols. 7 (3): 500–507. doi:10.1038/nprot.2011.457. 
  13. ^ Kingsford, Carl; Patro, Rob (15 June 2015). "Reference-based compression of short-read sequences using path encoding". Bioinformatics. 31 (12): 1920–1928. doi:10.1093/bioinformatics/btv071. 
  14. ^ Grabherr, Manfred G; Haas, Brian J; Yassour, Moran; Levin, Joshua Z; Thompson, Dawn A; Amit, Ido; Adiconis, Xian; Fan, Lin; Raychowdhury, Raktima; Zeng, Qiandong; Chen, Zehua; Mauceli, Evan; Hacohen, Nir; Gnirke, Andreas; Rhind, Nicholas; di Palma, Federica; Birren, Bruce W; Nusbaum, Chad; Lindblad-Toh, Kerstin; Friedman, Nir; Regev, Aviv (15 May 2011). "Full-length transcriptome assembly from RNA-Seq data without a reference genome". Nature Biotechnology. 29 (7): 644–652. doi:10.1038/nbt.1883. 
  15. ^ "De Novo Assembly Using Illumina Reads" (PDF). Retrieved 22 October 2016. 
  16. ^ Zerbino, D. R.; Birney, E. (21 February 2008). "Velvet: Algorithms for de novo short read assembly using de Bruijn graphs". Genome Research. 18 (5): 821–829. doi:10.1101/gr.074492.107. 
  17. ^ Grabherr, Manfred G; Haas, Brian J; Yassour, Moran; Levin, Joshua Z; Thompson, Dawn A; Amit, Ido; Adiconis, Xian; Fan, Lin; Raychowdhury, Raktima; Zeng, Qiandong; Chen, Zehua; Mauceli, Evan; Hacohen, Nir; Gnirke, Andreas; Rhind, Nicholas; di Palma, Federica; Birren, Bruce W; Nusbaum, Chad; Lindblad-Toh, Kerstin; Friedman, Nir; Regev, Aviv (15 May 2011). "Full-length transcriptome assembly from RNA-Seq data without a reference genome". Nature Biotechnology. 29 (7): 644–652. doi:10.1038/nbt.1883. 
  18. ^ Oases: a transcriptome assembler for very short reads
  19. ^ Chang, Zheng; Li, Guojun; Liu, Juntao; Zhang, Yu; Ashby, Cody; Liu, Deli; Cramer, Carole L; Huang, Xiuzhen (2015). "Bridger: a new framework for de novo transcriptome assembly using RNA-seq data". Genome Biology. 16 (1): 30. doi:10.1186/s13059-015-0596-2. 
  20. ^ Li, Bo; Fillmore, Nathanael; Bai, Yongsheng; Collins, Mike; Thomson, James A; Stewart, Ron; Dewey, Colin N (21 December 2014). "Evaluation of de novo transcriptome assemblies from RNA-Seq data". Genome Biology. 15 (12). doi:10.1186/s13059-014-0553-5. 
  21. ^ a b Dobin, A.; Davis, C. A.; Schlesinger, F.; Drenkow, J.; Zaleski, C.; Jha, S.; Batut, P.; Chaisson, M.; Gingeras, T. R. (25 October 2012). "STAR: ultrafast universal RNA-seq aligner". Bioinformatics. 29 (1): 15–21. doi:10.1093/bioinformatics/bts635. 
  22. ^ Langmead, Ben; Trapnell, Cole; Pop, Mihai; Salzberg, Steven L (2009). "Ultrafast and memory-efficient alignment of short DNA sequences to the human genome". Genome Biology. 10 (3): R25. doi:10.1186/gb-2009-10-3-r25. 
  23. ^ Trapnell, C.; Pachter, L.; Salzberg, S. L. (16 March 2009). "TopHat: discovering splice junctions with RNA-Seq". Bioinformatics. 25 (9): 1105–1111. doi:10.1093/bioinformatics/btp120. 
  24. ^ Trapnell, Cole; Roberts, Adam; Goff, Loyal; Pertea, Geo; Kim, Daehwan; Kelley, David R; Pimentel, Harold; Salzberg, Steven L; Rinn, John L; Pachter, Lior (1 March 2012). "Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks". Nature Protocols. 7 (3): 562–578. doi:10.1038/nprot.2012.016. 
  25. ^ Liao, Y.; Smyth, G. K.; Shi, W. (4 April 2013). "The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote". Nucleic Acids Research. 41 (10): e108–e108. doi:10.1093/nar/gkt214. 
  26. ^ Patro, Rob; Mount, Stephen M; Kingsford, Carl (20 April 2014). "Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms". Nature Biotechnology. 32 (5): 462–464. doi:10.1038/nbt.2862. 
  27. ^ Bray, Nicolas L; Pimentel, Harold; Melsted, Páll; Pachter, Lior (4 April 2016). "Near-optimal probabilistic RNA-seq quantification". Nature Biotechnology. 34 (5): 525–527. doi:10.1038/nbt.3519. 
  28. ^ Li, Bo; Fillmore, Nathanael; Bai, Yongsheng; Collins, Mike; Thomson, James A; Stewart, Ron; Dewey, Colin N (21 December 2014). "Evaluation of de novo transcriptome assemblies from RNA-Seq data". Genome Biology. 15 (12). doi:10.1186/s13059-014-0553-5. 
  29. ^ Lu B, Zeng Z, Shi T (February 2013). "Comparative study of de novo assembly and genome-guided assembly strategies for transcriptome reconstruction based on RNA-Seq". Science China Life Sciences. 56 (2): 143–55. doi:10.1007/s11427-013-4442-z. PMID 23393030. 
  30. ^ Bradnam KR, Fass JN, Alexandrov A, et al. (July 2013). "Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species". Gigascience. 2 (1): 10. doi:10.1186/2047-217X-2-10. PMID 23870653. 
  31. ^ Greenbaum D, Colangelo C, Williams K, Gerstein M (2003). "Comparing protein abundance and mRNA expression levels on a genomic scale". Genome Biology. 4 (9): 117. doi:10.1186/gb-2003-4-9-117. PMC 193646free to read. PMID 12952525. 
  32. ^ Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, Gerstein M, Snyder M (2008). "The Transcriptional Landscape of the Yeast Genome Defined by RNA Sequencing". Science. 320 (5881): 1344–1349. doi:10.1126/science.1158441. PMC 2951732free to read. PMID 18451266. 
  33. ^ Trapnell, Cole; Williams, Brian A; Pertea, Geo; Mortazavi, Ali; Kwan, Gordon; van Baren, Marijke J; Salzberg, Steven L; Wold, Barbara J; Pachter, Lior (May 2010). "Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation". Nat Biotechnol. 28 (5): 511–515. doi:10.1038/nbt.1621. PMC 3146043free to read. PMID 20436464. 
  34. ^ "CummeRbund - An R package for persistent storage, analysis, and visualization of RNA-Seq from cufflinks output". Retrieved 2013-07-28. 
  35. ^ a b Anders, Simon; Huber, Wolfgang (2010-01-01). "Differential expression analysis for sequence count data". Genome Biology. 11 (10): R106. doi:10.1186/gb-2010-11-10-r106. ISSN 1474-760X. PMC 3218662free to read. PMID 20979621. 
  36. ^ a b Robinson, Mark D.; McCarthy, Davis J.; Smyth, Gordon K. (2010-01-01). "edgeR: a Bioconductor package for differential expression analysis of digital gene expression data". Bioinformatics (Oxford, England). 26 (1): 139–140. doi:10.1093/bioinformatics/btp616. ISSN 1367-4811. PMC 2796818free to read. PMID 19910308. 
  37. ^ "Bioconductor - Open source software for bioinformatics". 
  38. ^ Huber, Wolfgang; Carey, Vincent J; Gentleman, Robert; Anders, Simon; Carlson, Marc; Carvalho, Benilton S; Bravo, Hector Corrada; Davis, Sean; Gatto, Laurent. "Orchestrating high-throughput genomic analysis with Bioconductor". Nature Methods. 12 (2): 115–121. doi:10.1038/nmeth.3252. PMC 4509590free to read. PMID 25633503. 
  39. ^ Mortazavi, Ali; Williams, Brian A.; McCue, Kenneth; Schaeffer, Lorian; Wold, Barbara (2008-07-01). "Mapping and quantifying mammalian transcriptomes by RNA-Seq". Nature Methods. 5 (7): 621–628. doi:10.1038/nmeth.1226. ISSN 1548-7105. PMID 18516045. 
  40. ^ Marguerat, Samuel; Schmidt, Alexander; Codlin, Sandra; Chen, Wei; Aebersold, Ruedi; Bähler, Jürg (2012-10-26). "Quantitative analysis of fission yeast transcriptomes and proteomes in proliferating and quiescent cells". Cell. 151 (3): 671–683. doi:10.1016/j.cell.2012.09.019. ISSN 1097-4172. PMC 3482660free to read. PMID 23101633. 
  41. ^ Owens, Nick D. L.; Blitz, Ira L.; Lane, Maura A.; Patrushev, Ilya; Overton, John D.; Gilchrist, Michael J.; Cho, Ken W. Y.; Khokha, Mustafa K. (2016-01-26). "Measuring Absolute RNA Copy Numbers at High Temporal Resolution Reveals Transcriptome Kinetics in Development". Cell Reports. 14 (3): 632–647. doi:10.1016/j.celrep.2015.12.050. ISSN 2211-1247. PMC 4731879free to read. PMID 26774488. 
  42. ^ Barbazuk WB, Emrich SJ, Chen HD, Li L, Schnable PS (2007). "SNP discovery via 454 transcriptome sequencing". The Plant Journal. 51 (5): 910–918. doi:10.1111/j.1365-313X.2007.03193.x. PMC 2169515free to read. PMID 17662031. 
  43. ^ Lalonde E, Ha KC, Wang Z, et al. (April 2011). "RNA sequencing reveals the role of splicing polymorphisms in regulating human gene expression". Genome Res. 21 (4): 545–54. doi:10.1101/gr.111211.110. PMC 3065702free to read. PMID 21173033. 
  44. ^ Garcion E, Wallace B, Pelletier L, Wion D (2004). "RNA mutagenesis and sporadic prion diseases". Journal of Theoretical Biology. 230 (2): 271–274. doi:10.1016/j.jtbi.2004.05.014. PMID 15302558. 
  45. ^ Teixeira MR (2006). "Recurrent fusion oncogenes in carcinomas". Ciritical Reviews in Oncogenesis. 12 (3–4): 257–271. doi:10.1615/critrevoncog.v12.i3-4.40. PMID 17425505. 
  46. ^ Maher CA, Kumar-Sinha C, Cao X, Kalyana-Sundaram S, Han B, Jing X, Sam L, Barrette T, Palanisamy N, Chinnaiyan AM (January 2009). "Transcriptome Sequencing to Detect Gene Fusions in Cancer". Nature. 458 (7234): 97–101. doi:10.1038/nature07638. PMC 2725402free to read. PMID 19136943. 
  47. ^ a b Marcotte, EM.; Pellegrini, M.; Thompson, MJ.; Yeates, TO.; Eisenberg, D. (Nov 1999). "A combined algorithm for genome-wide prediction of protein function.". Nature. 402 (6757): 83–6. doi:10.1038/47048. PMID 10573421. 
  48. ^ a b Giorgi Federico Manuel (2013). "Comparative study of RNA-seq- and Microarray-derived coexpression networks in Arabidopsis thaliana". Bioinformatics. 29 (6): 717–724. doi:10.1093/bioinformatics/btt053. PMID 23376351. 
  49. ^ Iancu Ovidiu D (2012). "Utilizing RNA-Seq data for de novo coexpression network inference". Bioinformatics. 28 (12): 1592–1597. doi:10.1093/bioinformatics/bts245. PMID 22556371. 
  50. ^ Eksi, R; Li, HD; Menon, R; Wen, Y; Omenn, GS; Kretzler, M; Guan, Y (Nov 2013). "Systematically differentiating functions for alternatively spliced isoforms through integrating RNA-seq data.". PLOS Computational Biology. 9 (11): e1003314. doi:10.1371/journal.pcbi.1003314. PMC 3820534free to read. PMID 24244129. 
  51. ^ Li, HD; Menon, R; Omenn, GS; Guan, Y (Jun 17, 2014). "The emerging era of genomic data integration for analyzing splice isoform function.". Trends in genetics : TIG. 30 (8): 340–347. doi:10.1016/j.tig.2014.05.005. PMID 24951248. 
  52. ^ Tuch BB, Laborde RR, Xu X, et al. (2010). "Tumor transcriptome sequencing reveals allelic expression imbalances associated with copy number alterations". PLoS ONE. 5 (2): e9317. doi:10.1371/journal.pone.0009317. PMC 2824832free to read. PMID 20174472. 
  53. ^ Berger MF, Levin JZ, Vijayendran K, et al. (April 2010). "Integrative analysis of the melanoma transcriptome". Genome Res. 20 (4): 413–27. doi:10.1101/gr.103697.109. PMC 2847744free to read. PMID 20179022. 
  54. ^ Twine NA, Janitz K, Wilkins MR, Janitz M (2011). "Whole transcriptome sequencing reveals gene expression and splicing differences in brain regions affected by Alzheimer's disease". PLoS ONE. 6 (1): e16266. doi:10.1371/journal.pone.0016266. PMC 3025006free to read. PMID 21283692. 
  55. ^ Ku GM, Kim H, Vaughn IW, et al. (October 2012). "Research resource: RNA-Seq reveals unique features of the pancreatic β-cell transcriptome". Mol. Endocrinol. 26 (10): 1783–92. doi:10.1210/me.2012-1176. PMID 22915829. 
  56. ^ Morán I, Akerman I, van de Bunt M, et al. (October 2012). "Human β cell transcriptome analysis uncovers lncRNAs that are tissue-specific, dynamically regulated, and abnormally expressed in type 2 diabetes". Cell Metab. 16 (4): 435–48. doi:10.1016/j.cmet.2012.08.010. PMID 23040067. 
  57. ^ Merrick B. A.; Phadke D. P.; Auerbach S. S.; Mav D.; Stiegelmeyer S. M.; Shah R. R.; Tice R. R. (2013). "RNA-seq reveals novel hepatic gene expression pattern in Aflatoxin B1 treated rats". PLoS ONE. 8: e61768. doi:10.1371/journal.pone.0061768. 
  58. ^ Han Y.; Chen J.; Zhao X.; Liang C.; Wang Y.; Sun L.; Jiang Z.; Zhang Z.; Yang R.; Chen J.; Li Z.; Tang A.; Li X.; Ye J.; Guan Z.; Gui Y.; Cai Z. (2011). "MicroRNA expression signatures of bladder cancer revealed by deep sequencing". PLOS ONE. 6: e18286. doi:10.1371/journal.pone.0018286. 
  59. ^ "ENCODE Data Matrix". Retrieved 2013-07-28. 
  60. ^ "The Cancer Genome Atlas - Data Portal". Retrieved 2013-07-28. 

External links[edit]