User:Dcwalt/sandbox

This is the user sandbox of Dcwalt. A user sandbox is a subpage of the user's user page. It serves as a testing spot and page development space for the user and is not an encyclopedia article. Create or edit your own sandbox here.

Other sandboxes: Main sandbox | Template sandbox

Finished writing a draft article? Are you ready to request review of it by an experienced editor for possible inclusion in Wikipedia? Submit your draft for review!

RNA-seq (RNA Sequencing), also called "Whole Transcriptome Shotgun Sequencing" ^[1] ("WTSS"), is a technology that utilizes the capabilities of next-generation sequencing to reveal a snapshot of RNA presence and quantity from a genome at a given moment in time.^[2]

Introduction

The transcriptome of a cell is dynamic; it continually changes as opposed to a static genome. The recent developments of Next-Generation Sequencing (NGS) allow for increased base coverage of a DNA sequence, as well as higher sample throughput. This facilitates sequencing of the RNA transcripts in a cell, providing the ability to look at alternative gene spliced transcripts, post-transcriptional changes, gene fusion, mutations/SNPs and changes in gene expression.^[3] In addition to mRNA transcripts, RNA-Seq can look at different populations of RNA to include total RNA, small RNA, such as miRNA, tRNA, and ribosomal profiling.^[4] RNA-Seq can also be used to determine exon/intron boundaries and verify or amend previously annotated 5’ and 3’ gene boundaries, Ongoing RNA-Seq research includes observing cellular pathway alterations during infection,^[5] and gene expression level changes in cancer studies.^[6] Prior to NGS, transcriptomics and gene expression studies were previously done with expression microarrays, which contain thousands of DNA sequences that probe for a match in the target sequence, making available a profile of all transcripts being expressed. This was later done with Serial Analysis of Gene Expression ( SAGE).

One deficiency with microarrays that makes RNA-Seq more attractive has been limited coverage; such arrays target the identification of known common alleles that represent approximately 500,000 to 2,000,000 SNPs of the more than 10,000,000 in the genome.^[7] As such, libraries aren’t usually available to detect and evaluate rare allele variant transcripts^[8], and the arrays are only as good as the SNP databases they’re designed from, so they have limited application for research purposes^[9]. Many cancers for example are caused by rare <1% mutations and would go undetected. However, arrays still have a place for targeted identification of already known common allele variants, making them ideal for regulatory-body approved diagnostics such as cystic fibrosis.

Methods

RNA 'Poly(A)' Library

Creation of a sequence library can change from platform to platform in high throughput sequencing,^[10] where each has several kits designed to build different types of libraries and adapting the resulting sequences to the specific requirements of their instruments. However, due to the nature of the template being analyzed, there are commonalities within each technology. Frequently, in mRNA analysis the 3' polyadenylated (poly(A)) tail is targeted in order to ensure that coding RNA is separated from noncoding RNA. This can be accomplished simply with poly (T) oligos covalently attached to a given substrate. Presently many studies utilize magnetic beads for this step.^[1]^[11] The Protocol Online website^[12] provides a list of several protocols relating to mRNA isolation.

Studies including portions of the transcriptome outside poly(A) RNAs have shown that when using poly(T) magnetic beads, the flow-through RNA (non-poly(A) RNA) can yield important noncoding RNA gene discovery which would have otherwise gone unnoticed.^[1] Also, since ribosomal RNA represents over 90% of the RNA within a given cell, studies have shown that its removal via probe hybridization increases the capacity to retrieve data from the remaining portion of the transcriptome.

The next step is reverse transcription. Due to the 5' bias of randomly primed-reverse transcription as well as secondary structures influencing primer binding sites,^[11] hydrolysis of RNA into 200-300 nucleotides prior to reverse transcription reduces both problems simultaneously. However, there are trade-offs with this method where although the overall body of the transcripts are efficiently converted to DNA, the 5' and 3' ends are less so. Depending on the aim of the study, researchers may choose to apply or ignore this step.

Once the cDNA is synthesized it can be further fragmented to reach the desired fragment length of the sequencing system.

Small RNA/Non-coding RNA sequencing

When sequencing RNA other than mRNA the library preparation is modified. The cellular RNA is selected based on the desired size range. For small RNA targets, such as miRNA, the RNA is isolated through size selection. This can be performed with a size exclusion gel, through size selection magnetic beads, or with a commercially developed kit. Once isolated, linkers are added to the 3’ and 5’ end then purified. The final step is cDNA generation through reverse transcription.

RNA-seq mapping of short reads in exon-exon junctions.

Direct RNA Sequencing

As converting RNA into cDNA using reverse transcriptase has been shown to introduce biases and artifacts that may interfere with both the proper characterization and quantification of transcripts^[13], single molecule Direct RNA Sequencing (DRSTM) technology is currently under development by Helicos. DRSTM sequences RNA molecules directly in a massively-parallel manner without RNA conversion to cDNA or other biasing sample manipulations such as ligation and amplification.

Transcriptome Assembly

Two different assembly methods are used for producing a transcriptome from raw sequence reads: de-novo and genome-guided.

The first approach does not rely on the presence of a reference genome in order to reconstruct the nucleotide sequence. Due to the small size of the short reads de novo assembly may be difficult though some software does exist ( Velvet (algorithm), Oases, and Trinity^[14] to mention a few), as there cannot be large overlaps between each read needed to easily reconstruct the original sequences. The deep coverage also makes the computing power to track all the possible alignments prohibitive.^[15] This deficit can improved using longer sequences obtained from the same sample using other techniques such as Sanger sequencing, and using larger reads as a "skeleton" or a "template" to help assemble reads in difficult regions (e.g. regions with repetitive sequences).

An “easier” and relatively computationally cheaper approach is that of aligning the millions of reads to a "reference genome". There are many tools available for aligning genomic reads to a reference genome (sequence alignment tools), however, special attention is needed when alignment of a transcriptome to a genome, mainly when dealing with genes having intronic regions. Several software packages exist for short read alignment, and recently specialized algorithms for transcriptome alignment have been developed, e.g. Bowtie for RNA-seq short read alignment^[16], TopHat for aligning reads to a reference genome to discover splice sites^[17], Cufflinks to assemble the transcripts and compare/merge them with others^[18], or FANSe^[19]. These tools can also be combined to form a comprehensive system^[20].

Although numerous solutions to the assembly quest have been proposed, there is still lots of room for improvement given the resulting variability of the approaches. A group from the Center for Computational Biology at the East China Normal University in Shanghai compared different de novo and genome-guided approaches for RNA-Seq assembly. They noted that, although most of the problems can be solved using graph theory approaches, there is still a consistent level of variability in all of them. Some algorithms outperformed the common standards for some species while still struggling for others. The authors suggest that the “most reliable” assembly could be then obtained by combining different approaches^[21]. Interestingly, these results are consistent with NGS-genome data obtained in a recent contest called Assemblathon where 21 contestants analyzed sequencing data from three different vertebrates (fish, snake and bird) and handed in a total of 43 assemblies. Using a metric made of 100 different measures for each assembly, the reviewers concluded that 1) assembly quality can vary a lot depending on which metric is used and 2) assemblies that scored well in one species didn’t really perform well in the other species^[22].

As discussed above, sequence libraries are created by extracting mRNA using its poly(A) tail, which is added to the mRNA molecule post-transcriptionally and thus splicing has taken place. Therefore, the created library and the short reads obtained cannot come from intronic sequences, so library reads spanning the junction of two or more exons will not align to the genome.

A possible method to work around this is to try to align the unaligned short reads using a proxy genome generated with known exonic sequences. This need not cover whole exons, only enough so that the short reads can match on both sides of the exon-exon junction with minimum overlap. Some experimental protocols allow the production of strand specific reads.^[11]

Experimental Considerations

The information gathered when sequencing a sample's transcriptome in this way has many of the same limitations and advantages as other RNA expression analysis pipelines. The main pros and cons of this approach can be summarized as:

a) Tissue specificity: Gene expression is not uniform throughout an organism's cells, it is strongly dependent on the tissue type being measured; RNA-Seq, as any other sequencing technology that analyzes homogeneous samples, can provide a complete snapshot of all the transcripts being available at that precise moment in the cell. This approach is unlikely to be biased like an oligonucleotide microarray approach that instead analyzes a selected number of previously defined transcripts.

b) Time dependent: During a cell's lifetime and context, its gene expression levels change. As previously mentioned any single sequencing experiment will offer information regarding one point in time. Time course experiments are so far the only solution that would allow a complete overview of the circadian transcriptome so that researchers could obtain a precise description of the physiological changes happening over time. However, this approach is unfeasible for patient samples since it is quite improbable that biopsies will be collected serially in short time intervals. A possible work-around could be the use of urine, blood or saliva samples that won’t require any invasive procedure.

c) Coverage: coverage/depth can affect the mutations seen. Given that everything is expression-centric, an allele might not be detected, either because it is not in the genome, or because it is not being expressed. At the same time, RNA-seq can yield additional information rather than just the existence of a heterozygous gene as it can also help in estimating the expression of each allele. In association studies, genotypes are associated to disease and expression levels can also be associated with disease. Using RNA-seq, we can measure the relationship between these two associated variables, that is, in what relation are each of the alleles being expressed.

The depth of sequencing required for specific applications can be extrapolated from a pilot experiment.^[23]

d) Subjectivity of the analysis: As described above, numerous attempts have been taken to uniformly analyze the data. However, the results can vary due to the multitude of algorithms and pipelines available. Most of the approaches are correct, but have to be tailored to the needs of the investigators in order to better capture the desired effect. This variability in methods, although in smaller scale, is still present in other RNA profiling approaches where reagents, personnel and techniques can lead to similar, although statistically different, results. Because of this, care must be taken when drawing conclusions from the sequencing experiment, as some information gathered might not be representative of the individual.

e) Data management: The main issue with NGS data is the volume of data produced. Microarray data occupy up to one thousand times less disk space than NGS data therefore requiring smaller storage units. The high capacity storage units required by RNA-Seq data are, however, directly proportional to the volume of information that goes with it. The payoff of “more complete” big scale datasets have to be evaluated prior starting the experiment.

f) Downstream interpretation of the data: Different layers of interpretations have to be considered when analyzing RNA-Seq data. Biological, clinical and regulatory functions of the results are what allow clinicians and investigators to draw meaningful conclusions (i.e. the sequence of an RNA molecule presents, although identified with different read depths, might not perfectly mirror the initial DNA sequence). An example of this would be during SNV discovery as the mutations discovered are more precisely the mutations being expressed. Observing a homozygote location to a non-reference allele in an organism does not necessarily mean that this is the individual's genotype, it could just mean that the gene copy with the reference allele is not being expressed in that tissue and/or at the time snapshot the sample was acquired.

Analysis

Gene expression

The characterization of gene expression in cells via measurement of mRNA levels has long been of interest to researchers, both in terms of which genes are expressed in what tissues, and at what levels. Even though it has been shown that due to other post transcriptional gene regulation events (such as RNA interference) there is not necessarily always a strong correlation between the abundance of mRNA and the related proteins,^[24] measuring mRNA concentration levels is still a useful tool in determining how the transcriptional machinery of the cell is affected in the presence of external signals (e.g. drug treatment), or how cells differ between a healthy state and a diseased state.

Expression can be deduced via RNA-seq to the extent at which a sequence is retrieved. Transcriptome studies in yeast ^[25] show that in this experimental setting, a fourfold coverage is required for amplicons to be classified and characterized as an expressed gene. When the transcriptome is fragmented prior to cDNA synthesis, the number of reads corresponding to the particular exon normalized by its length in vivo yields gene expression levels which correlate with those obtained through qPCR.^[23]

The only way to be absolutely sure of the individual's mutations is to compare the transcriptome sequences to the germline DNA sequence. This enables the distinction of homozygous genes versus skewed expression of one of the alleles and it can also provide information about genes that were not expressed in the transcriptomic experiment. An R-based statistical package known as CummeRbund^[26] can be used to generate expression comparison charts for visual analysis.

Single nucleotide variation discovery

Transcriptome single nucleotide variation has been analyzed in maize on the Roche 454 sequencing platform.^[27] Directly from the transcriptome analysis, around 7000 single nucleotide polymorphisms (SNPs) were recognized. Following Sanger sequence validation, the researchers were able to conservatively obtain almost 5000 valid SNPs covering more than 2400 maize genes. RNA-seq is limited to transcribed regions however, since it will only discover sequence variations in exon regions. This misses many subtle but important intron alleles that affect disease such as transcription regulators, leaving analysis to only large effectors. While some correlation exists between exon to intron variation, only whole genome sequencing would be able to capture the source of all relevant SNPs^[28].

Post-transcriptional SNVs

Having the matching genomic and transcriptomic sequences of an individual can also help in detecting post-transcriptional edits,^[10] where, if the individual is homozygous for a gene, but the gene's transcript has a different allele, then a post-transcriptional modification event is determined.

mRNA centric single nucleotide variants (SNVs) are generally not considered as a representative source of functional variation in cells, mainly due to the fact that these mutations disappear with the mRNA molecule, however the fact that efficient DNA correction mechanisms do not apply to RNA molecules can cause them to appear more often. This has been proposed as the source of certain prion diseases,^[29] also known as TSE or transmissible spongiform encephalopathies.

RNA-seq mapping of short reads over exon-exon junctions, depending on where each end maps to, it could be defined a *Trans* or a *Cis* event.

Fusion gene detection

Caused by different structural modifications in the genome, fusion genes have gained attention because of their relationship with cancer.^[30] The ability of RNA-seq to analyze a sample's whole transcriptome in an unbiased fashion makes it an attractive tool to find these kinds of common events in cancer.^[31]

The idea follows from the process of aligning the short transcriptomic reads to a reference genome. Most of the short reads will fall within one complete exon, and a smaller but still large set would be expected to map to known exon-exon junctions. The remaining unmapped short reads would then be further analyzed to determine whether they match an exon-exon junction where the exons come from different genes. This would be evidence of a possible fusion event, however, because of the length of the reads, this could prove to be very noisy. An alternative approach is to use pair-end reads, when a potentially large number of paired reads would map each end to a different exon, giving better coverage of these events (see figure). Nonetheless, the end result consists of multiple and potentially novel combinations of genes providing an ideal starting point for further validation.

Application to Genomic Medicine

History

The past five years have seen a flourishing of NGS-based methods for genome analysis leading to the discovery of a number of new mutations and fusion transcripts in cancer. RNA-Seq data could help researchers interpreting the “personalized transcriptome” so that it will help understanding the transcriptomic changes happening therefore, ideally, identifying gene drivers for a disease. The feasibility of this approach is however dictated by the costs in term of money and time.

A basic search on PubMed reveals that the term RNA Seq, queried as “rna Seq OR RNA-Seq OR rna sequencing OR RNASeq” in order to capture the most common ways of phrasing it, gives 147.525 hits demonstrating the exponentially increasing usage rate of this technology. A few examples will be taken into consideration to explain that RNA-Seq applications to the clinic have the potentials to significantly affect patient’s life and, on the other hand, requires a team of specialists (bioinformaticians, physicians/clinicians, basic researchers, technicians) to fully interpret the huge amount of data generated by this analysis.

As an example of excellent clinical applications, researchers at the Mayo Clinic used an RNA-Seq approach to identify differentially expressed transcripts between oral cancer and normal tissue samples. They also accurately evaluated the allelic imbalance (AI), ratio of the transcripts produced by the single alleles, within a subgroup of genes involved in cell differentiation, adhesion, cell motility and muscle contraction^[32] identifying a unique transcriptomic and genomic signature in oral cancer patients. Novel insight on skin cancer (melanoma) also come from RNA-Seq of melanoma patients. This approach led to the identification of eleven novel gene fusion transcripts originated from previously unknown chromosomal rearrangements. Twelve novel chimeric transcripts were also reported, including seven of those that confirmed previously identified data in multiple melanoma samples^[33]. Furthermore, this approach is not limited to cancer patients. RNA-Seq has been used to study other important chronic diseases such as Alzheimer (AD) and diabetes. In the former case, Twine and colleagues compared the transcriptome of different lobes of deceased AD’s patient’s brain with the brain of healthy individuals identifying a lower number of splice variants in AD’s patients and differential promoter usage of the APOE-001 and -002 isoforms in AD’s brains^[34]. In the latter case, different groups showed the unicity of the beta-cells transcriptome in diabetic patients in terms of transcripts accumulation and differential promoter usage^[35] and long non coding RNAs (lncRNAs) signature^[36].

ENCODE and TCGA

A lot of emphasis has been given to RNA-Seq data after the Encyclopedia of the regulatory elements (ENCODE) and The Cancer Genome Atlas (TCGA) projects have used this approach to characterize dozens of cell lines^[37] and thousands of primary tumor samples^[38], respectively. The former aimed to identify genome-wide regulatory regions in different cohort of cell lines and transcriptomic data are paramount in order to understand the downstream effect of those epigenetic and genetic regulatory layers. The latter project, instead, aimed to collect and analyze thousands of patient’s samples from 30 different tumor types in order to understand the underlying mechanisms of malignant transformation and progression. In this context RNA-Seq data provide a unique snapshot of the transcriptomic status of the disease and look at an unbiased population of transcripts that allows the identification of novel transcripts, fusion transcripts and non-coding RNAs that could be undetected with different technologies.

External links

RNA-Seq for Everyone: a high-level guide to designing and implementing an RNA-Seq experiment.
ChIPBase database: provides expression profiles of protein-coding genes and lncRNAs (lincRNAs) from RNA-Seq data across 22 tissues.
Martin A. Perdacher (September 2011) Next-Generation Sequencing and its Applications in RNA-Seq. Theory part of the Bachelorthesis, Hagenberg.
The RNA-Seq Blog

References

^ ^a ^b ^c Ryan D. Morin, Matthew Bainbridge, Anthony Fejes, Martin Hirst, Martin Krzywinski, Trevor J. Pugh, Helen McDonald, Richard Varhol, Steven J.M. Jones, and Marco A. Marra. (2008). "Profiling the HeLa S3 transcriptome using randomly primed cDNA and massively parallel short-read sequencing". BioTechniques. 45 (1): 81–94. doi:10.2144/000112900. PMID 18611170.{{cite journal}}: CS1 maint: multiple names: authors list (link)
^ Chu Y, Corey DR (August 2012). "RNA sequencing: platform selection, experimental design, and data interpretation". Nucleic Acid Ther. 22 (4): 271–4. doi:10.1089/nat.2012.0367. PMC 3426205. PMID 22830413.{{cite journal}}: CS1 maint: date and year (link)
^ Maher CA, Kumar-Sinha C, Cao X; et al. (March 2009). "Transcriptome sequencing to detect gene fusions in cancer". Nature. 458 (7234): 97–101. doi:10.1038/nature07638. PMC 2725402. PMID 19136943. {{cite journal}}: Explicit use of et al. in: |author= (help)CS1 maint: date and year (link) CS1 maint: multiple names: authors list (link)
^ Ingolia NT, Brar GA, Rouskin S, McGeachy AM, Weissman JS (August 2012). "The ribosome profiling strategy for monitoring translation in vivo by deep sequencing of ribosome-protected mRNA fragments". Nat Protoc. 7 (8): 1534–50. doi:10.1038/nprot.2012.086. PMC 3535016. PMID 22836135.{{cite journal}}: CS1 maint: date and year (link) CS1 maint: multiple names: authors list (link)
^ Qian F, Chung L, Zheng W; et al. (2013). "Identification of Genes Critical for Resistance to Infection by West Nile Virus Using RNA-Seq Analysis". Viruses. 5 (7): 1664–81. doi:10.3390/v5071664. PMC 3738954. PMID 23881275. {{cite journal}}: Explicit use of et al. in: |author= (help)CS1 maint: multiple names: authors list (link)
^ Beane J, Vick J, Schembri F; et al. (June 2011). "Characterizing the impact of smoking and lung cancer on the airway transcriptome using RNA-Seq". Cancer Prev Res (Phila). 4 (6): 803–17. doi:10.1158/1940-6207.CAPR-11-0212. PMC 3694393. PMID 21636547. {{cite journal}}: Explicit use of et al. in: |author= (help)CS1 maint: date and year (link) CS1 maint: multiple names: authors list (link)
^ "HapMap: About the Project". Retrieved 2013-07-28.
^ Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y (September 2008). "RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays". Genome Res. 18 (9): 1509–17. doi:10.1101/gr.079558.108. PMC 2527709. PMID 18550803.{{cite journal}}: CS1 maint: date and year (link) CS1 maint: multiple names: authors list (link)
^ Siu H, Zhu Y, Jin L, Xiong M (2011). "Implication of next-generation sequencing on association studies". BMC Genomics. 12: 322. doi:10.1186/1471-2164-12-322. PMC 3148210. PMID 21682891.{{cite journal}}: CS1 maint: multiple names: authors list (link) CS1 maint: unflagged free DOI (link)
^ ^a ^b Wang Z, Gerstein M, Snyder M. (January 2009). "RNA-Seq: a revolutionary tool for transcriptomics". Nature Reviews Genetics. 10 (1): 57–63. doi:10.1038/nrg2484. PMC 2949280. PMID 19015660.{{cite journal}}: CS1 maint: multiple names: authors list (link)
^ ^a ^b ^c Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. (2008). "Mapping and quantifying mammalian transcriptomes by RNA-seq". Nature Methods. 5 (7): 621–628. doi:10.1038/nmeth.1226. PMID 18516045.{{cite journal}}: CS1 maint: multiple names: authors list (link)
^ http://www.protocol-online.org/prot/Molecular_Biology/RNA/RNA_Extraction/mRNA_Isolation/index.html
^ Liu D, Graber JH (2006). "Quantitative comparison of EST libraries requires compensation for systematic biases in cDNA generation". BMC Bioinformatics. 7: 77. doi:10.1186/1471-2105-7-77. PMC 1431573. PMID 16503995.{{cite journal}}: CS1 maint: unflagged free DOI (link)
^ Grabherr MG, Haas BJ, Yassour M; et al. (July 2011). "Full-length transcriptome assembly from RNA-Seq data without a reference genome". Nat. Biotechnol. 29 (7): 644–52. doi:10.1038/nbt.1883. PMC 3571712. PMID 21572440. {{cite journal}}: Explicit use of et al. in: |author= (help)CS1 maint: date and year (link) CS1 maint: multiple names: authors list (link)
^ Zerbino DR, Birney E (2008). "Velvet: Algorithms for de novo short read assembly using de Bruijn graphs". Genome Research. 18 (5): 821–829. doi:10.1101/gr.074492.107. PMC 2336801. PMID 18349386.
^ Langmead B, Trapnell C, Pop M, Salzberg SL (2009). "Ultrafast and memory-efficient alignment of short DNA sequences to the human genome". Genome Biol. 10 (3): R25. doi:10.1186/gb-2009-10-3-r25. PMC 2690996. PMID 19261174.{{cite journal}}: CS1 maint: multiple names: authors list (link) CS1 maint: unflagged free DOI (link)
^ Cole Trapnell, Lior Pachter and Steven Salzberg (2009). "TopHat: discovering splice junctions with RNA-Seq". Bioinformatics. 25 (9): 1105–1111. doi:10.1093/bioinformatics/btp120. PMC 2672628. PMID 19289445.
^ Cole Trapnell, Brian A Williams, Geo Pertea, Ali Mortazavi, Gordon Kwan, Marijke J van Baren, Steven L Salzberg, Barbara J Wold and Lior Pachter (2010). "Transcript assembly and abundance estimation from RNA-Seq reveals thousands of new transcripts and switching among isoforms". Nature Biotechnology. 28 (5): 511–515. doi:10.1038/nbt.1621. PMC 3146043. PMID 20436464.{{cite journal}}: CS1 maint: multiple names: authors list (link)
^ "FANSe: introduction". Retrieved 2013-07-28.
^ Trapnell C, Roberts A, Goff L; et al. (March 2012). "Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks". Nat Protoc. 7 (3): 562–78. doi:10.1038/nprot.2012.016. PMC 3334321. PMID 22383036. {{cite journal}}: Explicit use of et al. in: |author= (help)CS1 maint: date and year (link) CS1 maint: multiple names: authors list (link)
^ Lu B, Zeng Z, Shi T (February 2013). "Comparative study of de novo assembly and genome-guided assembly strategies for transcriptome reconstruction based on RNA-Seq". Sci China Life Sci. 56 (2): 143–55. doi:10.1007/s11427-013-4442-z. PMID 23393030.{{cite journal}}: CS1 maint: date and year (link) CS1 maint: multiple names: authors list (link)
^ Bradnam KR, Fass JN, Alexandrov A; et al. (July 2013). "Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species". GigaScience. 2 (1): 10. doi:10.1186/2047-217X-2-10. PMC 3844414. PMID 23870653. {{cite journal}}: Explicit use of et al. in: |author= (help)CS1 maint: date and year (link) CS1 maint: multiple names: authors list (link) CS1 maint: unflagged free DOI (link)
^ ^a ^b Li H, Lovci MT, Kwon YS, Rosenfeld MG, Fu XD, Yeo GW (2008). "Determination of tag density required for digital transcriptome analysis: Application to an androgen-sensitive prostate cancer model". Proc Natl Acad Sci USA. 105 (51): 20179–84. doi:10.1073/pnas.0807121105. PMC 2603435. PMID 19088194.{{cite journal}}: CS1 maint: multiple names: authors list (link)
^ Greenbaum D, Colangelo C, Williams K, Gerstein M. (2003). "Comparing protein abundance and mRNA expression levels on a genomic scale". Genome Biology. 4 (9): 117. doi:10.1186/gb-2003-4-9-117. PMC 193646. PMID 12952525.{{cite journal}}: CS1 maint: multiple names: authors list (link) CS1 maint: unflagged free DOI (link)
^ Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, Gerstein M, Snyder M (2008). "The Transcriptional Landscape of the Yeast Genome Defined by RNA Sequencing". Science. 320 (5881): 1344–1349. doi:10.1126/science.1158441. PMC 2951732. PMID 18451266.{{cite journal}}: CS1 maint: multiple names: authors list (link)
^ "CummeRbund - An R package for persistent storage, analysis, and visualization of RNA-Seq from cufflinks output". Retrieved 2013-07-28.
^ Barbazuk WB, Emrich SJ, Chen HD, Li L, Schnable PS (2007). "SNP discovery via 454 transcriptome sequencing". The Plant Journal. 51 (5): 910–918. doi:10.1111/j.1365-313X.2007.03193.x. PMC 2169515. PMID 17662031.{{cite journal}}: CS1 maint: multiple names: authors list (link)
^ Lalonde E, Ha KC, Wang Z; et al. (April 2011). "RNA sequencing reveals the role of splicing polymorphisms in regulating human gene expression". Genome Res. 21 (4): 545–54. doi:10.1101/gr.111211.110. PMC 3065702. PMID 21173033. {{cite journal}}: Explicit use of et al. in: |author= (help)CS1 maint: date and year (link) CS1 maint: multiple names: authors list (link)
^ Garcion E, Wallace B, Pelletier L, Wion D. (2004). "RNA mutagenesis and sporadic prion diseases". Journal of Theoretical Biology. 230 (2): 271–274. doi:10.1016/j.jtbi.2004.05.014. PMID 15302558.{{cite journal}}: CS1 maint: multiple names: authors list (link)
^ Teixeira MR (2006). "Recurrent fusion oncogenes in carcinomas". Ciritical Reviews in Oncogenesis. 12 (3–4): 257–271. doi:10.1615/critrevoncog.v12.i3-4.40. PMID 17425505.
^ Maher CA, Kumar-Sinha C, Cao X, Kalyana-Sundaram S, Han B, Jing X, Sam L, Barrette T, Palanisamy N, Chinnaiyan AM (January 2009). "Transcriptome Sequencing to Detect Gene Fusions in Cancer". Nature. 458 (7234): 97–101. doi:10.1038/nature07638. PMC 2725402. PMID 19136943.{{cite journal}}: CS1 maint: multiple names: authors list (link)
^ Tuch BB, Laborde RR, Xu X; et al. (2010). "Tumor transcriptome sequencing reveals allelic expression imbalances associated with copy number alterations". PLOS ONE. 5 (2): e9317. doi:10.1371/journal.pone.0009317. PMC 2824832. PMID 20174472. {{cite journal}}: Explicit use of et al. in: |author= (help)CS1 maint: multiple names: authors list (link)
^ Berger MF, Levin JZ, Vijayendran K; et al. (April 2010). "Integrative analysis of the melanoma transcriptome". Genome Res. 20 (4): 413–27. doi:10.1101/gr.103697.109. PMC 2847744. PMID 20179022. {{cite journal}}: Explicit use of et al. in: |author= (help)CS1 maint: date and year (link) CS1 maint: multiple names: authors list (link)
^ Twine NA, Janitz K, Wilkins MR, Janitz M (2011). "Whole transcriptome sequencing reveals gene expression and splicing differences in brain regions affected by Alzheimer's disease". PLOS ONE. 6 (1): e16266. doi:10.1371/journal.pone.0016266. PMC 3025006. PMID 21283692.{{cite journal}}: CS1 maint: multiple names: authors list (link)
^ Ku GM, Kim H, Vaughn IW; et al. (October 2012). "Research resource: RNA-Seq reveals unique features of the pancreatic β-cell transcriptome". Mol. Endocrinol. 26 (10): 1783–92. doi:10.1210/me.2012-1176. PMC 3458219. PMID 22915829. {{cite journal}}: Explicit use of et al. in: |author= (help)CS1 maint: date and year (link) CS1 maint: multiple names: authors list (link)
^ Morán I, Akerman I, van de Bunt M; et al. (October 2012). "Human β cell transcriptome analysis uncovers lncRNAs that are tissue-specific, dynamically regulated, and abnormally expressed in type 2 diabetes". Cell Metab. 16 (4): 435–48. doi:10.1016/j.cmet.2012.08.010. PMC 3475176. PMID 23040067. {{cite journal}}: Explicit use of et al. in: |author= (help)CS1 maint: date and year (link) CS1 maint: multiple names: authors list (link)
^ "ENCODE Data Matrix". Retrieved 2013-07-28.
^ "The Cancer Genome Atlas - Data Portal". Retrieved 2013-07-28.

Category:Molecular biology Category:RNA Category:Gene expression

[morin2008-1] Ryan D. Morin, Matthew Bainbridge, Anthony Fejes, Martin Hirst, Martin Krzywinski, Trevor J. Pugh, Helen McDonald, Richard Varhol, Steven J.M. Jones, and Marco A. Marra. (2008). "Profiling the HeLa S3 transcriptome using randomly primed cDNA and massively parallel short-read sequencing". BioTechniques. 45 (1): 81–94. doi:10.2144/000112900. PMID 18611170.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[2] Chu Y, Corey DR (August 2012). "RNA sequencing: platform selection, experimental design, and data interpretation". Nucleic Acid Ther. 22 (4): 271–4. doi:10.1089/nat.2012.0367. PMC 3426205. PMID 22830413.{{cite journal}}: CS1 maint: date and year (link)

[3] Maher CA, Kumar-Sinha C, Cao X; et al. (March 2009). "Transcriptome sequencing to detect gene fusions in cancer". Nature. 458 (7234): 97–101. doi:10.1038/nature07638. PMC 2725402. PMID 19136943. {{cite journal}}: Explicit use of et al. in: |author= (help)CS1 maint: date and year (link) CS1 maint: multiple names: authors list (link)

[4] Ingolia NT, Brar GA, Rouskin S, McGeachy AM, Weissman JS (August 2012). "The ribosome profiling strategy for monitoring translation in vivo by deep sequencing of ribosome-protected mRNA fragments". Nat Protoc. 7 (8): 1534–50. doi:10.1038/nprot.2012.086. PMC 3535016. PMID 22836135.{{cite journal}}: CS1 maint: date and year (link) CS1 maint: multiple names: authors list (link)

[5] Qian F, Chung L, Zheng W; et al. (2013). "Identification of Genes Critical for Resistance to Infection by West Nile Virus Using RNA-Seq Analysis". Viruses. 5 (7): 1664–81. doi:10.3390/v5071664. PMC 3738954. PMID 23881275. {{cite journal}}: Explicit use of et al. in: |author= (help)CS1 maint: multiple names: authors list (link)

[6] Beane J, Vick J, Schembri F; et al. (June 2011). "Characterizing the impact of smoking and lung cancer on the airway transcriptome using RNA-Seq". Cancer Prev Res (Phila). 4 (6): 803–17. doi:10.1158/1940-6207.CAPR-11-0212. PMC 3694393. PMID 21636547. {{cite journal}}: Explicit use of et al. in: |author= (help)CS1 maint: date and year (link) CS1 maint: multiple names: authors list (link)

[7] "HapMap: About the Project". Retrieved 2013-07-28.

[8] Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y (September 2008). "RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays". Genome Res. 18 (9): 1509–17. doi:10.1101/gr.079558.108. PMC 2527709. PMID 18550803.{{cite journal}}: CS1 maint: date and year (link) CS1 maint: multiple names: authors list (link)

[9] Siu H, Zhu Y, Jin L, Xiong M (2011). "Implication of next-generation sequencing on association studies". BMC Genomics. 12: 322. doi:10.1186/1471-2164-12-322. PMC 3148210. PMID 21682891.{{cite journal}}: CS1 maint: multiple names: authors list (link) CS1 maint: unflagged free DOI (link)

[wang2009-10] Wang Z, Gerstein M, Snyder M. (January 2009). "RNA-Seq: a revolutionary tool for transcriptomics". Nature Reviews Genetics. 10 (1): 57–63. doi:10.1038/nrg2484. PMC 2949280. PMID 19015660.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[mortazavi2008-11] Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. (2008). "Mapping and quantifying mammalian transcriptomes by RNA-seq". Nature Methods. 5 (7): 621–628. doi:10.1038/nmeth.1226. PMID 18516045.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[protocolonline-12] ttp://www.protocol-online.org/prot/Molecular_Biology/RNA/RNA_Extraction/mRNA_Isolation/index.html

[13] Liu D, Graber JH (2006). "Quantitative comparison of EST libraries requires compensation for systematic biases in cDNA generation". BMC Bioinformatics. 7: 77. doi:10.1186/1471-2105-7-77. PMC 1431573. PMID 16503995.{{cite journal}}: CS1 maint: unflagged free DOI (link)

[14] Grabherr MG, Haas BJ, Yassour M; et al. (July 2011). "Full-length transcriptome assembly from RNA-Seq data without a reference genome". Nat. Biotechnol. 29 (7): 644–52. doi:10.1038/nbt.1883. PMC 3571712. PMID 21572440. {{cite journal}}: Explicit use of et al. in: |author= (help)CS1 maint: date and year (link) CS1 maint: multiple names: authors list (link)

[zerbino2008-15] Zerbino DR, Birney E (2008). "Velvet: Algorithms for de novo short read assembly using de Bruijn graphs". Genome Research. 18 (5): 821–829. doi:10.1101/gr.074492.107. PMC 2336801. PMID 18349386.

[16] Langmead B, Trapnell C, Pop M, Salzberg SL (2009). "Ultrafast and memory-efficient alignment of short DNA sequences to the human genome". Genome Biol. 10 (3): R25. doi:10.1186/gb-2009-10-3-r25. PMC 2690996. PMID 19261174.{{cite journal}}: CS1 maint: multiple names: authors list (link) CS1 maint: unflagged free DOI (link)

[trapnell2009-17] Cole Trapnell, Lior Pachter and Steven Salzberg (2009). "TopHat: discovering splice junctions with RNA-Seq". Bioinformatics. 25 (9): 1105–1111. doi:10.1093/bioinformatics/btp120. PMC 2672628. PMID 19289445.

[trapnell2010-18] Cole Trapnell, Brian A Williams, Geo Pertea, Ali Mortazavi, Gordon Kwan, Marijke J van Baren, Steven L Salzberg, Barbara J Wold and Lior Pachter (2010). "Transcript assembly and abundance estimation from RNA-Seq reveals thousands of new transcripts and switching among isoforms". Nature Biotechnology. 28 (5): 511–515. doi:10.1038/nbt.1621. PMC 3146043. PMID 20436464.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[19] "FANSe: introduction". Retrieved 2013-07-28.

[20] Trapnell C, Roberts A, Goff L; et al. (March 2012). "Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks". Nat Protoc. 7 (3): 562–78. doi:10.1038/nprot.2012.016. PMC 3334321. PMID 22383036. {{cite journal}}: Explicit use of et al. in: |author= (help)CS1 maint: date and year (link) CS1 maint: multiple names: authors list (link)

[21] Lu B, Zeng Z, Shi T (February 2013). "Comparative study of de novo assembly and genome-guided assembly strategies for transcriptome reconstruction based on RNA-Seq". Sci China Life Sci. 56 (2): 143–55. doi:10.1007/s11427-013-4442-z. PMID 23393030.{{cite journal}}: CS1 maint: date and year (link) CS1 maint: multiple names: authors list (link)

[22] Bradnam KR, Fass JN, Alexandrov A; et al. (July 2013). "Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species". GigaScience. 2 (1): 10. doi:10.1186/2047-217X-2-10. PMC 3844414. PMID 23870653. {{cite journal}}: Explicit use of et al. in: |author= (help)CS1 maint: date and year (link) CS1 maint: multiple names: authors list (link) CS1 maint: unflagged free DOI (link)

[li2008-23] Li H, Lovci MT, Kwon YS, Rosenfeld MG, Fu XD, Yeo GW (2008). "Determination of tag density required for digital transcriptome analysis: Application to an androgen-sensitive prostate cancer model". Proc Natl Acad Sci USA. 105 (51): 20179–84. doi:10.1073/pnas.0807121105. PMC 2603435. PMID 19088194.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[greenbaum2003-24] Greenbaum D, Colangelo C, Williams K, Gerstein M. (2003). "Comparing protein abundance and mRNA expression levels on a genomic scale". Genome Biology. 4 (9): 117. doi:10.1186/gb-2003-4-9-117. PMC 193646. PMID 12952525.{{cite journal}}: CS1 maint: multiple names: authors list (link) CS1 maint: unflagged free DOI (link)

[nagalakshmi2008-25] Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, Gerstein M, Snyder M (2008). "The Transcriptional Landscape of the Yeast Genome Defined by RNA Sequencing". Science. 320 (5881): 1344–1349. doi:10.1126/science.1158441. PMC 2951732. PMID 18451266.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[26] "CummeRbund - An R package for persistent storage, analysis, and visualization of RNA-Seq from cufflinks output". Retrieved 2013-07-28.

[barbazuk2007-27] Barbazuk WB, Emrich SJ, Chen HD, Li L, Schnable PS (2007). "SNP discovery via 454 transcriptome sequencing". The Plant Journal. 51 (5): 910–918. doi:10.1111/j.1365-313X.2007.03193.x. PMC 2169515. PMID 17662031.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[28] Lalonde E, Ha KC, Wang Z; et al. (April 2011). "RNA sequencing reveals the role of splicing polymorphisms in regulating human gene expression". Genome Res. 21 (4): 545–54. doi:10.1101/gr.111211.110. PMC 3065702. PMID 21173033. {{cite journal}}: Explicit use of et al. in: |author= (help)CS1 maint: date and year (link) CS1 maint: multiple names: authors list (link)

[garcion2004-29] Garcion E, Wallace B, Pelletier L, Wion D. (2004). "RNA mutagenesis and sporadic prion diseases". Journal of Theoretical Biology. 230 (2): 271–274. doi:10.1016/j.jtbi.2004.05.014. PMID 15302558.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[teixeira2006-30] Teixeira MR (2006). "Recurrent fusion oncogenes in carcinomas". Ciritical Reviews in Oncogenesis. 12 (3–4): 257–271. doi:10.1615/critrevoncog.v12.i3-4.40. PMID 17425505.

[maher2009-31] Maher CA, Kumar-Sinha C, Cao X, Kalyana-Sundaram S, Han B, Jing X, Sam L, Barrette T, Palanisamy N, Chinnaiyan AM (January 2009). "Transcriptome Sequencing to Detect Gene Fusions in Cancer". Nature. 458 (7234): 97–101. doi:10.1038/nature07638. PMC 2725402. PMID 19136943.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[32] Tuch BB, Laborde RR, Xu X; et al. (2010). "Tumor transcriptome sequencing reveals allelic expression imbalances associated with copy number alterations". PLOS ONE. 5 (2): e9317. doi:10.1371/journal.pone.0009317. PMC 2824832. PMID 20174472. {{cite journal}}: Explicit use of et al. in: |author= (help)CS1 maint: multiple names: authors list (link)

[33] Berger MF, Levin JZ, Vijayendran K; et al. (April 2010). "Integrative analysis of the melanoma transcriptome". Genome Res. 20 (4): 413–27. doi:10.1101/gr.103697.109. PMC 2847744. PMID 20179022. {{cite journal}}: Explicit use of et al. in: |author= (help)CS1 maint: date and year (link) CS1 maint: multiple names: authors list (link)

[34] Twine NA, Janitz K, Wilkins MR, Janitz M (2011). "Whole transcriptome sequencing reveals gene expression and splicing differences in brain regions affected by Alzheimer's disease". PLOS ONE. 6 (1): e16266. doi:10.1371/journal.pone.0016266. PMC 3025006. PMID 21283692.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[35] Ku GM, Kim H, Vaughn IW; et al. (October 2012). "Research resource: RNA-Seq reveals unique features of the pancreatic β-cell transcriptome". Mol. Endocrinol. 26 (10): 1783–92. doi:10.1210/me.2012-1176. PMC 3458219. PMID 22915829. {{cite journal}}: Explicit use of et al. in: |author= (help)CS1 maint: date and year (link) CS1 maint: multiple names: authors list (link)

[36] Morán I, Akerman I, van de Bunt M; et al. (October 2012). "Human β cell transcriptome analysis uncovers lncRNAs that are tissue-specific, dynamically regulated, and abnormally expressed in type 2 diabetes". Cell Metab. 16 (4): 435–48. doi:10.1016/j.cmet.2012.08.010. PMC 3475176. PMID 23040067. {{cite journal}}: Explicit use of et al. in: |author= (help)CS1 maint: date and year (link) CS1 maint: multiple names: authors list (link)

[37] "ENCODE Data Matrix". Retrieved 2013-07-28.

[38] "The Cancer Genome Atlas - Data Portal". Retrieved 2013-07-28.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]