RNA-Seq

From Wikipedia, the free encyclopedia
  (Redirected from RNA-seq)
Jump to navigation Jump to search
Summary of RNA-Seq. Within the organism, genes are transcribed and (in a eukaryotic organism) spliced to produce mature mRNA transcripts (red). The mRNA is extracted from the organism, fragmented and copied into stable ds-cDNA (blue). The ds-cDNA is sequenced using high-throughput, short-read sequencing methods. These sequences can then be aligned to a reference genome sequence to reconstruct which genome regions were being transcribed. This data can be used to annotate where expressed genes are, their relative expression levels, and any alternative splice variants.[1]

RNA-Seq (RNA sequencing), also called whole transcriptome shotgun sequencing[2] (WTSS), uses next-generation sequencing (NGS) to reveal the presence and quantity of RNA in a biological sample at a given moment.[3][4]

RNA-Seq is used to analyze the continuously changing cellular transcriptome. Specifically, RNA-Seq facilitates the ability to look at alternative gene spliced transcripts, post-transcriptional modifications, gene fusion, mutations/SNPs and changes in gene expression over time, or differences in gene expression in different groups or treatments.[5] In addition to mRNA transcripts, RNA-Seq can look at different populations of RNA to include total RNA, small RNA, such as miRNA, tRNA, and ribosomal profiling.[6] RNA-Seq can also be used to determine exon/intron boundaries and verify or amend previously annotated 5' and 3' gene boundaries. Recent advances in RNA-seq include single cell sequencing and in situ sequencing of fixed tissue.[7]

Prior to RNA-Seq, gene expression studies were done with hybridization-based microarrays. Issues with microarrays include cross-hybridization artifacts, poor quantification of lowly and highly expressed genes, and needing to know the sequence a priori.[8] Because of these technical issues, transcriptomics transitioned to sequencing-based methods. These progressed from Sanger sequencing of Expressed Sequence Tag libraries, to chemical tag-based methods (e.g., serial analysis of gene expression), and finally to the current technology, next-gen sequencing of cDNA (notably RNA-Seq).

Methods[edit]

Library preparation[edit]

Overview of RNA-Seq.

The general steps to prepare a complementary DNA (cDNA) library for sequencing are described below, but often vary between platforms.[9][4][10]

  1. RNA Isolation: RNA is isolated from tissue and mixed with deoxyribonuclease (DNase). DNase reduces the amount of genomic DNA. The amount of RNA degradation is checked with gel and capillary electrophoresis and is used to assign an RNA integrity number to the sample. This RNA quality and the total amount of starting RNA are taken into consideration during the subsequent library preparation, sequencing, and analysis steps.
  2. RNA selection/depletion: To analyze signals of interest, the isolated RNA can either be kept as is, filtered for RNA with 3' polyadenylated (poly(A)) tails to include only mRNA, depleted of ribosomal RNA (rRNA), and/or filtered for RNA that binds specific sequences (RNA selection and depletion methods table, below). The RNA with 3' poly(A) tails are mature, processed, coding sequences. Poly(A) selection is performed by mixing RNA with poly(T) oligomers covalently attached to a substrate, typically magnetic beads.[2][11] Poly(A) selection ignores noncoding RNA and introduces 3' bias,[12] which is avoided with the ribosomal depletion strategy. The rRNA is removed because it represents over 90% of the RNA in a cell, which if kept would drown out other data in the transcriptome.
  3. cDNA synthesis: RNA is reverse transcribed to cDNA because DNA is more stable and to allow for amplification (which uses DNA polymerases) and leverage more mature DNA sequencing technology. Amplification subsequent to reverse transcription results in loss of strandedness, which can be avoided with chemical labeling or single molecule sequencing. Fragmentation and size selection are performed to purify sequences that are the appropriate length for the sequencing machine. The RNA, cDNA, or both are fragmented with enzymes, sonication, or nebulizers. Fragmentation of the RNA reduces 5' bias of randomly primed-reverse transcription and the influence of primer binding sites,[11] with the downside that the 5' and 3' ends are converted to DNA less efficiently. Fragmentation is followed by size selection, where either small sequences are removed or a tight range of sequence lengths are selected. Because small RNAs like miRNAs are lost, these are analyzed independently. The cDNA for each experiment can be indexed with a hexamer or octamer barcode, so that these experiments can be pooled into a single lane for multiplexed sequencing.
RNA selection and depletion methods:[9]
Strategy Type of RNA Ribosomal RNA content Unprocessed RNA content Genomic DNA content Isolation method
Total RNA All High High High None
PolyA selection Coding Low Low Low Hybridization with poly(dT) oligomers
rRNA depletion Coding, noncoding Low High High Removal of oligomers complementary to rRNA
RNA capture Targeted Low Moderate Low Hybridization with probes complementary to desired transcripts

Small RNA/non-coding RNA sequencing[edit]

When sequencing RNA other than mRNA, the library preparation is modified. The cellular RNA is selected based on the desired size range. For small RNA targets, such as miRNA, the RNA is isolated through size selection. This can be performed with a size exclusion gel, through size selection magnetic beads, or with a commercially developed kit. Once isolated, linkers are added to the 3' and 5' end then purified. The final step is cDNA generation through reverse transcription.

Direct RNA sequencing[edit]

RNASeqPics1.jpg
RNASeqPics2.jpg

Because converting RNA into cDNA, ligation, amplifcation, and other sample manipulations have been shown to introduce biases and artifacts that may interfere with both the proper characterization and quantification of transcripts,[13] single molecule direct RNA sequencing has been explored by companies including Helicos (bankrupt), Oxford Nanopore Technologies,[14], and others. This technology sequences RNA molecules directly in a massively-parallel manner.

Single-cell RNA sequencing (scRNA-seq)[edit]

Standard methods such as microarrays and standard bulk RNA-seq analysis analyze the expression of RNAs from large populations of cells. In mixed cell populations, these measurements may obscure critical differences between individual cells within these populations.[15][16]

Single-cell RNA sequencing (scRNA-seq) provides the expression profiles of individual cells. Although it is not possible to obtain complete information on every RNA expressed by each cell, due to the small amount of material available, patterns of gene expression can be identified through gene clustering analyses. This can uncover the existence of rare cell types within a cell population that may never have been seen before. For example, rare specialized cells in the lung called pulmonary ionocytes that express the Cystic Fibrosis Transmembrane Conductance Regulator were identified in 2018 by two groups performing scRNA-Seq on lung airway epithelia.[17][18]

Experimental procedures[edit]

Single-cell RNA sequencing workflow

Current scRNA-seq protocols involve the following steps: isolation of single cell and RNA, reverse transcription (RT), amplification, library generation and sequencing. Early methods separated individual cells into separate wells; more recent methods encapsulate individual cells in droplets in a microfluidic device, where the reverse transcription reaction takes place, converting RNAs to cDNAs. Each droplet carries a DNA "barcode" that uniquely labels the cDNAs derived from a single cell. Once reverse transcription is complete, the cDNAs from many cells can be mixed together for sequencing; transcripts from a particular cell are identified by the unique barcode.[19][20]

Challenges for scRNA-Seq include preserving the initial relative abundance of mRNA in a cell and identifying rare transcripts.[21] The reverse transcription step is critical as the efficiency of the RT reaction determines how much of the cell’s RNA population will be eventually analyzed by the sequencer. The processivity of reverse transcriptases and the priming strategies used may affect full-length cDNA production and the generation of libraries biased toward 3’ or 5' end of genes.

In the amplification step, either PCR or in vitro transcription (IVT) is currently used to amplify cDNA. One of the advantages of PCR-based methods is the ability to generate full-length cDNA. However, different PCR efficiency on particular sequences (for instance, GC content and snapback structure) may also be exponentially amplified, producing libraries with uneven coverage. On the other hand, while libraries generated by IVT can avoid PCR-induced sequence bias, specific sequences may be transcribed inefficiently, thus causing sequence drop-out or generating incomplete sequences.[22][15] Several scRNA-seq protocols have been published: Tang et al.,[23] STRT,[24] SMART-seq,[25] CEL-seq,[26] RAGE-seq,[27] , Quartz-seq.[28] and C1-CAGE.[29] These protocols differ in terms of strategies for reverse transcription, cDNA synthesis and amplification, and the possibility to accommodate sequence-specific barcodes (i.e. UMIs) or the ability to process pooled samples.[30]

In 2017, two approaches were introduced to simultaneously measure single-cell mRNA and protein expression through oligonucleotide-labeled antibodies known as REAP-seq,[31] and CITE-seq.[32]

Applications[edit]

scRNA-Seq is becoming widely used across biological disciplines including Development, Neurology,[33] Oncology,[34][35][36] Autoimmune disease,[37] and Infectious disease.[38]

scRNA-Seq has provided considerable insight into the development of embryos and organisms, including the worm Caenorhabditis elegans,[39] and the regenerative planarian Schmidtea mediterranea.[40][41] The first vertebrate animals to be mapped in this way were Zebrafish[42][43] and Xenopus laevis.[44] In each case multiple stages of the embryo were studied, allowing the entire process of development to be mapped on a cell-by-cell basis.[9] Science recognized these advances as the 2018 Breakthrough of the Year.[45]

Experimental considerations[edit]

A variety of parameters are considered when designing and conducting RNA-Seq experiments:

  • Tissue specificity: Gene expression varies within and between tissues, and RNA-Seq measures this mix of cell types. This may make it difficult to isolate the biological mechanism of interest. Single cell sequencing can be used to study each cell individually, mitigating this issue.
  • Time dependence: Gene expression changes over time, and RNA-Seq only takes a snapshot. Time course experiments can be performed to observe changes in the transcriptome.
  • Coverage (also known as depth): RNA harbors the same mutations observed in DNA, and detection requires deeper coverage. With high enough coverage, RNA-Seq can be used to estimate the expression of each allele. This may provide insight into phenomena such as imprinting or cis-regulatory effects. The depth of sequencing required for specific applications can be extrapolated from a pilot experiment.[46]
  • Data generation artifacts (also known as technical variance): The reagents (e.g., library preparation kit), personnel involved, and type of sequencer (e.g., Illumina, Pacific Biosciences) can result in technical artifacts that might be mis-interpreted as meaningful results. As with any scientific experiment, it is prudent to conduct RNA-Seq in a well controlled setting. If this is not possible or the study is a meta-analysis, another solution is to detect technical artifacts by inferring latent variables (typically principal component analysis or factor analysis) and subsequently correcting for these variables.[47]
  • Data management: A single RNA-Seq experiment in humans is usually on the order of 1 Gb.[48] This large volume of data can pose storage issues. One solution is compressing the data using multi-purpose computational schemas (e.g., gzip) or genomics-specific schemas. The latter can be based on reference sequences or de novo. Another solution is to perform microarray experiments, which may be sufficient for hypothesis-driven work or replication studies (as opposed to exploratory research).

Analysis[edit]

Diagram outlining the RNASeq analyses described in this section

Transcriptome assembly[edit]

Two methods are used to assign raw sequence reads to genomic features (i.e., assemble the transcriptome):

  • De novo: This approach does not require a reference genome to reconstruct the transcriptome, and is typically used if the genome is unknown, incomplete, or substantially altered compared to the reference.[49] Challenges when using short reads for de novo assembly include 1) determining which reads should be joined together into contiguous sequences (contigs), 2) robustness to sequencing errors and other artifacts, and 3) computational efficiency. The primary algorithm used for de novo assembly transitioned from overlap graphs, which identify all pair-wise overlaps between reads, to de Bruijn graphs, which break reads into sequences of length k and collapse all k-mers into a hash table.[50] Overlap graphs were used with Sanger sequencing, but do not scale well to the millions of reads generated with RNA-Seq. Examples of assemblers that use de Bruijn graphs are Velvet,[51] Trinity,[49] Oases,[52] and Bridger.[53] Paired end and long read sequencing of the same sample can mitigate the deficits in short read sequencing by serving as a template or skeleton. Metrics to assess the quality of a de novo assembly include median contig length, number of contigs and N50.[54]
RNA-seq mapping of short reads in exon-exon junctions. The final mRNA is sequenced, which is missing the intronic sections of the pre-mRNA.
  • Genome guided: This approach relies on the same methods used for DNA alignment, with the additional complexity of aligning reads that cover non-continuous portions of the reference genome.[55] These non-continuous reads are the result of sequencing spliced transcripts (see figure). Typically, alignment algorithms have two steps: 1) align short portions of the read (i.e., seed the genome), and 2) use dynamic programming to find an optimal alignment, sometimes in combination with known annotations. Software tools that use genome-guided alignment include Bowtie,[56] TopHat (which builds on BowTie results to align splice junctions),[57][58] Subread,[59] STAR,[55] HISAT2,[60] Sailfish,[61] Kallisto,[62] and GMAP.[63] The quality of a genome guided assembly can be measured with both 1) de novo assembly metrics (e.g., N50) and 2) comparisons to known transcript, splice junction, genome, and protein sequences using precision, recall, or their combination (e.g., F1 score).[54] In addition, in silico assessment could be performed using simulated reads.[64][65]

A note on assembly quality: The current consensus is that 1) assembly quality can vary depending on which metric is used, 2) assemblies that scored well in one species do not necessarily perform well in the other species, and 3) combining different approaches might be the most reliable.[66][67]

Gene expression[edit]

Expression is quantified to study cellular changes in response to external stimuli, differences between healthy and diseased states, and other research questions. Gene expression is often used as a proxy for protein abundance, but these are often not equivalent due to post transcriptional events such as RNA interference and nonsense-mediated decay.[68]

Expression is quantified by counting the number of reads that mapped to each locus in the transcriptome assembly step. Expression can be quantified for exons or genes using contigs or reference transcript annotations.[9] These observed RNA-Seq read counts have been robustly validated against older technologies, including expression microarrays and qPCR.[46][69] Tools that quantify counts are HTSeq,[70] FeatureCounts,[71] Rcount,[72] maxcounts,[73] FIXSEQ,[74] and Cuffquant. The read counts are then converted into appropriate metrics for hypothesis testing, regressions, and other analyses. Parameters for this conversion are:

  • Sequencing depth/coverage: Although depth is pre-specified when conducting multiple RNA-Seq experiments, it will still vary widely between experiments.[75] Therefore, the total number of reads generated in a single experiment is typically normalized by converting counts to fragments, reads, or counts per million mapped reads (FPM, RPM, or CPM). Sequencing depth is sometimes referred to as library size, the number of intermediary cDNA molecules in the experiment.
  • Gene length: Longer genes will have more fragments/reads/counts than shorter genes if transcript expression is the same. This is adjusted by dividing the FPM by the length of a gene, resulting in the metric fragments per kilobase of transcript per million mapped reads (FPKM).[76] When looking at groups of genes across samples, FPKM is converted to transcripts per million (TPM) by dividing each FPKM by the sum of FPKMs within a sample.[77][78][79]
  • Total sample RNA output: Because the same amount of RNA is extracted from each sample, samples with more total RNA will have less RNA per gene. These genes appear to have decreased expression, resulting in false positives in downstream analyses.[75]
  • Variance for each gene's expression: is modeled to account for sampling error (important for genes with low read counts), increase power, and decrease false positives. Variance can be estimated as a normal, Poisson, or negative binomial distribution.[80][81][82]

Differential expression and absolute quantification of transcripts[edit]

RNA-Seq is generally used to compare gene expression between conditions, such as a drug treatment vs non-treated, and find out which genes are up- or down-regulated in each condition. In principle, RNA-Seq will make it possible to account for all the transcripts in the cell for each condition. Differently expressed genes can be identified using tools that count the sequencing reads per gene and compare them between samples. Many packages are available for this type of analysis;[83] some of the most commonly used tools are DESeq[84] and edgeR,[82] packages from Bioconductor.[85][86] Both these tools use a model based on the negative binomial distribution.[84][82]

Notably, in a recent comparison of 11 RNAseq differential expression packages, TMM normalization and the normalization provided by the DEseq package were the only two count normalization methods that showed satisfactory results with respect to all metrics used in the evaluation. DEseq can be a useful package for normalizing counts.[87][88]

It is not possible to do absolute quantification using the common RNA-Seq pipeline, because it only provides RNA levels relative to all transcripts. If the total amount of RNA in the cell changes between conditions, relative normalization will misrepresent the changes for individual transcripts. Absolute quantification of mRNAs is possible by performing RNA-Seq with added spike ins, samples of RNA at known concentrations. After sequencing, the read count of the spike in sequences is used to determine the direct correspondence between read count and biological fragments.[11][89] In developmental studies, this technique has been used in Xenopus tropicalis embryos at a high temporal resolution, to determine transcription kinetics.[90]

Coexpression networks[edit]

Coexpression networks are data-derived representations of genes behaving in a similar way across tissues and experimental conditions.[91] Their main purpose lies in hypothesis generation and guilt-by-association approaches for inferring functions of previously unknown genes.[91] RNASeq data has been recently used to infer genes involved in specific pathways based on Pearson correlation, both in plants [92] and mammals.[93] The main advantage of RNASeq data in this kind of analysis over the microarray platforms is the capability to cover the entire transcriptome, therefore allowing the possibility to unravel more complete representations of the gene regulatory networks. Differential regulation of the splice isoforms of the same gene can be detected and used to predict and their biological functions.[94][95] Weighted gene co-expression network analysis has been successfully used to identify co-expression modules and intramodular hub genes based on RNA seq data. Co-expression modules may correspond to cell types or pathways. Highly connected intramodular hubs can be interpreted as representatives of their respective module. An eigengene is a weighted sum of expression of all genes in a module. Eigengenes are useful biomarkers (features) for diagnosis and prognosis.[96] Variance-Stabilizing Transformation approaches for estimating correlation coefficients based on RNA seq data have been proposed.[92]

Single nucleotide variation discovery[edit]

Transcriptome single nucleotide variation has been analyzed in maize on the Roche 454 sequencing platform.[97] Directly from the transcriptome analysis, around 7000 single nucleotide polymorphisms (SNPs) were recognized. Following Sanger sequence validation, the researchers were able to conservatively obtain almost 5000 valid SNPs covering more than 2400 maize genes. RNA-seq is limited to transcribed regions however, since it will only discover sequence variations in exon regions. This misses many subtle but important intron alleles that affect disease such as transcription regulators, leaving analysis to only large effectors. While some correlation exists between exon to intron variation, only whole genome sequencing would be able to capture the source of all relevant SNPs.[98]

The only way to be absolutely sure of the individual's mutations is to compare the transcriptome sequences to the germline DNA sequence. This enables the distinction of homozygous genes versus skewed expression of one of the alleles and it can also provide information about genes that were not expressed in the transcriptomic experiment. An R-based statistical package known as CummeRbund[99] can be used to generate expression comparison charts for visual analysis.

RNA editing (post-transcriptional alterations)[edit]

Having the matching genomic and transcriptomic sequences of an individual can also help in detecting post-transcriptional edits,[4] where, if the individual is homozygous for a gene, but the gene's transcript has a different allele, then a post-transcriptional modification event is determined.

mRNA centric single nucleotide variants (SNVs) are generally not considered as a representative source of functional variation in cells, mainly due to the fact that these mutations disappear with the mRNA molecule, however the fact that efficient DNA correction mechanisms do not apply to RNA molecules can cause them to appear more often. This has been proposed as the source of certain prion diseases,[100] also known as TSE or transmissible spongiform encephalopathies.

RNA-seq mapping of short reads over exon-exon junctions, depending on where each end maps to, it could be defined a Trans or a Cis event.

Fusion gene detection[edit]

Caused by different structural modifications in the genome, fusion genes have gained attention because of their relationship with cancer.[101] The ability of RNA-seq to analyze a sample's whole transcriptome in an unbiased fashion makes it an attractive tool to find these kinds of common events in cancer.[5]

The idea follows from the process of aligning the short transcriptomic reads to a reference genome. Most of the short reads will fall within one complete exon, and a smaller but still large set would be expected to map to known exon-exon junctions. The remaining unmapped short reads would then be further analyzed to determine whether they match an exon-exon junction where the exons come from different genes. This would be evidence of a possible fusion event, however, because of the length of the reads, this could prove to be very noisy. An alternative approach is to use pair-end reads, when a potentially large number of paired reads would map each end to a different exon, giving better coverage of these events (see figure). Nonetheless, the end result consists of multiple and potentially novel combinations of genes providing an ideal starting point for further validation.

Application to genomic medicine[edit]

History[edit]

The past five years have seen a flourishing of NGS-based methods for genome analysis leading to the discovery of a number of new mutations and fusion transcripts in cancer. RNA-Seq data could help researchers interpreting the "personalized transcriptome" so that it will help understanding the transcriptomic changes happening therefore, ideally, identifying gene drivers for a disease. The feasibility of this approach is however dictated by the costs in terms of money and time.

A basic search on PubMed reveals that the term RNA Seq, queried as ""RNA Seq" OR "RNA-Seq" OR "RNA sequencing" OR "RNASeq"" in order to capture the most common ways of phrasing it, gives 5,425 hits demonstrating usage statistics of this technology. A few examples will be taken into consideration to explain that RNA-Seq applications to the clinic have the potentials to significantly affect patient's life and, on the other hand, requires a team of specialists (bioinformaticians, physicians/clinicians, basic researchers, technicians) to fully interpret the huge amount of data generated by this analysis.

As an example of clinical applications, researchers at the Mayo Clinic used an RNA-Seq approach to identify differentially expressed transcripts between oral cancer and normal tissue samples. They also accurately evaluated the allelic imbalance (AI), ratio of the transcripts produced by the single alleles, within a subgroup of genes involved in cell differentiation, adhesion, cell motility and muscle contraction[102] identifying a unique transcriptomic and genomic signature in oral cancer patients. Novel insight on skin cancer (melanoma) also come from RNA-Seq of melanoma patients. This approach led to the identification of eleven novel gene fusion transcripts originated from previously unknown chromosomal rearrangements. Twelve novel chimeric transcripts were also reported, including seven of those that confirmed previously identified data in multiple melanoma samples.[103] Furthermore, this approach is not limited to cancer patients. RNA-Seq has been used to study other important chronic diseases such as Alzheimer (AD) and diabetes. In the former case, Twine and colleagues compared the transcriptome of different lobes of deceased AD's patient's brain with the brain of healthy individuals identifying a lower number of splice variants in AD's patients and differential promoter usage of the APOE-001 and -002 isoforms in AD's brains.[104] In the latter case, different groups showed the unicity of the beta-cells transcriptome in diabetic patients in terms of transcripts accumulation and differential promoter usage[105] and long non coding RNAs (lncRNAs) signature.[106]

Compared with microarrays, NGS technology has identified novel and low frequency RNAs associated with disease processes. This advantage aids in the diagnosis and possible future treatments of diseases, including cancer. For example, NGS technology identified several previously undocumented differentially-expressed transcripts in rats treated with AFB1, a potent hepatocarcinogen. Nearly 50 new differentially-expressed transcripts were identified between the controls and AFB1-treated rats. Additionally potential new exons were identified, including some that are responsive to AFB1. The next-generation sequencing pipeline identified more differential gene expressions compared with microarrays, particularly when DESeq software was utilized. Cufflinks identified two novel transcripts that were not previously annotated in the Ensembl database; these transcripts were confirmed using PCR-cloning.[107] A followup study identified twenty-five, unannotated AFB1 transcripts from RNA-Seq as long noncoding RNAs.[108] Numerous other studies have demonstrated NGS's ability to detect aberrant mRNA and small non-coding RNA expression in disease processes above that provided by microarrays. The lower cost and higher throughput offered by NGS confers another advantage to researchers.

The role of small non-coding RNAs in disease processes has also been explored in recent years. For example, Han et al. (2011) examined microRNA expression differences in bladder cancer patients in order to understand how changes and dysregulation in microRNA can influence mRNA expression and function. Several microRNAs were differentially expressed in the bladder cancer patients. Upregulation in the aberrant microRNAs was more common than downregulation in the cancer patients. One of the upregulated microRNAs, has-miR-96, has been associated with carcinogenesis, and several of the overexpressed microRNAs have also been observed in other cancers, including ovarian and cervical. Some of the downregulated microRNAs in cancer samples were hypothesized to have inhibitory roles.[109]

ENCODE and TCGA[edit]

A lot of emphasis has been given to RNA-Seq data after the Encyclopedia of DNA Elements (ENCODE) and The Cancer Genome Atlas (TCGA) projects have used this approach to characterize dozens of cell lines[110] and thousands of primary tumor samples,[111] respectively. ENCODE aimed to identify genome-wide regulatory regions in different cohort of cell lines and transcriptomic data are paramount in order to understand the downstream effect of those epigenetic and genetic regulatory layers. TCGA, instead, aimed to collect and analyze thousands of patient's samples from 30 different tumor types in order to understand the underlying mechanisms of malignant transformation and progression. In this context RNA-Seq data provide a unique snapshot of the transcriptomic status of the disease and look at an unbiased population of transcripts that allows the identification of novel transcripts, fusion transcripts and non-coding RNAs that could be undetected with different technologies.

See also[edit]

References[edit]

  1. ^ Shafee T, Lowe R (2017). "Eukaryotic and prokaryotic gene structure". WikiJournal of Medicine. 4 (1). doi:10.15347/wjm/2017.002.
  2. ^ a b Morin R, Bainbridge M, Fejes A, Hirst M, Krzywinski M, Pugh T, et al. (July 2008). "Profiling the HeLa S3 transcriptome using randomly primed cDNA and massively parallel short-read sequencing". BioTechniques. 45 (1): 81–94. doi:10.2144/000112900. PMID 18611170.
  3. ^ Chu Y, Corey DR (August 2012). "RNA sequencing: platform selection, experimental design, and data interpretation". Nucleic Acid Therapeutics. 22 (4): 271–4. doi:10.1089/nat.2012.0367. PMC 3426205. PMID 22830413.
  4. ^ a b c Wang Z, Gerstein M, Snyder M (January 2009). "RNA-Seq: a revolutionary tool for transcriptomics". Nature Reviews Genetics. 10 (1): 57–63. doi:10.1038/nrg2484. PMC 2949280. PMID 19015660.
  5. ^ a b Maher CA, Kumar-Sinha C, Cao X, Kalyana-Sundaram S, Han B, Jing X, Sam L, Barrette T, Palanisamy N, Chinnaiyan AM (March 2009). "Transcriptome sequencing to detect gene fusions in cancer". Nature. 458 (7234): 97–101. Bibcode:2009Natur.458...97M. doi:10.1038/nature07638. PMC 2725402. PMID 19136943.
  6. ^ Ingolia NT, Brar GA, Rouskin S, McGeachy AM, Weissman JS (July 2012). "The ribosome profiling strategy for monitoring translation in vivo by deep sequencing of ribosome-protected mRNA fragments". Nature Protocols. 7 (8): 1534–50. doi:10.1038/nprot.2012.086. PMC 3535016. PMID 22836135.
  7. ^ Lee JH, Daugharthy ER, Scheiman J, Kalhor R, Yang JL, Ferrante TC, et al. (March 2014). "Highly multiplexed subcellular RNA sequencing in situ". Science. 343 (6177): 1360–3. Bibcode:2014Sci...343.1360L. doi:10.1126/science.1250212. PMC 4140943. PMID 24578530.
  8. ^ Kukurba KR, Montgomery SB (April 2015). "RNA Sequencing and Analysis". Cold Spring Harbor Protocols. 2015 (11): 951–69. doi:10.1101/pdb.top084970. PMC 4863231. PMID 25870306.
  9. ^ a b c d Griffith M, Walker JR, Spies NC, Ainscough BJ, Griffith OL (August 2015). "Informatics for RNA Sequencing: A Web Resource for Analysis on the Cloud". PLoS Computational Biology. 11 (8): e1004393. Bibcode:2015PLSCB..11E4393G. doi:10.1371/journal.pcbi.1004393. PMC 4527835. PMID 26248053.
  10. ^ "RNA-seqlopedia". rnaseq.uoregon.edu. Retrieved 2017-02-08.
  11. ^ a b c Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B (July 2008). "Mapping and quantifying mammalian transcriptomes by RNA-Seq". Nature Methods. 5 (7): 621–8. doi:10.1038/nmeth.1226. PMID 18516045.
  12. ^ Chen EA, Souaiaia T, Herstein JS, Evgrafov OV, Spitsyna VN, Rebolini DF, Knowles JA (October 2014). "Effect of RNA integrity on uniquely mapped reads in RNA-Seq". BMC Research Notes. 7 (1): 753. doi:10.1186/1756-0500-7-753. PMC 4213542. PMID 25339126.
  13. ^ Liu D, Graber JH (February 2006). "Quantitative comparison of EST libraries requires compensation for systematic biases in cDNA generation". BMC Bioinformatics. 7: 77. doi:10.1186/1471-2105-7-77. PMC 1431573. PMID 16503995.
  14. ^ Garalde, Daniel R; Snell, Elizabeth A; Jachimowicz, Daniel; Sipos, Botond; Lloyd, Joseph H; Bruce, Mark; Pantic, Nadia; Admassu, Tigist; James, Phillip; Warland, Anthony; Jordan, Michael; Ciccone, Jonah; Serra, Sabrina; Keenan, Jemma; Martin, Samuel; McNeill, Luke; Wallace, E Jayne; Jayasinghe, Lakmal; Wright, Chris; Blasco, Javier; Young, Stephen; Brocklebank, Denise; Juul, Sissel; Clarke, James; Heron, Andrew J; Turner, Daniel J (15 January 2018). "Highly parallel direct RNA sequencing on an array of nanopores". Nature Methods. 15 (3): 201–206. doi:10.1038/nmeth.4577.
  15. ^ a b "Shapiro E, Biezuner T, Linnarsson S (September 2013). "Single-cell sequencing-based technologies will revolutionize whole-organism science". Nature Reviews. Genetics. 14 (9): 618–30. doi:10.1038/nrg3542. PMID 23897237."
  16. ^ Kolodziejczyk AA, Kim JK, Svensson V, Marioni JC, Teichmann SA (May 2015). "The technology and biology of single-cell RNA sequencing". Molecular Cell. 58 (4): 610–20. doi:10.1016/j.molcel.2015.04.005. PMID 26000846.
  17. ^ Montoro DT, Haber AL, Biton M, Vinarsky V, Lin B, Birket SE, Yuan F, Chen S, Leung HM, Villoria J, Rogel N, Burgin G, Tsankov AM, Waghray A, Slyper M, Waldman J, Nguyen L, Dionne D, Rozenblatt-Rosen O, Tata PR, Mou H, Shivaraju M, Bihler H, Mense M, Tearney GJ, Rowe SM, Engelhardt JF, Regev A, Rajagopal J (August 2018). "A revised airway epithelial hierarchy includes CFTR-expressing ionocytes". Nature. 560 (7718): 319–324. doi:10.1038/s41586-018-0393-7. PMC 6295155. PMID 30069044.
  18. ^ Plasschaert LW, Žilionis R, Choo-Wing R, Savova V, Knehr J, Roma G, Klein AM, Jaffe AB (August 2018). "A single-cell atlas of the airway epithelium reveals the CFTR-rich pulmonary ionocyte". Nature. 560 (7718): 377–381. doi:10.1038/s41586-018-0394-6. PMC 6108322. PMID 30069046.
  19. ^ Klein AM, Mazutis L, Akartuna I, Tallapragada N, Veres A, Li V, Peshkin L, Weitz DA, Kirschner MW (May 2015). "Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells". Cell. 161 (5): 1187–1201. doi:10.1016/j.cell.2015.04.044. PMC 4441768. PMID 26000487.
  20. ^ Macosko EZ, Basu A, Satija R, Nemesh J, Shekhar K, Goldman M, Tirosh I, Bialas AR, Kamitaki N, Martersteck EM, Trombetta JJ, Weitz DA, Sanes JR, Shalek AK, Regev A, McCarroll SA (May 2015). "Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets". Cell. 161 (5): 1202–1214. doi:10.1016/j.cell.2015.05.002. PMC 4481139. PMID 26000488.
  21. ^ "Hebenstreit D (November 2012). "Methods, Challenges and Potentials of Single Cell RNA-seq". Biology. 1 (3): 658–67. doi:10.3390/biology1030658. PMC 4009822. PMID 24832513."
  22. ^ Eberwine J, Sul JY, Bartfai T, Kim J (January 2014). "The promise of single-cell sequencing". Nature Methods. 11 (1): 25–7. doi:10.1038/nmeth.2769. PMID 24524134.
  23. ^ Tang F, Barbacioru C, Wang Y, Nordman E, Lee C, Xu N, Wang X, Bodeau J, Tuch BB, Siddiqui A, Lao K, Surani MA (May 2009). "mRNA-Seq whole-transcriptome analysis of a single cell". Nature Methods. 6 (5): 377–82. doi:10.1038/NMETH.1315. PMID 19349980.
  24. ^ Islam S, Kjällquist U, Moliner A, Zajac P, Fan JB, Lönnerberg P, Linnarsson S (July 2011). "Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq". Genome Research. 21 (7): 1160–7. doi:10.1101/gr.110882.110. PMC 3129258. PMID 21543516.
  25. ^ Ramsköld D, Luo S, Wang YC, Li R, Deng Q, Faridani OR, Daniels GA, Khrebtukova I, Loring JF, Laurent LC, Schroth GP, Sandberg R (August 2012). "Full-length mRNA-Seq from single-cell levels of RNA and individual circulating tumor cells". Nature Biotechnology. 30 (8): 777–82. doi:10.1038/nbt.2282. PMC 3467340. PMID 22820318.
  26. ^ Hashimshony T, Wagner F, Sher N, Yanai I (September 2012). "CEL-Seq: single-cell RNA-Seq by multiplexed linear amplification". Cell Reports. 2 (3): 666–73. doi:10.1016/j.celrep.2012.08.003. PMID 22939981.
  27. ^ Singh M, Al-Eryani G, Carswell S, Ferguson JM, Blackburn J, Barton K, Roden D, Luciani F, Phan T, Junankar S, Jackson K, Goodnow CC, Smith MA, Swarbrick A (2018). "High-throughput targeted long-read single cell sequencing reveals the clonal and transcriptional landscape of lymphocytes". bioRxiv. doi:10.1101/424945.
  28. ^ Sasagawa Y, Nikaido I, Hayashi T, Danno H, Uno KD, Imai T, Ueda HR (April 2013). "Quartz-Seq: a highly reproducible and sensitive single-cell RNA sequencing method, reveals non-genetic gene-expression heterogeneity". Genome Biology. 14 (4): R31. doi:10.1186/gb-2013-14-4-r31. PMC 4054835. PMID 23594475.
  29. ^ Shin, Jay W.; Plessy, Charles; Carninci, Piero; Arner, Erik; Hon, Chung-Chau; Lassmann, Timo; Kasukawa, Takeya; Suzuki, Harukazu; West, Jay (2019-01-21). "C1 CAGE detects transcription start sites and enhancer activity at single-cell resolution". Nature Communications. 10 (1): 360. doi:10.1038/s41467-018-08126-5. ISSN 2041-1723. PMC 6341120. PMID 30664627.
  30. ^ Dal Molin A, Di Camillo B (January 2018). "How to design a single-cell RNA-sequencing experiment: pitfalls, challenges and perspectives". Briefings in Bioinformatics: bby007. doi:10.1093/bib/bby007. PMID 29394315.
  31. ^ Klappenbach, Joel A.; Sadekova, Svetlana; McClanahan, Terrill K.; Moore, Renee; Douglas C. Wilson; Li, Lixia; Wong, Jerelyn; Kumar, Namit; Zhang, Kelvin Xi (October 2017). "Multiplexed quantification of proteins and transcripts in single cells". Nature Biotechnology. 35 (10): 936–939. doi:10.1038/nbt.3973. ISSN 1546-1696. PMID 28854175.
  32. ^ Smibert, Peter; Satija, Rahul; Swerdlow, Harold; Pratip K. Chattopadhyay; Houck-Loomis, Brian; Stephenson, William; Hafemeister, Christoph; Stoeckius, Marlon (September 2017). "Simultaneous epitope and transcriptome measurement in single cells". Nature Methods. 14 (9): 865–868. doi:10.1038/nmeth.4380. ISSN 1548-7105. PMC 5669064. PMID 28759029.
  33. ^ Raj B, Wagner DE, McKenna A, Pandey S, Klein AM, Shendure J, Gagnon JA, Schier AF (June 2018). "Simultaneous single-cell profiling of lineages and cell types in the vertebrate brain". Nature Biotechnology. 36 (5): 442–450. doi:10.1038/nbt.4103. PMC 5938111. PMID 29608178.
  34. ^ Olmos D, Arkenau HT, Ang JE, Ledaki I, Attard G, Carden CP, Reid AH, A'Hern R, Fong PC, Oomen NB, Molife R, Dearnaley D, Parker C, Terstappen LW, de Bono JS (January 2009). "Circulating tumour cell (CTC) counts as intermediate end points in castration-resistant prostate cancer (CRPC): a single-centre experience". Annals of Oncology. 20 (1): 27–33. doi:10.1093/annonc/mdn544. PMID 18695026.
  35. ^ Sims, Peter A.; Yuan, Jinzhou; Levitin, Hanna Mendes (2018-04-01). "Single-Cell Transcriptomic Analysis of Tumor Heterogeneity". Trends in Cancer. 4 (4): 264–268. doi:10.1016/j.trecan.2018.02.003. ISSN 2405-8033. PMC 5993208. PMID 29606308.
  36. ^ Regev, Aviv; Izar, Benjamin; Yoon, Charles H.; Garraway, Levi A.; Rozenblatt-Rosen, Orit; Rotem, Asaf; Johnson, Bruce E.; Schadendorf, Dirk; Allen, Eliezer M. Van (2018-11-01). "A Cancer Cell Program Promotes T Cell Exclusion and Resistance to Checkpoint Blockade". Cell. 175 (4): 984–997.e24. doi:10.1016/j.cell.2018.09.006. ISSN 0092-8674. PMID 30388455.
  37. ^ Satija, Rahul; Swerdlow, Harold P.; Darnell, Robert B.; Orange, Dana E.; Bykerk, Vivian P.; Ivashkiv, Lionel B.; Goodman, Susan M.; Rashidfarrokhi, Ali; Bracken, Bernadette (2018-02-23). "Single-cell RNA-seq of rheumatoid arthritis synovial tissue using low-cost microfluidic instrumentation". Nature Communications. 9 (1): 791. doi:10.1038/s41467-017-02659-x. ISSN 2041-1723. PMC 5824814. PMID 29476078.
  38. ^ Avraham R, Haseley N, Brown D, Penaranda C, Jijon HB, Trombetta JJ, Satija R, Shalek AK, Xavier RJ, Regev A, Hung DT (September 2015). "Pathogen Cell-to-Cell Variability Drives Heterogeneity in Host Immune Responses". Cell. 162 (6): 1309–21. doi:10.1016/j.cell.2015.08.027. PMC 4578813. PMID 26343579.
  39. ^ Cao J, Packer JS, Ramani V, Cusanovich DA, Huynh C, Daza R, Qiu X, Lee C, Furlan SN, Steemers FJ, Adey A, Waterston RH, Trapnell C, Shendure J (August 2017). "Comprehensive single-cell transcriptional profiling of a multicellular organism". Science. 357 (6352): 661–667. doi:10.1126/science.aam8940. PMC 5894354. PMID 28818938.
  40. ^ Plass M, Solana J, Wolf FA, Ayoub S, Misios A, Glažar P, Obermayer B, Theis FJ, Kocks C, Rajewsky N (May 2018). "Cell type atlas and lineage tree of a whole complex animal by single-cell transcriptomics". Science. 360 (6391): eaaq1723. doi:10.1126/science.aaq1723. PMID 29674432.
  41. ^ Fincher CT, Wurtzel O, de Hoog T, Kravarik KM, Reddien PW (May 2018). "Schmidtea mediterranea". Science. 360 (6391): eaaq1736. doi:10.1126/science.aaq1736. PMID 29674431.
  42. ^ Wagner DE, Weinreb C, Collins ZM, Briggs JA, Megason SG, Klein AM (June 2018). "Single-cell mapping of gene expression landscapes and lineage in the zebrafish embryo". Science. 360 (6392): 981–987. doi:10.1126/science.aar4362. PMC 6083445. PMID 29700229.
  43. ^ Farrell JA, Wang Y, Riesenfeld SJ, Shekhar K, Regev A, Schier AF (June 2018). "Single-cell reconstruction of developmental trajectories during zebrafish embryogenesis". Science. 360 (6392): eaar3131. doi:10.1126/science.aar3131. PMC 6247916. PMID 29700225.
  44. ^ Briggs JA, Weinreb C, Wagner DE, Megason S, Peshkin L, Kirschner MW, Klein AM (June 2018). "The dynamics of gene expression in vertebrate embryogenesis at single-cell resolution". Science. 360 (6392): eaar5780. doi:10.1126/science.aar5780. PMC 6038144. PMID 29700227.
  45. ^ You J. "Science's 2018 Breakthrough of the Year: tracking development cell by cell". Science Magazine. American Association for the Advancement of Science.
  46. ^ a b Li H, Lovci MT, Kwon YS, Rosenfeld MG, Fu XD, Yeo GW (December 2008). "Determination of tag density required for digital transcriptome analysis: application to an androgen-sensitive prostate cancer model". Proceedings of the National Academy of Sciences of the United States of America. 105 (51): 20179–84. Bibcode:2008PNAS..10520179L. doi:10.1073/pnas.0807121105. PMC 2603435. PMID 19088194.
  47. ^ Stegle O, Parts L, Piipari M, Winn J, Durbin R (February 2012). "Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses". Nature Protocols. 7 (3): 500–7. doi:10.1038/nprot.2011.457. PMC 3398141. PMID 22343431.
  48. ^ Kingsford C, Patro R (June 2015). "Reference-based compression of short-read sequences using path encoding". Bioinformatics. 31 (12): 1920–8. doi:10.1093/bioinformatics/btv071. PMC 4481695. PMID 25649622.
  49. ^ a b Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, et al. (May 2011). "Full-length transcriptome assembly from RNA-Seq data without a reference genome". Nature Biotechnology. 29 (7): 644–52. doi:10.1038/nbt.1883. PMC 3571712. PMID 21572440.
  50. ^ "De Novo Assembly Using Illumina Reads" (PDF). Retrieved 22 October 2016.
  51. ^ Zerbino DR, Birney E (May 2008). "Velvet: algorithms for de novo short read assembly using de Bruijn graphs". Genome Research. 18 (5): 821–9. doi:10.1101/gr.074492.107. PMC 2336801. PMID 18349386.
  52. ^ Oases: a transcriptome assembler for very short reads
  53. ^ Chang Z, Li G, Liu J, Zhang Y, Ashby C, Liu D, Cramer CL, Huang X (February 2015). "Bridger: a new framework for de novo transcriptome assembly using RNA-seq data". Genome Biology. 16 (1): 30. doi:10.1186/s13059-015-0596-2. PMC 4342890. PMID 25723335.
  54. ^ a b Li B, Fillmore N, Bai Y, Collins M, Thomson JA, Stewart R, Dewey CN (December 2014). "Evaluation of de novo transcriptome assemblies from RNA-Seq data". Genome Biology. 15 (12): 553. doi:10.1186/s13059-014-0553-5. PMC 4298084. PMID 25608678.
  55. ^ a b Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR (January 2013). "STAR: ultrafast universal RNA-seq aligner". Bioinformatics. 29 (1): 15–21. doi:10.1093/bioinformatics/bts635. PMC 3530905. PMID 23104886.
  56. ^ Langmead B, Trapnell C, Pop M, Salzberg SL (2009). "Ultrafast and memory-efficient alignment of short DNA sequences to the human genome". Genome Biology. 10 (3): R25. doi:10.1186/gb-2009-10-3-r25. PMC 2690996. PMID 19261174.
  57. ^ Trapnell C, Pachter L, Salzberg SL (May 2009). "TopHat: discovering splice junctions with RNA-Seq". Bioinformatics. 25 (9): 1105–11. doi:10.1093/bioinformatics/btp120. PMC 2672628. PMID 19289445.
  58. ^ Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, Pimentel H, Salzberg SL, Rinn JL, Pachter L (March 2012). "Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks". Nature Protocols. 7 (3): 562–78. doi:10.1038/nprot.2012.016. PMC 3334321. PMID 22383036.
  59. ^ Liao Y, Smyth GK, Shi W (May 2013). "The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote". Nucleic Acids Research. 41 (10): e108. doi:10.1093/nar/gkt214. PMC 3664803. PMID 23558742.
  60. ^ Kim, D; Langmead, B; Salzberg, SL (April 2015). "HISAT: a fast spliced aligner with low memory requirements". Nature Methods. 12 (4): 357–60. doi:10.1038/nmeth.3317. PMC 4655817. PMID 25751142.
  61. ^ Patro R, Mount SM, Kingsford C (May 2014). "Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms". Nature Biotechnology. 32 (5): 462–4. arXiv:1308.3700. doi:10.1038/nbt.2862. PMC 4077321. PMID 24752080.
  62. ^ Bray NL, Pimentel H, Melsted P, Pachter L (May 2016). "Near-optimal probabilistic RNA-seq quantification". Nature Biotechnology. 34 (5): 525–7. doi:10.1038/nbt.3519. PMID 27043002.
  63. ^ Wu TD, Watanabe CK (May 2005). "GMAP: a genomic mapping and alignment program for mRNA and EST sequences". Bioinformatics. 21 (9): 1859–75. doi:10.1093/bioinformatics/bti310. PMID 15728110.
  64. ^ Baruzzo G, Hayer KE, Kim EJ, Di Camillo B, FitzGerald GA, Grant GR (February 2017). "Simulation-based comprehensive benchmarking of RNA-seq aligners". Nature Methods. 14 (2): 135–139. doi:10.1038/nmeth.4106. PMC 5792058. PMID 27941783.
  65. ^ Engström PG, Steijger T, Sipos B, Grant GR, Kahles A, Rätsch G, et al. (December 2013). "Systematic evaluation of spliced alignment programs for RNA-seq data". Nature Methods. 10 (12): 1185–91. doi:10.1038/nmeth.2722. PMC 4018468. PMID 24185836.
  66. ^ Lu B, Zeng Z, Shi T (February 2013). "Comparative study of de novo assembly and genome-guided assembly strategies for transcriptome reconstruction based on RNA-Seq". Science China Life Sciences. 56 (2): 143–55. doi:10.1007/s11427-013-4442-z. PMID 23393030.
  67. ^ Bradnam KR, Fass JN, Alexandrov A, Baranay P, Bechner M, Birol I, et al. (July 2013). "Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species". GigaScience. 2 (1): 10. doi:10.1186/2047-217X-2-10. PMC 3844414. PMID 23870653.
  68. ^ Greenbaum D, Colangelo C, Williams K, Gerstein M (2003). "Comparing protein abundance and mRNA expression levels on a genomic scale". Genome Biology. 4 (9): 117. doi:10.1186/gb-2003-4-9-117. PMC 193646. PMID 12952525.
  69. ^ Zhang ZH, Jhaveri DJ, Marshall VM, Bauer DC, Edson J, Narayanan RK, et al. (August 2014). "A comparative study of techniques for differential expression analysis on RNA-Seq data". PLOS One. 9 (8): e103207. Bibcode:2014PLoSO...9j3207Z. doi:10.1371/journal.pone.0103207. PMC 4132098. PMID 25119138.
  70. ^ Anders S, Pyl PT, Huber W (January 2015). "HTSeq--a Python framework to work with high-throughput sequencing data". Bioinformatics. 31 (2): 166–9. doi:10.1093/bioinformatics/btu638. PMC 4287950. PMID 25260700.
  71. ^ Liao Y, Smyth GK, Shi W (April 2014). "featureCounts: an efficient general purpose program for assigning sequence reads to genomic features". Bioinformatics. 30 (7): 923–30. arXiv:1305.3347. doi:10.1093/bioinformatics/btt656. PMID 24227677.
  72. ^ Schmid MW, Grossniklaus U (February 2015). "Rcount: simple and flexible RNA-Seq read counting". Bioinformatics. 31 (3): 436–7. doi:10.1093/bioinformatics/btu680. PMID 25322836.
  73. ^ Finotello F, Lavezzo E, Bianco L, Barzon L, Mazzon P, Fontana P, Toppo S, Di Camillo B (2014). "Reducing bias in RNA sequencing data: a novel approach to compute counts". BMC Bioinformatics. 15 (Suppl 1): S7. doi:10.1186/1471-2105-15-s1-s7. PMC 4016203. PMID 24564404.
  74. ^ Hashimoto TB, Edwards MD, Gifford DK (March 2014). "Universal count correction for high-throughput sequencing". PLoS Computational Biology. 10 (3): e1003494. Bibcode:2014PLSCB..10E3494H. doi:10.1371/journal.pcbi.1003494. PMC 3945112. PMID 24603409.
  75. ^ a b Robinson MD, Oshlack A (2010). "A scaling normalization method for differential expression analysis of RNA-seq data". Genome Biology. 11 (3): R25. doi:10.1186/gb-2010-11-3-r25. PMC 2864565. PMID 20196867.
  76. ^ Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL, Wold BJ, Pachter L (May 2010). "Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation". Nature Biotechnology. 28 (5): 511–5. doi:10.1038/nbt.1621. PMC 3146043. PMID 20436464.
  77. ^ Pachter L (19 April 2011). "Models for transcript quantification from RNA-Seq". arXiv:1104.3889 [q-bio.GN].
  78. ^ "What the FPKM? A review of RNA-Seq expression units". The farrago. 8 May 2014. Retrieved 28 March 2018.
  79. ^ Wagner GP, Kin K, Lynch VJ (December 2012). "Measurement of mRNA abundance using RNA-seq data: RPKM measure is inconsistent among samples". Theory in Biosciences = Theorie in den Biowissenschaften. 131 (4): 281–5. doi:10.1007/s12064-012-0162-3. PMID 22872506.
  80. ^ Law CW, Chen Y, Shi W, Smyth GK (February 2014). "voom: Precision weights unlock linear model analysis tools for RNA-seq read counts". Genome Biology. 15 (2): R29. doi:10.1186/gb-2014-15-2-r29. PMC 4053721. PMID 24485249.
  81. ^ Anders S, Huber W (2010). "Differential expression analysis for sequence count data". Genome Biology. 11 (10): R106. doi:10.1186/gb-2010-11-10-r106. PMC 3218662. PMID 20979621.
  82. ^ a b c Robinson MD, McCarthy DJ, Smyth GK (January 2010). "edgeR: a Bioconductor package for differential expression analysis of digital gene expression data". Bioinformatics. 26 (1): 139–40. doi:10.1093/bioinformatics/btp616. PMC 2796818. PMID 19910308.
  83. ^ Soneson C, Delorenzi M (March 2013). "A comparison of methods for differential expression analysis of RNA-seq data". BMC Bioinformatics. 14: 91. doi:10.1186/1471-2105-14-91. PMC 3608160. PMID 23497356.
  84. ^ a b Anders S, Huber W (2010-01-01). "Differential expression analysis for sequence count data". Genome Biology. 11 (10): R106. doi:10.1186/gb-2010-11-10-r106. PMC 3218662. PMID 20979621.
  85. ^ "Bioconductor - Open source software for bioinformatics".
  86. ^ Huber W, Carey VJ, Gentleman R, Anders S, Carlson M, Carvalho BS, et al. (February 2015). "Orchestrating high-throughput genomic analysis with Bioconductor". Nature Methods. 12 (2): 115–21. doi:10.1038/nmeth.3252. PMC 4509590. PMID 25633503.
  87. ^ Soneson, Charlotte; Delorenzi, Mauro (2013). "A comparison of methods for differential expression analysis of RNA-seq data". BMC Bioinformatics. 14: 91. doi:10.1186/1471-2105-14-91. PMC 3608160. PMID 23497356.
  88. ^ Soneson C, Delorenzi M (1 March 2013). "A comparison of methods for differential expression analysis of RNA-seq data". BMC Bioinformatics. 14 (91): 91. doi:10.1186/1471-2105-14-91. PMC 4509590. PMID 23497356.
  89. ^ Marguerat S, Schmidt A, Codlin S, Chen W, Aebersold R, Bähler J (October 2012). "Quantitative analysis of fission yeast transcriptomes and proteomes in proliferating and quiescent cells". Cell. 151 (3): 671–83. doi:10.1016/j.cell.2012.09.019. PMC 3482660. PMID 23101633.
  90. ^ Owens ND, Blitz IL, Lane MA, Patrushev I, Overton JD, Gilchrist MJ, Cho KW, Khokha MK (January 2016). "Measuring Absolute RNA Copy Numbers at High Temporal Resolution Reveals Transcriptome Kinetics in Development". Cell Reports. 14 (3): 632–647. doi:10.1016/j.celrep.2015.12.050. PMC 4731879. PMID 26774488.
  91. ^ a b Marcotte EM, Pellegrini M, Thompson MJ, Yeates TO, Eisenberg D (November 1999). "A combined algorithm for genome-wide prediction of protein function". Nature. 402 (6757): 83–6. Bibcode:1999Natur.402...83M. doi:10.1038/47048. PMID 10573421.
  92. ^ a b Giorgi FM, Del Fabbro C, Licausi F (March 2013). "Comparative study of RNA-seq- and microarray-derived coexpression networks in Arabidopsis thaliana". Bioinformatics. 29 (6): 717–24. doi:10.1093/bioinformatics/btt053. PMID 23376351.
  93. ^ Iancu OD, Kawane S, Bottomly D, Searles R, Hitzemann R, McWeeney S (June 2012). "Utilizing RNA-Seq data for de novo coexpression network inference". Bioinformatics. 28 (12): 1592–7. doi:10.1093/bioinformatics/bts245. PMC 3493127. PMID 22556371.
  94. ^ Eksi R, Li HD, Menon R, Wen Y, Omenn GS, Kretzler M, Guan Y (Nov 2013). "Systematically differentiating functions for alternatively spliced isoforms through integrating RNA-seq data". PLoS Computational Biology. 9 (11): e1003314. Bibcode:2013PLSCB...9E3314E. doi:10.1371/journal.pcbi.1003314. PMC 3820534. PMID 24244129.
  95. ^ Li HD, Menon R, Omenn GS, Guan Y (August 2014). "The emerging era of genomic data integration for analyzing splice isoform function". Trends in Genetics. 30 (8): 340–7. doi:10.1016/j.tig.2014.05.005. PMC 4112133. PMID 24951248.
  96. ^ Foroushani A, Agrahari R, Docking R, Chang L, Duns G, Hudoba M, Karsan A, Zare H (March 2017). "Large-scale gene network analysis reveals the significance of extracellular matrix pathway and homeobox genes in acute myeloid leukemia: an introduction to the Pigengene package and its applications". BMC Medical Genomics. 10 (1): 16. doi:10.1186/s12920-017-0253-6. PMC 5353782. PMID 28298217.
  97. ^ Barbazuk WB, Emrich SJ, Chen HD, Li L, Schnable PS (September 2007). "SNP discovery via 454 transcriptome sequencing". The Plant Journal. 51 (5): 910–8. doi:10.1111/j.1365-313X.2007.03193.x. PMC 2169515. PMID 17662031.
  98. ^ Lalonde E, Ha KC, Wang Z, Bemmo A, Kleinman CL, Kwan T, Pastinen T, Majewski J (April 2011). "RNA sequencing reveals the role of splicing polymorphisms in regulating human gene expression". Genome Research. 21 (4): 545–54. doi:10.1101/gr.111211.110. PMC 3065702. PMID 21173033.
  99. ^ "CummeRbund - An R package for persistent storage, analysis, and visualization of RNA-Seq from cufflinks output". Retrieved 2013-07-28.
  100. ^ Garcion E, Wallace B, Pelletier L, Wion D (September 2004). "RNA mutagenesis and sporadic prion diseases". Journal of Theoretical Biology. 230 (2): 271–4. doi:10.1016/j.jtbi.2004.05.014. PMID 15302558.
  101. ^ Teixeira MR (December 2006). "Recurrent fusion oncogenes in carcinomas". Critical Reviews in Oncogenesis. 12 (3–4): 257–71. doi:10.1615/critrevoncog.v12.i3-4.40. PMID 17425505.
  102. ^ Tuch BB, Laborde RR, Xu X, Gu J, Chung CB, Monighetti CK, et al. (February 2010). "Tumor transcriptome sequencing reveals allelic expression imbalances associated with copy number alterations". PLOS One. 5 (2): e9317. Bibcode:2010PLoSO...5.9317T. doi:10.1371/journal.pone.0009317. PMC 2824832. PMID 20174472.
  103. ^ Berger MF, Levin JZ, Vijayendran K, Sivachenko A, Adiconis X, Maguire J, et al. (April 2010). "Integrative analysis of the melanoma transcriptome". Genome Research. 20 (4): 413–27. doi:10.1101/gr.103697.109. PMC 2847744. PMID 20179022.
  104. ^ Twine NA, Janitz K, Wilkins MR, Janitz M (January 2011). "Whole transcriptome sequencing reveals gene expression and splicing differences in brain regions affected by Alzheimer's disease". PLOS One. 6 (1): e16266. Bibcode:2011PLoSO...616266T. doi:10.1371/journal.pone.0016266. PMC 3025006. PMID 21283692.
  105. ^ Ku GM, Kim H, Vaughn IW, Hangauer MJ, Myung Oh C, German MS, McManus MT (October 2012). "Research resource: RNA-Seq reveals unique features of the pancreatic β-cell transcriptome". Molecular Endocrinology. 26 (10): 1783–92. doi:10.1210/me.2012-1176. PMC 3458219. PMID 22915829.
  106. ^ Morán I, Akerman I, van de Bunt M, Xie R, Benazra M, Nammo T, et al. (October 2012). "Human β cell transcriptome analysis uncovers lncRNAs that are tissue-specific, dynamically regulated, and abnormally expressed in type 2 diabetes". Cell Metabolism. 16 (4): 435–48. doi:10.1016/j.cmet.2012.08.010. PMC 3475176. PMID 23040067.
  107. ^ Merrick BA, Phadke DP, Auerbach SS, Mav D, Stiegelmeyer SM, Shah RR, Tice RR (2013). "RNA-Seq profiling reveals novel hepatic gene expression pattern in aflatoxin B1 treated rats". PLOS One. 8 (4): e61768. Bibcode:2013PLoSO...861768M. doi:10.1371/journal.pone.0061768. PMC 3632591. PMID 23630614.
  108. ^ Merrick BA, Chang JS, Phadke DP, Bostrom MA, Shah RR, Wang X, Gordon O, Wright GM (2018). "HAfTs are novel lncRNA transcripts from aflatoxin exposure". PLOS One. 13 (1): e0190992. Bibcode:2018PLoSO..1390992M. doi:10.1371/journal.pone.0190992. PMC 5774710. PMID 29351317.
  109. ^ Han Y, Chen J, Zhao X, Liang C, Wang Y, Sun L, et al. (March 2011). "MicroRNA expression signatures of bladder cancer revealed by deep sequencing". PLOS One. 6 (3): e18286. Bibcode:2011PLoSO...618286H. doi:10.1371/journal.pone.0018286. PMC 3065473. PMID 21464941.
  110. ^ "ENCODE Data Matrix". Retrieved 2013-07-28.
  111. ^ "The Cancer Genome Atlas - Data Portal". Retrieved 2013-07-28.

External links[edit]