Long non-coding RNA

From Wikipedia, the free encyclopedia
  (Redirected from Long noncoding RNA)
Jump to: navigation, search

Long non-coding RNAs (long ncRNAs, lncRNA) are defined as non-protein coding transcripts longer than 200 nucleotides.[1] This somewhat arbitrary limit distinguishes long ncRNAs from small regulatory RNAs such as microRNAs (miRNAs), short interfering RNAs (siRNAs), Piwi-interacting RNAs (piRNAs), small nucleolar RNAs (snoRNAs), and other short RNAs.[2] However, very recent research has shown that some lncRNAs do encode proteins.


A recent study found only one-fifth of transcription across the human genome is associated with protein-coding genes,[3] indicating at least four times more long non-coding than coding RNA sequences. However, it is large-scale complementary DNA (cDNA) sequencing projects such as FANTOM (Functional Annotation of Mammalian cDNA) that reveal the complexity of this transcription.[4] The FANTOM3 project identified ~35,000 non-coding transcripts from ~10,000 distinct loci that bear many signatures of mRNAs, including 5’ capping, splicing, and poly-adenylation, but have little or no open reading frame (ORF).[4] While the abundance of long ncRNAs was unanticipated, this number, nevertheless, represents a conservative lower estimate, since it omitted many singleton transcripts and non-polyadenylated transcripts (tiling array data shows more than 40% of transcripts are non-polyadenylated).[5] However, unambiguously identifying ncRNAs within these cDNA libraries is challenging, since it can be difficult to distinguish protein-coding transcripts from non-coding transcripts. It has been suggested through multiple studies that testis,[6] and neural tissues express the greatest amount of long non-coding RNAs of any tissue type.[7] Using FANTOM5, 27,919 long ncRNAs have been identified in various human sources.[8]

Quantitatively, lncRNAs demonstrate ~10-fold lower abundance than mRNAs in a population of cells,[9][10] which is explained by higher cell-to-cell variation of expression levels of lncRNA genes in the individual cells, when compared to protein-coding genes.[11] In general, the majority (~78%) of lncRNAs are characterized as tissue-specific, as opposed by only ~19% of mRNAs.[9] In addition to higher tissue specificity, lncRNAs are characterized by higher developmental stage specificity,[12] and cell subtype specificity in heterogeneous tissues, such as human neocortex.[13]

Big efforts have been put into investigating lncRNAs in plant species, since they remain far more uninvestigated than in mammal species. An extensive study considering 37 higher plant species and six algae came out at the end of 2015 and identified ~200,000 non-coding transcripts using an in-silico approach.[14] With this study it was created the Green Non-Coding Database (GreeNC), which is a repository of plant lncRNAs.

Genomic organization[edit]

The current landscape of the mammalian genome is described as numerous 'foci' of transcription that are separated by long stretches of intergenic space.[4] While long ncRNAs are located and transcribed within the intergenic stretches, the majority are transcribed as complex, interlaced networks of overlapping sense and antisense transcripts that often includes protein-coding genes.[15] Genomic sequences within these transcriptional foci are often shared within a number of different coding and non-coding transcripts in the sense and antisense directions[16] giving rise to a complex hierarchy of overlapping isoforms. For example, 3012 out of 8961 cDNAs previously annotated as truncated coding sequences within FANTOM2 were later designated as genuine ncRNA variants of protein-coding cDNAs.[4] While the abundance and conservation of these interleaved arrangements suggest they have biological relevance, the complexity of these foci frustrates easy evaluation.

The GENCODE consortium has collated and analysed a comprehensive set of human lncRNA annotations and their genomic organisation, modifications, cellular locations and tissue expression profiles.[7] Their analysis indicates human lncRNAs show a bias toward two-exon transcripts.[7]


Many small RNAs, such as microRNAs or snoRNAs, exhibit strong conservation across diverse species.[17] In contrast, long ncRNAs (such as Air and Xist) lack strong conservation,[18] suggesting non-functionality[19][20] or the effects of different selection pressures.[21] Unlike mRNAs, which have to conserve the codon usage and prevent frameshift mutations in a single long ORF, selection may conserve only short regions of long ncRNAs that are constrained by structure or sequence-specific interactions. Therefore, we may see selection act only over small regions of the long ncRNA transcript. Thus, despite low conservation of long ncRNAs in general, it should be noted that many long ncRNAs still contain strongly conserved elements. For example, 19% of highly conserved phastCons elements occur in known introns, and another 32% in unannotated regions.[22] Furthermore, a representative set of human long ncRNAs exhibit small, yet significant, reductions in substitution and insertion/deletion rates indicative of purifying selection that conserve the integrity of the transcript at the levels of sequence, promoter and splicing.[23]

On the other hand, the low conservation of some ncRNAs may actually be the result of recent and rapid adaptive selection. For instance, some ncRNAs may even be more pliant to evolutionary pressures than protein-coding genes, as evidenced by the existence of many lineage specific ncRNAs, such as the aforementioned Xist or Air.[21] Indeed, those conserved regions of the human genome that are subject to recent evolutionary change relative to the chimpanzee genome occurs mainly in non-coding regions, many of which are transcribed.[24][25] This includes a ncRNA, HAR1F, which has undergone rapid evolutionary change in humans and is specifically expressed in the Cajal-Retzius cells in the human neocortex.[25] The observation that many functionally validated RNAs are evolving quickly[21][26] may result from these sequences having looser structure-function constraints, allowing greater evolutionary innovation. This is supported by the existence of thousands of sequences in the mammalian genome that show poor conservation at the primary sequence level but have evidence of conserved RNA secondary structures.[27]


Large-scale sequencing of cDNA libraries and more recently transcriptomic sequencing by next generation sequencing indicate that long noncoding RNAs number in the order of tens of thousands in mammals. However, despite accumulating evidence suggesting that the majority of these are likely to be functional,[28][29] only a relatively small proportion has been demonstrated to be biologically relevant. As of January 2016, 294 LncRNAs have been functionally annotated in LncRNAdb (a database of literature described LncRNAs),[30][31] with the majority of these (183 LncRNAs) being described in humans. A further large-scale sequencing study provides evidence that many transcripts thought to be LncRNAs may, in fact, be translated into proteins.[32]

In the regulation of gene transcription[edit]

In gene-specific transcription[edit]

In eukaryotes, RNA transcription is a tightly regulated process. NcRNAs can target different aspects of this process, targeting transcriptional activators or repressors, different components of the transcription reaction including RNA polymerase (RNAP) II and even the DNA duplex to regulate gene transcription and expression (Goodrich 2006). In combination these ncRNAs may comprise a regulatory network that, including transcription factors, finely control gene expression in complex eukaryotes.

NcRNAs modulate the function of transcription factors by several different mechanisms, including functioning themselves as co-regulators, modifying transcription factor activity, or regulating the association and activity of co-regulators. For example, the ncRNA Evf-2 functions as a co-activator for the homeobox transcription factor Dlx2, which plays important roles in forebrain development and neurogenesis (Feng 2006; Panganiban 2002). Sonic hedgehog induces transcription of Evf-2 from an ultra-conserved element located between the Dlx5 and Dlx6 genes during forebrain development (Feng 2006). Evf-2 then recruits the Dlx2 transcription factor to the same ultra-conserved element whereby Dlx2 subsequently induces expression of Dlx5. The existence of other similar ultra- or highly conserved elements within the mammalian genome that are both transcribed and fulfil enhancer functions suggest Evf-2 may be illustrative of a generalised mechanism that tightly regulates important developmental genes with complex expression patterns during vertebrate growth (Pennacchio 2006; Visel 2008). Indeed, the transcription and expression of similar non-coding ultraconserved elements was recently shown to be abnormal in human leukaemia and to contribute to apoptosis in colon cancer cells, suggesting their involvement in tumorogenesis (Calin 2007).

Local ncRNAs can also recruit transcriptional programmes to regulate adjacent protein-coding gene expression. For example, divergent lncRNAs that are transcribed in the opposite direction to nearby protein-coding genes (comprise a significant proportion ∼20% of total lncRNAs in mammalian genomes) possibly regulate the transcription of nearby adjacent essential developmental regulatory genes in pluripotent cells[33]

The RNA binding protein TLS, binds and inhibits the CREB binding protein and p300 histone acetyltransferease activities on a repressed gene target, cyclin D1. The recruitment of TLS to the promoter of cyclin D1 is directed by long ncRNAs expressed at low levels and tethered to 5’ regulatory regions in response to DNA damage signals (Wang 2008). Moreover, these local ncRNAs act cooperatively as ligands to modulate the activities of TLS. In the broad sense, this mechanism allows the cell to harness RNA-binding proteins, which make up one of the largest classes within the mammalian proteome, and integrate their function in transcriptional programs. Nascent long ncRNAs have been shown to increase the activity of CREB binding protein, which in turn increases the transcription of that ncRNA.[34] A recent study found that a lncRNA in the antisense direction of the Apolipoprotein A1 (APOA1) regulates the transcription of APOA1 through epigenetic modifications.[35]

Recent evidence has raised the possibility that transcription of genes that escape from X-inactivation might be mediated by expression of long non-coding RNA within the escaping chromosomal domains (Reinius 2010).

Regulating basal transcription machinery[edit]

NcRNAs also target general transcription factors required for the RNAP II transcription of all genes (Goodrich 2006). These general factors include components of the initiation complex that assemble on promoters or involved in transcription elongation. A ncRNA transcribed from an upstream minor promoter of the dihydrofolate reductase (DHFR) gene forms a stable RNA-DNA triplex within the major promoter of DHFR to prevent the binding of the transcriptional co-factor TFIIB (Martianov 2007). This novel mechanism of regulating gene expression may in fact represent a widespread method of controlling promoter usage given that thousands of such triplexes exist in eukaryotic chromosome (Lee 1987). The U1 ncRNA can induce transcription initiation by specifically binding to and stimulating TFIIH to phosphorylate the C-terminal domain of RNAP II (Kwek 2002). In contrast the ncRNA 7SK, is able to repress transcription elongation by, in combination with HEXIM1/2, forming an inactive complex that prevents the PTEFb general transcription factor from phosphorylating the C-terminal domain of RNAP II (Kwek 2002; Yang 2001; Yik 2003), thereby repressing global elongation under stressful conditions. These examples, which bypass specific modes of regulation at individual promoters to mediate changes directly at the level of initiation and elongation transcriptional machinery, provide a means of quickly affecting global changes in gene expression.

The ability to quickly mediate global changes is also apparent in the rapid expression of non-coding repetitive sequences. The short interspersed nuclear (SINE) Alu elements in humans and analogous B1 and B2 elements in mice have succeeded in becoming the most abundant mobile elements within the genomes, comprising ~10% of the human and ~6% of the mouse genome, respectively (Lander 2001; Waterston 2002). These elements are transcribed as ncRNAs by RNAP III in response to environmental stresses such as heat shock (Liu 1995), where they then bind to RNAP II with high affinity and prevent the formation of active pre-initiation complexes (Allen 2004; Espinoza 2004; Espinoza 2007; Mariner & Walters 2008). This allows for the broad and rapid repression of gene expression in response to stress (Allen 2004; Mariner & Walters 2008).

A dissection of the functional sequences within Alu RNA transcripts has drafted a modular structure analogous to the organization of domains in protein transcription factors (Shamovsky 2008). The Alu RNA contains two ‘arms’, each of which may bind one RNAP II molecule, as well as two regulatory domains that are responsible for RNAP II transcriptional repression in vitro (Mariner 2008). These two loosely structured domains may even be concatenated to other ncRNAs such as B1 elements to impart their repressive role (Mariner & Walters 2008). The abundance and distribution of Alu elements and similar repetitive elements throughout the mammalian genome may be partly due to these functional domains being co-opted into other long ncRNAs during evolution, with the presence of functional repeat sequence domains being a common characteristic of several known long ncRNAs including Kcnq1ot1, Xlsirt and Xist (Mattick 2003; Mohammad 2008; Wutz 2002; Zearfoss 2003).

In addition to heat shock, the expression of SINE elements (including Alu, B1, and B2 RNAs) increases during cellular stress such as viral infection (Singh 1985) in some cancer cells (Tang 2005) where they may similarly regulate global changes to gene expression. The ability of Alu and B2 RNA to bind directly to RNAP II provides a broad mechanism to repress transcription (Espinoza 2004; Mariner & Walters 2008). Nevertheless, there are specific exceptions to this global response where Alu or B2 RNAs are not found at activated promoters of genes undergoing induction, such as the heat shock genes (Mariner & Walters 2008). This additional hierarchy of regulation that exempts individual genes from the generalised repression also involves a long ncRNA, heat shock RNA-1 (HSR-1). It was argued that HSR-1 is present in mammalian cells in an inactive state, but upon stress is activated to induce the expression of heat shock genes (Shamovsky 2006). The authors found that this activation involves a conformational alteration to the structure of HSR-1 in response to rising temperatures, thereby permitting its interaction with the transcriptional activator HSF-1 that subsequently undergoes trimerisation and induces the expression of heat shock genes (Shamovsky 2006). In the broad sense, these examples illustrate a regulatory circuit nested within ncRNAs whereby Alu or B2 RNAs repress general gene expression, while other ncRNAs activate the expression of specific genes.

Transcribed by RNA polymerase III[edit]

Many of the ncRNAs that interact with general transcription factors or RNAP II itself (including 7SK, Alu and B1 and B2 RNAs) are transcribed by RNAP III,[36] thereby uncoupling the expression of these ncRNAs from the RNAP II transcriptional reaction they regulate. RNAP III also transcribes a number of additional novel ncRNAs, such as BC2, BC200 and some microRNAs and snoRNAs, in addition to the highly expressed infrastructural ‘housekeeping’ ncRNA genes such as tRNAs, 5S rRNAs and snRNAs.[37] The existence of an RNAP III-dependent ncRNA transcriptome that regulates its RNAP II-dependent counterpart was supported by a recent study that described a novel set of ncRNAs transcribed by RNAP III with sequence homology to protein-coding genes. This prompted the authors to posit a ‘cogene/gene’ functional regulatory network,[38] showing that one of these ncRNAs, 21A, regulates the expression its antisense partner gene, CENP-F in trans.

In post-transcriptional regulation[edit]

In addition to regulating transcription, ncRNAs also control various aspects of post-transcriptional mRNA processing. Similar to small regulatory RNAs such as microRNAs and snoRNAs, these functions often involve complementary base pairing with the target mRNA. The formation of RNA duplexes between complementary ncRNA and mRNA may mask key elements within the mRNA required to bind trans-acting factors, potentially affecting any step in post-transcriptional gene expression including pre-mRNA processing and splicing, transport, translation, and degradation.

In splicing[edit]

The splicing of mRNA can induce its translation and functionally diversify the repertoire of proteins it encodes. The Zeb2 mRNA, which has a particularly long 5’UTR, requires the retention of a 5’UTR intron that contains an internal ribosome entry site for efficient translation.[39] However, retention of the intron is dependent on the expression of an antisense transcript that complements the intronic 5’ splice site.[39] Therefore, the ectopic expression of the antisense transcript represses splicing and induces translation of the Zeb2 mRNA during mesenchymal development. Likewise, the expression of an overlapping antisense Rev-ErbAα2 transcript controls the alternative splicing of the thyroid hormone receptor ErbAα2 mRNA to form two antagonistic isoforms.[40]

In translation[edit]

NcRNA may also apply additional regulatory pressures during translation, a property particularly exploited in neurons where the dendritic or axonal translation of mRNA in response to synaptic activity contributes to changes in synaptic plasticity and the remodelling of neuronal networks. The RNAP III transcribed BC1 and BC200 ncRNAs, that previously derived from tRNAs, are expressed in the mouse and human central nervous system, respectively.[41][42] BC1 expression is induced in response to synaptic activity and synaptogenesis and is specifically targeted to dendrites in neurons.[43] Sequence complementarity between BC1 and regions of various neuron-specific mRNAs also suggest a role for BC1 in targeted translational repression.[44] Indeed, it was recently shown that BC1 is associated with translational repression in dendrites to control the efficiency of dopamine D2 receptor-mediated transmission in the striatum[45] and BC1 RNA-deleted mice exhibit behavioural changes with reduced exploration and increased anxiety.[46]

In siRNA-directed gene regulation[edit]

In addition to masking key elements within single-stranded RNA, the formation of double-stranded RNA duplexes can also provide a substrate for the generation of endogenous siRNAs (endo-siRNAs) in Drosophila and mouse oocytes.[47] The annealing of complementary sequences, such as antisense or repetitive regions between transcripts, forms an RNA duplex that may be processed by Dicer-2 into endo-siRNAs. Also, long ncRNAs that form extended intramolecular hairpins may be processed into siRNAs, compellingly illustrated by the esi-1 and esi-2 transcripts.[48] Endo-siRNAs generated from these transcripts seem particularly useful in suppressing the spread of mobile transposon elements within the genome in the germline. However, the generation of endo-siRNAs from antisense transcripts or pseudogenes may also silence the expression of their functional counterparts via RISC effector complexes, acting as an important node that integrates various modes of long and short RNA regulation, as exemplified by the Xist and Tsix (see above).[49]

In epigenetic regulation[edit]

Epigenetic modifications, including histone and DNA methylation, histone acetylation and sumoylation, affect many aspects of chromosomal biology, primarily including regulation of large numbers of genes by remodeling broad chromatin domains (Kiefer 2007; Mikkelsen 2007). While it has been known for some time that RNA is an integral component of chromatin (Nickerson 1989; Rodriguez-Campos 2007), it is only recently that we are beginning to appreciate the means by which RNA is involved in pathways of chromatin modification (Chen 2008; Rinn 2007; Sanchez-Elsner 2006).

In Drosophila, long ncRNAs induce the expression of the homeotic gene, Ubx, by recruiting and directing the chromatin modifying functions of the trithorax protein Ash1 to Hox regulatory elements (Sanchez-Elsner 2006). Similar models have been proposed in mammals, where strong epigenetic mechanisms are thought to underlie the embryonic expression profiles of the Hox genes that persist throughout human development (Mazo 2007; Rinn 2007). Indeed, the human Hox genes are associated with hundreds of ncRNAs that are sequentially expressed along both the spatial and temporal axes of human development and define chromatin domains of differential histone methylation and RNA polymerase accessibility (Rinn 2007). One ncRNA, termed HOTAIR, that originates from the HOXC locus represses transcription across 40 kb of the HOXD locus by altering chromatin trimethylation state. HOTAIR is thought to achieve this by directing the action of Polycomb chromatin remodeling complexes in trans to govern the cells' epigenetic state and subsequent gene expression. Components of the Polycomb complex, including Suz12, EZH2 and EED, contain RNA binding domains that may potentially bind HOTAIR and probably other similar ncRNAs (Denisenko 1998; Katayama 2005). This example nicely illustrates a broader theme whereby ncRNAs recruit the function of a generic suite of chromatin modifying proteins to specific genomic loci, underscoring the complexity of recently published genomic maps (Mikkelsen 2007). Indeed, the prevalence of long ncRNAs associated with protein coding genes may contribute to localised patterns of chromatin modifications that regulate gene expression during development. For example, the majority of protein-coding genes have antisense partners, including many tumour suppressor genes that are frequently silenced by epigenetic mechanisms in cancer (Yu 2008). A recent study observed an inverse expression profile of the p15 gene and an antisense ncRNA in leukaemia (Yu 2008). A detailed analysis showed the p15 antisense ncRNA (CDKN2BAS) was able to induce changes to heterochromatin and DNA methylation status of p15 by an unknown mechanism, thereby regulating p15 expression (Yu 2008). Therefore, misexpression of the associated antisense ncRNAs may subsequently silence the tumour suppressor gene contributing towards oncogenesis.


Many emergent themes of ncRNA-directed chromatin modification were first apparent within the phenomenon of imprinting, whereby only one allele of a gene is expressed from either the maternal or the paternal chromosome. In general, imprinted genes are clustered together on chromosomes, suggesting the imprinting mechanism acts upon local chromosome domains rather than individual genes. These clusters are also often associated with long ncRNAs whose expression is correlated with the repression of the linked protein-coding gene on the same allele (Pauler 2007). Indeed, detailed analysis has revealed a crucial role for the ncRNAs Kcnqot1 and Igf2r/Air in directing imprinting (Braidotti 2004).

Almost all the genes at the Kcnq1 loci are maternally inherited, except the paternally expressed antisense ncRNA Kcnqot1 (Mitsuya 1999). Transgenic mice with truncated Kcnq1ot fail to silence the adjacent genes, suggesting that Kcnqot1 is crucial to the imprinting of genes on the paternal chromosome (Mancini-Dinardo 2006). It appears that Kcnqot1 is able to direct the trimethylation of lysine 9 (H3K9me3) and 27 of histone 3 (H3K27me3) to an imprinting centre that overlaps the Kcnqot1 promoter and actually resides within a Kcnq1 sense exon (Umlauf 2004). Similar to HOTAIR (see above), Eed-Ezh2 Polycomb complexes are recruited to the Kcnq1 loci paternal chromosome, possibly by Kcnqot1, where they may mediate gene silencing through repressive histone methylation (Umlauf 2004). A differentially methylated imprinting centre also overlaps the promoter of a long antisense ncRNA Air that is responsible for the silencing of neighbouring genes at the Igf2r locus on the paternal chromosome (Sleutels 2002; Zwart 2001). The presence of allele-specific histone methylation at the Igf2r locus suggests Air also mediates silencing via chromatin modification (Fournier 2002).

Xist and X-chromosome inactivation[edit]

The inactivation of a X-chromosome in female placental mammals is directed by one of the earliest and best characterized long ncRNAs, Xist (Wutz 2007). The expression of Xist from the future inactive X-chromosome, and its subsequent coating of the inactive X-chromosome, occurs during early embryonic stem cell differentiation. Xist expression is followed by irreversible layers of chromatin modifications that include the loss of the histone (H3K9) acetylation and H3K4 methylation that are associated with active chromatin, and the induction of repressive chromatin modifications including H4 hypoacetylation, H3K27 trimethylation (Wutz 2007), H3K9 hypermethylation and H4K20 monomethylation as well as H2AK119 monoubiquitylation. These modifications coincide with the transcriptional silencing of the X-linked genes (Morey 2004). Xist RNA also localises the histone variant macroH2A to the inactive X–chromosome (Costanzi 1998). There are additional ncRNAs that are also present at the Xist loci, including an antisense transcript Tsix, which is expressed from the future active chromosome and able to repress Xist expression by the generation of endogenous siRNA (Ogawa 2008). Together these ncRNAs ensure that only one X-chromosome is active in female mammals.

Telomeric non-coding RNAs[edit]

Telomeres form the terminal region of mammalian chromosomes and are essential for stability and aging and play central roles in diseases such as cancer.[50] Telomeres have been long considered transcriptionally inert DNA-protein complexes until it was recently shown that telomeric repeats may be transcribed as telomeric RNAs (TelRNAs)[51] or telomeric repeat-containing RNAs.[52] These ncRNAs are heterogeneous in length, transcribed from several sub-telomeric loci and physically localise to telomeres. Their association with chromatin, which suggests an involvement in regulating telomere specific heterochromatin modifications, is repressed by SMG proteins that protect chromosome ends from telomere loss.[52] In addition, TelRNAs block telomerase activity in vitro and may therefore regulate telomerase activity.[51] Although early, these studies suggest an involvement for telomeric ncRNAs in various aspects of telomere biology.


Evidence without proof of biological significance[edit]

In mouse embryonic stem cells, it has been found that the majority of lincRNAs (lincRNAs are a subset of lncRNAs) are translated, including about half of the lincRNAs that are required to maintain pluripotency.[53] However, it was not determined if the protein products were required for pluripotency.

Peptides encoded by human lncRNAs have been found in cells[54] and adult tissues.[55]

Functional protein product[edit]

The protein SPAR has been found to be encoded by a lncRNA in mice and humans, and in vivo has biologically significant function in muscle regeneration in mice.[56]

In aging and disease[edit]

Recent recognition that long ncRNAs function in various aspects of cell biology has focused increasing attention on their potential to contribute towards disease etiology. A handful of studies have implicated long ncRNAs in a variety of disease states and support an involvement and co-operation in neurological disease and oncogenesis.

The first published report of an alteration in lncRNA abundance in aging and human neurological disease was provided by Lukiw et al.[57] in a study using short post-mortem interval Alzheimer's disease and non-Alzheimer's dementia (NAD) tissues; this early work was based on the prior identification of a primate brain-specific cytoplasmic transcript of the Alu repeat family by Watson and Sutcliffe in 1987 known as BC200 (brain, cytoplasmic, 200 nucleotide).[58]

While many association studies have identified long ncRNAs that are aberrantly expressed in disease states, we have little understanding of their contribution within disease etiology. Expression analyses that compare tumor cells and normal cells have revealed changes in the expression of ncRNAs in several forms of cancer. For example, in prostate tumours, one of two overexpressed ncRNAs, PCGEM1, is correlated with increased proliferation and colony formation suggesting an involvement in regulating cell growth.[59] MALAT1 (also known as NEAT2) was originally identified as an abundantly expressed ncRNA that is upregulated during metastasis of early-stage non-small cell lung cancer and its overexpression is an early prognostic marker for poor patient survival rates.[59] More recently, the highly conserved mouse homologue of MALAT1 was found to be highly expressed in hepatocellular carcinoma.[60] Intronic antisense ncRNAs with expression correlated to the degree of tumor differentiation in prostate cancer samples have also been reported.[61] Despite a number of long ncRNAs having aberrant expression in cancer, their function and potential role in tumourogenesis is relatively unknown. For example, the ncRNAs HIS-1 and BIC have been implicated in oncogenesis and growth control, but their function in normal cells is unknown.[62][63] In addition to cancer, ncRNAs also exhibit aberrant expression in other disease states. Overexpression of PRINS is associated with psoriasis susceptibility, with PRINS expression being elevated in the uninvolved epidermis of psoriatic patients compared with both psoriatic lesions and healthy epidermis.[64]

Genome-wide profiling revealed that many transcribed non-coding ultraconserved regions exhibit distinct profiles in various human cancer states.[65] An analysis of chronic lymphocytic leukaemia, colorectal carcinoma and hepatocellular carcinoma found that all three cancers exhibited aberrant expression profiles for ultraconserved ncRNAs relative to normal cells. Further analysis of one ultraconserved ncRNA suggested it behaved like an oncogene by mitigating apoptosis and subsequently expanding the number of malignant cells in colorectal cancers.[65] Many of these transcribed ultraconserved sites that exhibit distinct signatures in cancer are found at fragile sites and genomic regions associated with cancer. It seems likely that the aberrant expression of these ultraconserved ncRNAs within malignant processes results from important functions they fulfil in normal human development.

Recently, a number of association studies examining single nucleotide polymorphisms (SNPs) associated with disease states have been mapped to long ncRNAs. For example, SNPs that identified a susceptibility locus for myocardial infarction mapped to a long ncRNA, MIAT (myocardial infarction associated transcript).[66] Likewise, genome-wide association studies identified a region associated with coronary artery disease[67] that encompassed a long ncRNA, ANRIL.[68] ANRIL is expressed in tissues and cell types affected by atherosclerosis[69][70] and its altered expression is associated with a high-risk haplotype for coronary artery disease.[70][71]

The complexity of the transcriptome, and our evolving understanding of its structure may inform a reinterpretation of the functional basis for many natural polymorphisms associated with disease states. Many SNPs associated with certain disease conditions are found within non-coding regions and the complex networks of non-coding transcription within these regions make it particularly difficult to elucidate the functional effects of polymorphisms. For example, a SNP both within the truncated form of ZFAT and the promoter of an antisense transcript increases the expression of ZFAT not through increasing the mRNA stability, but rather by repressing the expression of the antisense transcript.[72]

The ability of long ncRNAs to regulate associated protein-coding genes may contribute to disease if misexpression of a long ncRNA deregulates a protein coding gene with clinical significance. In similar manner, an antisense long ncRNA that regulates the expression of the sense BACE1 gene, a crucial enzyme in Alzheimer’s disease etiology, exhibits elevated expression in several regions of the brain in individuals with Alzheimer's disease[73] Alteration of the expression of ncRNAs may also mediate changes at an epigenetic level to affect gene expression and contribute to disease aetiology. For example, the induction of an antisense transcript by a genetic mutation led to DNA methylation and silencing of sense genes, causing β-thalassemia in a patient.[74]

Long intergenic non-coding RNAs (lincRNA)[edit]

"Intergenic" refers to long non-coding RNAs that are transcribed from non-coding DNA sequences between protein-coding genes,[75][76] which are frequently enriched for various classes of transposable elements. A 2013 study identified tens of thousands of human lincRNAs.[77]

Some lincRNAs attach to messenger RNA to block protein production. At least 26 different lincRNAs are needed to prevent an embryonic stem cell from differentiating. Additionally, it was proposed to classify intergenic RNA domains of at least 50 kb in length as "very long intergenic non-coding" (vlincRNAs) regions.[78]

Families of transposable elements-derived lincRNAs have been implicated in the regulation of pluripotency. Human pluripotency-associated transcripts HPAT2, HPAT3 and HPAT5 function in preimplantation embryo development to modulate the acquisition of pluripotency and the formation of the inner cell mass. HPAT5 lincRNA was identified as a key component of the pluripotency network, that interacts with the let-7 microRNA family.[79]

See also[edit]


  1. ^ Perkel 2013
  2. ^ Ma 2013
  3. ^ Kapranov 2007
  4. ^ a b c d Carninci 2005
  5. ^ Cheng 2005
  6. ^ Necsulea, Anamaria; Soumillon, Magali; Warnefors, Maria; Liechti, Angélica; Daish, Tasman; Zeller, Ulrich; Baker, Julie C.; Grützner, Frank; Kaessmann, Henrik. "The evolution of lncRNA repertoires and expression patterns in tetrapods". Nature. 505 (7485): 635–640. doi:10.1038/nature12943. PMID 24463510. 
  7. ^ a b c Derrien 2012
  8. ^ Hon CC, Ramilowski JA, Harshbarger J, Bertin N, Rackham OJ, Gough J, Denisenko E, Schmeier S, Poulsen TM, Severin J, Lizio M, Kawaji H, Kasukawa T, Itoh M, Burroughs AM, Noma S, Djebali S, Alam T, Medvedeva YA, Testa AC, Lipovich L, Yip CW, Abugessaisa I, Mendez M, Hasegawa A, Tang D, Lassmann T, Heutink P, Babina M, Wells CA, Kojima S, Nakamura Y, Suzuki H, Daub CO, de Hoon MJ, Arner E, Hayashizaki Y, Carninci P, Forrest AR (2017). "An atlas of human long non-coding RNAs with accurate 5' ends.". Nature. 543: 199–204. doi:10.1038/nature21374. PMID 28241135. 
  9. ^ a b Cabili, Moran N.; Trapnell, Cole; Goff, Loyal; Koziol, Magdalena; Tazon-Vega, Barbara; Regev, Aviv; Rinn, John L. (2011-09-15). "Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses". Genes & Development. 25 (18): 1915–1927. doi:10.1101/gad.17446611. ISSN 0890-9369. PMC 3185964Freely accessible. PMID 21890647. 
  10. ^ Ravasi, Timothy; Suzuki, Harukazu; Pang, Ken C.; Katayama, Shintaro; Furuno, Masaaki; Okunishi, Rie; Fukuda, Shiro; Ru, Kelin; Frith, Martin C. (2006-01-01). "Experimental validation of the regulated expression of large numbers of non-coding RNAs from the mouse genome". Genome Research. 16 (1): 11–19. doi:10.1101/gr.4200206. ISSN 1088-9051. PMC 1356124Freely accessible. PMID 16344565. 
  11. ^ Yunusov, Dinar; Anderson, Leticia; DaSilva, Lucas Ferreira; Wysocka, Joanna; Ezashi, Toshihiko; Roberts, R. Michael; Verjovski-Almeida, Sergio (2016-09-08). "HIPSTR and thousands of lncRNAs are heterogeneously expressed in human embryos, primordial germ cells and stable cell lines". Scientific Reports. 6: 32753. doi:10.1038/srep32753. ISSN 2045-2322. PMC 5015059Freely accessible. PMID 27605307. 
  12. ^ Yan, Liying; Yang, Mingyu; Guo, Hongshan; Yang, Lu; Wu, Jun; Li, Rong; Liu, Ping; Lian, Ying; Zheng, Xiaoying. "Single-cell RNA-Seq profiling of human preimplantation embryos and embryonic stem cells". Nature Structural & Molecular Biology. 20 (9): 1131–1139. doi:10.1038/nsmb.2660. 
  13. ^ Liu, Siyuan John; Nowakowski, Tomasz J.; Pollen, Alex A.; Lui, Jan H.; Horlbeck, Max A.; Attenello, Frank J.; He, Daniel; Weissman, Jonathan S.; Kriegstein, Arnold R.; Diaz, Aaron; Lim, Daniel A. (2016-01-01). "Single-cell analysis of long non-coding RNAs in the developing human neocortex". Genome Biology. 17: 67. doi:10.1186/s13059-016-0932-1. ISSN 1474-760X. PMC 4831157Freely accessible. PMID 27081004. 
  14. ^ Paytuví Gallart, Andreu; Hermoso Pulido, Antonio; Anzar Martínez de Lagrán, Irantzu; Sanseverino, Walter; Aiese Cigliano, Riccardo (2016-01-04). "GREENC: a Wiki-based database of plant lncRNAs". Nucleic Acids Research. 44 (D1): D1161–D1166. doi:10.1093/nar/gkv1215. ISSN 0305-1048. PMC 4702861Freely accessible. PMID 26578586. 
  15. ^ Kapranov 2007
  16. ^ Birney 2007
  17. ^ Bentwich 2005
  18. ^ Nesterova 2001
  19. ^ Brosius 2005
  20. ^ Struhl 2007
  21. ^ a b c Pang 2006
  22. ^ Siepel 2005
  23. ^ Ponjavic 2007
  24. ^ Pollard 2006
  25. ^ a b Pollard 2006
  26. ^ Smith 2004
  27. ^ Torarinsson 2006
  28. ^ Mercer, T. R.; Dinger, M. E.; Mattick, J. S. (2009). "Long non-coding RNAs: Insights into functions". Nature Reviews Genetics. 10 (3): 155–159. doi:10.1038/nrg2521. PMID 19188922. 
  29. ^ Dinger, M. E.; Amaral, P. P.; Mercer, T. R.; Mattick, J. S. (2009). "Pervasive transcription of the eukaryotic genome: Functional indices and conceptual implications". Briefings in Functional Genomics and Proteomics. 8 (6): 407–423. doi:10.1093/bfgp/elp038. PMID 19770204. 
  30. ^ Amaral, P. P.; Clark, M. B.; Gascoigne, D. K.; Dinger, M. E.; Mattick, J. S. (2010). "LncRNAdb: A reference database for long noncoding RNAs". Nucleic Acids Research. 39 (Database issue): D146–D151. doi:10.1093/nar/gkq1138. PMC 3013714Freely accessible. PMID 21112873. 
  31. ^ Quek, Xiu Cheng; Thomson, Daniel W.; Maag, Jesper L. V.; Bartonicek, Nenad; Signal, Bethany; Clark, Michael B.; Gloss, Brian S.; Dinger, Marcel E. (2015-01-01). "lncRNAdb v2.0: expanding the reference database for functional long noncoding RNAs". Nucleic Acids Research. 43 (Database issue): D168–173. doi:10.1093/nar/gku988. ISSN 1362-4962. PMC 4384040Freely accessible. PMID 25332394. 
  32. ^ Smith, JE; Alvarez-Dominguez, JR; Kline, N; Huynh, NJ; Geisler, S; Hu, W; Coller, J; Baker, KE (Jun 26, 2014). "Translation of Small Open Reading Frames within Unannotated RNA Transcripts in Saccharomyces cerevisiae.". Cell reports. 7 (6): 1858–66. doi:10.1016/j.celrep.2014.05.023. PMID 24931603. 
  33. ^ Luo S, Lu JY, Liu L, et al. "Divergent lncRNAs Regulate Gene Expression and Lineage Differentiation in Pluripotent Cells". Cell Stem Cell. 18 (5): 637–652. doi:10.1016/j.stem.2016.01.024. PMID 26996597. 
  34. ^ Adelman K. & Egan E. (2017). "Non-coding RNA: More uses for genomic junk". Nature. 543: 183–185. doi:10.1038/543183a. 
  35. ^ Halley, Paul; Kadakkuzha, Beena (2014). "Regulation of the apolipoprotein gene cluster by a long noncoding RNA.". Cell Reports. 6 (1): 222–30. doi:10.1016/j.celrep.2013.12.015. PMC 3924898Freely accessible. PMID 24388749. 
  36. ^ (Dieci 2007
  37. ^ Dieci 2007
  38. ^ Pagano 2007
  39. ^ a b Beltran 2008
  40. ^ (Munroe 1991
  41. ^ Tiedge 1993
  42. ^ Tiedge 1991)
  43. ^ Muslimov 1998
  44. ^ Wang 2005
  45. ^ Centonze 2007
  46. ^ Lewejohann 2004
  47. ^ Golden 2008
  48. ^ Czech 2008
  49. ^ (Ogawa 2008
  50. ^ Blasco 2007
  51. ^ a b Schoeftner 2008
  52. ^ a b Azzalin 2007
  53. ^ Ingolia, NT; et al. (2011). "Ribosome Profiling of Mouse Embryonic Stem Cells Reveals the Complexity and Dynamics of Mammalian Proteomes". Cell. 147: 789–802. doi:10.1016/j.cell.2011.10.002. PMC 3225288Freely accessible. PMID 22056041. 
  54. ^ Slavoff, SA; et al. (2013). "Peptidomic discovery of short open reading frame-encoded peptides in human cells". Nature Chemical Biology. 9: 59–64. doi:10.1038/nchembio.1120. PMC 3625679Freely accessible. PMID 23160002. 
  55. ^ Kim, MS; et al. (2014). "A draft map of the human proteome". Nature. 509: 575–581. doi:10.1038/nature13302. PMC 4403737Freely accessible. PMID 24870542. 
  56. ^ Matsumoto, A; et al. (2017). "mTORC1 and muscle regeneration are regulated by the LINC00961-encoded SPAR polypeptide". Nature. 541: 228–232. doi:10.1038/nature21034. 
  57. ^ Lukiw WJ, Handley P, Wong L, Crapper McLachlan DR (Jun 1992). "BC200 RNA in normal human neocortex, non-Alzheimer dementia (NAD), and senile dementia of the Alzheimer type (AD)". Neurochem Res. 17 (6): 591–7. doi:10.1007/bf00968788. PMID 1603265. 
  58. ^ Watson JB, Sutcliffe JG (Sep 1987). "Primate brain-specific cytoplasmic transcript of the Alu repeat family". Mol Cell Biol. 7 (9): 3324–7. PMC 367971Freely accessible. PMID 2444875. 
  59. ^ a b Fu 2006
  60. ^ Lin 2007
  61. ^ Reis 2004
  62. ^ Eis 2005
  63. ^ Li 1997)
  64. ^ Sonkoly 2005
  65. ^ a b Calin 2007
  66. ^ Ishii 2006
  67. ^ McPherson 2007
  68. ^ Pasmant 2007
  69. ^ Broadbend 2008
  70. ^ a b Jarinova 2009
  71. ^ Liu 2009
  72. ^ Shirasawa 2004
  73. ^ Faghihi 2008
  74. ^ Tufarelli 2003
  75. ^ Hesman Saey 2011
  76. ^ Rinn Lab lincRNA homepage
  77. ^ Hangauer, Matthew J.; Vaughn, Ian W.; McManus, Michael T.; Rinn, John L. (20 June 2013). "Pervasive Transcription of the Human Genome Produces Thousands of Previously Unidentified Long Intergenic Noncoding RNAs". PLoS Genetics. 9 (6): e1003569. doi:10.1371/journal.pgen.1003569. PMC 3688513Freely accessible. PMID 23818866. 
  78. ^ Laurent 2010
  79. ^ Durruthy-Durruthy, J., Sebastiano, V., Wossidlo, M., Cepeda, D., Cui, J., Grow, E. J., ... & Au, K. F. (2016). The primate-specific noncoding RNA HPAT5 regulates pluripotency during human preimplantation development and nuclear reprogramming. Nature genetics, 48(1), 44-52 doi:10.1038/ng.3449


External links[edit]