Conserved non-coding sequence
CNSs in plants and animals are highly associated with transcription factor binding sites and other cis-acting regulatory elements. Conserved non-coding sequences can be important sites of evolutionary divergence as mutations in these regions may alter the regulation of conserved genes, producing species-specific patterns of gene expression. These features have made them an invaluable resource in comparative genomics.
Sources of CNSs
All CNSs are likely to perform some function in order to have constraints on their evolution, but they can be distinguished based on where in the genome they are found and how they got there.
Introns are stretches of sequence found mostly in eukaryotic organisms which interrupt the coding regions of genes, with basepair lengths varying across three orders of magnitude. Intron sequences may be conserved, often because they contain expression regulating elements that put functional constraints on their evolution. Patterns of conserved introns between species of different kingdoms have been used to make inferences about intron density at different points in evolutionary history. This makes them an important resource for understanding the dynamics of intron gain and loss in eukaryotes (1,28).
Some of the most highly conserved noncoding regions are found in the untranslated regions (UTRs) at the 3’ end of mature RNA transcripts, rather than in the introns. This suggests an important function operating at the post-transcriptional level. If these regions perform an important regulatory function, the increase in 3’-UTR length over evolutionary time suggests that conserved UTRs contribute to organism complexity. Regulatory motifs in UTRs often conserved in genes belonging to the same metabolic family could potentially be used to develop highly specific medicines that target RNA transcripts.
Repetitive elements can accumulate in an organism’s genome as the result of a few different transposition processes. The extent to which this has taken place during the evolution of eukaryotes varies greatly: repetitive DNA accounts for just 3% of the fly genome, but accounts for 50% of the human genome.
There are different theories explaining the conservation of transposable elements. One holds that, like pseudogenes, they provide a source of new genetic material, allowing for faster adaptation to changes in the environment. A simpler alternative is that, because eukaryotic genomes may have no means to prevent the proliferation of transposable elements, they are free to accumulate as long as they are not inserted into or near a gene in such a way that they would disrupt essential functions. A recent study showed that transposons contribute at least 16% of the eutherian-specific CNSs, marking them as a “major creative force” in the evolution of gene regulation in mammals. There are three major classes of transposable elements, distinguished by the mechanisms by which they proliferate.
Classes of Transposable Elements
DNA transposons encode a transposase protein, which is flanked by inverted repeat sequences. The transposase excises the sequence and reintegrates it elsewhere in the genome. By excising immediately following DNA replication and inserting into target sites which have not yet been replicated, the number of transposons in the genome can increase.
Retrotransposons use reverse transcriptase to generate a cDNA from the TE transcript. These are further divided into long terminal repeat (LTR) retrotransposons, long interspersed elements (LINEs), and short interspersed nuclear elements (SINEs). In LTR retrotransposons, after the RNA template is degraded, a DNA strand complementary to the reverse-transcribed cDNA returns the element to a double-stranded state. Integrase, an enzyme encoded by the LTR retrotransposon, then reincorporates the element at a new target site. These elements are flanked by long terminal repeats (300-500bp) which mediate the transposition process.
LINEs use a simpler method in which the cDNA is synthesized at the target site following cleavage by a LINE-encoded endonuclease. LINE-encoded reverse transcriptase is not highly sequence-specific. The incorporation by LINE machinery of unrelated RNA transcripts gives rise to non-functional processed pseudogenes. If a small gene’s promoter is included in the transcribed portion of the gene, the stable transcript can be duplicated and reinserted into the genome multiple times. The elements produced by this process are called SINEs.
Conserved Regulatory TEs
When these elements are active in a genome, they can introduce new promoter regions, disrupt existing regulatory sites, or, if inserted into transcribed regions, alter splicing patterns. A particular transposed element will be positively selected for if the altered expression it produces confers an adaptive advantage. This has resulted in some of the conserved regions found in humans. Nearly 25% characterized promoters in humans contain transposed elements. This is of particular interest in light of the fact that most transposable elements humans are no longer active.
Pseudogenes are vestiges of once-functional genes disabled by sequence deletions, insertions, or mutations. The primary evidence for this process is the presence of fully functioning orthologues to these inactivated sequences in lower-vertebrate genomes. Pseudogenes commonly emerge following a gene duplication or polyploidization event. With two functional copies of a gene, there is no selective pressure to maintain expressibility of both, leaving one free to accumulate mutations as a nonfunctioning pseudogene. This is the typical case, whereby neutral selection allows pseudogenes to accumulate mutations, serving as “reservoirs” of new genetic material, with potential to be reincorporated into the genome. However, some pseudogenes have been found to be conserved in mammals. The simplest explanation for this is that these noncoding regions may serve some biological function, and this has been found to be the case for several conserved pseudogenes. Makorin1 mRNA, for example, was found to be stabilized by its paralogous pseudogene, Makorin1-p1, which is conserved in several mouse species. Other pseudogenes have also been found to be conserved between humans and mice and between humans and chimpanzees, originating from duplication events prior to the divergence of the species. Evidence of these pseudogenes’ transcription also supports the hypothesis that they have a biological function. Findings of potentially functional pseudogenes creates difficulty in defining them, since the term was originally meant for degenerate sequences with no biological function.
An example of a pseudogene is the gene for L-gulonolactone oxidase, a liver enzyme necessary for biosynthesis of L-ascorbic acid (vitamin C) in most birds and mammals, but which is mutated in the haplorrhini suborder of primates, including humans which require ascorbic acid or ascorbate from food. The remains of this non-functional gene with many mutations is still present in the genomes of guinea pigs and humans.
Ultraconserved regions (UCRs) are regions over 200 bp in length with 100% identity across species. These unique sequences are mostly found in noncoding regions. It is still not fully understood why the negative selective pressure on these regions is so much stronger than the selection in protein-coding regions. Though these regions can be seen as unique, the distinction between regions with a high degree of sequence conservation and those with perfect sequence conservation is not necessarily one of biological significance. One study in Science found that all extremely conserved noncoding sequences have important regulatory functions regardless of whether the conservation is perfect, making the distinction of ultraconservation appear somewhat arbitrary.
CNSs in Comparative Genomics: Evolutionary Insights
The conservation of both functional and nonfunctional noncoding regions provides an important tool for comparative genomics, though conservation of cis-regulatory elements has proven particularly useful. The presence of CNSs could be due in some cases to a lack of divergence time, though the more common thinking is that they perform functions which place varying degrees of constraint on their evolution. Consistent with this theory, cis-regulatory elements are commonly found in conserved noncoding regions. Thus, sequence similarity is often used as a parameter to limit the search space when trying to identify regulatory elements conserved across species, though this is most useful in analyzing distantly related organisms, since closer relatives have sequence conservation among nonfunctional elements as well.
Orthologues with high sequence similarity may not share the same regulatory elements. These differences may account for different expression patterns across species. Conservation of noncoding sequence is important for the analysis of paralogs within a single species as well. CNSs shared by paralogous clusters of Hox genes are candidates for expression regulating regions, possibly coordinating the similar expression patterns of these genes.
Comparative genomic studies of the promoter regions of orthologous genes can also detect differences in the presence and relative positioning of transcription factor binding sites in promoter regions. Orthologues with high sequence similarity may not share the same regulatory elements. These differences may account for different expression patterns across species .
The regulatory functions commonly associated with conserved non-coding regions are thought to play a role in the evolution of eukaryotic complexity. On average, plants contain fewer CNSs per gene than mammals. This is thought to be related to their having undergone more polyploidization, or genome duplication events. During the subfunctionalization that ensues following gene duplication, there is potential for a greater rate of CNS loss per gene. Thus, genome duplication events may account for the fact that plants have more genes, each with fewer CNSs. Assuming the number of CNSs to be a proxy for regulatory complexity, this may account for the disparity in complexity between plants and mammals .
Because changes in gene regulation are thought to account for most of the differences between humans and chimpanzees, researchers have looked to CNSs to try to show this. A portion of the CNSs between humans and other primates have an enrichment of human-specific single-nucleotide polymorphisms, suggesting positive selection for these SNPs and accelerated evolution of those CNSs. Many of these SNPs are also associated with changes in gene expression, suggesting that these CNSs played an important role in human evolution.
Online Bioinformatic Software for Analyzing CNSs
- Hardison, RC. (Sep 2000). "Conserved noncoding sequences are reliable guides to regulatory elements.". Trends Genet 16 (9): 369–72. PMID 10973062.
- Freeling, M; Subramaniam, S (Apr 2009). "Conserved noncoding sequences (CNSs) in higher plants.". Curr Opin Plant Biol 12 (2): 126–32. doi:10.1016/j.pbi.2009.01.005. PMID 19249238.
- Prabhakar, S.; Noonan, JP.; Pääbo, S.; Rubin, EM. (Nov 2006). "Accelerated evolution of conserved noncoding sequences in humans.". Science 314 (5800): 786. doi:10.1126/science.1130738. PMID 17082449.
- Jegga, AG.; Aronow, BJ. (Apr 2008). "Evolutionarily Conserved Noncoding DNA.". eLS. doi:10.1002/9780470015902.a0006126.pub2.
- Rogozin, IB.; Wolf, YI.; Sorokin, AV.; Mirkin, BG.; Koonin, EV. (Sept 2003). "Remarkable Interkingdom Conservation of Intron Positions and Massive, Lineage-Specific Intron Loss and Gain in Eukaryotic Evolution.". Current Bio 13 (17): 1512–1517. doi:10.1016/S0960-9822(03)00558-X. PMID 12956953.
- Eickbush, TH.; Eickbush, DJ. (July 2006). "Transposable Elements: Evolution.". eLS.
- Mikkelsen, T.S., et al. (2007). "Genome of the marsupial Monodelphis domestica reveals innovation in non-coding sequences.". Nature 447 (7141): 167–177. doi:10.1038/nature05805. PMID 17495919.
- Feschotte, Cédric. (May 2008). "Transposable Elements and the Evolution of Regulatory Networks.". Nature Reviews Genetics.
- Cooper, DN. Human Gene Evolution. Oxford: BIOS Scientific Publishers, Sept, 1988, p.265-292
- Svensson, O.; Arvestad, L.; Lagergren, J. (May 2005). "Genome-wide survery for biologically functional pseudogenes.". PLoS Comput Biol. 2 (5): 46. doi:10.1371/journal.pcbi.0020046. PMC 1456316. PMID 16680195.
- Podlaha, Ondrej.; Zhang, Jianzhi. (Nov 2010). "Pseudogenes and Their Evolution.". eLS.
- Nishikimi M, Kawai T, Yagi K (October 1992). "Guinea pigs possess a highly mutated gene for L-gulono-gamma-lactone oxidase, the key enzyme for L-ascorbic acid biosynthesis missing in this species". J. Biol. Chem. 267 (30): 21967–72. PMID 1400507.
- Bejerano, G.; Pheasant, M.; Makunin, I.; Stephen, S.; Kent, JS.; Mattick; Haussler, David. (May 2004). "Ultraconserved Elements in the Human Genome.". Science 304 (5675): 1321–1325. doi:10.1126/science.1098119. PMID 15131266.
- Katzman, Sol.; Kern, AD.; Bejerano, G.; Fewell, G.; Fulton, RK.; Wilson; Salama, SR.; Haussler, David. (Aug 2007). "Human Genome Ultraconserved Elements are Ultraselected.". Science 317 (5840): 915. doi:10.1126/science.1142430. PMID 17702936.
- Dubchack, I.; Brudno, M.; Loots, GG.; Pachter, L.; Mayor, EM.; Rubin; Frazer, KA. (2000). "Active Conservation of Noncoding Sequences Revealed by Three-Way Species Comparisons.". Genome Res. 10 (9): 1304–1306. doi:10.1101/gr.142200. PMC 310906. PMID 10984448.
- Matsunami, M.; Sumiyama, K.; Saitou, N. (Sept 2010). "Evolution of Conserved Non-Coding Sequences Within the Vertebrate Hox Clusters Through the Two-Round Whole Genome Duplications Revealed by Phylogenetic Footprinting Analysis.". Journal of Mol. Evol. 71 (5-6): 427–463. doi:10.1007/s00239-010-9396-1. PMID 20981416.
- Santini, S.; Boore, JL.; Meyer, A. (2003). "Evolutionary Conservation of Regulatory Elements in Vertebrate Hox Gene Clusters.". Genome Res. 13 (6A): 1111–1122. doi:10.1101/gr.700503. PMC 403639. PMID 12799348.
- Greaves, D.R., et al. (1998). "Functional Comparison of the Murine Macrosialin and Human CD68 Promoters in Macrophage and Nonmacrophage Cell Lines.". Genomics 54 (1): 165–168. doi:10.1006/geno.1998.5546. PMID 9806844.
- Marchese, A., et al. (1994). "Mapping Studies of Two G Protein-Coupled Receptor Genes: An Amino Acid Difference May Confer a Functional Variation Between a Human and Rodent Receptor.". Biochem. and Biophys. Res. Comm. 205 (3): 1952–1958. doi:10.1006/bbrc.1994.2899. PMID 7811287.
- Margarit, Ester, et al. (1998). "Identification of Conserved Potentially Regulatory Sequences of the SRY Gene from 10 Different Species of Mammals.". Biochem. and Biophys. Res. Comm. 245 (2): 370–377. doi:10.1006/bbrc.1998.8441. PMID 9571157.
- Lockton, Steven.; Gaut, BS. (Jan 2005). "Plant conserved non-coding sequences and paralogue evolution.". Trends in Genetics 21 (1): 60–65. doi:10.1016/j.tig.2004.11.013. PMID 15680516.
- Bird, Christine P., et al. (2007). "Fast-evolving noncoding sequences in the human genome.". Genome Biology 8 (6): R118. doi:10.1186/gb-2007-8-6-r118. PMC 2394770. PMID 17578567.