Conserved sequence

From Wikipedia, the free encyclopedia
Jump to: navigation, search
A sequence alignment, produced by ClustalO, of mammalian histone proteins.
Sequences are the amino acids for residues 120-180 of the proteins. Residues that are conserved across all sequences are highlighted in grey. Below the protein sequences is a key denoting conserved sequence (*), conservative mutations (:), semi-conservative mutations (.), and non-conservative mutations ( ).[1]

In biology, conserved sequences are similar or identical sequences that occur within nucleic acid sequences (such as RNA and DNA sequences), protein sequences, or polymeric carbohydrates across species (orthologous sequences) or within different molecules produced by the same organism (paralogous sequences). In the case of cross species conservation, this indicates that a particular sequence may have been maintained by evolution despite speciation. The further back up the phylogenetic tree a particular conserved sequence may occur the more highly conserved it is said to be. Since sequence information is normally transmitted from parents to progeny by genes, a conserved sequence implies that there is a conserved gene.

It is widely believed that mutation in a "highly conserved" region leads to a non-viable life form, or a form that is eliminated through natural selection. The environment too plays a crucial role. If for example, a microorganism with antibiotic resistance genes is in the presence of antibiotic, the antibiotic resistance genes will be highly conserved. If not in the presence of antibiotics, the genes will become non-conserved.

Nucleic acid and protein sequences[edit]

Highly conserved DNA sequences are thought to have functional value. The role for many of these highly conserved non-coding DNA sequences is not understood. Ultra-conserved elements or sequences (UCEs or UCRs, ultra-conserved regions) that share 100% identity among human, mouse and rat were first described by Bejerano and colleagues in 2004.[2] One recent study that eliminated four highly conserved non-coding DNA sequences in mice yielded viable mice with no significant phenotypic differences; the authors described their findings as "unexpected".[3] Many regions of the DNA, including highly conserved DNA sequences, consist of repeated sequence elements. One possible explanation of the null hypothesis above is that removal of only one or a subset of a repeated sequence could theoretically preserve phenotypic functioning on the assumption that one such sequence is sufficient and the repetitions are superfluous to essential life processes; it was not specified in the paper whether the eliminated sequences were repeated sequences. Although most of the conserved sequences' biological function is still unknown, few conserved sequences derived transcripts showed that their expression is deregulated in human cancer tissues.[4]


A common notation to denote the level of sequence conservation is used by the clustal alignment programs. Below a set of aligned sequences, residue columns are indicated as fully conserved (*), containing only conservative mutations (:), semi-conservative mutations (.), and non-conservative mutations ( ).[5]

CG Islands[edit]

Germ line methylation can be utilized to shut down gene expression. Cytosine-guanine sequences in the gene are potential methylation sites, and when methylated disrupt regular expression of that portion of the gene. When methylated cytosine (5-methyl-cytosine) deaminates, it turns into thymine, which is then incorrectly paired with guanine. Guanine may then be replaced with adenine, locking in an altered gene sequence. Over time, 5-methyl-cytosines are likely to deaminate, which reduces the CG frequencies in the methylated regions of the gene. However, some regions of the gene can be found to have a high frequency of CG sequences since they are not being methylated. This absence of methylation allows regular expression of these areas of the gene. These regions, commonly referred to as CG islands, are said to be highly conserved sequences. The CG islands are considered highly conserved because any alterations to the sequences, like methylation, are detrimental to the organism. Therefore, CG islands are said to be under selective pressure. Similar CG islands can be found in various species’ genomes, denoting conservation of these sequences over a long period of time.

GERP Scores[edit]

A GERP (Genomic Evolutionary Rate Profiling) score measures evolutionary conservation of genetic sequences across species.[6] There is a relationship between a sequences’ GERP score and the proportion of variant alleles within that sequence. As the GERP score of a sequence increases, variation within that sequence becomes more rare. A higher GERP signifies a highly conserved sequence, where alteration is harmful, so adverse variants would reduce the fitness of the organism and be selected against.

Biological role[edit]

Highly conserved sequences are often required for basic cellular function, stability or reproduction. Sequence similarity is used as evidence of structural and functional conservation, and evolutionary relationships between sequences. Consequently, functional elements are frequently identified by searching for conserved sequence in a genome.

Conservation of protein-coding sequences leads to the presence of identical amino acid residues at analogous regions of the protein structure and hence similar function. Conservative mutations alter amino acids to similar chemically residues and so may still not affect the protein's function. Among the most highly conserved sequences are the active sites of enzymes and the binding sites of protein receptors.

Conserved non-coding sequences do not encode protein, but often harbour cis-regulatory elements. Some deletions of highly conserved sequences in humans (hCONDELs) and other organisms have been suggested to be a potential cause of the anatomical and behavioural differences between humans and other mammals.[7][8] The TATA promoter sequence is an example of a highly conserved DNA sequence found in most eukaryotes.[9]

Polymeric carbohydrate sequences[edit]

The monosaccharide sequence of the glycosaminoglycan heparin is conserved across a wide range of species.


The research of conserved genetic sequences is extremely beneficial to the scientific community. The detection of similar sequences across diverse species’ genomes can provide useful information regarding the evolutionary history of these species. Additionally, the examination of conserved sequences can aid medical research. By identifying rare alleles within conserved sequences, information can be compiled and used to assess risk of disease among humans. Genome-wide association studies (GWAS) compare various alleles across the human genome and their association with risk for a particular diseases or ailments.

See also[edit]


  1. ^ "Clustal FAQ #Symbols". Clustal. Retrieved 8 December 2014. 
  2. ^ Bejerano, G; Pheasant, M; Makunin, I; Stephen, S; Kent, WJ; Mattick, JS; Haussler, D (2004-05-28). "Ultraconserved elements in the human genome.". Science. 304 (5675): 1321–5. doi:10.1126/science.1098119. PMID 15131266. 
  3. ^ Ahituv N, Zhu Y, Visel A, et al. (2007). "Deletion of ultraconserved elements yields viable mice". PLoS Biol. 5 (9): e234. doi:10.1371/journal.pbio.0050234. PMC 1964772free to read. PMID 17803355. 
  4. ^ Calin, GA; Liu, CG; Ferracin, M; Hyslop, T; Spizzo, R; Sevignani, C; Fabbri, M; Cimmino, A; Lee, EJ; Wojcik, SE; Shimizu, M; Tili, E; Rossi, S; Taccioli, C; Pichiorri, F; Liu, X; Zupo, S; Herlea, V; Gramantieri, L; Lanza, G; Alder, H; Rassenti, L; Volinia, S; Schmittgen, TD; Kipps, TJ; Negrini, M; Croce, CM (September 2007). "Ultraconserved regions encoding ncRNAs are altered in human leukemias and carcinomas.". Cancer Cell. 12 (3): 215–29. doi:10.1016/j.ccr.2007.07.027. PMID 17785203. 
  5. ^ "Clustal FAQ #Symbols". Clustal. Retrieved 8 December 2014. 
  6. ^ Genomic Evolutionary Rate Profiling at Sidow Lab
  7. ^ McLean, Cory Y.; et al. (10 March 2011). "Human-specific loss of regulatory DNA and the evolution of human-specific traits". Nature. 471 (7337): 216–219. doi:10.1038/nature09774. PMC 3071156free to read. PMID 21390129. 
  8. ^ Gross, Liza (September 2007). "Are "Ultraconserved" Genetic Elements Really Indispensable?". PLOS Biology. 5 (9): e253. doi:10.1371/journal.pbio.0050253. PMC 1964769free to read. PMID 20076686. 
  9. ^ Patikoglou, G. A.; Kim, J. L.; Sun, L.; Yang, S.-H.; Kodadek, T.; Burley, S. K. (15 December 1999). "TATA element recognition by the TATA box-binding protein has been conserved throughout evolution". Genes & Development. 13 (24): 3217–3230. doi:10.1101/gad.13.24.3217. PMID 10617571.