Jump to content

Pseudogene

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Boghog (talk | contribs) at 15:01, 25 July 2009 (→‎Properties of pseudogenes: deleted unsupported statement (see talk)). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Pseudogenes are defunct relatives of known genes that have lost their protein-coding ability or are otherwise no longer expressed in the cell.[1] Although some do not have introns or promoters (these pseudogenes are copied from mRNA and incorporated into the chromosome and are called processed pseudogenes)[2], most have some gene-like features (such as promoters, CpG islands, and splice sites), they are nonetheless considered nonfunctional, due to their lack of protein-coding ability resulting from various genetic disablements (stop codons, frameshifts, or a lack of transcription) or their inability to encode RNA (such as with rRNA pseudogenes). Thus the term, coined in 1977 by Jacq, et al.,[3] is composed of the prefix pseudo, which means false, and the root gene, which is the central unit of molecular genetics.

Because pseudogenes are generally thought of as the last stop for genomic material that is to be removed from the genome,[4] they are often labeled as junk DNA. Nonetheless, pseudogenes contain fascinating biological and evolutionary histories within their sequences. This is due to a pseudogene's shared ancestry with a functional gene: in the same way that Darwin thought of two species as possibly having a shared common ancestry followed by millions of years of evolutionary divergence (see speciation), a pseudogene and its associated functional gene also share a common ancestor and have diverged as separate genetic entities over millions of years.

Properties of pseudogenes

Pseudogenes are characterized by a combination of homology to a known gene and nonfunctionality. That is, although every pseudogene has a DNA sequence that is similar to some functional gene, they are nonetheless unable to produce functional final products (nonfunctionality).[5] Pseudogenes are quite difficult to identify and characterize in genomes, because the two requirements of homology and nonfunctionality are implied through sequence calculations and alignments rather than biologically proven.

  1. Homology is implied by sequence identity between the DNA sequences of the pseudogene and parent gene. After aligning the two sequences, the percentage of identical base pairs is computed. A high sequence identity (usually between 40% and close to 100%) means that it is highly likely that these two sequences diverged from a common ancestral sequence (are homologous), and highly unlikely that these two sequences were independently created (see typewriting monkeys).
  2. Nonfunctionality can manifest itself in many ways. Normally, a gene must go through several steps in going from a genetic DNA sequence to a fully-functional protein: transcription, pre-mRNA processing, translation, and protein folding are all required parts of this process. If any of these steps fails, then the sequence may be considered nonfunctional. In high-throughput pseudogene identification, the most commonly identified disablements are stop codons and frameshifts, which almost universally prevent the translation of a functional protein product.
  3. Pseudogenes for RNA genes are often easier to discover. Many RNA genes occur as multiple copy genes, and pseudogenes are identified through sequence identity and location within the region.

Types and origin of pseudogenes

There are three main types of pseudogenes, all with distinct mechanisms of origin and characteristic features. The classifications of pseudogenes are as follows:

  1. Processed (or retrotransposed) pseudogenes. In higher eukaryotes, particularly mammals, retrotransposition is a fairly common event that has had a huge impact on the composition of the genome. For example, somewhere between 30% - 44% of the human genome consists of repetitive elements such as SINEs and LINEs (see retrotransposons).[6][7] In the process of retrotransposition, a portion of the mRNA transcript of a gene is spontaneously reverse transcribed back into DNA and inserted into chromosomal DNA. Although retrotransposons usually create copies of themselves, it has been shown in an in vitro system that they can create retrotransposed copies of random genes, too.[8] Once these pseudogenes are inserted back into the genome, they usually contain a poly-A tail, and usually have had their introns spliced out; these are both hallmark features of cDNAs. However, because they are derived from a mature mRNA product, processed pseudogenes also lack the upstream promoters of normal genes; thus, they are considered "dead on arrival", becoming non-functional pseudogenes immediately upon the retrotransposition event.[9] A further characteristic of processed pseudogenes is common truncation of the 5' end relative to the parent sequence, which is a result of the relatively non-processive retrotransposition mechanism that creates processed pseudogenes.[10]
  2. Non-processed (or duplicated) pseudogenes. Gene duplication is another common and important process in the evolution of genomes. A copy of a functional gene may arise as a result of a gene duplication event and subsequently acquire mutations that cause it to become nonfunctional. Duplicated pseudogenes usually have all the same characteristics of genes, including an intact exon-intron structure and promoter sequences. The loss of a duplicated gene's functionality usually has little effect on an organism's fitness, since an intact functional copy still exists. According to some evolutionary models, shared duplicated pseudogenes indicate the evolutionary relatedness of humans and the other primates.[11]
  3. Disabled genes, or unitary pseudogenes. Various mutations can stop a gene from being successfully transcribed or translated, and a gene may become nonfunctional or deactivated if such a mutation becomes fixed in the population. This is the same mechanism by which non-processed genes become deactivated, but the difference in this case is that the gene was not duplicated before becoming disabled. Normally, such gene deactivation would be unlikely to become fixed in a population, but various population effects, such as genetic drift, a population bottleneck, or in some cases, natural selection, can lead to fixation. The classic example of a unitary pseudogene is the gene that presumably coded the enzyme L-gulono-γ-lactone oxidase (GLO) in primates. In all mammals studied besides primates (except guinea pigs), GLO aids in the biosynthesis of Ascorbic acid (vitamin C), but it exists as a disabled gene in humans and other primates.[12][13] Another interesting and more recent example of a disabled gene, which links the deactivation of a caspase gene (through a nonsense mutation) to positive selection in humans.[14]

Pseudogenes can complicate molecular genetic studies. For example, a researcher who wants to amplify a gene by PCR may simultaneously amplify a pseudogene that shares similar sequences. This is known as PCR bias or amplification bias. Similarly, pseudogenes are sometimes annotated as genes in genome sequences.

Processed pseudogenes often pose a problem for gene prediction programs, often being misidentified as real genes or exons. It has been proposed that identification of processed pseudogenes can help improve the accuracy of gene prediction methods.[15]

It has also been shown that the parent sequences that give rise to processed pseudogenes lose their coding potential faster than those giving rise to non-processed pseudogenes.[4]

Functional pseudogenes?

By definition, pseudogenes lack a function. However, the classification of pseudogenes generally relies on computational analysis of genomic sequences using complex algorithms.[16] This has led to the incorrect identification of pseudogenes. For example the functional, chimeric gene jingwei in Drosophila was once thought to be a processed pseudogene. [17]

It has been established that quite a few pseudogenes can go through the process of transcription, either if their own promoter is still intact or in some cases using the promoter of a nearby gene; this expression of pseudogenes also appears to be tissue-specific.[4] In 2003, Hirotsune et al. identified a retrotransposed pseudogene whose transcript purportedly plays a trans-regulatory role in the expression of its homologous gene, Makorin1, and suggested this as a general model under which pseudogenes may play an important biological role.[18] Other researchers have since hypothesized similar roles for other pseudogenes.[19] Hirotsune's report prompted two molecular biologists to carefully review scientific literature on the subject of pseudogenes. To the surprise of many, they found a number of examples in which pseudogenes play a role in gene regulation and expression,[20] forcing Hirotsune's group to rescind their claim that they were the first to identify pseudogene function.[21] Furthermore, the original findings of Hirotsune et al. concerning Makorin1 have recently been strongly contested;[22] thus, the possibility that some pseudogenes could have important biological functions was disputed. Additionally, University of Chicago and University of Cincinnati scientists reported in 2002 that a processed pseudogene called phosphoglycerate mutase 3 (PGAM3P) actually produces a functional protein.[23]

A 2008 publication in Nature discusses that some endogenous siRNAs are derived from pseudogenes, and thus some pseudogenes play a role in regulating protein-coding transcripts.[24]

References

  1. ^ Vanin EF (1985). "Processed pseudogenes: characteristics and evolution". Annu. Rev. Genet. 19: 253–72. doi:10.1146/annurev.ge.19.120185.001345. PMID 3909943.
  2. ^ Evolutionary Analysis Fourth Edition, Freeman Scott, Herron John C.
  3. ^ Jacq C, Miller JR, Brownlee GG (1977). "A pseudogene structure in 5S DNA of Xenopus laevis". Cell. 12 (1): 109–20. doi:10.1016/0092-8674(77)90189-1. PMID 561661. {{cite journal}}: Unknown parameter |month= ignored (help)CS1 maint: multiple names: authors list (link)
  4. ^ a b c Zheng D, Frankish A, Baertsch R, Kapranov P, Reymond A, Choo SW, Lu Y, Denoeud F, Antonarakis SE, Snyder M, Ruan Y, Wei CL, Gingeras TR, Guigó R, Harrow J, Gerstein MB (2007). "Pseudogenes in the ENCODE regions: consensus annotation, analysis of transcription, and evolution". Genome Res. 17 (6): 839–51. doi:10.1101/gr.5586307. PMC 1891343. PMID 17568002. {{cite journal}}: Unknown parameter |month= ignored (help)CS1 maint: multiple names: authors list (link)
  5. ^ Mighell AJ, Smith NR, Robinson PA, Markham AF (2000). "Vertebrate pseudogenes". FEBS Lett. 468 (2–3): 109–14. doi:10.1016/S0014-5793(00)01199-6. PMID 10692568. {{cite journal}}: Unknown parameter |month= ignored (help)CS1 maint: multiple names: authors list (link)
  6. ^ Jurka J (2004). "Evolutionary impact of human Alu repetitive elements". Curr. Opin. Genet. Dev. 14 (6): 603–8. doi:10.1016/j.gde.2004.08.008. PMID 15531153. {{cite journal}}: Unknown parameter |month= ignored (help)
  7. ^ Dewannieux M, Heidmann T (2005). "LINEs, SINEs and processed pseudogenes: parasitic strategies for genome modeling". Cytogenet. Genome Res. 110 (1–4): 35–48. doi:10.1159/000084936. PMID 16093656.
  8. ^ Dewannieux M, Esnault C, Heidmann T (2003). "LINE-mediated retrotransposition of marked Alu sequences". Nat. Genet. 35 (1): 41–8. doi:10.1038/ng1223. PMID 12897783. {{cite journal}}: Unknown parameter |month= ignored (help)CS1 maint: multiple names: authors list (link)
  9. ^ Graur D, Shuali Y, Li WH (1989). "Deletions in processed pseudogenes accumulate faster in rodents than in humans". J. Mol. Evol. 28 (4): 279–85. doi:10.1007/BF02103423. PMID 2499684. {{cite journal}}: Unknown parameter |month= ignored (help)CS1 maint: multiple names: authors list (link)
  10. ^ Pavlícek A, Paces J, Zíka R, Hejnar J (2002). "Length distribution of long interspersed nucleotide elements (LINEs) and processed pseudogenes of human endogenous retroviruses: implications for retrotransposition and pseudogene detection". Gene. 300 (1–2): 189–94. doi:10.1016/S0378-1119(02)01047-8. PMID 12468100. {{cite journal}}: Unknown parameter |month= ignored (help)CS1 maint: multiple names: authors list (link)
  11. ^ Max EE (2003-05-05). "Plagiarized Errors and Molecular Genetics". TalkOrigins Archive. Retrieved 2008-07-22. {{cite web}}: Cite has empty unknown parameter: |coauthors= (help)
  12. ^ Nishikimi M, Kawai T, Yagi K (1992). "Guinea pigs possess a highly mutated gene for L-gulono-gamma-lactone oxidase, the key enzyme for L-ascorbic acid biosynthesis missing in this species". J. Biol. Chem. 267 (30): 21967–72. PMID 1400507. {{cite journal}}: Unknown parameter |month= ignored (help)CS1 maint: multiple names: authors list (link)
  13. ^ Nishikimi M, Fukuyama R, Minoshima S, Shimizu N, Yagi K (1994). "Cloning and chromosomal mapping of the human nonfunctional gene for L-gulono-gamma-lactone oxidase, the enzyme for L-ascorbic acid biosynthesis missing in man". J. Biol. Chem. 269 (18): 13685–8. PMID 8175804. {{cite journal}}: Unknown parameter |month= ignored (help)CS1 maint: multiple names: authors list (link)
  14. ^ Xue Y, Daly A, Yngvadottir B, Liu M, Coop G, Kim Y, Sabeti P, Chen Y, Stalker J, Huckle E, Burton J, Leonard S, Rogers J, Tyler-Smith C (2006). "Spread of an inactive form of caspase-12 in humans is due to recent positive selection". Am. J. Hum. Genet. 78 (4): 659–70. doi:10.1086/503116. PMC 1424700. PMID 16532395. {{cite journal}}: Unknown parameter |month= ignored (help)CS1 maint: multiple names: authors list (link)
  15. ^ van Baren MJ, Brent MR (2006). "Iterative gene prediction and pseudogene removal improves genome annotation". Genome Res. 16 (5): 678–85. doi:10.1101/gr.4766206. PMC 1457044. PMID 16651666. {{cite journal}}: Unknown parameter |month= ignored (help)
  16. ^ Harrison PM, Milburn D, Zhang Z, Bertone P, Gerstein M (2003). "Identification of pseudogenes in the Drosophila melanogaster genome". Nucleic Acids Res. 31 (3): 1033–7. doi:10.1093/nar/gkg169. PMC 149191. PMID 12560500. {{cite journal}}: Unknown parameter |month= ignored (help)CS1 maint: multiple names: authors list (link)
  17. ^ Long M, Langley CH (1993). "Natural selection and the origin of jingwei, a chimeric processed functional gene in Drosophila". Science (journal). 260 (5104): 91–5. doi:10.1126/science.7682012. PMID 7682012. {{cite journal}}: Unknown parameter |month= ignored (help)
  18. ^ Hirotsune S, Yoshida N, Chen A, Garrett L, Sugiyama F, Takahashi S, Yagami K, Wynshaw-Boris A, Yoshiki A (2003). "An expressed pseudogene regulates the messenger-RNA stability of its homologous coding gene". Nature. 423 (6935): 91–6. doi:10.1038/nature01535. PMID 12721631. {{cite journal}}: Unknown parameter |month= ignored (help)CS1 maint: multiple names: authors list (link)
  19. ^ Svensson O, Arvestad L, Lagergren J (2006). "Genome-wide survey for biologically functional pseudogenes". PLoS Comput. Biol. 2 (5): e46. doi:10.1371/journal.pcbi.0020046. PMC 1456316. PMID 16680195. {{cite journal}}: Unknown parameter |month= ignored (help)CS1 maint: multiple names: authors list (link) CS1 maint: unflagged free DOI (link)
  20. ^ Balakirev ES, Ayala FJ (2003). "Pseudogenes: are they "junk" or functional DNA?". Annu. Rev. Genet. 37: 123–51. doi:10.1146/annurev.genet.37.040103.103949. PMID 14616058.
  21. ^ Hirotsune S, Yoshida N, Chen A, Garrett L, Sugiyama F, Takahashi S, Yagami K, Wynshaw-Boris A, Yoshiki A (2003). "Addendum: An Expressed Pseudogene Regulates the messenger-RNA Stability of Its Homologous Coding Gene". Nature. 426 (100): 100. doi:10.1038/nature02094. PMID 12721631. {{cite journal}}: Unknown parameter |month= ignored (help)CS1 maint: multiple names: authors list (link)
  22. ^ Gray TA, Wilson A, Fortin PJ, Nicholls RD (2006). "The putatively functional Mkrn1-p1 pseudogene is neither expressed nor imprinted, nor does it regulate its source gene in trans". Proc. Natl. Acad. Sci. U.S.A. 103 (32): 12039–44. doi:10.1073/pnas.0602216103. PMC 1567693. PMID 16882727. {{cite journal}}: Unknown parameter |month= ignored (help)CS1 maint: multiple names: authors list (link)
  23. ^ Betrán E, Wang W, Jin L, Long M (2002). "Evolution of the phosphoglycerate mutase processed gene in human and chimpanzee revealing the origin of a new primate gene". Mol. Biol. Evol. 19 (5): 654–63. PMID 11961099. {{cite journal}}: Unknown parameter |month= ignored (help)CS1 maint: multiple names: authors list (link)
  24. ^ Watanabe T, Totoki Y, Toyoda A, Kaneda M, Kuramochi-Miyagawa S, Obata Y, Chiba H, Kohara Y, Kono T, Nakano T, Surani MA, Sakaki Y, Sasaki H (2008). "Endogenous siRNAs from naturally formed dsRNAs regulate transcripts in mouse oocytes". Nature. 453 (7194): 539–43. doi:10.1038/nature06908. PMID 18404146. {{cite journal}}: Unknown parameter |month= ignored (help)CS1 maint: multiple names: authors list (link)

See also