Retrotransposons (also called transposons via RNA intermediates) are genetic elements that can amplify themselves in a genome and are ubiquitous components of the DNA of many eukaryotic organisms. They are one of the two subclasses of transposon, where the other is DNA transposon, which does not involve an RNA intermediate. They are particularly abundant in plants, where they are often a principal component of nuclear DNA. In maize, 49–78% of the genome is made up of retrotransposons. In wheat, about 90% of the genome consists of repeated sequences and 68% of transposable elements. In mammals, almost half the genome (45% to 48%) is transposons or remnants of transposons. Around 42% of the human genome is made up of retrotransposons, while DNA transposons account for about 2–3%.
The retrotransposons' replicative mode of transposition by means of an RNA intermediate rapidly increases the copy numbers of elements and thereby can increase genome size. Like DNA transposable elements (class II transposons), retrotransposons can induce mutations by inserting near or within genes. Furthermore, retrotransposon-induced mutations are relatively stable, because the sequence at the insertion site is retained as they transpose via the replication mechanism.
Retrotransposons copy themselves to RNA and then back to DNA that may integrate back to the genome. The second step of forming DNA may be carried out by a reverse transcriptase, which the retrotransposon encodes. Transposition and survival of retrotransposons within the host genome are possibly regulated both by retrotransposon- and host-encoded factors, to avoid deleterious effects on host and retrotransposon as well, in a relationship that has existed for many millions of years between retrotransposons and their hosts. The understanding of how retrotransposons and their hosts' genomes have co-evolved mechanisms to regulate transposition, insertion specificities, and mutational outcomes in order to optimize each other's survival is still in its infancy.
Because of accumulated mutations, most retrotransposons are no longer able to retrotranspose.
Types of retrotransposons
Retrotransposons, also known as class I transposable elements, consist of two subclasses, the long terminal repeat (LTR) and the non-LTR retrotransposons. Classification into these subclasses is based on the phylogeny of the reverse transcriptase, which goes in line with structural differences, such as presence/absence of long terminal repeats as well as number and types of open reading frames, encoding domains and target site duplication lengths.
LTR retrotransposons have direct LTRs that range from ~100 bp to over 5 kb in size. LTR retrotransposons are further sub-classified into the Ty1-copia-like (Pseudoviridae), Ty3-gypsy-like (Metaviridae), and BEL-Pao-like groups based on both their degree of sequence similarity and the order of encoded gene products. Ty1-copia and Ty3-gypsy groups of retrotransposons are commonly found in high copy number (up to a few million copies per haploid nucleus) in animals, fungi, protista, and plants genomes. BEL-Pao like elements have so far only been found in animals. Although Retroviruses are often classified separately, they share many features with LTR retrotransposons. A major difference with Ty1-copia and Ty3-gypsy retrotransposons is that retroviruses have an Envelope protein (ENV). A retrovirus can be transformed into an LTR retrotransposon through inactivation or deletion of the domains that enable extracellular mobility. If such a retrovirus infects and subsequently inserts itself in the genome in germ line cells, it may become transmitted vertically and become an Endogenous Retrovirus (ERV). Endogenous retroviruses make up about 8% of the human genome and approximately 10% of the mouse genome.
In plant genomes, LTR retrotransposons are the major repetitive sequence class, e.g. able to constitute more than 75% of the maize genome.
Ty1-copia retrotransposons are abundant in species ranging from single-cell algae to bryophytes, gymnosperms, and angiosperms. They encode four protein domains in the following order: protease, integrase, reverse transcriptase, and ribonuclease H.
At least two classification systems exist for the subdivision of Ty1-copia retrotransposons into five lineages: Sireviruses/Maximus, Oryco/Ivana, Retrofit/Ale, TORK (subdivided in Angela/Sto, TAR/Fourf, GMR/Tork), and Bianca.
Sireviruses/Maximus retrotransposons contain an additional putative envelope gene. This lineage is named for the founder element SIRE1 in the Glycine max genome, and was later described in many species such as Zea mays, Arabidopsis thaliana, Beta vulgaris, and Pinus pinaster. Plant Sireviruses of many sequenced plant genomes are summarized at the MASIVEdb Sirevirus database.
Ty3-gypsy retrotransposons (Metaviridae) are widely distributed in the plant kingdom, including both gymnosperms and angiosperms. They encode at least four protein domains in the order: protease, reverse transcriptase, ribonuclease H, and integrase. Based on structure, presence/absence of specific protein domains, and conserved protein sequence motifs, they can be subdivided into several lineages:
Errantiviruses contain an additional defective envelope ORF with similarities to the retroviral envelope gene. First described as Athila-elements in Arabidopsis thaliana, they have been later identified in many species, such as Glycine max  and Beta vulgaris.
Chromoviruses contain an additional chromodomain (chromatin organization modifier domain) at the C-terminus of their integrase protein. They are widespread in plants and fungi, probably retaining protein domains during evolution of these two kingdoms. It is thought that the chromodomain directs retrotransposon integration to specific target sites. According to sequence and structure of the chromodomain, chromoviruses are subdivided into the four clades CRM, Tekay, Reina and Galadriel. Chromoviruses from each clade show distinctive integration patterns, e.g. into centromeres or into the rRNA genes.
Metaviruses describe conventional Ty3-gypsy retrotransposons that do not contain additional domains or ORFs.
Endogenous retroviruses (ERV)
Endogenous retroviruses are the most important LTR retrotransposons in mammals, including human where the Human ERVs make up 8% of the genome.
Non-LTR retrotransposons consist of two sub-types, long interspersed elements (LINEs) and short interspersed elements (SINEs). They can also be found in high copy numbers, as shown in the plant species. Non-long terminal repeat (LTR) retroposons are widespread in eukaryotic genomes. LINEs possess two ORFs, which encode all the functions needed for retrotransposition. These functions include reverse transcriptase and endonuclease activities, in addition to a nucleic acid-binding property needed to form a ribonucleoprotein particle. SINEs, on the other hand, co-opt the LINE machinery and function as nonautonomous retroelements.
Long Interspersed Nuclear Elements (LINE) are a group of genetic elements that are found in large numbers in eukaryotic genomes, comprising 17% of the human genome (99.9% of which is no longer capable of mobilization). Among the LINE, there are several subgroups, such as L1, L2 and L3. Human coding L1 begin with an untranslated region (UTR) that includes an RNA polymerase II promoter, two non-overlapping open reading frames (ORF1 and ORF2), and ends with another UTR. Recently, a new open reading frame in the 5' end of the LINE elements has been identified in the reverse strand. It is shown to be transcribed and endogenous proteins are observed. The name ORF0 is coined due to its position with respect to ORF1 and ORF2. ORF1 encodes an RNA binding protein and ORF2 encodes a protein having an endonuclease (e.g. RNase H) as well as a reverse transcriptase. The reverse transcriptase has a higher specificity for the LINE RNA than other RNA, and makes a DNA copy of the RNA that can be integrated into the genome at a new site. The endonuclease encoded by non-LTR retroposons may be AP (Apurinic/Pyrimidinic) type or REL (Restriction Endonuclease Like) type. Elements in the R2 group have REL type endonuclease, which shows site specificity in insertion.
The 5' UTR contains the promoter sequence, while the 3' UTR contains a polyadenylation signal (AATAAA) and a poly-A tail. Because LINEs (and other class I transposons, e.g. LTR retrotransposons and SINEs) move by copying themselves (instead of moving by a cut and paste like mechanism, as class II transposons do), they enlarge the genome. The human genome, for example, contains about 500,000 LINEs, which is roughly 17% of the genome. Of these, approximately 7,000 are full-length, a small subset of which are capable of retrotransposition.
Interestingly, it was recently found that specific LINE-1 retroposons in the human genome are actively transcribed and the associated LINE-1 RNAs are tightly bound to nucleosomes and essential in the establishment of local chromatin environment.
Short Interspersed Nuclear Elements are short DNA sequences (<500 bases) that represent reverse-transcribed RNA molecules originally transcribed by RNA polymerase III into tRNA, 5S ribosomal RNA, and other small nuclear RNAs. The mechanism of retrotransposition of these elements is more complicated than LINEs, and less dependent solely on the actual elements that they encode. SINEs do not encode a functional reverse transcriptase protein and rely on other mobile elements for transposition. In some cases they may have their own endonuclease that will allow them to cleave their way into the genome, but the majority of SINEs integrate at chromosomal breaks by using random DNA breaks to prime reverse transcriptase.
The most common SINEs in primates are called Alu sequences. Alu elements are approximately 350 base pairs long, do not contain any coding sequences, and can be recognized by the restriction enzyme AluI (hence the name). With about 1,500,000 copies, SINEs make up about 11% of the human genome. While historically viewed as "junk DNA", recent research suggests that, in some rare cases, both LINEs and SINEs were incorporated into novel genes so as to evolve new functionality. The distribution of these elements has been implicated in some genetic diseases and cancers. Although sequence analysis of human Alu subfamilies shows the existence of mosaic (recombinant) elements, experimental evidence is lacking. In the primitive eukaryote Entamoeba histolytica, the frequent exchange of sequence during retrotransposition has been reported; this results in a mosaic pattern in its SINE sequences.
- Endogenous retrovirus
- Insertion sequences
- Copy-number variation
- Genomic organization
- Interspersed repeat
- Retrotransposon markers, a powerful method of reconstructing phylogenies.
- SanMiguel P, Bennetzen JL (1998). "Evidence that a recent increase in maize genome size was caused by the massive amplification of intergene retrotranposons" (PDF). Annals of Botany 82 (Suppl A): 37–44. doi:10.1006/anbo.1998.0746.
- Li W, Zhang P, Fellers JP, Friebe B, Gill BS (November 2004). "Sequence composition, organization, and evolution of the core Triticeae genome". Plant J. 40 (4): 500–11. doi:10.1111/j.1365-313X.2004.02228.x. PMID 15500466.
- Lander ES, Linton LM, Birren B, et al. (February 2001). "Initial sequencing and analysis of the human genome". Nature 409 (6822): 860–921. doi:10.1038/35057062. PMID 11237011.
- Dombroski BA, Feng Q, Mathias SL, et al. (July 1994). "An in vivo assay for the reverse transcriptase of human retrotransposon L1 in Saccharomyces cerevisiae". Mol. Cell. Biol. 14 (7): 4485–92. doi:10.1128/mcb.14.7.4485. PMC 358820. PMID 7516468.
- Xiong, Y; Eickbush, TH (October 1990). "Origin and evolution of retroelements based upon their reverse transcriptase sequences.". The EMBO Journal 9 (10): 3353–62. PMID 1698615.
- Copeland CS, Mann VH, Morales ME, Kalinna BH, Brindley PJ (2005). "The Sinbad retrotransposon from the genome of the human blood fluke, Schistosoma mansoni, and the distribution of related Pao-like elements". BMC Evol. Biol. 5 (1): 20. doi:10.1186/1471-2148-5-20. PMC 554778. PMID 15725362.
- Wicker T, Sabot F, Hua-Van A, et al. (December 2007). "A unified classification system for eukaryotic transposable elements". Nat. Rev. Genet. 8 (12): 973–82. doi:10.1038/nrg2165. PMID 17984973.
- McCarthy EM, McDonald JF (2004). "Long terminal repeat retrotransposons of Mus musculus". Genome Biol. 5 (3): R14. doi:10.1186/gb-2004-5-3-r14. PMC 395764. PMID 15003117.
- Baucom, RS; Estill, JC; Chaparro, C; Upshaw, N; Jogi, A; Deragon, JM; Westerman, RP; Sanmiguel, PJ; Bennetzen, JL (November 2009). "Exceptional diversity, non-random distribution, and rapid evolution of retroelements in the B73 maize genome.". PLoS Genetics 5 (11): e1000732. doi:10.1371/journal.pgen.1000732. PMID 19936065.
- Wicker, T; Keller, B (July 2007). "Genome-wide comparative analysis of copia retrotransposons in Triticeae, rice, and Arabidopsis reveals conserved ancient evolutionary lineages and distinct dynamics of individual copia families.". Genome Research 17 (7): 1072–81. doi:10.1101/gr.6214107. PMID 17556529.
- Llorens, C; Muñoz-Pomer, A; Bernad, L; Botella, H; Moya, A (2 November 2009). "Network dynamics of eukaryotic LTR retroelements beyond phylogenetic trees.". Biology Direct 4: 41. doi:10.1186/1745-6150-4-41. PMID 19883502.
- Laten, HM; Majumdar, A; Gaucher, EA (9 June 1998). "SIRE-1, a copia/Ty1-like retroelement from soybean, encodes a retroviral envelope-like protein.". Proceedings of the National Academy of Sciences of the United States of America 95 (12): 6897–902. doi:10.1073/pnas.95.12.6897. PMID 9618510.
- Bousios, A; Kourmpetis, YA; Pavlidis, P; Minga, E; Tsaftaris, A; Darzentas, N (February 2012). "The turbulent life of Sirevirus retrotransposons and the evolution of the maize genome: more than ten thousand elements tell the story.". The Plant journal 69 (3): 475–88. doi:10.1111/j.1365-313x.2011.04806.x. PMID 21967390.
- Kapitonov, VV; Jurka, J (1999). "Molecular paleontology of transposable elements from Arabidopsis thaliana.". Genetica 107 (1-3): 27–37. PMID 10952195.
- Weber, B; Wenke, T; Frömmel, U; Schmidt, T; Heitkam, T (February 2010). "The Ty1-copia families SALIRE and Cotzilla populating the Beta vulgaris genome show remarkable differences in abundance, chromosomal distribution, and age.". Chromosome Research 18 (2): 247–63. doi:10.1007/s10577-009-9104-4. PMID 20039119.
- Miguel, C; Simões, M; Oliveira, MM; Rocheta, M (November 2008). "Envelope-like retrotransposons in the plant kingdom: evidence of their presence in gymnosperms (Pinus pinaster).". Journal of Molecular Evolution 67 (5): 517–25. doi:10.1007/s00239-008-9168-3. PMID 18925379.
- Bousios, A; Minga, E; Kalitsou, N; Pantermali, M; Tsaballa, A; Darzentas, N (30 April 2012). "MASiVEdb: the Sirevirus Plant Retrotransposon Database.". BMC Genomics 13: 158. doi:10.1186/1471-2164-13-158. PMID 22545773.
- Pélissier, T; Tutois, S; Deragon, JM; Tourmente, S; Genestier, S; Picard, G (November 1995). "Athila, a new retroelement from Arabidopsis thaliana.". Plant Molecular Biology 29 (3): 441–52. doi:10.1007/bf00020976. PMID 8534844.
- Wright, DA; Voytas, DF (June 1998). "Potential retroviruses in plants: Tat1 is related to a group of Arabidopsis thaliana Ty3/gypsy retrotransposons that encode envelope-like proteins.". Genetics 149 (2): 703–15. PMID 9611185.
- Wright, DA; Voytas, DF (January 2002). "Athila4 of Arabidopsis and Calypso of soybean define a lineage of endogenous plant retroviruses.". Genome Research 12 (1): 122–31. doi:10.1101/gr.196001. PMID 11779837.
- Wollrab, C; Heitkam, T; Holtgräwe, D; Weisshaar, B; Minoche, AE; Dohm, JC; Himmelbauer, H; Schmidt, T (November 2012). "Evolutionary reshuffling in the Errantivirus lineage Elbe within the Beta vulgaris genome.". The Plant Journal 72 (4): 636–51. doi:10.1111/j.1365-313x.2012.05107.x. PMID 22804913.
- Marín, I; Lloréns, C (July 2000). "Ty3/Gypsy retrotransposons: description of new Arabidopsis thaliana elements and evolutionary perspectives derived from comparative genomic data.". Molecular Biology and Evolution 17 (7): 1040–9. doi:10.1093/oxfordjournals.molbev.a026385. PMID 10889217.
- Gorinsek, B; Gubensek, F; Kordis, D (May 2004). "Evolutionary genomics of chromoviruses in eukaryotes.". Molecular Biology and Evolution 21 (5): 781–98. doi:10.1093/molbev/msh057. PMID 14739248.
- Novikova, O; Smyshlyaev, G; Blinov, A (8 April 2010). "Evolutionary genomics revealed interkingdom distribution of Tcn1-like chromodomain-containing Gypsy LTR retrotransposons among fungi and plants.". BMC Genomics 11: 231. doi:10.1186/1471-2164-11-231. PMID 20377908.
- Gao, X; Hou, Y; Ebina, H; Levin, HL; Voytas, DF (March 2008). "Chromodomains direct integration of retrotransposons to heterochromatin.". Genome Research 18 (3): 359–69. doi:10.1101/gr.7146408. PMID 18256242.
- Neumann, P; Navrátilová, A; Koblížková, A; Kejnovský, E; Hřibová, E; Hobza, R; Widmer, A; Doležel, J; Macas, J (3 March 2011). "Plant centromeric retrotransposons: a structural and cytogenetic perspective.". Mobile DNA 2 (1): 4. doi:10.1186/1759-8753-2-4. PMID 21371312.
- Weber, B; Heitkam, T; Holtgräwe, D; Weisshaar, B; Minoche, AE; Dohm, JC; Himmelbauer, H; Schmidt, T (1 March 2013). "Highly diverse chromoviruses of Beta vulgaris are classified by chromodomains and chromosomal integration.". Mobile DNA 4 (1): 8. doi:10.1186/1759-8753-4-8. PMID 23448600.
- Macas, J; Neumann, P (1 April 2007). "Ogre elements--a distinct group of plant Ty3/gypsy-like retrotransposons.". Gene 390 (1-2): 108–16. doi:10.1016/j.gene.2006.08.007. PMID 17052864.
- Neumann, P; Pozárková, D; Macas, J (October 2003). "Highly abundant pea LTR retrotransposon Ogre is constitutively transcribed and partially spliced.". Plant Molecular Biology 53 (3): 399–410. doi:10.1023/b:plan.0000006945.77043.ce. PMID 14750527.
- Schmidt, Thomas (1999-08-01). "LINEs, SINEs and repetitive DNA: non-LTR retrotransposons in plant genomes". Plant Molecular Biology 40 (6): 903–910. doi:10.1023/A:1006212929794. ISSN 0167-4412.
- Yadav, VP; Mandal, PK; Rao, DN; Bhattacharya, S (December 2009). "Characterization of the restriction enzyme-like endonuclease encoded by the Entamoeba histolytica non-long terminal repeat retroposon EhLINE1". The FEBS Journal 276 (23): 7070–82. doi:10.1111/j.1742-4658.2009.07419.x. PMID 19878305.
- Singer MF (March 1982). "SINEs and LINEs: highly repeated short and long interspersed sequences in mammalian genomes". Cell 28 (3): 433–4. doi:10.1016/0092-8674(82)90194-5. PMID 6280868.
- Doucet AJ, Hulme AE, Sahinovic E, Kulpa DA, Moldovan JB, Kopera HC, Athanikar JN, Hasnaoui M, Bucheton A, Moran JV, Gilbert N (October 7, 2010). "Characterization of LINE-1 ribonucleoprotein particles". PLOS Genetics 6 (10): e1001150. doi:10.1371/journal.pgen.1001150. PMC 2951350. PMID 20949108.
- Denli, AM; Narvaiza, I; Kerman, BE; Pena, M; Benner, C; Marchetto, MC; Diedrich, JK; Aslanian, A; Ma, J; Moresco, JJ; Moore, L; Hunter, T; Saghatelian, A; Gage, FH (22 October 2015). "Primate-Specific ORF0 Contributes to Retrotransposon-Mediated Diversity.". Cell 163 (3): 583–93. doi:10.1016/j.cell.2015.09.025. PMID 26496605.
- Ohshima K, Okada N (2005). "SINEs and LINEs: symbionts of eukaryotic genomes with a common tail". Cytogenet. Genome Res. 110 (1–4): 475–90. doi:10.1159/000084981. PMID 16093701.
- Yadav, VP; Mandal, PK; Rao, DN; Bhattacharya, S (December 2009). "Characterization of the restriction enzyme-like endonuclease encoded by the Entamoeba histolytica non-long terminal repeat retrotransposon EhLINE1". The FEBS Journal 276 (23): 7070–82. doi:10.1111/j.1742-4658.2009.07419.x. PMID 19878305.
- Deininger PL, Batzer MA (October 2002). "Mammalian retroelements". Genome Res. 12 (10): 1455–65. doi:10.1101/gr.282402. PMID 12368238.
- Richard Cordaux and Mark Batzer (October 2009). "The impact of retrotransposons on human genome evolution". Nature Reviews Genetics 10 (10): 691–703. doi:10.1038/nrg2640. PMC 2884099. PMID 19763152.
- Griffiths, Anthony J. (2008). Introduction to genetic analysis (9th ed.). New York: W.H. Freeman. p. 505. ISBN 0-7167-6887-9.
- Rangwala S, Kazazian HH (2009). "Many LINE1 elements contribute to the transcriptome of human somatic cells". Genome Biology 10 (9): R100. doi:10.1186/gb-2009-10-9-r100. PMC 2768975. PMID 19772661.
- Chueh, A.C.; Northrop, Emma L.; Brettingham-Moore, Kate H.; Choo, K. H. Andy; Wong, Lee H. (Jan 2009). Bickmore, Wendy A., ed. "LINE Retrotransposon RNA Is an Essential Structural and Functional Epigenetic Component of a Core Neocentromeric Chromatin". PLoS Genetics 5 (1): e1000354. doi:10.1371/journal.pgen.1000354. PMC 2625447. PMID 19180186.
- Stansfield, William D.; King, Robert C. (1997). A dictionary of genetics (5th ed.). Oxford [Oxfordshire]: Oxford University Press. ISBN 0-19-509441-7.
- Santangelo, Andrea; de Souza, Flavio; Franchini, Lucia; Bumaschny, Viviana; Low, Malcolm; Rubinstein,Marcelo (October 2007). "Ancient Exaptation of a CORE-SINE Retroposon into a Highly Conserved Mammalian Neuronal Enhancer of the Proopiomelanocortin Gene". PLoS Genetics (Public Library of Science) 3 (10): 1813–26. doi:10.1371/journal.pgen.0030166. PMC 2000970. PMID 17922573. Retrieved 2007-12-31. Cite uses deprecated parameter
- Liang, Kung-Hao; Yeh, Chau-Ting (2013). "A gene expression restriction network mediated by sense and antisense Alu sequences located on protein-coding messenger RNAs.". BMC Genomics 14: 325. doi:10.1186/1471-2164-14-325. PMC 3655826. PMID 23663499. Retrieved 2013-05-11.
- Yadav, Vijay Pal; Mandal, Prabhat Kumar; Bhattacharya, Alok; Bhattacharya, Sudha (21 May 2012). "Recombinant SINEs are formed at high frequency during induced retrotransposition in vivo". Nature Communications 3: 854. doi:10.1038/ncomms1855.