Overlapping gene

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

An overlapping gene is a gene whose expressible nucleotide sequence partially overlaps with the expressible nucleotide sequence of another gene.[1] In this way, a nucleotide sequence may make a contribution to the function of one or more gene products. Overprinting refers to a type of overlap in which all or part of the sequence of one gene is read in an alternate reading frame from another gene at the same locus. Overprinting has been hypothesized as a mechanism for de novo emergence of new genes from existing sequences, either older genes or previously non-coding regions of the genome.[2] Overprinted genes are particularly common features of the genomic organization of viruses, likely to greatly increase the number of potential expressible genes from a small set of viral genetic information.


Tandem out-of-phase overlap of the human mitochondrial genes ATP8 (+1 frame, in red) and ATP6 (+3 frame, in blue)[3]

Genes may overlap in a variety of ways and can be classified by their positions relative to each other.[1][4][5][6][7]

  • Unidirectional or tandem overlap: the 3' end of one gene overlaps with the 5' end of another gene on the same strand. This arrangement can be symbolized with the notation → → where arrows indicate the reading frame from start to end.
  • Convergent or end-on overlap: the 3' ends of the two genes overlap on opposite strands. This can be written as → ←.
  • Divergent or tail-on overlap: the 5' ends of the two genes overlap on opposite strands. This can be written as ← →.

Overlapping genes can also be classified by phases, which describe their relative reading frames:[1][4][5][6][7]

  • In-phase overlap occurs when the shared sequences use the same reading frame. This is also known as "phase 0". Unidirectional genes with phase 0 overlap are not considered distinct genes, but rather as alternative start sites of the same gene.
  • Out-of-phase overlaps occurs when the shared sequences use different reading frames. This can occur in "phase 1" or "phase 2", depending on whether the reading frames are offset by 1 or 2 nucleotides. Because a codon is three nucleotides long, an offset of three nucleotides is an in-phase, phase 0 frame.


Overlapping genes are particularly common in rapidly evolving genomes, such as those of viruses, bacteria, and mitochondria. They may originate in three ways:[8]

  1. By extension of an existing open reading frame (ORF) downstream into a contiguous gene due to the loss of a stop codon;
  2. By extension of an existing ORF upstream into a contiguous gene due to loss of an initiation codon;
  3. By generation of a novel ORF within an existing one due to a point mutation.

The use of the same nucleotide sequence to encode multiple genes may provide evolutionary advantage due to reduction in genome size and due to the opportunity for transcriptional and translational co-regulation of the overlapping genes.[5][9][10][11] Gene overlaps introduce novel evolutionary constraints on the sequences of the overlap regions.[7][12]

Origins of new genes[edit]

A cladogram indicating the likely evolutionary trajectory of the gene-dense pX region in human T-lymphotropic virus 1 (HTLV1), a deltaretrovirus associated with blood cancers. This region contains numerous overlapping genes, several of which likely originated de novo through overprinting.[13]

In 1977, Pierre-Paul Grassé proposed that one of the genes in the pair could have originated de novo by mutations to introduce novel ORFs in alternate reading frames; he described the mechanism as overprinting.[14]:231 It was later substantiated by Susumu Ohno, who identified a candidate gene that may have arisen by this mechanism.[15] Some de novo genes originating in this way may not remain overlapping, but subfunctionalize following gene duplication[2], contributing to the prevalence of orphan genes. Which member of an overlapping gene pair is younger can be identified bioinformatically either by a more restricted phylogenetic distribution, or by less optimized codon usage.[13][16][17] Younger members of the pair tend to have higher intrinsic structural disorder than older members, but the older members are also more disordered than other proteins, presumably as a way of alleviating the increased evolutionary constraints posed by overlap.[16] Overlaps are more likely to originate in proteins that already have high disorder.[16]

Taxonomic distribution[edit]

Overlapping genes in the bacteriophage ΦX174 genome. There are 11 genes in this genome (A, A*, B-H, J, K). Genes B, K, E overlap with genes A, C, D.[18]

Overlapping genes occur in all domains of life, though with varying frequencies. They are especially common in viral genomes.


The RNA silencing suppressor p19 from tomato bushy stunt virus, a protein encoded by an overprinted gene. The protein specifically binds siRNAs produced as part of the plant's RNA silencing defense against viruses.[19]

The existence of overlapping genes was first identified in viruses; the first DNA genome ever sequenced, of the bacteriophage ΦX174, contained several examples.[18] Overlapping genes are particularly common in viral genomes.[13] Some studies attribute this observation to selective pressure toward small genome sizes mediated by the physical constraints of packaging the genome in a viral capsid, particularly one of icosahedral geometry.[20] However, other studies dispute this conclusion and argue that the distribution of overlaps in viral genomes is more likely to reflect overprinting as the evolutionary origin of overlapping viral genes.[21] Overprinting is a common source of de novo genes in viruses.[17]

Studies of overprinted viral genes suggest that their protein products tend to be accessory proteins which are not essential to viral proliferation, but contribute to pathogenicity. Overprinted proteins often have unusual amino acid distributions and high levels of intrinsic disorder.[22] In some cases overprinted proteins do have well-defined, but novel, three-dimensional structures;[23] one example is the RNA silencing suppressor p19 found in Tombusviruses, which has both a novel protein fold and a novel binding mode in recognizing siRNAs.[17][19][24]


Estimates of gene overlap in bacterial genomes typically find that around one third of bacterial genes are overlapped, though usually only by a few base pairs.[5][25][26] Most studies of overlap in bacterial genomes find evidence that overlap serves a function in gene regulation, permitting the overlapped genes to be transcriptionally and translationally co-regulated.[5][11] In prokaryotic genomes, unidirectional overlaps are most common, possibly due to the tendency of adjacent prokaryotic genes to share orientation.[5][7][4] Among unidirectional overlaps, long overlaps are more commonly read with a one-nucleotide offset in reading frame (i.e., phase 1) and short overlaps are more commonly read in phase 2.[26][27] Long overlaps of greater than 60 base pairs are more common for convergent genes; however, putative long overlaps have very high rates of misannotation.[28] Robustly validated examples of long overlaps in bacterial genomes are rare; in the well-studied model organism Escherichia coli, only four gene pairs are well validated as having long, overprinted overlaps.[29]


Compared to prokaryotic genomes, eukaryotic genomes are often poorly annotated and thus identifying genuine overlaps is relatively challenging.[17] However, examples of validated gene overlaps have been documented in a variety of eukaryotic organisms, including mammals such as mice and humans.[30][31][32][33] Eukaryotes differ from prokaryotes in distribution of overlap types: while unidirectional (i.e., same-strand) overlaps are most common in prokaryotes, opposite or antiparallel-strand overlaps are more common in eukaryotes. Among the opposite-strand overlaps, convergent orientation is most common.[31] Most studies of eukaryotic gene overlap have found that overlapping genes are extensively subject to genomic reorganization even in closely related species, and thus the presence of an overlap is not always well-conserved.[32][34] Overlap with older or less taxonomically restricted genes is also a common feature of genes likely to have originated de novo in a given eukaryotic lineage.[32][35][36]


  1. ^ a b c Y. Fukuda, M. Tomita et T. Washio (1999). "Comparative study of overlapping genes in the genomes of Mycoplasma genitalium and Mycoplasma pneumoniae". Nucleic Acids Res. 27 (8): 1847–1853. doi:10.1093/nar/27.8.1847. PMC 148392. PMID 10101192.
  2. ^ a b Keese, PK; Gibbs, A (15 October 1992). "Origins of genes: "big bang" or continuous creation?". Proceedings of the National Academy of Sciences of the United States of America. 89 (20): 9489–93. Bibcode:1992PNAS...89.9489K. doi:10.1073/pnas.89.20.9489. PMC 50157. PMID 1329098.
  3. ^ Anderson S, Bankier AT, Barrell BG, de Bruijn MH, Coulson AR, Drouin J, Eperon IC, Nierlich DP, Roe BA, Sanger F, Schreier PH, Smith AJ, Staden R, Young IG (April 1981). "Sequence and organization of the human mitochondrial genome". Nature. 290 (5806): 457–465. Bibcode:1981Natur.290..457A. doi:10.1038/290457a0. PMID 7219534. S2CID 4355527.
  4. ^ a b c Fukuda, Yoko; Nakayama, Yoichi; Tomita, Masaru (December 2003). "On dynamics of overlapping genes in bacterial genomes". Gene. 323: 181–187. doi:10.1016/j.gene.2003.09.021. PMID 14659892.
  5. ^ a b c d e f Johnson Z, Chisholm S (2004). "Properties of overlapping genes are conserved across microbial genomes". Genome Res. 14 (11): 2268–72. doi:10.1101/gr.2433104. PMC 525685. PMID 15520290.
  6. ^ a b Normark S.; Bergstrom S.; Edlund T.; Grundstrom T.; Jaurin B.; Lindberg F.P.; Olsson O. (1983). "Overlapping genes". Annual Review of Genetics. 17: 499–525. doi:10.1146/annurev.ge.17.120183.002435. PMID 6198955.
  7. ^ a b c d Rogozin, Igor B.; Spiridonov, Alexey N.; Sorokin, Alexander V.; Wolf, Yuri I.; Jordan, I.King; Tatusov, Roman L.; Koonin, Eugene V. (May 2002). "Purifying and directional selection in overlapping prokaryotic genes". Trends in Genetics. 18 (5): 228–232. doi:10.1016/S0168-9525(02)02649-5. PMID 12047938.
  8. ^ Krakauer, David C. (June 2000). "Stability and Evolution of Overlapping Genes". Evolution. 54 (3): 731–739. doi:10.1111/j.0014-3820.2000.tb00075.x. PMID 10937248.
  9. ^ Delaye, Luis; DeLuna, Alexander; Lazcano, Antonio; Becerra, Arturo (2008). "The origin of a novel gene through overprinting in Escherichia coli". BMC Evolutionary Biology. 8 (1): 31. doi:10.1186/1471-2148-8-31. PMC 2268670. PMID 18226237.
  10. ^ Saha, Deeya; Podder, Soumita; Panda, Arup; Ghosh, Tapash Chandra (May 2016). "Overlapping genes: A significant genomic correlate of prokaryotic growth rates". Gene. 582 (2): 143–147. doi:10.1016/j.gene.2016.02.002. PMID 26853049.
  11. ^ a b Luo, Yingqin; Battistuzzi, Fabia; Lin, Kui; Gibas, Cynthia (29 November 2013). "Evolutionary Dynamics of Overlapped Genes in Salmonella". PLOS ONE. 8 (11): e81016. doi:10.1371/journal.pone.0081016. PMC 3843671. PMID 24312259.
  12. ^ Wei, X.; Zhang, J. (31 December 2014). "A Simple Method for Estimating the Strength of Natural Selection on Overlapping Genes". Genome Biology and Evolution. 7 (1): 381–390. doi:10.1093/gbe/evu294. PMC 4316641. PMID 25552532.
  13. ^ a b c Pavesi, Angelo; Magiorkinis, Gkikas; Karlin, David G.; Wilke, Claus O. (15 August 2013). "Viral Proteins Originated De Novo by Overprinting Can Be Identified by Codon Usage: Application to the "Gene Nursery" of Deltaretroviruses". PLOS Computational Biology. 9 (8): e1003162. doi:10.1371/journal.pcbi.1003162. PMC 3744397. PMID 23966842.
  14. ^ Grassé, Pierre-Paul (1977). Evolution of Living Organisms: Evidence for a New Theory of Transformation. Academic Press. ISBN 9781483274096.
  15. ^ Ohno, S (April 1984). "Birth of a unique enzyme from an alternative reading frame of the preexisted, internally repetitious coding sequence". Proceedings of the National Academy of Sciences of the United States of America. 81 (8): 2421–5. Bibcode:1984PNAS...81.2421O. doi:10.1073/pnas.81.8.2421. PMC 345072. PMID 6585807.
  16. ^ a b c Willis, Sara; Masel, Joanna (19 July 2018). "Gene Birth Contributes to Structural Disorder Encoded by Overlapping Genes". Genetics. 210 (1): 303–313. doi:10.1534/genetics.118.301249. PMC 6116962. PMID 30026186.
  17. ^ a b c d Sabath, N.; Wagner, A.; Karlin, D. (19 July 2012). "Evolution of Viral Proteins Originated De Novo by Overprinting". Molecular Biology and Evolution. 29 (12): 3767–3780. doi:10.1093/molbev/mss179. PMC 3494269. PMID 22821011.
  18. ^ a b Sanger, F.; Air, G. M.; Barrell, B. G.; Brown, N. L.; Coulson, A. R.; Fiddes, J. C.; Hutchison, C. A.; Slocombe, P. M.; Smith, M. (1977). "Nucleotide sequence of bacteriophage ΦX174 DNA". Nature. 265 (5596): 687–95. Bibcode:1977Natur.265..687S. doi:10.1038/265687a0. PMID 870828. S2CID 4206886.
  19. ^ a b Ye, Keqiong; Malinina, Lucy; Patel, Dinshaw J. (3 December 2003). "Recognition of small interfering RNA by a viral suppressor of RNA silencing". Nature. 426 (6968): 874–878. Bibcode:2003Natur.426..874Y. doi:10.1038/nature02213. PMC 4694583. PMID 14661029.
  20. ^ Chirico, N.; Vianelli, A.; Belshaw, R. (7 July 2010). "Why genes overlap in viruses". Proceedings of the Royal Society B: Biological Sciences. 277 (1701): 3809–3817. doi:10.1098/rspb.2010.1052. PMC 2992710. PMID 20610432.
  21. ^ Brandes, Nadav; Linial, Michal (21 May 2016). "Gene overlapping and size constraints in the viral world". Biology Direct. 11 (1): 26. doi:10.1186/s13062-016-0128-3. PMC 4875738. PMID 27209091.
  22. ^ Rancurel, C.; Khosravi, M.; Dunker, A. K.; Romero, P. R.; Karlin, D. (29 July 2009). "Overlapping Genes Produce Proteins with Unusual Sequence Properties and Offer Insight into De Novo Protein Creation". Journal of Virology. 83 (20): 10719–10736. doi:10.1128/JVI.00595-09. PMC 2753099. PMID 19640978.
  23. ^ Abroi, Aare (1 December 2015). "A protein domain-based view of the virosphere–host relationship". Biochimie. 119: 231–243. doi:10.1016/j.biochi.2015.08.008. PMID 26296474.
  24. ^ Vargason, Jeffrey M; Szittya, György; Burgyán, József; Hall, Traci M.Tanaka (December 2003). "Size Selective Recognition of siRNA by an RNA Silencing Suppressor". Cell. 115 (7): 799–811. doi:10.1016/S0092-8674(03)00984-X. PMID 14697199. S2CID 12993441.
  25. ^ Huvet, Maxime; Stumpf, Michael PH (1 January 2014). "Overlapping genes: a window on gene evolvability". BMC Genomics. 15 (1): 721. doi:10.1186/1471-2164-15-721. ISSN 1471-2164. PMC 4161906. PMID 25159814.
  26. ^ a b Cock, Peter J. A.; Whitworth, David E. (19 March 2007). "Evolution of Gene Overlaps: Relative Reading Frame Bias in Prokaryotic Two-Component System Genes". Journal of Molecular Evolution. 64 (4): 457–462. doi:10.1007/s00239-006-0180-1. PMID 17479344. S2CID 21612308.
  27. ^ Fonseca, M. M.; Harris, D. J.; Posada, D. (5 November 2013). "Origin and Length Distribution of Unidirectional Prokaryotic Overlapping Genes". G3: Genes, Genomes, Genetics. 4 (1): 19–27. doi:10.1534/g3.113.005652. PMC 3887535. PMID 24192837.
  28. ^ Pallejà, Albert; Harrington, Eoghan D; Bork, Peer (2008). "Large gene overlaps in prokaryotic genomes: result of functional constraints or mispredictions?". BMC Genomics. 9 (1): 335. doi:10.1186/1471-2164-9-335. PMC 2478687. PMID 18627618.
  29. ^ Fellner, Lea; Simon, Svenja; Scherling, Christian; Witting, Michael; Schober, Steffen; Polte, Christine; Schmitt-Kopplin, Philippe; Keim, Daniel A.; Scherer, Siegfried; Neuhaus, Klaus (18 December 2015). "Evidence for the recent origin of a bacterial protein-coding, overlapping orphan gene by evolutionary overprinting". BMC Evolutionary Biology. 15 (1): 283. doi:10.1186/s12862-015-0558-z. PMC 4683798. PMID 26677845.
  30. ^ McLysaght, Aoife; Guerzoni, Daniele (31 August 2015). "New genes from non-coding sequence: the role of de novo protein-coding genes in eukaryotic evolutionary innovation". Philosophical Transactions of the Royal Society B: Biological Sciences. 370 (1678): 20140332. doi:10.1098/rstb.2014.0332. PMC 4571571. PMID 26323763.
  31. ^ a b C. Sanna, W. Li et L. Zhang (2008). "Overlapping genes in the human and mouse genomes". BMC Genomics. 9 (169): 169. doi:10.1186/1471-2164-9-169. PMC 2335118. PMID 18410680.
  32. ^ a b c Makałowska, Izabela; Lin, Chiao-Feng; Hernandez, Krisitina (2007). "Birth and death of gene overlaps in vertebrates". BMC Evolutionary Biology. 7 (1): 193. doi:10.1186/1471-2148-7-193. PMC 2151771. PMID 17939861.
  33. ^ Veeramachaneni, V. (1 February 2004). "Mammalian Overlapping Genes: The Comparative Perspective". Genome Research. 14 (2): 280–286. doi:10.1101/gr.1590904. PMC 327103. PMID 14762064.
  34. ^ Behura, Susanta K; Severson, David W (2013). "Overlapping genes of Aedes aegypti: evolutionary implications from comparison with orthologs of Anopheles gambiae and other insects". BMC Evolutionary Biology. 13 (1): 124. doi:10.1186/1471-2148-13-124. PMC 3689595. PMID 23777277.
  35. ^ Murphy, Daniel N.; McLysaght, Aoife; Carmel, Liran (21 November 2012). "De Novo Origin of Protein-Coding Genes in Murine Rodents". PLOS ONE. 7 (11): e48650. Bibcode:2012PLoSO...748650M. doi:10.1371/journal.pone.0048650. PMC 3504067. PMID 23185269.
  36. ^ Knowles, D. G.; McLysaght, A. (2 September 2009). "Recent de novo origin of human protein-coding genes". Genome Research. 19 (10): 1752–1759. doi:10.1101/gr.095026.109. PMC 2765279. PMID 19726446.