Orphan genes (also called ORFans, especially in microbial literature) are genes without detectable homologues in other lineages. Orphans are a subset of taxonomically-restricted genes (TRGs), which are unique to a specific taxonomic level (e.g. plant-specific). In contrast to non-orphan TRGs, orphans are usually considered unique to a very narrow taxon, generally a species.
The classic model of evolution is based on duplication, rearrangement, and mutation of genes with the idea of common descent. Orphan genes differ in that they are lineage-specific with no known history of shared duplication and rearrangement outside of their specific species or clade. Orphan genes may arise through a variety of mechanisms, such as horizontal gene transfer, duplication and rapid divergence, and de novo emergence from non-coding sequence. These processes may act at different rates in insects, primates, and plants. Despite their relatively recent origin, orphan genes may encode functionally important proteins.
History of orphan genes
Orphan genes were first discovered when the yeast genome-sequencing project began in 1996. Orphan genes accounted for an estimated 26% of the yeast genome, but it was believed that these genes could be classified with homologues when more genomes were sequenced. At the time, gene duplication was considered the only serious model of gene evolution and there were few sequenced genomes for comparison, so a lack of detectable homologues was thought to be most likely due to a lack of sequencing data and not due to a true lack of homology. However, orphan genes continued to persist as the quantity of sequenced genomes grew, eventually leading to the conclusion that orphan genes are ubiquitous to all genomes. Estimates of the percentage of genes which are orphans varies enormously between species and between studies; 10-30% is a commonly cited figure.
The study of orphan genes emerged largely after the turn of the century. In 2003, a study of Caenorhabditis briggsae and related species compared over 2000 genes. They proposed that these genes must be evolving too quickly to be detected and are consequently sites of very rapid evolution. In 2005, Wilson examined 122 bacterial species to try to examine whether the large number of orphan genes in many species was legitimate. The study found that it was legitimate and played a role in bacterial adaptation. The definition of taxonomically-restricted genes was introduced into the literature to make orphan genes seem less "mysterious."
In 2008, a yeast protein of established functionality, BSC4, was found to have evolved de novo from non-coding sequences whose homology was still detectable in sister species.
In 2009, an orphan gene was discovered to regulate an internal biological network: the orphan gene, QQS, from Arabidopsis thaliana modifies plant composition. The QQS orphan protein interacts with a conserved transcription factor, these data explain the compositional changes (increased protein) that are induced when QQS is engineered into diverse species. In 2011, a comprehensive genome-wide study of the extent and evolutionary origins of orphan genes in plants was conducted in the model plant Arabidopsis thaliana "
How to identify orphan genes
Genes can be tentatively classified as orphans if no orthologous proteins can be found in nearby species.
One method used to estimate nucleotide or protein sequence similarity indicative of homology (i.e. similarity due to common origin) is the Basic Local Alignment Search Tool (BLAST). BLAST allows query sequences to be rapidly searched against large sequence databases. Simulations suggest that under certain conditions BLAST is suitable for detecting distant relatives of a gene. However, genes that are short and evolve rapidly can easily be missed by BLAST.
The systematic detection of homology to annotate orphan genes is called phylostratigraphy. Phylostratigraphy generates a phylogenetic tree in which the homology is calculated between all genes of a focal species and the genes of other species. The earliest common ancestor for a gene determines the age, or phylostratum, of the gene. The term "orphan" is sometimes used only for the youngest phylostratum containing only a single species, but when interpreted broadly as a taxonomically-restricted gene, it can refer to all but the oldest phylostratum, with the gene orphaned within a larger clade.
Where do orphan genes come from?
Orphan genes arise from multiple sources, predominantly through de novo origination, duplication and rapid divergence, and horizontal gene transfer.
De Novo Origination
Novel orphan genes continually arise de novo from non-coding sequences. These novel genes may be sufficiently beneficial to be swept to fixation by selection. Or, more likely, they will fade back into the non-genic background. This latter option is supported by research in Drosophila showing that young genes are more likely go extinct.
De novo genes were once thought to be a near impossibility due to the complex and potentially fragile intricacies of creating and maintaining functional polypeptides, but research from the past 10 years or so has found multiple examples of de novo genes, some of which are associated with important biological processes, particularly testes function in animals. De novo genes were also found in fungi and plants.
For young orphan genes, it is sometimes possible to find homologous non-coding DNA sequences in sister taxa, which is generally accepted as strong evidence of de novo origin. However, the contribution of de novo origination to taxonomically-restricted genes of older origin, particularly in relation to the traditional gene duplication theory of gene evolution, remains contested.
Duplication and Divergence
The duplication and divergence model for orphan genes involves a new gene being created from some duplication or divergence event and undergoing a period of rapid evolution where all detectable similarity to the originally duplicated gene is lost. While this explanation is consistent with current understandings of duplication mechanisms, the number of mutations needed to lose detectable similarity is large enough as to be a rare event, and the evolutionary mechanism by which a gene duplicate could be sequestered and diverge so rapidly remains unclear.
Horizontal Gene Transfer
Another explanation for how orphan genes arise is through a duplication mechanism called horizontal gene transfer, where the original duplicated gene derives from a separate, unknown lineage. This explanation for the origin of orphan genes is especially relevant in bacteria and archaea, where horizontal gene transfer is common.
Orphans genes tend to be very short (~6 times shorter than mature genes), and some are weakly expressed, tissue specific and simpler in codon usage and amino acid composition. Orphan genes tend to encode more intrinsically disordered proteins, although some structure has been found in one of the best characterized orphan genes. Of the tens of thousands of enzymes of primary or specialized metabolism that have been characterized to date, none are orphans, or even of restricted lineage; apparently, catalysis requires hundreds of millions of years of evolution.
While the prevalence of orphan genes has been established, the evolutionary role of orphans, and its resulting importance, is still being debated. One theory is that many orphans have no evolutionary role; genomes contain non-functional open reading frames (ORFs) that create spurious polypeptide products not maintained by selection, meaning that they are unlikely to be conserved between species and would likely be detected as orphan genes. However, a variety of other studies have shown that at least some orphans are functionally important and may help explain the emergence of novel phenotypes.
- Fischer, D.; Eisenberg, D. (1 September 1999). "Finding families for genomic ORFans". Bioinformatics. 15 (9): 759–762. doi:10.1093/bioinformatics/15.9.759. PMID 10498776.
- Tautz, D.; Domazet-Lošo, T. (2011). "The evolutionary origin of orphan genes". Nature Reviews Genetics. 12 (10): 692–702. doi:10.1038/nrg3053. PMID 21878963.
- Khalturin, K; Hemmrich, G; Fraune, S; Augustin, R; Bosch, TC (2009). "More than just orphans: are taxonomically-restricted genes important in evolution?". Trends in Genetics. 25 (9): 404–413. doi:10.1016/j.tig.2009.07.006. PMID 19716618.
- Ohno, Susumu (11 December 2013). Evolution by Gene Duplication. Springer Science & Business Media. ISBN 978-3-642-86659-3.
- Zhou, Qi; Zhang, Guojie; Zhang, Yue; Xu, Shiyu; Zhao, Ruoping; Zhan, Zubing; Li, Xin; Ding, Yun; Yang, Shuang (1 September 2008). "On the origin of new genes in Drosophila". Genome Research. 18 (9): 1446–1455. doi:10.1101/gr.076588.108. PMC 2527705. PMID 18550802.
- Toll-Riera, M.; Bosch, N.; Bellora, N.; Castelo, R.; Armengol, L.; Estivill, X.; Alba, M. M. (2009). "Origin of primate orphan genes: a comparative genomics approach". Molecular Biology and Evolution. 26 (3): 603–612. doi:10.1093/molbev/msn281. PMID 19064677.
- Wissler, L.; Gadau, J.; Simola, D. F.; Helmkampf, M.; Bornberg-Bauer, E. (2013). "Mechanisms and Dynamics of Orphan Gene Emergence in Insect Genomes". Genome Biology and Evolution. 5 (2): 439–455. doi:10.1093/gbe/evt009. PMC 3590893. PMID 23348040.
- Reinhardt, Josephine A.; Wanjiru, Betty M.; Brant, Alicia T.; Saelao, Perot; Begun, David J.; Jones, Corbin D. (17 October 2013). "De Novo ORFs in Drosophila Are Important to Organismal Fitness and Evolved Rapidly from Previously Non-coding Sequences". PLoS Genet. 9 (10): e1003860. doi:10.1371/journal.pgen.1003860. PMC 3798262. PMID 24146629.
- Suenaga, Yusuke; Islam, S. M. Rafiqul; Alagu, Jennifer; Kaneko, Yoshiki; Kato, Mamoru; Tanaka, Yukichi; Kawana, Hidetada; Hossain, Shamim; Matsumoto, Daisuke (2 January 2014). "NCYM, a Cis-Antisense Gene of MYCN, Encodes a De Novo Evolved Protein That Inhibits GSK3β Resulting in the Stabilization of MYCN in Human Neuroblastomas". PLoS Genet. 10 (1): e1003996. doi:10.1371/journal.pgen.1003996. PMC 3879166. PMID 24391509.
- Jacob, F. (10 June 1977). "Evolution and tinkering". Science. 196 (4295): 1161–1166. Bibcode:1977Sci...196.1161J. doi:10.1126/science.860134. PMID 860134.
- Wilson, G. A.; Bertrand, N.; Patel, Y.; Hughes, J. B.; Feil, E. J.; Field, D. (2005). "Orphans as taxonomically restricted and ecologically important genes". Microbiology. 151 (8): 2499–2501. doi:10.1099/mic.0.28146-0. PMID 16079329.
- Cai, Jing; Zhao, Ruoping; Jiang, Huifeng; Wang, Wen (1 May 2008). "De Novo Origination of a New Protein-Coding Gene in Saccharomyces cerevisiae". Genetics. 179 (1): 487–496. doi:10.1534/genetics.107.084491. PMC 2390625. PMID 18493065.
- Li, L.; Foster, C. M.; Gan, Q.; Nettleton, D.; James, M. G.; Myers, A. M.; Wurtele, E. S. (2009). "Identification of the novel protein QQS as a component of the starch metabolic network in Arabidopsis leaves". The Plant Journal. 58 (3): 485–498. doi:10.1111/j.1365-313X.2009.03793.x. PMID 19154206.
- Li, L; Zheng, W; Zhu, Y; Ye, H; Tang, B; Arendsee, Z; Jones, D; Li, R; Ortiz, D; Zhao, X; Du, C; Nettleton, D; Scott, P; Salas-Fernandez, M; Yin, Y; Wurtele, ES (2015). "The QQS orphan gene regulates carbon and nitrogen partitioning across species via NF-YC interactions". Proc. Natl. Acad. Sci. 112 (47): 14734–14739. Bibcode:2015PNAS..11214734L. doi:10.1073/pnas.1514670112. PMC 4664325. PMID 26554020.
- Donoghue, M.T.A; Keshavaiah, C.; Swamidatta, S.H.; Spillane, C. (2011). "Evolutionary origins of Brassicaceae specific genes in Arabidopsis thaliana". BMC Evolutionary Biology. 11 (1): 47. doi:10.1186/1471-2148-11-47. PMC 3049755. PMID 21332978.
- Altschul, S. (1 September 1997). "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs". Nucleic Acids Research. 25 (17): 3389–3402. doi:10.1093/nar/25.17.3389. PMC 146917. PMID 9254694.
- "NCBI BLAST homepage".
- Alba, M; Castresana, J (2007). "On homology searches by protein BLAST and the characterization of the age of genes". BMC Evol. Biol. 7: 53. doi:10.1186/1471-2148-7-53. PMC 1855329. PMID 17408474.
- Moyers, B. A.; Zhang, J. (13 October 2014). "Phylostratigraphic Bias Creates Spurious Patterns of Genome Evolution". Molecular Biology and Evolution. 32 (1): 258–267. doi:10.1093/molbev/msu286. PMC 4271527. PMID 25312911.
- Domazet-Lošo, Tomislav; Brajković, Josip; Tautz, Diethard (11 January 2007). "A phylostratigraphy approach to uncover the genomic history of major adaptations in metazoan lineages". Trends in Genetics. 23 (11): 533–539. doi:10.1016/j.tig.2007.08.014. PMID 18029048.
- McLysaght, Aoife; Guerzoni, Daniele (31 August 2015). "New genes from non-coding sequence: the role of de novo protein-coding genes in eukaryotic evolutionary innovation". Philosophical Transactions of the Royal Society B: Biological Sciences. 370 (1678): 20140332. doi:10.1098/rstb.2014.0332. PMC 4571571. PMID 26323763.
- Palmieri, Nicola; Kosiol, Carolin; Schlötterer, Christian (19 February 2014). "The life cycle of orphan genes". eLife. 3: e01311. doi:10.7554/eLife.01311. PMC 3927632. PMID 24554240.
- Zhao, Li; Saelao, Perot; Jones, Corbin D.; Begun, David J. (14 February 2014). "Origin and Spread of de Novo Genes in Drosophila melanogaster Populations". Science. 343 (6172): 769–772. Bibcode:2014Sci...343..769Z. doi:10.1126/science.1248286. PMC 4391638. PMID 24457212.
- Levine, Mia T.; Jones, Corbin D.; Kern, Andrew D.; Lindfors, Heather A.; Begun, David J. (27 June 2006). "Novel genes derived from noncoding DNA in Drosophila melanogaster are frequently X-linked and exhibit testis-biased expression". Proceedings of the National Academy of Sciences. 103 (26): 9935–9939. Bibcode:2006PNAS..103.9935L. doi:10.1073/pnas.0509809103. PMC 1502557. PMID 16777968.
- Heinen, Tobias J. A. J.; Staubach, Fabian; Häming, Daniela; Tautz, Diethard (29 September 2009). "Emergence of a New Gene from an Intergenic Region". Current Biology. 19 (18): 1527–1531. doi:10.1016/j.cub.2009.07.049. PMID 19733073.
- Chen, Sidi; Zhang, Yong E.; Long, Manyuan (17 December 2010). "New Genes in Drosophila Quickly Become Essential". Science. 330 (6011): 1682–1685. Bibcode:2010Sci...330.1682C. doi:10.1126/science.1196380. PMC 7211344. PMID 21164016.
- Reinhardt, Josephine A.; Wanjiru, Betty M.; Brant, Alicia T.; Saelao, Perot; Begun, David J.; Jones, Corbin D. (17 October 2013). "De Novo ORFs in Drosophila Are Important to Organismal Fitness and Evolved Rapidly from Previously Non-coding Sequences". PLOS Genet. 9 (10): e1003860. doi:10.1371/journal.pgen.1003860. PMC 3798262. PMID 24146629.
- Silveira AB, Trontin C, Cortijo S, Barau J, Del-Bem LE, Loudet O, Colot V, Vincentz M (2013). "Extensive Natural Epigenetic Variation at a De Novo Originated Gene". PLoS Genetics. 9 (4): e1003437. doi:10.1371/journal.pgen.1003437. PMC 3623765. PMID 23593031.
- Neme, Rafik; Tautz, Diethard (17 March 2014). "Evolution: Dynamics of De Novo Gene Emergence". Current Biology. 24 (6): R238–R240. doi:10.1016/j.cub.2014.02.016. PMID 24650912.
- Moyers, Bryan A.; Zhang, Jianzhi (11 January 2016). "Evaluating phylostratigraphic evidence for widespread de novo gene birth in genome evolution". Molecular Biology and Evolution. 33 (5): 1245–56. doi:10.1093/molbev/msw008. PMC 5010002. PMID 26758516.
- Lynch, Michael; Katju, Vaishali (1 November 2004). "The altered evolutionary trajectories of gene duplicates". Trends in Genetics. 20 (11): 544–549. CiteSeerX 10.1.1.335.7718. doi:10.1016/j.tig.2004.09.001. PMID 15475113.
- Arendsee, Zebulun W.; Li, Ling; Wurtele, Eve Syrkin (November 2014). "Coming of age: orphan genes in plants". Trends in Plant Science. 19 (11): 698–708. doi:10.1016/j.tplants.2014.07.003. PMID 25151064.
- Mukherjee, S.; Panda, A.; Ghosh, T.C. (June 2015). "Elucidating evolutionary features and functional implications of orphan genes in Leishmania major". Infection, Genetics and Evolution. 32: 330–337. doi:10.1016/j.meegid.2015.03.031. PMID 25843649.
- Wilson, Benjamin A.; Foy, Scott G.; Neme, Rafik; Masel, Joanna (24 April 2017). "Young genes are highly disordered as predicted by the preadaptation hypothesis of de novo gene birth". Nature Ecology & Evolution. 1 (6): 0146–146. doi:10.1038/s41559-017-0146. PMC 5476217. PMID 28642936.
- Willis, Sara; Masel, Joanna (19 July 2018). "Gene Birth Contributes to Structural Disorder Encoded by Overlapping Genes". Genetics. 210 (1): 303–313. doi:10.1534/genetics.118.301249. PMC 6116962. PMID 30026186.
- Bungard, Dixie; Copple, Jacob S.; Yan, Jing; Chhun, Jimmy J.; Kumirov, Vlad K.; Foy, Scott G.; Masel, Joanna; Wysocki, Vicki H.; Cordes, Matthew H.J. (November 2017). "Foldability of a Natural De Novo Evolved Protein". Structure. 25 (11): 1687–1696.e4. doi:10.1016/j.str.2017.09.006. PMC 5677532. PMID 29033289.