Gene Ontology Term Enrichment

Gene Ontology (GO) term enrichment is a technique for interpreting sets of genes making use of the Gene Ontology system of classification, in which genes are assigned to a set of predefined bins depending on their functional characteristics. For example, the gene FasR is categorized as being a receptor, involved in apoptosis and located on the plasma membrane.

Researchers performing high-throughput experiments that yield sets of genes (for example, genes that are differentially expressed under different conditions) often want to retrieve a functional profile of that gene set, in order to better understand the underlying biological processes. This can be done by comparing the input gene set with each of the bins (terms) in the GO – a statistical test can be performed for each bin to see if it is enriched for the input genes.

The output of the analysis is typically a ranked list of GO terms, each associated with a p-value.^[1]

Background

The Gene Ontology

The Gene Ontology (GO) provides a system for hierarchically classifying genes or gene products into terms organized in a graph structure (or an ontology). The terms are groups into three categories: molecular function (describing the molecular activity of a gene), biological process (describing the larger cellular or physiological role carried out by the gene, coordinated with other genes), and cellular component (describing the location in the cell where the gene product executes its function). Each gene can be described (annotated) with multiple terms. The GO is actively used to classify genes from humans, model organisms and a variety of other species.

Using the GO, it is possible to retrieve the set of terms used to describe any gene, or conversely, given a term, return the set of genes annotated to that term. For the latter query, the hierarchical system of the GO is employed to give complete results. For example, a query for the GO term for nucleus should return genes annotated to the term "nuclear membrane".

Interpreting high throughput data

Certain types of high-throughput experiments (e.g., RNA seq) return sets of genes that are over- or under-expressed. GO can be used to functionally profile this set of genes and to determine which GO terms appear more frequently than would be expected by chance when examining the set of terms annotated to the input genes. For example, an experiment may compare gene expression in healthy cells versus cancerous cells. Functional profiling can be used to elucidate the underlying cellular mechanisms associated with the cancerous condition. This is also called term enrichment or term overrepresentation, as we are testing whether a GO term is statistically enriched for the given set of genes.

Methods

There are a variety of methods for performing a term enrichment using GO. Methods may vary according to the type of statistical test applied, the most common being a Fisher's exact test / hypergeometric test. Some methods make use of Bayesian statistics.^[2] There is also variability in the type of correction applied for Multiple comparisons, the most common being the Bonferroni correction.

Methods also vary in their input – some take unranked gene sets, others take ranked gene sets, with more sophisticated methods allowing each gene to be associated with a magnitude (e.g., expression level), avoiding arbitrary cutoffs.

Tools

MOET: a web-based gene set enrichment tool at the Rat Genome Database for multiontology and multispecies analyses^[3]
PlantRegMap: GO annotation for 165 species and GO term enrichment analysis
PLAZA Workbench: GO, InterPro and MapMan enrichment analysis for different plant species.
The Gene Ontology Consortium (GOC) provides a Term Enrichment tool.^[4]
Term Enrichment
FunRich^[5] is a Windows-based, free, standalone functional enrichment analysis tool.
Blast2GO,^[6] is a platform-independent desktop application to perform functional enrichment analysis as well as functional annotation of novel sequence data.

References

^ Rhee, S. Y.; Wood, V; Dolinski, K; Draghici, S (2008). "Use and misuse of the gene ontology annotations". Nature Reviews Genetics. 9 (7): 509–15. doi:10.1038/nrg2363. PMID 18475267. S2CID 15599098.
^ Bauer, S; Gagneur, J; Robinson, P. N. (2010). "GOing Bayesian: Model-based gene set analysis of genome-scale data". Nucleic Acids Research. 38 (11): 3523–32. doi:10.1093/nar/gkq045. PMC 2887944. PMID 20172960.
^ Vedi M, Nalabolu HS, Lin CW, Hoffman MJ, Smith JR, Brodie K, De Pons JL, Demos WM, Gibson AC, Hayman GT, Hill ML, Kaldunski ML, Lamers L, Laulederkind SJ, Thorat K, Thota J, Tutaj M, Tutaj MA, Wang SJ, Zacher S, Dwinell MR, Kwitek AE (April 2022). "MOET: a web-based gene set enrichment tool at the Rat Genome Database for multiontology and multispecies analyses". Genetics. 220 (4). doi:10.1093/genetics/iyac005. PMC 8982048. PMID 35380657.
^ Carbon S, Ireland A, Mungall CJ, Shu S, Marshall B, Lewis S (2008). "AmiGO: Online access to ontology and annotation data". Bioinformatics. 25 (2). AmiGO Hub; Web Presence Working Group: 288–9. doi:10.1093/bioinformatics/btn615. PMC 2639003. PMID 19033274.
^ Pathan, M; Keerthikumar, S; Ang, C. S.; Gangoda, L; Quek, C. Y.; Williamson, N. A.; Mouradov, D; Sieber, O. M.; Simpson, R. J.; Salim, A; Bacic, A; Hill, A; Stroud, D. A.; Ryan, M. T.; Agbinya, J. I.; Mariadasson, J. M.; Burgess, A. W.; Mathivanan, S (2015). "Technical brief funrich: An open access standalone functional enrichment and interaction network analysis tool". Proteomics. 15 (15): 2597–601. doi:10.1002/pmic.201400515. PMID 25921073. S2CID 28583044.
^ Götz, S; García-Gómez, JM; Terol, J; Williams, TD; Nagaraj, SH; Nueda, MJ; Robles, M; Talón, M; Dopazo, J; Conesa, A (June 2008). "High-throughput functional annotation and data mining with the Blast2GO suite". Nucleic Acids Research. 36 (10): 3420–35. doi:10.1093/nar/gkn176. PMC 2425479. PMID 18445632.

[pmid18475267-1] Rhee, S. Y.; Wood, V; Dolinski, K; Draghici, S (2008). "Use and misuse of the gene ontology annotations". Nature Reviews Genetics. 9 (7): 509–15. doi:10.1038/nrg2363. PMID 18475267. S2CID 15599098.

[2] Bauer, S; Gagneur, J; Robinson, P. N. (2010). "GOing Bayesian: Model-based gene set analysis of genome-scale data". Nucleic Acids Research. 38 (11): 3523–32. doi:10.1093/nar/gkq045. PMC 2887944. PMID 20172960.

[3] Vedi M, Nalabolu HS, Lin CW, Hoffman MJ, Smith JR, Brodie K, De Pons JL, Demos WM, Gibson AC, Hayman GT, Hill ML, Kaldunski ML, Lamers L, Laulederkind SJ, Thorat K, Thota J, Tutaj M, Tutaj MA, Wang SJ, Zacher S, Dwinell MR, Kwitek AE (April 2022). "MOET: a web-based gene set enrichment tool at the Rat Genome Database for multiontology and multispecies analyses". Genetics. 220 (4). doi:10.1093/genetics/iyac005. PMC 8982048. PMID 35380657.

[4] Carbon S, Ireland A, Mungall CJ, Shu S, Marshall B, Lewis S (2008). "AmiGO: Online access to ontology and annotation data". Bioinformatics. 25 (2). AmiGO Hub; Web Presence Working Group: 288–9. doi:10.1093/bioinformatics/btn615. PMC 2639003. PMID 19033274.

[pmid25921073-5] Pathan, M; Keerthikumar, S; Ang, C. S.; Gangoda, L; Quek, C. Y.; Williamson, N. A.; Mouradov, D; Sieber, O. M.; Simpson, R. J.; Salim, A; Bacic, A; Hill, A; Stroud, D. A.; Ryan, M. T.; Agbinya, J. I.; Mariadasson, J. M.; Burgess, A. W.; Mathivanan, S (2015). "Technical brief funrich: An open access standalone functional enrichment and interaction network analysis tool". Proteomics. 15 (15): 2597–601. doi:10.1002/pmic.201400515. PMID 25921073. S2CID 28583044.

[6] Götz, S; García-Gómez, JM; Terol, J; Williams, TD; Nagaraj, SH; Nueda, MJ; Robles, M; Talón, M; Dopazo, J; Conesa, A (June 2008). "High-throughput functional annotation and data mining with the Blast2GO suite". Nucleic Acids Research. 36 (10): 3420–35. doi:10.1093/nar/gkn176. PMC 2425479. PMID 18445632.

[1]

[2]

[3]

[4]

[5]

[6]