Pathway analysis: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
mNo edit summary
mNo edit summary
Line 29: Line 29:
====Network enrichment analysis (NEA)====
====Network enrichment analysis (NEA)====


Network enrichment analysis (NEA) has been an extension of gene-set enrichment analysis to the domain of global gene networks.<ref>{{cite journal|last1=Alexeyenko|first1=A.|last2=Lee|first2=W.|last3=Pernemalm|first3=M.|title=Network enrichment analysis: extension of gene-set enrichment analysis to gene networks|journal=BMC Bioinformatics|volume=13|year=2012|doi=10.1186/1471-2105-13-226}}</ref> The major principle of NEA can be understood in comparison with {{abbr|ORA| Over-representation Analysis}}, where enrichment of {{abbr|FGS|Functional Gene Set}} in genes of the {{abbr|AGS|Altered Gene Set}} is determined by how many genes are directly shared by AGS and FGS. In NEA, on the contrary, the global network is searched for network edges that connect any genes of AGS with any genes of FGS. Since enrichment significance is influenced by the highly variable node degrees of individual AGS and FGS genes, it should be determined by a dedicated statistical test, which compares the observed number of network edges to the number expected by chance in the same network context. Some valuable properties of NEA are that:
Network enrichment analysis (NEA) has been an extension of gene-set enrichment analysis to the domain of global gene networks <ref>{{cite journal |last1=Shojaie |first1=Ali |last2=Michailidis |first2=George |title=Network Enrichment Analysis in Complex Experiments |journal=Statistical Applications in Genetics and Molecular Biology |date=22 May 2010 |volume=9 |issue=1 |doi=10.2202/1544-6115.1483 |language=en |issn=1544-6115}}</ref><ref>{{cite journal |last1=Huttenhower |first1=Curtis |last2=Haley |first2=Erin M. |last3=Hibbs |first3=Matthew A. |last4=Dumeaux |first4=Vanessa |last5=Barrett |first5=Daniel R. |last6=Coller |first6=Hilary A. |last7=Troyanskaya |first7=Olga G. |title=Exploring the human genome with functional maps |journal=Genome Research |date=26 February 2009 |doi=10.1101/gr.082214.108 |url=https://genome.cshlp.org/content/early/2009/05/04/gr.082214.108 |language=en |issn=1088-9051}}</ref><ref>{{cite journal|last1=Alexeyenko|first1=A.|last2=Lee|first2=W.|last3=Pernemalm|first3=M.|title=Network enrichment analysis: extension of gene-set enrichment analysis to gene networks|journal=BMC Bioinformatics|volume=13|year=2012|doi=10.1186/1471-2105-13-226}}</ref><ref>{{cite journal |last1=Signorelli |first1=Mirko |last2=Vinciotti |first2=Veronica |last3=Wit |first3=Ernst C. |title=NEAT: an efficient network enrichment analysis test |journal=BMC Bioinformatics |date=5 September 2016 |volume=17 |issue=1 |pages=352 |doi=10.1186/s12859-016-1203-6 |url=https://link.springer.com/article/10.1186/s12859-016-1203-6 |language=en |issn=1471-2105}}</ref> The major principle of NEA can be understood in comparison with {{abbr|ORA| Over-representation Analysis}}, where enrichment of {{abbr|FGS|Functional Gene Set}} in genes of the {{abbr|AGS|Altered Gene Set}} is determined by how many genes are directly shared by AGS and FGS. In NEA, on the contrary, the global network is searched for network edges that connect any genes of AGS with any genes of FGS. Since enrichment significance is influenced by the highly variable node degrees of individual AGS and FGS genes, it should be determined by a dedicated statistical test, which compares the observed number of network edges to the number expected by chance in the same network context. Some valuable properties of NEA are that:
# it is more robust to biological and technical variability between sample replicates;<ref>{{cite journal|last1=Jeggari|first1=A.|last2=Alexeyenko|first2=A|title=NEArender: an R package for functional interpretation of ‘omics’ data via network enrichment analysis|journal=BMC Bioinformatics|volume=18|year=2017|doi=10.1186/s12859-017-1534-y}}</ref>
# it is more robust to biological and technical variability between sample replicates;<ref>{{cite journal|last1=Jeggari|first1=A.|last2=Alexeyenko|first2=A|title=NEArender: an R package for functional interpretation of ‘omics’ data via network enrichment analysis|journal=BMC Bioinformatics|volume=18|year=2017|doi=10.1186/s12859-017-1534-y}}</ref>
# AGS genes may not necessarily be annotated as pathway members;<ref>{{cite journal|last1=Hong|first1=M.|last2=Alexeyenko|first2=A.|last3=Lambert|first3=J.|title=Genome-wide pathway analysis implicates intracellular transmembrane protein transport in Alzheimer disease|journal=Journal of Human Genetics|volume=55|pages=707–709|year=2010|doi=10.1038/jhg.2010.92}}</ref>
# AGS genes may not necessarily be annotated as pathway members;<ref>{{cite journal|last1=Hong|first1=M.|last2=Alexeyenko|first2=A.|last3=Lambert|first3=J.|title=Genome-wide pathway analysis implicates intracellular transmembrane protein transport in Alzheimer disease|journal=Journal of Human Genetics|volume=55|pages=707–709|year=2010|doi=10.1038/jhg.2010.92}}</ref>

Revision as of 07:33, 17 October 2020

In bioinformatics, pathway analysis software is used to identify related proteins within a pathway or building pathway de novo from the proteins of interest. This is helpful when studying differential expression of a gene in a disease or analyzing any omics dataset with a large number of proteins. By examining the changes in gene expression in a pathway, its biological causes can be explored. Pathway is the term from molecular biology which depicts an artificial simplified model of a process within a cell or tissue. A typical pathway model starts with an extracellular signaling molecule that activates a specific receptor, thus triggering a chain of protein-protein or protein-small molecule interactions.[1] Pathway analysis helps to understand or interpret omics data from the point of view of canonical prior knowledge structured in the form of pathways diagrams. It allows finding distinct cell processes (Cellular processes), diseases or signaling pathways that are statistically associated with selection of differentially expressed genes between two samples.[2] Often but erroneously pathway analysis is used as synonym for network analysis (functional enrichment analysis and gene set analysis).[3]

Uses

The data for pathway analysis come from high throughput biology. This includes high throughput sequencing data and microarray data. Before pathway analysis can be done, the omics data should be normalized, and genes should be ranked by differential expression usually with help of Student's t-test, ANOVA or other statistics. In general, any list of statistical ranked genes can be analyzed by pathway analysis. For example, often the functional activity of proteins can be inferred using network enrichment analysis of genes deferentially expressed in the experiment. Such functional activity scores can then be used for pathway analysis to find pathways responsible for observed differential expression. In case when ranking is not available, simply a list of all genes can be analyzed. Also it is possible to integrate multiple microarray data sets from different research groups by meta-analysis and cross-platform normalization.[4] By using pathway analysis software, researchers can determine which gene groups such as pathways, cell processes or diseases are enriched with over and under expressed in experimental data genes. They can also infer associated upstream and downstream regulators, proteins, small molecules, drugs, etc.[5] For example, pathway analysis of several independent microarray experiments (meta-analysis) helped to discover potential biomarkers in a single pathway important for fast-to-slow switch fiber type transition in Duchenne muscular dystrophy.[6] In other study meta-analysis identified two biomarkers in blood of patients with Parkinson's disease, which can be useful for monitoring the disease.[7]

Pathways databases

Pathway analysis needs a knowledge base with pathway collection and interaction networks. Pathway collections content, structure and functionality usually vary in different sources. The examples of the pathway collections are KEGG [8], WikiPathways, and Reactome.[9] Also there are commercial pathways collections such as Pathway Studio pathways [10] and IPA pathways.[11]

Methods and software

Pathway analysis software can be found in the form of desktop programs, web-based applications, or packages coded in such languages as R and Python and shared openly through the BioConductor [12] and GitHub [13] projects. The methodology of pathway analysis evolves fast and the classification is still discussable [14][15], with the following main categories of pathway enrichment analysis applicable to high-throughput data:[16]:

Over-representation analysis (ORA)

This method measures the overlap between, on the one hand, a set of genes (or proteins) in a pathway or another functionally characterised group (gene ontology (GO) groups, protein families, pathways), generally called Functional Gene Set (FGS) and, on the other hand, a set of genes altered in an experimental (or pathological) condition, generally called Altered Gene Set (AGS). A typical example of AGS is a list of top N differentially expressed genes from RNA-Seq assay. The basic assumption behind ORA is that a biologically relevant pathway can be identified by excess of AGS genes in it compared to the number expected by chance. The aim of ORA is to identify such enriched pathways, judging by statistical significance of the overlap between FGS and AGS as determined either by an appropriate statistic, such as Jaccard index or by a statistical test producing p-values (Fisher's exact test or the test using hypergeometric distribution).

Functional class scoring (FCS)

This method identifies FGS by considering their relative positions in the full list of genes studied in the experiment. This full list should be therefore ranked in advance by a statistic (such as mRNA expression fold-change, Student's t-test etc.) or a p-value - while watching the direction of fold change, since p-values are non-directional. Thus FCS takes into account every FGS gene regardless of its statistical significance and does not require pre-compiled AGS. One of the first and most popular methods deploying the FCS approach was the Gene Set Enrichment Analysis (GSEA).[17]

Pathway topology analysis (PTA)

Similarly to FCS, PTA accounts for high-throughput data for every FGS gene [18]. In addition, specific topological information is used about role, position, and interaction directions of the pathway genes. This requires additional input data from a pathway database in a pre-specified format, such as KEGG Markup Language (KGML). Using this information, PTA estimates a pathway significance by considering how much each individual gene alteration might have affected the whole pathway. Multiple alteration types can be used in parallel (copy-number variations, somatic mutations etc.) when available. [19] The set of PTA methods includes Signaling Pathway Impact Analysis (SPIA),[20][21] EnrichNet,[22] GGEA,[23] and TopoGSA.[24]

Network enrichment analysis (NEA)

Network enrichment analysis (NEA) has been an extension of gene-set enrichment analysis to the domain of global gene networks [25][26][27][28] The major principle of NEA can be understood in comparison with ORA, where enrichment of FGS in genes of the AGS is determined by how many genes are directly shared by AGS and FGS. In NEA, on the contrary, the global network is searched for network edges that connect any genes of AGS with any genes of FGS. Since enrichment significance is influenced by the highly variable node degrees of individual AGS and FGS genes, it should be determined by a dedicated statistical test, which compares the observed number of network edges to the number expected by chance in the same network context. Some valuable properties of NEA are that:

  1. it is more robust to biological and technical variability between sample replicates;[29]
  2. AGS genes may not necessarily be annotated as pathway members;[30]
  3. FGS members do not have to be altered themselves, but still are accounted for due to possessing network links to AGS genes.[31]

Commercial solutions

Beyond open-source tools, such as STRING or Cytoscape, a number of companies sell licensed software products to analyse gene sets. While most of the publicly available solutions use online and public pathway collections, the commercial products mostly promote own, proprietary pathways and networks. The choice of such products might be driven by customers' skills, financial and time resources, and needs [32]. Ingenuity, for example, maintains a knowledge base for comparative analysis of gene expression data [33]. Pathways Studio [34] is commercial software which allows searching for biologically relevant facts, analyze experiments, and create pathways. Pathways Studio Viewer [35] is a free resource from the same company for presenting the Pathway Studio interactive pathway collection and database. Two commercial solutions offer PTA: PathwayGuide from Advaita Corporation and MetaCore from Thomson Reuters.[36] Advaita uses the peer reviewed Signaling Pathway Impact Analysis (SPIA) method[37][38] while the MetaCore method is unpublished[39].

Limitations

Lack of annotations

Application of pathway analysis methods depends on annotations found in existing databases, such as gene set membership in pathways, pathway topology, presence of genes in the global network etc. These annotations, however, are far from being complete and have highly variable degrees of confidence. In addition, such information is usually general, i.e. deprived of e.g. cell type, compartment, or developmental context. Therefore, interpretation of pathway analysis results for omics datasets should be done with caution.[40]. Partially, the problem can be addressed by analysing larger gene sets in a more global context, such as big pathway collections or global interaction networks.

References

  1. ^ Berg J. M., Tymoczko J. L., Stryer L. Biochemistry, 5th edition, New York: W. H. Freeman; 2002
  2. ^ García-Campos, Miguel Angel; Espinal-Enríquez, Jesús; Hernández-Lemus, Enrique (2015). "Pathway analysis: State of the art". Frontiers in Physiology. 6: 383. doi:10.3389/fphys.2015.00383. PMC 4681784. PMID 26733877.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  3. ^ GSEA
  4. ^ Walsh, Christopher; Hu, Pingzhao; Batt, Jane; Santos, Claudia (2015). "Microarray Meta-Analysis and Cross-Platform Normalization: Integrative Genomics for Robust Biomarker Discovery". Microarrays. 4 (3): 389–406. doi:10.3390/microarrays4030389. PMC 4996376. PMID 27600230.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  5. ^ Subramanian, Aravind; Tamayo, Pablo; Mootha, Vamsi K.; Mukherjee, Sayan; Ebert, Benjamin L.; Gillette, Michael A.; Paulovich, Amanda; et al. (2005). "Gene Set Enrichment Analysis: A Knowledge-Based Approach for Interpreting Genome-Wide Expression Profiles". Proceedings of the National Academy of Sciences of the United States of America. 102 (43): 15545–50. Bibcode:2005PNAS..10215545S. doi:10.1073/pnas.0506580102. PMC 1239896. PMID 16199517.
  6. ^ Kotelnikova, Ekaterina; Shkrob, Maria A.; Pyatnitskiy, Mikhail A.; Ferlini, Alessandra; Daraselia, Nikolai (2012). "Novel Approach to Meta-Analysis of Microarray Datasets Reveals Muscle Remodeling-Related Drug Targets and Biomarkers in Duchenne Muscular Dystrophy". PLoS Computational Biology. 8 (2): e1002365. Bibcode:2012PLSCB...8E2365K. doi:10.1371/journal.pcbi.1002365. PMC 3271016. PMID 22319435.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  7. ^ Santiago, Jose A.; Potashkin, Judith A. (2015). "Network-Based Metaanalysis Identifies HNF4A and PTBP1 as Longitudinally Dynamic Biomarkers for Parkinson's Disease". Proceedings of the National Academy of Sciences of the United States of America. 112 (7): 2257–62. Bibcode:2015PNAS..112.2257S. doi:10.1073/pnas.1423573112. PMC 4343174. PMID 25646437.
  8. ^ Ogata, H.; Goto, S.; Sato, K.; Fujibuchi, W.; Bono, H.; Kanehisa, M. (1999). "KEGG: Kyoto Encyclopedia of Genes and Genomes". Nucleic Acids Research. 27 (1): 29–34. doi:10.1093/nar/27.1.29. PMC 148090. PMID 9847135.
  9. ^ Vastrik, Imre; D'Eustachio, Peter; Schmidt, Esther; Joshi-Tope, Geeta; Gopinath, Gopal; Croft, David; de Bono, Bernard; et al. (2007). "Reactome: A Knowledge Base of Biologic Pathways and Processes". Genome Biology. 8 (3): R39. doi:10.1186/gb-2007-8-3-r39. PMC 1868929. PMID 17367534.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  10. ^ Pathway Studio Pathways
  11. ^ Pathway Central
  12. ^ Gentleman, R. C.; Carey, V. J.; Bates, D. M.; Bolstad, B.; Dettling, M.; Dudoit, S.; et al. (2004). "Bioconductor: open software development for computational biology and bioinformatics". Genome Biol. 5 (10): R80. doi:10.1186/gb-2004-5-10-r80. PMC 545600. PMID 15461798.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  13. ^ Dabbish, L., Stuart, C., Tsay, J., and Herbsleb, J. (2012). "Social coding in github: transparency and collaboration in an open software repository," in Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work (New York, NY: ACM), 1277–1286
  14. ^ Khatri P., Sirota M., Butte A. J. Ten years of pathway analysis: current approaches and outstanding challenges. Plos Comput Biol. 2012;8(2)
  15. ^ Henderson-Maclennan NK, Papp JC, Talbot CC, McCabe ERB, Presson AP. Pathway analysis software: annotation errors and solutions. Mol Genet Metab. 2010 Nov;101(2–3):134–40
  16. ^ Khatri P., Sirota M., Butte A. J. Ten years of pathway analysis: current approaches and outstanding challenges. Plos Comput Biol. 2012;8(2)
  17. ^ Subramanian, Aravind; Tamayo, Pablo; Mootha, Vamsi K.; Mukherjee, Sayan; Ebert, Benjamin L.; Gillette, Michael A.; Paulovich, Amanda; et al. (2005). "Gene Set Enrichment Analysis: A Knowledge-Based Approach for Interpreting Genome-Wide Expression Profiles". Proceedings of the National Academy of Sciences of the United States of America. 102 (43): 15545–50. Bibcode:2005PNAS..10215545S. doi:10.1073/pnas.0506580102. PMC 1239896. PMID 16199517.
  18. ^ Emmert-Streib, F.; Dehmer, M. (2011). "Networks for systems biology: conceptual connection of data and function". IET Systems Biology. 5 (3): 185–207. doi:10.1049/iet-syb.2010.0025. PMID 21639592.
  19. ^ Khatri, Purvesh; Sirota, Marina; Butte, Atul J.; Ouzounis, Christos A. (23 February 2012). "Ten Years of Pathway Analysis: Current Approaches and Outstanding Challenges". PLoS Computational Biology. 8 (2): e1002375. Bibcode:2012PLSCB...8E2375K. doi:10.1371/journal.pcbi.1002375. PMC 3285573. PMID 22383865.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  20. ^ Draghici, S.; Khatri, P.; Tarca, A. L.; Amin, K.; Done, A.; Voichita, C.; Georgescu, C.; Romero, R. (4 September 2007). "A systems biology approach for pathway level analysis". Genome Research. 17 (10): 1537–1545. doi:10.1101/gr.6202607. PMC 1987343. PMID 17785539.
  21. ^ Tarca, A. L.; Draghici, S.; Khatri, P.; Hassan, S. S.; Mittal, P.; Kim, J.-s.; Kim, C. J.; Kusanovic, J. P.; Romero, R. (5 November 2008). "A novel signaling pathway impact analysis". Bioinformatics. 25 (1): 75–82. doi:10.1093/bioinformatics/btn577. PMC 2732297. PMID 18990722.
  22. ^ Glaab, E.; Baudot, A.; Krasnogor, N.; Schneider, R. S.; Valencia, A. (15 September 2012). "EnrichNet: Network-based gene set enrichment analysis". Bioinformatics. 28 (18): i451–i457. doi:10.1093/bioinformatics/bts389. PMC 3436816. PMID 22962466.
  23. ^ Geistlinger, L.; Csaba, G.; Küffner, R.; Mulder, N.; Zimmer, R. (2011). "From sets to graphs: Towards a realistic enrichment analysis of transcriptomic systems". Bioinformatics. 27 (13): i366–i373. doi:10.1093/bioinformatics/btr228. PMC 3117393. PMID 21685094.
  24. ^ Glaab, E.; Baudot, A.; Krasnogor, N.; Valencia, A. (2012). "TopoGSA: Network topological gene set analysis". Bioinformatics. 26 (18): 1271–1272. doi:10.1093/bioinformatics/btq131. PMC 2859135. PMID 20335277.
  25. ^ Shojaie, Ali; Michailidis, George (22 May 2010). "Network Enrichment Analysis in Complex Experiments". Statistical Applications in Genetics and Molecular Biology. 9 (1). doi:10.2202/1544-6115.1483. ISSN 1544-6115.
  26. ^ Huttenhower, Curtis; Haley, Erin M.; Hibbs, Matthew A.; Dumeaux, Vanessa; Barrett, Daniel R.; Coller, Hilary A.; Troyanskaya, Olga G. (26 February 2009). "Exploring the human genome with functional maps". Genome Research. doi:10.1101/gr.082214.108. ISSN 1088-9051.
  27. ^ Alexeyenko, A.; Lee, W.; Pernemalm, M. (2012). "Network enrichment analysis: extension of gene-set enrichment analysis to gene networks". BMC Bioinformatics. 13. doi:10.1186/1471-2105-13-226.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  28. ^ Signorelli, Mirko; Vinciotti, Veronica; Wit, Ernst C. (5 September 2016). "NEAT: an efficient network enrichment analysis test". BMC Bioinformatics. 17 (1): 352. doi:10.1186/s12859-016-1203-6. ISSN 1471-2105.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  29. ^ Jeggari, A.; Alexeyenko, A (2017). "NEArender: an R package for functional interpretation of 'omics' data via network enrichment analysis". BMC Bioinformatics. 18. doi:10.1186/s12859-017-1534-y.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  30. ^ Hong, M.; Alexeyenko, A.; Lambert, J. (2010). "Genome-wide pathway analysis implicates intracellular transmembrane protein transport in Alzheimer disease". Journal of Human Genetics. 55: 707–709. doi:10.1038/jhg.2010.92.
  31. ^ Jeggari, Ashwini; Alekseenko, Zhanna; Petrov, Iurii; Dias, José M; Ericson, Johan; Alexeyenko, Andrey (2 July 2018). "EviNet: a web platform for network enrichment analysis with flexible definition of gene sets". Nucleic Acids Research. 46 (W1): W163–W170. doi:10.1093/nar/gky485.
  32. ^ García-Campos, Miguel Angel; Espinal-Enríquez, Jesús; Hernández-Lemus, Enrique (2015). "Pathway analysis: State of the art". Frontiers in Physiology. 6: 383. doi:10.3389/fphys.2015.00383. PMC 4681784. PMID 26733877.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  33. ^ "Ingenuity IPA - Integrate and Understand Complex 'omics Data." Ingenuity. Web. 8 Apr. 2015. <http://www.ingenuity.com/products/ipa#/?tab=features>.
  34. ^ Pathway Studio
  35. ^ Pathway Studio Viewer
  36. ^ Mitrea, Cristina; Taghavi, Zeinab; Bokanizad, Behzad; Hanoudi, Samer; Tagett, Rebecca; Donato, Michele; Voichiţa, Călin; Drăghici, Sorin (2013). "Methods and approaches in the topology-based analysis of biological pathways". Frontiers in Physiology. 4: 278. doi:10.3389/fphys.2013.00278. PMC 3794382. PMID 24133454.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  37. ^ Draghici, S.; Khatri, P.; Tarca, A. L.; Amin, K.; Done, A.; Voichita, C.; Georgescu, C.; Romero, R. (4 September 2007). "A systems biology approach for pathway level analysis". Genome Research. 17 (10): 1537–1545. doi:10.1101/gr.6202607. PMC 1987343. PMID 17785539.
  38. ^ Tarca, A. L.; Draghici, S.; Khatri, P.; Hassan, S. S.; Mittal, P.; Kim, J.-s.; Kim, C. J.; Kusanovic, J. P.; Romero, R. (5 November 2008). "A novel signaling pathway impact analysis". Bioinformatics. 25 (1): 75–82. doi:10.1093/bioinformatics/btn577. PMC 2732297. PMID 18990722.
  39. ^ Mitrea, Cristina; Taghavi, Zeinab; Bokanizad, Behzad; Hanoudi, Samer; Tagett, Rebecca; Donato, Michele; Voichiţa, Călin; Drăghici, Sorin (2013). "Methods and approaches in the topology-based analysis of biological pathways". Frontiers in Physiology. 4: 278. doi:10.3389/fphys.2013.00278. PMC 3794382. PMID 24133454.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  40. ^ Henderson-Maclennan, Nicole K., Jeanette C. Papp, C. Conover Talbot, Edward R. B. McCabe, and Angela P. Presson. "Pathway Analysis Software: Annotation Errors and Solutions."Molecular Genetics and Metabolism (2010): 134–40. PMC. Web. 8 April 2015.