Pathway analysis

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

In bioinformatics research, pathway analysis software is used to identify related proteins within a pathway or building pathway de novo from the proteins of interest. This is helpful when studying differential expression of a gene in a disease or analyzing any omics dataset with a large number of proteins. By examining the changes in gene expression in a pathway, its biological causes can be explored. Pathway is the term from molecular biology which depicts an artificial simplified model of a process within a cell or tissue. A typical pathway model starts with an extracellular signaling molecule that activates a specific receptor, thus triggering a chain of protein-protein or protein-small molecule interactions.[1] Pathway analysis helps to understand or interpret omics data from the point of view of canonical prior knowledge structured in the form of pathways diagrams. It allows finding distinct cell processes (Cellular processes), diseases or signaling pathways that are statistically associated with selection of differentially expressed genes between two samples.[2] Often but erroneously pathway analysis is used as synonym for network analysis (functional enrichment analysis and gene set analysis).[3]

Uses[edit]

The data for pathway analysis come from high throughput biology. This includes high throughput sequencing data and microarray data. Before pathway analysis can be done, the omics data should be normalized, and genes should be ranked by differential expression usually with help of Student's t-test, ANOVA or other statistics. In general, any list of statistical ranked genes can be analyzed by pathway analysis. For example, often the functional activity of proteins can be inferred using network enrichment analysis of genes deferentially expressed in the experiment. Such functional activity scores can then be used for pathway analysis to find pathways responsible for observed differential expression. In case when ranking is not available, simply a list of all genes can be analyzed. Also it is possible to integrate multiple microarray data sets from different research groups by meta-analysis and cross-platform normalization.[4] By using pathway analysis software, researchers can determine which gene groups such as pathways, cell processes or diseases are enriched with over and under expressed in experimental data genes. They can also infer associated upstream and downstream regulators, proteins, small molecules, drugs, etc.[5] For example, pathway analysis of several independent microarray experiments (meta-analysis) helped to discover potential biomarkers in a single pathway important for fast-to-slow switch fiber type transition in Duchenne muscular dystrophy.[6] In other study meta-analysis identified two biomarkers in blood of patients with Parkinson's disease, which can be useful for monitoring the disease.[7]

Pathways Databases[edit]

Pathway analysis needs a knowledge base with pathway collection and interaction networks. Pathway collections content, structure and functionality usually vary in different sources. The examples of the pathway collections are KEGG [8], WikiPathways, and Reactome.[9] Also there are commercial pathways collections such as Pathway Studio pathways [10] and IPA pathways.[11]

Methods and software[edit]

Pathway analysis software can be generally divided into web-based applications, desktop programs and programming packages. Programming packages are mostly coded in the R and Python languages, and are shared openly through the BioConductor [12] and GitHub [13] projects. Different methods of pathway analysis evolve fast, so classification of these methods is still discussable.[14][15] There are 3 main groups of methods in pathway analysis according to:[16] ORA, FSC and PT.

Over-Representation Analysis or Enrichment Analysis (ORA)[edit]

This method measures the percentage of genes in a pathway or any gene group (gene ontology (GO) groups, protein families, pathways) that have differential expression. The aim of ORA is to get a list of the most relevant pathways, ordered in accordance to a p-value. The basic hypothesis in ORA is that relevant pathways can be identified by the number of genes differently expressed in the experiment that pathways contain. The statistical significance of the overlap between genes from a pathway and the list of differently expressed genes is determined by such statistical tests as Fisher's exact test, hypergeometric distribution test or Jaccard index.

Functional Class Scoring (FCS)[edit]

This method analyzes the expression change of overall genes in the list (not ranking by statistical significance or something else) of differently expressed in experiment genes. FCS discards the ORA cut-off threshold limitation. The aim of FCS is to evaluate differently expressed genes enrichment scores (see gene set enrichment) using pathways as gene sets to perform their computations. One of the first and most popular methods deploying the FCS approach is the Gene Set Enrichment Analysis (GSEA).[17]

Pathway Topology (PT)[edit]

Pathway topology is essentially the same as FCS, except PT uses gene-level statistics through different databases integration.[18] However the critical difference is that by leveraging the information about role, position, and direction of interaction from the pathway database, PT is able to re-score the significance of a pathway as the linkages change, whereas FCS will always provide the same score.[19] Examples for PT approaches include Signaling Pathway Impact Analysis (SPIA),[20][21] EnrichNet,[22] GGEA,[23] and TopoGSA.[24]

Notable companies[edit]

Several companies have licensed software to perform a number of analytic methods on gene set. Most of free software solutions provide only links to online pathway collections; rather commercial ones have their own collections. The choice of best software depends on user skills, cost and time which one could spend on pathways analysis.[25] Ingenuity, for example, charges a fee for use of their software. Some software, like STRING or Cytoscape are an open-source. However, Ingenuity maintains a knowledge base to compare gene expression data to.[26] Pathways Studio [27] is commercial software which allows to search biologically relevant facts, analyze experiments and create pathways. Pathways Studio Viewer [28] is a free resource from that company for making acquaintance with Pathway Studio interactive pathway collection and database. Only two commercial applications are known to offer pathway topology (PT) based analyses, PathwayGuide from Advaita Corporation and MetaCore from Thomson Reuters.[29] Advaita uses the peer reviewed Signaling Pathway Impact Analysis (SPIA) method[30][31] while the MetaCore method is unpublished.[32]

Limits[edit]

Missing annotations on cell types and conditions[edit]

Many current methods for pathway analysis depend on existing databases. The data used, however, is not always completely annotated. Many genes interactions in databases are relatively speculative as they are based on scientific facts, are pulled from a specific cell type or disease. Also most canonical pathways are built using the knowledge obtained from a limited number of experiments with narrow cell models. Therefore, interpretation of results of pathway analysis of omics data obtained from different tissues should be done with caution.[33]

References[edit]

  1. ^ Berg J. M., Tymoczko J. L., Stryer L. Biochemistry, 5th edition, New York: W. H. Freeman; 2002
  2. ^ García-Campos, Miguel Angel; Espinal-Enríquez, Jesús; Hernández-Lemus, Enrique (2015). "Pathway analysis: State of the art". Frontiers in Physiology. 6: 383. doi:10.3389/fphys.2015.00383. PMC 4681784. PMID 26733877.
  3. ^ GSEA
  4. ^ Walsh, Christopher; Hu, Pingzhao; Batt, Jane; Santos, Claudia (2015). "Microarray Meta-Analysis and Cross-Platform Normalization: Integrative Genomics for Robust Biomarker Discovery". Microarrays. 4 (3): 389–406. doi:10.3390/microarrays4030389. PMC 4996376. PMID 27600230.
  5. ^ Subramanian, Aravind; Tamayo, Pablo; Mootha, Vamsi K.; Mukherjee, Sayan; Ebert, Benjamin L.; Gillette, Michael A.; Paulovich, Amanda; et al. (2005). "Gene Set Enrichment Analysis: A Knowledge-Based Approach for Interpreting Genome-Wide Expression Profiles". Proceedings of the National Academy of Sciences of the United States of America. 102 (43): 15545–50. doi:10.1073/pnas.0506580102. PMC 1239896. PMID 16199517.
  6. ^ Kotelnikova, Ekaterina; Shkrob, Maria A.; Pyatnitskiy, Mikhail A.; Ferlini, Alessandra; Daraselia, Nikolai (2012). "Novel Approach to Meta-Analysis of Microarray Datasets Reveals Muscle Remodeling-Related Drug Targets and Biomarkers in Duchenne Muscular Dystrophy". PLoS Computational Biology. 8 (2): e1002365. doi:10.1371/journal.pcbi.1002365. PMC 3271016. PMID 22319435.
  7. ^ Santiago, Jose A.; Potashkin, Judith A. (2015). "Network-Based Metaanalysis Identifies HNF4A and PTBP1 as Longitudinally Dynamic Biomarkers for Parkinson's Disease". Proceedings of the National Academy of Sciences of the United States of America. 112 (7): 2257–62. doi:10.1073/pnas.1423573112. PMC 4343174. PMID 25646437.
  8. ^ Ogata, H.; Goto, S.; Sato, K.; Fujibuchi, W.; Bono, H.; Kanehisa, M. (1999). "KEGG: Kyoto Encyclopedia of Genes and Genomes". Nucleic Acids Research. 27 (1): 29–34. doi:10.1093/nar/27.1.29. PMC 148090. PMID 9847135.
  9. ^ Vastrik, Imre; D'Eustachio, Peter; Schmidt, Esther; Joshi-Tope, Geeta; Gopinath, Gopal; Croft, David; de Bono, Bernard; et al. (2007). "Reactome: A Knowledge Base of Biologic Pathways and Processes". Genome Biology. 8 (3): R39. doi:10.1186/gb-2007-8-3-r39. PMC 1868929. PMID 17367534.
  10. ^ Pathway Studio Pathways
  11. ^ Pathway Central
  12. ^ Gentleman, R. C.; Carey, V. J.; Bates, D. M.; Bolstad, B.; Dettling, M.; Dudoit, S.; et al. (2004). "Bioconductor: open software development for computational biology and bioinformatics". Genome Biol. 5 (10): R80. doi:10.1186/gb-2004-5-10-r80. PMC 545600. PMID 15461798.
  13. ^ Dabbish, L., Stuart, C., Tsay, J., and Herbsleb, J. (2012). "Social coding in github: transparency and collaboration in an open software repository," in Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work (New York, NY: ACM), 1277–1286
  14. ^ Khatri P., Sirota M., Butte A. J. Ten years of pathway analysis: current approaches and outstanding challenges. Plos Comput Biol. 2012;8(2)
  15. ^ Henderson-Maclennan NK, Papp JC, Talbot CC, McCabe ERB, Presson AP. Pathway analysis software: annotation errors and solutions. Mol Genet Metab. 2010 Nov;101(2–3):134–40
  16. ^ Khatri P., Sirota M., Butte A. J. Ten years of pathway analysis: current approaches and outstanding challenges. Plos Comput Biol. 2012;8(2)
  17. ^ Subramanian, Aravind; Tamayo, Pablo; Mootha, Vamsi K.; Mukherjee, Sayan; Ebert, Benjamin L.; Gillette, Michael A.; Paulovich, Amanda; et al. (2005). "Gene Set Enrichment Analysis: A Knowledge-Based Approach for Interpreting Genome-Wide Expression Profiles". Proceedings of the National Academy of Sciences of the United States of America. 102 (43): 15545–50. doi:10.1073/pnas.0506580102. PMC 1239896. PMID 16199517.
  18. ^ Emmert-Streib, F.; Dehmer, M. (2011). "Networks for systems biology: conceptual connection of data and function". Syst. Biol. IET. 5 (3): 185–207. doi:10.1049/iet-syb.2010.0025. PMID 21639592.
  19. ^ Khatri, Purvesh; Sirota, Marina; Butte, Atul J.; Ouzounis, Christos A. (23 February 2012). "Ten Years of Pathway Analysis: Current Approaches and Outstanding Challenges". PLoS Computational Biology. 8 (2): e1002375. doi:10.1371/journal.pcbi.1002375. PMC 3285573. PMID 22383865.
  20. ^ Draghici, S.; Khatri, P.; Tarca, A. L.; Amin, K.; Done, A.; Voichita, C.; Georgescu, C.; Romero, R. (4 September 2007). "A systems biology approach for pathway level analysis". Genome Research. 17 (10): 1537–1545. doi:10.1101/gr.6202607. PMC 1987343. PMID 17785539.
  21. ^ Tarca, A. L.; Draghici, S.; Khatri, P.; Hassan, S. S.; Mittal, P.; Kim, J.-s.; Kim, C. J.; Kusanovic, J. P.; Romero, R. (5 November 2008). "A novel signaling pathway impact analysis". Bioinformatics. 25 (1): 75–82. doi:10.1093/bioinformatics/btn577. PMC 2732297. PMID 18990722.
  22. ^ Glaab, E.; Baudot, A.; Krasnogor, N.; Schneider, R. S.; Valencia, A. (15 September 2012). "EnrichNet: Network-based gene set enrichment analysis". Bioinformatics. 28 (18): i451–i457. doi:10.1093/bioinformatics/bts389. PMC 3436816. PMID 22962466.
  23. ^ Geistlinger, L.; Csaba, G.; Küffner, R.; Mulder, N.; Zimmer, R. (2011). "From sets to graphs: Towards a realistic enrichment analysis of transcriptomic systems". Bioinformatics. 27 (13): i366–i373. doi:10.1093/bioinformatics/btr228. PMC 3117393. PMID 21685094.
  24. ^ Glaab, E.; Baudot, A.; Krasnogor, N.; Valencia, A. (2012). "TopoGSA: Network topological gene set analysis". Bioinformatics. 26 (18): 1271–1272. doi:10.1093/bioinformatics/btq131. PMC 2859135. PMID 20335277.
  25. ^ García-Campos, Miguel Angel; Espinal-Enríquez, Jesús; Hernández-Lemus, Enrique (2015). "Pathway analysis: State of the art". Frontiers in Physiology. 6: 383. doi:10.3389/fphys.2015.00383. PMC 4681784. PMID 26733877.
  26. ^ "Ingenuity IPA - Integrate and Understand Complex 'omics Data." Ingenuity. Web. 8 Apr. 2015. <http://www.ingenuity.com/products/ipa#/?tab=features>.
  27. ^ Pathway Studio
  28. ^ Pathway Studio Viewer
  29. ^ Mitrea, Cristina; Taghavi, Zeinab; Bokanizad, Behzad; Hanoudi, Samer; Tagett, Rebecca; Donato, Michele; Voichiţa, Călin; Drăghici, Sorin (2013). "Methods and approaches in the topology-based analysis of biological pathways". Frontiers in Physiology. 4: 278. doi:10.3389/fphys.2013.00278. PMC 3794382. PMID 24133454.
  30. ^ Draghici, S.; Khatri, P.; Tarca, A. L.; Amin, K.; Done, A.; Voichita, C.; Georgescu, C.; Romero, R. (4 September 2007). "A systems biology approach for pathway level analysis". Genome Research. 17 (10): 1537–1545. doi:10.1101/gr.6202607. PMC 1987343. PMID 17785539.
  31. ^ Tarca, A. L.; Draghici, S.; Khatri, P.; Hassan, S. S.; Mittal, P.; Kim, J.-s.; Kim, C. J.; Kusanovic, J. P.; Romero, R. (5 November 2008). "A novel signaling pathway impact analysis". Bioinformatics. 25 (1): 75–82. doi:10.1093/bioinformatics/btn577. PMC 2732297. PMID 18990722.
  32. ^ Mitrea, Cristina; Taghavi, Zeinab; Bokanizad, Behzad; Hanoudi, Samer; Tagett, Rebecca; Donato, Michele; Voichiţa, Călin; Drăghici, Sorin (2013). "Methods and approaches in the topology-based analysis of biological pathways". Frontiers in Physiology. 4: 278. doi:10.3389/fphys.2013.00278. PMC 3794382. PMID 24133454.
  33. ^ Henderson-Maclennan, Nicole K., Jeanette C. Papp, C. Conover Talbot, Edward R. B. McCabe, and Angela P. Presson. "Pathway Analysis Software: Annotation Errors and Solutions."Molecular Genetics and Metabolism (2010): 134–40. PMC. Web. 8 April 2015.