Biological network inference: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Monkbot (talk | contribs)
m Task 18 (cosmetic): eval 12 templates: del empty params (1×);
Added three different sections; network analysis tools, network analysis methods, network attributes and expanded the section detailing types of networks.
Line 1: Line 1:
'''Biological network inference''' is the process of making [[inference]]s and predictions about [[biological network]]s.<ref name="MercatelliScalambra2020">{{cite journal|last1=Mercatelli|first1=Daniele|last2=Scalambra|first2=Laura|last3=Triboli|first3=Luca|last4=Ray|first4=Forest|last5=Giorgi|first5=Federico M.|title=Gene regulatory network inference resources: A practical overview|journal=Biochimica et Biophysica Acta (BBA) - Gene Regulatory Mechanisms|volume=1863|issue=6|year=2020|pages=194430|issn=18749399|doi=10.1016/j.bbagrm.2019.194430|pmid=31678629}}</ref>
'''Biological network inference''' is the process of making [[inference]]s and predictions about [[biological network]]s.<ref name="MercatelliScalambra2020">{{cite journal|last1=Mercatelli|first1=Daniele|last2=Scalambra|first2=Laura|last3=Triboli|first3=Luca|last4=Ray|first4=Forest|last5=Giorgi|first5=Federico M.|title=Gene regulatory network inference resources: A practical overview|journal=Biochimica et Biophysica Acta (BBA) - Gene Regulatory Mechanisms|volume=1863|issue=6|year=2020|pages=194430|issn=18749399|doi=10.1016/j.bbagrm.2019.194430|pmid=31678629}}</ref> By using networks to analyze patterns in biological systems, such as food-webs, we can visualize the nature and strength of interactions between species, DNA, proteins, and more.


The analysis of biological networks with respect to diseases has led to the development of the field of [[network medicine]].<ref>{{Cite journal |last=Barabási |first=Albert-László |last2=Gulbahce |first2=Natali |last3=Loscalzo |first3=Joseph |date=2011-01 |title=Network medicine: a network-based approach to human disease |url=https://www.nature.com/articles/nrg2918 |journal=Nature Reviews Genetics |language=en |volume=12 |issue=1 |pages=56–68 |doi=10.1038/nrg2918 |issn=1471-0064}}</ref> Recent examples of application of network theory in biology include applications to understanding the [[cell cycle]]<ref>{{Cite journal |last=Jailkhani |first=Noor |last2=Ravichandran |first2=Srikanth |last3=Hegde |first3=Shubhada R. |last4=Siddiqui |first4=Zaved |last5=Mande |first5=Shekhar C. |last6=Rao |first6=Kanury V. S. |date=2011-12-01 |title=Delineation of key regulatory elements identifies points of vulnerability in the mitogen-activated signaling network |url=https://genome.cshlp.org/content/21/12/2067 |journal=Genome Research |language=en |volume=21 |issue=12 |pages=2067–2081 |doi=10.1101/gr.116145.110 |issn=1088-9051 |pmid=21865350}}</ref> as well as a quantitative framework for developmental processes.<ref>{{Cite journal |last=Jackson |first=Matthew D. B. |last2=Duran-Nebreda |first2=Salva |last3=Bassel |first3=George W. |date=2017-10-31 |title=Network-based approaches to quantify multicellular development |url=https://royalsocietypublishing.org/doi/10.1098/rsif.2017.0484 |journal=Journal of The Royal Society Interface |volume=14 |issue=135 |pages=20170484 |doi=10.1098/rsif.2017.0484 |pmc=PMC5665831 |pmid=29021161}}</ref> Good network inference requires proper planning and execution of an experiment, thereby ensuring quality data acquisition. Optimal experimental design in principle refers to the use of statistical and or mathematical concepts to plan for data acquisition. This must be done in such a way that the data information content is enriched, and a sufficient amount of data is collected with enough technical and biological replicates where necessary.<ref name=":0">{{Cite journal |last=Omony |first=Jimmy |date=2014-01-10 |title=Biological Network Inference: A Review of Methods and Assessment of Tools and Techniques |url=https://www.journalarrb.com/index.php/ARRB/article/view/25052 |journal=Annual Research & Review in Biology |volume=4 |issue=4 |pages=577–601 |doi=10.9734/ARRB/2014/5718}}</ref>
==Biological networks==

The general cycle to modeling biological networks is as follows<ref name=":0" />:

# Prior knowledge
#* Involves a thorough literature and database search or seeking an expert’s opinion.
# Model selection
#* A formalism to model your system, usually an [[ordinary differential equation]], [[boolean network]], or [[Linear regression]] models, e.g. [[Least-angle regression]], by [[Bayesian network]] or based on [[Information theory]] approaches.<ref>{{cite journal |vauthors=van Someren EP, Wessels LF, Backer E, Reinders MJ |date=July 2002 |title=Genetic network modeling |journal=Pharmacogenomics |volume=3 |issue=4 |pages=507–25 |doi=10.1517/14622416.3.4.507 |pmid=12164774}}</ref><ref>{{Cite journal |last1=Banf |first1=Michael |last2=Rhee |first2=Seung Y. |date=January 2017 |title=Computational inference of gene regulatory networks: Approaches, limitations and opportunities |journal=Biochimica et Biophysica Acta (BBA) - Gene Regulatory Mechanisms |volume=1860 |issue=1 |pages=41–52 |doi=10.1016/j.bbagrm.2016.09.003 |issn=1874-9399 |pmid=27641093 |doi-access=free}}</ref> it can also be done by the application of a correlation-based inference algorithm, as will be discussed below, an approach which is having increased success as the size of the available microarray sets keeps increasing <ref name="Marbach2012" /><ref>{{cite journal |vauthors=Faith JJ, Hayete B, Thaden JT, Mogno I, Wierzbowski J, Cottarel G, Kasif S, Collins JJ, Gardner TS |date=January 2007 |title=Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles |journal=PLOS Biology |volume=5 |issue=1 |pages=e8 |doi=10.1371/journal.pbio.0050008 |pmc=1764438 |pmid=17214507}}</ref>
# Hypothesis/assumptions
# Experimental design
# Data acquisition
#* Ensure that high quality data is collected with all the required variables being measured
# Network inference
#* This process is mathematical rigorous and computationally costly.
# Model refinement
#* Cross-check how well the results meet the expectations. the process is terminated upon obtaining a good model fit to data, otherwise, there is need for model re-adjustment.

=Biological networks=
A network is a set of nodes and a set of directed or undirected edges between the nodes. Many types of biological networks exist, including transcriptional, signalling and metabolic. Few such networks are known in anything approaching their complete structure, even in the simplest [[bacteria]]. Still less is known on the parameters governing the behavior of such networks over time, how the networks at different levels in a cell interact, and how to predict the complete state description of a [[eukaryote|eukaryotic]] cell or bacterial organism at a given point in the future. [[Systems biology]], in this sense, is still in its infancy.
A network is a set of nodes and a set of directed or undirected edges between the nodes. Many types of biological networks exist, including transcriptional, signalling and metabolic. Few such networks are known in anything approaching their complete structure, even in the simplest [[bacteria]]. Still less is known on the parameters governing the behavior of such networks over time, how the networks at different levels in a cell interact, and how to predict the complete state description of a [[eukaryote|eukaryotic]] cell or bacterial organism at a given point in the future. [[Systems biology]], in this sense, is still in its infancy.


There is great interest in [[network medicine]] for the [[modelling biological systems]]. This article focuses on a necessary prerequisite to dynamic modeling of a network: inference of the [[topology]], that is, prediction of the "wiring diagram" of the network. More specifically, we focus here on inference of biological network structure using the growing sets of high-throughput expression data for [[gene]]s, [[protein]]s, and [[metabolism|metabolites]].<ref>{{cite journal | vauthors = Tieri P, Farina L, Petti M, Astolfi L, Paci P, Castiglione F | title = Network Inference and Reconstruction in Bioinformatics | journal = Encyclopedia of Bioinformatics and Computational Biology | volume = 2 | pages = 805–813 | date = 2018 | doi = 10.1016/B978-0-12-809633-8.20290-2| isbn = 9780128114322 }}</ref> Briefly, methods using high-throughput data for inference of regulatory networks rely on searching for patterns of partial correlation or conditional probabilities that indicate causal influence.<ref name="Marbach2012">{{cite journal | vauthors = Marbach D, Costello JC, Küffner R, Vega NM, Prill RJ, Camacho DM, Allison KR, Kellis M, Collins JJ, Stolovitzky G | title = Wisdom of crowds for robust gene network inference | journal = Nature Methods | volume = 9 | issue = 8 | pages = 796–804 | date = August 2012 | pmid = 22796662 | pmc = 3512113 | doi = 10.1038/nmeth.2016 }}</ref><ref>{{cite book | vauthors = Sprites P, Glamour C, Scheines R | year = 2000 | title = Causation, Prediction, and Search: Adaptive Computation and Machine Learning | edition = 2nd | publisher = [[MIT Press]] }}</ref> Such patterns of partial correlations found in the high-throughput data, possibly combined with other supplemental data on the genes or proteins in the proposed networks, or combined with other information on the organism, form the basis upon which such [[algorithm]]s work. Such algorithms can be of use in inferring the topology of any network where the change in state of one [[Node (networking)|node]] can affect the state of other nodes.
There is great interest in network medicine for the [[modelling biological systems]]. This article focuses on inference of biological network structure using the growing sets of high-throughput expression data for [[gene]]s, [[protein]]s, and [[metabolism|metabolites]].<ref>{{cite journal | vauthors = Tieri P, Farina L, Petti M, Astolfi L, Paci P, Castiglione F | title = Network Inference and Reconstruction in Bioinformatics | journal = Encyclopedia of Bioinformatics and Computational Biology | volume = 2 | pages = 805–813 | date = 2018 | doi = 10.1016/B978-0-12-809633-8.20290-2| isbn = 9780128114322 }}</ref> Briefly, methods using high-throughput data for inference of regulatory networks rely on searching for patterns of partial correlation or conditional probabilities that indicate causal influence.<ref name="Marbach2012">{{cite journal | vauthors = Marbach D, Costello JC, Küffner R, Vega NM, Prill RJ, Camacho DM, Allison KR, Kellis M, Collins JJ, Stolovitzky G | title = Wisdom of crowds for robust gene network inference | journal = Nature Methods | volume = 9 | issue = 8 | pages = 796–804 | date = August 2012 | pmid = 22796662 | pmc = 3512113 | doi = 10.1038/nmeth.2016 }}</ref><ref>{{cite book | vauthors = Sprites P, Glamour C, Scheines R | year = 2000 | title = Causation, Prediction, and Search: Adaptive Computation and Machine Learning | edition = 2nd | publisher = [[MIT Press]] }}</ref> Such patterns of partial correlations found in the high-throughput data, possibly combined with other supplemental data on the genes or proteins in the proposed networks, or combined with other information on the organism, form the basis upon which such [[algorithm]]s work. Such algorithms can be of use in inferring the topology of any network where the change in state of one [[Node (networking)|node]] can affect the state of other nodes.

=== Transcriptional regulatory networks ===
Genes are the nodes and the edges are directed. A gene serves as the source of a direct regulatory edge to a target gene by producing an [[RNA]] or protein molecule that functions as a transcriptional activator or inhibitor of the target gene. If the gene is an activator, then it is the source of a positive regulatory connection; if an inhibitor, then it is the source of a negative regulatory connection. Computational algorithms take as primary input data measurements of [[mRNA]] expression levels of the genes under consideration for inclusion in the network, returning an estimate of the network [[topology]]. Such algorithms are typically based on linearity, independence or normality assumptions, which must be verified on a case-by-case basis.<ref>{{cite journal | vauthors = Oates CJ, Mukherjee S | title = Network Inference and Biological Dynamics | journal = The Annals of Applied Statistics | volume = 6 | issue = 3 | pages = 1209–1235 | date = September 2012 | pmid = 23284600 | doi = 10.1214/11-AOAS532 | arxiv = 1112.1047 | pmc=3533376}}</ref> Clustering or some form of statistical classification is typically employed to perform an initial organization of the high-throughput mRNA expression values derived from microarray experiments, in particular to select sets of genes as candidates for network nodes.<ref>{{cite journal | vauthors = Guthke R, Möller U, Hoffmann M, Thies F, Töpfer S | title = Dynamic network reconstruction from gene expression data applied to immune response during bacterial infection | journal = Bioinformatics | volume = 21 | issue = 8 | pages = 1626–34 | date = April 2005 | pmid = 15613398 | doi = 10.1093/bioinformatics/bti226 | doi-access = free }}</ref> The question then arises: how can the clustering or classification results be connected to the underlying biology? Such results can be useful for pattern classification – for example, to classify subtypes of [[cancer]], or to predict differential responses to a [[drug]] (pharmacogenomics). But to understand the relationships between the genes, that is, to more precisely define the influence of each gene on the others, the scientist typically attempts to reconstruct the transcriptional regulatory network.

=== Gene Co-Expression Networks ===
Main Article: [[Gene co-expression network|Gene Co-Expression Network]]

A gene co-expression network (GCN) is an [[Graph (discrete mathematics)#Undirected%20graph|undirected graph]], where each node corresponds to a [[gene]], and a pair of nodes is connected with an edge if there is a significant [[Gene expression|co-expression]] relationship between them.

=== Signal transduction ===
Main Article: [[Signal transduction]]

Signal transduction networks use proteins for the nodes and directed edges to represent interaction in which the biochemical conformation of the child is modified by the action of the parent (e.g. mediated by [[phosphorylation]], ubiquitylation, methylation, etc.). Primary input into the inference algorithm would be data from a set of experiments measuring protein activation / inactivation (e.g., phosphorylation / dephosphorylation) across a set of proteins. Inference for such signalling networks is complicated by the fact that total concentrations of signalling proteins will fluctuate over time due to transcriptional and translational regulation. Such variation can lead to statistical [[confounding]]. Accordingly, more sophisticated statistical techniques must be applied to analyse such datasets.<ref>
{{cite journal | vauthors = Oates CJ, Mukherjee S | year = 2012 | title = Structural inference using nonlinear dynamics | journal = CRiSM Working Paper | volume = 12 | issue = 7 }}</ref>(very important in the biology of cancer)

=== Metabolic Network ===
Main Article: [[Metabolic network|Metabolic Network]]

[[Metabolite]] networks use nodes to represent chemical reactions and directed edges for the [[Metabolic pathway|metabolic pathways]] and regulatory interactions that guide these reactions. Primary input into an algorithm would be data from a set of experiments measuring metabolite levels.

=== Protein-Protein Interaction Networks ===
Main Article: [[Interactome]]

One of the most intensely studied networks in biology'', Protein-protein interaction networks'' (PINs) visualize the physical relationships between proteins inside a cell. in a PIN, proteins are the nodes and their interactions are the undirected edges. PINs can be discovered with a variety of methods including; [[Two-hybrid screening|Two-hybrid Screening]], ''in vitro'': [[co-immunoprecipitation]]<ref>{{Citation |last=Isono |first=Erika |title=Co-immunoprecipitation and Protein Blots |date=2010 |url=https://doi.org/10.1007/978-1-60761-765-5_25 |work=Plant Developmental Biology: Methods and Protocols |pages=377–387 |editor-last=Hennig |editor-first=Lars |series=Methods in Molecular Biology |place=Totowa, NJ |publisher=Humana Press |language=en |doi=10.1007/978-1-60761-765-5_25 |isbn=978-1-60761-765-5 |access-date=2022-05-03 |last2=Schwechheimer |first2=Claus |editor2-last=Köhler |editor2-first=Claudia}}</ref>, blue native gel electrophoresis<ref>{{Cite journal |last=Wittig |first=Ilka |last2=Braun |first2=Hans-Peter |last3=Schägger |first3=Hermann |date=2006-06 |title=Blue native PAGE |url=https://www.nature.com/articles/nprot.2006.62 |journal=Nature Protocols |language=en |volume=1 |issue=1 |pages=418–428 |doi=10.1038/nprot.2006.62 |issn=1750-2799}}</ref>, and more<ref>{{Cite journal |last=Miernyk |first=Jan A. |last2=Thelen |first2=Jay J. |date=2008-02-04 |title=Biochemical approaches for discovering protein-protein interactions: Biochemical approaches for discovering protein-protein interactions |url=https://onlinelibrary.wiley.com/doi/10.1111/j.1365-313X.2007.03316.x |journal=The Plant Journal |language=en |volume=53 |issue=4 |pages=597–609 |doi=10.1111/j.1365-313X.2007.03316.x}}</ref>.

=== Neuronal Network ===
Main Article: [[Neural network|Neural Network]]

A neuronal network is composed to represent neurons with each node and synapses for the edges, which are typically weighted and directed. the weights of edges are usually adjusted by the activation of connected nodes. The network is usually organized into input layers, hidden layers, and output layers.

=== Food Webs ===
Main Article: [[Food web|Food Web]]

A food web is an interconnected directional graph of what eats what in an ecosystem. The members of the ecosystem are the nodes and if a member eats another member then there is a directed edge between those 2 nodes.

=== Within Species and Between Species Interaction Networks ===
These networks are defined by a set of pairwise interactions between and within a species that is used to understand the structure and function of larger [[Ecological network|ecological networks]].<ref>{{Cite journal |last=Bascompte |first=Jordi |date=2009-07-24 |title=Disentangling the Web of Life |url=https://www.sciencemag.org/lookup/doi/10.1126/science.1170749 |journal=Science |language=en |volume=325 |issue=5939 |pages=416–419 |doi=10.1126/science.1170749 |issn=0036-8075}}</ref> By using [[Social network analysis|network analysis]] we can discover and understand how these interactions link together within the system's network. It also allows us to quantify associations between individuals, which makes it possible to infer details about the network as a whole at the species and/or population level.<ref>{{Cite journal |last=Croft |first=Darren P. |last2=Krause |first2=Jens |last3=James |first3=Richard |date=2004-12-07 |title=Social networks in the guppy (Poecilia reticulata) |url=https://royalsocietypublishing.org/doi/10.1098/rsbl.2004.0206 |journal=Proceedings of the Royal Society of London. Series B: Biological Sciences |volume=271 |issue=suppl_6 |pages=S516–S519 |doi=10.1098/rsbl.2004.0206 |pmc=PMC1810091 |pmid=15801620}}</ref>

=== DNA-DNA Chromatin Networks ===
Main Article: [[Chromatin]]

DNA-DNA Chromatin Networks are used to clarify the activation or suppression of genes via the relative location of strands of [[chromatin]]. These interactions can be understood by analyzing commonalities amongst different [[Locus (genetics)|loci]], a fixed position on a [[chromosome]] where a particular gene or [[genetic marker]] is located. Network analysis can provide vital support in understanding relationships among different areas of the genome.

=== Gene Regulatory Networks ===
Main Article: [[Gene regulatory network]]

A gene regulatory network (GRN) is a set of molecular regulators that interact with each other and with other substances in the cell. The regulator can be [[DNA]], [[RNA]], [[protein]] and complexes of these. GRNs can be modeled in numerous ways including; Coupled ordinary differential equations, Boolean networks, Continuous networks, and Stochastic gene networks.

== Network Attributes ==

=== Data Sources ===
The initial data used to make the inference can have a huge impact on the accuracy of the final inference. Network data is inherently noisy and incomplete sometimes due to evidence from multiple sources that don't overlap or contradictory data. Data can be sourced in multiple ways to include manual curation of scientific literature put into databases, High-throughput datasets, computational predictions, and text mining of old scholarly articles from before the digital era.

=== Network Diameter ===
A network's diameter is the maximum number of steps separating any two nodes and can be used to determine the How connected a graph is, in topology analysis, and clustering analysis.

=== Transitivity ===
The transitivity or [[clustering coefficient]] of a network is a measure of the tendency of the nodes to cluster together. High transitivity means that the network contains communities or groups of nodes that are densely connected internally. In biological networks, finding these communities is very important, because they can reflect functional modules and protein complexes<ref>{{Cite journal |last=Hsia |first=Ching-Wu |last2=Ho |first2=Ming-Yi |last3=Shui |first3=Hao-Ai |last4=Tsai |first4=Chong-Bin |last5=Tseng |first5=Min-Jen |date=2015-02-01 |title=Analysis of dermal papilla cell interactome using STRING database to profile the ex vivo hair growth inhibition effect of a vinca alkaloid drug, colchicine |url=https://europepmc.org/articles/PMC4346914 |journal=International journal of molecular sciences |volume=16 |issue=2 |pages=3579–3598 |doi=10.3390/ijms16023579 |issn=1422-0067 |pmc=4346914 |pmid=25664862}}</ref>

=== Network Confidence ===
Network confidence is a way to measure how sure one can be that the network represents a real biological interaction. We can do this via contextual biological information, counting the number of times an interaction is reported in the literature, or group different strategies into a single score. the [https://www.ebi.ac.uk/training/online/courses/network-analysis-of-protein-interaction-data-an-introduction/building-and-analysing-ppins/assessing-reliability-and-measuring-confidence/ MIscore] method for assessing the reliability of protein-protein interaction data is based on the use of standards<ref>{{Cite journal |last=Villaveces |first=J M |last2=Jiménez |first2=R C |last3=Porras |first3=P |last4=Del-Toro |first4=N |last5=Duesbury |first5=M |last6=Dumousseau |first6=M |last7=Orchard |first7=S |last8=Choi |first8=H |last9=Ping |first9=P |last10=Zong |first10=N C |last11=Askenazi |first11=M |date=2015-01-01 |title=Merging and scoring molecular interactions utilising existing community standards: tools, use-cases and a case study |url=https://europepmc.org/articles/PMC4316181 |journal=Database |volume=2015 |doi=10.1093/database/bau131 |issn=1758-0463 |pmc=4316181 |pmid=25652942}}</ref>. MIscore gives an estimation of confidence weighting on all available evidence for an interacting pair of proteins. The method allows weighting of evidence provided by different sources, provided the data is represented following the standards created by the IMEx consortium. The weights are number of publications, detection method, interaction evidence type.

=== Closeness ===
Main Article: [[Closeness centrality|Closeness]]

Closeness, a.k.a. closeness centrality, is a measure of centrality in a network and is calculated as the reciprocal of the sum of the length of the shortest paths between the node and all other nodes in the graph. This measure can be used to make inferences in all graph types and analysis methods.

=== Betweenness ===
Main Article: [[Betweenness centrality|Betweenness]]

Betweeness, a.k.a. betweenness centrality, is a measure of centrality in a graph based on shortest paths. The betweenness for each node is the number of these shortest paths that pass through the node.

== Network Analysis Methods ==
Main Article: [[Network theory|Network Theory]]

For our purposes, network analysis is closely related to [[graph theory]]. By measuring the attributes in the previous section we can utilize many different techniques to create accurate inferences based on biological data.

=== Topology Analysis ===
Topology Analysis analyzes the topology of a network to identify relevant participates and substructures that may be of biological significance. The term encompasses an entire class of techniques such as [[network motif]] search, centrality analysis, topological clustering, and shortest paths. These are but a few examples, each of these techniques use the general idea of focusing on the topology of a network to make inferences.

==== Network Motif Search ====
A motif is defined as a frequent and unique sub-graph. By counting all the possible instances, listing all patterns, and testing isomorphisms we can derive crucial information about a network. They're suggested to be the basic building blocks complex biological networks. The computational research has focused on improving existing motif detection tools to assist the biological investigations and allow larger networks to be analyzed. Several different algorithms have been provided so far, which are elaborated in the next section.

==== Centrality Analysis ====
Centrality gives an estimation on how important a node or edge is for the connectivity or the information flow of the network. It is a useful parameter in signalling networks and it is often used when trying to find drug targets.<ref>{{Cite web |last=EMBL-EBI |title=Centrality analysis {{!}} Network analysis of protein interaction data |url=https://www.ebi.ac.uk/training/online/courses/network-analysis-of-protein-interaction-data-an-introduction/building-and-analysing-ppins/topological-ppin-analysis/centrality-analysis/ |access-date=2022-05-05 |language=en}}</ref> It is most commonly used in PINs to determine important proteins and their functions. Centrality can be measured in different ways depending on the graph and the question that needs answering, they include the degree of nodes or the number of connected edges to a node, global centrality measures, or via random walks which is used by the [[PageRank|Google PageRank]] algorithm to assign weight to each webpage<ref>{{Cite journal |last=Brin |first=Sergey |last2=Page |first2=Lawrence |date=1998-04-01 |title=The anatomy of a large-scale hypertextual Web search engine |url=https://www.sciencedirect.com/science/article/pii/S016975529800110X |journal=Computer Networks and ISDN Systems |series=Proceedings of the Seventh International World Wide Web Conference |language=en |volume=30 |issue=1 |pages=107–117 |doi=10.1016/S0169-7552(98)00110-X |issn=0169-7552}}</ref>

==== Topological Clustering ====
Topological Clustering or [[Topological data analysis|Topological Data Analysis]] (TDA) provides a general framework to analyze high dimensional, incomplete, and noisy data in a way that reduces dimensional and gives a robustness to noise. The idea that is that the shape of data sets contains relevant information. When this information is a [[Homology (mathematics)|homology]] group there is a mathematical interpretation that assumes that features that persist for a wide range of parameters are "true" features and features persisting for only a narrow range of parameters are noise, although the theoretical justification for this is unclear.<ref>{{Cite journal |last=Carlsson |first=Gunnar |date=2009 |title=Topology and data |url=https://www.ams.org/bull/2009-46-02/S0273-0979-09-01249-X/ |journal=Bulletin of the American Mathematical Society |language=en |volume=46 |issue=2 |pages=255–308 |doi=10.1090/S0273-0979-09-01249-X |issn=0273-0979}}</ref> This technique has been used for progression analysis of disease<ref>{{Citation |last=Schmidt |first=Stephan |title=Disease Progression Analysis: Towards Mechanism-Based Models |date=2011 |url=https://doi.org/10.1007/978-1-4419-7415-0_19 |work=Clinical Trial Simulations: Applications and Trends |pages=433–455 |editor-last=Kimko |editor-first=Holly H. C. |place=New York, NY |publisher=Springer |language=en |doi=10.1007/978-1-4419-7415-0_19 |isbn=978-1-4419-7415-0 |access-date=2022-05-05 |last2=Post |first2=Teun M. |last3=Boroujerdi |first3=Massoud A. |last4=van Kesteren |first4=Charlotte |last5=Ploeger |first5=Bart A. |last6=Pasqua |first6=Oscar E. Della |last7=Danhof |first7=Meindert |editor2-last=Peck |editor2-first=Carl C.}}</ref><ref>{{Cite journal |last=Nicolau |first=Monica |last2=Levine |first2=Arnold J. |last3=Carlsson |first3=Gunnar |date=2011-04-26 |title=Topology based data analysis identifies a subgroup of breast cancers with a unique mutational profile and excellent survival |url=https://pnas.org/doi/full/10.1073/pnas.1102826108 |journal=Proceedings of the National Academy of Sciences |language=en |volume=108 |issue=17 |pages=7265–7270 |doi=10.1073/pnas.1102826108 |issn=0027-8424 |pmc=PMC3084136 |pmid=21482760}}</ref>, viral evolution<ref>{{Cite journal |last=Chan |first=Joseph Minhow |last2=Carlsson |first2=Gunnar |last3=Rabadan |first3=Raul |date=2013-11-12 |title=Topology of viral evolution |url=https://pnas.org/doi/full/10.1073/pnas.1313480110 |journal=Proceedings of the National Academy of Sciences |language=en |volume=110 |issue=46 |pages=18566–18571 |doi=10.1073/pnas.1313480110 |issn=0027-8424 |pmc=PMC3831954 |pmid=24170857}}</ref>, propagation of contagions on networks<ref>{{Cite journal |last=Taylor |first=Dane |last2=Klimm |first2=Florian |last3=Harrington |first3=Heather A. |last4=Kramár |first4=Miroslav |last5=Mischaikow |first5=Konstantin |last6=Porter |first6=Mason A. |last7=Mucha |first7=Peter J. |date=2015-07-21 |title=Topological data analysis of contagion maps for examining spreading processes on networks |url=https://www.nature.com/articles/ncomms8723 |journal=Nature Communications |language=en |volume=6 |issue=1 |pages=7723 |doi=10.1038/ncomms8723 |issn=2041-1723}}</ref>, bacteria classification using molecular spectroscopy<ref>{{Cite journal |last=Offroy |first=Marc |last2=Duponchel |first2=Ludovic |date=2016-03-03 |title=Topological data analysis: A promising big data exploration tool in biology, analytical chemistry and physical chemistry |url=https://www.sciencedirect.com/science/article/pii/S0003267016300137 |journal=Analytica Chimica Acta |language=en |volume=910 |pages=1–11 |doi=10.1016/j.aca.2015.12.037 |issn=0003-2670}}</ref>, and much more in and outside of biology.

==== Shortest paths ====
The [[shortest path problem]] is a common problem in graph theory that tries to find the [[Path (graph theory)|path]] between two [[Vertex (graph theory)|vertices]] (or nodes) in a graph such that the sum of the [[Glossary of graph theory terms#weighted%20graph|weights]] of its constituent edges is minimized. This method can be used to determine the network diameter or redundancy in a network. there are many algorithms for this including [[Dijkstra's algorithm]], [[Bellman–Ford algorithm]], and the [[Floyd–Warshall algorithm]] just to name a few.


=== Clustering Analysis ===
===Transcriptional regulatory networks===
[[Cluster analysis]] groups objects (nodes) such that objects in the same cluster are more similar to each other than to those in other clusters. This can be used to perform [[pattern recognition]], [[image analysis]], [[information retrieval]], [[Statistics|statistical]] [[data analysis]], and so much more. It has applications in [[Plant]] and [[animal]] [[ecology]], Sequence analysis, antimicrobial activity analysis, and many other fields. [[:Category:Cluster analysis algorithms|Cluster analysis algorithms]] come in many forms as well such as [[Hierarchical clustering]], [[k-means clustering]], Distribution-based clustering, Density-based clustering, and Grid-based clustering.
Genes are the nodes and the edges are directed. A gene serves as the source of a direct regulatory edge to a target gene by producing an [[RNA]] or protein molecule that functions as a transcriptional activator or inhibitor of the target gene. If the gene is an activator, then it is the source of a positive regulatory connection; if an inhibitor, then it is the source of a negative regulatory connection. Computational algorithms take as primary input data measurements of [[mRNA]] expression levels of the genes under consideration for inclusion in the network, returning an estimate of the network [[topology]]. Such algorithms are typically based on linearity, independence or normality assumptions, which must be verified on a case-by-case basis.<ref>{{cite journal | vauthors = Oates CJ, Mukherjee S | title = Network Inference and Biological Dynamics | journal = The Annals of Applied Statistics | volume = 6 | issue = 3 | pages = 1209–1235 | date = September 2012 | pmid = 23284600 | doi = 10.1214/11-AOAS532 | arxiv = 1112.1047 | pmc=3533376}}</ref> Clustering or some form of statistical classification is typically employed to perform an initial organization of the high-throughput mRNA expression values derived from microarray experiments, in particular to select sets of genes as candidates for network nodes.<ref>{{cite journal | vauthors = Guthke R, Möller U, Hoffmann M, Thies F, Töpfer S | title = Dynamic network reconstruction from gene expression data applied to immune response during bacterial infection | journal = Bioinformatics | volume = 21 | issue = 8 | pages = 1626–34 | date = April 2005 | pmid = 15613398 | doi = 10.1093/bioinformatics/bti226 | doi-access = free }}</ref> The question then arises: how can the clustering or classification results be connected to the underlying biology? Such results can be useful for pattern classification – for example, to classify subtypes of [[cancer]], or to predict differential responses to a [[drug]] (pharmacogenomics). But to understand the relationships between the genes, that is, to more precisely define the influence of each gene on the others, the scientist typically attempts to reconstruct the transcriptional regulatory network. This can be done by data integration in dynamic models supported by background literature, or information in public [[database]]s, combined with the clustering results.<ref>{{cite journal | vauthors = Hecker M, Lambeck S, Toepfer S, van Someren E, Guthke R | title = Gene regulatory network inference: data integration in dynamic models-a review | journal = Bio Systems | volume = 96 | issue = 1 | pages = 86–103 | date = April 2009 | pmid = 19150482 | doi = 10.1016/j.biosystems.2008.12.004 }}</ref> The modelling can be done by a [[Boolean network]], by [[Ordinary differential equation]]s or [[Linear regression]] models, e.g. [[Least-angle regression]], by [[Bayesian network]] or based on [[Information theory]] approaches.<ref>{{cite journal | vauthors = van Someren EP, Wessels LF, Backer E, Reinders MJ | title = Genetic network modeling | journal = Pharmacogenomics | volume = 3 | issue = 4 | pages = 507–25 | date = July 2002 | pmid = 12164774 | doi = 10.1517/14622416.3.4.507 }}</ref><ref>{{Cite journal|last1=Banf|first1=Michael|last2=Rhee|first2=Seung Y.|date=January 2017|title=Computational inference of gene regulatory networks: Approaches, limitations and opportunities|journal=Biochimica et Biophysica Acta (BBA) - Gene Regulatory Mechanisms|volume=1860|issue=1|pages=41–52|doi=10.1016/j.bbagrm.2016.09.003|pmid=27641093|issn=1874-9399|doi-access=free}}</ref> For instance it can be done by the application of a correlation-based inference algorithm, as will be discussed below, an approach which is having increased success as the size of the available microarray sets keeps increasing <ref name="Marbach2012" /><ref>{{cite journal | vauthors = Faith JJ, Hayete B, Thaden JT, Mogno I, Wierzbowski J, Cottarel G, Kasif S, Collins JJ, Gardner TS | title = Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles | journal = PLOS Biology | volume = 5 | issue = 1 | pages = e8 | date = January 2007 | pmid = 17214507 | pmc = 1764438 | doi = 10.1371/journal.pbio.0050008 }}</ref><ref>{{cite journal | vauthors = Hayete B, Gardner TS, Collins JJ | title = Size matters: network inference tackles the genome scale | journal = Molecular Systems Biology | volume = 3 | issue = 1 | pages = 77 | year = 2007 | pmid = 17299414 | pmc = 1828748 | doi = 10.1038/msb4100118 }}</ref>


=== Annotation enrichment analysis ===
===Signal transduction===
Gene annotation databases are commonly used to evaluate the functional properties of experimentally derived gene sets. Annotation Enrichment Analysis (AEA) is used to overcome biases from overlap statistical methods used to assess these associations<ref>{{Cite journal |last=Glass |first=Kimberly |last2=Girvan |first2=Michelle |date=2014-02-26 |title=Annotation Enrichment Analysis: An Alternative Method for Evaluating the Functional Properties of Gene Sets |url=https://www.nature.com/articles/srep04191 |journal=Scientific Reports |language=en |volume=4 |issue=1 |pages=4191 |doi=10.1038/srep04191 |issn=2045-2322}}</ref>. It does this by using gene/protein annotations to infer which annotations are over-represented in a list of genes/proteins taken from a network.
[[Signal transduction]] networks (very important in the biology of cancer). Proteins are the nodes and directed edges represent interaction in which the biochemical conformation of the child is modified by the action of the parent (e.g. mediated by [[phosphorylation]], ubiquitylation, methylation, etc.). Primary input into the inference algorithm would be data from a set of experiments measuring protein activation / inactivation (e.g., phosphorylation / dephosphorylation) across a set of proteins. Inference for such signalling networks is complicated by the fact that total concentrations of signalling proteins will fluctuate over time due to transcriptional and translational regulation. Such variation can lead to statistical [[confounding]]. Accordingly, more sophisticated statistical techniques must be applied to analyse such datasets.<ref>
{{cite journal | vauthors = Oates CJ, Mukherjee S | year = 2012 | title = Structural inference using nonlinear dynamics | journal = CRiSM Working Paper | volume = 12 | issue = 7 }}</ref>


== Network Analysis Tools ==
===Metabolic===
{| class="wikitable"
[[Metabolite]] networks. Metabolites are the nodes and the edges are directed. Primary input into an algorithm would be data from a set of experiments measuring metabolite levels.
|+Network Analysis Tools
!Network
!Analysis Tools
|-
|Transcriptional regulatory networks
|FANMOD<ref>{{Cite web |last=Wernicke |first=Sebastian |date=1 May 2006 |title=FANMOD: a tool for fast network motif detection |url=https://academic.oup.com/crawlprevention/governor?content=%2fbioinformatics%2farticle%2f22%2f9%2f1152%2f199945 |url-status=live |access-date=2022-05-05 |website=academic.oup.com}}</ref>, ChIP-on-chip<ref name=":1">{{Cite journal |last=Blais |first=Alexandre |last2=Dynlacht |first2=Brian David |date=2005-07-01 |title=Constructing transcriptional regulatory networks |url=http://genesdev.cshlp.org/content/19/13/1499 |journal=Genes & Development |language=en |volume=19 |issue=13 |pages=1499–1511 |doi=10.1101/gad.1325605 |issn=0890-9369 |pmid=15998805}}</ref>, position–weight
matrices<ref name=":1" />, AlignACE<ref name=":1" />, MDScan<ref name=":1" />,


MEME<ref name=":1" />, REDUCE<ref name=":1" />
===Protein-protein interaction===
|-
Protein-protein interaction networks are also under very active study. However, reconstruction of these networks does not use correlation-based inference in the sense discussed for the networks already described (interaction does not necessarily imply a change in protein state), and a description of such interaction network reconstruction is left to other articles.
|Gene Co-Expression Networks
|FANMOD, Paired Design<ref>{{Cite journal |last=Li |first=Jianqiang |last2=Zhou |first2=Doudou |last3=Qiu |first3=Weiliang |last4=Shi |first4=Yuliang |last5=Yang |first5=Ji-Jiang |last6=Chen |first6=Shi |last7=Wang |first7=Qing |last8=Pan |first8=Hui |date=2018-01-12 |title=Application of Weighted Gene Co-expression Network Analysis for Data from Paired Design |url=https://www.nature.com/articles/s41598-017-18705-z |journal=Scientific Reports |language=en |volume=8 |issue=1 |pages=622 |doi=10.1038/s41598-017-18705-z |issn=2045-2322}}</ref>, WGCNA<ref>{{Cite journal |last=Liu |first=Wei |last2=Li |first2=Li |last3=Ye |first3=Hua |last4=Tu |first4=Wei |date=2017-11-01 |title=[Weighted gene co-expression network analysis in biomedicine research] |url=https://doi.org/10.13345/j.cjb.170006 |journal=Sheng wu gong cheng xue bao = Chinese journal of biotechnology |volume=33 |issue=11 |pages=1791–1801 |doi=10.13345/j.cjb.170006 |issn=1872-2075 |pmid=29202516}}</ref>
|-
|Signal transduction
|FANMOD, PathLinker<ref>{{Cite journal |last=Ritz |first=Anna |last2=Poirel |first2=Christopher L. |last3=Tegge |first3=Allison N. |last4=Sharp |first4=Nicholas |last5=Simmons |first5=Kelsey |last6=Powell |first6=Allison |last7=Kale |first7=Shiv D. |last8=Murali |first8=T. M. |date=2016-03-03 |title=Pathways on demand: automated reconstruction of human signaling networks |url=https://www.nature.com/articles/npjsba20162 |journal=npj Systems Biology and Applications |language=en |volume=2 |issue=1 |pages=1–9 |doi=10.1038/npjsba.2016.2 |issn=2056-7189}}</ref>
|-
|Metabolic Network
|FANMOD, [http://bioinformatics.ai.sri.com/ptools/ Pathway Tools], [https://ergo.integratedgenomics.com/login.ergo?m=expired&from=https%3A%2F%2Fergo%2Eintegratedgenomics%2Ecom%2F Ergo], [http://www.cogsys.cs.uni-tuebingen.de/software/KEGGtranslator/ KEGGtranslator], [https://modelseed.org/ ModelSEED]
|-
|Protein-Protein Interaction Networks
|FANMOD, NETBOX<ref>{{Cite web |title={{ngMeta['og:title']}} |url=https://bio.tools/netbox |access-date=2022-05-05 |website=bio.tools |language=en}}</ref>, Text Mining<ref>{{Cite journal |last=Hoffmann |first=Robert |last2=Krallinger |first2=Martin |last3=Andres |first3=Eduardo |last4=Tamames |first4=Javier |last5=Blaschke |first5=Christian |last6=Valencia |first6=Alfonso |date=2005-05-10 |title=Text Mining for Metabolic Pathways, Signaling Cascades, and Protein Networks |url=https://www.science.org/doi/10.1126/stke.2832005pe21 |journal=Science's STKE |language=en |volume=2005 |issue=283 |doi=10.1126/stke.2832005pe21 |issn=1525-8882}}</ref>, [https://string-db.org/ STRING]
|-
|Neuronal Network
|FANMOD, [https://www.predictiveanalyticstoday.com/neural-designer-data-mining-using-neural-networks/ Neural Designer], [https://www.predictiveanalyticstoday.com/neuroph/ Neuroph], [https://pjreddie.com/darknet/ Darknet]
|-
|Food Webs
|FANMOD, [https://github.com/opetchey/dumping_ground/tree/master/random_cascade_niche RCN], [[R (programming language)|R]]
|-
|Within Species and Between Species Interaction Networks
|FANMOD, NETBOX
|-
|DNA-DNA Chromatin Networks
|FANMOD,
|-
|Gene Regulatory Networks
|FANMOD,
|}


== See also ==
== See also ==

Revision as of 19:41, 5 May 2022

Biological network inference is the process of making inferences and predictions about biological networks.[1] By using networks to analyze patterns in biological systems, such as food-webs, we can visualize the nature and strength of interactions between species, DNA, proteins, and more.

The analysis of biological networks with respect to diseases has led to the development of the field of network medicine.[2] Recent examples of application of network theory in biology include applications to understanding the cell cycle[3] as well as a quantitative framework for developmental processes.[4] Good network inference requires proper planning and execution of an experiment, thereby ensuring quality data acquisition. Optimal experimental design in principle refers to the use of statistical and or mathematical concepts to plan for data acquisition. This must be done in such a way that the data information content is enriched, and a sufficient amount of data is collected with enough technical and biological replicates where necessary.[5]

The general cycle to modeling biological networks is as follows[5]:

  1. Prior knowledge
    • Involves a thorough literature and database search or seeking an expert’s opinion.
  2. Model selection
  3. Hypothesis/assumptions
  4. Experimental design
  5. Data acquisition
    • Ensure that high quality data is collected with all the required variables being measured
  6. Network inference
    • This process is mathematical rigorous and computationally costly.
  7. Model refinement
    • Cross-check how well the results meet the expectations. the process is terminated upon obtaining a good model fit to data, otherwise, there is need for model re-adjustment.

Biological networks

A network is a set of nodes and a set of directed or undirected edges between the nodes. Many types of biological networks exist, including transcriptional, signalling and metabolic. Few such networks are known in anything approaching their complete structure, even in the simplest bacteria. Still less is known on the parameters governing the behavior of such networks over time, how the networks at different levels in a cell interact, and how to predict the complete state description of a eukaryotic cell or bacterial organism at a given point in the future. Systems biology, in this sense, is still in its infancy.

There is great interest in network medicine for the modelling biological systems. This article focuses on inference of biological network structure using the growing sets of high-throughput expression data for genes, proteins, and metabolites.[10] Briefly, methods using high-throughput data for inference of regulatory networks rely on searching for patterns of partial correlation or conditional probabilities that indicate causal influence.[8][11] Such patterns of partial correlations found in the high-throughput data, possibly combined with other supplemental data on the genes or proteins in the proposed networks, or combined with other information on the organism, form the basis upon which such algorithms work. Such algorithms can be of use in inferring the topology of any network where the change in state of one node can affect the state of other nodes.

Transcriptional regulatory networks

Genes are the nodes and the edges are directed. A gene serves as the source of a direct regulatory edge to a target gene by producing an RNA or protein molecule that functions as a transcriptional activator or inhibitor of the target gene. If the gene is an activator, then it is the source of a positive regulatory connection; if an inhibitor, then it is the source of a negative regulatory connection. Computational algorithms take as primary input data measurements of mRNA expression levels of the genes under consideration for inclusion in the network, returning an estimate of the network topology. Such algorithms are typically based on linearity, independence or normality assumptions, which must be verified on a case-by-case basis.[12] Clustering or some form of statistical classification is typically employed to perform an initial organization of the high-throughput mRNA expression values derived from microarray experiments, in particular to select sets of genes as candidates for network nodes.[13] The question then arises: how can the clustering or classification results be connected to the underlying biology? Such results can be useful for pattern classification – for example, to classify subtypes of cancer, or to predict differential responses to a drug (pharmacogenomics). But to understand the relationships between the genes, that is, to more precisely define the influence of each gene on the others, the scientist typically attempts to reconstruct the transcriptional regulatory network.

Gene Co-Expression Networks

Main Article: Gene Co-Expression Network

A gene co-expression network (GCN) is an undirected graph, where each node corresponds to a gene, and a pair of nodes is connected with an edge if there is a significant co-expression relationship between them.

Signal transduction

Main Article: Signal transduction

Signal transduction networks use proteins for the nodes and directed edges to represent interaction in which the biochemical conformation of the child is modified by the action of the parent (e.g. mediated by phosphorylation, ubiquitylation, methylation, etc.). Primary input into the inference algorithm would be data from a set of experiments measuring protein activation / inactivation (e.g., phosphorylation / dephosphorylation) across a set of proteins. Inference for such signalling networks is complicated by the fact that total concentrations of signalling proteins will fluctuate over time due to transcriptional and translational regulation. Such variation can lead to statistical confounding. Accordingly, more sophisticated statistical techniques must be applied to analyse such datasets.[14](very important in the biology of cancer)

Metabolic Network

Main Article: Metabolic Network

Metabolite networks use nodes to represent chemical reactions and directed edges for the metabolic pathways and regulatory interactions that guide these reactions. Primary input into an algorithm would be data from a set of experiments measuring metabolite levels.

Protein-Protein Interaction Networks

Main Article: Interactome

One of the most intensely studied networks in biology, Protein-protein interaction networks (PINs) visualize the physical relationships between proteins inside a cell. in a PIN, proteins are the nodes and their interactions are the undirected edges. PINs can be discovered with a variety of methods including; Two-hybrid Screening, in vitro: co-immunoprecipitation[15], blue native gel electrophoresis[16], and more[17].

Neuronal Network

Main Article: Neural Network

A neuronal network is composed to represent neurons with each node and synapses for the edges, which are typically weighted and directed. the weights of edges are usually adjusted by the activation of connected nodes. The network is usually organized into input layers, hidden layers, and output layers.

Food Webs

Main Article: Food Web

A food web is an interconnected directional graph of what eats what in an ecosystem. The members of the ecosystem are the nodes and if a member eats another member then there is a directed edge between those 2 nodes.

Within Species and Between Species Interaction Networks

These networks are defined by a set of pairwise interactions between and within a species that is used to understand the structure and function of larger ecological networks.[18] By using network analysis we can discover and understand how these interactions link together within the system's network. It also allows us to quantify associations between individuals, which makes it possible to infer details about the network as a whole at the species and/or population level.[19]

DNA-DNA Chromatin Networks

Main Article: Chromatin

DNA-DNA Chromatin Networks are used to clarify the activation or suppression of genes via the relative location of strands of chromatin. These interactions can be understood by analyzing commonalities amongst different loci, a fixed position on a chromosome where a particular gene or genetic marker is located. Network analysis can provide vital support in understanding relationships among different areas of the genome.

Gene Regulatory Networks

Main Article: Gene regulatory network

A gene regulatory network (GRN) is a set of molecular regulators that interact with each other and with other substances in the cell. The regulator can be DNA, RNA, protein and complexes of these. GRNs can be modeled in numerous ways including; Coupled ordinary differential equations, Boolean networks, Continuous networks, and Stochastic gene networks.

Network Attributes

Data Sources

The initial data used to make the inference can have a huge impact on the accuracy of the final inference. Network data is inherently noisy and incomplete sometimes due to evidence from multiple sources that don't overlap or contradictory data. Data can be sourced in multiple ways to include manual curation of scientific literature put into databases, High-throughput datasets, computational predictions, and text mining of old scholarly articles from before the digital era.

Network Diameter

A network's diameter is the maximum number of steps separating any two nodes and can be used to determine the How connected a graph is, in topology analysis, and clustering analysis.

Transitivity

The transitivity or clustering coefficient of a network is a measure of the tendency of the nodes to cluster together. High transitivity means that the network contains communities or groups of nodes that are densely connected internally. In biological networks, finding these communities is very important, because they can reflect functional modules and protein complexes[20]

Network Confidence

Network confidence is a way to measure how sure one can be that the network represents a real biological interaction. We can do this via contextual biological information, counting the number of times an interaction is reported in the literature, or group different strategies into a single score. the MIscore method for assessing the reliability of protein-protein interaction data is based on the use of standards[21]. MIscore gives an estimation of confidence weighting on all available evidence for an interacting pair of proteins. The method allows weighting of evidence provided by different sources, provided the data is represented following the standards created by the IMEx consortium. The weights are number of publications, detection method, interaction evidence type.

Closeness

Main Article: Closeness

Closeness, a.k.a. closeness centrality, is a measure of centrality in a network and is calculated as the reciprocal of the sum of the length of the shortest paths between the node and all other nodes in the graph. This measure can be used to make inferences in all graph types and analysis methods.

Betweenness

Main Article: Betweenness

Betweeness, a.k.a. betweenness centrality, is a measure of centrality in a graph based on shortest paths. The betweenness for each node is the number of these shortest paths that pass through the node.

Network Analysis Methods

Main Article: Network Theory

For our purposes, network analysis is closely related to graph theory. By measuring the attributes in the previous section we can utilize many different techniques to create accurate inferences based on biological data.

Topology Analysis

Topology Analysis analyzes the topology of a network to identify relevant participates and substructures that may be of biological significance. The term encompasses an entire class of techniques such as network motif search, centrality analysis, topological clustering, and shortest paths. These are but a few examples, each of these techniques use the general idea of focusing on the topology of a network to make inferences.

Network Motif Search

A motif is defined as a frequent and unique sub-graph. By counting all the possible instances, listing all patterns, and testing isomorphisms we can derive crucial information about a network. They're suggested to be the basic building blocks complex biological networks. The computational research has focused on improving existing motif detection tools to assist the biological investigations and allow larger networks to be analyzed. Several different algorithms have been provided so far, which are elaborated in the next section.

Centrality Analysis

Centrality gives an estimation on how important a node or edge is for the connectivity or the information flow of the network. It is a useful parameter in signalling networks and it is often used when trying to find drug targets.[22] It is most commonly used in PINs to determine important proteins and their functions. Centrality can be measured in different ways depending on the graph and the question that needs answering, they include the degree of nodes or the number of connected edges to a node, global centrality measures, or via random walks which is used by the Google PageRank algorithm to assign weight to each webpage[23]

Topological Clustering

Topological Clustering or Topological Data Analysis (TDA) provides a general framework to analyze high dimensional, incomplete, and noisy data in a way that reduces dimensional and gives a robustness to noise. The idea that is that the shape of data sets contains relevant information. When this information is a homology group there is a mathematical interpretation that assumes that features that persist for a wide range of parameters are "true" features and features persisting for only a narrow range of parameters are noise, although the theoretical justification for this is unclear.[24] This technique has been used for progression analysis of disease[25][26], viral evolution[27], propagation of contagions on networks[28], bacteria classification using molecular spectroscopy[29], and much more in and outside of biology.

Shortest paths

The shortest path problem is a common problem in graph theory that tries to find the path between two vertices (or nodes) in a graph such that the sum of the weights of its constituent edges is minimized. This method can be used to determine the network diameter or redundancy in a network. there are many algorithms for this including Dijkstra's algorithm, Bellman–Ford algorithm, and the Floyd–Warshall algorithm just to name a few.

Clustering Analysis

Cluster analysis groups objects (nodes) such that objects in the same cluster are more similar to each other than to those in other clusters. This can be used to perform pattern recognition, image analysis, information retrieval, statistical data analysis, and so much more. It has applications in Plant and animal ecology, Sequence analysis, antimicrobial activity analysis, and many other fields. Cluster analysis algorithms come in many forms as well such as Hierarchical clustering, k-means clustering, Distribution-based clustering, Density-based clustering, and Grid-based clustering.

Annotation enrichment analysis

Gene annotation databases are commonly used to evaluate the functional properties of experimentally derived gene sets. Annotation Enrichment Analysis (AEA) is used to overcome biases from overlap statistical methods used to assess these associations[30]. It does this by using gene/protein annotations to infer which annotations are over-represented in a list of genes/proteins taken from a network.

Network Analysis Tools

Network Analysis Tools
Network Analysis Tools
Transcriptional regulatory networks FANMOD[31], ChIP-on-chip[32], position–weight

matrices[32], AlignACE[32], MDScan[32],

MEME[32], REDUCE[32]

Gene Co-Expression Networks FANMOD, Paired Design[33], WGCNA[34]
Signal transduction FANMOD, PathLinker[35]
Metabolic Network FANMOD, Pathway Tools, Ergo, KEGGtranslator, ModelSEED
Protein-Protein Interaction Networks FANMOD, NETBOX[36], Text Mining[37], STRING
Neuronal Network FANMOD, Neural Designer, Neuroph, Darknet
Food Webs FANMOD, RCN, R
Within Species and Between Species Interaction Networks FANMOD, NETBOX
DNA-DNA Chromatin Networks FANMOD,
Gene Regulatory Networks FANMOD,

See also

References

  1. ^ Mercatelli, Daniele; Scalambra, Laura; Triboli, Luca; Ray, Forest; Giorgi, Federico M. (2020). "Gene regulatory network inference resources: A practical overview". Biochimica et Biophysica Acta (BBA) - Gene Regulatory Mechanisms. 1863 (6): 194430. doi:10.1016/j.bbagrm.2019.194430. ISSN 1874-9399. PMID 31678629.
  2. ^ Barabási, Albert-László; Gulbahce, Natali; Loscalzo, Joseph (2011-01). "Network medicine: a network-based approach to human disease". Nature Reviews Genetics. 12 (1): 56–68. doi:10.1038/nrg2918. ISSN 1471-0064. {{cite journal}}: Check date values in: |date= (help)
  3. ^ Jailkhani, Noor; Ravichandran, Srikanth; Hegde, Shubhada R.; Siddiqui, Zaved; Mande, Shekhar C.; Rao, Kanury V. S. (2011-12-01). "Delineation of key regulatory elements identifies points of vulnerability in the mitogen-activated signaling network". Genome Research. 21 (12): 2067–2081. doi:10.1101/gr.116145.110. ISSN 1088-9051. PMID 21865350.
  4. ^ Jackson, Matthew D. B.; Duran-Nebreda, Salva; Bassel, George W. (2017-10-31). "Network-based approaches to quantify multicellular development". Journal of The Royal Society Interface. 14 (135): 20170484. doi:10.1098/rsif.2017.0484. PMC 5665831. PMID 29021161.{{cite journal}}: CS1 maint: PMC format (link)
  5. ^ a b Omony, Jimmy (2014-01-10). "Biological Network Inference: A Review of Methods and Assessment of Tools and Techniques". Annual Research & Review in Biology. 4 (4): 577–601. doi:10.9734/ARRB/2014/5718.
  6. ^ van Someren EP, Wessels LF, Backer E, Reinders MJ (July 2002). "Genetic network modeling". Pharmacogenomics. 3 (4): 507–25. doi:10.1517/14622416.3.4.507. PMID 12164774.
  7. ^ Banf, Michael; Rhee, Seung Y. (January 2017). "Computational inference of gene regulatory networks: Approaches, limitations and opportunities". Biochimica et Biophysica Acta (BBA) - Gene Regulatory Mechanisms. 1860 (1): 41–52. doi:10.1016/j.bbagrm.2016.09.003. ISSN 1874-9399. PMID 27641093.
  8. ^ a b Marbach D, Costello JC, Küffner R, Vega NM, Prill RJ, Camacho DM, Allison KR, Kellis M, Collins JJ, Stolovitzky G (August 2012). "Wisdom of crowds for robust gene network inference". Nature Methods. 9 (8): 796–804. doi:10.1038/nmeth.2016. PMC 3512113. PMID 22796662.
  9. ^ Faith JJ, Hayete B, Thaden JT, Mogno I, Wierzbowski J, Cottarel G, Kasif S, Collins JJ, Gardner TS (January 2007). "Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles". PLOS Biology. 5 (1): e8. doi:10.1371/journal.pbio.0050008. PMC 1764438. PMID 17214507.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  10. ^ Tieri P, Farina L, Petti M, Astolfi L, Paci P, Castiglione F (2018). "Network Inference and Reconstruction in Bioinformatics". Encyclopedia of Bioinformatics and Computational Biology. 2: 805–813. doi:10.1016/B978-0-12-809633-8.20290-2. ISBN 9780128114322.
  11. ^ Sprites P, Glamour C, Scheines R (2000). Causation, Prediction, and Search: Adaptive Computation and Machine Learning (2nd ed.). MIT Press.
  12. ^ Oates CJ, Mukherjee S (September 2012). "Network Inference and Biological Dynamics". The Annals of Applied Statistics. 6 (3): 1209–1235. arXiv:1112.1047. doi:10.1214/11-AOAS532. PMC 3533376. PMID 23284600.
  13. ^ Guthke R, Möller U, Hoffmann M, Thies F, Töpfer S (April 2005). "Dynamic network reconstruction from gene expression data applied to immune response during bacterial infection". Bioinformatics. 21 (8): 1626–34. doi:10.1093/bioinformatics/bti226. PMID 15613398.
  14. ^ Oates CJ, Mukherjee S (2012). "Structural inference using nonlinear dynamics". CRiSM Working Paper. 12 (7).
  15. ^ Isono, Erika; Schwechheimer, Claus (2010), Hennig, Lars; Köhler, Claudia (eds.), "Co-immunoprecipitation and Protein Blots", Plant Developmental Biology: Methods and Protocols, Methods in Molecular Biology, Totowa, NJ: Humana Press, pp. 377–387, doi:10.1007/978-1-60761-765-5_25, ISBN 978-1-60761-765-5, retrieved 2022-05-03
  16. ^ Wittig, Ilka; Braun, Hans-Peter; Schägger, Hermann (2006-06). "Blue native PAGE". Nature Protocols. 1 (1): 418–428. doi:10.1038/nprot.2006.62. ISSN 1750-2799. {{cite journal}}: Check date values in: |date= (help)
  17. ^ Miernyk, Jan A.; Thelen, Jay J. (2008-02-04). "Biochemical approaches for discovering protein-protein interactions: Biochemical approaches for discovering protein-protein interactions". The Plant Journal. 53 (4): 597–609. doi:10.1111/j.1365-313X.2007.03316.x.
  18. ^ Bascompte, Jordi (2009-07-24). "Disentangling the Web of Life". Science. 325 (5939): 416–419. doi:10.1126/science.1170749. ISSN 0036-8075.
  19. ^ Croft, Darren P.; Krause, Jens; James, Richard (2004-12-07). "Social networks in the guppy (Poecilia reticulata)". Proceedings of the Royal Society of London. Series B: Biological Sciences. 271 (suppl_6): S516–S519. doi:10.1098/rsbl.2004.0206. PMC 1810091. PMID 15801620.{{cite journal}}: CS1 maint: PMC format (link)
  20. ^ Hsia, Ching-Wu; Ho, Ming-Yi; Shui, Hao-Ai; Tsai, Chong-Bin; Tseng, Min-Jen (2015-02-01). "Analysis of dermal papilla cell interactome using STRING database to profile the ex vivo hair growth inhibition effect of a vinca alkaloid drug, colchicine". International journal of molecular sciences. 16 (2): 3579–3598. doi:10.3390/ijms16023579. ISSN 1422-0067. PMC 4346914. PMID 25664862.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  21. ^ Villaveces, J M; Jiménez, R C; Porras, P; Del-Toro, N; Duesbury, M; Dumousseau, M; Orchard, S; Choi, H; Ping, P; Zong, N C; Askenazi, M (2015-01-01). "Merging and scoring molecular interactions utilising existing community standards: tools, use-cases and a case study". Database. 2015. doi:10.1093/database/bau131. ISSN 1758-0463. PMC 4316181. PMID 25652942.
  22. ^ EMBL-EBI. "Centrality analysis | Network analysis of protein interaction data". Retrieved 2022-05-05.
  23. ^ Brin, Sergey; Page, Lawrence (1998-04-01). "The anatomy of a large-scale hypertextual Web search engine". Computer Networks and ISDN Systems. Proceedings of the Seventh International World Wide Web Conference. 30 (1): 107–117. doi:10.1016/S0169-7552(98)00110-X. ISSN 0169-7552.
  24. ^ Carlsson, Gunnar (2009). "Topology and data". Bulletin of the American Mathematical Society. 46 (2): 255–308. doi:10.1090/S0273-0979-09-01249-X. ISSN 0273-0979.
  25. ^ Schmidt, Stephan; Post, Teun M.; Boroujerdi, Massoud A.; van Kesteren, Charlotte; Ploeger, Bart A.; Pasqua, Oscar E. Della; Danhof, Meindert (2011), Kimko, Holly H. C.; Peck, Carl C. (eds.), "Disease Progression Analysis: Towards Mechanism-Based Models", Clinical Trial Simulations: Applications and Trends, New York, NY: Springer, pp. 433–455, doi:10.1007/978-1-4419-7415-0_19, ISBN 978-1-4419-7415-0, retrieved 2022-05-05
  26. ^ Nicolau, Monica; Levine, Arnold J.; Carlsson, Gunnar (2011-04-26). "Topology based data analysis identifies a subgroup of breast cancers with a unique mutational profile and excellent survival". Proceedings of the National Academy of Sciences. 108 (17): 7265–7270. doi:10.1073/pnas.1102826108. ISSN 0027-8424. PMC 3084136. PMID 21482760.{{cite journal}}: CS1 maint: PMC format (link)
  27. ^ Chan, Joseph Minhow; Carlsson, Gunnar; Rabadan, Raul (2013-11-12). "Topology of viral evolution". Proceedings of the National Academy of Sciences. 110 (46): 18566–18571. doi:10.1073/pnas.1313480110. ISSN 0027-8424. PMC 3831954. PMID 24170857.{{cite journal}}: CS1 maint: PMC format (link)
  28. ^ Taylor, Dane; Klimm, Florian; Harrington, Heather A.; Kramár, Miroslav; Mischaikow, Konstantin; Porter, Mason A.; Mucha, Peter J. (2015-07-21). "Topological data analysis of contagion maps for examining spreading processes on networks". Nature Communications. 6 (1): 7723. doi:10.1038/ncomms8723. ISSN 2041-1723.
  29. ^ Offroy, Marc; Duponchel, Ludovic (2016-03-03). "Topological data analysis: A promising big data exploration tool in biology, analytical chemistry and physical chemistry". Analytica Chimica Acta. 910: 1–11. doi:10.1016/j.aca.2015.12.037. ISSN 0003-2670.
  30. ^ Glass, Kimberly; Girvan, Michelle (2014-02-26). "Annotation Enrichment Analysis: An Alternative Method for Evaluating the Functional Properties of Gene Sets". Scientific Reports. 4 (1): 4191. doi:10.1038/srep04191. ISSN 2045-2322.
  31. ^ Wernicke, Sebastian (1 May 2006). "FANMOD: a tool for fast network motif detection". academic.oup.com. Retrieved 2022-05-05.{{cite web}}: CS1 maint: url-status (link)
  32. ^ a b c d e f Blais, Alexandre; Dynlacht, Brian David (2005-07-01). "Constructing transcriptional regulatory networks". Genes & Development. 19 (13): 1499–1511. doi:10.1101/gad.1325605. ISSN 0890-9369. PMID 15998805.
  33. ^ Li, Jianqiang; Zhou, Doudou; Qiu, Weiliang; Shi, Yuliang; Yang, Ji-Jiang; Chen, Shi; Wang, Qing; Pan, Hui (2018-01-12). "Application of Weighted Gene Co-expression Network Analysis for Data from Paired Design". Scientific Reports. 8 (1): 622. doi:10.1038/s41598-017-18705-z. ISSN 2045-2322.
  34. ^ Liu, Wei; Li, Li; Ye, Hua; Tu, Wei (2017-11-01). "[Weighted gene co-expression network analysis in biomedicine research]". Sheng wu gong cheng xue bao = Chinese journal of biotechnology. 33 (11): 1791–1801. doi:10.13345/j.cjb.170006. ISSN 1872-2075. PMID 29202516.
  35. ^ Ritz, Anna; Poirel, Christopher L.; Tegge, Allison N.; Sharp, Nicholas; Simmons, Kelsey; Powell, Allison; Kale, Shiv D.; Murali, T. M. (2016-03-03). "Pathways on demand: automated reconstruction of human signaling networks". npj Systems Biology and Applications. 2 (1): 1–9. doi:10.1038/npjsba.2016.2. ISSN 2056-7189.
  36. ^ "{{ngMeta['og:title']}}". bio.tools. Retrieved 2022-05-05.
  37. ^ Hoffmann, Robert; Krallinger, Martin; Andres, Eduardo; Tamames, Javier; Blaschke, Christian; Valencia, Alfonso (2005-05-10). "Text Mining for Metabolic Pathways, Signaling Cascades, and Protein Networks". Science's STKE. 2005 (283). doi:10.1126/stke.2832005pe21. ISSN 1525-8882.