Centrality

From Wikipedia, the free encyclopedia
  (Redirected from Betweenness)
Jump to: navigation, search

In graph theory and network analysis, centrality of a vertex measures its relative importance within a graph. Applications include how influential a person is within a social network, how important a room is within a building (space syntax), and how well-used a road is within an urban network. There are four main measures of centrality: degree, betweenness, closeness, and eigenvector. Centrality concepts were first developed in social network analysis, and many of the terms used to measure centrality reflect their sociological origin.[1]

Definition and characterization of centrality indices[edit]

Next to the following classic centrality indices, there are dozens of other more specialized centrality indices. Despite its intuitive notion there is not yet a definition or characterization of centrality indices which captures all of them.[2] A very loose definition of a centrality index is the following:

A centrality index is a real-valued function on the nodes of a graph. It is a structural index, i.e., if G and H are two isomorphic graphs and \Phi is the mapping from the vertex set V(G) of G to V(H), then the centrality of a vertex v of G must be the same as the centrality of \Phi(v) in H. Conventionally, the higher the centrality index of a node, the higher its perceived centrality in the graph.[3] This definition comprises all classic centrality measures but not all measures that fulfill this definition would be accepted as centrality indices.

Borgatti and Everett summarize that centrality indices measure the position of a node along a predefined set of walks. They characterize centrality indices along four dimensions: the set of walks, whether the length or the number of these walks is considered, the position of the node on the walks (at the start=radial; in the middle=medial), and how the numbers assigned to the paths are summarized in the measure (average, median, weighted sum, ...).[2] This leads to a characterization by the way a centrality index is calculated. In a different characterization, Borgatti differentiates the centrality indices by what type of paths they consider and which type of network flow they imply.[4] The latter characterizes the centrality indices by the quality with which they predict which node is most central for a given network flow process. This characterization thus provides guidance on when to use which centrality index.

Categorization[edit]

Reachability, Amount of flow, Vitality, Feedback; see.[5]

Degree centrality[edit]

Historically first and conceptually simplest is degree centrality, which is defined as the number of links incident upon a node (i.e., the number of ties that a node has). The degree can be interpreted in terms of the immediate risk of a node for catching whatever is flowing through the network (such as a virus, or some information). In the case of a directed network (where ties have direction), we usually define two separate measures of degree centrality, namely indegree and outdegree. Accordingly, indegree is a count of the number of ties directed to the node and outdegree is the number of ties that the node directs to others. When ties are associated to some positive aspects such as friendship or collaboration, indegree is often interpreted as a form of popularity, and outdegree as gregariousness.

The degree centrality of a vertex v, for a given graph G:=(V,E) with |V| vertices and |E| edges, is defined as

C_D(v)= \text{deg}(v)

Calculating degree centrality for all the nodes in a graph takes \Theta(V^2) in a dense adjacency matrix representation of the graph, and for edges takes \Theta(E) in a sparse matrix representation.

Sometimes the interest is in finding the centrality of a graph within a graph.

The definition of centrality on the node level can be extended to the whole graph. Let v* be the node with highest degree centrality in G. Let X:=(Y,Z) be the Y node connected graph that maximizes the following quantity (with y* being the node with highest degree centrality in X):

H= \displaystyle{\sum^{|Y|}_{j=1}{C_D(y*)-C_D(y_j)}}

Correspondingly, the degree centrality of the graph G is as follows:

C_D(G)= \frac{\displaystyle{\sum^{|V|}_{i=1}{[C_D(v*)-C_D(v_i)]}}}{H}

The value of H is maximized when the graph X contains one central node to which all other nodes are connected (a star graph), and in this case H=(n-1)(n-2).

Closeness centrality[edit]

In connected graphs there is a natural distance metric between all pairs of nodes, defined by the length of their shortest paths. The farness of a node s is defined as the sum of its distances to all other nodes, and its closeness is defined as the inverse of the farness.[6][7] Thus, the more central a node is the lower its total distance to all other nodes. Closeness can be regarded as a measure of how long it will take to spread information from s to all other nodes sequentially.[8]

In the classic definition of the closeness centrality, the spread of information is modeled by the use of shortest paths. This model might not be the most realistic for all types of communication scenarios. Thus, related definitions have been discussed to measure closeness, like the random walk closeness centrality introduced by Noh and Rieger (2004). It measures the speed with which randomly walking messages reach a vertex from elsewhere in the network—a sort of random-walk version of closeness centrality.[9]

The information centrality of Stephenson and Zelen (1989) is another closeness measure, which bears some similarity to that of Noh and Rieger. In essence it measures the harmonic mean of the resistance distances towards a vertex i, which is smaller if i has many paths of small resistance connecting it to other vertices.[10]

Note that by definition of graph theoretic distances, the classic closeness centrality of all nodes in an unconnected graph would be 0. In a work by Dangalchev (2006) relating network vulnerability, the definition for closeness is modified such that it can be calculated more easily and can be also applied to graphs which lack connectivity:[11]

C_C(v)=\sum_{t \in V\setminus v}2^{-d_G(v,t)}.

Another extension to networks with disconnected components has been proposed by Opsahl (2010),[12] and later studied by Boldi and Vigna (2013) [13] in general directed graphs:

C_H(x)= \sum_{y \neq x}\frac{1}{d(y,x)}

The formula above, with the convention 1/\infty=0, defines harmonic centrality. It is a natural modification of Bavelas's definition of closeness following the general principle proposed by Marchiori and Latora (2000) [14] that in networks with infinite distances the harmonic mean behaves better than the arithmetic mean. Indeed, Bavelas's closeness can be described as the denormalized reciprocal of the arithmetic mean of distances, whereas harmonic centrality is the denormalized reciprocal of the harmonic mean of distances.

Betweenness centrality[edit]

Hue (from red=0 to blue=max) shows the node betweenness.

Betweenness is a centrality measure of a vertex within a graph (there is also edge betweenness, which is not discussed here). Betweenness centrality quantifies the number of times a node acts as a bridge along the shortest path between two other nodes. It was introduced as a measure for quantifying the control of a human on the communication between other humans in a social network by Linton Freeman.[15] In his conception, vertices that have a high probability to occur on a randomly chosen shortest path between two randomly chosen vertices have a high betweenness.

The betweenness of a vertex v in a graph G:=(V,E) with V vertices is computed as follows:

  1. For each pair of vertices (s,t), compute the shortest paths between them.
  2. For each pair of vertices (s,t), determine the fraction of shortest paths that pass through the vertex in question (here, vertex v).
  3. Sum this fraction over all pairs of vertices (s,t).

More compactly the betweenness can be represented as:[16]

C_B(v)= \sum_{s \neq v \neq t \in V}\frac{\sigma_{st}(v)}{\sigma_{st}}

where \sigma_{st} is total number of shortest paths from node s to node t and \sigma_{st}(v) is the number of those paths that pass through v. The betweenness may be normalised by dividing through the number of pairs of vertices not including v, which for directed graphs is (n-1)(n-2) and for undirected graphs is (n-1)(n-2)/2. For example, in an undirected star graph, the center vertex (which is contained in every possible shortest path) would have a betweenness of (n-1)(n-2)/2 (1, if normalised) while the leaves (which are contained in no shortest paths) would have a betweenness of 0.

From a calculation aspect, both betweenness and closeness centralities of all vertices in a graph involve calculating the shortest paths between all pairs of vertices on a graph, which requires \Theta(V^3) time with the Floyd–Warshall algorithm. However, on sparse graphs, Johnson's algorithm may be more efficient, taking O(V^2 \log V + V E) time. In the case of unweighted graphs the calculations can be done with Brandes' algorithm[16] which takes O(V E) time. Normally, these algorithms assume that graphs are undirected and connected with the allowance of loops and multiple edges. When specifically dealing with network graphs, often graphs are without loops or multiple edges to maintain simple relationships (where edges represent connections between two people or vertices). In this case, using Brandes' algorithm will divide final centrality scores by 2 to account for each shortest path being counted twice.[16]

Eigenvector centrality[edit]

Eigenvector centrality is a measure of the influence of a node in a network. It assigns relative scores to all nodes in the network based on the concept that connections to high-scoring nodes contribute more to the score of the node in question than equal connections to low-scoring nodes. Google's PageRank is a variant of the Eigenvector centrality measure.[17] Another closely related centrality measure is Katz centrality.

Using the adjacency matrix to find eigenvector centrality[edit]

For a given graph G:=(V,E) with |V| number of vertices let A = (a_{v,t}) be the adjacency matrix, i.e. a_{v,t} = 1 if vertex v is linked to vertex t, and a_{v,t} = 0 otherwise. The centrality score of vertex v can be defined as:

x_v = \frac{1}{\lambda} \sum_{t \in M(v)}x_t = \frac{1}{\lambda} \sum_{t \in G} a_{v,t}x_t

where M(v) is a set of the neighbors of v and \lambda is a constant. With a small rearrangement this can be rewritten in vector notation as the eigenvector equation

\mathbf{Ax} = {\lambda}\mathbf{x}

In general, there will be many different eigenvalues \lambda for which an eigenvector solution exists. However, the additional requirement that all the entries in the eigenvector be positive implies (by the Perron–Frobenius theorem) that only the greatest eigenvalue results in the desired centrality measure.[18] The v^{th} component of the related eigenvector then gives the centrality score of the vertex v in the network. Power iteration is one of many eigenvalue algorithms that may be used to find this dominant eigenvector.[17] Furthermore, this can be generalized so that the entries in A can be real numbers representing connection strengths, as in a stochastic matrix.

Katz centrality and PageRank[edit]

Katz centrality [19] is a generalization of degree centrality. Degree centrality measures the number of direct neighbors, and Katz centrality measures the number of all nodes that can be connected through a path, while the contributions of distant nodes are penalized. Mathematically, it is defined as x_i = \sum_{k=1}^{\infin}\sum_{j=1}^N \alpha^k (A^k)_{ji} where \alpha is an attenuation factor in (0,1).

Katz centrality can be viewed as a variant of eigenvector centrality. Another form of Katz centrality is x_i = \alpha \sum_{j =1}^N a_{ij}(x_j+1). Compared to the expression of eigenvector centrality, x_j is replaced by x_j+1.

It is shown that [20] the principal eigenvector (associated with the largest eigenvalue of A, the adjacency matrix) is the limit of Katz centrality as \alpha approaches 1/\lambda from below.

PageRank satisfies the following equation x_i = \alpha \sum_{j } a_{ji}\frac{x_j}{L(j)} + \frac{1-\alpha}{N}, where L(j) = \sum_{j} a_{ij} is the number of neighbors of node j (or number of outbound links in a directed graph). Compared to eigenvector centrality and Katz centrality, one major difference is the scaling factor L(j). Another difference between PageRank and eigenvector centrality is that the PageRank vector is a left hand eigenvector (note the factor a_{ji} has indices reversed).[21]

Percolation Centrality[edit]

A slew of centrality measures exist to determine the ‘importance’ of a single node in a complex network. However, these measures quantify the importance of a node in purely topological terms, and the value of the node does not depend on the ‘state’ of the node in any way. It remains constant regardless of network dynamics. This is true even for the weighted betweenness measures. However, a node may very well be centrally located in terms of betweenness centrality or another centrality measure, but may not be ‘centrally’ located in the context of a network in which there is percolation. Percolation of a ‘contagion’ occurs in complex networks in a number of scenarios. For example, viral or bacterial infection can spread over social networks of people, known as contact networks. The spread of disease can also be considered at a higher level of abstraction, by contemplating a network of towns or population centres, connected by road, rail or air links. Computer viruses can spread over computer networks. Rumours or news about business offers and deals can also spread via social networks of people. In all of these scenarios, a ‘contagion’ spreads over the links of a complex network, altering the ‘states’ of the nodes as it spreads, either recoverably or otherwise. For example, in an epidemiological scenario, individuals go from ‘susceptible’ to ‘infected’ state as the infection spreads. The states the individual nodes can take in the above examples could be binary (such as received/not received a piece of news), discrete (susceptible/infected/recovered), or even continuous (such as the proportion of infected people in a town), as the contagion spreads. The common feature in all these scenarios is that the spread of contagion results in the change of node states in networks. Percolation centrality (PC) was proposed with this in mind, which specifically measures the importance of nodes in terms of aiding the percolation through the network. This measure was proposed by Piraveen et al.[22]

The Percolation Centrality is defined for a given node, at a given time, as the proportion of ‘percolated paths’ that go through that node. A ‘percolated path’ is a shortest path between a pair of nodes, where the source node is percolated (e.g., infected). The target node can be percolated or non-percolated, or in a partially percolated state.

PC^t(v)= \frac{1}{N-2}\sum_{s \neq v \neq r}\frac{\sigma_{sr}(v)}{\sigma_{sr}}\frac{{x^t}_s}{{\sum {[{x^t}_i}]}-{x^t}_v}

where \sigma_{sr}(v) is total number of shortest paths from node s to node r and \sigma_{sr} is the number of those paths that pass through v. The percolation state of the node i at time t is denoted by {x^t}_i and two special cases are when {x^t}_i=0 which indicates a non-percolated state at time t whereas when {x^t}_i=1 which indicates a fully percolated state at time t. The values in between indicate partially percolated states ( e.g., in a network of townships, this would be the percentage of people infected in that town).

The attached weights to the percolation paths depend on the percolation levels assigned to the source nodes, based on the premise that the higher the percolation level of a source node is, the more important are the paths that originate from that node. Nodes which lie on shortest paths originating from highly-percolated nodes are therefore potentially more important to the percolation. The definition of PC may also be extended to include target node weights as well. Percolation centrality calculations run in O(NM) time with an efficient implementation adopted from Brandes' fast algorithm and if the calculation needs to consider target nodes weights, the worst case time is O(N^3).

Cross-Clique Centrality[edit]

Cross-Clique centrality of a single node, in a complex graph determines the connectivity of a node to different Cliques. A node with high cross-clique connectivity facilitates the propagation of information or disease in a graph. Cliques are subgraphs in which every node is connected to every other node in the clique. The cross-clique connectivity of a node v for a given graph G:=(V,E) with |V| vertices and |E| edges, is defined as X(v) where X(v) is the number of cliques to which vertex v belongs. This measure was proposed in.[23]

Centralization[edit]

The centralization of any network is a measure of how central its most central node is in relation to how central all the other nodes are.[24] The general definition of centralization for non-weighted networks was proposed by Linton Freeman (1979). Centralization measures then (a) calculate the sum in differences in centrality between the most central node in a network and all other nodes; and (b) divide this quantity by the theoretically largest such sum of differences in any network of the same size.[24] Thus, every centrality measure can have its own centralization measure. Defined formally, if C_x(p_i) is any centrality measure of point i, if C_x(p_*) is the largest such measure in the network, and if max \sum_{i=1}^{N} C_x(p_*)-C_x(p_i) is the largest sum of differences in point centrality C_x for any graph of with the same number of nodes, then the centralization of the network is:[24]

C_x=\frac{\sum_{i=1}^{N} C_x(p_*)-C_x(p_i)}{max \sum_{i=1}^{N} C_x(p_*)-C_x(p_i)}

Extensions[edit]

Empirical and theoretical research have extended the concept of centrality in the context of static networks to dynamic centrality[25] in the context of time-dependent and temporal networks.[26][27][28]

For generalizations to weighted networks, see Opsahl et al. (2010).[29]

The concept of centrality was extended to a group level as well. For example, Group Betweenness centrality shows the proportion of geodesics connecting pairs of non-group members that pass through the group.[30][31]

See also[edit]

Notes and references[edit]

  1. ^ Newman, M.E.J. 2010. Networks: An Introduction. Oxford, UK: Oxford University Press.
  2. ^ a b Borgatti, Stephen P.; Everett, Martin G. (2005). "A Graph-Theoretic Perspective on Centrality". Social Networks (Elsevier) 28: 466–484. doi:10.1016/j.socnet.2005.11.005. 
  3. ^ Koschützki, Dirk; Katharina A. Lehmann; Leon Peeters; Stefan Richter; Dagmar Tenfelde-Podehl; Oliver Zlotowski (2005). "Centrality Indices". In Ulrik Brandes, Thomas Erlebach. Network Analysis – Methodological Foundations. LNCS 3418. Springer Verlag, Heidelberg, Germany. pp. 16–60. ISBN 978-3-540-24979-5. 
  4. ^ Stephen P. Borgatti (2005). "Centrality and Network Flow". Social Networks (Elsevier) 27: 55–71. 
  5. ^ Ulrik Brandes, Thomas Erlebach. Network Analysis – Methodological Foundations. LNCS 3418. Springer Verlag, Heidelberg, Germany. pp. 16–60. ISBN 978-3-540-24979-5.
  6. ^ Alex Bavelas. Communication patterns in task-oriented groups. J. Acoust. Soc. Am, 22(6):725–730, 1950.
  7. ^ Sabidussi, G. (1966) The centrality index of a graph. Psychometrika 31, 581–603.
  8. ^ M.E.J. Newman (2005), "A measure of betweenness centrality based on random walks", Social Networks 27: 39–54, arXiv:cond-mat/0309045, doi:10.1016/j.socnet.2004.11.009 . Papercore summary http://papercore.org/Newman2005.
  9. ^ J. D. Noh and H. Rieger, Phys. Rev. Lett. 92, 118701 (2004).
  10. ^ Stephenson, K. A. and Zelen, M., 1989. Rethinking centrality: Methods and examples. Social Networks 11, 1–37.
  11. ^ Dangalchev Ch., Residual Closeness in Networks, Phisica A 365, 556 (2006).
  12. ^ Tore Opsahl. Closeness centrality in networks with disconnected components. 
  13. ^ Boldi, Paolo; Vigna, Sebastiano (2014), "Axioms for Centrality", Internet Mathematics 
  14. ^ Massimo Marchiori and Vito Latora. Harmony in the small-world. Physica A: Statistical Mechanics and its Applications, 285(3-4):539 – 546, 2000
  15. ^ Freeman, Linton (1977). "A set of measures of centrality based upon betweenness". Sociometry 40: 35–41. 
  16. ^ a b c Brandes, Ulrik (2001). "A faster algorithm for betweenness centrality" (PDF). Journal of Mathematical Sociology 25: 163–177. Retrieved 10.11.2011.  Paperore summary http://papercore.org/Brandes2001
  17. ^ a b http://www.ams.org/samplings/feature-column/fcarc-pagerank
  18. ^ M. E. J. Newman. The mathematics of networks (PDF). Retrieved 2006-11-09. 
  19. ^ Katz, L. 1953. A New Status Index Derived from Sociometric Index. Psychometrika, 39–43.
  20. ^ Bonacich, P., 1991. Simultaneous group and individual centralities. Social Networks 13, 155–168.
  21. ^ How does Google rank webpages? 20Q: About Networked Life
  22. ^ Piraveen, Mahendra (2013). "Percolation Centrality: Quantifying Graph-Theoretic Impact of Nodes during Percolation in Networks". PLoSone. 
  23. ^ Faghani, Mohamamd Reza (2013). "A Study of XSS Worm Propagation and Detection Mechanisms in Online Social Networks". IEEE Trans. Inf. Forensics and Security. 
  24. ^ a b c Freeman, L. C. (1979). Centrality in social networks: Conceptual clarification. Social Networks, 1(3), 215–239.
  25. ^ Braha, D. and Bar-Yam, Y. 2006. "From Centrality to Temporary Fame: Dynamic Centrality in Complex Networks." Complexity 12: 59-63.
  26. ^ Hill,S.A. and Braha, D. 2010. "Dynamic Model of Time-Dependent Complex Networks." Physical Review E 82, 046105.
  27. ^ Gross, T. and Sayama, H. (Eds.). 2009. Adaptive Networks: Theory, Models and Applications. Springer.
  28. ^ Holme, P. and Saramäki, J. 2013. Temporal Networks. Springer.
  29. ^ Opsahl, Tore; Agneessens, Filip; Skvoretz, John (2010). "Node centrality in weighted networks: Generalizing degree and shortest paths". Social Networks 32 (3): 245. doi:10.1016/j.socnet.2010.03.006. 
  30. ^ Everett, M. G. and Borgatti, S. P. (2005). Extending centrality. In P. J. Carrington, J. Scott and S. Wasserman (Eds.), Models and methods in social network analysis (pp. 57-76). New York: Cambridge University Press.
  31. ^ Puzis, R., Yagil, D., Elovici, Y., Braha, D. (2009).Collaborative attack on Internet users’ anonymity, Internet Research 19(1)

Further reading[edit]

  • Freeman, L. C. (1979). Centrality in social networks: Conceptual clarification. Social Networks, 1(3), 215–239.
  • Sabidussi, G. (1966). The centrality index of a graph. Psychometrika, 31 (4), 581–603.
  • Freeman, L. C. (1977). A set of measures of centrality based on betweenness. Sociometry 40, 35–41.
  • Koschützki, D.; Lehmann, K. A.; Peeters, L.; Richter, S.; Tenfelde-Podehl, D. and Zlotowski, O. (2005) Centrality Indices. In Brandes, U. and Erlebach, T. (Eds.) Network Analysis: Methodological Foundations, pp. 16–61, LNCS 3418, Springer-Verlag.
  • Bonacich, P. (1987). Power and Centrality: A Family of Measures, The American Journal of Sociology, 92 (5), pp 1170–1182.

External links[edit]