Centrality

(Redirected from Eigenvector centrality)
For the statistical concept, see Central tendency.

In graph theory and network analysis, indicators of centrality identify the most important vertices within a graph. Applications include identifying the most influential person(s) in a social network, key infrastructure nodes in the Internet or urban networks, and super spreaders of disease. Centrality concepts were first developed in social network analysis, and many of the terms used to measure centrality reflect their sociological origin.[1]

Definition and characterization of centrality indices

Centrality indices are answers to the question "What characterizes an important vertex?" The answer is given in terms of a real-valued function on the vertices of a graph, where the values produced are expected to provide a ranking which identifies the most important nodes.[2][3]

The word "importance" has a wide number of meanings, leading to many different definitions of centrality. Two categorization schemes have been proposed. "Importance" can be conceived in relation to a type of flow or transfer across the network. This allows centralities to be classified by the type of flow they consider important.[3] "Importance" can alternately be conceived as involvement in the cohesiveness of the network. This allows centralities to be classified based on how they measure cohesiveness.[4] Both of these approaches divide centralities in distinct categories. A further conclusion is that a centrality which is appropriate for one category will often "get it wrong" when applied to a different category.[3]

When centralities are categorized by their approach to cohesiveness, it becomes apparent that the majority of centralities inhabit one category. The count of the number of walks starting from a given vertex differs only in how walks are defined and counted. Restricting consideration to this group allows for a soft characterization which places centralities on a spectrum from walks of length one (degree centrality) to infinite walks (eigenvalue centrality).[2][5] The observation that many centralities share this familial relationships perhaps explains the high rank correlations between these indices.

Characterization by network flows

A network can be considered a description of the paths along which something flows. This allows a characterization based on the type of flow and the type of path encoded by the centrality. A flow can be based on transfers, where each undivisible item goes from one node to another, like a package delivery which goes from the delivery site to the client's house. A second case is the serial duplication, where this is a replication of the item which goes to the next node, so both the source and the target have it. An example is the propagation of information through gossip, with the information being propagated in a private way and with both the source and the target nodes being informed at the end of the process. The last case is the parallel duplication, with the item being duplicated to several links at the same time, like a radio broadcast which provides the same information to many listeners at once.[3]

Likewise, the type of path can be constrained to: Geodesics (shortest paths), paths (no vertex is visited more than once), trails (vertices can be visited multiple times, no edge is traversed more than once), or walks (vertices and edges can be visited/traversed multiple times).[3]

Characterization by walk structure

An alternate classification can be derived from how the centrality is constructed. This again splits into two classes. Centralities are either Radial or Medial. Radial centralities count walks which start/end from the given vertex. The degree and eigenvalue centralities are examples of radial centralities, counting the number of walks of length one or length infinity. Medial centralities count walks which pass through the given vertex. The canonical example is Freedman's betweenness centrality, the number of shortest paths which pass through the given vertex.[4]

Likewise, the counting can capture either the volume or the length of walks. Volume is the total number of walks of the given type. The three examples from the previous paragraph fall into this category. Length captures the distance from the given vertex to the remaining vertices in the graph. Freedman's closeness centrality, the total geodesic distance from a given vertex to all other vertices, is the best known example.[4] Note that this classification is independent of the type of walk counted (i.e. walk, trail, path, geodesic).

Borgatti and Everett propose that this typology provides insight into how best to compare centrality measures. Centralities placed in the same box in this 2×2 classification are similar enough to make plausible alternatives; one can reasonably compare which is better for a given application. Measures from different boxes, however, are categorically distinct. Any evaluation of relative fitness can only occur within the context of predetermining which category is more applicable, rendering the comparison moot.[4]

Radial-volume centralities exist on a spectrum

The characterization by walk structure shows that almost all centralities in wide use are radial-volume measures. These encode the belief that a vertex's centrality is a function of the centrality of the vertices it is associated with. Centralities distinguish themselves on how association is defined.

Bonacich showed that if association is defined in terms of walks, then a family of centralities can be defined based on the length of walk considered.[2] The degree counts walks of length one, the eigenvalue centrality counts walks of length infinity. Alternate definitions of association are also reasonable. The alpha centrality allows vertices to have an external source of influence. Estrada's subgraph centrality proposes only counting closed paths (triangles, squares, ...).

The heart of such measures is the observation that powers of the graph's adjacency matrix gives the number of walks of length given by that power. Similarly, the matrix exponential is also closely related to the number of walks of a given length. An initial transformation of the adjacency matrix allows differing definition of the type of walk counted. Under either approach, the centrality of a vertex can be expressed as an infinite sum, either

$\sum_{k=0}^\infty \beta^k A_{R}^{k}$

for matrix powers or

$\sum_{k=0}^\infty \frac{(\beta A_R)^k}{k!}$

for matrix exponentials, where

• $k$ is walk length,
• $A_R$ is the transformed adjacency matrix, and
• $\beta$ is a discount parameter which ensures convergence of the sum.

Bonacich's family of measures does not transform the adjacency matrix. The alpha centrality replaces the adjacency matrix with its resolvent. The subgraph centrality replaces the adjacency matrix with its trace. A startling conclusion is that regardless of the initial transformation of the adjacency matrix, all such approaches have common limiting behavior. As $\beta$ approaches zero, the indices converge to the degree centrality. As $\beta$ approaches its maximal value, the indices converge to the eigenvalue centrality.[5]

Important limitations

Centrality indices have two important limitations, one obvious and the other subtle. The obvious limitation is that a centrality which is optimal for one application is often sub-optimal for a different application. Indeed, if this were not so, we would not need so many different centralities.

The more subtle limitation is the commonly held fallacy that vertex centrality indicates the relative importance of vertices. Centrality indices are explicitly designed to produce a ranking which allows indication of the most important vertices. [2][3] This they do well, under the limitation just noted. The error is two-fold. Firstly, a ranking only orders vertices by importance, it does not quantify the difference in importance between different levels of the ranking. Secondly, and more importantly, the features which (correctly) identify the most important vertices in a given network/application do not generalize to the remaining vertices. The rankings are meaningless for the vast majority of network nodes. This explains why, for example, only the first few results of a Google image search appear in a reasonable order.

While the failure of centrality indices to generalize to the rest of the network may at first seem counter-intuitive, it follows directly from the above definitions. Complex networks have heterogeneous topology. To the extent that the optimal measure depends on the network structure of the most important vertices, a measure which is optimal for such vertices is sub-optimal for the remainder of the network. [6]

Degree centrality

Main article: Degree (graph theory)

Historically first and conceptually simplest is degree centrality, which is defined as the number of links incident upon a node (i.e., the number of ties that a node has). The degree can be interpreted in terms of the immediate risk of a node for catching whatever is flowing through the network (such as a virus, or some information). In the case of a directed network (where ties have direction), we usually define two separate measures of degree centrality, namely indegree and outdegree. Accordingly, indegree is a count of the number of ties directed to the node and outdegree is the number of ties that the node directs to others. When ties are associated to some positive aspects such as friendship or collaboration, indegree is often interpreted as a form of popularity, and outdegree as gregariousness.

The degree centrality of a vertex $v$, for a given graph $G:=(V,E)$ with $|V|$ vertices and $|E|$ edges, is defined as

$C_D(v)= \deg(v)$

Calculating degree centrality for all the nodes in a graph takes $\Theta(V^2)$ in a dense adjacency matrix representation of the graph, and for edges takes $\Theta(E)$ in a sparse matrix representation.

The definition of centrality on the node level can be extended to the whole graph, in which case we are speaking of graph centralization.[7] Let $v*$ be the node with highest degree centrality in $G$. Let $X:=(Y,Z)$ be the $|Y|$ node connected graph that maximizes the following quantity (with $y*$ being the node with highest degree centrality in $X$):

$H= \sum^{|Y|}_{j=1} [C_D(y*)-C_D(y_j)]$

Correspondingly, the degree centralization of the graph $G$ is as follows:

$C_D(G)= \frac{\displaystyle{\sum^{|V|}_{i=1}{[C_D(v*)-C_D(v_i)]}}}{H}$

The value of $H$ is maximized when the graph $X$ contains one central node to which all other nodes are connected (a star graph), and in this case $H=(n-1)(n-2)$.

Closeness centrality

In connected graphs there is a natural distance metric between all pairs of nodes, defined by the length of their shortest paths. The farness of a node x is defined as the sum of its distances from all other nodes, and its closeness was defined by Bavelas as the reciprocal of the farness,[8][9] that is:

$C(x)= \frac{1}{\sum_y d(y,x)}.$

Thus, the more central a node is the lower its total distance from all other nodes. Note that taking distances from or to all other nodes is irrelevant in undirected graphs, whereas in directed graphs distances to a node are considered a more meaningful measure of centrality, as in general (e.g., in, the web) a node has little control over its incoming links.

When a graph is not strongly connected, a widespread idea is that of using the sum of reciprocal of distances, instead of the reciprocal of the sum of distances, with the convention $1/\infty=0$:

$H(x)= \sum_{y \neq x}\frac{1}{d(y,x)}.$

This idea was explicitly stated for undirected graphs under the name harmonic centrality by Rochat (2009)[10] and proposed once again later by Opsahl (2010).[11] It was later studied in full generality for directed networks by Boldi and Vigna (2014).[12]

Note that harmonic centrality is a most natural modification of Bavelas's definition of closeness following the general principle proposed by Marchiori and Latora (2000)[13] that in graphs with infinite distances the harmonic mean behaves better than the arithmetic mean. Indeed, Bavelas's closeness can be described as the denormalized reciprocal of the arithmetic mean of distances, whereas harmonic centrality is the denormalized reciprocal of the harmonic mean of distances.

Dangalchev (2006),[14] in a work on network vulnerability, starts from Marchiori and Latora's work but proposes for undirected graphs a different definition:

$D(x)=\sum_{y\neq x}\frac{1}{2^{d(y,x)}}.$

Note that the original definition[14] uses $d(x,y)$.

The information centrality of Stephenson and Zelen (1989) is another closeness measure, which computes the harmonic mean of the resistance distances towards a vertex x, which is smaller if x has many paths of small resistance connecting it to other vertices.[15]

In the classic definition of the closeness centrality, the spread of information is modeled by the use of shortest paths. This model might not be the most realistic for all types of communication scenarios. Thus, related definitions have been discussed to measure closeness, like the random walk closeness centrality introduced by Noh and Rieger (2004). It measures the speed with which randomly walking messages reach a vertex from elsewhere in the graph—a sort of random-walk version of closeness centrality.[16] Hierarchical closeness of Tran and Kwon (2014)[17] is an extended closeness centrality to deal still in another way with the limitation of closeness in graphs that are not strongly connected. The hierarchical closeness explicitly includes information about the range of other nodes that can be affected by the given node.

Betweenness centrality

Hue (from red = 0 to blue = max) shows the node betweenness.

Betweenness is a centrality measure of a vertex within a graph (there is also edge betweenness, which is not discussed here). Betweenness centrality quantifies the number of times a node acts as a bridge along the shortest path between two other nodes. It was introduced as a measure for quantifying the control of a human on the communication between other humans in a social network by Linton Freeman[18] In his conception, vertices that have a high probability to occur on a randomly chosen shortest path between two randomly chosen vertices have a high betweenness.

The betweenness of a vertex $v$ in a graph $G:=(V,E)$ with $V$ vertices is computed as follows:

1. For each pair of vertices (s,t), compute the shortest paths between them.
2. For each pair of vertices (s,t), determine the fraction of shortest paths that pass through the vertex in question (here, vertex v).
3. Sum this fraction over all pairs of vertices (s,t).

More compactly the betweenness can be represented as:[19]

$C_B(v)= \sum_{s \neq v \neq t \in V}\frac{\sigma_{st}(v)}{\sigma_{st}}$

where $\sigma_{st}$ is total number of shortest paths from node $s$ to node $t$ and $\sigma_{st}(v)$ is the number of those paths that pass through $v$. The betweenness may be normalised by dividing through the number of pairs of vertices not including v, which for directed graphs is $(n-1)(n-2)$ and for undirected graphs is $(n-1)(n-2)/2$. For example, in an undirected star graph, the center vertex (which is contained in every possible shortest path) would have a betweenness of $(n-1)(n-2)/2$ (1, if normalised) while the leaves (which are contained in no shortest paths) would have a betweenness of 0.

From a calculation aspect, both betweenness and closeness centralities of all vertices in a graph involve calculating the shortest paths between all pairs of vertices on a graph, which requires $\Theta(V^3)$ time with the Floyd–Warshall algorithm. However, on sparse graphs, Johnson's algorithm may be more efficient, taking $O(V^2 \log V + V E)$ time. In the case of unweighted graphs the calculations can be done with Brandes' algorithm[19] which takes $O(V E)$ time. Normally, these algorithms assume that graphs are undirected and connected with the allowance of loops and multiple edges. When specifically dealing with network graphs, often graphs are without loops or multiple edges to maintain simple relationships (where edges represent connections between two people or vertices). In this case, using Brandes' algorithm will divide final centrality scores by 2 to account for each shortest path being counted twice.[19]

Eigenvector centrality

Eigenvector centrality is a measure of the influence of a node in a network. It assigns relative scores to all nodes in the network based on the concept that connections to high-scoring nodes contribute more to the score of the node in question than equal connections to low-scoring nodes. Google's PageRank is a variant of the eigenvector centrality measure.[20] Another closely related centrality measure is Katz centrality.

Using the adjacency matrix to find eigenvector centrality

For a given graph $G:=(V,E)$ with $|V|$ number of vertices let $A = (a_{v,t})$ be the adjacency matrix, i.e. $a_{v,t} = 1$ if vertex $v$ is linked to vertex $t$, and $a_{v,t} = 0$ otherwise. The centrality score of vertex $v$ can be defined as:

$x_v = \frac{1}{\lambda} \sum_{t \in M(v)}x_t = \frac{1}{\lambda} \sum_{t \in G} a_{v,t}x_t$

where $M(v)$ is a set of the neighbors of $v$ and $\lambda$ is a constant. With a small rearrangement this can be rewritten in vector notation as the eigenvector equation

$\mathbf{Ax} = {\lambda}\mathbf{x}$

In general, there will be many different eigenvalues $\lambda$ for which an eigenvector solution exists. However, the additional requirement that all the entries in the eigenvector be positive implies (by the Perron–Frobenius theorem) that only the greatest eigenvalue results in the desired centrality measure.[21] The $v^{th}$ component of the related eigenvector then gives the centrality score of the vertex $v$ in the network. Power iteration is one of many eigenvalue algorithms that may be used to find this dominant eigenvector.[20] Furthermore, this can be generalized so that the entries in A can be real numbers representing connection strengths, as in a stochastic matrix.

Katz centrality and PageRank

Main article: Katz centrality

Katz centrality[22] is a generalization of degree centrality. Degree centrality measures the number of direct neighbors, and Katz centrality measures the number of all nodes that can be connected through a path, while the contributions of distant nodes are penalized. Mathematically, it is defined as $x_i = \sum_{k=1}^{\infin}\sum_{j=1}^N \alpha^k (A^k)_{ji}$ where $\alpha$ is an attenuation factor in $(0,1)$.

Katz centrality can be viewed as a variant of eigenvector centrality. Another form of Katz centrality is $x_i = \alpha \sum_{j =1}^N a_{ij}(x_j+1).$ Compared to the expression of eigenvector centrality, $x_j$ is replaced by $x_j+1$.

It is shown that[23] the principal eigenvector (associated with the largest eigenvalue of $A$, the adjacency matrix) is the limit of Katz centrality as $\alpha$ approaches $1/\lambda$ from below.

PageRank satisfies the following equation $x_i = \alpha \sum_{j } a_{ji}\frac{x_j}{L(j)} + \frac{1-\alpha}{N},$ where $L(j) = \sum_{j} a_{ij}$ is the number of neighbors of node $j$ (or number of outbound links in a directed graph). Compared to eigenvector centrality and Katz centrality, one major difference is the scaling factor $L(j)$. Another difference between PageRank and eigenvector centrality is that the PageRank vector is a left hand eigenvector (note the factor $a_{ji}$ has indices reversed).[24]

Percolation centrality

The Percolation Centrality is defined for a given node, at a given time, as the proportion of ‘percolated paths’ that go through that node. A ‘percolated path’ is a shortest path between a pair of nodes, where the source node is percolated (e.g., infected). The target node can be percolated or non-percolated, or in a partially percolated state.

$PC^t(v)= \frac{1}{N-2}\sum_{s \neq v \neq r}\frac{\sigma_{sr}(v)}{\sigma_{sr}}\frac{{x^t}_s}{{\sum {[{x^t}_i}]}-{x^t}_v}$

where $\sigma_{sr}$ is total number of shortest paths from node $s$ to node $r$ and $\sigma_{sr}(v)$ is the number of those paths that pass through $v$. The percolation state of the node $i$ at time $t$ is denoted by ${x^t}_i$ and two special cases are when ${x^t}_i=0$ which indicates a non-percolated state at time $t$ whereas when ${x^t}_i=1$ which indicates a fully percolated state at time $t$. The values in between indicate partially percolated states ( e.g., in a network of townships, this would be the percentage of people infected in that town).

The attached weights to the percolation paths depend on the percolation levels assigned to the source nodes, based on the premise that the higher the percolation level of a source node is, the more important are the paths that originate from that node. Nodes which lie on shortest paths originating from highly percolated nodes are therefore potentially more important to the percolation. The definition of PC may also be extended to include target node weights as well. Percolation centrality calculations run in $O(NM)$ time with an efficient implementation adopted from Brandes' fast algorithm and if the calculation needs to consider target nodes weights, the worst case time is $O(N^3)$.

Cross-clique centrality

Cross-clique centrality of a single node, in a complex graph determines the connectivity of a node to different Cliques. A node with high cross-clique connectivity facilitates the propagation of information or disease in a graph. Cliques are subgraphs in which every node is connected to every other node in the clique. The cross-clique connectivity of a node $v$ for a given graph $G:=(V,E)$ with $|V|$ vertices and $|E|$ edges, is defined as $X(v)$ where $X(v)$ is the number of cliques to which vertex $v$ belongs. This measure was used in [26] but was first proposed by Everett and Borgatti in 1998 where they called it clique-overlap centrality.

Centralization

The centralization of any network is a measure of how central its most central node is in relation to how central all the other nodes are.[27] Centralization measures then (a) calculate the sum in differences in centrality between the most central node in a network and all other nodes; and (b) divide this quantity by the theoretically largest such sum of differences in any network of the same size.[27] Thus, every centrality measure can have its own centralization measure. Defined formally, if $C_x(p_i)$ is any centrality measure of point $i$, if $C_x(p_*)$ is the largest such measure in the network, and if $\max \sum_{i=1}^{N} C_x(p_*)-C_x(p_i)$ is the largest sum of differences in point centrality $C_x$ for any graph with the same number of nodes, then the centralization of the network is:[27] $C_x=\frac{\sum_{i=1}^{N} C_x(p_*)-C_x(p_i)}{\max \sum_{i=1}^{N} C_x(p_*)-C_x(p_i)}$

Extensions

Empirical and theoretical research have extended the concept of centrality in the context of static networks to dynamic centrality[28] in the context of time-dependent and temporal networks.[29][30][31]

For generalizations to weighted networks, see Opsahl et al. (2010).[32]

The concept of centrality was extended to a group level as well. For example, Group Betweenness centrality shows the proportion of geodesics connecting pairs of non-group members that pass through the group.[33][34]

Notes and references

1. ^ Newman, M.E.J. 2010. Networks: An Introduction. Oxford, UK: Oxford University Press.
2. ^ a b c d Bonacich, Phillip (1987). "Power and Centrality: A Family of Measures". American Journal of Sociology (University of Chicago Press) 92: 1170–1182. doi:10.1086/228631.
3. Borgatti, Stephen P. (2005). "Centrality and Network Flow". Social Networks (Elsevier) 27: 55–71. doi:10.1016/j.socnet.2004.11.008.
4. ^ a b c d Borgatti, Stephen P.; Everett, Martin G. (2006). "A Graph-Theoretic Perspective on Centrality". Social Networks (Elsevier) 28: 466–484. doi:10.1016/j.socnet.2005.11.005.
5. ^ a b Benzi, Michele; Klymko, Christine (2013). "A matrix analysis of different centrality measures". arXiv. Retrieved July 11, 2014.
6. ^ Lawyer, Glenn (2015). "Understanding the spreading power of all nodes in a network: a continuous-time perspective". Sci Rep 5: 8665. doi:10.1038/srep08665.
7. ^ Freeman, Linton C. "Centrality in social networks conceptual clarification." Social networks 1.3 (1979): 215–239.
8. ^ Alex Bavelas. Communication patterns in task-oriented groups. J. Acoust. Soc. Am, 22(6):725–730, 1950.
9. ^ Sabidussi, G. (1966) The centrality index of a graph. Psychometrika 31, 581–603.
10. ^ Yannick Rochat. Closeness centrality extended to unconnected graphs: The harmonic centrality index (PDF). Applications of Social Network Analysis, ASNA 2009.
11. ^
12. ^ Boldi, Paolo; Vigna, Sebastiano (2014), "Axioms for Centrality", Internet Mathematics 10
13. ^ Marchiori, Massimo; Latora, Vito (2000), "Harmony in the small-world" (PDF), Physica A: Statistical Mechanics and its Applications 285 (3-4): 539–546
14. ^ a b Dangalchev Ch., Residual Closeness in Networks, Phisica A 365, 556 (2006).
15. ^ Stephenson, K. A. and Zelen, M., 1989. Rethinking centrality: Methods and examples. Social Networks 11, 1–37.
16. ^ J. D. Noh and H. Rieger, Phys. Rev. Lett. 92, 118701 (2004).
17. ^ Tran, T.-D. and Kwon, Y.-K. Hierarchical closeness efficiently predicts disease genes in a directed signaling network, Computational biology and chemistry.
18. ^ Freeman, Linton (1977). "A set of measures of centrality based upon betweenness". Sociometry 40: 35–41. doi:10.2307/3033543.
19. ^ a b c Brandes, Ulrik (2001). "A faster algorithm for betweenness centrality" (PDF). Journal of Mathematical Sociology 25: 163–177. doi:10.1080/0022250x.2001.9990249. Retrieved October 11, 2011.
20. ^ a b http://www.ams.org/samplings/feature-column/fcarc-pagerank
21. ^ M. E. J. Newman. "The mathematics of networks" (PDF). Retrieved 2006-11-09.
22. ^ Katz, L. 1953. A New Status Index Derived from Sociometric Index. Psychometrika, 39–43.
23. ^ Bonacich, P., 1991. Simultaneous group and individual centralities. Social Networks 13, 155–168.
24. ^ How does Google rank webpages? 20Q: About Networked Life
25. ^ Piraveenan, Mahendra (2013). "Percolation Centrality: Quantifying Graph-Theoretic Impact of Nodes during Percolation in Networks". PLoSone 8 (1). doi:10.1371/journal.pone.0053095.
26. ^ Faghani, Mohamamd Reza (2013). "A Study of XSS Worm Propagation and Detection Mechanisms in Online Social Networks". IEEE Trans. Inf. Forensics and Security.
27. ^ a b c Freeman, Linton C. (1979), "centrality in social networks: Conceptual clarification" (PDF), Social Networks 1 (3): 215–239
28. ^ Braha, D. and Bar-Yam, Y. 2006. "From Centrality to Temporary Fame: Dynamic Centrality in Complex Networks." Complexity 12: 59–63.
29. ^ Hill,S.A. and Braha, D. 2010. "Dynamic Model of Time-Dependent Complex Networks." Physical Review E 82, 046105.
30. ^ Gross, T. and Sayama, H. (Eds.). 2009. Adaptive Networks: Theory, Models and Applications. Springer.
31. ^ Holme, P. and Saramäki, J. 2013. Temporal Networks. Springer.
32. ^ Opsahl, Tore; Agneessens, Filip; Skvoretz, John (2010). "Node centrality in weighted networks: Generalizing degree and shortest paths". Social Networks 32 (3): 245. doi:10.1016/j.socnet.2010.03.006.
33. ^ Everett, M. G. and Borgatti, S. P. (2005). Extending centrality. In P. J. Carrington, J. Scott and S. Wasserman (Eds.), Models and methods in social network analysis (pp. 57–76). New York: Cambridge University Press.
34. ^ Puzis, R., Yagil, D., Elovici, Y., Braha, D. (2009).Collaborative attack on Internet users’ anonymity, Internet Research 19(1)