Graphon

In graph theory and statistics, a graphon (also known as a graph limit) is a symmetric measurable function $W:[0,1]^{2}\to [0,1]$ , that is important in the study of dense graphs. Graphons arise both as a natural notion for the limit of a sequence of dense graphs, and as the fundamental defining objects of exchangeable random graph models. Graphons are tied to dense graphs by the following pair of observations: the random graph models defined by graphons give rise to dense graphs almost surely, and, by the regularity lemma, graphons capture the structure of arbitrary large dense graphs.

Statistical formulation

A graphon is a symmetric measurable function $W:[0,1]^{2}\to [0,1]$ . Usually a graphon is understood as defining an exchangeable random graph model according to the following scheme:

Each vertex $j$ of the graph is assigned an independent random value $u_{j}\sim U[0,1]$
Edge $(i,j)$ is independently included in the graph with probability $W(u_{i},u_{j})$ .

A random graph model is an exchangeable random graph model if and only if it can be defined in terms of a (possibly random) graphon in this way. The model based on a fixed graphon $W$ is sometimes denoted $\mathbb {G} (n,W)$ , by analogy with the Erdős–Rényi model of random graphs. A graph generated from a graphon $W$ in this way is called a $W$ -random graph.

It follows from this definition and the law of large numbers that, if $W\neq 0$ , exchangeable random graph models are dense almost surely.^[1]

Examples

The simplest example of a graphon is $W(x,y)\equiv p$ for some constant $p\in [0,1]$ . In this case the associated exchangeable random graph model is the Erdős–Rényi model $G(n,p)$ that includes each edge independently with probability $p$ .

If we instead start with a graphon that is piecewise constant by:

dividing the unit square into $k\times k$ blocks, and
setting $W$ equal $p_{lm}$ on the $(l,m)^{\text{th}}$ block,

the resulting exchangeable random graph model is the $k$ community stochastic block model, a generalization of the Erdős–Rényi model. We can interpret this as a random graph model consisting of $k$ distinct Erdős–Rényi graphs with parameters $p_{ll}$ respectively, with bigraphs between them where each possible edge between blocks $(l,l)$ and $(m,m)$ is included independently with probability $p_{lm}$ .

Many other popular random graph models can be understood as exchangeable random graph models defined by some graphon, a detailed survey is included in Orbanz and Roy.^[1]

Jointly exchangeable adjacency matrices

A random graph of size $n$ can be represented as a random $n\times n$ adjacency matrix. In order to impose consistency (in the sense of projectivity) between random graphs of different sizes it is natural to study the sequence of adjacency matrices arising as the upper-left $n\times n$ sub-matrices of some infinite array of random variables; this allows us to generate $G_{n}$ by adding a node to $G_{n-1}$ and sampling the edges $(j,n)$ for $j<n$ . With this perspective, random graphs are defined as random infinite symmetric arrays $(X_{ij})$ .

Following the fundamental importance of exchangeable sequences in classical probability, it is natural to look for an analogous notion in the random graph setting. One such notion is given by jointly exchangeable matrices; i.e. random matrices satisfying

(X_{ij})\ {\overset {d}{=}}\,(X_{\sigma (i)\sigma (j)})

for all permutations $\sigma$ of the natural numbers, where ${\overset {d}{=}}$ means equal in distribution. Intuitively, this condition means that the distribution of the random graph is unchanged by a relabeling of its vertices: that is, the labels of the vertices carry no information.

There is a representation theorem for jointly exchangeable random adjacency matrices, analogous to de Finetti’s representation theorem for exchangeable sequences. This is a special case of the Aldous–Hoover theorem for jointly exchangeable arrays and, in this setting, asserts that the random matrix $(X_{ij})$ is generated by:

Sample $u_{j}\sim U[0,1]$ independently
$X_{ij}=X_{ji}=1$ independently at random with probability $W(u_{i},u_{j}),$

where $W:[0,1]^{2}\to [0,1]$ is a (possibly random) graphon. That is, a random graph model has a jointly exchangeable adjacency matrix if and only if it is a jointly exchangeable random graph model defined in terms of some graphon.

Graphon estimation

Due to identifiability issues, it is impossible to estimate either the graphon function $W$ or the node latent positions $u_{i},$ and there are two main directions of graphon estimation. One direction aims at estimating $W$ up to an equivalence class,^[2]^[3] or estimate the probability matrix induced by $W$ .^[4]^[5]

Analytic formulation

Any graph on $n$ vertices $\{1,2,\dots ,n\}$ can be identified with its adjacency matrix $A_{G}$ . This matrix corresponds to a stepfunction $W_{G}:[0,1]^{2}\to [0,1]$ , defined by partitioning $[0,1]$ into intervals $I_{1},I_{2},\dots ,I_{n}$ such that $I_{j}$ has interior $\left({\frac {j-1}{n}},{\frac {j}{n}}\right)$ and for each $(x,y)\in I_{i}\times I_{j}$ , setting $W_{G}(x,y)$ equal to the $(i,j)^{\text{th}}$ entry of $A_{G}$ . This function $W_{G}$ is the associated graphon of the graph $G$ .

In general, if we have a sequence of graphs $(G_{n})$ where the number of vertices of $G_{n}$ goes to infinity, we can analyze the limiting behavior of the sequence by considering the limiting behavior of the functions $(W_{G_{n}})$ . If these graphs converge (according to some suitable definition of convergence), then we expect the limit of these graphs to correspond to the limit of these associated functions.

This motivates the definition of a graphon (short for "graph function") as a symmetric measurable function $W:[0,1]^{2}\to [0,1]$ which captures the notion of a limit of a sequence of graphs. It turns out that for sequences of dense graphs, several apparently distinct notions of convergence are equivalent and under all of them the natural limit object is a graphon.^[6]

Examples

Example 1: Take a random sequence $(G_{n})$ Erdős–Rényi graphs $G_{n}=G(n,p)$ with some fixed parameter $p$ . Intuitively, as $n$ tends to infinity, the limit of this sequence of graphs is determined solely by edge density of these graphs.

In the space of graphons, it turns out that such a sequence converges almost surely to the constant $W(x,y)\equiv p$ , which captures the above intuition.

Example 2: Take the sequence $(H_{n})$ of half-graphs, defined by taking $H_{n}$ to be the bipartite graph on $2n$ vertices $u_{1},u_{2},\dots ,u_{n}$ and $v_{1},v_{2},\dots ,v_{n}$ such that $u_{i}$ is adjacent to $v_{j}$ precisely when $i\leq j$ . If the vertices are listed in the presented order, then the adjacency matrix $A_{H_{n}}$ has two corners of "half square" block matrices filled with ones, with the rest of the entries equal to zero. For example, the adjacency matrix of $H_{3}$ is given by

${\begin{bmatrix}0&0&0&1&1&1\\0&0&0&0&1&1\\0&0&0&0&0&1\\1&0&0&0&0&0\\1&1&0&0&0&0\\1&1&1&0&0&0\end{bmatrix}}.$

As $n$ gets large, these corners of ones "smooth" out. Matching this intuition, the sequence $(H_{n})$ converges to the half-graphon $W$ defined by $W(x,y)=1$ when $|x-y|\geq 1/2$ and $W(x,y)=0$ otherwise.

Example 3: Take the sequence $(K_{n,n})$ of complete bipartite graphs with equal sized parts. If we order the vertices by placing all vertices in one part at the beginning and placing the vertices of the other part at the end, the adjacency matrix of $(K_{n,n})$ looks like a block off-diagonal matrix, with two blocks of ones and two blocks of zeros. For example, the adjacency matrix of $K_{2,2}$ is given by

${\begin{bmatrix}0&0&1&1\\0&0&1&1\\1&1&0&0\\1&1&0&0\end{bmatrix}}.$

As $n$ gets larger, this block structure of the adjacency matrix remains constant, so that this sequence of graphs converges to a "complete bipartite" graphon $W$ defined by $W(x,y)=1$ whenever $\min(x,y)\leq 1/2$ and $\max(x,y)>1/2$ , and setting $W(x,y)=0$ otherwise.

Example 4: Consider the sequence $(K_{n,n})$ from the previous example again. If we instead order vertices by alternating between parts, the adjacency matrix has a chessboard structure of zeros and ones. For example, under this ordering, the adjacency matrix of $K_{2,2}$ is given by

${\begin{bmatrix}0&1&0&1\\1&0&1&0\\0&1&0&1\\1&0&1&0\end{bmatrix}}.$

As $n$ gets larger, the adjacency matrices become a finer and finer chessboard. Despite this behavior, we still want the limit of $(K_{n,n})$ to be unique and result in the graphon from example 4. This means when we formally define convergence for a sequence of graphs, the definition of a limit should be agnostic to relabelings of the vertices.

Example 5: Take a random sequence $(G_{n})$ of $W$ -random graphs by drawing $G_{n}\sim \mathbb {G} (n,W)$ for some fixed graphon $W$ . Then just like in the first example from this section, it turns out that $(G_{n})$ converges to $W$ almost surely.

Example 6: Given graph $G$ with associated graphon $W=W_{G}$ , we can recover graph theoretic properties and parameters of $G$ by integrating transformations of $W$ .

For example, the edge density (i.e. average degree divided by number of vertices) of $G$ is given by the integral $\int _{0}^{1}\int _{0}^{1}W(x,y)\;\mathrm {d} x\,\mathrm {d} y.$ This is because $W$ is $\{0,1\}$ -valued, and each edge $(i,j)$ in $G$ corresponds to a region $I_{i}\times I_{j}$ of area $1/n^{2}$ where $W$ equals $1$ .

Similar reasoning shows that the number of triangles in $G$ is equal to ${\frac {1}{6}}\int _{0}^{1}\int _{0}^{1}\int _{0}^{1}W(x,y)W(y,z)W(z,x)\;\mathrm {d} x\,\mathrm {d} y\,\mathrm {d} z.$

Notions of convergence

There are many different ways to measure the distance between two graphs. If we are interested in metrics that "preserve" extremal properties of graphs, then we should restrict our attention to metrics that identify random graphs as similar. For example, if we randomly draw two graphs independently from an Erdős–Rényi model $G(n,p)$ for some fixed $p$ , the distance between these two graphs under a "reasonable" metric should be close to zero with high probability for large $n$ .

There are two natural metrics that behave well on dense random graphs, in this sense. The first is a sampling metric, which says two graphs are close if their distributions of subgraphs are close. The second is an edge discrepancy metric, which says two graphs are close when their edge densities are close on all their corresponding subsets of vertices.

Miraculously, a sequence of graphs converges when with respect to one distance precisely when it converges with respect to the other. Moreover, the limit objects under both distances turn out to graphons. The equivalence of these two notions of convergence mirrors how various notions of quasirandom graphs are equivalent.^[7]

Subgraph counts

One way to measure the distance between two graphs $G$ and $H$ is to compare their relative subgraph counts. That is, for each graph $F$ we can compare the number of copies of $F$ in $G$ and $F$ in $H$ . If these numbers are close for every graph $F$ , then intuitively $G$ and $H$ are similar looking graphs.

Homomorphism densities

Rather than dealing directly with subgraphs, it turns out to be easier to work with graph homomorphisms. This is fine when dealing with large, dense graphs, since in this scenario the number of subgraphs and the number of graph homomorphisms from a fixed graph are asymptotically equal.

Given two graphs $F$ and $G$ , the homomorphism density $t(F,G)$ of $F$ in $G$ is defined to be the number of graph homomorphisms from $F$ to $G$ . In other words, $t(F,G)$ is the probability a randomly chosen map from the vertices of $F$ to the vertices of $G$ sends adjacent vertices in $F$ to adjacent vertices in $G$ .

Graphons offer a simple way to compute homomorphism densities. Indeed, given a graph $G$ with associated graphon $W_{G}$ and another $F$ , we have

$t(F,G)=\int \prod _{(i,j)\in E(F)}W_{G}(x_{i},x_{j})\;\left\{\mathrm {d} x_{i}\right\}_{i\in V(F)}$

where the integral is multidimensional, taken over the unit hypercube $[0,1]^{V(F)}$ . This follows from the definition of an associated graphon, by considering when the above integrand is equal to $1$ . We can then extend the definition of homomorphism density to arbitrary graphons $W$ , by using the same integral and defining

$t(F,W)=\int \prod _{(i,j)\in E(F)}W(x_{i},x_{j})\;\left\{\mathrm {d} x_{i}\right\}_{i\in V(F)}$

for any graph $F$ .

Limits in terms of homomorphism density

Given this setup, we say a sequence of graphs $(G_{n})$ converges if for every fixed graph $F$ , the sequence of homomorphism densities $\left(t(F,G_{n})\right)$ converges. Although not evident from the definition alone, if $(G_{n})$ converges in this sense, then there always exists a graphon $W$ such that for every graph $F$ , we have $\lim _{n\to \infty }t(F,G_{n})=t(F,W)$ simultaneously.

Cut distance

Take two graphs $G$ and $H$ on the same vertex set. Because these graphs share the same vertices, one way to measure their distance is to restrict to subsets $X,Y$ of the vertex set, and for each such pair subsets compare the number of edges $e_{G}(X,Y)$ from $X$ to $Y$ in $G$ to the number of edges $e_{H}(X,Y)$ between $X$ and $Y$ in $H$ . If these numbers are similar for every pair of subsets (relative to the total number of vertices), then that suggests $G$ and $H$ are similar graphs.

To formalize this, for any symmetric, measurable function $f:[0,1]^{2}\to \mathbb {R}$ , define the cut norm of $f$ to be the quantity

$\lVert f\rVert _{\square }=\sup _{S,T\subseteq [0,1]}\left|\int _{S}\int _{T}f(x,y)\;\mathrm {d} x\,\mathrm {d} y\right|$

taken over all measurable subsets $S,T$ of the unit interval. ^[6] This norm captures our earlier notion of distance, because for two graphs $G$ and $H$ with same vertex set $V$ of size $|V|=n$ , the cut norm with the associated graphons

$\lVert W_{G}-W_{H}\rVert _{\square }={\frac {1}{n^{2}}}\max _{X,Y\subseteq V}\left|e_{G}(X,Y)-e_{H}(X,Y)\right|$

lets us compute the maximum discrepancy of the edge densities between $G$ and $H$ . Note that this definition can still be used even when the graphs being compared do not share a vertex set.

This distance measure still has one major limitation: it can assign nonzero distance to two isomorphic graphs. To make sure isomorphic graphs have distance zero, we should compute the minimum cut norm over all possible "relabellings" of the vertices. This motivates the definition of the cut distance between two graphons $W$ and $U$ to be

$\delta _{\square }(U,W)=\inf _{\varphi }\lVert U-W^{\varphi }\rVert _{\square }$

where $W^{\varphi }(x,y)=W(\varphi (x),\varphi (y))$ is the composition of $W$ with the map $\varphi$ , and the infimum is taken over all measure-preserving bijections from the unit interval to itself.^[8] The cut distance between two graphs is defined to be the cut distance between their associated graphons.

As a metric space

To make the cut-distance into a metric, we take the set of all graphons and identify two graphons $U\sim W$ whenever $\delta _{\square }(U,W)=0$ . The resulting space of graphons is denoted ${\tilde {\mathcal {W}}}_{0}$ , and together with $\delta _{\square }$ forms a metric space.

This space turns out to be compact. Moreover, it contains the set of all finite graphs as a dense subset. Here, graphs are being identified as $\{0,1\}$ -valued step functions on the unit square. These observations show that the space of graphons is a completion of the space of graphs with respect to the cut distance.

Applications

Regularity lemma

Using compactness of the space of graphons $({\tilde {\mathcal {W}}}_{0},\delta _{\square })$ , one can prove stronger versions of Szemerédi's regularity lemma.

Sidorenko's conjecture

The analytic nature of graphons allows greater flexibility in attacking inequalities related to homomorphisms.

For example, Sidorenko's conjecture is a major open problem in extremal graph theory, which asserts that for any graph $G$ on $n$ vertices with average degree $pn$ (for some $p\in [0,1]$ ) and bipartite graph $H$ on $v$ vertices and $e$ edges, the number of homomorphisms from $H$ to $G$ is at least $p^{e}n^{v}$ .^[9] Since this is quantity is the expected number of labeled subgraphs of $H$ in a random graph $G(n,p)$ , the conjecture can be interpreted as the claim that for any bipartite graph $H$ , the random graph achieves (in expectation) the minimum number of copies of $H$ over all graphs with some fixed edge density.

Many approaches to Sidorenko's conjecture formulate the problem as an integral inequality on graphons, which then allows the problem to be attacked using other analytical approaches. ^[10]

Generalizations

Graphons are naturally associated with dense simple graphs. There are extensions of this model to dense directed weighted graphs, often referred to as decorated graphons.^[11] There are also recent extensions to the sparse graph regime, from both the perspective of random graph models ^[12] and graph limit theory.^[13]^[14]

References

^ ^a ^b Orbanz, P.; Roy, D.M. (2015). "Bayesian Models of Graphs, Arrays and Other Exchangeable Random Structures". IEEE Transactions on Pattern Analysis and Machine Intelligence. 37 (2): 437–461. arXiv:1312.7857. doi:10.1109/tpami.2014.2334607. PMID 26353253.
^ Wolfe, Patrick J.; Olhede, Sofia C. (2013-09-23). "Nonparametric graphon estimation". arXiv:1309.5936 [math.ST].
^ Choi, David; Wolfe, Patrick J. (February 2014). "Co-clustering separately exchangeable network data". The Annals of Statistics. 42 (1): 29–63. arXiv:1212.4093. doi:10.1214/13-AOS1173. ISSN 0090-5364.
^ Gao, Chao; Lu, Yu; Zhou, Harrison H. (December 2015). "Rate-optimal graphon estimation". The Annals of Statistics. 43 (6): 2624–2652. arXiv:1410.5837. doi:10.1214/15-AOS1354. ISSN 0090-5364.
^ Yuan, Zhang; Elizaveta, Levina; Ji, Zhu (2017). "Estimating network edge probabilities by neighbourhood smoothing". Biometrika. 104 (4): 771–783. doi:10.1093/biomet/asx042. ISSN 0006-3444.
^ ^a ^b Lovász, L. Large Networks and Graph Limits. American Mathematical Society.
^ Chung, Fan R. K.; Graham, Ronald L.; Wilson, R. M. (1989). "Quasi-random graphs". Combinatorica. 9 (4): 345–362. doi:10.1007/BF02125347. {{cite journal}}: Invalid |ref=harv (help)
^ Glasscock, D. (2015). "What is a graphon". Notices of the American Mathematical Society. 62 (1): 46–48. arXiv:1611.00718. {{cite journal}}: Invalid |ref=harv (help)
^ Sidorenko, A. (1993). "A correlation inequality for bipartite graphs". Graphs and Combinatorics. 9 (2–4): 201–204. doi:10.1007/BF02988307. {{cite journal}}: Invalid |ref=harv (help)
^ Hatami, H. (2010). "Graph norms and Sidorenko's conjecture". Israel Journal of Combinatorics. 175 (1): 125–150. arXiv:0806.0047. doi:10.1007/s11856-010-0005-1. {{cite journal}}: Invalid |ref=harv (help)
^ Vaishampayan, V. A. (2019). "Classification in a Large Network". arXiv:1902.05531 [cs.IT]. {{cite arXiv}}: Invalid |ref=harv (help)
^ Veitch, V.; Roy, D. M. (2015). "The Class of Random Graphs Arising from Exchangeable Random Measures". arXiv:1512.03099 [math.ST].
^ Borgs, C.; Chayes, J. T.; Cohn, H.; Zhao, Y. (2019). "An L^p theory of sparse graph convergence I: limits, sparse random graph models, and power law distributions". Transactions of the American Mathematical Society. 372 (5): 3019–3062. arXiv:1401.2906. doi:10.1090/tran/7543.
^ Borgs, C.; Chayes, J. T.; Cohn, H.; Zhao, Y. (2018). "An L^p theory of sparse graph convergence II: LD convergence, quotients, and right convergence". The Annals of Probability. 46 (2018): 337–396. arXiv:1408.0744. doi:10.1214/17-AOP1187.

[Orbanz:Roy:2015-1] Orbanz, P.; Roy, D.M. (2015). "Bayesian Models of Graphs, Arrays and Other Exchangeable Random Structures". IEEE Transactions on Pattern Analysis and Machine Intelligence. 37 (2): 437–461. arXiv:1312.7857. doi:10.1109/tpami.2014.2334607. PMID 26353253.

[2] Wolfe, Patrick J.; Olhede, Sofia C. (2013-09-23). "Nonparametric graphon estimation". arXiv:1309.5936 [math.ST].

[3] Choi, David; Wolfe, Patrick J. (February 2014). "Co-clustering separately exchangeable network data". The Annals of Statistics. 42 (1): 29–63. arXiv:1212.4093. doi:10.1214/13-AOS1173. ISSN 0090-5364.

[4] Gao, Chao; Lu, Yu; Zhou, Harrison H. (December 2015). "Rate-optimal graphon estimation". The Annals of Statistics. 43 (6): 2624–2652. arXiv:1410.5837. doi:10.1214/15-AOS1354. ISSN 0090-5364.

[5] Yuan, Zhang; Elizaveta, Levina; Ji, Zhu (2017). "Estimating network edge probabilities by neighbourhood smoothing". Biometrika. 104 (4): 771–783. doi:10.1093/biomet/asx042. ISSN 0006-3444.

[Lovasz:2013-6] Lovász, L. Large Networks and Graph Limits. American Mathematical Society.

[qrandom-7] Chung, Fan R. K.; Graham, Ronald L.; Wilson, R. M. (1989). "Quasi-random graphs". Combinatorica. 9 (4): 345–362. doi:10.1007/BF02125347. {{cite journal}}: Invalid |ref=harv (help)

[whatis-8] Glasscock, D. (2015). "What is a graphon". Notices of the American Mathematical Society. 62 (1): 46–48. arXiv:1611.00718. {{cite journal}}: Invalid |ref=harv (help)

[sidorenko-9] Sidorenko, A. (1993). "A correlation inequality for bipartite graphs". Graphs and Combinatorics. 9 (2–4): 201–204. doi:10.1007/BF02988307. {{cite journal}}: Invalid |ref=harv (help)

[norms-10] Hatami, H. (2010). "Graph norms and Sidorenko's conjecture". Israel Journal of Combinatorics. 175 (1): 125–150. arXiv:0806.0047. doi:10.1007/s11856-010-0005-1. {{cite journal}}: Invalid |ref=harv (help)

[decorate-11] Vaishampayan, V. A. (2019). "Classification in a Large Network". arXiv:1902.05531 [cs.IT]. {{cite arXiv}}: Invalid |ref=harv (help)

[Veitch:Roy:2015-12] Veitch, V.; Roy, D. M. (2015). "The Class of Random Graphs Arising from Exchangeable Random Measures". arXiv:1512.03099 [math.ST].

[Borgs:Chayes:Cohn:Zhao:2014:sgc1-13] Borgs, C.; Chayes, J. T.; Cohn, H.; Zhao, Y. (2019). "An L^p theory of sparse graph convergence I: limits, sparse random graph models, and power law distributions". Transactions of the American Mathematical Society. 372 (5): 3019–3062. arXiv:1401.2906. doi:10.1090/tran/7543.

[Borgs:Chayes:Cohn:Zhao:2014:sgc2-14] Borgs, C.; Chayes, J. T.; Cohn, H.; Zhao, Y. (2018). "An L^p theory of sparse graph convergence II: LD convergence, quotients, and right convergence". The Annals of Probability. 46 (2018): 337–396. arXiv:1408.0744. doi:10.1214/17-AOP1187.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]