Component (graph theory)

A graph with three connected components.

In graph theory, a connected component (or just component) of an undirected graph is a subgraph in which any two vertices are connected to each other by paths, and which is connected to no additional vertices in the supergraph. For example, the graph shown in the illustration has three connected components. A vertex with no incident edges is itself a connected component. A graph that is itself connected has exactly one connected component, consisting of the whole graph.

An equivalence relation

An alternative way to define connected components involves the equivalence classes of an equivalence relation that is defined on the vertices of the graph. In an undirected graph, a vertex v is reachable from a vertex u if there is a path from u to v. In this definition, a single vertex is counted as a path of length zero, and the same vertex may occur more than once within a path. Reachability is an equivalence relation, since:

It is reflexive: There is a trivial path of length zero from any vertex to itself.
It is symmetric: If there is a path from u to v, the same edges form a path from v to u.
It is transitive: If there is a path from u to v and a path from v to w, the two paths may be concatenated together to form a path from u to w.

The connected components are then the induced subgraphs formed by the equivalence classes of this relation.

The number of connected components

The number of connected components is an important topological invariant of a graph. In topological graph theory it can be interpreted as the zeroth Betti number of the graph. In algebraic graph theory it equals the multiplicity of 0 as an eigenvalue of the Laplacian matrix of the graph. It is also the index of the first nonzero coefficient of the chromatic polynomial of a graph. Numbers of connected components play a key role in the Tutte theorem characterizing graphs that have perfect matchings, and in the definition of graph toughness.

Algorithms

It is straightforward to compute the connected components of a graph in linear time (in terms of the numbers of the vertices and edges of the graph) using either breadth-first search or depth-first search. In either case, a search that begins at some particular vertex v will find the entire connected component containing v (and no more) before returning. To find all the connected components of a graph, loop through its vertices, starting a new breadth first or depth first search whenever the loop reaches a vertex that has not already been included in a previously found connected component. Hopcroft & Tarjan (1973) describe essentially this algorithm, and state that at that point it was "well known".

There are also efficient algorithms to dynamically track the connected components of a graph as vertices and edges are added, as a straightforward application of disjoint-set data structures. These algorithms require amortized O(α(n)) time per operation, where adding vertices and edges and determining the connected component in which a vertex falls are both operations, and α(n) is a very slow-growing inverse of the very quickly growing Ackermann function. A related problem is tracking connected components as all edges are deleted from a graph, one by one; an algorithm exists to solve this with constant time per query, and O(|V||E|) time to maintain the data structure; this is an amortized cost of O(|V|) per edge deletion. For forests, the cost can be reduced to O(q + |V| log |V|), or O(log |V|) amortized cost per edge deletion (Shiloach & Even 1981).

Researchers have also studied algorithms for finding connected components in more limited models of computation, such as programs in which the working memory is limited to a logarithmic number of bits (defined by the complexity class L). Lewis & Papadimitriou (1982) asked whether it is possible to test in logspace whether two vertices belong to the same connected component of an undirected graph, and defined a complexity class SL of problems logspace-equivalent to connectivity. Finally Reingold (2008) succeeded in finding an algorithm for solving this connectivity problem in logarithmic space, showing that L = SL.

Connected component in random graphs

In random graphs the sizes of connected components are given by a random variable, which, in turn, depends on the specific model.

The $G(n,p)$ model has three regions with seemingly different behavior:

Subcritical $np<1$ : All components are simple and very small, the largest component has size $|C_{1}|=O(\log n)$ ;

Critical $np=1$ : $|C_{1}|=O(n^{\frac {2}{3}})$ ;

Supercritical $np>1$ : $|C_{1}|\approx yn$ where $y=y(np)$ is the positive solution to the equation $e^{-pny}=1-y$ .

Where $C_{1}$ and $C_{2}$ are respectively the largest and the second largest components. All other components have their sizes of the order $O(\log n)$ .

References

Hopcroft, J.; Tarjan, R. (1973), "Algorithm 447: efficient algorithms for graph manipulation", Communications of the ACM, 16 (6): 372–378, doi:10.1145/362248.362272
Lewis, Harry R.; Papadimitriou, Christos H. (1982), "Symmetric space-bounded computation", Theoretical Computer Science, 19 (2): 161–187, doi:10.1016/0304-3975(82)90058-5
Reingold, Omer (2008), "Undirected connectivity in log-space", Journal of the ACM, 55 (4): Article 17, 24 pages, doi:10.1145/1391289.1391291
Shiloach, Yossi; Even, Shimon (1981), "An on-line edge-deletion problem", Journal of the ACM, 28 (1): 1–4, doi:10.1145/322234.322235

External links

MATLAB code to find connected components in undirected graphs, MATLAB File Exchange.
Connected components, Steven Skiena, The Stony Brook Algorithm Repository