# Tarjan's strongly connected components algorithm

Data structure Tarjan's Algorithm Animation Tarjan's Algorithm. Graph $O(|V|+|E|)$

Tarjan's Algorithm (named for its discoverer, Robert Tarjan[1]) is a graph theory algorithm for finding the strongly connected components of a graph. Although it precedes it chronologically, it can be seen as an improved version of Kosaraju's algorithm, and is comparable in efficiency to the path-based strong component algorithm.

## Overview

The algorithm takes a directed graph as input, and produces a partition of the graph's vertices into the graph's strongly connected components. Every vertex of the graph appears in a single strongly connected component, even if it means a vertex appears in a strongly connected component by itself (as is the case with tree-like parts of the graph, as well as any vertex with no successor or no predecessor).

The basic idea of the algorithm is this: a depth-first search begins from an arbitrary start node (and subsequent depth-first searches are conducted on any nodes that have not yet been found). The search does not explore any node that has already been explored. The strongly connected components form the subtrees of the search tree, the roots of which are the roots of the strongly connected components.

The nodes are placed on a stack in the order in which they are visited. When the search returns from a subtree, the nodes are taken from the stack and it is determined whether each node is the root of a strongly connected component. If a node is the root of a strongly connected component, then it and all of the nodes taken off before it form that strongly connected component.

### The root property

The crux of the algorithm comes in determining whether a node is the root of a strongly connected component. The concept of the "root" applies only to this algorithm (outside of the algorithm, a strongly connected component has no single "root" node). The root node is simply the first node of the strongly connected component which is encountered during the depth-first traversal. When a node is identified as the root node, once recursion on its successors has finished, all nodes on the stack from the root upwards form a complete strongly connected component.

To find the root, each node is given a depth search index v.index, which numbers the nodes consecutively in the order in which they are discovered. In addition, each node is assigned a value v.lowlink that is equal to the smallest index of some node reachable from v, and is always less than or equal to v.index if no other node is reachable from v. Therefore v is the root of a strongly connected component if and only if v.lowlink == v.index. The value v.lowlink is computed during the depth first search such that it is always known when needed.

## The algorithm in pseudocode

```algorithm tarjan is
input: graph G = (V, E)
output: set of strongly connected components (sets of vertices)

index := 0
S := empty
for each v in V do
if (v.index is undefined) then
strongconnect(v)
end if
repeat

function strongconnect(v)
// Set the depth index for v to the smallest unused index
v.index := index
index := index + 1
S.push(v)

// Consider successors of v
for each (v, w) in E do
if (w.index is undefined) then
// Successor w has not yet been visited; recurse on it
strongconnect(w)
else if (w is in S) then
// Successor w is in stack S and hence in the current SCC
end if
repeat

// If v is a root node, pop the stack and generate an SCC
start a new strongly connected component
repeat
w := S.pop()
add w to current strongly connected component
until (w = v)
output the current strongly connected component
end if
end function
```

The index variable is the depth-first search node number counter. S is the node stack, which starts out empty and stores the history of nodes explored but not yet committed to a strongly connected component. Note that this is not the normal depth-first search stack, as nodes are not popped as the search returns up the tree; they are only popped when an entire strongly connected component has been found.

The outermost loop searches each node that has not yet been visited, ensuring that nodes which are not reachable from the first node are still eventually traversed. The function strongconnect performs a single depth-first search of the graph, finding all successors from the node v, and reporting all strongly connected components of that subgraph.

When each node finishes recursing, if its lowlink is still set to its index, then it is the root node of a strongly connected component, formed by all of the nodes above it on the stack. The algorithm pops the stack up to and including the current node, and presents all of these nodes as a strongly connected component.

## Remarks

1. Complexity: The Tarjan procedure is called once for each node; the forall statement considers each edge at most twice. The algorithm's running time is therefore linear in the number of edges in G, i.e. $O(|V|+|E|)$.
2. The test for whether w is on the stack should be done in constant time, for example, by testing a flag stored on each node that indicates whether it is on the stack.
3. While there is nothing special about the order of the nodes within each strongly connected component, one useful property of the algorithm is that no strongly connected component will be identified before any of its successors. Therefore, the order in which the strongly connected components are identified constitutes a reverse topological sort of the DAG formed by the strongly connected components.[2]

## References

1. ^ Tarjan, R. E. (1972), "Depth-first search and linear graph algorithms", SIAM Journal on Computing 1 (2): 146–160, doi:10.1137/0201010
2. ^ Harrison, Paul. "Robust topological sorting and Tarjan's algorithm in Python". Retrieved 9 February 2011.