Proof of O(log*n) time complexity of union–find

In computer science, Union Find is an algorithm for doing certain operations on sets. This page is about proof of O(log*n) amortized time [1] of Union Find[2][3][4]

Statement: If m operations, either Union or Find, are applied to n elements, the total run time is O(m log*n), where log* is the iterated logarithm.

Proof

Lemma 1: As the find function follows the path along to the root, the rank of node it encounters is increasing.

Proof: claim that as Find and Union operations are applied to the data set, this fact remains true over time. Initially when each node is the root of its own tree, it's trivially true. The only case when the rank of a node might be changed is when the Union by Rank operation is applied. In this case, a tree with smaller rank will be attached to a tree with greater rank, rather than vice versa. And during the find operation, all nodes visited along the path will be attached to the root, which has larger rank than its children, so this operation won't change this fact either.

Lemma 2: A node u which is root of a subtree with rank r has at least 2r nodes.

Proof: Initially when each node is the root of its own tree, it's trivially true. Assume that a node u with rank r has at least 2r nodes. Then when two tree with rank r Unions by Rank and form a tree with rank r + 1, the new node has at least 2r + 2r = 2r + 1 nodes.

Lemma 3: The maximum number of nodes of rank r is at most n/2r.

Proof: From lemma 2, we know that a node u which is root of a subtree with rank r has at least 2r nodes. We will get the maximum number of nodes of rank r when each node with rank r is the root of a tree that has exactly 2r nodes. In this case, the number of nodes of rank r is n/2r

For convenience, we define "bucket" here: a bucket is a set that contains vertices with particular ranks.

We create some buckets and put vertices into the buckets according to their ranks inductively. That is, vertices with rank 0 go into the zeroth bucket, vertices with rank 1 go into the first bucket, vertices with ranks 2 and 3 go into the second bucket. If the Bth bucket contains vertices with ranks from interval [r, 2r − 1] = [r, R - 1] then the (B+1)st bucket will contain vertices with ranks from interval [R, 2R − 1].

We can make two observations about the buckets.

1. The total number of buckets is at most log*n
Proof: When we go from one bucket to the next, we add one more two to the power, that is, the next bucket to [B, 2B − 1] will be [2B, 22B − 1]
2. The maximum number of elements in bucket [B, 2B – 1] is at most 2n/2B
Proof: The maximum number of elements in bucket [B, 2B – 1] is at most n/2B + n/2B+1 + n/2B+2 + … + n/22^B – 1 ≤ 2n/2B

Let F represent the list of "find" operations performed, and let

$T_1 = \sum_F\text{(link to the root)}$

$T_2 = \sum_F\text{(number of links traversed where the buckets are different)}$

$T_3 = \sum_F\text{(number of links traversed where the buckets are the same).}$

Then the total cost of m finds is T = T1 + T2 + T3

Since each find operation makes exactly one traversal that leads to a root, we have T1 = O(m).

Also, from the bound above on the number of buckets, we have T2 = O(mlog*n).

For T3, suppose we are traversing from u to v, where u and v have rank in the bucket [B, 2B − 1]. From lemma 1, we know that the number of times we traversed a link (u,v) where u and v were in the same bucket is at most 2B − 1 − B, which is at most 2B.

Therefore, $T_3=\sum_{[B, 2^B - 1]} \sum_{u} 2^B$

From Observations 1 and 2, we can conclude that $T_3\le\sum_{B} 2^B\frac{2n}{2^B}\le 2n\log^*n.$

Therefore, T = T1 + T2 + T3 = O(m log*n).

References

1. ^ Raimund Seidel, Micha Sharir. "Top-down analysis of path compression", SIAM J. Computing 34(3):515–525, 2005
2. ^ Tarjan, Robert Endre (1975). "Efficiency of a Good But Not Linear Set Union Algorithm". Journal of the ACM 22 (2): 215–225. doi:10.1145/321879.321884.
3. ^ Hopcroft, J. E.; Ullman, J. D. (1973). "Set Merging Algorithms". SIAM Journal on Computing 2 (4): 294–303. doi:10.1137/0202024.
4. ^ Robert E. Tarjan and Jan van Leeuwen. Worst-case analysis of set union algorithms. Journal of the ACM, 31(2):245–281, 1984.