Iacono's working set structure

Iacono's working set data structure
Invented 2001
Invented by John Iacono
Asymptotic complexity
in big O notation
Space O(n)
Search O(log w(x))
Insert O(log n)
Delete O(log n)

In computer science, Iacono's working set structure[1] is a comparison based dictionary. It supports insertion, deletion and access operation to maintain a dynamic set of $n$ elements. The working set of an item $x$ is the set of elements that have been accessed in the structure since the last time that $x$ was accessed (or inserted if it was never accessed). Inserting and deleting in the working set structure takes $O(\log n)$ time while accessing an element $x$ takes $O(\log w(x))$. Here, $w(x)$ represents the size of the working set of $x$.

Structure

An example of a search for $x$ in the working set structure. After finding $x$, it is removed from $T_4$ and inserted into $T_1$. Finally, a shift from 1 to 4 is performed in which an element is removed from $T_i$ and inserted into $T_{i+1}$ for $1\leq i <4$.

To store a dynamic set of $n$ elements, this structure consists of a series of Red–black trees, or other Self-balancing binary search trees $T_1, T_2, \ldots, T_k$, and a series of deques (Double-ended queues) $Q_1, Q_2, \ldots Q_k$, where $k = \lceil \log\log n\rceil$. For every $1\leq i\leq k$, tree $T_i$ and deque $Q_i$ share the same contents and pointers are maintained between their corresponding elements. For every $i < k$, the size of $T_i$ and $Q_i$ is $2^{2^i}$. Tree $T_k$ and deque $Q_k$ consist of the remaining elements, i.e., their size is $n - \sum_{i=1}^{k-1} 2^{2^i}$. Therefore, the number of items in all trees and the number of elements in all deques both add up to $n$. Every element that has been inserted in the data structure is stored in exactly one of the trees and its corresponding deque.

Working set Invariant

In the deques of this structure, elements are kept in sorted order according to their working set size. Formally, element $x$ lies after $y$ in deque $Q_i$ if and only if $w(x)< w(y)$. Moreover, for every $1\leq i < k$, the elements in deque $Q_i$ have a smaller working sets than the elements in deque $Q_{i+1}$. This property is referred to as the Working set invariant. Every operation in the data structure maintains the Working set invariant.

Operations

The basic operation in this structure is called shift from $h$ to $j$, where $h$ and $j$ are indices of some trees in the structure. Two cases are considered in a shift from $h$ to $j$: If $h< j$, then for every $h\leq i < j$, taken in increasing order, an item is dequeued from $Q_i$ and enqueued into $Q_{i+1}$. The corresponding item is deleted from $T_i$ and inserted into $T_{i+1}$. The running time of this operation is $O(\sum_{i=h}^{j} \log |T_i|) = O(\sum_{i=h}^{j} \log 2^{2^i}) = O(2^ j)$. Analogously, if $j< h$, then for every $j < i \leq h$, taken in decreasing order, an item is dequeued from $Q_i$ and enqueued into $Q_{i-1}$. The corresponding item is deleted from $T_i$ and inserted into $T_{i-1}$. The running time of this operation is $O(\sum_{i=j}^{h} \log |T_i|) = O(\sum_{i=j}^{h} \log 2^{2^i}) = O(2^ h)$. Regardless of the case, after a shift operation, the size of $T_h$ decreases by one whereas the size of $T_j$ increases by one. Since that elements in the deques are sorted with respect to their working sets sizes, a shift operation maintains the Working set invariant.

Search

To search for an element $x$, search for $x$ in $T_1, T_2, \ldots T_k$, in increasing order, until finding a tree $T_j$ containing $x$. If no tree is found, the search is unsuccessful. If $x$ is found, it is deleted from $T_j$ and then inserted into $T_1$, i.e., it is moved to the front of the structure. The search finishes by performing a shift from $1$ to $j$ which restores the size of every tree and every deque to their size prior to the search operation. The running time of this search is $O(\sum_{i=1}^{j} \log 2^{2^i}) = O(2^ j)$ if the search was successful, or $O(\sum_{i=j}^{k} \log 2^{2^i}) = O(2^k) = O(\log n)$ otherwise. By the Working set property, every element in trees $T_1, T_2, \ldots, T_{j-1}$ belongs to the working set of $x$. In particular, every element in $T_{j-1}$ belongs to the working set of $x$ and hence, $w(x) > |T_{j-1}| = 2^{2^{j-1}}$. Thus, the running time of a successful search is $O(2^j) = O(\log 2^{2^{j-1}}) = O(\log w(x))$.

Insert

Inserting an element $x$ into the structure is performed by inserting $x$ into $T_1$ and enqueuing it into $Q_1$. Insertion is completed by performing a shift from $1$ to $k$. To avoid overflow, if $|T_k| = 2^{2^k}$ before the shift, i.e., if the last tree is full, then $k$ is incremented and a new empty $T_k$ and $Q_k$ is created. The running time of this operation is dominated by the shift from $1$ to $k$ whose running time is $O(2^k) = O(2^{\log\log n}) = O(\log n)$. Since element $x$, whose working set is the smallest, is enqueued in $Q_1$, the Working set invariant is preserved after the shift.

Delete

Deleting an element $x$ is done by searching for $x$ on each tree in the structure, in increasing order, until finding a tree $T_j$ that contains it (if non is found the deletion is unsuccessful). Item $x$ is deleted from $T_j$ and $Q_j$. Finally, a shift from $k$ to $j$ maintains the size of $T_j$ equal to $2^{2^j}$. The running time of this operation is $O(2^k) = O(\log n)$. The working set invariant is preserved as deleting an element does not change the order of the working set of the elements.

Discussion

Splay trees are self adjusting search trees introduced by Sleator and Tarjan[2] in 1985. Using restructuring heuristic, splay trees are able to achieve insert and delete operations in $O(\log n)$ amortized time, without storing any balance information at the nodes. Moreover, the Working Set Theorem for splay trees states that the cost to access an element in a splay tree is $O(\log w(x))$ amortized. Iacono's workings set structure obtains the same running time for search, insert and delete in the worst-case. Therefore, offering an alternative to splay trees.

References

1. ^ Iacono, John (2001). "Alternatives to splay trees with O(log n) worst-case access times". Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms: 516–522.
2. ^ Sleator, Daniel D.; Tarjan, Robert E. (1985), "Self-Adjusting Binary Search Trees", Journal of the ACM (Association for Computing Machinery) 32 (3): 652–686, doi:10.1145/3828.3835