# Wavelet Tree

A wavelet tree on the string "abracadabra". At each node the symbols of the string are projected onto two partitions of the alphabet, and a bitvector denotes to which partition each symbol belongs. Note that only the bitvectors are stored; the strings in the nodes are only for illustratory purposes.

The Wavelet Tree is a succinct data structure to store strings in compressed space. It generalizes the $\mathbf{rank}_q$ and $\mathbf{select}_q$ operations defined on bitvectors to arbitrary alphabets.

Originally introduced to represent compressed suffix arrays,[1] it has found application in several contexts.[2][3] The tree is defined by recursively partitioning the alphabet into pairs of subsets; the leaves correspond to individual symbols of the alphabet, and at each node a bitvector stores whether a symbol of the string belongs to one subset or the other.

The name derives from an analogy with the wavelet transform for signals, which recursively decomposes a signal into low-frequency and high-frequency components.

## Properties

Let $\Sigma$ be a finite alphabet with $\sigma=|\Sigma|$. By using succinct dictionaries in the nodes, a string $s \in \Sigma^*$ can be stored in $nH_0(s) + o(|s|\log \sigma)$, where $H_0(s)$ is the order-0 empirical entropy of $s$.

If the tree is balanced, the operations $\mathbf{access}$, $\mathbf{rank}_q$, and $\mathbf{select}_q$ can be supported in $O(\log \sigma)$ time.

## Extensions

Several extensions to the basic structure have been presented in the literature. To reduce the height of the tree, multiary nodes can be used instead of binary.[2] The data structure can be made dynamic, supporting insertions and deletions at arbitrary points of the string; this feature enables the implementation of dynamic FM-indexes.[4] This can be further generalized, allowing the update operations to change the underlying alphabet: the Wavelet Trie[5] exploits the trie structure on an alphabet of strings to enable dynamic tree modifications.