Wavelet Tree

From Wikipedia, the free encyclopedia
Jump to: navigation, search
A wavelet tree on the string "abracadabra". At each node the symbols of the string are projected onto two partitions of the alphabet, and a bitvector denotes to which partition each symbol belongs. Note that only the bitvectors are stored; the strings in the nodes are only for illustratory purposes.

The Wavelet Tree is a succinct data structure to store strings in compressed space. It generalizes the \mathbf{rank}_q and \mathbf{select}_q operations defined on bitvectors to arbitrary alphabets.

Originally introduced to represent compressed suffix arrays,[1] it has found application in several contexts.[2][3] The tree is defined by recursively partitioning the alphabet into pairs of subsets; the leaves correspond to individual symbols of the alphabet, and at each node a bitvector stores whether a symbol of the string belongs to one subset or the other.

The name derives from an analogy with the wavelet transform for signals, which recursively decomposes a signal into low-frequency and high-frequency components.

Properties[edit]

Let \Sigma be a finite alphabet with \sigma=|\Sigma|. By using succinct dictionaries in the nodes, a string s \in \Sigma^* can be stored in nH_0(s) + o(|s|\log \sigma), where H_0(s) is the order-0 empirical entropy of s.

If the tree is balanced, the operations \mathbf{access}, \mathbf{rank}_q, and \mathbf{select}_q can be supported in O(\log \sigma) time.

Extensions[edit]

Several extensions to the basic structure have been presented in the literature. To reduce the height of the tree, multiary nodes can be used instead of binary.[2] The data structure can be made dynamic, supporting insertions and deletions at arbitrary points of the string; this feature enables the implementation of dynamic FM-indexes.[4] This can be further generalized, allowing the update operations to change the underlying alphabet: the Wavelet Trie[5] exploits the trie structure on an alphabet of strings to enable dynamic tree modifications.

Further reading[edit]

  • Wavelet Trees. A blog post describing the construction of a wavelet tree, with examples.

References[edit]

  1. ^ R. Grossi, A. Gupta, and J. S. Vitter, High-order entropy-compressed text indexes, Proceedings of the 14th Annual SIAM/ACM Symposium on Discrete Algorithms (SODA), January 2003, 841-850.
  2. ^ a b P. Ferragina, R. Giancarlo, G. Manzini, The myriad virtues of Wavelet Trees, Information and Computation, Volume 207, Issue 8, August 2009, Pages 849-866
  3. ^ G. Navarro, Wavelet Trees for All, Proceedings of 23rd Annual Symposium on Combinatorial Pattern Matching (CPM), 2012
  4. ^ H.-L. Chan, W.-K. Hon, T.-W. Lam, and K. Sadakane, Compressed Indexes for dynamic text collections, ACM Transactions on Algorithms, 3(2), 2007
  5. ^ R. Grossi and G. Ottaviano, The Wavelet Trie: maintaining an indexed sequence of strings in compressed space, In Proceedings of the 31st Symposium on the Principles of Database Systems (PODS), 2012