Hash array mapped trie

From Wikipedia, the free encyclopedia
Jump to: navigation, search

A hash array mapped trie[1] (HAMT) is an implementation of an associative array that combines the characteristics of a hash table and an array mapped trie.[1] It is a refined version of the more general notion of a hash tree.

Operation[edit]

A HAMT is an array mapped trie where the keys are first hashed in order to ensure an even distribution of keys and to ensure a constant key length.

In a typical implementation of an array mapped trie, each node may branch to up to 32 other nodes. However, as allocating space for 32 pointers for each node would be expensive, each node instead contains a bitmap which is 32 bits long where each bit indicates the presence of a path. This is followed by an array of pointers equal in length to the number of ones (or Hamming weight of the bitmap).

Advantages of HAMTs[edit]

The hash array mapped trie achieves almost hash table-like speed, despite being a functional, persistent data structure.

Problems with HAMTs[edit]

Implementation of a HAMT involves the use of the population count function, which counts the number of ones in the binary representation of a number. This operation is available in many instruction set architectures, but it is available in only some high-level languages. Although population count can be implemented in software in O(1) time using a series of shift and add instructions, doing so may perform the operation an order of magnitude slower.

Implementations[edit]

The programming languages Clojure[2] and Scala use a persistent variant of hash array mapped tries for their native hash map type. The Haskell library unordered-containers uses the same to implement persistent map and set data types.[3] The Rubinius[4] implementation of Ruby includes a HAMT, mostly written in Ruby but with 3[5] primitives. There is a C++ HAMT[6] implementation that optionally uses POPCNT CPU instruction.

The concurrent lock-free version[7] of the hash trie called Ctrie is a mutable thread-safe implementation which ensures progress. The data-structure has been proven to be correct[8] - Ctrie operations have been shown to have the atomicity, linearizability and lock-freedom properties.

References[edit]