Purely functional data structure
In computer science, purely functional data structures are data structure which can be implemented in purely functional language. The main difference between arbitrary data structure and purely functional one is that the latter are (strongly) immutable. This restriction ensures that the data structures admits all advantage of immutable objects and of purely functional language, such as (full) persistency, quick copy of objects, thread safety. Efficient purely functional data structures may require the use of lazy evaluation and memoization.
Purely functional data structures are often represented in a different way than their imperative counterparts. For example, array with constant-time access and update is a basic component of most imperative languages and many imperative data-structure, such as hash table and binary heap, are based on arrays. Arrays can be replaced by map or random access list, which admits purely functional implementation, but the access and update time is logarithmic. Therefore, purely functional data structures can be used in languages which are non-functional, but they may not be the most efficient tool available, especially if persistency is not required.
Ensuring that a data structure is purely functional
A data structure is not inherently purely functional. For examples, stacks can be implemented as singly linked list. This implementation is purely functional as long as the only operations on the stacks are adding an element in front of the stack, accessing an element or removing the first element of the stack. But, if the language is not purely functional, some code could either acces to an element in the middle of the list, and update its associated value or the pointer to its following element.
In order to ensure that the data-structure is indeed used in a purely functional way in a non purely functional language, modules or classes can be used to ensures that the user only uses the authorized functions.
Example of purely functional data structure
Here is a list of abstract data structure with a purely functional implementation.
- Stack (First-in, last out) implemented as Singly linked list,
- Queue, implemented as Real-time queue,
- Double-ended queue, implemented as Real-time double-ended queue,
- (multi)set of ordered elements and map indexed by ordered keys, implemented as Red–black tree, or more generally by search tree,
- Priority queue, implemented as Brodal queue
- Random access list, implemented as Skew binary random access list
Methods used by purely functional data structures
In his book, Chris Okasaki listed some technics used to design and create purely functional data structures. Those technics are listed below.
Laziness and memoization
Lazy evaluation is particularly interesting in purely functional language, since the order of the evaluation never change the result of a function. Therefore, lazy evaluation naturally become an important part of the construction of purely functional data structures. It allows computation to be done only when its result is actually required. Therefore, the code of a purely functional data structure can, without loss of efficiency, consider similarly data which will effectively be used and data which will be ignored, since only the computation required for the first kind of data will actually be performed.
One of the key tool in building efficient purely functional data structures is memoization. When a computation is done, it is saved, therefore it does not have to be performed a second time. This is particularly important with laziness, since different evaluation may require the same result, while it is impossible to know which evaluation will requires it first.
Amortized analysis and scheduling
Some data structures, even non-purely-functional ones such as dynamic array, admits operation which are efficient (constant time for dynamic arrays) most of the time, and inefficient (linear time for dynamic arrays) rarely. Amortization can then be used to prove that the average running-time of the operations are efficient. That is, the few inefficient operations are rare enough, and does not change the asymptotical evolution of the time complexity when a sequence of operations is considered.
In general, having inefficient operations is not acceptable for persistent data structures, because this inefficient operation can be called many times. It is not acceptable either for real-time or imperative systems, where the user may requires the time taken by the operation to be predictable. Furthermore, this unpredictability complexifies the uses of parallelism.
In order to avoid those problem, some data structures allows for the inefficient operation to be postponed, this is called scheduling. The only requirement is that the computation of the inefficient operation ends before the result of the operation is actually needed. A constant part of the inefficient operation is performed simultaneously with following call to an efficient operation, so that, the inefficient operation is already totally done when it is needed, and each individual operations remains efficient.
For example, amortized queues are composed of two singly linked lists, the front, and the reversed rear. Elements are added to the rear list, and are removed from the front list. Furthermore, each time that the front queue is empty, the rear queue is reversed and became the front queue, while the rear queue becomes empty. The amortized time complexity of each operation is constant, since each cell of the lists is added, reversed and removed at most once. In order to avoid the inefficient operation where the rear list is reversed, real-time queues, adds the restriction that the rear list is at most as long as the rear list. To ensure that, when the rear list become longer than the front list, the front list is appended to the reversed rear list. Since this operation is inefficient, it is not entirely performed immediately, instead, for each following operations, two cells of the new front lists are actually computed. Therefore, each cell is computed before it is needed, and the new front list is totally computed before the moment where a new inefficient operation needs to be called.
- Purely Functional Data Structures thesis by Chris Okasaki (PDF format)
- Making Data-Structures Persistent by James R. Driscoll, Neil Sarnak, Daniel D. Sleator, Robert E. Tarjan (PDF)
- Fully Persistent Lists with Catenation by James R. Driscoll, Daniel D. Sleator, Robert E. Tarjan (PDF)
- Persistent Data Structures from MIT open course Advanced Algorithms
- What's new in purely functional data structures since Okasaki? on Theoretical Computer Science StackExchange