Zipper (data structure)

From Wikipedia, the free encyclopedia
Jump to: navigation, search

A zipper is a technique of representing an aggregate data structure so that it is convenient for writing programs that traverse the structure arbitrarily and update its contents, especially in purely functional programming languages. The zipper was described by Gérard Huet in 1997.[1] It includes and generalizes the gap buffer technique sometimes used with arrays.

The zipper technique is general in the sense that it can be adapted to lists, trees, and other inductively-defined data structures. Such modified data structures are usually referred to as "a tree with zipper" or "a list with zipper" to emphasize that the structure is conceptually a tree or list, while the zipper is a detail of the implementation.

A laymans explanation for a tree with zipper would be an ordinary computer filesystem with operations to go to parent (often ..), and the possibility to go downwards (cd subdirectory). The zipper is the pointer to the current path. Behind the scenes the zipper are efficient when making (functional) changes to a data structure, where a new, slightly changed, data structure is returned from an edit operation (instead of making a change in the current data structure).

Contents

[edit] Example: Bidirectional list traversal

Many common data structures in computer science can be expressed as the structure generated by a few primitive constructor operations or observer operations. These include the structure of finite lists, which can be generated by two operations:

  • Empty: Constructs an empty list
  • Insert(x, L): Constructs a list by inserting value x in front of list L

The list [1, 2, 3] is then constructed as Insert(1, Insert(2, Insert(3, Empty))). It is possible to describe the location of a value in a list as the number of steps from the front of the list to that value. More formally, a location is the number of additional Insert operations used to construct the whole list, after a particular value was inserted.

A context for a location in the list is constructed by recording not just the number of Insert operations, but all of the other information about them—namely, the values that were inserted. These are represented in a separate list that is reversed from the order of the original data structure. Specifically, the context of "3" in the list [1, 2, 3] is represented as [2, 1]. A list with a zipper represents the entire structure, and a location within the structure. This is a pair consisting of the location's context, and the part of the structure that begins at the location. The list [1, 2, 3, 4] with a reference to the "3" is represented as ([2, 1], [3, 4]).

With the list represented this way, it is easy to define efficient operations that move the location forward or backward and manipulate the list at that location, for example by inserting or removing items. Similarly, applying the zipper transformation to a tree makes it easy to insert or remove values at a particular location in the tree.

[edit] Uses

The zipper is often used where there is some concept of 'focus' or of moving around in some set of data, since its semantics reflect that of moving around but in a functional non-destructive manner.

The zipper has been used in

  • Xmonad, to manage focus and placement of windows
  • Huet's papers cover a structural editor[2] based on zippers and a theorem prover
  • A filesystem (ZipperFS) written in Haskell offering "...transactional semantics; undo of any file and directory operation; snapshots; statically guaranteed the strongest, repeatable read, isolation mode for clients; pervasive copy-on-write for files and directories; built-in traversal facility; and just the right behavior for cyclic directory references."[3]
  • Clojure has extensive support for zippers. [4]

[edit] Zipper contexts and differentiation

It has been shown that the type of the items in the context list produced by the zipper transformation is the "derivative" of the original type in a sense that is loosely related to differentiation in calculus. Intuitively, each item represents the "difference" between an internal substructure and its next containing structure. When the types are described in a particular way, the representation of the original type looks like a polynomial, and the representation of the type of context items looks like the derivative of that polynomial.[5]

For instance, a binary tree with nodes of type A can be represented by the inductive definition: 1+A×T2 → T, that is: a tree is either empty, or a tuple consisting of a value of type A and two subtrees. The derivative of 1+A×T2 is 2×A×T, that is: a tuple consisting of a binary flag, a value of type A and a sibling tree. This is precisely the information needed, given a particular subtree, to construct its parent tree. The boolean flag is an indicator of whether the subtree was on the left or right, the value of type A is the value at the parent, and the value of type T is the sibling subtree. Then, one can reconstruct a tree given one of its subtrees and a list of such derivatives, called a path. The subtree has been effectively separated out (or pointed out) and each item in the list contains the information to reconstruct a node along the path from subtree to the top of the tree.

[edit] Alternatives and extensions

[edit] Direct modification

In a non-purely-functional programming language, it may be more convenient to simply traverse the original data structure and modify it directly (perhaps after deep cloning it, to avoid affecting other code that might hold a reference to it).

[edit] Generic zipper

The Generic Zipper[6][7][8] is a technique to achieve the same goal as the conventional zipper without creating any type-specific data structure. It uses continuations to allow revisiting earlier nodes. It also keeps track of which nodes contain changes, to enable sharing. (The Haskell code given in the reference uses generic programming to generate a traversal function for any data structure, but this is optional - any suitable traversal function can be used.)

However, the Generic Zipper involves inversion of control, so some uses of it require a state machine (or equivalent) to keep track of what to do next.

[edit] References

[edit] Further reading

[edit] External links

Personal tools
Namespaces
Variants
Actions
Navigation
Interaction
Toolbox
Print/export
Languages