Algebraic data type
From Wikipedia, the free encyclopedia
| This article may be confusing or unclear to readers. Please help clarify the article; suggestions may be found on the talk page. (February 2009) |
In computer programming, an algebraic data type (sometimes also called a variant type[1]) is a datatype each of whose values is data from other datatypes wrapped in one of the constructors of the datatype. Any wrapped datum is an argument to the constructor. In contrast to other datatypes, the constructor is not executed and the only way to operate on the data is to unwrap the constructor using pattern matching.
The most common algebraic data type is a list with two constructors: Nil or [] for an empty list, and Cons (an abbreviation of constructor), ::, or : for the combination of a new element with a shorter list (for example (Cons 1 '(2 3 4)) or 1:[2,3,4]).
Special cases of algebraic types are product types i.e. records (only one constructor), sum types or tagged unions (many constructors with a single argument) and enumerated types (many constructors with no arguments). Algebraic types are one kind of composite type (i.e. a type formed by combining other types).
An algebraic data type may also be an abstract data type (ADT) if it is exported from a module without its constructors. Values of such a type can only be manipulated using functions defined in the same module as the type itself.
In set theory the equivalent of an algebraic data type is a discriminated union – a set whose elements consist of a tag (equivalent to a constructor) and an object of a type corresponding to the tag (equivalent to the constructor arguments).
Contents |
[edit] An example
For example, in Haskell we can define a new algebraic data type, Tree:
data Tree = Empty | Leaf Int | Node Tree Tree
or in OCaml syntax:
type tree = Empty | Leaf of int | Node of tree * tree
Here, Empty, Leaf and Node are the constructors. Somewhat similar to a function, a constructor is applied to arguments of an appropriate type, then yielding an instance of the data type to which the constructor belongs. For instance, Leaf has the something like a “functional type” Int -> Tree meaning that giving an integer as an argument to Leaf produces a value of the type Tree. As Node takes two arguments of the type Tree itself, the datatype is recursive.
Operations on algebraic data types can be defined by using pattern matching to retrieve the arguments. For example, consider a function to find the depth of a Tree, given here in Haskell:
depth :: Tree -> Int depth Empty = 0 depth (Leaf n) = 1 depth (Node l r) = 1 + max (depth l) (depth r)
Thus, a Tree given to depth can be constructed using any of Empty, Leaf or Node and we must match for any of them respectively to deal with all cases. In case of Node, the pattern extracts the subtrees l and r for further processing.
[edit] Explanation
What is happening is that we have a datatype, which can be “one of several types of things.” Each “type of thing” is associated with an identifier called a constructor, which can be thought of as a kind of tag for that kind of data. Each constructor can carry with it a different type of data. A constructor could carry no data at all (e.g. "Empty" in the example above), carry one piece of data (e.g. “Leaf” has one Int value), or multiple pieces of data (e.g. “Node” has two Tree values).
When we want to do something with a value of this Tree algebraic data type, we deconstruct it using a process known as pattern matching. It involves matching the data with a series of patterns. The example function "depth" above pattern-matches its argument with three patterns. When the function is called, it finds the first pattern that matches its argument, performs any variable bindings that are found in the pattern, and evaluates the expression corresponding to the pattern.
Each pattern has a form that resembles the structure of some possible value of this datatype. The first pattern above simply matches values of the constructor Empty. The second pattern above matches values of the constructor Leaf. Patterns are recursive, so then the data that is associated with that constructor is matched with the pattern "n". In this case, a lowercase identifier represents a pattern that matches any value, which then is bound to a variable of that name — in this case, a variable “n” is bound to the integer value stored in the data type — to be used in the expression to be evaluated.
The recursion in patterns in this example are trivial, but a possible more complex recursive pattern would be something like Node (Node (Leaf 4) x) (Node y (Node Empty z)). Recursive patterns several layers deep are used for example in balancing red-black trees, which involve cases that require looking at colors several layers deep.
The example above is operationally equivalent to the following pseudocode:
if data.constructor == Empty: return 0 else if data.constructor == Leaf: let n = data.field1 return 1 else if data.constructor == Node: let l = data.field1 let r = data.field2 return 1 + max (depth l) (depth r)
The comparison of this with pattern matching will point out some of the advantages of algebraic data types and pattern matching. First is type safety. The pseudocode above relies on the diligence of the programmer to not access field2 when the constructor is a Leaf, for example. Also, the type of field1 is different for Leaf and Node (for Leaf it is Int; for Node it is Tree), so the type system would have difficulties assigning a static type to it in a safe way in a traditional record data structure. However, in pattern matching, the type of each extracted value is checked based on the types declared by the relevant constructor, and how many values you can extract is known based on the constructor, so it does not face these problems.
Second, in pattern matching, the compiler statically checks that all cases are handled. If one of the cases of the “depth” function above were missing, the compiler would issue a warning, indicating that a case is not handled. This task may seem easy for the simple patterns above, but when you have many complicated recursive patterns, the task becomes difficult for the average human (or compiler, if it has to check arbitrary nested if-else constructs) to handle. Similarly, there may be patterns which never match (i.e. it is already covered by previous patterns), and the compiler can also check and issue warnings for these, as they may indicate an error in reasoning.
Do not confuse these patterns with regular expression patterns used in string pattern matching. The purpose is similar — to check whether a piece of data matches certain constraints, and if so, extract relevant parts of it for processing — but the mechanism is very different. This kind of pattern matching on algebraic data types matches on the structural properties of an object rather than on the character sequence of strings.
[edit] Theory
A general algebraic data type is a possibly recursive sum type of product types. Each constructor tags a product type to separate it from others, or if there is only one constructor, the data type is a product type. Further, the parameter types of a constructor are the factors of the product type. A parameterless constructor corresponds to the empty product. If a datatype is recursive, the entire sum of products is wrapped in a recursive type, and each constructor also rolls the datatype into the recursive type.
For example, the Haskell datatype:
data List a = Nil | Cons a (List a)
is represented in type theory as
with constructors
and
.
The Haskell List datatype can also be represented in type theory in a slightly different form, as follows:
. (Note how the μ and λ constructs are reversed relative to the original.) The original formation specified a type function whose body was a recursive type; the revised version specifies a recursive function on types. (We use the type variable φ to suggest a function rather than a "base type" like β, since φ is like a Greek "f".) Note that we must also now apply the function φ to its argument type α in the body of the type.
For the purposes of the List example, these two formulations are not significantly different; but the second form allows one to express so-called nested data types, i.e., those where the recursive type differs parametrically from the original. (For more information on nested data types, see the works of Richard Bird, Lambert Meertens and Ross Paterson.)
[edit] Programming languages with algebraic data types
The following programming languages have algebraic data types as a first class notion:
[edit] See also
[edit] References
- Algebraic data type in The Free On-line Dictionary of Computing, Editor Denis Howe.
- ^ Records and variants - OCaml manual section 1.4
This article was originally based on material from the Free On-line Dictionary of Computing, which is licensed under the GFDL.
|
|||||||||||||||||||||||

