Jump to content

Abstract data type: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
m Spelling/grammar/punctuation/typographical correction Fixing style/layout errors
rewrite, move stuff around and copy in boom hierarchy etc. from Data type#Abstract data type, fold stub sections into bigger ones, centralize definition parts
Line 6: Line 6:
}}
}}


In [[computer science]], an '''abstract data type''' ('''ADT''') is a [[mathematical model]] for [[data type]]s, defined by its behavior ([[Semantics (computer science)|semantics]]) from the point of view of a ''[[User (computing)|user]]'' of the data, specifically in terms of possible values, possible operations on data of this type, and the behavior of these operations. This mathematical model contrasts with ''[[data structure]]s'', which are concrete representations of data, and are the point of view of an implementer, not a user.
In [[computer science]], an '''abstract data type''' ('''ADT''') is a [[mathematical model]] for [[data type]]s, defined by its behavior ([[Semantics (computer science)|semantics]]) from the point of view of a ''[[User (computing)|user]]'' of the data, specifically in terms of possible values, possible operations on data of this type, and the behavior of these operations. This mathematical model contrasts with ''[[data structure]]s'', which are concrete representations of data, and are the point of view of an implementer, not a user. For example, a [[Stack (abstract data type)|stack]] has push/pop operations that follow a Last-In-First-Out rule, and can be concretely implemented using either a list or an array. Another example is a [[set (abstract data type)|set]] which stores values, without any particular [[sequence|order]], and no repeated values. Values themselves are not retrieved from sets, rather one tests a value for membership to obtain a Boolean "in" or "not in".


ADTs are a theoretical concept, used in formal [[Semantics (computer science)|semantics]] and program [[Formal verification|verification]] and, less strictly, in the design and analysis of [[algorithm]]s, [[Data structure|data structures]], and [[Software system|software systems]]. Most mainstream computer languages do not directly support formally specifying ADTs. However, various language features correspond to certain aspects of implementing ADTs, and are easily confused with ADTs proper; these include [[abstract type]]s, [[opaque data type]]s, [[Protocol (object-oriented programming)|protocols]], and [[design by contract]]. For example, in [[modular programming]], the module declares procedures that correspond to the ADT operations, often with [[comment (computer programming)|comments]] that describe the constraints. This [[information hiding]] strategy allows the implementation of the module to be changed without disturbing the [[client (computing)|client]] programs, but the module only informally defines an ADT. The notion of abstract data types is related to the concept of [[data abstraction]], important in [[object-oriented programming language|object-oriented programming]] and design by contract methodologies for [[software engineering]].{{cn|date=September 2022}}
Formally, an ADT may be defined as a "class of objects whose logical behavior is defined by a set of values and a set of operations";{{sfn|Dale|Walker|1996|p=3}} this is analogous to an [[algebraic structure]] in mathematics. What is meant by "behaviour" varies by author, with the two main types of formal specifications for behavior being ''axiomatic (algebraic) specification'' and an ''abstract model;''{{sfn|Dale|Walker|1996|p=4}} these correspond to [[axiomatic semantics]] and [[operational semantics]] of an [[abstract machine]], respectively. Some authors also include the [[Computational complexity theory|computational complexity]] ("cost"), both in terms of time (for computing operations) and space (for representing values). In practice, many common data types are not ADTs, as the abstraction is not perfect, and users must be aware of issues like [[arithmetic overflow]] that are due to the representation. For example, [[Integer|integers]] are often stored as fixed-width values ([[32-bit computing|32-bit]] or [[64-bit computing|64-bit]] [[Binary number|binary numbers]]), and thus experience integer overflow if the maximum value is exceeded.


== History ==
ADTs are a theoretical concept, in computer science, used in the design and analysis of [[algorithm]]s, [[Data structure|data structures]], and [[Software system|software systems]], and do not correspond to specific features of [[computer language]]s—mainstream computer languages do not directly support formally specified ADTs. However, various language features correspond to certain aspects of ADTs, and are easily confused with ADTs proper; these include [[abstract type]]s, [[opaque data type]]s, [[Protocol (object-oriented programming)|protocols]], and [[design by contract]]. ADTs were first proposed by [[Barbara Liskov]] and Stephen N. Zilles in 1974, as part of the development of the [[CLU (programming language)|CLU]] language.{{sfn|Liskov|Zilles|1974}}


ADTs were first proposed by [[Barbara Liskov]] and Stephen N. Zilles in 1974, as part of the development of the [[CLU (programming language)|CLU]] language.{{sfn|Liskov|Zilles|1974}} [[Algebraic specification]] was an important subject of research in CS around 1980 and almost a synonym for abstract data types at that time.<ref>{{cite book|title=Fundamentals of Algebraic Specification 1 - Equations and Initial Semantics|first=H.|last= Ehrig|publisher=Springer-Verlag|year=1985|isbn=0-387-13718-1}}</ref> It has a mathematical foundation in [[universal algebra]].<ref>{{cite book|title=Universal Algebra for Computer Scientists|first=Wolfgang|last= Wechler|publisher=Springer-Verlag|year=1992|isbn=0-387-54280-9}}</ref>
==Discussion==
For example, [[integer]]s are an ADT, defined as the values ..., −2, −1, 0, 1, 2, ..., and by the operations of addition, subtraction, multiplication, and division, together with greater than, less than, etc., which behave according to familiar mathematics (with care for [[integer division]]), independently of how the integers are represented by the computer. {{efn|Compare to the characterization of integers in abstract algebra.}} Explicitly, "behavior" includes obeying various axioms (associativity and commutativity of addition, etc.), and preconditions on operations (cannot divide by zero). Typically integers are represented in a data structure as [[binary number]]s, most often as [[two's complement]], but might be [[binary-coded decimal]] or in [[ones' complement]], but for most purposes, the user can work with the abstraction rather than the concrete choice of representation, and can simply use the data as if the type were truly abstract.


== Definition ==
An ADT consists not only of operations but also of a domain of values and of constraints on the defined operations. An "interface" typically refers only to the operations, and perhaps some of the constraints on the operations, such as pre-conditions and post-conditions; but not to other constraints, such as relations between the operations.


Formally, an ADT is analogous to an [[algebraic structure]] in mathematics,<ref>{{cite book | author=Rudolf Lidl | title=Abstract Algebra| publisher=Springer | year=2004 | isbn=978-81-8128-149-4}}, Chapter 7, section 40.</ref> consisting of a domain, a collection of operations, and a set of constraints the operations must satisfy.{{sfn|Dale|Walker|1996|p=3}} The domain is often defined implicitly, for example the [[free object]] over the set of ADT operations. The [[interface (computer science)|interface]] of the ADT typically refers only to the domain and operations, and perhaps some of the constraints on the operations, such as pre-conditions and post-conditions; but not to other constraints, such as relations between the operations, which are considered behavior. There are two main styles of formal specifications for behavior, [[axiomatic semantics]] and [[operational semantics]].{{sfn|Dale|Walker|1996|p=4}}
For example, an abstract [[Stack (abstract data type)|stack]], which is a last-in-first-out structure, could be defined by three operations: <kbd>push</kbd>, that inserts a data item onto the stack; <kbd>pop</kbd>, that removes a data item from it; and <kbd>peek</kbd> or <kbd>top</kbd>, that accesses a data item on top of the stack without removal. An abstract [[Queue (abstract data type)|queue]], which is a first-in-first-out structure, would also have three operations: <kbd>enqueue</kbd>, that inserts a data item into the queue; <kbd>dequeue</kbd>, that removes the first data item from it; and <kbd>front</kbd>, that accesses and serves the first data item in the queue. If these were the entire definitions, there would be no way of differentiating these two data types and their very different expected ordering behavior. Thus, a constraint is introduced that for a stack specifies that each pop always returns (and removes) the most recently pushed item that has not been popped yet, and for a queue (in contrast) specifies that pop operates on the least recently pushed item.


Despite not being part of the interface, the constraints are still important to the definition of the ADT; for example a [[Stack (abstract data type)|stack]] and a [[Queue (abstract data type)|queue]] have similar add element/remove element interfaces, but it is the constraints that distinguish last-in-first-out from first-in-first-out behavior. The constraints do not consist only of equations such as {{code|1=fetch(store(S,v))=v}} but also [[logical formula]]s.
When [[analysis of algorithms|analyzing the efficiency]] of algorithms, one may also specify that all operations take the same time no matter how many data items have been pushed into the stack and that the stack uses a constant amount of storage for each element. However, time bounds are not always considered part of the definition of an ADT.


===Axiomatic semantics===
Abstract data types are purely theoretical entities, used (among other things) to simplify the description of abstract algorithms, to classify and evaluate data structures, and to formally describe the [[type system]]s of programming languages. However, an ADT may be [[implementation|implemented]] by specific [[data type]]s or [[data structure]]s, in many ways and in many programming languages; or described in a [[formal specification language]]. ADTs are often implemented as [[modular programming|modules]]: the module's [[interface (computer science)|interface]] declares procedures that correspond to the ADT operations, sometimes with [[comment (computer programming)|comments]] that describe the constraints. This [[information hiding]] strategy allows the implementation of the module to be changed without disturbing the [[client (computing)|client]] programs.
In the spirit of [[functional programming]], each state of an abstract data structure is a separate entity or value. In this view, each operation is modelled as a [[function (mathematics)|mathematical function]] with no [[side effect (computer science)|side effect]]s. Operations that modify the ADT are modeled as functions that take the old state as an argument and returns the new state as part of the result. The order in which operations are evaluated is immaterial, and the same operation applied to the same arguments (including the same input states) will always return the same results (and output states). The constraints are specified as axioms or algebraic laws that the operations must satisfy.


===Operational semantics===
The term abstract data type can also be regarded as a generalized approach of a number of [[Algebraic structure|algebraic structures]], such as lattices, groups, and rings.<ref>{{cite book | author=Rudolf Lidl | title=Abstract Algebra| publisher=Springer | year=2004 | isbn=978-81-8128-149-4}}, Chapter 7, section 40.</ref> The notion of abstract data types is related to the concept of [[data abstraction]], important in [[object-oriented programming language|object-oriented programming]] and [[design by contract]] methodologies for [[software engineering|software development]].{{cn|date=September 2022}}
In the spirit of [[imperative programming]], an abstract data structure is conceived as an entity that is ''mutable''&mdash;meaning that there is a notion of time and the ADT may be in different states at different times. Operations then change the state of the ADT over time; therefore, the order in which operations are evaluated is important, and the same operation on the same entities may have different effects if executed at different times. This is analogous to the instructions of a computer or the commands and procedures of an imperative language. To underscore this view, it is customary to say that the operations are ''executed'' or ''applied'', rather than ''evaluated'', similar to the imperative style often used when describing abstract algorithms. The constraints are typically specified in prose.


===Auxiliary operations===
==Definition==
An abstract data type is defined as a mathematical model of the data objects that make up a data type as well as the functions that operate on these objects. There are no standard conventions for defining them. A broad division may be drawn between "imperative" (or "operational") and "functional" (or "axiomatic") definition styles.


Presentations of ADTs are often limited in scope to only key operations. More thorough presentations often specify auxiliary operations on ADTs, such as:
===Imperative-style definition===
* {{code|create}}(), that yields a new instance of the ADT;
In the theory of [[imperative programming]] languages, an abstract data structure is conceived as an entity that is ''mutable''&mdash;meaning that it may be in different ''states'' at different times. Some operations may change the state of the ADT; therefore, the order in which operations are evaluated is important, and the same operation on the same entities may have different effects if executed at different times. This is analogous to the instructions of a computer or the commands and procedures of an imperative language. To underscore this view, it is customary to say that the operations are ''executed'' or ''applied'', rather than ''evaluated'', similar to the imperative style often used when describing abstract algorithms. (See [[The Art of Computer Programming]] by [[Donald Knuth]] for more details).
* {{code|compare}}(''s'', ''t''), that tests whether two instances' states are equivalent in some sense;
* {{code|hash}}(''s''), that computes some standard [[hash function]] from the instance's state;
* {{code|print}}(''s'') or {{code|show}}(''s''), that produces a human-readable representation of the instance's state.
These names are illustrative and may vary between authors. In imperative-style ADT definitions, one often finds also:
* {{code|initialize}}(''s''), that prepares a newly created instance ''s'' for further operations, or resets it to some "initial state";
* {{code|copy}}(''s'', ''t''), that puts instance ''s'' in a state equivalent to that of ''t'';
* {{code|clone}}(''t''), that performs ''s'' ← {{code|create}}(), {{code|copy}}(''s'', ''t''), and returns ''s'';
* {{code|free}}(''s'') or {{code|destroy}}(''s''), that reclaims the memory and other resources used by ''s''.


The {{code|free}} operation is not normally relevant or meaningful, since ADTs are theoretical entities that do not "use memory". However, it may be necessary when one needs to analyze the storage used by an algorithm that uses the ADT. In that case, one needs additional axioms that specify how much memory each ADT instance uses, as a function of its state, and how much of it is returned to the pool by {{code|free}}.
====Abstract variable====
Imperative-style definitions of ADT often depend on the concept of an ''abstract variable'', which may be regarded as the simplest non-trivial ADT. An abstract variable ''V'' is a mutable entity that admits two operations:
* <kbd>store</kbd>(''V'', ''x'') where ''x'' is a ''value'' of unspecified nature;
* <kbd>fetch</kbd>(''V''), that yields a value,
with the constraint that:
* <kbd>fetch</kbd>(''V'') always returns the value ''x'' used in the most recent <kbd>store</kbd>(''V'', ''x'') operation on the same variable ''V''.


===Restricted types ===
Fetching before storing can be disallowed, defined to have a certain result, or (less desirably), leave the behavior unspecified.


The definition of an ADT often restricts the stored value(s) for its instances, to members of a specific set ''X'' called the ''range'' of those variables. For example, an abstract variable may be constrained to only store integers. As in programming languages, such restrictions may simplify the description and [[analysis of algorithms]], and improve its readability.
As in many programming languages, the operation <kbd>store</kbd>(''V'', ''x'') is often written ''V'' ← ''x'' (or some similar notation), and <kbd>fetch</kbd>(''V'') is implied whenever a variable ''V'' is used in a context where a value is required. Thus, for example, ''V'' ← ''V'' + 1 is commonly understood to be a shorthand for <kbd>store</kbd>(''V'',<kbd>fetch</kbd>(''V'') + 1).


===Aliasing===
In this definition, it is implicitly assumed that names are always distinct: storing a value into a variable ''U'' has no effect on the state of a distinct variable ''V''. To make this assumption explicit, one could add the constraint that:
* if ''U'' and ''V'' are distinct variables, the sequence { <kbd>store</kbd>(''U'', ''x''); <kbd>store</kbd>(''V'', ''y'') } is equivalent to { <kbd>store</kbd>(''V'', ''y''); <kbd>store</kbd>(''U'', ''x'') }.


In the operational style, it is often unclear how multiple instances are handled and if modifying one instance may affect others. A common style of defining ADTs writes the operations as if only one instance exists during the execution of the algorithm, and all operations are applied to that instance. For example, a stack may have operations {{code|push}}(''x'') and {{code|pop}}(), that operate on ''the'' only existing stack. ADT definitions in this style can be easily rewritten to admit multiple coexisting instances of the ADT, by adding an explicit instance parameter (like ''S'' in the stack example below) to every operation that uses or modifies the implicit instance. Some ADTs cannot be meaningfully defined without allowing multiple instances, for example when a single operation takes two distinct instances of the ADT as parameters, such as a {{code|union}} operation on sets or a {{code|compare}} operation on lists.
More generally, ADT definitions often assume that any operation that changes the state of one ADT instance has no effect on the state of any other instance of the same ADT, unless the ADT axioms define certain instances as connected (see [[Aliasing (computing)|aliased]]) in a specific way. The most common such connections include:


The multiple instance style is sometimes combined with an [[Aliasing (computing)|aliasing]] axiom, namely that the result of {{code|create}}() is distinct from any instance already in use by the algorithm. Implementations of ADTs may still reuse memory and allow implementations of {{code|create}}() to yield a previously created instance; however, defining that such an instance even is "reused" is difficult in the ADT formalism.
* Aliasing, in which two or more names refer to the exact same data object.
* Composition, in which an ADT is defined to contain instances of (the same or other) ADTs.
* Reference, in which an ADT is defined to refer to instance of (the same or other) ADTs.


For example, when extending the definition of an abstract variable to include abstract [[record (computer science)|records]], operations upon a field ''F'' of a record variable ''R'', clearly involve ''F'', which is distinct from, but also a part of, ''R''.
More generally, this axiom may be strengthened to exclude also partial aliasing with other instances, so that composite ADTs (such as trees or records) and reference-style ADTs (such as pointers) may be assumed to be completely disjoint. For example, when extending the definition of an abstract variable to include abstract [[record (computer science)|records]], operations upon a field ''F'' of a record variable ''R'', clearly involve ''F'', which is distinct from, but also a part of, ''R''. A partial aliasing axiom would state that changing a field of one record variable does not affect any other records.


=== Complexity analysis ===
The definition of an ADT may restrict the stored value(s) for its instances, to members of a specific set ''X'' called the ''range'' of those variables. For example, an ADT for an aggregate such as a Stack or Queue, may constrain all items in the queue to be integers, or at least to all be of a single type (see [[Homogeneity_and_heterogeneity_(statistics)|homogeneity]]). As in programming languages, such restrictions may simplify the description and [[analysis of algorithms]], and improve its readability.
Some authors also include the [[Computational complexity theory|computational complexity]] ("cost") of each operation, both in terms of time (for computing operations) and space (for representing values), to aid in [[analysis of algorithms]]. For example, one may specify that each operation takes the same time and each value takes the same space regardless of the state of the ADT, or that there is a "size" of the ADT and the operations are linear, quadratic, etc. in the size of the ADT. [[Alexander Stepanov]], designer of the C++ [[Standard Template Library]], included complexity guarantees in the STL specification, arguing:


{{Blockquote|The reason for introducing the notion of abstract data types was to allow interchangeable software modules. You cannot have interchangeable modules unless these modules share similar complexity behavior. If I replace one module with another module with the same functional behavior but with different complexity tradeoffs, the user of this code will be unpleasantly surprised. I could tell him anything I like about data abstraction, and he still would not want to use the code. Complexity assertions have to be part of the interface.|Alexander Stepanov<ref>{{Cite journal |first=Al |last=Stevens |title=Al Stevens Interviews Alex Stepanov |date=March 1995 |journal=[[Dr. Dobb's Journal]] |url=http://www.sgi.com/tech/stl/drdobbs-interview.html |access-date=31 January 2015}}</ref>}}
Note that this definition does not imply anything about the result of evaluating <kbd>fetch</kbd>(''V'') when ''V'' is ''un-initialized'', that is, before performing any <kbd>store</kbd> operation on ''V''. An algorithm that does so may be considered invalid, either (a) because the ADT prohibits such an operation, or (b) simply because its effect is not defined by the ADT. However, there are some important algorithms whose efficiency strongly depends on the assumption that such a <kbd>fetch</kbd> is legal, and returns some arbitrary value in the variable's range.{{Citation needed|date=May 2009}})


Other authors disagree, arguing that a stack ADT is the same whether it is implemented with a linked list or an array, despite the difference in operation costs, and that an ADT specification should be independent of implementation.
====Instance creation====
Some algorithms need to create new instances of some ADT (such as new variables, or new stacks). To describe such algorithms, one usually includes in the ADT definition a <kbd>create</kbd>() operation that yields an instance of the ADT, usually with axioms equivalent to:
* the result of <kbd>create</kbd>() is distinct from any instance already in use by the algorithm.
This axiom may be strengthened to exclude also partial aliasing with other instances{{clarification needed|date=January 2022}}. For practical use, such as axiom may still allow implementations of <kbd>create</kbd>() to yield a previously created instance that has become inaccessible to the program; however, defining that such an instance even is "the same" is difficult, especially in the abstract (though even a re-used block of memory is only "the same object" in certain senses).


==Examples==
====Example: abstract stack (imperative)====
As another example, an imperative-style definition of an abstract stack could specify that the state of a stack ''S'' can be modified only by the operations:
* <kbd>push</kbd>(''S'', ''x''), where ''x'' is some ''value'' of unspecified nature;
* <kbd>pop</kbd>(''S''), that yields a value as a result,
with the constraint that:
* For any value ''x'' and any abstract variable ''V'', the sequence of operations { <kbd>push</kbd>(''S'', ''x''); ''V'' ← <kbd>pop</kbd>(''S'') } is equivalent to ''V'' ← ''x''.


===Abstract variable===
Since the assignment ''V'' ← ''x'', by definition, cannot change the state of ''S'', this condition implies that ''V'' ← <kbd>pop</kbd>(''S'') restores ''S'' to the state it had before the <kbd>push</kbd>(''S'', ''x''). From this condition and from the properties of abstract variables, it follows, for example, that the sequence:
An abstract variable may be regarded as the simplest non-trivial ADT, with the semantics of an imperative variable. It admits two operations, {{code|fetch}} and {{code|store}}. Operational definitions are often written in terms of abstract variables. In the axiomatic semantics, letting <math>V</math> be the type of the abstract variable and <math>X</math> be the type of its contents, {{code|fetch}} is a function <math>V \to X</math> and {{code|store}} is a function of type <math>V \to X \to V</math>. The main constraint is that {{code|fetch}} always returns the value ''x'' used in the most recent {{code|store}} operation on the same variable ''V'', i.e. {{code|1=fetch(store(V,x)) = x}}. We may also require that {{code|store}} overwrites the value fully, {{code|1=store(store(V,x1),x2) = store(V,x2)}}.
: { <kbd>push</kbd>(''S'', ''x''); <kbd>push</kbd>(''S'', ''y''); ''U'' ← <kbd>pop</kbd>(''S''); <kbd>push</kbd>(''S'', ''z''); ''V'' ← <kbd>pop</kbd>(''S''); ''W'' ← <kbd>pop</kbd>(''S'') }
where ''x'', ''y'', and ''z'' are any values, and ''U'', ''V'', ''W'' are pairwise distinct variables, is equivalent to:
: { ''U'' ← ''y''; ''V'' ← ''z''; ''W'' ← ''x'' }


In the operational semantics, {{code|fetch}}(''V'') is a procedure that returns the current value in the location ''V'', and {{code|store}}(''V'', ''x'') is a procedure with {{code|void}} return type that stores the value ''x'' in the location ''V''. The constraints are described informally as that reads are consistent with writes. As in many programming languages, the operation {{code|store}}(''V'', ''x'') is often written ''V'' ← ''x'' (or some similar notation), and {{code|fetch}}(''V'') is implied whenever a variable ''V'' is used in a context where a value is required. Thus, for example, ''V'' ← ''V'' + 1 is commonly understood to be a shorthand for {{code|store}}(''V'',{{code|fetch}}(''V'') + 1).
Here it is implicitly assumed that operations on a stack instance do not modify the state of any other ADT instance, including other stacks; that is:
* For any values ''x'', ''y'', and any distinct stacks ''S'' and ''T'', the sequence { <kbd>push</kbd>(''S'', ''x''); <kbd>push</kbd>(''T'', ''y'') } is equivalent to { <kbd>push</kbd>(''T'', ''y''); <kbd>push</kbd>(''S'', ''x'') }.


In this definition, it is implicitly assumed that names are always distinct: storing a value into a variable ''U'' has no effect on the state of a distinct variable ''V''. To make this assumption explicit, one could add the constraint that:
An abstract stack definition usually includes also a [[Boolean value|Boolean]]-valued function <kbd>empty</kbd>(''S'') and a <kbd>create</kbd>() operation that returns a stack instance, with axioms equivalent to:
* if ''U'' and ''V'' are distinct variables, the sequence { {{code|store}}(''U'', ''x''); {{code|store}}(''V'', ''y'') } is equivalent to { {{code|store}}(''V'', ''y''); {{code|store}}(''U'', ''x'') }.
* <kbd>create</kbd>() ≠ ''S'' for any prior stack ''S'' (a newly created stack is distinct from all previous stacks);
* <kbd>empty</kbd>(<kbd>create</kbd>()) (a newly created stack is empty);
* <kbd>not</kbd> <kbd>empty</kbd>(<kbd>push</kbd>(''S'', ''x'')) (pushing something into a stack makes it non-empty).


This definition does not say anything about the result of evaluating {{code|fetch}}(''V'') when ''V'' is ''un-initialized'', that is, before performing any {{code|store}} operation on ''V''. Fetching before storing can be disallowed, defined to have a certain result, or left unspecified. There are some algorithms whose efficiency depends on the assumption that such a {{code|fetch}} is legal, and returns some arbitrary value in the variable's range.
====Single-instance style====
Sometimes an ADT is defined as if only one instance of it existed during the execution of the algorithm, and all operations were applied to that instance, which is not explicitly notated. For example, the abstract stack above could have been defined with operations <kbd>push</kbd>(''x'') and <kbd>pop</kbd>(), that operate on ''the'' only existing stack. ADT definitions in this style can be easily rewritten to admit multiple coexisting instances of the ADT, by adding an explicit instance parameter (like ''S'' in the previous example) to every operation that uses or modifies the implicit instance.


===Abstract stack===
On the other hand, some ADTs cannot be meaningfully defined without assuming multiple instances. This is the case when a single operation takes two distinct instances of the ADT as parameters. For an example, consider augmenting the definition of the abstract stack with an operation <kbd>compare</kbd>(''S'', ''T'') that checks whether the stacks ''S'' and ''T'' contain the same items in the same order.
An abstract [[Stack (abstract data type)|stack]] is a last-in-first-out structure, It is generally defined by three key operations: {{code|push}}, that inserts a data item onto the stack; {{code|pop}}, that removes a data item from it; and {{code|peek}} or {{code|top}}, that accesses a data item on top of the stack without removal. A complete abstract stack definition includes also a [[Boolean value|Boolean]]-valued function {{code|empty}}(''S'') and a {{code|create}}() operation that returns an initial stack instance.


In the axiomatic semantics, letting <math>S</math> be the type of state states and <math>X</math> be the type of values contained in the stack, these could have the types <math>push : S \to X \to S</math>, <math>pop : S \to (S,X)</math>, <math>top : S \to X</math>, <math>create : S</math>, and <math>empty : S \to \mathbb{B}</math>. In the axiomatic semantics, creating the initial stack is a "trivial" operation, and always returns the same distinguished state. Therefore, it is often designated by a special symbol like [[Lambda|Λ]] or "()". The {{code|empty}} operation predicate can then be written simply as <math>s = \Lambda</math> or <math>s \neq \Lambda</math>.
===Functional-style definition===
Another way to define an ADT, closer to the spirit of [[functional programming]], is to consider each state of the structure as a separate entity. In this view, any operation that modifies the ADT is modelled as a [[function (mathematics)|mathematical function]] that takes the old state as an argument and returns the new state as part of the result. Unlike the imperative operations, these functions have no [[side effect (computer science)|side effect]]s. Therefore, the order in which they are evaluated is immaterial, and the same operation applied to the same arguments (including the same input states) will always return the same results (and output states).


The constraints are then {{code|1=pop(push(S,v))=(S,v)}}, {{code|1=top(push(S,v))=v}},<ref>{{cite web |last1=Black |first1=Paul E. |title=axiomatic semantics |url=https://xlinux.nist.gov/dads/HTML/axiomaticSemantics.html |website=Dictionary of Algorithms and Data Structures |access-date=25 November 2023 |date=24 August 2005}}</ref>, {{code|empty}}({{code|create}} = T) (a newly created stack is empty), {{code|empty}}({{code|push}}(''S'', ''x'')) = F (pushing something into a stack makes it non-empty). These axioms do not define the effect of {{code|top}}(''s'') or {{code|pop}}(''s''), unless ''s'' is a stack state returned by a {{code|push}}. Since {{code|push}} leaves the stack non-empty, those two operations can be defined to be invalid when ''s'' = Λ. From these axioms (and the lack of side effects), it can be deduced that {{code|push}}(Λ, ''x'') ≠ Λ. Also, {{code|push}}(''s'', ''x'') = {{code|push}}(''t'', ''y'') [[if and only if]] ''x'' = ''y'' and ''s'' = ''t''.
In the functional view, in particular, there is no way (or need) to define an "abstract variable" with the semantics of imperative variables (namely, with <kbd>fetch</kbd> and <kbd>store</kbd> operations). Instead of storing values into variables, one passes them as arguments to functions.


As in some other branches of mathematics, it is customary to assume also that the stack states are only those whose existence can be proved from the axioms in a finite number of steps. In this case, it means that every stack is a ''finite'' sequence of values, that becomes the empty stack (Λ) after a finite number of {{code|pop}}s. By themselves, the axioms above do not exclude the existence of infinite stacks (that can be {{code|pop}}ped forever, each time yielding a different state) or circular stacks (that return to the same state after a finite number of {{code|pop}}s). In particular, they do not exclude states ''s'' such that {{code|pop}}(''s'') = ''s'' or {{code|push}}(''s'', ''x'') = ''s'' for some ''x''. However, since one cannot obtain such stack states from the initial stack state with the given operations, they are assumed "not to exist".
====Example: abstract stack (functional)====
For example, a complete functional-style definition of an abstract stack could use the three operations:
* <kbd>push</kbd>: takes a stack state and an arbitrary value, returns a stack state;
* <kbd>top</kbd>: takes a stack state, returns a value;
* <kbd>pop</kbd>: takes a stack state, returns a stack state.


In the operational definition of an abstract stack, {{code|push}}(''S'', ''x'') returns nothing and {{code|pop}}(''S'') yields the value as the result but not the new state of the stack. There is then the constraint that, for any value ''x'' and any abstract variable ''V'', the sequence of operations { {{code|push}}(''S'', ''x''); ''V'' ← {{code|pop}}(''S'') } is equivalent to ''V'' ← ''x''. Since the assignment ''V'' ← ''x'', by definition, cannot change the state of ''S'', this condition implies that ''V'' ← {{code|pop}}(''S'') restores ''S'' to the state it had before the {{code|push}}(''S'', ''x''). From this condition and from the properties of abstract variables, it follows, for example, that the sequence:
In a functional-style definition there is no need for a <kbd>create</kbd> operation. Indeed, there is no notion of "stack instance". The stack states can be thought of as being potential states of a single stack structure, and two-stack states that contain the same values in the same order are considered to be identical states. This view actually mirrors the behavior of some concrete implementations, such as [[linked list]]s with [[hash cons]].
: { {{code|push}}(''S'', ''x''); {{code|push}}(''S'', ''y''); ''U'' ← {{code|pop}}(''S''); {{code|push}}(''S'', ''z''); ''V'' ← {{code|pop}}(''S''); ''W'' ← {{code|pop}}(''S'') }
where ''x'', ''y'', and ''z'' are any values, and ''U'', ''V'', ''W'' are pairwise distinct variables, is equivalent to:
: { ''U'' ← ''y''; ''V'' ← ''z''; ''W'' ← ''x'' }


Unlike the axiomatic semantics, the operational semantics can suffer from aliasing. Here it is implicitly assumed that operations on a stack instance do not modify the state of any other ADT instance, including other stacks; that is:
Instead of <kbd>create</kbd>(), a functional-style definition of an abstract stack may assume the existence of a special stack state, the ''empty stack'', designated by a special symbol like Λ or "()"; or define a <kbd>bottom</kbd>() operation that takes no arguments and returns this special stack state. Note that the axioms imply that:
* For any values ''x'', ''y'', and any distinct stacks ''S'' and ''T'', the sequence { {{code|push}}(''S'', ''x''); {{code|push}}(''T'', ''y'') } is equivalent to { {{code|push}}(''T'', ''y''); {{code|push}}(''S'', ''x'') }.
* <kbd>push</kbd>(Λ, ''x'') ≠ Λ.
In a functional-style definition of a stack one does not need an <kbd>empty</kbd> predicate: instead, one can test whether a stack is empty by testing whether it is equal to Λ.


=== Boom hierarchy ===
Note that these axioms do not define the effect of <kbd>top</kbd>(''s'') or <kbd>pop</kbd>(''s''), unless ''s'' is a stack state returned by a <kbd>push</kbd>. Since <kbd>push</kbd> leaves the stack non-empty, those two operations are undefined (hence invalid) when ''s'' = Λ. On the other hand, the axioms (and the lack of side effects) imply that <kbd>push</kbd>(''s'', ''x'') = <kbd>push</kbd>(''t'', ''y'') [[if and only if]] ''x'' = ''y'' and ''s'' = ''t''.


A more involved example is the Boom hierarchy of the [[binary tree]], [[List (abstract data type)|list]], [[Set (abstract data type)#Multiset|bag]] and [[Set (computer science)|set]] abstract data types.<ref>{{cite journal |last1=Bunkenburg |first1=Alexander |title=The Boom Hierarchy |journal=Functional Programming, Glasgow 1993 |series=Workshops in Computing |date=1994 |pages=1–8 |doi=10.1007/978-1-4471-3236-3_1|isbn=978-3-540-19879-6 |citeseerx=10.1.1.49.3252}}</ref> All these data types can be declared by three operations: ''null'', which constructs the empty container, ''single'', which constructs a container from a single element and ''append'', which combines two containers of the same type. The complete specification for the four data types can then be given by successively adding the following rules over these operations:
As in some other branches of mathematics, it is customary to assume also that the stack states are only those whose existence can be proved from the axioms in a finite number of steps. In the abstract stack example above, this rule means that every stack is a ''finite'' sequence of values, that becomes the empty stack (Λ) after a finite number of <kbd>pop</kbd>s. By themselves, the axioms above do not exclude the existence of infinite stacks (that can be <kbd>pop</kbd>ped forever, each time yielding a different state) or circular stacks (that return to the same state after a finite number of <kbd>pop</kbd>s). In particular, they do not exclude states ''s'' such that <kbd>pop</kbd>(''s'') = ''s'' or <kbd>push</kbd>(''s'', ''x'') = ''s'' for some ''x''. However, since one cannot obtain such stack states with the given operations, they are assumed "not to exist".
{|
|-
| - null is the left and right neutral for a tree:|| append(null,A) = A, append(A,null) = A.
|-
| - lists add that append is associative: || append(append(A,B),C) = append(A,append(B,C)).
|-
| - bags add commutativity: || append(B,A) = append(A,B).
|-
| - finally, sets are also idempotent: || append(A,A) = A.
|}


Access to the data can be specified by pattern-matching over the three operations, e.g. a ''member'' function for these containers by:
===Whether to include complexity===
{|
Aside from the behavior in terms of axioms, it is also possible to include, in the definition of an ADT operation, their [[Analysis of algorithms|algorithmic complexity]]. [[Alexander Stepanov]], designer of the C++ [[Standard Template Library]], included complexity guarantees in the STL specification, arguing:
|-
| - member(X,single(Y)) = eq(X,Y)
|-
| - member(X,null) = false
|-
| - member(X,append(A,B)) = or(member(X,A), member(X,B))
|}
Care must be taken to ensure that the function is invariant under the relevant rules for the data type. Within each of the equivalence classes implied by the chosen subset of equations, it has to yield the same result for all of its members.


=== Common ADTs ===
{{Blockquote|The reason for introducing the notion of abstract data types was to allow interchangeable software modules. You cannot have interchangeable modules unless these modules share similar complexity behavior. If I replace one module with another module with the same functional behavior but with different complexity tradeoffs, the user of this code will be unpleasantly surprised. I could tell him anything I like about data abstraction, and he still would not want to use the code. Complexity assertions have to be part of the interface.|Alexander Stepanov<ref>{{Cite journal |first=Al |last=Stevens |title=Al Stevens Interviews Alex Stepanov |date=March 1995 |journal=[[Dr. Dobb's Journal]] |url=http://www.sgi.com/tech/stl/drdobbs-interview.html |access-date=31 January 2015}}</ref>}}

==Advantages==
{{More citations needed section|date=May 2011}}

===Encapsulation===
[[Abstraction (computer science)|Abstraction]] provides a promise that any implementation of the ADT has certain properties and abilities; knowing these is all that is required to make use of an ADT object.

===Localization of change===
Code that uses an ADT object will not need to be edited if the implementation of the ADT is changed. Since any changes to the implementation must still comply with the interface, and since code using an ADT object may only refer to properties and abilities specified in the interface, changes may be made to the implementation without requiring any changes in code where the ADT is used.

===Flexibility===
Different implementations of the ADT, having all the same properties and abilities, are equivalent and may be used somewhat interchangeably in code that uses the ADT. This gives a great deal of flexibility when using ADT objects in different situations. For example, different implementations of the ADT may be more efficient in different situations; it is possible to use each in the situation where they are preferable, thus increasing overall efficiency.

==Typical operations==
Some operations that are often specified for ADTs (possibly under other names) are:
* <kbd>compare</kbd>(''s'', ''t''), that tests whether two instances' states are equivalent in some sense;
* <kbd>hash</kbd>(''s''), that computes some standard [[hash function]] from the instance's state;
* <kbd>print</kbd>(''s'') or <kbd>show</kbd>(''s''), that produces a human-readable representation of the instance's state.

In imperative-style ADT definitions, one often finds also:
* <kbd>create</kbd>(), that yields a new instance of the ADT;
* <kbd>initialize</kbd>(''s''), that prepares a newly created instance ''s'' for further operations, or resets it to some "initial state";
* <kbd>copy</kbd>(''s'', ''t''), that puts instance ''s'' in a state equivalent to that of ''t'';
* <kbd>clone</kbd>(''t''), that performs ''s'' ← <kbd>create</kbd>(), <kbd>copy</kbd>(''s'', ''t''), and returns ''s'';
* <kbd>free</kbd>(''s'') or <kbd>destroy</kbd>(''s''), that reclaims the memory and other resources used by ''s''.

The <kbd>free</kbd> operation is not normally relevant or meaningful, since ADTs are theoretical entities that do not "use memory". However, it may be necessary when one needs to analyze the storage used by an algorithm that uses the ADT. In that case, one needs additional axioms that specify how much memory each ADT instance uses, as a function of its state, and how much of it is returned to the pool by <kbd>free</kbd>.

==Examples==
Some common ADTs, which have proved useful in a great variety of applications, are
Some common ADTs, which have proved useful in a great variety of applications, are
{{div col|colwidth=20em}}
{{div col|colwidth=20em}}
Line 160: Line 134:
{{div col end}}
{{div col end}}


Each of these ADTs may be defined in many ways and variants, not necessarily equivalent. For example, an abstract stack may or may not have a <kbd>count</kbd> operation that tells how many items have been pushed and not yet popped. This choice makes a difference not only for its clients but also for the implementation.
Each of these ADTs may be defined in many ways and variants, not necessarily equivalent. For example, an abstract stack may or may not have a {{code|count}} operation that tells how many items have been pushed and not yet popped. This choice makes a difference not only for its clients but also for the implementation.


; Abstract graphical data type
; Abstract graphical data type
Line 166: Line 140:


==Implementation==
==Implementation==
Abstract data types are theoretical entities, used (among other things) to simplify the description of abstract algorithms, to classify and evaluate data structures, and to formally describe the [[type system]]s of programming languages. However, an ADT may be [[implementation|implemented]]. This means each ADT instance or state is represented by some concrete [[data type]] or [[data structure]], and for each abstract operation there is a corresponding [[subroutine|procedure or function]], and these implemented procudures satisfy the ADT's specifications and axioms up to some standard. In practice, the implementation is not perfect, and users must be aware of issues due to limitations of the representation and implemented procedures.
{{further|Opaque data type}}
Implementing an ADT means providing one [[subroutine|procedure or function]] for each abstract operation. The ADT instances are represented by some concrete [[data structure]] that is manipulated by those procedures, according to the ADT's specifications.


For example, [[integer]]s may be specified as an ADT, defined by the distinguished values 0 and 1, the operations of addition, subtraction, multiplication, division (with care for division by zero), comparison, etc., behaving according to the familiar mathematical axioms in abstract algebra such as associativity, commutativity, and so on. However, in a computer, integers are most commonly represented as fixed-width [[32-bit computing|32-bit]] or [[64-bit computing|64-bit]] [[Binary number|binary numbers]]. Users must be aware of issues with this representation, such as [[arithmetic overflow]], where the ADT specifies a valid result but the representation is unable to accomodate this value. Nonetheless, for many purposes, the user can ignore these infidelities and simply use the implementation as if it were the abstract data type.
Usually, there are many ways to implement the same ADT, using several different concrete data structures. Thus, for example, an abstract stack can be implemented by a [[linked list]] or by an [[Array data structure|array]].


Usually, there are many ways to implement the same ADT, using several different concrete data structures. Thus, for example, an abstract stack can be implemented by a [[linked list]] or by an [[Array data structure|array]]. Different implementations of the ADT, having all the same properties and abilities, can be considered semantically equivalent and may be used somewhat interchangeably in code that uses the ADT. This provides a form of [[Abstraction (computer science)|abstraction]] or encapsulation, and gives a great deal of flexibility when using ADT objects in different situations. For example, different implementations of the ADT may be more efficient in different situations; it is possible to use each in the situation where they are preferable, thus increasing overall efficiency. Code that uses an ADT implementation according to its interface will continute working even if the implementation of the ADT is changed.
In order to prevent clients from depending on the implementation, an ADT is often packaged as an ''[[opaque data type]]'' in one or more [[module (programming)|modules]], whose interface contains only the signature (number and types of the parameters and results) of the operations. The implementation of the module—namely, the bodies of the procedures and the concrete data structure used—can then be hidden from most clients of the module. This makes it possible to change the implementation without affecting the clients. If the implementation is exposed, it is known instead as a ''transparent data type.''


When implementing an ADT, each instance (in imperative-style definitions) or each state (in functional-style definitions) is usually represented by a [[handle (computing)|handle]] of some sort.<ref>{{cite book | author=Robert Sedgewick | title=Algorithms in C | publisher=Addison/Wesley | year=1998 | isbn=978-0-201-31452-6 | url-access=registration | url=https://archive.org/details/algorithmsinc00sedg }}, definition 4.4.</ref>
In order to prevent clients from depending on the implementation, an ADT is often packaged as an [[opaque data type]] or [[handle (computing)|handle]] of some sort,<ref>{{cite book | author=Robert Sedgewick | title=Algorithms in C | publisher=Addison/Wesley | year=1998 | isbn=978-0-201-31452-6 | url-access=registration | url=https://archive.org/details/algorithmsinc00sedg }}, definition 4.4.</ref> in one or more [[module (programming)|modules]], whose interface contains only the signature (number and types of the parameters and results) of the operations. The implementation of the module—namely, the bodies of the procedures and the concrete data structure used—can then be hidden from most clients of the module. This makes it possible to change the implementation without affecting the clients. If the implementation is exposed, it is known instead as a ''transparent data type.''


Modern object-oriented languages, such as [[C++]] and [[Java programming language|Java]], support a form of abstract data types. When a class is used as a type, it is an abstract type that refers to a hidden representation. In this model, an ADT is typically implemented as a [[class (computer science)|class]], and each instance of the ADT is usually an [[object (computer science)|object]] of that class. The module's interface typically declares the constructors as ordinary procedures, and most of the other ADT operations as [[method (computer programming)|methods]] of that class. However, such an approach does not easily encapsulate multiple representational variants found in an ADT. It also can undermine the extensibility of object-oriented programs. In a pure object-oriented program that uses interfaces as types, types refer to behaviours, not representations.
Modern object-oriented languages, such as [[C++]] and [[Java programming language|Java]], support a form of abstract data types. When a class is used as a type, it is an abstract type that refers to a hidden representation. In this model, an ADT is typically implemented as a [[class (computer science)|class]], and each instance of the ADT is usually an [[object (computer science)|object]] of that class. The module's interface typically declares the constructors as ordinary procedures, and most of the other ADT operations as [[method (computer programming)|methods]] of that class. Many modern programming languages, such as C++ and Java, come with standard libraries that implement numerous ADTs in this style. However, such an approach does not easily encapsulate multiple representational variants found in an ADT. It also can undermine the extensibility of object-oriented programs. In a pure object-oriented program that uses interfaces as types, types refer to behaviours, not representations.

The specification of some programming languages is intentionally vague about the representation of certain built-in data types, defining only the operations that can be done on them. Therefore, those types can be viewed as "built-in ADTs". Examples are the arrays in many scripting languages, such as [[Awk]], [[Lua (programming language)|Lua]], and [[Perl]], which can be regarded as an implementation of the abstract list.

In a [[formal specification language]], ADTs may be defined axiomatically, and the language then allows manipulating values of these ADTs, thus providing a straightforward and immediate implementation. The [[OBJ (programming language)|OBJ]] family of programming languages for instance allows defining [[equation]]s for specification and [[rewriting]] to run them. Such automatic implementations are usually not as efficient as dedicated implementations, however.


===Example: implementation of the abstract stack===
===Example: implementation of the abstract stack===
Line 204: Line 181:
</syntaxhighlight>
</syntaxhighlight>


This interface can be implemented in many ways. The implementation may be arbitrarily inefficient, since the formal definition of the ADT, above, does not specify how much space the stack may use, nor how long each operation should take. It also does not specify whether the stack state ''s'' continues to exist after a call ''x'' ← <kbd>pop</kbd>(''s'').
This interface can be implemented in many ways. The implementation may be arbitrarily inefficient, since the formal definition of the ADT, above, does not specify how much space the stack may use, nor how long each operation should take. It also does not specify whether the stack state ''s'' continues to exist after a call ''x'' ← {{code|pop}}(''s'').


In practice the formal definition should specify that the space is proportional to the number of items pushed and not yet popped; and that every one of the operations above must finish in a constant amount of time, independently of that number. To comply with these additional specifications, the implementation could use a linked list, or an array (with dynamic resizing) together with two integers (an item count and the array size).
In practice the formal definition should specify that the space is proportional to the number of items pushed and not yet popped; and that every one of the operations above must finish in a constant amount of time, independently of that number. To comply with these additional specifications, the implementation could use a linked list, or an array (with dynamic resizing) together with two integers (an item count and the array size).
Line 220: Line 197:
stack_Item stack_top(stack_T s); // returns the top item of the stack state
stack_Item stack_top(stack_T s); // returns the top item of the stack state
</syntaxhighlight>
</syntaxhighlight>

===ADT libraries===
Many modern programming languages, such as C++ and Java, come with standard libraries that implement several common ADTs, such as those listed above.

===Built-in abstract data types===
The specification of some programming languages is intentionally vague about the representation of certain built-in data types, defining only the operations that can be done on them. Therefore, those types can be viewed as "built-in ADTs". Examples are the arrays in many scripting languages, such as [[Awk]], [[Lua (programming language)|Lua]], and [[Perl]], which can be regarded as an implementation of the abstract list.


==See also==
==See also==

Revision as of 19:51, 25 November 2023

In computer science, an abstract data type (ADT) is a mathematical model for data types, defined by its behavior (semantics) from the point of view of a user of the data, specifically in terms of possible values, possible operations on data of this type, and the behavior of these operations. This mathematical model contrasts with data structures, which are concrete representations of data, and are the point of view of an implementer, not a user. For example, a stack has push/pop operations that follow a Last-In-First-Out rule, and can be concretely implemented using either a list or an array. Another example is a set which stores values, without any particular order, and no repeated values. Values themselves are not retrieved from sets, rather one tests a value for membership to obtain a Boolean "in" or "not in".

ADTs are a theoretical concept, used in formal semantics and program verification and, less strictly, in the design and analysis of algorithms, data structures, and software systems. Most mainstream computer languages do not directly support formally specifying ADTs. However, various language features correspond to certain aspects of implementing ADTs, and are easily confused with ADTs proper; these include abstract types, opaque data types, protocols, and design by contract. For example, in modular programming, the module declares procedures that correspond to the ADT operations, often with comments that describe the constraints. This information hiding strategy allows the implementation of the module to be changed without disturbing the client programs, but the module only informally defines an ADT. The notion of abstract data types is related to the concept of data abstraction, important in object-oriented programming and design by contract methodologies for software engineering.[citation needed]

History

ADTs were first proposed by Barbara Liskov and Stephen N. Zilles in 1974, as part of the development of the CLU language.[1] Algebraic specification was an important subject of research in CS around 1980 and almost a synonym for abstract data types at that time.[2] It has a mathematical foundation in universal algebra.[3]

Definition

Formally, an ADT is analogous to an algebraic structure in mathematics,[4] consisting of a domain, a collection of operations, and a set of constraints the operations must satisfy.[5] The domain is often defined implicitly, for example the free object over the set of ADT operations. The interface of the ADT typically refers only to the domain and operations, and perhaps some of the constraints on the operations, such as pre-conditions and post-conditions; but not to other constraints, such as relations between the operations, which are considered behavior. There are two main styles of formal specifications for behavior, axiomatic semantics and operational semantics.[6]

Despite not being part of the interface, the constraints are still important to the definition of the ADT; for example a stack and a queue have similar add element/remove element interfaces, but it is the constraints that distinguish last-in-first-out from first-in-first-out behavior. The constraints do not consist only of equations such as fetch(store(S,v))=v but also logical formulas.

Axiomatic semantics

In the spirit of functional programming, each state of an abstract data structure is a separate entity or value. In this view, each operation is modelled as a mathematical function with no side effects. Operations that modify the ADT are modeled as functions that take the old state as an argument and returns the new state as part of the result. The order in which operations are evaluated is immaterial, and the same operation applied to the same arguments (including the same input states) will always return the same results (and output states). The constraints are specified as axioms or algebraic laws that the operations must satisfy.

Operational semantics

In the spirit of imperative programming, an abstract data structure is conceived as an entity that is mutable—meaning that there is a notion of time and the ADT may be in different states at different times. Operations then change the state of the ADT over time; therefore, the order in which operations are evaluated is important, and the same operation on the same entities may have different effects if executed at different times. This is analogous to the instructions of a computer or the commands and procedures of an imperative language. To underscore this view, it is customary to say that the operations are executed or applied, rather than evaluated, similar to the imperative style often used when describing abstract algorithms. The constraints are typically specified in prose.

Auxiliary operations

Presentations of ADTs are often limited in scope to only key operations. More thorough presentations often specify auxiliary operations on ADTs, such as:

  • create(), that yields a new instance of the ADT;
  • compare(s, t), that tests whether two instances' states are equivalent in some sense;
  • hash(s), that computes some standard hash function from the instance's state;
  • print(s) or show(s), that produces a human-readable representation of the instance's state.

These names are illustrative and may vary between authors. In imperative-style ADT definitions, one often finds also:

  • initialize(s), that prepares a newly created instance s for further operations, or resets it to some "initial state";
  • copy(s, t), that puts instance s in a state equivalent to that of t;
  • clone(t), that performs screate(), copy(s, t), and returns s;
  • free(s) or destroy(s), that reclaims the memory and other resources used by s.

The free operation is not normally relevant or meaningful, since ADTs are theoretical entities that do not "use memory". However, it may be necessary when one needs to analyze the storage used by an algorithm that uses the ADT. In that case, one needs additional axioms that specify how much memory each ADT instance uses, as a function of its state, and how much of it is returned to the pool by free.

Restricted types

The definition of an ADT often restricts the stored value(s) for its instances, to members of a specific set X called the range of those variables. For example, an abstract variable may be constrained to only store integers. As in programming languages, such restrictions may simplify the description and analysis of algorithms, and improve its readability.

Aliasing

In the operational style, it is often unclear how multiple instances are handled and if modifying one instance may affect others. A common style of defining ADTs writes the operations as if only one instance exists during the execution of the algorithm, and all operations are applied to that instance. For example, a stack may have operations push(x) and pop(), that operate on the only existing stack. ADT definitions in this style can be easily rewritten to admit multiple coexisting instances of the ADT, by adding an explicit instance parameter (like S in the stack example below) to every operation that uses or modifies the implicit instance. Some ADTs cannot be meaningfully defined without allowing multiple instances, for example when a single operation takes two distinct instances of the ADT as parameters, such as a union operation on sets or a compare operation on lists.

The multiple instance style is sometimes combined with an aliasing axiom, namely that the result of create() is distinct from any instance already in use by the algorithm. Implementations of ADTs may still reuse memory and allow implementations of create() to yield a previously created instance; however, defining that such an instance even is "reused" is difficult in the ADT formalism.

More generally, this axiom may be strengthened to exclude also partial aliasing with other instances, so that composite ADTs (such as trees or records) and reference-style ADTs (such as pointers) may be assumed to be completely disjoint. For example, when extending the definition of an abstract variable to include abstract records, operations upon a field F of a record variable R, clearly involve F, which is distinct from, but also a part of, R. A partial aliasing axiom would state that changing a field of one record variable does not affect any other records.

Complexity analysis

Some authors also include the computational complexity ("cost") of each operation, both in terms of time (for computing operations) and space (for representing values), to aid in analysis of algorithms. For example, one may specify that each operation takes the same time and each value takes the same space regardless of the state of the ADT, or that there is a "size" of the ADT and the operations are linear, quadratic, etc. in the size of the ADT. Alexander Stepanov, designer of the C++ Standard Template Library, included complexity guarantees in the STL specification, arguing:

The reason for introducing the notion of abstract data types was to allow interchangeable software modules. You cannot have interchangeable modules unless these modules share similar complexity behavior. If I replace one module with another module with the same functional behavior but with different complexity tradeoffs, the user of this code will be unpleasantly surprised. I could tell him anything I like about data abstraction, and he still would not want to use the code. Complexity assertions have to be part of the interface.

— Alexander Stepanov[7]

Other authors disagree, arguing that a stack ADT is the same whether it is implemented with a linked list or an array, despite the difference in operation costs, and that an ADT specification should be independent of implementation.

Examples

Abstract variable

An abstract variable may be regarded as the simplest non-trivial ADT, with the semantics of an imperative variable. It admits two operations, fetch and store. Operational definitions are often written in terms of abstract variables. In the axiomatic semantics, letting be the type of the abstract variable and be the type of its contents, fetch is a function and store is a function of type . The main constraint is that fetch always returns the value x used in the most recent store operation on the same variable V, i.e. fetch(store(V,x)) = x. We may also require that store overwrites the value fully, store(store(V,x1),x2) = store(V,x2).

In the operational semantics, fetch(V) is a procedure that returns the current value in the location V, and store(V, x) is a procedure with void return type that stores the value x in the location V. The constraints are described informally as that reads are consistent with writes. As in many programming languages, the operation store(V, x) is often written Vx (or some similar notation), and fetch(V) is implied whenever a variable V is used in a context where a value is required. Thus, for example, VV + 1 is commonly understood to be a shorthand for store(V,fetch(V) + 1).

In this definition, it is implicitly assumed that names are always distinct: storing a value into a variable U has no effect on the state of a distinct variable V. To make this assumption explicit, one could add the constraint that:

  • if U and V are distinct variables, the sequence { store(U, x); store(V, y) } is equivalent to { store(V, y); store(U, x) }.

This definition does not say anything about the result of evaluating fetch(V) when V is un-initialized, that is, before performing any store operation on V. Fetching before storing can be disallowed, defined to have a certain result, or left unspecified. There are some algorithms whose efficiency depends on the assumption that such a fetch is legal, and returns some arbitrary value in the variable's range.

Abstract stack

An abstract stack is a last-in-first-out structure, It is generally defined by three key operations: push, that inserts a data item onto the stack; pop, that removes a data item from it; and peek or top, that accesses a data item on top of the stack without removal. A complete abstract stack definition includes also a Boolean-valued function empty(S) and a create() operation that returns an initial stack instance.

In the axiomatic semantics, letting be the type of state states and be the type of values contained in the stack, these could have the types , , , , and . In the axiomatic semantics, creating the initial stack is a "trivial" operation, and always returns the same distinguished state. Therefore, it is often designated by a special symbol like Λ or "()". The empty operation predicate can then be written simply as or .

The constraints are then pop(push(S,v))=(S,v), top(push(S,v))=v,[8], empty(create = T) (a newly created stack is empty), empty(push(S, x)) = F (pushing something into a stack makes it non-empty). These axioms do not define the effect of top(s) or pop(s), unless s is a stack state returned by a push. Since push leaves the stack non-empty, those two operations can be defined to be invalid when s = Λ. From these axioms (and the lack of side effects), it can be deduced that push(Λ, x) ≠ Λ. Also, push(s, x) = push(t, y) if and only if x = y and s = t.

As in some other branches of mathematics, it is customary to assume also that the stack states are only those whose existence can be proved from the axioms in a finite number of steps. In this case, it means that every stack is a finite sequence of values, that becomes the empty stack (Λ) after a finite number of pops. By themselves, the axioms above do not exclude the existence of infinite stacks (that can be popped forever, each time yielding a different state) or circular stacks (that return to the same state after a finite number of pops). In particular, they do not exclude states s such that pop(s) = s or push(s, x) = s for some x. However, since one cannot obtain such stack states from the initial stack state with the given operations, they are assumed "not to exist".

In the operational definition of an abstract stack, push(S, x) returns nothing and pop(S) yields the value as the result but not the new state of the stack. There is then the constraint that, for any value x and any abstract variable V, the sequence of operations { push(S, x); Vpop(S) } is equivalent to Vx. Since the assignment Vx, by definition, cannot change the state of S, this condition implies that Vpop(S) restores S to the state it had before the push(S, x). From this condition and from the properties of abstract variables, it follows, for example, that the sequence:

{ push(S, x); push(S, y); Upop(S); push(S, z); Vpop(S); Wpop(S) }

where x, y, and z are any values, and U, V, W are pairwise distinct variables, is equivalent to:

{ Uy; Vz; Wx }

Unlike the axiomatic semantics, the operational semantics can suffer from aliasing. Here it is implicitly assumed that operations on a stack instance do not modify the state of any other ADT instance, including other stacks; that is:

  • For any values x, y, and any distinct stacks S and T, the sequence { push(S, x); push(T, y) } is equivalent to { push(T, y); push(S, x) }.

Boom hierarchy

A more involved example is the Boom hierarchy of the binary tree, list, bag and set abstract data types.[9] All these data types can be declared by three operations: null, which constructs the empty container, single, which constructs a container from a single element and append, which combines two containers of the same type. The complete specification for the four data types can then be given by successively adding the following rules over these operations:

- null is the left and right neutral for a tree: append(null,A) = A, append(A,null) = A.
- lists add that append is associative: append(append(A,B),C) = append(A,append(B,C)).
- bags add commutativity: append(B,A) = append(A,B).
- finally, sets are also idempotent: append(A,A) = A.

Access to the data can be specified by pattern-matching over the three operations, e.g. a member function for these containers by:

- member(X,single(Y)) = eq(X,Y)
- member(X,null) = false
- member(X,append(A,B)) = or(member(X,A), member(X,B))

Care must be taken to ensure that the function is invariant under the relevant rules for the data type. Within each of the equivalence classes implied by the chosen subset of equations, it has to yield the same result for all of its members.

Common ADTs

Some common ADTs, which have proved useful in a great variety of applications, are

Each of these ADTs may be defined in many ways and variants, not necessarily equivalent. For example, an abstract stack may or may not have a count operation that tells how many items have been pushed and not yet popped. This choice makes a difference not only for its clients but also for the implementation.

Abstract graphical data type

An extension of ADT for computer graphics was proposed in 1979:[10] an abstract graphical data type (AGDT). It was introduced by Nadia Magnenat Thalmann, and Daniel Thalmann. AGDTs provide the advantages of ADTs with facilities to build graphical objects in a structured way.

Implementation

Abstract data types are theoretical entities, used (among other things) to simplify the description of abstract algorithms, to classify and evaluate data structures, and to formally describe the type systems of programming languages. However, an ADT may be implemented. This means each ADT instance or state is represented by some concrete data type or data structure, and for each abstract operation there is a corresponding procedure or function, and these implemented procudures satisfy the ADT's specifications and axioms up to some standard. In practice, the implementation is not perfect, and users must be aware of issues due to limitations of the representation and implemented procedures.

For example, integers may be specified as an ADT, defined by the distinguished values 0 and 1, the operations of addition, subtraction, multiplication, division (with care for division by zero), comparison, etc., behaving according to the familiar mathematical axioms in abstract algebra such as associativity, commutativity, and so on. However, in a computer, integers are most commonly represented as fixed-width 32-bit or 64-bit binary numbers. Users must be aware of issues with this representation, such as arithmetic overflow, where the ADT specifies a valid result but the representation is unable to accomodate this value. Nonetheless, for many purposes, the user can ignore these infidelities and simply use the implementation as if it were the abstract data type.

Usually, there are many ways to implement the same ADT, using several different concrete data structures. Thus, for example, an abstract stack can be implemented by a linked list or by an array. Different implementations of the ADT, having all the same properties and abilities, can be considered semantically equivalent and may be used somewhat interchangeably in code that uses the ADT. This provides a form of abstraction or encapsulation, and gives a great deal of flexibility when using ADT objects in different situations. For example, different implementations of the ADT may be more efficient in different situations; it is possible to use each in the situation where they are preferable, thus increasing overall efficiency. Code that uses an ADT implementation according to its interface will continute working even if the implementation of the ADT is changed.

In order to prevent clients from depending on the implementation, an ADT is often packaged as an opaque data type or handle of some sort,[11] in one or more modules, whose interface contains only the signature (number and types of the parameters and results) of the operations. The implementation of the module—namely, the bodies of the procedures and the concrete data structure used—can then be hidden from most clients of the module. This makes it possible to change the implementation without affecting the clients. If the implementation is exposed, it is known instead as a transparent data type.

Modern object-oriented languages, such as C++ and Java, support a form of abstract data types. When a class is used as a type, it is an abstract type that refers to a hidden representation. In this model, an ADT is typically implemented as a class, and each instance of the ADT is usually an object of that class. The module's interface typically declares the constructors as ordinary procedures, and most of the other ADT operations as methods of that class. Many modern programming languages, such as C++ and Java, come with standard libraries that implement numerous ADTs in this style. However, such an approach does not easily encapsulate multiple representational variants found in an ADT. It also can undermine the extensibility of object-oriented programs. In a pure object-oriented program that uses interfaces as types, types refer to behaviours, not representations.

The specification of some programming languages is intentionally vague about the representation of certain built-in data types, defining only the operations that can be done on them. Therefore, those types can be viewed as "built-in ADTs". Examples are the arrays in many scripting languages, such as Awk, Lua, and Perl, which can be regarded as an implementation of the abstract list.

In a formal specification language, ADTs may be defined axiomatically, and the language then allows manipulating values of these ADTs, thus providing a straightforward and immediate implementation. The OBJ family of programming languages for instance allows defining equations for specification and rewriting to run them. Such automatic implementations are usually not as efficient as dedicated implementations, however.

Example: implementation of the abstract stack

As an example, here is an implementation of the abstract stack above in the C programming language.

Imperative-style interface

An imperative-style interface might be:

typedef struct stack_Rep stack_Rep;       // type: stack instance representation (opaque record)
typedef stack_Rep* stack_T;               // type: handle to a stack instance (opaque pointer)
typedef void* stack_Item;                 // type: value stored in stack instance (arbitrary address)

stack_T stack_create(void);               // creates a new empty stack instance
void stack_push(stack_T s, stack_Item x); // adds an item at the top of the stack
stack_Item stack_pop(stack_T s);          // removes the top item from the stack and returns it
bool stack_empty(stack_T s);              // checks whether stack is empty

This interface could be used in the following manner:

#include <stack.h>          // includes the stack interface

stack_T s = stack_create(); // creates a new empty stack instance
int x = 17;
stack_push(s, &x);          // adds the address of x at the top of the stack
void* y = stack_pop(s);     // removes the address of x from the stack and returns it
if (stack_empty(s)) { }     // does something if stack is empty

This interface can be implemented in many ways. The implementation may be arbitrarily inefficient, since the formal definition of the ADT, above, does not specify how much space the stack may use, nor how long each operation should take. It also does not specify whether the stack state s continues to exist after a call xpop(s).

In practice the formal definition should specify that the space is proportional to the number of items pushed and not yet popped; and that every one of the operations above must finish in a constant amount of time, independently of that number. To comply with these additional specifications, the implementation could use a linked list, or an array (with dynamic resizing) together with two integers (an item count and the array size).

Functional-style interface

Functional-style ADT definitions are more appropriate for functional programming languages, and vice versa. However, one can provide a functional-style interface even in an imperative language like C. For example:

typedef struct stack_Rep stack_Rep;          // type: stack state representation (opaque record)
typedef stack_Rep* stack_T;                  // type: handle to a stack state (opaque pointer)
typedef void* stack_Item;                    // type: value of a stack state (arbitrary address)

stack_T stack_empty(void);                   // returns the empty stack state
stack_T stack_push(stack_T s, stack_Item x); // adds an item at the top of the stack state and returns the resulting stack state
stack_T stack_pop(stack_T s);                // removes the top item from the stack state and returns the resulting stack state
stack_Item stack_top(stack_T s);             // returns the top item of the stack state

See also

Notes

Citations

  1. ^ Liskov & Zilles 1974.
  2. ^ Ehrig, H. (1985). Fundamentals of Algebraic Specification 1 - Equations and Initial Semantics. Springer-Verlag. ISBN 0-387-13718-1.
  3. ^ Wechler, Wolfgang (1992). Universal Algebra for Computer Scientists. Springer-Verlag. ISBN 0-387-54280-9.
  4. ^ Rudolf Lidl (2004). Abstract Algebra. Springer. ISBN 978-81-8128-149-4., Chapter 7, section 40.
  5. ^ Dale & Walker 1996, p. 3.
  6. ^ Dale & Walker 1996, p. 4.
  7. ^ Stevens, Al (March 1995). "Al Stevens Interviews Alex Stepanov". Dr. Dobb's Journal. Retrieved 31 January 2015.
  8. ^ Black, Paul E. (24 August 2005). "axiomatic semantics". Dictionary of Algorithms and Data Structures. Retrieved 25 November 2023.
  9. ^ Bunkenburg, Alexander (1994). "The Boom Hierarchy". Functional Programming, Glasgow 1993. Workshops in Computing: 1–8. CiteSeerX 10.1.1.49.3252. doi:10.1007/978-1-4471-3236-3_1. ISBN 978-3-540-19879-6.
  10. ^ D. Thalmann, N. Magnenat Thalmann (1979). Design and Implementation of Abstract Graphical Data Types. IEEE. doi:10.1109/CMPSAC.1979.762551., Proc. 3rd International Computer Software and Applications Conference (COMPSAC'79), IEEE, Chicago, USA, pp.519-524
  11. ^ Robert Sedgewick (1998). Algorithms in C. Addison/Wesley. ISBN 978-0-201-31452-6., definition 4.4.

References

Further reading

External links