Dynamic array: Difference between revisions

Content deleted Content added

Inline

Revision as of 06:43, 28 July 2007

A dynamic array, growable array, resizable array, dynamic table, or array list is a data structure, an array which is automatically expanded to accommodate new objects if filled beyond its current size. It may also automatically deallocate some of its unused space to save memory. It has become a popular data structure in modern mainstream programming languages, supplied with most standard libraries.

Note that in this article, a dynamic array is not the same thing as a dynamically-allocated array, which is just an ordinary fixed-size array whose size is selected at runtime.

Overview

One of the main disadvantages of a simple array is that it has a single fixed size. A dynamic array automatically resizes the array when there is an attempt to add an element to the end of the array, and there is no more space. However, if just one element is added to the array each time it runs out of space, the cost of the resizing operations rapidly becomes prohibitive.

To deal with this, we instead resize the array by a large amount, such as doubling its size. Then, the next time we need to enlarge the array, we just expand it into some of this reserved space. The amount of space we have allocated for the array is called its capacity, and it may be larger than its current logical size. Here's how the operation adding an element to the end might work in pseudocode:

function insertEnd(dynarray a, element e)
    if (a.size = a.capacity)
        // resize a to twice its current capacity:
        a.capacity = a.capacity * 2  
        // (copy the contents to the new memory location here)
    a[a.size] = e
    a.size = a.size + 1

Using amortized analysis, it can be shown that as long as we expand the array by some fixed percentage each time, the cost for inserting n elements will be O(n); we say the insertions have an amortized cost of O(1) each (see big-O notation for an explanation of this notation).

Many dynamic arrays also deallocate some of the underlying storage if its size drops below a certain threshold, such as 30% of the capacity.

In computer science education, dynamic arrays often serve as an elementary example of amortized analysis because of their simplicity and familiarity.

Performance

In most ways the dynamic array has performance similar to an array, with the addition of new operations to add and remove elements from the end:

Getting or setting the value at a particular index (constant time)
Iterating over the elements in order (linear time, good cache performance)
Add an element to the end (constant amortized time)
Remove an element from the end (constant amortized time, or just constant time if there is no buffer shrinking)

Dynamic arrays benefit from many of the advantages of arrays, including good locality of reference and data cache utilization, compactness (low memory use), and random access. They usually have only a small fixed additional overhead for storing information about the size and capacity. This makes dynamic arrays an attractive tool for building cache-friendly data structures.

The main disadvantage of dynamic arrays compared to normal arrays is that when they grow they reserve a great deal of space (linear space in fact) that may never be used. This becomes especially problematic when there are a large number of such arrays. There are variants however (see Variants) which waste only ${\sqrt {n}}$ space at any time.

Analysis

In choosing the percentage by which to enlarge the table, there is a time-space tradeoff: the average time per insertion operation is about a/(a−1), where a is the multiplier factor such as 2 by which the table size increases each time. On the other hand, capacity minus size space is allocated but not in use, a quantity bounded above by (a−1)n. The choice a=2 is a commonly-used value, but depending on the application many values between about a=1.2 and a=4 may be suitable.

Variants

Dynamic arrays supporting additional operations efficiently can be created. For example, by using cells from the middle of the buffer instead of the beginning of the buffer, one can allow amortized constant time insertion and deletion at both ends, at the expense of keeping track of the starting index of the data (this variation may also require copying during buffer growing more often, since functions like realloc() typically only grow in one direction). See the array-based implementation of deques for more information.

Optionally, this double-ended variant can wrap elements around from the beginning to the end or vice versa when it runs out of room, creating a growable circular buffer which only needs to be enlarged when all cells are full. This is more space-efficient but also destroys the useful property that the array's cells occupy contiguous words of memory, allowing normal indexing.

Another variation, notably used by the library sort algorithm, leaves carefully placed gaps between elements to facilitate rapid insertion in the middle of the list.

One of the problems with the simple dynamic array is that it often has a constant fraction of cells not in use, which wastes space. In a 1999 paper^[1], Brodnik et al. describe a dynamic array data structure which wastes only ${\sqrt {n}}$ space for n elements at any point in time, and they prove a lower bound showing that any dynamic array must waste this much space if the operations are to remain amortized constant time. Additionally, they present a variant where growing and shrinking the buffer has not only amortized but worst-case constant time. However, like the circular buffer variant, these structures have the problem that indexing into the array is somewhat slower in practice.

Language support

The C language supports dynamic arrays through the use of realloc library call. C++'s std::vector is an implementation of dynamic arrays, as are the ArrayList ^[2] classes supplied with the Java API and the .NET Framework. The generic List<> class supplied with version 2.0 of the .NET Framework is also implemented with dynamic arrays. Delphi and D implements dynamic arrays at the language's core. Many scripting languages such as Perl offer dynamic arrays as a built-in primitive data type.

References

^ Brodnik, Andrej; Carlsson, Svante; Sedgewick, Robert; Munro, JI; Demaine, ED (Technical Report CS-99-09), Resizable Arrays in Optimal Time and Space (PDF), Department of Computer Science, University of Waterloo {{citation}}: Check date values in: |date= and |year= / |date= mismatch (help)
^ Javadoc on ArrayList

Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms, Second Edition. MIT Press and McGraw-Hill, 2001. ISBN 0-262-03293-7. Section 17.4: Dynamic tables, pp.416–425.

External links

NIST Dictionary of Algorithms and Data Structures: Dynamic array

[1] Brodnik, Andrej; Carlsson, Svante; Sedgewick, Robert; Munro, JI; Demaine, ED (Technical Report CS-99-09), Resizable Arrays in Optimal Time and Space (PDF), Department of Computer Science, University of Waterloo {{citation}}: Check date values in: |date= and |year= / |date= mismatch (help)

[2] Javadoc on ArrayList

[1]

[2]

@@ Line 1: / Line 1: @@
-A '''dynamic array''', '''growable array''', '''resizable array''', '''dynamic table''', or '''array list''' is a list [[data structure]] based on [[array]]s which is automatically expanded to accommodate new objects if filled beyond its current size. It may also automatically deallocate some of its unused space to save memory. It has become a popular [[data structure]] in modern mainstream programming languages, supplied with most standard libraries.
+A '''dynamic array''', '''growable array''', '''resizable array''', '''dynamic table''', or '''array list''' is a [[data structure]], an [[array]] which is automatically expanded to accommodate new objects if filled beyond its current size. It may also automatically deallocate some of its unused space to save memory. It has become a popular [[data structure]] in modern mainstream programming languages, supplied with most standard libraries.
-Note that in this article, a dynamic array is not the same thing as a [[dynamic memory allocation|dynamically-allocated]] array, which is just an ordinary fixed-size array whose size is selected at runtime. It is also not, strictly speaking, an array but a simple data structure based on arrays.
+Note that in this article, a dynamic array is not the same thing as a [[dynamic memory allocation|dynamically-allocated]] array, which is just an ordinary fixed-size array whose size is selected at runtime.
+== Overview ==
-== Storing a list in an array ==
+One of the main disadvantages of a simple array is that it has a single fixed size. A dynamic array automatically resizes the array when there is an attempt to add an element to the end of the array, and there is no more space. However, if just one element is added to the array each time it runs out of space, the cost of the resizing operations rapidly becomes prohibitive.
-Because arrays are fixed in size, it doesn't make sense to speak of inserting or deleting elements in an array. However, if we have an ''n'' element array, we can store a list of ''k'' elements, where ''k'' is anywhere between 0 and ''n'', in the first ''k'' elements of the array. The leftover elements are ignored and not considered part of the list. Here we distinguish between two different "sizes": the list size or number of elements, and the ''capacity'' or size of the underlying array.
+To deal with this, we instead resize the array by a large amount, such as doubling its size. Then, the next time we need to enlarge the array, we just expand it into some of this reserved space. The amount of space we have allocated for the array is called its ''capacity'', and it may be larger than its current logical size. Here's how the operation adding an element to the end might work in [[pseudocode]]:
-The problem is that with this basic data structure, the list size cannot exceed the capacity. For some applications, it may suffice to simply set the capacity large enough that it is known that the list size will not exceed it. In applications where the list size is more unpredictable, we can increase the capacity by allocating a new, larger array and copying the contents of the old array to the new array. However, this is a very expensive (&Omega;(''n'') or linear-time) operation. Even in environments such as [[C (programming language)|C]] supporting a ''realloc()'' operation, there is no guarantee that each reallocation will not copy contents of the entire array. If we were to expand the capacity every time we add a new element, it would require &Omega;(''n''<sup>2</sup>) or quadratic time to build a list of ''n'' elements, which is prohibitive for large lists.
-== Geometric expansion and amortized cost ==
-To deal with this, we instead anticipate that we will require more space by resizing the underlying array by a large amount, such as doubling its size. Then, the next time we need to add to the list, we just expand it into some of this reserved space. Here's how the operation adding an element to the end might work in [[pseudocode]]:
  '''function''' insertEnd(''dynarray'' a, ''element'' e)
      '''if''' (a.size = a.capacity)
          ''// resize a to twice its current capacity:''
-         a.capacity := a.capacity * 2
+         a.capacity = a.capacity * 2
          ''// (copy the contents to the new memory location here)''
      a[a.size] = e
      a.size = a.size + 1
-Using [[amortized analysis]], it can be shown that as long as we expand the array by some fixed percentage each time (in order words, the sequence of capacities forms a [[Geometric progression]]), the cost for inserting ''n'' elements will be O(''n''); we say the insertions have an ''amortized cost'' of O(1) each (see [[big-O notation]] for an explanation of this notation).
+Using [[amortized analysis]], it can be shown that as long as we expand the array by some fixed percentage each time, the cost for inserting ''n'' elements will be O(''n''); we say the insertions have an ''amortized cost'' of O(1) each (see [[big-O notation]] for an explanation of this notation).
 Many dynamic arrays also deallocate some of the underlying storage if its size drops below a certain threshold, such as 30% of the capacity.
@@ Line 38: / Line 34: @@
 Dynamic arrays benefit from many of the advantages of arrays, including good [[locality of reference]] and [[data cache]] utilization, compactness (low memory use), and [[random access]]. They usually have only a small fixed additional overhead for storing information about the size and capacity. This makes dynamic arrays an attractive tool for building cache-friendly data structures.
+The main disadvantage of dynamic arrays compared to normal arrays is that when they grow they reserve a great deal of space (linear space in fact) that may never be used. This becomes especially problematic when there are a large number of such arrays. There are variants however (see [[#Variants|Variants]]) which waste only <math>\sqrt{n}</math> space at any time.
-On the other hand, compared to other list data structures, dynamic arrays are not as flexible. In particular, in order to add or remove an element at an arbitrary position in a dynamic array, we must shift up or down all following elements, requiring &Omega;(''n'') or linear time, whereas a linked list can do this in constant time. For this reason other list data structures may still be preferred in applications where data is frequently inserted or removed from the middle of the list.
-Compared to normal arrays, the main disadvantage of dynamic arrays is that when they grow they reserve a great deal of space (linear space in fact) that may never be used. This becomes especially problematic when there are a large number of such arrays. There are variants however (see [[#Variants|Variants]]) which waste only <math>\sqrt{n}</math> space at any time.
 == Analysis ==