Jump to content

Heapsort

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by ZeroOne (talk | contribs) at 19:33, 4 September 2004 (why was there a __NOTOC__? removed it.). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Heapsort is one of the best general-purpose sort algorithms, and is part of the selection sort family. It has very good running time performance on randomly ordered arrays, well behaved memory use, and its worst-case performance is effectively the same as its average performance.

Heapsort in action

Consider placing a number of objects into a heap data structure. The heap places an ordering upon elements added to it, so the largest elements will be placed at the top of the heap (we term this heap a max-heap), or the smallest elements will be placed at the top of the heap (we term this heap a min-heap). The heap data structure also allows us to perform the operation of extracting the minimum or maximum element in the heap, reordering what elements are left in the heap.

We can exploit this property to obtain an efficient sort algorithm. If we have an unsorted collection of elements we wish to sort, we add them into a min-heap, and then continue extracting the minimum element of the heap, placing this element first in order, then second and so on will give the elements in sorted order. Likewise, if we have a max-heap, we extract continue extracting the maximum element of the heap, placing the elements however backwards, giving the maximum element last in order, moving to the minimum element first in order.

In doing so, we require only as much extra space to store the heap (assuming the original array of unsorted elements as given).

Why it works

The heap data structure is designed to act as a priority queue, placing the largest elements at the front of the queue and the smallest elements at the back, or vice versa. When adding elements to (or deleting them from) a heap, one must preserve this so-called heap property. When using a binary tree to implement the heap, the relevant operation is often known as up-heap, bubble-up, or sift-up. Since trees can be represented by means of an array, arrays can be used to implement heaps also.

Although not widely known, it is possible to define a ternary Heapsort. Instead of each element having two children, each element has three. Ternary Heapsort is somewhat more complicated to program, but it is potentially faster. Each step in a ternary heap requires three comparisons and one swap vs. two comparisons and one swap in a binary heap. The ternary heap can do two steps in less time than the binary heap requires for three steps. But two steps of a ternary tree multiply the index by a factor of 9, which is more than the factor 8 of three binary steps. Ternary Heapsort is about 12% faster than binary Heapsort.

Analysis of heapsort

Heapsort works thus in-place and the worst-case running time to sort n elements is O(n log n). For reasonably large values of n, the log n term is almost constant, so that the sorting time is close to linear in the number of items to sort.

Because Heapsort requires only a fixed amount of extra storage space, it is a true in-place algorithm. The up-heap operation may use tail recursion which requires stack space in naive interpreters and compilers, but a good language implementation or a competent programmer could straightforwardly convert the algorithm to an iterative form.

Heapsort is not a stable sort.

Comparison with other sorts

Heapsort primarily competes with Quicksort, another very efficient general purpose nearly-in-place comparison-based sort algorithm.

Quicksort is typically somewhat faster, but the worst-case running time for Quicksort is O(n2) which is unacceptable for large data sets and can be deliberately triggered given enough knowledge of the implementation, creating a security risk. See Quicksort for a detailed discussion of this problem, and possible solutions.

The Quicksort algorithm also requires Ω(log n) extra storage space on average, making it not a strictly in-place algorithm. This typically does not pose a problem except on the smallest embedded systems, or on systems where memory allocation is highly restricted. Constant space (in-place) variants of Quicksort are possible to construct, but are rarely used in practice due to their extra complexity.

Thus, because of the O(n log n) upper bound on Heapsort's running time and constant upper bound on its auxiliary storage, embedded systems with real-time constraints or systems concerned with security often use Heapsort.

Heapsort also competes with Mergesort, which has the same time bounds, but requires Ω(n) auxilary space, whereas Heapsort requires only a constant amount. Heapsort also typically runs more quickly in practice. However, Mergesort is generally seen as simpler to understand than heapsort, is a stable sort, parallelizes better, and can be easily adapted to operate on linked lists and very large lists stored on slow-to-access media such as disk storage or network-attached storage. Heapsort shares none of these benefits.