Jump to content

Flashsort

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by 64.44.80.252 (talk) at 09:43, 10 February 2021 (Rewrite complete. I'm sure there are a few embarrassing typos that I'll spot after a bit of rest.). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Flashsort is a distribution sorting algorithm showing linear computational complexity O(n) for uniformly distributed data sets and relatively little additional memory requirement. The original work was published in 1998 by Karl-Dietrich Neubert.[1]

Concept

Flashsort is an efficient in-place implementation of bucket sort. It assigns each of the the n input elements to one of m buckets, efficiently rearranges the input to place the buckets in the correct order, then sorts each bucket. The original algorithm sorts an input array A as follows:

  1. Using a first pass over the input or a priori knowledge, find the minimum and maximum sort keys.
  2. Linearly divide the range [Amin, Amax] into m buckets.
  3. Make one pass over the input, counting the number of elements Ai which fall into each bucket. (Neubert calls the buckets "classes" and the assignment of elements to their buckets "classification".)
  4. Convert the counts of elements in each bucket to a prefix sum, where Lb is the number of elements Ai in bucket b or less. (L0 = 0 and Lm = n.)
  5. Rearrange the input to all elements of each bucket b are stored in positions Ai where Lb−1 < iLb.
  6. Sort each bucket using insertion sort.

Steps 1–3 and 6 are common to any bucket sort, and can be improved using techniques generic to bucket sorts. In particular, the goal is for the buckets to be of approximately equal size (n/m elements each),[1] with the ideal being division into m quantiles. If the input distribution is known to be non-uniform, a non-linear division will more closely approximate this ideal. Likewise, the final sort can use any of a number of techniques, including a recursive flash sort.

The significant contribution of flash sort is step 5: an efficient O(n) in-place algorithm for collecting the elements of each bucket together in the correct relative order using only m words of additional memory.

Memory efficient implementation

Flashsort is easy to understand using two (word-sized) variables per bucket. The clever part is the elimination of one of those variables, allowing twice as many buckets to be used and therefore half as much time spent on the final O(n2) sorting.

To understand it with two variables per bucket, assume there are are two arrays of m additional words: Kb is the (fixed) upper limit of bucket b, and Lb is a (movable) index into bucket b, so Kb−1LbKb.

We maintain the loop invariant that each bucket is divided by Lb into an unclassified prefix (Ai for Kb−1 < iLb have yet to be moved to their target buckets) and a classified suffix (Ai for Lb < iKb are all in the correct bucket and will not be moved again). Initially Lb = Kb and all elements are unclassified. As sorting proceeds, the Lb are decremented until Lb = Kb−1 for all b and all elements are classified into the correct bucket.

Each round begins by finding the first unclassified element Ai, i.e. i = Kc−1 + 1 for the first incompletely classified bucket c which has Kc−1 < Lc. (Neubert calls this the "cycle leader".) Copy Ai to a temporary variable t and repeat:

  • Compute the bucket b to which t belongs.
  • Let j = Lb be the location where t will be stored.
  • Exchange t with Aj, i.e. store t in Aj while fetching the previous value Aj thereby displaced.
  • Decrement Lb to reflect the fact that Aj is now correctly classified.
  • If ji, restart this loop with the new t.
  • If j = i, this round is over and find a new first unclassified element Ai.
  • When there are no more unclassified elements, the distribution into buckets is complete.

When implemented with two variables per bucket in this way, the choice of each round's starting point i is in fact arbitrary; any unclassified element may be used as a cycle leader. The only requirement is that the cycle leaders can be found efficiently.

Although the preceding description uses K to find the cycle leaders, it is in fact possible to do without it, allowing the entire m-word array to be eliminated. (After the distribution is complete, the bucket boundaries can be found in L.)

Suppose that we have classified all elements up to i−1, and are considering Ai as a potential new cycle leader. It is easy to compute its target bucket b. By the loop invariant, it is classified if Lb < iKb, and unclassified if i is outside that range. The first inequality is easy to test, but the second appears to require the value Kb.

It turns out that the induction hypothesis that all elements up to i−1 are classified implies that iKb, so it is not necessary to test the second inequality.

Consider the bucket c which position i falls into. That is, Kc−1 < iKc. By the induction hypothesis, all elements below i, which includes all buckets up to Kc−1 < i, are completely classified. I.e. no elements which belong in those buckets remain in the rest of the array. Therefore, it is not possible that b < c.

The only remaining case is bc, which implies KbKci, Q.E.D.

Incorporating this, the flashsort distribution algorithm begins with L as described above and i = 1. Then proceed:[1][2]

  • If i > n, the distribution is complete.
  • Given Ai, compute the bucket b to which it belongs.
  • If iLb, then Ai is unclassified. Copy it a temporary variable t and:
    • Let j = Lb be the location where t will be stored.
    • Exchange t with Aj, i.e. store t in Aj while fetching the previous value Aj thereby displaced.
    • Decrement Lb to reflect the fact that Aj is now correctly classified.
    • If ji, compute the bucket b to which t belongs and restart this (inner) loop with the new t.
  • Ai is now correctly classified. Increment i and restart the (outer) loop.

While saving memory, Flashsort has the disadvantage that it recomputes the bucket for many already-classified elements. This can be expensive if buckets are assigned using a more complex distribution than simple linear interpolation. A variant reduces the number of recomputations from almost n to at most m:

  • Maintain a variable c identifying the first incompletely-classified bucket. Let c = 1 to begin with, and when c > m, the distribution is complete.
  • Let i = Lc
  • Compute the bucket b to which Ai belongs.
  • If b < c, then Lc = Kc−1 and we are done with bucket c. Increment c and restart this loop.
  • If b = c, the classification is trivial. Decrement Lc and restart this loop.
  • If b > c, then Ai is unclassified. Perform the same classification loop as the previous case, then restart this loop.

Performance

The only extra memory requirements are the auxiliary vector for storing bucket bounds and the constant number of other variables used.

In the ideal case of a balanced data set, each bucket will be approximately the same size. If the number m of buckets is linear in the input size n, each bucket has a constant size, so sorting a single bucket with an O(n^2) algorithm like insertion sort has complexity O(12) = O(1). The running time of the final insertion sorts is therefore m ċ O(1) = O(m) = O(n). In the worst-case scenarios where almost all the elements are in a few buckets, the complexity of the algorithm is limited by the performance of the final-step sorting method, so degrades to O(n^2). Variations of the algorithm improve worst-case performance by using better-performing sorts such as quicksort or recursive flashsort on buckets which exceed a certain size limit.[2][3]

Choosing a value for m, the number of buckets, trades off time spent classifying elements (high m) and time spent in the final insertion sort step (low m).

Memory-wise, flashsort avoids the overhead needed to store buckets in the standard bucket sort. For m = 0.1 n with uniformly distributed random data, flashsort is faster than heapsort for all and faster than quicksort for n > 80. It becomes about as twice as fast as quicksort at n = 10000.[1]

Uniformly-distributed input is, however, a best case. Flashsort performance degrades on non-uniform inputs.

Due to the in situ permutation that flashsort performs in its classification process, flashsort is not stable. If stability is required, it is possible to use a second array so elements can be classified sequentially. However, in this case, the algorithm will require O(n) additional memory.

See also

References

  1. ^ a b c d Neubert, Karl-Dietrich (February 1998). "The Flashsort1 Algorithm". Dr. Dobb's Journal. 23 (2): 123–125, 131. Retrieved 2007-11-06.
  2. ^ a b Neubert, Karl-Dietrich (1998). "The FlashSort Algorithm". Retrieved 2007-11-06.
  3. ^ Xiao, Li; Zhang, Xiaodong; Kubricht, Stefan A. (2000). "Improving Memory Performance of Sorting Algorithms: Cache-Effective Quicksort". ACM Journal of Experimental Algorithmics. 5. CiteSeerX 10.1.1.43.736. doi:10.1145/351827.384245. Archived from the original on 2007-11-02. Retrieved 2007-11-06.