From Wikipedia, the free encyclopedia
Jump to: navigation, search

Samplesort is a sorting algorithm that is a divide and conquer algorithm often used in parallel processing systems.[1] Conventional divide and conquer sorting algorithms partitions the array into sub-intervals or buckets. The buckets are then sorted individually and then concatenated together. However, if the array is non-uniformly distributed, the performance of these sorting algorithms can be significantly throttled. Samplesort addresses this issue by selecting a sample of size s from the n-element sequence, and determining the range of the buckets by sorting the sample and choosing m -1 elements from the result. These elements (called splitters) then divide the sample into m equal-sized buckets.[2] Samplesort is described in the 1970 paper, "Samplesort: A Sampling Approach to Minimal Storage Tree Sorting", by W. D. Frazer and A. C. McKellar.


Samplesort can be thought of as a refined quicksort. Where quicksort partitions its input into two parts at each step, based on a single value called the pivot, samplesort instead takes a larger sample from its input and divides its data into buckets accordingly. Like quicksort, it then recursively sorts the buckets.

To devise a samplesort implementation, one needs to decide on the number of buckets p. When this is done, the actual algorithm operates in three phases:[3]

  1. Sample p−1 elements from the input (the splitters). Sort these; each pair of adjacent splitters then defines a bucket.
  2. Loop over the data, placing each element in the appropriate bucket. (This may mean: send it to a processor, in a multiprocessor system.)
  3. Sort each of the buckets.

The full sorted output is the concatenation of the buckets.

A common strategy is to set p equal to the number of processors available. The data is then distributed among the processors, which perform the sorting of buckets using some other, sequential, sorting algorithm.


The complexity, given in Big O notation:

Find the splitters.

Send to buckets.

for reading all nodes
for broadcasting
for binary search for all keys
to send keys to bucket

Sort buckets.

where is the complexity of the underlying sequential sorting method[1]

Sampling the data[edit]

The data may be sampled through different methods. Some methods include:

  1. Pick evenly spaced samples.
  2. Pick randomly selected samples.


The oversampling ratio determines how many data elements to pull as samples. The goal is to get a good representation of the distribution of the data. If the data values are widely distributed, in that there are not many duplicate values, then a small sampling ratio is needed. In other cases where there are many duplicates in the distribution, a larger oversampling ratio will be necessary.

Selecting the Splitters[edit]

The ideal is to pick splitters that separate the data into j buckets of size n/j, where n is the number of elements to be sorted. This is to achieve an even distribution among the buckets, this way no one bucket takes longer than others to be sorted. This can be accomplished by selecting splitters in the sample by stepping through the sorted sample using a/j. Where sample size is a, and bucket size is j such that the values are a/j, 2a/j, ... (j - 1)a/j.

Uses in parallel systems[edit]

Samplesort is often used in parallel systems, including distributed systems such as bulk synchronous parallel machines.[4][3][5] This is done by splitting the sorting for each processor or node, where the number buckets is equal to the number of processors. Sample sort is efficient in parallel systems because each processor receives approximately the same bucket size. Since the buckets are sorted concurrently, the processors will complete the sorting at approximately the same time, thus not having a processor wait for others.

Experiments performed in the early 1990s on Connection Machine supercomputers showed samplesort to be particularly good at sorting large datasets on these machines, because its incurs little interprocessor communication overhead.[6] On latter-day GPUs, the algorithm may be less effective than its alternatives.[7]

See also[edit]


  1. ^ a b "Samplesort using the Standard Template Adaptive Parallel Library" (PDF). Texas A&M University. 
  2. ^ "Bucket and Samplesort". 
  3. ^ a b Hill, Jonathan M. D.; McColl, Bill; Stefanescu, Dan C.; Goudreau, Mark W.; Lang, Kevin; Rao, Satish B.; Suel, Torsten; Tsantilas, Thanasis; Bisseling, Rob H. (1998). "BSPlib: The BSP Programming Library". Parallel Computing. 24 (14): 1947–1980. CiteSeerX accessible. 
  4. ^ Gerbessiotis, Alexandros V.; Valiant, Leslie G. (1992). "Direct Bulk-Synchronous Parallel Algorithms". J. Parallel and Distributed Computing. CiteSeerX accessible. 
  5. ^ Hightower, William L.; Prins, Jan F.; Reif, John H. (1992). Implementations of randomized sorting on large parallel machines (PDF). ACM Symp. on Parallel Algorithms and Architectures. 
  6. ^ Blelloch, Guy E.; Leiserson, Charles E.; Maggs, Bruce M.; Plaxton, C. Gregory; Smith, Stephen J.; Zagha, Marco (1991). A Comparison of Sorting Algorithms for the Connection Machine CM-2. ACM Symp. on Parallel Algorithms and Architectures. CiteSeerX accessible. 
  7. ^ Satish, Nadathur; Harris, Mark; Garland, Michael. Designing Efficient Sorting Algorithms for Manycore GPUs. Proc. IEEE Int'l Parallel and Distributed Processing Symp. CiteSeerX accessible. 

External links[edit]

Frazer and McKellar's samplesort and derivatives:

Adapted for use on parallel computers: