# Samplesort

Samplesort is a sorting algorithm that is a divide and conquer algorithm often used in parallel processing systems.[1] Conventional divide and conquer sorting algorithms partitions the array into sub-intervals or buckets. The buckets are then sorted individually and then concatenated together. However, if the array is non-uniformly distributed, the performance of these sorting algorithms can be significantly throttled. Samplesort addresses this issue by selecting a sample of size s from the n-element sequence, and determining the range of the buckets by sorting the sample and choosing m -1 elements from the result. These elements (called splitters) then divide the sample into m equal-sized buckets.[2] Samplesort is described in the 1970 paper, "Samplesort: A Sampling Approach to Minimal Storage Tree Sorting", by W. D. Frazer and A. C. McKellar.

## Algorithm

Samplesort can be thought of as a refined quicksort. Where quicksort partitions its input into two parts at each step, based on a single value called the pivot, samplesort instead takes a larger sample from its input and divides its data into buckets accordingly. Like quicksort, it then recursively sorts the buckets.

To devise a samplesort implementation, one needs to decide on the number of buckets p. When this is done, the actual algorithm operates in three phases:[3]

1. Sample p−1 elements from the input (the splitters). Sort these; each pair of adjacent splitters then defines a bucket.
2. Loop over the data, placing each element in the appropriate bucket. (This may mean: send it to a processor, in a multiprocessor system.)
3. Sort each of the buckets.

The full sorted output is the concatenation of the buckets.

A common strategy is to set p equal to the number of processors available. The data is then distributed among the processors, which perform the sorting of buckets using some other, sequential, sorting algorithm.

### Complexity

The complexity, given in Big O notation:

Find the splitters.

${\displaystyle O\left({\frac {n}{p}}+\log(p)\right)}$

Send to buckets.

${\displaystyle O(p)}$ for reading all nodes
${\displaystyle O(\log(p))}$ for broadcasting
${\displaystyle O\left({\frac {n}{p}}\log(p)\right)}$ for binary search for all keys
${\displaystyle O\left({\frac {n}{p}}\right)}$ to send keys to bucket

Sort buckets.

${\displaystyle O(c/p)}$ where ${\displaystyle c}$ is the complexity of the underlying sequential sorting method[1]

## Sampling the data

The data may be sampled through different methods. Some methods include:

1. Pick evenly spaced samples.
2. Pick randomly selected samples.

### Oversampling

The oversampling ratio determines how many data elements to pull as samples. The goal is to get a good representation of the distribution of the data. If the data values are widely distributed, in that there are not many duplicate values, then a small sampling ratio is needed. In other cases where there are many duplicates in the distribution, a larger oversampling ratio will be necessary.

## Selecting the Splitters

The ideal is to pick splitters that separate the data into j buckets of size n/j, where n is the number of elements to be sorted. This is to achieve an even distribution among the buckets, this way no one bucket takes longer than others to be sorted. This can be accomplished by selecting splitters in the sample by stepping through the sorted sample using a/j. Where sample size is a, and bucket size is j such that the values are a/j, 2a/j, ... (j - 1)a/j.

## Uses in parallel systems

Samplesort is often used in parallel systems, including distributed systems such as bulk synchronous parallel machines.[4][3][5] This is done by splitting the sorting for each processor or node, where the number buckets is equal to the number of processors. Sample sort is efficient in parallel systems because each processor receives approximately the same bucket size. Since the buckets are sorted concurrently, the processors will complete the sorting at approximately the same time, thus not having a processor wait for others.

Experiments performed in the early 1990s on Connection Machine supercomputers showed samplesort to be particularly good at sorting large datasets on these machines, because its incurs little interprocessor communication overhead.[6] On latter-day GPUs, the algorithm may be less effective than its alternatives.[7]