Jump to content

User:Earlster/Sandbox: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Earlster (talk | contribs)
No edit summary
Earlster (talk | contribs)
No edit summary
Line 11: Line 11:
This is a WIP for Proxmap using the [[Bucket sort]] page as a template
This is a WIP for Proxmap using the [[Bucket sort]] page as a template
[[Image:Bucket sort 1.png|right|frame|Elements are distributed among bins]]
[[Image:Bucket sort 1.png|right|frame|Elements are distributed among bins]]
[[Image:Bucket sort 2.png|right|frame|Then, elements are sorted within each bin]]
[[Image:Bucket sort 2.png|right|frame|Unlike bucket sorting, the elements are sorted as they are input]]
'''Proxmap Sort''', or '''Proxmap''', is a [[sorting algorithm]] that works by partitioning an [[Array data structure|array]] into a number of [[bucket (computing)|bucket]]s. Several "tracking" arrays are used to reach what is known as the "proxmap," giving the sorting algorithm its title. A key is used to place each of the data into buckets using this proxmap. Data is dropped into each bucket and sorted individually by [[insertion sort]]. Since bucket sort is not a [[comparison sort]], the Ω(''n'' log ''n'') lower bound is inapplicable. The [[computational complexity]] estimates involve the number of buckets and the key used.
'''Proxmap Sort''', or '''Proxmap''', is a [[sorting algorithm]] that works by partitioning an [[Array data structure|array]] into a number of [[bucket (computing)|bucket]]s. Several "tracking" arrays are used to reach what is known as the "proxmap," giving the sorting algorithm its title. A key is used to place each of the data into buckets using this proxmap. Data is dropped into each bucket and sorted individually by [[insertion sort]]. Since bucket sort is not a [[comparison sort]], the Ω(''n'' log ''n'') lower bound is inapplicable. The [[computational complexity]] estimates involve the number of buckets and the key used.


Line 36: Line 36:
Many optimizations include using the same array to store the sorted data, reusing the hitCount, proxMap, and location arrays used to sort the data. To be expanded on later...
Many optimizations include using the same array to store the sorted data, reusing the hitCount, proxMap, and location arrays used to sort the data. To be expanded on later...


===Generic bucket sort===
===Generic bucket sort related to Proxmap===
The most common variant of bucket sort operates on a list of ''n'' numeric inputs between zero and some maximum value ''M'' and divides the value range into ''n'' buckets each of size ''M''/''n''. If each bucket is sorted using [[insertion sort]], the sort can be shown to run in expected linear time (where the average is taken over all possible inputs).<ref>[[Thomas H. Cormen]], [[Charles E. Leiserson]], [[Ronald L. Rivest]], and [[Clifford Stein]]. ''[[Introduction to Algorithms]]'', Second Edition. MIT Press and McGraw-Hill, 2001. ISBN 0-262-03293-7. Section 8.4: Bucket sort, pp.174&ndash;177.</ref> However, the performance of this sort degrades with clustering; if many values occur close together, they will all fall into a single bucket and be sorted slowly.
The most common variant of bucket sort operates on a list of ''n'' numeric inputs between zero and some maximum value ''M'' and divides the value range into ''n'' buckets each of size ''M''/''n''. If each bucket is sorted using [[insertion sort]], the sort can be shown to run in expected linear time (where the average is taken over all possible inputs).<ref>[[Thomas H. Cormen]], [[Charles E. Leiserson]], [[Ronald L. Rivest]], and [[Clifford Stein]]. ''[[Introduction to Algorithms]]'', Second Edition. MIT Press and McGraw-Hill, 2001. ISBN 0-262-03293-7. Section 8.4: Bucket sort, pp.174&ndash;177.</ref> However, the performance of this sort degrades with clustering; if many values occur close together, they will all fall into a single bucket and be sorted slowly. This holds a similar story with Proxmap, if the buckets are too large or too small then the performance of the function degrades severely.


==Comparison with other sorting algorithms==
==Comparison with other sorting algorithms==

Revision as of 15:41, 20 October 2010

Earlster/Sandbox
Example of insertion sort sorting a list of random numbers.
Example of insertion sort sorting a list of random numbers.
Example of insertion sort sorting a list of random numbers.
ClassSorting algorithm
Data structureArray
Worst-case performance
Best-case performance
Average performance
Worst-case space complexity total,

This is a WIP for Proxmap using the Bucket sort page as a template

Elements are distributed among bins
Unlike bucket sorting, the elements are sorted as they are input

Proxmap Sort, or Proxmap, is a sorting algorithm that works by partitioning an array into a number of buckets. Several "tracking" arrays are used to reach what is known as the "proxmap," giving the sorting algorithm its title. A key is used to place each of the data into buckets using this proxmap. Data is dropped into each bucket and sorted individually by insertion sort. Since bucket sort is not a comparison sort, the Ω(n log n) lower bound is inapplicable. The computational complexity estimates involve the number of buckets and the key used.

ProxmapSort works as follows:

  1. Set up an array of initially empty "buckets."
  2. Assign an appropriate sorting key to match the data.
  3. Scatter: Go over the original array, dropping each object into its bucket, insertion sorting as needed.
  4. Gather: Visit the buckets in order and put all elements back into the original array.

Pseudocode

  • Not yet implemented fully*
function proxmap-sort(array, n) is
  buckets ← new array of n empty lists
  for i = 0 to (length(array)-1) do
    insert array[i] into buckets[msbits(array[i], k)]
  for i = 0 to n - 1 do
    next-sort(buckets[i])
  return the concatenation of buckets[0], ..., buckets[n-1]

Here array is the array to be sorted and n is the number of buckets to use. The key function such as msbits(x,k) returns the k most significant bits of x (floor(x/2^(size(x)-k))); different functions can be used to translate the range of elements in array to n buckets, such as translating the letters A–Z to 0–25 or returning the first character (0–255) for sorting strings. Another method could simply take the floor of the incoming data divided by 10, or floor(k/10) Buckets get sorted as the data comes in, not all at the end.

Optimizations

Many optimizations include using the same array to store the sorted data, reusing the hitCount, proxMap, and location arrays used to sort the data. To be expanded on later...

The most common variant of bucket sort operates on a list of n numeric inputs between zero and some maximum value M and divides the value range into n buckets each of size M/n. If each bucket is sorted using insertion sort, the sort can be shown to run in expected linear time (where the average is taken over all possible inputs).[1] However, the performance of this sort degrades with clustering; if many values occur close together, they will all fall into a single bucket and be sorted slowly. This holds a similar story with Proxmap, if the buckets are too large or too small then the performance of the function degrades severely.

Comparison with other sorting algorithms

Generally, it works relatively fast when compared with other sorting algorithms as its not comparison-based. The function is nearly constant access time which makes it very appealing for large databases. Though it might take a few seconds to build, it saves thousands down the line. To be expanded on later...

References

  1. ^ Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms, Second Edition. MIT Press and McGraw-Hill, 2001. ISBN 0-262-03293-7. Section 8.4: Bucket sort, pp.174–177.