X + Y sorting

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

In computer science, sorting is the problem of sorting pairs of numbers by their sum. It can be solved using a number of comparisons that is quadratic in the input length, fewer than would be needed needed to sort an unstructured list of equally many items. However, it is not known whether it is possible for an algorithm that solves the problem to have total running time faster by more than a constant factor than comparison sorting.

Problem statement and history[edit]

Unsolved problem in computer science:

Is there an sorting algorithm faster than ?

The input to the sorting problem is two finite collections of numbers and , both of the same length. The problem's output is the collection of all pairs with in and with in , arranged into sorted order by the value of for each pair. One way to solve the problem would be to construct the Cartesian product of the two given sets (the collection of pairs to be sorted) and use this collection of pairs as the input to a standard comparison sorting algorithm such as merge sort or heapsort. When the input collections each consist of numbers, their Cartesian product consists of pairs, and the time for one of these comparison sorting algorithms to sort a collection of this many pairs is . No asymptotically faster algorithm for sorting is known. Whether it can be done in a faster time bound is an open problem,[1][2] posed by Elwyn Berlekamp prior to 1975.[3][2]

Number of orderings[edit]

Together, the two input collections for the sorting problem comprise numbers, which can alternatively be interpreted as the Cartesian coordinates of a point in the -dimensional space . If one partitions this space into cells defined by the property that the collection of pairs to be sorted has a fixed ordering within each cell, then each boundary between two cells lies within a hyperplane defined by an equality of pairs , where and are two pairs whose ordering changes from one adjacent cell to the other. These hyperplanes are either generated by two disjoint pairs, or they have the simplified forms or , so the number of distinct hyperplanes that can be determined in this way is

The number of cells that this number of hyperplanes can divide a space of dimension into is
Therefore, the set has different possible sorted orderings.[3][4][5]

Harper et al. (1975) suggest separately sorting and , and then constructing a two-dimensional matrix of the values of that is sorted both by rows and by columns before using this partially-sorted data to complete the sort of . This idea can reduce the number of comparisons needed by a constant factor, compared to naive comparison sorting. However, they show that the number of possible orderings of a sorted matrix is large enough that any comparison sorting algorithm that can work for arbitrary matrices that are sorted by rows and columns still requires comparisons. Therefore, additional information about the set beyond this matrix ordering would be needed for any faster sorting algorithm.[3]

Quadratic comparisons[edit]

The number of comparisons required to sort is certainly lower than for ordinary comparison sorting: Michael Fredman showed in 1976 that sorting can be done using only comparisons. More generally, he shows that any set of elements, whose sorted ordering has already been restricted to a family of orderings, can be sorted using comparisons, by a form of binary insertion sort. For the sorting problem, , and , so and Fredman's bound implies that only comparisons are needed. However, the time needed to decide which comparisons to perform may be significantly higher than the bound on the number of comparisons.[4] If only comparisons between elements of are allowed, then there is also a matching lower bound of on the number of comparisons needed.[4][6]

The first explicit algorithm that achieves both comparisons and total complexity was published sixteen years later by Lambert (1992). The algorithm performs the following steps:

  1. Recursively sort the two sets and .
  2. Use the equivalence to infer the sorted orderings of and without additional comparisons.
  3. Merge the two sets and into a single sorted order, using a number of comparisons linear in their total size.
  4. Use the merged order and the equivalence to infer the sorted order of without additional comparisons.

The part of the algorithm that recursively sorts (or equivalently ) does so by the following steps:

  1. Split into two equal sublists and .
  2. Recursively sort and
  3. Infer the ordering on using only the comparisons from a single merge step as above.
  4. Merge the sorted results , , and together.

The number of comparisons needed to perform this recursive algorithm on an input of items can be analyzed using the recurrence relation

where the term of the recurrence counts the number of comparisons in the recursive calls to the algorithm to sort and , and the term counts the number of comparisons used to merge the results. The master theorem for recurrence relations of this form shows that . The total time complexity is slower, , because of the steps of the algorithm that use already-made comparisons to infer orderings of other sets. These steps can be performed in time by using a standard comparison-sorting algorithm with its comparison steps replaced by the stated inferences.[7]

Non-comparison-based algorithms[edit]

Just as integer sorting can be faster than comparison sorting for small-enough integers, the same is true for sorting. In particular, with integer inputs in the range from to some upper limit , the problem can be solved in operations by means of the fast Fourier transform.[3][2]

Applications[edit]

Steven Skiena recounts a practical application in transit fare minimisation, an instance of the shortest path problem: find the cheapest two-hop airplane ticket between two given cities, from an input that describes the cost of each hop from the starting city or to the destination city, and describing which pairs of hops are allowed to be combined into a single ticket. Skiena's solution consists of sorting pairs of hops by their total cost as an instance of the sorting problem, and then testing the resulting pairs in this sorted order until finding one that is allowed. To generate the sorted pairs in this order, Skiena uses a priority queue of pairs, initially containing only a single pair, the one consisting of the two cheapest hops. Then, when a pair is removed from the queue and found to be disallowed, two more pairs are added, with one of these two pairs combining with the next hop after in a sorted list of the hops to the destination, and the other pair combining with the next hop after in a sorted list of hops from the start. In this way, each successive pair can be found in logarithmic time, and only the pairs up to the first allowable one need to be sorted.[1]

Related problems[edit]

Several other problems in computational geometry have equivalent or harder complexity to sorting, including constructing Minkowski sums of staircase polygons, finding the crossing points of an arrangement of lines in sorted order by their -coordinates, listing pairs of points in sorted order by their distances, and testing whether one rectilinear polygon can be translated to fit within another.[8]

The problem of testing whether two of the pairs in the sorting problem have equal sums can be solved by sorting the pairs and then testing consecutive pairs for equality. In turn, it could be used to solve the 3SUM problem, implying that it is unlikely to have a strongly subquadratic algorithm.[2]

References[edit]

  1. ^ a b Skiena, Steven (2008). "4.4 War Story: Give me a Ticket on an Airplane". The Algorithm Design Manual (2nd ed.). Springer. pp. 118–120. doi:10.1007/978-1-84800-070-4_4.
  2. ^ a b c d Demaine, Erik; Erickson, Jeff; O'Rourke, Joseph (20 August 2006). "Problem 41: Sorting X + Y (Pairwise Sums)". The Open Problems Project. Retrieved 23 September 2014.
  3. ^ a b c d Harper, L. H.; Payne, T. H.; Savage, J. E.; Straus, E. (1975). "Sorting X + Y". Communications of the ACM. 18 (6): 347–349. doi:10.1145/360825.360869. MR 0378473.
  4. ^ a b c Fredman, Michael L. (1976). "How good is the information theory bound in sorting?". Theoretical Computer Science. 1 (4): 355–361. doi:10.1016/0304-3975(76)90078-5. MR 0416100.
  5. ^ Sloane, N. J. A. (ed.). "Sequence A343245". The On-Line Encyclopedia of Integer Sequences. OEIS Foundation.
  6. ^ Dietzfelbinger, Martin (1989). "Lower bounds for sorting of sums". Theoretical Computer Science. 66 (2): 137–155. doi:10.1016/0304-3975(89)90132-1. MR 1019082.
  7. ^ Lambert, Jean-Luc (1992). "Sorting the sums (xi + yj) in O(n2) comparisons". Theoretical Computer Science. 103 (1): 137–141. doi:10.1016/0304-3975(92)90089-X. MR 1181041.
  8. ^ Hernández Barrera, Antonio (1996). "Finding an o(n2 log n) algorithm is sometimes hard" (PDF). Proceedings of the 8th Canadian Conference on Computational Geometry (CCCG'96). pp. 289–294.