# X + Y sorting

Jump to navigation Jump to search

In computer science, ${\boldsymbol {X}}+{\boldsymbol {Y}}$ sorting is the problem of sorting pairs of numbers by their sum. It can be solved using a number of comparisons that is quadratic in the input length, fewer than would be needed needed to sort an unstructured list of equally many items. However, it is not known whether it is possible for an algorithm that solves the problem to have total running time faster by more than a constant factor than comparison sorting.

## Problem statement and history

Unsolved problem in computer science:

Is there an $X+Y$ sorting algorithm faster than $O(n^{2}\log n)$ ?

The input to the $X+Y$ sorting problem is two finite collections of numbers $X$ and $Y$ , both of the same length. The problem's output is the collection of all pairs $(x_{i},y_{j})$ with $x_{i}$ in $X$ and with $y_{j}$ in $Y$ , arranged into sorted order by the value of $x_{i}+y_{j}$ for each pair. One way to solve the problem would be to construct the Cartesian product of the two given sets (the collection of pairs to be sorted) and use this collection of pairs as the input to a standard comparison sorting algorithm such as merge sort or heapsort. When the input collections each consist of $n$ numbers, their Cartesian product consists of $n^{2}$ pairs, and the time for one of these comparison sorting algorithms to sort a collection of this many pairs is $O(n^{2}\log n)$ . No asymptotically faster algorithm for $X+Y$ sorting is known. Whether it can be done in a faster time bound is an open problem, posed by Elwyn Berlekamp prior to 1975.

## Number of orderings

Together, the two input collections for the $X+Y$ sorting problem comprise $2n$ numbers, which can alternatively be interpreted as the Cartesian coordinates of a point in the $2n$ -dimensional space $\mathbb {R} ^{2n}$ . If one partitions this space $\mathbb {R} ^{2n}$ into cells defined by the property that the collection of pairs to be sorted has a fixed ordering within each cell, then each boundary between two cells lies within a hyperplane defined by an equality of pairs $x_{i}+y_{j}=x_{k}+y_{\ell }$ , where $(x_{i},y_{j})$ and $(x_{k},y_{\ell })$ are two pairs whose ordering changes from one adjacent cell to the other. These hyperplanes are either generated by two disjoint pairs, or they have the simplified forms $x_{i}=x_{k}$ or $y_{j}=y_{\ell }$ , so the number of distinct hyperplanes that can be determined in this way is

$k=2{\binom {n}{2}}^{2}+2{\binom {n}{2}}.$ The number of cells that this number of hyperplanes can divide a space of dimension $2n$ into is
${\binom {k}{2n}}+{\binom {k}{2n-1}}+\cdots +{\binom {k}{0}}=O(n^{8n}).$ Therefore, the set $X+Y$ has $O(n^{8n})$ different possible sorted orderings.

Harper et al. (1975) suggest separately sorting $X$ and $Y$ , and then constructing a two-dimensional matrix of the values of $X+Y$ that is sorted both by rows and by columns before using this partially-sorted data to complete the sort of $X+Y$ . This idea can reduce the number of comparisons needed by a constant factor, compared to naive comparison sorting. However, they show that the number of possible orderings of a sorted matrix is large enough that any comparison sorting algorithm that can work for arbitrary $n\times n$ matrices that are sorted by rows and columns still requires $\Omega (n^{2}\log n)$ comparisons. Therefore, additional information about the set $X+Y$ beyond this matrix ordering would be needed for any faster sorting algorithm.

## Quadratic comparisons

The number of comparisons required to sort $X+Y$ is certainly lower than for ordinary comparison sorting: Michael Fredman showed in 1976 that $X+Y$ sorting can be done using only $O(n^{2})$ comparisons. More generally, he shows that any set of $N$ elements, whose sorted ordering has already been restricted to a family $\Gamma$ of orderings, can be sorted using $\log _{2}|\Gamma |+O(N)$ comparisons, by a form of binary insertion sort. For the $X+Y$ sorting problem, $N=n^{2}$ , and $|\Gamma |=O(n^{8n})$ , so $\log _{2}|\Gamma |=O(n\log n)$ and Fredman's bound implies that only $O(n^{2})$ comparisons are needed. However, the time needed to decide which comparisons to perform may be significantly higher than the bound on the number of comparisons. If only comparisons between elements of $X+Y$ are allowed, then there is also a matching lower bound of $\Omega (n^{2})$ on the number of comparisons needed.

The first explicit algorithm that achieves both $O(n^{2})$ comparisons and $O(n^{2}\log n)$ total complexity was published sixteen years later by Lambert (1992). The algorithm performs the following steps:

1. Recursively sort the two sets $X+X$ and $Y+Y$ .
2. Use the equivalence $x_{i}-x_{j}\leq x_{k}-x_{\ell }\Leftrightarrow x_{i}+x_{\ell }\leq x_{j}+x_{k}$ to infer the sorted orderings of $X-X$ and $Y-Y$ without additional comparisons.
3. Merge the two sets $X-X$ and $Y-Y$ into a single sorted order, using a number of comparisons linear in their total size.
4. Use the merged order and the equivalence $x_{i}+y_{j}\leq x_{k}+y_{\ell }\Leftrightarrow x_{i}-x_{k}\leq y_{\ell }-y_{j}$ to infer the sorted order of $X+Y$ without additional comparisons.

The part of the algorithm that recursively sorts $X+X$ (or equivalently $Y+Y$ ) does so by the following steps:

1. Split $X$ into two equal sublists $A$ and $B$ .
2. Recursively sort $A+A$ and $B+B$ 3. Infer the ordering on $A+B$ using only the comparisons from a single merge step as above.
4. Merge the sorted results $A+A$ , $B+B$ , and $A+B$ together.

The number of comparisons $C(n)$ needed to perform this recursive algorithm on an input of $n$ items can be analyzed using the recurrence relation

$C(n)\leq 2C(n/2)+O(n^{2}),$ where the $2C(n/2)$ term of the recurrence counts the number of comparisons in the recursive calls to the algorithm to sort $A+A$ and $B+B$ , and the $O(n^{2})$ term counts the number of comparisons used to merge the results. The master theorem for recurrence relations of this form shows that $C(n)=O(n^{2})$ . The total time complexity is slower, $O(n^{2}\log n)$ , because of the steps of the algorithm that use already-made comparisons to infer orderings of other sets. These steps can be performed in time $O(n^{2}\log n)$ by using a standard comparison-sorting algorithm with its comparison steps replaced by the stated inferences.

## Non-comparison-based algorithms

Just as integer sorting can be faster than comparison sorting for small-enough integers, the same is true for $X+Y$ sorting. In particular, with integer inputs in the range from $0$ to some upper limit $M$ , the problem can be solved in $O(n+M\log M)$ operations by means of the fast Fourier transform.

## Applications

Steven Skiena recounts a practical application in transit fare minimisation, an instance of the shortest path problem: find the cheapest two-hop airplane ticket between two given cities, from an input that describes the cost of each hop from the starting city or to the destination city, and describing which pairs of hops are allowed to be combined into a single ticket. Skiena's solution consists of sorting pairs of hops by their total cost as an instance of the $X+Y$ sorting problem, and then testing the resulting pairs in this sorted order until finding one that is allowed. To generate the sorted pairs in this order, Skiena uses a priority queue of pairs, initially containing only a single pair, the one consisting of the two cheapest hops. Then, when a pair $(x,y)$ is removed from the queue and found to be disallowed, two more pairs are added, with one of these two pairs combining $x$ with the next hop after $y$ in a sorted list of the hops to the destination, and the other pair combining $y$ with the next hop after $x$ in a sorted list of hops from the start. In this way, each successive pair can be found in logarithmic time, and only the pairs up to the first allowable one need to be sorted.

## Related problems

Several other problems in computational geometry have equivalent or harder complexity to $X+Y$ sorting, including constructing Minkowski sums of staircase polygons, finding the crossing points of an arrangement of lines in sorted order by their $x$ -coordinates, listing pairs of points in sorted order by their distances, and testing whether one rectilinear polygon can be translated to fit within another.

The problem of testing whether two of the pairs in the $X+Y$ sorting problem have equal sums can be solved by sorting the pairs and then testing consecutive pairs for equality. In turn, it could be used to solve the 3SUM problem, implying that it is unlikely to have a strongly subquadratic algorithm.