Element distinctness problem

In computational complexity theory, the element distinctness problem or element uniqueness problem is the problem of determining whether all the elements of a list are distinct.

It is a well studied problem in many different models of computation. The problem may be solved by sorting the list and then checking if there are any consecutive equal elements; it may also be solved in linear expected time by a randomized algorithm that inserts each item into a hash table and compares only those elements that are placed in the same hash table cell.^[1]

Several lower bounds in computational complexity are proved by reducing the element distinctness problem to the problem in question, i.e., by demonstrating that the solution of the element uniqueness problem may be quickly found after solving the problem in question.

Decision tree complexity

The number of comparisons needed to solve the problem of size $n$ , in a comparison-based model of computation such as a decision tree or algebraic decision tree, is $\Theta (n\log n)$ . Here, $\Theta$ invokes big theta notation, meaning that the problem can be solved in a number of comparisons proportional to $n\log n$ (a linearithmic function) and that all solutions require this many comparisons.^[2] In these models of computation, the input numbers may not be used to index the computer's memory (as in the hash table solution) but may only be accessed by computing and comparing simple algebraic functions of their values. For these models, an algorithm based on comparison sort solves the problem within a constant factor of the best possible number of comparisons. The same lower bound applies as well to the expected number of comparisons in the randomized algebraic decision tree model.^[3]^[4]

Real RAM Complexity

If the elements in the problem are real numbers, the decision-tree lower bound extends to the real random-access machine model with an instruction set that includes addition, subtraction and multiplication of real numbers, as well as comparison and either division or remaindering ("floor").^[5] It follows that the problem's complexity in this model is also $\Theta (n\log n)$ . This RAM model covers more algorithms than the algebraic decision-tree model, as it encompasses algorithms that use indexing into tables. However, in this model all program steps are counted, not just decisions.

Turing Machine complexity

A single-tape deterministic Turing machine can solve the problem, for n elements of $m \geq log n$ bits each, in time $O (n 2 m (m +2-log n))$ , while on a nondeterministic machine the time complexity is $O (nm (n + log m))$ .^[6]

Quantum complexity

Quantum algorithms can solve this problem faster, in ${\textstyle \Theta (n^{2/3})}$ queries. The optimal algorithm is by Andris Ambainis.^[7] Yaoyun Shi first proved a tight lower bound when the size of the range is sufficiently large.^[8] Ambainis^[9] and Kutin^[10] independently (and via different proofs) extended his work to obtain the lower bound for all functions.

Generalization: Finding repeated elements

Elements that occur more than $n/k$ times in a multiset of size $n$ may be found by a comparison-based algorithm, the Misra–Gries heavy hitters algorithm, in time $O(n\log k)$ . The element distinctness problem is a special case of this problem where $k=n$ . This time is optimal under the decision tree model of computation.^[11]

References

^ Gil, J.; Meyer auf der Heide, F.; Wigderson, A. (1990), "Not all keys can be hashed in constant time", Proc. 22nd ACM Symposium on Theory of Computing, pp. 244–253, doi:10.1145/100216.100247, S2CID 11943779.
^ Ben-Or, Michael (1983), "Lower bounds for algebraic computation trees", Proc. 15th ACM Symposium on Theory of Computing, pp. 80–86, doi:10.1145/800061.808735.
^ Grigoriev, Dima; Karpinski, Marek; Heide, Friedhelm Meyer; Smolensky, Roman (1996), "A lower bound for randomized algebraic decision trees", Computational Complexity, 6 (4): 357, doi:10.1007/BF01270387, S2CID 1462184.
^ Grigoriev, Dima (1999), "Complexity lower bounds for randomized computation trees over zero characteristic fields", Computational Complexity, 8 (4): 316–329, doi:10.1007/s000370050002, S2CID 10641238.
^ Ben-Amram, Amir M.; Galil, Zvi (2001), "Topological Lower Bounds on Algebraic Random Access Machines", SIAM Journal on Computing, 31 (3): 722–761, doi:10.1137/S0097539797329397.
^ Ben-Amram, Amir M.; Berkman, Omer; Petersen, Holger (2003), "Element distinctness on one-tape Turing machines: a complete solution.", Acta Informatica, 40 (2): 81–94, doi:10.1007/s00236-003-0125-8, S2CID 24821585
^ Ambainis, Andris (2007), "Quantum walk algorithm for element distinctness", SIAM Journal on Computing, 37 (1): 210–239, arXiv:quant-ph/0311001, doi:10.1137/S0097539705447311
^ Shi, Y. (2002). Quantum lower bounds for the collision and the element distinctness problems. Proceedings of the 43rd Symposium on Foundations of Computer Science. pp. 513–519. arXiv:quant-ph/0112086. doi:10.1109/SFCS.2002.1181975.
^ Ambainis, A. (2005). "Polynomial Degree and Lower Bounds in Quantum Complexity: Collision and Element Distinctness with Small Range". Theory of Computing. 1 (1): 37–46. doi:10.4086/toc.2005.v001a003.
^ Kutin, S. (2005). "Quantum Lower Bound for the Collision Problem with Small Range". Theory of Computing. 1 (1): 29–36. doi:10.4086/toc.2005.v001a002.
^ Misra, J.; Gries, D. (1982), "Finding repeated elements", Science of Computer Programming, 2 (2): 143–152, doi:10.1016/0167-6423(82)90012-0, hdl:1813/6345.

[1] Gil, J.; Meyer auf der Heide, F.; Wigderson, A. (1990), "Not all keys can be hashed in constant time", Proc. 22nd ACM Symposium on Theory of Computing, pp. 244–253, doi:10.1145/100216.100247, S2CID 11943779.

[2] Ben-Or, Michael (1983), "Lower bounds for algebraic computation trees", Proc. 15th ACM Symposium on Theory of Computing, pp. 80–86, doi:10.1145/800061.808735.

[3] Grigoriev, Dima; Karpinski, Marek; Heide, Friedhelm Meyer; Smolensky, Roman (1996), "A lower bound for randomized algebraic decision trees", Computational Complexity, 6 (4): 357, doi:10.1007/BF01270387, S2CID 1462184.

[4] Grigoriev, Dima (1999), "Complexity lower bounds for randomized computation trees over zero characteristic fields", Computational Complexity, 8 (4): 316–329, doi:10.1007/s000370050002, S2CID 10641238.

[5] Ben-Amram, Amir M.; Galil, Zvi (2001), "Topological Lower Bounds on Algebraic Random Access Machines", SIAM Journal on Computing, 31 (3): 722–761, doi:10.1137/S0097539797329397.

[6] Ben-Amram, Amir M.; Berkman, Omer; Petersen, Holger (2003), "Element distinctness on one-tape Turing machines: a complete solution.", Acta Informatica, 40 (2): 81–94, doi:10.1007/s00236-003-0125-8, S2CID 24821585

[7] Ambainis, Andris (2007), "Quantum walk algorithm for element distinctness", SIAM Journal on Computing, 37 (1): 210–239, arXiv:quant-ph/0311001, doi:10.1137/S0097539705447311

[8] Shi, Y. (2002). Quantum lower bounds for the collision and the element distinctness problems. Proceedings of the 43rd Symposium on Foundations of Computer Science. pp. 513–519. arXiv:quant-ph/0112086. doi:10.1109/SFCS.2002.1181975.

[9] Ambainis, A. (2005). "Polynomial Degree and Lower Bounds in Quantum Complexity: Collision and Element Distinctness with Small Range". Theory of Computing. 1 (1): 37–46. doi:10.4086/toc.2005.v001a003.

[10] Kutin, S. (2005). "Quantum Lower Bound for the Collision Problem with Small Range". Theory of Computing. 1 (1): 29–36. doi:10.4086/toc.2005.v001a002.

[11] Misra, J.; Gries, D. (1982), "Finding repeated elements", Science of Computer Programming, 2 (2): 143–152, doi:10.1016/0167-6423(82)90012-0, hdl:1813/6345.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]