Collision (computer science)
This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these template messages)(Learn how and when to remove this template message)
Collision is used in two slightly different senses in theoretical computer science and telecommunications. In computer science it refers to a situation where a function maps two distinct inputs to the same output. In telecommunications, a collision occurs when two nodes of a network attempt to transmit at the same time.
Collisions in theoretical computer science
A collision or clash occurs when two different inputs to a function, typically one used to compress large data items into a smaller or fixed size, produce the same output, called (depending on the application) a hash value, checksum, fingerprint, or digest.
Collisions are unavoidable whenever members of a very large set (such as all possible person names, or all possible computer files) are mapped to a relatively short bit string. This is merely an instance of the pigeonhole principle.
The impact of collisions depends on the application. When the goal is to identify similar data, such as homologous DNA sequences, or similar audio files, or digital watermarking of images or banknotes, the compressive functions are designed to maximize the probability of collision between distinct but similar data. Checksums and ordinary hash functions on the other hand, are designed to smooth out the distribution of outputs, so as to minimize the probability of accidental collisions, while cryptographic hash functions are designed to make it infeasibly hard for a malicious adversary to find a collision (even though many exist), thereby defeating the purpose of an authentication or digital signature scheme.
Collisions in Telecommunications
Half-duplex (HDX) data communications networks have the possibility that more than one node will transmit at the same or overlapping time. This event is referred to as a collision. The messages sent by the transmitting nodes are corrupted. The receiving nodes will receive (in most but not all cases) random data. In half-duplex networks, the transmitting node must ensure that the network is quiet prior to transmitting, and in addition there is usually some mechanism for transmitting nodes to detect overlapping transmissions. See also Collision domain.
- Birthday attack
- Collision attack (cryptographic hash functions)
- Collision resolution (hash tables)
- Perfect hash function, a hash function that is free of collisions by design
- Name collision
- Jered Floyd (2008-07-18). "What do Hash Collisions Really Mean?". http://permabit.wordpress.com/: Permabits and Petabytes. Retrieved 2011-03-24.
For the long explanation on cryptographic hashes and hash collisions, I wrote a column a bit back for SNW Online, “What you need to know about cryptographic hashes and enterprise storage”. The short version is that deduplicating systems that use cryptographic hashes use those hashes to generate shorter “fingerprints” to uniquely identify each piece of data, and determine if that data already exists in the system. The trouble is, by a mathematical rule called the “pigeonhole principle”, you can’t uniquely map any possible files or file chunk to a shorter fingerprint. Statistically, there are multiple possible files that have the same hash.