Talk:Hash function

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

MD5 and SHA-1 weaknesses[edit]

This article should document the issues, raised last summer and sharpened last month, with MD5 and some other widely used hash functions. I'm putting a note that the algorithm has issues on this page and the MD5 page. If anyone wants to document this properly, look at:

(I won't write this up properly: not my field and I have very limited interest in the topic) ---- Charles Stewart 15:20, 14 Mar 2005 (UTC)

I confused this page with Cryptographic hash function: sorry, no issue ---- Charles Stewart 15:26, 14 Mar 2005 (UTC)

Missing content[edit]

The following have been mentioned as being missing from the article. Moving them to talk instead of keeping in a huge cleanup banner:

  • performance versus data length & distributions, chip architectures, etc
  • analysis: worst case, average case, etc
  • collision resolution, since all practical hash functions yield collisions
  • Basic info missing: is the output longer or shorter than the input? Is the output length fixed as input varies? Examples please. — Preceding unsigned comment added by Ian R Bryce (talkcontribs) 07:05, 17 June 2021 (UTC)[reply]

Thjarkur (talk) 00:19, 21 February 2020 (UTC)[reply]

@Þjarkur and Ian R Bryce: This article is also missing information about locality-sensitive hash functions. Jarble (talk) 22:06, 12 April 2022 (UTC)[reply]

Folding hash codes[edit]

The paragraph on folding hash codes is a mess.

First:

A folding hash code is produced by dividing the input into n sections of m bits, where 2m is the table size…

and:

…For example, for a table size of 15 bits…

If 2m = table size = 15, then m ≅ 3.91 bits per section.

…and key value of 0x0123456789ABCDEF…

The binary representation of 0x0123456789ABCDEF,

100100011010001010110011110001001101010111100110111101111,

has 57 bits. 57 bits ÷ 3.91 bits per section = n ≅ 14.59 sections. However:

…there are five sections consisting of 0x4DEF, 0x1357, 0x159E, 0x091A and 0x8…

These sections appear to consist of 15 bits, sort of:

Section 1

0x4DEF = 100110111101111 — 15 bits — matches key bits 0 through 14.

100100011010001010110011110001001101010111100110111101111
                                          100110111101111
Section 2

0x1357 = 1001101010111 — 13 bits — matches key bits 15 through 27.

100100011010001010110011110001001101010111100110111101111
                             1001101010111

There are 2 key bits skipped between sections 2 and 3. Prepending those two bits (00) to section 2 would make it 15 bits long.

Section 3

0x159E = 1010110011110 — 13 bits — matches key bits 30 through 42.

100100011010001010110011110001001101010111100110111101111
              1010110011110

There are 2 key bits skipped between sections 3 and 4. Prepending those two bits (00) to section 3 would make it 15 bits long.

Section 4

0x091A = 100100011010 — 12 bits — matches key bits 45 through 56.

100100011010001010110011110001001101010111100110111101111
100100011010
Section 5

0x8 = 1000 — 4 bits — matches 3 separate 4-bit segments of the key:

100100011010001010110011110001001101010111100110111101111
   1000   1000           1000

Problematically, all of these segments overlap other sections, and the first 4 sections account for every bit in the key value.

Next:

“...and using a parity-preserving bitwise operation such as ADD or XOR to combine the sections…”

While addition and exclusive OR are analogous, “ADD” is not a valid token in the lexicons of computer science or Boolean algebra. Further, addition is not parity-preserving in binary arithmetic. E.g., 1 + 1 (even parity) = 10 (odd parity). XOR is the bitwise parity function. The only one.

Finally:

“...Adding, we obtain 0x7AA4, a 15-bit value.”

The binary representation of 0x7AA4 is 111101010100100. It is indeed a 15-bit value, but neither adding nor XOR-ing all 5 sections (or just the first 4 sections) together produces this number. 216.30.159.11 (talk) 19:15, 5 January 2024 (UTC)[reply]