|This article is of interest to the following WikiProjects:|
|To-do list for SHA-1:|
Archive 1: April 2010
SHA and MD5
Is it possible to find two messages that have the same SHA(1 or 2) and MD5 hashes? What is the probability of coming across such a file by chance (compared to that for SHA and MD5 individually)? 126.96.36.199 (talk) 13:49, 27 July 2010 (UTC)
- My answer probably isn't as clear as you would want it to be, but Wikipedia is not a Q&A site (you might want to try for instance Stack Overflow). But you can have a look at birthday paradox, birthday attack, collision attack. -- intgr [talk] 16:06, 27 July 2010 (UTC)
The best public cryptanalysys is wrong
The box resuming sha-1 algorithm says that Best public cryptanalysis is 2^59. IT's WRONG. You can see it here http://ehash.iaik.tugraz.at/wiki/SHA-1
and the newest version of Stéphane Manuel paper can be found here: http://www-rocq.inria.fr/secret/Stephane.Manuel/pdfs/WCC09_paper.pdf
On it she never touch on the 2^51 algorithm and says that the best one is the on other article, that article refeerences the Wang 2^69 article.
So the correct colision algoritm would be Wang´s 2^69
- I agree, the Paper from Stéphane Manuel states no practical attack, but only a theoretical complexity of 2^51.
- To quote its complexity evaluation: "We are not claiming that this is the best way ever to evaluate the complexity of a collision attack against SHA-1."
- 188.8.131.52 (talk) 02:09, 13 March 2011 (UTC)
- Stéphane is a he. Unfortunately this WCC09_paper.pdf document is no longer accessible, but the abstract of the published version of the document ( http://www.springerlink.com/content/u320751102001580/ ) states that the most efficient vector is the Jutla & Patthak Codeword2 which complexity he evaluated to be 2^60. — Preceding unsigned comment added by Jmdwp (talk • contribs) 11:21, 4 October 2012 (UTC)
due to avalanche effect ?
I am not a native english speaker, but this does not seem right. Avalanche effect is not *cause*, its name for earlier part of sentence. "Even a small change in the message will, with overwhelming probability, result in a completely different hash due to the avalanche effect." I think this should be rewritten as "Even a small change in the message will, with overwhelming probability, result in a completely different hash (a.k.a. avalanche effect)." firstname.lastname@example.org 184.108.40.206 (talk) 10:54, 20 September 2011 (UTC)
- "due to avalanche effect" can be read either as "caused by avalanche effect" or as "resulting from avalanche effect". — Preceding unsigned comment added by 220.127.116.11 (talk) 13:15, 21 February 2012 (UTC)
Example Hashes wrong?
For sha1("The quick brown fox jumps over the lazy dog") I get be417768b5c3c5c1d9bcb2e7c119196dd76b5570 which differs from the article. However, for sha1("") I do get the published value of da39a3ee5e6b4b0d3255bfef95601890afd80709. Could someone verify the first two example hashes? 18.104.22.168 (talk) 22:05, 30 April 2012 (UTC)
For this section
These are examples of SHA-1 digests. ASCII encoding is used for all messages.
SHA1("The quick brown fox jumps over the lazy dog") = 2fd4e1c6 7a2d28fc ed849ee1 bb76e739 1b93eb12 Even a small change in the message will, with overwhelming probability, result in a completely different hash due to the avalanche effect. For example, changing dog to cog produces a hash with different values for 81 of the 160 bits:
SHA1("The quick brown fox jumps over the lazy cog") = de9f2c7f d25e1b3a fad3e85a 0bd17d9b 100db4b3
The hash of the zero-length string is:
SHA1("") = da39a3ee 5e6b4b0d 3255bfef 95601890 afd80709
Those hashes each have 40 characters. Which is 320 bits. If SHA1 has an 160 (edited to correct wrong citation from table) bit output, as claimed in the table at the top of the article, then the examples are wrong. Or the table is wrong. — Preceding unsigned comment added by 22.214.171.124 (talk) 14:47, 6 September 2012 (UTC)
- The hashes are expressed in hexadecimal which uses two characters per eight bits of hash data. This is much more convenient than writing the hash is "�6�W0ᒦ��>���2"���[E�" (for example). 40 characters is needed to represent 160 bits: 160 bits divided by 8 bits per byte is 20 bytes. 20 bytes is represented by 40 hexadecimal characters. —EncMstr (talk) 19:27, 6 September 2012 (UTC)
I think I can clear this up. The OP accidentially added a newline character to the hash input. SHA1("The quick brown fox jumps over the lazy dog\n") = be417768b5c3c5c1d9bcb2e7c119196dd76b5570, but SHA1("The quick brown fox jumps over the lazy dog") = 2fd4e1c67a2d28fced849ee1bb76e7391b93eb12. O.mangold (talk) 14:46, 26 August 2016 (UTC)
While there have been several published theoretical attacks on SHA-1, the last I heard no one has produced a collision, so putting it in the Category Broken Cryptographic Hashes may be premature. At the least, a notable citation saying it is considered broken is needed.--agr (talk) 01:02, 11 June 2012 (UTC)
SHA-1 Pseudocode Typo
I think there's a typo in the pseudo code. On page 9 of the FIPS 180-2 document, it says:
- ft (x,y,z)=Ch(x,y,z=(x^y) xor (~x^z) 0 ≤ t ≤ 19
(Other values for the function continue for t > 19.)
And the pseudocode shows:
- f = (b and c) or ((not b) and d)
Shouldn't that or, be an xor?
- f = (b and c) xor ((not b) and d)
- It does not matter. They are both the same function. 126.96.36.199 (talk) 18:02, 26 October 2012 (UTC)
The original FIPS PUB 180-1 specified it as :f = (b and c) or ((not b) and d), while newer versions of the publication specify it as :f = (b and c) xor ((not b) and d). Since FIPS PUB 180-1 is no longer available from NIST I would suggest referring to the newer document instead and adapting the pseudo-code and alternative forms accordingly.
General comment in the article about hash algorithms versus key derivation algorithms
"The use of plain or salted hashes for password storage is a bad idea, they are not designed for this, and key-derivation algorithms like PBKDF2 or Bcrypt or similar should be used instead." - Is this true? There's no citation for such a qualitative statement, And even if so, does it belong in an article about a specific hash function? I'm no expert so I don't feel qualified to edit the article. Dominic Sayers (talk) 10:06, 4 December 2012 (UTC)
relevance of MS-office section
I just had a read of this article and happened to note the section on MS-Office screaming out at me as being irrelevant - many if not most applications can use SHA1 so I don't think the article should contain a section on MS Office's use of SHA1. Alex Harvey (talk) 05:23, 7 January 2013 (UTC)
"Although no successful attacks have yet been reported on SHA-2, they are algorithmically similar to SHA-1."
The sentence says that no successful hacks have been reported; what is the "they" referred to here? Unsuccessful hacks? (Edit: er, I mean attacks, not hacks.)
Apparent Inconsistency - Relationship of SHA-1, SHA-2 Algorithms
The first paragraph contains the statement
SHA-2 on the other hand significantly differs from the SHA-1 hash function.
The third paragraph states
Although no successful attacks have yet been reported on SHA-2, they are algorithmically similar to SHA-1.
How to reconcile the statements ??
- they significantly differ
- they are algorithmically similar — Preceding unsigned comment added by 188.8.131.52 (talk) 13:13, 8 May 2013 (UTC)
Hi, Stépane Manuel, author of the 2^51 attack, removed this claim from the final paper (see here). The best attack is today the one from Stevens (here). Do you have any other information that I may have missed? --Dduuhh (talk) 15:52, 22 May 2013 (UTC)
What is rot?
In the section Comparison of SHA functions, the table lists bitwise functions that include "rot". The bitwise function article has no mention of "rot". Now I assume that rot means some form of rotation (covered in the bitwise function article) but which one is it? Or is it more than one form of rotation? A simple footnote clarifying the "rot" term would suffice. 184.108.40.206 (talk) 20:46, 30 April 2014 (UTC)
- It refers to Rotate no carry (Circular shift). I don't know if a footnote is necessary since this is really the standard set of bitwise operations, and I'm not sure what else "rot" could be reasonably confused with. But I have no real objection to someone adding one. Mlkj (talk) 09:04, 29 May 2016 (UTC)
- Rot is not quite as common as many other bitwise operations. It exists in some computer architectures, where it allows one to use it to build up other operations. Rotate through carry allows for a multiple register shift, for example. Architectures that include a multiple register shift, such as IBM S/360 and successors, don't have it. (It might have been added more recently, I can't keep track of all the new instructions.) I suspect that it could use a little documentation. Gah4 (talk) 20:01, 7 January 2018 (UTC)
The description of the algorithm uses big endianness as a notation for multibyte integer units (words). Of course, how constants are represented in a pseudocode it doesn't matter, but it seems the algorithm itself makes the big endian view the part of its requirements. It treats a chunk of the input 512 bits as an array of 32 unsigned integers on which operations such as addition modulo 2^32 are applied. And it does matter if the algorithm dictates to treat every 32 bits of the input bit string as a word with big endian byte order on it as is in the pseudocode. A BE computer just will load the next 4 bytes from the message buffer into its variables (registers) and do its math, but a LE computer will need shuffle those 4 bytes all around before to continue on the steps required by the standard. Does really the algorithm require that, and for what a reason in this little endian dominated world this requirement has been done by designers? Every computer which almost certainly is little endian now needs to perform such an overhead by the will of the designers just like in the case of stupid choice for the network order which at least was motivated by dominating of BE systems at the times networking was born. But at the times sha appeared, little endian systems was dominating, so why why this choice? This is totally needless overhead where performance requirements are so desirable. I'm note sure if LE systems have some tweaking features to overcome this stupidity. Eg reading (at once) 4 bytes from the buffer back to forth (from the addr a to a-3 incl.). The case is in the same principle - standard should be designed for the real system in minds, not for sphaerical horse in vacuum. — Preceding unsigned comment added by 220.127.116.11 (talk) 18:08, 6 September 2014 (UTC)
- Wikipedia talk pages are for discussing changes to the article. Sorry, this is not the place to argue about the design decisions of SHA-1. -- intgr [talk] 18:18, 6 September 2014 (UTC)
Performance column in table
The performance column only refers to a CPU used, and smacks of Original Research. Although some might be pedantic, I don't mind. But these numbers differ from what I see using OpenSSL - I find that SHA-1 is almost as fast as MD5 (within ten or twenty percent, perhaps), but that SHA-256 is substantially (four times?) slower. Anybody know what algorithms were benchmarked here, or ideally, external references to benchmarks?
- Oh, I see there was an external link, but from 2009. Time for an update, perhaps? Ketil (talk) 06:44, 14 April 2015 (UTC)
Document SHAppening/Fappening/Snappening name relationship?
I added a footnote, for the benefit of readers years later when the terms are no longer current, explaining that "The SHAppening" is a reference to The Fappening (and, I should have added, Snapchat#The Snappening). User:Intgr reverted it on the grounds that it lacked a WP:reliable source. I hadn't included one because, per WP:V, I hadn't expected the statement to be challenged.
The problem is, due (I believe) to a combination of not wanting to be condescending to readers and the unsavoury nature of the nude photo leaks, I haven't managed to track down a source explicitly documenting that fact.
So per the WP:BOLD, revert, discuss cycle, I'd just like to canvas other people's opinions. Do y'all feel:
- It's not true that it's a deliberate reference to those terms, or at least there's a reasonable chance it's not. (E.g. it might be a reference to The Happening (2008 film) instead.)
- It is a deliberate reference, but that fact isn't WP:notable enough to be worth mentioning at all.
- It would be worth mentioning if we could find a WP:RS, but if we can't, it should be omitted per policy.
- It is worth mentioning. (FWIW, User:Intgr has expressed a willingness to be persuaded by consensus here to drop the challenge.)
Personally, I think it's very marginally notable (which is why I put it in a footnote and not the main text), but in a spirit of inclusionism, might be informative to a non-native English speaker or future reader when the terms have faded from people's awareness. 18.104.22.168 (talk) 09:31, 27 October 2015 (UTC)
- The discussion between OP and me is at User talk:22.214.171.124#Your edits to SHA-1. -- intgr [talk] 10:09, 27 October 2015 (UTC)
Collisions not yet found (Dec 2015)
The section "Comparison of SHA functions" is indicating that collisions for SHA-1 have been found. As far as I understand the quoted source this is only true for the compression function within SHA-1. No two inputs for SHA-1 have been found (yet) that produce the same hash value. --KommX (talk) 09:06, 15 December 2015 (UTC)
- @KommX: Thanks! You are correct, as per the source: "freestart collisions do not directly lead to actual collisions for SHA-1". The table is now hidden away in a separate template page, that's why editors watching the article probably didn't notice this IP address edit. -- intgr [talk] 09:23, 15 December 2015 (UTC)
- I agree no collisions have been found for SHA1, AIUI the attach has been demonstrated by attacking the compression function only which is not the same as finding a collision. The reference seems clear about this. I edited the template to no collisions found and included the SHAppening cost estimate to give indication of near term risks of finding collision with explanation that less than 80 is not a good representation of none. My template edit has been reverted and I don't see any good reason for this. I think it should be reverted back to my version or something better unless someone has good reason to support "<80" crandles (talk) 10:12, 16 December 2015 (UTC)
Revision control systems such as Git and Mercurial use SHA-1 not for security but for ensuring that the data has not changed due to accidental corruption.
Am I the only one who thinks this is nonsense? When you sign a git commit, what really is signed is the SHA-1 id of the commit, thus the security clearly relies on SHA-1 not being broken.
However Git does not require the second preimage resistance of SHA-1 as a security feature, since it will always prefer to keep the earliest version of an object in case of collision, preventing an attacker from surreptitiously overwriting files.
This is only true as long as you pulled the commit in question first from a trusted source. But this is not always the case, for example when using Gittorrent. Even worse seems to be the situation when a malicious actor is able to get commits merged upstream. Then the security not only depends on second-preimage resistance of SHA-1 but also on collision resistance.
"Announced an attack" vs "performed an attack"
Re : To me, "announced an attack" could refer to mathematically proving that a collision is feasible, which had already been done several years ago, without actually committing the processing resources to do it. Furthermore, the article collision attack clearly shows that the phrase can be used to refer to just finding a collision, whether or not the collision was used for an "advantage". Smyth (talk) 14:29, 5 March 2017 (UTC)
- See if anyone else wants to say, one way or the other. I was thinking in terms of a physical attack. (Country X) attacked (country Y) usually doesn't mean that they just collected enough guns and ammo, and even went to live rounds practicing, but that they actually used them in the country. But maybe the analogy isn't right. It was really close when I decided to do the rv, and maybe others will add to the discussion. Thanks. Gah4 (talk) 17:11, 5 March 2017 (UTC)
- OK, for now I reverted the change. Unlike countries, there can be an attack against SHA-1 vs. an attack using SHA-1. This distinction isn't so easy to make in words. As in the paper, one could make two copies of a document, for example with different monetary values. Verifying using SHA-1 would allow one to cheat using the second document. As far as I know, that hasn't happened. The form of the collision documents makes detection fairly easy, though, if one decides to check. Gah4 (talk) 20:00, 5 March 2017 (UTC)
Not sure if it's quite notable enough to mention it, but there is a "hardened" SHA1 hash function that produces identical SHA1 hashes in almost all cases (2^-90 chance of different hash), except when it detects collision attempts occurring. https://github.com/cr-marcstevens/sha1collisiondetection
Apparently this is now on the mainline git (from v2.13) so if you were to put the SHAttered pdf files into git, it would not cause any issues. If this sees wider adoption it would certainly be worth mentioning in the article. --Nanite (talk) 16:11, 5 August 2018 (UTC)
Request for Review in Leurent-Peyrin Attack Section
A few days ago I added coverage of the Leurent-Peyrin attack based on skimming the paper in question and referenced papers. One particular sentence I wrote is "This method is also capable of finding chosen-prefix collisions in the MD5 function, but at a complexity of 246.3 does not surpass the prior best available method at a theoretical level (239), though potentially at a practical level (≤249)." This was based on the belief that references to MD5 attacks with a specific block count referred to basing the search for solutions on a given number of blocks there is information for, making smaller numbers of blocks more difficult to solve theoretically than larger numbers (as reflected in the complexity for 3-block solutions and general solutions). But now I'm doubting this assumption, thinking they were referring to hashing messages of that number of blocks length, where shorter messages are much easier to analyze to find collisions for than longer messages (though that doesn't really explain why the attack against a 3-block message would take longer than the theoretical attack in general); if true, the claim that the Leurent-Peyrin attack is more practical than the prior best attack is completely wrong. Justin.olbrantz (talk) 10:11, 18 May 2019 (UTC)
I have just published a (the?) second public pair of colliding files. https://privacylog.blogspot.com/2019/12/the-second-sha-collision.html Thought you might be interested Full Decent (talk) 06:33, 31 December 2019 (UTC)
- Right in time for it to still be in 2019, I see. "Processing power used: none. No brute force processing was necessary to generate these two files based on the other files above." Which other files were these based on? BernardoSulzbach (talk) 16:15, 31 December 2019 (UTC)
- This is not new -- the same 320 byte vectors were used to win the Bitcoin SHA1 collision challenge, back in 2017. (See 'sigscript' here https://www.blockchain.com/btc/tx/8d31992805518fd62daa3bdd2a5c4fd2cd3054c9b3dca1d78055e9528cff6adc ). I'd link to the bitcointalk thread but that site is blacklisted on wikipedia. --Nanite (talk) 23:44, 1 January 2020 (UTC)
- All you did was to take the first 320 bytes of the two files of Stevens et al! As is clear from their paper, you can put whatever you want after those 320 bytes, so long as you put the same thing in both files. You just left that part out! Big deal. Eric Kvaalen (talk) 08:36, 23 February 2020 (UTC)
Is the collision really useful?
Stevens et al] found a way to create two different files having the same SHA-1 digest. In fact, what they did was to create two beginnings of a PDF file, and then one can put whatever one wants after that so long as it's the same in the two files. They say "We exploit these limited differences to craft two colliding PDF documents containing arbitrary distinct images." But that doesn't seem to be true! Since the content after the first 320 bytes has to be the same, the images will be practically the same. They give an example at https://shattered.io. The two PDFs are the same exxcept for the colors. I suppose the colors are determined by the two blocks that differ in the first 320 bytes.
So in that case, how is this useful? At https://shattered.io they say "For example, by crafting the two colliding PDF files as two rental agreements with different rent, it is possible to trick someone to create a valid signature for a high-rent contract by having him or her sign a low-rent contract." But not with the collision they found! All it would do is give two rental agreements with different colors! No? Please Ping me if you answer. Eric Kvaalen (talk) 08:36, 23 February 2020 (UTC)