Talk:Confusion and diffusion

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Cryptography / Computer science  (Rated Start-class, Top-importance)
WikiProject icon This article is within the scope of WikiProject Cryptography, a collaborative effort to improve the coverage of Cryptography on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Start-Class article Start  This article has been rated as Start-Class on the quality scale.
 Top  This article has been rated as Top-importance on the importance scale.
Taskforce icon
This article is supported by WikiProject Computer science (marked as Top-importance).

I think that both of you, User:OT and User:Matt Crypto, are right below in some sense, even though you are disagreeing with each other. The relationship between the key and the ciphertext is complex exactly because the cryptosytem has the property that changing one bit of the key changes the ciphertext completely, and hence in order to change only one bit of the ciphertext you certainly would have to change the key completely. There are basically equivalent properties. I will try to clarify these things here now. --GaborPete (talk) 04:35, 5 April 2009 (UTC)

I'm not really satisfied with my own work, but presently that's what I can do. --GaborPete (talk) 10:31, 5 April 2009 (UTC)

And I agree with User:Dvunkannon (if it was him at the end): claiming that S-boxes are for confusion and P-boxes are for diffusion is complete nonsense. In the substitution-permutation network article it had been stated the opposite way, which is also complete nonsense, I have just changed that there. Both diffusion and confusion are the results of using S-boxes and P-boxes together in the right way. If you introduce the key with XORs, as usual, then it's impossible to achieve only confusion or only diffusion, whatever your S-boxes and P-boxes are. (This is related to my previous comment, too.) --GaborPete (talk) 04:35, 5 April 2009 (UTC)

I have now deleted the following paragraph: --GaborPete (talk) 10:31, 5 April 2009 (UTC)
Substitution (a plaintext symbol is replaced by another) has been identified as a mechanism for primarily confusion (see S-box); conversely transposition (rearranging the order of symbols, see P-box) is a technique for diffusion, although other mechanisms are also used in modern practice, such as linear transformations (e.g. in Rijndael). Product ciphers use alternating substitution and transposition phases to achieve both confusion and diffusion respectively.

There are two more slightly confusing things in the present text, I will change them, too:

  • I think it is dangerous to connect the word "substitution" in a substitution-permutation network with a substitution cipher. For example, in Rijndael, a main example of an SP network, an S-box substitutes 8-bit blocks by 8-bit blocks. On the other hand, when hearing "substitution cipher", one usually thinks of changing one character to another character, maybe pairs to pairs, or triplets to triplets, at most, but not octets to octets. True, an ASCII character is usually 8 bits, or something like that, but then Rijndael mixes these bits among the 8-bit blocks, not at all respecting ASCII characters or anything like that.
  • The transposition (mathematics) link is wrong. That means something else. --GaborPete (talk) 04:35, 5 April 2009 (UTC)
OK, I didn't really change them, rather deleted. But I have put a sentence containing substitution cipher and transposition cipher into the substitution-permutation network article. --GaborPete (talk) 10:31, 5 April 2009 (UTC)


this is one of my first visits on Wikipedia, and if my editing experience is very scarce yet, I duly apologize.

OK, now to the discussion: I have recently been editing the article "confusion and diffusion" because I became aware of a statement in the article which simply is not true in my opinion.

Unfortunately my changes have been un-done so I wanted to take a chance to argue in favour of what I think is correct.

The point is that confusion and diffusion are -- at least in some lecture notes and books -- explained just in a self-respective way: confusing.

But if you happen to have read Shannon's original paper, which should be taken as the origin of any citations, the terms are made rather clear:

There are two "inputs" which lead to a ciphertext. One is the plaintext, the other one is the key. Diffusion leads to an elimination of a statistical relation between changes in single plaintext bits and changes in single ciphertext bits. If you change a plaintext bits, a cipher with good diffusion should lead to a 50 pc chance for _any_ ciphertext bit to change.

Confusion, on the other hand, does the same for the key<->ciphertext relation.

As I stated, I did the changes the day before yesterday, but they have been un-done.

Please read the original paper to get the terms correct, and then please re-do my changes. The current explanation is simply incorrect.

A link:

Thanks for your contributions, and Wikipedia:Welcome to Wikipedia! Your above definition isn't how I understand confusion and diffusion, but it's been a little while since I've read Shannon's paper, so it might well be different. I'll have a leaf through it now, though it would be most helpful if you could give some relevant quotes from Shannon's paper. — Matt 07:37, 11 Jun 2004 (UTC)
To be more specific, the thing that I disagree with about your above definition is the assertion that confusion is the just the same thing as diffusion, but between key and ciphertext, not plaintext and ciphertext. I think confusion is about complexity of the relationship between the inputs and outputs, and diffusion is about the extent to which a small change in an input propagates to the output. — Matt 08:43, 11 Jun 2004 (UTC)

Shannon's paper[edit]

Now Shannon's paper is very lenghty indeed. If you are just interested in the issues of statistical methods reading especially Section 23 ("statistical methods") is sufficient.




Shannon writes:

Two methods (other than recourse to ideal systems) suggest themselves for frustrating a statistical analysis. These we may call the methods of diffusion and confusion. In the method of diffusion the statistical structure of M which leads to its redundancy is "dissipated" into long range statistics—i.e., into statistical structure involving long combinations of letters in the cryptogram...
The method of confusion is to make the relation between the simple statistics of E and the simple description of K a very complex and involved one.

Schneier writes in Applied Cryptography:

Confusion serves to hide any relationship between the plaintext, the ciphertext and the key. Remember how linear and differential cryptanalysis can exploit even a slight relationship between these three things? Good confusion makes the relationship statistics so complicated that even these powerful cryptanalytic tools won't work.

Diffusion spreads the influence of individual plaintext or key bits over as much of the ciphertext as possible. This also hides statistical relationships and makes cryptanalysis more difficult.

— Matt 08:04, 11 Jun 2004 (UTC)

...and Menezes, van Oorschot, Vanstone write (Handbook of Applied Cryptography):

p.20 1.36: Confusion is intended to make the relationship between the key and ciphertext as complex as possible. Diffusion refers to rearranging or spreading out the bits in the message so that any redundancy in the plaintext is spread out over the ciphertext.

— OT Fri Jun 11 10:29:59 CEST 2004

William Stallings, Cryptography and Network Security (3rd ed. p66-67):

In diffusion, the statistical structure of the plaintext is dissipated into long range statistics of the ciphertext. this is achieved by having each plaintext digit affect the value of many ciphertext digits, which is equivalent to saying that each ciphertext digit is affected by many plaintext digits...
The mechanism of diffusion seeks to make the statistical relationship between the plaintext and ciphertext as complex as possible to thwart attempts to deduce the key. On the other hand, confusion seeks to make the relationship between the statistics of the ciphertext and the value of the encryption key as complex as possible, again to thwart attempts to recover the key. Thus, even if the attacker can get some handle on the statistics of the ciphertext, the way in which the key was used to produce that ciphertext is so complex as to make it difficult to deduce the key.

— Matt 09:04, 11 Jun 2004 (UTC)

Clarification: First try[edit]

OK, I try to offer an explanation for both terms in my own words, and try to bit more precise than in my first uttering:

"Diffusion" is the spreading the redundancy of a plaintext over the ciphertext. What does that mean? In the set of all clear text messages, "sensible" clear text messages have a certain correlation between two-, three- and more-letter combinations (e.g. a "q" is always followd by a "u"). A cipher with good diffusion eliminates this correlation. Changes in one bit or letter affect every bit in the ciphertext. No correlation then means: the chances for a change in any ciphertext bit is 50 per cent.

I think we both agree with this definition.

Yes, I think I broadly agree, though I might have a minor quibble: I think the property you mention is also termed the Strict Avalanche Criterion (SAC). I think diffusion is a more general concept than SAC that covers properties such as completeness, avalanche, higher-order SAC, that kind of thing; general dependancy of output digits on input digits, which can be measured using a variety of criteria. — Matt 10:48, 11 Jun 2004 (UTC)
Right, then let us put it this way: if diffusion (i.e. the spreading of correlation) is taken to the one extreme so that a change in a single bit of plaintext leads to a 50 pc chance for any single bit in the ciphertext to change, then the cipher meets the SAC. Diffusion alone also includes stages in-between where the SAC is not met.
Diffusion also (IMO) includes stronger criteria, such as SAC of order n... etc (fixing any n or fewer bits to a fixed constant, the resulting function is still SAC). — Matt 11:04, 11 Jun 2004 (UTC)

Now to "confusion": Shannon's view is to have the influence of the key bits on the ciperhtext such that a statistical analysis of the ciphertext does not help to cut out a simply-structured subset which the correct key most probably is situated in. Tying down the key closer and closer shall be practically impossible because the subset is too folded (in a geometric view) inside the set of all keys.

It should therefore be as difficult as possible to infer the key by a known-plaintext attack, for example.

It is generally stated that a complex polyalphabetic substitution rule does just that.

Now to the often-stated relation between diffusion/confusion and transposition/substitution: clearly, transposition adds to the diffusion of the cipher, but only if additional steps are taken. A simple transposition does not spread the influence of a bit change. It simply transposes it.

Similarly, a polyalphabetic cipher alone does not lead to confusion, because a simple key change only leads to a simple ciphertext change (Take Vigenere: if you just change the first key letter of a say ten-letter Vigenere key, then cipher text letters 1,11,21,31 etc are changed, none else.)

So even though transposition is the ingredient for diffusion, and substitution is the ingredient for confusion, it is an iterated combination of both, which really provides both confusion and diffusion.

I'd like to point out that transposition by itself dissipates the higher order n-gram statistics, but not the unigram statistics. User:Dvunkannon —Preceding comment was added at 20:35, 21 November 2007 (UTC)