# Talk:Entropy (information theory)/Archive3

## entropy explained

this statement is not followed up with something that uses the premise it states: "Since the probability of each event is 1 / n"

watson (talk) 04:08, 5 March 2009 (UTC)

## log probability

The article Perplexity says that information entropy is "also called" log probability. Is it true that they're the same thing? If so, a mention or brief discussion of this in the article might be appropriate. dbtfztalk 01:34, 20 April 2006 (UTC)

Entropy is *expected* log probability Full Decent (talk) 01:21, 3 December 2009 (UTC)

## Interesting properties

Hello, I have posted some (relatively?) interesting properties of the entropy function on my blog. http://fulldecent.blogspot.com/2009/12/interesting-properties-of-entropy.html Normally information from a blog is not authoritative and I wouldn't use this way to post primary information on Wikipedia; but math is math and stands on its own. Full Decent (talk) 01:27, 3 December 2009 (UTC)

I have made a few edits today and merged in some of that blog to http://en.wikipedia.org/wiki/Perplexity I think this article requires some review and attention to mathematical pedanting to maintain B-class level. Full Decent (talk) 16:00, 11 December 2009 (UTC)

## Actual information entropy

I was surprised to see Shannon entropy here and not the explicit collapse of information and was even more surprised to see the latter not even linked! I'd like to see this in the otheruses template at the top but am not sure how to phrase it succinctly. Can someone give it a shot? It's a hairy issue since the articles are so tightly related. .froth. (talk) 03:06, 26 March 2009 (UTC)

## About the relation between entropy and information

Hi. I've drawn this graphic. I'd like to know your comments about the idea it describes, thank you very much.

--Faustnh (talk) 00:07, 29 March 2009 (UTC)

Also posted at Fluctuation theorem talk page.

It's the same amount of information, but the information in the first one can be described more succinctly. See Kolmogorov complexity .froth. (talk) 01:30, 4 April 2009 (UTC)

I think it's not the same amount of information.

It is true that there is something that remains constant in both graphs, but it is not information:

The bigger quantity of information in the wave case, gets compensated by, or gets correlated to, the smaller quantity of entropy in that wave's case.

So, certainly, there is something that remains constant. But it is not information.

Here : aaa , there is less information than here : abc . But here : aaa , entropy is bigger than here : abc .

Another example:

This universe : abcd - abcd - abcd - abcd , is maximum entropy and minimum information.

This other universe : aaaa - bbbb - cccc - dddd , is minimum entropy and maximum information.

But something remains constant in both universes, because the second universe is a big replica of each of the small particles or sub-universes of the first universe.

--Faustnh (talk) 11:03, 4 April 2009 (UTC)

More information = more entropy. Also, I don't think this is very relevant to the information theory definition of entropy. Full Decent (talk) 15:58, 11 December 2009 (UTC)

Actually "abcd - abcd - abcd - abcd" is highly ordered. If we read these as sets of four hexadecimal digits, "aaaa - bbbb - cccc - dddd" is different 16 bit characters, while "abcd - abcd - abcd - abcd" is four of the same 16 bit character, and therefore more ordered. Any pattern is order.
I agree the poster of this image meant well, and it would be a great analogy if it were right. Unfortunately it's wrong, for reasons I get into below.  Randall Bart   Talk  21:45, 2 December 2010 (UTC)

It doesn't belong, read WP:NAMB. I'll remove it again unless some valid reason is given to keep it. Mintrick (talk) 21:56, 27 May 2009 (UTC)

• If you bother yourself to go to Entropy (disambiguation), you will find a whole section on different measures and generalisations of entropy which are used in information theory -- most notably Renyi entropy, also the entropies listed under the "Mathematics" section.
The article, as the hatnote says, is specifically about Shannon entropy.
This is in conformity with WP:NAMB:

However, a hatnote may still be appropriate when even a more specific name is still ambiguous. For example, Matt Smith (comics) might still be confused for the comics illustrator Matt Smith (illustrator).

Jheald (talk) 23:19, 27 May 2009 (UTC)
If there are other entropies in information theory, then the title of this article isn't fully disambiguated. Shannon entropy would be fully disambiguated however. --Cybercobra (talk) 01:04, 10 January 2010 (UTC)

Changing the base of a logarithm is tantamount as a scaling factor : $\scriptstyle{ \log_b x = \frac {\log_r x} {\log_r b} }$.

The same holds for entropy: $H_b(X) = \frac 1 {\log_r b} H_r(X)\!$ for any alternative base $r\!$.

In other words, changing the base is nothing more than changing the unit of measurment. All reasoning and comparaisons between entropy are independent of the base.

The question arise then to choose a reference base. The maximal entropy beeing

$\scriptstyle{ \max H_b(X) = - \sum{ p_i \log_b p_i} = -N \frac 1 N log_b {\frac 1 N} = log_b N }$,

a natural choice would then be to choose $\scriptstyle{ \log_b N = 1}$, that is $\scriptstyle{ b = N }$.

In that case, $0 \le H_b(X) \le 1$, with $H_b(X) = 0 \!$ for certain distribution and $H_b(X) = 1 \!$ for uniform distibution.

This justify the use of $b=2$ when analysing binary data. —Preceding unsigned comment added by 62.65.141.230 (talk) 11:27, 29 January 2010 (UTC)

## Layman's Terms

I've put in a short section titled "Layman's terms" to make it more user-friendly to the curious layman who is not familiar with hard sums or long winded technical definitions. It is my belief that every scientific and technical article should have one of these to encourage public interest in science. Hope my idea meets with general approval :-) --82.45.15.186 (talk) 19:39, 31 January 2010 (UTC)

Maybe we could use the example of drawing a number between 2 and 12 out of an equiprobable hat, versus rolling dice. I think the focus would be on the probability of the number 7. Bridgetttttttebabblepoop 13:35, 5 October 2010 (UTC)

## Use of Shannon Information Content with DNA

I wanted to relate Shannon to DNA and cell biology by searching for answers to the following four questions:

→ Is inanimate matter and energy both the input and output of a living cell?

→ Is the Shannon information content of DNA sufficient to animate matter and energy into life?

→ Was the Shannon information content required to bootstrap life into existence lost after life began?

→ Hypothetically, was that (now lost) bootstrap information derived from NP-hard processes?

I searched the web, PubMed, xxx.lanl.gov ... and found no references. Anyone know a reference? Bridgetttttttebabblepoop 10:31, 5 October 2010 (UTC)

## Requested move

The following is a closed discussion of the proposal. Please do not modify it. Subsequent comments should be made in a new section on the talk page. No further edits should be made to this section.

The result of the proposal was Not done. No consensus for proposal. No prejudice regarding other editorial proposals and potential renames. DMacks (talk) 18:48, 18 December 2010 (UTC)

Entropy (information theory)Shannon entropyRelisted. Vegaswikian (talk) 02:52, 14 November 2010 (UTC) Per Entropy_(disambiguation)#Information_theory_and_mathematics, there are multiple notions of entropy in information theory, which makes the current title not unambiguous. Cybercobra (talk) 07:45, 7 November 2010 (UTC)

I think this article is about both the general concept of Entropy in information theory, and Shannon's entropy (which is by far the most common example, and hence the primary topic for "Entropy (information theory)"). Most of the other definitions seem to be generalisations of this concept to very specialised mathematical contexts. Would it be better to keep this page where it is, but to make the links to the other mathematical entropies more explicit (e.g. to have a section about mathematical generalisations of the idea)? Djr32 (talk) 11:29, 7 November 2010 (UTC)
Parenthesized names are artificial and don't have primary topics. "Entropy" can have a primary topic. "Entropy (information theory)" is a name purely created for disambiguation and therefore primary topics aren't applicable. Splitting the article into 2 separate ones about the general concept and Shannon entropy specifically is another option. --Cybercobra (talk) 04:16, 14 November 2010 (UTC)
My instinct is to leave put. In information theory, in a book like say Cover & Thomas, I think this is now more commonly just called "Entropy" rather than "Shannon Entropy"; and in many ways it is actually the more fundamental concept than Entropy in thermodynamics, which (at least in the view of some) can be best understood as a particular concrete application of the more general idea of entropy that arises in information theory. So I don't see any great value in a move; but I do agree with Djr32 that a section towards the end introducing mathematical generalisations of the idea could be a useful addition. Jheald (talk) 18:27, 17 November 2010 (UTC)
The above discussion is preserved as an archive of the proposal. Please do not modify it. Subsequent comments should be made in a new section on this talk page. No further edits should be made to this section.

## "Compression" needs to be defined

The article introduces the concept of "compression" without explaining what it is. Explanation needed. 74.96.8.53 (talk) 15:35, 27 November 2010 (UTC)

## Reverse meaning of "entropy"

I read this LiveScience article and thought the word "entropy" was used backwards. So I came to this WP article to read about Shannon entropy. I quickly realized one of the following had to be true:

1. I was misreading the article (and the LiveScience write made the same mistake)
3. Claude Shannon used the word "entropy" backwards

Now that I have read this discussion page, it is clear to me that it is #3. The section Layman's terms begins "Entropy is a measure of disorder". This sentence is leading me down the primrose path into utter befuddlement. In thermodynamics, entropy is the tendency toward disorder, thus the words "measure of disorder" imply you are measuring the same thing meant by thermodynamic entropy. The ultimate thermodynamic entropy is the heat death of the universe. In such a state nothing differs from anything else, so there is no information. Yet Shannon calls information disorder, and therefor entropy is information. According to Shannon, the heat death of the universe is maximum information, which is a distinctly odd way of viewing it.

The article should be changed to acknowledge that this is the reverse of thermodynamic entropy.  Randall Bart   Talk  20:20, 2 December 2010 (UTC)

No. You misunderstand the notion of "heat death of the universe".
You say that In such a state nothing differs from anything else, so there is no information. But this is only true at the macroscopic level. At the microscopic level things are different. There are still an enormous number of different possible microscopic states that all the electrons, all the atoms, all the photons, all the parts of the universe together could be in. So if you wanted a total description of the state of the universe, down at the most microscopic level, there would be an enormous amount of information to find. In fact, heat death is the macroscopic state that maximises the number of microscopic states compatible with that macroscopic state -- i.e. the state that requires the most further information to fully specify the microscopic state given the macroscopic state. That is what makes heat death the state of maximum entropy -- both maximum thermodynamic entropy, and maximum Shannon entropy. Jheald (talk) 23:56, 2 December 2010 (UTC)
Just adding that, if it's any consolation, you're certainly not the first person that's been tripped up by the use of the word "disorder" to describe entropy. For further discussion of what the term "disorder" means in thermodynamical discussions of entropy, see the article Entropy (order and disorder). For a discussion of difficulties that the word "disorder" can lead to, see eg material near the start of Entropy (energy dispersal), and references and links from that page.
(Wikipedia isn't perfect. Ideally, the Entropy (order and disorder) page would include some of the discussion as to why the "disorder" can lead to confusion; and the Entropy (energy dispersal) should discuss some of the problems with seeing entropy only (or even primarily/preferentially) as something related to energy. But between the two articles, I hope you may find something of use.) Jheald (talk) 10:00, 3 December 2010 (UTC)
Regarding the last point, I've added some more text in a specific new section (here) to the Entropy (order and disorder) page to at least start to bring up this issue; and added a tag to Entropy (energy dispersal) with an explanation on the talk page, to at least visibly flag up some of its issues. Jheald (talk) 21:35, 3 December 2010 (UTC)

## Sculpture

What does this sculpture have to do with entropy? While nice, I don't really see the connection, nor the relevance of this picture. --InverseHypercube (talk) 19:44, 14 February 2011 (UTC)

It appears to be a chaotic Jenga stack so I am guessing the connection is chaos. However, I agree that it is more suited to a "Entropy in popular culture" section or article. Problem is, we currently don't have a better image for the lede to replace it with - do you have a suggestion? SpinningSpark 22:51, 15 February 2011 (UTC)
I don't think it needs an image; not all articles do, and the image shouldn't be kept simply because there is no alternative. Anyone else think it should be removed? --InverseHypercube (talk) 03:52, 16 February 2011 (UTC)
I also think it should be removed from this article. It's also on the 'Entropy' article, where it may have least a little relevance, but it's not appropriate in the information theory context. Qwfp (talk) 13:03, 16 February 2011 (UTC)
Done.--InverseHypercube (talk) 18:36, 16 February 2011 (UTC)

## Definition of uncertainty

Is the uncertainty really defined as $\displaystyle u = \log_b (n)$?

When describing the case of a a set of n\, possible outcomes (events) \left\{ x_i : i = 1 , \ldots , n \right\} the article says that the probability mass function is given by p(x_i) = 1 / n\, and then states that the uncertainty for such a set of n\, outcomes is defined by \displaystyle u = \log_b (n).

I believe that the uncertainty is not defined this way but is really defined in relation to the probability mass function where uncertainty is the integral of the probability mass function. While this was probably quite obvious to the writer, I'm not sure it would be to all readers. The way its worded almost makes it sound like the log relationship is something that came out of thin air by some definition when its really the result of the previous equation. I know math students probably should be able to figure this out by themselves, but I'm wondering if pointing this out would be better policy. At the very least, avoiding the misnomer of a "definition" would avoid some confusion. Dugthemathguy (talk) 03:28, 2 March 2011 (UTC)

## Information Theory and Thermodynamics Entropy

Can these two be linked? According to Computer Scientist Rolf Landauer, no. "...there is no unavoidable minimal energy requirement per transmitted bit."

Reference: Rolf Landauer, "Minimal Energy Requirements in Communication" p 1914-1918 v 272 Science, 28 June 1996.

210.17.201.123 (talk) 06:39, 10 April 2011 (UTC)

## Untitled

In the introduction, the article states that 'th' is the most common character sequence in the English language. A quick test seems to contradict this:

desktops:root:ga(3)> grep -i th /usr/share/dict/words | wc -l
21205
desktops:root:ga(3)> grep -i no /usr/share/dict/words | wc -l
22801
desktops:root:ga(3)> grep -i na /usr/share/dict/words | wc -l
22103

Where /usr/share/dict/words is a mostly complete list of English words, these lines count the occurance of 'th', 'no' and 'na' in that file. I'm sure that there are others that are more frequent still. —Preceding unsigned comment added by 72.165.89.132 (talk) 18:42, 11 April 2011 (UTC)

It's the most common character sequence in a corpus made of English sentences, not a list of unique words. Consider some of the most common words: the, then, this, that, with, etc. NeoAdamite (talk) 06:12, 24 October 2011 (UTC)

## Shannon entropy and continuous random variables

The article states that "The Shannon entropy is restricted to random variables taking discrete values". Is this technically true? My understanding is that the Shannon entropy of a continuous random variable is defined, but infinite. The infinite Shannon entropy of a continuous r.v. is an important result, for example, combined with the source coding theorem, it predicts that an information channel must have infinite capacity to perfectly transmit continuously distributed random variables (which is also true, and also an important result in the field). --YearOfGlad (talk) 20:58, 21 January 2012 (UTC)

## Citation Needed for Entropy of English

The paragraph that discusses the entropy of the English language, stating "English text has fairly low entropy. In other words, it is fairly predictable.", is a bold claim and appears to be original research. I would like to see citations to back up this discussion. hovden (talk) 26 May 2011 —Preceding undated comment added 19:26, 26 May 2011 (UTC).

Two cites for numerical estimates for the entropy of English are given in the lead. But they could happily be repeated lower down.
As for whether the entropy is "fairly low", surely what the article is doing is comparing the entropy of English with that of a random sequence of letters. English clearly does have systematic regularities compared to such a stream. But the article could make more explicit that this is the comparison it has in mind. Jheald (talk) 10:33, 28 May 2011 (UTC)

## Sentence confuses me

"A single toss of a fair coin has an entropy of one bit, but a particular result (e.g. "heads") has zero entropy, since it is entirely 'predictable'."

It calls the result of a random coin toss result entirely predictable. That doesn't make any sense. Can someone please clarify? — Preceding unsigned comment added by 67.1.51.94 (talk) 08:27, 28 May 2011 (UTC)

In this case the term 'predictable' is in reference to knowing what can happen as a result of the coin toss. It will either be heads or tails with no other options. People tend to get confused between the probability of one particular outcome vs. the predictability of it either being heads or tails. For it to be unpredictable you would have to have other unknown options. § Music Sorter § (talk) 23:08, 4 July 2011 (UTC)

Perhaps the confusion arises from a misunderstanding of the term "two-headed coin" in the sentence "A series of tosses of a two-headed coin will have zero entropy." This is distinct from the notion of a normal two-sided coin. A two-headed coin has two sided, both of which are heads and indistinguishable from one another. Therefore the entropy of of a series of tosses will be 0, since the result of each and every toss will be indistinguishably heads. If, however, a normal, fair, two-sided (1 heads, 1 tails) coin is tossed multiple times, the entropy will be 1, since the results will be entirely unpredictable. 76.65.229.24 (talk) 17:08, 22 September 2011 (UTC)Joey Morin

## Appropriateness of one link to basic entropy in article

The question is, is it appropriate to have one link to entropy in the article, or would the need to use two links to get to the basic article using the disambiguation link at the top of the page be irritating to some readers. Does anyone think that one, but only one, direct link would contribute to the article?

1. (cur | prev) 09:31, 2 October 2011 67.206.184.19 (talk) (42,078 bytes) (The word being defined in the introduction is entropy. The word being linked to is entropy. One more revert and I will attempt to transfer these headers to the discussion page.) (undo)
2. (cur | prev) 09:26, 2 October 2011 SudoGhost (talk | contribs) (42,074 bytes) (Undid revision by 67.206.184.19 (talk) The wikilink doesn't belong there. The word being defined is NOT the word you're linking to. Period. That's what the disambiguation is for) (undo)
3. (cur | prev) 09:24, 2 October 2011 67.206.184.19 (talk) (42,078 bytes) (Your view is interesting. The disambiguation page takes two links to get to the article. This could be irritating to many readers who would want to get to the basic term.) (undo)
4. (cur | prev) 09:21, 2 October 2011 SudoGhost (talk | contribs) (42,074 bytes) (Undid revision by 67.206.184.19 (talk) The use of the word defining the information theory is not an appropriate place to link Entropy. There is a disambiguation for it above) (undo)
5. (cur | prev) 09:18, 2 October 2011 67.206.184.19 (talk) (42,078 bytes) (Could not find a single link to basic entropy in article. A vast number is bad but one seems reasonable.

67.206.184.19 (talk) 09:41, 2 October 2011 (UTC)

Saying "Entropy is a measure of disorder, or more precisely unpredictability" is completely untrue. Entropy is a thermodynamic property that can be used to determine the energy available for useful work in a thermodynamic process. Placing the link there is confusing at best, because the article being linked to has nothing to do with the information being discussed. A wikilink is placed within an article to help give the reader a better understanding of the content of an article, and placing that wikilink there not only does not accomplish that task, it does the opposite. There's no reason to place that wikilink there, but plenty of reason not to. - SudoGhost 09:46, 2 October 2011 (UTC)
You placed two links to the article into the discussion. I think the article speaks for itself. Beyond that I didn't say anything. The basic question is, would it be helpful for readers to be able to easily link to the basic term somewhere in the article? This question is not asked to SudoGhost. It is asked to other viewers of the discussion page. 67.206.184.19 (talk) 09:52, 2 October 2011 (UTC)
The article speaks for itself? That requires clarification, because your meaning is unclear. What you linked is Entropy. Entropy is not a measure of disorder. Entropy (information theory) is. They are not the same. It's misleading and inaccurate. You cannot place that wikilink there for the same reason you cannot say that "Android is an operating system for mobile devices such as smartphones and tablet computers." Android and Android are not the same, and you can't place a link to Android (drug) in the Android (operating system) article at a random spot where the word "Android" is used just because it might save someone a single click. That's what disambiguations are for. The entropy article you linked has nothing to do with the entropy definition in the article, and as such it doesn't belong there. - SudoGhost 10:07, 2 October 2011 (UTC)
I would agree with that statement. Would you agree that the link would be appropriate for the end of the phrase 'The inspiration for adopting the word entropy' in the section 'Aspects - Relationship to thermodynamic entropy'? If you would think that it was a different entropy being referred to there, what entropy would it be referring to? 67.206.184.19 (talk) 10:18, 2 October 2011 (UTC)
Yes, that would be a completely appropropriate place for it. I've edited the article so that "thermodynamic entropy" (which sent the reader to the disambiguation page) became "thermodynamic entropy" (sending the reader directly to the entropy article). - SudoGhost 10:23, 2 October 2011 (UTC)
I agree with this last modification. There is *information entropy* and *thermodynamic entropy*. The concept of information entropy can be applied to the special case of a statistical ensemble of physical particles. The thermodynamic entropy is then equal to the Boltzmann constant times this information entropy. Best of all worlds would be two articles: Thermodynamic entropy and Information entropy. PAR (talk) 19:03, 2 October 2011 (UTC)

## Chain Rule?

I was thinking this article (or a subarticle) should discuss the entropy chain rule as discussed in Cover & Thomas (1991, pg 21) (see http://www.cse.msu.edu/~cse842/Papers/CoverThomas-Ch2.pdf).

$H(X_1, X_2, \dots X_n) = \sum_{i=1}^{n} H(X_i|X_{i-1}, X_{i-2}, \dots X_1)$ — Preceding unsigned comment added by 150.135.222.186 (talkcontribs) 21:39, 13 October 2011