Entropy in thermodynamics and information theory
There are close parallels between the mathematical expressions for the thermodynamic entropy, usually denoted by S, of a physical system in the statistical thermodynamics established by Ludwig Boltzmann and J. Willard Gibbs in the 1870s, and the information-theoretic entropy, usually expressed as H, of Claude Shannon and Ralph Hartley developed in the 1940s. Shannon, although not initially aware of this similarity, commented on it upon publicizing information theory in A Mathematical Theory of Communication.
This article explores what links there are between the two concepts, and how far they can be regarded as connected.
Equivalence of form of the defining expressions 
Discrete case 
where is the probability of the microstate i taken from an equilibrium ensemble.
where is the probability of the message taken from the message space M.
Mathematically H may also be seen as an average information, taken over the message space, because when a certain message occurs with probability pi, the information −log(pi) will be obtained.
If all the microstates are equiprobable (a microcanonical ensemble), the statistical thermodynamic entropy reduces to the form on Boltzmann's tombstone,
where W is the number of microstates.
If all the messages are equiprobable, the information entropy reduces to the Hartley entropy
where is the cardinality of the message space M.
The logarithm in the thermodynamic definition is the natural logarithm. It can be shown that the Gibbs entropy formula, with the natural logarithm, reproduces all of the properties of the macroscopic classical thermodynamics of Clausius. (See article: Entropy (statistical views)).
The logarithm can also be taken to the natural base in the case of information entropy. This is equivalent to choosing to measure information in nats instead of the usual bits. In practice, information entropy is almost always calculated using base 2 logarithms, but this distinction amounts to nothing other than a change in units. One nat is about 1.44 bits.
The presence of Boltzmann's constant k in the thermodynamic definitions is a historical accident, reflecting the conventional units of temperature. It is there to make sure that the statistical definition of thermodynamic entropy matches the classical entropy of Clausius, thermodynamically conjugate to temperature. For a simple compressible system that can only perform volume work, the first law of thermodynamics becomes
But one can equally well write this equation in terms of what physicists and chemists sometimes call the 'reduced' or dimensionless entropy, σ = S/k, so that
Just as S is conjugate to T, so σ is conjugate to kT (the energy that is characteristic of T on a molecular scale).
Continuous case 
The most obvious extension of the Shannon entropy is the differential entropy,
As long as f(x) is a probability density function, pdf, H represents the entropy (average information, disorder, diversity, etc.) of f(x). For any uniform pdf f(x), the exponential of H is the volume covered by f(x) (in analogy to the cardinality in the discrete case). The volume covered by a n-dimensional multivariate Gaussian distribution with moment matrix M is proportional to the volume of the ellipsoid of concentration and is equal to . The volume is always positive.
Entropy may be maximized using Gaussian adaptation – one of the evolutionary algorithms – keeping the mean fitness – i. e. the probability of becoming a parent to new individuals in the population – constant (and without the need for any knowledge about entropy as a criterion function). This is illustrated by the figure below, showing Gaussian adaptation climbing a mountain crest in a phenotypic landscape. The lines in the figure are part of a contour line enclosing a region of acceptability in the landscape. At the start the cluster of red points represents a very homogeneous population with small variances in the phenotypes. Evidently, even small environmental changes in the landscape may cause the process to become extinct.
After a sufficiently large number of generations, the increase in entropy may result in the green cluster. Actually, the mean fitness is the same for both red and green cluster (about 65%). The effect of this adaptation is not very salient in a 2-dimensional case, but in a high-dimensional case, the efficiency of the search process may be increased by many orders of magnitude.
Besides, a Gaussian distribution has the highest entropy compared to other distributions having the same second order moment matrix (Middleton 1960).
But it turns out that this is not in general a good measure of uncertainty or information. For example, the differential entropy can be negative; also it is not invariant under continuous coordinate transformations. Jaynes showed in fact that the expression above is not the correct limit of the expression for a finite set of probabilities.
The correct expression, appropriate for the continuous case, is the relative entropy of a distribution, defined as the Kullback-Leibler divergence from the distribution to a reference measure m(x),
(or sometimes the negative of this).
The relative entropy carries over directly from discrete to continuous distributions, and is invariant under coordinate reparametrisations. The relative entropy is moreover, always positive, or zero in the case that f(x)=m(x).
Theoretical relationship 
Despite all that, there is an important difference between the two quantities. The information entropy H can be calculated for any probability distribution (if the "message" is taken to be that the event i which had probability pi occurred, out of the space of the events possible), while the thermodynamic entropy S refers to thermodynamic probabilities pi specifically.
Furthermore, the thermodynamic entropy S is dominated by different arrangements of the system, and in particular its energy, that are possible on a molecular scale. In comparison, information entropy of any macroscopic event is so small as to be completely irrelevant.
However, a connection can be made between the two. If the probabilities in question are the thermodynamic probabilities pi:, the (reduced) Gibbs entropy σ can then be seen as simply the amount of Shannon information needed to define the detailed microscopic state of the system, given its macroscopic description. Or, in the words of G. N. Lewis writing about chemical entropy in 1930, "Gain in entropy always means loss of information, and nothing more". To be more concrete, in the discrete case using base two logarithms, the reduced Gibbs entropy is equal to the minimum number of yes/no questions needed to be answered in order to fully specify the microstate, given that we know the macrostate.
Furthermore, the prescription to find the equilibrium distributions of statistical mechanics —such as the Boltzmann distribution— by maximising the Gibbs entropy subject to appropriate constraints (the Gibbs algorithm) can be seen as something not unique to thermodynamics, but as a principle of general relevance in all sorts of statistical inference, if it is desired to find a maximally uninformative probability distribution, subject to certain constraints on the behaviour of its averages. (These perspectives are explored further in the article Maximum entropy thermodynamics.)
Information is physical 
Szilard's engine 
A physical thought experiment demonstrating how just the possession of information might in principle have thermodynamic consequences was established in 1929 by Leó Szilárd, in a refinement of the famous Maxwell's demon scenario.
Consider Maxwell's set-up, but with only a single gas particle in a box. If the supernatural demon knows which half of the box the particle is in (equivalent to a single bit of information), it can close a shutter between the two halves of the box, close a piston unopposed into the empty half of the box, and then extract joules of useful work if the shutter is opened again. The particle can then be left to isothermally expand back to its original equilibrium occupied volume. In just the right circumstances therefore, the possession of a single bit of Shannon information (a single bit of negentropy in Brillouin's term) really does correspond to a reduction in the entropy of the physical system. The global entropy is not decreased, but information to energy conversion is possible.
Using a phase-contrast microscope equipped with a high speed camera connected to a computer, as demon, the principle has been actually demonstrated. In this experiment, information to energy conversion is performed on a Brownian particle by means of feedback control; that is, synchronizing the work given to the particle with the information obtained on its position. Computing energy balances for different feedback protocols, has confirmed that the Jarzynski equality requires a generalization that accounts for the amount of information involved in the feedback.
Landauer's principle 
In fact one can generalise: any information that has a physical representation must somehow be embedded in the statistical mechanical degrees of freedom of a physical system.
Thus, Rolf Landauer argued in 1961, if one were to imagine starting with those degrees of freedom in a thermalised state, there would be a real reduction in thermodynamic entropy if they were then re-set to a known state. This can only be achieved under information-preserving microscopically deterministic dynamics if the uncertainty is somehow dumped somewhere else – i.e. if the entropy of the environment (or the non information-bearing degrees of freedom) is increased by at least an equivalent amount, as required by the Second Law, by gaining an appropriate quantity of heat: specifically kT ln 2 of heat for every 1 bit of randomness erased.
On the other hand, Landauer argued, there is no thermodynamic objection to a logically reversible operation potentially being achieved in a physically reversible way in the system. It is only logically irreversible operations – for example, the erasing of a bit to a known state, or the merging of two computation paths – which must be accompanied by a corresponding entropy increase. When information is physical, all processing of its representations, i.e. generation, encoding, transmission, decoding and interpretation, are natural processes where entropy increases by consumption of free energy.
Applied to the Maxwell's demon/Szilard engine scenario, this suggests that it might be possible to "read" the state of the particle into a computing apparatus with no entropy cost; but only if the apparatus has already been SET into a known state, rather than being in a thermalised state of uncertainty. To SET (or RESET) the apparatus into this state will cost all the entropy that can be saved by knowing the state of Szilard's particle.
Shannon entropy has been related by physicist Léon Brillouin to a concept sometimes called negentropy. In his 1962 book Science and Information Theory, Brillouin described the Negentropy Principle of Information or NPI, the gist of which is that acquiring information about a system’s microstates is associated with a decrease in entropy (work is needed to extract information, erasure leads to increase in thermodynamic entropy). There is no violation of the second law of thermodynamics, according to Brillouin, since a reduction in any local system’s thermodynamic entropy results in an increase in thermodynamic entropy elsewhere. Negentropy was considered as controversial because its earlier understanding can yield Carnot efficiency higher than one.
In 2009, Mahulikar & Herwig redefined thermodynamic negentropy as the specific entropy deficit of the dynamically ordered sub-system relative to its surroundings. This definition enabled the formulation of the Negentropy Principle, which is mathematically shown to follow from the 2nd Law of Thermodynamics, during order existence.
Black holes 
|This section requires expansion. (June 2008)|
Stephen Hawking often speaks of the thermodynamic entropy of black holes in terms of their information content. Do black holes destroy information? It appears that there are deep relations between the entropy of a black hole and information loss See Black hole thermodynamics and Black hole information paradox.
Quantum theory 
Hirschman showed, cf. Hirschman uncertainty, that Heisenberg's uncertainty principle can be expressed as a particular lower bound on the sum of the classical distribution entropies of the quantum observable probability distributions of a quantum mechanical state, the square of the wave-function, in coordinate, and also momentum space, when expressed in Planck units. The resulting inequalities provide a tighter bound on the uncertainty relations of Heisenberg.
One could speak of the "joint entropy" of the position and momentum distributions in this quantity by considering them independent, but since they are not jointly observable, they cannot be considered as a joint distribution. Note that this entropy is not the accepted entropy of a quantum system, the Von Neumann entropy, −Tr ρ lnρ = −⟨lnρ⟩. In phase-space, the Von Neumann entropy can nevertheless be represented equivalently to Hilbert space, even though positions and momenta are quantum conjugate variables; and thus leads to a properly bounded entropy distinctly different (more detailed) than Hirschman's; this one accounts for the full information content of a mixture of quantum states.
(Dissatisfaction with the Von Neumann entropy from quantum information points of view has been expressed by Stotland, Pomeransky, Bachmat and Cohen, who have introduced a yet different definition of entropy that reflects the inherent uncertainty of quantum mechanical states. This definition allows distinction between the minimum uncertainty entropy of pure states, and the excess statistical entropy of mixtures.)
The fluctuation theorem 
|This section requires expansion. (June 2008)|
The fluctuation theorem provides a mathematical justification of the second law of thermodynamics under these principles, and precisely defines the limitations of the applicability of that law to the microscopic realm of individual particle movements.
Topics of recent research 
Is information quantized? 
In 1995, Dr Tim Palmer signalled two unwritten assumptions about Shannon's definition of information that may make it inapplicable as such to quantum mechanics:
- The supposition that there is such a thing as an observable state (for instance the upper face of a dice or a coin) before the observation begins
- The fact that knowing this state does not depend on the order in which observations are made (commutativity)
The article Conceptual inadequacy of the Shannon information in quantum measurement, published in 2001 by Anton Zeilinger and Caslav Brukner, synthesized and developed these remarks. The so-called Zeilinger's principle suggests that the quantization observed in QM could be bound to information quantization (one cannot observe less than one bit, and what is not observed is by definition "random"). Nevertheless, these claims remain quite controversial.
See also 
- Thermodynamic entropy
- Information entropy
- Statistical mechanics
- Information theory
- Physical information
- Quantum entanglement
- Quantum decoherence
- Fluctuation theorem
- Black hole entropy
- Black hole information paradox
- Entropy (information theory)
- Entropy (statistical thermodynamics)
- Entropy (order and disorder)
- Orders of magnitude (entropy)
- Jaynes, E.T. (1963). "Information Theory And Statistical Mechanics" (PDF). Brandeis University Summer Institute Lectures In Theoretical Physics 3 (sect. 4b): 181–218.
- For an application of relative entropy in a quantum information theory setting, see e.g. Hong Qian (2001). "Relative Entropy: Free Energy Associated with Equilibrium Fluctuations and Nonequilibrium Deviations". Physical Review E 63 (4). arXiv:math-ph/0007010. Bibcode:2001PhRvE..63d2103Q. doi:10.1103/PhysRevE.63.042103.
- Shoichi Toyabe; Takahiro Sagawa; Masahito Ueda; Eiro Muneyuki; Masaki Sano (2010-09-29). "Information heat engine: converting information to energy by feedback control". Nature Physics 6 (12): 988–992. arXiv:1009.5287. Bibcode:2011NatPh...6..988T. doi:10.1038/nphys1821. "We demonstrated that free energy is obtained by a feedback control using the information about the system; information is converted to free energy, as the first realization of Szilard-type Maxwell’s demon."
- Weiss, V.; Weiss, H. (November 2003). "The golden mean as clock cycle of brain waves". Chaos, Solitons and Fractals 18 (4): 643–652. Bibcode:2003CSF....18..643W. doi:10.1016/S0960-0779(03)00026-2.
- Karnani, M.; Pääkkönen, K.; Annila, A. (2009). "The physical character of information". Proc. R. Soc. A 465 (2107): 2155–75. Bibcode:2009RSPSA.465.2155K. doi:10.1098/rspa.2009.0063.
- Classical Information Theory (Shannon) – Talk Origins Archive
- Mahulikar, S.P.; Herwig, H. (August 2009). "Exact thermodynamic principles for dynamic order existence and evolution in chaos". Chaos, Solitons & Fractals 41 (4): 1939–48. Bibcode:2009CSF....41.1939M. doi:10.1016/j.chaos.2008.07.051.
- Schiffer M, Bekenstein JD (February 1989). "Proof of the quantum bound on specific entropy for free fields". Physical Review D 39 (4): 1109–15. Bibcode:1989PhRvD..39.1109S. doi:10.1103/PhysRevD.39.1109. PMID 9959747. "Black Holes and Entropy". Physical Review D 7 (8): 2333. Bibcode:1973PhRvD...7.2333B. doi:10.1103/PhysRevD.7.2333.Ellis, George Francis Rayner; Hawking, S. W. (1973). The large scale structure of space-time. Cambridge, Eng: University Press. ISBN 0-521-09906-4. von Baeyer, Christian, H. (2003). Information — the New Language of Science. Harvard University Press. ISBN 0-674-01387-5. Callaway DJE (April 1996). "Surface tension, hydrophobicity, and black holes: The entropic connection". Physical Review E 53 (4): 3738–3744. arXiv:cond-mat/9601111. Bibcode:1996PhRvE..53.3738C. doi:10.1103/PhysRevE.53.3738. PMID 9964684. Srednicki M (August 1993). "Entropy and area". Physical Review Letters 71 (5): 666–669. arXiv:hep-th/9303048. Bibcode:1993PhRvL..71..666S. doi:10.1103/PhysRevLett.71.666. PMID 10055336.
- Hirschman, Jr., I.I. (January 1957). "A note on entropy". American Journal of Mathematics 79 (1): 152–6. JSTOR 2372390.
- Zachos, C. K. (2007). "A classical bound on quantum entropy". Journal of Physics A: Mathematical and Theoretical 40 (21): F407. arXiv:hep-th/0609148. Bibcode:2007JPhA...40..407Z. doi:10.1088/1751-8113/40/21/F02.
- Alexander Stotland; Pomeransky; Eitan Bachmat; Doron Cohen (2004). "The information entropy of quantum mechanical states". Europhysics Letters 67 (5): 700–6. arXiv:quant-ph/0401021. Bibcode:2004EL.....67..700S. doi:10.1209/epl/i2004-10110-1.
- Brukner, Č.; Zeilinger, A. (2001). "Conceptual inadequacy of the Shannon information in quantum measurements". Physical Review A 63 (2). doi:10.1103/PhysRevA.63.022113.
- For a detailed discussion of the applicability of the Shannon information in quantum mechanics and an argument that Zeilinger's principle cannot explain quantization, see Timpson, 2003 and Hall, 2000, Mana, 2004, who show that Brukner and Zeilinger change, in the middle of the calculation in their article, the numerical values of the probabilities needed to compute the Shannon entropy, so that the calculation makes little sense.
Additional references 
- Bennett, C.H. (1973). "Logical reversibility of computation". IBM J. Res. Develop. 17 (6): 525–532. doi:10.1147/rd.176.0525.
- Brillouin, Léon ([1956, 1962] 2004). Science And Information Theory. Dover. ISBN 978-0-486-43918-1.
- Frank, Michael P. (May/June 2002). "Physical Limits of Computing". Computing in Science and Engineering 4 (3): 16–25.
- Greven, Andreas; Keller, Gerhard; Warnecke, Gerald, eds. (2003). Entropy. Princeton University Press. ISBN 978-0-691-11338-8. (A highly technical collection of writings giving an overview of the concept of entropy as it appears in various disciplines.)
- Landauer, R. (1993). "Information is Physical". Proc. Workshop on Physics and Computation PhysComp'92. Los Alamitos: IEEE Comp. Sci.Press. pp. 1–4. doi:10.1109/PHYCMP.1992.615478.
- Landauer, R. (1961). "Irreversibility and Heat Generation in the Computing Process". IBM J. Res. Develop. 5 (3): 183–191. doi:10.1147/rd.53.0183.
- Leff, H.S.; Rex, A.F., ed. (1990). Maxwell's Demon: Entropy, Information, Computing. Princeton NJ: Princeton University Press. ISBN 0-691-08727-X.
- Middleton, D. (1960). An Introduction to Statistical Communication Theory. McGraw-Hill.
- Shannon, Claude E. (July/October 1948). "A Mathematical Theory of Communication". Bell System Technical Journal 27 (3): 379–423. (as PDF)
- Entropy is Simple...If You Avoid the Briar Patches. Dismissive of direct link between information-theoretic and thermodynamic entropy.
- Information Processing and Thermodynamic Entropy Stanford Encyclopedia of Philosophy.
- An Intuitive Guide to the Concept of Entropy Arising in Various Sectors of Science — a wikibook on the interpretation of the concept of entropy.