# Binary entropy function

In information theory, the binary entropy function, denoted ${\displaystyle \operatorname {H} (p)}$ or ${\displaystyle \operatorname {H} _{\text{b}}(p)}$, is defined as the entropy of a Bernoulli process with probability ${\displaystyle p}$ of one of two values. It is a special case of ${\displaystyle \mathrm {H} (X)}$, the entropy function. Mathematically, the Bernoulli trial is modelled as a random variable ${\displaystyle X}$ that can take on only two values: 0 and 1, which are mutually exclusive and exhaustive.

If ${\displaystyle \operatorname {Pr} (X=1)=p}$, then ${\displaystyle \operatorname {Pr} (X=0)=1-p}$ and the entropy of ${\displaystyle X}$ (in shannons) is given by

${\displaystyle \operatorname {H} (X)=\operatorname {H} _{\text{b}}(p)=-p\log _{2}p-(1-p)\log _{2}(1-p)}$,

where ${\displaystyle 0\log _{2}0}$ is taken to be 0. The logarithms in this formula are usually taken (as shown in the graph) to the base 2. See binary logarithm.

When ${\displaystyle p={\tfrac {1}{2}}}$, the binary entropy function attains its maximum value. This is the case of an unbiased coin flip.

${\displaystyle \operatorname {H} (p)}$ is distinguished from the entropy function ${\displaystyle \mathrm {H} (X)}$ in that the former takes a single real number as a parameter whereas the latter takes a distribution or random variable as a parameter. Sometimes the binary entropy function is also written as ${\displaystyle \operatorname {H} _{2}(p)}$. However, it is different from and should not be confused with the Rényi entropy, which is denoted as ${\displaystyle \mathrm {H} _{2}(X)}$.

## Explanation

In terms of information theory, entropy is considered to be a measure of the uncertainty in a message. To put it intuitively, suppose ${\displaystyle p=0}$. At this probability, the event is certain never to occur, and so there is no uncertainty at all, leading to an entropy of 0. If ${\displaystyle p=1}$, the result is again certain, so the entropy is 0 here as well. When ${\displaystyle p=1/2}$, the uncertainty is at a maximum; if one were to place a fair bet on the outcome in this case, there is no advantage to be gained with prior knowledge of the probabilities. In this case, the entropy is maximum at a value of 1 bit. Intermediate values fall between these cases; for instance, if ${\displaystyle p=1/4}$, there is still a measure of uncertainty on the outcome, but one can still predict the outcome correctly more often than not, so the uncertainty measure, or entropy, is less than 1 full bit.

## Derivative

The derivative of the binary entropy function may be expressed as the negative of the logit function:

${\displaystyle {d \over dp}\operatorname {H} _{\text{b}}(p)=-\operatorname {logit} _{2}(p)=-\log _{2}\left({\frac {p}{1-p}}\right)}$.

## Taylor series

The Taylor series of the binary entropy function in a neighborhood of 1/2 is

${\displaystyle \operatorname {H} _{\text{b}}(p)=1-{\frac {1}{2\ln 2}}\sum _{n=1}^{\infty }{\frac {(1-2p)^{2n}}{n(2n-1)}}}$

for ${\displaystyle 0\leq p\leq 1}$.

## Bounds

The following bounds hold for ${\displaystyle 0:[1]

${\displaystyle \ln(2)\cdot \log _{2}(p)\cdot \log _{2}(1-p)\leq H_{\text{b}}(p)\leq \log _{2}(p)\cdot \log _{2}(1-p)}$

and

${\displaystyle 4p(1-p)\leq H_{\text{b}}(p)\leq (4p(1-p))^{(1/\ln 4)}}$

where ${\displaystyle \ln }$ denotes natural logarithm.