# Conditional entropy

Jump to: navigation, search
Individual (H(X),H(Y)), joint (H(X,Y)), and conditional entropies for a pair of correlated subsystems X,Y with mutual information I(X; Y).

In information theory, the conditional entropy (or equivocation) quantifies the amount of information needed to describe the outcome of a random variable $Y$ given that the value of another random variable $X$ is known. Here, information is measured in bits, nats, or bans. The entropy of $Y$ conditioned on $X$ is written as $H(Y|X)$.

## Definition

If $H(Y|X=x)$ is the entropy of the variable $Y$ conditioned on the variable $X$ taking a certain value $x$, then $H(Y|X)$ is the result of averaging $H(Y|X=x)$ over all possible values $x$ that $X$ may take.

Given discrete random variable $X$ with support $\mathcal X$ and $Y$ with support $\mathcal Y$, the conditional entropy of $Y$ given $X$ is defined as:[1]

\begin{align} H(Y|X)\ &\equiv \sum_{x\in\mathcal X}\,p(x)\,H(Y|X=x)\\ &{=}\sum_{x\in\mathcal X} \left(p(x)\sum_{y\in\mathcal Y}\,p(y|x)\,\log\, \frac{1}{p(y|x)}\right)\\ &=-\sum_{x\in\mathcal X}\sum_{y\in\mathcal Y}\,p(x,y)\,\log\,p(y|x)\\ &=-\sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)\log\,p(y|x)\\ &=\sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)\log \frac {p(x)} {p(x,y)}. \\ \end{align}

Note: The supports of X and Y can be replaced by their domains if it is understood that $0 \log 0$ should be treated as being equal to zero.

$H(Y|X)=0$ if and only if the value of $Y$ is completely determined by the value of $X$. Conversely, $H(Y|X) = H(Y)$ if and only if $Y$ and $X$ are independent random variables.

## Chain rule

Assume that the combined system determined by two random variables X and Y has entropy $H(X,Y)$, that is, we need $H(X,Y)$ bits of information to describe its exact state. Now if we first learn the value of $X$, we have gained $H(X)$ bits of information. Once $X$ is known, we only need $H(X,Y)-H(X)$ bits to describe the state of the whole system. This quantity is exactly $H(Y|X)$, which gives the chain rule of conditional entropy:

$H(Y|X)\,=\,H(X,Y)-H(X) \, .$

Formally, the chain rule indeed follows from the above definition of conditional entropy:

\begin{align} H(Y|X)=&\sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)\log \frac {p(x)} {p(x,y)}\\ =&-\sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)\log\,p(x,y) + \sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)\log\,p(x) \\ =& H(X,Y) + \sum_{x \in \mathcal X} p(x)\log\,p(x) \\ =& H(X,Y) - H(X). \end{align}

## Generalization to quantum theory

In quantum information theory, the conditional entropy is generalized to the conditional quantum entropy. The latter can take negative values, unlike its classical counterpart.

## Other properties

For any $X$ and $Y$:

$H(X|Y) \le H(X) \,$

$H(X,Y) = H(X|Y) + H(Y|X) + I(X;Y)$, where $I(X;Y)$ is the mutual information between $X$ and $Y$.

$I(X;Y) \le H(X),\,$

where $I(X;Y)$ is the mutual information between $X$ and $Y$.

For independent $X$ and $Y$:

$H(Y|X) = H(Y)\text{ and }H(X|Y) = H(X) \,$

Although the specific-conditional entropy, $H(X|Y=y)$, can be either less or greater than $H(X|Y)$, $H(X|Y=y)$ can never exceed $H(X)$ when $X$ is the uniform distribution.

## References

1. ^ Thomas, Thomas M. Cover, Joy A. (1991). Elements of information theory (99th ed. ed.). New York: Wiley. ISBN 0-471-06259-6.