Data processing inequality

The data processing inequality is an information theoretic concept which states that the information content of a signal cannot be increased via a local physical operation. This can be expressed concisely as 'post-processing cannot increase information'.[1]

Definition

Let three random variables form the Markov chain ${\displaystyle X\rightarrow Y\rightarrow Z}$, implying that the conditional distribution of ${\displaystyle Z}$ depends only on ${\displaystyle Y}$ and is conditionally independent of ${\displaystyle X}$. Specifically, we have such a Markov chain if the joint probability mass function can be written as

${\displaystyle p(x,y,z)=p(x)p(y|x)p(z|y)}$

In this setting, no processing of Y , deterministic or random, can increase the information that Y contains about X. Using the mutual information, this can be written as :

${\displaystyle I(X;Y)\geqslant I(X;Z)}$

With the equality ${\displaystyle I(X;Y)=I(X;Z)}$ if and only if ${\displaystyle I(X;Y\mid Z)=0}$, i.e. ${\displaystyle Z}$ and ${\displaystyle Y}$ contain the same information about ${\displaystyle X}$, and ${\displaystyle X\rightarrow Z\rightarrow Y}$ also forms a Markov chain.[2]

Proof

One can apply the chain rule for mutual information to obtain two different decompositions of ${\displaystyle I(X;Y,Z)}$:

${\displaystyle I(X;Z)+I(X;Y\mid Z)=I(X;Y,Z)=I(X;Y)+I(X;Z\mid Y)}$

By the relationship ${\displaystyle X\rightarrow Y\rightarrow Z}$, we know that ${\displaystyle X}$ and ${\displaystyle Z}$ are conditionally independent, given ${\displaystyle Y}$, which means the conditional mutual information, ${\displaystyle I(X;Z\mid Y)=0}$. The data processing inequality then follows from the non-negativity of ${\displaystyle I(X;Y\mid Z)\geq 0}$.