# Fano's inequality

In information theory, Fano's inequality (also known as the Fano converse and the Fano lemma) relates the average information lost in a noisy channel to the probability of the categorization error. It was derived by Robert Fano in the early 1950s while teaching a Ph.D. seminar in information theory at MIT, and later recorded in his 1961 textbook.

It is used to find a lower bound on the error probability of any decoder as well as the lower bounds for minimax risks in density estimation.

Let the random variables X and Y represent input and output messages with a joint probability ${\displaystyle P(x,y)}$. Let e represent an occurrence of error; i.e., that ${\displaystyle X\neq {\tilde {X}}}$, with ${\displaystyle {\tilde {X}}=f(Y)}$ being an approximate version of ${\displaystyle X}$. Fano's inequality is

${\displaystyle H(X|Y)\leq H(e)+P(e)\log(|{\mathcal {X}}|-1),}$

where ${\displaystyle {\mathcal {X}}}$ denotes the support of X,

${\displaystyle H\left(X|Y\right)=-\sum _{i,j}P(x_{i},y_{j})\log P\left(x_{i}|y_{j}\right)}$

is the conditional entropy,

${\displaystyle P(e)=P(X\neq {\tilde {X}})}$

is the probability of the communication error, and

${\displaystyle H(e)=-P(e)\log P(e)-(1-P(e))\log(1-P(e))}$

is the corresponding binary entropy.

## Alternative formulation

Let X be a random variable with density equal to one of ${\displaystyle r+1}$ possible densities ${\displaystyle f_{1},\ldots ,f_{r+1}}$. Furthermore, the Kullback–Leibler divergence between any pair of densities cannot be too large,

${\displaystyle D_{KL}(f_{i}\|f_{j})\leq \beta }$ for all ${\displaystyle i\not =j.}$

Let ${\displaystyle \psi (X)\in \{1,\ldots ,r+1\}}$ be an estimate of the index. Then

${\displaystyle \sup _{i}P_{i}(\psi (X)\not =i)\geq 1-{\frac {\beta +\log 2}{\log r}}}$

where ${\displaystyle P_{i}}$ is the probability induced by ${\displaystyle f_{i}}$

## Generalization

The following generalization is due to Ibragimov and Khasminskii (1979), Assouad and Birge (1983).

Let F be a class of densities with a subclass of r + 1 densities ƒθ such that for any θ ≠ θ

${\displaystyle \|f_{\theta }-f_{\theta '}\|_{L_{1}}\geq \alpha ,}$
${\displaystyle D_{KL}(f_{\theta }\|f_{\theta '})\leq \beta .}$

Then in the worst case the expected value of error of estimation is bound from below,

${\displaystyle \sup _{f\in \mathbf {F} }E\|f_{n}-f\|_{L_{1}}\geq {\frac {\alpha }{2}}\left(1-{\frac {n\beta +\log 2}{\log r}}\right)}$

where ƒn is any density estimator based on a sample of size n.