# Fano's inequality

(Redirected from Fano inequality)

In information theory, Fano's inequality (also known as the Fano converse and the Fano lemma) relates the average information lost in a noisy channel to the probability of the categorization error. It was derived by Robert Fano in the early 1950s while teaching a Ph.D. seminar in information theory at MIT, and later recorded in his 1961 textbook.

It is used to find a lower bound on the error probability of any decoder as well as the lower bounds for minimax risks in density estimation.

Let the random variables X and Y represent input and output messages with a joint probability $P(x,y)$ . Let e represent an occurrence of error; i.e., that $X\neq {\tilde {X}}$ , with ${\tilde {X}}=f(Y)$ being an approximate version of $X$ . Fano's inequality is

$H(X|Y)\leq H(e)+P(e)\log(|{\mathcal {X}}|-1),$ where ${\mathcal {X}}$ denotes the support of X,

$H\left(X|Y\right)=-\sum _{i,j}P(x_{i},y_{j})\log P\left(x_{i}|y_{j}\right)$ is the conditional entropy,

$P(e)=P(X\neq {\tilde {X}})$ is the probability of the communication error, and

$H(e)=-P(e)\log P(e)-(1-P(e))\log(1-P(e))$ is the corresponding binary entropy.

## Alternative formulation

Let X be a random variable with density equal to one of $r+1$ possible densities $f_{1},\ldots ,f_{r+1}$ . Furthermore, the Kullback–Leibler divergence between any pair of densities cannot be too large,

$D_{KL}(f_{i}\|f_{j})\leq \beta$ for all $i\not =j.$ Let $\psi (X)\in \{1,\ldots ,r+1\}$ be an estimate of the index. Then

$\sup _{i}P_{i}(\psi (X)\not =i)\geq 1-{\frac {\beta +\log 2}{\log r}}$ where $P_{i}$ is the probability induced by $f_{i}$ ## Generalization

The following generalization is due to Ibragimov and Khasminskii (1979), Assouad and Birge (1983).

Let F be a class of densities with a subclass of r + 1 densities ƒθ such that for any θ ≠ θ

$\|f_{\theta }-f_{\theta '}\|_{L_{1}}\geq \alpha ,$ $D_{KL}(f_{\theta }\|f_{\theta '})\leq \beta .$ Then in the worst case the expected value of error of estimation is bound from below,

$\sup _{f\in \mathbf {F} }E\|f_{n}-f\|_{L_{1}}\geq {\frac {\alpha }{2}}\left(1-{\frac {n\beta +\log 2}{\log r}}\right)$ where ƒn is any density estimator based on a sample of size n.