# Leave-one-out error

• Leave-one-out cross-validation (CVloo) Stability An algorithm f has CVloo stability β with respect to the loss function V if the following holds:

${\displaystyle \forall i\in \{1,...,m\},\mathbb {P} _{S}\{\sup _{z\in Z}|V(f_{S},z_{i})-V(f_{S^{|i}},z_{i})|\leq \beta _{CV}\}\geq 1-\delta _{CV}}$

• Expected-to-leave-one-out error (${\displaystyle Eloo_{err}}$) Stability An algorithm f has ${\displaystyle Eloo_{err}}$ stability if for each n there exists a${\displaystyle \beta _{EL}^{m}}$ and a ${\displaystyle \delta _{EL}^{m}}$ such that:

${\displaystyle \forall i\in \{1,...,m\},\mathbb {P} _{S}\{|I[f_{S}]-{\frac {1}{m}}\sum _{i=1}^{m}V(f_{S^{|i}},z_{i})|\leq \beta _{EL}^{m}\}\geq 1-\delta _{EL}^{m}}$, with ${\displaystyle \beta _{EL}^{m}}$and ${\displaystyle \delta _{EL}^{m}}$ going to zero for ${\displaystyle n\rightarrow \inf }$

## Preliminary notations

X and Y ⊂ R being respectively an input and an output space, we consider a training set

${\displaystyle S=\{z_{1}=(x_{1},\ y_{1})\ ,..,\ z_{m}=(x_{m},\ y_{m})\}}$ of size m in ${\displaystyle Z=X\times Y}$ drawn i.i.d. from an unknown distribution D. A learning algorithm is a function ${\displaystyle f}$ from ${\displaystyle Z_{m}}$ into ${\displaystyle F\subset YX}$which maps a learning set S onto a function ${\displaystyle f_{S}}$ from X to Y. To avoid complex notation, we consider only deterministic algorithms. It is also assumed that the algorithm ${\displaystyle f}$ is symmetric with respect to S, i.e. it does not depend on the order of the elements in the training set. Furthermore, we assume that all functions are measurable and all sets are countable which does not limit the interest of the results presented here.

The loss of an hypothesis f with respect to an example ${\displaystyle z=(x,y)}$ is then defined as ${\displaystyle V(f,z)=V(f(x),y)}$. The empirical error of f is ${\displaystyle I_{S}[f]={\frac {1}{n}}\sum V(f,z_{i})}$.

The true error of f is ${\displaystyle I[f]=\mathbb {E} _{z}V(f,z)}$

Given a training set S of size m, we will build, for all i = 1....,m, modified training sets as follows:

• By removing the i-th element

${\displaystyle S^{|i}=\{z_{1},...,\ z_{i-1},\ z_{i+1},...,\ z_{m}\}}$

• By replacing the i-th element

${\displaystyle S^{i}=\{z_{1},...,\ z_{i-1},\ z_{i}',\ z_{i+1},...,\ z_{m}\}}$

## References

• S. Mukherjee, P. Niyogi, T. Poggio, and R. M. Rifkin. Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization. Adv. Comput. Math., 25(1-3):161–193, 2006