# Reproducing kernel Hilbert space

(Redirected from Reproducing kernel)
Figure illustrates related but varying approaches to viewing RKHS

In functional analysis (a branch of mathematics), a reproducing kernel Hilbert space (RKHS) is a Hilbert space of functions in which pointwise evaluation is a continuous linear functional. Equivalently, they are spaces that can be defined by reproducing kernels. The subject was originally and simultaneously developed by Nachman Aronszajn (1907–1980) and Stefan Bergman (1895–1977) in 1950.

In this article we assume that Hilbert spaces are complex. The main reason for this is that many of the examples of reproducing kernel Hilbert spaces are spaces of analytic functions, although some real Hilbert spaces also have reproducing kernels. A key motivation for reproducing kernel hilbert spaces in machine learning is the Representer theorem which says that any function in an RKHS that classifies a set of sample points can be defined as a linear combination of the canonical feature maps of those points.

An important subset of the reproducing kernel Hilbert spaces are the reproducing kernel Hilbert spaces associated to a continuous kernel. These spaces have wide applications, including complex analysis, harmonic analysis, quantum mechanics, statistics and machine learning.

## Definition

Let X be an arbitrary set and H a Hilbert space of complex-valued functions on X. We say that H is a reproducing kernel Hilbert space if the linear map

$L_{x} : f \mapsto f(x)$

from H to the complex numbers is continuous for any x in X; equivalently, if $L_x$ is a bounded operator for any x in X. By the Riesz representation theorem, this implies that for every x in X there exists a unique element Kx of H with the property that:

$f(x) = L_{x}(f) = \langle f,\ K_x \rangle \quad \forall f \in H \quad (*).$

The function Kx is called the point-evaluation function at the point x.

Since H is a space of functions, the element Kx is itself a function with domain X, and may be written as Kx(y). The family of functions Kx(y) may be encoded into a single function K : X × XC by defining

$K(x,y) \ \stackrel{\mathrm{def}}{=}\ \overline{K_x(y)}.$

This function is called the reproducing kernel for the Hilbert space H and it is determined entirely by H because the Riesz representation theorem guarantees, for every x in X, that the element Kx satisfying (*) is unique.

## Moore-Aronszajn Theorem

We have seen how a reproducing kernel Hilbert space defines a reproducing kernel function. It follows from the definition of an inner product that the kernel defined is (conjugate) symmetric and positive definite. The Moore-Aronszajn theorem goes in the other direction; it says that every symmetric, positive definite kernel defines a unique reproducing kernel Hilbert space. The theorem first appeared in Aronszajn's Theory of Reproducing Kernels, although he attributes it to E. H. Moore.

Theorem. Suppose K is a symmetric, positive definite kernel on a set E. Then there is a unique Hilbert space of functions on E for which K is a reproducing kernel.

Proof. For all x in E, define Kx = K(x, ⋅ ). Let H0 be the linear span of {Kx : xE}. Define an inner product on H0 by

$\left\langle \sum_{j=1}^n b_j K_{y_j}, \sum_{i=1}^m a_i K_{x_i} \right \rangle = \sum_{i=1}^m \sum_{j=1}^n \overline{a_i} b_j K(y_j, x_i).$

The symmetry of this inner product follows from the symmetry of K and the non-degeneracy follows from the fact that K is positive definite.

Let H be the completion of H0 with respect to this inner product. Then H consists of functions of the form

$f(x) = \sum_{i=1}^\infty a_i K_{x_i} (x)$

where $\sum_{i=1}^\infty a_i^2 K (x_i, x_i) < \infty$. The fact that the above sum converges for every x follows from the Cauchy-Schwarz inequality.

Now we can check the RKHS property, (*):

$\langle f, K_x \rangle = \left \langle \sum_{i=1}^\infty a_i K_{x_i}, K_x \right \rangle= \sum_{i=1}^\infty a_i K (x_i, x) = f(x).$

To prove uniqueness, let G be another Hilbert space of functions for which K is a reproducing kernel. For any x and y in E, (*) implies that

$\langle K_x, K_y \rangle_H = K(x, y) = \langle K_x, K_y \rangle_G. \,$

By linearity, $\langle \cdot, \cdot \rangle_H = \langle \cdot, \cdot \rangle_G$ on the span of {Kx : xE}. Then G = H by the uniqueness of the completion.

## Integral Operators and Mercer Theorem

We may characterize a symmetric positive semi-definite kernel K via the integral operator, using the Mercer's theorem.

Suppose K is a continuous positive semi-definite kernel on a compact set $X\subset\mathbb{R}^n, \mu$ a Borel measure on $X$, and the integral operator $T_K: L_2(X) \rightarrow L_2(X)$ defined by $[T_K f](\cdot) =\int_X K(\cdot,t) f(t)\, d\mu(t)$ is positive semi-definite.

By the Mercer Theorem, there is an orthonormal basis $\{\phi_i\}_i$ of $L_2(X)$ consisting of eigenfunctions of $T_K$ such that the corresponding sequence of eigenvalues $\{\sigma_i\}_i$ is nonnegative. The eigenfunctions corresponding to non-zero eigenvalues are continuous on X and K has the representation

$K(s,t) = \sum_{j=1}^\infty \sigma_j \, \phi_j(s) \, \phi_j(t)$

where the convergence is absolute and uniform, that is, $\lim_{n \to \infty}\sup_{u,v}|K(u,v) - \sum_{j=1}^n \sigma_j \, \phi_j(u) \, \phi_j(v)| = 0$

If X is a compact set, then $T_K$ is a compact, positive, and self-adjoint operator, which yields a decreasing sequence $(\sigma_i)_i \geq 0$ such that $\lim_{i \to \infty}\sigma_i = 0$ and $T_K\phi(x) = \sigma_i\phi_i(x)$, where the $\{\phi_i\}$ form an orthonormal basis of $L_2(X)$.

This means that $\forall{f}{\in}L_2(X) \ f = \sum_{i=1}^\infty \left\langle f,\phi_i \right\rangle \phi_i$, so that the action of $T_K$ can be written as $[T_K f] = \sum_{i=1}^\infty \sigma_i \left\langle f,\phi_i\right\rangle \phi_i$

Furthermore, a Reproducing Kernel Hilbert Space $H$ can now be constructed, with $K$ as the reproducing kernel.

It can be shown to be $H = \left\{f {\in}L_2(X)\mathrel{\Bigg|} \sum_{i=1}^\infty\frac{\left\langle f,\phi_i\right\rangle^2}{\sigma_i} < \infty\right\}$,

with the inner product of $H$ given by $\left\langle f,g \right\rangle_H = \sum_{i=1}^\infty \frac{\left\langle f,\phi_i \right\rangle_{L_2}\left\langle g,\phi_i \right\rangle_{L_2}}{\sigma_i}$

## Feature Map

Main article: Kernel trick

A feature map is a map $\varphi: X \rightarrow F$, where $F$ is a Hilbert space called the Feature Space.

Every feature map defines a kernel via $K(x,s) = <\varphi(x), \varphi(s)>$

For example, for the reproducing kernel in the previous section, we can define $\varphi(x) = (\sqrt(\sigma_i)\phi_i(x))_i$. We can always associate a feature map to every kernel; in fact, there exist infinitely many feature maps associated with $K$.

With feature maps, we can now view functions as hyperplanes in the Feature Space. For example, for any $x{\in}X$, $\{w {\in}F \mathrel{|} f(x) = \}$ is a hyperplane.

In machine learning, kernel-based procedures allow us to map the data from the input space X into the Feature Space where linear methods may then be used.

## Examples

• Linear Kernel:
$K(x,y) = \langle x,y\rangle$
• Polynomial Kernel:
$K(x,y) = (\alpha\langle x,y \rangle + 1)^d, \alpha {\in} \mathbb{R}, d {\in} \mathbb{N}$

These are kernels which satisfy $K(x,y) = K(\|x - y\|)$

• Gaussian Kernel:
Sometimes referred to as the Radial basis function kernel, or squared exponential kernel
$K(x,y) = e^{-\frac{\|x - y\|^2}{2\sigma^2}}, \sigma > 0$
• Laplacian Kernel:
$K(x,y) = e^{-\frac{\|x - y\|}{\sigma}}, \sigma > 0$

In another example, when X is finite and H consists of all complex-valued functions on X, then an element of H can be represented as an array of complex numbers. If the usual inner product is used, then Kx is the function whose value is 1 at x and 0 everywhere else, and K(x,y) can be thought of as an identity matrix since K(x,y)=1 when x=y and K(x,y)=0 otherwise. In this case, H is isomorphic to Cn.

The case of X = D is more sophisticated, here the Bergman space H2(D) is the space of square-integrable holomorphic functions on D. It can be shown that the reproducing kernel for H2(D) is

$K(x,y)=\frac{1}{\pi}\frac{1}{(1-x\overline{y})^2}.$

This kernel is an example of a Bergman kernel, named for Stefan Bergman.