= Mercer's theorem =

In mathematics, specifically functional analysis, Mercer's theorem is a representation of a symmetric positive-definite function on a square as a sum of a convergent sequence of product functions. This theorem, presented in , is one of the most notable results of the work of James Mercer (1883–1932). It is an important theoretical tool in the theory of integral equations; it is used in the Hilbert space theory of stochastic processes, for example the Karhunen–Loève theorem; and it is also used in the reproducing kernel Hilbert space theory where it characterizes a symmetric positive-definite kernel as a reproducing kernel.

== Introduction ==
To explain Mercer's theorem, we first consider an important special case; see below for a more general formulation.
A kernel, in this context, is a symmetric continuous function

$K: [a,b] \times [a,b] \rightarrow \mathbb{R}$

where $K(x,y) = K(y,x)$ for all $x,y \in [a,b]$.

K is said to be a positive-definite kernel if and only if

$\sum_{i=1}^n\sum_{j=1}^n K(x_i, x_j) c_i c_j \geq 0$

for all finite sequences of points x_{1}, ..., x_{n} of [a, b] and all choices of real numbers c_{1}, ..., c_{n}. Note that the term "positive-definite" is well-established in literature despite the weak inequality in the definition.

The fundamental characterization of stationary positive-definite kernels (where $K(x,y) = K(x-y)$) is given by Bochner's theorem. It states that a continuous function $K(x-y)$ is positive-definite if and only if it can be expressed as the Fourier transform of a finite non-negative measure $\mu$:

$K(x-y) = \int_{-\infty}^{\infty} e^{i(x-y)\omega} \, d\mu(\omega)$

This spectral representation reveals the connection between positive definiteness and harmonic analysis, providing a stronger and more direct characterization of positive definiteness than the abstract definition in terms of inequalities when the kernel is stationary, e.g, when it can be expressed as a 1-variable function of the distance between points rather than the 2-variable function of the positions of pairs of points.

Associated to K is a linear operator (more specifically a Hilbert–Schmidt integral operator when the interval is compact) on functions defined by the integral

$[T_K \varphi](x) =\int_a^b K(x,s) \varphi(s)\, ds.$

We assume $\varphi$ can range through the space
of real-valued square-integrable functions L^{2}[a, b]; however, in many cases the associated reproducing kernel Hilbert space can be strictly larger than L^{2}[a, b]. Since T_{K} is a linear operator, the eigenvalues and eigenfunctions of T_{K} exist.

Theorem. Suppose K is a continuous symmetric positive-definite kernel. Then there is an orthonormal basis
{e_{i}}_{i} of L^{2}[a, b] consisting of eigenfunctions of T_{K} such that the corresponding
sequence of eigenvalues {λ_{i}}_{i} is nonnegative. The eigenfunctions corresponding to non-zero eigenvalues are continuous on [a, b] and K has the representation

$K(s,t) = \sum_{j=1}^\infty \lambda_j \, e_j(s) \, e_j(t)$

where the convergence is absolute and uniform.

== Details ==
We now explain in greater detail the structure of the proof of
Mercer's theorem, particularly how it relates to spectral theory of compact operators.

- The map K ↦ T_{K} is injective.
- T_{K} is a non-negative symmetric compact operator on L^{2}[a,b]; moreover K(x, x) ≥ 0.

To show compactness, show that the image of the unit ball of L^{2}[a,b] under T_{K} is equicontinuous and apply Ascoli's theorem, to show that the image of the unit ball is relatively compact in C([a,b]) with the uniform norm and a fortiori in L^{2}[a,b].

Now apply the spectral theorem for compact operators on Hilbert
spaces to T_{K} to show the existence of the
orthonormal basis {e_{i}}_{i} of
L^{2}[a,b]

$\lambda_i e_i(t)= [T_K e_i](t) = \int_a^b K(t,s) e_i(s)\, ds.$

If λ_{i} ≠ 0, the eigenvector (eigenfunction) e_{i} is seen to be continuous on [a,b]. Now

$\sum_{i=1}^\infty \lambda_i |e_i(t) e_i(s)| \leq \sup_{x \in [a,b]} |K(x,x)|,$

which shows that the sequence

$\sum_{i=1}^\infty \lambda_i e_i(t) e_i(s)$

converges absolutely and uniformly to a kernel K_{0} which is easily seen to define the same operator as the kernel K. Hence K=K_{0} from which Mercer's theorem follows.

Finally, to show non-negativity of the eigenvalues one can write $\lambda \langle f,f \rangle= \langle f, T_{K}f \rangle$ and expressing the right hand side as an integral well-approximated by its Riemann sums, which are non-negative
by positive-definiteness of K, implying $\lambda \langle f,f \rangle \geq 0$, implying $\lambda \geq 0$.

== Trace ==
The following is immediate:

Theorem. Suppose K is a continuous symmetric positive-definite kernel; T_{K} has a sequence of nonnegative
eigenvalues {λ_{i}}_{i}. Then

$\int_a^b K(t,t)\, dt = \sum_i \lambda_i.$

This shows that the operator T_{K} is a trace class operator and

$\operatorname{trace}(T_K) = \int_a^b K(t,t)\, dt.$

== Generalizations ==
Mercer's theorem itself is a generalization of the result that any symmetric positive-semidefinite matrix is the Gramian matrix of a set of vectors.

The first generalization replaces the interval [a, b] with any compact Hausdorff space and Lebesgue measure on [a, b] is replaced by a finite countably additive measure μ on the Borel algebra of X whose support is X. This means that μ(U) > 0 for any nonempty open subset U of X.

A recent generalization replaces these conditions by the following: the set X is a first-countable topological space endowed with a Borel (complete) measure μ. X is the support of μ and, for all x in X, there is an open set U containing x and having finite measure. Then essentially the same result holds:

Theorem. Suppose K is a continuous symmetric positive-definite kernel on X. If the function κ is L^{1}_{μ}(X), where κ(x) := K(x,x) for all x in X, then there is an orthonormal set
{e_{i}}_{i} of L^{2}_{μ}(X) consisting of eigenfunctions of T_{K} such that corresponding
sequence of eigenvalues {λ_{i}}_{i} is nonnegative. The eigenfunctions corresponding to non-zero eigenvalues are continuous on X and K has the representation

$K(s,t) = \sum_{j=1}^\infty \lambda_j \, e_j(s) \, e_j(t)$

where the convergence is absolute and uniform on compact subsets of X.

The next generalization deals with representations of measurable kernels.

Let (X, M, μ) be a σ-finite measure space. An L^{2} (or square-integrable) kernel on X is a function

$K \in L^2_{\mu \otimes \mu}(X \times X).$

L^{2} kernels define a bounded operator T_{K} by the formula

$\langle T_K \varphi, \psi \rangle = \int_{X \times X} K(y,x) \varphi(y) \psi(x) \,d[\mu \otimes \mu](y,x).$

T_{K} is a compact operator (actually it is even a Hilbert–Schmidt operator). If the kernel K is symmetric, by the spectral theorem, T_{K} has an orthonormal basis of eigenvectors. Those eigenvectors that correspond to non-zero eigenvalues can be arranged in a sequence {e_{i}}_{i} (regardless of separability).

Theorem. If K is a symmetric positive-definite kernel on (X, M, μ), then

$K(y,x) = \sum_{i \in \mathbb{N}} \lambda_i e_i(y) e_i(x)$

where the convergence in the L^{2} norm. Note that when continuity of the kernel is not assumed, the expansion no longer converges uniformly.

==Mercer's condition==
A real-valued function K(x,y) is said to fulfill Mercer's condition if for all square-integrable functions g(x) one has

$\iint g(x)K(x,y)g(y)\,dx\,dy \geq 0.$

===Discrete analog===
This is analogous to the definition of a positive-semidefinite matrix. This is a matrix $K$ of dimension $N$, which satisfies, for all vectors $g$, the property
$(g,Kg)=g^{T}{\cdot}Kg=\sum_{i=1}^N\sum_{j=1}^N\,g_i\,K_{ij}\,g_j\geq0$.

===Examples===
A positive constant function
$K(x, y)=c\,$
satisfies Mercer's condition, as then the integral becomes by Fubini's theorem
$\iint g(x)\,c\,g(y)\,dx \, dy = c\int\! g(x) \,dx \int\! g(y) \,dy = c\left(\int\! g(x) \,dx\right)^2$
which is indeed non-negative.

== See also ==
- Kernel trick
- Representer theorem
- Reproducing kernel Hilbert space
- Spectral theory
