Talk:Reproducing kernel Hilbert space

From Wikipedia, the free encyclopedia
Jump to: navigation, search

What is this sesquilinearity convention of which you speak? Do you mean linear in the second variable and conjucate-linear in the first? If so, it seems a strange convention to me. Lupin 08:39, 6 Aug 2004 (UTC)

Norm-continuous[edit]

The article states that H is a reproducing kernel Hilbert space iff the linear map f→f(x) is norm-continuous for any element x of X. -- what does norm-continuous mean in this context? linas 15:06, 25 Jun 2005 (UTC)

H is a Hilbert space, thus carries a norm. This ff(x) has a weell-defined topology on both source and target.--CSTAR 15:38, 25 Jun 2005 (UTC)

Examples, too[edit]

Some examples would be handy too. Reading this article naively, one has the Dirac delta function playing the role of K, so that f(x)=\int \delta(x-y) f(y) dy and δ is usually naively understood to be the "identity operator" on the Hilbert space in question (that is, its the sum over all states, e.g. the sum over all orthogonal polynomials on an interval). Is this naive reading correct? Can I correctly say that an example is

K(x,y)=\sum_{n=0}^\infty T_n(x) T_n(y)

where T_n ar the Chebyshev polynomials (and the norm/volume elt/weight is taken appropriately)? It seems to fulfill the requirements described in this article, I think ... linas 15:21, 25 Jun 2005 (UTC)

That's just it, the delta function is not in the space.--CSTAR 15:38, 25 Jun 2005 (UTC)
Would you object if I added the above as an example to the article? linas 18:31, 25 Jun 2005 (UTC)

Counter-examples, too[edit]

Assuming the example given above is correct, then, naively, every hilbert space encountered by a naive physicist is a reproducing kernel hilbert space. So it now becomes hard to imagine a Hilbert space that doesn't have such a beast associated with it. (Since K seems to be defined as "the identity operator", and these are always easily constructed as a sum over states). linas 15:27, 25 Jun 2005 (UTC)

What's relevant here is the realization of the Hilbert space as a space of functions on a set X.--CSTAR 15:38, 25 Jun 2005 (UTC)
Well, then, for the record, are there any hilbert spaces that are not realized as a space of functions on a set? Alternately, are there any hilbert spaces that are realized as a space of functions on a set, but do not have the norm-continuous property? I'm one of these people who don't feel comfortable with a topic until I have understood both an example and a counter-example. I'll try to think of something, but if you have something up your sleeve, that's certainly better than whatever I might dream up. linas 18:31, 25 Jun 2005 (UTC)
Yes: any L2μ for a non-atomic measure μ. These are realized as equivalence classes of functions.--CSTAR 18:41, 25 Jun 2005 (UTC)
late late comment: i think linas's original post is right. every Hilbert space H is isomorphic to a reproducing kernel Hilbert space, meaning there exist a set X, and a positive kernel K: X × XC such that the resulting Hilbert space is isomorphic H, and, yes, K can be taken as, essentially, the identity operator.Mct mht 19:38, 30 March 2007 (UTC)
Late late late comment. Whilst it is true that every Hilbert space is isomorphic to a RKHS, that's not the same as saying it is a RKHS. For example, L2[0,1] is isomorphic to l^2 (\mathbb{Z}), which is a RKHS. However, the reproducting kernel here is K(x,y)=\delta_{xy} for x,y\in \mathbb{Z}. The key thing is, that this does not lead to a reproducting kernel for L2[0,1], as it takes x and y in \mathbb{Z}, not [0,1]. In fact, L2[0,1] is not a RKHS, as point evaluation is not a continuous function. The message here is that being a RKHS is not preserved by isomorphism. Also, I should point out that the delta function is not a function in the conventional sense. It's a common shorthand to regard it as one, but it isn't a function in the conventional sense, and it certainly isn't in L2[0,1]. James pic 13:37, 9 October 2007 (UTC)
P.s, the delta function I use in the example above as \delta_{xy} is the Kronecker delta, not the Dirac delta. The Kronecker delta is in l^2 (\mathbb{Z}), whereas the Dirac delta is not in L2[0,1].
Incidentally, in response to the request for examples, L2(X) is not a RKHS for X any open subset of \mathbb{C}^n. On the other hand, Hardy spaces usually are RKHSs, using various variants of the Szego kernel.
For example L^2(\mathbb{T}), the Hilbert space of L2 functions on the circle is not a RKHS. On the other hand, H^2(\mathbb{T}), the space of L2 functions on the circle which are radial limits of holomorphic functions on the disc, is. James pic 14:18, 9 October 2007 (UTC)

Edit[edit]

User:Oleg Alexandrov states that the sentence I edited was wrong, e.g [1]. Hoever, several sentences were modified. Which sentence are you referring to here?--CSTAR 17:34, 25 Jun 2005 (UTC)

Well, I modified there only one sentence, everything else is just link insertion. I was referring to the sentence:
H is a reproducing kernel Hilbert space iff the linear map
 f \mapsto f(x)
is norm-continuous for any element x of X.

I thought that the part "for any element x of X" is wrong, but then I realized I was wrong, so I put it back in the next edit. Does that make more sence now? Oleg Alexandrov 17:43, 25 Jun 2005 (UTC)

Seems to. I'll give this more thought later tonight. linas 18:35, 25 Jun 2005 (UTC)
So, basically x is fixed, and then you get a linear map with input f and output f(x). I understand that this is weird, because usually the function f is fixed and input is x while output is f(x). Oleg Alexandrov 19:29, 25 Jun 2005 (UTC)
An important class of examples are Hilbert spaces of analytic functions.--CSTAR 05:59, 26 Jun 2005 (UTC)

article is not consistent[edit]

what is meant by "reproducing kernel Hilbert space" here? sections of the article disagree.

  1. one can say that a Hilbert space of functions on X is a RKHS if it admits a positive definite kernel K on X × X. this is the approach in the first section. with this definition, as CSTAR said above, any L2μ for a non-atomic measure μ would not be a RKHS. (BTW, the positivity of the kernel is really the central theme whether one adopts this formulation or the one below, but that's not pointed out at all in the intro.)
  2. one can start with a positive kernel K on X × X. then there exists a Hilbert space H for which K is a reproducing kernel. this is alluded to in the Moore-Aronszajn theorem section. but then "is a reproducing kernel for" now means something slightly different. it is no longer insisted that H consists of functions on X. after quotiening out the degenrate subspace and taking the completion, we'd end up with equivalence classes in general. in this formulation, any Hilbert space is a RKHS, because the existence of orthonormal basis.

this should be resolved. if it's ok, i am going to attach clean-up tag to the article. Mct mht 01:40, 31 March 2007 (UTC)

I don't think your second point is correct. The phrase "is a reproducing kernel" implies that H consists of functions. This is because a reproducing kernel implies a continuous point-evaluation functional. Unless you are dealing with functions, point-evaluation makes no sense. Also, not every Hilbert space is an RKHS (which has been discussed above) although it may be isomorphic to one.150.203.33.77 02:27, 13 August 2007 (UTC)
i did not make a claim regarding what "is a reproducing kernel" means. i just pointed out that article seems to imply two inconsistent meanings. merely choosing one and label the other one incorrect, as you did, doesn't address the issue. in the intro, article states, as you say, "is a reproducing kernel" means H consists of functions and pointwise evaluations are bounded functionals. on the other hand, in the Moore-Aronszajn theorem section, it seems pretty clear to me that, if one carries out the construction as intended, the resulting Hilbert space would not in general be a spaces of functions, unless the ≥ is changed to >. Mct mht 03:04, 13 August 2007 (UTC)
Ah, I see. You are talking about the construction that you removed, right? I'll add in a correct construction.203.173.5.212 10:38, 13 August 2007 (UTC)
you've changed the section completely and i doubt it's right. do you have a reference? i think the point is that one considers here spaces of functions, not equivalence classes of functions. by starting with L2, this is already violated (contradicting what you said yourself above). furthermore, your inner product does not converge in general. TK is compact, so its eigenvalues λi goes to zero. if they do so, say, exponentially, then your inner product is not well-defined. assuming the inner product is valid, even then you may not have a Hilbert space. i don't think it is true in general that the range of TK is closed.
For a reference, see (for example) "On the Mathematical Foundations of Learning" by Cucker&Smale (bulletin of the AMS). They define an RKHS on page 11. Also, "On the Performance of Kernel classes" by Mendelson (Journal of Machine Learning Research) has a slightly different way of defining it. I'll add these to the references list.
The fact that I started with L2 is irrelevant. Two functions that belong to the same equivalence class will be mapped by the kernel operator to the same function so my RKHS is indeed a space of functions. I'll make a point of this.
The inner product I defined may not (in fact, will not) converge in L2. Note, however, that I only defined it in the image of LK. Then L_K^{-1}g is in L2 and so the inner product is defined. The fact that the inner product doesn't converge in L2 in important; it means that the RKHS is a proper subset of L2. For a start, functions in the RKHS are continuous and have the same number of derivatives as the kernel does.
Are there any more issues with that section, or can I reinstate it? (I'm the same person that made the changes before -- I finally got around to making an account) Jneem 00:18, 19 August 2007 (UTC)
Sorry, I forgot to address the point of the space being complete. Let E be the closed (in L2) span of the eigenfunctions of LK. This, with the standard L2 inner product, is a Hilbert space. Also, LK1/2 is an isomorphism between E (with the L2 inner product) and the RKHS (with the RKHS inner product). Therefore the RKHS is also complete. Jneem 00:24, 19 August 2007 (UTC)
i think the assumption of the previous version was probably correct and it just needed some elaboration. Mct mht 06:38, 17 August 2007 (UTC)
The previous version may have been correct, but it doesn't directly follow from any construction I am aware of. In addition, the construction I give says explicitly what the inner product is. It doesn't seem to be mentioned elsewhere on the page that the inner product on a RKHS will, except in finite-dimensional situations, be completely different from the normal L2 inner product. Reading the talk page, there seems to be some confusion on this point (with people thinking of the point-evaluation functional as being a dirac function) so I think it's worth having.Jneem 00:18, 19 August 2007 (UTC)
hi, Jneem, good to have you contributing. is there a pdf file for the Cucker & Smale paper? some questions/replies:
  1. Re "Two functions that belong to the same equivalence class will be mapped by the kernel operator to the same function": ok, that is true.
  2. Re "The inner product I defined may not (in fact, will not) converge in L2. Note, however, that I only defined it in the image of LK. Then L_K^{-1}g is in L2 and so the inner product is defined.": ok, that is also true.
  3. Re "Let E be the closed (in L2) span of the eigenfunctions of LK. This, with the standard L2 inner product, is a Hilbert space.": by adjoining elements of L2 to get completeness, are you not ending up with a space of equivalence classes? in the line where you verified that this is a RKHS: \langle K_x, f \rangle_H = \langle K_x, T_K^{-1} f \rangle_2 = (T_K (T_K^{-1} f)) (x) = f(x) , you only did it for f in the image of TK. what if f is not in the image of TK but in its completion?
  4. Re the original (in the article right now) version: it is more general and fundamental than the construction you gave, which seems to me constructs a Hilbert space (of either functions or equivalence classes of them) using a positive, in your case unbounded, operator in a pretty standard way. it's clear that a positive definite kernel on a set X (with no extra structure, such as a measure) gives an inner product on functions from X to C with finite support. the issue is completion. if we take the usual Cauchy completion, then it will not be a space of functions. but i think it can be bypassed. Mct mht 01:28, 19 August 2007 (UTC)
OK, you make a good point about completion. What if I define the RKHS by the set of functions \{f(x) = \sum_{i=1}^\infty a_i \sqrt{\lambda_i} \phi_i (x):\ \sum a_i^2 < \infty\}. Then f is defined pointwise because, by Mercer's theorem, \infty > K(x, x) = \sum \lambda_i \phi_i (x)^2 and so f(x) < ∞ by Cauchy-Shwartz. Then this is obviously a family of functions and the fact that it is complete under the RKHS inner product follows from a diagonalisation argument. Basically, I'm forgetting about the kernel operator and following Mendelson's second definition more closely. By the way, there is a pdf of Cucker&Smale at the same address as the ps -- just change the ".ps" to ".pdf". Jneem 03:36, 19 August 2007 (UTC)
besides some details, i think that does it. that the positive operator has purely discrete spectrum seems to me essential here. this is really a kernel K on the natural numbers N given by K(i,j) = λ i δi j in disguise (one can substitute the natural basis {ei(j) = δi j} with any set of orthonormal functions and you did it here with the eigenvectors of TK). i for one am not going to object if you wanna add it back. in any case let's keep the Moore-Aronszajn theorem section, as that is the most general formulation. Mct mht 17:56, 19 August 2007 (UTC)
Sorry, I'm still missing something. The Moore-Aroszajn theorem says that the RKHS is unique. Therefore, any construction we give must result in the same RKHS. So why is one thing more general than another? Jneem 22:57, 19 August 2007 (UTC)
well, the assumptions of the Moore-Aroszajn theorem are different than those for your construction so i don't think you can apply their uniqueness statement. if i understand correctly, the Moore-Aroszajn theorem says that for any set X and any positive definite kernel K: X × X → C, there is a unique RKHS consisting of functions on X. so no structure such as a measure is assumed on X and no passing through an integral operator. Mct mht 01:50, 20 August 2007 (UTC)
you're right, so I've done a proof of Moore-Aroszajn in the general case. I'll add the stuff about Mercer kernels when I have time.Jneem 08:17, 20 August 2007 (UTC)

also, the the Moore-Aronszajn theorem section contains some unusual notation and language ("first class citizens"?), and the very brief outline of the construction doesn't look quite right. Mct mht 02:04, 31 March 2007 (UTC)

I agree; the previous version by the anon contributor made a mishmash of Carleman operators, Hilbert Schmidt operators and Kernels. --CSTAR 15:21, 17 August 2007 (UTC)

It is a Hilbert space[edit]

The most recent edit claims that the inner product given in the proof of Moore-Aronszajn may be degenerate but doesn't say why. I think that it cannot be degenerate: suppose it is. Then there exists a point x such that \langle K_x, K_x \rangle = 0 and Kx is not the zero function. Then we can find y such that Kx(y) is not zero. Then the kernel matrix determined by x and y is


\begin{bmatrix}
0 & a \\
\overline a & b \\
\end{bmatrix}

for some non-zero a and some real number b. One can check that the eigenvalues are


\frac{b \pm \sqrt {b^2 + 4|a|^2}}{2}

and so one eigenvalue is negative, contradicting the fact that the kernel was positive definite. Unless someone can poke a hole in my argument, I'll reinstate the claim that the construction produces a genuine Hilbert space. Jneem 12:14, 27 September 2007 (UTC)

I see no flaw in your argument. I'll undo my edit. Memming 13:08, 27 September 2007 (UTC)
You were right to require symmetry of the kernel, though, so I'll put that back in. Jneem 00:17, 28 September 2007 (UTC)

i don't know if one is interested in real RKHS's but in the complex case, positivity implies Hermiticity, i.e. K(x,y) and K(y,x,) are mutual conjugates. Mct mht 05:06, 9 October 2007 (UTC)

informal description[edit]

an anon contributor recently added the following passage:

"Informally, a reproducing kernel Hilbert space is a function space on which the concept of a Dirac delta function makes sense, despite possibly being technically absent from the Hilbert space."

i think that can be misleading, unless one introduces some concrete examples of RKHS (necessarily one that is not a space of analytic functions). does the anon have some such e.g.'s at hand?

even then, one might still argue that the statement is misleading. talking about δ-functions implicitly assumes that some kinda analytic structure is present, while the generic RKHS does not. take for instance the trivial kernel on R, K(x,y) = δxy. apply the Moore-Aronszajn construction given in the article. the resulting space of functions on R has no obvious analytic description. it would be a big stretch for fit δ-function into the picture, if that is possible. neither does it make sense to speak of δ-functions in the context of spaces of analytic functions. something that is closer to what anon describes is perhaps a rigged Hilbert space. although in that case, the δ-functions do reside in the space IIRC. Mct mht 05:50, 30 September 2007 (UTC)

Hi, I'm anon. My original statement was wrong. I changed it to

"Informally, a reproducing kernel Hilbert space is a function space which contains a Dirac delta-like function at every point of the underlying set."

Anyways, let me explain the rationale. There are only two important things about a dirac delta function: it has a mass (integral) of one concentrated entirely on a single point of the underlying set. The concept of analyticity and taking limits, while important in its construction and interpretation on R^n, is irrelevant to its purpose. It's more of a measure theoretic object (dirac measure) than an analytic object. Perhaps I should have said dirac measure instead of dirac delta function, but since the article was talking about function spaces I thought I'd use the latter to avoid confusion.

Since the rkhs abstracts those two essential properties from the Dirac delta into the definition of the reproducing kernel, I thought it appropriate to make that informal connection. In the article as it stood, it was not clear why anyone would care about rkhs. I couldn't figure out why until I read it carefully and made the connection above, so I thought it was worth sharing.

Your concerns are certainly legitimate (the dirac delta is technically an object with analytic structure, although I'm downplaying that to make the connection) and if you can think of a way to make it more accurate, please do. But let's try to make the article more readable, it's too formalistic right now.--141.211.61.254 18:50, 7 October 2007 (UTC)

I agree with Mct mht and I think the reference to a dirac function should be removed. It might make sense to think of a dirac "function" if the inner product on an RKHS were the usual L2 inner product -- but it isn't and that is a very important point (I already removed a sentence that made the same mistake, so this probably needs to be emphasised somewhere). The inner product on an RKHS is given in the proof of Moore-Aronszajn and it is definitely not the usual one. In fact, as Mct mht pointed out on this talk page, an RKHS does not need a measure.
I think the term "point-evaluation functional" is less misleading than the term "Dirac delta function." I do agree, however that the article needs some motivation. However, I think that the motivation must not be misleading.Jneem 02:07, 9 October 2007 (UTC)

the connection between pointwise evaluations and delta-functions is rather forced in this setting. given a positive operator A on a Hilbert space H, it induces an inner product on its domain: <h, Ah>. in particular, a positive semifinite n × n matrix allows one to renorm Cn. a RKHS is really a very natural generalization of this. if X is the integers, then one can think of positive kernel as an infinite positive matrix. the definition of positivity copies verbatim from the matrix case: we require that every principal minor (of finite size) is positive. so naturally we have an inner product on the space of functions f: X → C with finite support. to get a Hilbert space, two possible obstacles: degeneracy and completeness. if a positive semifinite n × n matrix A is not strictly positive, the induced inner product would be degenerate on Cn. but it one restricts to the range of A, spanned by its columns, degeneracy is removed. this is essentially what is done also for the kernel case. for completeness, we include all functions of finite norm (the abstract Cauchy completion would not do here). Mct mht 05:02, 9 October 2007 (UTC)

Sesquilinearity[edit]

Should we be following the convention for inner product spaces of \langle \cdot , \ \cdot \rangle being linear in the first variable, or the convention for more general sesquilinear forms of being linear in the second variable? My feeling is that since we're dealing solely with Hilbert spaces here, so we're only dealing with inner products, we should use the inner product space convention, which also seems to be more common in the literature. What say you? James pic 11:35, 18 October 2007 (UTC)

matter of consensus. i personally don't have a preference. article just needs to be consistent. Mct mht 19:19, 18 October 2007 (UTC)
I don't mind terribly -- I used linearity in the second variable for consistency with the page on sesquilinear forms (the page on inner product spaces is inconsistent: first, it says that an inner product is a sesquilinear form. Then it says that an inner product is linear in the first variable. Then it gives an example of an inner product (the third example) that is linear in the second variable). Anyway, if you want to make it linear in the first variable, you should also change the proof of the Moore-Aronszajn theorem. Jneem 01:47, 20 October 2007 (UTC)
Like I said, my preference is to use linearity in the first variable, as this seems to be most consistent with the literature, at least for Hilbert spaces. However, even this could be contentious, as physics literature generally uses the opposite convention. As you said, it should be a matter of consensus, so I thought I'd call for a vote.

First Variable: I've seen it most often in the literature. James pic 16:20, 22 October 2007 (UTC)

looks like there won't be any objection if you wanna switch conventions. Mct mht 20:38, 24 October 2007 (UTC)

It all depends on whether you prefer Forward Polish Notation, fx to denote the application of f to x, or Reverse Polish Notation, xf. The usual convention is that the vector space V consists of column vectors v, which transform covariantly as Mv when multiplied on the left by a matrix M representing a linear transformation from V to some vector space. Its dual V* consists of row vectors, which transform contravariantly when multiplied on the right by the adjoint (as distinct from the adjugate) of M. Dot product has the row vector on the left representing a functional from V to the one-dimensional space. So if dot product is your basic example of an inner product then the second variable should be the linear one. Physicists got that one right, see e.g. Blank et al, Hilbert Space Operators in Quantum Physics where the bra is applied to the ket. The other way round corresponds to Reverse Polish Notation as popularized by HP calculators, no inconsistency there as long as you stick to RPN throughout. Switching everything in the article to RPN looks like more work than RPN warrants. --Vaughan Pratt (talk) 09:16, 1 February 2013 (UTC)

Is the proof correct?[edit]

I do not think that your proof on the fact that given a kernel K, there exists a unique RKHS associared with it is correct. Or, at leats, it is incomplete. After introducing G as another Hilbert space of functions for which K is a reproducing kernel, to prove uniqueness it is assumed that G is the completion of the span of \{K_x:\ x \in E \}. But where does this fact come from? This has to be proved. Notice that in Cucker and Smale's paper "On the mathematical foundations of learning" this is a condition which is listed in the statement of theorem 2 on pag. 35. —Preceding unsigned comment added by Spline (talkcontribs) 17:34, 10 December 2008 (UTC)

I understand that H (or G as well) is a space of functions on E: if an element f ∈ H is orthogonal to all the \{K_x:\ x \in E \}, then f is the 0 function on E, thus f = 0_H, hence \{K_x:\ x \in E \} is total. Does it make sense? Bdmy (talk) 11:38, 11 December 2008 (UTC)

Yes, it makes sense. span{K_x} must contain an orthonormal basis of H, hence must be dense in G, if G has to satisfy the reproducing property. I think that this point should be mentioned in the proof, because it would make it more clear. —Preceding unsigned comment added by 87.4.244.249 (talk) 21:38, 11 December 2008 (UTC)

Yes, it makes sense. span{K_x} must contain an orthonormal basis of G, i.e. must be dense in G if G has to satisfy the reproducing property. I think that this point should be mentioned in the proof, it would make it more complete. The conclusion without mentioning this point is not so clear. —Preceding unsigned comment added by 87.4.244.249 (talk) 21:41, 11 December 2008 (UTC)

Symmetric or "Hermitian symmetric"[edit]

I find the wording "the kernel K is symmetric" ambiguous or misleading. Clearly what is meant but never said is the "Hermitian symmetry"

K(y, x) = \overline {K(x, y)}

like the one for the Szegö kernel. I am right? Bdmy (talk) 11:43, 11 December 2008 (UTC)

Kernel as dot-product after non-linear map[edit]

I don't know a lot about this topic, but I ended up here by studying support vector machines. It seems the SVM world ended up using RKHS's by first mapping statistical data from a set X into a Hilbert Space H using a non-linear map f: X -> H, defining the kernel k: X x X -> IR: (x, y) -> <f(x), f(y)>. I can see to some extent how these definitions might be equivalent or at least related, but I think it might be enlightening for some people to mention this approach to RKHS's in the article as well.

Apologies for the clumsy notation, I couldn't find any way to get a tex-template here.

94.224.49.243 (talk) 15:04, 14 October 2012 (UTC)

Take a look at Sławomir Biały's edit here and see if you agree. He's also the main editor for Bergman kernel (Bergman 1922) which he wrote starting on the same day. Note that kernel trick (what you're talking about in connection with SVM's) has been in the See Also section of this article ever since this 2006 edit.
Incidentally you can tex-ify your formulas as k: X\times X\to IR: (x,y)\to\langle f(x),f(y)\rangle etc. --Vaughan Pratt (talk) 08:05, 1 February 2013 (UTC)

Contrast with L^2-spaces[edit]

Subsuming large parts of the discussion above, it seems that the article should be more precise and clearer right at the beginning about the fact that usual L^2-spaces on manifolds are not RKHS's. These aren't because they are not defined as Hilbert spaces of functions, but only of equivalence classes of functions; leading to evaluation at a point of the manifold being undefined. This point is just too likely to be confusing for readers unfamiliar with the subject but used to the usual L^2-spaces. Wurzel33 (talk) 19:36, 1 December 2013 (UTC)