|WikiProject Robotics||(Rated C-class, Mid-importance)|
|WikiProject Statistics||(Rated C-class, High-importance)|
- 1 Article talks separately about kernel functions and covariance functions -- it would be good to explain their relationship
- 2 Merge with Kriging article?
- 3 Gaussian Process vs. integral of Gaussian Process
- 4 Untitled
- 5 suggestions for clarification
- 6 Link to the Gaussian Processes Research Group at the Australian Centre for Field Robotics
- 7 Alternative definition
- 8 "Process" is a "distribution"?
- 9 Lazy learning and Optimization
Article talks separately about kernel functions and covariance functions -- it would be good to explain their relationship
Merge with Kriging article?
Gaussian Process vs. integral of Gaussian Process
Is the integral of a Gaussian process somehow also a Gaussian process? Or is this just a common abuse of terminology? I think it's the later, and made some changes to reflect that... — Preceding unsigned comment added by 184.108.40.206 (talk) 21:33, 16 September 2014 (UTC)
Added cleanup tag: this article does not give someone in the field an adequate overview of what a Gaussian process is, and goes off on a tangent involving undefined math. —Preceding unsigned comment added by Ninjagecko (talk • contribs)
- Perhaps it could be made accessible to a somewhat broader audience, but where does it go off on a tangent or get into "undefined math"? It gives the definition and a simple characterization, and then it lists examples, with links. Michael Hardy 21:45, 4 December 2006 (UTC)
- Of course, any article can be improved in many ways, and surely this one can also. However, I have no idea what you mean by undefined math. In addition, I would have thought that for someone in the field, this article is rather banal and uninteresting, since surely its contents would be already familiar to such an individual. Do you mean someone not in the field? --CSTAR 03:54, 5 December 2006 (UTC)
- Somehow my reply never went through. Michael-- Yes, you're right. Technically the indices were previously defined way at the top, thus I removed the cleanup tag. Nevertheless it wasn't very clear I thought, so I improved the article lots by categorizing all the glomped-up text, and making the definition abit clearer. CSTAR-- No, I meant what I said: "someone in the field". Even as a reference, it was hard to follow. I've already fixed it though. —The preceding unsigned comment was added by Ninjagecko (talk • contribs) 09:21, 6 December 2006 (UTC).
- Also CSTAR, I personally find it rather haughty, to imagine the only people who have any business reading this entry are people who've been working with this material for 4+ years. The point of a reference is to be a reference for someone who wants to learn or brush up on the material. No offense. Ninjagecko 09:24, 6 December 2006 (UTC)
- I don't think your statement(the only people who have any business reading this entry are people who've been working with this material for 4+ years) paraphrases in any way what I said. In any case what I had intended to say was that the article was technicaly correct. --CSTAR 13:36, 6 December 2006 (UTC)
suggestions for clarification
I'm not in the field, and I have found some things I wish this article would clarify. Please feel free to say there is some other, introductory article to the topic that I should have read which would have explained the answers to my questions.
- 1. What is an easy, mathematical example of a Gaussian process?
- 2. Does the definition imply that a Gaussian process is normally distrusted? (I think the answer is obviously yes, but I have no experience to justify changing this article.)
- 3. How does the definition imply the parenthetical remark "any linear functional applied to the sample function Xt will give a normally distributed result"? An example? So integrating Xt yields a Gaussian process?
- 4. What is a sample function? pdf? cdf? Other types?
- Gaussian processes are distributions over infinite dimensional objects (i.e functions), whereas multivariate normal distributions are defined over finite dimensional objects or variables. In other words, GPs can be thought of as extension of multivariate normal distributions to infinite dimensionality. appoose (talk)
- I do not know the proof, but for 3, integration of a GP results in a GP as well as any other linear operation (summing, differentiation, etc.) Aghez (talk) 20:52, 11 March 2012 (UTC)
Link to the Gaussian Processes Research Group at the Australian Centre for Field Robotics
I have renamed the link to www.gaussianprocesses.com, to "The Gaussian Processes Research Group at the Australian Centre for Field Robotics". The web site has a very general sounding name, but the home page is currently recruiting students to a lab, rather than explaining the theory of Gaussian processes, as the link description previously claimed to do. I hope this avoids confusion. Mebden (talk) 08:26, 5 March 2009 (UTC)
Is the that appears in the second display formula of the section the Imaginary unit? If it is an index, it is not bound to any summation sign. Maybe a real-valued variable? I do not have a reference with me of the formula so I cannot fix it, but I guess that something is missing. I would be grateful if someone does fix it. Junkie.dolphin (talk) 15:49, 3 July 2012 (UTC)
- The fact that it is the imaginary unit is confirmed/implied by the equation being part of a sentence starting "Using characteristic functions ....". Melcombe (talk) 16:55, 3 July 2012 (UTC)
"Process" is a "distribution"?
The current article says: "A Gaussian process is a statistical distribution Xt, t ∈ T, for which any finite linear combination of samples has a joint Gaussian distribution." I think a "process" is an indexed collection of a random variable while a "distribution" is a function associated with a single random variable. The notation apparently intends to convey the idea of "an indexed collection of distributions", so it would be better to use those words than the singular "a statistical distribution".
- Yes, this must be wrong and it's confusing. It means you have to look somewhere else for the actual efinition (outside of Wikipedia). 220.127.116.11 (talk) 03:11, 15 December 2015 (UTC)
- Hello. It is a distribution, but over an infinite dimensional space. Which makes it rather different from more common distributions, like e.g. the Gaussian distribution. I think the term "distribution" is more misleading than helpful here, so I have replaced it with plain "statistical model", since the text does then go on to define a GP. I hope that helps. — Preceding unsigned comment added by Winterstein (talk • contribs) 09:09, 11 June 2016 (UTC)
Lazy learning and Optimization
Winterstein, I noticed the addition on the page relating GPs to lazy learning and them usually being fitted with optimization software. While I appreciate that your experience may have given you this practical insight, I am not sure that this is beneficial to someone trying to understand what is a GP.
Regarding lazy learning, I am not familiar enough with the concept to be able to tell if it applies here, but from the short wikipedia article and your blog I can see how it would apply to a GP used for krigging.
Regarding optimization software, what is really necessary is some matrix algebra, which includes a matrix inversion, to get the posterior mean (if you want a single value estimate) and some more to get the posterior variance if you want that too. While in certain cases (large matrices, etc.) optimization software may be used to find these, it is not something fundamental to the process that one reading this article would need to know about.
Finally, it can only be viewed as a machine learning algorithm when used for prediction (krigging) as you mention, so overall I think your comments would be more at home in the Applications section. It might also be more appropriate to give actual sources than a blog entry, despite how impressive your background is. Thank you. Webdrone (talk) 17:38, 7 June 2016 (UTC)
- Actually it would be a great help if you could help fix the very first sentence which reads "[...] a Gaussian process is a statistical distribution, [...]". Webdrone (talk) 17:42, 7 June 2016 (UTC)
- Hello Webdrone. Thank you for your thoughtful comments.
I think it is appropriate that the overview section should include notes on the uses of a technique as well as the technical definition -- otherwise it isn't an overview. Also, we'd like the overview to be readable by a range of people. As it was, the overview was not accessible to anyone other than probability theorists. Making it a little more accessible to the machine learning community is a good thing. I think there is more work to be done making this article accessible, both within these communities and to more communities, but I do believe my addition helps.
I also think that the infinite-dimensional distribution-based phrasing is a challenging way to introduce new people to this model (especially for the majority of those who use statistical methods but have not studied e.g. Hilbert spaces). Giving people a couple of ways to get their head around these ideas can only help.
Regarding the mention of "using optimisation software" -- thank you for the observation about matrix algebra being enough. Optimisation software is needed if you use a parameterised kernel (which opens up a wider range of applications beyond "traditional" kriging). I will amend the text now to give both.
Regarding sources for a paragraph that is an aid towards understanding -- academic papers go straight to the technical definitions by their very nature, and I don't know of a GP textbook yet which has an introduction for non-probability-theorists. Blog posts are the "natural" source for this kind of material. If you know of a better source, please do put one in. I don't think it would be appropriate to fully expand this paragraph within this article, as the explanation-for-machine-learning-people would then somewhat swamp the important technical matter.
- Hello WebDrone. Re. the first sentence -- I agree it could use work, but I can't think of a good re-phrasing. I've replaced the "stats distribution" phrase -- which other people have also complained about (see above) with the less confusing (if also less meaningful) phrase "stats model". — Preceding unsigned comment added by Winterstein (talk • contribs) 09:05, 11 June 2016 (UTC)
- Winterstein, I guess you are right, including your comments might make it more accessible to people from different backgrounds. I hope we're improving the article -- it annoys me that it's not well-written, but I'm not sure how to improve it.
- As for the infinite-dimensionality explanation, I feel like alternative explanations are always missing something. I come from physics where Hilbert spaces are often used so maybe that's why. Do you think an explanation along the following lines might help a reader visualise the infinite dimensionality setting?
- "The function (f(x)=y) can be thought to exist as a single point in a (infinite-dimensional) space where each point x in the function's domain is a separate dimension in this new space. Values of y associated with each x point are coordinates of the function in that x dimension; think of f(x)=y as a very long vector, with an element for each possible x value -- since x is continuous it has infinite possible values and so the vector is infinitely long. We define a covariance kernel which relates an x dimension to another, and use it along with a mean function (m(x) which is usually taken to be 0) to set a multi-variate Gaussian prior over the infinite-dimensional space. We can then consider a set of observations (x, y) to be jointly Gaussian with non-observed points (x*, y*) with mean and covariance given by our prior. Conditioning on the observations, we can create a posterior Gaussian for y*|y, with a new mean and covariance which takes into account given points. Sampling points from this multi-variate Gaussian posterior gives possible functions which satisfy our conditions. Alternatively, just the posterior mean can be used as the MAP estimate of the function, with the new covariance used to find the uncertainty for each dimension (x value). In case of zero noise assumed for observed values (y), the new mean will go through the y values with 0 posterior variance (uncertainty), for the associated x dimensions."
- Webdrone (talk) 19:30, 18 June 2016 (UTC)