# Talk:Function approximation

WikiProject Mathematics (Rated Start-class, Mid-priority)
This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of Mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Mathematics rating:
 Start Class
 Mid Priority
Field: Applied mathematics
WikiProject Statistics (Rated Stub-class, Mid-importance)

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

Stub  This article has been rated as Stub-Class on the quality scale.
Mid  This article has been rated as Mid-importance on the importance scale.
WikiProject Robotics (Rated Start-class, Mid-importance)
Function approximation is within the scope of WikiProject Robotics, which aims to build a comprehensive and detailed guide to Robotics on Wikipedia. If you would like to participate, you can choose to edit this article, or visit the project page (Talk), where you can join the project and see a list of open tasks.
Start  This article has been rated as Start-Class on the project's quality scale.
Mid  This article has been rated as Mid-importance on the project's importance scale.

## Untitled

Function approximation is a general class of problem solving where one tries to approximate an unknown function from a labeled data set (X, y).
....
Mathematically the problem can be posed as:
$\min_{w} \|Xw - y\|^2.$

I do not understand the above. What does "labeled" mean? Is the "data set" simply a finite collection of ordered pairs of numbers? The part following the words "the problem can be posed as" makes no sense at all. If the author of these words or anyone else has any idea what was meant, could they make some attempt to explain it in the article? Michael Hardy 04:22, 27 Nov 2004 (UTC)

Function approximation is used in the field of supervised learning and statistical curve fitting. The goal of this problem is to find a function $f^{*}$ within the family of functions:
$\{f: \Bbb{R}^{n}$ $\Bbb{R} \}$
that minimizes the expected error defined as:
$\sum^{T}_{i=0} P(x_{i})L(x_{i}, f)$
Where $P(x_{i})$ is the probability that the example $x_{i}$ will be sampled and $L(x_{i}, f)$ is the loss function which describes how accurate the function f is at predicting the correct value for a given $x_{i}.\,$

The reference to the probability that xi will be sampled means that the randomness resides in the independent variable and not in its image under f. That is strange when talking about curve-fitting. It also implies that the probability distribution is discrete. Also very strange. In the expression L(xi, f) it is not clear whether the f is supposed to be the unobservable true f or some fitted estimate of it given the data. "... how accurate the function f is at predicting the correct value" implies f is an estimate, but speaking of L as a loss function would normally presuppose that f is the true value. The article then goes on to say that ordinary least squares estimation is an example. Ordinary least squares minimizes the sum of squares of residuals; it does not minimize an expected loss.

Yes that definition is not that great, I'll try and replace it with a better one when I have time. Also I think that loss function in this sense is defined differently. Nvrmnd 03:07, 29 Nov 2004 (UTC)

A simple example of linear function approximation posed as an optmization problem is:
$\min_{w} \|Xw - y\|^2.$

I still find the above incomprehensible as written. I've put a guess as to what it means on the discussion page of the person who wrote the above. But one shouldn't have to guess.

If this is supposed to be about curve-fitting as usually practiced, it fails to explain it. Michael Hardy 02:28, 29 Nov 2004 (UTC)

I rewrote the introduction leading up to what I think the main point (of a recent version) of the article was. However, I'm not convinced that the title "function approximation" is appropriate anymore (or that it ever was appropriate, for that matter). Currently, it is essentially about empirical risk minimization, a concept which doesn't have its own article yet. I don't think we need another article about regression analysis, curve fitting, or approximation theory, so maybe simply moving the present article to empirical risk minimization or merging it with one of the statistical learning theory articles could be a satisfactory solution. --MarkSweep 16:08, 1 Dec 2004 (UTC)

How about merging into supervised learning? -- hike395 18:17, 31 Jan 2005 (UTC)
Possibly, except that supervised learning is a different concept, and in some sense more general. Supervised learning is not necessarily about global function approximations in the sense defined here. The case/instance/memory-based learning scenario is supervised, but there one doesn't attempt to infer global generalizations (one could argue that it is in fact the same, but the term "function approximation" is not widely used to describe memory-based learning). It would also be good to have a brief article called "function approximation" that clarifies that the term may mean different things to people working in approximation theory vs. machine learning. --MarkSweep 21:19, 31 Jan 2005 (UTC)
What you're proposing sounds very clear and good to me. As I understand your proposal, we should...
1. Leave a short article here that describes the difference between appoximation theory (creating convergent series of functions, e.g.) and machine learning (creating functions from samples). Machine learning part should point to supervised learning
2. Make sure that supervised learning article starts out very generally, to include methods such as nearest-neighbor and local model fitting. We can distinguish between global model fitting and local model fitting over there.
3. Move this empirical risk minimization stuff over to supervised learning, in its own section. It may be a good idea to put the new section after the list of supervised learning methods, although that list is so long that perhaps no one would read the new section.
Did I grasp your idea correctly, or would you like to do something else? -- hike395 04:41, 1 Feb 2005 (UTC)
Yes, exactly regarding the first and second point. The first point in particular has caused some confusion, both here and elsewhere, so a short summary should be kept here. As for empirical risk minimization, we don't seem to have a separate article for that yet. Either start a new article, or integrate it with something like statistical learning theory a.k.a. computational learning theory. --MarkSweep 06:07, 1 Feb 2005 (UTC)
Another thought: I think that the example section of this article should be sent over to regression analysis: it's such a specific example, that it won't illustrate supervised learning well. Sending the example over to regression will make the new material fit better at supervised learning, I think. -- hike395 05:32, 1 Feb 2005 (UTC)
Right, the example section is a remnant from an older version of this article. I didn't feel like deleting it altogether, but it is a bit out of place now. --MarkSweep 06:07, 1 Feb 2005 (UTC)
Done: sent material to regression analysis and supervised learning, kept intro material here, redir from empirical risk minimization back to supervised learning. Feel free to edit. hike395 06:56, 1 Feb 2005 (UTC)