Talk:Non-linear least squares
|WikiProject Statistics||(Rated Start-class, High-importance)|
|This subject is featured in the Outline of regression analysis, which is incomplete and needs further development. That page, along with the other outlines on Wikipedia, is part of Wikipedia's Outline of Knowledge, which also serves as the table of contents or site map of Wikipedia.|
|This article is substantially duplicated by a piece in an external publication. Please do not flag this article as a copyright violation of the following source:
- 1 Least squares: implementation of proposal
- 2 Cholesky Decomposition not in linked Linear Least Squares article
- 3 Dubious example
- 4 an explicite example in the article's introductory section
- 5 Symbol for fraction parameter for shift-cutting
- 6 Excellent article -- except one section
- 7 Conjugate gradient method
- 8 minimizing error using all known variables
- 9 Difference from Linear Least Squares
Least squares: implementation of proposal
- Least squares has been revised an expanded. That article should serve as in introduction and over-view to the two subsidiary articles,
which contain more technical details, but it has sufficient detail to stand on its own.
In addition Gauss-Newton algorithm has been revised. The earlier article contained a serious error regarding the validity of setting second derivatives to zero. Points to notice include:
- Adoption of a standard notation in all four articles mentioned above. This makes for easy cross-referencing. The notation also agrees with many of the articles on regression
- New navigation template
- Weighted least squares should be deleted. The first section is adequately covered in Linear least squares and Non-linear least squares. The second section (Linear Algebraic Derivation) is rubbish.
This completes the fist phase of restructuring of the topic of least squares analysis. From now on I envisage only minor revision of related articles. May I suggest that comments relating to more than one article be posted on talk: least squares and that comments relating to a specific article be posted on the talk page of that article. This note is being posted an all four talk pages and Wikipedia talk:WikiProject Mathematics.
The article suggests that Cholesky Decomposition is described (in the context in a usable way) in the Linear Least Squares article. But it is not, as that is just a links page, nor is any of the sub-subjects there describing Cholesky Decomposition in a context of use in Least Squares. The direct article on Cholesky Decomposition does not deal (as I read it) with the radically non-square matrix issues that Least Squares method present. (Of course they could be treated as 0 elements to extend to square, but this article describes that it's use is the same as in the other article---it is not.) (18.104.22.168 (talk) 05:19, 5 January 2011 (UTC))
- I have fixed the (one) link to Linear Least Squares. But what "radically non-square matrix issues" are there here? The matrix to be inverted here is square and symmetric: . Melcombe (talk) 11:22, 5 January 2011 (UTC)
In the section "Multiple minima", the passage
- "For example, the model
- has a local minimum at and a global minimum at = −3."
appears dubious. We can't find the minima for beta unless we know some data; different data will give different locations of the minima. Maybe the source  gave some data for which this result obtains. But unless that data set is very small, it would be pointless to put the data into this passage for the sake of keeping the example. Therefore I would recommend simply deleting the example. The assertion that it attempts to exemplify, that squares of nonlinear functions can have multiple extrema, is so obvious to anyone who has read this far in the article that no illustrative example is necessary. Duoduoduo (talk) 21:56, 6 February 2011 (UTC)
Seeing no objection, I'm removing the example. Duoduoduo 16:56, 8 February 2011 (UTC)
an explicite example in the article's introductory section
Coming from "polynomial regression" I'm a bit confused about the difference between polynomial (=multiple linear) and nonlinear regression. As I understood, we have in polynomial regression, that y is a vector-function of x and a set of parameters as well. So what is the difference between the function formally referenced here y = f(x,beta) and that of polynomial regression which would be of the same form y = f(x,b)? One simple example where f(x,beta) is explicated would be great. (Below in the article there is something with the exp-function, but I'm unsure how I had to insert that here in the introductory part before that heavy-weight-formulae following it directly/whether I have the correct translation at all.) Perhaps just the "most simple nonlinear function" as an example taken in the same explicite form as in "polynomial regression" y=a+bx+??? would be good...
upps I didn't sign my comment... --Gotti 19:57, 7 August 2011 (UTC)
Symbol for fraction parameter for shift-cutting
Using f for the fraction parameter for the shift-cutting is a bad choice IMHO as f is already used for the function whose parameters are to be determined/fitted. I think a better symbol here would be 'alpha', which is isn't already used in this article, and which is used for exactly the same purpose in the Gauss-Newton article (and in my experience also in quite a lot of the optimisation/root finding literature). (ezander) 22.214.171.124 (talk) 10:14, 24 October 2011 (UTC)
Excellent article -- except one section
Whoever on wikipedia has contributed to this article, congrats, it's great.
But....It'd be excellent if there was more information on "Parameter errors, confidence limits, residuals etc" rather than referring back to linear least squares. At least in what I read there are subtle differences and assumptions in NL-LS and OLS about local linearity, local minima, etc that need to be considered and would be appropriate to include here. Not an expert so I can't do it myself, sorry.
Conjugate gradient method
"Conjugate gradient search. This is an improved steepest descent based method with good theoretical convergence properties, although it can fail on finite-precision digital computers even when used on quadratic problems. M.J.D. Powell, Computer Journal, (1964), 7, 155."
This method is robust to finite-precision digital computers. See next reference, p. 32 (Convergence Analysis of Conjugate Gradients) "Because of this loss of conjugacy, the mathematical community discarded CG during the 1960s, and interest only resurged when evidence for its effectiveness as an iterative procedure was published in the seventies."
An Introduction to the Conjugate Gradient Method Without the Agonizing Pain. Jonathan Richard Shewchuk http://www.cs.cmu.edu/~quake-papers/painless-conjugate-gradient.pdf — Preceding unsigned comment added by 126.96.36.199 (talk) 22:10, 19 December 2011 (UTC)
minimizing error using all known variables
I have posted a comment on the pearson's chi square page, somewhat related to the least square method, but my assumption is to start with an equation which fits all known values (the error/sum of squares is zero as the eqn f(x,y)=0 passes through every known point) and then solve for minimal error on the unknown y for a given x. I wonder if there is any work in this direction? http://en.wikipedia.org/wiki/Talk:Least_squares -Alok 23:09, 26 January 2013 (UTC)
Difference from Linear Least Squares
Consider the example from the LLS page:
There are four data points: and . In the LLS example, the model was a line . However, if we take a model that is nonlinear in the parameter, e.g. , the procedure still seems to work (without any iterative method):
Then we can form the equation of the sum of squares of the residuals as compute its partial and set it to zero
and solve to get resulting in as the function that minimizes the residual. What is wrong with this? That is, why can you not do this/why are you required to use a NLLS method for a model like this? daviddoria (talk) 15:12, 14 February 2013 (UTC)
The question you have asked is very similar to the one I posted right above. I can get a perfect fit for the known data set and then minimize for the unknown point. I too think the non linear case needs some clarifications. -Alok 17:18, 17 February 2013 (UTC) — Preceding unsigned comment added by Alokdube (talk • contribs)