# Talk:Non-linear least squares

WikiProject Statistics (Rated Start-class, High-importance)

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

Start  This article has been rated as Start-Class on the quality scale.
High  This article has been rated as High-importance on the importance scale.

## Maybe merge it with Gaussian-Newton method?

I think that this is a great article but reading some related entries I have the feeling that some of the information is redundant and sometimes not organized in an accessible way. For example, in this article you find information about weighted nonlinear least squares error but you don't find anything about this in the Gaussian newton article. That's also a problem in the article about Levenberg–Marquardt_algorithm. It might very useful to merge at least some of this information in a single large article than have redundant entries in a bunch of articles (which makes navigation much harder) :-) — Preceding unsigned comment added by 129.132.224.85 (talk) 12:05, 10 November 2014 (UTC)

## Least squares: implementation of proposal

which contain more technical details, but it has sufficient detail to stand on its own.

In addition Gauss-Newton algorithm has been revised. The earlier article contained a serious error regarding the validity of setting second derivatives to zero. Points to notice include:

• Adoption of a standard notation in all four articles mentioned above. This makes for easy cross-referencing. The notation also agrees with many of the articles on regression
• Weighted least squares should be deleted. The first section is adequately covered in Linear least squares and Non-linear least squares. The second section (Linear Algebraic Derivation) is rubbish.

This completes the fist phase of restructuring of the topic of least squares analysis. From now on I envisage only minor revision of related articles. May I suggest that comments relating to more than one article be posted on talk: least squares and that comments relating to a specific article be posted on the talk page of that article. This note is being posted an all four talk pages and Wikipedia talk:WikiProject Mathematics.

Petergans (talk) 09:36, 8 February 2008 (UTC)

## Cholesky Decomposition not in linked Linear Least Squares article

The article suggests that Cholesky Decomposition is described (in the context in a usable way) in the Linear Least Squares article. But it is not, as that is just a links page, nor is any of the sub-subjects there describing Cholesky Decomposition in a context of use in Least Squares. The direct article on Cholesky Decomposition does not deal (as I read it) with the radically non-square matrix issues that Least Squares method present. (Of course they could be treated as 0 elements to extend to square, but this article describes that it's use is the same as in the other article---it is not.) (74.222.193.102 (talk) 05:19, 5 January 2011 (UTC))

I have fixed the (one) link to Linear Least Squares. But what "radically non-square matrix issues" are there here? The matrix to be inverted here is square and symmetric: ${\displaystyle \left(J^{T}WJ\right)}$. Melcombe (talk) 11:22, 5 January 2011 (UTC)

## Dubious example

In the section "Multiple minima", the passage

"For example, the model
${\displaystyle f(x_{i},\beta )=\left(1-3\beta +\beta ^{3}\right)x_{i}}$
has a local minimum at ${\displaystyle \beta \,=1}$ and a global minimum at ${\displaystyle {\hat {\beta }}\,}$ = −3.[6]"

appears dubious. We can't find the minima for beta unless we know some data; different data will give different locations of the minima. Maybe the source [6] gave some data for which this result obtains. But unless that data set is very small, it would be pointless to put the data into this passage for the sake of keeping the example. Therefore I would recommend simply deleting the example. The assertion that it attempts to exemplify, that squares of nonlinear functions can have multiple extrema, is so obvious to anyone who has read this far in the article that no illustrative example is necessary. Duoduoduo (talk) 21:56, 6 February 2011 (UTC)

Seeing no objection, I'm removing the example. Duoduoduo 16:56, 8 February 2011 (UTC)

## an explicite example in the article's introductory section

Coming from "polynomial regression" I'm a bit confused about the difference between polynomial (=multiple linear) and nonlinear regression. As I understood, we have in polynomial regression, that y is a vector-function of x and a set of parameters as well. So what is the difference between the function formally referenced here y = f(x,beta) and that of polynomial regression which would be of the same form y = f(x,b)? One simple example where f(x,beta) is explicated would be great. (Below in the article there is something with the exp-function, but I'm unsure how I had to insert that here in the introductory part before that heavy-weight-formulae following it directly/whether I have the correct translation at all.) Perhaps just the "most simple nonlinear function" as an example taken in the same explicite form as in "polynomial regression" y=a+bx+??? would be good...

upps I didn't sign my comment... --Gotti 19:57, 7 August 2011 (UTC)

## Symbol for fraction parameter for shift-cutting

Using f for the fraction parameter for the shift-cutting is a bad choice IMHO as f is already used for the function whose parameters are to be determined/fitted. I think a better symbol here would be 'alpha', which is isn't already used in this article, and which is used for exactly the same purpose in the Gauss-Newton article (and in my experience also in quite a lot of the optimisation/root finding literature). (ezander) 134.169.77.151 (talk) 10:14, 24 October 2011 (UTC)

## Excellent article -- except one section

But....It'd be excellent if there was more information on "Parameter errors, confidence limits, residuals etc" rather than referring back to linear least squares. At least in what I read there are subtle differences and assumptions in NL-LS and OLS about local linearity, local minima, etc that need to be considered and would be appropriate to include here. Not an expert so I can't do it myself, sorry.

"Conjugate gradient search. This is an improved steepest descent based method with good theoretical convergence properties, although it can fail on finite-precision digital computers even when used on quadratic problems.[7] M.J.D. Powell, Computer Journal, (1964), 7, 155."

This method is robust to finite-precision digital computers. See next reference, p. 32 (Convergence Analysis of Conjugate Gradients) "Because of this loss of conjugacy, the mathematical community discarded CG during the 1960s, and interest only resurged when evidence for its effectiveness as an iterative procedure was published in the seventies."

An Introduction to the Conjugate Gradient Method Without the Agonizing Pain. Jonathan Richard Shewchuk http://www.cs.cmu.edu/~quake-papers/painless-conjugate-gradient.pdf — Preceding unsigned comment added by 198.102.62.250 (talk) 22:10, 19 December 2011 (UTC)

## minimizing error using all known variables

I have posted a comment on the pearson's chi square page, somewhat related to the least square method, but my assumption is to start with an equation which fits all known values (the error/sum of squares is zero as the eqn f(x,y)=0 passes through every known point) and then solve for minimal error on the unknown y for a given x. I wonder if there is any work in this direction? http://en.wikipedia.org/wiki/Talk:Least_squares -Alok 23:09, 26 January 2013 (UTC)

## Difference from Linear Least Squares

Consider the example from the LLS page:

There are four ${\displaystyle (x,y)}$ data points: ${\displaystyle (1,6),}$ ${\displaystyle (2,5),}$ ${\displaystyle (3,7),}$ and ${\displaystyle (4,10)}$ . In the LLS example, the model was a line ${\displaystyle y=\beta _{1}+\beta _{2}x}$. However, if we take a model that is nonlinear in the parameter, e.g. ${\displaystyle y=e^{\beta _{1}x}}$, the procedure still seems to work (without any iterative method):

{\displaystyle {\begin{alignedat}{2}6&&\;=e^{\beta _{1}}\\5&&\;=e^{2\beta _{1}}\\7&&\;=e^{3\beta _{1}}\\10&&\;=e^{4\beta _{1}}\\\end{alignedat}}}

Then we can form the equation of the sum of squares of the residuals as ${\displaystyle S(\beta _{1})=(6-e^{\beta _{1}})^{2}+(5-e^{2\beta _{1}})^{2}+(7-e^{3\beta _{1}})^{2}+(10-e^{4\beta _{1}})^{2}}$ compute its partial and set it to zero ${\displaystyle {\frac {\partial S}{\partial \beta _{1}}}=0=-12e^{\beta _{1}}-18e^{2\beta _{1}}-42e^{3\beta _{1}}-76e^{4\beta _{1}}+6e^{6\beta _{1}}+8e^{8\beta _{1}}}$

and solve to get ${\displaystyle \beta _{1}=0.59}$ resulting in ${\displaystyle y=e^{.59x}}$ as the function that minimizes the residual. What is wrong with this? That is, why can you not do this/why are you required to use a NLLS method for a model like this? daviddoria (talk) 15:12, 14 February 2013 (UTC)

The question you have asked is very similar to the one I posted right above. I can get a perfect fit for the known data set and then minimize for the unknown point. I too think the non linear case needs some clarifications. -Alok 17:18, 17 February 2013 (UTC) — Preceding unsigned comment added by Alokdube (talkcontribs)

In your example, the iteration is required in order to solve the last non-linear equation (${\displaystyle \textstyle {\frac {\partial S}{\partial \beta _{1}}}=0=-12e^{\beta _{1}}+...)}$. If you had started with the linear model ${\displaystyle \textstyle y=\beta _{1}x}$, you would have gotten the linear equation ${\displaystyle \textstyle {\frac {\partial S}{\partial \beta _{1}}}=0=-154+60\beta _{1},}$ which is readily solved. This is pretty well covered in the theory section. In both linear and non-linear least squares, one solves

${\displaystyle {\frac {\partial S}{\partial \beta _{j}}}=2\sum _{i}r_{i}{\frac {\partial r_{i}}{\partial \beta _{j}}}=0\quad (j=1,\ldots ,n).}$

With a linear model this is a simultaneous set of linear equations, but with a non-linear model it is a simultaneous set of non-linear equations. In your example, you are stopping at the point where the non-linear and linear problems require different techniques. Cfn137 (talk) 19:06, 3 February 2016 (UTC)

## field surveying may have good examples

An early/historical use of least squares was by field surveyors measuring bearings and distances and leveling heights, to compute survey monument coordinates - they were early-adopters and would invert normal equations on paper by hand, over the winter after a summer of field measurements, so folklore has it. Some of the observation equations are non-linear, and with 'triangulation networks' and traverse closure, there was usually always redundancy in measurements. Statistical calibration of measuring devices would give an input variance to observations, and the final solved linear normal equation could be used to compute a covariance of the computed unknowns - the survey monument positions. Surveying textbooks might be a good place to look for some organization of concepts and/or easy to understand geometry examples. — Preceding unsigned comment added by 75.159.19.229 (talk) 15:31, 12 June 2015 (UTC)

## Open source solver

Ceres Solver is an open source implementation of a solver for non-linear least squares problems with bounds constraints. Olivier Mengué |  13:23, 6 October 2016 (UTC)