# Talk:Total least squares

WikiProject Statistics (Rated B-class, Mid-importance)

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

B  This article has been rated as B-Class on the quality scale.
Mid  This article has been rated as Mid-importance on the importance scale.

Can you clarify: Are the noises in A iid and the noises in b iid (but with possibly a differenct common distribution from the noises in A)-- OR, do the noises in A have the same distribution as the noises in b?

To my best knowledge, the connection between the residuals and the real noise distribution is still not clear in any literature. Only thing we can say is that the residuals have even probability in positive and negative values. Based on that, noises in A and B have even distribution functions. If the real noise has odd pdf such as chi-distribution, then TLS may not work well. It is because TLS allows the negative correction in elements of A and b. Rather than speaking distribution, we can say that TLS minimizes the estimate of noise powers equally in A and b. I am sorry but I can describe only a very rough picture. S. Jo 01:40, 18 April 2006 (UTC)

This page is wrong. Error-in-variables refers to a model where the independent variables are measured with error. There are lots of possible solutions, of which total least squares is only one possibility (and one that's far from universally accepted. Economists would never use it, for example.) -- Walt Pohl 02:44, 27 November 2006 (UTC)

I agree (that it's incomplete). In the eiv literature method of moments estimators (plural!) are more in use. Eg for the example on this page, see Amemiya et al 1987. There are also several other estimators: SIMEX, estimating equations with adjusted score functions,...more? Libraries have been written on this, and not all on "total least squares"!
Carroll et al. heaviliy criticise orthogonal regression. In this specific example their criticism does not apply because the "$\approx$" sign I presume means that the relationship is near perfect. As soon as that is not the case however, orthogonal regression goes wrong. Apparently this problem is not encountered much in computer science (??) but in other disciplines (econometrics, epidemiology, psychology, medicine, agricultural sciences, etc. etc.) it is more or less always the case that relationships to be estimated are imperfect.
Besides that it is difficult to understand, this explanation seems very tailored to the treatment of this problem within one specific field.
87.219.191.214 11:53, 5 April 2007 (UTC)

## Total Least Squares, Orthogonal, Errors-in-variables

I think we need some clarification on the terminology used throughout the literature. I am not a real statistician, but to me Orthogonal regression as explained <A HREF="http://www.nlreg.com/orthogonal.htm"> here </A> is the same as TLS, at least it minimizes the same distances... could anyone elaborate? I think it would really add to the usability of the method. Jeroenemans 16:53, 5 December 2006 (UTC)

In my view, "Orthogonal" and TLS are similar, but "Errors-in-variables" still leaves freedom in which direction to measure error. It all depends to what degree you would like to have the dependant variable contribute to error. So, in the case of the basic 2-dimensional model you are refering to, one should not necessarily assume orthogonal distance meausurement as optimal solution. OK?Witger 18:03, 5 December 2006 (UTC)

## Page revised

This page has been revised to bring it in line with other articles on least squares. As it stands, it is mostly relevant to the physical sciences, where there is experimental error on all measured quantities. It is obvious from the discussion above that in other fields other methods are more likely to be used, but I have no experience of them. The earlier version mentions Data Least Squares, and Structured Total Least Squares, but gave no details, so they have been omitted, for now.

Regarding the comment "this page is wrong", I would say rather that it's the title that is wrong. It should always have been Total least squares as that is what it was about. Maybe Errors-in-variables is another topic altogether? Petergans (talk) 15:34, 16 February 2008 (UTC)

I think that's right. (Total least squares is a technique for error-in-variables that makes sense for the physical sciences (where you can plausibly know that two different variables have the same size of measurement error, but it makes less sense in the social sciences, where instrumental variables would be preferred. -- Walt Pohl (talk) 20:21, 16 February 2008 (UTC)
Error size has no meaning in physical measurements, unless it's of the same thing and in the same units. Only relative sizes matter. Is this what you mean? Pgpotvin (talk) 05:34, 22 February 2008 (UTC)

## Should we move to his page to Total Least Squares?

I was thinking (inspired by Peter Gans) that we should just move this page to total least squares, and start a new page on error-in-variables. Does anyone have any objections? -- Walt Pohl (talk) 20:34, 16 February 2008 (UTC)

I would support that suggestion. I've just finished revising regression analysis where there is a link in Underlying assumptions to this article. That link would be better directed to the new page. Petergans (talk) 14:04, 20 February 2008 (UTC)
Since no one else seems to have an opinion, I went ahead and did it. -- Walt Pohl (talk) 07:20, 21 February 2008 (UTC)

Peter, please define your symbols. 'delta y', 'delta beta', 'K' and 'F', for instance. And it would be nice to see how the 'F' equations are condition statements (constraints). And what is meant by condition? Simultaneity? I find this article difficult to read and I can't imagine what a novice could make of it. (P.S. I agree with Walt Pohl on the absence of consensus on TLS. I am personally suspicious of it.) Pgpotvin (talk) 05:30, 22 February 2008 (UTC)

This is inherently a very technical subject, not suitable for novices. "Some algebraic manipulation" is long and complicated, so I opted for a brief summary. As to whether TLS is innocent or not, that's for the jury to decide. Petergans (talk) 08:02, 22 February 2008 (UTC)

## GNU Octave code

Some comments for each line would be useful for those with no exposure to GNU Octave. 217.169.50.138 (talk) 08:35, 27 June 2008 (UTC)

I'll add comments about what each line means, but I would love an explanation of why this works. I think I would understand SVD much better, for one thing. —Ben FrantzDale (talk) 02:57, 23 May 2009 (UTC)
I looked at it more and expanded it. It looks like the V matrix—the analyzing matrix—basically captures the covariance structure of A and B in that it transforms vectors right multiplied on $[A|B]$ into the directions corresponding to the singular values of $[A|B]$. I'm still confused why the singular values get ignored and where the negative sign comes from, though... —Ben FrantzDale (talk) 14:18, 26 May 2009 (UTC)

## Question about the total least squares

I would like to know the differences between the total least square methods for the linear regression and the first eigenvector of the principal component analysis (or EOF). Anybody can explain it? —Preceding unsigned comment added by Tribute0708 (talkcontribs) 07:01, 14 May 2009 (UTC)

I'm not quite sure either; I'd love a good explanation. I think it has something to do with letting A and B be different shapes... —Ben FrantzDale (talk) 13:58, 26 May 2009 (UTC)
There may also be the fact that SVD scales better to large numbers of degrees of freedom in which constructing the covariance matrix would be impractical... —Ben FrantzDale (talk) 14:18, 26 May 2009 (UTC)

## Possible clarifications

I'm very happy this page exists, and I deeply appreciate the efforts of the various contributors, but I have to say that I find several aspects of this page confusing and/or misleading.

In particular, the first line states that "Total least squares [...] is a least squares data modeling technique in which observational errors on both dependent and independent variables are taken into account." It seems that there are several different techniques that might be used in settings where there are errors in both independent and dependent variables are taken into account, and total least squares is only one of them. (The geometric mean functional relationship would be another, for instance.) Wouldn't it be more accurate to say that "Total least squares [...] is a least squares data modeling technique in which the sum of the squared distances, as measured orthogonally from the fit line to the data points, is minimized. Total least squares is often used in settings where there are observational errors in both the independent and dependent variables." This would emphasize that total least squares is a particular way of dealing with settings where there are errors in both the independent and dependent variables, not the only way, and would specify what is specific to total least squares.

In the section "Geometrical interpretation", I think the first paragraph is excellent, but the second states that "A serious difficulty arises if the variables are not measured in the same units." and then goes on to describe the nature of this difficulty. It ends with "To avoid this problem of incommensurability it is sometimes suggested that we convert to dimensionless variables - this may be called normalization or standardization. However there are various ways of doing this, and these lead to fitted models which are not equivalent to each other." This is stated as if it's a fatal flaw (at least that's how I read it), but it's really not. In particular, if one knows a priori that the errors in dependent and independent variables have equal variance and are all independent, then Draper and Smith show that total least squares gives the maximum likelihood solution for the regression coefficients.

Then in the section "Scale invariant methods", it goes on to say that "In short, total least squares does not have the property of units-invariance (it is not scale invariant). For a meaningful model we require this property to hold." This last statement is just not true. In some settings, scale invariance may be important, but it is not important in all settings. For instance, if the regressor variables can only take on integer values, then the choice of scale is not arbitrary, and so one might nor care whether one used a scale-invariant method or not. Also, if one knows a priori that the errors in dependent and independent variables have equal variance, then TLS gives a maximum likelihood solution. Rescaling the dependent and independent variables by different factors will make the error variances unequal, so it is not surprising that TLS no longer gives the maximum likelihood solution, and this doesn't imply that TLS is 'bad' or 'wrong'. It's just an answer to a particular question (or questions), which may or may not be the question you want an answer to.

Does anyone object to me editing the page to resolve these issues?

Adam Lyle Taylor (talk) 17:50, 13 June 2009 (UTC)

## Derivation

It would be nice if someone expanded the derivation so one could follow it more easily. 203.167.251.186 (talk) 19:47, 9 February 2010 (UTC)

## TLS with an eigenvector approach?

I'm not an expert in this area, but I have a strong background with eigenvectors. I think it would be helpful, if possible, to rephrase the problem as an eigenvalue problem. Here (http://www.cs.princeton.edu/courses/archive/fall11/cos323/notes/cos323_f11_lecture09_svd.pdf) makes it seem like TLS is simply augmenting the parameter space with one additional parameter attached to the output, y: minimize[(a1*x1 + a2*x2 + ... an*xn - b*y)^2]. Naturally there are an infinite number of solutions including all parameters set to zero. Which is why (http://www.cs.princeton.edu/courses/archive/fall11/cos323/notes/cos323_f11_lecture09_svd.pdf) treats this as a problem of selecting the unit eigenvector with the smallest eigenvalue of the gram-matrix A'*A.

I would greatly appreciate the article being explained this simply, if this description actually works. — Preceding unsigned comment added by 150.135.222.130 (talk) 22:31, 12 February 2013 (UTC)

Edit: I must be blind, it is actually mentioned at the top of the article "The total least squares approximation of the data is generically equivalent to the best, in the Frobenius norm, low-rank approximation of the data matrix." However, I would appreciate this being elaborated as I presented above. — Preceding unsigned comment added by 150.135.222.130 (talk) 22:34, 12 February 2013 (UTC)

## Algebraic approach (notation)

I've changed the variable letters used in the algebraic approach section so the equation solved is now

$X B \approx Y$

rather than

$A X \approx B$

to more closely reflect the lettering used in the first part of the article.

(In particular, when reading the article as it previously was, it took me a while to realise that X was being used for the design matrix in the first part, but then the matrix of unknown regression coefficients in the second. I hope with the change of lettering the transition is now easier.) Jheald (talk) 16:12, 13 June 2013 (UTC)