Talk:Ordinary least squares

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Statistics (Rated B-class, Top-importance)
WikiProject icon

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

B-Class article B  This article has been rated as B-Class on the quality scale.
 Top  This article has been rated as Top-importance on the importance scale.
WikiProject Mathematics (Rated B-class, Low-importance)
WikiProject Mathematics
This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of Mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Mathematics rating:
B Class
Low Importance
 Field: Probability and statistics
One of the 500 most frequently viewed mathematics articles.

Regression articles discussion July 2009[edit]

A discussion of content overlap of some regression-related articles has been started at Talk:Linear least squares#Merger proposal but it isn't really just a question of merging and no actual merge proposal has been made. Melcombe (talk) 11:37, 14 July 2009 (UTC)


Don't merge them, this article is OLS. To put it simple, this article is only for OLS, so don't write much about GLS etc in this article. But we can go deeper for OLS in this article. I want to delete much material is not OLS, those material can be put in the linear regression article. For example, the section 1 should be simplified. Jackzhp (talk) 16:25, 25 March 2010 (UTC)

The variance of the estimator[edit]

We know var(\hat{\beta}), but what is var(s^2)? I don't know how to find it even for the simple case when \Omega=\sigma^2I_n, but I know that the FDCR lower bound of it in this case is \frac{2\sigma^{4}}{3n-4k}. Jackzhp (talk) 16:53, 25 March 2010 (UTC)

According to my calculations, the variance is equal to

     \operatorname{Var}[s^2|X] = \frac{2\sigma^4}{n-p} + \gamma_2\cdot\sigma^4 \frac{\sum_{i=1}^n m_{ii}^2}{(n-p)^2},
where mii is the i-th diagonal element of the annihilator matrix M, and γ2 is the kurtosis of the distribution of the error terms. When the kurtosis is positive we can obtain an upper bound from the fact that m_{ii}\geq m_{ii}^2, and the sum of mii’s is the trace of M which is n − p:

      \frac{2\sigma^4}{n-p} \leq \operatorname{Var}[s^2|X] \leq \frac{(2+\gamma_2)\sigma^4}{n-p}
 // stpasha »  00:36, 25 April 2010 (UTC)

Comment from main article on assumptions[edit]

I'm moving this from the main article, it was included as a comment in the section on assumptions:

Can we just replace this section with the following one line?
Please discuss with me in the discussion page. Let's keep material only related to OLS, anything else should be deleted. If you want, please move it to linear regression article.
correlation between data points can be discussed, but not in this section. Given E\left(\varepsilon\varepsilon^{'}|X\right)=\sigma^{2}I_{n}, we can clear see this.
Identifiability can be discussed, but should be in the estimation section

I disagree with this proposal. Some written explanation is much more useful than a single mathematical expression. The current text is not unduly digressive. Skbkekas (talk) 22:29, 25 March 2010 (UTC)

Why does "linear least-squares" redirect here?[edit]

This page does not make any sense to someone who is just interested in the general problem \min_x \|Ax-b\|^2, since this page seems extremely application specific. Did some error occur when this strange redirection happened? —Preceding unsigned comment added by Short rai (talkcontribs) 09:54, 23 May 2010 (UTC)

See Ordinary least squares#Geometric approach.  // stpasha »  19:07, 23 May 2010 (UTC)
But this is a major subject in virtually every subfield of applied mathematics, and you refer everyone who is not in statistics to one tiny paragraph called "geometric approach"? I think there recently was a redirect and/or merge process going on with other articles in about the topic, and there seems to have been a page called "linear least squares" earlier, but that page is impossible to find now.Short rai (talk) 23:14, 23 May 2010 (UTC)
Ya, there used to be a backlink in the see also section, I have restored it now. And by the way, the OLS regression and the problem of minimization of the norm of Ax−b are exactly the same problems, only written in different notations. The difference is that statisticians use X and y for known quantities, and β for unknown. Mathematicians use those symbols in the opposite way.  // stpasha »  03:53, 24 May 2010 (UTC)

Hypothesis Testing[edit]

Why is this section empty? Stuffisthings (talk) 13:32, 15 July 2010 (UTC)

Linear Least Squares[edit]

FYI, there is a discussion on the usage of Linear least squares which currently redirects here, but was previously another topic, at Talk:Numerical methods for linear least squares. (talk) 03:56, 21 October 2010 (UTC)

Vertical or Euclidian Distances?[edit]

The article states that OLS minimizes the sum of squared distances from the points to the estimated regression line. But we are taught in standard (Euclidian) geometry that the distance between a point and a line is defined as the length of the perpindicular line segment connecting the two. This is not what OLS minimizes. Rather, it minimizes the vertical distance between the points and the line. Shouldn't the article say as much? —Preceding unsigned comment added by (talk) 17:00, 2 November 2010 (UTC)

I've changed that sentence, so that it says "vertical distances" now.  // stpasha »  20:53, 2 November 2010 (UTC)
I have a problem where I do want to minimize the sum of squared Euclidean distances from a point to a set of given straight lines in the plane. Can anyone give a reference or some keywords? (The problem occurs in surveying, when many observers at known locations can see the same point at an unknown location. Each observer can measure its bearing to the target point. This gives a set of lines that ideally should intersect at the target point. But measurement errors gives an overdetermined problem if there are more than two observers.) Mikrit (talk) 15:41, 22 November 2010 (UTC)
You want to look at Deming regression or Total least squares.  // stpasha »  16:47, 22 November 2010 (UTC)

Example with real data[edit]

Are the calculations of the Akaike criterion and Schwarz criterion correct here? I know that there are many "different" forms of the AIC and SIC - but I just can't figure out how these were calculated. Certainly they seem inconsistent with the forms that are linked-to in the description that follows the calculated values.—Preceding unsigned comment added by (talk) 05:32, 30 November 2010 (UTC)

I'll try again[edit]

Way back in 2008 I came across this example calculation, and looked at the plot of (x,y) and noticed an odd cadence in the positioning. Some simple inspection soon showed that the original height data were in terms of inches, and whoever converted them to metric bungled the job. The conversion factor is 2.54cm to an inch and rounding to the nearest centimetre is not a good idea. This makes a visible difference and considerably changes the results of an attempt at a quadratic fit. It doesn't matter what statistical packages might be used, to whatever height of sophistication, if the input data are wrongly prepared. The saving grace is the presence of the plot of the data, but, that plot has to be looked at not just gazed at with vague approbation. In the various reorganisations, my report of this problem has been lost, and the editors ignored its content. The error remains, so I'll try again.

Height^2       Height         Const.
61.96033      -143.162      128.8128  Improper rounding of inches to whole cm.
58.5046       -131.5076     119.0205  Proper conversion, no rounding.

The original incorrectly-converted plot can be reproduced, but here is a plot of the correctly-converted heights, with a quadratic fit. Notice the now-regular spacing of the x-values, without cadence.

Correctly converted heights, quadratic fit.

For the incorrectly-concerted heights, the residuals are

Wrongly converted heights, Residuals to a quadratic fit.

Whereas for the correctly-converted height data, the residuals are much smaller. (Note the vertical scale)

Correctly converted heights, Residuals to a quadratic fit.

And indeed, this pattern of residuals rather suggests a higher-order fit attempt, such as with a cubic.

Correctly converted heights, Residuals to a cubic fit.

But assessing the merit of this escalation would be helped by the use of some more sophisticated analysis, such as might be offered by the various fancy packages, if fed correct data.

Later, it occurred to me to consider whether the weights might have been given in pounds. The results were odd in another way. Using the conversion 1KG = 2.20462234 lbs used in the USA, the weights are

115.1033 117.1095 120.1078 123.1061 126.1044 129.1247 132.123 135.1213 139.1337 142.132 146.1224 150.1348 154.1472 159.1517 164.1562
114.862  116.864  119.856  122.848  125.84   128.854  131.846 134.838  138.842  141.834 145.816  149.82   153.824  158.818  163.812

The second row being for the approximate conversion of 1KG = 2.2lbs. I am puzzled by the fractional parts. NickyMcLean (talk) 22:11, 5 September 2011 (UTC)

Possible error in formula for standard error for coefficients[edit]

It seems to me that the 1/n should not be included in the formula for the standard errors for each coefficient. With 1/n, the values calculated in this example are not produced. Removing it generates the values given in the example. Would someone more knowledgeable in this subject examine this and correct the formula, if necessary? — Preceding unsigned comment added by (talk) 17:37, 22 May 2012 (UTC)

Yes, this looks wrong to me as well. — Preceding unsigned comment added by 2620:0:1009:1:BAAC:6FFF:FE7D:1EE9 (talk) 17:24, 30 November 2012 (UTC)

I arrived to the same conclusion independently before checking this talk page, thus I removed the 1/n. (talk) 12:06, 1 January 2015 (UTC)

I found this error as well, and, despite the previous comment the 1/n was still there. I removed it now. — Preceding unsigned comment added by (talk) 10:50, 20 February 2015 (UTC)

On multicollinearity[edit]

Multicollinearity means high level of correlation between variables. OLS can handle this fine, it just needs more data to do it. However, in the article, "multicollinearity" is being used to mean perfect collinearity, ie. the data matrix does not have full column rank. This is confusing. I propose we stop using multicollinearity to mean lack of full rank, and just say "not full rank". —Preceding unsigned comment added by (talk) 04:21, 10 March 2011 (UTC)

Done. Let me know if I missed any instances of it. Duoduoduo (talk) 15:11, 10 March 2011 (UTC)


In the section "Simple regression model"; It is not true that \hat\beta = \mathrm{Cov}(x,y)/\mathrm{Var}(x) , this only holds for the true parameters and not the estimator. Rather, \hat\beta is equal to the sample covariance over sample variance. Maybe use a hat over Cov and Var to signify this if you want the relation to be stressed.

Also, in my opinion a lot of time is being devoted to the use of the annihilator-matrix. It does't seem necessary to introduce the extra notation unless one wants to go into the Frisch-Waugh-Lovell theorem and this has its own separate page. — Preceding unsigned comment added by Superpronker (talkcontribs) 06:46, 1 June 2011 (UTC)

I believe that a clarification on notation would help immensely. The regressor values for 1 observation is referred to as the *column* vector x_i. However, in the design matrix of regressor values, the values for an observation occupy a *row*. Hence, it is easy to fall into the trap of thinking of x_i as a row vector, leading to confusion.Craniator (talk) 05:34, 3 May 2015 (UTC)

Alternative Derivations: Geometric Approach[edit]

In the illustration of orthogonal projection, it would be helpful to clarify that X_i refers to a column in the data matrix, thus clearly distinguishing it from x_i for the set of regressor values from one observation.Craniator (talk) 06:01, 3 May 2015 (UTC)

Too technical? Should it be rewritten "one level down"?[edit]

I realize this article has been rated B-class, but I wonder if it is too technical for someone who does not already understand OLS. OLS is often studied in undergrad stats classes. Therefore, in the spirit of writing "one level down" (see: WP:UPFRONT), this article should ideally include an extended intro that is much more comprehensible to someone with some relatively advanced high school math training (say, through a year of high school calculus, but without linear algebra).

As currently written, almost all of the article is incomprehensible to someone who doesn't understand linear algebra. There are many ways of introducing OLS without linear algebra, so could such non-technical, intuitive approaches be put at the top of this article, leaving the more technical, formal math stuff for the bottom? I hesitate to be so bold in editing, since this is a very important article, but I think it is currently way too technical.Aroundthewayboy (talk) 04:57, 26 July 2015 (UTC)