Talk:Ordinary least squares: Difference between revisions

Browse history interactively

← Previous edit Next edit →

Content deleted Content added

VisualWikitext

Inline

Revision as of 05:34, 3 May 2015

Template:Vital article

Statistics B‑class Top‑importance

	This article is within the scope of WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.StatisticsWikipedia:WikiProject StatisticsTemplate:WikiProject StatisticsStatistics articles
B	This article has been rated as B-class on Wikipedia's content assessment scale.
Top	This article has been rated as Top-importance on the importance scale.

Mathematics B‑class Low‑priority

	Mathematics portal This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.MathematicsWikipedia:WikiProject MathematicsTemplate:WikiProject Mathematicsmathematics articles
B	This article has been rated as B-class on Wikipedia's content assessment scale.
Low	This article has been rated as Low-priority on the project's priority scale.

Regression articles discussion July 2009

A discussion of content overlap of some regression-related articles has been started at Talk:Linear least squares#Merger proposal but it isn't really just a question of merging and no actual merge proposal has been made. Melcombe (talk) 11:37, 14 July 2009 (UTC)[reply]

OLS

Don't merge them, this article is OLS. To put it simple, this article is only for OLS, so don't write much about GLS etc in this article. But we can go deeper for OLS in this article. I want to delete much material is not OLS, those material can be put in the linear regression article. For example, the section 1 should be simplified. Jackzhp (talk) 16:25, 25 March 2010 (UTC)[reply]

The variance of the estimator

We know $var({\hat {\beta }})$ , but what is $var(s^{2})$ ? I don't know how to find it even for the simple case when $\Omega =\sigma ^{2}I_{n}$ , but I know that the FDCR lower bound of it in this case is ${\frac {2\sigma ^{4}}{3n-4k}}$ . Jackzhp (talk) 16:53, 25 March 2010 (UTC)[reply]

According to my calculations, the variance is equal to

\operatorname {Var} [s^{2}|X]={\frac {2\sigma ^{4}}{n-p}}+\gamma _{2}\cdot \sigma ^{4}{\frac {\sum _{i=1}^{n}m_{ii}^{2}}{(n-p)^{2}}},

where m_ii is the i-th diagonal element of the annihilator matrix M, and γ₂ is the kurtosis of the distribution of the error terms. When the kurtosis is positive we can obtain an upper bound from the fact that

m_{ii}\geq m_{ii}^{2}

, and the sum of m_ii’s is the trace of M which is n − p:

{\frac {2\sigma ^{4}}{n-p}}\leq \operatorname {Var} [s^{2}|X]\leq {\frac {(2+\gamma _{2})\sigma ^{4}}{n-p}}

// stpasha » 00:36, 25 April 2010 (UTC)[reply]

Comment from main article on assumptions

I'm moving this from the main article, it was included as a comment in the section on assumptions:

Can we just replace this section with the following one line?

E\left(\varepsilon \varepsilon ^{'}|X\right)=\Omega _{n}

Please discuss with me in the discussion page. Let's keep material only related to OLS, anything else should be deleted. If you want, please move it to linear regression article.

correlation between data points can be discussed, but not in this section. Given

E\left(\varepsilon \varepsilon ^{'}|X\right)=\sigma ^{2}I_{n}

, we can clear see this.

Identifiability can be discussed, but should be in the estimation section

I disagree with this proposal. Some written explanation is much more useful than a single mathematical expression. The current text is not unduly digressive. Skbkekas (talk) 22:29, 25 March 2010 (UTC)[reply]

Why does "linear least-squares" redirect here?

This page does not make any sense to someone who is just interested in the general problem $\min _{x}\|Ax-b\|^{2}$ , since this page seems extremely application specific. Did some error occur when this strange redirection happened? —Preceding unsigned comment added by Short rai (talk • contribs) 09:54, 23 May 2010 (UTC)[reply]

See Ordinary least squares#Geometric approach. // stpasha » 19:07, 23 May 2010 (UTC)[reply]

But this is a major subject in virtually every subfield of applied mathematics, and you refer everyone who is not in statistics to one tiny paragraph called "geometric approach"? I think there recently was a redirect and/or merge process going on with other articles in about the topic, and there seems to have been a page called "linear least squares" earlier, but that page is impossible to find now.Short rai (talk) 23:14, 23 May 2010 (UTC)[reply]

Ya, there used to be a backlink in the see also section, I have restored it now. And by the way, the OLS regression and the problem of minimization of the norm of Ax−b are exactly the same problems, only written in different notations. The difference is that statisticians use X and y for known quantities, and β for unknown. Mathematicians use those symbols in the opposite way. // stpasha » 03:53, 24 May 2010 (UTC)[reply]

Hypothesis Testing

Why is this section empty? Stuffisthings (talk) 13:32, 15 July 2010 (UTC)[reply]

Linear Least Squares

FYI, there is a discussion on the usage of Linear least squares which currently redirects here, but was previously another topic, at Talk:Numerical methods for linear least squares.

76.66.198.128 (talk) 03:56, 21 October 2010 (UTC)[reply]

Vertical or Euclidian Distances?

The article states that OLS minimizes the sum of squared distances from the points to the estimated regression line. But we are taught in standard (Euclidian) geometry that the distance between a point and a line is defined as the length of the perpindicular line segment connecting the two. This is not what OLS minimizes. Rather, it minimizes the vertical distance between the points and the line. Shouldn't the article say as much? —Preceding unsigned comment added by 76.76.220.34 (talk) 17:00, 2 November 2010 (UTC)[reply]

I've changed that sentence, so that it says "vertical distances" now. // stpasha » 20:53, 2 November 2010 (UTC)[reply]

I have a problem where I do want to minimize the sum of squared Euclidean distances from a point to a set of given straight lines in the plane. Can anyone give a reference or some keywords? (The problem occurs in surveying, when many observers at known locations can see the same point at an unknown location. Each observer can measure its bearing to the target point. This gives a set of lines that ideally should intersect at the target point. But measurement errors gives an overdetermined problem if there are more than two observers.) Mikrit (talk) 15:41, 22 November 2010 (UTC)[reply]

You want to look at Deming regression or Total least squares. // stpasha » 16:47, 22 November 2010 (UTC)[reply]

Example with real data

Are the calculations of the Akaike criterion and Schwarz criterion correct here? I know that there are many "different" forms of the AIC and SIC - but I just can't figure out how these were calculated. Certainly they seem inconsistent with the forms that are linked-to in the description that follows the calculated values.—Preceding unsigned comment added by 71.224.184.11 (talk) 05:32, 30 November 2010 (UTC)[reply]

I'll try again

Way back in 2008 I came across this example calculation, and looked at the plot of (x,y) and noticed an odd cadence in the positioning. Some simple inspection soon showed that the original height data were in terms of inches, and whoever converted them to metric bungled the job. The conversion factor is 2.54cm to an inch and rounding to the nearest centimetre is not a good idea. This makes a visible difference and considerably changes the results of an attempt at a quadratic fit. It doesn't matter what statistical packages might be used, to whatever height of sophistication, if the input data are wrongly prepared. The saving grace is the presence of the plot of the data, but, that plot has to be looked at not just gazed at with vague approbation. In the various reorganisations, my report of this problem has been lost, and the editors ignored its content. The error remains, so I'll try again.

Height^2       Height         Const.
61.96033      -143.162      128.8128  Improper rounding of inches to whole cm.
58.5046       -131.5076     119.0205  Proper conversion, no rounding.

The original incorrectly-converted plot can be reproduced, but here is a plot of the correctly-converted heights, with a quadratic fit. Notice the now-regular spacing of the x-values, without cadence.

Correctly converted heights, quadratic fit.

For the incorrectly-concerted heights, the residuals are

Wrongly converted heights, Residuals to a quadratic fit.

Whereas for the correctly-converted height data, the residuals are much smaller. (Note the vertical scale)

Correctly converted heights, Residuals to a quadratic fit.

And indeed, this pattern of residuals rather suggests a higher-order fit attempt, such as with a cubic.

Correctly converted heights, Residuals to a cubic fit.

But assessing the merit of this escalation would be helped by the use of some more sophisticated analysis, such as might be offered by the various fancy packages, if fed correct data.

Later, it occurred to me to consider whether the weights might have been given in pounds. The results were odd in another way. Using the conversion 1KG = 2.20462234 lbs used in the USA, the weights are

115.1033 117.1095 120.1078 123.1061 126.1044 129.1247 132.123 135.1213 139.1337 142.132 146.1224 150.1348 154.1472 159.1517 164.1562
114.862  116.864  119.856  122.848  125.84   128.854  131.846 134.838  138.842  141.834 145.816  149.82   153.824  158.818  163.812

The second row being for the approximate conversion of 1KG = 2.2lbs. I am puzzled by the fractional parts. NickyMcLean (talk) 22:11, 5 September 2011 (UTC)[reply]

Possible error in formula for standard error for coefficients

It seems to me that the 1/n should not be included in the formula for the standard errors for each coefficient. With 1/n, the values calculated in this example are not produced. Removing it generates the values given in the example. Would someone more knowledgeable in this subject examine this and correct the formula, if necessary? — Preceding unsigned comment added by 173.178.40.20 (talk) 17:37, 22 May 2012 (UTC)[reply]

Yes, this looks wrong to me as well. — Preceding unsigned comment added by 2620:0:1009:1:BAAC:6FFF:FE7D:1EE9 (talk) 17:24, 30 November 2012 (UTC)[reply]

I arrived to the same conclusion independently before checking this talk page, thus I removed the 1/n. 91.157.6.139 (talk) 12:06, 1 January 2015 (UTC)[reply]

I found this error as well, and, despite the previous comment the 1/n was still there. I removed it now. — Preceding unsigned comment added by 212.213.198.88 (talk) 10:50, 20 February 2015 (UTC)[reply]

On multicollinearity

Multicollinearity means high level of correlation between variables. OLS can handle this fine, it just needs more data to do it. However, in the article, "multicollinearity" is being used to mean perfect collinearity, ie. the data matrix does not have full column rank. This is confusing. I propose we stop using multicollinearity to mean lack of full rank, and just say "not full rank". —Preceding unsigned comment added by 24.30.13.209 (talk) 04:21, 10 March 2011 (UTC)[reply]

Done. Let me know if I missed any instances of it. Duoduoduo (talk) 15:11, 10 March 2011 (UTC)[reply]

Estimation

In the section "Simple regression model"; It is not true that ${\hat {\beta }}=\mathrm {Cov} (x,y)/\mathrm {Var} (x)$ , this only holds for the true parameters and not the estimator. Rather, ${\hat {\beta }}$ is equal to the sample covariance over sample variance. Maybe use a hat over Cov and Var to signify this if you want the relation to be stressed.

Also, in my opinion a lot of time is being devoted to the use of the annihilator-matrix. It does't seem necessary to introduce the extra notation unless one wants to go into the Frisch-Waugh-Lovell theorem and this has its own separate page. — Preceding unsigned comment added by Superpronker (talk • contribs) 06:46, 1 June 2011 (UTC)[reply]

I believe that a clarification on notation would help immensely. The regressor values for 1 observation is referred to as the *column* vector $x_{i}$ . However, in the design matrix of regressor values, the values for an observation occupy a *row*. Hence, it is easy to fall into the trap of thinking of $x_{i}$ as a row vector, leading to confusion.Craniator (talk) 05:34, 3 May 2015 (UTC)[reply]

Revision as of 05:33, 3 May 2015 edit Craniator (talk \| contribs) 19 edits →‎Estimation ← Previous edit		Revision as of 05:34, 3 May 2015 edit undo Craniator (talk \| contribs) 19 edits →‎Estimation Next edit →
Line 120:		Line 120:
	Also, in my opinion a lot of time is being devoted to the use of the annihilator-matrix. It does't seem necessary to introduce the extra notation unless one wants to go into the Frisch-Waugh-Lovell theorem and this has its own separate page. <small><span class="autosigned">— Preceding [[Wikipedia:Signatures\|unsigned]] comment added by [[User:Superpronker\|Superpronker]] ([[User talk:Superpronker\|talk]] • [[Special:Contributions/Superpronker\|contribs]]) 06:46, 1 June 2011 (UTC)</span></small><!-- Template:Unsigned --> <!--Autosigned by SineBot-->		Also, in my opinion a lot of time is being devoted to the use of the annihilator-matrix. It does't seem necessary to introduce the extra notation unless one wants to go into the Frisch-Waugh-Lovell theorem and this has its own separate page. <small><span class="autosigned">— Preceding [[Wikipedia:Signatures\|unsigned]] comment added by [[User:Superpronker\|Superpronker]] ([[User talk:Superpronker\|talk]] • [[Special:Contributions/Superpronker\|contribs]]) 06:46, 1 June 2011 (UTC)</span></small><!-- Template:Unsigned --> <!--Autosigned by SineBot-->

	I believe that a clarification on notation would help immensely. The regressor values for 1 observation is referred to as the column vector <math>x_i</math>. However, in the design matrix of regressor values, the values for an observation occupy a row. Hence, it is easy to fall into the trap of thinking of <math>x_i</math> as a row vector, leading to confusion.		I believe that a clarification on notation would help immensely. The regressor values for 1 observation is referred to as the column vector <math>x_i</math>. However, in the design matrix of regressor values, the values for an observation occupy a row. Hence, it is easy to fall into the trap of thinking of <math>x_i</math> as a row vector, leading to confusion.[[User:Craniator\|Craniator]] ([[User talk:Craniator\|talk]]) 05:34, 3 May 2015 (UTC)