# Talk:Mean squared error

WikiProject Statistics (Rated C-class, High-importance)

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

C  This article has been rated as C-Class on the quality scale.
High  This article has been rated as High-importance on the importance scale.

## INCONSISTENT ARTICLE TITLE

There is another article published on Wikipedia and titled "Root mean square error". Notice the use of the adjective "square", rather than "squared".

For the sake of consistency, I suggest to use "square" everywhere, including the title of this article, and indicate in the text that "squared" can be also used:

Symbol Preferred name Other name
SS Sum of squares
MS Mean square ("Mean squared" is not correct)
RMS Root mean square ("Root mean squared" is not correct")
MSE Mean square error Mean squared error
RMSE Root mean square error Root mean squared error
MSD Mean square deviation Mean squared deviation
RMSD Root mean square deviation Root mean squared deviation

A disambiguation is also necessary.

It is also true that a google search yields about twice as many hits for "mean square error" as "mean squared error". Cazort (talk) 17:22, 23 December 2007 (UTC)

Even though nobody's responded in three years, I'm going to add my support that something should be changed. The title says "mean squared error", but the first line says "mean square error". Maybe there should be consistency within the same article, at least? 67.174.115.100 (talk) 10:02, 15 November 2010 (UTC)
I agree, so I'll move it. Qwfp (talk) 09:24, 16 November 2010 (UTC)

## MSE examples wrong?

I suspect that the MSEs presented for ${\displaystyle S_{n-1}^{2}}$ and ${\displaystyle S_{n}^{2}}$ might be wrong. At least the formulas presented are completely different from those shown in Mood, Graybill and Boes (1974) Introduction to the Theory of Statistics (see pages 229 and 294). The formulas presented in MGB, which is a classic, are more complex and include terms in ${\displaystyle \mu _{4}}$. The derivation of the MSE for ${\displaystyle S_{n-1}^{2}}$, which is the same as the variance in this case, seems a rather tedious task. I think someone with some expertise in this area should have a look on this issue. --Bluemaster (talk) 21:06, 22 June 2008 (UTC)

• μ4 implies they're looking at a multivariate estimator, most likely the James-Stein estimator, which is a different animal than the univariate ones. -- DanielPenfield (talk) 22:41, 22 June 2008 (UTC)
• No, it is indeed the univariate estimator in this case: ${\displaystyle \mu _{4}}$ in this context is just ${\displaystyle E[(X-\mu _{x})^{4}]}$, the fourth central moment. The MSE expression in Mood, Graybill and Boyes for ${\displaystyle S_{n-1}^{2}}$ is ${\displaystyle {\frac {1}{n}}[\mu _{4}-{\frac {n-3}{n-1}}\sigma ^{4}]}$. A possible explanation is that this expression simplifies to the one presented in the article, but to be in the safe side, an expert review would be advisable.--Bluemaster (talk) 01:48, 23 June 2008 (UTC)
• The formula presented as MSE for ${\displaystyle S_{n}^{2}}$ is clearly wrong if the MSE for ${\displaystyle S_{n-1}^{2}}$ is correct (not sure). As ${\displaystyle S_{n}^{2}={\frac {n-1}{n}}S_{n-1}^{2}}$

it is easy to show that ${\displaystyle \operatorname {MSE} (S_{n}^{2})={\frac {2n-1}{n^{2}}}\sigma ^{4}}$, accepting the result on ${\displaystyle \operatorname {MSE} (S_{n-1}^{2})}$ as correct. --Bluemaster (talk) 12:45, 24 June 2008 (UTC)

• After some research I believe I've found out what is going on: the result ${\displaystyle \operatorname {MSE} (S_{n-1}^{2})={\frac {2}{n-1}}\sigma ^{4}}$, presented in the article, is not general but is correct under the assumption that ${\displaystyle (n-1)S_{n-1}^{2}/\sigma ^{2}}$ has ${\displaystyle \chi ^{2}}$ distribution with ${\displaystyle n-1}$ degrees of freedom, which has variance ${\displaystyle 2(n-1)}$. This result follows, for instance, from the assumption of normality on the ${\displaystyle X_{i}}$s used in the computation of ${\displaystyle S_{n-1}^{2}}$. The general result, without distributional assumptions, is, I believe, the one presented in Mood, Graybill and Boes (1974, p. 229 and 294) that is ${\displaystyle \operatorname {MSE} (S_{n-1}^{2})={\frac {1}{n}}[\mu _{4}-{\frac {n-3}{n-1}}\sigma ^{4}]}$, with ${\displaystyle \mu _{4}=\operatorname {E} [(X-\mu )^{4}]}$. The result ${\displaystyle \operatorname {MSE} (S_{n}^{2})={\frac {2n+1}{n^{2}}}\sigma ^{4}}$ is clearly wrong and will be corrected. The correct result, derived from ${\displaystyle \operatorname {MSE} (S_{n-1}^{2})={\frac {2}{n-1}}\sigma ^{4}}$, is ${\displaystyle \operatorname {MSE} (S_{n}^{2})={\frac {2n-1}{n^{2}}}\sigma ^{4}}$. The conclusion that ${\displaystyle \operatorname {MSE} (S_{n}^{2})<\operatorname {MSE} (S_{n-1}^{2})}$ is indeed correct for the Gaussian distributions, despite these problems. I will wait for comments during the next days and will proceed to make some changes in the article to reflect these findings if there is no disagreement or more insights. --Bluemaster (talk) 13:15, 25 June 2008 (UTC)
• Corrections made. Bluemaster (talk) 03:03, 10 July 2008 (UTC)

### Also contradictory: In the section "regression"

In regression analysis, the term mean squared error is sometimes used to refer to the unbiased estimate of error variance: the residual sum of squares divided by the number of degrees of freedom... Note that, although the MSE is not an unbiased estimator of the error variance, it is consistent, given the consistency of the predictor.

We have to choose; It cannot be both biased and unbiased.

--Livingthingdan (talk) 14:33, 25 January 2014 (UTC)

I've clarified it in the text. Loraof (talk) 18:21, 19 August 2016 (UTC)

## MSE

The article does not give explicit formulae of the MSE for the estimators in the example. Could someone fill this in?

Someone has suggested that the page for Root mean square deviation (RMSD) be merged with mean squared error. I do not think that it makes sense to do this for several reasons: 1. MSE is a measure of error, whereas RMSD method for comparing two biological structures. 2. RMSD is used almost exclusively in the context of protein folding, whereas MSE is used to describe statistics 3. Merging the articles would result in losing the meaning of the RMSD article.

Note that root mean squared deviation is different than root mean squared error.

1. RMSE = estimator of average error, RMSD = estimator of average distance. They're measuring the same thing: differences or variation.
2. RMSD is used in disciplines other than bioinformatics/biostatistics—try googling RMSD and "electrical engineering", for example.
3. Merging the articles should preserve the RMSD tie-in.
• My opinion: -- PdL -- January 11 2007 (UTC)
1. D in RMSD typically stands for "deviation", not for "distance". The distance and the difference between two scalar values are not exactly the same thing: the distance is the absolute value of the difference.
2. Deviation is the difference between the real value of a variable and its estimated or expected or predicted or "desired" value (for instance, the mean).
3. On the other hand, "error" is the difference between an estimated value of a variable and its real value. There are errors "of estimate" as well as errors "of measurement", and they are all with respect to the (often unknown) real value of the variable.
4. I conclude that a deviation is the additive opposite of an error. I agree that both words indicate differences, but they have not exactly the same meaning, and it is inappropriate to use theme as synonims.

## Content

Specific suggestion: If someone agrees with me on the following statement, then it would be helpful if added into the article--

"MSE is also sometimes called the variance; RMSE is also sometimes called the standard deviation."

I'm pretty sure that's correct, but I won't add it without confirmation.

• Agreed that the article could be made more friendly to those of use who haven't studied statistical theory. BTW, MSE and RMSE estimate the variance and standard deviation. To equate them would be inaccurate. --DanielPenfield 17:40, 1 November 2006 (UTC)

Some indication of how MSE differs from the variance would be useful.

MSE has a lot in common with variance but they are not the same! As an example, suppose you are trying to estimate the mean of a random variable that has a normal distribution with mean m and variance 1. The mean, m, is a fixed number, but it is unknown. Now, suppose you take a sample from this random variable. If you try to estimate m, your estimator is taking the sample and using it to guess m. A simple case would be to take one sample and have your guess for a be whatever value is picked. But you could also pick many samples and use, say, the median of your sample as the estimator. Or you could just guess the value 0 no matter what (this would be a bad estimator but it would still be an estimator!). The variance of the estimator is going to be the amount by which the estimator varies about ITS mean, not the true mean. The MSE is the amount that the estimator varies about its TRUE mean, which in this example is the number m. For an unbiased estimator, the MSE and the variance are the same. But often, it is not possible to find an unbiased estimator, or in cases a biased estimator might be preferred. I hope this answers the questions given here. Cazort (talk) 17:18, 23 December 2007 (UTC)

"MSE is also sometimes called the variance; RMSE is also sometimes called the standard deviation." Well, the MSE is a random variable itself that needs to be estimated. It's not just a number. If it has been estimated, it gives a measures of the variation of an estimator with repect to a known parameter. But it is not the variance as it also accounts for the bias of the estimator. Squim 10:59, 24 December 2006 (UTC)

In Examples, is it really true that ${\displaystyle S^{2}={\frac {1}{n}}\sum _{i=1}^{n}\left(X_{i}-{\overline {X}}\,\right)^{2}}$ has a lower MSE than the unbiased estimator ${\displaystyle S^{2}={\frac {1}{n-1}}\sum _{i=1}^{n}\left(X_{i}-{\overline {X}}\,\right)^{2}}$? I agree that it has a lower variance, but this is offset when calculating the MSE by the bias term.

Providing a practical example with real numbers would be desirable. —Preceding unsigned comment added by 131.203.101.15 (talk) 22:14, 16 September 2007 (UTC)

I agree whole heartedly with this sweeping critique. Like many mathematics articles on wikipedia, it's written by experts for experts instead of by experts for laymen, but since laymean don't really understand where to start asking questions, the problem is never fixed. I speak English and if you tell me something in English, I can understand it. But if you write something in a mathematical equation using symbols that are by conventions known only to those who have studied mathematics formally, I will not understand you. Hence, if you tell me "the mean squared error equals the average (mean) of the squares of the variance" I know what you mean. I don't know what

${\displaystyle \operatorname {MSE} ({\hat {\theta }})=\operatorname {E} {\big [}({\hat {\theta }}-\theta )^{2}{\big ]}.}$

means, except that I'm able to guess from the context. Please state all these equations in English. ${\displaystyle \operatorname {E} }$ is just a capital "E" where I'm from. 134.114.203.37 (talk) 18:40, 20 January 2011 (UTC)

It would have been nice if it was mentioned that ${\displaystyle \operatorname {E} }$ is the symbol for the Expected value.MahdiEynian (talk) 08:39, 4 September 2012 (UTC)

I agree with the previous comment that this article is pretty useless to anyone but a math major. In my opinion, most people look up MSE to get a general idea what it is and how to calculate it - and not for the ultra-precise mathematical dedinition. If I don't know what the MSE is, I am very likely also not to know what all those other greek symbols on the page mean... (and worse - I can't even google them). What about a simple paragraph for the layman first, along the lines of MSE = average((y-x)^2) / average((y^2 - x^2))... — Preceding unsigned comment added by 146.203.126.246 (talk) 21:58, 31 January 2012 (UTC)

## Squared error loss

Squared error loss redirects to this page, and appropriately so. However, because of this, and because that term is fairly commonly used, and also because this is a question of naming and definitions, I think that the remark about the term squared error loss should remain at the very top of the page, with the definition. Cazort (talk) 23:08, 26 December 2007 (UTC)

I first want to say that I am fully committed to making this page accessible, but there are some disagreements about how to do this. In particular, I have been editing the page in order to make it more concise, and removing the explanations/expositions of topics that are duplicated elsewhere. The way I look at things is this:

• Using technical terms does not necessarily make the article less accessible, nor does replacing them with expanded explanations make it less more so.
• Defining technical terms within in the article is not the appropriate when it leads to duplication elsewhere on wikipedia. Rather, these terms should be referenced on other pages. This is the whole point of wikipedia! Wikipedia is based around the idea of a web of knowledge, not a more-or-less linear exposition of knowledge like is found in most textbooks.

Cazort (talk) 00:15, 28 December 2007 (UTC)

Defining terms, no. But is defining the variables used in the examples section possible? I would do it myself if only I had the knowledge. Inclusion of the variable names would, in one fell swoop, change this article's value to me from nearly useless to something frequently referenced. It is very difficult to apply currently.Cranhandler (talk) 22:31, 24 October 2009 (UTC)

## Normative statements

I think that a statement "The error is phrased as a mean of squares ... because ..." is problematic because it does not specify what is meant by "because". I think that there are different things going on here, which is that MSE is used in some circumstances solely out of convenience and because the exact choice of loss function has little bearing on the result. However, in other situations, it is used because it approximates some loss function arising in utility theory. In other situations, it might be inappropriate. Still more, there are circumstances where there are compelling theoretical reasons to use it--such as its direct relationship to the expected value, whereas mean absolute error is a more natural way to measure the error of an estimate of a median. On these grounds, I would like to say that I don't think we should avoid normative statements about the loss functions, rather, I think we really ought to include them, and to discuss in more detail exactly why MSE is used in different situations. Cazort (talk) 00:28, 28 December 2007 (UTC)

## multiple usages.

This article confuses two distinct usages of MSE. I've clarified the distinction in the first section, but other sections still have the same confusion.

Also, with regard to square vs squared: It's squared error, of which we then take either the expectation or an average (depending on usage). Therefore Mean Squared Error is correct, Mean Square Error is incorrect. (Similarly, chi-square distribution is incorrect. Chi-squared is correct).

--Zaqrfv (talk) 22:24, 13 August 2008 (UTC)

The second usage would be more correctly placed at residual sum of squares. See Errors and residuals in statistics3mta3 (talk) 18:48, 26 October 2009 (UTC)