# Talk:Algebraic formula for the variance

WikiProject Statistics (Rated Start-class, Mid-importance)

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

Start  This article has been rated as Start-Class on the quality scale.
Mid  This article has been rated as Mid-importance on the importance scale.
WikiProject Mathematics (Rated Start-class, Mid-importance)
This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of Mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Mathematics rating:
 Start Class
 Mid Importance
Field: Probability and statistics

## old merge

It apparently has been suggested that the article should be merged into variance. The latter is a very long piece, while a reader interested specifically in the computational formula may find it difficult to find it there. The term "computational formula for the variance" is fairly standard in probability textbooks. It makes sense to have a separate article on it. Katzmik (talk) 13:14, 9 December 2007 (UTC)

Yes? —Preceding unsigned comment added by 65.183.135.231 (talk) 04:10, 4 May 2008 (UTC)

I copied the proof over from the variance page. Katzmik (talk) 14:57, 7 May 2008 (UTC)

## Example

It might be good to have a simple example of this method, even if it does sound simple. 64.81.60.187 (talk) 03:29, 21 April 2009 (UTC)

Is this article supposed to be about computing of s2? or Var(X)? Is there a general reference for using the title, "Computational formula for the variance" for this (whatever the answer is)? 018 (talk) 21:31, 20 September 2010 (UTC)

This is a useful short article about an identity that is extremely commonly used when calculating variances by hand. Of course it's not common to calculate the sample variance by hand, but this identity is useful when working with small toy examples to demonstrate some basic point or provide a counterexample (and for all the thousands of students who use this identity to solve homework exercises each year). The identity is also useful in calculating population variances for many continuous distributions, when it is easier to calculate uncentered moments than centered moments (which is nearly always the case, the exponential distribution comes to mind as an example).
I am not aware of a reference for the title, and am not defending the choice of title (I don't see that anyone has proposed an alternative title). Ross (First Course in Probability) calls it the "alternative formula for the variance" and puts a box around it. Not every mathematical identity needs its own article, but this one seems prominent enough to merit an article.
This is of course not a good approach for computing variances on a finite precision device like a computer, due to roundoff error. I may add a mention of that somewhere in the article.
As to whether it is about s2? or Var(X), I fully appreciate the difference between these two things, and that they are often confused. In the context of this specific calculation, there are parallel identities for the two, and it makes perfect sense to present them together. Skbkekas (talk) 04:04, 25 September 2010 (UTC)

## MLE versus the biased unbiased estimator

I changed this to represent the variance estimate as a modification of the MLE. The bias correction factor to the MLE is is historically where the typical inverse N-1 comes from. However, nobody actually cares about variances, they only use them as an intermediate to getting a standard deviation, and unbiasedness is not transformation independent--so the "unbiased" variance estimate leads to a biased standard deviation estimate. Use of inverse N-1 is a relic that few statisticians would argue for or care about (N is about N-1 anyways). You can see my version at [1].

Also, what does this even mean?

These results are often used in practice to calculate the variance when it is inconvenient to center a random variable by subtracting its expected value or to center a set of data by subtracting the sample mean.

Who centers? How hard can it be to recenter? Why would you recenter anything? Is this endorsed anywhere? There is also an implication that there is this way better way of doing this out there, but it is not stated. How much harder are these calculations than the second moment? 018 (talk) 02:05, 24 September 2010 (UTC)

I agree with you that too much often gets made of the N-1 terms in various statistical expressions (as opposed to just using N). But for better or worse, most statistical textbooks, and many statisticians, use the unbiased variance estimate. Incidentally, using N-1 in the sample variance doesn't given an unbiased estimate of the SD, but the SD estimate computed with N-1 in the denominator is generally less biased and more accurate than the SD estimate computed with N in the denominator (by just a small amount, usually).
As for the statement about centering, it's intended to give some motivation for how the identity is used. If you compute EX first, then go to $\int (x-EX)^2p(x)dx$ you generally would proceed by expanding $(x-EX)^2$ and computing uncentered moments, which effectively is rederiving the identity, so why not just use it? Yes, I know that sometimes you can use a change of variables instead of expanding the square. Skbkekas (talk) 04:43, 25 September 2010 (UTC)
I guess, I was saying, why not present it as an update to the MLE instead of some dew drop from nowhere. I also think it is worth noting that the "unbiased estimator" is biased when applied. 018 (talk) 19:06, 27 September 2010 (UTC)
It's only the MLE in the Gaussian case, so I don't see that as a natural way to introduce the sample variance. The method of moments perspective is a lot more general. In any case, this is article is about a mathematical identity, not the statistical properties or interpretation of the quantities involved in the identity. The identity is pure algebra and does not rely on any statistics or probability. These issues would fit better in the variance and standard deviation articles. Skbkekas (talk) 13:57, 28 September 2010 (UTC)
So, we should take out the N-1? The motivations for it is entirely the unbiased estimator for variance of the normal, right? 018 (talk) 19:55, 28 September 2010 (UTC)
Using N-1 in the denominator gives an unbiased estimate of the variance for any distribution that has a variance, not just in the normal case.Skbkekas (talk) 20:07, 28 September 2010 (UTC)
Thanks for that info. I am relieved to learn that it is always the right answer to the wrong question. 018 (talk) 22:47, 28 September 2010 (UTC)

On the topic of the paragraph

These results are often used in practice to calculate the variance when it is inconvenient to center a random variable by subtracting its expected value or to center a set of data by subtracting the sample mean.

I still don't understand this at all. "these" means which results? Do you agree that it might be convenient to center and that you would then just calculate the second central moment? 018 (talk) 22:47, 28 September 2010 (UTC)

## Origins / History

When I learned this at the university, it was introduced as "Satz von Steiner" or "Verschiebungssatz von Steiner". The first name is also mentioned on the german Wikipedia (de:Verschiebungssatz (Statistik)) version of this article, so I assume this is rather common. It would translate to something like "Steiner translation theorem"; not to be confused with the parallel axis theorem which is also known as "de:Satz von Steiner". But talking about confusion: these two might be closely related, and indeed I've seen this formula attributed to Jakob Steiner as well. But it also is referenced as fr:Théorème de König-Huyghens (= "König-Huyghens Theorem"). Therefore, it would be sensible to: a) dig up the original publications to properly attribute the development of this formula as historical information on this theorem, b) list all the names it is known as. c) if it is closely related to the parallel axis theorem (which in english wikipedia is also listed as "Huygens-Steiner theorem"), the two should reference each other. --Chire (talk) 16:59, 12 July 2011 (UTC)

## Suggested move

The following discussion is an archived discussion of a requested move. Please do not modify it. Subsequent comments should be made in a new section on the talk page. Editors desiring to contest the closing decision should consider a move review. No further edits should be made to this section.

The result of the move request was: Moved to Algebraic formula for the variance Mike Cline (talk) 22:30, 25 February 2013 (UTC)

Computational formula for the varianceAlgebraic formula for the variance – To avoid confusion with existing article Algorithms for calculating variance, as this one is not about use computer algorithms. 81.98.35.149 (talk) 20:20, 6 February 2013 (UTC)

### Survey

Feel free to state your position on the renaming proposal by beginning a new line in this section with *'''Support''' or *'''Oppose''', then sign your comment with ~~~~. Since polling is not a substitute for discussion, please explain your reasons, taking into account Wikipedia's policy on article titles.