Squared deviations
In probability theory and statistics, the definition of variance is either the expected value (when considering a theoretical distribution), or average value (for actual experimental data), of squared deviations from the mean. Computations for analysis of variance involve the partitioning of a sum of squared deviations. An understanding of the complex computations involved is greatly enhanced by a detailed study of the statistical value:
It is well known that for a random variable
with mean
and variance
:
Therefore
From the above, the following are easily derived:
If
is a vector of n predictions, and
is the vector of the true values, then the SSE of the predictor is: 
Contents |
Sample variance [edit]
The sum of squared deviations needed to calculate variance (before deciding whether to divide by n or n − 1) is most easily calculated as
From the two derived expectations above the expected value of this sum is
which implies
This effectively proves the use of the divisor n − 1 in the calculation of an unbiased sample estimate of σ2.
Partition — analysis of variance [edit]
In the situation where data is available for k different treatment groups having size ni where i varies from 1 to k, then it is assumed that the expected mean of each group is
and the variance of each treatment group is unchanged from the population variance
.
Under the Null Hypothesis that the treatments have no effect, then each of the
will be zero.
It is now possible to calculate three sums of squares:
- Individual
- Treatments
Under the null hypothesis that the treatments cause no differences and all the
are zero, the expectation simplifies to
- Combination
Sums of squared deviations [edit]
Under the null hypothesis, the difference of any pair of I, T, and C does not contain any dependency on
, only
.
total squared deviations aka total sum of squares
treatment squared deviations aka explained sum of squares
residual squared deviations aka residual sum of squares
The constants (n − 1), (k − 1), and (n − k) are normally referred to as the number of degrees of freedom.
Example [edit]
In a very simple example, 5 observations arise from two treatments. The first treatment gives three values 1, 2, and 3, and the second treatment gives two values 4, and 6.
Giving
- Total squared deviations = 66 − 51.2 = 14.8 with 4 degrees of freedom.
- Treatment squared deviations = 62 − 51.2 = 10.8 with 1 degree of freedom.
- Residual squared deviations = 66 − 62 = 4 with 3 degrees of freedom.
Two-way analysis of variance [edit]
The following hypothetical example gives the yields of 15 plants subject to two different environmental variations, and three different fertilisers.
| Extra CO2 | Extra humidity | |
|---|---|---|
| No fertiliser | 7, 2, 1 | 7, 6 |
| Nitrate | 11, 6 | 10, 7, 3 |
| Phosphate | 5, 3, 4 | 11, 4 |
Five sums of squares are calculated:
| Factor | Calculation | Sum | ![]() |
|---|---|---|---|
| Individual | ![]() |
641 | 15 |
| Fertiliser × Environment | ![]() |
556.1667 | 6 |
| Fertiliser | ![]() |
525.4 | 3 |
| Environment | ![]() |
519.2679 | 2 |
| Composite | ![]() |
504.6 | 1 |
Finally, the sums of squared deviations required for the analysis of variance can be calculated.
| Factor | Sum | ![]() |
Total | Environment | Fertiliser | Fertiliser × Environment | Residual |
|---|---|---|---|---|---|---|---|
| Individual | 641 | 15 | 1 | 1 | |||
| Fertiliser × Environment | 556.1667 | 6 | 1 | −1 | |||
| Fertiliser | 525.4 | 3 | 1 | −1 | |||
| Environment | 519.2679 | 2 | 1 | −1 | |||
| Composite | 504.6 | 1 | −1 | −1 | −1 | 1 | |
| Squared deviations | 136.4 | 14.668 | 20.8 | 16.099 | 84.833 | ||
| Degrees of freedom | 14 | 1 | 2 | 2 | 9 |
See also [edit]
References [edit]
- ^ Mood & Graybill: An introduction to the Theory of Statistics (McGraw Hill)

















total squared deviations aka
treatment squared deviations aka
residual squared deviations aka 






