Squared deviations from the mean

From Wikipedia, the free encyclopedia
  (Redirected from Squared deviations)
Jump to: navigation, search

Squared deviations from the mean (SDM) are involved in various calculations. In probability theory and statistics, the definition of variance is either the SDM expected value (when considering a theoretical distribution) or its average value (for actual experimental data). Computations for analysis of variance involve the partitioning of a sum of SDM.

Introduction[edit]

An understanding of the computations involved is greatly enhanced by a study of the statistical value:

It is well known that for a random variable with mean and variance :

[1]

Therefore

From the above, the following are easily derived:

If is a vector of n predictions, and is the vector of the true values, then the SSE of the predictor is:

Sample variance[edit]

Main article: Sample variance

The sum of squared deviations needed to calculate sample variance (before deciding whether to divide by n or n − 1) is most easily calculated as

From the two derived expectations above the expected value of this sum is

which implies

This effectively proves the use of the divisor n − 1 in the calculation of an unbiased sample estimate of σ2.

Partition — analysis of variance[edit]

In the situation where data is available for k different treatment groups having size ni where i varies from 1 to k, then it is assumed that the expected mean of each group is

and the variance of each treatment group is unchanged from the population variance .

Under the Null Hypothesis that the treatments have no effect, then each of the will be zero.

It is now possible to calculate three sums of squares:

Individual
Treatments

Under the null hypothesis that the treatments cause no differences and all the are zero, the expectation simplifies to

Combination

Sums of squared deviations[edit]

Under the null hypothesis, the difference of any pair of I, T, and C does not contain any dependency on , only .

total squared deviations aka total sum of squares
treatment squared deviations aka explained sum of squares
residual squared deviations aka residual sum of squares

The constants (n − 1), (k − 1), and (n − k) are normally referred to as the number of degrees of freedom.

Example[edit]

In a very simple example, 5 observations arise from two treatments. The first treatment gives three values 1, 2, and 3, and the second treatment gives two values 4, and 6.

Giving

Total squared deviations = 66 − 51.2 = 14.8 with 4 degrees of freedom.
Treatment squared deviations = 62 − 51.2 = 10.8 with 1 degree of freedom.
Residual squared deviations = 66 − 62 = 4 with 3 degrees of freedom.

Two-way analysis of variance[edit]

The following hypothetical example gives the yields of 15 plants subject to two different environmental variations, and three different fertilisers.

Extra CO2 Extra humidity
No fertiliser 7, 2, 1 7, 6
Nitrate 11, 6 10, 7, 3
Phosphate 5, 3, 4 11, 4

Five sums of squares are calculated:

Factor Calculation Sum
Individual 641 15
Fertiliser × Environment 556.1667 6
Fertiliser 525.4 3
Environment 519.2679 2
Composite 504.6 1

Finally, the sums of squared deviations required for the analysis of variance can be calculated.

Factor Sum Total Environment Fertiliser Fertiliser × Environment Residual
Individual 641 15 1 1
Fertiliser × Environment 556.1667 6 1 −1
Fertiliser 525.4 3 1 −1
Environment 519.2679 2 1 −1
Composite 504.6 1 −1 −1 −1 1
Squared deviations 136.4 14.668 20.8 16.099 84.833
Degrees of freedom 14 1 2 2 9

See also[edit]

References[edit]

  1. ^ Mood & Graybill: An introduction to the Theory of Statistics (McGraw Hill)