An intuitive understanding of the need for the Bessel correction when estimating the population variance can be found by considering the case when the sample consists of just two observations, x1 and x2. In the diagram, the horizontal and vertical axes are the values of these variables, and the two observations are represented by point A. The probability density function for the population is shown in grey, centered at the point M at on the line of equality which is shown in blue. If we were to take many such double observations and plot them, they would tend to cluster about the point M with a variance equivalent to that of the probability density. The sample mean is represented by point B at position , also on the line of equality, where . It can be shown that the line AB is perpendicular to the line of equality:
The three points form a right triangle with sides of length , , and . It is reasonable to expect that a good measure of the population variance derived from the single pair of observations will be given by the square of the distance from point A to point M. It can also be seen that the distance from point A to point B will always underestimate that distance, unless the population mean and the sample mean happen to coincide. The distance from point A to point B is:
which is just twice the biased estimator of the population variance. The unbiased estimator will be given by (WRONG). It reasonable that this error is roughly given by , and this is in fact exactly true, although not graphically obvious. Since , the unbiased estimate of the population variance is then
which is just what is expected based on the above definitions of the biased and unbiased varianse.
If we were to take samples containing three observations, we could make a similar diagram in a 3-dimensional space. The distance to the line of equality would now involve the squares of tw, while the error would still be measured along the line of equality.