This article needs attention from an expert in statistics. Please add a reason or a talk parameter to this template to explain the issue with the article. WikiProject Statistics (or its Portal) may be able to help recruit an expert.(September 2009)
In statistics, the explained sum of squares (ESS), alternatively known as the model sum of squares or sum of squares due to regression ("SSR" – not to be confused with the residual sum of squaresRSS), is a quantity used in describing how well a model, often a regression model, represents the data being modelled. In particular, the explained sum of squares measures how much variation there is in the modelled values and this is compared to the total sum of squares, which measures how much variation there is in the observed data, and to the residual sum of squares, which measures the variation in the modelling errors.
The explained sum of squares (ESS) is the sum of the squares of the deviations of the predicted values from the mean value of a response variable, in a standard regression model — for example, yi = a + b1x1i + b2x2i + ... + εi, where yi is the ith observation of the response variable, xji is the ith observation of the jthexplanatory variable, a and bi are coefficients, i indexes the observations from 1 to n, and εi is the ith value of the error term. In general, the greater the ESS, the better the estimated model performs.
Partitioning in the general ordinary least squares model
The general regression model with n observations and k explanators, the first of which is a constant unit vector whose coefficient is the regression intercept, is
where y is an n × 1 vector of dependent variable observations, each column of the n × k matrix X is a vector of observations on one of the k explanators, is a k × 1 vector of true coefficients, and e is an n × 1 vector of the true underlying errors. The ordinary least squares estimator for is
The residual vector is , so the residual sum of squares is, after simplification,
Denote as the constant vector all of whose elements are the sample mean of the dependent variable values in the vector y. Then the total sum of squares is
The explained sum of squares, defined as the sum of squared deviations of the predicted values from the observed mean of y, is
Using in this, and simplifying to obtain , gives the result that TSS = ESS + RSS if and only if . The left side of this is times the sum of the elements of y, and the right side is times the sum of the elements of , so the condition is that the sum of the elements of y equals the sum of the elements of , or equivalently that the sum of the prediction errors (residuals) is zero. This can be seen to be true by noting the well-known OLS property that the k × 1 vector : since the first column of X is a vector of ones, the first element of this vector is the sum of the residuals and is equal to zero. This proves that the condition holds for the result that TSS = ESS + RSS.
In linear algebra terms, we have , , . The proof can be simplified by noting that . The proof is as follows:
which again gives the result that TSS = ESS + RSS if and only if .