Mean and predicted response

From Wikipedia, the free encyclopedia
Jump to: navigation, search

In linear regression mean response and predicted response are values of the dependent variable calculated from the regression parameters and a given value of the independent variable. The values of these two responses are the same, but their calculated variances are different.

Straight line regression[edit]

In straight line fitting, the model is

y_i=\alpha+\beta x_i +\epsilon_i\,

where y_i is the response variable, x_i is the explanatory variable, εi is the random error, and \alpha and \beta are parameters. The predicted response value for a given explanatory value, xd, is given by

\hat{y}_d=\hat\alpha+\hat\beta x_d ,

while the actual response would be

y_d=\alpha+\beta x_d +\epsilon_d  \,

Expressions for the values and variances of \hat\alpha and \hat\beta are given in linear regression.

Mean response is an estimate of the mean of the y population associated with xd, that is E(y | x_d)=\hat{y}_d\!. The variance of the mean response is given by

\text{Var}\left(\hat{\alpha} + \hat{\beta}x_d\right) = \text{Var}\left(\hat{\alpha}\right) + \left(\text{Var} \hat{\beta}\right)x_d^2 + 2 x_d\text{Cov}\left(\hat{\alpha},\hat{\beta}\right) .

This expression can be simplified to

\text{Var}\left(\hat{\alpha} + \hat{\beta}x_d\right) =\sigma^2\left(\frac{1}{m} + \frac{\left(x_d - \bar{x}\right)^2}{\sum (x_i - \bar{x})^2}\right).

To demonstrate this simplification, one can make use of the identity

\sum (x_i - \bar{x})^2 = \sum x_i^2 - \frac{1}{m}\left(\sum x_i\right)^2 .

The predicted response distribution is the predicted distribution of the residuals at the given point xd. So the variance is given by

\text{Var}\left(y_d - \left[\hat{\alpha} + \hat{\beta}x_d\right]\right) = \text{Var}\left(y_d\right) + \text{Var}\left(\hat{\alpha} + \hat{\beta}x_d\right) .

The second part of this expression was already calculated for the mean response. Since \text{Var}\left(y_d\right)=\sigma^2 (a fixed but unknown parameter that can be estimated), the variance of the predicted response is given by

\text{Var}\left(y_d - \left[\hat{\alpha} + \hat{\beta}x_d\right]\right) = \sigma^2 + \sigma^2\left(\frac{1}{m} + \frac{\left(x_d - \bar{x}\right)^2}{\sum (x_i - \bar{x})^2}\right) = \sigma^2\left(1+\frac{1}{m} + \frac{\left(x_d - \bar{x}\right)^2}{\sum (x_i - \bar{x})^2}\right) .

Confidence intervals[edit]

The 100(1-\alpha)% confidence intervals are computed as  y_d  \pm t_{\frac{\alpha }{2},m - n - 1} \sqrt{\text {Var}} . Thus, the confidence interval for predicted response is wider than the interval for mean response. This is expected intuitively – the variance of the population of y values does not shrink when one samples from it, because the random variable εi does not decrease, but the variance of the mean of the y does shrink with increased sampling, because the variance in \hat \alpha and \hat \beta decrease, so the mean response (predicted response value) becomes closer to \alpha + \beta x_d.

This is analogous to the difference between the variance of a population and the variance of the sample mean of a population: the variance of a population is a parameter and does not change, but the variance of the sample mean decreases with increased samples.

General linear regression[edit]

The general linear model can be written as

y_i=\sum_{j=1}^{j=n}X_{ij}\beta_j + \epsilon_i\,

Therefore since y_d=\sum_{j=1}^{j=n} X_{dj}\hat\beta_j the general expression for the variance of the mean response is

\text{Var}\left(\sum_{j=1}^{j=n} X_{dj}\hat\beta_j\right)= \sum_{i=1}^{i=n}\sum_{j=1}^{j=n}X_{di}M_{ij}X_{dj},

where M is the covariance matrix of the parameters, given by

\mathbf{M}=\sigma^2\left(\mathbf{X^TX}\right)^{-1}.

References[edit]

  • Draper, N.R.; Smith, H. (1998). Applied Regression Analysis (3rd ed.). John Wiley. ISBN 0-471-17082-8.