# Mean and predicted response

(Redirected from Mean response)

In linear regression mean response and predicted response are values of the dependent variable calculated from the regression parameters and a given value of the independent variable. The values of these two responses are the same, but their calculated variances are different.

## Background

Further information: Straight line fitting

In straight line fitting, the model is

${\displaystyle y_{i}=\alpha +\beta x_{i}+\epsilon _{i}\,}$

where ${\displaystyle y_{i}}$ is the response variable, ${\displaystyle x_{i}}$ is the explanatory variable, εi is the random error, and ${\displaystyle \alpha }$ and ${\displaystyle \beta }$ are parameters. The predicted response value for a given explanatory value, xd, is given by

${\displaystyle {\hat {y}}_{d}={\hat {\alpha }}+{\hat {\beta }}x_{d},}$

while the actual response would be

${\displaystyle y_{d}=\alpha +\beta x_{d}+\epsilon _{d}\,}$

Expressions for the values and variances of ${\displaystyle {\hat {\alpha }}}$ and ${\displaystyle {\hat {\beta }}}$ are given in linear regression.

## Mean response

Mean response is an estimate of the mean of the y population associated with xd, that is ${\displaystyle E(y|x_{d})={\hat {y}}_{d}\!}$. The variance of the mean response is given by

${\displaystyle {\text{Var}}\left({\hat {\alpha }}+{\hat {\beta }}x_{d}\right)={\text{Var}}\left({\hat {\alpha }}\right)+\left({\text{Var}}{\hat {\beta }}\right)x_{d}^{2}+2x_{d}{\text{Cov}}\left({\hat {\alpha }},{\hat {\beta }}\right).}$

This expression can be simplified to

${\displaystyle {\text{Var}}\left({\hat {\alpha }}+{\hat {\beta }}x_{d}\right)=\sigma ^{2}\left({\frac {1}{m}}+{\frac {\left(x_{d}-{\bar {x}}\right)^{2}}{\sum (x_{i}-{\bar {x}})^{2}}}\right).}$

To demonstrate this simplification, one can make use of the identity

${\displaystyle \sum (x_{i}-{\bar {x}})^{2}=\sum x_{i}^{2}-{\frac {1}{m}}\left(\sum x_{i}\right)^{2}.}$

## Predicted response

The predicted response distribution is the predicted distribution of the residuals at the given point xd. So the variance is given by

${\displaystyle {\text{Var}}\left(y_{d}-\left[{\hat {\alpha }}+{\hat {\beta }}x_{d}\right]\right)={\text{Var}}\left(y_{d}\right)+{\text{Var}}\left({\hat {\alpha }}+{\hat {\beta }}x_{d}\right).}$

The second part of this expression was already calculated for the mean response. Since ${\displaystyle {\text{Var}}\left(y_{d}\right)=\sigma ^{2}}$ (a fixed but unknown parameter that can be estimated), the variance of the predicted response is given by

${\displaystyle {\text{Var}}\left(y_{d}-\left[{\hat {\alpha }}+{\hat {\beta }}x_{d}\right]\right)=\sigma ^{2}+\sigma ^{2}\left({\frac {1}{m}}+{\frac {\left(x_{d}-{\bar {x}}\right)^{2}}{\sum (x_{i}-{\bar {x}})^{2}}}\right)=\sigma ^{2}\left(1+{\frac {1}{m}}+{\frac {\left(x_{d}-{\bar {x}}\right)^{2}}{\sum (x_{i}-{\bar {x}})^{2}}}\right).}$

## Confidence intervals

Main article: Confidence interval
Further information: Prediction interval

The ${\displaystyle 100(1-\alpha )\%}$ confidence intervals are computed as ${\displaystyle y_{d}\pm t_{{\frac {\alpha }{2}},m-n-1}{\sqrt {\text{Var}}}}$. Thus, the confidence interval for predicted response is wider than the interval for mean response. This is expected intuitively – the variance of the population of ${\displaystyle y}$ values does not shrink when one samples from it, because the random variable εi does not decrease, but the variance of the mean of the ${\displaystyle y}$ does shrink with increased sampling, because the variance in ${\displaystyle {\hat {\alpha }}}$ and ${\displaystyle {\hat {\beta }}}$ decrease, so the mean response (predicted response value) becomes closer to ${\displaystyle \alpha +\beta x_{d}}$.

This is analogous to the difference between the variance of a population and the variance of the sample mean of a population: the variance of a population is a parameter and does not change, but the variance of the sample mean decreases with increased samples.

## General linear regression

The general linear model can be written as

${\displaystyle y_{i}=\sum _{j=1}^{n}X_{ij}\beta _{j}+\epsilon _{i}\,}$

Therefore since ${\displaystyle y_{d}=\sum _{j=1}^{n}X_{dj}{\hat {\beta }}_{j}}$ the general expression for the variance of the mean response is

${\displaystyle \operatorname {Var} \left(\sum _{j=1}^{n}X_{dj}{\hat {\beta }}_{j}\right)=\sum _{i=1}^{n}\sum _{j=1}^{n}X_{di}S_{ij}X_{dj},}$

where S is the covariance matrix of the parameters, given by

${\displaystyle \mathbf {S} =\sigma ^{2}\left(\mathbf {X^{\mathsf {T}}X} \right)^{-1}}$.

## References

• Draper, N.R.; Smith, H. (1998). Applied Regression Analysis (3rd ed.). John Wiley. ISBN 0-471-17082-8.