High-leverage points are those observations, if any, made at extreme or outlying values of the independent variables such that the lack of neighboring observations means that the fitted regression model will pass close to that particular observation.
Modern computer packages for statistical analysis include, as part of their facilities for regression analysis, various quantitative measures for identifying influential observations: among these measures is partial leverage, a measure of how a variable contributes to the leverage of a datum.
Linear regression model
In the linear regression model, the leverage score for the i-th data unit is defined as:
where and are the fitted and measured observation, respectively.
Bounds on leverage
First, note that H is an idempotent matrix: Also, observe that is symmetric. So equating the ii element of H to that of H 2, we have
Effect on residual variance
then where (the i th regression residual).
In other words, if the model errors are homoscedastic, an observation's leverage score determines the degree of noise in the model's misprediction of that observation.
First, note that is idempotent and symmetric. This gives,
The corresponding studentized residual—the residual adjusted for its observation–specific residual variance—is then
where is an appropriate estimate of
- Projection matrix – whose main diagonal entries are the leverages of the observations
- Mahalanobis distance – a measure of leverage of a datum
- Cook's distance – a measure of changes in regression coefficients when an observation is deleted
- Outliers – observations with extreme Y values
- Everitt, B. S. (2002). Cambridge Dictionary of Statistics. Cambridge University Press. ISBN 0-521-81099-X.
- Cardinali, C. (June 2013). "Data Assimilation: Observation influence diagnostic of a data assimilation system" (PDF).