# Ridge regression

Ridge regression is a method of estimating the coefficients of multiple-regression models in scenarios where linearly independent variables are highly correlated.[1] It has been used in many fields including econometrics, chemistry, and engineering.[2]

The theory was first introduced by Hoerl and Kennard in 1970 in their Technometrics papers “RIDGE regressions: biased estimation of nonorthogonal problems” and “RIDGE regressions: applications in nonorthogonal problems”.[3][4][1] This was the result of ten years of research into the field of ridge analysis.[5]

Ridge regression was developed as a possible solution to the imprecision of least square estimators when linear regression models have some multicollinear (highly correlated) independent variables—by creating a ridge regression estimator (RR). This provides a more precise ridge parameters estimate, as its variance and mean square estimator are often smaller than the least square estimators previously derived.[6][2]

## Mathematical details

In standard linear regression, an ${\textstyle n\times 1}$ column vector ${\textstyle y}$ is to be projected onto the column space of the ${\textstyle n\times p}$ design matrix ${\textstyle X}$ (typically ${\textstyle p\ll n}$) whose columns are highly correlated. The ordinary least squares estimator of the coefficients ${\textstyle \beta \in \mathbb {R} ^{p\times 1}}$ by which the columns are multiplied to get the orthogonal projection ${\textstyle X\beta }$ is

${\displaystyle {\widehat {\beta }}=(X^{T}X)^{-1}X^{T}y}$

(where ${\textstyle X^{T}}$ is the transpose of ${\textstyle X}$).

By contrast, the ridge regression estimator is

${\displaystyle {\widehat {\beta }}_{\text{ridge}}=(X^{T}X+kI_{p})^{-1}X^{T}y}$

where ${\textstyle I_{p}}$ is the ${\textstyle p\times p}$ identity matrix and ${\textstyle k>0}$ is small. The name 'ridge' refers to the shape along the diagonal of I.

## References

1. ^ a b Hilt, Donald E.; Seegrist, Donald W. (1977). Ridge, a computer program for calculating ridge regression estimates. doi:10.5962/bhl.title.68934.[page needed]
2. ^ a b Gruber, Marvin (1998). Improving Efficiency by Shrinkage: The James--Stein and Ridge Regression Estimators. CRC Press. p. 2. ISBN 978-0-8247-0156-7.
3. ^ Hoerl, Arthur E.; Kennard, Robert W. (1970). "Ridge Regression: Biased Estimation for Nonorthogonal Problems". Technometrics. 12 (1): 55–67. doi:10.2307/1267351. JSTOR 1267351.
4. ^ Hoerl, Arthur E.; Kennard, Robert W. (1970). "Ridge Regression: Applications to Nonorthogonal Problems". Technometrics. 12 (1): 69–82. doi:10.2307/1267352. JSTOR 1267352.
5. ^ Beck, James Vere; Arnold, Kenneth J. (1977). Parameter Estimation in Engineering and Science. James Beck. p. 287. ISBN 978-0-471-06118-2.
6. ^ Jolliffe, I. T. (2006). Principal Component Analysis. Springer Science & Business Media. p. 178. ISBN 978-0-387-22440-4.