# Ridge regression

Ridge regression is a method of estimating the coefficients of multiple-regression models in scenarios where linearly independent variables are highly correlated. It has been used in many fields including econometrics, chemistry, and engineering.

The theory was first introduced by Hoerl and Kennard in 1970 in their Technometrics papers “RIDGE regressions: biased estimation of nonorthogonal problems” and “RIDGE regressions: applications in nonorthogonal problems”. This was the result of ten years of research into the field of ridge analysis.

Ridge regression was developed as a possible solution to the imprecision of least square estimators when linear regression models have some multicollinear (highly correlated) independent variables—by creating a ridge regression estimator (RR). This provides a more precise ridge parameters estimate, as its variance and mean square estimator are often smaller than the least square estimators previously derived.

## Mathematical details

In standard linear regression, an ${\textstyle n\times 1}$ column vector ${\textstyle y}$ is to be projected onto the column space of the ${\textstyle n\times p}$ design matrix ${\textstyle X}$ (typically ${\textstyle p\ll n}$ ) whose columns are highly correlated. The ordinary least squares estimator of the coefficients ${\textstyle \beta \in \mathbb {R} ^{p\times 1}}$ by which the columns are multiplied to get the orthogonal projection ${\textstyle X\beta }$ is

${\widehat {\beta }}=(X^{T}X)^{-1}X^{T}y$ (where ${\textstyle X^{T}}$ is the transpose of ${\textstyle X}$ ).

By contrast, the ridge regression estimator is

${\widehat {\beta }}_{\text{ridge}}=(X^{T}X+kI_{p})^{-1}X^{T}y$ where ${\textstyle I_{p}}$ is the ${\textstyle p\times p}$ identity matrix and ${\textstyle k>0}$ is small. The name 'ridge' refers to the shape along the diagonal of I.