Tikhonov regularization: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Inline

Latest revision as of 12:08, 17 October 2022

Redirect to:

Ridge regression#Tikhonov regularization

From a merge: This is a redirect from a page that was merged into another page. This redirect was kept in order to preserve the edit history of this page after its content was merged into the content of the target page. Please do not remove the tag that generates this text (unless the need to recreate content on this page has been demonstrated) or delete this page.
- For redirects with substantive page histories that did not result from page merges use {{R with history}} instead.

To a section: This is a redirect from a topic that does not have its own page to a section of a page on the subject. For redirects to embedded anchors on a page, use {{R to anchor}} instead.

@@ Line 1: / Line 1: @@
+#REDIRECT [[Ridge regression#Tikhonov regularization]]
-{{Short description|Regularization technique for ill-posed problems}}
-{{Merge from|Ridge regression|discuss=Talk:Tikhonov regularization#Proposed merge of Ridge regression into Tikhonov regularization|date=March 2021}}
-{{Regression bar}}
-'''Tikhonov regularization''', named for [[Andrey Nikolayevich Tikhonov|Andrey Tikhonov]], is a method of [[regularization (mathematics)|regularization]] of [[ill-posed problem]]s. Also known as '''ridge regression''',{{efn|In [[statistics]], the method is known as '''ridge regression''', in [[machine learning]] it and its modifications are known as '''weight decay''', and with multiple independent discoveries, it is also variously known as the '''Tikhonov–Miller method''', the '''Phillips–Twomey method''', the '''constrained linear inversion''' method, '''{{math|''L''<sub>2</sub>}} regularization''', and the method of '''linear regularization'''. It is related to the [[Levenberg–Marquardt algorithm]] for [[non-linear least squares|non-linear least-squares]] problems.}} it is particularly useful to mitigate the problem of [[multicollinearity]] in [[linear regression]], which commonly occurs in models with large numbers of parameters.<ref>{{cite book |first=Peter |last=Kennedy |author-link=Peter Kennedy (economist) |title=A Guide to Econometrics |location=Cambridge |publisher=The MIT Press |edition=Fifth |year=2003 |isbn=0-262-61183-X |pages=205–206 |url=https://books.google.com/books?id=B8I5SP69e4kC&pg=PA205 }}</ref> In general, the method provides improved [[Efficient estimator|efficiency]] in parameter estimation problems in exchange for a tolerable amount of [[Bias of an estimator|bias]] (see [[bias–variance tradeoff]]).<ref>{{cite book |first=Marvin |last=Gruber |title=Improving Efficiency by Shrinkage: The James–Stein and Ridge Regression Estimators |location=Boca Raton |publisher=CRC Press |year=1998 |pages=7–15 |isbn=0-8247-0156-9 |url=https://books.google.com/books?id=wmA_R3ZFrXYC&pg=PA7 }}</ref>
+{{R from merge}}
-In the simplest case, the problem of a [[Singular matrices|near-singular]] [[moment matrix]] <math>(\mathbf{X}^\mathsf{T}\mathbf{X})</math> is alleviated by adding positive elements to the [[Main diagonal|diagonals]], thereby decreasing its [[condition number]]. Analogous to the [[ordinary least squares]] estimator, the simple ridge estimator is then given by
+{{R to section}}
-:<math>\hat{\beta}_{R} = (\mathbf{X}^{\mathsf{T}} \mathbf{X} + \lambda \mathbf{I})^{-1} \mathbf{X}^{\mathsf{T}} \mathbf{y}</math>
-where <math>\mathbf{y}</math> is the [[regressand]], <math>\mathbf{X}</math> is the [[design matrix]], <math>\mathbf{I}</math> is the [[identity matrix]], and the ridge parameter <math>\lambda \geq 0</math> serves as the constant shifting the diagonals of the moment matrix.<ref>For the choice of <math>\lambda</math> in practice, see {{cite journal |first1=Ghadban |last1=Khalaf |first2=Ghazi |last2=Shukur |title=Choosing Ridge Parameter for Regression Problems |journal=[[Communications in Statistics – Theory and Methods]] |volume=34 |year=2005 |issue=5 |pages=1177–1182 |doi=10.1081/STA-200056836 |s2cid=122983724 }}</ref> It can be shown that this estimator is the solution to the [[least squares]] problem subject to the [[Constraint (mathematics)|constraint]] <math>\beta^\mathsf{T}\beta = c</math>, which can be expressed as a Lagrangian:
-:<math>\min_{\beta} \, (\mathbf{y} - \mathbf{X} \beta)^\mathsf{T}(\mathbf{y} - \mathbf{X} \beta) + \lambda (\beta^\mathsf{T}\beta - c)</math>
-which shows that <math>\lambda</math> is nothing but the [[Lagrange multiplier]] of the constraint. Typically, <math>\lambda</math> is chosen according to a heuristic criterion, so that the constraint will not be satisfied exactly. Specifically in the case of <math>\lambda = 0</math>, in which the [[Non-binding constraint|constraint is non-binding]], the ridge estimator reduces to [[ordinary least squares]]. A more general approach to Tikhonov regularization is discussed below.
-==History==
-Tikhonov regularization has been invented independently in many different contexts.
-It became widely known from its application to integral equations from the work of
-[[Andrey Nikolayevich Tikhonov|Andrey Tikhonov]]<ref>{{Cite journal| last=Tikhonov | first=Andrey Nikolayevich | author-link=Andrey Nikolayevich Tikhonov | year=1943 | title=Об устойчивости обратных задач |trans-title=On the stability of inverse problems | journal=[[Doklady Akademii Nauk SSSR]] | volume=39 | issue=5 | pages=195–198|url=http://a-server.math.nsc.ru/IPP/BASE_WORK/tihon_en.html| archive-url=https://web.archive.org/web/20050227163812/http://a-server.math.nsc.ru/IPP/BASE_WORK/tihon_en.html | archive-date=2005-02-27 }}</ref><ref>{{Cite journal| last=Tikhonov | first=A. N. | year=1963 | title=О решении некорректно поставленных задач и методе регуляризации | journal=Doklady Akademii Nauk SSSR | volume=151 | pages=501–504}}. Translated in {{Cite journal| journal=Soviet Mathematics | volume=4 | pages=1035–1038 | title=Solution of incorrectly formulated problems and the regularization method }}</ref><ref>{{Cite book| last=Tikhonov | first=A. N. |author2=V. Y. Arsenin  | year=1977 | title=Solution of Ill-posed Problems | publisher=Winston & Sons | location=Washington | isbn=0-470-99124-0}}</ref><ref>{{cite book |last1=Tikhonov |first1=Andrey Nikolayevich |last2=Goncharsky |first2=A. |last3=Stepanov |first3=V. V. |last4=Yagola |first4=Anatolij Grigorevic |title=Numerical Methods for the Solution of Ill-Posed Problems |date=30 June 1995 |publisher=Springer Netherlands |location=Netherlands |isbn=079233583X |url=https://www.springer.com/us/book/9780792335832 |access-date=9 August 2018 |ref=TikhonovSpringer1995Numerical}}</ref><ref>{{cite book |last1=Tikhonov |first1=Andrey Nikolaevich |last2=Leonov |first2=Aleksandr S. |last3=Yagola |first3=Anatolij Grigorevic |title=Nonlinear ill-posed problems |date=1998 |publisher=Chapman & Hall |location=London |isbn=0412786605 |url=https://www.springer.com/us/book/9789401751698 |access-date=9 August 2018 |ref=TikhonovChapmanHall1998Nonlinear}}</ref> and David L. Phillips.<ref>{{Cite journal | last1 = Phillips | first1 = D. L. | doi = 10.1145/321105.321114 | title = A Technique for the Numerical Solution of Certain Integral Equations of the First Kind | journal = Journal of the ACM | volume = 9 | pages = 84–97 | year = 1962 | s2cid = 35368397 }}</ref> Some authors use the term '''Tikhonov–Phillips regularization'''.
-The finite-dimensional case was expounded by [[Arthur E. Hoerl]], who took a statistical approach,<ref>{{cite journal |last1=Hoerl |first1=Arthur E. |title=Application of Ridge Analysis to Regression Problems |journal=Chemical Engineering Progress |date=1962 |volume=58 |issue=3 |pages=54–59 |ref=AEHoerl1962V58I3}}</ref> and by Manus Foster, who interpreted this method as a [[Kriging|Wiener–Kolmogorov (Kriging)]] filter.<ref>{{Cite journal | last1 = Foster | first1 = M. | title = An Application of the Wiener-Kolmogorov Smoothing Theory to Matrix Inversion | doi = 10.1137/0109031 | journal = Journal of the Society for Industrial and Applied Mathematics | volume = 9 | issue = 3 | pages = 387–392 | year = 1961 }}</ref> Following Hoerl, it is known in the statistical literature as ridge regression,<ref>{{cite journal | last = Hoerl | first = A. E. |author2=R. W. Kennard | year = 1970 | title=Ridge regression: Biased estimation for nonorthogonal problems | journal=Technometrics | volume=12 | issue=1 | pages = 55–67 | doi=10.1080/00401706.1970.10488634}}</ref> named after the shape along the diagonal of the identity matrix.
-== Tikhonov regularization ==
-Suppose that for a known matrix <math>A</math> and vector <math>\mathbf{b}</math>, we wish to find a vector <math>\mathbf{x}</math> such that{{Clarify|reason=what are the relative dimensions of A, b and x/ is A a square or non-square matrix?; are x and y of the same dimension|date=May 2020}}
-: <math>A\mathbf{x} = \mathbf{b}.</math>
-The standard approach is [[ordinary least squares]] linear regression.{{Clarify|reason=does this represent a system of linear equations (i.e. are x and b both of the same dimension as one side of the - supposedly square - matrix? then, as far as I know, the standard approach for solving it is any of a wide range of solvers ''not'' including linear regression|date=May 2020}} However, if no <math>\mathbf{x}</math> satisfies the equation or more than one <math>\mathbf{x}</math> does—that is, the solution is not unique—the problem is said to be [[Well-posed problem|ill posed]]. In such cases, ordinary least squares estimation leads to an [[Overdetermined system|overdetermined]], or more often an [[Underdetermined system|underdetermined]] system of equations.  Most real-world phenomena have the effect of [[low-pass filters]] in the forward direction where <math>A</math> maps <math>\mathbf{x}</math> to <math>\mathbf{b}</math>.  Therefore, in solving the inverse-problem, the inverse mapping operates as a [[high-pass filter]] that has the undesirable tendency of amplifying noise ([[eigenvalues]] / singular values are largest in the reverse mapping where they were smallest in the forward mapping).  In addition, ordinary least squares implicitly nullifies every element of the reconstructed version of <math>\mathbf{x}</math> that is in the null-space of <math>A</math>, rather than allowing for a model to be used as a prior for <math>\mathbf{x}</math>.
-Ordinary least squares seeks to minimize the sum of squared [[Residual (numerical analysis)|residuals]], which can be compactly written as
-: <math>\|A\mathbf{x} - \mathbf{b}\|_2^2,</math>
-where <math>\|\cdot\|_2</math> is the [[Norm (mathematics)#Euclidean norm|Euclidean norm]].
-In order to give preference to a particular solution with desirable properties, a regularization term can be included in this minimization:
-: <math>\|A\mathbf{x} - \mathbf{b}\|_2^2 + \|\Gamma \mathbf{x}\|_2^2</math>
-for some suitably chosen '''Tikhonov matrix''' <math>\Gamma </math>. In many cases, this matrix is chosen as a scalar multiple of the [[identity matrix]] (<math>\Gamma = \alpha I</math>), giving preference to solutions with smaller [[Norm (mathematics)|norms]]; this is known as '''{{math|''L''<sub>2</sub>}} regularization'''.<ref>{{cite conference |first=Andrew Y. |last=Ng |author-link=Andrew Ng |year=2004 |title=Feature selection, L1 vs. L2 regularization, and rotational invariance |conference=Proc. [[International Conference on Machine Learning|ICML]] |url=https://icml.cc/Conferences/2004/proceedings/papers/354.pdf}}</ref> In other cases, high-pass operators (e.g., a [[difference operator]] or a weighted [[discrete fourier transform|Fourier operator]]) may be used to enforce smoothness if the underlying vector is believed to be mostly continuous.
-This regularization improves the conditioning of the problem, thus enabling a direct numerical solution. An explicit solution, denoted by <math>\hat{x}</math>, is given by
-: <math>\hat{x} = (A^\top A + \Gamma^\top \Gamma)^{-1} A^\top \mathbf{b}.</math>
-The effect of regularization may be varied by the scale of matrix <math>\Gamma</math>. For <math>\Gamma = 0</math> this reduces to the unregularized least-squares solution, provided that (A<sup>T</sup>A)<sup>−1</sup> exists.
-{{math|''L''<sub>2</sub>}} regularization is used in many contexts aside from linear regression, such as [[Statistical classification|classification]] with [[logistic regression]] or [[support vector machine]]s,<ref>{{cite journal |author1=R.-E. Fan |author2=K.-W. Chang |author3=C.-J. Hsieh |author4=X.-R. Wang |author5=C.-J. Lin |title=LIBLINEAR: A library for large linear classification |journal=[[Journal of Machine Learning Research]] |volume=9 |pages=1871–1874 |year=2008}}</ref> and matrix factorization.<ref>{{cite journal |last1=Guan |first1=Naiyang |first2=Dacheng |last2=Tao |first3=Zhigang |last3=Luo |first4=Bo |last4=Yuan |title=Online nonnegative matrix factorization with robust stochastic approximation |journal=IEEE Transactions on Neural Networks and Learning Systems |volume=23 |issue=7 |year=2012 |pages=1087–1099|doi=10.1109/TNNLS.2012.2197827 |pmid=24807135 |s2cid=8755408 }}</ref>
-===Generalized Tikhonov regularization===
-For general multivariate normal distributions for <math>x</math> and the data error, one can apply a transformation of the variables to reduce to the case above. Equivalently, one can seek an <math>x</math> to minimize
-: <math>\|Ax - b\|_P^2 + \|x - x_0\|_Q^2,</math>
-where we have used <math>\|x\|_Q^2</math> to stand for the weighted norm squared <math>x^\top Q x</math> (compare with the [[Mahalanobis distance]]). In the Bayesian interpretation <math>P</math> is the inverse [[covariance matrix]] of <math>b</math>, <math>x_0</math> is the [[expected value]] of <math>x</math>, and <math>Q</math> is the inverse covariance matrix of <math>x</math>. The Tikhonov matrix is then given as a factorization of the matrix <math>Q = \Gamma^\top \Gamma</math> (e.g. the [[Cholesky factorization]]) and is considered a [[Whitening transformation|whitening filter]].
-This generalized problem has an optimal solution <math>x^*</math> which can be written explicitly using the formula
-: <math>x^* = (A^\top PA + Q)^{-1} (A^\top Pb + Qx_0),</math>
-or equivalently
-: <math>x^* = x_0 + (A^\top PA + Q)^{-1} (A^\top P(b - Ax_0)).</math>
-==Lavrentyev regularization==
-In some situations, one can avoid using the transpose <math>A^\top</math>, as proposed by [[Mikhail Lavrentyev]].<ref>{{cite book |first=M. M. |last=Lavrentiev |title=Some Improperly Posed Problems of Mathematical Physics |publisher=Springer |location=New York |year=1967 }}</ref> For example, if <math>A</math> is symmetric positive definite, i.e. <math>A = A^\top > 0</math>, so is its inverse <math>A^{-1}</math>, which can thus be used to set up the weighted norm squared <math>\|x\|_P^2  = x^\top A^{-1} x</math> in the generalized Tikhonov regularization, leading to minimizing
-: <math>\|Ax - b\|_{A^{-1}}^2 + \|x - x_0\|_Q^2</math>
-or, equivalently up to a constant term,
-: <math>x^\top (A+Q)x - 2 x^\top (b + Qx_0)</math>.
-This minimization problem has an optimal solution <math>x^*</math> which can be written explicitly using the formula
-: <math>x^* = (A + Q)^{-1} (b + Qx_0)</math>,
-which is nothing but the solution of the generalized Tikhonov problem where <math>A = A^\top =P^{-1}.</math>
-The Lavrentyev regularization, if applicable, is advantageous to the original Tikhonov regularization, since the Lavrentyev matrix <math>A + Q</math> can be better conditioned, i.e., have a smaller [[condition number]], compared to the Tikhonov matrix <math>A^\top A + \Gamma^\top \Gamma.</math>
-==Regularization in Hilbert space==
-Typically discrete linear ill-conditioned problems result from discretization of [[integral equation]]s, and one can formulate a Tikhonov regularization in the original infinite-dimensional context. In the above we can interpret <math>A</math> as a [[compact operator]] on [[Hilbert space]]s, and <math>x</math> and <math>b</math> as elements in the domain and range of <math>A</math>. The operator <math>A^* A + \Gamma^\top \Gamma </math> is then a [[Hermitian adjoint|self-adjoint]] bounded invertible operator.
-==Relation to singular-value decomposition and Wiener filter==
-With <math>\Gamma = \alpha I</math>, this least-squares solution can be analyzed in a special way using the [[singular-value decomposition]]. Given the singular value decomposition
-:<math>A = U \Sigma V^\top</math>
-with singular values <math>\sigma _i</math>, the Tikhonov regularized solution can be expressed as
-:<math>\hat{x} = V D U^\top b,</math>
-where <math>D</math> has diagonal values
-:<math>D_{ii} = \frac{\sigma_i^2}{\sigma_i^2 + \alpha^2}</math>
-and is zero elsewhere. This demonstrates the effect of the Tikhonov parameter on the [[condition number]] of the regularized problem. For the generalized case, a similar representation can be derived using a [[generalized singular-value decomposition]].<ref name="Hansen_SIAM_1998">{{cite book |last1=Hansen |first1=Per Christian |title=Rank-Deficient and Discrete Ill-Posed Problems: Numerical Aspects of Linear Inversion |date=Jan 1, 1998 |publisher=SIAM |location=Philadelphia, USA |isbn=9780898714036 |edition=1st }}</ref>
-Finally, it is related to the [[Wiener filter]]:
-:<math>\hat{x} = \sum _{i=1}^q f_i \frac{u_i^\top b}{\sigma_i} v_i,</math>
-where the Wiener weights are <math>f_i = \frac{\sigma _i^2}{\sigma_i^2 + \alpha^2}</math> and <math>q</math> is the [[Rank (linear algebra)|rank]] of <math>A</math>.
-==Determination of the Tikhonov factor==
-The optimal regularization parameter <math>\alpha</math> is usually unknown and often in practical problems is determined by an ''ad hoc'' method. A possible approach relies on the Bayesian interpretation described below. Other approaches include the [[discrepancy principle]], [[cross-validation (statistics)|cross-validation]], [[L-curve method]],<ref>P. C. Hansen, "The L-curve and its use in the
-numerical treatment of inverse problems", [https://www.sintef.no/globalassets/project/evitameeting/2005/lcurve.pdf]</ref> [[restricted maximum likelihood]] and [[unbiased predictive risk estimator]]. [[Grace Wahba]] proved that the optimal parameter, in the sense of [[cross-validation (statistics)#Leave-one-out cross-validation|leave-one-out cross-validation]] minimizes<ref>{{cite journal |last=Wahba |first=G. |year=1990 |title=Spline Models for Observational Data |journal=CBMS-NSF Regional Conference Series in Applied Mathematics |publisher=Society for Industrial and Applied Mathematics |bibcode=1990smod.conf.....W }}</ref><ref>{{cite journal |last3=Wahba |first3=G. |first1=G. |last1=Golub |first2=M. |last2=Heath |year=1979 |title=Generalized cross-validation as a method for choosing a good ridge parameter |journal=Technometrics |volume=21 |issue=2 |pages=215–223 |url=http://www.stat.wisc.edu/~wahba/ftp1/oldie/golub.heath.wahba.pdf |doi=10.1080/00401706.1979.10489751}}</ref>
-:<math>G = \frac{\operatorname{RSS}}{\tau^2} = \frac{\|X \hat{\beta} - y\|^2}{[\operatorname{Tr}(I - X(X^T X + \alpha^2 I)^{-1} X^T)]^2},</math>
-where <math>\operatorname{RSS}</math> is the [[residual sum of squares]], and <math>\tau</math> is the [[effective number of degrees of freedom]].
-Using the previous SVD decomposition, we can simplify the above expression:
-:<math>\operatorname{RSS} = \left\| y - \sum_{i=1}^q (u_i' b) u_i \right\|^2 + \left\| \sum _{i=1}^q \frac{\alpha^2}{\sigma_i^2 + \alpha^2} (u_i' b) u_i \right\|^2,</math>
-:<math>\operatorname{RSS} = \operatorname{RSS}_0 + \left\| \sum_{i=1}^q \frac{\alpha^2}{\sigma_i^2 + \alpha^2} (u_i' b) u_i \right\|^2,</math>
-and
-:<math>\tau = m - \sum_{i=1}^q \frac{\sigma_i^2}{\sigma_i^2 + \alpha^2}
-= m - q + \sum_{i=1}^q \frac{\alpha^2}{\sigma _i^2 + \alpha^2}.</math>
-==Relation to probabilistic formulation==
-The probabilistic formulation of an [[inverse problem]] introduces (when all uncertainties are Gaussian) a covariance matrix <math> C_M</math> representing the ''a priori'' uncertainties on the model parameters, and a covariance matrix <math> C_D</math> representing the uncertainties on the observed parameters.<ref>{{cite book |last1=Tarantola |first1=Albert |title=Inverse Problem Theory and Methods for Model Parameter Estimation |date=2005 |publisher=Society for Industrial and Applied Mathematics (SIAM) |location=Philadelphia |isbn=0898717922 |edition=1st |url=http://www.ipgp.jussieu.fr/~tarantola/Files/Professional/SIAM/index.html |access-date=9 August 2018 |ref=ATarantolaSIAM2004}}</ref> In the special case when these two matrices are diagonal and isotropic, <math> C_M = \sigma_M^2 I </math> and <math> C_D = \sigma_D^2 I </math>, and, in this case, the equations of inverse theory reduce to the equations above, with <math> \alpha = {\sigma_D}/{\sigma_M} </math>.
-==Bayesian interpretation==
-{{main|Bayesian interpretation of regularization}}
-{{Further|Minimum mean square error#Linear MMSE estimator for linear observation process}}
-Although at first the choice of the solution to this regularized problem may look artificial, and indeed the matrix <math>\Gamma</math> seems rather arbitrary, the process can be justified from a [[Bayesian probability|Bayesian point of view]]. Note that for an ill-posed problem one must necessarily introduce some additional assumptions in order to get a unique solution. Statistically, the [[prior probability]] distribution of <math>x</math> is sometimes taken to be a [[multivariate normal distribution]]. For simplicity here, the following assumptions are made: the means are zero; their components are independent; the components have the same [[standard deviation]] <math>\sigma _x</math>. The data are also subject to errors, and the errors in <math>b</math> are also assumed to be [[statistical independence|independent]] with zero mean and standard deviation <math>\sigma _b</math>. Under these assumptions the Tikhonov-regularized solution is the [[maximum a posteriori|most probable]] solution given the data and the ''a priori'' distribution of <math>x</math>, according to [[Bayes' theorem]].<ref>{{cite book |author=Vogel, Curtis R. |title=Computational methods for inverse problems |publisher=Society for Industrial and Applied Mathematics |location=Philadelphia |year=2002 |isbn=0-89871-550-4 }}</ref>
-If the assumption of [[normal distribution|normality]] is replaced by assumptions of [[homoscedasticity]] and uncorrelatedness of [[errors and residuals in statistics|errors]], and if one still assumes zero mean, then the [[Gauss–Markov theorem]] entails that the solution is the minimal [[Bias of an estimator|unbiased linear estimator]].<ref>{{cite book |last=Amemiya |first=Takeshi |author-link=Takeshi Amemiya |year=1985 |title=Advanced Econometrics |publisher=Harvard University Press |pages=[https://archive.org/details/advancedeconomet00amem/page/60 60–61] |isbn=0-674-00560-0 |url-access=registration |url=https://archive.org/details/advancedeconomet00amem/page/60 }}</ref>
-==See also==
-* [[Lasso (statistics)|LASSO estimator]] is another regularization method in statistics.
-* [[Elastic net regularization]]
-* [[Matrix regularization]]
-==Notes==
-{{notelist}}
-==References==
-{{Reflist}}
-==Further reading==
-*{{cite book |first=Marvin |last=Gruber |title=Improving Efficiency by Shrinkage: The James–Stein and Ridge Regression Estimators |location=Boca Raton |publisher=CRC Press |year=1998 |isbn=0-8247-0156-9 |url=https://books.google.com/books?id=wmA_R3ZFrXYC }}
-* {{cite book |last=Kress |first=Rainer |title=Numerical Analysis |location=New York |publisher=Springer |year=1998 |isbn=0-387-98408-9 |pages=86–90 |chapter=Tikhonov Regularization |chapter-url=https://books.google.com/books?id=Jv_ZBwAAQBAJ&pg=PA86 }}
-* {{Cite book | last1=Press | first1=W. H. | last2=Teukolsky | first2=S. A. | last3=Vetterling | first3=W. T. | last4=Flannery | first4=B. P. | year=2007 | title=Numerical Recipes: The Art of Scientific Computing | edition=3rd | publisher=Cambridge University Press |  location=New York | isbn=978-0-521-88068-8 | chapter=Section 19.5. Linear Regularization Methods | chapter-url=http://apps.nrbook.com/empanel/index.html#pg=1006}}
-* {{cite book |first1=A. K. Md. Ehsanes |last1=Saleh |first2=Mohammad |last2=Arashi |first3=B. M. Golam |last3=Kibria |title=Theory of Ridge Regression Estimation with Applications |location=New York |publisher=John Wiley & Sons |year=2019 |isbn=978-1-118-64461-4 |url=https://books.google.com/books?id=v0KCDwAAQBAJ }}
-* {{cite book |first=Matt |last=Taddy |title=Business Data Science: Combining Machine Learning and Economics to Optimize, Automate, and Accelerate Business Decisions |chapter=Regularization |pages=69–104 |location=New York |publisher=McGraw-Hill |year=2019 |isbn=978-1-260-45277-8 |chapter-url=https://books.google.com/books?id=yPOUDwAAQBAJ&pg=PA69 }}
-{{Least_squares_and_regression_analysis}}
-{{Authority control}}
 [[Category:Linear algebra]]