Talk:Generalized minimal residual method

WikiProject Mathematics (Rated C-class, Mid-importance)
This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of Mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Mathematics rating:
 C Class
 Mid Importance
Field:  Algebra

Why Krylov subspace?

It isn't clear to me why the Krylov subspace makes a good basis for the solution. —Ben FrantzDale 20:43, 19 January 2007 (UTC)

Are you asking why the Krylov subspace is a good subspace, or why a particular basis for the Krylov subspace is good? -- Jitse Niesen (talk) 06:30, 22 January 2007 (UTC)
I'm asking why the Krylov subspace is good. When I first heard about GMRES, I immediately thought of the power iteration algorithm for finding eigenvalues. That makes me think the Krylov subspaces will contain the directions that “matter most” for approximating the range of the operator, and since it will approximate eigenvectors it would also contain directions that “matter most” for the domain of the operator. Is that intuition reasonable? On the other hand, the fact that power iterations converge on the principle eigenvecctor makes it seem like the Krylov subspaces will basically igore the low-eigenvalue directions. (Perhaps that's tha good thing.) —Ben FrantzDale 13:17, 22 January 2007 (UTC)
Actually, I don't know that much about iterative methods, but I'll give your interesting question a try. Firstly, I think the Krylov subspace is a fairly natural space to consider; it's easy to construct and uses the problem data. The second argument is emperical: the method works. I think this is not a satisfactory answer, so if you ever find something better please let me know.
You're right that there is a connection with power iteration. Krylov methods indeed concentrate on the extremal eigenvalues (not just the largest eigenvalue, as power iteration, but also the small eigenvalues). Compare with the formula for the speed of convergence when the matrix is positive definite: it depends on the smallest and largest eigenvalues. -- Jitse Niesen (talk) 01:41, 23 January 2007 (UTC)
Thanks for the comment. I think I found an answer: If I wanted to construct an approximate solution, x, to ${\displaystyle Ax=b}$, I would want to do it with the eigenvectors of A, starting with the eigenvector with the largest eigenvalue, and going on in descending order. It appears that GMRES approximates this in that the Krylov subspace will quickly head off towards those principle eigenvectors and will eventually span the entire space. According to Arnoldi iteration:
The resulting vectors are a basis of the Krylov subspace, ${\displaystyle {\mathcal {K}}_{n}}$. We may expect the vectors of this basis to give good approximations of the eigenvectors corresponding to the ${\displaystyle n}$ largest eigenvalues, for the same reason that ${\displaystyle A^{n-1}b}$ approximates the dominant eigenvector.
So that basically answers the question.
Two things stood out for me as points of confusion. First, the idea of constructing an input to A out of the output of A seemed unnatural. (I'm thinking in terms of mechanics, so the output might be force when the input is position, making it extra counterintuitive.) Second, it seems like the solution will depend on the initial guess. If it is orthogonal to some of the eigenvectors, won't those eigenvectors never be seen in the result? (That is highly unlikely, I guess, but it could happen in theory, right? I suppose it's more likely when the matrix is ill-conditioned.) —Ben FrantzDale 03:45, 23 January 2007 (UTC)
Suppose A is non-singular. Take the characteristic polynomial c(x). By Caylay-Hamilton, c(A)=0, and c(0)=+-det(A). So it is
${\displaystyle A(c_{1}+c_{2}A+\ldots +A^{n-1})=-c_{0}=(-1)^{n-1}\det(A)}$.
From this a formula for the inverse matrix and thus a formula for the solution results
${\displaystyle A^{-1}={\frac {(-1)^{n-1}}{\det(A)}}(c_{1}+c_{2}A+\ldots +A^{n-1})}$.
That is, if there is a solution x to Ax=b, then it is a linear combination of b,Ab,AAb,... This proves the theoretical correctness of the algorithm and also shows why this particular Krylow space is used.
For the input-output-question one has to assume that some rescaling by an invertible matrix took place so that all entries are of the same unit or, after cancelling it out, are unitless.--LutzL 12:49, 21 September 2007 (UTC)
The main reason for using the Krylov subspace is that you have the residuals for all the Krylov subspace vectors already available from the previous iterations. So forming the minimum residual linear combination is very cheap (especially if you're using a limited history). Basically, it's just re-using the information already calculated previously to a maximum degree, and by this increasing the efficiency of the preconditioner (you can easily see where it went wrong and actually increased the residual norm!). That's all.141.70.179.56 (talk) —Preceding undated comment added 21:25, 28 March 2011 (UTC).

Pseudocode

I'm confused by the pseudocode. First, c and s are never initialized. Second, there seems to be no convergence criteria. So, is this to find the exact x after all n steps? If so, what good is it? JefeDeJefes 20:49, 17 January 2007 (UTC)

I don't understand your first point. All entries in c and s are initialized before they are used (in the first iteration of the j-loop, the i-loop is skipped because the sequence 1, … j-1 is empty). Your second point is spot on: the algorithm finds the exact x after n steps, and no, this is not very useful. The algorithm needs to be amended and cleaned up. -- Jitse Niesen (talk) 14:50, 30 March 2007 (UTC)

Pseudocode bug ?

in the implementation description, the Givens rotation contains the submatrix ((c,s),(-s,c)) in the pseudocode, the Givens rotation contains the submatrix ((c,s)(s,-c)) which version is correct ? 157.161.41.179 21:36, 23 June 2007 (UTC)

((c,s)(s,-c)) is not a rotation, so the version in the pseudocode is probably wrong. Together with the point raised in the previous section, this is enough for me to have my doubts about it (I copied it from the German article at de:GMRES-Verfahren without really checking it). So I removed it from the article and copied it below, until somebody can have a good look at it. -- Jitse Niesen (talk) 10:11, 24 June 2007 (UTC)

Moved from the article:

=== Pseudocode ===

Inputs: A, b, x0.

${\displaystyle r_{0}\leftarrow b-Ax_{0}\,}$
if ${\displaystyle r_{0}=0}$ then

return ${\displaystyle x_{0}}$

end if
${\displaystyle v_{1}={\frac {r_{0}}{|r_{0}|_{2}}}\,}$
${\displaystyle \gamma _{1}=|r_{0}|_{2}}$
for j ← 1, …, n do

for i ← 1, …, j do
${\displaystyle h_{ij}\leftarrow v_{i}^{\top }Av_{j}\,}$
end for
${\displaystyle w_{j}\leftarrow Av_{j}-\sum _{i=1}^{j}h_{ij}v_{i}\,}$
${\displaystyle h_{j+1,j}\leftarrow |w_{j}|_{2}\,}$
for i ← 1, …, j − 1 do
${\displaystyle {\begin{pmatrix}h_{ij}\\h_{i+1,j}\end{pmatrix}}\leftarrow {\begin{pmatrix}c_{i+1}&s_{i+1}\\s_{i+1}&-c_{i+1}\end{pmatrix}}{\begin{pmatrix}h_{ij}\\h_{i+1,j}\end{pmatrix}}}$
end for
${\displaystyle \beta \leftarrow {\sqrt {h_{jj}^{2}+h_{j+1,j}^{2}}}\,}$
${\displaystyle s_{j+1}\leftarrow {\frac {h_{j+1,j}}{\beta }}\,}$
${\displaystyle c_{j+1}\leftarrow {\frac {h_{jj}}{\beta }}\,}$
${\displaystyle h_{jj}\leftarrow \beta \,}$
${\displaystyle \gamma _{j+1}\leftarrow s_{j+1}\gamma _{j}\,}$
${\displaystyle \gamma _{j}\leftarrow c_{j+1}\gamma _{j}\,}$.
if ${\displaystyle \gamma _{j+1}\neq 0}$ then
${\displaystyle v_{j+1}\leftarrow {\frac {w_{j}}{h_{j+1,j}}}\,}$
else
for ij, …, 1 do
${\displaystyle \alpha _{i}\leftarrow {\frac {1}{h_{ii}}}\left(\gamma _{i}-\sum _{k=i+1}^{j}h_{ik}\alpha _{k}\right)\,}$
end for
${\displaystyle x\leftarrow x_{0}+\sum _{i=1}^{j}\alpha _{i}v_{i}\,}$
return x
end if

end for

Wrong definition of Krylov subspaces?

The text states that the n-th Krylov subspace is defined as

${\displaystyle K_{n}=\operatorname {span} \,\{b,Ab,A^{2}b,\ldots ,A^{n-1}b\}\,}$

where ${\displaystyle b}$ is the right-hand side of the system we want to solve. I think this should be substituted by

${\displaystyle K_{n}=\operatorname {span} \,\{r,Ar,A^{2}r,\ldots ,A^{n-1}r\}\,}$

where ${\displaystyle r}$ is the initial residual. Otherwise you could choose ${\displaystyle b}$ as one eigenvector and the initial iterate ${\displaystyle x_{0}}$ as a different one and your Krylov subspace certainly never contains the solution vector.

Or did I misunderstand something? —Preceding unsigned comment added by 195.176.179.209 (talk) 07:41, 23 December 2010 (UTC)

Response: I think that your objection is correct, so I changed to definition of the Krylov space accordingly. Note that the new definition reduces to the old one when the initial iterate x_0 is taken to be zero. Source: Iterative Methods for Nonlinear Equations, by C.T. Kelley. 64.134.243.230 (talk) 23:28, 30 July 2011 (UTC)particle25

intentional obfuscation -- somebody trying to show off with his knowledge of obscure math terms?!

I absolutely agree. I'm just trying to get a little background info on this method so I understand how it's used to fit actual data, but I couldn't decipher all of the jargon in this page. Wikipedia is supposed to be for general public use, not for measuring manhoods and competing to see who can write the page so that the fewest possible people can understand it. --Zoharalson (talk) 20:39, 27 June 2013 (UTC)
Krylov subspace is the correct technical term for a subspace generated by the repeated application of a matrix to a vector. It is quite usual to express the analysis of gradient and quasi-Newton methods in terms of Krylov subspaces. You will find this term quite often if you are going to explore the inner workings of those methods.--LutzL (talk) 18:04, 28 June 2013 (UTC)

Error in convergence?

The english version states for symmetric positive matrices:

${\displaystyle \|r_{n}\|\leq \left({\frac {2\kappa _{2}(A)-1}{2\kappa _{2}(A)}}\right)^{n/2}\|r_{0}\|}$

whereas boh the german and japanese versions state

${\displaystyle \|r_{m}\|_{2}\leq \left({\frac {\kappa _{2}^{2}(A)-1}{\kappa _{2}^{2}(A)}}\right)^{m/2}\|r_{0}\|_{2}.}$

Which one is correct? — Preceding unsigned comment added by 193.54.112.22 (talk) 15:28, 7 June 2013 (UTC)

The German / Japanese versions are correct. To obtain this, simply start from the preceding bound and note that for the symmetric matrix ${\displaystyle M}$, we have ${\displaystyle \lambda _{\min }(M+M^{T})/2=\lambda _{\min }(M)=\sigma _{\min }(M)}$ and ${\displaystyle \|M\|=\lambda _{\max }(M)=\sigma _{\max }(M)}$. Finally, substituting ${\displaystyle \kappa _{2}(M)=\sigma _{\max }(M)/\sigma _{\min }(M)}$ yields the bound in the German / Japanese version. I've made the edit. 18.138.1.194 (talk) 19:41, 3 October 2015 (UTC)

History

I am familiar with the history of GMRES, but I have never heard DIIS mentioned. Is there a published literature review that mentions this relation? A quick reading of Pulay's 1980 paper shows he misses the Arnodli process and the underlying theory for the GMRES polynomial (although he gets tantalizing close). As it stands, calling GMRES a special case of DIIS is like calling the Wright Flyer a special case of Langley's Aerodrome. Jeffreyhokanson (talk) 16:13, 27 November 2013 (UTC)

iterative process -- is n incremented somewhere?

192.73.228.23 (talk) 00:09, 6 February 2014 (UTC)

Why is this a Krylov subspace method as it is not using a polynomial of A?

192.73.228.23 (talk) 00:15, 6 February 2014 (UTC)

assuming that n indicates the nth step, is the dimension of the sub problem increasing by one at every step?

192.73.228.23 (talk) 00:25, 6 February 2014 (UTC)

Hn is a time machine...

A*Qn = Qn+1*Hn

but Qn+1 depends on xn+1 which we are trying to calculate right now... — Preceding unsigned comment added by 192.73.228.23 (talk) 01:05, 6 February 2014 (UTC)