Talk:Conjugate gradient method

This is the talk page for discussing improvements to the Conjugate gradient method article.
This is not a forum for general discussion of the article's subject.

Put new text under old text. Click here to start a new topic.
New to Wikipedia? Welcome! Learn to edit; get help.

Article policies

Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL

Archives: 1: 2 months

Archives

Archive 1

This page has archives. Sections older than 60 days may be automatically archived by .

Discussion of correct of the algorithm

The explanation skips a large important part of the correctness proof, which shows that the mentioned "Gram-Schmidt orthonormalization" only needs to consider the latest direction p_k and need not be performed relative to all previous directions p_j where j < k. This can be deduced from the fact that the r_k and p_k span the same Krylov subspace, but it should be highlighted how this implies that p_j r_k = 0 for j <= k and p_j A r_k = 0 for j < k. — Preceding unsigned comment added by 46.128.186.9 (talk) 16:04, 13 December 2023 (UTC)[reply]

Comment

In your algorithm, the formula to calculate pk differs from Jonathan Richard Shewchuk paper. The index of r should be k instead of k-1. Mmmh, sorry, it seams to be correct! ;-) —The preceding unsigned comment was added by 171.66.40.105 (talk • contribs) 01:45, 21 February 2007 (UTC)

additional comment:

basis vector set $P = \{ p_0, ... p_{n-1} \}$.

Conflicting information under "The resulting algorithm" subsection

In the subsection, I can see that r_i^T r_j = 0. The following is quoted from the same. However, in the algorithm, in the calculation for the value of {\beta}, the r_k^T r_k comes up in the denominator, which does not look right. In my openion, the denominator should be r_k^T r_k+1. However, I could be incorrect, and hence, I would like to bring this to the notice of the experts. — Preceding unsigned comment added by Zenineasa (talk • contribs) 13:25, 26 November 2021 (UTC)[reply]

Missing steps thinking about it as a linear solver

I can't wrap my head around the fact that it's optimizing

f(\mathbf {x} )={\tfrac {1}{2}}\mathbf {x} ^{\mathsf {T}}\mathbf {A} \mathbf {x} -\mathbf {x} ^{\mathsf {T}}\mathbf {b} ,\qquad \mathbf {x} \in \mathbf {R} ^{n}\,

.

Abstractly, I think (?) the idea is that we choose $f(x)$ such that

\nabla f(\mathbf {x} )=\mathbf {A} \mathbf {x} -\mathbf {b} \,

and so the residual vector is the gradient of the surface. But coming at it from the perspective of "let's minimize the residual", my line of thinking is let

g(\mathbf {x} ):=\|\mathbf {A} \mathbf {x} -\mathbf {b} \|_{2}^{2}=(\mathbf {Ax-b} )^{\top }(\mathbf {Ax-b} )=\mathbf {x^{\top }A^{\top }Ax} -2\mathbf {b^{\top }Ax+b^{\top }b}

which we could drop the constant $\mathbf {b^{\top }b}$ from. That's similar but decidedly different from $f(\mathbf {x} )$ . What gives? Is $f(\mathbf {x} )$ minimizing the residual? Can the norm of the residual be massaged to look like $f(\mathbf {x} )$ ? —Ben FrantzDale (talk) 21:30, 18 January 2023 (UTC)[reply]

Pathological example not PSD

The method is supposed to be used for PSD matrices, but the example given in "A pathological example" is not PSD. Thomasda (talk) 17:47, 23 January 2024 (UTC)[reply]

A method is not supposed to be used only for specific entries. It is supposed to be used on any input for which it converges to a solution. In § Conjugate gradient on the normal equations, it is shown that it may be applied to any linear system. When used for unconstrained optimization, it can be used for finding a local minimum of a non-convex function, in which case, if it converges, one cannot predict which local minimum is obtained. Personally, around 1970, I used this method for finding a solution of an overconstrained system of 81 quadratic equations in 54 unknowns, by minimizing the sum of the squares of the equations (see System of polynomial equations § General solving algorithms). More than 50 years later, I do not know any other method that works on this problem. D.Lazard (talk) 20:49, 23 January 2024 (UTC)[reply]

That's fair, but in the beginning of the article it is assumed A is PSD: "is an algorithm for the numerical solution of particular systems of linear equations, namely those whose matrix is positive-definite." and "the known

n\times n

matrix

\mathbf {A}

is symmetric (i.e., A^T = A), positive-definite (i.e. x^TAx > 0 for all non-zero vectors

\mathbf {x}

in Rⁿ), and real".

So either it should be more clarified that the method is " supposed to be used on any input" (including in the analysis section, which also assumes PSD, as far as I can tell), or the Pathological example should note that it is not PSD, and hence doesn't conflict with the rest of the analysis.Thomasda (talk) 01:52, 24 January 2024 (UTC)[reply]