Mathematics desk
< May 23	<< Apr \| May \| Jun >>	Current desk >

Welcome to the Wikipedia Mathematics Reference Desk Archives
The page you are currently viewing is a transcluded archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.

May 24

Optimization: Necessary and Sufficient Conditions for Superlinear Convergence of Quasi-Newton Method

Hello,

I am reading *Numerical Optimization* by Nocedal & Wright, and I am having trouble understanding some aspects of the proof of Theorem $3.7$ . I have written the theorem, its proof, and my questions in LaTeX. I'm sorry I couldn't figure out how to make it show up nicely on Wikipedia.

I have also asked this question on Math Stackexchange, if you would prefer to see it there: https://math.stackexchange.com/questions/3686947/proof-of-superlinear-convergence-of-quasi-newton-methods-in-nocedal-wright

I have been stuck on this theorem for many hours, so any help is greatly appreciated.

There are two things I don't understand:

1) The theorem is an iff statement. The author proves one direction, but I don't see how to prove the reverse.

2) The author seems to use the assumption that the Hessian is Lipschitz, but this is not an explicit assumption of the theorem. Is this a mistake from the author? (I checked the errata and this wasn't on there)

The following are several lines the author references in the proof. The theorem and the proof follows.

||x_{k}+p_{k}^{N}-x^{*}||\leq L||x_{k}-x^{*}||^{2}~~~~(3.33)

[The above is where my point #2 comes from. This inequality was derived in the proof of an earlier theorem (the theorem about quadratic convergence of Newton's Method) and in that theorem we had a hypothesis that the Hessian is Lipschitz.(and we used that hypothesis to prove the above inequality)]

p_{k}=-B_{k}^{-1}\nabla f_{k}~~~~\mathrm {where} ~B_{k}~\mathrm {is~symmetric~and~pos.\,def.} ~~~~(3.34)

\lim _{k\to \infty }{\dfrac {||(B_{k}-H_{f}(x^{*}))p_{k}||}{||p_{k}||}}=0~~~~(3.36)

\mathbf {Theorem~3.7:}

Suppose that

f:\mathbb {R} ^{n}\to \mathbb {R}

is twice continuously differentiable. Consider the iteration

x_{k+1}=x_{k}+p_{k}

(that is, the step length

\alpha _{k}

is uniformly

1

) and that

p_{k}

is given by

(3.34)

. Let us assume also that

(x_{k})

converges to a point

x^{*}

such that

\nabla f(x^{*})=0

and

H_{f}(x^{*})

is positive definite. Then

(x_{k})

converges superlinearly if and only if

(3.36)

holds.

{\textit {Proof:}}~

We first show that

(3.36)

is equivalent to

p_{k}-p_{k}^{N}=o(||p_{k}||)~~~~(3.37)

where

p_{k}^{N}=-H_{f}(x_{k})^{-1}\nabla f_{k}

is the Newton step. Assuming

(3.36)

holds, we have that

{\begin{array}{lcl}p_{k}-p_{k}^{N}&=&H_{f}(x_{k})^{-1}(H_{f}(x_{k})p_{k}+\nabla f_{k})\\&=&H_{f}(x_{k})^{-1}(H_{f}(x_{k})-B_{k})p_{k}\\&=&O(||(H_{f}(x_{k})-B_{k})p_{k}||)\\&=&o(||p_{k}||)\end{array}}

where we have used the fact that

||H_{f}(x_{k})^{-1}||

is bounded above for

x_{k}

sufficiently close to

x^{*}

, since the limiting Hessian

H_{f}(x_{*})

is positive definite. The converse follows readily of we multiply both sides of

(3.37)

by

H_{f}(x_{k})

and recall

(3.34)

.

By combining

(3.33)

and

(3.37)

, we obtain that

||x_{k}+p_{k}-x^{*}||\leq ||x_{k}+p_{k}^{N}-x^{*}||+||p_{k}-p_{k}^{N}||=O(||x_{k}-x^{*}||^{2})+o(||p_{k}||).

A simple manipulation of this inequality reveals that

||p_{k}||=O(||x_{k}-x^{*}||),

so we obtain

||x_{k}+p_{k}-x^{*}||\leq o(||x_{k}-x^{*}||),

giving the superlinear convergence result.

In an earlier edition of the book (ISBN 978-0-387-22742-9), the statement of Theorem 3.7 starts with: "Suppose that $f$ is twice differentiable and that the Hessian $\nabla ^{2}f(x)$ is Lipschitz continuous ..." [my emphasis by underscoring — L.] The statement of Theorem 3.7 in a later edition (ISBN 978-0-387-40065-5) is as above, but the form of (3.36) is subtly different:

\lim _{k\to \infty }{\dfrac {||(B_{k}-\nabla ^{2}f(x^{*}))p_{k}||}{||p_{k}||}}=0~~~~(3.36)

So what edition is the above from? The presentation is not entirely self-contained; I assume that

f_{k}

is shorthand notation for

f(x_{k})

. --Lambiam 10:29, 24 May 2020 (UTC)[reply]

Thanks for the response. I'm using the second edition; it's good to hear that the Lipschitz hypothesis appears in the first edition. Its ommision must have been a mistake on the author's part, but unfortunately it wasn't listed in the errata.

As far as the diferences in (3.36) the author does use $\nabla ^{2}f(x)$ while I use $H_{f}(x)$ . But from going through the proof of Theorem 3.7, I am quite confident thatI am quite confident that (3.36) a little wrong; instead of $H_{f}(x^{*})$ , it should have $H_{f}(x_{k})$

Yes indeed, $f_{k}$ stands for $f(x_{k})$ . I have tried to make the proof as self contained as possible; did I miss something, or are you referring to the "A simple manipulation of this inequality...."? --BlueDream30 15:01, 24 May 2020 (UTC)[reply]