Talk:Limited-memory BFGS

This article is within the scope of WikiProject Computer science, a collaborative effort to improve the coverage of Computer science related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.Computer scienceWikipedia:WikiProject Computer scienceTemplate:WikiProject Computer scienceComputer science articles

???

This article has not yet received a rating on the project's importance scale.

Things you can help WikiProject Computer science with:

Here are some tasks awaiting attention:

Article requests :
- Requested articles/Applied arts and sciences/Computer science, computing, and Internet
Cleanup :
- Computer science articles needing attention
- Computer science articles needing expert attention
Copyedit :
- Computing
Expand :
- Computer science
Infobox :
- Computer science articles without infoboxes
Maintain :
- Timeline of computing 2020–present
Photo :
- Find pictures for the biographies of computer scientists (see List of computer scientists)
- Computing articles needing images
Stubs :
- Computer science stubs
Unreferenced :
- WikiProject Computer science/Unreferenced BLPs
Project-related :
- Tag all relevant articles in Category:Computer science and sub-categories with {{WikiProject Computer science}}

Daily pageviews of this article

A graph should have been displayed here but graphs are temporarily disabled. Until they are enabled again, visit the interactive graph at pageviews.wmcloud.org

It would be nice if the LBFGS article and the BFGS article used the same symbol to represent the approximation to the inverse of the Hessian. LBFGS uses Hk and BFGS uses Bk^{-1} — Preceding unsigned comment added by 76.102.204.220 (talk) 03:52, 26 March 2014 (UTC)[reply]

As it stands, this article is basically an a set of links to, and information about, library packages. It should have a more detailed description of the algorithm itself and fewer links to external content. Does someone with a more thorough background in optimization than myself want to take this on? --Soultaco 15:55, 15 October 2007 (UTC)[reply]

software information back[edit]

The intention when creating this page was to give software information. Our hope is that users can contribute links to versions of L-BFGS written in different languages. There are various explanations of the L-BFGS algorithm on the web and there is no need for another one here. Therefore I will try to restore the earlier version. Nocedal 06:53, 2 December 2007 (UTC)Jorge Nocedal[reply]

Thanks for contributing, but Wikipedia is not a collection of links. It is an online encyclopedia, and while links to software packages for L-BFGS are certainly relevant and worth including, this entry is not here simply to advertise software packages; the Wikipedia entry on L-BFGS should first and foremost be a discussion the algorithm. If no further objection/elaboration is raised, I'm going to re-add the algorithm information and reformat the article to conform with Wikipedia standards. --Soultaco (talk) 04:58, 6 December 2007 (UTC)[reply]

I have rewritten the page so that it conforms to Wikipedia standards and informs the reader about the L-BFGS codes. Please feel free to write a separate page about limited memory algorithms, but I propose that we do not try to do both in the same page. Nocedal February, 2008

I completely agree with Soultaco; a page entitled L-BFGS should provide a concise description of the algorithm. Readers are more interested in how the method works than the author's software package that implements it. Henkelman (talk) 04:34, 20 June 2008 (UTC)[reply]

I recently had to implement this, and was very annoyed that there wasn't a helpful Wikipedia page, and none of my standard sources had enough information (e.g. Numerical Recipes). As such, having finished my implementation, and finding the key to doing so buried at the back of the "Representations" article by Byrd, Nocedal and Schnabel which I've cited, I've enriched the wikipedia article to the best of my ability. We really need someone who can explain the relevant proofs -- e.g. Why we're able to use a limited history for the BFGS update without causing the approximate Hessian to stop being symmetric and positive definite. These proofs are given in the "Representations" paper, but I don't understand them well enough to reduplicate the argument without lifting the proofs directly. I imagine they're also in the 1980 paper, but I can't find that anywhere. The 1989 paper is totally useless from an implementation perspective -- it outlines the QN procedure, and skips over how to do the Hessian w/o representing the whole thing. Abeppu (talk) 05:00, 19 February 2009 (UTC)[reply]

Thanks to the previous contributors (Abeppu?) for adding in the L-BFGS algorithm, which is indeed very useful and nicely formatted. There seem to have been some minor errors about the loop index i, and while referring to some published descriptions of L-BFGS I fixed these. I also changed the definition of s and y to be consistent with i going to the current iteration minus one, which seems to have been the intention of the code that was there (there are two alternative formulations in the literature which differ by an index offset of one, and I've gone with the one you can see there, where i goes up to the current iteration minus one). I also changed scalars to Greek letters, because when we're not using bold for vector quantities, it is otherwise quite hard to distinguish scalar and vector quantities. Danpovey (talk) 00:37, 6 April 2011 (UTC)[reply]

Citation needed?[edit]

in the sentence "An early, open source implementation of L-BFGS in Fortran exists" I would expect some citation and/or link to this reference that seems important. No? —Preceding unsigned comment added by Orzelf (talk • contribs) 22:38, 20 February 2010 (UTC) yes - done. --Dikay0 (talk) 18:07, 31 October 2010 (UTC)[reply]

Bug in algorithm?[edit]

The sign of initial z seems to be wrong here:

 $H_{k}^{0}=\gamma _{k}I$ 
 $z=-H_{k}^{0}q$

Nocedal's Numerical Optimization has a positive sign. (Alg 7.4, page 178) I tried implementing the version listed here and it does not converge on Rosenbrock. — Preceding unsigned comment added by 136.152.250.167 (talk) 00:16, 23 January 2019 (UTC)[reply]

The assignment $H_{k}^{0}=y_{k-1}^{\rm {T}}s_{k-1}/y_{k-1}^{\rm {T}}y_{k-1}$ can't be right: The left side is a matrix, the right side a scalar.

How to fix this? The paper cited below gives at the top of page 9 (in the paragraph just before equation 3.1) the formula $H_{k}^{0}=\gamma _{k}I$ and, on the bottom of page 11, equation 3.12 says $\gamma _{k}=y_{k-1}^{\rm {T}}s_{k-1}/y_{k-1}^{\rm {T}}y_{k-1}$ .

So it seems a multiplication with $I$ should be added on the right side of the assignment.

If someone could verify this and then change the page accordingly that would be great.

Also, note that this calculation of $H_{k}^{0}$ makes this matrix diagonal, a fact that is described in the following text as only “commonly”. There should, IMO, be a short explanation of this apparent contradiction. I feel I don't know enough about this, so won't provide one.

The paper is: Richard H. Byrd, Jorge Nocedal and Robert B. Schnabel: “Representations of Quasi-Newton Matrices and Their Use in Limited Memory Methods,” Technical Report, CU-CS-612-92, University of Colorado at Boulder, 1992. This is probably the same as the one from citation 5, only two years earlier and fetched from another source.

84.143.150.249 (talk) 21:07, 17 November 2015 (UTC)[reply]

I was wandering about the same thing. But following your interpretation makes

H_{k}^{0}

a **scalar** matrix, so by abuse of notation one might say that the pseudo code is in fact correct? But then surely there should be a better approximation of

H_{k}^{0}

that uses a diagonal non-scalar matrix, right? bungalo (talk) 11:00, 15 April 2017 (UTC)[reply]

Hi, I know wikipedia is not intended to be a place of original research, but the sign in front of $H_{k}^{0}$ is definitely bugged. I coded up limited memory BFGS myself to check. Using $z=-H_{k}^{0}q$ gave failure to find a Wolfe conditions satisfying point. (I used Algorithm 3.5 & 3.6 of Nocedal mentioned above.) Flipping the sign to $z=H_{k}^{0}q$ gave convergence to local minimum in 14 steps. I used a strictly positive bivariate quartic polynomial for my objective similar to the Rosenbrock function. Even though my code works for a benchmark optimization problem it would be good if someone could find a source or dig through the original papers on limited bfgs. Akrodger (talk) 07:02, 22 March 2019 (UTC)[reply]

External links modified[edit]

Hello fellow Wikipedians,

I have just modified one external link on Limited-memory BFGS. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:

Added archive https://web.archive.org/web/20131101205929/http://acl.ldc.upenn.edu/W/W02/W02-2018.pdf to http://acl.ldc.upenn.edu/W/W02/W02-2018.pdf

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

This message was posted before February 2018. After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than regular verification using the archive tool instructions below. Editors have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the RfC before doing mass systematic removals. This message is updated dynamically through the template {{source check}} (last update: 18 January 2022).

If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
If you found an error with any archives or the URLs themselves, you can fix them with this tool.

Cheers.—InternetArchiveBot (Report bug) 10:37, 23 December 2017 (UTC)[reply]

Algorithm description[edit]

I don't understand the point of introducing the matrix $H_{k}^{0}$ in the algorithm. It is just a scaled identity matrix, so why not simply multiply $z$ by $\gamma _{k}$ ? -- Andrew Myers (talk) 22:37, 25 August 2019 (UTC)[reply]