No free lunch in search and optimization

In computing, there are circumstances in which the outputs of all procedures solving a particular type of problem are statistically identical. A way of describing such a circumstance, introduced by David H. Wolpert and William G. Macready in connection with the problems of search^[1] and optimization,^[2] is to say that there is no free lunch. Cullen Schaffer had previously established that there is no free lunch in machine learning problems of a particular sort, but used different terminology.^[3]

In formal terms, there is no free lunch when the probability distribution on problem instances is such that all problem solvers have identically distributed results. In the case of search, a problem instance is an objective function, and a result is a sequence of values obtained in evaluation of candidate solutions in the domain of the function. For typical interpretations of results, search is an optimization process. There is no free lunch in search if and only if the distribution on objective functions is invariant under permutation of the space of candidate solutions.^[4]^[5] This condition does not hold precisely in practice,^[4] but an "(almost) no free lunch" theorem suggests that it holds approximately.^[6]

Overview

Some computational problems are solved by searching for good solutions in a space of candidate solutions. A description of how to repeatedly select candidate solutions for evaluation is called a search algorithm. On a particular problem, different search algorithms may obtain different results, but over all problems, they are indistinguishable. It follows that if an algorithm achieves superior results on some problems, it must pay with inferiority on other problems. In this sense there is no free lunch in search.^[1] Alternatively, following Schaffer,^[3] search performance is conserved. Usually search is interpreted as optimization, and this leads to the observation that there is no free lunch in optimization.^[2]

"The 'no free lunch' theorem of Wolpert and Macready," as stated in plain language by Wolpert and Macready themselves, is that "any two algorithms are equivalent when their performance is averaged across all possible problems."^[7] The "no free lunch" results indicate that matching algorithms to problems gives higher average performance than does applying a fixed algorithm to all. Igel and Toussaint^[4] and English^[5] have established a general condition under which there is no free lunch. While it is physically possible, it does not hold precisely.^[4] Droste, Jansen, and Wegener have proved a theorem they interpret as indicating that there is "(almost) no free lunch" in practice.^[6]

To make matters more concrete, consider an optimization practitioner confronted with a problem. Given some knowledge of how the problem arose, the practitioner may be able to exploit the knowledge in selection of an algorithm that will perform well in solving the problem. If the practitioner does not understand how to exploit the knowledge, or simply has no knowledge, then he or she faces the question of whether some algorithm generally outperforms others on real-world problems. The authors of the "(almost) no free lunch" theorem say that the answer is essentially no, but admit some reservations as to whether the theorem addresses practice.^[6]

No free lunch (NFL)

A "problem" is, more formally, an objective function that associates candidate solutions with goodness values. A search algorithm takes an objective function as input and evaluates candidate solutions one-by-one. The output of the algorithm is the sequence of observed goodness values.^[8]^[9]

Wolpert and Macready stipulate that an algorithm never reevaluates a candidate solution, and that algorithm performance is measured on outputs.^[2] For simplicity, we disallow randomness in algorithms. Under these conditions, when a search algorithm is run on every possible input, it generates each possible output exactly once.^[9] Because performance is measured on the outputs, the algorithms are indistinguishable in how often they achieve particular levels of performance.

Some measures of performance indicate how well search algorithms do at optimization of the objective function. A common performance measure is the least index of the least value in the output sequence. This is the number of evaluations required to minimize the objective function. For some algorithms, the time required to find the minimum is proportional to the number of evaluations.^[5]

The original no free lunch (NFL) theorems assume that all objective functions are equally likely to be input to search algorithms.^[2] It has since been established that there is NFL if and only if, loosely speaking, "shuffling" objective functions has no impact on their probabilities.^[5]^[4] This condition for NFL does not hold precisely.^[4] But NFL is a matter of degree, not an all-or-nothing proposition. If the condition for NFL holds approximately, then all algorithms yield approximately the same results over all objective functions.^[5] It has been argued that there is "(almost) no free lunch" in practice.^[6]

Formal synopsis of NFL

The set of all objective functions is $Y^{X}$ , where $X$ is a finite solution space and $Y$ is a finite poset. The set of all permutations of X is J. Random variable F is distributed on $Y^{X}$ . For all j in J, F o j is a random variable distributed on $Y^{X}$ , with Pr{F o j = f} = Pr{F = f o j^–1} for all f in $Y^{X}$ . Let a(f) denote the output of search algorithm a on input f. If a(F) and b(F) are identically distributed for all search algorithms a and b, then F has an NFL distribution. This condition holds if and only if F and F o j are identically distributed for all j in J.^[5]^[4] In other words, there is no free lunch for search algorithms if and only if the distribution of objective functions is invariant under permutation of the solution space.

Original NFL theorems

Wolpert and Macready give two principal NFL theorems, the first regarding objective functions that do not change while search is in progress, and the second regarding objective functions that may change.^[2]

Theorem 1: For any pair of algorithms a₁ and a₂

\sum _{f}P(h_{m}^{y}|f,m,a_{1})=\sum _{f}P(h_{m}^{y}|f,m,a_{2}).

Theorem 2 establishes a "more subtle" NFL result for time-varying objective functions.

Interpretation of NFL results

The NFL results have been interpreted as saying that "a general-purpose universal optimization strategy is theoretically impossible, and the only way one strategy can outperform another is if it is specialized to the specific problem under consideration"^[10]. However, an algorithm may outperform another on a problem when neither is specialized to the problem. It may be that both algorithms are among the worst for the problem. Wolpert and Macready have developed a measure of "alignment" of an algorithm and a problem.^[2] To say that one algorithm matches a problem better than another is not to say that either is specialized to the problem.

Coevolutionary free lunches

Wolpert and Macready have proved that there are free lunches in coevolutionary optimization.^[7] Their analysis "covers 'self-play' problems. In these problems, the set of players work together to produce a champion, who then engages one or more antagonists in a subsequent multiplayer game."^[7] That is, the objective is to obtain a good player, but without an objective function. The goodness of each player (candidate solution) is assessed by observing how well it plays against others. An algorithm attempts to use players and their quality of play to obtain better players. The player deemed best of all by the algorithm is the champion. Wolpert and Macready have demonstrated that some coevolutionary algorithms are generally superior to other algorithms in quality of champions obtained. Generating a champion through self-play is of interest in evolutionary computation and game theory. The results are inapplicable to coevolution of biological species, which does not yield champions.^[7]

Intelligent design and NFL

The no free lunch theorem is often invoked by intelligent design proponents Robert J. Marks II and William Dembski as supporting intelligent design and Dembski's concept of specified complexity which he alleges is evidence of design.^[11] The scientific community has rejected both the notions of specified complexity and that the no free lunch theorem supports intelligent design.^[12]

References and notes

^ ^a ^b Wolpert, D.H., Macready, W.G. (1995), No Free Lunch Theorems for Search, Technical Report SFI-TR-95-02-010 (Santa Fe Institute).
^ ^a ^b ^c ^d ^e ^f Wolpert, D.H., Macready, W.G. (1997), "No Free Lunch Theorems for Optimization," IEEE Transactions on Evolutionary Computation 1, 67. http://ic.arc.nasa.gov/people/dhw/papers/78.pdf
^ ^a ^b Schaffer, Cullen (1994), "A conservation law for generalization performance," International Conference on Machine Learning, H. Willian and W. Cohen, Editors. San Francisco: Morgan Kaufmann, pp.295-265.
^ ^a ^b ^c ^d ^e ^f ^g Igel, C., and Toussaint, M. (2004) "A No-Free-Lunch Theorem for Non-Uniform Distributions of Target Functions," Journal of Mathematical Modelling and Algorithms 3, pp. 313–322.
^ ^a ^b ^c ^d ^e ^f English, T. (2004) No More Lunch: Analysis of Sequential Search, Proceedings of the 2004 IEEE Congress on Evolutionary Computation, pp. 227-234. http://BoundedTheoretics.com/CEC04.pdf
^ ^a ^b ^c ^d S. Droste, T. Jansen, and I. Wegener. 2002. "Optimization with randomized search heuristics: the (A)NFL theorem, realistic scenarios, and difficult functions," Theoretical Computer Science, vol. 287, no. 1, pp. 131-144.
^ ^a ^b ^c ^d Wolpert, D.H., and Macready, W.G. (2005) "Coevolutionary free lunches," IEEE Transactions on Evolutionary Computation, 9(6): 721-735
^ A search algorithm also outputs the sequence of candidate solutions evaluated, but that output is unused in this article.
^ ^a ^b English, T. M. 2000. "Optimization Is Easy and Learning Is Hard in the Typical Function," Proceedings of the 2000 Congress on Evolutionary Computation: CEC00, pp. 924-931. http://www.BoundedTheoretics.com/CEC2000.pdf
^ Ho, Y.C., Pepyne, D.L. (2002), "Simple Explanation of the No-Free-Lunch Theorem and Its Implications," Journal of Optimization Theory and Applications 115, 549.
^ Dembski, W. A. (2002) No Free Lunch, Rowman & Littlefield
^ Wolpert, D. (2003) "William Dembski's treatment of the No Free Lunch theorems is written in jello," Mathematical Reviews 12, review 2003b:00012. http://www.talkreason.org/articles/jello.cfm.

External links

[WM95-1] Wolpert, D.H., Macready, W.G. (1995), No Free Lunch Theorems for Search, Technical Report SFI-TR-95-02-010 (Santa Fe Institute).

[WM97-2] ^ ^a ^b ^c ^d ^e ^f Wolpert, D.H., Macready, W.G. (1997), "No Free Lunch Theorems for Optimization," IEEE Transactions on Evolutionary Computation 1, 67. http://ic.arc.nasa.gov/people/dhw/papers/78.pdf

[Schaffer94-3] Schaffer, Cullen (1994), "A conservation law for generalization performance," International Conference on Machine Learning, H. Willian and W. Cohen, Editors. San Francisco: Morgan Kaufmann, pp.295-265.

[Igel-4] ^ ^a ^b ^c ^d ^e ^f ^g Igel, C., and Toussaint, M. (2004) "A No-Free-Lunch Theorem for Non-Uniform Distributions of Target Functions," Journal of Mathematical Modelling and Algorithms 3, pp. 313–322.

[English2004-5] ^ ^a ^b ^c ^d ^e ^f English, T. (2004) No More Lunch: Analysis of Sequential Search, Proceedings of the 2004 IEEE Congress on Evolutionary Computation, pp. 227-234. http://BoundedTheoretics.com/CEC04.pdf

[ANFL-6] S. Droste, T. Jansen, and I. Wegener. 2002. "Optimization with randomized search heuristics: the (A)NFL theorem, realistic scenarios, and difficult functions," Theoretical Computer Science, vol. 287, no. 1, pp. 131-144.

[WM-coev-7] Wolpert, D.H., and Macready, W.G. (2005) "Coevolutionary free lunches," IEEE Transactions on Evolutionary Computation, 9(6): 721-735

[8] A search algorithm also outputs the sequence of candidate solutions evaluated, but that output is unused in this article.

[English2000-9] English, T. M. 2000. "Optimization Is Easy and Learning Is Hard in the Typical Function," Proceedings of the 2000 Congress on Evolutionary Computation: CEC00, pp. 924-931. http://www.BoundedTheoretics.com/CEC2000.pdf

[10] Ho, Y.C., Pepyne, D.L. (2002), "Simple Explanation of the No-Free-Lunch Theorem and Its Implications," Journal of Optimization Theory and Applications 115, 549.

[DembskiNFL-11] Dembski, W. A. (2002) No Free Lunch, Rowman & Littlefield

[jello-12] Wolpert, D. (2003) "William Dembski's treatment of the No Free Lunch theorems is written in jello," Mathematical Reviews 12, review 2003b:00012. http://www.talkreason.org/articles/jello.cfm.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]