Boosting

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Boosting is a machine learning meta-algorithm for performing supervised learning. Boosting is based on the question posed by Kearns[1]: can a set of weak learners create a single strong learner? A weak learner is defined to be a classifier which is only slightly correlated with the true classification (it can label examples better than random guessing). In contrast, a strong learner is a classifier that is arbitrarily well-correlated with the true classification.

Schapire's affirmative answer [2] to Kearns' question has had significant ramifications in machine learning and statistics, most notably leading to the development of boosting.

Contents

[edit] Boosting algorithms

While boosting is not algorithmically constrained, most boosting algorithms consist of iteratively learning weak classifiers with respect to a distribution and adding them to a final strong classifier. When they are added, they are typically weighted in some way that is usually related to the weak learners' accuracy. After a weak learner is added, the data is reweighted: examples that are misclassified lose weight and examples that are classified correctly gain weight (some boosting algorithms actually decrease the weight of repeatedly misclassified examples, e.g., boost by majority and BrownBoost). Thus, future weak learners focus more on the examples that previous weak learners misclassified.

There are many boosting algorithms. The original ones, proposed by Robert Schapire (a recursive majority gate formulation [2]) and Yoav Freund (boost by majority [3]), were not adaptive and could not take full advantage of the weak learners.

Only algorithms that are provable boosting algorithms in the probably approximately correct learning formulation are called boosting algorithms. Other algorithms that are similar in spirit to boosting algorithms are sometimes called "leveraging algorithms", although they are also sometimes incorrectly called boosting algorithms.[4]

[edit] Examples of boosting algorithms

The main variation between many boosting algorithms is their method of weighting training data points and hypotheses. AdaBoost is very popular and perhaps the most significant historically as it was the first algorithm that could adapt to the weak learners. However, there are many more recent algorithms such as LPBoost, TotalBoost, BrownBoost, MadaBoost, LogitBoost, and others. Many boosting algorithms fit into the AnyBoost framework,[5] which shows that boosting performs gradient descent in function space using a convex cost function. In 2008 Phillip Long (at Google) and Rocco A. Servedio (Columbia University) published a paper at the 25th International Conference for Machine Learning suggesting that these algorithms are provably flawed in that "convex potential boosters cannot withstand random classification noise," thus making the applicability of such algorithms for real world, noisy data sets questionable[6].

[edit] See also

[edit] Implementations

  • Orange, a free data mining software suite, module orngEnsemble
  • Weka is a machine learning set of tools that offers variate implementations of boosting algorithms like AdaBoost and LogitBoost
  • R package GBM (Generalized Boosted Regression Models) implements extensions to Freund and Schapire's AdaBoost algorithm and Friedman's gradient boosting machine.

[edit] References

[edit] Footnotes

  1. ^ Michael Kearns. Thoughts on hypothesis boosting. Unpublished manuscript. 1988
  2. ^ a b Rob Schapire. Strength of Weak Learnability. Machine Learning Vol. 5, pages 197-227. 1990
  3. ^ Yoav Freund. Boosting a weak learning algorithm by majority. Proceedings of the Third Annual Workshop on Computational Learning Theory. 1990
  4. ^ Nir Krause and Yoram Singer. Leveraging the margin more carefully. In Proceedings of the International Conference on Machine Learning (ICML), 2004.
  5. ^ Llew Mason, Jonathan Baxter, Peter Bartlett, and Marcus Frean. Boosting algorithms as gradient descent. In S.A. Solla, T.K. Leen, and K.-R. Muller, editors, Advances in Neural Information Processing Systems 12, pages 512--518. MIT Press, 2000
  6. ^ Long version published as Long, P.M. and Servedio, R.A. Random classification noise defeats all convex potential boosters. Machine Learning 78(3), p. 287-304, 2010

[edit] Notations

[edit] External links

Personal tools
Namespaces
Variants
Actions
Navigation
Interaction
Toolbox
Print/export
Languages