Talk:Bootstrap aggregating

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Statistics (Rated Start-class, Mid-importance)
WikiProject icon

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

Start-Class article Start  This article has been rated as Start-Class on the quality scale.
 Mid  This article has been rated as Mid-importance on the importance scale.
WikiProject Robotics (Rated Start-class, Mid-importance)
WikiProject icon Bootstrap aggregating is within the scope of WikiProject Robotics, which aims to build a comprehensive and detailed guide to Robotics on Wikipedia. If you would like to participate, you can choose to edit this article, or visit the project page (Talk), where you can join the project and see a list of open tasks.
Start-Class article Start  This article has been rated as Start-Class on the project's quality scale.
 Mid  This article has been rated as Mid-importance on the project's importance scale.


A couple of things look wrong with the figure (though it's still okay enough to give the general idea of bagging)

1) Far less than 100 smoothers are plotted

2) The red curve does not appear to be an average of the grey curves

(2) could be due to (1)


Should this page and Bootstrapping_(machine_learning) possibly by combined together? It seems that they cover very similar content, albeit the other article is fairly short. Xekno (talk) 04:52, 20 May 2011 (UTC)

Agree, it describes the same technique. Let's replace that article with a redirect. It doesn't look to me that we need to combine anything. The only new fact mentioned there is "The error is then estimated by err = 0.632×err_test + 0.368×err_training." It's not clear where that formula comes from. References don't look useful to me: 1) Efron's paper is about bootstrapping in statistics, not bagging, 2) there's nothing about bagging in Viola and Jones paper, 3) external link is dead. -- X7q (talk) 11:02, 20 May 2011 (UTC)

Linear models[edit]

This page claims that "One particular interesting point about bagging is that, since the method averages several predictors, it is not useful to improve linear models." Is there a reference for this? 12:10, 7 September 2007 (UTC)

I'm pretty sure that if you average a bunch of bootstrapped (linear?) models you will just end up with the same thing as fitting one model to the whole dataset. —Preceding unsigned comment added by (talk) 22:18, 7 November 2007 (UTC)
I am not at all sure, but i would like to see the curve fitted on the full data in the figure! Maybe a simulated dataset would also give a better view on the advantages of bagging? Jeroenemans (talk) 15:11, 6 March 2008 (UTC)
What is the context of the statement? Bagging is not used in the traditional linear models but in machine learning approaches (random forests and the like). So what is the point?
I am not sure if this is what the anon poster meant, but my question is this. If I am making a GLM model with many free parameters (say I'm interested in fitting the first 10 terms in a Taylor expansion of some complicated function), would I benefit from bagging my models? --IlyaV (talk) 05:19, 1 April 2009 (UTC)
Heh, it seems I answered my own question. I ran some simulations and it seems that bagging a GLM model gave a bigger error relative to noiseless data than the original GLM. My examples were generated using Gaussian noise added to a nonlinear model. I tried very high order GLMs (10+ coefficients), and bagging never improved the prediction (as measured by MSE relative to noiseless original distribution). It seems that as n->Inf bagging gives prediction which starts to approach the original GLM --IlyaV (talk) 18:57, 2 April 2009 (UTC)


Please explain this: "If n'=n, then for large n the set Di expected to have 63.2% of the unique examples of D, the rest being duplicates." How is the mysterious 63.2% derived? If n' (the size of each Di equals n (the size of the original training dataset), then Di contains 100% of the training data set! Also, instead of the word "examples" could we use some standard language such as cases or observations? —Preceding unsigned comment added by (talk) 06:13, 20 March 2009 (UTC)

No,if Di is drawn from D with replacement (with n=size(D)), then there is 1/n^2 probability of duplicate, 1/n^3 of triplicates (is that a word?), etc. This is actually independent of size of ni. If you have ni>>n then you should have Di have all elements from D, however, for ni<=n, it would be incredibly unlikely for Di to have all elements from D. Also, examples IS standard language when you talk about learning; bagging is, if anything, a learning algorithm. --IlyaV (talk) 16:43, 2 April 2009 (UTC)


Is it worth listing a list of implementations? AndrewHZ (talk) 18:44, 4 December 2012 (UTC)

Here is a start of a list

  • Neural networks with dropout ("Dropout can be seen as an extreme form of bagging in which each model is trained on a single case and each parameter of the model is very strongly regularized by sharing it with the corresponding parameter in all the other models." Improving neural networks by preventing co-adaptation of feature detectors
  • bagEarth in R package caret
  • R package ipred Improved Predictors
  • R package adabag Applies multiclass AdaBoost.M1, AdaBoost-SAMME and Bagging