# Bootstrap aggregating

Bootstrap aggregating, also called bagging (from bootstrap aggregating), is a machine learning ensemble meta-algorithm designed to improve the stability and accuracy of machine learning algorithms used in statistical classification and regression. It also reduces variance and helps to avoid overfitting. Although it is usually applied to decision tree methods, it can be used with any type of method. Bagging is a special case of the model averaging approach.

## Description of the technique

Given a standard training set $D$ of size n, bagging generates m new training sets $D_{i}$ , each of size n′, by sampling from D uniformly and with replacement. By sampling with replacement, some observations may be repeated in each $D_{i}$ . If n=n, then for large n the set $D_{i}$ is expected to have the fraction (1 - 1/e) (≈63.2%) of the unique examples of D, the rest being duplicates. This kind of sample is known as a bootstrap sample. Sampling with replacement ensures each bootstrap is independent from its peers, as it does not depend on previous chosen samples when sampling. Then, m models are fitted using the above m bootstrap samples and combined by averaging the output (for regression) or voting (for classification).

Bagging leads to "improvements for unstable procedures", which include, for example, artificial neural networks, classification and regression trees, and subset selection in linear regression. Bagging was shown to improve preimage learning. On the other hand, it can mildly degrade the performance of stable methods such as K-nearest neighbors.

## Process of the Algorithm

### Original dataset

The original dataset contains several entries of samples from s1 to s5. Each sample has 5 features (Gene 1 to Gene 5). All samples are labeled as Yes or No for a classification problem. ### Creation of Bootstrapped datasets

Given the table above to classify a new sample, first a bootstrapped dataset must be created using the data from the original dataset. This Bootstrapped dataset is typically the size of the original dataset, or smaller.

In this example, the size is 5 (s1 through s5). The Bootstrapped Dataset is created by randomly selecting samples from the original dataset. Repeat selections are allowed. Any samples that are not chosen for the bootstrapped dataset are placed in a separate dataset called the Out-of-Bag dataset.

See an example bootstrapped dataset below. It has 5 entries (same size as the original dataset). There are duplicated entries such as two s3 since the entries are selected randomly with replacement.

This step will repeat to generate m bootstrapped datasets.

### Creating of Decision Trees

A Decision tree is created for each Bootstrapped dataset using randomly selected column values to split the nodes.

### Predicting using Multiple Decision Trees When a new sample is added to the table. The bootstrapped dataset is used to determine the new entry's clasifier value.

The new sample is tested in the random forest created by each bootstrapped dataset and each tree produces a classifier value for the new sample. For Classification, a process called voting is used to determine the final result, where the result produced the most frequently by the random forest is the given result for the sample. For Regression, the sample is assigned the average classifier value produced by the trees.

After the sample is tested in the random forest. A classifier value is assigned to the sample and it is added to the table.

## Algorithm (Classification)

For Classification, use a training set $D$ , Inducer $I$ and the number of bootstrap samples $m$ as input. Generate a classifier $C^{*}$ as output

1. Create $m$ new training sets $D_{i}$ , from $D$ with replacement
2. Classifier $C_{i}$ is built from each set $D_{i}$ using $I$ to determine the classification of set $D_{i}$ 3. Finally classifier $C^{*}$ is generated by using the previously created set of classifiers $C_{i}$ on the original data set $D$ , the classification predicted most often by the sub-classifiers $C_{i}$ is the final classification
```for i = 1 to m {
D' = bootstrap sample from D    (sample with replacement)
Ci = I(D')
}
C*(x) = argmax    Σ 1               (most often predicted label y)
y∈Y   i:Ci(x)=y
```

## Example: Ozone data

To illustrate the basic principles of bagging, below is an analysis on the relationship between ozone and temperature (data from Rousseeuw and Leroy (1986), analysis done in R).

The relationship between temperature and ozone appears to be nonlinear in this data set, based on the scatter plot. To mathematically describe this relationship, LOESS smoothers (with bandwidth 0.5) are used. Rather than building a single smoother for the complete data set, 100 bootstrap samples were drawn. Each sample is composed of a random subset of the original data and maintains a semblance of the master set’s distribution and variability. For each bootstrap sample, a LOESS smoother was fit. Predictions from these 100 smoothers were then made across the range of the data. The black lines represent these initial predictions. The lines lack agreement in their predictions and tend to overfit their data points: evident by the wobbly flow of the lines.

By taking the average of 100 smoothers, each corresponding to a subset of the original data set, we arrive at one bagged predictor (red line). The red line's flow is stable and does not overly conform to any data point(s).

• Many weak learners aggregated typically outperform a single learner over the entire set, and has less overfit
• Removes variance in high-variance low-bias data sets
• Can be performed in parallel, as each separate bootstrap can be processed on its own before combination