*3.4. Bagging and Boosting*

Bagging, or Bootstrap aggregation, proposed by [38], is a technique used with regression methods to decrease the variance and improve prediction accuracy. It is a simple technique where several bootstrap samples are drawn from the input data, and prediction is made following the same prediction method for each sample. The results are merged

by averaging (regression) or simple voting (classification) that adumbrate the input data results subjected to the same prediction method as the bootstrap samples but with reduced variance. All the bootstrap samples have the same size as the original data. The sampling is done with replacement, because of which, a few instances/samples are repeated, and a few are omitted. The stability of base classifier of each bootstrap sample essentially determines the performance of bagging. Since all the samples are equally likely to get aggregated, bagging does not suffer from issues of overfitting and works well with noisy data. Thus, the focus on a specific sample of training data is removed.

Boosting, similar to bagging, is a sample-based approach to improve classification and regression models' accuracy; however, unlike bagging, which uses a direct averaging of individual sample results, boosting uses a weighted average method to reduce the overall prediction variance. All the samples are initialized with equal weights, then the weights are updated with each boosting classification round. The weights of samples that are harder to classify are increased, and the weights are decreased for the samples that are correctly classified. This ensures the boosting algorithm to focus on samples that are harder to classify with increase in iterations. All the base classifiers of each boosting round are aggregated to obtain the final ensemble boosting classification. The fundamentals of bagging and boosting could be found in [39].
