*4.4. ID3 with Bagging and Adaptive Boosting*

Like the CART algorithm that uses Gini impurity to form splits in the data set, the ID3 decision tree utilizes the information gain and entropy. Implementation of the ID3 algorithm on our data set also yields a tree with four levels (please see Figure 9).

**Figure 9.** The Decision Tree developed using the ID3 algorithm.

Again, since "p" is the most significant variable, the root node is split using the "Precipitation" input variable. The decision nodes in level 3 and level 4 are split to maintain lower Information Gain and Entropy uncertainty. Level 4 of the ID3 decision tree has the dataset classified into four pre-defined classes. Similar to the CART algorithm results, the class with the highest number of samples is "Fishing" followed by "BCR," "Dangerous" and "Domestic." The accuracies obtained for the training data set and overall dataset are 61.78% and 61.77% respectively, which is slightly lower than that obtained by the CART algorithm. From Table 2, we can see that ID3 with adaptive boosting comparable results to that of CART with adaptive boosting. Adaboost is an iterative procedure with no replacement. It generates a strong ensemble classifier by putting high weights on the mis-classifiers and low weights on the correctly classified trees to reduce bias and variance in the model. For this reason, it is called the "best out-of-the-box classifier" usually. The second-best testing accuracy is obtained using this method with ID3 in all attempted decision tree models. ID3 with bagging results can also be seen from Table 2, which creates many independent bootstrap aggregation models and associates weak learner with each model to finally aggregate them to produce average or ensemble prediction that has a lower variance. This method also yields the second best testing accuracy of all the decision tree models. We can see that the bagging and boosting ensemble methods improve the testing accuracy compared to simple ID3.
