**5. Conclusions**

The classification abilities of Decision Trees such as CART, ID3, RF, and ensemble methods such as bagging and boosting are utilized for the classification and prediction of Fecal Coliform (FC) into four classes in this study. The variable with maximum information gain and gain ratio in the case of ID3 model, and the variable with maximum Gini Index in the case of CART model are selected at the root node, and such criteria used further down the tree till the leaf nodes, using DT algorithms for best classification in terms of maximum accuracy. The algorithms perform comparably well with each other, Random Forests being the most consistent in the classification of Fecal Coliform for the Upper Green River watershed overall. It performs better than CART and ID3 in all the phases, i.e., training, testing, and overall. Gradient Boosting and Extremely Randomized Trees are the other DT algorithms that show comparable accuracies as that of Random Forest in training and testing phases. The CART decision tree with Adaptive Boosting yielded the best testing accuracy. In contrast, the CART with Bagging and ID3 with Bagging and Adaptive Boosting yielded the comparable second-best testing accuracies respectively out of all the decision tree modeling attempts. There is no proof of exactly the same feature or attribute will be chosen for each node of the resulting tree for various DT algorithms. There is also no guarantee that accuracy of classification will be higher for the proposed classifier as it needs to be tested for a variety of water quality parameters of different watersheds under climate changes. Also, being greedy at each step/node may not ensure overall minimization of entropy or global optimization of the classification process. In the present work, the authors have focused on only the classification capabilities of the Decision Trees for this particular watershed/dataset. The present work explores the classification capabilities in training and testing phases only. The size of the data was one limitation because of which, the authors could not go for cross-validation. However, the depth of the successful trees is essentially governed by maximizing the information gain or minimizing the entropy, i.e., randomness at every level. It is generally found that the shorter trees are prone to better classification capabilities than the more extended trees [52] (Mitchell, 1997). This is due to lesser overtraining of the trees, leading to more successful generalization or predictions. From the above discussion of results, the following salient conclusions can be made as follows:


**Author Contributions:** A.H.—Methodology, software, validation, writing-original draft preparation, visualization; J.A.—Conceptualization, methodology, validation, writing-review and editing, visualization, validation, software, formal analysis, investigation, supervision, project administration, funding acquisition; All authors have read and agreed to the published version of the manuscript.

**Funding:** Council of Scientific and Industrial Research (CSIR), India grant (No. 24 (0356)/19/EMR-II), project titled "Experimental and Computational studies of Surface Water Quality parameters from Morphometry and Spectral Characteristics".

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data is owned by Upper Green Biological Preserve, Department of Biology, Western Kentucky University, Bowling Green, Kentucky, USA.

**Acknowledgments:** The corresponding author would like to express acknowledgments to the Council of Scientific and Industrial Research (CSIR), India grant (No. 24 (0356)/19/EMR-II) of the project titled "Experimental and Computational studies of Surface Water Quality parameters from Morphometry and Spectral Characteristics." The authors would like to thank Ouida Meier, Albert Meier, Stuart Foster, Tim Rink, and Jenna Harbaugh for providing us with the required data. The authors would like to thank Vamsi Krishna Sridharan, Institute of Marine Sciences, University of California, Santa Cruz, California, USA for a detailed review and comments of the manuscript. The authors would also like to thank M.M. Prakash Mohan and N.Satish, Research Scholars, Department of Civil Engineering, Birla Institute of Technology and Science, Pilani, Hyderabad Campus for helping out with a couple of figures.

**Conflicts of Interest:** The authors declare no conflict of interest.
