*3.4. Performance Evaluation of ML Models*

In the current study, two model efficiency statistics, namely, the root mean square error (RMSE) and coefficient of determination (R<sup>2</sup> ), were utilized to evaluate the goodness of fit between the predictions and observations. RMSE measures the deviation between the observed and predicted values, and R<sup>2</sup> measures the degree of correlation between the observed and predicted data [30].

$$\text{RMSE} = \sqrt{\frac{\sum\_{i=1}^{n} (\mathbf{O}\_{i} - \mathbf{P}\_{i})^{2}}{n}} \tag{3}$$

$$\mathbf{R}^2 = 1 - \frac{\sum\_{i=1}^n \left(\mathbf{O\_i} - \mathbf{P\_i}\right)^2}{\sum\_{i=1}^n \left(\mathbf{O\_i} - \overline{\mathbf{O}}\right)^2} \tag{4}$$

where n is the total number of predicted values, O<sup>i</sup> is the observed value, O is the mean of observed values, and P<sup>i</sup> is the predicted value.

#### **4. Results and Discussion**

#### *4.1. Performance Evaluation of Boosting-Based Models*

Table 4 exhibits the model performance of the boosting-based algorithms during the testing process. Results showed that AdaBoost-S2 (R<sup>2</sup> = 0.973 and RMSE = 0.175) had the highest performance in predicting WQI among the AdaBoost models, GBM-S7 (R<sup>2</sup> = 0.989 and RMSE = 0.108) had the highest performance among the GBM models, HGBM-S2 (R<sup>2</sup> = 0.967 and RMSE = 0.183) had the highest performance among the GBM models, LightGBM-S6 (R<sup>2</sup> = 0.986 and RMSE = 0.119) had the highest performance among the LightGBM models, and XGBoost-S9 (R<sup>2</sup> = 0.989 and RMSE = 0.107) had the highest performance among the XGBoost models under the S1–S10 scenarios. Additionally, the comparison plots of the measured WQI values with the WQI values predicted by AdaBoost-S2, GBM-S7, HGBM-S2, LightGBM-S6, and XGBoost-S9 in the testing period are shown in Figure 2. Generally, these models replicated very well the measured WQI during the testing period. However, there are small discrepancies between the measured and predicted WQI high or low values (especially those of AdaBoost-S2 and HGBM-S2). On the whole, the comparison between the boosting-based models under the S1–S10 scenarios demonstrates the XGBoost-S9 model as the best performance model.


*Water* **2022**, *14*, x FOR PEER REVIEW 7 of 12

**Models S1 S2 S3 S4 S5 S6 S7 S8 S9 S10** 

tions during the testing process.

**Table 4.** Efficiency statistics of the 12 ML model under the 10 scenarios of input variable combinations during the testing process. AdaBoost RMSE 0.550 0.175 0.211 0.205 0.205 0.207 0.212 0.212 0.221 0.219 R2 0.690 0.973 0.959 0.960 0.962 0.964 0.960 0.961 0.955 0.958

**Table 4.** Efficiency statistics of the 12 ML model under the 10 scenarios of input variable combina-

**Figure 2.** *Cont.*

*Water* **2022**, *14*, x FOR PEER REVIEW 8 of 12

**Figure 2.** Temporal variation in the observed and predicted WQI values for the best performance models using boosting-based algorithms during the testing period. (**a**) AdaBoost-S2. (**b**) GBM-S7. (**c**) HGBM-S2. (**d**) LightGBM-S6. (**e**) XGBoost-S9. **Figure 2.** Temporal variation in the observed and predicted WQI values for the best performance models using boosting-based algorithms during the testing period. (**a**) AdaBoost-S2. (**b**) GBM-S7. (**c**) HGBM-S2. (**d**) LightGBM-S6. (**e**) XGBoost-S9. models using boosting-based algorithms during the testing period. (**a**) AdaBoost-S2. (**b**) GBM-S7. (**c**) HGBM-S2. (**d**) LightGBM-S6. (**e**) XGBoost-S9. *4.2. Performance Evaluation of Decision Tree-Based Models* 

#### *4.2. Performance Evaluation of Decision Tree-Based Models 4.2. Performance Evaluation of Decision Tree-Based Models* Table 4 also presents the model performance of the decision tree-based algorithms

Table 4 also presents the model performance of the decision tree-based algorithms during the testing process. The results indicated that DT-S5 (R2 = 0.979 and RMSE = 0.147), ExT-S5 (R2 = 0.985 and RMSE = 0.126), and RF-S5 (R2 = 0.986 and RMSE = 0.121) had the highest performance in predicting WQI among the DT models, ExT models, and RF models under the S1–S10 scenarios, respectively. Figure 3 displays the comparisons between the predicted and measured WQI for the DT-S5, ExT-S5, and RF-S5 models during the testing period. In general, all three models reproduced well the measured WQI and small differences between the measured and predicted WQI high or low values can be seen. Regarding the model performance of the decision tree-based models, RF-S5 had the high-Table 4 also presents the model performance of the decision tree-based algorithms during the testing process. The results indicated that DT-S5 (R<sup>2</sup> = 0.979 and RMSE = 0.147), ExT-S5 (R<sup>2</sup> = 0.985 and RMSE = 0.126), and RF-S5 (R<sup>2</sup> = 0.986 and RMSE = 0.121) had the highest performance in predicting WQI among the DT models, ExT models, and RF models under the S1–S10 scenarios, respectively. Figure 3 displays the comparisons between the predicted and measured WQI for the DT-S5, ExT-S5, and RF-S5 models during the testing period. In general, all three models reproduced well the measured WQI and small differences between the measured and predicted WQI high or low values can be seen. Regarding the model performance of the decision tree-based models, RF-S5 had the highest accurate prediction. during the testing process. The results indicated that DT-S5 (R2 = 0.979 and RMSE = 0.147), ExT-S5 (R2 = 0.985 and RMSE = 0.126), and RF-S5 (R2 = 0.986 and RMSE = 0.121) had the highest performance in predicting WQI among the DT models, ExT models, and RF models under the S1–S10 scenarios, respectively. Figure 3 displays the comparisons between the predicted and measured WQI for the DT-S5, ExT-S5, and RF-S5 models during the testing period. In general, all three models reproduced well the measured WQI and small differences between the measured and predicted WQI high or low values can be seen. Regarding the model performance of the decision tree-based models, RF-S5 had the highest accurate prediction.

(**c**) **Figure 3.** Temporal variation in the observed and predicted WQI values for the best performance models using decision tree-based algorithms during the testing period. (**a**) DT-S5. (**b**) ExT-S5. (**c**) RF-**Figure 3.** Temporal variation in the observed and predicted WQI values for the best performance models using decision tree-based algorithms during the testing period. (**a**) DT-S5. (**b**) ExT-S5. (**c**) RF-S5. **Figure 3.** Temporal variation in the observed and predicted WQI values for the best performance models using decision tree-based algorithms during the testing period. (**a**) DT-S5. (**b**) ExT-S5. (**c**) RF-S5.

= 0.360), DFNN-S2 (R2 = 0.973 and RMSE = 0.162), and CNN-S7 (R2 = 0.982 and RMSE =

According to the model performance of the ANN-based algorithms during the testing period (Table 4), MLP-S4 (R2 = 0.984 and RMSE = 0.132), RBF-S2 (R2 = 0.887 and RMSE

*4.3. Performance Evaluation of ANN-Based Models* 

*4.3. Performance Evaluation of ANN-Based Models* 

S5.

#### *4.3. Performance Evaluation of ANN-Based Models*

*Water* **2022**, *14*, x FOR PEER REVIEW 9 of 12

According to the model performance of the ANN-based algorithms during the testing period (Table 4), MLP-S4 (R<sup>2</sup> = 0.984 and RMSE = 0.132), RBF-S2 (R<sup>2</sup> = 0.887 and RMSE = 0.360), DFNN-S2 (R<sup>2</sup> = 0.973 and RMSE = 0.162), and CNN-S7 (R<sup>2</sup> = 0.982 and RMSE = 0.139) are the best models for predicting WQI among the MLP models, RBF models, DFNN models, and CNN models under the S1–S10 scenarios, respectively. Figure 4 illustrates the comparisons between the predicted and measured WQI for the MLP-S4, RBF-S2, DFNN-S2, and CNN-S7 models during the testing period. Generally, these four models reproduced well the measured WQI during the testing period. Moreover, small differences between the measured and predicted WQI high or low values can be observed for most models, except for RBF-S2, which show a considerable discrepancy. Regarding the model performance of the ANN-based models, MLP-S4 had the highest accurate prediction (R<sup>2</sup> = 0.984 and RMSE = 0.132). 0.139) are the best models for predicting WQI among the MLP models, RBF models, DFNN models, and CNN models under the S1–S10 scenarios, respectively. Figure 4 illustrates the comparisons between the predicted and measured WQI for the MLP-S4, RBF-S2, DFNN-S2, and CNN-S7 models during the testing period. Generally, these four models reproduced well the measured WQI during the testing period. Moreover, small differences between the measured and predicted WQI high or low values can be observed for most models, except for RBF-S2, which show a considerable discrepancy. Regarding the model performance of the ANN-based models, MLP-S4 had the highest accurate prediction (R2 = 0.984 and RMSE = 0.132).

#### *4.4. Discussion*

*4.4. Discussion*  A comparison of twelve ML models, including five boosting-based algorithms (Adaboost, GBM, HGBM, LightGBM, and XGBoost), three decision tree-based algorithms (DT, ExT, and RF), and four ANN-based algorithms (MLP, RBF, DFNN, and CNN), was conducted to evaluate their performance in predicting the WQI based on the model efficiency statistics. Based on the model performance of the twelve ML models, our findings indicate that all ML models could predict the WQI well for this study area, but the best scenarios of input variables to the ML models are different. This can be explained by the fact that each ML algorithm will respond in a different way to different input variables and data patterns [31]. As reported by Morton and Henderson [32] and Yang and Moyer [33], water quality data are characterized by a nonlinear distribution. In general, Adaboost, HGBM, A comparison of twelve ML models, including five boosting-based algorithms (Adaboost, GBM, HGBM, LightGBM, and XGBoost), three decision tree-based algorithms (DT, ExT, and RF), and four ANN-based algorithms (MLP, RBF, DFNN, and CNN), was conducted to evaluate their performance in predicting the WQI based on the model efficiency statistics. Based on the model performance of the twelve ML models, our findings indicate that all ML models could predict the WQI well for this study area, but the best scenarios of input variables to the ML models are different. This can be explained by the fact that each ML algorithm will respond in a different way to different input variables and data patterns [31]. As reported by Morton and Henderson [32] and Yang and Moyer [33], water quality data are characterized by a nonlinear distribution. In general, Adaboost, HGBM, RBF, and DFNN achieved good results under the S2 scenario of the input variables; DT, ExT, and RF achieved good results under the S5 scenario; and GBM and CNN achieved good

RBF, and DFNN achieved good results under the S2 scenario of the input variables; DT,

good results under the S7 scenario. In addition, MLP, LightGBM, and XGBoost performed well in Scenarios S4, S6, and S9, respectively. These findings indicate that most accurate prediction is dependent on the ML model parameters for the given scenario of input var-

After comparison of all twelve ML models, it indicated that the XGBoost model outperforms other ML models in the study area. In comparison with other studies, DFNN performs better than XGBoost, MLP, and RF in the Mahanadi River Basin in India [5].

iables, which is consistent with results of Hussain and Khan [31].

results under the S7 scenario. In addition, MLP, LightGBM, and XGBoost performed well in Scenarios S4, S6, and S9, respectively. These findings indicate that most accurate prediction is dependent on the ML model parameters for the given scenario of input variables, which is consistent with results of Hussain and Khan [31].

After comparison of all twelve ML models, it indicated that the XGBoost model outperforms other ML models in the study area. In comparison with other studies, DFNN performs better than XGBoost, MLP, and RF in the Mahanadi River Basin in India [5]. Asadollah et al. [4] indicated that ExT is superior to DT and support vector regression (SVR) in the Lam Tsuen River in Hong Kong. Moreover, DT performs better as compared to the MLP model in the Rawal Dam lake in Pakistan [14]. In general, different ML algorithms will give different performance when applied to different regions. Therefore, exploring and developing a generalized ML model for applications of water quality assessment is an ongoing struggle.

As stated in previous studies, an important gap is a lack of considering cross influences between the explanatory variables, namely, the cross-correlation between land-use classes and the cross-correlation between climate conditions in influencing river water quality [34–36]. Land-use change and climate change affect hydrological components, and consequently river discharge and pollutant transport [21]. Therefore, it is essential to take into account land-use and climate changes, which may improve the accuracy of the ML models.

### **5. Conclusions**

This research work was conducted to investigate the capability of twelve ML models, namely, five boosting-based algorithms (Adaboost, GBM, HGBM, LightGBM, XGBoost), three decision tree-based algorithms (DT, ExT, and R)), and four ANN-based algorithms (MLP, RBF, DFNN, and CNN), in predicting the WQI. The four WQ monitoring stations alongside the La Buong River were considered as a case study. Two model efficiency statistics (i.e., R<sup>2</sup> and RMSE) were chosen for performance comparison of the different ML models. XGBoost achieved an R<sup>2</sup> of 0.989 and RMSE of 0.107 in the testing process, thus being the most appropriate ML algorithm in the study area. It was followed by GBM, LightGBM, RF, ExT, MLP, CNN, DT, DFNN, AdaBoost, HGBM, and RBF. Generally, our findings strengthen the argument that ML models, particularly XGBoost, can be utilized for predicting the WQI with a high degree of accuracy, which will further improve water quality management.

**Author Contributions:** Conceptualization, D.N.K.; methodology, D.N.K. and N.T.Q.; software, N.T.Q. and N.T.D.T.; validation, N.T.Q. and N.T.D.T.; formal analysis, N.T.Q. and N.T.D.T.; data curation, D.Q.L. and P.T.T.N.; writing—original draft preparation, D.N.K., N.T.Q., P.T.T.N., D.Q.L. and N.T.D.T.; writing—review and editing, D.N.K., N.T.Q., P.T.T.N., D.Q.L. and N.T.D.T.; visualization, P.T.T.N. and D.Q.L.; supervision, D.N.K.; project administration, D.N.K.; funding acquisition, D.N.K. All authors have read and agreed to the published version of the manuscript.

**Funding:** The research was supported by the Department of Science and Technology of Ho Chi Minh City, managed by Institute for Computational Science and Technology under the contract number 11/2020/HÐ-QPTKHCN.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** We would like to thank the Institute for Computational Science and Technology for supporting us to complete this research.

**Conflicts of Interest:** The authors declare no conflict of interest.
