*Article* **Using Machine Learning Models for Predicting the Water Quality Index in the La Buong River, Vietnam**

**Dao Nguyen Khoi 1,2,\* , Nguyen Trong Quan <sup>3</sup> , Do Quang Linh <sup>4</sup> , Pham Thi Thao Nhi <sup>3</sup> and Nguyen Thi Diem Thuy 1,2**


**Abstract:** For effective management of water quantity and quality, it is absolutely essential to estimate the pollution level of the existing surface water. This case study aims to evaluate the performance of twelve machine learning (ML) models, including five boosting-based algorithms (adaptive boosting, gradient boosting, histogram-based gradient boosting, light gradient boosting, and extreme gradient boosting), three decision tree-based algorithms (decision tree, extra trees, and random forest), and four ANN-based algorithms (multilayer perceptron, radial basis function, deep feed-forward neural network, and convolutional neural network), in estimating the surface water quality of the La Buong River in Vietnam. Water quality data at four monitoring stations alongside the La Buong River for the period 2010–2017 were utilized to calculate the water quality index (WQI). Prediction performance of the ML models was evaluated by using two efficiency statistics (i.e., R<sup>2</sup> and RMSE). The results indicated that all twelve ML models have good performance in predicting the WQI but that extreme gradient boosting (XGBoost) has the best performance with the highest accuracy (R<sup>2</sup> = 0.989 and RMSE = 0.107). The findings strengthen the argument that ML models, especially XGBoost, may be employed for WQI prediction with a high level of accuracy, which will further improve water quality management.

**Keywords:** La Buong River; machine learning algorithms; surface water quality; water quality index (WQI)
