*Article* **Water Quality Evaluation and Pollution Source Apportionment of Surface Water in a Major City in Southeast China Using Multi-Statistical Analyses and Machine Learning Models**

**Yu Zhou <sup>1</sup> , Xinmin Wang <sup>1</sup> , Weiying Li 2,3,\*, Shuyun Zhou <sup>4</sup> and Laizhu Jiang <sup>5</sup>**


**Abstract:** The comprehensive evaluation of water quality and identification of potential pollution sources has become a hot research topic. In this study, 14 water quality parameters at 4 water quality monitoring stations on the M River of a city in southeast China were measured monthly for 10 years (2011–2020). Multiple statistical methods, the water quality index (WQI) model, machine learning (ML), and positive matrix factorisation (PMF) models were used to assess the overall condition of the river, select crucial water quality parameters, and identify potential pollution sources. The average WQI values of the four sites ranged from 68.31 to 77.16, with a clear trend of deterioration from upstream to downstream. A random forest-based WQI model (WQIRF model) was developed, and the results showed that Mn, Fe, faecal coliform, dissolved oxygen, and total nitrogen were selected as the top five important water quality parameters. Based on the results of the WQIRF and PMF models, the contributions of potential pollution sources to the variation in the WQI values were quantitatively assessed and ranked. These findings prove the effectiveness of ML in evaluating water quality, and improve our understanding of surface water quality, thus providing support for the formulation of water quality management strategies.

**Keywords:** water quality index (WQI); machine learning; parameter selection; positive matrix factorization (PMF); source apportionment
