Using Real-Time Data and Unsupervised Machine Learning Techniques to Study Large-Scale Spatio–Temporal Characteristics of Wastewater Discharges and their Influence on Surface Water Quality in the Yangtze River Basin
Abstract
:1. Introduction
2. Material and Methods
2.1. Study Area, Monitoring Surface Water Sites, and Monitored Sewage Outlets
2.2. Wastewater-Generating Economic Activities in the YRB in 2016 and 2017
2.3. Monitoring Methods and Data Sources
2.4. Models and Algorithms
2.4.1. Clustering Algorithms
2.4.2. Significance Tests with Confidence Intervals
2.4.3. Correlation Analyses
2.4.4. Software Application
3. Results and Discussion
3.1. Spatial Zoning of the Wastewater-Generating Factories and the Surface Water Sites in the YRB Using the PAM Clustering
3.2. Identification of Heavily Polluted and Unpolluted Wastewater and Surface Water in the YRB Using EM Clustering
3.2.1. Identification of Heavily Polluted and Unpolluted Surface Water Sections in the YRB Using EM Clustering and Weekly Water Quality Data
3.2.2. Identification of Heavily Polluted and Unpolluted Wastewater Discharges in the YRB Using EM Clustering
3.2.3. Analyses of Heavily Polluted and Unpolluted Economic Activities in the YRB
3.3. Differences in the Pollutants in Wastewater and Economic Activities between the Heavily Polluted and Unpolluted Zones in the YRB
3.3.1. Differences in the Pollutant Concentrations in the Heavily Polluted Wastewater Discharges between the Heavily Polluted and Unpolluted Zones and Analyses of the Economic Activities in the YRB
3.3.2. Geographical, Administrative, and Economic Distributions of Heavily Polluted Wastewater Discharges in the Heavily Polluted and Unpolluted Zones in the YRB
3.4. Temporal Variations in the Pollution Characteristics and Relationships between Heavily Polluted Wastewater Discharges and Surface Water in the Heavily Polluted YRB Zone
3.4.1. Identification of Heavily Polluted Periods Using EM Clustering Based on Weekly Data for Heavily Polluted Wastewater Discharges and Heavily Polluted Surface Water in the Heavily Polluted YRB Zone
3.4.2. Temporal Correlations between Heavily Polluted Surface Water and Heavily Polluted Wastewater Discharges in the Heavily Polluted YRB Zone
4. Conclusions
Supplementary Materials
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
Abbreviations
References
- UN-Water. The United Nations World Water Development Report, 2017: Wastewater: The Untapped Resource; UNESCO CLD: Paris, France, 2017. [Google Scholar]
- Xinhua. China Battles Chemical Pollution along Yangtze. Available online: http://english.mep.gov.cn/News_service/media_news/201610/t20161011_365297.shtml (accessed on 30 August 2018).
- Xinhua. China Releases Yangtze Environmental Protection Plan. Available online: http://english.mep.gov.cn/News_service/media_news/201707/t20170724_418374.shtml (accessed on 30 August 2018).
- MEP, P.R.C. Cleaner, Greener Yangtze on the Agenda. Available online: http://english.mep.gov.cn/News_service/media_news/201712/t20171229_428830.shtml (accessed on 30 August 2018).
- Bach, P.M.; Rauch, W.; Mikkelsen, P.S.; McCarthy, D.T.; Deletic, A. A critical review of integrated urban water modelling Urban drainage and beyond. Environ. Mod. Softw. 2014, 54, 88–107. [Google Scholar] [CrossRef]
- Beck, M.B.; Reda, A. Identification and application of a dynamic-model for operational management of water-quality. Water Sci. Technol. 1994, 30, 31–41. [Google Scholar] [CrossRef]
- Liu, R.M.; Xu, F.; Zhang, P.P.; Yu, W.W.; Men, C. Identifying non-point source critical source areas based on multi-factors at a basin scale with SWAT. J. Hydrol. 2016, 533, 379–388. [Google Scholar] [CrossRef]
- Wu, Y.; Chen, J. Investigating the effects of point source and nonpoint source pollution on the water quality of the East River (Dongjiang) in South China. Ecol. Indic. 2013, 32, 294–304. [Google Scholar] [CrossRef]
- Cortés, U.; Sànchez-Marrè, M.; Ceccaroni, L.; R-Roda, I.; Poch, M. Artificial intelligence and environmental decision support systems. Appl. Intell. 2000, 13, 77–91. [Google Scholar] [CrossRef]
- Eggimann, S.; Mutzner, L.; Wani, O.; Schneider, M.Y.; Spuhler, D.; de Vitry, M.M.; Beutler, P.; Maurer, M. The Potential of Knowing More: A Review of Data-Driven Urban Water Management. Environ. Sci. Technol. 2017, 51, 2538–2553. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Di, Z.; Chang, M.; Guo, P. Water Quality Evaluation of the Yangtze River in China Using Machine Learning Techniques and Data Monitoring on Different Time Scales. Water 2019, 11, 339. [Google Scholar] [CrossRef]
- Rauch, W.; Urich, C.; Bach, P.M.; Rogers, B.C.; de Haan, F.J.; Brown, R.R.; Mair, M.; McCarthy, D.T.; Kleidorfer, M.; Sitzenfrei, R.; et al. Modelling transitions in urban water systems. Water Res. 2017, 126, 501–514. [Google Scholar] [CrossRef] [PubMed]
- Romero, J.M.P.; Hallett, S.H.; Jude, S. Leveraging big data tools and technologies: Addressing the challenges of the water quality sector. Sustainability 2017, 9, 19. [Google Scholar] [CrossRef]
- Chini, C.M.; Stillwell, A.S. The state of us urban water: Data and the energy-water nexus. Water Resour. Res. 2018, 54, 1796–1811. [Google Scholar] [CrossRef]
- Rui, Y.H.; Fu, D.F.; Minh, H.D.; Radhakrishnan, M.; Zevenbergen, C.; Pathirana, A. Urban Surface Water Quality, Flood Water Quality and Human Health Impacts in Chinese Cities. What Do We Know? Water 2018, 10, 18. [Google Scholar] [CrossRef]
- Borah, D.K.; Ahmadisharaf, E.; Padmanabhan, G.; Imen, S.; Mohamoud, Y.M. Watershed models for development and implementation of total maximum daily loads. J. Hydrol. Eng. 2019, 24, 18. [Google Scholar] [CrossRef]
- Meyer, A.M.; Klein, C.; Funfrocken, E.; Kautenburger, R.; Beck, H.P. Real-time monitoring of water quality to identify pollution pathways in small and middle scale rivers. Sci. Total Environ. 2019, 651, 2323–2333. [Google Scholar] [CrossRef] [PubMed]
- Fan, J.; Han, F.; Liu, H. Challenges of big data analysis. Natl. Sci. Rev. 2014, 1, 293–314. [Google Scholar] [CrossRef] [PubMed]
- Aghabozorgi, S.; Seyed Shirkhorshidi, A.; Ying Wah, T. Time-series clustering—A decade review. Inform. Syst. 2015, 53, 16–38. [Google Scholar] [CrossRef]
- Hill, D.J.; Minsker, B.S. Anomaly detection in streaming environmental sensor data: A data-driven modeling approach. Environ. Mod. Softw. 2010, 25, 1014–1022. [Google Scholar] [CrossRef]
- Mandel, P.; Maurel, M.; Chenu, D. Better understanding of water quality evolution in water distribution networks using data clustering. Water Res. 2015, 87, 69–78. [Google Scholar] [CrossRef]
- Osmi, S.F.C.; Malek, M.A.; Yusoff, M.; Azman, N.H.; Faizal, W.M. Development of river water quality management using fuzzy techniques: A review. Int. J. River Basin Manag. 2016, 14, 243–254. [Google Scholar] [CrossRef]
- Zou, H.; Zou, Z.; Wang, X. An Enhanced K-Means Algorithm for Water Quality Analysis of The Haihe River in China. Int. J. Environ. Res. Public Health 2015, 12, 14400–14413. [Google Scholar] [CrossRef]
- Li, D.; Wang, S.; Li, D. Spatial Data Mining: Theory and Application; Springer: Berlin, Germany, 2015; p. 329. [Google Scholar]
- Zhang, Q.; Couloigner, I. A new and efficient k-medoid algorithm for spatial clustering. In Proceedings of the Computational Science and Its Applications—ICCSA 2005, Singapore, 9–12 May 2005; Springer: Berlin, Germany, 2015; pp. 181–189. [Google Scholar]
- Wu, X.; Kumar, V.; Ross Quinlan, J.; Ghosh, J.; Yang, Q.; Motoda, H.; McLachlan, G.J.; Ng, A.; Liu, B.; Yu, P.S.; et al. Top 10 algorithms in data mining. Knowl. Inf. Syst. 2008, 14, 1–37. [Google Scholar] [CrossRef]
- Brunton, S.L.; Kutz, J.N. Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control; Cambridge University Press: Cambridge, UK, 2019. [Google Scholar]
- Dempster, A.P.; Laird, N.M.; Rubin, D.B. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B Ser. B Meth. 1977, 39, 1–22. [Google Scholar] [CrossRef]
- Do, C.B.; Batzoglou, S. What is the expectation maximization algorithm? Nat. Biotechnol. 2008, 26, 897. [Google Scholar] [CrossRef] [PubMed]
- Adler, J. R in a Nutshell: A Desktop Quick Reference; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2010. [Google Scholar]
- Omar, S.; Ngadi, A.; Jebur, H.H. Machine learning techniques for anomaly detection: An overview. Int. J. Comput. Appl. 2013, 79. [Google Scholar] [CrossRef]
- Editorial Committee of Encyclopedia of rivers and lakes in China. In Section of Changjiang River Basin; China Water & Power press: Beijing, China, 2010; Volume 1, p. 510.
- Wikipedia. Yangtze. Available online: https://en.wikipedia.org/wiki/Yangtze (accessed on 30 August 2018).
- General Office MEP. Ministry of Environmental Protection, the People’s Republic of China, Beijing, China, 2015. Available online: http://www.mee.gov.cn/gkml/hbb/bgt/201602/t20160204_329897.htm (accessed on 2 September 2018).
- GAQSIQ, P.R.C.; SA, P.R.C. Industrial Classification for National Economic Activities, Vol. GB/T 4754-2017; General Administration of Quality Supervision, Inspection and Quarantine and Standardization Administration, the People’s Republic of China: Beijing, China, 2017; p. 222.
- UN-DESA-SD. Series M No. 4/Rev.4, Department of Economic and Social Affairs, Statistics Division, 2008. Available online: https://unstats.un.org/unsd/publication/seriesm/seriesm_4rev4e.pdf (accessed on 30 August 2018).
- General Office MEP; Ministry of Environmental Protection. 2016 Report on the State of the Environment in China; Ministry of Environmental Protection: Beijing, China, 2016.
- Wang, X.P.; Zhang, F.; Kung, H.T.; Ghulam, A.; Trumbo, A.L.; Yang, J.Y.; Ren, Y.; Jing, Y.Q. Evaluation and estimation of surface water quality in an arid region based on EEM-PARAFAC and 3D fluorescence spectral index: A case study of the Ebinur Lake Watershed, China. Catena 2017, 155, 62–74. [Google Scholar] [CrossRef]
- China National Environmental Monitoring Centre. Weekly Reports on National Surface Water Quality Automatic Monitoring; China National Environmental Monitoring Centre: Beijing, China, 2016; Available online: http://www.cnemc.cn/sssj/szzdjczb/ (accessed on 1 February 2018).
- China National Environmental Monitoring Centre. Real-Time Data on National Surface Water Quality Automatic Monitoring Publishing System; China National Environmental Monitoring Centre: Beijing, China, 2016; Available online: http://58.68.130.147/# (accessed on 1 February 2018).
- Zhao, Y. R and Data Mining: Examples and Case Studies; Academic Press: Cambridge, MA, USA, 2012. [Google Scholar]
- Schubert, E.; Rousseeuw, P.J. Faster k-Medoids Clustering: Improving the PAM, CLARA, and CLARANS Algorithms. arXiv 2018, arXiv:1810.05691. [Google Scholar]
- Hennig, C.; Liao, T.F. How to find an appropriate clustering for mixed-type variables with application to socio-economic stratification. J. R. Stat. Soc. Ser. C (Appl. Stat.) 2013, 62, 309–369. [Google Scholar] [CrossRef] [Green Version]
- Scrucca, L.; Fop, M.; Murphy, T.B.; Raftery, A.E. mclust 5: Clustering, classification and density estimation using gaussian finite mixture models. R J. 2016, 8, 289. [Google Scholar] [CrossRef] [PubMed]
- Hollander, M.; Wolfe, D.A.; Chicken, E. Nonparametric Statistical Methods, 3rd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2015; p. 751. [Google Scholar]
- Cortez, B.; Carrera, B.; Kim, Y.-J.; Jung, J.-Y. An architecture for emergency event prediction using LSTM recurrent neural networks. Expert Syst. Appl. 2018, 97, 315–324. [Google Scholar] [CrossRef]
- Chen, P.; Li, L.; Zhang, H.B. Spatio-Temporal Variations and Source Apportionment of Water Pollution in Danjiangkou Reservoir Basin, Central China. Water 2015, 7, 2591–2611. [Google Scholar] [CrossRef] [Green Version]
- People’s Daily & China.org.cn. Biggest Water Transfer Project Ever Benefits 100 mln in China. Available online: http://english.mee.gov.cn/News_service/media_news/201706/t20170622_416491.shtml (accessed on 1 September 2018).
- Wilson, M.; Li, X.-Y.; Ma, Y.-J.; Smith, A.; Wu, J. A review of the economic, social, and environmental impacts of China’s South–North Water Transfer Project: A sustainability perspective. Sustainability 2017, 9, 1489. [Google Scholar] [CrossRef]
- World Health Organization. 2018. Available online: https://www.who.int/water_sanitation_health/monitoring/coverage/wastewater-country-files/en/ (accessed on 18 January 2019).
- UN-Water GLAAS. Trackfin Initiative: Tracking Financing to Sanitation, Hygiene and Drinking-Water at National Level: Guidance Document; World Health Organization: Geneva, Switzerland, 2017. [Google Scholar]
- Deng, W.H.; Wang, G.Y. A novel water quality data analysis framework based on time-series data mining. J. Environ. Manag. 2017, 196, 365–375. [Google Scholar] [CrossRef] [PubMed]
- Hou, D.B.; Liu, S.; Zhang, J.; Chen, F.; Huang, P.J.; Zhang, G.X. Online Monitoring of Water-Quality Anomaly in Water Distribution Systems Based on Probabilistic Principal Component Analysis by UV-Vis Absorption Spectroscopy. J. Spectrosc. 2014, 2014, 150636. [Google Scholar] [CrossRef]
- MEP, P.R.C.; GAQSIQ, P.R.C. Discharge Standard of Water Pollutants for Ammonia Industry, Vol. GB 13458-2013; Ministry of Environmental Protection and General Administration of Quality Supervision, Inspection and Quarantine: Beijing, China, 2013; p. 8.
- MEP, P.R.C.; GAQSIQ, P.R.C. Discharge standards of water pollutants for dyeing and finishing of textile industry, Vol. GB 4287-2012; Ministry of Environmental Protection and General Administration of Quality Supervision, Inspection and Quarantine: Beijing, China, 2012; p. 9.
- MEP, P.R.C.; GAQSIQ, P.R.C. GAQSIQ, P.R.C. Discharge Standard of Water Pollutants for Starch Industry, Vol. GB25461-2010; Ministry of Environmental Protection and General Administration of Quality Supervision, Inspection and Quarantine: Beijing, China, 2010; p. 10.
- MEP, P.R.C.; GAQSIQ, P.R.C. Discharge Standard of Pollutants for Municipal Wastewater Treatment Plant, Vol. GB 18918-2002; State Environmental Protection Administration and General Administration of Quality Supervision, Inspection and Quarantine: Beijing, China, 2003; p. 12.
- Cun, C.; Vilagines, R. Time series analysis on chlorides, nitrates, ammonium and dissolved oxygen concentrations in the Seine river near Paris. Sci. Total Environ. 1997, 208, 59–69. [Google Scholar] [CrossRef]
- EPA, U.S. Aquatic Life Ambient Water Quality Criteria for Ammonia—Freshwater 2013; Office of Water, U.S. EPA: Washington, DC, USA, 2013. Available online: https://www.epa.gov/sites/production/files/2015-08/documents/aquatic-life-ambient-water-quality-criteria-for-ammonia-freshwater-2013.pdf (accessed on 15 May 2018).
- Zhou, P.; Huang, J.; Pontius, R.G.; Hong, H. New insight into the correlations between land use and water quality in a coastal watershed of China: Does point source pollution weaken it? Sci. Total Environ. 2016, 543, 591–600. [Google Scholar] [CrossRef] [PubMed]
- Al-Mamun, A.; Zainuddin, Z.J.I.E.J. Sustainable river water quality management in Malaysia. IIUM Eng. J. 2013, 14. [Google Scholar] [CrossRef]
- Ministry of Environmental Protection. The 2018 National Working Conference on Environmental Protection Held in Beijing. Available online: http://english.mep.gov.cn/About_MEE/leaders_of_mee/liganjie/Activities_lgj/201802/t20180213_431467.shtml (accessed on 30 August 2018).
- Alizadeh, M.J.; Kavianpour, M.R.; Danesh, M.; Adolf, J.; Shamshirband, S.; Chau, K.-W. Effect of river flow on the quality of estuarine and coastal waters using machine learning models. Eng. Appl. Comput. Fluid Mech. 2018, 12, 810–823. [Google Scholar] [CrossRef] [Green Version]
- Olyaie, E.; Banejad, H.; Chau, K.-W.; Melesse, A.M. A comparison of various artificial intelligence approaches performance for estimating suspended sediment load of river systems: A case study in United States. J. Environ. Monit. Manag. 2015, 187, 189. [Google Scholar] [CrossRef] [PubMed]
- Shamshirband, S.; Jafari Nodoushan, E.; Adolf, J.E.; Abdul Manaf, A.; Mosavi, A.; Chau, K.-W. Ensemble models with uncertainty analysis for multi-day ahead forecasting of chlorophyll a concentration in coastal waters. Eng. Appl. Comput. Fluid Mech. 2019, 13, 91–101. [Google Scholar] [CrossRef]
Year | Pollutant | EM_PAM Cluster | Sample Number | Yearly Mean (mg L−1) | Yearly Median (mg L−1) | Welch t-Test P | Wilcoxon Test P | Welch t-Test T | Wilcoxon Test T | Weekly Means | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
MAX (mg L−1) | MIN (mg L−1) | SD (mg L−1) | CV | |||||||||||
2016 | COD | HPW_PAM2 | 1072 | 120.7 | 99.3 | (−14.8, 8.4) / | (−5.5, 11.5) / | (−6.2, 6.5) | (−9.4, 1.5) | 489.6 | 0.0 | 82.6 | 0.68 | |
HPW_PAM3 | 347 | 123.9 | 84.6 | (−26.5, −2.9) | (−36.5, −18.5) | 490.2 | 4.9 | 99.4 | 0.80 | |||||
UPW_PAM2 | 4021 | 19.4 | 16.8 | (−2.2, 0.8) / | (−3.4, −2.2) * | (2.9, 4.0) | (2.0, 2.9) | 324.5 | 0.0 | 13.4 | 0.69 | |||
UPW_PAM3 | 1235 | 20.9 | 20.4 | (2.3, 3.6) | (2.3, 3.5) | 61.5 | 0.0 | 10.0 | 0.48 | |||||
NH3-N | HPW_PAM2 | 1590 | 6.9 | 5.9 | (−0.3, 0.6) / | (−0.3, 0.4) / | (−2.0, −0.8) | (−1.4, −0.5) | 30.0 | 0.0 | 5.9 | 0.86 | ||
HPW_PAM3 | 888 | 6.8 | 5.1 | (−1.4, −0.1) | (−1.4, −0.1) | 28.6 | 0.1 | 5.8 | 0.85 | |||||
UPW_PAM2 | 4040 | 0.5 | 0.3 | (−0.1, −0.0) * | (−0.1, −0.0) * | (−0.0, 0.0) | (0.0, 0.0) | 9.2 | 0.0 | 0.5 | 1.13 | |||
UPW_PAM3 | 1189 | 0.5 | 0.4 | (−0.0, 0.0) | (−0.0, 0.0) | 5.7 | 0.0 | 0.5 | 1.02 | |||||
2017 | COD | HPW_PAM2 | 1242 | 120.6 | 99.8 | (−24.7, −11.4) * | (−24.9, −13.1) * | / | / | 451.6 | 1.4 | 71.9 | 0.60 | |
HPW_PAM3 | 765 | 138.6 | 124.2 | * | * | 486.5 | 0.0 | 75.2 | 0.54 | |||||
UPW_PAM2 | 3426 | 16.0 | 14.8 | (−2.4, −1.5) * | (−3.0, −2.2) * | * | * | 153.6 | 0.0 | 9.5 | 0.60 | |||
UPW_PAM3 | 2324 | 17.9 | 17.6 | * | * | 73.3 | 0.0 | 7.9 | 0.44 | |||||
NH3-N | HPW_PAM2 | 676 | 8.3 | 7.1 | (0.1, 1.5) * | (−0.2, 1.0) / | * | * | 29.9 | 0.0 | 6.8 | 0.82 | ||
HPW_PAM3 | 434 | 7.5 | 6.8 | * | * | 29.2 | 0.0 | 5.6 | 0.75 | |||||
UPW_PAM2 | 3410 | 0.5 | 0.3 | (−0.1, −0.0) * | (−0.1, −0.0) * | / | * | 6.8 | 0.0 | 0.5 | 1.14 | |||
UPW_PAM3 | 2273 | 0.5 | 0.4 | / | / | 6.4 | 0.0 | 0.5 | 0.96 |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Di, Z.; Chang, M.; Guo, P.; Li, Y.; Chang, Y. Using Real-Time Data and Unsupervised Machine Learning Techniques to Study Large-Scale Spatio–Temporal Characteristics of Wastewater Discharges and their Influence on Surface Water Quality in the Yangtze River Basin. Water 2019, 11, 1268. https://doi.org/10.3390/w11061268
Di Z, Chang M, Guo P, Li Y, Chang Y. Using Real-Time Data and Unsupervised Machine Learning Techniques to Study Large-Scale Spatio–Temporal Characteristics of Wastewater Discharges and their Influence on Surface Water Quality in the Yangtze River Basin. Water. 2019; 11(6):1268. https://doi.org/10.3390/w11061268
Chicago/Turabian StyleDi, Zhenzhen, Miao Chang, Peikun Guo, Yang Li, and Yin Chang. 2019. "Using Real-Time Data and Unsupervised Machine Learning Techniques to Study Large-Scale Spatio–Temporal Characteristics of Wastewater Discharges and their Influence on Surface Water Quality in the Yangtze River Basin" Water 11, no. 6: 1268. https://doi.org/10.3390/w11061268