Groundwater Management Based on Time Series and Ensembles of Machine Learning
Abstract
:1. Introduction
- A robust model using the time-series algorithm and four ensemble techniques was proposed to distinguish between excellent drinking water, good drinking water, poor irrigation water, and very poor irrigation water.
- We used the GEOTHERM dataset and pre-processed it by replacing the missing and null values, solving the sparsity problem with our recommender system proposed in [9], and applying the oversampling technique SMOTE.
- The dataset’s dimensionality was reduced by the PCC feature selection method.
- With average precision, recall, DSC, and accuracy of approximately 98%, 89.25%, 93%, and 95%, respectively, the RF model differentiated between excellent drinking water, good drinking water, poor irrigation water, and very poor irrigation water.
2. Literature Review
3. Background
3.1. Ensemble
3.1.1. Bagging
3.1.2. Boosting
3.1.3. Stacking
3.1.4. Time Series
4. Materials and Methods
4.1. Dataset Description
4.2. Model Architecture and Training
4.2.1. Data Pre-Processing
Missing and Null Values Handling
Sparsity Problem
Sampling
4.2.2. Feature Selection
5. Model Implementation and Evaluation
5.1. Model Evaluation Metrics
5.2. Model Implementation
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Siebert, S.; Burke, J.; Faures, J.M.; Frenken, K.; Hoogeveen, J.; Döll, P.; Portmann, F.T. Groundwater use for Irrigation: A Global Inventory. Hydrol. Earth Syst. Sci. 2010, 14, 1863–1880. [Google Scholar] [CrossRef]
- Menon, S. Ground Water Management: Need for Sustainable Approach; Personal RePEc Archive: Munich, Germany, 2007. [Google Scholar]
- Zektser, I.S.; Everett, L.G. Groundwater Resources of the World and Their Use; UNESCO Digital Library: Fontenoy, Paris, 2004. [Google Scholar]
- Helena, B.; Pardo, R.; Vega, M.; Barrado, E.; Fernandez, J.M.; Fernandez, L. Temporal Evolution of Ground Water Composition in an Alluvial Aquifer (pisuerga river, spain) by Principal Component Analysis. Water Resour. 2000, 34, 807–816. [Google Scholar]
- Mohamad, S.; Arzaneh, F.; Mohamad, J.P. Quality of Groundwater in an Area with Intensive Agricultural Activity. Expo. Health 2016, 8, 93–105. [Google Scholar]
- Huq, M.E.; Su, C.; Li, J.; Sarven, M.S. Arsenic Enrichment and Mobilization in the Holocene Alluvial Aquifers of Prayagpur of Southwestern Bangladesh. Int. Biodeterior. Biodegrad. 2018, 128, 186–194. [Google Scholar] [CrossRef]
- Huq, M.E.; Su, C.; Fahad, S.; Li, J.; Sarven, M.S.; Liu, R. Distribution and Hydrogeochemical Behavior of Arsenic Enriched Groundwater in the Sedimentary Aquifer Comparison between Datong Basin (China) and Kushtia District (Bangladesh). Environ. Sci. Pollut. Res. 2018, 25, 15830–15843. [Google Scholar] [CrossRef] [PubMed]
- Zaidi, F.K.; Nazzal, Y.; Ahmed, I.; Naeem, M.; Jafri, M.K. Identification of Potential Artificial Groundwater Recharge Zones in North Western Saudi Arabia Using Gis and Boolean Logic. J. Afr. Earth Sci. 2015, 111, 156–169. [Google Scholar] [CrossRef]
- Abd El-Aziz, A.A.; Alsalem, K.O.; Mahmood, M.A. An Intelligent Groundwater Management Recommender System. Indian J. Sci. Technol. 2021, 14, 2871–2879. [Google Scholar] [CrossRef]
- Hou, D.; Song, X.; Zhang, G.; Zhang, H.; Loaiciga, H. An Early Warning and Control System for Urban, Drinking Water Quality Protection: Chinas Experience. Environ. Sci. Pollut Res. 2013, 20, 4496–4508. [Google Scholar] [CrossRef]
- Bassiliades, N.; Antoniades, I.; Hatzikos, E.; Vlahavas, I.; Koutitas, G.; Monitoring, A.I.S.; Quality, P.W. An Intelligent System for Monitoring and Predicting Water Quality. In Proceedings of the European Conference towards eENVIRONMENT, Prague, Czech Republic, 25 March 2009; pp. 534–542. [Google Scholar]
- Gino Sophia, S.G.; Sharmila, V.C.; Suchitra, S.; Muthu, T.S.; Pavithra, B. Water Management using Genetic Algorithm-based Machine Learning. Soft Comput. 2020, 24, 17153–17165. [Google Scholar] [CrossRef]
- Alahmadi, F.S. Groundwater Quality Categorization by Unsupervised Machine Learning in Madinah. In In Proceedings of the International Geoinformatics Conference (IGC2019), Riyadh, Saudi Arabia, February 2019. [Google Scholar]
- Inoue, J.; Yamagata, Y.; Chen, Y.; Poskitt, C.M.; Sun, J. Anomaly Detection for a Water Treatment System Using Unsupervised Machine Learning. In Proceedings of the 2017 IEEE International Conference on Data Mining Workshops (ICDMW), New Orleans, LA, USA, 18–21 November 2017. [Google Scholar]
- Yuvaraj, N.; Anusha, K.; MeagaVarsha, R. Healthcare Recommendation System for Water Affected Habitations using Machine Learning Algorithms. Int. J. Pure Appl. Math. 2018, 118, 3797–3809. [Google Scholar]
- Adnan, S.; Iqbal, J.; Maltamo, M.; Suleman, M.B.; Shahab, A.; Valbuena, R. A Simple Approach of Groundwater Quality Analysis, Classification, and Mapping in Peshawar, Pakistan. Environments 2019, 6, 123. [Google Scholar] [CrossRef]
- Salman, F.K.Z.A.S.; Hussein, M.T. Evaluation of Groundwater Quality in Northern Saudi Arabia using Multivariate Analysis and Stochastic Statistics. Environ. Earth Sci. 2015, 74, 7769–7782. [Google Scholar] [CrossRef]
- Kamakshaiah, K.; Seshadri, R. Ground Water Quality Assessment using Data Mining Techniques. Int. J. Comput. Appl. 2013, 76, 39–45. [Google Scholar]
- Al-Omran, A.; Al-Barakah, F.; Altuquq, A.; Aly, A.; Nadeem, M. Drinking Water Quality Assessment and Water Quality Index of Riyadh, Saudi Arabia. Water Qual. Res. J. 2015, 50, 287–296. [Google Scholar] [CrossRef]
- Asma, A.K.; Al-Jaloud, A.; El-Taher, A. Quality Level of Bottled Drinking Water Consumed in Saudi Arabia. J. Environ. Sci. Technol. 2014, 7, 90–106. [Google Scholar]
- Opitz, D.; Maclin, R. Popular Ensemble Methods: An Empirical Study. J. Artif. Intell. Res. 1999, 11, 169–198. [Google Scholar] [CrossRef]
- Polikar, R. Ensemble Based Systems in Decision Making. IEEE Circuits Syst. Mag. 2006, 6, 21–45. [Google Scholar] [CrossRef]
- Rokach, L. Ensemble-Based Classifiers. Artif. Intell. Rev. 2010, 33, 1–39. [Google Scholar] [CrossRef]
- Mohammed, A.; Kora, R. A Comprehensive Review on Ensemble Deep Learning: Opportunities and Challenges. J. King Saud Univ. Comput. Inf. Sci. 2023. [Google Scholar] [CrossRef]
- Analytics Vidhya. Available online: https://www.analyticsvidhya.com (accessed on 1 January 2022).
- Freund, Y.; Iyer, R.; Schapire, R.E.; Singer, Y. An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res. 2003, 4, 933–969. [Google Scholar]
- Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
- Friedman, J.; Hastie, T.; Tibshirani, R. Additive logistic regression: A statistical view of boosting (with discussion and a rejoinder by the authors). Ann. Stat. 2000, 28, 337–407. [Google Scholar] [CrossRef]
- Ma, Z.; Wang, P.; Gao, Z.; Wang, R.; Khalighi, K. Ensemble of machine learning algorithms using the stacked generalization approach to estimate the warfarin dose. PLoS ONE 2018, 13, e0205872. [Google Scholar] [CrossRef] [PubMed]
- Dinger, T.; Chang, Y.C.; Pavuluri, R.; Subramanian, D. Time series representation learning with contrastive triplet selection. In Proceedings of the 5th Joint International Conference on Data Science & Management of Data, 9th ACM IKDD CODS and 27th COMAD, Bangalore, India, 7 – 10 January 2022. [Google Scholar]
- Goff, F.; Bergfeld, D.; Janik, C.J.; Counce, D.; Murrell, M. Geochemical Data on Waters, Gases, Scales, and Rocks. Available online: https://help.waterdata.usgs.gov/faq/additional-background (accessed on 9 November 2011).
- James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning; Springer: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
- Luukka, P. Feature Selection using Fuzzy Entropy Measures with Similarity Classifier. Expert Syst. Appl. 2011, 38, 4600–4607. [Google Scholar] [CrossRef]
Type | Class | No. of Rows |
---|---|---|
Drinking | Excellent | 34,516 |
Good | 9915 | |
Irrigation | Poor | 1892 |
Very Poor | 77 | |
Total | 46,400 |
Type | Class | No. of Rows |
---|---|---|
Drinking | Excellent | 34,683 |
Good | 10,311 | |
Irrigation | Poor | 937 |
Very Poor | 469 | |
Total | 46,400 |
Model | Class | Precision | Recall | DSC |
---|---|---|---|---|
RF | Excellent | 95 | 100 | 97 |
Good | 97 | 82 | 89 | |
Poor | 100 | 75 | 86 | |
V. poor | 100 | 100 | 100 | |
Average | 98 | 89.25 | 93 | |
Accuracy | 95 | |||
Gradient Boosting | Excellent | 91 | 100 | 95 |
Good | 100 | 66 | 79 | |
Poor | 100 | 100 | 100 | |
V. poor | 100 | 100 | 100 | |
Average | 97.7 | 91.5 | 53.4 | |
Accuracy | 92 | |||
Bagging | Excellent | 91 | 92 | 91 |
Good | 71 | 77 | 74 | |
Poor | 0 | 0 | 0 | |
V. poor | 0 | 0 | 0 | |
Average | 40.5 | 42.25 | 41.25 | |
Accuracy | 86 | |||
AdaBoost | Excellent | 88 | 83 | 86 |
Good | 53 | 64 | 58 | |
Poor | 100 | 100 | 100 | |
V. poor | 100 | 100 | 100 | |
Average | 79 | 85.25 | 86.75 | 86 |
1 | 2 | 3 | 4 | |
---|---|---|---|---|
1 | 34,683 | 0 | 0 | 0 |
2 | 1875 | 8436 | 0 | 0 |
3 | 0 | 468 | 469 | 0 |
4 | 0 | 0 | 0 | 469 |
1 | 2 | 3 | 4 | |
---|---|---|---|---|
1 | 34,683 | 0 | 0 | 0 |
2 | 3515 | 6796 | 0 | 0 |
3 | 0 | 0 | 937 | 0 |
4 | 0 | 0 | 0 | 469 |
1 | 2 | 3 | 4 | |
---|---|---|---|---|
1 | 31,168 | 3515 | 0 | 0 |
2 | 3046 | 7265 | 0 | 0 |
3 | 0 | 468 | 469 | 0 |
4 | 0 | 0 | 469 | 0 |
1 | 2 | 3 | 4 | |
---|---|---|---|---|
1 | 28,824 | 5859 | 0 | 0 |
2 | 3749 | 6562 | 0 | 0 |
3 | 0 | 0 | 937 | 0 |
4 | 0 | 0 | 0 | 469 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Alsalem, K.O.; Mahmood, M.A.; A. Azim, N.; Abd El-Aziz, A.A. Groundwater Management Based on Time Series and Ensembles of Machine Learning. Processes 2023, 11, 761. https://doi.org/10.3390/pr11030761
Alsalem KO, Mahmood MA, A. Azim N, Abd El-Aziz AA. Groundwater Management Based on Time Series and Ensembles of Machine Learning. Processes. 2023; 11(3):761. https://doi.org/10.3390/pr11030761
Chicago/Turabian StyleAlsalem, Khalaf Okab, Mahmood A. Mahmood, Nesrine A. Azim, and A. A. Abd El-Aziz. 2023. "Groundwater Management Based on Time Series and Ensembles of Machine Learning" Processes 11, no. 3: 761. https://doi.org/10.3390/pr11030761
APA StyleAlsalem, K. O., Mahmood, M. A., A. Azim, N., & Abd El-Aziz, A. A. (2023). Groundwater Management Based on Time Series and Ensembles of Machine Learning. Processes, 11(3), 761. https://doi.org/10.3390/pr11030761