Assessing the Suitability of Boosting Machine-Learning Algorithms for Classifying Arsenic-Contaminated Waters: A Novel Model-Explainable Approach Using SHapley Additive exPlanations
Abstract
:1. Introduction
2. Study Area
3. Materials and Methods
3.1. Data Description
3.2. Model Development
3.2.1. Categorical Boosting
3.2.2. Natural Gradient Boosting
3.2.3. Adaptive Boosting
3.2.4. Light Gradient Boosting
3.2.5. Extreme Gradient Boosting
3.2.6. Gradient Boosting Machine
3.2.7. SHapley Additive exPlanation
3.3. Statistical Evaluation of Model Performance
4. Results and Discussion
4.1. Hydrogeochemistry of Input Parameters and Arsenic Pollution
4.2. Overall Model Performance
4.3. Single-Class Model Performance
4.4. Relative Importance of Predictor Variables
4.5. SHAP Global Interpretation
4.6. SHAP Local Interpretation
4.7. Contribution and Limitations
5. Conclusions
- In terms of overall assessment metrics (Acc, MCC, Kappa and AUC), all the boosting models (XGB, NGB, LGB, ADAB, CATB, and GBM) developed proved efficient in the arsenic modelling task with minimum AUC, MCC, Kappa, and Acc scores of 0.83, 0.58, and 0.76, respectively.
- The single class assessment metrics (precision, sensitivity and F1 score) indicate that the boosting models are more efficient at recognising high and low arsenic contaminated waters.
- Essentially, the XGB algorithm outperformed the remaining models in terms of overall and single-class assessment metrics, whereas ADAB obtained the least performance.
- High pH water was found to be highly correlated with high arsenic water, and vice versa. Water with high pH, Cond and TDS increases the likelihood of encountering high arsenic water sources. Low pH, Cond, and TDS levels are all indicators of low arsenic water. Medium arsenic waters are mostly associated with low Cond and low TDS.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Cho, K.H.; Sthiannopkao, S.; Pachepsky, Y.A.; Kim, K.-W.; Kim, J.H. Prediction of Contamination Potential of Groundwater Arsenic in Cambodia, Laos, and Thailand Using Artificial Neural Network. Water Res. 2011, 45, 5535–5544. [Google Scholar] [CrossRef] [PubMed]
- Naujokas, M.F.; Anderson, B.; Ahsan, H.; Aposhian, H.V.; Graziano, J.H.; Thompson, C.; Suk, W.A. The Broad Scope of Health Effects from Chronic Arsenic Exposure: Update on a Worldwide Public Health Problem. Environ. Health Perspect. 2013, 121, 295–302. [Google Scholar] [CrossRef] [PubMed]
- World Health Organization. Guidelines for Drinking-Water Quality; World Health Organization: Geneva, Switzerland, 2017. [Google Scholar]
- Smith, A.H.; Lingas, E.O.; Rahman, M. Contamination of Drinking-Water by Arsenic in Bangladesh: A Public Health Emergency. Bull. World Health Organ. 2000, 78, 1093–1103. [Google Scholar]
- Tan, Z.; Yang, Q.; Zheng, Y. Machine Learning Models of Groundwater Arsenic Spatial Distribution in Bangladesh: Influence of Holocene Sediment Depositional History. Environ. Sci. Technol. 2020, 54, 9454–9463. [Google Scholar] [CrossRef] [PubMed]
- Chakraborty, M.; Sarkar, S.; Mukherjee, A.; Shamsudduha, M.; Ahmed, K.M.; Bhattacharya, A.; Mitra, A. Modeling Regional-Scale Groundwater Arsenic Hazard in the Transboundary Ganges River Delta, India and Bangladesh: Infusing Physically-Based Model with Machine Learning. Sci. Total Environ. 2020, 748, 141107. [Google Scholar] [CrossRef]
- Erickson, M.L.; Elliott, S.M.; Brown, C.J.; Stackelberg, P.E.; Ransom, K.M.; Reddy, J.E.; Cravotta III, C.A. Machine-Learning Predictions of High Arsenic and High Manganese at Drinking Water Depths of the Glacial Aquifer System, Northern Continental United States. Environ. Sci. Technol. 2021, 55, 5791–5805. [Google Scholar] [CrossRef] [PubMed]
- Lombard, M.A.; Bryan, M.S.; Jones, D.K.; Bulka, C.; Bradley, P.M.; Backer, L.C.; Focazio, M.J.; Silverman, D.T.; Toccalino, P.; Argos, M.; et al. Machine Learning Models of Arsenic in Private Wells Throughout the Conterminous United States As a Tool for Exposure Assessment in Human Health Studies. Environ. Sci. Technol. 2021, 55, 5012–5023. [Google Scholar] [CrossRef]
- Ibrahim, B.; Ewusi, A.; Ahenkorah, I.; Ziggah, Y.Y. Modelling of Arsenic Concentration in Multiple Water Sources: A Comparison of Different Machine Learning Methods. Groundw. Sustain. Dev. 2022, 17, 100745. [Google Scholar] [CrossRef]
- Taieb, S.B.; Hyndman, R.J. A Gradient Boosting Approach to the Kaggle Load Forecasting Competition. Int. J. Forecast. 2014, 30, 382–394. [Google Scholar] [CrossRef] [Green Version]
- Chen, T.; Guestrin, C. Xgboost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
- Ferreira, A.J.; Figueiredo, M.A. Boosting Algorithms: A Review of Methods, Theory, and Applications. Ensemble Mach. Learn. 2012, 35–85. [Google Scholar] [CrossRef]
- Ayotte, J.D.; Nolan, B.T.; Gronberg, J.A. Predicting Arsenic in Drinking Water Wells of the Central Valley, California. Environ. Sci. Technol. 2016, 50, 7555–7563. [Google Scholar] [CrossRef] [PubMed]
- Wu, T.; Zhang, W.; Jiao, X.; Guo, W.; Hamoud, Y.A. Comparison of Five Boosting-Based Models for Estimating Daily Reference Evapotranspiration with Limited Meteorological Variables. PLoS ONE 2020, 15, e0235324. [Google Scholar] [CrossRef]
- Fan, J.; Ma, X.; Wu, L.; Zhang, F.; Yu, X.; Zeng, W. Light Gradient Boosting Machine: An Efficient Soft Computing Model for Estimating Daily Reference Evapotranspiration with Local and External Meteorological Data. Agric. Water Manag. 2019, 225, 105758. [Google Scholar] [CrossRef]
- Shen, K.; Qin, H.; Zhou, J.; Liu, G. Runoff Probability Prediction Model Based on Natural Gradient Boosting with Tree-Structured Parzen Estimator Optimization. Water 2022, 14, 545. [Google Scholar] [CrossRef]
- Dong, L.; Zeng, W.; Wu, L.; Lei, G.; Chen, H.; Srivastava, A.K.; Gaiser, T. Estimating the Pan Evaporation in Northwest China by Coupling CatBoost with Bat Algorithm. Water 2021, 13, 256. [Google Scholar] [CrossRef]
- Wolpert, D.H.; Macready, W.G. No Free Lunch Theorems for Optimization. IEEE Trans. Evol. Computat. 1997, 1, 67–82. [Google Scholar] [CrossRef] [Green Version]
- Escalante, H.J.; Escalera, S.; Guyon, I.; Baró, X.; Güçlütürk, Y.; Güçlü, U.; van Gerven, M.; van Lier, R. Explainable and Interpretable Models in Computer Vision and Machine Learning; Springer: Berlin/Heidelberg, Germany, 2018. [Google Scholar]
- Masís, S. Interpretable Machine Learning with Python: Learn to Build Interpretable High-Performance Models with Hands-on Real-World Examples; Packt Publishing Ltd.: Birmingham, UK, 2021. [Google Scholar]
- Štrumbelj, E.; Kononenko, I. Explaining Prediction Models and Individual Predictions with Feature Contributions. Knowl. Inf. Syst. 2014, 41, 647–665. [Google Scholar] [CrossRef]
- Lama, L.; Wilhelmsson, O.; Norlander, E.; Gustafsson, L.; Lager, A.; Tynelius, P.; Wärvik, L.; Östenson, C.-G. Machine Learning for Prediction of Diabetes Risk in Middle-Aged Swedish People. Heliyon 2021, 7, e07419. [Google Scholar] [CrossRef]
- Mangalathu, S.; Hwang, S.-H.; Jeon, J.-S. Failure Mode and Effects Analysis of RC Members Based on Machine-Learning-Based SHapley Additive exPlanations (SHAP) Approach. Eng. Struct. 2020, 219, 110927. [Google Scholar] [CrossRef]
- Ibrahim, B.; Ahenkorah, I.; Ewusi, A. Explainable Risk Assessment of Rockbolts’ Failure in Underground Coal Mines Based on Categorical Gradient Boosting and SHapley Additive exPlanations (SHAP). Sustainability 2022, 14, 11843. [Google Scholar] [CrossRef]
- Wen, X.; Xie, Y.; Wu, L.; Jiang, L. Quantifying and Comparing the Effects of Key Risk Factors on Various Types of Roadway Segment Crashes with LightGBM and SHAP. Accid. Anal. Prev. 2021, 159, 106261. [Google Scholar] [CrossRef] [PubMed]
- Wang, R.; Kim, J.-H.; Li, M.-H. Predicting Stream Water Quality under Different Urban Development Pattern Scenarios with an Interpretable Machine Learning Approach. Sci. Total Environ. 2021, 761, 144057. [Google Scholar] [CrossRef] [PubMed]
- Wang, S.; Peng, H.; Hu, Q.; Jiang, M. Analysis of Runoff Generation Driving Factors Based on Hydrological Model and Interpretable Machine Learning Method. J. Hydrol. Reg. Stud. 2022, 42, 101139. [Google Scholar] [CrossRef]
- Podgorski, J.; Berg, M. Global Threat of Arsenic in Groundwater. Science 2020, 368, 845–850. [Google Scholar] [CrossRef]
- Podgorski, J.; Wu, R.; Chakravorty, B.; Polya, D.A. Groundwater Arsenic Distribution in India by Machine Learning Geospatial Modeling. Int. J. Environ. Res. Public Health 2020, 17, 7119. [Google Scholar] [CrossRef]
- Amponsah, N.; Bakobie, N.; Cobbina, S.; Duwiejuah, A. Assessment of Rainwater Quality in Ayanfuri, Ghana. Am. Chem. Sci. J. 2015, 6, 172–182. [Google Scholar] [CrossRef]
- Agbenyezi, T.K.; Foli, G.; Gawu, S.K. Geochemical Characteristics of Gold-Bearing Granitoids At Ayanfuri In The Kumasi Basin, Southwestern Ghana: Implications For The Orogenic Related Gold Systems. Earth Sci. Malays. (ESMY) 2020, 4, 127–134. [Google Scholar] [CrossRef]
- Majeed, F.; Ziggah, Y.Y.; Kusi-Manu, C.; Ibrahim, B.; Ahenkorah, I. A Novel Artificial Intelligence Approach for Regolith Geochemical Grade Prediction Using Multivariate Adaptive Regression Splines. Geosyst. Geoenviron. 2022, 1, 100038. [Google Scholar] [CrossRef]
- Ghana Statistical Service. 2010 Population and Housing Census: District Analytical Report, Tarkwa Nsuaem Municipal. Available online: https://www.statsghana.gov.gh/ (accessed on 25 October 2014).
- Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
- Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: Unbiased Boosting with Categorical Features. In Proceedings of the Advances in Neural Information Processing Systems, Montréal, QC, Canada, 3–8 December 2018; Curran Associates, Inc.: Red Hook, NY, USA, 2018; Volume 31. [Google Scholar]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-Learn: Machine Learning in Python. Mach. Learn. Python 2011, 12, 2825–2830. [Google Scholar]
- Duan, T.; Anand, A.; Ding, D.Y.; Thai, K.K.; Basu, S.; Ng, A.; Schuler, A. Ngboost: Natural Gradient Boosting for Probabilistic Prediction. In Proceedings of the International Conference on Machine Learning, PMLR; pp. 2690–2700. Available online: http://proceedings.mlr.press/v119/duan20a.html?ref=https://githubhelp.com (accessed on 20 October 2022).
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
- Peters, J.; Baets, B.D.; Verhoest, N.E.C.; Samson, R.; Degroeve, S.; Becker, P.D.; Huybrechts, W. Random Forests as a Tool for Ecohydrological Distribution Modelling. Ecol. Model. 2007, 207, 304–318. [Google Scholar] [CrossRef]
- Ibrahim, B.; Majeed, F.; Ewusi, A.; Ahenkorah, I. Residual Geochemical Gold Grade Prediction Using Extreme Gradient Boosting. Environ. Chall. 2022, 6, 100421. [Google Scholar] [CrossRef]
- Kadiyala, A.; Kumar, A. Applications of Python to Evaluate the Performance of Decision Tree-Based Boosting Algorithms. Environ. Prog. Sustain. Energy 2018, 37, 618–623. [Google Scholar] [CrossRef]
- Dorogush, A.V.; Ershov, V.; Gulin, A. CatBoost: Gradient Boosting with Categorical Features Support. arXiv 2018, arXiv:1810.11363. [Google Scholar]
- Peng, T.; Zhi, X.; Ji, Y.; Ji, L.; Tian, Y. Prediction Skill of Extended Range 2-m Maximum Air Temperature Probabilistic Forecasts Using Machine Learning Post-Processing Methods. Atmosphere 2020, 11, 823. [Google Scholar] [CrossRef]
- Ferov, M.; Modrỳ, M. Enhancing Lambdamart Using Oblivious Trees. arXiv 2016, arXiv:1609.05610. [Google Scholar]
- Freund, Y.; Schapire, R.E. A Decision-Theoretic Generalization of on-Line Learning and an Application to Boosting. J. Comput. Syst. Sci. 1997, 55, 119–139. [Google Scholar] [CrossRef] [Green Version]
- Margineantu, D.D.; Dietterich, T.G. Prunning Adaptive Boosting. ICML 1997, 97, 211–218. [Google Scholar]
- Alsabti, K.; Ranka, S.; Singh, V. CLOUDS: A Decision Tree Classifier for Large Datasets. In Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 27–31 August 1998; Volume 2. No. 8. [Google Scholar]
- Shi, H. Best-First Decision Tree Learning. Available online: https://researchcommons.waikato.ac.nz/handle/10289/2317 (accessed on 19 October 2022).
- Chen, T.; He, T.; Benesty, M.; Khotilovich, V.; Tang, Y.; Cho, H.; Chen, K. Xgboost: Extreme Gradient Boosting. R Package, Version 0.4-2 2015, 1, 1–4. [Google Scholar]
- Friedman, J.H. Greedy Function Approximation: A Gradient Boosting Machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
- Friedman, J.; Hastie, T.; Tibshirani, R. Additive Logistic Regression: A Statistical View of Boosting (with Discussion and a Rejoinder by the Authors). Ann. Stat. 2000, 28, 337–407. [Google Scholar] [CrossRef]
- Dev, V.A.; Eden, M.R. Formation Lithology Classification Using Scalable Gradient Boosted Decision Trees. Comput. Chem. Eng. 2019, 128, 392–404. [Google Scholar] [CrossRef]
- Lundberg, S.M.; Lee, S.-I. A Unified Approach to Interpreting Model Predictions. Adv. Neural Inf. Process. Syst. 2017, 30, 4768–4777. [Google Scholar]
- Molnar, C. Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. [Online]. Available online: https://christophm.github.io/interpretable-ml-book/ (accessed on 29 September 2022).
- Lundberg, S.M.; Erion, G.G.; Lee, S.-I. Consistent Individualized Feature Attribution for Tree Ensembles. arXiv 2018, arXiv:1802.03888. [Google Scholar] [CrossRef]
- Hossin, M.; Sulaiman, M.N. A Review on Evaluation Metrics for Data Classification Evaluations. Int. J. Data Min. Knowl. Manag. Process 2015, 5, 1. [Google Scholar]
- Tanha, J.; Abdi, Y.; Samadi, N.; Razzaghi, N.; Asadpour, M. Boosting Methods for Multi-Class Imbalanced Data Classification: An Experimental Review. J. Big Data 2020, 7, 70. [Google Scholar] [CrossRef]
- Cohen, J. A Coefficient of Agreement for Nominal Scales. Educ. Psychol. Meas. 1960, 20, 37–46. [Google Scholar] [CrossRef]
- Chicco, D.; Jurman, G. The Advantages of the Matthews Correlation Coefficient (MCC) over F1 Score and Accuracy in Binary Classification Evaluation. BMC Genom. 2020, 21, 6. [Google Scholar] [CrossRef]
- Grandini, M.; Bagli, E.; Visani, G. Metrics for Multi-Class Classification: An Overview. arXiv 2020, arXiv:2008.05756. [Google Scholar]
- Ewusi, A.; Ahenkorah, I.; Kuma, J.S.Y. Groundwater Vulnerability Assessment of the Tarkwa Mining Area Using SINTACS Approach and GIS. Ghana Min. J. 2017, 17, 18–30. [Google Scholar] [CrossRef]
- Ewusi, A.; Apeani, B.Y.; Ahenkorah, I.; Nartey, R.S. Mining and Metal Pollution: Assessment of Water Quality in the Tarkwa Mining Area. Ghana Min. J. 2017, 17, 17–31. [Google Scholar] [CrossRef]
- Kusimi, J.M.; Kusimi, B.A. The Hydrochemistry of Water Resources in Selected Mining Communities in Tarkwa. J. Geochem. Explor. 2012, 112, 252–261. [Google Scholar] [CrossRef]
- Asante, K.A.; Agusa, T.; Kubota, R.; Subramanian, A.; Ansa-Asare, O.D.; Biney, C.A.; Tanabe, S. Evaluation of Urinary Arsenic as an Indicator of Exposure to Residents of Tarkwa, Ghana. West Afr. J. Appl. Ecol. 2008, 12, 45751. [Google Scholar] [CrossRef]
- Landis, J.R.; Koch, G.G. An Application of Hierarchical Kappa-Type Statistics in the Assessment of Majority Agreement among Multiple Observers. Biometrics 1977, 33, 363–374. [Google Scholar] [CrossRef]
- Welch, A.H.; Stollenwerk, K.G. Arsenic in Ground Water: Geochemistry and Occurrence; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2003; ISBN 978-1-4020-7317-5. [Google Scholar]
- Asante, K.A.; Agusa, T.; Subramanian, A.; Ansa-Asare, O.D.; Biney, C.A.; Tanabe, S. Contamination Status of Arsenic and Other Trace Elements in Drinking Water and Residents from Tarkwa, a Historic Mining Township in Ghana. Chemosphere 2007, 66, 1513–1522. [Google Scholar] [CrossRef] [PubMed]
- Smedley, P.L. Arsenic in Rural Groundwater in Ghana: Part Special Issue: Hydrogeochemical Studies in Sub-Saharan Africa. J. Afr. Earth Sci. 1996, 22, 459–470. [Google Scholar] [CrossRef]
- Bortey-Sam, N.; Nakayama, S.M.; Ikenaka, Y.; Akoto, O.; Baidoo, E.; Mizukawa, H.; Ishizuka, M. Health Risk Assessment of Heavy Metals and Metalloid in Drinking Water from Communities near Gold Mines in Tarkwa, Ghana. Environ. Monit. Assess. 2015, 187, 397. [Google Scholar] [CrossRef]
Model | Optimal Hyperparameters | Library |
---|---|---|
LGB | n_estimators = (150) max_depth = (3) | Lightgbm [34] |
XGB | n_estimators = (200) max_depth = (3) | Xgboost [11] |
CATB | n_estimators = (140) max_depth = (3) | Catboost [35] |
ADAB | n_estimators = (200) learaning_rate = (0.1) | Scikitlearn [36] |
GBM | n_estimators = (100) max_depth = (5) k-neighbors = (5) | Scikitlearn [36] |
NGB | n_estimators = (80) learaning_rate = (0.01) | Ngboost [37] |
RF | n_estimators = (170) max_depth = (5) | Scikitlearn [36] |
Model | Advantages | Limitations |
---|---|---|
LGB | (i) Fast learning [41] (ii) Lower memory consumption [41] | (i) Can lose predictive performance due to gradient-based one-side sampling split (GOSS) approximations [20] (ii) Sensitive to noisy data [34] |
XGB | (i) Higher execution speed [41] (ii) Less prone to overfitting (iii) Supports parallelisation (iv) Scalable | (i) Performs sub-optimally on sparse and unstructured data |
CATB | (i) Less prone to overfitting [17,42] | (i) Setting of different random numbers has a certain impact on the model prediction results [34] |
ADAB | (i) Easier implementation [41] (ii) Less prone to overfitting [34] (iii) Simpler feature selection [41] | (i) Sensitive to outliers and noisy data [41] |
GBM | (i) Insensitive to missing data [14] (ii) Reduced bias [14] (iii) Reduced overfitting [14] | (i) Computationally expensive |
NGB | (i) Flexible and scalable [37] (ii) Performs probabilistic prediction [37] (iii) Efficient for joint prediction [37] (iv) Modular with respect to base learners | (i) Limited in some skewed probability distribution [43] |
Metric | Formula | Equation | Description |
---|---|---|---|
Acc | (2) | It measures the ratio of correct predictions over the total number of instances evaluated. | |
Kappa | (3) | The kappa coefficient demonstrates the agreement between the observed classes and the measured classes [58]. | |
AUC | (4) | The AUC value indicates how well the probabilities of the positive class are separated from the negative class. | |
MCC | (5) | MCC is generally known to be a balanced metric for evaluating classification performance on data with varying class sizes [59]. It is a good indicator of the total unbalanced prediction model [60]. | |
Sensitivity | (6) | The percentage of the relevant materials data sets that were correctly identified. | |
Precision | (7) | Precision represents the proportion of predicted positive cases that are correctly real values [56]. | |
F1 | (8) | This measures the harmonic mean between recall and precision values [56]. |
Parameters | Unit | Limit | Surface Water | Groundwater | ||||||
---|---|---|---|---|---|---|---|---|---|---|
Mean | Std. Dev. | Min | Max | Mean | Std. Dev. | Min | Max | |||
pH units | - | 6.5–8.5 | 6.35 | 0.77 | 3.90 | 8.51 | 5.73 | 0.58 | 4.23 | 7.30 |
Cond | μS/cm | 2500 | 183.50 | 206.67 | 6.80 | 2040.00 | 245.91 | 140.84 | 83.30 | 1070.00 |
TDS | μg/L | 1,000,000 | 104,169 | 129,714 | 8440 | 2,390,000 | 150,003 | 100,199 | 48,300 | 934,000 |
Turbidity | NTU | 5 | 1312.72 | 11,465.35 | 0.60 | 292,600.00 | 18.17 | 27.36 | 0.20 | 142.00 |
As | μg/L | 10 | 28.51 | 65.88 | 2.00 | 620.00 | 4.23 | 7.59 | 2.00 | 88.29 |
Model | AUC | MCC | Kappa | Acc |
---|---|---|---|---|
LGB | 0.93 | 0.72 | 0.71 | 0.83 |
XGB | 0.93 | 0.75 | 0.75 | 0.86 |
CATB | 0.93 | 0.68 | 0.67 | 0.81 |
ADAB | 0.83 | 0.58 | 0.58 | 0.76 |
GBM | 0.93 | 0.69 | 0.69 | 0.82 |
NGB | 0.92 | 0.69 | 0.68 | 0.82 |
RF | 0.93 | 0.68 | 0.68 | 0.81 |
Model | Low (≤5 μg/L) | Medium (>5 to ≤10 μg/L) | High (>10 μg/L) | ||||||
---|---|---|---|---|---|---|---|---|---|
Precision | Sensitivity | F1 | Precision | Sensitivity | F1 | Precision | Sensitivity | F1 | |
XGB | 0.87 | 0.93 | 0.9 | 0.85 | 0.57 | 0.68 | 0.84 | 0.88 | 0.86 |
LGB | 0.84 | 0.89 | 0.87 | 0.66 | 0.63 | 0.64 | 0.92 | 0.85 | 0.88 |
ADAB | 0.78 | 0.87 | 0.82 | 0.52 | 0.53 | 0.52 | 0.88 | 0.67 | 0.76 |
CATB | 0.84 | 0.89 | 0.86 | 0.62 | 0.53 | 0.57 | 0.86 | 0.83 | 0.84 |
GBM | 0.86 | 0.86 | 0.86 | 0.63 | 0.63 | 0.63 | 0.85 | 0.85 | 0.85 |
NGB | 0.8 | 0.91 | 0.85 | 0.75 | 0.7 | 0.72 | 0.9 | 0.71 | 0.8 |
RF | 0.84 | 0.88 | 0.86 | 0.62 | 0.53 | 0.57 | 0.85 | 0.85 | 0.85 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ibrahim, B.; Ewusi, A.; Ahenkorah, I. Assessing the Suitability of Boosting Machine-Learning Algorithms for Classifying Arsenic-Contaminated Waters: A Novel Model-Explainable Approach Using SHapley Additive exPlanations. Water 2022, 14, 3509. https://doi.org/10.3390/w14213509
Ibrahim B, Ewusi A, Ahenkorah I. Assessing the Suitability of Boosting Machine-Learning Algorithms for Classifying Arsenic-Contaminated Waters: A Novel Model-Explainable Approach Using SHapley Additive exPlanations. Water. 2022; 14(21):3509. https://doi.org/10.3390/w14213509
Chicago/Turabian StyleIbrahim, Bemah, Anthony Ewusi, and Isaac Ahenkorah. 2022. "Assessing the Suitability of Boosting Machine-Learning Algorithms for Classifying Arsenic-Contaminated Waters: A Novel Model-Explainable Approach Using SHapley Additive exPlanations" Water 14, no. 21: 3509. https://doi.org/10.3390/w14213509
APA StyleIbrahim, B., Ewusi, A., & Ahenkorah, I. (2022). Assessing the Suitability of Boosting Machine-Learning Algorithms for Classifying Arsenic-Contaminated Waters: A Novel Model-Explainable Approach Using SHapley Additive exPlanations. Water, 14(21), 3509. https://doi.org/10.3390/w14213509