Kernel-Based Versus Tree-Based Data-Driven Models: On Applying Suspended Sediment Load Estimation
Abstract
:1. Introduction
2. Materials and Methods
2.1. Study Area and Data Used
2.2. Used Models
2.2.1. Gaussian Process Regression
2.2.2. Support Vector Regression
2.2.3. M5 Tree
2.2.4. Random Forest
2.2.5. Multilayer Perceptron
2.2.6. Extra Trees
2.2.7. Multi-Search
2.2.8. Reduced Error Pruning Tree (REPTree)
2.2.9. Random Tree
2.3. Evaluation Metrics
3. Results and Discussion
4. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Nomenclature
AI: artificial intelligence | r: Correlation coefficient |
ANFIS: Adaptive Neuro-Fuzzy Inference System | REPT: reduced error pruning tree |
ANN: artificial neural network | REPTree: reduced REP tree pruning |
DT: decision tree | RF: random forest |
GA: genetic algorithm | RMSE: root mean square error |
GP: Gaussian process | RT: random tree |
GPR: Gaussian process regression | SS: suspended sediment |
MAE: mean absolute error | SSL: suspended sediment load |
ML: machine learning | SVM: support vector machine |
MLP: multilayer perceptron network | SVR: support vector regression |
MLR: multiple linear regression | WI: Wilmot index |
References
- Pandey, M.; Azamathulla, H.M.; Chaudhuri, S.; Pu, J.H.; Pourshahbaz, H. Reduction of time-dependent scour around piers using collars. Ocean Eng. 2020, 213, 107692. [Google Scholar] [CrossRef]
- Tsegaye, L.; Bharti, R. Soil erosion and sediment yield assessment using RUSLE and GIS-based approach in Anjeb watershed, Northwest Ethiopia. SN Appl. Sci. 2021, 3, 582. [Google Scholar] [CrossRef]
- Das, B.; Pal, S.C.; Malik, S. Assessment of flood hazard in a riverine tract between Damodar and Dwarkeswar River, Hugli District, West Bengal. Spat. Inf. Res. 2018, 26, 91–101. [Google Scholar] [CrossRef]
- Sahour, H.; Gholami, V.; Vazifedan, M.; Saeedi, S. Machine learning applications for water-induced soil erosion modeling and mapping. Soil Tillage Res. 2021, 211, 105032. [Google Scholar] [CrossRef]
- Frings, R.M.; Kleinhans, M.G. Complex variations in sediment transport at three large river bifurcations during discharge waves in the river Rhine. Sedimentology 2008, 55, 1145–1171. [Google Scholar] [CrossRef]
- Asadi, M.; Fathzadeh, A.; Kerry, R.; Ebrahimi-Khusfi, Z.; Taghizadeh-Mehrjardi, R. Prediction of river suspended sediment load using machine learning models and geo-morphometric parameters. Arab. J. Geosci. 2021, 14, 1926. [Google Scholar] [CrossRef]
- Varol, İ.S.; Çetin, N.; Kırnak, H. Evaluation of Image Processing Technique on Quality Properties of Chickpea Seeds (Cicer arietinum L.) Using Machine Learning Algorithms. J. Agric. Sci. 2023, 29, 427–442. [Google Scholar] [CrossRef]
- Kisi, O.; Yuksel, I.; Dogan, E. Modelling daily suspended sediment of rivers in Turkey using several data-driven techniques. Hydrol. Sci. J. 2008, 53, 1270–1285. [Google Scholar] [CrossRef]
- Cigizoglu, H.K. Estimation and forecasting of daily suspended sediment data by multilayer perceptrons. Adv. Water Resour. 2004, 27, 185–195. [Google Scholar] [CrossRef]
- Francke, T.; Lopez-Tarazon, J.A.; Schroder, B. Estimation of suspended sediment concentration and yield using linear models, random forests and quantile regression forests. Hydrol. Process. 2008, 22, 4892–4904. [Google Scholar] [CrossRef]
- Azamathulla, H.M.; Ghani, A.A.; Chang, C.K.; Hasan, Z.A.; Zakaria, N.A. Machine learning approach to predict sedimentload—A case study. Clean—Soil Air Water 2010, 38, 969–976. [Google Scholar] [CrossRef]
- Senthil Kumar, A.; Kumar Goyal, M.; Ojha, C.; Swamee, P. Modeling of Suspended Sediment Concentration at Kasol in India Using ANN, Fuzzy Logic, and Decision Tree Algorithms. Expert. Syst. Appl. 2012, 41, 5267–5276. [Google Scholar] [CrossRef]
- Kumar Goyal, M. Modeling of Sediment Yield Prediction Using M5 Model Tree Algorithm and Wavelet Regression. J. Water Resour. Manag. 2014, 28, 1991–2003. [Google Scholar] [CrossRef]
- Yadav, A.; Chatterjee, S.; Equeenuddin, S.M. Prediction of suspended sediment yield by artificial neural network and traditional mathematical model in Mahanadi river basin, India. Sustain. Water Resour. Manag. 2018, 4, 745–759. [Google Scholar] [CrossRef]
- Choubin, B.; Darabi, H.; Rahmati, O.; Sajedi-Hosseini, F.; Kløve, B. River suspended sediment modelling using the CART model: A comparative study of machine learning techniques. Sci. Total Environ. 2018, 615, 272–281. [Google Scholar] [CrossRef]
- Roushangar, K.; Shahnazi, S. Prediction of sediment transport rates in gravel-bed rivers using Gaussian process regression. J. Hydroinformatics 2020, 22, 249–262. [Google Scholar] [CrossRef]
- Zounemat-Kermani, M.; Mahdavi-Meymand, A.; Alizamir, M.; Adarsh, S.; Yaseen, Z. On the complexities of sediment load modeling using integrative machine learning: Application of the great river of Loíza in Puerto Rico. J. Hydrol. 2020, 585, 124759. [Google Scholar] [CrossRef]
- Hazarika, B.; Gupta, D.; Berlin, M. Modeling suspended sediment load in a river using extreme learning machine and twin support vector regression with wavelet conjunction. Env. Earth Sci. 2020, 79, 234. [Google Scholar] [CrossRef]
- Nourani, V.; Kheiri, A.; Behfar, N. Multi-station artificial intelligence based ensemble modeling of suspended sediment load. Water Supply 2022, 22, 707–733. [Google Scholar] [CrossRef]
- Doroudi, S.; Sharafati, A.; Mohajeri, H. Estimation of Daily Suspended Sediment Load Using a Novel Hybrid Support Vector Regression Model Incorporated with Observer-Teacher-Learner-Based Optimization Method. Complex. Hindawi. 2021, 2021, 5540284. [Google Scholar] [CrossRef]
- Cakmak, S.; Demir, T.; Canpolat, E.; Serdar Aytac, A. Evaluation of the effects of precipitation and flow characteristics on suspended sediment transport in mountain-type Mediterranean climate; Korkuteli Stream sample, Antalya, Turkey. Arab. J. Geosci. 2021, 14, 2053. [Google Scholar] [CrossRef]
- Hanoon, M.; Abdullatif, B.; Ahmed, A.; Rezzaq, A.; Birima, A.; El-shafie, A. A comparison of various machine learning approaches performance for prediction suspended sediment load of river systems: A case study in Malaysia. Earth Sci. Inform. 2021, 15, 91–104. [Google Scholar] [CrossRef]
- Dehghani, N.; Vafakhah, M.; Bahremand, A.R. Simulation of streamflow using a hydrological model-distributed wetspa in Kasilian watershed. J. Water Soil. Conserv. 2013, 20, 253–261. [Google Scholar]
- Etedali, H.; Ahmadi, M. Evaluation of various meteorological datasets in estimation yield and actual evapotranspiration of wheat and maize (case study: Qazvin plain). Agric. Water Manag. 2021, 256, 107080. [Google Scholar] [CrossRef]
- Hosseinzadeh, H.; Safarzadeh, D.; Ahmadi, E.; Nabavi, A. Optimization of energy consumption of dairly farms using data envelopment analysis—A case study: Qazvin city of Iran. J. Saudi Soc. Agric. Sci. 2018, 21, 7–228. [Google Scholar]
- Ahmadi, M.; Etedali, H.; Elbeltagi, A. Evaluation of the effect of climate change on maize water footprint under RCPs scenarios in Qazvin plain, Iran. Agric. Water Manag. 2021, 254, 106969. [Google Scholar] [CrossRef]
- Pasban, A. Integrating Terrain and Vegetation Indices for Soil Erosion Estimation in the Amoughin Watershed Using RUSLE Model. Ph.D. Thesis, University of Mohaghegh Ardabili, Ardabil, Iran, 2020. [Google Scholar]
- Raza, A.; Fahmeed, R.; Syed, N.R.; Katipoğlu, O.M.; Zubair, M.; Alshehri, F.; Elbeltagi, A. Performance Evaluation of Five Machine Learning Algorithms for Estimating Reference Evapotranspiration in an Arid Climate. Water 2023, 15, 3822. [Google Scholar] [CrossRef]
- Ramıah Subburaj, S.D.; Vaıthyam Rengarajan, V.; Palanıswamy, S. Transfer Learning based Image Classification of Diseased Tomato Leaves with Optimal Fine-Tuning combined with Heat Map Visualization. J. Agric. Sci. 2023, 29, 1003–1017. [Google Scholar] [CrossRef]
- Dhakate, P.P.; Patil, S.; Rajeswari, K.; Abin, D. Preprocessing and Classification in WEKA Using Different Classifier. Int. J. Eng. Res. Appl. 2014, 4, 91–93. [Google Scholar]
- Rasmussen, C.E.; Williams, C.K., I. Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning Series); MIT Press: Cambridge, MA, USA, 2005. [Google Scholar]
- Neal, R.M. Monte Carlo implementation of Gaussian process models for Bayesian regression and classification. arXiv 1997, arXiv:physics/9701026. [Google Scholar]
- Kuss, M. Gaussian Process Models for Robust Regression, Classification, and Reinforcement Learning. Ph.D. Thesis, Technischen Universität, Darmstadt, Germany, 2006. [Google Scholar]
- Pal, M.; Deswal, S. Modelling pile capacity using Gaussian process regression. Comput. Geotech. 2010, 37, 942–947. [Google Scholar] [CrossRef]
- Boser, B.E.; Guyon, I.M.; Vapnik, V.N. A training algorithm for optimal margin classifiers. In Proceedings of the 5th Annual ACM Workshop on COLT, Pittsburgh, PA, USA, 27–29 July 1992; Haussler, D., Ed.; pp. 144–152. [Google Scholar]
- Vapnik, V.N. The Nature of Statistical Learning Theory; Springer: New York, NY, USA, 1995; 314p. [Google Scholar]
- Pal, M. M5 model tree for land cover classification. Int. J. Remote Sens. 2006, 27, 825–831. [Google Scholar] [CrossRef]
- Demirci, M. Prediction of Precipitation Flow Relationship Using Support Vector Machines and M5 Decision Tree Methods. DUMF Muhendis. Derg. 2019, 10, 1113–1124. [Google Scholar]
- Wang, Y.; Witten, I.H. Inducing model trees for continuous classes. In Proceedings of the Ninth European Conference on Machine Learning, Prague, Czech Republic, 23–25 April 1997; Springer: Berlin/Heidelberg, Germany, 1997. [Google Scholar]
- Larose, D.T. Discovering Knowledge in Data: An Introduction to Data Mining; John Wiley & Sons: Hoboken, NJ, USA, 2005. [Google Scholar]
- Quinlan, J.R. Learning with Continuous Classes. In Proceedings of the 5th Australian Joint Conference on Artificial Intelligence, Hobart, Tasmania, 16–18 November 1992; World Scientific: Singapore, 1992. [Google Scholar]
- Breiman, L. Application and analysis of random forests and machine learning. J. Water Manag. 2001, 15, 5–32. [Google Scholar]
- Özen, H.; Bal, C. A study on missing data problem in random Forest. Osman. J. Med. 2020, 42, 103–109. [Google Scholar] [CrossRef]
- Evans, J.S.; Cushman, S.A. Gradient modeling of conifer species using random forests. Landsc. Ecol. 2009, 24, 673–683. [Google Scholar] [CrossRef]
- Verikas, A.; Gelzinis, A.; Bacauskiene, M. Mining data with random forests: A survey and results of new tests. Pattern Recognit. 2011, 44, 330–349. [Google Scholar] [CrossRef]
- Beale, R.; Jackson, T. Neural Computing; Adam Hilger: Cape Cod, MA, USA, 1990. [Google Scholar]
- Cybenko, G. Approximation by superpositions of a sigmoidal function. Math Control Signal 1989, 2, 303–314. [Google Scholar] [CrossRef]
- Tang, Z.; De Almeida, C.; Fishwick, P.A. Time series forecasting using neural networks vs Box–Jenkins methodology. Simulation 1991, 57, 303–310. [Google Scholar] [CrossRef]
- Hornik, K.; Stinchcombe, M.; White, H. Multilayer feedforward networks are universal approximators. Neural Netw. 1989, 2, 359–366. [Google Scholar] [CrossRef]
- Ruder, S. An overview of gradient descent optimization algorithms. arXiv 2016, arXiv:1609.04747. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Shadkani, S.; Abbaspour, A.; Samadianfard, S.; Hashemi, S.; Mosavi, A.; Band, S.S. Comparative study of multilayer perceptron-stochastic gradient descent and gradient boosted trees for predicting daily suspended sediment load: The case study of the Mississippi River U.S. Int. J. Sediment Res. 2021, 36, 512–523. [Google Scholar] [CrossRef]
- Breiman, L.; Friedman, J.; Stone, C.J.; Olshen, R. Classification and Regression Trees (Wadsworth Statistics/Probability); Chapman and Hall/CRC: Boca Raton, FL, USA, 1984. [Google Scholar]
- Okoro, E.E.; Obomanu, T.; Sanni, S.E.; Olatunji, D.; Igbinedion, P. Application of artificial intelligence in predicting the dynamics of bottom hole pressure for under-balanced drilling: Extra tree compared with feed forward neural network model. Petroleum 2022, 8, 227–236. [Google Scholar] [CrossRef]
- Geurts, P.; Ernst, D.; Wehenkel, L. Extremely randomized trees. Mach. Learn. 2006, 63, 3–42. [Google Scholar] [CrossRef]
- John, V.; Liu, Z.; Guo, C.; Mita, S.; Kidono, K. Real-Time Lane Estimation Using Deep Features and Extra Trees Regression; Springer International Publishing: Cham, Switzerland, 2016; pp. 721–733. [Google Scholar] [CrossRef]
- Reutemann, P.; Rijn, J.; Frank, E. 2016. Available online: https://github.com/fracpete/multisearch-weka-package (accessed on 15 May 2024).
- Chang, C.-C.; Lin, C.-J. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2011, 2, 1–27. [Google Scholar] [CrossRef]
- Kalmegh, S. Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News. Int. J. Innov. Sci. Eng. Technol. 2015, 2, 438–446. [Google Scholar]
- Mohamed, W.; Salleh, M.; Omar, A. A comparative study of reduced error pruning method in decision tree algorithms, control systems, computing and engineering (ICCSCE). In Proceedings of the 2012 IEEE International Conference on Control System, Computing and Engineering, Penang, Malaysia, 23–25 November 2012; pp. 392–397. [Google Scholar]
- Chen, W.; Hong, H.; Li, S.; Shahabi, H.; Wang, Y.; Wang, W. Flood susceptibility modelling using novel hybrid approach of reduced-error pruning trees with bagging and random subspace ensembles. J. Hydrol. 2019, 575, 864–873. [Google Scholar] [CrossRef]
- Pfahringer, B. Random Model Trees: An Effective and Scalable Regression Method; Working Paper Series; University of Waikato: Hamilton, New Zealand, 2010. [Google Scholar]
- Ajayram, K.A.; Jegadeeshwaran, R.; Sakthivel, G.; Sivakumar, R.; Patange, A.D. Condition monitoring of carbide and non-carbide coated tool insert using decision tree and random tree—A statistical learning. Mater. Today Proc. 2021, 46, 1201–1209. [Google Scholar] [CrossRef]
- Willmott, C.J. On the validation of models. Phys. Geogr. 1981, 2, 184–194. [Google Scholar] [CrossRef]
- Taylor, K. Summarizing multiple aspects of model performance in a single diagram. J. Geophys. Res. 2001, 106, 7183–7192. [Google Scholar] [CrossRef]
- Sattari, M.T.; Rezazadeh, J.A.; Safdari, F.; Ghahramanian, F. Performance evaluation of m5 tree model and support vector regression methods in suspended sediment load modeling. J. Water Soil Resour. Conserv. 2016, 6, 109–124. [Google Scholar]
Scenario Number | Input Parameters |
---|---|
1 | Discharge |
2 | Discharge and Rain1 |
3 | Discharge, Rain1, and Rain2 |
Method | Key Features |
---|---|
Support vector regression (SVR) |
|
Gaussian process regression (GPR) |
|
M5 model tree |
|
Random forest |
|
Reduced error pruning tree (REPTree) |
|
Random tree |
|
Multi-search |
|
Extra trees |
|
Multilayer perceptron (MLP) |
|
Scenario | Model | r | RMSE | MAE | WI |
---|---|---|---|---|---|
1 | Extra tree | 0.498 | 0.052 | 0.013 | 0.590 |
GPR | 0.804 | 0.035 | 0.031 | 0.739 | |
M5P | 0.867 | 0.029 | 0.012 | 0.872 | |
Multilayer perceptron CS | 0.918 | 0.027 | 0.008 | 0.897 | |
Multi-search | 0.827 | 0.029 | 0.014 | 0.857 | |
REPT | 0.744 | 0.030 | 0.009 | 0.827 | |
RF | 0.770 | 0.057 | 0.015 | 0.705 | |
RT | 0.749 | 0.063 | 0.015 | 0.675 | |
SVR | 0.948 | 0.011 | 0.003 | 0.965 | |
2 | Extra tree | 0.946 | 0.046 | 0.011 | 0.815 |
GPR | 0.802 | 0.034 | 0.030 | 0.745 | |
M5P | 0.863 | 0.043 | 0.012 | 0.801 | |
Multilayer perceptron CS | 0.861 | 0.044 | 0.011 | 0.797 | |
Multi-search | 0.789 | 0.034 | 0.024 | 0.828 | |
REPT | 0.742 | 0.068 | 0.019 | 0.646 | |
RF | 0.778 | 0.063 | 0.018 | 0.682 | |
RT | 0.646 | 0.090 | 0.022 | 0.525 | |
SVR | 0.824 | 0.029 | 0.006 | 0.372 | |
3 | Extra tree | 0.890 | 0.055 | 0.015 | 0.757 |
GPR | 0.789 | 0.034 | 0.030 | 0.737 | |
M5P | 0.879 | 0.059 | 0.016 | 0.734 | |
Multilayer perceptron CS | 0.193 | 0.060 | 0.015 | 0.293 | |
Multi-search | 0.823 | 0.031 | 0.017 | 0.840 | |
REPT | 0.713 | 0.036 | 0.012 | 0.781 | |
RF | 0.866 | 0.073 | 0.020 | 0.676 | |
RT | 0.921 | 0.104 | 0.023 | 0.596 | |
SVR | 0.823 | 0.029 | 0.006 | 0.390 |
Statistic | Observed | GPR 2 | SVR 1 | M5P 1 | RF 3 | MLP 1 | Extra Trees 2 | Multi-Search 1 | Random Tree 3 | REP Tree 1 |
---|---|---|---|---|---|---|---|---|---|---|
Max. | 126,173 | 109,572 | 123,802 | 198,747 | 357,175 | 189,734 | 268,000 | 164,595 | 474,336 | 106,726 |
Min. | 0 | 7115 | 0 | 0 | 475 | 0 | 0 | 0 | 0 | 949 |
Mean | 3453 | 16647 | 2785 | 8193 | 11733 | 5518 | 7305 | 9699 | 12655 | 6535 |
Diff. * | - | 13,194 | −667 | 4739 | 8280 | 2064 | 3851 | 6246 | 9202 | 3081 |
SD | 15,517 | 13,324 | 13,034 | 23,958 | 46,136 | 25,436 | 35,529 | 21,277 | 62,406 | 20,753 |
CV | 4.49 | 0.8 | 4.68 | 2.92 | 3.93 | 4.61 | 4.86 | 2.19 | 4.93 | 3.17 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sattari, M.T.; Apaydin, H.; Milweski, A. Kernel-Based Versus Tree-Based Data-Driven Models: On Applying Suspended Sediment Load Estimation. Water 2024, 16, 2973. https://doi.org/10.3390/w16202973
Sattari MT, Apaydin H, Milweski A. Kernel-Based Versus Tree-Based Data-Driven Models: On Applying Suspended Sediment Load Estimation. Water. 2024; 16(20):2973. https://doi.org/10.3390/w16202973
Chicago/Turabian StyleSattari, Mohammad Taghi, Halit Apaydin, and Adam Milweski. 2024. "Kernel-Based Versus Tree-Based Data-Driven Models: On Applying Suspended Sediment Load Estimation" Water 16, no. 20: 2973. https://doi.org/10.3390/w16202973
APA StyleSattari, M. T., Apaydin, H., & Milweski, A. (2024). Kernel-Based Versus Tree-Based Data-Driven Models: On Applying Suspended Sediment Load Estimation. Water, 16(20), 2973. https://doi.org/10.3390/w16202973