Quantifying Predictive Uncertainty and Feature Selection in River Bed Load Estimation: A Multi-Model Machine Learning Approach with Particle Swarm Optimization

Le, Xuan-Hien; Huynh, Trung Tin; Song, Mingeun; Lee, Giha

doi:10.3390/w16141945

Open AccessEditor’s ChoiceArticle

Quantifying Predictive Uncertainty and Feature Selection in River Bed Load Estimation: A Multi-Model Machine Learning Approach with Particle Swarm Optimization

by

Xuan-Hien Le

^1,2,*

,

Trung Tin Huynh

³

,

Mingeun Song

¹ and

Giha Lee

^1,*

¹

Department of Advanced Science and Technology Convergence, Kyungpook National University, 2559 Gyeongsang, Sangju 37224, Republic of Korea

²

Faculty of Water Resources Engineering, Thuyloi University, 175 Tay Son, Dong Da, Hanoi 116705, Vietnam

³

Bach Khoa Ho Chi Minh City Technology Joint Stock Company, Ho Chi Minh City University of Technology, VNU-HCMC, Ho Chi Minh 70000, Vietnam

^*

Authors to whom correspondence should be addressed.

Water 2024, 16(14), 1945; https://doi.org/10.3390/w16141945

Submission received: 13 June 2024 / Revised: 25 June 2024 / Accepted: 1 July 2024 / Published: 10 July 2024

(This article belongs to the Special Issue Application of Machine Learning in Hydrologic Sciences)

Download

Browse Figures

Versions Notes

Abstract

This study presents a comprehensive multi-model machine learning (ML) approach to predict river bed load, addressing the challenge of quantifying predictive uncertainty in fluvial geomorphology. Six ML models—random forest (RF), categorical boosting (CAT), extra tree regression (ETR), gradient boosting machine (GBM), Bayesian regression model (BRM), and K-nearest neighbors (KNNs)—were thoroughly evaluated across several performance metrics like root mean square error (RMSE), and correlation coefficient (R). To enhance model training and optimize performance, particle swarm optimization (PSO) was employed for hyperparameter tuning across all the models, leveraging its capability to efficiently explore complex hyperparameter spaces. Our findings indicated that RF, GBM, CAT, and ETR demonstrate superior predictive performance (R score > 0.936), benefiting significantly from PSO. In contrast, BRM displayed lower performance (0.838), indicating challenges with Bayesian approaches. The feature importance analysis, including permutation feature and SHAP values, highlighted the non-linear interdependencies between the variables, with river discharge (Q), bed slope (S), and flow width (W) being the most influential. This study also examined the specific impact of individual variables on model performance by adding and excluding individual variables, which is particularly meaningful when choosing input variables for the model, especially in limited data conditions. Uncertainty quantification through Monte Carlo simulations highlighted the enhanced predictability and reliability of models with larger datasets. The correlation between increased training data and improved model precision was evident in the consistent rise in mean R scores and reduction in standard deviations as the sample size increased. This research underscored the potential of advanced ensemble methods and PSO to mitigate the limitations of single-predictor models and exploit collective model strengths, thereby improving the reliability of predictions in river bed load estimation. The insights from this study provide a valuable framework for future research directions focused on optimizing ensemble configurations for hydro-dynamic modeling.

Keywords:

ensemble techniques; feature sensitivity analysis; Monte Carlo simulation; particle swarm optimization (PSO); predictive uncertainty; river bed load estimation

1. Introduction

Predicting river bed load is a pivotal challenge in fluvial geomorphology, crucial for understanding river dynamics, sediment transport, and their implications across ecological and engineering applications. The task is inherently complex, driven by the stochastic and non-linear nature of sediment transport within diverse riverine systems. This complexity is further enhanced by factors such as variability in river discharge, sediment size distribution, channel morphology, and hydro-dynamic forces, which together complicate the prediction of reliable bed load measurements [1]. Additionally, the direct measurement of bed load during significant flooding events or over extended periods poses substantial difficulties, further exacerbating the challenges in making reliable predictions [2].

Traditional methodologies in sediment transport, including semi-empirical and empirical models [3,4], often necessitate extensive site-specific calibration and face challenges in transferability across diverse river environments, underscoring the limitations inherent in conventional approaches [5]. Numerical models in bed load transport prediction often rely on assumptions like grain diameter and density, which may not accurately represent the mechanics of sediment moving by saltation [6,7]. This has led to a growing recognition that a singular approach may not suffice to encompass the complex interplay of factors influencing bed load transport. This limitation has been highlighted in multi-model evaluations, which reveal that few approaches perform consistently well across broad, multi-region samples, underscoring the challenge of resolving site-specific variation and averaging over temporal and spatial scales [8,9]. Efforts to predict bed load transport are further challenged by the inherent variability in this process, which includes turbulent stress spatial variability, grain protrusion, bed heterogeneity, structural arrangement [10,11,12], and additional complicating factors such as vegetation-induced resistance [4], variability in alluvial cover thickness [13], and the processes of the aggradation and degradation of river beds [14]. These elements provide substantial uncertainty to predictions, often requiring models to be adaptable and robust across a wide range of conditions, especially during significant flood events or over prolonged periods [15].

In this context, advanced machine learning (ML) algorithms emerge as a promising alternative [16,17]. Unlike traditional models, ML approaches offer the flexibility to learn directly from data, capturing complex, non-linear relationships without the reliance on predefined equations [18,19]. Notably, the application of ML methods in environmental monitoring, including the estimation of scour depth [20], precipitation bias correction [21], and rainfall-runoff modeling [22,23], underscores their adaptability to varied contexts. This adaptability is pivotal for river bed load prediction, where the interaction of numerous factors—ranging from hydrological to geomorphological—determines sediment transport rates [24]. For example, Bhattacharya et al. [25] developed two ML models, model trees and artificial neural network (ANN), to estimate total load transport and bed load, utilizing observed samples provided by Gomez and Church [26]. Their analysis reaffirmed the advantages of ML models, particularly in their ability to process complex input parameters. It derived accurate predictions with significantly lower root mean squared errors (RMSEs) compared to traditional models, such as the models of Bagnold [27] and Einstein [28]. Azamathulla et al. [29] compared the adaptive neuro-fuzzy inference system (ANFIS) with four common bed load equations using a dataset based on observed data gathered from four watersheds in Malaysia. This study demonstrated that the ANFIS model can estimate the bed load data more efficiently than a regression-based equation.

In addition, Kitsikoudis et al. [30] and Kitsikoudis et al. [31] made a noteworthy advancement in applying ML to sediment transport by employing genetic algorithm (GA)-based symbolic regression (SR), ANN, and ANFIS models to analyze sediment concentration datasets from field and flume investigations. Their findings illuminated the superior performance of models trained on field data over those trained on flume data. The author concluded that ANNs provided the best outcomes, while the equations derived from SR are simpler and more clear. Roushangar and Koosheh [32] developed a bed load transport rate prediction model for three gravel-bed rivers utilizing support vector regression (SVR). The results showed that SVR models offered better performance than conventional equations. By contrast, the study of Roushangar and Shahnazi [33] pointed out that wavelet kernel extreme learning machine has a higher predictive potential than SVM in estimating bed load for 19 gravel-bed rivers in the USA. In 2020, Asheghi and Hosseini [34] expanded the scope of ML applications in bed load transport by developing three ANN-based prediction models (multilayer perceptions, generalized feed-forward neural network, and radial-based function) using a dataset in Idaho, USA. Despite the limited geographic scope, their study demonstrated the potential of ANNs to outperform existing empirical equations, the coefficient of determination range [0.93–0.98] for ANNs and [0.10–0.21] for empirical equations. Moreover, a block-combined network with a GA to improve ANN performance was proposed by Hosseini et al. [35]. In comparison to previous ANNs and empirical models, this model displayed improved performance accuracy (89.77%) when tested on 879 bed load datasets in Idaho, USA.

Despite the above advances, the application of ML in predicting river bed load across a wide range of fluvial settings remains underexplored. The studies mentioned above, while pioneering, were constrained by the limited size and scope of their datasets. This limitation underscores the need for research employing extensive, globally collected datasets to train and validate a diverse array of ML models. Hosseiny et al. [36] addressed this gap by developing an ANN model, utilizing measurements from 134 rivers to forecast bed load sediment transport rates. This comprehensive dataset enabled the development of a sediment transport model that outperforms current models and eliminates the necessity for site-specific calibration. Compared to other traditional models like Einstein [28], Wilcock and Crowe [37], and Recking [38], the ANN model demonstrated superior accuracy and reliability in reflecting the distribution of the observed bed loads. However, it also highlights a significant gap in the literature—the application of a comprehensive suite of ML models to river bed load prediction. While ANNs have been the focal point of much research, the potential of other ML algorithms, such as random forest (RF), categorical boosting (CAT), extra tree regression (ETR), or K-nearest neighbors (KNNs), remains underexplored in this specific context.

The present study extends previous work by employing a multi-model ML approach, integrating both individual algorithms and state-of-the-art ensemble techniques to predict river bed load, utilizing a comprehensive dataset collected from research by Recking [39]. This approach leverages the collective strengths of several well-established ML models, including RF, CAT, ETR, KNN, gradient boosting machine (GBM), and Bayesian regression model (BRM). Each model has been selected for its demonstrated efficacy in handling complex, non-linear data structures and its potential to contribute unique insights into the predictive challenges associated with river bed load estimation. Furthermore, the research enhances traditional ML approaches by introducing ensemble models that combine predictions from multiple models, aiming to reduce predictive uncertainty and improve accuracy. This innovative aspect examines the performance of ensemble models—such as the weighted averaging ensemble (WAE), stacking ensemble (SE), and voting ensemble (VE)—in achieving more precise bed load estimations than individual models.

The principal purpose of this study is to develop, analyze, and compare several advanced ML algorithms for the prediction of river bed loads. The study also performs a comparative analysis to discern which predictors excel in diverse scenarios and to understand the reasons behind their performance. The primary contributions of this research are as follows:

It presents one of the first comprehensive comparisons of multiple advanced ML models alongside novel ensemble techniques in the context of river bed load prediction, contributing to the methodological advancements in fluvial geomorphology.
The study conducts a detailed analysis of model performance using a range of metrics, offering insights into the strengths and limitations of each ML approach and ensemble approach.
By utilizing feature importance methods and Shapley additive explanation (SHAP) values, the research deepens the understanding of the critical drivers in bed load dynamics.
The uncertainty quantification through Monte Carlo simulations under various data scenarios offers valuable insights into the reliability and robustness of ML predictions and explores the effect of training data size on the accuracy of river bed load predictions.
The study’s findings have practical implications, aiding in more accurate and reliable predictions of river bed loads, which are crucial for effective river management and policy-making.

Following this introduction, Section 2 outlines the methodology, detailing data collection and pre-processing, and provides an overview of the ML models utilized. This section also discusses model training and uncertainty analysis. Section 3, Results and Discussion, presents a comparative performance analysis, interprets feature importance and SHAP value insights, assesses the predictive uncertainty under various training data sizes, and examines the performance of ensemble models. Section 4 summarizes the paper.

2. Materials and Methods

2.1. Data Collection and Pre-Processing

The bed load transport dataset for this study was sourced from BedloadWeb http://en.bedloadweb.com (accessed on 15 February 2024), an online platform accessible to the public that stores statistics on bed loads collected in laboratories and in the field, as well as databases and official reports. According to Recking [39], this database stands out for its global compilation of bed load transport measurements, encompassing over 120 sites worldwide and offering an expansive dataset of approximately 11,000 data points. Each data point in this dataset corresponds to a specific measurement of bed load transport rates.

This dataset integrates a wide variety of physical and hydrological variables, including river discharge (Q), flow width (W), bed slope (S), section averaged velocity (U), mean flow depth (H), hydraulic radius (R_h), and grain size distributions denoted by the 16th, 50th, 84th, and 90th percentiles (D16, D50, D84, and D90). These variables were selected to comprehensively represent the key factors influencing bed load transport, thus providing a robust foundation for developing ML models with widespread applicability across different riverine and geomorphological contexts.

A significant challenge in pre-processing the dataset was the uneven availability of variables across the different river survey locations. Some measurements, such as U, H, R_h, and specific grain size percentiles (D16 and D90), were inconsistently reported. This inconsistency poses a risk of introducing bias or over-reliance on incomplete datasets in ML models, potentially compromising the model’s predictive performance and generalizability. To address this challenge, a strategic decision was made to exclude the variables with high rates of missing data, specifically U, H, and R_h, from the analysis. This strategic decision was supported theoretically by the application of Manning’s equation in river hydraulics, where U, H, and R_h can be expressed as functions of the consistently reported variables Q, S, and W, along with the Manning roughness coefficient (n), generally correlated with grain size. Therefore, the exclusion minimizes potential bias and ensures that the ML models developed could rely on a consistently available set of variables. Here, the relationship between U, H, and R_h with channel geometry and roughness is implicitly maintained through the other included variables.

Furthermore, the survey locations lacking complete data for essential grain size distributions (D16 or D90) were omitted from the dataset. Consequently, the primary variables of interest for this study were refined to seven, including Q, W, S, and the grain size distributions (D16, D50, D84, and D90). In addition, the data underwent a screening process to identify and remove outliers, excluding extreme values that could potentially skew the model training. Specifically, transport data related to the discharge values outside the 95th percentile, extreme bed load flux values above the 95th percentile, and values below the 10th percentile were removed. After applying these steps, the dataset was reduced to 5347 samples, each encompassing a comprehensive and consistent set of variables critical for analyzing bed load transport dynamics. This refined dataset ensures a balanced representation of the physical characteristics relevant to bed load transport, improving the reliability and applicability of the resulting ML algorithms.

Subsequently, the data were log-transformed (base 10) to normalize the distribution of each variable. This transformation helps stabilize variance and make the dataset more model-friendly, particularly for algorithms sensitive to the scale of the input features. Following log transformation, the dataset underwent feature scaling to ensure uniformity in the range of data, and no single variable would dominate the model due to its scale. This scaling is crucial for the models sensitive to the magnitude of input, such as KNN. The scaling adjusted the values to a standard scale of 0 to 1, defined by the following:

X ’ = \log_{10} (X)

(1)

X_{s c a l e d} = \frac{X ’ - \min (X ’)}{\max (X ’) - \min (X ’)}

(2)

where X and X’ denote the original and log-transformed values in the dataset, and X_scaled represents the scaled value.

2.2. Overview of ML Algorithms

2.2.1. Random Forest (RF)

The RF algorithm is a powerful ensemble learning technique utilized for both regression and classification tasks. It works by constructing a large number of decision trees during training time and then outputs the mode of the mean prediction (regression) or classes (classification) of the individual trees. RF introduces randomness into the model in two ways: sampling the data points (bootstrap sampling) and picking feature subsets on every split in the decision trees, thereby increasing the diversity of the models in the ensemble [40]. RF is particularly suited for river bed load estimation due to its ability to handle high-dimensional data and its robustness to overfitting. By aggregating the predictions of numerous trees, RF can identify complex, non-linear correlations between the input factors and the target q_s.

Given a dataset D = ((x₁, y₁), (x₂, y₂), …, (x_n, y_n)) with n samples and each sample x_i having m features, the RF algorithm constructs B decision trees. A random sample of the data D_b is selected for each tree T_b. At each tree node, a subset of m’ < m features is chosen randomly, and the best split on these features is used to partition the data. The procedure is repeated until a stopping requirement is satisfied, for example, a minimum number of samples per leaf. The prediction for a new sample x’ is obtained by averaging the predictions from all the individual trees (T_b):

\overset{⌢}{y} = \frac{1}{B} \sum_{b = 1}^{B} T_{b} (x ’)

(3)

2.2.2. Categorical Boosting (CAT)

CAT is an algorithm that builds upon the principles of gradient boosting and is particularly designed to function effectively with categorical variables alongside continuous ones [41]. It introduces innovative techniques such as ordered boosting and combining categorical features to enhance model performance and reduce overfitting.

Given a dataset D, CAT iteratively constructs an ensemble of decision trees. The key distinction lies in its treatment of categorical features, which are transformed using a statistics-based approach considering the target variable. The prediction model at iteration k is denoted as F_k(x) and is updated by fitting a tree to the pseudo-residuals calculated from the previous model F_k₋₁(x).

F_{k} (x) = F_{k - 1} (x) + α h_{k} (x)

(4)

where h_k(x) represents the decision tree trained on the pseudo-residuals at iteration k, and α is the learning rate.

2.2.3. Gradient Boosting Machine (GBM)

GBM is an ensemble approach that develops models consecutively, with each new model correcting errors from the previous ones. GBM combines numerous weak prediction models, commonly decision trees, to form a robust model stage-wise. The fundamental principle of GBM is to fit a new model to the residuals of the previous models and add it to the ensemble to minimize the overall prediction error [42]. GBM has notable efficacy in predictive tasks such as river bed load estimation, whereby the correlation between the predictors and the response predictor is complicated and non-linear. Its ability to sequentially focus on difficult-to-predict instances makes it a powerful tool for improving prediction accuracy.

Let F_k(x) be the predictive model at iteration k. The GBM algorithm seeks to minimize a loss function L(y, F(x)) over all the samples in the dataset, where y is the observed value and F(x) is the predictive value. The algorithm starts with an initial model F₀(x), often chosen as the mean of the target variable. At each subsequent iteration k, the predictions of the current model F_k₋₁(x) are utilized to fit a decision tree h_k(x) to the loss function’s negative gradient. The model is then updated as follows:

F_{0} (x) = \arg \min_{γ} \sum_{i = 1}^{n} L (y_{i}, γ)

(5)

h_{k} (x) = \arg \min_{h} \sum_{i = 1}^{n} [{- \frac{\partial L (y_{i}, F (x_{i}))}{\partial F (x_{i})}|}_{F (x) = F_{k - 1} (x)}] h (x_{i})

(6)

F_{k} (x) = F_{k - 1} (x) + α h_{k} (x)

(7)

where γ represents the initial model parameters; α is the learning rate, which determines how much each tree contributes to the final model.

2.2.4. Extra Tree Regression (ETR)

ETR is an ensemble method that builds upon the principles of RFs by constructing many unpruned decision trees. The key differences are in how the splits are chosen and how the data are sampled. ETR selects splits completely at random and builds the trees using the whole dataset rather than a bootstrap sample [43]. ETR’s approach of using the entire dataset and selecting splits randomly increases the diversity among the trees in the ensemble, often leading to improved model robustness and generalization capability compared to standard RFs.

Given a dataset D, ETR constructs B decision trees. For each split in each tree, rather than searching for the optimal split point, ETR randomly selects split points and chooses the best one among them:

T_{b} (x) = a g g r e g a t e (\{t (x; θ_{b, i}) |i = 1, \dots, n\})

(8)

where t(x; θ_b_,i) represents the i^th tree in the b^th ensemble, parameterized by θ; and the aggregation is typically the average for regression task.

2.2.5. K-Nearest Neighbors (KNNs)

KNN is a lazy learning method for classification and regression that is non-parametric. It operates on the simple principle of determining the k-nearest training instances in the feature space and producing estimates based on the majority vote (for classification) or average (for regression) of these neighbors [44]. KNN’s simplicity and effectiveness make it particularly useful for river bed load estimation when the connection between the input variables and the objective may not be linear, but a local similarity in the feature space can be exploited.

For a new sample x’, the KNN algorithm uses a distance metric, such as Euclidean, to calculate the distance between x’ and each point x_i in the training set. The algorithm then selects the k nearest points N_k(x’) based on the smallest distances and predicts the output variable by averaging the target values of these nearest neighbors for regression tasks:

d (x ’, x_{i}) = \sqrt{\sum_{j = 1}^{m} {(x ’_{j} - x_{i j})}^{2}}

(9)

\overset{⌢}{y} (x ’) = \frac{1}{k} \sum_{x_{i} \in N_{k} (x ’)} y_{i}

(10)

where m is the number of features.

2.2.6. Bayesian Regression Model (BRM)

BRM represents a probabilistic approach to regression analysis, which contrasts with traditional regression methods by incorporating prior knowledge about the model parameters through probability distributions [45]. BRM leverages computational techniques such as Markov Chain Monte Carlo to determine the posterior distribution of the coefficients from which predictions can be made. These predictions are inherently probabilistic, providing point estimates and confidence intervals that reflect the uncertainty in the model parameters. Incorporating prior information and explicitly modeling parameter uncertainty make BRM especially valuable in contexts where prior knowledge exists or where it is crucial to quantify the uncertainty in predictions.

In contrast to linear regression, which uses a linear function to represent the relationship between a dependent variable (y) and a set of independent variables (X), Bayesian regression models the regression coefficient (β) as random variables with specified prior distributions. The noise term (ε) is typically considered to be normally distributed (mean zero and variance σ²). The core equation in BRM is the posterior distribution of the regression coefficient given the data, expressed as follows:

y = X β + ε

(11)

p (β |y, X) = \frac{p (y |β, X) \cdot p (β)}{p (y |X)}

(12)

where

p (β |y, X)

is the posterior probability of the coefficients given the data, representing our updated belief about the coefficients after observing the data;

p (y |β, X)

represents the likelihood of observing the data given the coefficients, essentially the probability of the data under the model specified by β;

p (β)

represents the prior probability of the coefficients, encapsulating our beliefs about the values of β before observing the data; and

p (y |X)

represents the marginal likelihood of the data, serving as a normalizing constant to ensure the posterior distribution sums to one.

2.2.7. Ensemble Techniques

Ensemble techniques have received substantial interest in ML due to their robustness and superior predictive performance over single-model approaches. They operate on the premise that combining the predictions of multiple models can compensate for individual weaknesses and leverage their diverse strengths, thereby enhancing prediction accuracy and reducing model variance. This subsection delineates three sophisticated ensemble techniques employed in river bed load estimation: weighted averaging ensemble (WAE), stacking ensemble (SE), and voting ensemble (VE).

WAE approach synthesizes predictions from various models by averaging them together and assigning different weights to each model’s output. This method emphasizes the contribution of the more accurate models and downplays that of the less accurate ones. In this study, the weights (w_i) are inversely proportional to the model’s RMSE (RMSE_i), enhancing the ensemble’s predictive performance. Formally, the combined estimation

\hat{y}

for a given input vector X is expressed as follows:

R M S E_{i} = \sqrt{\frac{1}{m} \sum_{j = 1}^{m} {(y_{j} - f_{i} (X_{j}))}^{2}}

(13)

where i represents the i^th model in the ensemble; j represents the j^th observations in the dataset; m denotes the number of observations.

w_{i} = \frac{1}{R M S E_{i}} / \sum_{k = 1}^{n} \frac{1}{R M S E_{k}}

(14)

\hat{y} = \sum_{i = 1}^{n} w_{i} f_{i} (X)

(15)

where i denotes the i^th model for which the weight is being calculated; k represents the k^th model in the total ensemble of n models; f_i(X) denotes the prediction from the i^th model.

SE, also known as stacked generalization, involves training a secondary model, known as a meta-learner, to combine the predictions from multiple base models optimally. The based-level model f_i is trained on the complete training dataset, and their predictions form a new dataset (D_meta), which is used to train the meta-learner (F). The final prediction is obtained by inputting the base model’s predictions into the meta-learner. In the context of this study, linear regression serves as the meta-learner, chosen for its interpretability and efficiency. This technique can be mathematically formalized as follows:

f_{i} = t r a i n (X, y)

(16)

D_{m e t a} = \{(f_{1} (X), f_{2} (X), \dots, f_{n} (X)), y\}

(17)

F = t r a i n (D_{m e t a})

(18)

\hat{y} = F (f_{1} (X), f_{2} (X), \dots, f_{n} (X))

(19)

VE is another ensemble method that combines predictions from multiple models, but unlike WAE, it does not use weighted averages. Instead, VE applies simple voting mechanisms. This method benefits from the diverse perspectives of the constituent models, as it does not assume any prior information regarding the model’s performance, thereby providing a model-agnostic approach to ensemble learning. For regression tasks, the most common form of voting is averaging, where each model in the ensemble votes for a particular value, and the final prediction is the arithmetic mean of these votes:

\hat{y} = \frac{1}{n} \sum_{i = 1}^{n} f_{i} (X)

(20)

2.3. Model Training and Uncertainty Analysis

2.3.1. Hyperparameter Optimization

The model training and cross-validation processes are critical in developing ML models for river bed load prediction. The dataset was subjected to a random shuffling algorithm a thousand times after pre-processing. This rigorous shuffling ensures a homogeneous distribution of data points across the dataset, thereby mitigating any potential biases that could arise from the original ordering of the data. Such a strategy is instrumental in preserving the integrity of the model training and validation processes. Upon data shuffling, the dataset was partitioned into training and testing subsets, with 80% of the data (approximately 4277 samples) allocated for training purposes and the remaining 20% (about 1070 samples) reserved for testing. This division is designed to thoroughly evaluate the model’s predictive abilities and capacity to generalize across novel datasets.

Hyperparameter optimization is a critical step in configuring ML models to achieve optimal performance. For this study, particle swarm optimization (PSO) was employed, diverging from the conventional grid search methodology. PSO, inspired by the social behavior patterns of birds and fish, excels in navigating large and complex search spaces efficiently. The algorithm simulates a swarm of particles moving through the search space, where each particle represents a potential solution, in this case, a set of hyperparameters.

The PSO was initiated by randomly positioning particles within the predefined bounds of possible hyperparameter values. The position of each particle was updated iteratively based on two key aspects: the personal best position of the particle and the global best position found by any particle in the swarm. This dual influence allows the particles to explore the search space thoroughly while converging toward the most promising regions.

The specific objective function for the PSO aimed to minimize the negative mean squared error, assessed through a rigorous 5-fold cross-validation. In each iteration of the cross-validation, four segments of the training dataset were used to train the model, while the fifth segment served as the validation set to evaluate performance. This procedure was repeated five times, rotating the validation segment to ensure each part of the dataset was used exactly once for validation. Such a methodology provided a robust estimation of the model’s predictive accuracy across varied data subsets. The updated equations governing the movement of each particle are expressed as follows:

v_{i}^{t + 1} = w \cdot v_{i}^{t} + c_{1} \cdot r_{1} \cdot (p_{i} - x_{i}^{t}) + c_{2} \cdot r_{2} \cdot (g - x_{i}^{t})

(21)

x_{i}^{t + 1} = x_{i}^{t} + v_{i}^{t + 1}

(22)

where

v_{i}^{t}

is the velocity of particle i at iteration t;

x_{i}^{t}

is the current position (hyperparameters); p_i is the personal best position of the particle; and g is the best position found by any particle in the swarm. The constants w, c₁, and c₂ represent the inertia weight and cognitive and social coefficients, respectively, while r₁ and r₂ are random numbers between 0 and 1. These equations ensure that each particle learns from its own experience and from the successes of neighbors, driving towards convergence on optimal hyperparameters. The optimal hyperparameters identified through PSO for each model are summarized in Table 1.

2.3.2. Feature Importance and SHAP Value Analysis

The permutation feature importance method was employed to quantify the contribution of each input variable in the predictive models. This technique involves randomly shuffling each predictor variable in the dataset and measuring the change in the model’s accuracy [46]. The more significant the drop in accuracy, the more influential the variable is for the model’s prediction capability. This approach is particularly insightful for understanding the relative significance of different physical factors, such as grain size and river discharge distributions, in influencing bed load transport rates.

Additionally, the SHAP values were calculated for each model to provide a deeper understanding of the feature contributions. SHAP values, at their core, serve to decompose the prediction of an ML algorithm into contributions from each predictor, providing a comprehensive perspective through which to interpret the model’s behavior [47]. This method extends the interpretability of models by applying a model-agnostic approach, which means it can be used across different types of ML models, from linear models to complex ensemble trees and deep neural networks.

One remarkably versatile implementation of SHAP is kernel SHAP, which employs a weighted linear regression model to approximate SHAP values for any ML model. This approach is grounded in the idea that the explanation model (a simple linear model of feature contributions) should approximate the output of the complex model as closely as possible for any given input. The kernel SHAP method calculates the SHAP value for a feature by evaluating the change in the prediction that occurs when the feature is “included” in a subset versus when it is “excluded”. The weights for the regression are determined by the SHAP kernel, which measures the distance between the coalition of features included and the complete set of features. Kernel SHAP effectively enables the estimation of SHAP values and emphasizes the importance of understanding model predictions regarding feature contributions.

2.3.3. Uncertainty Quantification

In the context of river bed load dynamics, which are characterized by their inherent complexity and variability, quantifying uncertainty in model predictions is essential. The Monte Carlo method stands out as a fundamental approach for uncertainty assessment, taking advantage of stochastic simulations to explore a variety of outcomes under various scenarios. This technique is instrumental in evaluating the behavior of ML models under different data conditions, shedding light on their robustness and reliability.

The framework for uncertainty quantification was structured around several data scenarios, essentially varying the sample rate of the training data. By adjusting the sample rates (SR) of the original training data (X_train) from 0.3 to 1.0, the analysis captures the model’s performance across a spectrum of data completeness. This approach is designed to gauge how well each model performs under the varying degrees of data completeness, reflecting realistic situations where data limitations are a common challenge. By systematically exploring the impact of training data variability on model performance, this methodology provides a robust framework for assessing the reliability of ML predictions.

For each sample rate SR_i, where i ranges from 0.3 to 1.0 in predefined increments, the study executes 1000 Monte Carlo simulations. These simulations generate a multitude of training subsets (

X_{t r a i n, i}^{j}

), with j denoting the simulation iteration. This process ensures the robustness of the findings, mitigating the influence of specific data samplings. A detailed landscape of each model’s adaptability to quantity and quality is constructed by evaluating model performance across diverse training data scenarios. Following the simulations, the mean (µ) and standard deviation (σ) of the performance metrics (here is R) for each model and sample rate are computed to quantify the model’s predictive performance and its variability.

X_{t r a i n, i}^{j} = S a m p l e (X_{t r a i n}, S R_{i})

(23)

μ_{S R_{i}} = \frac{1}{1000} \sum_{j = 1}^{1000} R (X_{t r a i n, i}^{j})

(24)

σ_{S R_{i}} = \frac{1}{999} \sum_{j = 1}^{1000} (R (X_{t r a i n, i}^{j}) - μ_{S R_{i}})

(25)

2.4. Performance Metrics

In this study, three critical performance metrics were employed to evaluate the effectiveness of ML models in predicting river bed load: RMSE, mean absolute error (MAE), correlation coefficient (R), and Nash–Sutcliffe efficiency (NSE). These metrics were chosen for their widespread acceptance in hydro-dynamic modeling and their ability to provide comprehensive insights into model accuracy, predictive power, and overall performance. In particular, RMSE measures the model’s ability to estimate the quantity of interest. It quantifies the square root of the average squared differences between the observed and estimated values. Lower RMSE (or MAE) values indicate better model performance. NSE is used to assess the predictive skill of a model compared to the mean of the observed data, while R measures the strength and direction of the linear relationship between the observed and predicted values. The following table (Table 2) summarizes these metrics, providing their formulas, range values, and optimal values for reference.

In this table, y_i and

{\hat{y}}_{i}

represent the observed and predicted values;

\bar{y}

is the mean of observed values; and n denotes the number of samples.

3. Results and Discussion

3.1. Model Performance Analysis

Table 3 presents the performance statistics of six ML algorithms across four key metrics: RMSE, MAE, NSE, and R. Figure 1 complements this analysis by providing the comparative scatter plots of model predictions versus the observed data.

Table 3 provides a concise overview of the performance statistics for the six algorithms. An immediate observation is a superior performance among RF, GBM, CAT, and ETR models, which exhibit the lowest MAE and RMSE scores (0.009 and 0.022, respectively), indicating a closer fit to the observed data points. These models also demonstrate high NSE values (above 0.932), signifying their efficiency in capturing the variance of the observed bed load transport rates. Conversely, the BRM exhibits notably higher MAE and RMSE values (0.014 and 0.035, respectively) alongside a considerably lower NSE (0.838). Although BRM can capture the bed load transport process trend, its predictions are less precise and consistent compared to the tree-based and KNN models. This shows the difficulty of using Bayesian methods in this problem domain. The KNN model, with an NSE of 0.925 and RMSE of 0.024, offers a balance between the high accuracy of tree-based models and the lower performance of BRM. These differences underscore the variability in model performance, highlighting the strengths and limitations of each approach in capturing complex fluvial processes.

Figure 1 visually reinforces these quantitative findings by illustrating the degree of alignment between the model predictions and observed data. The scatter plots, particularly for the tree-based models, show a tighter clustering of points along the 1:1 line, reflecting their higher accuracy and efficiency in bed load prediction with R values exceeding 0.936. The plots for KNN, which depict a slightly more dispersed but still closely aligned set of points, further confirm their effectiveness despite a slight decrease in accuracy as indicated by their R score (about 0.9326). In contrast, the BRM, with its unique approach to handling uncertainty and incorporating prior knowledge, tends to exhibit variability in performance depending on the specificity and quality of the input data. This variability is evident in the broader spread of data points in the scatter plot for BRM, resulting in a notably lower R score (0.8384) and visually depicting its lower predictive accuracy.

3.2. Feature Importance and SHAP Value Insights

This section will delve into the critical evaluation of feature importance across the different ML models employed in the study. This analysis is pivotal in understanding how each input variable influences the model’s predictive capabilities. Permutation feature importance scores are initially calculated for each variable across all the models. To enable a comparative analysis across the diverse models, these scores are normalized by dividing each variable’s raw importance score by the sum of all the scores for that model, ensuring that the total normalized importance for each model equals one. This normalization facilitates a direct comparison of the influence of each variable within and across different modeling approaches. Figure 2 graphically represents permutation feature importance scores across the models, and Table 4 lists these scores, providing a basis for the normalization process that results in the values depicted in Figure 2. Subsequent discussions will pivot to SHAP values, as illustrated in Figure 3.

Figure 2 graphically represents these normalized permutation feature importance scores across the models. River discharge (Q) demonstrates high normalized importance across most models, particularly in BRM and GBM, both registering a score of 0.692. This dominance emphasizes the critical role of discharge in influencing bed load transport rates, a finding that aligns with hydrological principles. In addition, this score signifies the substantial influence of Q in these models compared to CAT and KNN, where Q’s normalized importances are markedly lower at 0.469 and 0.440, respectively.

Among the variables, flow width (W) showed notably higher importance in the BRM (0.179) than others, suggesting that BRM attributed a significant portion of the prediction variance to this variable. On the other hand, the ETR assigns the lowest importance to W (0.044). The difference in the scores of W highlights the different weighting of physical properties by the different algorithms. The bed slope (S) emerged as a consistently significant variable in all the models, with KNN having the highest importance (0.161). This finding demonstrates the widespread recognition of the influence of slope on bed load transport regardless of the modeling method. Interestingly, the detailed variables (D16, D50, D84, and D90) show varying importance across models, with D16 and D84 receiving modest importance scores in some instances. The relatively low scores for grain sizes, especially in the BRM, indicate the limited sensitivity of these samples to changes in larger grain sizes. The analysis emphasizes the heterogeneous nature of model responses to different input variables, reflecting the complex interactions between physical factors in river bed load dynamics.

SHAP value distributions, as presented in Figure 3, provide a deeper understanding of variable importance than permutation feature importance scores alone, as they take into account interaction effects between features and provide insight directly into the marginal contribution of each variable to the estimation. For instance, the Q, which emerged as a key variable in the permutation feature importance analysis, consistently shows the highest impact across all the models in the SHAP value analysis as well. This is evidenced by the dense clustering of points with larger SHAP values, signifying that the role of Q in predicting bed load is both significant and stable. Notably, the SHAP values for Q in the BRM display a wider spread (from −0.6 to +0.6), suggesting that the BRM’s predictions are more sensitive to variations in Q than those of the other models (from −0.4 to +0.2). When observing the directionality of the SHAP values for the models, higher values of river discharge lead to an increase in predicted bed load, which aligns with hydrological expectations that higher flows generally carry more sediment. This trend is visually represented by a dense cluster of high-value points extending to the right of the zero line, indicating a positive correlation with the model’s output. Moreover, the distinct spread of SHAP values for Q in the BRM model, as depicted in Figure 3, underscores the model’s unique handling of predictive uncertainty. Unlike more deterministic models, BRM’s broader SHAP value distribution reflects its probabilistic framework, which integrates a range of possible outcomes to accommodate the inherent uncertainties in the input data. This characteristic is particularly evident when compared to the tighter clusters observed in deterministic models like RF and CAT, which are less equipped to directly represent uncertainty in their predictions.

The SHAP analysis also highlights the substantial impact of W and S (but inconsistently for models), corroborating the findings from the importance of the permutation feature. The SHAP values for W in the BRM suggest that higher values of W tend to decrease the predicted bed load (collection of points skewed to the left of 0 for higher feature values), a less clear relationship in the other models. This inverse relationship may reflect the unique way in which BRM integrates this feature, perhaps influenced by its underlying Bayesian framework, which is designed to handle uncertainty differently than other methods. The impact of S is significant in models such as KNN, RF, and ETR, where a higher slope is associated with an increase in the predicted bed load. The SHAP values cluster positively for S, which is consistent with the physical understanding that steeper slopes can often result in more vigorous sediment transport due to increased gravitational forces acting on the particles.

The grain size variables D16, D50, D84, and D90, showed different levels of impact between the models. The CAT and ETR models display more pronounced sensitivity to changes in these features than KNN and BRM. For instance, the CAT model displays a sensitive response to changes in D50, which is not as pronounced in the KNN and BRM. For the RF and GBM, the strong association of larger grain sizes with increased bed load is clearly reflected in the SHAP values, with a trend that emphasizes the correlation between higher D90 values and transport rates.

3.3. Impact of Individual Variables on Model Performance

In this section, we continue to unravel the complexity of predictive modeling for river bed load estimation by examining the impact of sequential feature inclusion and exclusion on model performance. The results are systematically presented in Table 5 and visually interpreted through Figure 4 and Figure 5.

Table 5 summarizes the cumulative performance of the five ML models when additional variables are incrementally introduced into the training process. The models exhibit a consistent trend: starting with using only the Q, the R scores range from 0.508 to 0.554, underscoring the discharge’s dominant role as previously established by permutation importance and SHAP analyses. Including S leads to a notable improvement across all the models, with increased R values indicating a significant slope contribution to model accuracy. Adding W to the Q and S variables further enhances model performance, with R values rising above 0.890 in all the models. This increase highlights the relevance of W despite its lower permutation significance, suggesting its synergistic effect with other variables, especially Q and S.

The trend continues by including mean grain size (D50), where the models reach an R-value range between 0.918 and 0.927. The cumulative impact of sequential feature inclusion reached a plateau with the addition of D90, D84, and D16, where the R values showed marginal improvements or, in some cases, slight declines. This pattern indicates the diminishing returns of adding less influential variables and possibly the onset of model complexity that does not correspondingly enhance performance. Figure 4 visually demonstrates these findings through a stacked bar representation, elucidating the incremental benefits of each added variable. The visualization emphasizes the importance of Q, S, and W as fundamental contributors to the model’s predictive ability.

The line graph in Figure 5 illustrates a marked decline in the R score when Q is removed from the models, underscoring its importance. The RF model, which initially had a robust R score of 0.926 with all the variables included, witnessed a significant decrease to an R score of 0.882 without Q. This pattern is reflected across the GBM, CAT, ETR, and KNN models, although the extent of impact varies, with KNN showing the most significant reduction. Similarly, the exclusion of bed slope S and W yields a clear reduction in model performance, though to a lesser extent compared to Q. The relatively smaller decline in R scores when excluding these variables, such as the decrease from 0.926 to 0.919 for RF when omitting S, indicates that although they are significant, the models may partially compensate for their absence by leveraging information from other correlated variables.

The grain size distribution variables, D16, D50, D84, and D90, present an interesting pattern. When excluded individually, the slight variation in R scores, especially for D84 and D90, implies a lower reliance of the models on these variables to predict bed loads. For instance, the RF model exhibits a negligible change in the R score when D84 is removed, confirming the marginal influence of the variable, which is also suggested by the importance of the permutation feature and SHAP value analysis. In general, these observations reveal a hierarchy of variable importance with Q at the top, followed by S and W, and finally, the grain size distribution variables. The effectiveness of models such as RF and ETR in eliminating individual variables highlights their potential utility in situations where data on specific predictors may be incomplete or challenging to collect.

3.4. Uncertainty Assessment in Predictions

This subsection explores uncertainty in model predictions under varying data availability. As mentioned in Section 3.4, the study performed 1000 Monte Carlo simulations for each sample rate SRi. Table 6 and Figure 6 summarize the results of the Monte Carlo simulation-based sampling processes that illustrate the influence of the training sample size on the R scores of four primary ML models: RF, GBM, CAT, and ETR.

The numbers in Table 6 and Figure 6 showed a consistency in the performance of the models in that as the number of samples from the training dataset increases (ratio from 0.3 to 1.0), the models show a dual trend of increasing average R scores and narrowing standard deviations. This highlights the direct correlation between the amount of training data and the model’s prediction accuracy. The RF model, for instance, displays a commendable growth in the mean R score, starting from 0.9156 at a sample rate of 0.3 and reaching a peak of 0.9366 with the entire dataset. This upward trend in the mean R score is paralleled by a decrease in standard deviation from 0.00535 to 0.0002, illustrating the robustness of the model’s predictive accuracy.

Similarly, the CAT and ETR models exhibit comparable trends with slightly different magnitudes of variation in the mean R score and standard deviation. In particular, the mean R scores gradually increase (from 0.922 to about 0.940). At the same time, their standard deviation gradually decreases (from 0.0024 to just 0.0002), a clear indicator of enhanced performance with more data. The GBM model, though starting with a slightly higher standard deviation at a sample rate of 0.3 (0.0034, exhibits the most pronounced reduction in predictive variability, reaching 0.00003 at a sample rate of 1.0. This dramatic decrease in standard deviation, combined with a rise in the mean R score, emphasizes the enhanced robustness of the model when trained on a more extensive dataset.

Previous subsections discussed individual model performances and the importance of features. Therefore, the joint consideration of mean and variability adds another layer of understanding to model behavior in response to varying training data sizes. It provides a metric of model reliability—models with smaller standard deviations at equivalent mean R scores are considered more reliable, as their predictions are consistently closer to the observed values. In addition, ML models not only become more accurate but also demonstrate reduced variability in their estimations with adequate training data.

3.5. Comparative Analysis of Ensemble Techniques

This subsection provides an examination of the performance of ensemble methods in comparison to individual ML models. The ensemble techniques, WAE, SE, and VE, synthesize the strengths of the RF, GBM, CAT, and ETR models to enhance predictive accuracy and robustness. Table 7 and Figure 7 summarize the main results and illustrate these comparisons.

Table 7 reveals that all three ensemble models achieve remarkably similar performance metrics, with RMSE and MAE values slightly better than those of the strongest individual models. For instance, while RF and GBM individually report an RMSE of 0.0453 (see Table 3), WAE and VE show a slightly improved RMSE of 0.0439. The ensemble models also yield an NSE metric comparable to the individual models, with SE slightly ahead with an NSE of 0.9249. Notably, the R scores for the ensemble models reached 0.9406, slightly higher than the R-value of the individual models. Figure 7 provides a visual corroboration of these findings. The scatter plots for WAE, SE, and VE all show a tight clustering of predictions around the observed data points, where R scores across the ensembles are uniformly high (0.940).

The analysis in this subsection emphasizes the importance of ensemble techniques as an advanced methodological approach to refine prediction accuracy. By integrating predictions from four selected models which have proven effective in their capacities, the ensemble methods minimize individual model biases and leverage the collective predictive strength, thereby improving model reliability.

4. Conclusions

The study systematically analyzed the predictive capabilities of six distinct ML algorithms, delved into the intricate dynamics of feature importance, evaluated the impact of individual variables on model performance, assessed prediction uncertainty, and scrutinized the efficacy of advanced ensemble techniques. This investigation provides a comprehensive understanding of the capabilities of the models and the robust enhancements achieved through ensemble approaches.

The individual models, with RF, CAT, and ETR, were tops in performance and established a high benchmark in terms of RMSE, MAE, NSE, and R metrics. The feature importance analysis, enriched by permutation importance scores and SHAP values, revealed a significant impact of river discharge (Q) on predictions, a finding that corroborated across all the models and resonated with fundamental hydrological principles. Furthermore, bed slope (S) and flow width (W) have been consistently recognized, although with varying degrees of influence.

Sequential feature inclusion and exclusion analyses demonstrated the specific influence of each single variable on model performance. This analysis provided valuable guidance on feature prioritization in situations constrained by data scarcity. Moreover, uncertainty assessment through Monte Carlo simulation has emphasized the important role of data volume in model training and confidence. This directly correlates with increased training data and improved predictive accuracy and reliability.

The ensemble techniques—WAE, SE, and VE—demonstrated their power, not just matching but slightly exceeding the performance of the most accurate individual models. This small improvement in RMSE and R scores is especially notable given the already high performance of individual models, suggesting that the ability of multiple models to integrate multiple predictions can provide an advantage.

This study has comprehensively examined the capabilities and limitations of individual and ensemble models in river bed load estimation. The ensemble methods, in particular, have shown that they can offer a valuable pathway to improve predictive accuracy and reduce uncertainty. The research has contributed to advancing ML applications in hydrology and highlighted the importance of strategic data collection and model selection.

Author Contributions

Formal analysis, X.-H.L. and M.S.; writing—original draft, X.-H.L.; writing—review and editing, T.T.H.; supervision, G.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Korea Environmental Industry&Technology Institute (KEITI) through R&D Program for Research and Development on the Technology for Securing the Water Resouces Stability in Response to Future Change, funded by Korea Ministry of Environment (MOE) (RS-2024-00332494),

Data Availability Statement

Data will be made available on request

Conflicts of Interest

Author Trung Tin Huynh was employed by the company Bach Khoa Ho Chi Minh City Technology Joint Stock Company. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

ANFIS	Adaptive neuro-fuzzy inference system
ANN	Artificial neural network
BRM	Bayesian regression model
CAT	Categorical boosting
R	Correlation coefficient
(ETR)	Extra tree regression
GA	Genetic algorithm
GBM	Gradient boosting machine
(KNN)	K-nearest neighbors
ML	Machine learning
MAE	Mean absolute error
NSE	Nash–Sutcliffe efficiency
PSO	Particle swarm optimization
RF	Random forest
RMSE	Root mean squared error
SR	Sample rates
SHAP	Shapley additive explanations
SE	Stacking ensemble
SVR	Support vector regression
WAE	Weighted averaging ensemble
VE	Voting ensemble

References

Wang, X.; Gualtieri, C.; Huai, W. Grain shear stress and bed-load transport in open channel flow with emergent vegetation. J. Hydrol. 2023, 618, 129204. [Google Scholar] [CrossRef]
Liu, S.; Wang, D.; Miao, W.; Wang, Z.; Zhang, P.; Li, D. Characteristics of runoff and sediment load during flood events in the Upper Yangtze River, China. J. Hydrol. 2023, 620, 129433. [Google Scholar] [CrossRef]
Moradi, S.; Esmaili, K.; Khodashenas, S.R. Experimental study on velocity distribution and bed load transport in compound channels: Effect of the floodplain’s wall parallel with the meandering main channel. J. Hydrol. 2023, 621, 129571. [Google Scholar] [CrossRef]
Duan, J.G.; Al-Asadi, K. On Bed Form Resistance and Bed Load Transport in Vegetated Channels. Water 2022, 14, 3794. [Google Scholar] [CrossRef]
Cohen, S.; Syvitski, J.; Ashley, T.; Lammers, R.; Fekete, B.; Li, H.-Y. Spatial Trends and Drivers of Bedload and Suspended Sediment Fluxes in Global Rivers. Water Resour. Res. 2022, 58, e2021WR031583. [Google Scholar] [CrossRef]
Wiberg, P.L.; Smith, J.D. Model for Calculating Bed Load Transport of Sediment. J. Hydraul. Eng. 1989, 115, 101–123. [Google Scholar] [CrossRef]
Martínez-Aranda, S.; Fernández-Pato, J.; García-Navarro, P. Non-Equilibrium Bedload Transport Model Applied to Erosive Overtopping Dambreach. Water 2023, 15, 3094. [Google Scholar] [CrossRef]
Recking, A. A comparison between flume and field bed load transport data and consequences for surface-based bed load transport prediction. Water Resour. Res. 2010, 46, W03518. [Google Scholar] [CrossRef]
Recking, A. An analysis of nonlinearity effects on bed load transport prediction. J. Geophys. Res. Earth Surf. 2013, 118, 1264–1281. [Google Scholar] [CrossRef]
Monsalve, A.; Yager, E.M.; Turowski, J.M.; Rickenmann, D. A probabilistic formulation of bed load transport to include spatial variability of flow and surface grain size distributions. Water Resour. Res. 2016, 52, 3579–3598. [Google Scholar] [CrossRef]
Matoušek, V. Modelling Intense Combined Load Transport in Open Channel. Water 2022, 14, 572. [Google Scholar] [CrossRef]
Choi, J.-H.; Jun, K.-W.; Jang, C.-D. Bed-Load Collision Sound Filtering through Separation of Pipe Hydrophone Frequency Bands. Water 2020, 12, 1875. [Google Scholar] [CrossRef]
Mishra, J.; Inoue, T. Alluvial cover on bedrock channels: Applicability of existing models. Earth Surf. Dynam. 2020, 8, 695–716. [Google Scholar] [CrossRef]
Johnson, J.P.L. Gravel threshold of motion: A state function of sediment transport disequilibrium? Earth Surf. Dynam. 2016, 4, 685–703. [Google Scholar] [CrossRef]
Deal, E.; Venditti, J.G.; Benavides, S.J.; Bradley, R.; Zhang, Q.; Kamrin, K.; Perron, J.T. Grain shape effects in bed load sediment transport. Nature 2023, 613, 298–302. [Google Scholar] [CrossRef] [PubMed]
Khosravi, K.; Cooper, J.R.; Daggupati, P.; Thai Pham, B.; Tien Bui, D. Bedload transport rate prediction: Application of novel hybrid data mining techniques. J. Hydrol. 2020, 585, 124774. [Google Scholar] [CrossRef]
Le, X.H.; Eu, S.; Choi, C.; Nguyen, D.H.; Yeon, M.; Lee, G. Machine learning for high-resolution landslide susceptibility mapping: Case study in Inje County, South Korea. Front. Earth Sci. 2023, 11, 1268501. [Google Scholar] [CrossRef]
Le, X.-H.; Le, T.T.H. Predicting maximum scour depth at sluice outlet: A comparative study of machine learning models and empirical equations. Environ. Res. Commun. 2024, 6, 015010. [Google Scholar] [CrossRef]
Meshram, S.G.; Singh, V.P.; Kisi, O.; Karimi, V.; Meshram, C. Application of Artificial Neural Networks, Support Vector Machine and Multiple Model-ANN to Sediment Yield Prediction. Water Resour. Manag. 2020, 34, 4561–4575. [Google Scholar] [CrossRef]
Hien, L.X.; Hien, L.T.T.; Ho, H.V.; Lee, G. Benchmarking the performance and uncertainty of machine learning models in estimating scour depth at sluice outlets. J. Hydroinform. 2024, jh2024297. [Google Scholar] [CrossRef]
Le, X.H.; Nguyen, D.H.; Lee, G. Performance Comparison of Bias-Corrected Satellite Precipitation Products by Various Deep Learning Schemes. IEEE Trans Geosci Remote Sens 2023, 61, 4704012. [Google Scholar] [CrossRef]
Le, X.-H.; Kim, Y.; Van Binh, D.; Jung, S.; Hai Nguyen, D.; Lee, G. Improving rainfall-runoff modeling in the Mekong river basin using bias-corrected satellite precipitation products by convolutional neural networks. J. Hydrol. 2024, 630, 130762. [Google Scholar] [CrossRef]
Ho, H.V.; Nguyen, D.H.; Le, X.-H.; Lee, G. Multi-step-ahead water level forecasting for operating sluice gates in Hai Duong, Vietnam. Environ. Monit. Assess. 2022, 194, 442. [Google Scholar] [CrossRef] [PubMed]
Goldstein, E.B.; Coco, G.; Plant, N.G. A review of machine learning applications to coastal sediment transport and morphodynamics. Earth Sci. Rev. 2019, 194, 97–108. [Google Scholar] [CrossRef]
Bhattacharya, B.; Price, R.K.; Solomatine, D.P. Machine Learning Approach to Modeling Sediment Transport. J. Hydraul. Eng. 2007, 133, 440–450. [Google Scholar] [CrossRef]
Gomez, B.; Church, M. An assessment of bed load sediment transport formulae for gravel bed rivers. Water Resour. Res. 1989, 25, 1161–1186. [Google Scholar] [CrossRef]
Bagnold, R.A. An empirical correlation of bedload transport rates in flumes and natural rivers. Proc. R. Soc. Lond. A Math. Phys. Sci. 1980, 372, 453–473. [Google Scholar] [CrossRef]
Einstein, H.A. The Bed-Load Function for Sediment Transportation in Open Channel Flows; United States Department of Agriculture, U.S. Government Printing Office: Washington, DC, USA, 1950; p. 71.
Azamathulla, H.M.; Chang, C.K.; Ghani, A.A.; Ariffin, J.; Zakaria, N.A.; Abu Hasan, Z. An ANFIS-based approach for predicting the bed load for moderately sized rivers. J. Hydro-Environ. Res. 2009, 3, 35–44. [Google Scholar] [CrossRef]
Kitsikoudis, V.; Sidiropoulos, E.; Hrissanthou, V. Machine Learning Utilization for Bed Load Transport in Gravel-Bed Rivers. Water Resour. Manag. 2014, 28, 3727–3743. [Google Scholar] [CrossRef]
Kitsikoudis, V.; Sidiropoulos, E.; Hrissanthou, V. Assessment of sediment transport approaches for sand-bed rivers by means of machine learning. Hydrol. Sci. J. 2015, 60, 1566–1586. [Google Scholar] [CrossRef]
Roushangar, K.; Koosheh, A. Evaluation of GA-SVR method for modeling bed load transport in gravel-bed rivers. J. Hydrol. 2015, 527, 1142–1152. [Google Scholar] [CrossRef]
Roushangar, K.; Shahnazi, S. Bed load prediction in gravel-bed rivers using wavelet kernel extreme learning machine and meta-heuristic methods. Int. J. Environ. Sci. Technol. 2019, 16, 8197–8208. [Google Scholar] [CrossRef]
Asheghi, R.; Hosseini, S.A. Prediction of bed load sediments using different artificial neural network models. Front. Struct. Civ. Eng. 2020, 14, 374–386. [Google Scholar] [CrossRef]
Hosseini, S.A.; Abbaszadeh Shahri, A.; Asheghi, R. Prediction of bedload transport rate using a block combined network structure. Hydrol. Sci. J. 2022, 67, 117–128. [Google Scholar] [CrossRef]
Hosseiny, H.; Masteller, C.C.; Dale, J.E.; Phillips, C.B. Development of a machine learning model for river bed load. Earth Surf. Dynam. 2023, 11, 681–693. [Google Scholar] [CrossRef]
Wilcock, P.R.; Crowe, J.C. Surface-based Transport Model for Mixed-Size Sediment. J. Hydraul. Eng. 2003, 129, 120–128. [Google Scholar] [CrossRef]
Recking, A. Simple Method for Calculating Reach-Averaged Bed-Load Transport. J. Hydraul. Eng. 2013, 139, 70–75. [Google Scholar] [CrossRef]
Recking, A. BedloadWeb. Available online: https://en.bedloadweb.com/ (accessed on 15 February 2024).
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: Unbiased boosting with categorical features. arXiv 2018, arXiv:1706.09516. [Google Scholar] [CrossRef]
Friedman, J.H. Greedy Function Approximation: A Gradient Boosting Machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
Geurts, P.; Ernst, D.; Wehenkel, L. Extremely randomized trees. Mach. Learn. 2006, 63, 3–42. [Google Scholar] [CrossRef]
Cover, T.; Hart, P. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 1967, 13, 21–27. [Google Scholar] [CrossRef]
Gelman, A.; Hill, J. Data Analysis Using Regression and Multilevel/Hierarchical Models; Cambridge University Press: Cambridge, UK, 2006. [Google Scholar] [CrossRef]
Altmann, A.; Toloşi, L.; Sander, O.; Lengauer, T. Permutation importance: A corrected feature importance measure. Bioinformatics 2010, 26, 1340–1347. [Google Scholar] [CrossRef] [PubMed]
Lundberg, S.M.; Lee, S.-I. A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; pp. 4768–4777. [Google Scholar]

Figure 1. Comparative scatter plots of the ML model predictions vs. observed data. Each figure features two distinct distribution lines along the margins. The distribution of the observed values is green, while the distribution for the predicted values is blue.

Figure 2. Permutation feature importance scores across different ML models.

Figure 3. SHAP value distributions for feature impact analysis in the six models. Each dot represents the impact of a feature on the model’s output, with color indicating the feature value (high or low). The horizontal placement illustrates whether the impact of that value is connected with a higher or lower estimation, allowing for a quick comparison of feature importance across the models.

Figure 4. The stacked bar for accumulated performance with sequential feature inclusion.

Figure 5. Line plot for the R scores of models with the individual exclusion.

Figure 6. Impact of training sample size on R score for ML models.

Figure 7. Comparative scatter plots of the ensemble model predictions vs. observed data. Each figure features two distinct distribution lines along the margins. The distribution of observed values is green, while the distribution for the predicted values is blue.

Table 1. Optimal hyperparameters for 6 ML models using PSO.

Model	Hyperparameter	Ranges	Optimal Value
RF	n_estimators	[50, 500]	472
	max_depth	[1, 50]	29
	min_sample_leaf	[1, 10]	4
CAT	learning_rate	[0.001, 0.5]	0.045
	iterations	[50, 500]	461
	depth	[1, 16]	15
GBM	learning_rate	[0.001, 0.5]	0.033
	n_estimators	[50, 500]	406
	max_depth	[1, 50]	6
	min_sample_leaf	[1, 10]	6
ETR	n_estimators	[50, 500]	183
	max_depth	[1, 50]	17
	min_sample_leaf	[1, 10]	3
KNN	n_neighbors	[5, 200]	6
BRM	alpha_1	[1 × 10⁻⁶, 1 × 10⁻³]	4.82 × 10⁻⁴
BRM	lambda_1	[1 × 10⁻⁶, 1 × 10⁻³]	1 × 10⁻³

Table 2. Performance metrics for evaluating models in river bed load prediction.

Criteria	Description	Range Values
RMSE	$\sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}$	RMSE ≥ 0
MAE	$\frac{1}{n} \sum_{i = 1}^{n} \|y_{i} - {\hat{y}}_{i}\|$	MAE ≥ 0
NSE	$1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - {\bar{y}}_{i})}^{2}}$	1 ≥ NSE ≥ −∞
R	$\frac{\sum_{i = 1}^{n} (y_{i} - {\bar{y}}_{i}) ({\hat{y}}_{i} - {\bar{\hat{y}}}_{i})}{\sqrt{\sum_{i = 1}^{n} {(y_{i} - {\bar{y}}_{i})}^{2} \sum_{i = 1}^{n} {({\hat{y}}_{i} - {\bar{\hat{y}}}_{i})}^{2}}}$	1 ≥ R ≥ −1

Table 3. Performance statistics of ML algorithms.

Criteria	RF	GBM	CAT	ETR	KNN	BRM
RMSE	0.0224	0.0224	0.0225	0.0221	0.0236	0.0347
MAE	0.0087	0.0087	0.0086	0.0088	0.0093	0.0140
NSE	0.9323	0.9322	0.9318	0.9338	0.9246	0.8379
R	0.9360	0.9357	0.9403	0.9391	0.9326	0.8384

Table 4. Normalized permutation feature importance scores of ML algorithms.

Variable	RF	GBM	CAT	ETR	KNN	BRM
W	0.050	0.059	0.102	0.044	0.081	0.179
S	0.139	0.121	0.107	0.087	0.161	0.095
Q	0.624	0.692	0.469	0.654	0.440	0.692
D16	0.013	0.007	0.065	0.041	0.107	0.004
D50	0.038	0.031	0.093	0.080	0.109	0.015
D84	0.026	0.019	0.067	0.054	0.063	0.002
D90	0.111	0.069	0.097	0.039	0.040	0.013

Table 5. Accumulated model performance with sequential feature inclusion.

Model	Q	Q + S	Q + S+W	Q + S + W + D50	Q + S + W + D50 + D90	Q + S + W + D50 + D90 + D84	Q + S + W + D50 + D90 + D84 + D16
RF	0.563	0.891	0.911	0.936	0.936	0.936	0.936
GBM	0.581	0.889	0.917	0.938	0.936	0.937	0.936
CAT	0.584	0.886	0.924	0.940	0.941	0.941	0.940
ETR	0.577	0.884	0.919	0.939	0.939	0.939	0.939
KNN	0.533	0.863	0.888	0.933	0.934	0.933	0.933

Table 6. Comparative R scores for ML models across various training sample sizes.

Metric	Model	Sample Rate from Training Dataset
Metric	Model	0.3	0.4	0.5	0.6	0.7	0.8	0.9	1
Mean	RF	0.9156	0.9215	0.9256	0.9287	0.9312	0.9333	0.9350	0.9366
	GBM	0.9185	0.9238	0.9274	0.9300	0.9320	0.9337	0.9351	0.9358
	CAT	0.9239	0.9289	0.9323	0.9348	0.9366	0.9381	0.9392	0.9405
	ETR	0.9216	0.9268	0.9303	0.9328	0.9348	0.9364	0.9378	0.9391
STD	RF	0.0027	0.0023	0.0019	0.0018	0.0015	0.0012	0.0009	0.0002
	GBM	0.0034	0.0028	0.0023	0.0021	0.0018	0.0015	0.0011	3.1 × 10⁻⁵
	CAT	0.0033	0.0027	0.0023	0.0020	0.0016	0.0013	0.0009	0.0003
	ETR	0.0024	0.0020	0.0017	0.0015	0.0013	0.0010	0.0007	0.0002

Note(s): Here STD means standard deviation.

Table 7. Performance statistics of ensemble models.

Metrics	Weighted Average Ensemble (WAE)	Stacked Ensemble (SE)	Voting Ensemble (VE)
RMSE	0.0439	0.0437	0.0439
MAE	0.0338	0.0336	0.0338
NSE	0.9242	0.9249	0.9241
R	0.9400	0.9406	0.9400

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Le, X.-H.; Huynh, T.T.; Song, M.; Lee, G. Quantifying Predictive Uncertainty and Feature Selection in River Bed Load Estimation: A Multi-Model Machine Learning Approach with Particle Swarm Optimization. Water 2024, 16, 1945. https://doi.org/10.3390/w16141945

AMA Style

Le X-H, Huynh TT, Song M, Lee G. Quantifying Predictive Uncertainty and Feature Selection in River Bed Load Estimation: A Multi-Model Machine Learning Approach with Particle Swarm Optimization. Water. 2024; 16(14):1945. https://doi.org/10.3390/w16141945

Chicago/Turabian Style

Le, Xuan-Hien, Trung Tin Huynh, Mingeun Song, and Giha Lee. 2024. "Quantifying Predictive Uncertainty and Feature Selection in River Bed Load Estimation: A Multi-Model Machine Learning Approach with Particle Swarm Optimization" Water 16, no. 14: 1945. https://doi.org/10.3390/w16141945

APA Style

Le, X.-H., Huynh, T. T., Song, M., & Lee, G. (2024). Quantifying Predictive Uncertainty and Feature Selection in River Bed Load Estimation: A Multi-Model Machine Learning Approach with Particle Swarm Optimization. Water, 16(14), 1945. https://doi.org/10.3390/w16141945

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Quantifying Predictive Uncertainty and Feature Selection in River Bed Load Estimation: A Multi-Model Machine Learning Approach with Particle Swarm Optimization

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Collection and Pre-Processing

2.2. Overview of ML Algorithms

2.2.1. Random Forest (RF)

2.2.2. Categorical Boosting (CAT)

2.2.3. Gradient Boosting Machine (GBM)

2.2.4. Extra Tree Regression (ETR)

2.2.5. K-Nearest Neighbors (KNNs)

2.2.6. Bayesian Regression Model (BRM)

2.2.7. Ensemble Techniques

2.3. Model Training and Uncertainty Analysis

2.3.1. Hyperparameter Optimization

2.3.2. Feature Importance and SHAP Value Analysis

2.3.3. Uncertainty Quantification

2.4. Performance Metrics

3. Results and Discussion

3.1. Model Performance Analysis

3.2. Feature Importance and SHAP Value Insights

3.3. Impact of Individual Variables on Model Performance

3.4. Uncertainty Assessment in Predictions

3.5. Comparative Analysis of Ensemble Techniques

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI