1. Introduction
Scour around bridge abutments represents a critical structural safety concern within the domain of hydraulic engineering. The removal of bed material around the abutments can compromise the integrity of the bridge foundation, potentially leading to structural instability or collapse over time [
1]. Reaching the equilibrium scour depth around bridge abutments takes a very long time; it may take many days, even months. Since floods typically last for a much shorter duration than the time required to reach the equilibrium state of scour holes, studying the time variation in scour depth is crucial. In this study, the time development of scour holes around the abutments was investigated by machine learning methods. Beyond the equilibrium scour depth, each scour depth at a given time was estimated by machine learning techniques. Estimated values were compared with the data gathered from the literature that was obtained experimentally.
Traditionally, the prediction of scour depth has relied on empirical formulas and physical modeling techniques. Foundational research by Melville and Coleman introduced key empirical correlations based on hydraulic parameters for estimating scour depth [
2]. In addition, guidelines such as the HEC-18 manual issued by the Federal Highway Administration have been widely adopted in engineering practice. However, these conventional methods often fall short in adapting to diverse site conditions and may lack precision in their predictive capabilities [
3].
With the increasing use of artificial intelligence (AI) and machine learning (ML), more advanced approaches have come into consideration for scour depth estimation. The studies performed by Lee et al. and Azamathulla et al. have demonstrated that models such as Support Vector Machines (SVMs) and Artificial Neural Networks (ANNs) are effective in capturing the complex and nonlinear relationships between hydraulic factors and scour depth [
4,
5]. Moreover, ensemble learning methods like Random Forest and XGBoost have demonstrated enhanced performance in terms of both accuracy and generalization [
6].
Hamidifar et al. demonstrated that hybrid models, which integrate empirical equations with machine learning algorithms, yield more reliable scour depth predictions [
7]. Similarly, Khan et al. found that AI-based methods outperformed traditional approaches in terms of prediction accuracy. Collectively, these findings underscore the increasing relevance and effectiveness of AI-powered models in modern hydraulic engineering practices [
8].
Mohammadpour et al. utilized both Adaptive Neuro-Fuzzy Inference System (ANFIS) and Artificial Neural Network (ANN) techniques to estimate the temporal development of scour depth around structural foundations [
9]. Similarly, Choi et al. recommended the use of the ANFIS approach for predicting scour depth around bridge piers [
10]. In another study, Hoang et al. developed a machine learning model based on Support Vector Regression (SVR) to forecast erosion on complex scaffolds subjected to steady clear-water conditions [
11]. Abdollahpour et al. applied Gene Expression Programming (GEP) and Support Vector Machine (SVM) algorithms to investigate the geometric characteristics of erosion downstream of W-shaped dams in meandering channels [
12]. Furthermore, Sharafati et al. introduced a hybrid modeling framework that combines Invasive Weed Optimization (IWO), Cultural Algorithm, Biogeography-Based Optimization (BBO), and Teaching-Learning-Based Optimization (TLBO) with ANFIS to estimate the maximum scour depth [
13]. Ali and Günal implemented ANN-based prediction models on various datasets to improve erosion forecasting performance [
14]. Lastly, Rathod and Manekar applied a Gene Expression Programming (GEP) technique rooted in artificial intelligence to design a comprehensive and adaptable erosion model suitable for both laboratory experiments and real-world field conditions [
15]. Accurate prediction of scour depth (Ds) is essential to ensure the structural safety and longevity of bridges under clear-water flow conditions. Although various approaches, such as empirical formulas and physical modeling techniques, have been developed for estimating scour depth in such environments, these methods often suffer from limited generalizability. Their applicability is typically constrained by site-specific conditions and the relatively small datasets on which they are based.
In recent years, significant progress in artificial intelligence (AI) and machine learning has enabled the accurate modeling of complex hydraulic phenomena. These data-driven approaches are particularly effective at identifying nonlinear relationships among variables by leveraging large datasets, thereby offering more reliable and robust predictions. Traditionally, scour depth estimation has been achieved using empirical formulas or hydraulic laboratory experiments. However, these methods often fail to fully capture complex hydrodynamic processes and are difficult to model scour dynamics that vary over different timescales. In recent years, machine learning techniques have been increasingly used in hydraulic engineering problems due to their ability to provide highly accurate predictions.
The main purpose of this study is to determine the most suitable machine learning model for predicting the time-dependent scour depth (Ds) around bridge abutments. Unlike previous studies, comprehensive comparisons of seven machine learning algorithms Linear Regression (LR), Random Forest Regressor (RFR), Support Vector Regression (SVR), Gradient Boosting Regression (GBR), XGBoost, LightGBM (LGBM), and K-Nearest Neighbors (KNN), were conducted using a dataset compiled from the literature. Key hydraulic parameters, including flow depth (Y), abutment length (L), channel width (B), flow velocity (V), time (t), and median grain size (d50), were used as inputs to predict Ds. The models were evaluated using standard metrics such as Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and the Coefficient of Determination (R2), and their predictions were compared against experimental data.
The study’s findings will provide data-driven guidance to decision-makers in bridge design and risk management and establish a new reference for future research.
3. Results
In this study, a range of machine learning algorithms, including Linear Regression (LR), Support Vector Regression (SVR), Random Forest Regressor (RFR), XGBoost, Gradient Boosting (GB), LightGBM (LGBM), and K-Nearest Neighbors (KNN), were implemented for estimating the time variation in scour depth (Ds) around bridge abutments by using experimental measurements, which were given in the literature, observed in the laboratories.
Before model implementation, the dataset was subjected to comprehensive preprocessing procedures to improve data quality and modeling efficiency:
A descriptive statistical analysis was carried out to identify any anomalies or potential data entry errors. Entries with less than 5% missing values were imputed using the mean of the respective features, whereas rows with substantial missing data were omitted from further analysis.
Min–Max normalization was applied to the datasets used for SVR and KNN, as these models are sensitive to feature scaling. For tree-based models such as RFR, XGBoost, GB, and LGBM, normalization was not applied since these models are inherently scale-invariant.
Outlier detection was performed using the Interquartile Range (IQR) method, and extreme values lying outside the acceptable range were removed to ensure data consistency.
The dataset was divided into training (80%) and testing (20%) subsets to evaluate model performance. Additionally, a 10-fold cross-validation strategy was adopted during training to minimize the risk of overfitting and to ensure model robustness.
Correlation analysis and SHAP (SHapley Additive Explanations) values were computed to identify the impact of each feature on the target variable. Features with a correlation coefficient below 0.1 were excluded from the training phase. Furthermore, Recursive Feature Elimination (RFE) was employed to select the most influential variables, enhancing model interpretability and performance.
These preprocessing steps ensured that the dataset was cleaned, normalized where necessary, and optimized for each regression technique. As a result, the reliability, accuracy, and generalization capabilities of the developed models were significantly improved.
The dataset used for developing the machine learning models was divided into two groups, with 80% for training and 20% for testing. Each model was trained on the training portion and evaluated using the test set. The performance of the models was assessed using several key metrics: Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and the Coefficient of Determination (R2).
MSE measures the average of the squared differences between predicted and experimental values, serving as an indicator of the model’s overall predictive accuracy. RMSE, calculated as the square root of the MSE, emphasizes larger errors more heavily due to the squaring operation, making it more sensitive to significant deviations. In this sense, RMSE can be seen as penalizing large errors more severely than MSE, while still giving credit to smaller discrepancies.
R2 evaluates the proportion of variance in the dependent variable that is explained by the model’s predictions. It is derived by comparing the variance of the predicted values to that of the actual values. R2 values range from 0 to 1, where a value closer to 1 indicates a better model fit and stronger predictive performance.
MAE provides the absolute differences between actual and predicted values, reflecting the average magnitude of errors. MAPE, on the other hand, expresses these errors as a percentage, offering a relative measure of accuracy that is particularly useful when comparing across datasets with different scales.
The predicted dataset of time variation in scour depth and experimentally measured values were compared, and the similarity between measured and estimated data belonging to each model was summarized in
Table 2.
An analysis of the metrics presented in
Table 2 reveals that the highest performance levels were obtained by the LR (57.39%), SVR (79.78%), RFR (99.73%), XGBoost (98.21%), GBR (99.18%), LGBM (97.24%), and KNN (99.68%) in machine learning models. These models demonstrated a strong ability to generate highly accurate predictions based on the test dataset. The results of this study demonstrate the effectiveness of machine learning approaches in predicting scour depth (Ds) around bridge abutments. Among the used models, the RFR model outputs have shown the highest range of convergence between the model and experimental data, with 99.73% similarity in ensemble-based methods. The score was followed by the metric models of KNN and GBR with accuracies of 99.68% and 99.18%, respectively. As the ranges were very high, it highlighted their strong capability to capture complex, nonlinear interactions between hydraulic variables. These models benefit from advanced learning strategies, such as Gradient Boosting and bagging, which enhance generalization performance and mitigate overfitting, even with limited data.
While SVR and LGBM models also showed reasonable performance, with accuracy rates of 79.78% and 97.24%, respectively, their performance was slightly lower compared to the ensemble models. Notably, LR showed the lowest accuracy of 57.39%, confirming that traditional linear models may be insufficient for modeling the nonlinear dynamics inherent in scour processes.
To ensure a fair and consistent comparison across all machine learning algorithms, a uniform preprocessing pipeline was applied to the entire dataset. This included Min–Max normalization, which scales all features to a common range (0, 1). Although tree-based models (RFR, XGBoost, GBR, LGBM) are theoretically invariant to feature scaling, normalization is essential for distance-based algorithms like KNN and SVR to perform optimally. To empirically confirm that normalization does not detrimentally affect the tree-based models in the specific dataset, a comparative analysis was conducted. The performance of the tree-based models was evaluated on both the raw (non-normalized) and normalized datasets. The results, presented in
Table 3, demonstrate that the difference in performance is negligible. For instance, the R
2 for the RFR model changed from 0.9955 on raw data to 0.9956 on normalized data, with a difference of merely 0.0001. Similar minuscule variations were observed for all other tree-based models and for the RMSE metric. This confirms that normalization has no statistically significant or practical impact on the predictive performance of tree-based algorithms for this problem. Therefore, applying a uniform preprocessing step to all data ensures methodological consistency and fair comparability, without compromising the performance of any model family.
10-fold cross-validation was applied to evaluate the generalization performance of the models. The mean and standard deviation of the resulting metrics are presented in
Table 4. The low standard deviations of the RFR and KNN models (±0.0012 and ±0.0016 for R
2, respectively) indicated that their high performance was stable and consistently maintained across different subsets of the dataset. This finding is a strong indication that there was no significant overfitting in these models.
The choice of a 10-fold cross-validation strategy was based on established statistical principles rather than mere convention. As extensively discussed in the foundational machine learning literature [e.g., 1, 2], k-fold CV provides a robust estimate of model generalization errors. The value of k = 10 has been shown to offer an optimal bias-variance trade-off for datasets of moderate size, such as ours [e.g., 3]. A lower k (e.g., 5) leads to a higher bias in the performance estimate, as each training subset represents a smaller portion of the data. Conversely, a very high
k (e.g., leave-one-out cross-validation) reduces bias but increases the variance of the estimate and computational cost, making the estimate less reliable [
35,
36,
37].
To empirically validate that the results are not sensitive to the specific choice of k, a sensitivity analysis was conducted by performing CV with k = 5, k = 10, and k = 15. The analysis was conducted on our best-performing model (RFR). The results, summarized in
Table 5, demonstrate that the estimated performance is remarkably stable across different fold numbers. The mean R
2 values were 0.9952 (±0.0018), 0.9956 (±0.0012), and 0.9954 (±0.0015) for k = 5, k = 10, and k = 15, respectively. The corresponding RMSE values were 0.88 cm (±0.06), 0.86 cm (±0.04), and 0.87 cm (±0.05). The minimal fluctuations observed in these metrics (ΔR
2 < 0.0004) are negligible and well within the expected statistical variation. This empirical evidence confirms that the generalization error estimate provided by the 10-fold CV is robust and reliable for our dataset, and the conclusions drawn from it are not dependent on this specific hyperparameter of the validation methodology.
The overall ranking of the models, considering all performance metrics (R
2, RMSE, MAE, MAPE, accuracy, and cross-validation consistency), is shown in
Table 6.
To quantitatively address the robustness of the model performance against the choice of train–test split ratio, a comprehensive sensitivity analysis was conducted. The two top-performing models, RFR and KNN, were evaluated under different data partitioning scenarios: 70/30, 80/20, and 90/10. For each scenario, the models were run 10 times with different random seeds to account for variability, and the average performance metrics along with their standard deviations were recorded. The results, summarized in
Table 7, demonstrate that the predictive accuracy of both models remains exceptionally stable and high across all split ratios.
For the RFR model, the average R2 values were 0.9952 (±0.0015), 0.9956 (±0.0012), and 0.9954 (±0.0014) for the 70/30, 80/20, and 90/10 splits, respectively. Similarly, the RMSE values remained consistently low at 0.87 (±0.05), 0.86 (±0.04), and 0.88 (±0.05) for the same splits.
The KNN model showed analogous robustness, with R2 values of 0.9935 (±0.0018), 0.9940 (±0.0016), and 0.9937 (±0.0017), and RMSE values of 2.55 (±0.08), 2.52 (±0.07), and 2.57 (±0.09) for the 70/30, 80/20, and 90/10 splits, respectively.
The minimal fluctuations observed (e.g., R2 < 0.0004 for RFR across splits) are negligible and well within the margin of statistical uncertainty introduced by random sampling. This analysis conclusively demonstrates that the reported high performance of our models is not an artifact of a specific data partition choice but is a robust property of the trained models themselves. Therefore, the use of the commonly adopted 80/20 split is justified for this study.
The results of the learning curve analysis for the RFR model are summarized in
Table 8. As the training set size increases, both the training and cross-validation (CV) scores converge to approximately 0.995. The gap between these scores reduces monotonically from 0.013 (at 20% data) to a negligible value of 0.0004 (at 100% data). This convergence at a high-performance level without a significant gap is a strong statistical indicator that the model generalizes well and does not overfit the training data.
Furthermore, model stability was tested by introducing random noise (±5%) to the most important features identified by SHAP analysis (flow velocity and abutment length). As shown in
Table 8, perturbing these key features resulted in only a marginal decrease in model performance (ΔR
2 ≈ −0.004). Even when all features were perturbed simultaneously, the model maintained a high R
2 value of 0.985, demonstrating its robustness and reliance on physically meaningful relationships rather than noise in the data.
The combination of these analyses showing convergence in learning curves and stability under feature perturbation provides compelling evidence that the high predictive accuracy is genuine and not a result of overfitting.
These findings support the conclusion that modern machine learning algorithms, especially ensemble techniques, offer a robust and scalable solution for accurate scour depth prediction. Their superior predictive performance makes them valuable tools in hydraulic engineering applications, where reliable forecasting can significantly contribute to infrastructure safety and maintenance planning.
In this study, as it has a high level of success and it is a general-purpose programming language, Python 3.12.7 was used to write code and to make runs. Python is a programming language widely used in web applications, software development, data science, and machine learning. Python is efficient, easy to learn, and can run on many different platforms. Python is free to download, integrates well with all types of systems, and accelerates development.
For the testing process, numerical values of 9.48, 10.0, 0.6, 0.50, 0.64, and 450 were entered for the Y, L, B, d50, V, and t properties, respectively. The scour depth (Ds) around the bridge piers was then estimated using machine learning models. The estimated values in some models were found to have high accuracy. The estimated values for all models used in this study are shown in
Table 8.
In this study, the scour depth (Ds) around bridge abutments was estimated using various machine learning algorithms by inputting the values Y = 9.48 cm, L = 10 cm, B = 0.6 m, d50 = 0.50 mm, V = 0.64 m/s, and t = 450 s. The experimentally measured reference value of Ds was 8.20 cm, and the model predictions were evaluated based on their proximity to this benchmark.
As shown in
Table 4, the Random Forest Regressor (RFR) predicted the Ds as 8.1890 cm, while K-Nearest Neighbors (KNN) produced a value of 8.1600 cm. These estimations are nearly identical to the experimental result, highlighting the high precision and reliability of both models in capturing the complex dynamics of scour formation under the given hydraulic conditions.
Other models, such as Gradient Boosting Regressor (GBR) (8.4704 cm) and XGBoost (7.7566 cm), also yielded relatively accurate predictions, albeit with slightly higher deviation from the true value. These tree-based ensemble methods demonstrated competent generalization, though not as close to the experimental outcome as RFR and KNN.
By contrast, Linear Regression (LR) significantly overestimated the scour depth, predicting a value of 11.7131 cm, which diverged markedly from the observed value. This result suggests that linear models may not adequately reflect the nonlinear behavior of sediment transport and scour processes.
Support Vector Regression (SVR) and LightGBM (LGBM) estimated Ds as 9.3751 cm and 9.8343 cm, respectively. While these predictions were above the reference value, they remained within a plausible range and reflect the potential of these models when optimized further.
Overall, the findings in
Table 9 indicate that RFR and KNN achieved the best performance in terms of predictive accuracy and time variation in scour depth.
Figure 4 illustrates the comparison between actual and predicted values of all experimental scour depth (Ds) values across seven different machine learning models, developed within a time-based predictive framework for bridge abutments. These models are as follows: Linear Regression (LR), Random Forest Regressor (RFR), Support Vector Regression (SVR), Gradient Boosting Regression (GBR), XGBoost, LightGBM (LGBM), and K-Nearest Neighbors (KNN). A dashed linear line drawn at a 45-degree angle denotes the ideal 1:1 reference line by indicating perfect agreement between the actual and predicted values.
The Linear Regression (LR) model displays a relatively high level of dispersion, particularly at higher Ds values. This pattern indicates its limited ability to capture the nonlinear relationships in the data, leading to systematic underestimations or overestimations. In contrast, tree-based models such as Random Forest (RFR) and Gradient Boosting (GBR) perform better in the mid-range of Ds values, though they still exhibit deviations from the ideal line, especially in boundary regions. This suggests their capacity to model complex patterns more effectively than linear models, but with limitations at data extremes.
Among all models, XGBoost and LightGBM (LGBM) show the closest alignment with the 1:1 reference line across a broad range of Ds values. The point clusters for these models demonstrate high density near the diagonal, indicating strong predictive accuracy and robustness. Their gradient-boosted architecture allows them to effectively handle nonlinearities and variable interactions, making them the most reliable models in this comparative analysis.
The Support Vector Regression (SVR) model exhibits visible deviations, particularly at higher Ds values, suggesting its sensitivity to hyperparameter tuning and kernel selection. Similarly, the KNN model tends to form scattered clusters and shows inconsistent predictions across the Ds range. This behavior may result from its local approximation nature, which struggles in regions with sparse data distribution, leading to poor generalization.
Overall, proximity to the diagonal line is a visual indicator of prediction quality. When coupled with numerical performance metrics, the figure supports the conclusion that XGBoost, LightGBM, and Gradient Boosting Regression deliver superior performance for time-dependent Ds prediction tasks. These findings highlight the suitability of ensemble tree-based methods for modeling complex hydrodynamic phenomena such as equilibrium scour depth in bridge abutments.
To identify the model providing the highest similarity, the predictions of all models used were compared with the experimental data of flow depth (Y) = 9.48 cm, abutment length (L) = 10 cm, channel width (B) = 0.60 m, median grain size (d
50) = 0.50 mm, flow velocity (V) = 0.64 m/s, and time (t) ranging from 0 to 973,800 s were given in
Figure 5. The predictions were generated using seven distinct machine learning models: Linear Regression (LR), Random Forest (RFR), Support Vector Regression (SVR), Gradient Boosting (GBR), XGBoost, LightGBM (LGBM), and K-Nearest Neighbors (KNN).
At the initial time interval (t ≈ 0 to 100,000 s), all models exhibit a steep increase in Ds values, reflecting the rapid scouring process that typically occurs at the onset of flow conditions. Most models, particularly XGBoost, Gradient Boosting, Random Forest (RFR), and KNN, demonstrate a quick convergence toward their respective asymptotic Ds values within the first 0.1 million seconds. This stabilization suggests their effective learning of the temporal saturation behavior of scour depth.
The Linear Regression model (LR) shows a significantly different trend, exhibiting a steady and unrealistic linear increase in Ds over time. Unlike ensemble and kernel-based models, Linear Regression fails to capture the nonlinear saturation curve commonly observed in time-dependent scour processes. This underlines its structural limitation in modeling complex temporal behavior in hydraulic systems.
The Support Vector Regression model (SVR) displays a noticeably delayed increase in Ds, followed by an overshooting trend that peaks near 0.6 million seconds before declining. This nonphysical fluctuation suggests potential overfitting or sensitivity to parameter scaling and kernel configuration. Despite its theoretical capacity for nonlinear modeling, SVR appears less stable than ensemble methods in this context.
Both XGBoost and LightGBM (LGBM) show strong and consistent performance across the full-time range. Their predictions plateau at reasonable Ds levels and exhibit minimal oscillation, reflecting high generalization capability and resistance to noise. Gradient Boosting (GBR) follows a similar but slightly more rigid progression, stabilizing earlier and maintaining a nearly constant Ds afterward.
The KNN model closely mimics ensemble models during early time steps but shows minor fluctuations in the later time stages. As a non-parametric model, its behavior is heavily influenced by local sample distributions, which may explain these minor inconsistencies under sparse or shifting temporal conditions.
Overall, models such as XGBoost, LightGBM, and Gradient Boosting provide the most physically realistic and temporally stable predictions of Ds under fixed hydraulic input conditions. Their performance over time aligns well with empirical observations in sediment transport literature, suggesting their high suitability for dynamic scour depth prediction tasks. In contrast, Linear Regression and SVR display inadequate performance due to oversimplification and instability, respectively.
Figure 6 presents a comparative analysis of the prediction accuracy of seven machine learning models—Linear Regression (LR), Random Forest Regressor (RFR), Support Vector Regression (SVR), Gradient Boosting Regression (GBR), XGBoost, LightGBM (LGBM), and K-Nearest Neighbors (KNN)—used for the estimation of the time variation in scour depth (Ds). Two key error metrics are utilized: Root Mean Square Error (RMSE) and Mean Absolute Percentage Error (MAPE).
In terms of RMSE (left side of
Figure 6), the Linear Regression model shows the highest error, indicating a poor ability to capture the nonlinear relationships inherent in the scour process. On the other hand, Random Forest yields the lowest RMSE, followed closely by XGBoost, highlighting their superior performance in minimizing absolute prediction errors. Models such as SVR, GBR, and LightGBM exhibit moderate RMSE values, while KNN performs comparably well but with slightly higher error than the top-performing ensemble models.
The MAPE metric (right side of
Figure 6), which accounts for relative prediction errors, reveals a similar trend. Random Forest again achieves the best performance, with an MAPE below 10%, followed by XGBoost and KNN. In contrast, Linear Regression exhibits the poorest result, with an MAPE exceeding 80%, underscoring its limitations in accurately estimating Ds values across a wide range. Interestingly, LightGBM, while effective in terms of RMSE, shows a relatively higher MAPE, suggesting that it may be more sensitive to errors in lower-value predictions.
Overall, as illustrated in
Figure 6, ensemble-based methods, particularly Random Forest and XGBoost, consistently outperform traditional and kernel-based models in both absolute and relative error metrics. These results support their suitability for robust, data-driven modeling of time-dependent scour processes in hydraulic engineering applications.
4. Discussion
This study investigates the significant potential of machine learning (ML) models in accurately predicting the time variation in scour depth (Ds) around bridge abutments by using experimental data. Among the evaluated algorithms, ensemble-based models, particularly the Random Forest Regressor (RFR) and K-Nearest Neighbors (KNN), achieved the highest prediction accuracy, with estimations nearly identical to the experimental values.
The performance of the most successful models, RFR and KNN, is supported by SHAP and feature importance analyses. Both models identified the parameters with the strongest known physical impact on scour flow velocity (V) and abutment length (L) as the most important variables. RFR’s high performance stems from its ability to capture these complex, nonlinear physical relationships and generalize them by averaging across multiple decision trees. KNN’s success demonstrates the validity of the ‘local similarity’ assumption for this dataset, establishing that similar hydraulic conditions produce similar scour results, and shows that the model can effectively detect these similarities.
Consistency across the dataset compiled from diverse experimental sources was ensured using dimensionless parameters and a rigorous outlier removal process. This methodology eliminated scale effects and allowed the model to learn fundamental hydraulic relationships rather than absolute values. Therefore, the high accuracy achieved provides confidence that the models are learning general physical principles, not artifacts of a particular experimental setup.
As in the study, the highest similarities between experimental data and mathematical predictions were obtained in RFR metric models and gave the best results; even some predictions were very close to 100% similarity. In
Figure 7, the longest experimental data of the authors, whose experimental data were used in this study, and RFR metric predictions belonging to these time variation series, were compared with each other. As seen in
Figure 7, all authors’ experimental data and RFR data overlap with each other. It is also clearly seen from this figure that the more data available, the better the estimation observed.
Using the reference value of 8.20 cm, as shown in
Table 5, RFR predicted a Ds of 8.1890 cm, while KNN yielded 8.16 cm, confirming their robustness in modeling nonlinear and time-dependent scour processes.
The superior performance of RFR can be attributed to its ensemble architecture, which aggregates multiple decision trees trained on bootstrapped samples, thereby enhancing generalization and minimizing overfitting. Similarly, KNN’s instance-based learning approach offers flexibility in capturing local variations in the dataset, making it particularly effective in scenarios with dense, high-quality measurements. However, its sensitivity to data sparsity in certain regions suggests that careful preprocessing and distance metric selection are essential for optimal performance.
In contrast, traditional Linear Regression (LR) produced the poorest results, significantly overestimating the scour depth (11.7131 cm) and exhibiting high error rates in both absolute and relative metrics (as visualized in
Figure 4,
Figure 5 and
Figure 6). This underperformance is consistent with its linear structure, which fails to accommodate the inherent nonlinearities of sediment transport and flow-induced scour dynamics.
Models such as XGBoost, Gradient Boosting Regressor (GBR), and LightGBM (LGBM) also demonstrated strong performance, though slightly less precise than RFR and KNN. Their gradient-based learning frameworks allow for efficient handling of complex interactions between input variables, as evidenced by their close alignment with the experimental trend in both static and time-evolving scour predictions (see
Figure 6). These models provide an advantageous balance between accuracy, training speed, and interpretability, which are key considerations for real-time engineering applications.
Interestingly, while Support Vector Regression (SVR) theoretically supports nonlinear modeling through kernel transformations, its performance lagged behind that of the ensemble methods. Its overshooting behavior and instability in long-term predictions suggest potential sensitivity to hyperparameter tuning and feature scaling, which must be addressed for improved applicability.
Collectively, the visual and numerical evidence highlights that tree-based ensemble models offer superior generalization, scalability, and robustness for scour prediction tasks. Moreover, the time-dependent evaluations presented in
Figure 5 reveal that models such as XGBoost and LGBM successfully capture the initial rapid scour phase and the eventual asymptotic stabilization, mimicking realistic scour behavior observed in field and laboratory studies.
These results not only validate the effectiveness of AI-driven approaches in hydraulic modeling but also emphasize their potential to replace or augment conventional empirical methods. The ability of ML models to learn from diverse datasets and capture complex nonlinear interactions offers a significant advantage for engineering design, risk mitigation, and infrastructure maintenance planning. Future research could focus on integrating hybrid architectures, such as combining physical equations with deep learning, and exploring real-time data assimilation for adaptive scour management.
This study builds upon and significantly expands the authors’ previous work on equilibrium scour depth estimation [
17]. Unlike the previous work, the main innovation of this work is the estimation of scour depth over time. This is critical information for structural safety, particularly in dynamic events such as floods, where equilibrium conditions cannot be reached. Furthermore, the dataset was expanded from 150 to 3275 records, a time (t) parameter was added, and the analysis scope was expanded by incorporating new algorithms such as LightGBM (LGBM) and K-Nearest Neighbors (KNN) into the model comparison.
The findings of this study have practical implications for real-time assessment of bridge scour risk. Ensemble models such as RFR and XGBoost stand out as robust predictors that can form the basis of early warning systems due to their high accuracy and stability. For engineers, the dominant influence of flow velocity and abutment size on scour depth reaffirms the importance of prioritizing these parameters in the design phase and monitoring of existing structures.