Next Article in Journal
A Study on Causes of Gender Gap in Construction Management: High School Students’ Knowledge and Perceptions across Genders
Next Article in Special Issue
Exploring Edge Computing for Sustainable CV-Based Worker Detection in Construction Site Monitoring: Performance and Feasibility Analysis
Previous Article in Journal
High-Temperature Stirring Pretreatment of Waste Rubber Particles Enhances the Interfacial Bonding and Mechanical Properties of Rubberized Concrete
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Dam Deformation Prediction Considering the Seasonal Fluctuations Using Ensemble Learning Algorithm

1
The National Key Laboratory of Water Disaster Prevention, Hohai University, Nanjing 210024, China
2
College of Water Conservancy and Hydropower Engineering, Hohai University, Nanjing 210024, China
3
Powerchina Kunming Engineering Co., Ltd., Kunming 650051, China
4
Yunnan Key Laboratory of Water Conservancy and Hydropower Engineering Safety, Kunming 650051, China
5
Cooperative Innovation Center for Water Safety and Hydro Science, Hohai University, Nanjing 210024, China
*
Author to whom correspondence should be addressed.
Buildings 2024, 14(7), 2163; https://doi.org/10.3390/buildings14072163
Submission received: 7 June 2024 / Revised: 23 June 2024 / Accepted: 12 July 2024 / Published: 14 July 2024

Abstract

:
Dam deformation is the most visual and relevant monitoring quantity that reflects the operational condition of a concrete dam. The seasonal variations in the external environment can induce seasonal fluctuations in the deformation of concrete dams. Hence, preprocessing the deformation monitoring series to identify seasonal fluctuations within the series can effectively enhance the accuracy of the predictive model. Firstly, the dam deformation time series are decomposed into the seasonal and non-seasonal components based on the seasonal decomposition technique. The advanced ensemble learning algorithm (Extreme Gradient Boosting model) is used to forecast the seasonal and non-seasonal components independently, as well as employing the Tree-structured Parzen Estimator (TPE) optimization algorithm to tune the model parameters, ensuring the optimal performance of the prediction model. The results of the case study indicate that the predictive performance of the proposed model is intuitively superior to the benchmark models, demonstrated by a higher fitting accuracy and smaller prediction residuals. In the comparison of the objective evaluation metrics RMSE, MAE, and R2, the proposed model outperforms the benchmark models. Additionally, using feature importance measures, it is found that in predicting the seasonal component, the importance of the temperature component increases, while the importance of the water pressure component decreases compared to the prediction of the non-seasonal component. The proposed model, with its elevated predictive accuracy and interpretability, enhances the practicality of the model, offering an effective approach for predicting concrete dam deformation.

1. Introduction

With more than 45,000 rivers crisscrossing its 9.6 million square kilometers of land, China is rich in hydraulic resources, with the highest theoretical reserves and exploitable capacity globally [1]. China has built many dams to fully use the water and hydroenergy resources, reaching more than 98,000, which have played a massive role in providing water resources, power generation, and flood mitigation [2]. Among the 98,000 dams, the majority of those with a height exceeding 100 m are constructed using concrete. The seasonal variations in the external environment can induce seasonal fluctuations in the deformation of concrete dams. Meanwhile, climate variations, such as temperature changes [3] and flood-induced foundation scouring, can exacerbate the damage to civil engineering structures, such as dams and bridges [4]. In severe cases, dam break may occur. The failure of the concrete dam may give rise to catastrophic disasters and substantial economic ramifications for society. The concrete dam will respond to changes in load and environmental quantities, manifesting them as deformation, seepage, internal stress–strain, external cracks, and other structural responses [5,6,7,8]. Deformation aptly mirrors the operational condition of a concrete dam and proves comparatively more straightforward to measure than other structural responses. Therefore, it is considered a key indicator in the safety monitoring of concrete dams [9,10,11]. For the prevention of disasters, the timely and accurate health monitoring [12] and damage identification [13] of engineering structures are crucial. Constructing a deformation prediction model and executing remedial and preventive measures upon notable disparities between predicted and measured values constitute effective measures to ensure the secure operation of concrete dams.
Over the preceding decades, researchers have postulated numerous models for the prognostication of deformation. For example, statistical models have been proposed, such as stepwise regression [14], multiple regression [15], and partial least squares regression [16]. These models excel in handling linear relationships. However, there exists a complex, nonlinear, and time-varying mapping relationship that persists between the dam deformation and the influencing factors. This will affect the predictive accuracy of statistical models.
In recent years, scholars have dedicated their efforts to researching how to enhance the performance of monitoring models. Machine learning (ML), with its excellent ability to handle nonlinear and complex interactions between variables, has gradually replaced traditional mathematical statistical models as a new approach for developing safety monitoring models for dams. Artificial Neural Networks (ANNs) [17], Random Forests (RFs) [18], Extreme Learning Machines (ELMs) [19], and Support Vector Machines (SVMs) [20] have been adeptly employed in forecasting concrete dam deformation, yielding promising results.
However, certain ML algorithms also exhibit inherent limitations, including poor robustness, over-fitting, and black boxes [21,22,23]. The lack of interpretability (the capability to comprehend the decision-making process or prediction mechanism) caused by the black-box nature of ML models can hinder their practical engineering application [24]. Therefore, interpretable machine learning (IML) is attracting more attention from researchers [25,26]. Extreme Gradient Boosting (XGBoost) is an advanced ensemble model based on the gradient boosting decision tree (GBDT) algorithm, which can effectively prevent overfitting and has a higher tolerance for noise [27]. The interpretability of the XGBoost algorithm is achieved through the feature importance measures, enabling it to explain the correlation between input features and output results. The XGBoost algorithm has achieved outstanding results in Kaggle machine learning competitions. In the field of scientific research, it also has been successfully applied in many studies. For example, in the prediction of the failure risk analysis [28], flash flood risk assessment [29], the compressive strength of recycled aggregate concrete [30], forecasting gold price [31], the prediction of stock price direction [32], and the prediction of train arrival delay [33]. Based on the above explanation, the XGBoost model is utilized for forecasting deformation in concrete dams.
Machine learning models often suffer from parameter optimization problems [34,35]. The XGBoost model has numerous parameters, and its performance is inevitably closely related to the value of the hyperparameters taken [36]. Traditional manual reference and grid searches are often time-consuming and computationally expensive [37]. Multiple studies have demonstrated the successful integration of the XGBoost model with the Tree-structured Parzen Estimator (TPE) optimization algorithm to address the intricacies of parameter tuning [38,39]. Henceforth, in this study, TPE is employed to tune the parameters of the XGBoost model.
Some studies have already demonstrated that employing suitable decomposition techniques to decompose the deformation time series and modeling the prediction of the decomposed subsequences can effectively enhance the accuracy of predictions. The main decomposition techniques commonly used include Wavelet Transform (WT) [40], Empirical Mode Decomposition (EMD) [41], Variational Mode Decomposition (VMD) [42], and Ensemble EMD (EEMD) [43]. However, these methods ignore significant seasonal factors, which are important characteristics of concrete dam deformation. Models that do not account for seasonality often struggle to achieve satisfactory prediction performance [44]. To address this issue, seasonal trend decomposition procedures based on loess (STL) are employed here to extract seasonal fluctuations in a concrete dam deformation series.
In this study, a prediction model considering seasonal components is proposed for concrete dam deformation. Firstly, discerning seasonal fluctuations within the deformation sequence of concrete dams, and decomposing it into seasonal components and non-seasonal components. Secondly, XGBoost models are harnessed to forecast the seasonal and non-seasonal components. Through the utilization of the TPE algorithm, the model parameters are adjusted to attain optimal predictive performance. Then, consolidating the forecasted outcomes of the seasonal and non-seasonal components culminates in the definitive prediction for the series of concrete dam deformations. Lastly, the proposed model is applied to predict the deformation measured data from a concrete gravity dam to verify the feasibility.
Based on the above description, the contributions of this study are as follows:
  • The STL algorithm is used to decompose the deformation series of the concrete dam, thereby identifying its seasonal fluctuations. The decomposition results show that this algorithm can effectively separate the seasonal and non-seasonal components in the deformation series.
  • The TPE algorithm is used to optimize the parameters of the prediction model. The transparency of the optimization process is enhanced through visualizations of the parameter optimization history, parameter optimization relationships, and parameter importance.
  • The XGBOOST model is applied to the prediction process of both the seasonal and non-seasonal components. Other popular machine learning models are introduced as benchmark models to validate the performance of the proposed model. The comparison results indicate that the predictive accuracy of the proposed model surpasses that of the benchmark models.
  • Using feature importance measures, the contributions of the water pressure component, temperature component, and aging component in predicting the seasonal and non-seasonal fluctuations of concrete dam deformations are analyzed.
The rest of the sections in this paper are organized as follows. We introduce the appropriate method and theory in Section 2. The main steps of the proposed model are explained in detail in Section 3.1. In addition, we introduce the benchmark model and evaluation indicators in Section 3.2. Section 4 describes the applicability of the proposed model through the case study application and benchmark model comparisons. Finally, we detail the conclusions and future research.

2. Methodology

2.1. Seasonal Decomposition Based on Loess

The series of concrete dam deformations constitute a prototypical long-scale time series characterized by an annual cycle and frequently accompanied by stochastic fluctuations. The STL algorithm [45] is a technique that can decompose time series into seasonal and non-seasonal (combined trend and remainder) components. Compared to traditional decomposition methods, it can handle large time series efficiently and without the interference of a small number of outlier observations in the data, thus improving the predictive accuracy of subsequent modeling. In summary, it is appropriate to decompose the dam deformation monitoring sequence by STL.
Let Y t be the measured series of dam deformation, T t , S t , and R t are the trend, seasonal and residual terms of Y t , respectively. After the STL decomposition the original sequence Y t can be represented as follows:
Y t = T t + S t + R t
where t = 1 , 2   ,   3 n Indicates the time node of the deformation measurement sequence.
STL consists of two main recursive processes, inner and outer loops, and relies on iteration for decomposition. The inner loop matches the outer loop and each inner loop updates the season and trend terms once. In the inner loop, the seasonal and trend terms obtained at the end of the k th iteration are denoted as S t k and T t k respectively, then the steps of the inner loop for the k + 1 th iteration are as follows:
(1) Detrending. Define T t 0 = 0 . Delete T t k at k + 1 th iteration, i.e., Y t det r e n d = Y t T t k .
(2) Cycle-subseries Smoothing. The sequence C t k + 1 is obtained by processing Y t det r e n d based on the loess smoothing model.
(3) Low-Pass Filtering of Smoothed Cycle-Subseries. The sequence C t k + 1 is first low-pass filtered and then loess processed to obtain the sequence L t k + 1 .
(4) Detrending of Smoothed Cycle-Subseries. The seasonal term for k + 1 th iterations is calculated by S t k + 1 = C t k + 1 L t k + 1 .
(5) Deseasonalizing. Delete S t k + 1 at k + 1 th iterations, i.e., Y t de s e a s o n = Y t S t k + 1 .
(6) Trend Smoothing. Loess smoothing is performed on Y t de s e a s o n to obtain the trend term T t k + 1 for k + 1 th iterations.
At the end of each inner loop, the corresponding remainder term R t k + 1 is calculated are calculated according to Equation (2).
R t k + 1 = Y t S t k + 1 T t k + 1
The large values in the remaining components, R t , are considered as outliers and are given a smaller weight. and use them in subsequent inner loops to reduce the effect of the outliers on the seasonal and trend terms.
Repeating the above operation will bring the Y t resulting in three robust component sub-series, T t , S t , and R t .

2.2. XGBoost Algorithm

2.2.1. The Basic Principles of XGBoost

This study has demonstrated that machine learning tools (specifically the GBDT algorithm) exhibit superior predictive accuracy compared to traditional hydrostatic–seasonal–time (HST) models in terms of dam deformation prediction [46]. The XGBoost algorithm is one of the most recent GBDT algorithms [27]. It improves the loss function, regularization, and parallel algorithm in the traditional GBDT algorithm, significantly improving the efficiency and accuracy. The base learners of the XGBoost model are decision tree-based models. Figure 1 illustrates the framework of a regression decision tree model (T represents the environmental temperature, while H represents the water level upstream).
For a given dataset T = x i , y i , the algorithm’s predicted value y ^ i is k = 1 k f k x i , the loss function is l y i , y ^ i and the regularization term is Ω f k . The objective function is:
O b j = i = 1 n l y i , y ^ i + k = 1 m Ω f k
where n is the number of samples, m is the number of trees, and Ω f = γ T + 1 2 λ ω 2 .
The Equation (3) undergoes an initial second-order Taylor expansion.
O b j t i = 1 n l y i , y ^ i t 1 + g i f t x i + 1 2 h i f t 2 x i x i + Ω f t
where t represents the t th tree generated, g i is the first-order gradient statistic, and h i is the second-order gradient statistic.
After removing the constant term, Equation (4) is transformed into the following:
O b j t = i = 1 n g i f t x i + 1 2 h i f t 2 x i x i + Ω f t
Define the information of j th leaf node as a collection of nodes:
I j = i q x i = j
Edit Equation (5) and combined the primary term coefficients and the secondary term coefficients:
O b j t = j = 1 T i I j g i ω j + 1 2 i I j h i + λ ω j 2 + γ T
The Equation (3) is transformed into an equation with respect to the weight vector ω j of the leaf nodes. The optimal solution of the Equation (3) O b j b and the corresponding weight vector of the leaf nodes ω are calculated by Equation (7), and the result is expressed as:
O b j b = 1 2 j = 1 T i I j g i 2 i I j h i + λ + γ T
ω = i I j g i i I j h i + λ

2.2.2. Key Parameters of XGBoost

XGBoost is an updated version of GBDT with a dozen hyperparameters. They are divided into three main categories: general, booster, and learning task parameters. The general parameters use the default data and do not need to be modified. The booster parameter is used to define the details of the booster and has the most significant impact on XGBoost performance. In general, linear boosters do not perform as well as tree boosters; the booster parameters are all those of the tree boosters in this paper. The learning task parameters are employed to establish both the objective function and the output metric at each step. The efficacy of the XGBoost model is contingent upon the configuration of pivotal parameters [47,48]. Pivotal parameters are presented in Table 1.

2.3. TPE Optimization Algorithm

The TPE optimization algorithm is a probabilistic approach which selects the optimal hyperparameters by means of a probabilistic model [49]. It endeavors to identify the history of observation H that maximizes the Expected Improvement ( E I ) criterion. In this case, the best parameter in observation history H is the x * . E I is expressed as follows:
E I x = y * y * y p x y p y p x d y
where y is the value on the history of observation, and y * represents the threshold and is the γ quantile of the history of observation H. Thus, γ = p y * > y .
The TPE optimization algorithm focuses on modeling and analyzing p x y and p y in the above equation, where p x y is expressed in two ways:
p x y = l x if   y * > y   g x if   y * y
where l x is the density consisting of observations x i , corresponding to a loss function f x i smaller than y * . Then g x indicates the density of the remaining observations.
Substituting γ = p y * > y and p x = γ l x + 1 γ g x into Equation (11) gives:
E I x = l x γ y * l x y * p y d y γ l x + 1 γ g x γ + g x l x 1 γ 1
Based on the above equation, it is clear that the tree structure of l and g makes it easy to obtain many candidate parameters for g x l x . The optimal parameter x * can be calculated as:
x * = arg x min g x l x

3. The Modeling Framework and Performance Analysis

3.1. The Primary Steps of the Proposed Model

The conceptual foundation of the proposed model is grounded in the principle of “decomposition and summation”. The original concrete dam deformation series is decomposed into seasonal and non-seasonal components. By identifying the seasonal features of the sequence, thereby mitigating predictive challenges, the model’s prognostic performance is enhanced. Additionally, the relationship between input factors and the seasonal and non-seasonal components of the deformation sequence are explored, based on the model’s interpretability. The research content framework is shown in Figure 2, with the primary steps presented as follows:
(1)
The concrete dam measured deformation series Y t is decomposed based on STL method to obtain three components: the trend, seasonal, and residual components. The trend component and the residual component are combined as the non-seasonal component ( N S t ).
(2)
The dataset of each component is partitioned into training and validation sets, maintaining a ratio of 4:1.
(3)
The traditional HST model categorizes the main factors influencing deformation into three categories: water pressure, temperature, and aging [50,51]. The three categories are used as input factors to the XGBoost model.
(4)
For seasonal components ( S t ) and non-seasonal components ( N S t ), the XGBoost model is used for prediction modeling. To enhance the generalization capability of the model and reduce overfitting, the TPE optimization algorithm is introduced to obtain the most reliable parameters [52]. Multiple benchmark models are utilized for the comparative assessment of the predictive efficacy of the proposed model.
(5)
The prediction results for the seasonal components ( S t ) and non-seasonal components ( N S t ) produced by XGBoost algorithm are amalgamated through summation, yielding a consolidated output that signifies the ultimate predicted deformation of the concrete dam.

3.2. Analysis of the Predictive Performance

For a quantitative and objective assessment of the performance of the proposed model, the root mean square error (RMSE), the mean absolute error (MAE) and the Coefficient of Determination (R2) are employed.
(1)
RMSE:
RMSE = 1 n i = 1 n y i y ^ i 2
(2)
MAE
MAE = 1 n i = 1 n y i y ^ i
(3)
R2
R 2 = 1 i = 1 n y i y ^ i 2 i = 1 n y i y ¯ i 2
where n is the number of test sets; y i is the observed values; y ^ i is the corresponding predicted values; and y ¯ i is the average of y i .
The diminution of RMSE and MAE values signifies enhanced prediction accuracy, while a proximity of R2 to 1 further indicates superior predictive precision.

4. Empirical Analysis

4.1. Dataset Information

The project employed in this paper is situated in the upper reaches of the Yellow River in Qinghai Province, China (Figure 3). It constitutes a concrete arch dam classified as a large (1) type first-class project. The primary purpose of the project is electricity generation, with due consideration given to irrigation and other comprehensive benefits. The elevation of the wave wall, dam crest, and dam foundation are 2186.2 m, 2185 m, and 2030 m, respectively. For the monitoring of dynamic alterations and performance throughout the operational phase of the dam, monitoring instruments are installed in the dam body. A total of 22 plumb lines (PL) and 9 inverted plumb lines (IP) are deployed for monitoring the horizontal displacements (Figure 4).
The measuring point PL3-3 is selected as the prime research measuring point. The monitoring series of PL3-3 from 2015 to 2019 is defined as an analysis dataset. The analysis dataset is partitioned into a training set encompassing the period from January 2015 to December 2018, and a test set spanning from January 2019 to December 2019, with a 4:1 ratio. The analysis dataset of deformation and the corresponding environmental variable series are shown in Figure 5.
The 10 input factors for the XGBoost model are H H 0 , H H 0 2 , H H 0 3 , H H 0 4 , sin 2 π t 365 sin 2 π t 0 365 , cos 2 π t 365 cos 2 π t 0 365 , sin 4 π t 365 sin 4 π t 0 365 , cos 4 π t 365 cos 4 π t 0 365 , θ θ 0 , ln θ ln θ 0 . H 0 refers to the reservoir water depth of monitoring starting date; H refers to the reservoir water depth of monitoring date; the first four factors belong to the water pressure category. t 0 denotes the cumulative count of days from the monitoring date to the monitoring commencement date; t represents the cumulative count of days from the initiation of the model to the monitoring commencement date; the middle four factors belong to the temperature category. θ , θ 0 are t / 100 and t 0 / 100 respectively, the last two factors belong to the aging category. The output data are the predicted dam monitoring sequence. The entire calculation process is performed on the software PYTHON 3.8.5.

4.2. STL Time Series Decomposition

The STL method is used to identify the seasonal fluctuations within the deformation sequence and subsequently decompose it into three components (trend, seasonal and residual). The frequency of the cycle for the STL decomposition is set to 365 due to the deformation series constituting a typical time series featuring an annual cycle. Figure 6 illustrates the outcomes of the STL decomposition of the measured deformation series at PL3-3. If the residual components are removed, the seasonal and trend components are summed to a sequence that does not differ significantly from the original deformation sequence, showing that the STL decomposition can effectively identify the trends and periodic patterns in the measured deformation series (Figure 7).
The non-seasonal component is a combination of the trend and residual components (Figure 8). The seasonal component of the measured deformation series primarily captures the regular cyclic patterns present in the series. The non-seasonal component primarily describes the directional changes and random effects of other uncertainties on the deformation series. It can be observed visually that the seasonal component exhibits strong regularity, while the non-seasonal component demonstrates notable nonlinearity. Therefore, the non-seasonal component, which presents a challenging dataset due to its nonlinear nature, is chosen as an example for subsequent fitting and validation.

4.3. TPE Parameter Optimization

To enhance the performance of the proposed model, the TPE algorithm is utilized for the exploration of the optimal parameter combination. Additionally, a five-fold cross-validation is conducted to assess the performance of the model. The training dataset is subjected to five-fold cross-validation, which randomly divides the training dataset into five subsets (called folds) to improve training performance. The established proposed model is then trained and evaluated five times, each time selecting one fold for evaluation and training on a further four folds. The result is a set of five evaluation scores, which are the basis for further optimization of the model. Define R2 as the performance evaluation metric of the searched parameter combinations to ensure the generalization ability of the model. Conduct 100 parameter searches and identify the parameter combination with the highest R2 as the optimal parameters for the model. Table 2 illustrates the search space and corresponding optimal value.
As depicted in the historical results of TPE optimization (Figure 9), it can be observed that in the first 40 search results, the performance of the model is relatively unstable. The evaluation index R2 corresponding to the model exhibits significant fluctuations. The model’s performance, corresponding to the derived parameters, stabilizes at around 0.96 as the number of searches increases, with the best indicator R2 = 0.966.
Most machine learning algorithms are the equivalent of black boxes, making it difficult to understand the correlation between parameters and between parameters and results. Interpretability is attracting more attention as an emerging area of machine learning. Therefore, we visualize the parameter relationships for each TPE optimization model in this paper, which helps us to understand how each parameter relates to the value of the objective function during each parameter search. There are color bars to indicate the strengths and weaknesses of that parameter combination, presenting the process of each parameter search and understanding the complex relationships learned by the model (Figure 10).
Moreover, the importance scores of model parameters are output to reflect their contributions to the model’s performance and their impact on the optimization time (Figure 11 and Figure 12). The importance scores of these model parameters can help us optimize the model’s performance with limited computational resources by adjusting the parameters that need optimization, thereby achieving higher performance with minimal resources.

4.4. Proposed Model Performance Evaluation

Advanced methods such as RF, SVR, the ANN model, and MLR are introduced for comparison and validation during the process of fitting predictions to the non-seasonal component. A comparison of the model parameters and optimal values is given in Table 3.
Line graphs depicting the predicted and actual values for each model are plotted, accompanied by subsequent linear regression analysis (Figure 13). The predictions from the XGBoost model closely align with a perfect fit to the observed values, showcasing a high level of accuracy in the fitting. A majority of the predicted values fall within the 95% prediction band, indicating the high accuracy of the proposed model’s fit. Conversely, other comparative models do not exhibit such precise fitting, with some predicted values lying outside the 95% prediction band.
Boxplots are also created to analyze the prediction residuals of each model. The XGBoost model’s prediction residuals predominantly fall within the range of −0.1 to 0.1, with only a few outliers (Figure 14). Conversely, other models display a wider spectrum of prediction residuals and a greater number of outliers in comparison to the proposed model.
The same computational procedure is employed for the prediction of the seasonal component. Similarly, the prediction residuals of each model for the seasonal component are analyzed using boxplots. From Figure 15, it can be seen that, compared to other benchmark models, the XGBoost model also achieves the best performance in predicting the seasonal component.
The prediction results of the seasonal and non-seasonal components are combined to form the final prediction of the model. Evaluation metrics such as RMSE, R 2 , and MAE are then used to analyze and compare the prediction results of each model. Table 4 and Figure 16 present the evaluation indicators for each model. The results demonstrate that the proposed model surpasses the comparative models across all evaluation indicators.

4.5. Prediction Results Analysis

The interpretability of the proposed model is utilized to generate feature importance measures. This allows us to comprehend the crucial features contributing to the output and their relative importance in comparison to other individual features.
From Figure 17, for the prediction results of the non-seasonal component, the factors of the water pressure, temperature, and aging factors account for 0.486, 0.359, and 0.155, respectively. The water pressure factor holds the highest importance, primarily due to its significant impact on the forces exerted on a concrete dam through changes in upstream water levels. Consequently, it greatly influences the magnitude of deformation. The temperature factor holds intermediate importance, and the influence of various temperature period functions does not exhibit significant differences in terms of their degree of impact. The aging component holds minimal importance, considering that the concrete dam has been in service for a considerable number of years.
Similarly, in the prediction results of the seasonal component, the factors of the water pressure, temperature, and aging factors account for 0.328, 0.436, and 0.236, respectively (Figure 18). In contrast to the feature importance of the non-seasonal component, the importance of the water pressure factor decreases, while the importance of the temperature component increases for the seasonal component. This phenomenon aligns with the principles underlying the construction of the HST model. The temperature factor, consisting of trigonometric functions with an annual cycle, primarily captures the cyclic variations in temperature and their impact on deformation.

4.6. Generalization Performance of the Proposed Model

To further investigate the generalizability of the model, we have selected measuring points PL3-4 and PL3-5, situated within the same dam section as PL3-3, as instances for further analysis. From Figure 19, it can be seen that STL also accurately identifies the trend and periodic patterns in the measured series at measuring points PL3-4 and PL3-5.
The same training process used for measuring point PL3-3 is applied to measuring points PL3-4 and PL3-5, and the predictive performance is validated on the test set. Figure 20 depicts the prediction outcomes of the proposed model at PL3-4 and PL3-5, both of which demonstrate excellent predictive performance. These results indicate that the proposed model’s prediction performance is consistent across different datasets.

5. Conclusions and Future Work

Seasonal fluctuations constitute crucial data features within the deformation sequence of concrete dams. A novel prediction model for concrete dam deformation, considering seasonal components, is proposed in this study. The simulation and comparative experiment are conducted using a concrete arch dam as the case study to validate the effectiveness of the model. The principal conclusions drawn from this study are as follows:
  • The STL method can effectively identify the seasonal fluctuations in a concrete dam deformation series and decompose them into seasonal and non-seasonal components. The seasonal component exhibits clear periodic features, while the non-seasonal component shows strong nonlinear features, validating the effectiveness of the STL method.
  • Four well-established methods commonly utilized for forecasting a concrete dam deformation series are employed as benchmark models for comparison. Based on qualitative analysis (linear regression analysis and boxplots), the proposed model demonstrates better fitting accuracy to the data and smaller prediction residuals. As for quantitative assessment (evaluation indicators), It achieved the best performance on the evaluation metrics RMSE, MAE, and R 2 , with specific values of 0.081, 0.062, and 0.998, respectively.
  • Utilizing feature importance measures, the study delved into the relationship between input factors and the seasonal and non-seasonal components of the concrete dam deformation sequence. For the non-seasonal component, the contributions of water pressure, temperature, and aging components are 0.486, 0.359, and 0.155, respectively. For the seasonal component, the contributions of water pressure, temperature, and aging components are 0.328, 0.436, and 0.236, respectively.
The monitoring system of the case study project underwent renovation and upgrades in both 2003 and 2011. The deformation series in this paper exhibit relative stability and are not highly susceptible to noise interference. However, there are numerous concrete dams equipped with outdated monitoring systems, causing the corresponding monitoring series often exhibit random fluctuations. Consequently, comprehensive studies on prediction models that consider the impact of noise are necessary.

Author Contributions

M.L.: Conceptualization, Investigation, Methodology, Software, Writing—original draft, Writing—Review and Editing. Y.F.: Project administration, Visualization. S.Y.: Project administration, Visualization. H.S.: Resources, Supervision, Project administration, Funding acquisition, Writing—Review and Editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research has been partially supported by the National Natural Science Foundation of China (SN: 52239009, 51979093), the National Key Research and Development Program of China (SN: 2019YFC1510801), the Open Foundation of The National Key Laboratory of Water Disaster Prevention (SN: 523024852), the Fundamental Research Funds for the Central Universities (SN: 2019B69814), and the Open Foundation of Yunnan Key Laboratory of Water Conservancy and Hydropower Engineering Safety (SN: 202302AN360003).

Data Availability Statement

Data is contained within the article.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

  1. Li, Y.; Li, Y.; He, J. Strategic countermeasures for China’s water resources security in the new development stage. J. Hydraul. Eng. 2021, 52, 1340–1346+1354. [Google Scholar] [CrossRef]
  2. Liu, N.; Jia, J.; Liu, Z.; Jia, F. Ministry of water resources of the people’s republic of China. In SL 678–2014 Technical Guideline for Cemented Granular Material Dams; China Water & Power Press: Beijing, China, 2014. [Google Scholar]
  3. Huang, M.; Zhang, J.; Hu, J.; Ye, Z.; Deng, Z.; Wan, N. Nonlinear modeling of temperature-induced bearing displacement of long-span single-pier rigid frame bridge based on DCNN-LSTM. Case Stud. Therm. Eng. 2024, 53, 103897. [Google Scholar] [CrossRef]
  4. Zhang, J.; Huang, M.; Wan, N.; Deng, Z.; He, Z.; Luo, J. Missing measurement data recovery methods in structural health Monitoring: The State, challenges and case study. Measurement 2024, 231, 114528. [Google Scholar] [CrossRef]
  5. Zhao, E.; Gu, C. Review on health diagnosis of long-term service behaviors for concrete dams. J. Hydroelectr. Eng. 2021, 40, 22–34. [Google Scholar] [CrossRef]
  6. Beiranvand, B.; Rajaee, T. Application of artificial intelligence-based single and hybrid models in predicting seepage and pore water pressure of dams: A state-of-the-art review. Adv. Eng. Softw. 2022, 173, 103268. [Google Scholar] [CrossRef]
  7. Li, Y.; Bao, T. A real-time multi-defect automatic identification framework for concrete dams via improved YOLOv5 and knowledge distillation. J. Civ. Struct. Health Monit. 2023, 13, 1333–1349. [Google Scholar] [CrossRef]
  8. Cao, W.; Wen, Z.; Su, H. Spatiotemporal clustering analysis and zonal prediction model for deformation behavior of super-high arch dams. Expert Syst. Appl. 2023, 216, 119439. [Google Scholar] [CrossRef]
  9. Su, H.; Li, J.; Wen, Z.; Guo, Z.; Zhou, R. Integrated certainty and uncertainty evaluation approach for seepage control effectiveness of a gravity dam. Appl. Math. Model. 2019, 65, 1–22. [Google Scholar] [CrossRef]
  10. Liu, B.; Wei, B.; Li, H.; Mao, Y. Multipoint hybrid model for RCC arch dam displacement health monitoring considering construction interface and its seepage. Appl. Math. Model. 2022, 110, 674–697. [Google Scholar] [CrossRef]
  11. Wang, S.; Sui, X.; Liu, Y.; Gu, H.; Xu, B.; Xia, Q. Prediction and interpretation of the deformation behaviour of high arch dams based on a measured temperature field. J. Civ. Struct. Health Monit. 2023, 13, 661–675. [Google Scholar] [CrossRef]
  12. Deng, Z.; Huang, M.; Wan, N.; Zhang, J. The Current Development of Structural Health Monitoring for Bridges: A Review. Buildings 2023, 13, 1360. [Google Scholar] [CrossRef]
  13. Huang, M.; Ling, Z.; Sun, C.; Lei, Y.; Xiang, C.; Wan, Z.; Gu, J. Two-stage damage identification for bridge bearings based on sailfish optimization and element relative modal strain energy. Struct. Eng. Mech. 2023, 86, 715–730. [Google Scholar]
  14. Yang, S.; Han, X.; Kuang, C.; Fang, W.; Zhang, J.; Yu, T. Comparative Study on Deformation Prediction Models of Wuqiangxi Concrete Gravity Dam Based onMonitoring Data. CMES-Comput. Model. Eng. Sci. 2022, 131. [Google Scholar] [CrossRef]
  15. Wen, L.; Li, Y.; Chai, J. Multiple nonlinear regression models for predicting deformation behavior of concrete-face rockfill dams. Int. J. Geomech. 2021, 21, 04020253. [Google Scholar] [CrossRef]
  16. Li, B.; Han, X.; Yao, M.; Tian, J.; Zheng, Q. Quantitative analysis method for the importance of stress influencing factors of a high arch dam during the operation period using SPA–OSC–PLS. Struct. Control Health Monit. 2022, 29, e3087. [Google Scholar] [CrossRef]
  17. Dai, B.; Gu, H.; Zhu, Y.; Chen, S.; Rodriguez, E.F. On the Use of an Improved Artificial Fish Swarm Algorithm-Backpropagation Neural Network for Predicting Dam Deformation Behavior. Complexity 2020, 2020, 5463893. [Google Scholar] [CrossRef]
  18. Li, X.; Wen, Z.; Su, H. An approach using random forest intelligent algorithm to construct a monitoring model for dam safety. Eng. Comput. 2021, 37, 39–56. [Google Scholar] [CrossRef]
  19. Zhang, Y.; Zhang, W.; Li, Y.; Wen, L.; Sun, X. AF-OS-ELM-MVE: A new online sequential extreme learning machine of dam safety monitoring model for structure deformation estimation. Adv. Eng. Inform. 2024, 60, 102345. [Google Scholar] [CrossRef]
  20. Wang, S.; Xu, C.; Liu, Y.; Wu, B. A spatial association-coupled double objective support vector machine prediction model for diagnosing the deformation behaviour of high arch dams. Struct. Health Monit. 2022, 21, 945–964. [Google Scholar] [CrossRef]
  21. Li, B.; Yang, J.; Hu, D. Dam monitoring data analysis methods: A literature review. Struct. Control Health Monit. 2020, 27, e2501. [Google Scholar] [CrossRef]
  22. Flah, M.; Nunez, I.; Ben Chaabene, W.; Nehdi, M.L. Machine learning algorithms in civil structural health monitoring: A systematic review. Arch. Comput. Methods Eng. 2021, 28, 2621–2643. [Google Scholar] [CrossRef]
  23. Salazar, F.; Morán, R.; Toledo, M.Á.; Oñate, E. Data-based models for the prediction of dam behaviour: A review and some methodological considerations. Arch. Comput. Methods Eng. 2017, 24, 1–21. [Google Scholar] [CrossRef]
  24. Narwaria, M. Does explainable machine learning uncover the black box in vision applications? Image Vis. Comput. 2022, 118, 104353. [Google Scholar] [CrossRef]
  25. Molnar, C.; Casalicchio, G.; Bischl, B. Interpretable machine learning–a brief history, state-of-the-art and challenges. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases; Springer International Publishing: Cham, Switzerland, 2020; pp. 417–431. [Google Scholar]
  26. Murdoch, W.J.; Singh, C.; Kumbier, K.; Abbasi-Asl, R.; Yu, B. Definitions, methods, and applications in interpretable machine learning. Proc. Natl. Acad. Sci. USA 2019, 116, 22071–22080. [Google Scholar] [CrossRef] [PubMed]
  27. Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
  28. Mazumder, R.K.; Salman, A.M.; Li, Y. Failure risk analysis of pipelines using data-driven machine learning algorithms. Struct. Saf. 2021, 89, 102047. [Google Scholar] [CrossRef]
  29. Ma, M.; Zhao, G.; He, B.; Li, Q.; Dong, H.; Wang, S.; Wang, Z. XGBoost-based method for flash flood risk assessment. J. Hydrol. 2021, 598, 126382. [Google Scholar] [CrossRef]
  30. Duan, J.; Asteris, P.G.; Nguyen, H.; Bui, X.-N.; Moayedi, H. A novel artificial intelligence technique to predict compressive strength of recycled aggregate concrete using ICA-XGBoost model. Eng. Comput. 2021, 37, 3329–3346. [Google Scholar] [CrossRef]
  31. Jabeur, S.B.; Mefteh-Wali, S.; Viviani, J.-L. Forecasting gold price with the XGBoost algorithm and SHAP interaction values. Ann. Oper. Res. 2024, 334, 679–699. [Google Scholar] [CrossRef]
  32. Yun, K.; Yoon, S.; Won, D. Prediction of stock price direction using a hybrid GA-XGBoost algorithm with a three-stage feature engineering process. Expert Syst. Appl. 2021, 186, 115716. [Google Scholar] [CrossRef]
  33. Shi, R.; Xu, X.; Li, J.; Li, Y. Prediction and analysis of train arrival delay based on XGBoost and Bayesian optimization. Appl. Soft Comput. 2021, 109, 107538. [Google Scholar] [CrossRef]
  34. Wen, Z.; Zhou, R.; Su, H. MR and stacked GRUs neural network combined model and its application for deformation prediction of concrete dam. Expert Syst. Appl. 2022, 201, 117272. [Google Scholar] [CrossRef]
  35. Lin, C.; Li, T.; Chen, S.; Lin, C.; Liu, X.; Gao, L.; Sheng, T. Structural identification in long-term deformation characteristic of dam foundation using meta-heuristic optimization techniques. Adv. Eng. Softw. 2020, 148, 102870. [Google Scholar] [CrossRef]
  36. Qiu, Y.; Zhou, J.; Khandelwal, M.; Yang, H.; Yang, P.; Li, C. Performance evaluation of hybrid WOA-XGBoost, GWO-XGBoost and BO-XGBoost models to predict blast-induced ground vibration. Eng. Comput. 2022, 38 (Suppl. 5), 4145–4162. [Google Scholar] [CrossRef]
  37. Masum, M.; Shahriar, H.; Haddad, H.; Faruk, M.J.H.; Valero, M.; Khan, M.A.; Rahman, M.A.; Adnan, M.I.; Cuzzocrea, A.; Wu, F. Bayesian hyperparameter optimization for deep neural network-based network intrusion detection. In Proceedings of the IEEE International Conference on Big Data (Big Data), Orlando, FL, USA, 15–18 December 2021; pp. 5413–5419. [Google Scholar]
  38. Erwianda, M.S.F.; Kusumawardani, S.S.; Santosa, P.I.; Rimadana, M.R. Improving confusion-state classifier model using xgboost and tree-structured parzen estimator. In Proceedings of the International Seminar on Research of Information Technology and Intelligent Systems (ISRITI), Yogyakarta, Indonesia, 5–6 December 2019; pp. 309–313. [Google Scholar]
  39. Yang, Y. Market Forecast using XGboost and Hyperparameters Optimized by TPE. In Proceedings of the IEEE International Conference on Artificial Intelligence and Industrial Design (AIID), Guangzhou, China, 28–30 May 2021; pp. 7–10. [Google Scholar]
  40. Liu, Y.; Teza, G.; Nava, L.; Chang, Z.; Shang, M.; Xiong, D.; Cola, S. Deformation evaluation and displacement forecasting of baishuihe landslide after stabilization based on continuous wavelet transform and deep learning. Nat. Hazards 2024, 1–25. [Google Scholar] [CrossRef]
  41. Zhu, Y.; Gao, Y.; Wang, Z.; Cao, G.; Wang, R.; Lu, S.; Li, W.; Nie, W.; Zhang, Z. A Tailings Dam Long-Term Deformation Prediction Method Based on Empirical Mode Decomposition and LSTM Model Combined with Attention Mechanism. Water 2022, 14, 1229. [Google Scholar] [CrossRef]
  42. Yang, J.; Lv, Z. Research on dam foundation deformation prediction based on VMD optimized temporal convolutional network. In Proceedings of the 7th International Conference on Hydraulic and Civil Engineering & Smart Water Conservancy and Intelligent Disaster Reduction Forum (ICHCE & SWIDR), Nanjing, China, 6–8 November 2021; pp. 232–238. [Google Scholar]
  43. Yan, T.; Chen, B. Dam Deformation Prediction Based on EEMD-ELM Model. In Proceedings of the EGU General Assembly Conference Abstracts; p. 1220. [CrossRef]
  44. Zhang, G.P.; Qi, M. Neural network forecasting for seasonal and trend time series. Eur. J. Oper. Res. 2005, 160, 501–514. [Google Scholar] [CrossRef]
  45. Cleveland, R.B.; Cleveland, W.S.; McRae, J.E.; Terpenning, I. STL: A seasonal-trend decomposition. J. Off. Stat. 1990, 6, 3–73. [Google Scholar]
  46. Salazar, F.; Toledo, M.A.; Oñate, E.; Morán, R. An empirical comparison of machine learning techniques for dam behaviour modelling. Struct. Saf. 2015, 56, 9–17. [Google Scholar] [CrossRef]
  47. Zhou, J.; Qiu, Y.; Zhu, S.; Armaghani, D.J.; Khandelwal, M.; Mohamad, E.T. Estimation of the TBM advance rate under hard rock conditions using XGBoost and Bayesian optimization. Undergr. Space 2021, 6, 506–515. [Google Scholar] [CrossRef]
  48. Shi, X.; Wong, Y.D.; Li, M.Z.-F.; Palanisamy, C.; Chai, C. A feature learning approach based on XGBoost for driving assessment and risk prediction. Accid. Anal. Prev. 2019, 129, 170–179. [Google Scholar] [CrossRef]
  49. Bergstra, J.; Bardenet, R.; Bengio, Y.; Kégl, B. Algorithms for hyper-parameter optimization. In Proceedings of the 24th International Conference on Neural Information Processing System, Granada, Spain, 12–15 December 2011; pp. 2546–2554. [Google Scholar]
  50. Léger, P.; Leclerc, M. Hydrostatic, temperature, time-displacement model for concrete dams. J. Eng. Mech. 2007, 133, 267–277. [Google Scholar] [CrossRef]
  51. Liu, X.; Li, Z.; Sun, L.; Khailah, E.Y.; Wang, J.; Lu, W. A critical review of statistical model of dam monitoring data. J. Build. Eng. 2023, 80, 108106. [Google Scholar] [CrossRef]
  52. Akiba, T.; Sano, S.; Yanase, T.; Ohta, T.; Koyama, M. Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 2623–2631. [Google Scholar]
Figure 1. The framework of regression decision tree model.
Figure 1. The framework of regression decision tree model.
Buildings 14 02163 g001
Figure 2. The main process of the proposed model.
Figure 2. The main process of the proposed model.
Buildings 14 02163 g002
Figure 3. Comprehensive overview of the project.
Figure 3. Comprehensive overview of the project.
Buildings 14 02163 g003
Figure 4. The spatial arrangement of the plumb line monitoring system.
Figure 4. The spatial arrangement of the plumb line monitoring system.
Buildings 14 02163 g004
Figure 5. Water level, air temperature, and measured deformation at PL3-3.
Figure 5. Water level, air temperature, and measured deformation at PL3-3.
Buildings 14 02163 g005
Figure 6. The results of the STL decomposition of PL3-3 measured deformation time series.
Figure 6. The results of the STL decomposition of PL3-3 measured deformation time series.
Buildings 14 02163 g006
Figure 7. The original series and the series obtained by summing the seasonal and trend components for PL3-3.
Figure 7. The original series and the series obtained by summing the seasonal and trend components for PL3-3.
Buildings 14 02163 g007
Figure 8. The seasonal series and non-seasonal series for PL3-3.
Figure 8. The seasonal series and non-seasonal series for PL3-3.
Buildings 14 02163 g008
Figure 9. Historical results from TPE optimization.
Figure 9. Historical results from TPE optimization.
Buildings 14 02163 g009
Figure 10. The parameter relationships for each TPE optimization model.
Figure 10. The parameter relationships for each TPE optimization model.
Buildings 14 02163 g010
Figure 11. Parameter importance for objective value.
Figure 11. Parameter importance for objective value.
Buildings 14 02163 g011
Figure 12. Parameter importance for optimizing duration.
Figure 12. Parameter importance for optimizing duration.
Buildings 14 02163 g012
Figure 13. The linear regression analysis of the proposed model and comparison models for non-seasonal component.
Figure 13. The linear regression analysis of the proposed model and comparison models for non-seasonal component.
Buildings 14 02163 g013
Figure 14. Boxplots of the prediction residuals of each model for non-seasonal component.
Figure 14. Boxplots of the prediction residuals of each model for non-seasonal component.
Buildings 14 02163 g014
Figure 15. Boxplots of the prediction residuals of each model for seasonal component.
Figure 15. Boxplots of the prediction residuals of each model for seasonal component.
Buildings 14 02163 g015
Figure 16. Comparison of evaluation indicators for each model.
Figure 16. Comparison of evaluation indicators for each model.
Buildings 14 02163 g016
Figure 17. The feature importance for prediction results of non-seasonal component.
Figure 17. The feature importance for prediction results of non-seasonal component.
Buildings 14 02163 g017
Figure 18. The feature importance for prediction results of seasonal component.
Figure 18. The feature importance for prediction results of seasonal component.
Buildings 14 02163 g018
Figure 19. The original series and the series obtained by summing the seasonal and trend components for PL3-4 and PL3-5.
Figure 19. The original series and the series obtained by summing the seasonal and trend components for PL3-4 and PL3-5.
Buildings 14 02163 g019
Figure 20. The folding line chart of proposed model at PL3-4 and PL3-5.
Figure 20. The folding line chart of proposed model at PL3-4 and PL3-5.
Buildings 14 02163 g020
Table 1. The name and explanation of pivotal parameters.
Table 1. The name and explanation of pivotal parameters.
ParameterExplain
GammaMinimum loss reduction required to make a further partition on a leaf node of the tree
AlphaL1 regularization term on weights
EtaStep size shrinkage used in update to prevent overfitting
Max_depthMaximum depth of tree
Min_child_weightMinimum sum of instance weight needed in a child
LambdaL2 regularization term on weights
Colsample_bytreeThe subsample ratio of columns when constructing each tree
SubsampleSubsample ratio of the training instances
Table 2. The search space and optimal value.
Table 2. The search space and optimal value.
Booster ParameterSearch SpaceOptimal Value
Gamma(0.01, 10)0.012
Eta(0.005, 0.5)0.015
Alpha(0.01, 10)0.031
Subsample(0.1, 0.9)0.459
Lambda(0.01, 10)0.018
Max_depth(4, 20)10
Min_child_weight(0, 10)4
Colsample_bytree(0.1, 0.9)0.128
Table 3. Comparison of model parameters and optimal value.
Table 3. Comparison of model parameters and optimal value.
Comparison ModelParametersOptimal Value
SVRC0.1
Gamma1
RFN_estimators100
Max_depth14
Min_samples_split20
Min_samples_leaf10
ANNHidden_layer_sizes(1000, 500, 200, 100, 50)
Table 4. Analysis of the predictive performance for each model.
Table 4. Analysis of the predictive performance for each model.
ModelsEvaluation Indicators
RMSEMAER2
XGBoost0.0810.0620.998
RF0.1980.1330.989
SVR0.2280.1810.985
ANN0.2530.1970.983
MLR0.4480.2980.946
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, M.; Feng, Y.; Yang, S.; Su, H. Dam Deformation Prediction Considering the Seasonal Fluctuations Using Ensemble Learning Algorithm. Buildings 2024, 14, 2163. https://doi.org/10.3390/buildings14072163

AMA Style

Liu M, Feng Y, Yang S, Su H. Dam Deformation Prediction Considering the Seasonal Fluctuations Using Ensemble Learning Algorithm. Buildings. 2024; 14(7):2163. https://doi.org/10.3390/buildings14072163

Chicago/Turabian Style

Liu, Mingkai, Yanming Feng, Shanshan Yang, and Huaizhi Su. 2024. "Dam Deformation Prediction Considering the Seasonal Fluctuations Using Ensemble Learning Algorithm" Buildings 14, no. 7: 2163. https://doi.org/10.3390/buildings14072163

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop