1. Introduction
So far, there are approximately 98,000 dams in China, which play a significant role in flood control and power generation. Once the dam bursts, it will cause irreversible damage to the environment, society and economy [
1,
2]. Generally, the collapse of the dam is caused by the long-term deformation. Therefore, it is the primary task to accurately predict the deformation of the dam at present. Furthermore, how to improve the prediction accuracy is an important issue in the field of the dam’s safety monitoring [
3]. From the perspective of the current relevant research, the common dam deformation analysis and prediction models are mainly divided into the following three categories: the statistical model [
4,
5,
6], the deterministic model [
7,
8], and the mixed model [
9,
10,
11].
Based on the regression theory, the statistical model is usually applied to output the linear relationship between the influencing factor and the target variable. Wu Zhongru et al. [
12,
13] have conducted more comprehensive research in this area, and their findings have been widely utilized. However, the dam deformation is usually affected by factors such as temperature and water level. These influencing factors present a strong nonlinear relationship, which can be quantified by the long-term persistence (LTP). The global scale analysis reveals that there is long-term persistence in all the above factors [
14]. Hence, the existence of these LTP factors makes the deformation of the dam exhibit strong volatility, which increases the difficulty to accurately predict the deformation of the dam. Although the regression model can establish a nonlinear relationship between variables, its generalization ability is weak. Based on mechanical assumptions, the deterministic model simulates the working process of the dam through the finite element method and calculates its displacement and stress [
15]. However, this method requires a heavy workload and there is a strong nonlinear relationship between dam deformation and influencing factors, so that it can only implement qualitative analysis. In recent years, with the development of artificial intelligence, machine learning has made breakthroughs in data mining, especially the neural network models represented by BP, ARIMA, SVM, ELM, etc. Many scholars applying neural network to dam deformation prediction have achieved good results [
16,
17,
18], which has further improved the processing of nonlinear problems in comparison with the regression model. The ARIMA model can capture processes with strong volatility, while it requires a large number of parameters when the process exhibits long-range dependence [
19]. It makes the model complicated and difficult to handle. However, multivariate forecasting models usually only consider environmental factors and ignore the time dependence of the deformation sequence itself. When the deformation sequence has strong volatility, the above methods have poor robustness and generalization ability.
In order to solve the shortcomings of the above models, many scholars have proposed to adopt the LSTM [
20] model to solve the problem of the dam’s deformation prediction. Due to a long-term memory function, LSTM possesses a unique advantage in time series prediction. Mei-Li Shen et al. [
21] proposed employing LSTM to predict import and export. They found that LSTM can effectively simulate the uncertainty trend in the data. Zhangtao et al. [
22] applied LSTM to predict ship motion and proved that this approach has better performance. Yan.s et al. combined LSTM with the attention mechanism, so that the model could attach great importance to other more important factors in the time dimension [
23]. Through the analysis of the results, it is found that although the model has made great improvements in the overall prediction accuracy, the extreme value prediction is still insufficiently accurate when the data’s volatility is large.
In this case, this study proposes a multi-scale dam deformation prediction model based on signal decomposition. The model decomposes the original deformation sequence into a finite number of sub-sequences with a variety of frequencies, which can greatly reduce the model’s nonlinearity with the potential to improve the model’s prediction accuracy and robustness.
The traditional signal decomposition techniques mainly include Fourier transform, Wavelet Decomposition (WD) and Empirical Mode Decomposition (EMD) which are frequently applied to decompose the dam deformation sequence [
24]. Fourier transform can realize the mutual conversion from time domain to frequency domain, while its conditions are relatively harsh, and there is no Fourier transform for considerable useful signals; WD has a certain priority, while it is difficult to choose the basis function and decomposition scale. Furthermore, its practical application is more complicated [
25]; EMD does not need to set a basis function, which is capable to be adaptively decomposed into a finite number of modal functions. However, it is prone to modal confusion and lacks the support of mathematical theory [
26,
27]. Aiming at solving the above problems, this paper introduces a new adaptive signal decomposition method—Variational Modal Decomposition (VMD) [
28]. In contrast to the above decomposition method, it is a more accurate mathematical model, which can decompose the original data into a set of eigenmode functions (IMFs) fluctuating around the center frequency [
29,
30,
31], with a better decomposition effect as well as higher robustness [
9,
32]. However, the decomposition effect of VMD is affected by the decomposition modulus, and too many or too few subsequences will affect the decomposition effect [
33,
34]. In order to solve this problem, this paper proposes the instantaneous frequency average method to analyze the decomposition modulus K so as to avoid the modal confusion caused by excessive modulus or insufficient decomposition due to too small modulus.
In summary, this paper proposes a multi-scale dam-deformation prediction model based on VMD-LSTM-RF, which greatly reduces the complexity of the original sequence. In this model, the original sequence is first decomposed into sub-sequences with various frequencies and residuals. The LSTM is suitable for dealing with complex nonlinear problems with good long-term memory capabilities; it has the following characteristics: its results are reliable in predicting weak volatility series; the model is stable; it is not prone to overfitting, and the model performance is robust. Therefore, LSTM is applied to predict high-frequency components and residuals, and the low-frequency components are input to RF for prediction. Finally, the prediction results of the two parts are superimposed and reconstructed to obtain the final prediction results, which can not only avoid the inaccurate prediction problem caused by too large volatility of LSTM, but can also reduce the workload to a large extent, and effectively improve the accuracy of dam prediction. In order to verify the superiority of the proposed model, the most commonly applied methods are selected as benchmark methods, and the three quantitative evaluation indicators are utilized to evaluate the prediction performance of the proposed model. The main contributions of this paper are as follows:
- (1)
The VMD decomposition technology is introduced, and the instantaneous frequency average method is adopted to determine the decomposition modulus K, and the dam displacement and deformation data with strong volatility are decomposed into periodic and stable sub-sequences.
- (2)
On the basis of LSTM and RF, the high and low frequency sub-sequences are modeled and predicted, respectively. The analysis indicates that this approach can accurately capture the nonlinear characteristics of the dam deformation.
- (3)
This paper verifies the feasibility of the signal decomposition technology as well as the machine learning in dam deformation prediction, and predicts high frequency and low frequency separately, which reduces the data’s complexity of data and greatly improves the accuracy of the dam’s deformation prediction and has the obvious significance for actual project.
The rest contents of this paper are as follows: the second part briefly introduces VMD, LSTM and RF-related theories, as well as the method of determining the K value of VMD; the third part introduces the two cases’ research design, evaluation indicators, model realization, and the determination of relevant model parameters; the fourth part compares and analyzes the prediction performance of the proposed model and other models; the fifth part draws conclusions for the whole paper.
2. Materials and Methods
This section describes the proposed model, and briefly introduces the basic principles of VMD, LSTM, and RF, respectively, and then establishes the VMD-LSTM-RF model, and gives the modeling process and the specific steps of the model in detail.
2.1. VMD-Based Decomposition Technology
VMD is a new type of self-adaptable signal decomposition technology with preset scale. It can decompose a real-valued signal into K modal components around the center frequency. The process of signal decomposition is the process of solving the variational problem, as shown in Formula (1):
where
are the K modal components,
are the center frequency of each modal component,
is the original signal, and
is the pulse function. In order to obtain the optimal solution of the constrained variational problem, the Lagrange multiplication operator
and the quadratic penalty factor
are introduced to transform the constrained variational problem into an unconstrained variational problem. The extended Lagrange function is expressed as:
Alternate direction operator multiplication is used to obtain the saddle point of the above Lagrangian function, then the optimal solution can be obtained. Specific steps are as follows:
- (1)
Initializing ,, , and converting each parameter from time domain to frequency domain.
- (2)
. On the non-negative frequency domain interval,
, for
, updating
,
:
- (3)
For all
, updating
:
In the above Formulas, , and are the Fourier transforms of , and , respectively.
- (4)
Judging whether the convergence condition (judgment accuracy) is satisfied, if the following Formula is satisfied, stop the iteration, otherwise return to step (2). When the entire period is over, the result is output, and K eigenmode components (IMF) are obtained.
2.2. Method of Determining K Value Based on the Mean Value of Instantaneous Frequency
When using VMD to decompose the original signal, the value of the decomposition modulus K will greatly affect the decomposition effect. If the K value is too small, the decomposed subsequence will lose information or cause modal heterogeneity; on the contrary, if the K value is too large, it will lead to excessive decomposition, increase the amount of calculation of the subsequent neural network, and then affect the prediction effect [
35]. Therefore, it is very important to select an appropriate K value and fully extract the characteristics of the original data. This article introduces the instantaneous frequency mean value method of the component to select the K value. The main idea is that the IMFs obtained based on VMD decomposition are with obviously different frequencies. By dividing the original sequence into sub-sequences with different modulus, the mean value of the component instantaneous frequency under various K values is calculated. If the average change corresponding to two adjacent K values is significantly reduced, it is considered that the over-decomposition has caused the modal confusion phenomenon at this time, and this critical K value is the optimal decomposition modulus. In order to verify the effectiveness of this method, an analog signal is established, which is shown in Formula (7).
After the signal is decomposed by VMD, three IMFs should be obtained, as shown in
Figure 1:
Using the instantaneous frequency mean value method, the decomposition is divided into deformed sub-sequences with different modulus, and the instantaneous frequency mean values of the components under different K values are calculated. The specific calculation results are shown in
Figure 2. It can be seen that when K ≥ 4, each component has a certain degree of modal confusion. Therefore, for analog signals, K = 3 is the best decomposition modulus, which is consistent with the preset K value. It is proved that the method is effective in selecting the modulus of VMD.
2.3. Long Short-Term Memory Network (LSTM)
LSTM is transformed from Recurrent Neural Network (RNN), which solves the problem of gradient disappearance and gradient explosion caused by the long-term dependence of RNN on the time series, and can effectively deal with the prediction problem of the time series [
18]. The essence of RNN is to add a feedback mechanism to the fully connected neural network, which determines that the output of the network is not only related to the current input, but also related to the previous output while LSTM adds a memory unit on the basis of RNN. Compared with RNN, LSTM has a longer memory time, but its control process is similar to RNN. They both process the data flowing through the cell in the process of forward propagation. The difference is that the structure and operation of the cells in LSTM have changed.
The structure of LSTM is shown in
Figure 3. It is mainly composed of a memory unit that stores the information state and three gate control units to regulate the flow of information in and out of the memory unit. The memory unit can retain time sequence hidden information to achieve the purpose of using longer time sequence information. The three gating units mainly use the activation function to change the information state in the memory unit, while the input gate is mainly to update and store the information in the cell state, and the output gate selects the useful part for prediction from the final memory unit. The cell unit update status of LSTM is as follows below.
Forgetting gate: It outputs a 0–1 vector by checking the information of the sum to determine how many memories of the cell state in the previous period need to be forgotten. At this time, the output of the cell state is:
Input gate: Using
and
to decide what new information to add to the cell state, the update Formula is as follows
Cell state update: Using the state
output by the forget gate and the
and
obtained from the input gate to update the cell state so that the next state can be used. The update Formula is as follows:
Output gate: After updating the cell state, it is necessary to determine which state features of the output cell are based on the sum of the inputs. The calculation Formula is:
In the Formulas (8)–(13),,, and are the weight matrix and bias of the forgetting gate, input gate, and output gate, respectively, and is the activation function.
As shown in
Figure 3, the output of one cell state on the hidden layer will be used as the input of the next state. This transfer mechanism makes LSTM have better learning and memory capabilities. This allows the model to obtain the optimal parameters through iteration. In essence, iteration is a process of seeking optimization, and its loss function is the objective function. In this article, the Adam optimizer is used for stochastic gradient descent. Compared with other optimizers, Adam is more efficient and has fewer parameters.
2.4. Random Forest
RF is a special bagging method, which is an organic combination of bagging algorithm and decision tree. In each decision tree in the random forest algorithm, there is no connection between each other. After the model is established, whenever a new deformed input sample appears, each decision tree in the model will make a judgment and judge which type the sample belongs to, then the more frequent ones will be the selection result of the final data analysis [
36]. The RF algorithm flow is as follows.
- (1)
Among N samples, T training sets are formed by Bootstrap sampling.
- (2)
Each training set generates a corresponding decision tree .
- (3)
The node of each decision tree is split, and an attribute is randomly selected from the attributes as the attribute set of the current node split, and a feature is selected for splitting based on the principle of minimum absolute average error or square average error.
- (4)
After the decision tree is constructed, there is no need to perform pruning processing.
- (5)
Testing set sample X, after testing with each decision tree C, get the predicted value.
- (6)
The average value of all decision trees is used as the final predicted value.
2.5. Proposed Model (VMD-LSTM-RF)
Combining the above-mentioned theories, this section proposes a hybrid model based on multi-scale prediction, and its process is shown in
Figure 4. The implementation of the hybrid model is mainly divided into the following steps:
Step 1: Decomposition. Decomposing the original displacement sequence of the dam into several components such as low frequency, high frequency and residual by VMD method.
Step 2: Components Prediction. Using LSTM to predict high-frequency components and residuals (ER). The low-frequency components are predicted by RF. The optimal parameters of the model are determined by grid search and five-fold cross-validation.
Step 3: Refactor. Adding and reconstructing the components obtained by the above prediction model to obtain the final dam deformation prediction value.
Step 4: Evaluation. In order to verify the accuracy of the model prediction, the average absolute error (MAE), root mean square error (RMSE) and coefficient of determination (R2) are used for evaluation and compared with the basic model.
2.6. Comparison Model Extreme Learning Machine (ELM)
ELM is a new type of feedforward neural network proposed by Huang et al. in 2006, which consists of an input layer, a hidden layer and an output layer. In this article, the neurons in the input layer correspond to the n input variables in the model, and the neurons in the output layer correspond to the horizontal displacement of the concrete dam crest. The specific principles of ELM are as follows:
Given a Q group sample
, where
,and
. Assuming that the hidden layer activation function is
, the output vector is:
Among them,
represents the threshold of the i-th neural node in the hidden layer;
is the weight from the input layer of the i-th neural node to the hidden layer;
represents the weight from the hidden layer to the i-th neural node of the output layer Formula (15) can be expressed as:
According to the ELM theorem, when the number of training set samples is large, the number of hidden neurons K is usually less than Q. At this time, the training error of ELM can be infinitely approximated to any real number
greater than 0.
The connection weight between the hidden layer and the output layer can be obtained by finding the least squares solution of , then . Where is the generalized inverse Moore-Penrose matrix of matrix .
2.7. Research and Design
2.7.1. Project Overview
Taking a roller compacted concrete gravity dam as an example, which is located in Fujian Province, China, the design flood level is 633.0 m. The elevation of the foundation surface is 562.0 m, and the maximum dam height is 73.4 m. The horizontal displacement monitoring of the concrete dam crest adopts the tension line method, and a total of 11 measuring points are arranged, of which 9 working measuring points are located on the top of each dam section; 2 check base points are located on the left and right ends of the tension line. The deformation monitoring system of the dam includes horizontal displacement and vertical displacement. This article mainly models and analyzes the horizontal displacement of the dam crest. Taking into account the unevenness and local differences of the dam deformation, the monitoring data of the middle measuring point EX5 and the left bank measuring point EX2 of the fourth dam section are taken for analysis, as shown in
Figure 5. The deformed data obtained by processing the outliers of these two sets of data are shown in
Figure 6. It can be seen that the data a highly volatile and EX5 shows periodicity on the whole, but EX2 looks messy and has no rules to follow. The numbers of deformation monitoring data of the two measuring points are 739 and 417, respectively. The division results of their training set, validation set and prediction set are shown in
Table 1.
2.7.2. Determination of K Value Based on VMD
Since the choice of decomposition modulus K will greatly affect the decomposition effect of VMD, it is necessary to determine the decomposition modulus K before proceeding to decompose the data. In this section, EX5 is taken as an example, the data are decomposed nine times using the VMD model, and the instantaneous frequency mean value change line chart of the component under different K values is calculated, as shown in
Figure 7. It can be seen from the figure that when K = 6, the instantaneous frequency mean change of the two components is significantly reduced. When K > 6, this trend is more obvious, indicating that there is modal confusion in the decomposition results at this time, which will make the sub-sequences obtained by decomposition not able to express the dam deformation characteristics well. Therefore, K = 5 is selected as the best decomposition modulus, and the measurement point of EX2 is selected as K = 6 according to the above method; the specific decomposition results of EX5 and EX2 are shown in
Figure 8.
It can be seen from the decomposition results that the frequency characteristics of all components are obvious, and there are no undesirable phenomena such as modal confusion. In the decomposition results of the corresponding deformation data of EX5, the frequency of IMF1–IMF3 is lower, IMF1 is basically consistent with the dam deformation trend, and the IMF2–IMF3 component process line is relatively smooth; IMF4–IMF5 has obvious periodicity, and ER has high volatility. Therefore, IMF1–IMF3 will be used as the input of RF; IMF4–IMF5 and ER will be the input of LSTM. Similarly, in the decomposition result of the corresponding deformation data of EX2, IMF1–IMF4 are used as the input of RF, and IMF5–IMF6 and ER are used as the input of LSTM.
2.7.3. Evaluation Index
In order to evaluate the proposed model, six evaluation indicators are introduced for comparison with other models in this study. MAE and RMSE are used to indicate the degree of deviation between the predicted value and the true value, and
is used to express the correlation between the true value and the predicted value;
and
are used to measure the degree of improvement of the proposed model compared with other models. Their definitions and formulas are shown in
Table 2, in which
is the observed value and
is the predicted value.
2.7.4. Model Realization
After decomposition, the high frequency components and ER will be predicted by LSTM. The realization of the LSTM model needs to consider the hidden layer, the number of neurons in each layer, the size of the training batch, the activation function and the optimizer. Too many hidden layers and neurons will increase the computer load, increase the workload of the computer, and reduce work efficiency. On the contrary, too little can easily lead to underfitting of training data, which in turn affects the prediction accuracy. In order to prevent overfitting, a dropout layer is added for each hidden layer. In addition, during the training process, the Adam optimizer is currently considered to be the best performing optimizer. See
Table 3 for detailed parameter information.
Using the decomposed low-frequency components as the input to the RF. Here, Bootstrap, max_depth and n_estimators are mainly considered. Generally speaking, if n_estimators is too small, it is easy to underfit, and if it is too large, it will be time-consuming. Therefore, it is necessary to select a moderate value for parameter tuning. Meanwhile, due to the large amount of data, max_depth needs to be set. The above parameters are determined by grid search and 5-fold cross-validation. The traversal range is determined based on historical experience and manual experiments. Relevant parameters are shown in
Table 4.
3. Results and Discussion
In this section, the sub-sequences of different frequencies obtained after the decomposition of the two measuring points are input into LSTM and RF, respectively, and obtain the corresponding predicted values. To evaluate the applicability of the model, the prediction effects of LSTM and RF on high and low frequency data are discussed through two monitoring points EX5 and EX2, analyzing and explaining them based on the correlation coefficient. Finally, comparing the proposed model with the traditional prediction model, it is found that the model has excellent practicability for dam deformation prediction.
As shown in
Figure 9a and
Figure 10a, LSTM has great advantages in capturing the inflection point of high-frequency data, but occasionally there may be situations where it cannot be captured. After our analysis, the reason may be the data’s volatility is too strong.
Figure 9b and
Figure 10b show that the correlation between the predicted value and the true value at the two measuring points reached 95.3% and 95.0%, respectively. As can be seen, most of the prediction points are within the 95% confidence interval, which shows that LSTM has a strong ability to mine high-frequency data.
Figure 9c and
Figure 10c indicate that RF also has a good effect in realizing the prediction of stable data, which is not only reflected in the trend of the data, but also in the time lag. It can be seen from
Figure 9d and
Figure 10d, although the residual sequence is an uncertain component, the predicted correlation coefficient values at the two measurement points also reached 99.4% and 98.1%, respectively, which proved that the input of low frequency components into RF is effective. Therefore, it is considered that RF has a positive significance for the prediction of low-frequency components.
Then, the high-frequency, low-frequency, and residual prediction results are reconstructed, and finally the final prediction value of dam deformation is obtained. This section selects a single LSTM, RF and Extreme Learning Machine (ELM) as the comparison model. Compared with the traditional single hidden layer neural network, ELM has the advantages of fewer parameters, fast training speed and strong generalization ability [
16]. The comparison result is shown in
Figure 11. From
Figure 11a, it can be seen that the predicted trends of each model are roughly the same as the actual measured trends, but the predicted results of the proposed model are closer to the real results, and at the same time, it can better capture the mutation points of the dam deformation. From
Figure 11b, it can be found that the absolute error of the model is the smallest, which illustrates the accuracy of the model’s prediction and its reliability in the prediction of concrete dam deformation. In order to detect the outliers between the observed value and the predicted value, this study uses box plots to further illustrate. The box plots do not rely on any assumptions and are very robust to outliers.
Figure 11c shows the box line drawing results of the proposed model and other comparison models on measuring point EX5. It can be found that the residuals of the model are distributed within 1.5 quartiles (IQR), and there are almost no extreme outliers, with only a few mild outliers. In order to illustrate the prediction effect of each model more intuitively, six evaluation indicators are selected, the corresponding values are shown in
Table 5, among which proposed model is selected as the comparison model.
Table 5 shows that the model proposed in this study is the lowest in all comparison models, whether it is MAE or RMSE, and its MAE and RMSE values are 0.174 and 0.214, respectively. The correlation coefficient between the real value and the predicted value is the closest to 1, but due to the denseness of the data, the correlation coefficient is increased by 2.2% at the minimum and 9.6% at the maximum. It can be seen from the table that the prediction effect of a single LSTM model and RF model on the overall data is not as good as the proposed model. The MAE and RMSE of a single LSTM are 0.402 and 0.512, respectively. The MAE and RMSE values of a single RF are 0.355 and 0.441, respectively. Based on LSTM, the proposed model has increased the corresponding MAE and RMSE values by 56.7% and 58.2%, respectively. It also increased by 51.0% and 51.5%, respectively, on the basis of RF. Compared with ELM, the improved performance is more significant, reaching 72.9% and 72.0%. In order to further verify the validity of the model, the model is then applied to the deformation data of the EX2 measuring point, and it was also compared with the other three models. The prediction results are shown in
Figure 12 and
Table 6.
Through the prediction results of each model at the EX2 measurement point, it can be found that the proposed model is still the closest to the true value, as shown in
Figure 12a.
Figure 12b shows that the residual is the smallest and most stable.
Figure 12c exhibits that there are almost no extreme outliers between the observed value and the predicted value, which shows that the proposed model improves the performance of a single LSTM and RF to a certain extent. According to
Table 6, it can be seen that the degree of improvement of each index exceeds 70%, and the correlation coefficient has increased the most. This is because the EX2 measurement point is located on the left bank of the dam, and the data is discrete and volatile, and the decomposed data become orderly, which makes the proposed model easy to predict; however, the data that are without decomposition are difficult to mine for other models, and the law is not easy to capture, resulting in a small correlation coefficient. In summary, the accuracy and practicability of the proposed model for dam deformation prediction can be further explained.
The prediction results of each measurement point show that the model proposed in this paper solves the problem of inaccurate prediction caused by the strong volatility of dam deformation data to a certain extent. At the same time, the proposed model is significantly better than other advanced models. The proposal of this model provides theoretical knowledge for the analysis of the deformation prediction of important dam types such as RCC dams, and also provides prior knowledge for the construction of the dam safety monitoring system.