1. Introduction
Ensuring dam safety is a fundamental and critical aspect of engineering operation and management. Dam deformation is a key indicator that reflects the overall safety status of a dam and is used as the primary monitoring parameter both domestically and internationally [
1]. Dam deformation is influenced by external factors such as water level, sedimentation, and environmental temperature, as well as internal factors such as material properties. It involves complex linear or nonlinear relationships. Traditional statistical models use factors such as water level, temperature, and creep for fitting and are widely used in engineering analysis. However, they have limitations in terms of multidimensional input, model adaptive learning, and fitting complex nonlinear relationships. Additionally, they generate significant selection bias because of the need to assume the distribution of data in advance.
To overcome these problems, in recent years, a large number of algorithm models based on mathematical methods such as neural networks have been proposed. Multi-layer neural networks have excellent nonlinear fitting capabilities, but multi-layer perceptrons and traditional recurrent neural networks cannot learn and remember long-term dependency information due to the vanishing gradient problem. Dam engineering deformation measurement data have obvious regularity, and it is necessary to consider the long-term dependency relationship between the data in the time dimension. The LSTM (Long Short-Term Memory) [
2] proposed by Schnidhuber and Hochreiter is a variant of recurrent neural networks and has been widely used in time series analysis and natural language processing [
3,
4,
5] due to its ability to extract and remember long-term dependency information. It has good application prospects in dam engineering deformation analysis. Ou Bin and others applied the LSTM neural network to the prediction of concrete dam deformation, effectively mining and learning the complex nonlinear relationships between dam deformation and various environmental factors. By comparing with traditional methods such as stepwise regression and multivariate regression, it is shown that the deformation prediction model based on LSTM has superior performance [
5,
6,
7,
8].
Some parameters of neural networks need to be set manually before training and cannot be obtained during data learning. The reasonable determination of hyperparameters is the primary problem in optimizing neural networks, and automating the hyperparameter optimization process is key to deploying time series models in production environments. Current hyperparameter optimization algorithms mainly fall into two categories: Bayesian optimization and evolutionary algorithms [
9]. Evolutionary algorithms, such as the Sparrow Search Algorithm (SSA) [
10], Particle Swarm Optimization (PSO) [
11], and Artificial Bee Colony Algorithm [
12], have been widely applied by numerous researchers to optimize the hyperparameters of LSTM neural networks, achieving good optimization results. These algorithms offer broad applicability and higher robustness compared to traditional grid-search and random-search algorithms. The core of Bayesian optimization is constructing a prior probability model based on known sample points to estimate the posterior distribution. This method can make full use of existing information to determine the step size and direction of the next search [
9]. Bayesian optimization has shown great potential in parameter estimation in various fields, such as combinatorial optimization, neural architecture searches, automatic machine learning, safety monitoring models, and environmental monitoring [
13,
14,
15,
16].
Time series data decomposition is an efficient algorithm widely used in the field of time series data prediction [
17,
18,
19]. By decomposing measured data with temporal features and then predicting the decomposed residual, periodic, and trend components separately, the prediction accuracy can be further improved [
20,
21]. Li Bin and others combined the Seasonal Difference Autoregressive Integrated Moving Average (SARIMA) model and the Multivariate Linear Regression (MLR) model in their research on earth-rock dam displacement prediction. They predicted the periodic and trend components obtained after Hodrick–Prescott (HP) filtering [
22]. Dong Yong and others first used Empirical Mode Decomposition (EMD) to decompose the original deformation monitoring data of a roller-compacted concrete dam and then further decomposed the high-frequency components using Ensemble Empirical Mode Decomposition (EEMD) to extract effective deformation information [
23]. By separately modeling these components using LSTM, they effectively improved the prediction accuracy of a single model. Lin Chuan and others used the Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN) method to remove redundant data from the concrete dam deformation sequence, reducing the screening time and improving prediction accuracy [
24]. The research work of these scholars shows that decomposing the original time series data using statistical methods can reduce the impact of random redundant data on a single model and, on the other hand, utilize the characteristics of different models to model different components [
25], thereby further improving the prediction accuracy of the model.
In summary, neural networks have certain advantages over traditional models in adaptive learning and complex nonlinear relationship fitting, but the choice of hyperparameters significantly affects prediction accuracy. In this study, a forward analysis model based on prototype observation data is constructed. The LSTM neural network is used to learn the potential rules and patterns in the actual operation process of the dam, and periodic analysis, STL decomposition, and TPE hyperparameter optimization are introduced into the model to achieve automated modeling of numerous measurement points. The effectiveness of the model is verified through its practical application in the Wanjiazhai Dam project, and the prediction accuracy is improved compared to the single LSTM model and traditional Support Vector Machine (SVM) and Autoregressive Integrated Moving Average (ARIMA) models.
2. Principle of the TPE-STL-LSTM Model
2.1. LSTM
Recurrent Neural Networks (RNN) address the lack of sequential memory problem in Multi-Layer Perceptrons (MLP) by sharing model parameters through the structure of recurrent networks. However, as the sequence length increases, the gradients of the loss function can become very small when propagated back through many time steps (or layers). This leads to the problem of vanishing gradients [
26,
27].
As shown in
Figure 1, Long Short-Term Memory (LSTM) [
2] neural networks maintain and transmit a cell state across various layers, permeating the entire recurrent network architecture and employing specifically designed “gate” structures that determine the updating and modification of cell states through elementary linear operations, which optimizes the propagation of gradients in the backpropagation (BP) process. Consequently, they selectively rectify parameters returned by the error function during gradient descent to preserve long-term dependencies and discard irrelevant information, thereby mitigating, to a certain extent, the issues of gradient explosion and vanishing gradients. For each input time sequence, each layer of the LSTM neural network encompasses the following functions [
28]. The symbols in
Figure 1 are the same as in the formula:
In this formulation, denotes the hidden state at time t, signifies the cell state at time , represents the input time series data at time , and corresponds to the hidden state from time t or the initial state. The input gate, forget gate, cell state, and output gate are respectively represented by , , , and . The input gate updates the cell state, the forget gate determines whether to discard or retain information from the cell state, and the output gate controls the input received by subsequent neurons. This gate structure ensures the preservation of latent information and patterns in long time measurement sequences.
2.2. TPE Hyperparameter Optimization
In Bayesian optimization, the Bayesian theorem is used to update the prior probability distribution of unknown parameters based on observed data. This method is particularly useful for optimizing objective functions that are computationally expensive or time-consuming to evaluate [
29]. The method consists of two important components: a probabilistic surrogate model and an acquisition function. The surrogate model approximates the black-box objective function by tracking past evaluation results, and the acquisition function is used to predict the potential locations of optimal points given the current known data.
The Tree-structured Parzen Estimator (TPE) [
30] is a hyperparameter optimization algorithm proposed by Bergstra based on the Bayesian rule that employs Gaussian mixture models to construct probability surrogate models. The surrogate model comprises two parts, with the formula as follows:
In this expression, denotes the best observed objective function value to date. Separate probability density functions are maintained for observations smaller and larger than this value. The next sampling point is evaluated based on the expected improvement (EI), with the aim of obtaining a superior parameter combination compared to the existing observations.
In this study, the Optuna [
31] framework proposed by Takuya Akiba, Shotaro Sano, and others is employed for hyperparameter optimization. This framework boasts several characteristics that render it advantageous in practical applications:
Lightweight: Optuna’s design is concise and easy to install and use, allowing for effortless deployment and application in various computing environments.
Modular design: Optuna adopts a modular approach, enabling users to freely combine different optimization algorithms, search space definitions, and evaluation methods, making it adaptable to a wide range of scenarios.
High extensibility: Optuna’s design permits the addition of custom optimization algorithms and evaluation methods to meet the requirements of specific problems, ensuring continuity and customization for subsequent research.
Database-based parallelism: Capitalizing on TPE’s characteristics, Optuna implements a parallel mode that maintains surrogate models through a central database. In multi-node computing environments, this accelerates the hyperparameter search process, enabling the discovery of better parameter combinations within a limited timeframe.
2.3. STL Decomposition
Seasonal and Trend decomposition using Loess (STL) [
32], a robust and versatile time series decomposition method based on local weighted regression, is primarily designed for non-stationary time series. Its implementation consists of two parts: an inner loop and an outer loop. The inner loop extracts the trend and seasonal components of the time series, while the outer loop adjusts the weight parameters of the robust local weighted regression during the inner loop process. Through this double recursion, non-stationary time series are ultimately decomposed into three additive components: trend, seasonal, and residual.
In STL decomposition applications, a periodic parameter must be provided. To enhance decomposition accuracy and automate the modeling process, the autocorrelation function and Fourier transform are employed to determine the periodic parameter of the original monitoring data.
The Autocorrelation Function (ACF) characterizes the similarity of a time series at different time intervals, with its value ranging from −1 to 1. A value closer to 1 indicates a positive correlation between the time series data in the two intervals. The autocorrelation function value reaches its maximum at integer multiples of the actual time series period, making it a common tool for identifying repetitive patterns in time-domain signals. The formula for the autocorrelation function is as follows. In the equation,
is the periodicity parameter,
is the total length of the sequence,
represents the cumulative variable,
is the t-th data point, and
represents the mean of all the data:
In this study, the Fourier transform is employed to convert time series data signals from the time domain to the frequency domain, with the maximum amplitude sine wave selected as the optimal period (T) for the time series. The autocorrelation coefficients of several periods with relatively high weights are calculated to ensure the reliability of the periodic parameter selection.
2.4. Modeling Process
Figure 2 illustrates the model construction process coupling TPE, STL, and LSTM, as well as their execution logic within the automated monitoring system. The red-framed section in the figure represents the modeling service workflow, while the blue section below illustrates the dam displacement prediction service workflow in a production environment. If the prediction accuracy does not meet the specified requirements, the model will be rebuilt and stored in the model database.
As shown in
Figure 2, the modeling process comprises four core components: preprocess, filter data, period analysis, and TPE-HPO (Hyperparameter optimization).
Preprocess: Preprocess the raw monitoring data obtained from the monitoring database, including white noise detection, data resampling, and outlier removal.
Filter data: Select input data for the model through correlation analysis, partitioning them into training and validation sets. The training set is used for model training, while the validation set supports TPE algorithm parameter optimization.
Period analysis: Employing discrete Fourier transform, convert the input data to the frequency domain, and select the period of the sine wave with the highest amplitude as the periodic parameter for the time series data. Then, decompose the data using STL, yielding trend (), seasonal (), and residual () components.
TPE-HPO: Establish corresponding LSTM neural network models (
,
, and
) for the trend, seasonal, and residual components, and use the TPE algorithm to optimize the hyperparameter values of the models, selecting the best model. The hyperparameter optimization is represented by the flowchart in
Figure 3, with train loop representing each epoch, and HPO loop denoting each trial. The train loop contains pruning operations, comparing model fitting accuracy within the same training iteration. If the accuracy is insufficient, the training terminates early, discarding unfruitful training attempts and improving HPO efficiency. TPE optimization determines the hyperparameter values for the next train loop using the TPE algorithm.
Service: Obtain the predicted displacement value by summing the predicted results of the trend, seasonal, and residual components. If the model optimized by the TPE algorithm is validated using the latest monitoring data, it will be stored in the model database.
3. Engineering Case Application
The Wanjiazhai project is one of the key initiatives within the pilot implementation plan for digital-twin river basins constructed by China’s Ministry of Water Resources, with dam deformation prediction and analysis being a crucial aspect. In this study, the Wanjiazhai concrete gravity dam serves as the case project, and an automated TPE-STL-LSTM dam deformation prediction model is established, predicting dam deformation using data from 14 dam sections.
3.1. Project Overview
This Grade I large-scale project primarily comprises a river-blocking dam, a powerhouse behind the dam, and other structures. The river-blocking dam is a semi-integral concrete gravity dam with a crest length of 443 m and a maximum dam height of 105 m, and it is divided into 22 sections, as shown in
Figure 4. The dam crest displacement is jointly observed by sightlines and vacuum laser alignment systems and mutually verified, while dam body deflections are monitored by four sets of plumb and reverse plumb lines distributed across the bank slope and riverbed sections. Horizontal and vertical displacement control networks are established near the dam site to monitor rock mass displacements in the vicinity and verify displacement changes in the dam’s horizontal and vertical working points.
Section 14 of the dam serves as the powerhouse section and is a typical observation section. Various monitoring instruments are comprehensively and centrally arranged on the observation cross-section of this section. This study is based on the actual observed measurements of section 14 for modeling, prediction, and analysis. Displacement data are obtained from the LBD-14 laser alignment system measuring point at the top of the dam. A series of daily measured data from 1 January 2018 to 31 December 2021, spanning four years, are used as the original modeling samples. After outlier filtering and data resampling, 1462 sets of measured data are obtained, and they are divided into training and validation sets at a ratio of 4:1. The training set is used to determine the weights between neurons in the neural network, while the validation set is used to test the model’s prediction performance and generalization capabilities.
As seen in the measured data curve in
Figure 5 and
Figure 6, the water level during the dam’s operation phase is significantly affected by the dispatching scheme, and the crest displacement along the river and pressure of dam foundation are strongly correlated with the reservoir water level. The reservoir water level rises and is lowered to the lowest point of the year in September due to sediment discharge scheduling, causing the crest deformation towards upstream to reach its maximum value for the year.
Figure 6a,b represent the monitored strain values on the surface and internally, respectively, and both exhibit a high correlation with temperature. The measured data conform to the general deformation pattern of concrete gravity dams and can be used to test model performance.
3.2. Correlation Analysis
Dam deformation is influenced by various environmental factors and exhibits a complex nonlinear relationship with them. The abundance of measured data from automated monitoring facilitates models that fully learn various features related to dam deformation, while also introducing a large volume of irrelevant data. Utilizing all measured data for model training would result in excessive computation and prolonged training times. By conducting correlation analysis, some irrelevant information can be eliminated and the validity of monitoring data can be verified. Employing measured data with a high correlation to dam displacement as the original training sample avoids ineffective inputs, extracts main features, and considers computational efficiency.
In the correlation heatmap, prefixes indicate the dam section where the instrument is located, X represents the along-stream (horizontal) direction, and Y denotes the vertical direction. As shown in
Figure 7, reservoir water levels exhibit a distinct correlation with the along-stream displacement of the dam crest in section 14. Some internal monitoring instruments are significantly affected by temperature, displaying a strong positive correlation. This is mutually corroborated by the qualitative analysis of deformation and water level curves in the previous section, demonstrating the effectiveness of correlation analysis. In addition to environmental factors such as water level and temperature, this study also utilizes highly correlated strains from the same dam section as the original data for establishing a multi-dimensional LSTM model.
3.3. Periodic Analysis and STL Decomposition
In STL decomposition, if there is an error in determining the cycle parameter, the decomposition result will deviate abnormally. The weight of the seasonal component will decrease, and the trend or residual components will contain cyclical factors, rendering the decomposition work meaningless and reducing prediction accuracy. By using the discrete Fourier transform, the along-river displacement time series data are converted to the frequency domain, resulting in the decomposition curve shown in
Figure 8. The horizontal axis represents the cycle, and the vertical axis represents the amplitude of the corresponding cycle. A larger amplitude indicates that the cycle has the highest proportion in the original measured data, and the cycle with the largest amplitude is selected as the optimal cycle.
According to the spectral analysis, the autocorrelation coefficients for each optimal cycle are calculated using the autocorrelation function. Some calculation results of the measured sequences are shown in
Table 1, where vertical displacement is taken from the 14-Y absolute displacement, and horizontal displacement is taken from the 14-X absolute displacement. The optimal cycles are concentrated around 365 and 182 days. The ACF value of temperature with a 182-day cycle is −0.7763, while the ACF value with a 365-day cycle is 0.6813. This result deviates from the optimal cycle obtained by Fourier decomposition, but considering practical experience, the decomposition cycle is still set to 365. Other data exhibit high consistency between the cycles characterized by discrete Fourier decomposition and the autocorrelation function, indicating that using the discrete Fourier decomposition method for cycle analysis is reliable.
Based on the cycle parameters obtained from cycle analysis, the measured values are decomposed using the STL method. Taking horizontal displacement data as an example, cycle parameters of 91, 182, and 365 days are chosen, and the decomposition results are shown in
Figure 9. The cycle parameters significantly affect the STL decomposition results. As the cycle parameters approach the optimal cycle obtained from cycle analysis, the periodic fluctuations in the trend component are gradually attributed to the seasonal component, and the regularity of the residual component decreases accordingly. This demonstrates that the STL algorithm performs better at the optimal cycle than at other parameter values. The decomposition results show that the horizontal displacement of the concrete dam is mainly influenced by periodic environmental factors such as water level and temperature. The trend component of the displacement, which is affected by time-dependent factors and concrete creep, has stabilized, which is consistent with the performance characteristics of the dam’s stable operation for more than twenty years.
In summary, we validate the periodic parameter using the Autocorrelation Function (ACF) coefficients and compare the decomposition results for different periodic parameters. The periodic analysis is performed using the Fourier method, and the period with the maximum amplitude is selected as the optimal period. We then use the optimal period as a parameter for the STL decomposition, resulting in a trend component that exhibits the least periodic characteristics, aligning with the modeling approach and requirements.
3.4. Hyperparameter Optimization
Following data preprocessing, periodicity analysis, and time series decomposition steps, a series of data including trend, periodic, and residual components is obtained. To eliminate the impact of differing data scales on training performance and enhance model convergence speed, these data need to be standardized. The multi-dimensional LSTM neural network is defined and implemented using Pytorch, with an input layer composed of a multi-dimensional tensor. The input data include ten dimensions, containing information on environmental factors, water head, and strain within the same dam section. The batch size is set to 30, the loss function is the mean squared error function, and the widely used Adaptive Moment Estimation (Adam) method is employed for optimization.
To compare and validate the advantages of the decomposition–prediction model over a single model, this study divides the model hyperparameter optimization into four parts: one part being a multi-dimensional LSTM model directly predicting the original sample data; and the other three being multi-dimensional LSTM models predicting trend, periodic, and residual components obtained from STL decomposition separately. All four parts use TPE algorithm for automated hyperparameter optimization. The hyperparameters to be optimized include training epochs, LSTM layers, hidden layers, and learning rate.
Table 2 presents the optimal parameter values obtained from the optimization of the four parts, where Hidden_Size represents the feature scale of the LSTM hidden layer, Learning_Rate represents the learning rate, Num_Epochs represents the training epochs, and Num_Layers represents the stacked layers of LSTM. Hidden_Size determines the number or scale of neurons in the LSTM hidden layer. Through the hidden layer, LSTM can remember and utilize previous information to influence current outputs. Learning_Rate controls the step size or rate at which the model updates weights and biases during each iteration. Num_Epochs represents the number of times the model traverses the entire training dataset during the training process.
Hyperparameter importance analysis is an essential task that helps with understanding the stability and accuracy of the model, and it can guide similar time series modeling tasks to better optimize the hyperparameter search space and improve search efficiency. In this study, we explore the impact of common hyperparameters such as learning rate, network depth, hidden layer size, and training epochs on model establishment by tracking training history.
Optuna offers optional fANOVA and Mean Decrease Impurity methods for assessing hyperparameter importance [
31]. We choose the latter, which uses a random forest regression model to predict the objective values of complete trials based on their parameter configurations and computes feature importance using MDI. This approach exhibits robustness for high-dimensional, nonlinear datasets.
The relationships among these hyperparameters are complex, and Optuna’s hyperparameter optimization framework can conveniently implement database-based parallelism based on the TPE algorithm, greatly enhancing the efficiency of exploring the hyperparameter space. As a result, we can investigate the patterns among hyperparameters in many experimental samples.
Figure 10 displays hyperparameter importance, showing the importance of each hyperparameter for the residual, periodic, and trend components after STL decomposition and for undecomposed original data when modeling directly.
Both original data and residual components exhibit high sensitivity to num_layers, which may be related to the high randomness and complex nonlinear relationships in the data. The expressive capability of neural networks is closely associated with their width and depth. Trend components capture the long-term tendencies of time series, and these tendencies typically require more iterations during training to be adequately learned and fitted. For periodic components, because the periodic data themselves possess apparent periodicity and regularity, a smaller learning rate may cause slower model convergence and inferior fitting accuracy with the same training epochs. In summary, the understanding of hyperparameter importance is closely related to the inherent characteristics of time series. By adjusting the step size and exploring the parameter search space, we can incorporate prior knowledge into the hyperparameter optimization process, thereby improving its efficiency and accuracy.
4. Discussion
The hyperparameters obtained from the hyperparameter optimization are used to train the model and predict the horizontal displacement of the dam crest at section 14. To demonstrate the superiority and applicability of this model for dam deformation prediction more intuitively, the prediction results are compared with those of SVM and ARIMA models. The SVM model demonstrates excellent generalization and robustness in situations with small samples, high dimensionality, and nonlinearity [
33], making it widely applied in dam deformation prediction. The ARIMA model decomposes time series data into three parts, namely autoregressive, difference, and moving average, thereby describing the autocorrelation and non-stationarity of the data. It can effectively handle both seasonal and non-seasonal trends in time series data. In this model, SVM is supported by the famous Scikit-learn [
34] open source project, while ARIMA is supported by the open source project pmdarima, similar to the well-known auto.arima [
35] in R.
Table 3 presents the monthly average values of the prediction results, actual measurements, and residuals of the four models.
Figure 11 shows the residual area plots of each model. It can be seen from the graphs that the residuals of all four models exhibit positive and negative fluctuations, indicating that the predictions fluctuate above and below the actual measurements. However, the TPE-STL-LSTM model’s residual distribution is more even compared to the other three models. As time goes on, the positive and negative fluctuations in SVM and ARIMA residuals become larger. This verifies that the LSTM model can mine and learn the relationships between pieces of long-sequence information, thereby improving prediction accuracy.
To further quantify model prediction accuracy, the evaluation indicators of the TPE-LSTM, TPE-STL-LSTM, SVM, and ARIMA models are calculated, as shown in
Table 4. The TPE-STL-LSTM model performs better on the validation set than the TPE-LSTM model, with a 45.5% reduction in MAPE, a 38.1% reduction in MAE, a 64.3% reduction in MSE, and a 40.2% reduction in RMSE. This indicates that by using the STL decomposition to distinguish different features of the time series data and then allowing the neural network model to learn the data, the model can effectively mine the periodic and trend changes in the data and improve prediction accuracy. Compared to traditional SVM and ARIMA models, the TPE-STL-LSTM model reduces MAPE by 67% and 64%, MAE by 42% and 17%, and MSE by 73% and 66%, respectively. As shown in
Figure 12, all four models can make good predictions of the data trends, with the TPE-STL-LSTM model still demonstrating decent predictive performance for measurements with larger or smaller absolute values.
As a classical statistical analysis method, multiple regression models have the advantages of concise mathematical expressions, good interpretability, and low data requirements, making them widely used in hydrological engineering data analysis. However, multiple regression models have clear limitations in handling nonlinear relationships and high-dimensional data. To ensure the completeness of the analysis, this study applies four different multiple regression models, namely Linear Regression, Second-Order Polynomial Regression, Lasso Regression, and Ridge Regression, provided by the open-source project scikit-learn [
34], to make predictions on a common dataset.
These models are all used to establish the relationships between multiple independent variables and a dependent variable. As shown in
Figure 13, the predicted mean squared errors (MSE) of the four models are around 2.0. However, it should be noted that multiple regression models may fail to capture the complex relationships between historical dependencies and high-dimensional features. This limitation restricts the predictive ability of the models and may result in significant prediction errors.
Neural network models, unlike traditional statistical models, are data-driven black-box models. However, due to their powerful predictive capabilities and automatic feature learning (which eliminates the need for manual feature extraction), they have been increasingly used by researchers in recent years for time series prediction and safety status determination in dam engineering. In this context, the autoregressive model based on historical time series data showed good performance. A series of hyperparameter optimization procedures have made it possible to embed this model into the production system. However, as a black-box model, its interpretability severely limits its development in safety monitoring. Nevertheless, the opacity of the model can be reduced through methods such as pre-training models, data decomposition, and rational network structure design.