3.1. Displacement Decomposition
Understanding the impact of influencing factors on the prediction and development of landslides is crucial in this analysis [
11]. Based on previous research, we mainly selected some influencing factors related to rainfall and reservoir water level.
In this study, the original displacement of landslides was subjected to a decomposition process that separates it into two components: trend and period terms. Various factors, including geological tectonism, weathering, and the specific stage of evolution of the deformation, influence landslide trend terms.
The periodic term of landslides corresponds to the short-term displacement observed in wading landslides within the Three Gorges Reservoir Area [
25,
26]. This displacement is predominantly influenced by two factors: rainfall changes and reservoir water level changes. It is important to note that this study did not consider displacement caused by random factors, as it is challenging to monitor and generally of relatively small magnitude [
27].
To analyze the accumulated displacement time series, the accumulated displacement time series can be decomposed as follows:
where
m indicates the time step, that they are all spaced by an interval
. The index m corresponds to the time instant
·
is the accumulated displacement,
is the trend term, and
is the periodic term.
3.3. Recurrent Neural Networks and Long Short-Term Memory Neural Networks
The standard recurrent neural networks (RNNs) module contains a single layer (
Figure 4a) [
29]. RNNs have input and output units that contain data sets accordingly. We marked the input data set as
, while the output data set as
. RNNs also contain hidden units, whose output set is marked as
. The calculation of
is expressed as a non-linear activation function, which reads.
where
is the state of the hidden layer at time step
t, which is calculated based on the output of the current input layer and the status of the previous hidden layer.
f is a non-linear activation function.
It is correct to point out that traditional RNNs can suffer from vanishing or exploding gradient problems, making them less effective in handling long-term dependencies. However, LSTM (Long Short-Term Memory) neural networks (
Figure 4b), which are a specific type of RNN, are designed to overcome these limitations [
30].
In an LSTM network, each RNN unit is replaced with a memory block containing three gate functions: the input gate, the output gate, and the forget gate. Those gate functions control the flow of information within the memory block.
By incorporating these gate functions, LSTM networks can effectively capture and learn long-term dependencies in a data sequence. This makes them particularly well suited for handling and modeling complex temporal relationships over extended periods.
3.5. Stacking and Its Optimization
When dealing with complex data, achieving an accurate fit with a single model can be difficult. Moreover, single models often exhibit limited robustness to disturbances. To address these challenges in displacement prediction, using an ensemble model, consisting of multiple strategically combined models, can be beneficial. This approach leverages individual models’ diverse strengths and weaknesses to enhance the ensemble model’s overall generalization ability [
35].
In this study, the stacking approach combined Boosting and Bagging, which are two commonly used ensemble learning techniques.
Stacking, also known as stacked generalization, involves modeling the stacked predictions generated by multiple base learners fitted to the original data [
36]. The base learners are first trained on the original data to produce individual predictions in this process. These predictions are then horizontally stacked together, resulting in a two-dimensional array where the rows represent the samples and the columns represent the base learners. Subsequently, this newly created data set is used as input for a higher-level model to improve prediction performance further. The diagram provided in
Figure 6 depicts the schematic representation of a traditional stacking model.
The sliding window method is a commonly used data processing technology, often used in extracting several subsequences from a long data sequence and then performing certain calculations and analyses on these subsequences. The basic idea of this technique is to divide the input sequence into several fixed-length subsequences (also called windows) and process the data within each window interval to obtain a result sequence. For example, we can divide the original sequence into several fixed-length windows in time-series data analysis. Then, various statistics and calculations, such as mean, variance, maximum, and minimum, can be performed on the data within each window, serving as the feature values of this window. By extracting features from data within the window, we can effectively reduce the data dimensionality and enhance the data representation and utilization efficiency.
This paper introduces a model based on stacked ensemble learning. In the initial layer of this model, we have employed four distinct deep learning algorithms as base learners, thereby constructing the respective base models, each of which is crafted using a singular deep learning algorithm. Upon completion of the training of these base models, their performance is evaluated on the validation set. It is noteworthy that the stacking ensemble technique itself does not intervene with the models; it refrains from providing predictive data post-training of the base models. In the subsequent layer, we employ a straightforward regression algorithm to train the data outputted from the first layer. This algorithm necessitates no hyperparameter optimization and, following the division of the training and prediction sets, directly furnishes the prediction outcomes.
The conventional stacking model utilizes k-fold cross-validation to handle the dataset, which may have limitations when applied to time series problems. For example, the fact that each training dataset contains information from other samples may lead to data leakage issues, resulting in overly optimistic evaluation results. In this study, if future data was adopted to predict previous data, it would go against timeliness. To overcome this challenge, this study introduced an optimized stacking model incorporating the sliding window method to process the raw data set, thus preserving the inherent temporal sequence. The flowchart of this method is shown in
Figure 7.
Like traditional stacking frameworks, the improved stacking framework utilizing the sliding window method can be divided into two primary components: first-layer algorithms and second-layer algorithms.
The first-layer algorithms partitioned the dataset into a fitting set and a test set in a 6:1 ratio. The fitting set included training and validation sets with varying sample sizes. Notably, not all of the fitting set data was used for training in each base model. The sliding window method was employed to avoid validating past landslide data with future landslide data. Based on the previously mentioned dataset partitioning period of 12, the validation datasets consist of the sample data corresponding to the last twelve steps of each training set. Consequently, five different datasets were established with training-to-validation ratios of 1:1, 2:1, 3:1, 4:1, and 5:1, respectively, creating five base models. Then, the GS optimization algorithm is used to obtain the best hyperparameters for these models, and the retained prediction set is used to evaluate the models.
In the second-level algorithm, the input dataset was constructed from the outputs of the first-level algorithm. Specifically, for each base learner, datasets with five distinct sample sizes generated a set of validation data, VR n, after validating the validation set with the training set. These five sets of validation data were then stacked to form the input data for the second-level algorithm’s training samples, with the output value being the actual landslide displacement. Similarly, different base models within each base learner were tested on a reserved test set, and the processed prediction results were used as the input data for the second-level algorithm’s prediction samples, with the output value representing the unknown displacement.
The traditional stacking model often employs k-fold cross-validation, with the handling of the prediction set typically involving the computation of an average. In the sliding window approach, the number of samples per base model varies, and the differences influence the diversity of information provided by each base model to the meta-model in the training data and algorithms of these models. Consequently, determining the weighting for each base model’s predictions within the test set is paramount. In this study, the weighting of base model outputs for the meta-model’s predictive set was determined based on the mean absolute error (MAE) metric calculated on the test set. The MAE, the average of absolute errors, provides a more accurate reflection of the actual error in predictions. A smaller MAE value signifies better model performance, and a greater weighting is assigned. After weighing the results of each base learner’s output, the data obtained will serve as the predictive set for the second-level algorithm.