1. Introduction
As one of the most widely distributed and frequently occurring geological disasters in nature, landslides pose a greater threat to the environment, natural resources, hydraulic engineering, etc. [
1]. A landslide can be regarded as a nonlinear dynamic system, which is affected by external factors such as rainfall, reservoir water level, and groundwater [
2]. It is reported that plenty of old landslides were reactivated by the periodic reservoir level fluctuation and rainfall since the first impound of the Three Gorges Reservoir (TGR) in 2003 [
3,
4]. The surface displacement of the landslide is one of the important information generated during the landslide deformation process and is of great significance to predict the evolution law and development trend of landslides according to the analysis of it [
5,
6,
7]. Accurately predicting the surface displacement of the landslide is important and necessary for mastering the evolution stages of landslide and realizing the accurate early warning [
8,
9].
The support vector machine (SVM) proposed by Vapnik [
10], as a popular machine learning method that offers solutions for both classification and regression problems, has been widely used in snow avalanche hazard prediction [
11], earth fissure hazard prediction [
12], landslide displacement prediction [
13,
14], etc. The support vector regression (SVR) algorithm is the regression method of SVM, which has many applications in the prediction of time series combined with time-series theory [
15]. The landslide displacement time series is generally regarded as the superposition of the trend component and the periodic component and is predicted separately by the least square method and SVR model. However, the dominant limitation of the SVR model, that is, the penalty parameter C and the kernel function parameter g is difficult to determine, and its accuracy is directly related to the model’s predictive ability [
16,
17,
18,
19,
20]. Thus, the selection of an optimization algorithm for the SVR model plays a very important role [
21]. Miao et al. [
22] compared the prediction results of the genetic algorithm (GA)-SVR, grid search (GS)-SVR, and particle swarm optimization (PSO)-SVR and found that GA-SVR gave the best results. Zhou et al. [
23] developed displacement prediction models based on the analysis of time series data and a PSO-SVR model and obtained accurate results. Cai et al. [
24] and Wen et al. [
25] proposed a least-squares support vector machine (LSSVM) approach with multiple factors and a genetic algorithm (GA). Bui et al. [
26] establish a hybrid intelligent method for the spatial prediction of rainfall-induced landslides by combining LSSVM with artificial bee colony (ABC) optimization.
However, these optimization algorithms all have the disadvantages of falling into a local optimum easily in high-dimensional space and have a low convergence rate in the iterative process. Therefore, ant colony optimization (ACO) as a new biological evolution simulating method, which has the advantages of parallel computing, positive feedback search, and good adaptability can be used to avoid this issue [
27,
28]. The ACO algorithm is easy to realize. Meng et al. [
29] proposed a pricing model with ACO-based SVR, utilized ACO to increase the generalization performance. Hong et al. [
30] and Zhou et al. [
31] used the ACO-SVR model for forecasting purposes, and find that ACO can automatically determine the optimal parameters of SVR with higher predictive accuracy and generalization ability simultaneously. Overall, the existing research has proved that the ACO can facilitate the selection of optimal parameters for SVR, which has great potential in landslide displacement prediction [
32].
The choice of input variables for the SVR model is vital [
33]. Even though the inducing factors like rainfall show a significant seasonal characteristic, most of the existing research has ignored the frequency component of it and how it affects the deformation of the landslide. Zhang et al. [
34] pointed out that the IMFs of empirical mode decomposition (EMD) are extracted from high to low frequencies according to frequencies. Deng et al. [
35] used the ensemble empirical mode decomposition (EEMD) and
t-test methods to extract the frequency components of the inducing factors, based on which, a good prediction effect was obtained. Thus, as an optimized EMD, complete ensemble empirical mode decomposition (CEEMD) combined with a
t-test can highlight the fluctuations in a time series such as high frequency and low frequency components of rainfall and reservoir water levels. High frequency components usually reflect the characteristics of high intensity and frequency of influencing factors, such as multiple heavy rainfalls in a short time, leading to landslide deformation. The low frequency component reflects the factors related to the continuous creep movement under gravitational loads. Additionally, since the periodic component is a complex curve, which affected mostly by external inducing factors like the rainfall or reservoir level, it is critical to select the appropriate inducing factors as the explanatory variable for the prediction of the periodic component according to its generation mechanism [
36,
37]. At present, the selection methods of input variables mainly include gray relational analysis (GRA) [
38], mean influence value (MIV) [
39], maximal information coefficient (MIC) [
7], etc. Dynamic time warping (DTW) as a common time series similarity measurement method is widely used in the time-varying data sequence match since proposed by Berndt due to its simple concept and robust [
40]. It can be utilized in the selection of dominant inducing factors for the prediction of landslides.
By considering the frequency components of inducing factors, optimized parameter selection methods and application of ACO optimization algorithm, this study presents a hybrid prediction method consisting of the CEEMD, the DTW, and the ACO-SVR model to improve the accuracy of the SVR-based prediction model. The monitoring displacement data of landslide were first decomposed into the trend and periodic component by the CEEMD method. The trend component can be fitted and predicted by the least square method. Then, the inducing factors like rainfall and reservoir levels time series were reconstructed into high frequency components, low frequency components, and others with CEEMD and a t-test, respectively. The dominant factors were selected by the method of DTW from the inducing factor group. Finally, the ACO algorithms were used to determine the optimal C and g for the SVR model, which is then trained to predict the periodic component of the landslide displacements. Baishuihe landslide and Shuping landslide from the Three Gorges Reservoir area (TGRA) were studied to verify the effectiveness of the proposed hybrid method. The application of this method demonstrates the preferable effect of it and can be applied to the prediction of other creep landslides affected by the seasonal rainfall and reservoir water level.
2. Proposed Methodology
2.1. Time Series Theory of Displacement Data
The deformation of the landslide is mainly affected by the internal and external inducing factors. Internal inducing factors like geological structures, topography, landforms, etc., controls the displacement performance as an approximately monotonically increasing function over time, reflecting the general trend of the cumulative displacement change of the landslide. The external inducing factors like rainfall, reservoir level changes, etc., controls displacement as an approximately periodic function over time. Besides, there are random displacements caused by random factors such as wind load and traffic load [
22]. Therefore, according to the time series addition model, the landslide displacement curve can be decomposed as follows:
where,
is the observed displacement data of time
,
is the trend component of time
,
is the periodic component of time
,
is the random component of time
.
Due to the limitation of current monitoring methods, the inducing factors of random components have not been well obtained and have been ignored in this article. Formula (1) can be simplified into:
2.2. Complete Ensemble Empirical Mode Decomposition
Complete ensemble empirical mode decomposition (CEEMD), as a modified algorithm of EMD and EEMD, is widely used for time-frequency analysis adapted to non-stationary signals due to its high computing efficiency [
41]. The realization of CEEMD is as follows:
(1) Adding
sets of white noise sequences to the original signal in the form of positive and negative pairs, then 2
set signals are generated.
where
and
are time series after adding positive and negative white noise, respectively;
is the added white noise;
is the original time series.
(2) The EMD algorithm is used to decompose each set of signals, and each set of the signal gets a set of intrinsic mode function (IMF) components and a residue.
(3) The corresponding IMF components and the residue are averaged as the final decomposition result.
where
is the IMF component after decomposition,
,
is the residue;
is the
jth IMF component of
ith signal; and
denotes the number of white noises,
.
2.3. Dynamic Time Warping
Dynamic time warping (DTW) is the algorithm originally used in the field of speech recognition. It is a non-linear warping technique combining time warping and distance measurement calculation and has been widely used for calculating the similarity between time series [
42,
43]. The dynamic time warping distance (DTWD) can be calculated as follows:
(1) Set time series as the test sample sequence; set time series as a reference sample sequence. Set matrix as the distance matrix of and , then,
(2) In the distance matrix, set
as the dynamic time warping path (DTWP) of the test sample sequence and reference sample sequence. Where,
is
kth elements of the DTWP. The path
needs to meet the following conditions:
(3) DTWP is not unique, so the DTWP with the minimum value of
is the best and the corresponding distance is the dynamic time warping distance. Set
as the dynamic time warping distance between test sample sequence
and reference sample sequence
, then:
Set accumulate distance of test sample sequence
and reference sample sequence
as
, so
where
, so
.
By calculating the DTWD between the inducing factors and the periodic component, the smaller the calculation result is, the higher correlation the inducing factor is with the periodic component. Consequently, the inducing factor with the smallest result is selected as the input variable for the SVR model.
2.4. Support Vector Regression Model
The support vector regression (SVR) algorithm is widely used for small samples training and has advantages of being a simple process, having accurate prediction results, and having strong robustness [
22,
44]. It divides sample data into the training sample and test sample, taking the training sample as the input vector, then maps it to higher dimensional feature space, and trains it. Next, the best fitting effect is obtained in the space of the optimal decision function model, and the training sample is used to validate the analytical model results [
15]. However, due to a lack of theoretical methods available to determine the penalty factor and the kernel function parameter (
,
), the approach for selecting
and
must be further studied [
7,
14]. In this paper, ACO was adopted to obtain the optimal parameters for the SVR model, and its performance was compared to the original SVR model and GA-SVR model.
2.5. Ant Colony Optimization
As a general-purpose stochastic optimization algorithm, the ACO mimics the behavior of ants in finding the shortest paths from the colony to food. In this process, ants will leave pheromones on the paths they pass, and the ants followed will then choose paths based on the pheromone’s intensity. When reaching an intersection that has not been passed, ants will randomly select a path and released pheromones, and the amount of pheromone is inversely proportional to the length of the path. Over time, the pheromone on the shorter path will continue to increase, while the pheromone on the other longer paths will gradually decrease or disappear, and eventually, the ant colony will find a suitable optimal path. To imitate this, the artificial in ACO ants performs a mobile search through positive feedback of volatility accumulation of pheromones to select the optimal path, which can avoid falling into the local optimum. Therefore, the key of the ACO lies in the movement rules and pheromone update [
45,
46,
47,
48]. The detailed steps of ACO-SVR are as follows:
Step 1: initialize parameters and variables of the proposed algorithm by randomly assigning a set of parameters (
,
) to each artificial ant. Therefore, the corresponding error model can be obtained by training the training-data through SVR:
where
are the results of the SVR model,
is the value of the training data. Then, the pheromone of ant in
position can be obtained according to the error value predicted by the above error model:
where
is set to 3, so the smaller the error value, the larger the pheromone.
Step 2: Define the artificial ant’s transfer probability. The transfer probability of each artificial ant can be obtained based on the value of the pheromone:
where
is the artificial ant with the highest value of the pheromone.
Step 3: Define the evaporation coefficient
. To avoid local optimum, this study made the evaporation coefficient relatively small at the beginning of the local search and gradually increased with times of iterations. The evaporation coefficient
is defined as:
where
,
is the times of iterations and
is the max times of iterations.
Step 4: In each iteration, a dynamic global transfer factor is established based on the value of the pheromone. After setting the number of artificial ants as m, calculate the value of , then sort the results in ascending order to get a new sequence . When , , ; otherwise, , .
Use the as a criterion, perform the local search when transfer probability , otherwise, perform a global search. At the beginning of the search, most of the ants will perform the local search for a better solution. After that, most of the ants perform the global search to avoid falling into a local optimum and obtain a globally optimal solution.
Step 5: Update pheromone based on the
and
obtained from ACO, the update rules are as follows:
Step 6: Save the optimal solution in each iteration. Keep the ant with the largest pheromone in each iteration and calculate the error value according to the error model, return to step 1, and perform the iteration cycle.
Step 7: The global optimal solution obtained. When the iteration is accomplished, the best ant with the best combination of and is obtained based on the error value.
Step 8: Calculate the corresponding results of SVR according to the best combination of and .
2.6. Procedure of the Proposed Hybrid Algorithm
The schematic diagram of the proposed hybrid model is present in
Figure 1. First, CEEMD was used to decompose the cumulative displacement time series into a trend and periodic components based on time series theory, in which, the trend component was then fitted and predicted by a polynomial function. CEEMD and
t-test methods were used to reconstruct the time series data of rainfall and reservoir level into high- and low frequency components. The two dominant influential factors with the smallest DTWD were selected through DTW for the prediction of the periodic component. Finally, the ACO algorithm was used to determine the optimal
C and
g for the SVR model, which was then trained to predict the periodic component of the landslide displacements. The predicted result of the cumulative displacement was the sum of the predicted trend component and the predicted periodic component.
5. Discussion
The landslide displacement prediction is a hot issue, owing to the long-term risks and challenges in many places of the world. It is a key step to predict and mitigate future landslides. However, due to the nonlinear characteristics of the landslide displacement dataset, the accurate prediction of the landslide occurrence needs a lot of resources and is difficult to implement. Although various methods have been proposed to predict the landslide displacement, the prediction accuracy of these methods is still controversial and uncertainty [
9]. Actually, the high degree of uncertainty in the landslide displacement prediction makes it difficult for any single or specific model to be considered as the most suitable model for all scenarios.
Accuracy prediction of the landslide displacement needs to master the control factors of landslide occurrence [
49]. Traditionally, the control factors should include the factors that make the ground surface vulnerable to damage, such as the internal characteristics of underground conditions, and the factors that trigger the landslide occurrence, such as external events inducing instability. Especially, many landslides located in the TGRA have a typically “step-like” deformation behavior especially cases discussed in this study. The deformation characteristic is mainly controlled by both internal inducing factors and external inducing factors. The internal inducing factors are of significant affection to the general trend of its deformation and development. The external inducing factors such as seasonal rainfall and changes of the reservoir level directly accelerate the deformation and damage of the landslide, which is the main reason for the “step-like” development and evolution of the landslide [
22]. A wide range of topographic, lithologic, seismic, hydrological, and meteorological factors should be utilized to generate a catalog of the control factors for predicting the landslide displacement. The availability of data related to the landslide displacement prediction is often the key factor to determine the selection of the control factors. Concretely, for the two Three Gorges landslides analyzed, rainfall and changers of the reservoir water level as the two main inducing factors play a vital role in the deformation analysis. Obviously, these two factors are not enough to predict the landslide occurrence. Although the inducing factors like rainfall show a significant periodic characteristic, most of the existing research has often ignored the frequency component of it and how it affects the deformation of the landslide. Most of the classical optimized SVR methods, such as PSO-SVR and GA-SVR, do landslide displacement prediction without taking the frequency component of inducing factors into consideration. Therefore, it is inaccurate to predict the deformation of landslide without consideration of the external inducing factors’ frequency component [
11]. On the whole, there is no unified guideline or global standard for the accurate selection of the control factors and the number of factors that should be input into the prediction model. No single or specific model is best suited for all scenarios of the landslide displacement prediction, due to the limitation of the selection of the prediction model that depends on the potential goals, scientific objectives, model limitations, and warnings. Therefore, owing to the nonlinear mechanism of the landslide deformation, some prediction models are more attractive.
This study is committed to further improving the accuracy of the prediction model and has achieved two aspects of improvement. Firstly, CEEMD and DTW are used to extract frequency components, and the mapping relationship between independent variables and variables is established based on frequency components, which is not available in the aforementioned standard SVR model. Compared with other time-series decomposition methods, the CEEMD method has the advantage of avoiding an estimate of the function type of the trend component in advance. Through CEEMD and a t-test, the external inducing factors can be reconstructed into the high frequency, low frequency, and residue components. The high frequency components can reflect local information of inducing factors time series, which shows a strong correspondence with the periodic displacement velocity. The low frequency components, along with the residue components are relatively smooth and very different from the periodic displacement velocity curve, which can be ignored. Secondly, the ACO algorithm is used to further optimize the SVR model, and a hybrid prediction method composed of the ACO-SVR model is formed to realize a better selection of input variables and optimization of the SVR model. Compared with GS, GA, and PSO algorithms, the ACO method can avoid falling into a local optimum in high-dimensional space and has a low convergence rate in the iterative process. The ACO-based SVR model has a better generalization performance and can increase the predictive accuracy of landslide displacement by determining the optimal parameters of SVR automatically.
The deformation characteristic of the reservoir landslide is mainly controlled by both internal inducing factors and external inducing factors. The external inducing factors such as seasonal rainfall and changes of the reservoir level directly accelerate the deformation and damage of the landslide, which is the main reason for the “step-like” deformation and evolution of the landslide [
50]. The external inducing factors show a significant seasonal characteristic that consists of frequency components. The proposed method extracts these frequency components, which can better reveal the corresponding relationship between external inducing factors and landslide deformation. Using them as input variables to be selected can help to improve the prediction effect. Therefore, the proposed method has wide applicability in the prediction of the reservoir landslide, especially suitable for landslide deformation caused by inducing factors like rainfall and reservoir water level fluctuation.
In the study, only several monitoring points were modeled, and only one hundred sets of data are very small for the proposed method. At the same time, these datasets are short for the length of the time series. When large datasets can be used to compute for the computational intelligence algorithms, the influence of controlling factors on the landslide displacement prediction can be reflected more comprehensively.
The proposed method is a data-driven model that is also known as the black-box model with a drawback of only prediction error provided and no information regarding the associated predictive uncertainties. The output of most existing data-driven models is a single estimate of each prediction range, and these single estimates that provide deterministic results are called point predictions [
9]. The uncertainties can affect the accuracy of point estimates and are consisting mainly of parameter uncertainty, model uncertainty, and input uncertainty that could be substantial. Therefore, to improve the reliability and credibility of the proposed hybrid model, incorporate prediction uncertainty into point prediction to quantify the uncertainty is necessary [
51].
Although the proposed method has a small absolute error and an ideal prediction result during the initial sample training stage and the displacement with steady growth, the error of predicted displacement will inevitably increase where the observed displacement changes sharply. Therefore, to establish a more accurate causal relationship, the latest monitoring data should be gradually replaced, and earlier information should be deleted [
52].
6. Conclusions
The conclusions were summarized from this study as below:
(1) The results of the proposed hybrid model show that after considering the frequency components of landslide-induced factors, the prediction accuracy of the displacement prediction model based on ACO-SVR was better than that of other models based on SVR and GA-SVR.
(2) CEEMD with high decomposition accuracy and high operation efficiency could better highlight the local fluctuation characteristics of the inducing factors time series. It is suitable for extracting the trend component displacements of landslide’s displacements. After CEEMD decomposition, the frequency components of the landslide inducing factors could be divided into three parts, the high frequency components, the low frequency components, and a residue. The residue reflected the long-component trend of the inducing factor, and the high frequency component strongly affected the landslide periodic displacement.
(3) Overall, through the comparative analysis and prediction applied in the Baishuihe landslide and Shuping landslide, it was proved that the proposed hybrid displacement prediction method based on CEEMD reconstruction and DTW-ACO-SVR model was effective. This method has the potential for broad application to predict the landslide affected by seasonal reservoir water levels and rainfall.