*Article* **Multi-Scale Response Analysis and Displacement Prediction of Landslides Using Deep Learning with JTFA: A Case Study in the Three Gorges Reservoir, China**

**Yanan Jiang 1,2, Lu Liao 3,4,\*, Huiyuan Luo 2, Xing Zhu <sup>2</sup> and Zhong Lu <sup>5</sup>**


**Abstract:** Reservoir water and rainfall, leading to fluctuations groundwater levels, are the main triggering factors that induce landslides in the Three Gorges Reservoir area. This study investigates the response mechanism of landslide deformation under reservoir water and rainfall variations through long-time on-site observations. To address the non-stationary characteristics of the timeseries records, joint time-frequency analysis (JTFA) is first introduced into our landslide prediction model. This model employs optimal variational mode decomposition (VMD) to obtain specific signal components with clear physical meaning, such as trend component and periodic components. Then, multi-scale response analysis between the displacement and external factors three wavelet methods was conducted. The analysis results show a 1 year primary cycle of the time series associated with the landslide evolution. The reservoir water level and rainfall show anti-phase fluctuations. The periodic displacement correlates significantly with rainfall, lagging by about two months. The reservoir water is anti-phase with the landslide displacement, preceding it by approximately three months (−51 ± 8◦ phase difference). For landslide displacement prediction, the gated recurrent units (GRU) neural network model is integrated into the deep learning forecasting architecture. The model takes into account the correlation and hysteresis effect of input variables. Through six experiments, we investigate the effect of data volume on model predictions to determine the optimal model. The results demonstrate that our proposed model ensures high performance in landslide prediction. Moreover, a comparison with six other intelligent algorithms shows the advantages of our model in terms of time-effectiveness and long-sequence forecasting.

**Keywords:** joint time-frequency analysis (JTFA); multi-scale response analysis; hysteresis effect; deep learning forecasting model

#### **1. Introduction**

Since the Three Gorges Reservoir (TGR) was put into operation in June 2003, the stability of slopes along the TGR bank has changed dramatically in response to the periodic fluctuation of the rainfall intensity and the reservoir storage [1–3], resulting in thousands of reactivated landslides. The geomechanical and hydrologic characteristics of these landslides are continually affected by the consistent water levels and rainfall changes, leading to catastrophic events [4]. For example, the July 2003 Qianjiangping landslide developed

**Citation:** Jiang, Y.; Liao, L.; Luo, H.; Zhu, X.; Lu, Z. Multi-Scale Response Analysis and Displacement Prediction of Landslides Using Deep Learning with JTFA: A Case Study in the Three Gorges Reservoir, China. *Remote Sens.* **2023**, *15*, 3995. https://doi.org/10.3390/rs15163995

Academic Editors: Qiusheng Wu, Xinyi Shen, Jun Li and Chengye Zhang

Received: 30 June 2023 Revised: 30 July 2023 Accepted: 8 August 2023 Published: 11 August 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

during the initial reservoir impoundment, causing 24 fatalities and significant economic losses [5]. Since then, research on long-term monitoring and early warning of these landslides has steadily expanded, leading to a deeper understanding of the inherent patterns and the response mechanism of reservoir landslides.

Studies have revealed that the evolution of the reservoir landslide involves a complex, non-stationary and nonlinear dynamic process [6,7]. Consequently, the long-term observations collected on the landslide's behavior exhibit non-stationary time-varying characteristics with specific trends and/or seasonality. Although the deformation evolution and response analysis of reservoir landslides in TGR have been extensively studied and discussed in the time domain [8–10], their characteristics in the frequency domain are relatively unexplored, concealing valuable spectral information. Furthermore, non-stationary time series show time-frequency characteristics [6,11,12], making time-frequency representation of the signal essential for identifying the dominant fluctuation modes/spectra of different signals and understanding how those modes vary over time.

The empirical mode decomposition (EMD), developed by Huang et al. [13], is wellknown for its time-frequency analysis, which is suitable for processing non-stationary time series. This method decomposes the data into different band-limited intrinsic mode functions (IMFs), but is sensitive to noise and sampling [14]. The variational modal decomposition (VMD), a more recent signal decomposition technique, enables fully intrinsic and adaptive decomposition [14]. In addition, Isham et al. also pointed out the importance of determining the number of VMD modes to obtain specific signal components with explicitly physical meaning [15]. To achieve efficient decomposition, the Shannon entropy, reflecting the uniformity of probability distribution, can be used as a fitness function to derive optimal parameters through an optimization method, such as the harmony search (HS) algorithm [16] and the wolf optimization algorithm (GWO) [17] employed in this study. GWO is easily implemented and exhibits advantages in strong convergence with few parameters.

Moreover, it is crucial to explore the response mechanisms of landslide deformation [18] to unveil the changing patterns of specific signal components at different scales. In 1998, Torrence et al. proposed wavelet analysis (WA) to study non-stationary time series in signal processing [19]. Presently, wavelet decomposition and denoising are the most popular WA methods in time series analysis widely used in Geophysics [19], Geography [20] and Meteorology [21]. However, WA methods, such as continuous wavelet transform (CWT), cross-wavelet transform (XWT) and wavelet coherency (WTC), are more suitable for analyzing non-stationary time series at different scales [7,22–24]. Unfortunately, they are rarely used to analyze long-term non-stationary observations associated with landslide behavior. As a result, this study will extend WA methods and optimized VMD into a deep learning framework to conduct a joint time-frequency analysis (JTFA) before establishing a landslide forecasting model.

So far, various approaches have been developed for landslide hazards research, ranging from physics-based to statistics-based models [9,25,26]. Over the past decades, the integration of machine learning (ML) methods into landslide research has led to significant advancements [2,8,9]. In recent years, deep-learning approaches, like recurrent neural networks (RNN) and their variants, designed to address the gradient explosion problem, have shown impressive success in predicting landslide displacements based on GNSS time series data [4,10,27,28]. Among these variants, the gated recurrent unit (GRU) proposed by Chung et al. [29] stands out, surpassing other RNN variants, such as the LSTM, in terms of training time, parameter update and generalization ability. Its outstanding performance extends to diverse fields, including economics [30], flood control [31] and disaster reduction [10], among others.

This paper presents a deep learning neural network model for predicting landslide displacement, incorporating a joint time-frequency analysis (JTFA) module. The JTFA employs optimal variational mode decomposition (VMD) to obtain specific signal components with clear physical meaning (e.g., trend component and periodic component). Subsequently, multi-scale response analysis between the displacement and external factors is carried out by wavelet transform (CWT), cross-wavelet transform (XWT) and wavelet coherency (WTC). The enhanced GRU model based on deep learning architecture is employed to forecast landslide displacement (Figure 1e–g). Additionally, the model considers the correlation and hysteresis effect of input variables, such as impact factors, during the modeling process.

**Figure 1.** Architecture of the forecasting model.

#### **2. Methodology**

As can be seen in Figure 1, the forecasting model consists of seven parts: Figure 1 (a) signal decomposition of landslide displacement, Figure 1 (b) signal decomposition of the impacting factors, Figure 1 (c) multi-scale response analysis of landslide deformation by JTFA, Figure 1 (d) data packet preparation for the deep-learning neural network model, Figure 1 (e) trend displacement prediction by DES, Figure 1 (f) periodic and random displacement prediction by GRU, and Figure 1 (g) cumulative displacements prediction by the optimal model.

First, the on-site measured landslide displacements and the pre-selected impact factor sequences are decomposed into specific IMF components by an optimal VMD (Figure 1a,b). The number and parameters of the VMD modes with explicit physical meanings are determined by employing the Shannon entropy as a fitness function and the GWO optimization method. Then, a multi-scale response analysis between the displacement and external factors is carried out by three wavelet analysis methods to investigate landslide deformation response mechanisms and to reveal the changing patterns of specific signal components at different scales (Figure 1c). Finally, the enhanced GRU and DES model based on deep learning architecture is used to predict the landslide displacement (Figure 1e–g).

#### *2.1. Joint Time-Frequency Analysis (JTFA)*

The collected long-term observations associated with the landslide's behavior are nonstationary time series with specific trends or/and seasonality. The landslide displacement is a time-series function controlled by geological conditions, reflecting long-term deformation trends. While subject to external influences, the landslide displacement behaves as a periodic variation with near-white noise associated with random factors. Therefore, the landslide displacement time series can be expressed as follows:

$$Y\_t = T\_t + P\_t + R\_t \tag{1}$$

where *Yt* is the landslide displacement at time *t*; while *Tt*, *Pt* and *Rt* are the trend, periodic and random terms, respectively.

While studies on the deformation characteristics of landslides have been extensively explored in the time domain, the significance of time-frequency analysis of the signal should not be overlooked. Time-frequency analysis reveals the time-frequency features, and it provides access to the implicit spectral information of displacement *Yt* with different spectra, enabling a time-frequency perspective of information mining. Having a time-frequency representation is indispensable to investigate the dominant fluctuation spectra of deformation signals and understand how those spectra vary over time. This involves one-dimensional time series converting into a diffuse two-dimensional time-frequency image, allowing for a comprehensive examination of the signal's temporal and frequency properties.

#### 2.1.1. Grey Wolf Optimized Variational Mode Decomposition (GWO-VMD)

The VMD enables fully intrinsic and adaptive signal decomposition [14], by which the input displacement signal could be divided into several intrinsic mode functions (IMFs) with specific characteristics determined in advance. To ensure that (1) the central frequency bandwidth of each IMF is limited, (2) the sum of the estimated bandwidths is minimized, and (3) the sum of each IMF equals the original signal *f* so that the constrained variational equation can be expressed as follows:

$$\min\_{\{\{u\_k\}, \{w\_k\}}} \left\{ \sum\_k \left\| \partial\_t [(\delta(t) + j/\tau t) \* u\_k(t)]e^{-j\omega\_k t} \right\|\_2^2 \right\} \tag{2}$$
  $s.t. \ \sum\_k^K u\_k = f$ 

where *uk* is the *k*th decomposed IMF with a center frequency *wk*; and *δ*(*t*) and \* indicate the Dirac function and the convolution operator, respectively.

Then, the Lagrange multipliers *λ* and decomposition parameter *ε* are introduced to transform the constrained problem into an unconstrained variational one, yielding the augmented Lagrange expression:

$$\begin{aligned} L(\{\boldsymbol{u}\_k\}, \{\boldsymbol{\omega}\_k\}, \lambda) &= \\ \varepsilon \sum\_k \left\| \partial\_t [ (\delta(t) + j/\tau t) \* \boldsymbol{u}\_k(t) ] e^{-j\omega\_k t} \right\|\_2^2 + \\ \left\| f(t) - \sum\_k \boldsymbol{u}\_k(t) \right\|\_2^2 + \left< \lambda(t), f(t) - \sum\_k \boldsymbol{u}\_k(t) \right> \end{aligned} \tag{3}$$

Here, the alternating direction multiplier method (ADMM) is used to search the saddle points of the unconstrained model to obtain the optimal IMFs and the center frequency of the constrained model. This process is defined as a function *fVMD* in the Algorithm 1.

The VMD is used to decompose complex digital signals, and the empirical decomposition parameter *ε* is usually *K* > 3. This value will not be physically meaningful when applied to the landslide displacement decomposition. Thus, obtaining suitable VMD modes is crucial to getting specific signal components with explicitly physical meaning. At this point, the Shannon entropy (Equation (4)) is introduced to obtain the optimal IMFs with sparse features: the larger the value is, the better the uniformity of the probability distribution is [32].

$$H(\boldsymbol{\mu}\_k) = -\sum\_{i=1}^n \boldsymbol{\mu}\_k^i \log \boldsymbol{\mu}\_k^i \tag{4}$$

As for the landslide's periodic displacement, the probability distribution is relatively uniform and sparse. Thus, the value of *ε* is optimized using the grey wolf optimizer (GWO) with *H*(*uk*) as the fitness function [26] to obtain the periodic displacement with a uniform probability distribution and strong sparsity along with the trend and random terms.

The landslide displacement *Yt* is used as input. The population size *P* of the grey wolf, the upper bound *Xup* and lower bound *Xlow* of the individual grey wolves are initialized. Then, the algorithm iterates and updates *N* times to obtain the optimal value of ε (see the Algorithm 1). In the Algorithm 1, both *A* and *C* are coefficient vectors with *A* = 2*a* · *r*<sup>1</sup> − *a*

and *C* = 2 · *r*2. *r*<sup>1</sup> and *r*<sup>2</sup> are random values between 0 and 1. The convergence factor, *a* = 2 − 2n/*N*, decreases linearly from 2 to 0 during the iteration.


#### 2.1.2. Wavelet Analysis (WA)

Torrence and Compo [22] proposed WA to study non-stationary time series in signal processing. The most popular WA methods in time series analysis include wavelet decomposition and denoising. Additionally, there are other WA methods, such as the continuous wavelet transform (CWT), the cross-wavelet transform (XWT) and the wavelet coherency (WTC), which excel at non-stationary time series analysis at various scales. However, despite their effectiveness, these WA methods are still rarely used in non-stationary time series analysis associated with the behavior of landslides.

For a discrete time-series *xn* (*n* = 1, ..., *N*) with equal time spacing *δt*, the CWT is used for mapping the changing properties of non-stationary signals. CWT is defined as the convolution of *xn* with a scaled and translated version of the wavelet basis function:

$$\mathcal{W}\_{n}(s) = \sum\_{n'=1}^{N-1} \mathbb{x}\_{n'} \psi \* \left[ \frac{(n'-n)\delta t}{s} \right] \tag{12}$$

where \* indicates the complex conjugate, *s* represents the wavelet scale, and *n* is the localized time index. The non-orthogonal Morlet wavelets are chosen as the basis function: *<sup>ψ</sup>*0(*η*) = *<sup>π</sup>*−1/4*eiw*0*ηe*−*<sup>η</sup>*2/2, where *<sup>η</sup>* and *<sup>w</sup>*<sup>0</sup> are non-dimensional parameters of time and frequency, respectively. When *w*<sup>0</sup> = 6, the wavelet scale *s* equals the Fourier period. The wavelet basis functions *ψ*<sup>0</sup> are then normalized to have unit energy, and the CWT is conducted in the Fourier space to improve efficiency.

In Equation (12), *Wn*(*s*) is a complex value, and thus, |*Wn*(*s*)|2 denotes the wavelet power spectrum that clearly discriminates the periodic fluctuation and intensity in the time series. As the input data is assumed to be cyclic, edge effects appear in the power spectrum when dealing with finite-length time series. Hence, the cone of influence (COI) is defined, representing the corresponding edge effects within the region of the power spectrum. In addition, a Fourier power spectrum is constructed by Monte Carlo to represent the red noise background spectrum and later performs a χ<sup>2</sup> test with a specific confidence level, e.g., 95%, to calculate each scale and construct the confidence contour with a 95% confidence level.

The XWT of two time-series *xn* and *yn* is defined as *WXY* = *WXWY*\*, which measures the similarity between two waveforms. The arg(*WXY*) is the relative local phase between *xn* and *yn* in the time-frequency space. The mean and confidence interval must be estimated to analyze the phase difference. The phase relation is quantified using the periodic mean of phases over a region with a significance level of 5% outside the COI. The periodic mean of a set of phase angles is defined as follows:

$$a\_m = \arg(X, Y)\text{ with }X = \sum\_{i=1}^{n} \cos(\alpha\_i)\text{ and }Y = \sum\_{i=1}^{n} \sin(\alpha\_i)\tag{13}$$

Because the phase angles are not independent, it is difficult to calculate the confidence interval of the mean angle. Thus, the solution is obtained by calculating the scatter around (

the mean using the circular standard deviation *er* = <sup>−</sup>2 ln<sup>√</sup>*X*<sup>2</sup> <sup>+</sup> *<sup>Y</sup>*2/*<sup>n</sup>* .

The power spectrum of XWT only shows the high common power part, while the WTC measures the coherence of the XWT in the time-frequency space. The WTC of two-time series is defined as follows:

$$R\_n^2(s) = \frac{\left| S\left(s^{-1} \mathcal{W}\_n^{XY}(s)\right) \right|^2}{S\left(s^{-1} \left| \mathcal{W}\_n^X(s) \right|^2\right) \cdot S\left(s^{-1} \left| \mathcal{W}\_n^Y(s) \right|^2\right)}\tag{14}$$

where *S* is the smooth operator, and *S*(*W*) = *Sscale*(*Stime*(*Wn*(*S*))). *Sscale* and *Stime* indicate smoothing along the wavelet scale axis and over time, separately.

#### *2.2. Deep Learning Forecasting Model*

#### 2.2.1. Gated Recurrent Unit (GRU)

The gated recurrent unit (GRU) is a variant of recurrent neural networks (RNN), which outperforms other RNN variants, e.g., the LSTM [29], in terms of training time, and has fewer parameters. The GRU enables each recurrent unit to capture dependencies adaptively at different time scales. As a refinement of the general RNN structure, GRU is more straightforward, owing to the absence of a separate storage unit (see Figure 2) regulating the internal information flow via a gated unit. Moreover, it is more efficient in training time, parameter update and generalization ability than other RNNs [29].

**Figure 2.** The network structure of GRU.

The update gate *zt* is responsible for determining how much of the past information is to be neglected, while the reset gate *rt* is responsible for deciding how much past knowledge needs to be passed along into the future. Assume that the current input at *t* is *xt*, then one forward calculation of GRU is as follows:

$$z\_t = \sigma(\mathbb{W}\_z \cdot [h\_{t-1}, \mathbf{x}\_t] + b\_z) \tag{15}$$

$$r\_t = \sigma(\mathcal{W}\_r \cdot [h\_{t-1}, \mathfrak{x}\_t] + b\_r) \tag{16}$$

$$\tilde{h}\_t = \tanh\left(\mathcal{W}\_{\overline{h}} \cdot \left[r\_t \* h\_{t-1}, \mathbf{x}\_t\right] + b\_{\overline{h}}\right) \tag{17}$$

where *σ* indicates the sigmoid function, and *W* and *b* are the weight and bias parameters, respectively.

The hidden state at the present moment *ht* ht of the network is determined by the update gate, the candidate state of the previous time step *ht*−<sup>1</sup> and the current moment )*ht*. The output of the loop *y*ˆ*<sup>t</sup>* can be calculated by

$$h\_t = (1 - z\_t) \* h\_{t-1} + z\_t \* \dot{h}\_t \tag{18}$$

$$
\hat{y}\_t = \sigma(\mathsf{W}\_\mathsf{o} \cdot \mathsf{h}\_t) + \mathsf{b}\_\mathsf{o} \tag{19}
$$

During the training process, the mean-square error (MSE) is used to measure the errors, and the loss function is defined as *loss* <sup>=</sup> *<sup>T</sup>* ∑ *t*=1 (*y*ˆ*<sup>t</sup>* − *xt*) 2 /*T*, which is optimized by the adaptive moment estimation (Adam) [33] to obtain the minimization loss, and update *W* and *b* iteratively.

#### 2.2.2. Double Exponential Smoothing (DES)

Double exponential smoothing (DES) is a weighted moving average method suitable for predicting time series with certain trends. This method operates by adopting a weighted combination of the past observations and recent observations, with relatively higher weights assigned to the more recent data points. The slope component is updated through exponential smoothing [34], making it well-suited for time-sensitive landslide displacement prediction. Thus, DES is employed to predict the trend component of the landslide displacement.

Given a displacement time series with a specific trend *xt*, *t* = 1, ..., *N*. *st* represents the exponential smoothing value, *bt* represents the optimal trend estimate. The model's output *y*ˆ*t*+*<sup>m</sup>* is an estimate of *xt+m* at *t* + *m* (where *m* ≥ 1) and can be written as:

$$s\_t = \mathbb{Q}\mathbf{x}\_t + (1 - \mathbb{Q})(s\_{t-1} + b\_{t-1}) \tag{20}$$

$$b\_t = \xi(s\_t - s\_{t-1}) + (1 - \xi)b\_{t-1} \tag{21}$$

$$
\mathfrak{g}\_{t+m} = \mathfrak{s}\_t + mb\_t \tag{22}
$$

where *s*<sup>1</sup> = *x*<sup>1</sup> and *b*<sup>1</sup> = *x*<sup>1</sup> − *x*0. The *ζ* and *ξ* are smoothing factors for the data and the trend, respectively, that range from 0 to 1 and are usually set as *ζ* = 0.98 and *ξ* = 0.99.

#### 2.2.3. Evaluation Indicators

Two evaluation indicators, namely the coefficient of determination (R2) and the root mean square error (RMSE), are introduced to evaluate the performance of the deep learning architecture forecasting mode. These indicators are widely used in deep learning regression tasks and are defined as follows:

$$R^2 = 1 - \frac{\sum\_{t=1}^{N} (y\_t - \hat{y}\_t)^2}{\sum\_{t=1}^{N} (y\_t - \overline{y})^2} \tag{23}$$

$$RMSE = \sqrt{\frac{1}{N} \sum\_{t=1}^{N} \left( y\_t - \hat{y}\_t \right)^2} \tag{24}$$

where *yt* and *y*ˆ*<sup>t</sup>* are the observations and the predicted value at time *t*, respectively, and *y* represents the mean of the *N* observed measures.

#### **3. Multi-Scale Response Analysis with JTFA**

#### *3.1. Pre-Processing of the Collected Data Sequence*

Baishuihe landslide is a typical loose mound landslide in the TGR, about 56 km west of the Three Gorges Dam in Zigui County, Hubei Province (see Figure 3a). The landslide slope is about 30◦ with an average thickness of about 30 m, and the volume is about <sup>1260</sup> × 104 m3. The sliding surface is a contact zone between residual slope deposit and bedrock that is about 0.9–3.1 m thick. Its bedrock lithology is a medium-thick sandstone layer with a thin mudstone layer, yielding 15◦ ∠ 36◦, with joints and fissures developed in the rock layer. The slide material consists mainly of residual Quaternary slope deposits of gravelly soil with a gravel content of 20% to 40% [35].

**Figure 3.** (**a**) Location of the landslide. (**b**) monitoring layout diagram. (**c**) geological profile III' through point ZG93.

As shown in Figure 3b, 11 GNSS monitoring stations (with a plane accuracy of 5 ± 1 ppm) were deployed on the Baishuihe landslide; six stations, namely the ZG93, ZG118, XD-01, XD-02, XD-03 and XD-04, are located in the warning zone. The observed displacements and the corresponding rainfall and reservoir levels are illustrated in Figure 4. The TGR is in the wet season from May to September when the reservoir begins to play a regulatory role of releasing flood water in moderation per annum. At this time of year, a clear pattern of a step increase in the cumulative displacement appears, indicating a combined effect of the changing reservoir water levels and rainfall on the landslide evolution. However, monitoring stations in the non-warning area give a relatively small displacement variation with an average annual amount of around 35 mm. In the warning zone, the ZG93 station is located on profile III. As shown in Figure 3c, the mid-slip zone runs the entire slope length; the leading edge of the landslide has been submerged in water for years, making the stability of the slope within this area vulnerable to the reservoir water level.

**Figure 4.** Monitoring data in time series of the Baishuihe landslide.

In the subsequent analysis, the site monitoring data collected by ZG93 is selected as the data source of displacement. Other synchronous on-site data include the monthly reservoir water level data and the local rainfall data in the TGR between July 2003 and December 2012. As shown in Figure 4, the time-series displacement of the landslide shows specific trends, seasonality and noise-induced fluctuations, indicating the need to obtain IFMs based on VMD. Setting *K* = 3 in VMD allow us to acquire three IMFs in an increasing order of frequency. The influence factors, e.g., the reservoir level and rainfall, mainly contribute to the periodic terms, showing cyclical fluctuations. At this point, the Shannon entropy is introduced as the fitness function to obtain the optimal IMFs by searching for the optimum *ε* using the GWO. The population size of the GWO is set to 20, the number of iterations to 100, and the upper-lower bounds to [0.01,100]. The optimal *ε* of ZG93 was determined to be 0.55; it is then utilized in the VMD to obtain the trend, the periodic and the random components of the displacement (Figure 5).

**Figure 5.** Decomposition result of displacement at ZG93.

#### *3.2. Pre-Selection of the Impact Factors*

Surface water infiltration is an essential factor affecting slope stability, especially concerning the matrix suction of the unsaturated zone. When the slope experiences rainfall infiltration, the bulk density increases, causing the matrix suction to decrease, which in turn raises the sliding forces and reduces the shear strength [36]. Moreover, rainfall infiltration into fractures enhances the split effect and raises the groundwater table, resulting in an increase in porewater pressure [37], thus affecting the stability of the landslide. As shown in Figure 6, the effect of rainfall on landslide displacement has an evident hysteresis. For example, the wet season in the TGR area typically begins in May, but the

response of landslide deformation does not manifest until July or August. This delayed response highlights the complex interactions between rainfall, infiltration and subsequent landslide movements.

**Figure 6.** Rainfall effects on landslide displacement.

Studies have shown that the periodic deformation highly correlates with the cumulative rainfall of the current month and the preceding two months [2]. In this study, grey correlation analysis gives a correlation degree of 0.79 and 0.78, further confirming their close relationship with the displacement time series. Thus, the monthly rainfall *V*<sup>2</sup> and the two-month cumulative rainfall *V*<sup>3</sup> are selected to characterize the rainfall effects on landslide displacement, thereby avoiding transitional redundancy.

The periodical variation of reservoir level alters the distribution of the seepage field and the stress state of the rock mass, directly influencing the stability of the landslide. The rapid decline in reservoir water results in a higher hydraulic gradient inside and outside the slope [38]. The seepage force along the slope greatly affects the landslide's stability, especially during the late reservoir water decline when rainfall concentration occurs. The landslide deformation and failure process are further accelerated [39]. As shown in Figure 7, the changes in the periodical deformation are consistent with the reservoir level fluctuation.

**Figure 7.** Relations between the periodical deformation and the reservoir level fluctuation.

To characterize the reservoir level effects on landslide displacement, the monthly mean water level *V*1, the monthly variation *V*4, the monthly change rate *V*<sup>5</sup> and the bimonthly variation *V*<sup>6</sup> of the reservoir water level are selected. These variables exhibit a grey correlation degree of 0.78, 0.81, 0.8 and 0.76, respectively, signifying their significant relationship with the displacement of the landslide.

During the decomposition of the influence factors, we set *K* = 2 because they mainly show cyclical fluctuations. The other VMD settings are the same as those used for the landslide displacement. The Shannon entropy of the periodic term of the factors serves as the fitness function to obtain the optimal ε. The six obtained values of ε are 0.23, 0.99, 0.13, 0.98, 0.11 and 0.95, respectively. The results are given in Figure 8, where subscript *p* indicates the periodic term and subscript *s* indicates the random term.

**Figure 8.** Decomposition results of the six influence factors (*V*1–*V*6), with the subscript *p* indicates the periodic term and subscript *s* indicates the random term.

#### *3.3. Multi-Scale Response Analysis*

Investigating the deformation characteristics and the response analysis at different scales is crucial to revealing the landslide mechanism. To achieve this, the JTFA is introduced to conduct a multi-scale response analysis between the displacement and external factors by three WA methods to investigate the deformation response mechanisms and reveal the changing patterns of specific signal components at different scales. The WA methods employed in this study include the continuous wavelet transform (CWT), the cross-wavelet transform (XWT) and the wavelet coherency (WTC).

#### 3.3.1. Analysis of CWT

The CWT extends time series analysis into a time-frequency domain to intuitively map the changing properties of non-stationary signals. Through CWT, we have observed an apparent yearly periodicity (1 year cycle) in the time series data associated with the Baishuihe landslide. This periodicity is evident in the landslide displacement, rainfall intensity and reservoir levels fluctuation, all coinciding with the primary cycle.

As critical external factors influencing landslides displacement, the variation of the rainfall and reservoir water level (R.w.l) also exhibits specific patterns. With a subtropical monsoon climate, the precipitation in the TGR shows seasonal features characterized by a total rainfall exceeding 1000 mm per year. The rainfall is concentrated mainly in summer and winter [40], which aligns with the 1 year cycle (12 months) of monsoon-related patterns obtained by the CWT analysis (see Figure 9a).

**Figure 9.** The CWT power spectrum of the rainfall (**a**), the reservoir water level (R.w.l) (**b**) and the displacement at ZG93 (**c**). Regions enclosed by the heavy solid black line show a greater than 95% confidence level of the red noise standard spectrum. The cross-hatched regions on either end indicate the "cone of influence", where edge effects become important. Color changes mean different magnitudes of power.

The storage of the TGR can be categorized into three stages: Stage I (June 2003–August 2006) with an average storage level of 135 m; Stage II (August 2006–August 2008) with the highest storage level raised to 156 m; Stage III starting from November 2008 with the highest storage level regulated to 175 m. Since then, the reservoir water level has fluctuated between 145 m and 175 m, exhibiting a contrary periodic fluctuation to the precipitation intensity due to artificial flood control. As shown in Figure 9b, a clear 1 year cycle (12 months) of R.w.l has been observed since 2006.

Generally, the displacement dominated by geological conditions is approximately monotonic over time, indicating the long-term trend. The displacement affected by external triggering factors (e.g., rainfall and R.w.l variation) can be expressed as a proximate periodic function, leading to different deformation features. Thus, the periodical displacement is the optimal option to illustrate the multi-scale response relationship with the impact factors.

Figure 9c gives the CWT power spectrum of the displacement at ZG93, showing a 1 year cyclic period (12 months), indicating a combined effect of changing reservoir water levels and rainfall on the landslide displacement. In addition, a 2 years cyclic period (24 months) is observed during extreme changes in the landslide displacement. This phenomenon seems related to the deformation responding to the overall rise of the R.w.l, but further research is still needed to confirm the underlying causes. Thus, cross-analysis between the displacement and the influence factors are performed via XWT and WTC later to further study the effect of rainfall and R.w.l variation on the kinematics of the Baishuihe landslide.

#### 3.3.2. Analysis of XWT and WTC

XWT is a measure of similarity between two waveforms, showing the presence of high common power part. On the other hand, the WTC, which combines wavelet transform and coherence analysis, measures the coherence of the XWT in the time-frequency space. It reproduces the consistency and correlation of the sequence in different periods and scales through time-frequency analysis of the time series data [41]. Thus, employing XWT and WTC with dual time series will contribute to further elucidating the characteristics of landslide displacements and impact factors in terms of local coherence and detailed variation.

To explore the underlying correlations and causes in multiple time periods and scales (e.g., monthly or yearly time scales), we construct, the XWT power and the WTC coherence spectrum of the periodic displacement and the six pre-selected impact factors. The red noise standard spectrum is used to verify the reliability of the results, with a 95% confidence level obtained through the standard red noise test enclosed by the heavy solid black line. The thin solid black line enclosed area represents the cone of influence (COI) of the wavelet analysis, where edge effects are significant; thus, the response relationships will not be analyzed outside the COI.

As can be seen from Figure 10, there is a high common power area shared by the R.w.l and the rainfall time series on a time scale of 1 year (12 months) throughout the study period (2003–2012). The two time series are anti-phase relationship within the whole high common power area, with a mean phase difference of about −178 ± 3◦, confirming a fluctuation pattern of the R.w.l opposite to the precipitation conditions and thereby guaranteeing the safe operation of the TGR and preventing flooding disasters.

**Figure 10.** The XWT power and WTC coherence spectrum of R.w.l and rainfall. The arrow direction reflects the phase correlation between the two time series. The arrow points from left to right indicate the in-phase relation, and the opposite suggests the anti-phase. In XWT, the color change indicates the magnitude of power. In WTC, the color differences suggest different magnitudes of coherence.

As shown in Figure 11, the periodic displacement and the associated impact factors have a significant power area throughout the study period. Rainfall shows a natural cyclic fluctuation pattern. Figure 11a,b shows that the periodic displacement shares a continuous significant power sector with the rainfall at a time scale of 1 year (12 months) and is highly coherent. The two time series are in-phase throughout the study period, with a phase difference of about 34 ± 12◦, indicating about a two-month lag behind the displacement than the rainfall. In addition, a 2 years cyclic period (24 months) coincides with the extreme change in the landslide displacement between 2006 and 2009, showing an anti-phase with the R.w.l. This phenomenon seems related to the deformation response to the rapid rise of the R.w.l; since it is the first time the water level R.w.l has reached its maximum, further research is still needed to confirm the underlying causes of this behavior.

**Figure 11.** The XWT power and WTC coherence spectrum of the periodic term and six impact factors. (**a**,**b**) indicate rainfall-type factors *V*<sup>2</sup> and *V*3. The remainder indicates reservoir-type factors, where (**c**–**f**) indicate monthly mean level *V*1, monthly variation *V*4, single-month rate of change *V*<sup>5</sup> and bi-monthly variation *V*6.

As previously stated, the fluctuation of the R.w.l shows a cyclical pattern opposite to the precipitation conditions. As shown in Figure 11c–f, the periodic displacement and the associated four R.w.l factors also have a significant power sector with a 1 year (12 months) cyclic period. However, there is a notable difference with the monthly mean water level *V*<sup>1</sup> during the rapid impounding of the TGR (2006–2009), as it shows an inverse phase with the displacement. These four factors all contain a significant power sector with a 2 years (24 months) cyclic period, showing anti-phase relations, which may share a similar cause as that seen in the power spectrum of displacement and rainfall. This finding suggests that the rise of the reservoir level is closely related to the extraordinary change in landslide displacement, providing valuable insights into the influence of the reservoir water level on landslide behavior during specific periods.

As shown in Figure 11c, there is a phase difference of −51 ± 8◦ between the landslide displacement and the monthly mean level of the reservoir water, indicating the displacement change occurs ahead of the change in reservoir water level by about three months. This phenomenon is likely due to the fact that rising reservoir levels contribute to landslide stability, and the displacement increase often occurs before the reservoir level rises or after it falls. In-phase relations are observed between the displacement and the monthly variation V4, the monthly change rate V5 and the bi-monthly variation V6 of the R.w.l, with a mean phase difference of −13 ± 9◦, indicating a consistent variation pattern. All pre-selected impact factors show strong coherence over a 1 year cyclic period (12 months), except during the period from 2006 to 2009.

The analysis above confirms the preselected six impact factors have a strong correlation with the variation trends of landslide displacements. The periodic variation of the rainfall intensity and the reservoir storage leads to regularly varying underground water levels and the seepage field distribution, which adversely affects the Baishuihe landslide's stability. These regular variations alter the equilibrium of the slope and cause periodic changes in landslide displacements time series. The JFTA helps illustrate the periodically changing patterns and the response between the displacement and external factors at multi-scales to reveal the landslide mechanism behind the landslide behavior. By employing JTFA, we gain valuable insights into how the landslide responds to the dynamic interplay of various factors, providing a deeper understanding of the landslide's complex behavior and contributing to the advancement of landslide research and prediction.

#### **4. Model Forecast and Discussion**

#### *4.1. Training Dataset and Parameter Setting*

In the experiments, monthly GNSS-measured displacements at ZG93 were acquired from July 2003 to March 2013. Additionally, the corresponding monthly reservoir water level and precipitation data are used in the subsequent analysis and modeling. For the training process, datasets collected from July 2003 to June 2009, comprising 72 sets of data, are utilized as model inputs, while 25 sets of data collected from July 2009 to July 2010 are used as forecast datasets, and the reserved 17 sets of data from August 2011 to December 2012 are used for model validation.

Six training sets with different data volumes are obtained by dividing the dataset based on natural years (see Table 1). Using these datasets, six training models are constructed, and the optimal data volume for the prediction model is determined by comparing the model performance. Throughout this process, a sequential prediction strategy is implemented, considering the timeliness of monitoring GNSS displacement.

**Table 1.** The designed training dataset.


In the experiment, the deep learning architecture of the forecasting model in this paper is built using the Deep Learning Toolbox of Matlab2020. We empirically set the learning rate to 0.01 and the training number to 250. However, the number of hidden neurons is a crucial factor that can affect the prediction precision, and therefore, it should be determined through carefully designed comparison experiments. The results of the seven comparison experiments are shown in Table 2. According to Table 2, the model performance is proportional to its number of neurons. As the number of neurons increases, the RMSE of the model first decreases and then increases, with the optimal RMSE achieved when the number of neurons is 100. However, the computation time keeps growing as the number of neurons increases. Consequently, the number of hidden neurons is set to 100 in the following experiments.


**Table 2.** The performance of the GRU using different numbers of neurons.

#### *4.2. Prediction Results and Analyses*

#### 4.2.1. Displacement Components Prediction

The DES is employed to predict the trend component of the landslide displacement by using a weighted combination of past observations with recent observations given relatively higher weights than the older ones. Thus, despite the data volume difference between the designed six training datasets, the predicted trend component displacement is almost identical. Here, the prediction of Model 6 is taken as the final trend result. The DES results and the absolute error are shown in Figure 12a, with *R*<sup>2</sup> = 0.998 and *RMSE* = 2.741 mm.

**Figure 12.** Predictions of each displacement component.

The GRU could adaptively capture dependencies at different time scales, especially suitable for handling non-linear problems, and is thus used to forecast landslide displacements for periodic and random terms. The performance of the GRU in predicting the periodic component of the landslide displacement is shown in Table 3. According to Table 3, the GRU performance increases as the training data volume increases; thus, the optimal result is achieved with Model 6. However, the model performance of the random component displacement first increases and then decreases; the optimal results come from Model 3.


**Table 3.** The performance of the GRU in predicting periodic and random displacement.

The above results indicate that better periodic component prediction needs more trained data; of the six designed models, the larger the amount of the training data, the better. However, it does not apply to random components; for random component forecasts, an appropriate amount of datasets works best.

#### 4.2.2. Cumulative Displacement Prediction

The predicted cumulative displacements can be derived by combining the trend, the periodic and the random predicted components. The optimal prediction of each component is illustrated in Figure 12. According to Figure 12, the absolute forecast error of the trend component remains small (<20 mm) and does not vary much, despite the deformation being around 2 m. In contrast, the absolute forecast error of the periodic component follows a certain regularity and shows a tendency to decrease, and that of the random component shows randomness.

The evaluation indicators, as listed in Table 4, assess the performance of the six models. According to Table 4, both indicators experience an increase and then a decrease as the volume of the training dataset increases. A maximum *RMSE* of 26.043 mm and a minimum *R*<sup>2</sup> of 0.904 comes from Model 2 with a data volume of two natural years. The turning point found at Model 4 with four years of data volume shows the best results among the six models, with *R*<sup>2</sup> of 0.952 and *RMSE* of 18.582 mm.


**Table 4.** Predictions of the cumulative landslide displacement.

According to Section 4.2.1, Model 6 is the best model for predicting the periodic component, while Model 3 gives the best performance for predicting the random part. Consequently, based on this, an optimal combination prediction model for landslide displacement is constructed. The evaluation indicators of the optimal model, shown in Table 4, outperform the other six models in terms of both evaluation indicators, with an *RMSE* of 12.301 mm and *R*<sup>2</sup> of 0.979. Compared with Model 4, the best forecast model among the six, the RMSE decreases by 6.281 mm and *R*<sup>2</sup> increases by 0.027 for the optimal model.

#### 4.2.3. Comparative Experiments and Analyses

In Section 3.2, we conducted a pre-selection of the impact factors, and later, through the multi-scale response analysis in Section 3.3, we gained valuable insights into how the landslide responds to the dynamic interplay of the selected factors. The analysis revealed an apparent 1 year primary cycle of the time series associated with the landslide evolution. Of particular interest were the delayed response results, which highlight the complex interactions between the reservoir water level (R.w.l) and rainfall with the subsequent landslide movements. This provides a deeper understanding of the landslide's complex behavior and guides the selection of the landslide prediction model.

GRU and other recurrent neural networks (RNNs) excel at handling the non-linear dynamic characteristics present in complex time series. The GRU model, in particular, has the ability to capture dependencies at different time scales, making it well-suited for time series landslide displacement prediction. Considering the outstanding performance of GRU in time-series forecasting, we conducted comparative experiments to analyze the performance differences of the optimal model screened in Section 4.2.2, both when considering impact factors and when not considering them.

Figure 13 displays the predicted cumulative displacements and absolute errors. Notably, when the impact factors are ignored, the model exhibits more significant deviations, indicating a decline in prediction performance. The calculated *RMSE* and *R*<sup>2</sup> are 20.218 mm and 0.942, respectively. However, when considering the impact factors, the model produces an improved performance, yielding an *RMSE* of 12.301 mm and *R2* of 0.979. This results in a reduction in the RMSE by 7.917 mm and an increase in R2 by 0.037, respectively.

**Figure 13.** Predictions of the cumulative landslide displacement. The red marker indicates the absolute error without considering influencing factors, while the blue circle indicates the absolute error of our proposed method.

As a result, the impact factors selected in Section 3.2 have been proven to be indispensable for multi-scale response analysis and the prediction model training process. Despite the exceptional performance of GRU in time-series forecasting, it is evident that the influence factors of landslides displacement have a significant impact on the predictions. Hence, the analysis and selection of the impact factors during the modeling process are crucial, underscoring the comprehensiveness of the proposed model.

At this stage, the 17 reserved sets of data from August 2011 to December 2012 are utilized for model validation to further verify the model's performance. Figure 14 displays the cumulative landslide displacement prediction (17 sets) and the absolute errors of the proposed model, resulting in a calculated *RMSE* of 9.715 mm and *R*<sup>2</sup> of 0.967. The results demonstrate that the optimal combination prediction model exhibits a reliable capacity for predicting landslide displacement.

**Figure 14.** Predictions of the cumulative landslide displacement. The blue circles stands for the absolute errors of the proposed model.

To further illustrate the superiority of the proposed method compared to existing state-of-the-art methods, such as the GWO-MIC-SVR [42], the V/S-LSTM [43], the Chaotic DWT-ELM [44] and the Multi-Chaotic ELM [45], we conduct a comparative analysis of the forecasts generated by these models. The performance of the forecast models is shown in Table 5.



According to Table 5, the proposed model achieves a competitive high-accuracy result in terms of RMSE, ranking in the top two, with an *RMSE* of 9.715 mm, closely following the V/S-LSTM with an *RMSE* of 8.950 mm. However, what makes the proposed model stands out is its ability to makes 17 sets of consecutive forecasts, while V/S-LSTM only forecasts 8 sets.

Considering that forecast errors can accumulate over time, the proposed model's RMSE of 0.765 mm for 17 sets of consecutive forecasts demonstrates its capability for precise and reliable long-term predictions. Thus, from a comprehensive perspective, the proposed model exhibits both time-effectiveness and long-sequence forecasting advantages over the other six intelligent algorithms.

#### **5. Conclusions**

This paper presents a novel deep learning architecture specifically designed for predicting reservoir landslide displacement. The evolution of reservoir landslides involves highly complicated and nonlinear dynamics, characterized by specific time-frequency features. To address the complexities, the joint time-frequency analysis (JTFA) module is designed. This module integrates the GWO-optimized VMD and WA methods, facilitating the extraction of essential signal components with clear physical implications. Additionally, the module conducts multi-scale response analysis, thereby revealing deformation variation patterns in the underlying mechanisms governing the landslide's response behavior. For the actual displacement prediction, The GRU is integrated, which possesses the capability of adaptively

capturing dependencies at multiple time scales. Moreover, during the modeling process, the correlation and hysteresis effect of the impact factors are also considered, further enhancing the accuracy and reliability of the predictions. The model experiments show that as the amount of training data increases, the periodic component prediction improves significantly. For the random component forecast, an appropriate amount of datasets yields the best results, while the trend component remains almost identical regardless of the data size. This insight led to the construction of an optimal combination prediction model, surpassing the performance of the other six designed models in cumulative landslide displacement predictions. This model achieved impressive results, with a minimum RMSE of 12.301 mm and a maximum R2 of 0.979. Moreover, the proposed architecture's superiority in timeeffectiveness and its ability to accurately predict long-sequence landslide displacement have been firmly established through comparative experiments and analyses, in which we evaluated its performance against six other state-of-the-art intelligent methods. The favorable outcomes and impressive forecasting capabilities of our proposed architecture solidify its position as an efficient and accurate tool for landslide displacement prediction.

**Author Contributions:** Conceptualization, L.L. and Y.J.; methodology and validation, Y.J. and H.L.; formal analysis and investigation, X.Z; resources, Y.J.; writing—original draft preparation, Y.J.; writing—review and editing, Y.J. and Z.L.; visualization, H.L.; funding acquisition X.Z. and Y.J. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research is financially supported by the National Natural Science Foundation of China (Grant No. 41977255) and the Key Research and Development Program of Sichuan Province (2023YFS0439).

**Data Availability Statement:** The raw data used in this study can be downloaded for scientific research after approval at http://www.crensed.ac.cn/, accessed on 14 February 2022.

**Acknowledgments:** The authors would like to thank the National Service Center's Specialty Environmental Observation Stations for providing the in situ monitoring dataset.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
