**1. Introduction**

Landslide disaster is one of the crucial topics in geological research [1]. The sustainable development of economies and society is seriously threatened as a result of landslide disasters [2]. Reliable early warning systems are a reasonable approach for landslide risk reduction [3,4]. The mechanisms analysis and prediction of landslide movements are the key components of landslide early warning [5–7]. Therefore, it is judicious to carry out landslide displacement prediction.

Landslide displacement prediction models can be divided into two categories: physical models and numerical models [8,9]. Traditional physical models provide a physical explanation for the prediction work according to geological theory [10]. Saito established a three-stage theory of landslide creep failure in 1968 [11,12], and Hoek proposed the extension line method to predict the time-displacement curve of Chilean landslides in 1977 [13]. However, physical models are deficient in their ability to meet the demands of dynamic large landslide prediction [14–16]. With the rapid development of mathematical statistical theory and intelligent algorithms, numerical models have become more popular [5]. Numerical models fully consider the complexity and nonlinearity of the landslide evolution process and have higher prediction accuracy [5,17].

Advances in machine learning provide a powerful tool for numerical landslide model research. Zhou et al. [17] used kernel extreme learning for landslide displacement prediction. Zhu et al. [18] proposed a least squares support vector model and applied it to prediction of the Shuping landslide. Among them, Recurrent Neural Networks (RNNs) have particular advantages in dealing with sequential data [19,20]. Different from other neural networks, RNNs are the deepest algorithms [21], and they can effectively process data information with higher dimensions [22]. As a variant of RNNs, Long Short Term Memory (LSTM) networks perform better at storing and transferring historical information

**Citation:** Wang, J.; Nie, G.; Gao, S.; Wu, S.; Li, H.; Ren, X. Landslide Deformation Prediction Based on a GNSS Time Series Analysis and Recurrent Neural Network Model. *Remote Sens.* **2021**, *13*, 1055. https:// doi.org/10.3390/rs13061055

Academic Editors: Serdjo Kos, José Fernández and Juan F. Prieto

Received: 8 February 2021 Accepted: 9 March 2021 Published: 10 March 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

than RNNs [23–26]. The utility of the LSTM in landslide research has been confirmed by many scholars [27–30]. Thus, we choose an LSTM network for landslide displacement prediction in this paper.

The Attention Mechanism (AM) is currently a powerful deep learning toolkit [31]. AM is similar to the human visual observation mechanism that can transfer key information from the input information [32]. AM has been successfully applied in several tasks, such as natural language processing [31], translation [33], and image recognition [34]. Li et al. [35] added the Attention Mechanism to the LSTM model and successfully realized the prediction of personal mobility. Ding et al. [36] proposed a spatio-temporal attention LSTM model for flood forecasting. Thus, we incorporate an Attention Mechanism with an LSTM neural network to capture significant variation and improve the model's performance.

Therefore, a novel model based on time series analysis and Attention Mechanism with Long Short Term Memory (AMLSTM) was proposed to predict landslide displacement. The Baishuihe landslide in China, Hubei province, is utilized for the experiment area. First, we use the Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN) algorithm to divide the total displacement into the trend term, the periodic term, and the residual term. By analyzing the corresponding relationship between displacement and external factors, a multiple factors AMLSTM model, is applied to predict the displacement, and it is compared with a further four machine learning models. A series of contrastive analyses are conducted to evaluate the performance of all of the models. The results indicate that the proposed CEEMDAN-AMLSTM model performs best in the experiment.

#### **2. GNSS Time Series Analysis**

#### *2.1. Landslide Evolution Analysis*

The evolution of landslides is the result of the interaction of geological conditions and external factors [37]. The non-linear and non-stationary landslide displacement series are particularly complex and changeable. Therefore, it is necessary to decompose the landslide time series and forecast each component separately. The corresponding time series of the landslide displacement can be expressed by the additive model:

$$y\_i = \mathbf{T\_i} + \mathbf{S\_i} + \mathbf{R\_i} \tag{1}$$

where *yi* is the cumulative displacement, Ti is the trend term, *Si* is the period term, and *Ri* is the residual term.

#### *2.2. Decomposition of Displacement Time Series*

Many approaches have been recognized as being powerful tools for decomposing landslide displacement time series, and they include moving average [38], wavelet analysis [39], Variational Mode Decomposition (VMD) [40], and Empirical Mode Decomposition (EMD) [41]. The EMD method is an adaptive method that is used to analyze non-linear signals [42]. However, the model mixing problem constitutes an obstacle when using EMD. To address this problem, the Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN) method has been proposed in recent years [43]. Compared to the more commonly used EMD method, it has a better separation effect and is noise free. It has many applications in the fields of biological signal processing [44] and engineering [45], but its application in the geological field still needs to be explored.

The CEEMDAN decomposes the complex signal into a finite number of Intrinsic Mode Functions (IMFs). The basic process of the CEEMDAN is as follows [46]:

1. White Gaussian noises is added onto the lines of EEMD. The first IMF can be expressed as:

$$IMF\_1 = \sum\_{i=1}^{n} \frac{E\_1(\mathbf{x} + \varepsilon w\_i)}{n} \tag{2}$$

where n is the number of decomposition, x is the original signal, *ε* is a fixed coefficient, *wi* is the noise, and E(·) is the decomposition operator.

2. The first residual, *r*1, is calculated:

$$r\_1 = \text{x} - I\text{MF}\_1 \tag{3}$$

3. For k = 2,3 . . . , K, the *IMFk* and the kth residual can be calculated by:

$$IMF\_k = \sum\_{i=1}^{n} \frac{E\_1(r\_{k-1} + \varepsilon E\_{k-1}(w\_i))}{n} \tag{4}$$

$$r\_k = r\_{k-1} - IMF\_k \tag{5}$$

4. The process is calculated until the last residual, R, does not have more than two extrema points; the original signal can be expressed as:

$$\mathbf{x} = \sum\_{k=1}^{K} IMF\_k + R \tag{6}$$

#### **3. Attention Mechanism—LSTM Foresting Framework**

*3.1. LSTM*

Long Short Time Memory (LSTM) was proposed by Hochreiter and Schmidhuber in 1997 [23]. The LSTM can learn information through a well-designed structure called a "gate". The gate can store and control the flow of information so that the state of the previous time step can be transferred to the next time step. The LSTM algorithm has three gates—update gate, forget gate, and output gate—to protect and control the cell state explosion in training [25]. The internal structure of the unit memory is as shown in Figure 1.

**Figure 1.** The internal structure of the Long Short Time Memory (LSTM) unit memory.

The % represents the element-wise product and & is the element-plus product. The forget gate represents how much of the previous moment unit sate, ct−1, is retained by the current moment, ct. The input gate determines how much of the current moment input, xt, is saved in the unit state, ct. The output gate controls how much of the unit state, ct, is transferred to the output value, ht, of the LSTM.

Equations (7)–(12) show the calculation process of LSTM:

$$f\_l = \sigma\left(\mathcal{W}\_f \* [a\_{l-1}, \mathbf{x}\_l] + b\_f\right) \tag{7}$$

$$u\_t = \sigma(\mathcal{W}\_u \* [a\_{t-1}, x\_t] + b\_u) \tag{8}$$

$$\mathcal{Z}\_t = \tanh(\mathcal{W}\_{\mathcal{C}} \* [a\_{t-1}, \mathbf{x}\_t] + b\_{\mathcal{C}}) \tag{9}$$

$$c\_t = f\_t \* c\_{t-1} + u\_t \* \tilde{c}\_t \tag{10}$$

$$\rho\_t = \sigma\left(\mathcal{W}\_o \* \left[a\_{t-1}, \mathbf{x}\_t\right] + b\_o\right) \tag{11}$$

$$a\_l = o\_l \* \tanh(c\_l) \tag{12}$$

where *ft*, *ut*, and *ot* are gating vectors that respectively store the forgotten, updated, and output information of the storage unit memory; *ct* is the vector for the cell state; *at* is the hidden state vector; *σ* is the sigmoid function; and *xt* is the input vector. *Wf* , *Wu*, *Wc*, and *Wo* are linear transformation matrices whose parameters need to be learned, and *bf* , *bu*, *bc*, and *bo* are corresponding bias vectors.

Through the connection of several unit memories, the information flow can be transferred as shown in Figure 2.

**Figure 2.** The workflow of LSTM.

### *3.2. Attention Mechanism*

The Attention Mechanism is based on the visual Attention Mechanism found in human observation [32]. This mechanism helps the model focus on the salient information. The schematic of the Attention Mechanism layer is illustrated in Figure 3. The purpose of the attention layer is to enable the model to pay more attention to the significant information. Raffel et al. [47] proposed a reduced Feed-Forward Attention model, which was calculated as follows:

$$score\_l = v(a\_l) \tag{13}$$

$$w\_t = \frac{\exp(score\_t)}{\sum\_{k=1}^{T} \exp(score\_t)}\tag{14}$$

$$\mathbf{s} = \sum\_{t=1}^{T} w\_t \ast a\_t \tag{15}$$

where the score is the attention score, a is the state vector, v is the learnable function, w is the weight, and s is the context vector.

**Figure 3.** Schematic of the Attention Mechanism.

#### *3.3. Attention Mechanism—LSTM Model*

Based on the previous discussion, this paper applied the Attention Mechanism with LSTM (AMLSTM) model for landslide displacement prediction. The AMLSTM model includes an input vector, LSTM hidden layers, an attention layer, a fully connected layer, and output predicted values. The architecture of the AMLSTM model is shown in Figure 4.

**Figure 4.** Architecture of the Attention Mechanism with LSTM Neural Network (AMLSTM NN).
