Integrating Feature Selection with Machine Learning for Accurate Reservoir Landslide Displacement Prediction

Ge, Qi; Wang, Jingyong; Liu, Cheng; Wang, Xiaohong; Deng, Yiyan; Li, Jin

doi:10.3390/w16152152

Open AccessArticle

Integrating Feature Selection with Machine Learning for Accurate Reservoir Landslide Displacement Prediction

by

Qi Ge

¹,

Jingyong Wang

²,

Cheng Liu

¹

,

Xiaohong Wang

²,

Yiyan Deng

³

and

Jin Li

^3,*

¹

College of Civil Engineering, Nanjing Forestry University, Nanjing 210037, China

²

Powerchina Huadong Engineering Corporation, Hangzhou 310014, China

³

Institute of Artificial Intelligence, Nanjing University of Information Science and Technology, Nanjing 210044, China

^*

Author to whom correspondence should be addressed.

Water 2024, 16(15), 2152; https://doi.org/10.3390/w16152152

Submission received: 15 July 2024 / Revised: 27 July 2024 / Accepted: 28 July 2024 / Published: 30 July 2024

(This article belongs to the Special Issue Rainfall-Induced Landslides and Natural Geohazards)

Download

Browse Figures

Versions Notes

Abstract

:

Accurate prediction of reservoir landslide displacements is crucial for early warning and hazard prevention. Current machine learning (ML) paradigms for predicting landslide displacement demonstrate superior performance, while often relying on various feature engineering techniques, such as decomposing into different temporal lags and feature selection. This study investigates the impact of various feature selection techniques on the performance of ML algorithms for landslide displacement prediction. The Shuping and Baishuihe landslides in China’s Three Gorges Reservoir Area are used to comprehensively benchmark four prevalent ML algorithms. Both static ML models, including backpropagation neural network (BPNN), support vector machine (SVM), and dynamic models, such as long short-term memory (LSTM), and gated recurrent unit (GRU), are included. Each ML model is evaluated under three feature engineering techniques: raw multivariate time series, and feature selection under maximal information coefficient-partial autocorrelation function (MIC-PACF), or grey relational analysis-PACF (GRA-PACF). The results demonstrate that appropriate feature selection methods could significantly improve the performance of static ML models. In contrast, dynamic models effectively leverage inherent capabilities in capturing temporal dynamics within raw multivariate time series, seeing marginal gains with extensive feature engineering compared to no feature selection strategy. The optimal feature selection approach varies based on the ML model and specific landslide, highlighting the importance of case-specific assessments. The findings in this study offer guidance on integrating feature selection techniques with different machine learning models to maximize the robustness and generalizability of data-driven landslide displacement prediction frameworks.

Keywords:

landslide displacement prediction; feature selection; machine learning; Three Gorges Reservoir Area

1. Introduction

Landslides pose a major geological hazard in China’s Three Gorges Reservoir area (TGRA), threatening infrastructure and human lives [1,2,3,4,5]. Affected by seasonal rainfall and periodic reservoir level fluctuations, step-wise deformations are shown in many landslides in the TGRA [6,7,8,9]. Considering that GPS monitoring records the displacement of numerous landslides in the TGRA [10], predicting displacement is an essential and cost-effective method for early landslide warning and prevention in this area [11,12,13,14].

Different methods for predicting landslide displacement offer unique advantages. Physical models [15,16,17,18] contribute to understanding the relationship between deformation and failure mechanisms [19,20,21,22,23]. However, the displacement process of landslides is intricate, posing challenges in constructing sufficiently effective physical models. Despite being less effective in revealing the evolution mechanism of landslide deformation in some cases, phenomenological models are increasingly favored in displacement prediction for providing accurate predictions [24]. In recent years, machine learning (ML) methods have gained prominence as the most widely studied phenomenological approach for predicting landslide displacement in the TGRA. Numerous ML algorithms have been applied for reservoir landslide displacement prediction and have demonstrated superior performance. The existing methods can be broadly categorized as static and dynamic. Static algorithms, such as backpropagation neural network (BPNN) [25,26,27,28] and support vector machine (SVM) [29,30,31,32], offer advantages such as being lightweight, efficient, and easy to fine-tune. These models are less computationally demanding and easier to implement, making them suitable in scenarios where computational resources are limited. However, they may struggle to capture the temporal dynamics of landslide displacement influenced by changing environmental factors. On the other hand, dynamic algorithms, such as long short-term memory (LSTM) [33,34,35,36,37,38] and gated recurrent unit (GRU) [39,40,41], excel in leveraging historical data to model temporal dependencies and evolving patterns. These models are particularly effective in handling dynamic systems where external factors like rainfall and reservoir water levels play a significant role. Notably, given that the deformation of reservoir landslides is often influenced by external environmental factors like rainfall and reservoir water level, as well as their temporal lags, feature engineering and data preprocessing are widely used before constructing an ML model. Feature selection methods such as maximal information coefficient (MIC) [42,43], partial autocorrelation function (PACF) [44,45], and grey relational analysis (GRA) [46,47] are often leveraged to remove redundant features that exhibit low correlation with landslide deformation, and mitigate negative impacts of irrelevant inputs on model performance.

Despite extensive studies, the optimal integration of ML algorithms and feature selection methods for maximizing reservoir landslide displacement prediction accuracy remains unclear. As data-driven approaches, ML model performances are inherently tied to the inputs provided. Therefore, the feature engineering process, involving the selection or transformation of model inputs, carries important implications for downstream landslide displacement prediction. This raises key questions around best practices in making predictions given the diversity of available ML architectures and feature selection algorithms, especially around the following two aspects. First, as both static and dynamic ML models are prevalent in making predictions and have different characteristics, it is crucial to examine whether these fundamentally different model types are differentially impacted by feature engineering techniques that shape the input data. Second, feature selection methods such as MIC and GRA carry trade-offs; therefore, it remains uncertain whether any approach can deliver consistently accurate and robust performance across diverse case studies to justify its use as a default choice in future research. Answering these open questions around integrating feature selection with different ML algorithms can guide the development of accurate and generalizable modeling frameworks for landslide displacement prediction.

To address these open questions, this study presents a comprehensive comparison across four popular machine learning approaches, including both static models (BPNN, SVM) and dynamic models (LSTM, GRU). Each ML method is evaluated under three different feature engineering techniques, including feature selection using MIC-PACF or GRA-PACF, and no selection using raw multivariate time series inputs. Rigorous testing is carried out across multiple landslide cases and repeated runs to enable proper assessment of prediction accuracy, stability and robustness. The performance of all ML-feature selection combinations is benchmarked using two landslide case studies from the TGRA in China. By thoroughly examining integration strategies between prevalent ML algorithms and feature selection techniques, this research aims to guide optimization of the feature engineering process in future landslide displacement prediction modeling workflows.

2. Case Studies and Materials

2.1. Three Gorges Reservoir Area

In the TGRA, the movement of active landslides has intensified due to intense rainfall and reservoir water level changes. Figure 1 depicts the Yangtze River and its tributaries, the location of the Three Gorges Dam, and the sites of two focus landslides for this research. Many reservoir landslides in this area exhibit annual step-like displacement patterns, owing to the quasi-regular variations in seasonal precipitation and the reservoir’s water levels. For this study, the step-wise Shuping and Baishuihe landslides (see Figure 1) are utilized to compare the predictive performances of different combinations of feature selection techniques and machine learning algorithms.

2.2. Shuping Landslide

The Shuping landslide, characterized as a colluvial type, is located in Shazhenxi town on the southern bank of the Yangtze River, about 47 km west of the Three Gorges Dam. It has an estimated volume of 27.5 million cubic meters, with thickness ranging from 30 to 70 m. The upper part of the landslide consists mainly of soil containing boulders and gravel, while the lower part is predominantly composed of clay and silty clay. The landslide is underlain by magenta siltstone interbedded with mudstone and marlite from the Badong Formation bedrock. During reservoir lowering, the Shuping landslide accelerates its movement, while during reservoir filling, its movement decelerates, likely due to differential permeability within its layers [48]. Prolonged regional rainfall has also been identified as a potential trigger for movement.

Monitoring station ZG88 is selected for displacement modeling because of its substantial deformation and the availability of comprehensive data. Figure 2 illustrates the measured rainfall, reservoir water level, and cumulative displacement. The landslide undergoes short-term rapid deformation from May to July each year, coinciding with low reservoir water levels and heavy rainfall, while deformation at other times remains unchanged. The data from January 2011 to December 2012 (i.e., the last two years) are utilized as a test set, while the remaining data from January 2006 to December 2010 are employed for training.

2.3. Baishuihe Landslide

The Baishuihe landslide is positioned on the southern side of the Yangtze River within the Three Gorges Reservoir Area (TGRA), roughly 56 km west of the Three Gorges Dam. It is a retrogressive, fan-shaped landslide covering 0.42 km² with a maximum length of 780 m and width of 430 m. The landslide has an average thickness of 30 m and an estimated volume of 1.26 × 10⁷ m³. The sliding body consists mainly of cataclastic rock, silty mudstone, and gravelly soil, underlain by silty mudstones, Jurassic siltstone, and quartz sandstone bedrock. The Baishuihe landslide has experienced frequent reactivation, leading to severe deformation and extensive damage over time, including the destruction of 21 residential houses in 2004, as well as the development of transversal cracks on the landslide surface and significant road debris pile-up [49].

Eleven GPS stations were installed, given the significant displacements and risks from 2003. The GPS station ZG118 exhibits distinctive step-like deformation characteristics, a long displacement time series, and relatively large displacement, making it an ideal example for displacement analysis. Figure 3 displays the recorded rainfall, reservoir water level, and displacement at ZG118 of the Baishuihe landslide. The data from January 2012 to December 2013 (i.e., the last two years) are used as a test set, while the remaining data from August 2003 to December 2011 are used for training in subsequent modeling.

3. Methodology

3.1. Displacement Decomposition and Trend Term Prediction Methods

In ML models for predicting reservoir landslide displacement, the cumulative displacement D is divided into a trend component

α

and a periodic component

β

. The trend displacement shows the landslide’s evolutionary trend, while the periodic term represents displacement changes caused by reservoir level and rainfall. Each component is predicted individually and is combined to obtain the cumulative displacement predictions (i.e., D =

α

+

β

) [50,51]. This work employs the widely used Hodrick–Prescott (H–P) filter [52] to decompose the cumulative displacement of the reservoir landslides in the TGRA. The reference smoothing parameter for time series data with annual cycle characteristics is 100.

Double exponential smoothing (DES) is used to predict trend displacement. DES takes a weighted mean of historical observations for future predictions. With simple operation, broad adaptability, and good performance, DES is widely applied in landslide time series analysis, especially in landslide trend displacement prediction. For more details about DES, please refer to [53].

3.2. Periodic Displacement Prediction Methods

3.2.1. Overview of Basic Theory and Methods

Periodic displacement prediction, a key challenge in reservoir landslide displacement prediction, involves selecting input factors through feature selection methods and building an ML model for prediction. This study, based on previous TGRA research [54,55,56,57,58,59], identifies environmental and state factors, totaling 12 candidate factors (as listed in Table 1). Three feature selection methods and four ML algorithms are then used to develop and compare periodic displacement prediction models (as described in Table 2).

3.2.2. Feature Selection Methods

(1): MIC

The maximal information coefficient (MIC) method quantifies the strength of association between two variables by dividing their scatterplot into grid sections. It is a versatile metric for evaluating linear or complex statistical relationships in datasets, with values ranging from 0 to 1. Higher scores indicate stronger variable correlations. For the MIC method, a threshold value of 0.3 is used for selecting input variables. This threshold is supported by previous studies, including [45,60], which demonstrated that an MIC value of 0.3 effectively captures significant correlations relevant to landslide displacement prediction. This value helps balance the inclusion of pertinent features while excluding less relevant ones.

(2): GRA

Grey relational analysis (GRA) quantitatively evaluates relationships between time series using grey systems theory. It measures the grey relational grade (GRG) to describe the relative variations between one major factor and all other factors in a system. A high GRG indicates consistent relative variation between factors over time. In our study, a GRG threshold of 0.6 is employed to identify trigger factors with strong correlations to periodic displacement. This threshold is based on the findings of [61,62], which established that a GRG value exceeding 0.6 effectively identifies factors with significant and consistent relationships, making it suitable for feature selection in landslide displacement analysis.

(3): PACF

The partial autocorrelation function (PACF) is extensively employed in determining the optimal lag order of period displacement in TGRA research. It measures the correlation between a variable and its lag, revealing the specific contribution of each lag to the autocorrelation of the variable, helping to identify the direct influence of past observations on the current value. Factors with |PACF|

\geq 1.96 / \sqrt{n}

are selected, where n represents the sample size [53].

3.2.3. Machine Learning Algorithms

(1): BPNN

The so-called backpropagation neural network (BPNN), a three-layer network (input, hidden, and output), is the most widely used ANN architecture for time series. The BPNN is a feedforward network consisting of input, hidden, and output layers. The input layer receives the data to be processed. The output layer sends the processed results out of the network. One or more hidden layers lie between the inputs and outputs, performing intermediate computations. With its flexible structure and strong predictive capabilities, the BPNN has become a predominant choice for modeling time series data.

(2): SVM

The support vector machine (SVM) maps input data to a high-dimensional feature space to find an optimal separating hyperplane. It shows robust generalization with few tuning parameters and various kernel options like sigmoid, polynomial, and radial basis function (RBF). The RBF-kernel SVM is commonly used for landslide displacement prediction, but its performance depends on selecting suitable values for the penalty factor C, nonsensitive loss function

ϵ

, and RBF bandwidth

γ

. To address this limitation, particle swarm optimization (PSO) is used to search the optimal parameters of the SVM.

(3): LSTM

The long short-term memory (LSTM) neural network excels in modeling temporal sequences and time dependencies. Unlike traditional RNNs, which struggle with long sequences, the LSTM network uses specialized memory blocks to effectively capture and retain information over longer periods. It comprises an input layer, one or more hidden layers, and an output layer, with the fundamental unit being a memory cell containing input, forget, and output gates to regulate information flow. This enables LSTM to capture the dynamic characteristics of landslide displacement by connecting observations from one timestamp to the next.

(4): GRU

The gated recurrent unit (GRU) neural networks have demonstrated outstanding performance in predicting landslide displacement in TGRA by capturing the temporal dynamics within the time series data. Compared to LSTM, the GRU network features a simplified memory block structure, which includes an update gate and a reset gate. The reset gate controls the retention of information from the previous step, while the update gate enables the transfer of relevant information from previous steps and the current input to the cell and output layer.

3.3. Evaluation Metrics

We used RMSE, MAE, and R² as evaluation metrics to assess prediction accuracy. RMSE and MAE measure the difference between observed and predicted values, with lower scores indicating better agreement. R² assesses the strength of the linear correlation, with values closer to 1 indicating superior predictive performance. Here are the definitions of these metrics:

\begin{matrix} R M S E & = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}} \\ M A E & = \frac{1}{n} \sum_{i = 1}^{n} |y_{i} - {\hat{y}}_{i}| \\ R^{2} & = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}} \end{matrix}

(1)

In the given equations, n represents the number of samples,

y_{i}

represents the monitoring value at time i,

{\hat{y}}_{i}

represents the predicted value at time i, and

\bar{y}

represents the mean of the monitoring values.

To address ML model randomness in predicting periodic displacement, we conducted 100 independent runs for each algorithm. The mean values of the metrics assess prediction accuracy, while the coefficient of variation (CoV, i.e., the ratio of standard deviation to mean) evaluates model stability.

3.4. Procedures for Implementation

As illustrated in Figure 4, the implementation procedures can be summarized into five steps as follows:

Decompose cumulative displacement into trend and periodic using H–P filter;
Predict tend displacement using the DES method;
Select input factors for periodic displacement prediction via different methods;
Build different ML models and predict periodic displacement;
Obtain the cumulative displacement predictions, evaluate, and compare.

Figure 4. Flowchart of reservoir landslide displacement prediction models.

4. Results

4.1. Trend Term Predictions

Monitoring data from stations ZG88 (Shuping) and ZG118 (Baishuihe) are utilized to develop and validate displacement prediction models. The DES technique is adopted for modeling the trend component. Optimized smoothing parameters of

α = 0.99

and

β = 0.98

are set to obtain smooth predicted trends [63]. As shown in Figure 5 and Table 3, the predicted trend terms closely matched the measurements. For the Baishuihe test set, the DES model achieved a root mean squared error (RMSE) of 0.7835 mm and a mean absolute error (MAE) of 0.6858 mm. For the Shuping landslide, the prediction errors were slightly higher, with an RMSE of 3.9573 mm and MAE of 3.3963 mm. Overall, the DES method is capable of accurately modeling the deformation trends of both landslides.

4.2. Correlation Analysis

In this study, 12 candidate predictors capturing different time lags of reservoir level, rainfall, and prior landslide displacements are considered as candidate input features (see Table 1 and Section 3.2.1). MIC and GRA analysis are used to quantify the relative significance of the candidate inputs derived from reservoir level and rainfall, while the PACF analysis is utilized to quantify the relative significance of prior displacement. The final results after feature selections are listed in Table 4 for each landslide case. The detailed results for MIC, GRA, and PACF methods are shown in Figure 6, Figure 7, and Figure 8, respectively.

As demonstrated in Figure 6, based on the feature selection method of MIC, all candidate variables from a1 to a9 passed the threshold of 0.3. Based on the feature selection results of MIC, the Shuping landslide deformation is most sensitive to “average reservoir water level over two previous months” (a3), while it is least sensitive to “change in reservoir water level over previous month” (a5). Meanwhile, for grey relational analysis, candidate features a1 to a3 are excluded based on the threshold of 0.6. In contrast, “Change in reservoir water level during current month” (a4), “Rainfall over previous month” (a7), and “Rainfall over previous two months” (a8) are identified as the most influential factors for the Shuping landslide. Additionally, prior displacements at lag times of 1 and 2 months exhibited strong correlations based on the PACF analysis. For the Baishuihe landslide, “Average reservoir water level during current month” (a1) and “Rainfall over current month” (a6) are removed based on the MIC threshold of 0.3. The most influential factors for the Baishuihe landslide are “Change in reservoir water level during current month” (a4) and “Rainfall over previous two months” (a8). a4 and a8 also exhibit the highest grey relational grades based on GRA. Similarly to the outcomes observed for the Shuping landslide, candidate features a1 to a3 are excluded based on the GRA criterion of 0.6. Moreover, previous displacements at time lags of 1 and 2 months exhibited significant correlations, as revealed by the PACF analysis.

Overall, the results demonstrate that different feature selection methods and threshold values can generate varying outcomes for a given landslide case. Meanwhile, the same feature selection method may exhibit similar patterns on different landslides, as evidenced by the results from TGRA for the two landslides. However, it should be noted that even when comparable patterns are observed between the Shuping and Baishuihe cases using the MIC method (e.g., a1 and a5 consistently rank among the three least important features in both cases), the strict 0.3 threshold leads to differences in the final selection results (a1 and a5 are included for Shuping, while excluded for Baishuihe). In summary, the specifics of the feature selection method and chosen cutoff criteria can substantially impact the final determination of the feature selection results during preprocessing for a given landslide.

4.3. Periodic Displacement Prediction

To address inherent randomness, each ML model undergoes 100 independent runs with different preprocessing methods. The periodic displacement prediction results are visualized in Figure 9 and Figure 10 for the Shuping landslide and Baishuihe landslide, respectively. The repetition in experiments produces prediction intervals shown by the light-colored lines for each run. The solid black line depicts the actual monitored displacement data. The dark-colored line represents the average prediction over the 100 runs, with the same colors representing the same feature selection approaches.

From the perspective of ML prediction algorithms, as shown in Figure 9 and Figure 10, the SVM model has the best stability (the narrowest interval width composed of 100 prediction curves). Moreover, dynamic models such as LSTM and GRU outperform the static models during the mutational displacement stage (the peak point of the curves). In contrast, the static BPNN model demonstrates poorer performance during the mutational period, exhibiting greater variability between runs, as reflected by wider prediction intervals. On the other hand, from the perspective of feature selection methods, when compared to using the MIC-PACF or GRA-PACF feature selection methods, the approach of not performing feature filtering (i.e., none) appears to significantly degrade the performance of the two types of static models, but it does not seem to have a significant impact on the dynamic models. The quantitative results of these intuitive phenomena will be further analyzed in the next section.

4.4. Cumulative Displacement Prediction

The cumulative displacement prediction results for the Shuping landslide and Baishuihe landslide are showcased in Table 5 and Table 6, respectively. The performance is assessed using the RMSE, MAE, and R² metrics. The mean values are utilized to represent the average performance for each experimental setting, while the CoV values are employed to evaluate performance stability.

For the Shuping landslide, the choice of whether or not to adopt an effective feature selection method significantly impacts the performance of the static models. Optimal accuracy and stability for the BPNN model are achieved by utilizing the GRA method for feature selection. Meanwhile, the SVM model attained the highest prediction accuracy when using the MIC feature selection approach. Intriguingly, not performing feature selection resulted in the worst performance for both the BPNN and SVM models. For example, when evaluated by RMSE, the gap between no feature selection and the best-performing method reached 9.88% for BPNN (none: mean RMSE 50.3653, GRA-PACF: mean RMSE 46.0818) and 13.99% for SVM (none: mean RMSE 60.0487, MIC-PACF: mean RMSE 52.6767), respectively. This demonstrates the importance of appropriate feature selection techniques in improving model performance for static machine learning approaches on landslide displacement prediction tasks.

Regarding the dynamic models of LSTM and GRU, the conclusions differ from those of the static models. The best performance is achieved without the feature selection process, i.e., directly using multivariate time series as model inputs. The best feature selection methods with more variants as inputs performed on par with or slightly inferior to no feature selection. For example, when evaluated by MAE, the LSTM model with no feature selection yielded the result of 28.8818, while the best-performing feature selection method, MIC-PACF, produced 30.9762, which is slightly lower but very close. Overall, the findings for Shuping landslide demonstrate fundamental differences in how feature selection impacts static versus dynamic machine learning model performance for landslide displacement prediction.

For the Baishuihe landslide, similar findings can be observed regarding the difference between feature selection methods. As seen in Table 6, applying appropriate feature selection preprocessing significantly benefits the static models. Specifically, the MIC-PACF method produces the lowest prediction error for the BPNN model. SVM attains the best performance by leveraging the GRA-PACF feature selection approach. In contrast, for the LSTM model, the best performance was achieved using the MIC-PACF feature selection method, yielding the lowest RMSE of 12.4199, while the no feature selection produced an RMSE of 12.9922 that was slightly worse but comparable. This indicates that the LSTM model can effectively handle sequential data modeling even without feature selection. The GRU model reached superior accuracy without any feature selection, obtaining the lowest RMSE of 10.7799 when using the raw input time series data directly. This then slightly decreased to 11.3621 when applying the GRA-PACF feature selection method. The consistent conclusions between the two case studies demonstrate how such feature engineering and feature selection processes can improve the performance of static ML models for landslide displacement prediction, while providing no evident boost for recurrent neural network models already capable of handling temporal dynamics inherently.

To present the above results more intuitively, as shown in Table 7, we also provide a detailed comparison between the best output obtained from feature selection methods (i.e., MIC-PACF or GRA-PACF) and the output obtained by directly using the multivariate time series (i.e., none). The table reveals that employing suitable feature selection methods can significantly enhance the prediction accuracy of the two types of static models, with the maximum improvement exceeding 45%, and the minimum improvement approaching 10%—still a remarkably high increase. In contrast, the use of feature selection methods does not effectively enhance the prediction accuracy of the two types of dynamic models, and in certain cases, it may even reduce the performance by approximately 5%.

Although the above results suggest that combining appropriate feature selection methods can enhance the prediction accuracy of static models, the optimal feature selection method varies based on prediction algorithms and specific landslide cases. For the same landslide, different prediction models show different preferences for feature selection techniques. For example, in the Shuping case, the BPNN model obtained the highest accuracy by combining the MIC-PACF, while SVM performed best by leveraging GRA-PACF. Conversely, for the same static model, different landslides benefit from varying feature selection methods. For instance, on the Shuping landslide, the BPNN model achieved the highest prediction accuracy when paired with the MIC-PACF feature selection method. However, the BPNN favored the GRA-PACF method on the Baishuihe landslides.

5. Discussion

The results underscore the importance of employing appropriate feature selection techniques to enhance the performance of static ML approaches for landslide displacement prediction. Meanwhile, the dynamic GRU and LSTM models achieved high accuracy without feature selection, leveraging their inherent strength in analyzing temporal dynamics within multivariate data. The feature selection method involves calculating different time lags from the original multivariate time series, such as rainfall over previous months over rainfall over the current month, to identify crucial timestamps relevant to landslide displacement prediction. Static models like SVM and BPNN necessitate feature selection to acquire time lags from feature engineering, as they are not inherently designed to handle raw time series data effectively, as shown in Figure 11. Subsequently, identifying the most relevant lags via appropriate feature selection methods mitigates the impact of redundant inputs on predictions. In contrast, the excellent performance of recurrent networks without feature engineering stems from their capability to capture temporal dependencies and patterns in sequential data through an observation window, as illustrated in Figure 11. The recurrent nature of LSTM and GRU enables them to retain and utilize information from previous time steps, making them well suited for processing time series data and thereby diminishing the necessity for extensive feature engineering or manual selection. Therefore, the additional feature selection process may yield only marginal improvements in prediction performance for dynamic models, as evidenced by results presented from both case studies in Shuping and Baishuihe landslides in this work.

The findings showcase that while feature selection techniques can boost static model accuracy, the optimal feature selection approach varies based on the specifics of machine learning models and landslide datasets. This divergence reinforces the notion that a one-size-fits-all feature engineering solution is unlikely to maximize performance across unique datasets and model architectures. On one hand, machine learning models have fundamentally different underlying mechanisms for handling data, with architectures ranging from SVM to deep neural networks. On the other hand, the deformation of landslides is influenced by both internal and external conditions, and different landslides exhibit variations in geological structure, hydrogeological conditions, deformation mechanisms, and other aspects. Therefore, a feature set providing useful displacement predictions in one model may fail to unlock patterns in another paradigm. Tailoring optimal feature selection for each landslide by considering its physical mechanisms and prediction model strengths on a case-by-case basis may prove most effective for landslide displacement prediction.

Feature selection methods such as MIC and grey relational analysis employed in the current literature take a model-agnostic approach, filtering features without accounting for the biases or strengths of the downstream machine learning algorithms for prediction. However, optimizing preprocessing specifically for a model’s architecture, i.e., model-specific over model-agnostic techniques, could further boost accuracy. For instance, tailoring variable selection and transformations to feed the most informative time lags from displacement history to static models may improve their responsiveness during the mutational stage. Second, developing integrated pipelines that weight feature importance with model building could be beneficial, such as employing techniques like recursive feature elimination. Such a back-and-forth method works by first removing less informative variables, testing models, and then repeating. Integrating feature selection methods with model performance in adaptive frameworks and automatic pipelines could efficiently customize preprocessing. Finally, employing more interpretable machine learning models could provide insight into the relationships and patterns driving predictions. Opening the black boxes of machine learning methods would guide more informed, trustworthy feature engineering tailored for predicting landslide displacement.

6. Conclusions

This study presented a comprehensive comparison between different feature selection techniques (MIC-PACF, GRA-PACF, none) and machine learning algorithms (BPNN, SVM, LSTM, GRU) for landslide displacement prediction using monitoring data from the Baishuihe and Shuping landslides. The comparative assessment across two landslide cases and the ensemble of experimental runs rigorously evaluated model robustness, accuracy, and stability.

The key findings on two TGRA landslides demonstrate that while feature selection consistently improves static model performance, dynamic networks exhibit inherent capabilities in exploiting temporal patterns within raw multivariate time series data that diminish the necessity for extensive feature engineering. This finding indicates that tailoring preprocessing pipelines based on a model’s strengths and weaknesses for the landslide time series data may further optimize accuracy.

Additionally, the optimal feature selection approach varies based on the architecture of ML models and the characteristics of different landslide cases. This highlights the need for case-by-case assessments to determine the best-performing input features and model architectures for different landslides. Overall, the findings guide the development of accurate and reliable landslide displacement prediction systems based on integrating appropriate feature selection techniques with machine learning algorithms.

Further research could focus on model-specific feature selection methods, integrated feature engineering pipelines, the optimal division of steps for conveying the methodology effectively, and interpretable machine learning to customize informative variables for accurate landslide displacement prediction tasks.

Author Contributions

Conceptualization, Q.G. and J.L.; Methodology, Q.G., X.W. and Y.D.; Software, C.L., X.W. and Y.D.; Validation, J.W., C.L., X.W. and Y.D.; Formal analysis, Q.G.; Investigation, J.W. and X.W.; Resources, J.W. and X.W.; Data curation, J.W. and Y.D.; Writing—original draft, Q.G.; Writing—review & editing, C.L. and J.L.; Visualization, C.L. and Y.D.; Supervision, J.L.; Project administration, J.L.; Funding acquisition, Q.G. and J.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Natural Science Foundation of Jiangsu Province (Grant No. BK20220421), and the National Natural Science Foundation of China (Grant No. 82302352).

Data Availability Statement

The dataset can be obtained from the website of the National Cryosphere Desert Data Center by request (http://www.ncdc.ac.cn, 14 July 2024).

Acknowledgments

We thank the National Field Observation and Research Station of Landslides in the Three Gorges Reservoir Area of the Yangtze River for collecting and providing the data. We thank the anonymous reviewers who helped to improve the paper.

Conflicts of Interest

Author Jingyong Wang and Xiaohong Wang were employed by the company Powerchina Huadong Engineering Corporation. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Tang, H.; Wasowski, J.; Juang, C.H. Geohazards in the three Gorges Reservoir Area, China–Lessons learned from decades of research. Eng. Geol. 2019, 261, 105267. [Google Scholar] [CrossRef]
Criss, R.E.; Yao, W.; Li, C.; Tang, H. A predictive, two-parameter model for the movement of reservoir landslides. J. Earth Sci. 2020, 31, 1051–1057. [Google Scholar] [CrossRef]
Juang, C.H. BFTS-Engineering geologists’ field station to study reservoir landslides. Eng. Geol. 2021, 284, 106038. [Google Scholar] [CrossRef]
Li, C.; Criss, R.E.; Fu, Z.; Long, J.; Tan, Q. Evolution characteristics and displacement forecasting model of landslides with stair-step sliding surface along the Xiangxi River, three Gorges Reservoir region, China. Eng. Geol. 2021, 283, 105961. [Google Scholar] [CrossRef]
Zou, Z.; Luo, T.; Tan, Q.; Yan, J.; Luo, Y.; Hu, X. Dynamic determination of landslide stability and thrust force considering slip zone evolution. Nat. Hazards 2023, 118, 31–53. [Google Scholar] [CrossRef]
Long, J.; Li, C.; Liu, Y.; Feng, P.; Zuo, Q. A multi-feature fusion transfer learning method for displacement prediction of rainfall reservoir-induced landslide with step-like deformation characteristics. Eng. Geol. 2022, 297, 106494. [Google Scholar] [CrossRef]
Wen, H.; Xiao, J.; Xiang, X.; Wang, X.; Zhang, W. Singular spectrum analysis-based hybrid PSO-GSA-SVR model for predicting displacement of step-like landslides: A case of Jiuxianping landslide. Acta Geotech. 2023, 19, 1835–1852. [Google Scholar] [CrossRef]
Miao, F.; Xie, X.; Wu, Y.; Zhao, F. Data Mining and deep learning for predicting the displacement of “Step-like” landslides. Sensors 2022, 22, 481. [Google Scholar] [CrossRef]
Liao, K.; Wu, Y.; Miao, F. System reliability analysis of landslides subjected to fluctuation of reservoir water level: A case study in the Three Gorges Reservoir area, China. Bull. Eng. Geol. Environ. 2022, 81, 225. [Google Scholar] [CrossRef]
Zhu, H.; Wu, B.; Cao, D.; Zhang, C.; Shi, B. Monitoring of soil moisture and temperature distributions in seasonally frozen ground with fiber optic sensors. IOP Conf. Ser. Earth Environ. Sci. 2021, 861, 042042. [Google Scholar] [CrossRef]
Gong, W.; Tian, S.; Wang, L.; Li, Z.; Tang, H.; Li, T.; Zhang, L. Interval prediction of landslide displacement with dual-output least squares support vector machine and particle swarm optimization algorithms. Acta Geotech. 2022, 17, 4013–4031. [Google Scholar] [CrossRef]
Bovenga, F.; Argentiero, I.; Refice, A.; Nutricato, R.; Nitti, D.O.; Pasquariello, G.; Spilotro, G. Assessing the Potential of Long, Multi-Temporal SAR Interferometry Time Series for Slope Instability Monitoring: Two Case Studies in Southern Italy. Remote Sens. 2022, 14, 1677. [Google Scholar] [CrossRef]
Casagli, N.; Intrieri, E.; Tofani, V.; Gigli, G.; Raspini, F. Landslide detection, monitoring and prediction with remote-sensing techniques. Nat. Rev. Earth Environ. 2023, 4, 51–64. [Google Scholar] [CrossRef]
Kyriou, A.; Nikolakopoulos, K.G.; Koukouvelas, I.K. Timely and low-cost remote sensing practices for the assessment of landslide activity in the service of hazard management. Remote Sens. 2022, 14, 4745. [Google Scholar] [CrossRef]
Kennedy, R.; Take, W.A.; Siemens, G. Geotechnical centrifuge modelling of retrogressive sensitive clay landslides. Can. Geotech. J. 2021, 58, 1452–1465. [Google Scholar] [CrossRef]
Gupta, K.; Satyam, N.; Gupta, V. Probabilistic physical modelling and prediction of regional seismic landslide hazard in Uttarakhand state (India). Landslides 2023, 20, 901–912. [Google Scholar] [CrossRef]
Du, J.; Shi, X.; Chai, B.; Glade, T.; Luo, Z.; Zheng, L.; Liu, B. Force and energy equilibrium-based analytical method for progressive failure analysis of translational rockslides: Formulation and comparative study. Landslides 2023, 20, 475–488. [Google Scholar] [CrossRef]
Paswan, A.P.; Shrivastava, A.K. Evaluation of a tilt-based monitoring system for rainfall-induced landslides: Development and physical modelling. Water 2023, 15, 1862. [Google Scholar] [CrossRef]
Sun, H.Y.; Ge, Q.; Yu, Y.; Shuai, F.X.; Lü, C.C. A new self-starting drainage method for slope stabilization and its application. Bull. Eng. Geol. Environ. 2021, 80, 251–265. [Google Scholar] [CrossRef]
Phoon, K.K.; Cao, Z.J.; Ji, J.; Leung, Y.F.; Najjar, S.; Shuku, T.; Tang, C.; Yin, Z.Y.; Ikumasa, Y.; Ching, J. Geotechnical uncertainty, modeling, and decision making. Soils Found. 2022, 62, 101189. [Google Scholar] [CrossRef]
Jiang, S.; Zhu, G.; Wang, Z.Z.; Huang, Z.; Huang, J. Data augmentation for CNN-based probabilistic slope stability analysis in spatially variable soils. Comput. Geotech. 2023, 160, 105501. [Google Scholar] [CrossRef]
Zhang, J.; Yao, H.Z.; Wang, Z.P.; Xue, Y.D.; Zhang, L.L. On prediction of slope failure time with the inverse velocity method. Georisk: Assess. Manag. Risk Eng. Syst. Geohazards 2023, 17, 114–126. [Google Scholar] [CrossRef]
Cao, Z.J.; Peng, X.; Li, D.Q.; Tang, X.S. Full probabilistic geotechnical design under various design scenarios using direct Monte Carlo simulation and sample reweighting. Eng. Geol. 2019, 248, 207–219. [Google Scholar] [CrossRef]
Cascini, L.; Scoppettuolo, M.R.; Babilio, E. Forecasting the landslide evolution: From theory to practice. Landslides 2022, 19, 2839–2851. [Google Scholar] [CrossRef]
Ge, Q.; Sun, H.; Liu, Z.; Yang, B.; Lacasse, S.; Nadim, F. A novel approach for displacement interval forecasting of landslides with step-like displacement pattern. Georisk Assess. Manag. Risk Eng. Syst. Geohazards 2022, 16, 489–503. [Google Scholar] [CrossRef]
Tehrani, F.S.; Calvello, M.; Liu, Z.; Zhang, L.; Lacasse, S. Machine learning and landslide studies: Recent advances and applications. Nat. Hazards 2022, 114, 1197–1245. [Google Scholar] [CrossRef]
Nava, L.; Carraro, E.; Reyes-Carmona, C.; Puliero, S.; Bhuyan, K.; Rosi, A.; Monserrat, O.; Floris, M.; Meena, S.R.; Galve, J.P.; et al. Landslide displacement forecasting using deep learning and monitoring data across selected sites. Landslides 2023, 20, 2111–2129. [Google Scholar] [CrossRef]
Zeng, T.; Glade, T.; Xie, Y.; Yin, K.; Peduto, D. Deep learning powered long-term warning systems for reservoir landslides. Int. J. Disaster Risk Reduct. 2023, 94, 103820. [Google Scholar] [CrossRef]
Zhang, L.; Shi, B.; Zhu, H.; Yu, X.B.; Han, H.; Fan, X. PSO-SVM-based deep displacement prediction of Majiagou landslide considering the deformation hysteresis effect. Landslides 2021, 18, 179–193. [Google Scholar] [CrossRef]
Ma, J.; Xia, D.; Guo, H.; Wang, Y.; Niu, X.; Liu, Z.; Jiang, S. Metaheuristic-based support vector regression for landslide displacement prediction: A comparative study. Landslides 2022, 19, 2489–2511. [Google Scholar] [CrossRef]
Li, L.; Wu, Y.; Huang, Y.; Li, B.; Miao, F.; Ziqiang, D. Adaptive hybrid machine learning model for forecasting the step-like displacement of reservoir colluvial landslides: A case study in the three Gorges reservoir area, China. Stoch. Environ. Res. Risk Assess. 2023, 37, 903–923. [Google Scholar]
Jia, W.; Wen, T.; Li, D.; Guo, W.; Quan, Z.; Wang, Y.; Huang, D.; Hu, M. Landslide displacement prediction of Shuping landslide combining PSO and LSSVM model. Water 2023, 15, 612. [Google Scholar] [CrossRef]
Yang, B.; Yin, K.; Lacasse, S.; Liu, Z. Time series analysis and long short-term memory neural network to predict landslide displacement. Landslides 2019, 16, 677–694. [Google Scholar] [CrossRef]
Wang, L.; Xiao, T.; Liu, S.; Zhang, W.; Yang, B.; Chen, L. Quantification of model uncertainty and variability for landslide displacement prediction based on Monte Carlo simulation. Gondwana Res. 2023, 123, 27–40. [Google Scholar] [CrossRef]
Xing, Y.; Yue, J.; Chen, C.; Qin, Y.; Hu, J. A hybrid prediction model of landslide displacement with risk-averse adaptation. Comput. Geosci. 2020, 141, 104527. [Google Scholar] [CrossRef]
Huang, F.; Xiong, H.; Chen, S.; Lv, Z.; Huang, J.; Chang, Z.; Catani, F. Slope stability prediction based on a long short-term memory neural network: Comparisons with convolutional neural networks, support vector machines and random forest models. Int. J. Coal Sci. Technol. 2023, 10, 18. [Google Scholar] [CrossRef]
Lin, Q.; Yang, Z.; Huang, J.; Deng, J.; Chen, L.; Zhang, Y. A Landslide Displacement Prediction Model Based on the ICEEMDAN Method and the TCN–BiLSTM Combined Neural Network. Water 2023, 15, 4247. [Google Scholar] [CrossRef]
Zeng, T.; Wu, L.; Hayakawa, Y.S.; Yin, K.; Gui, L.; Jin, B.; Guo, Z.; Peduto, D. Advanced integration of ensemble learning and MT-InSAR for enhanced slow-moving landslide susceptibility zoning. Eng. Geol. 2024, 331, 107436. [Google Scholar] [CrossRef]
Jiang, Y.; Luo, H.; Xu, Q.; Lu, Z.; Liao, L.; Li, H.; Hao, L. A graph convolutional incorporating GRU network for landslide displacement forecasting based on spatiotemporal analysis of GNSS observations. Remote Sens. 2022, 14, 1016. [Google Scholar] [CrossRef]
Ge, Q.; Sun, H.; Liu, Z.; Wang, X. A data-driven intelligent model for landslide displacement prediction. Geol. J. 2023, 58, 2211–2230. [Google Scholar] [CrossRef]
Zhang, W.; Li, H.; Tang, L.; Gu, X.; Wang, L.; Wang, L. Displacement prediction of Jiuxianping landslide using gated recurrent unit (GRU) networks. Acta Geotech. 2022, 17, 1367–1382. [Google Scholar] [CrossRef]
Wang, H.; Long, G.; Shao, P.; Lv, Y.; Gan, F.; Liao, J. A DES-BDNN based probabilistic forecasting approach for step-like landslide displacement. J. Clean. Prod. 2023, 394, 136281. [Google Scholar] [CrossRef]
Pei, H.; Meng, F.; Zhu, H. Landslide displacement prediction based on a novel hybrid model and convolutional neural network considering time-varying factors. Bull. Eng. Geol. Environ. 2021, 80, 7403–7422. [Google Scholar] [CrossRef]
Shihabudheen, K.; Pillai, G.N.; Peethambaran, B. Prediction of landslide displacement with controlling factors using extreme learning adaptive neuro-fuzzy inference system (ELANFIS). Appl. Soft Comput. 2017, 61, 892–904. [Google Scholar]
Wang, Y.; Tang, H.; Wen, T.; Ma, J. A hybrid intelligent approach for constructing landslide displacement prediction intervals. Appl. Soft Comput. 2019, 81, 105506. [Google Scholar] [CrossRef]
Guo, Z.; Chen, L.; Gui, L.; Du, J.; Yin, K.; Do, H.M. Landslide displacement prediction based on variational mode decomposition and WA-GWO-BP model. Landslides 2020, 17, 567–583. [Google Scholar] [CrossRef]
Li, D.y.; Sun, Y.q.; Yin, K.l.; Miao, F.s.; Glade, T.; Leo, C. Displacement characteristics and prediction of Baishuihe landslide in the Three Gorges Reservoir. J. Mt. Sci. 2019, 16, 2203–2214. [Google Scholar] [CrossRef]
Seguí, C.; Rattez, H.; Veveakis, M. On the stability of deep-seated landslides. The cases of Vaiont (Italy) and Shuping (Three Gorges Dam, China). J. Geophys. Res. Earth Surf. 2020, 125, e2019JF005203. [Google Scholar] [CrossRef]
Miao, F.; Wu, Y.; Li, L.; Liao, K.; Xue, Y. Triggering factors and threshold analysis of baishuihe landslide based on the data mining methods. Nat. Hazards 2021, 105, 2677–2696. [Google Scholar] [CrossRef]
Du, J.; Yin, K.; Lacasse, S. Displacement prediction in colluvial landslides, three Gorges reservoir, China. Landslides 2013, 10, 203–218. [Google Scholar] [CrossRef]
Cao, Y.; Yin, K.; Alexander, D.E.; Zhou, C. Using an extreme learning machine to predict the displacement of step-like landslides in relation to controlling factors. Landslides 2016, 13, 725–736. [Google Scholar] [CrossRef]
Zhu, X.; Xu, Q.; Tang, M.; Nie, W.; Ma, S.; Xu, Z. Comparison of two optimized machine learning models for predicting displacement of rainfall-induced landslide: A case study in Sichuan Province, China. Eng. Geol. 2017, 218, 213–222. [Google Scholar] [CrossRef]
Wang, Y.; Tang, H.; Huang, J.; Wen, T.; Ma, J.; Zhang, J. A comparative study of different machine learning methods for reservoir landslide displacement prediction. Eng. Geol. 2022, 298, 106544. [Google Scholar] [CrossRef]
Liu, Z.; Guo, D.; Lacasse, S.; Li, J.H.; Yang, B.; Choi, J.C. Algorithms for intelligent prediction of landslide displacements. J. Zhejiang Univ. Sci. A 2020, 21, 412–429. [Google Scholar] [CrossRef]
Wen, T.; Tang, H.; Wang, Y.; Lin, C.; Xiong, C. Landslide displacement prediction using the GA-LSSVM model and time series analysis: A case study of Three Gorges Reservoir, China. Nat. Hazards Earth Syst. Sci. 2017, 17, 2181–2198. [Google Scholar] [CrossRef]
Zhou, C.; Yin, K.; Cao, Y.; Ahmed, B.; Fu, X. A novel method for landslide displacement prediction by integrating advanced computational intelligence algorithms. Sci. Rep. 2018, 8, 7287. [Google Scholar] [CrossRef]
Yao, W.; Li, C.; Guo, Y.; Criss, R.E.; Zuo, Q.; Zhan, H. Short-term deformation characteristics, displacement prediction, and kinematic mechanism of Baijiabao landslide based on updated monitoring data. Bull. Eng. Geol. Environ. 2022, 81, 393. [Google Scholar] [CrossRef]
Zeng, T.; Jin, B.; Glade, T.; Xie, Y.; Li, Y.; Zhu, Y.; Yin, K. Assessing the imperative of conditioning factor grading in machine learning-based landslide susceptibility modeling: A critical inquiry. Catena 2024, 236, 107732. [Google Scholar] [CrossRef]
Zeng, T.; Gong, Q.; Wu, L.; Zhu, Y.; Yin, K.; Peduto, D. Double-index rainfall warning and probabilistic physically based model for fast-moving landslide hazard analysis in subtropical-typhoon area. Landslides 2024, 21, 753–773. [Google Scholar] [CrossRef]
Huang, D.; He, J.; Song, Y.; Guo, Z.; Huang, X.; Guo, Y. Displacement Prediction of the Muyubao Landslide Based on a GPS Time-Series Analysis and Temporal Convolutional Network Model. Remote Sens. 2022, 14, 2656. [Google Scholar] [CrossRef]
Li, L.M.; Wang, C.Y.; Wen, Z.Z.; Gao, J.; Xia, M.F. Landslide displacement prediction based on the ICEEMDAN, ApEn and the CNN-LSTM models. J. Mt. Sci. 2023, 20, 1220–1231. [Google Scholar] [CrossRef]
Zhang, M.; Han, Y.; Yang, P.; Wang, C. Landslide displacement prediction based on optimized empirical mode decomposition and deep bidirectional long short-term memory network. J. Mt. Sci. 2023, 20, 637–656. [Google Scholar] [CrossRef]
Jiang, Y.; Xu, Q.; Lu, Z.; Luo, H.; Liao, L.; Dong, X. Modelling and predicting landslide displacements and uncertainties by multiple machine-learning algorithms: Application to Baishuihe landslide in Three Gorges Reservoir, China. Geomat. Nat. Hazards Risk 2021, 12, 741–762. [Google Scholar] [CrossRef]

Figure 1. The Yangtze River, the TGRA, and two landslides studied.

Figure 2. The recorded rainfall, reservoir water level, and displacement at ZG88 station for the Shuping landslide.

Figure 3. The recorded rainfall, reservoir water level, and displacement at ZG118 station for the Baishuihe landslide.

Figure 5. Trend displacement prediction results. (a) Baishuihe landslide; (b) Shuping landslide.

Figure 6. Feature selection results for features from a1 to a9 based on MIC method. (a) Shuping landslide; (b) Baishuihe landslide.

Figure 7. Feature selection results for features from a1 to a9 based on GRA method. (a) Shuping landslide; (b) Baishuihe landslide.

Figure 8. Feature selection results for features from a10 to a12 based on PACF method. (a) Shuping landslide; (b) Baishuihe landslide.

Figure 9. Predicted results of periodic displacement of Shuping landslide.

Figure 10. Predicted results of periodic displacement of Baishuihe landslide.

Figure 11. Comparison between static and dynamic ML models for landslide displacement prediction.

Table 1. Candidate factors for predicting landslide periodical displacement in the TGRA.

Types	Candidate Factors
Reservoir water level	a1	Average reservoir water level during current month.
	a2	Average reservoir water level over previous month.
	a3	Average reservoir water level over two previous month.
	a4	Change in reservoir water level during current month.
	a5	Change in reservoir water level over previous month.
Rainfall	a6	Rainfall over current month.
	a7	Rainfall over previous month.
	a8	Rainfall over previous two months.
	a9	Cumulative rainfall for current month and previous month.
Evolution state	a10	Periodic displacement over previous month.
	a11	Periodic displacement over two previous month.
	a12	Periodic displacement over three previous month.

Table 2. Feature selection methods and ML algorithms compared in the study.

a Feature selection methods
	Description
MIC-PACF	MIC for trigger factors selecting and PACF for the state factors.
GRA-PACF	GRA for trigger factors selecting and PACF for the state factors.
None	Raw time series data are used directly in their original form for machine learning models, without any feature engineering or selection techniques.
b Machine learning algorithms
	Description
BPNN	A static algorithm known for its efficiency in handling non-linear relationships.
SVM	A static algorithm effective for classification and regression tasks, utilizing kernel functions to handle nonlinearity.
LSTM	A dynamic algorithm designed to capture long-term dependencies and temporal patterns in sequential data.
GRU	A dynamic algorithm that, like LSTM, captures temporal dependencies but with a simplified architecture for faster training.

Table 3. Evaluation of trend displacement prediction results.

	RMSE	MAE	R²
Shuping	3.9573	3.3963	0.9999
Baishuihe	0.7835	0.6858	0.9999

Table 4. Input feature selection results for periodic displacement prediction.

	MIC-PACF	GRA-PACF	None
Shuping	a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11	a4, a5, a6, a7, a8, a9, a10, a11	a1, a6, a10
Baishuihe	a2, a3, a4, a5, a7, a8, a9, a10, a11	a4, a5, a6, a7, a8, a9, a10, a11	a1, a6, a10

Table 5. Evaluation of cumulative displacement prediction results for Shuping landslide.

		RMSE		MAE		R²
		Mean	CoV	Mean	CoV	Mean	CoV
BPNN	MIC-PACF	49.9049	0.1047	36.0493	0.1199	0.9878	0.0027
	GRA-PACF	46.0818	0.0838	32.5091	0.0845	0.9896	0.0018
	None	50.6353	0.0892	35.8194	0.1027	0.9874	0.0023
SVM	MIC-PACF	52.6767	0.0195	34.6874	0.0146	0.9865	0.0005
	GRA-PACF	52.8809	0.0162	35.1440	0.0061	0.9864	0.0005
	None	60.0487	0.0249	35.6171	0.0222	0.9824	0.0009
LSTM	MIC-PACF	40.3836	0.1536	30.9762	0.1411	0.9919	0.0027
	GRA-PACF	51.3091	0.1214	39.5537	0.1201	0.9870	0.0032
	None	38.5538	0.0965	28.8818	0.0956	0.9927	0.0015
GRU	MIC-PACF	43.3963	0.0980	32.3404	0.1091	0.9908	0.0019
	GRA-PACF	55.5425	0.0842	43.1360	0.0928	0.9849	0.0026
	None	41.0639	0.0972	29.8963	0.0938	0.9917	0.0018

Table 6. Evaluation of cumulative displacement prediction results for Baishuihe landslide.

		RMSE		MAE		R²
		thMean	thCoV	thMean	thCoV	thMean	thCoV
BPNN	MIC-PACF	13.3155	0.2743	10.9680	0.3020	0.9767	0.0147
	GRA-PACF	14.1094	0.2413	11.5834	0.2984	0.9743	0.0139
	None	15.8745	0.2747	12.7543	0.2903	0.9669	0.0209
SVM	MIC-PACF	14.4454	0.0441	11.5660	0.0342	0.9744	0.0022
	GRA-PACF	13.5008	0.0465	11.0873	0.0377	0.9777	0.0021
	None	19.6921	0.0129	14.1678	0.0268	0.9526	0.0013
LSTM	MIC-PACF	12.4199	0.1806	10.0549	0.1920	0.9805	0.0074
	GRA-PACF	13.9912	0.2188	11.2274	0.2473	0.9749	0.0142
	None	12.9922	0.1735	10.4681	0.1895	0.9787	0.0079
GRU	MIC-PACF	15.2416	0.1634	11.9183	0.1768	0.9708	0.0104
	GRA-PACF	13.9847	0.1729	11.3621	0.1839	0.9754	0.0094
	None	13.4234	0.1972	10.7799	0.2172	0.9771	0.0104

Table 7. The improvement in model accuracy in RMSE with feature selection (best-performing between MIC-PACF or GRA-PACF) vs. without feature selection (raw monitoring time series).

	Shuping	Baishuihe
BPNN	9.88%	19.22%
SVM	13.99%	45.86%
LSTM	4.53%	4.61%
GRU	−5.37%	−4.01%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ge, Q.; Wang, J.; Liu, C.; Wang, X.; Deng, Y.; Li, J. Integrating Feature Selection with Machine Learning for Accurate Reservoir Landslide Displacement Prediction. Water 2024, 16, 2152. https://doi.org/10.3390/w16152152

AMA Style

Ge Q, Wang J, Liu C, Wang X, Deng Y, Li J. Integrating Feature Selection with Machine Learning for Accurate Reservoir Landslide Displacement Prediction. Water. 2024; 16(15):2152. https://doi.org/10.3390/w16152152

Chicago/Turabian Style

Ge, Qi, Jingyong Wang, Cheng Liu, Xiaohong Wang, Yiyan Deng, and Jin Li. 2024. "Integrating Feature Selection with Machine Learning for Accurate Reservoir Landslide Displacement Prediction" Water 16, no. 15: 2152. https://doi.org/10.3390/w16152152

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Integrating Feature Selection with Machine Learning for Accurate Reservoir Landslide Displacement Prediction

Abstract

1. Introduction

2. Case Studies and Materials

2.1. Three Gorges Reservoir Area

2.2. Shuping Landslide

2.3. Baishuihe Landslide

3. Methodology

3.1. Displacement Decomposition and Trend Term Prediction Methods

3.2. Periodic Displacement Prediction Methods

3.2.1. Overview of Basic Theory and Methods

3.2.2. Feature Selection Methods

3.2.3. Machine Learning Algorithms

3.3. Evaluation Metrics

3.4. Procedures for Implementation

4. Results

4.1. Trend Term Predictions

4.2. Correlation Analysis

4.3. Periodic Displacement Prediction

4.4. Cumulative Displacement Prediction

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI