Comparative Study of Artificial Neural Network and Random Forest Model for Susceptibility Assessment of Landslides Induced by Earthquake in the Western Sichuan Plateau, China

Kamal, Mustafa; Zhang, Baolei; Cao, Jianfei; Zhang, Xin; Chang, Jun

doi:10.3390/su142113739

Open AccessArticle

Comparative Study of Artificial Neural Network and Random Forest Model for Susceptibility Assessment of Landslides Induced by Earthquake in the Western Sichuan Plateau, China

by

Mustafa Kamal

,

Baolei Zhang

^*,

Jianfei Cao

,

Xin Zhang

and

Jun Chang

^*

College of Geography and Environment, Shandong Normal University, Jinan 250014, China

^*

Authors to whom correspondence should be addressed.

Sustainability 2022, 14(21), 13739; https://doi.org/10.3390/su142113739

Submission received: 13 September 2022 / Revised: 6 October 2022 / Accepted: 9 October 2022 / Published: 24 October 2022

(This article belongs to the Section Hazards and Sustainability)

Download

Browse Figures

Versions Notes

Abstract

Earthquake-induced landslides are one of the most dangerous secondary disasters in mountainous areas throughout the world. The nowcasting of coseismic landslides is crucial for planning land management, development, and urbanization in mountainous areas. Taking Wenchuan County in Western Sichuan Plateau (WPS) as the study area, a landslide inventory was built using historical records. Herein, eight causative factors were selected for a library of factors, and then a landslide susceptibility assessment (LSA) was performed based on the machine learning techniques of Random Forest (RF) and Artificial Neural Network (ANN) models, respectively. The prediction abilities of the above two LSM models were assessed using the area under curve (AUC) value of the receiver operating characteristics (ROC) curve, precision, recall ratio, accuracy, and specificity. The performances of both machine learning techniques were found to be excellent, but RF outperformed in accuracy. There were still some differences between the models’ performances shown by the results: RF (AUC = 0.966) outperformed ANN (AUC = 0.914). The RF model demonstrated a higher degree of correlation between the areas classified as very low and high susceptibility in comparison to the ANN model. The results provided a theoretical framework upon which machine learning applications could be applied (e.g., RF and ANN), a reliable and low-cost tool to assess landslide susceptibility. This comparative study will provide a useful description of earthquake-induced landslides in the study area, which can be used to anticipate the features of landslides in the future, and have played a very important role in proper anthropogenic activities, resource management, and infrastructural development of the mountainous areas.

Keywords:

susceptibility assessment; earthquake-induced landslides; artificial neural network; random forest; Western Sichuan Plateau

1. Introduction

Landslides are a major geohazard worldwide. Every year, they cause significant damage to public infrastructure, loss of human life, and economic consequences [1]. The main triggering factors of landslides are earthquakes and rainfall, due to which slope failures occur [2,3]. One of the most dangerous secondary disasters is earthquake-triggered or coseismic landslides. Research on earthquake-triggered landslides covers a wide range of topics, from understanding the erosional process of orogenic belts to landslide hazard prevention [4]. Rapid detection of landslides is significant both for assessing earthquake impact and hazard prevention immediately following an earthquake [5]. Geohazard modeling in space can be made more manageable using landslide susceptibility assessments (LSA) (Li et al.) [6]. Over the last few years, LSAs have provided researchers with extensive data regarding landslide occurrence, as well as on the categorization of landslide-prone areas [7,8,9]. In the 1970s, LSA began as a qualitative or quantitative assessment of landslide geographic distribution, velocity, or strength [10]. LSA models are becoming more and more popular, even though several physical and statistical techniques have been proposed so far [11,12]. Physical models used accurate geological and geotechnical data to generate LSAs [13], but had a number of limitations, as they were costly, needed a large number of datasets, and lacked applicability selections [14,15]. Therefore, a linear regression model of physical models along with the analytical hierarchy method was used to develop the LSA map using machine learning methods [16,17].

The machine learning methods used in previous studies included the Convolutional Neural Network model (CNN) [18,19], Artificial Neural Network model (ANN) [20,21], Support Vector Machine model (SVM) [22,23], Recurrent Neural Network model (RNN) [24,25], Random Forest model (RF) [26], Deep Belief Network model (DBN) and Logistic Regression model (LR), etc. All models have shown good accuracy in the practical application of LSAs [27,28], though the RF and ANN models were the most widely used in previous studies because of their stability and high precision in forecasting and mapping landslide susceptibility. However, it remains unknown how well models that were developed for one region will perform in other regions. In the LSM literature, comparative studies are common formats intended to differentiate and compare different models to obtain reliable results [29,30]. According to our knowledge, the present study provides the first comparative analysis of RF and ANN in the study area of LSM, which is important for researchers studying the vulnerability assessment of natural disasters.

To investigate this point, Wenchuan County in the Western Sichuan Plateau (WSP) was selected. Based on multi-source datasets, we used the RF and ANN models to develop earthquake-triggered landslide susceptibility and compared the prediction abilities of the above two LSM models using the area under curve (AUC) value of the receiver operating characteristics (ROC) curve, precision, recall ratio, accuracy, and specificity. The main objectives of this study were: (1) to construct ANN and RF models in the study area for seismic induced landslides, (2) to compare the Artificial Neural Network and Random Forest Model for Susceptibility assessments of landslides induced by earthquakes, and (3) to determine the main influence impact factors using the RF and ANN models for landslide susceptibility assessment.

2. Materials and Methods

2.1. Study Area

Wenchuan county (102°51′–103°44′E, 30°45′–31°43′N) belongs to the northwestern Sichuan Plain with an area of 4,084 km2 and is at the conjunction of the eastern part of Tibetan Plateau and western boundary of Sichuan Basin in Longmenshan tectonic zone, one of the world’s most seismically active regions (Figure 1). Topographically, the area is mostly mountainous, and the terrain of Wenchuan County tilts from northwest to southeast. In the west, high mountains with an altitude of more than 3000 m are distributed, but the elevation in the outlet of Minjiang River in the southeast is only 780 m. The physiography of the region is primarily composed of the approximately southwest–northeast-trending Beichuan–Yingxiu fault, and its tributaries dissect mountain ranges with the same trending direction. The main fault zone in this area is the Longmeshan fault. The three main parts of the Longmenshan fault are (i) the Beichuan–Yingxiu fault (Main shock) and spread along Longmeshan fault, (ii) the Wenchuan–Maoxian fault, and (iii) the Jiangyou–Guanxian fault. Wenchuan County has a temperate monsoon climate, rising from southeast to northwest and showing a relatively complete vertical climate zone. In 2021, the total population was 91,682, and the gross domestic product (GDP) was RMB 74.99 million. The earthquake with a surface wave magnitude (Ms) bigger than 7.0 (Ms≥7.0) that occurred in this region was the 2008 Wenchuan earthquake (31.26°N, 103.45°E; Ms8.0).

2.2. Data Sources

2.2.1. Historical Landslide Inventory

Any specific region’s landslide database provides significant information on the spatial pattern of incidents in the vulnerable zone. It also aids in the comprehension of landslide behavior and the assessment of the relationship between causal causes and landslide occurrences. As a result, creating a landslide inventory map is an important and key step in any landslide susceptibility analysis. By using landslide inventories, landslide susceptibility can be analyzed [31]. Wenchuan and Lushan’s historical landslides were used in this study [32], obtained from USGS, the study analyzed 1362 landslides(Figure 2) with a single area bigger than 4 DEM grids which occurred as a result of the Wenchuan earthquake.

2.2.2. Landslide Influencing Factors

The establishment of an acceptable evaluation system and the selection of landslide influencing factors is the foundation of landslide susceptibility assessment. Digital Elevation Model (DEM), slope aspect, distance from a river, lithology of study area, soil, Normalized Different Vegetation Index (NDVI), curvature, rainfall, land use, distance from a road, and the interaction between the human activities and landslides are the main influencing factors (Table 1 and Figure 2) that were chosen for assessment. The sources, types, and resolutions of the influencing factor data are shown in Table 1. In light of the literature and current research outcomes [33,34], the influencing factors were divided into five to eight primary causative factors for developing the basic landslide susceptibility evaluation system [35], as described on Table 2 and Figure 3. All the data used in the current study were in digital format (30m × 30m grid) with a unified projection (UTM-Zone 48, WGS84 datum).

2.3. Methods

To determine the influencing factors and construct the landslide inventory, we conducted a four-stage analysis(Figure 4): (1) using the ESRI ArcGIS 10.2 (LA, USA) buffer tool, a point with no landslide was selected for every landslide point which was far from it (about 1 km) [36]. For the training and testing of models, we chose 70% and 30% landslides, respectively. (2) Susceptibility mapping of the study area was generated by using ANN and RF models. (3) The landslide susceptibility map models were compared, and the prediction ability of the above two LSM models was assessed using the area under curve (AUC) value of the receiver operating characteristics (ROC) curve, precision, recall ratio, accuracy, and specificity. (4) A better model was chosen in the study area based on the advantages of landslide predictions and validations.

2.3.1. Random Forest

Random Forest is a machine learning technique for group learning [37,38]. Based on bootstrap samples, a decision tree was built.

W (A) = {av}_{k} \max_{B} \sum_{i = 1}^{K} I (w i (A) = B)

(1)

where W(A) denotes the composition of the model. wi denotes a single decision tree. B denotes output variable and I(.) denotes characteristics function.

m g (A, B) = a v_{k} I (W k (A) = B) - \max_{j \neq B} a v_{k} I (w_{k} (B) = j)

(2)

Equation (2) calculates the model’s reliability. The higher the function value, the more reliable the model’s classification will be. Expression of the classifier’s generalization is given below:

P Q * = P_{a b} (m g (A, B) < 0)

(3)

where (A, B) represents feasibility. With the increment of several decision trees, the PQ sequence becomes PQ*.

P_{a b} (P_{Θ} (w (A, θ) = B) \max P_{θ} (w (A, θ) = J) < 0

(4)

This equation shows that the generalization error will decrease with an increase in the number of trees. Moreover, the flaw can be overcome by using RF, which is stable and performs well.

2.3.2. Artificial Neural Network

The Artificial Neural Network is a valuable tool for analyzing the likelihood of landslides and may be used to predict future landslides based on the distribution of previous landslides [39]. The landslides susceptibility evaluation can be thought of as a binary classification (landslide or non-landslide), and what we want to do is figure out which places are prone to landslides and which are not [40]. Most ANNs have three layers: Input, Hidden, and Output. Each incoming value is multiplied by the assembly weight at computing nodes. After that, the yields are totaled using a neuron-specific restraint known as bias, which is used to scale the total yields into an acceptable proportion. Finally, a computation node connects the above sum to an activation function, resulting in the node output. Weights and biases are determined using a process of optimizing, non-linearly minimizing learning function that determines how close the observations are to the outputs of ANN.

υ = f [\sum_{i = 1}^{n} ω_{j i} U_{i} + β]

(5)

The above equation shows that Input neurons

U_{i}

and output neurons

υ

are connected by weights

ω_{j i}

, whereas

β

represents bias.

2.3.3. ROC Curve and AUC Metric

ROC curves are frequently used to check LSM’s accuracy, and AUC can be used to specify accurate forecast results [41,42]. Classification with higher ROC values is more accurate. The value of AUC can be used to determine whether the samples are positive or negative. Ture positive (TP) samples are expected to be positive, whereas false negative (FN) samples are expected to be negative. Furthermore, false positive (FP) samples are expected to be positive. To create ROC curves, two variables must be calculated: 1-specificity and 1-sensitivity, derived from equations.

S E R e c a l l = \frac{T P}{T P + F N}

(6)

S P E C I F I C I T Y = \frac{T N}{T N + F P}

(7)

P R E C I S I O N = \frac{T N}{T P + F P}

(8)

A C C U R A C Y = \frac{T P + T N}{T P + T N + F P + F N}

(9)

3. Results

3.1. Analyzing the RF Model

In this study, we selected a sample from the data to determine the presence or absence of landslides and built a classification tree using the statistical program R version 4.1.1. Each node in the tree was created by randomly selecting a subset of causal factors. The most effective separation was then carried out depending on the gain factor to enhance the purity of the resulting group. The tree’s nodes were added one by one until there was just one component per leaf. By repeating this process, desired tree’s number was built (400 in the analysis). RF can detect two sorts of errors: a mean loss of accuracy (mean decrease in accuracy) and a mean loss of node impurity (mean decrease in Gini) (Table 3 and Figure 5). Variables can be ranked and selected using these types of errors [43,44].

As the number of trees of RF increases, the value of out-of-bag error (OOB) falls, whereas when the number of trees reaches 400, it shows the best results and remains stable (Figure 6). These OOB results are superior to those obtained in earlier research [45]. Furthermore, other influential factors were ranked using two different metrics: the mean decrease in accuracy and the mean decrease in the Gini coefficient. DEM is the main variable in the study area. The remaining influencing factors, except the main variable DEM, will appear in a different order than before if the mean decrease accuracy criterion is used. The most important variable, according to both measures, is PGA, followed by NDVI, distance to the river, and then DEM.

3.2. Analyzing the ANN Model

We used the functional fitting neural network system to set inputs to generate outputs, also known as the functional fitting net (FIT). For training and testing purposes for our model, in ANN we further used FIT to divide our data into three distinct datasets. These datasets comprise a training set of 70%, a validation set of 15%, and a test set of 15%. To train the model, we used 1800 training samples, 401 validation samples and 401 test samples. We trained our model for 60 epochs with a total number of 60 iterations. The role of an epoch was to train the network on each item, and the best validation performance was epoch 54 at 0.43042 (Figure 7).

The progress in the training window was updated regularly during the training. Performance, the magnitude of the performance gradient, and the number of validation checks were the most important factors to consider. The training was terminated based on the magnitude of the gradient and the number of validation tests. As the training progressed to a minimum of performance, the gradient became quite minor. The training was terminated if the gradient magnitude was smaller than 1×10^-5. The number of validation checks indicated how many iterations the validation performance had not improved. The training came to an end when this number hit 6 (the default).

3.3. Model Comparison and Construction of LSMs

The trained RF model and ANN model were used to construct the landslide susceptibility maps, which predicted the risk of each grid in the research area experiencing landslides. In addition, by using ESRI ArcGIS 10.2 (LA, USA), we categorized landslides susceptibility into four categories: Low, Moderate, High, and Very High (Figure 8). The two methods have different grading quantity segments, and the The areas of Low category We compared both maps of landslide susceptibility in the study area. According to the ROC curve (Figure 9), the accuracies of the RF and ANN models were 91% and 83%, respectively. As a result, both RF and ANN techniques performed well in prediction, but in the research area, the RF technique had a strong predictive ability. Within the research area, 86% and 92% of the total number of landslides occurred in the areas of high and very high susceptibility, respectively, whereas ANN predicted 84% and 82% of the landslides (Table 4).

4. Discussion

4.1. Comparison of Models (RF and ANN)

To compare the RF and ANN, training and performance evaluations were carried out using identical training data and test data. Figure 8 depicts the prediction results of ANN and RF in PR curves. Around 0.6 precision, the values of recall for RF and ANN were nearly the same, but overall, RF outperformed ANN. When precision was more than 0.6, there was a significant difference in recall. According to precision and recall definitions, higher precision resulted in a higher accuracy rate when forecast outcomes were positive, and a higher recall led to fewer vulnerable predictions. Even if a higher threshold was set to boost the certainty of the positive prediction, Random Forest, with a steeper PR curve, could predict more susceptibility. In conclusion, the RF model performed better in terms of prediction.

4.2. Statistical Matrices

The specificity (SP), sensitivity (SE), precision (PR), and accuracy (ACC) of predictions were calculated based on four possible prediction outcomes: True Positive (TP), True Negative (TN), False Positive (FP), False Negative (FN) The numbers of landslide cells accurately classified as landslide and non-landslides were denoted by True positive (TP) and False Positive (FP), respectively. Landslide cell numbers classified as non-landslides and mistakenly classified as non-landslides are shown by TN and FN, respectively. The SE is the ratio of accurately classified landslide cells to anticipated landslides cells. The number of erroneously classified landslide cells divided by the number of correctly classified non-landslide cells was referred to as accuracy [46,47,48,49,50].

4.3. Importance of Causative Factors

In many LSMs, elements factors of the widely regarded condition were topography, geology, hydrology and land use type [51]. The two most important factors were NDVI and Peak Ground Acceleration (PGA). The most significant contribution of distance to road could be that toad development had altered the slope, created an escarpment, and caused slope instability. The lithology component could affect the slope instability in terms of mechanical qualities, although it has been proven that hard and dense rock masses have little slope instability [52]. Ground collapse density and profile curvature were also important to condition factors. The ground surface’s unevenness was represented by the curvature factors. The steeper the slope, the more uneven the force, and the more likely geological hazards were to arise [51]. The density of ground collapse played an obvious influence in the occurrence of geological hazards. The DEM was the main variable in the study area. If the mean decreased accuracy criterion was used, they would appear in a different order than before. The most important variable, according to both measures, was PGA, followed by NDVI, the distance to the river, and then DEM.

Landslide susceptibility predictions were significantly influenced by effective and contributory factors [53]. Consequently, by calculating the mean decrease accuracy, the importance of arranging the causes of landslides can be demonstrated using the RF model [54]. Various evaluation criteria affected landslide susceptibility differently. Therefore, determining the importance and impact of the factors could assist in planning and preventing disasters. To determine the mean decrease in prediction accuracy for each factor after disordering it (rearranging the elements in the model) and examine the RF model’s accuracy, we used R Studio software (http://rstudio.com/). A higher value indicated the element was more significant. Noisy and correlated variables affected the relevance of ranking results. Results of a single significance rating were frequently erroneous [55]. As a result, the final ranking results in this research were based on the averages of ten occasions (Figure 10). We, on the other hand, kept the model by decreasing each of the ten components individually.

4.4. Application and Limitations of the Study

This research constructed ANN and RF models for susceptibility assessment on landslides induced by earthquakes in the Western Sichuan Plateau, China, and selected the main influence factors during the simulated process using both LSM models. The results provided a theoretical framework upon which machine learning applications could be applied (e.g., RF and ANN), a reliable and low-cost tool to assess landslide susceptibility. However, some limitations exist in this study: for example, maps of small and medium scale were used for the majority of the data layers. Medium-sized maps were used to collect information on several soil qualities such as depth of soil, texture of soil and ability of soil. Large-scale maps of this data would have been more suitable. Additionally, detailed seismic data information for the research area was difficult to obtain. Even though the current study has these limitations, it has still lots of potential in identifying the risks of landslides and defining the zones that will remain stable for future development and planning within the study area. Despite this, the models produced accurate results. Although two representative statistical models (RF and ANN) were compared in terms of their spatial generalization ability and prediction accuracy, neither method accounts for all aspects of the landslide mechanism. As a result, the landslide mechanism should be considered in future studies.

5. Conclusions

In hilly areas, landslides are a common occurrence in WSP, and have become a major problem for the residents and their immature and fragile formation of rocks. For the assessment of landslide susceptibility, two machine learning techniques of the RF model and ANN model were used to distinguish between the outcomes of these machine learning algorithms to figure out which technique is best for identifying landslide-prone regions. Both the LSMs were very useful for LSM based on LSMs and AUC values. Around 0.6 precision, the values of recall for RF and ANN were nearly the same, but overall, RF outperformed ANN. The RF outperformed (AUC = 0.966) ANN (AUC = 0.914), and the RF model demonstrated a higher degree of correlation between the areas classified as very low and high susceptibility in comparison to the ANN model. Landslide susceptibility predictions were significantly influenced by effective and contributory factors, and the most important variable was PGA, followed by NDVI, the distance to river, and then DEM. This comparative study will provide useful description of earthquake-induced landslides in the study area that can be used to anticipate the features of landslides triggered by earthquakes in the future. It will also play a very important role in proper anthropogenic activities, resources management, and infrastructural development of the area.

Author Contributions

Conceptualization, M.K.; data curation M.K.; methodology and analysis, J.C. (Jianfei Cao) and X.Z.; writing original draft preparation, M.K.; writing review and editing, M.K., J.C. (Jun Chang) and B.Z.; visualization, X.Z.; supervision, B.Z.; funding acquisition, J.C. (Jun Chang) All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Social Science Foundation of China [18BJY086] and the Natural Science Foundation of Shandong Province, China [ZR2021QD127, ZR2021ME203].

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets used or analyzed during the current study are available from the corresponding author on reasonable request.

Conflicts of Interest

The authors declare that they have no competing interests.

References

Aditian, A.; Kubota, T.; Shinohara, Y. Comparison of GIS-based landslide susceptibility models using frequency ratio, logistic regression, and artificial neural network in a tertiary region of Ambon, Indonesia. Geomorphology 2018, 318, 101–111. [Google Scholar] [CrossRef]
Alsabhan, A.H.; Singh, K.; Sharma, A.; Alam, S.; Pandey, D.D.; Rahman, S.A.S.; Khursheed, A.; Munshi, F.M. Landslide susceptibility assessment in the Himalayan range based along Kasauli–Parwanoo road corridor using weight of evidence, information value, and frequency ratio. J. King Saud Univ.-Sci. 2022, 34, 101759. [Google Scholar] [CrossRef]
Basu, T.; Pal, S. RS-GIS based morphometrical and geological multi-criteria approach to the landslide susceptibility mapping in Gish River Basin, West Bengal, India. Adv. Space Res. 2019, 63, 1253–1269. [Google Scholar] [CrossRef]
Bragagnolo, L.; Silva, R.V.; Grzybowsk, J.M.V. Landslide susceptibility mapping with r. landslide: A free open-source GIS-integrated tool based on Artificial Neural Networks. Environ. Model. Softw. 2020, 123, 104565. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Bui, D.T.; Tsangaratos, P.; Nguyen, V.-T.; Liem, N.V.; Trinh, P.T. Comparing the prediction performance of a Deep Learning Neural Network model with conventional machine learning models in landslide susceptibility assessment. Catena 2020, 188, 104426. [Google Scholar] [CrossRef]
Cutler, A.; Breiman, L. Random forests. In Ensemble Machine Learning; Springer: Berlin/Heidelberg, Germany, 2012; pp. 157–175. [Google Scholar]
Dao, D.; Ly, H.-B.; Trinh, S.; Le, T.-T.; Pham, B. Artificial intelligence approaches for prediction of compressive strength of geopolymer concrete. Materials 2019, 12, 983. [Google Scholar] [CrossRef]
Dao, D.; Trinh, S.; Ly, H.-B.; Pham, B. Prediction of compressive strength of geopolymer concrete using entirely steel slag aggregates: Novel hybrid artificial intelligence approaches. Appl. Sci. 2019, 9, 1113. [Google Scholar] [CrossRef]
Du, J.T.; Woldai, T.; Chai, B.; Zeng, B. Landslide susceptibility assessment based on an incomplete landslide inventory in the Jilong Valley, Tibet, Chinese Himalayas. Eng. Geol. 2020, 270, 105572. [Google Scholar] [CrossRef]
Fan, X.; Scaringi, G.; Korup, O.; West, A.J.; van Westen, C.J.; Tanyas, H.; Hovius, N.; Hales, T.C.; Jibson, R.W.; Allstadt, K.E. Earthquake-induced chains of geologic hazards: Patterns, mechanisms, and impacts. Rev. Geophys. 2019, 57, 421–503. [Google Scholar] [CrossRef]
Fang, Z.; Wang, Y.; Peng, L.; Hong, H. Integration of convolutional neural network and conventional machine learning classifiers for landslide susceptibility mapping. Comput. Geosci. 2020, 139, 104470. [Google Scholar] [CrossRef]
Guo, W.; Xu, X.; Wang, W.; Liu, Y.; Guo, M.; Cui, Z. Rainfall-triggered mass movements on steep loess slopes and their entrainment and distribution. Catena 2019, 183, 104238. [Google Scholar] [CrossRef]
Hodasová, K.; Bednarik, M. Effect of using various weighting methods in a process of landslide susceptibility assessment. Nat. Hazards 2021, 105, 481–499. [Google Scholar] [CrossRef]
Hong, H.; Tsangaratos, P.; Ilia, I.; Loupasakis, C.; Wang, Y. Introducing a novel multi-layer perceptron network based on stochastic gradient descent optimized by a meta-heuristic algorithm for landslide susceptibility mapping. Sci. Total Environ. 2020, 742, 140549. [Google Scholar] [CrossRef] [PubMed]
Hong, H.Y.; Pradhan, B.; Xu, C.; Bui, D.t. Spatial prediction of landslide hazard at the Yihuang area (China) using two-class kernel logistic regression, alternating decision tree and support vector machines. CATENA 2015, 133, 266–281. [Google Scholar] [CrossRef]
Hu, Q.; Zhou, Y.; Wang, S.; Wang, F. Machine learning and fractal theory models for landslide susceptibility mapping: Case study from the Jinsha River Basin. Geomorphology 2020, 351, 106975. [Google Scholar] [CrossRef]
Hua, Y.; Wang, X.; Li, Y.; Xu, P.; Xia, W. Dynamic development of landslide susceptibility based on slope unit and deep neural networks. Landslides 2021, 18, 281–302. [Google Scholar] [CrossRef]
Huang, F.; Ye, Z.; Jiang, S.-H.; Huang, J.; Chang, Z.; Chen, J. Uncertainty study of landslide susceptibility prediction considering the different attribute interval numbers of environmental factors and different data-based models. Catena 2021, 202, 105250. [Google Scholar] [CrossRef]
Huang, F.; Zhang, J.; Zhou, C.; Wang, Y.; Huang, J.; Zhu, L. A deep learning algorithm using a fully connected sparse autoencoder neural network for landslide susceptibility prediction. Landslides 2020, 17, 217–229. [Google Scholar] [CrossRef]
Kirschbaum, D.; Stanley, T.; Yatheendradas, S. Modeling landslide susceptibility over large regions with fuzzy overlay. Landslides 2016, 13, 485–496. [Google Scholar] [CrossRef]
Kutlug Sahin, E.; Colkesen, I.; Kavzoglu, T. A comparative assessment of canonical correlation forest, random forest, rotation forest and logistic regression methods for landslide susceptibility mapping. Geocarto Int. 2020, 35, 341–363. [Google Scholar] [CrossRef]
Li, G.; West, A.J.; Densmore, A.L.; Jin, Z.; Parker, R.N.; Hilton, R.G. Seismic mountain building: Landslides associated with the 2008 Wenchuan earthquake in the context of a generalized model for earthquake volume balance. Geochem. Geophys. Geosystems 2014, 15, 833–844. [Google Scholar] [CrossRef]
Li, J.Y.; Wang, W.D.; Han, Z. A variable weight combination model for prediction on landslide displacement using AR model, LSTM model, and SVM model: A case study of the Xinming landslide in China. Environ. Earth Sci. 2021, 80, 1–14. [Google Scholar] [CrossRef]
Li, J.; Wang, W.; Han, Z.; Li, Y.; Chen, G. Exploring the impact of multitemporal DEM data on the susceptibility mapping of landslides. Appl. Sci. 2020, 10, 2518. [Google Scholar] [CrossRef]
Li, L.; Liu, R.; Pirasteh, S.; Chen, X.; He, L.; Li, J. A novel genetic algorithm for optimization of conditioning factors in shallow translational landslides and susceptibility mapping. Arab. J. Geosci. 2017, 10, 1–12. [Google Scholar] [CrossRef]
Liu, J.; Li, S.L.; Cheng, T. Landslide susceptibility assessment based on optimized random forest model. Geomat. Inf. Sci. Wuhan Univ. 2018, 43, 1085–1091. [Google Scholar]
Liu, R.; Li, L.; Pirasteh, S.; Lai, Z.; Yang, X.; Shahabi, H. The performance quality of LR, SVM, and RF for earthquake-induced landslides susceptibility mapping incorporating remote sensing imagery. Arab. J. Geosci. 2021, 14, 1–15. [Google Scholar]
Ly, H.-B.; Monteiro, E.; Le, T.-T.; Le, V.M.; Dal, M.; Regnier, G.; Pham, B.T. Prediction and sensitivity analysis of bubble dissolution time in 3D selective laser sintering using ensemble decision trees. Materials 2019, 12, 1544. [Google Scholar] [CrossRef]
Nepal, N.; Chen, J.; Chen, H.; Wang, X.; Sharma, T.P.P. Assessment of landslide susceptibility along the araniko highway in poiqu/bhote koshi/sun koshi watershed, Nepal himalaya. Prog. Disaster Sci. 2019, 3, 100037. [Google Scholar] [CrossRef]
Ngo, P.T.T.; Panahi, M.; Khosravi, K.; Ghorbanzadeh, O.; Kariminejad, N.; Cerda, A.; Lee, S. Evaluation of deep learning algorithms for national scale landslide susceptibility mapping of Iran. Geosci. Front. 2021, 12, 505–519. [Google Scholar]
Nhu, V.-H.; Shirzadi, A.; Shahabi, H.; Chen, W.; Clague, J.J.; Geertsema, M.; Jaafari, A.; Avand, M.; Miraki, S.; Talebpour Asl, D. Shallow landslide susceptibility mapping by random forest base classifier and its ensembles in a semi-arid region of Iran. Forests 2020, 11, 421. [Google Scholar] [CrossRef]
Pham, B.T. A novel classifier based on composite hyper-cubes on iterated random projections for assessment of landslide susceptibility. J. Geol. Soc. India 2018, 91, 355–362. [Google Scholar] [CrossRef]
Pham, B.T.; Nguyen, M.D.; Bui, K.-T.T.; Prakash, I.; Chapi, K.; Bui, D.T. A novel artificial intelligence approach based on Multi-layer Perceptron Neural Network and Biogeography-based Optimization for predicting coefficient of consolidation of soil. Catena 2019, 173, 302–311. [Google Scholar] [CrossRef]
Pourghasemi, H.R.; Kerle, N. Random forests and evidential belief function-based landslide susceptibility assessment in Western Mazandaran Province, Iran. Environ. Earth Sci. 2016, 75, 1–17. [Google Scholar] [CrossRef]
Pourghasemi, H.; Rahmati, A. Rapid GIS-based spatial and regional modelling of landslide susceptibility using machine learning techniques in the R open source software. Catena 2018, 162, 177–192. [Google Scholar] [CrossRef]
Robinson, T.; Davies, T. Review Article: Potential geomorphic consequences of a future great (Mw Combining double low line 8.0+) Alpine Fault earthquake, South Island, New Zealand. Nat. Hazards Earth Syst. Sci. 2013, 13, 2279–2299. [Google Scholar] [CrossRef]
Saha, A.; Saha, S. Comparing the efficiency of weight of evidence, support vector machine and their ensemble approaches in landslide susceptibility modelling: A study on Kurseong region of Darjeeling Himalaya, India. Remote Sens. Appl. Soc. Environ. 2020, 19, 100323. [Google Scholar] [CrossRef]
Schlögel, R.; Marchesini, I.; Alvioli, M.; Reichenbach, P.; Rossi, M.; Malet, J.-P. Optimizing landslide susceptibility zonation: Effects of DEM spatial resolution and slope unit delineation on logistic regression models. Geomorphology 2018, 301, 10–20. [Google Scholar] [CrossRef]
Sevgen, E.; Kocaman, S.; Nefeslioglu, H.A.; Gokceoglu, C. A novel performance assessment approach using photogrammetric techniques for landslide susceptibility mapping with logistic regression, ANN and random forest. Sensors 2019, 19, 3940. [Google Scholar] [CrossRef]
Stanley, T.; Kirschbaum, D.B. A heuristic approach to global landslide susceptibility mapping. Nat. Hazards 2017, 87, 145–164. [Google Scholar] [CrossRef]
Su, Q.; Zhang, J.; Zhao, S.; Wang, L.; Liu, J.; Guo, J. Comparative assessment of three nonlinear approaches for landslide susceptibility mapping in a coal mine area. ISPRS Int. J. Geo-Inf. 2017, 6, 228. [Google Scholar] [CrossRef]
Trigila, A.; Iadanza, C.; Esposito, C.; Scarascia-Mugnozza, G. Comparison of Logistic Regression and Random Forests techniques for shallow landslide susceptibility assessment in Giampilieri (NE Sicily, Italy). Geomorphology 2015, 249, 119–136. [Google Scholar] [CrossRef]
Tsangaratos, P.; Ilia, I.; Hong, H.; Chen, W.; Xu, C. Applying Information Theory and GIS-based quantitative methods to produce landslide susceptibility maps in Nancheng County, China. Landslides 2017, 14, 1091–1111. [Google Scholar] [CrossRef]
Wang, Y.; Sun, D.L.; Wen, H.J.; Zhang, H.; Zhang, F.T. Comparison of random forest model and frequency ratio model for landslide susceptibility mapping (LSM) in Yunyang County (Chongqing, China). Int. J. Environ. Res. Public Health 2020, 17, 4206. [Google Scholar] [CrossRef] [PubMed]
Wang, L.J.; Guo, M.; Sawada, K.; Lin, J.; Zhang, J. Landslide susceptibility mapping in Mizunami City, Japan: A comparison between logistic regression, bivariate statistical analysis and multivariate adaptive regression spline models. Catena 2015, 135, 271–282. [Google Scholar] [CrossRef]
Wang, W.-D.; Li, J.; Han, Z. Comprehensive assessment of geological hazard safety along railway engineering using a novel method: A case study of the Sichuan-Tibet railway, China. Geomat. Nat. Hazards Risk 2019, 11, 1–21. [Google Scholar] [CrossRef]
Wang, W.D.; He, Z.L.; Han, Z.; Li, Y.G.; Dou, J.; Huang J., L. Mapping the susceptibility to landslides based on the deep belief network: A case study in Sichuan Province, China. Nat. Hazards 2020, 103, 3239–3261. [Google Scholar] [CrossRef]
Wang, Y.; Fang, Z.; Hong, H. Comparison of convolutional neural networks for landslide susceptibility mapping in Yanshan County, China. Sci. Total Environ. 2019, 666, 975–993. [Google Scholar] [CrossRef]
Wang, Y.; Fang, Z.; Wang, M.; Peng, L.; Hong, H. Comparative study of landslide susceptibility mapping with different recurrent neural networks. Comput. Geosci. 2020, 138, 104445. [Google Scholar] [CrossRef]
Xie, P.; Hai-Jia, W.; Dong-Ping, H.U. Research on susceptibility mapping of earthquake-induced landslides along highway in mountainous region. China J. Highw. Transp. 2018, 31, 106. [Google Scholar]
Yanar, T.; Kocaman, S.; Gokceoglu, C. Use of Mamdani fuzzy algorithm for multi-hazard susceptibility assessment in a developing urban settlement (Mamak, Ankara, Turkey). ISPRS Int. J. Geo-Inf. 2020, 9, 114. [Google Scholar] [CrossRef]
Yilmaz, I. Landslide susceptibility mapping using frequency ratio, logistic regression, artificial neural networks and their comparison: A case study from Kat landslides (Tokat-Turkey). Comput. Geosci. 2009, 35, 1125–1138. [Google Scholar] [CrossRef]
Youssef, A.M.; Pourghasemi, H.R.; Pourtaghi, Z.S.; Al-Katheeri, M.M. Landslide susceptibility mapping using random forest, boosted regression tree, classification and regression tree, and general linear models and comparison of their performance at Wadi Tayyah Basin, Asir Region, Saudi Arabia. Landslides 2015, 13, 839–856. [Google Scholar] [CrossRef]
Yu, K.Y.; Yao, X.; Qiu, Q.R.; Liu, J. Landslide spatial prediction based on random forest model. Trans. CSAM 2016, 47, 338–345. [Google Scholar]

Figure 1. Study area map.

Figure 2. Map of the landslide inventory.

Figure 3. The causative factors of landslides used in the study area.

Figure 4. Flowchart of methodology in study area.

Figure 5. Mean decrease accuracy and mean decrease Gini (sorted decreasingly from top to bottom) of attributes as assigned by the Random Forest.

Figure 6. The OOB (Out-of-Bag) error rate of the overall Random Forest (RF) model in the (left) and Receiver Operating Curve (ROC) curve using Random Forest (RF) in the (right).

Figure 7. The best validation performance in Artificial Neural Network ANN in (left) and final Receiver Operating Curve (ROC) curve with the value of Area Under Curve (AUC) in (right).

Figure 8. Landslide Susceptibility maps using Random Forest (RF) (left) and Artificial Neural Network (ANN) (right).

Figure 9. Comparison between the Receiver Operating Curve (ROC) with the Area Under Curve (AUC) values of Random Forest (RF) (left) and Artificial Neural Network (ANN) (right), respectively.

Figure 10. Showing descending order of Mean Decrease Accuracy (a) and Mean Decrease Gini (b) of attribute assigned by Random Forest (RF).

Table 1. The data sources of influencing factors.

Name	Source	Type	Scale/ Resolution
Historical landslide Inventory	United State Geological Survey USGS	Vector/Point	1:10,000
Digital Elevation Model	http://www.gscloud.cn/home data (accessed on 8 October 2019).	Raster	90 m × 90 m
Geological data	http://gsd.cgs.cn/download.asp (accessed on 29 August 2008).	Vector/polygon	1:50,000
LULC	https://www.resdc.cn/Default.aspx (accessed on 31 December 2020	Raster	30 m × 30 m
Soil	https://www.resdc.cn/Default.aspx(accessed on 31 December 2009).	Vector	1:1,000,000
Administrative division	https://www.resdc.cn/Default.aspx (accessed on 31 December 2020).	Vector	1:100,000
River network	https://www.webmap.cn/mapDataAction.do (accessed on 31 December 2020)	vector	1:10,000
Landsat-8	Geospatial Data Cloud platform (accessed on 31 August 2021).	Raster	30 m × 30 m
Roads	https://www.openhistoricalmap.org (accessed on 31 December 2020).	vector	1:10,000
Curvature	Extracted from DEM	Raster	90 m × 90 m
NDVI	Extracted from Landsat-8	Raster	30 m × 30 m
Slope Aspect	Extracted from DEM	Raster	90 m × 90 m
Slope Angle	Extracted from DEM	Raster	90 m × 90 m

Table 2. Classification of causative factors.

Factor	Type	Classification
Slope Aspect	Continuous	(a) North–East (–1–450); (b) East–North (45–900); (c) East–South (90–1350); (d) South–East (135–1800); (e) South–West (180–2250); (f) West–South (225–2700); (g) West–North (270–3150); (h) North–West (315–3600).
NDVI	Continuous	(a) –0.12 to –0.033; (b)–0.033 to –0.021; (c) –0.021 to –0.0051; (d) –0.0051 to 0.012; (e) 0.012 to 0.032; (f) 0.032 to 0.145.
Distance to River/km	Continuous	(a) 1 km (b) 1–2 km (c) 2–3 km (d) 3–5 km (e) 5–7 km (f) 7–9 km
Lithology	Categorical	(a) Fine Clastic Rock; (b) Phyllite; (c) Granite; (d) Diorite; (e) Synite; and (f) Carbonate Rock.
Land Use	Categorical	(a) Dry Land; (b) Wood Land; (c) Grass Land; (d) Beach; (e) Rocks.
Soil	Categorical	(a) Dark brown soil; (b) Cinnamon Soil; (c) Coarse Bonny Soil; (d) Yellow Soil; (e) Lakes.
PGA	Continuous	(a) 0.1–0.3g (b) 0.3–0.5g (c) 0.5–0.8g (d) 0.8–1.1g (e) 1.1–1.4g.
Slope Angle	Continuous	(a) 0–13°; (b) 13–21°; (c) 21–28°; (d) 28–34°; (e) 34–39°; (f) 39–45°; (g) 45–52°; (h) 52–62°; and (i) 62–88°

Table 3. RF model accuracy validated ten-cross-fold.

Subset	Accuracy		Subset	Accuracy
Subset	Training	Testing	Subset	Training	Testing
1	1	0.988	6	1	0.891
2	1	0.981	7	1	0.916
3	1	0.972	8	1	0.875
4	1	0.912	9	1	0.972
5	1	0.956	10	1	0.926

Table 4. Performance results of two machine learning models.

Model	AUC	TN	FP	FN	TP	SE	SP	PRE	ACC
ANN	0.914	1085	225	216	1145	0.84	0.82	0.79	0.83
RF	0.966	368	28	29	192	0.86	0.92	0.87	0.91

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kamal, M.; Zhang, B.; Cao, J.; Zhang, X.; Chang, J. Comparative Study of Artificial Neural Network and Random Forest Model for Susceptibility Assessment of Landslides Induced by Earthquake in the Western Sichuan Plateau, China. Sustainability 2022, 14, 13739. https://doi.org/10.3390/su142113739

AMA Style

Kamal M, Zhang B, Cao J, Zhang X, Chang J. Comparative Study of Artificial Neural Network and Random Forest Model for Susceptibility Assessment of Landslides Induced by Earthquake in the Western Sichuan Plateau, China. Sustainability. 2022; 14(21):13739. https://doi.org/10.3390/su142113739

Chicago/Turabian Style

Kamal, Mustafa, Baolei Zhang, Jianfei Cao, Xin Zhang, and Jun Chang. 2022. "Comparative Study of Artificial Neural Network and Random Forest Model for Susceptibility Assessment of Landslides Induced by Earthquake in the Western Sichuan Plateau, China" Sustainability 14, no. 21: 13739. https://doi.org/10.3390/su142113739

APA Style

Kamal, M., Zhang, B., Cao, J., Zhang, X., & Chang, J. (2022). Comparative Study of Artificial Neural Network and Random Forest Model for Susceptibility Assessment of Landslides Induced by Earthquake in the Western Sichuan Plateau, China. Sustainability, 14(21), 13739. https://doi.org/10.3390/su142113739

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Comparative Study of Artificial Neural Network and Random Forest Model for Susceptibility Assessment of Landslides Induced by Earthquake in the Western Sichuan Plateau, China

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Data Sources

2.2.1. Historical Landslide Inventory

2.2.2. Landslide Influencing Factors

2.3. Methods

2.3.1. Random Forest

2.3.2. Artificial Neural Network

2.3.3. ROC Curve and AUC Metric

3. Results

3.1. Analyzing the RF Model

3.2. Analyzing the ANN Model

3.3. Model Comparison and Construction of LSMs

4. Discussion

4.1. Comparison of Models (RF and ANN)

4.2. Statistical Matrices

4.3. Importance of Causative Factors

4.4. Application and Limitations of the Study

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI