Analysis of Geological Hazard Susceptibility of Landslides in Muli County Based on Random Forest Algorithm

Wu, Xiaoyi; Song, Yuanbao; Chen, Wei; Kang, Guichuan; Qu, Rui; Wang, Zhifei; Wang, Jiaxian; Lv, Pengyi; Chen, Han

doi:10.3390/su15054328

Open AccessArticle

Analysis of Geological Hazard Susceptibility of Landslides in Muli County Based on Random Forest Algorithm

by

Xiaoyi Wu

^1,2,

Yuanbao Song

¹,

Wei Chen

³,

Guichuan Kang

^4,*

,

Rui Qu

⁴

,

Zhifei Wang

²

,

Jiaxian Wang

⁵,

Pengyi Lv

⁵ and

Han Chen

^6,7

¹

Evaluation and Utilization of Strategic Rare Metals and Rare Earth Resource Key Laboratory of Sichuan Province & Sichuan Geological Survey, Chengdu 610081, China

²

College of Tourism and Urban-Rural Planning, Chengdu University of Technology, Chengdu 610059, China

³

Liangshan Prefecture Urban and Rural Land Comprehensive Consolidation and Reserve Center, Liangshan 615050, China

⁴

College of Earth Sciences, Chengdu University of Technology, Chengdu 610059, China

⁵

Research Institute of Exploration and Development, PetroChina Southwest Oil & Gas Field Company, Chengdu 610051, China

⁶

Sichuan Earthquake Agency, Chengdu 610041, China

⁷

Chengdu Institute of Tibetan Plateau Earthquake Research, China Earthquake Administration, Chengdu 610041, China

^*

Author to whom correspondence should be addressed.

Sustainability 2023, 15(5), 4328; https://doi.org/10.3390/su15054328

Submission received: 29 December 2022 / Revised: 14 February 2023 / Accepted: 23 February 2023 / Published: 28 February 2023

(This article belongs to the Section Hazards and Sustainability)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Landslides seriously threaten human life and property. The rapid and accurate prediction of landslide geological hazard susceptibility is the key to disaster prevention and mitigation. Traditional landslide susceptibility evaluation methods have disadvantages in terms of factor classification and subjective weight determination. Based on this, this paper uses a random forest model built using Python language to predict the landslide susceptibility of Muli County in western Sichuan and outputs the factor weight and model accuracy. The results show that (1) the three most important factors are elevation, distance from the road, and average annual rainfall, and the sum of their weights is 67.54%; (2) the model’s performance is good, with ACC = 99.43%, precision = 99.3%, recall = 99.48%, and F1 = 99.39%; (3) the landslide development and susceptibility zoning factors are basically the same. Therefore, this model can effectively and accurately evaluate regional landslide susceptibility. However, there are some limitations: (1) the landslide information statistical table is incomplete; (2) there are demanding requirements in terms of training concentration relating to the definition of landslide and non-landslide point sets, and the landslide range should be accurately delineated according to field surveys.

Keywords:

random forest; landslide; susceptibility analysis; Muli County

1. Introduction

Landslides are one of the geological hazards with the highest occurrence rate in the southwest of Sichuan, with such features as high hazardousness, high destructive power, and wide distribution, and thus seriously threaten the safety of human life and property [1]. Muli, Sichuan, has mountainous terrain, rich stratigraphy, complex structure, and a low degree of overall exploitation. However, along the river on both sides of the low elevation, it features relatively flat terrain, encouraging road construction and human settlement development with riverfront characteristics. Under the actions of nature and external forces, the two sides of the riverbank easily form unstable slopes, which provide conditions for the gestation of landslides. By the end of 2021, Muli County had had 313 landslides. There were 19 large landslides, 134 medium ones, and 160 small ones. These landslides are uncertain in their occurrence, highly destructive, and threaten a large number of people and properties. Therefore, it is of great significance to explore their spatial distribution characteristics, disaster mechanism, and vulnerability index to assist regional disaster early warning work.

The research on landslide susceptibility can be divided into three types: qualitative analysis, semi-quantitative analysis, and qualitative analysis [2]. Qualitative analysis is too subjective and is more influenced by expert opinions [3]. In recent years, a study has shown that quantitative analysis and semi-quantitative analysis methods are the most commonly used in landslide susceptibility issues [4,5,6,7]. Semi-quantitative analysis methods include the Analytic Hierarchy Process (AHP), the Analytic Network Process (ANP), the Fuzzy Analytic Hierarchy Process (FAHP), etc. These methods are mainly applied in the fields of landslide susceptibility and risk [8,9,10]. The purpose of quantitative analysis is to determine the quantitative relationship between landslides and the genesis of landslides; therefore, the application of mathematical statistics and mathematical models in this field is becoming more and more extensive. Furthermore, machine learning is being applied by more and more workers with the development of computer technology [11,12,13,14]. In recent years, the algorithms of machine learning have been widely used in many fields [15]. Machine learning (1) has the advantages of handling abundant data and ultra-multidimensional spatial data sets, and (2) can achieve accurate classification and prediction [16]. Machine learning models include random forest (RF), Support Vector Machine (SVM), Logistic Regression (LR), Artificial Neural Network (ANN), Convolutional Neural Network (CNN), etc.; these have been applied in the study of landslides of many evaluation types (susceptibility, hazard, risk, sensitivity, etc.) [17]. Some studies also compare the performances of models via the results of multiple algorithm runs [18,19,20,21]. Previous studies indicate that RF, as a typical type of bagging strategy, has the advantages of high stability, ease of use, low time costs, and high accuracy [22,23]. Moreover, the random forest algorithm conducts modeling by randomly selecting samples and features but has low sensitivity to noise and outliers and does not easily overfit [24] (Table 1).

Before using the model, it is necessary to identify the disaster factors of landslides in the study area. Most studies show that geological environment factors are the most important factors in landslide susceptibility evaluation. Rainfall and human engineering activities are external conditions that promote the development of geological disasters. Traditional landslide susceptibility studies are mainly performed via the weighted superposition of raster files. The pretreatment process of a raster file involves the subjective determination of factor classification and weight [25,26], which may affect the objectivity of the model’s results. The raster files are converted into point data sets, and the landslide and non-landslide point sets are delineated through field investigation. Our landslide susceptibility prediction model was obtained by the training and verification of the selected point set. The susceptibility of the whole domain was evaluated, and the model precision and factor weight were output using our model. In this way, the disadvantages of traditional landslide susceptibility evaluation can be addressed.

Based on the landslide and non-landslide point sets, this paper uses Python language to build a random forest model, establish a landslide susceptibility prediction model, evaluate the landslide susceptibility of Muli County, output the importance ranking of factors, and compile a landslide susceptibility classification chart. The results can provide some basis for the disaster prevention and mitigation work of the local government.

2. Data and Methods

2.1. Overview of the Study Area

Muli County is located in the southwest of Sichuan Province and the northwest of Liangshan Yi Autonomous Prefecture, with the geographical coordinates of longitude: 100°03′20″~101°40′00″ E and latitude: 27°40′30″~29°10′20″ N. The county mainly comprises three kinds of landforms: the northwest is a mountain plain landform, the southeast of the mountains has deep-cut mountain landforms, and the southwest has high mountains and deep -cut mountain landforms. The territory of the three big rivers (Yalong River, Litang River, Shuiluo River) is contained within the Jinsha River system; the rivers on both sides of the tributaries are dense, and dendritic factors account for the development of the valley in a deep narrow “V” shape.

Muli County is located at the junction of the Yangtse Quasi-Terrace and the Songpan Ganzi Fold System; it is at the eastern edge of the Ganzi-Muli Earth Backslope and is bordered on the southeast by the Kang Dian ancient land and Yan Yuan, Lijiang Terrace at the edge of the depression area, and on the northwest by Songpan, the Ganzi Trough Fold Zone and the Yidun, Dacheng Fold Zone. Most of the county is in the western Sichuan Trough, i.e., the eastern edge of the Ganzi Trough, and faces the eastern Sichuan Trough (i.e., Yangzi Quasi-Terrane) to the southeast across the Jinping Mountain Fault and the Xiaojin River Fault. Since the Tertiary period, due to the movement of the Himalayas, the crust has undergone a strong uplift, making most of Muli County a part of the Tibetan Plateau. The strong folding action of the Indo-Chinese movement in the Triassic period is the basis of the complex present-day tectonics. Resulting from a series of north–south folds and large, deep fractures, the recent uplift of the complex backslope in Muli has been much greater than that of the Kang-Dian ancient land to the southeast, thus determining the contemporary topography, i.e., high in the northwest and low in the southeast, with terrain spreading north–south (Figure 1).

2.2. Data

2.2.1. Data Selection Basis

In the evaluation of landslide susceptibility, the correct identification of landslide disaster-causing factors is the first step. According to the results of field surveys of landslide geological disasters in Muli County, combined with previous research results, we selected 10 factors, including elevation, slope, aspect, stratigraphic lithology, distance from the road, distance from the river, distance from the fault, land use cover type, normalized vegetation index, and average annual precipitation [27,28,29].

(1): Elevation

Elevation is a basic index reflecting topography [30] (Figure 2a). Elevation plays an indirect role in the breeding and development of geological hazards [31]. The first factor to consider is groundwater action. Many studies have pointed out a correlation between groundwater and geological hazards [32]. Muli County comprises a mountainous terrain. With increasing elevation, the groundwater has less impact on the likelihood of geological disasters. The second factor is the scope and intensity of human activities. Generally, human activities mostly occur in areas of lower altitudes and flatter terrain, so the higher the altitude, the smaller the scope of human activities, and the lower their intensity.

(2): Slope

Slope is an important factor controlling the occurrence of geological hazards [33,34] (Figure 2b). Slope is not only related to human activities, loose material accumulation, and slope stress, but it also closely correlates with hydrological and lithological factors [35]. The slope provides a critical surface for the body of the disaster, controls the friction impacting the landslide’s surface, and affects the probability of slope displacement [36].

(3): Aspect

The duration of sunshine and the intensity of solar radiation vary from slope to slope (Figure 2c). Generally speaking, sunshine duration is longer, and solar radiation intensity is stronger on sunny slopes, which leads to better vegetation development, faster weathering of rocks, and looser geotechnical structures. These processes can impact slope surface runoff, physical properties, and the infiltration capacity of rainfall, thus controlling the stability of the slope.

(4): Stratigraphic lithology

There is a strong relationship between the occurrence of geological hazards and the lithology of the strata [37,38,39,40,41] (Figure 2d). On the one hand, lithology affects the development of joint fissures. On the other hand, the difference between rocks from above to below will give rise to nonconformities in the surface, which are conducive to geo-disasters. Moreover, different lithologies show different degrees of resistance to deformation, and it is generally believed that the stronger the resistance to deformation, the lower the probability of geological hazards.

(5): Distance from the road

Through field investigations, it is found that the construction of roads and residential land alters the stability of a slope, and roads have become a key factor affecting the development of landslide geological hazards [42]. The closer the region is to a road, the greater the probability of landslide geological hazard (Figure 2e).

(6): Distance from rivers

Not only do rivers have a shaping effect on riverbanks, but the distribution and density of rivers also determine the infiltration capacity of regional soils and their water contents (Figure 2f), which thus affect the probability of landslide occurrence. Studies have shown that soil moisture content is closely related to the development of geological hazards [43]. Therefore, distance from rivers becomes a key factor in analyzing vulnerability, danger, and the risk of landslides [44].

(7): Distance from fault

There are two reasons for identifying distance from a fault as a key factor in analyzing landslide susceptibility [28]. One is that earthquakes usually develop on a fault, and when an earthquake occurs, its force will cause the slope to become less stable [45]. It is generally believed that the closer a slope is to a fault, the greater the probability of landslides. Secondly, faults generally develop on fracture zones of a certain width, where the nearby joints and fissures are more developed, thus increasing the formation of fissures and the probability of landslide geological hazards [39] (Figure 2g).

(8): Land use cover type

Land use cover types refer to the different forms of human modifications of geotechnical bodies, which modifications destroy the natural environment’s ability to self-regulate to a certain extent [27]. For example, the construction of panhandle roads affects the soil’s natural ability to be infiltrated with rainwater, which leads to the gathering of rainwater and the scouring of slope bodies; this causes soil erosion and can even trigger geological disasters. Therefore, it is generally believed that the more fragile the land type, the greater the probability of landslides (Figure 2h).

(9): Normalized Vegetation Index (NDVI)

NDVI (vegetation index) reflects the development status of surface vegetation. The higher the vegetation index, the better the development of the surface vegetation, the greater the biomass, and the stronger the ability of the area to prevent wind damage, fix the soil, and resist the scouring of the surface by rainwater [37]. It is generally believed that the higher the vegetation index, the lower the probability of geological disasters [41] (Figure 2i).

(10): Average annual precipitation

Rainfall is one of the factors predisposing an area to geological hazards (Figure 2j) [46]. A large number of studies have shown that geological hazards mostly occur in the high-rainfall season [47], and Muli County is situated in southwest Sichuan, where rainfall is concentrated from May–October, and especially from June to August.

2.2.2. Data Sources and Processing

Firstly, the data collected on the 10 factors above (Table 2 for data sources) were transformed, calibrated, and resampled by projection. They were then unified to a 50 m ∗ 50 m resolution raster point and converted into point files. Secondly, 313 previous landslides in Muli County were highlighted in the field survey results of the 2021 Muli County Geological Hazard Risk Assessment Project (Figure 3), and 352 non-landslides were delineated uniformly and randomly in the study area. Finally, the data on 10 features of the regions around landslide and non-landslide events were extracted and randomly divided into a training set and a validation set, with a ratio of 7:3.

2.3. Random Forest Model Establishment

2.3.1. Random Forest Modeling Principles

The random forest model is the most common of the integrated algorithms. It extracts multiple samples from the original data via repeated autonomous sampling and constructs decision trees for each sample. It then compiles these decision trees and votes, taking each decision tree as a member to achieve classification and prediction. The random forest algorithm has many advantages, the most important of which is that in cases with large amounts of sample data, when more feature elements are used, fewer errors are generated, and overfitting becomes less. Additionally, RF is widely used for and effective in identifying and obtaining the importance of variables. The flow of the random forest model is shown in Figure 4.

2.3.2. Model Establishment

Model construction is an important step in the task of landslide susceptibility assessment, and the model construction process of the specific model is shown in Figure 5. Following the field survey, two important tasks were completed. First, we determined the main factors of landslide development. In this study, 10 factors, such as elevation and slope, were selected as the main ones affecting landslide development in the study area following field exploration, and their attributes were extracted to establish the point data sets. Secondly, the landslide and non-landslide areas were delineated, and all point sets within the given range were extracted. Next, the landslide and non-landslide point sets were randomly divided into a training set and a test set by 7:3. Among these, 70% of the training samples employed the sklearn library based on Python to build the random forest model, and the remaining 30% of the test samples were used in model testing. Finally, all the data from the study area were input into the model to yield prediction results, factor importance, and model accuracy (Figure 5).The code availability section is in Appendix A.

2.3.3. Precision Analysis of Random Forest Model

After the model’s training was completed, the test set was used as the input to yield prediction results, and the accuracy (ACC) was evaluated by comparison with the real results. In this paper, four indexes—ACC, Precision, Recall, and F1—have been used to evaluate the performance of the model. ACC reflects the overall correctness; Precision and Recall reflect the accuracy of the prediction results, and F1 reflects the performance of the model. Generally, high ACC, Precision, Recall, and F1 are considered ideal.

2.3.4. Factor Importance Measure

This study used the out-of-bag (OOB) ranking function of the random forest algorithm to assess the importance of each factor in a landslide event. First, for each decision tree, the corresponding OOB was selected to calculate the error, denoted as errOOB₁; second, noise interference was randomly added to feature X of all samples of the OOB, and the OOB error was calculated again, denoted as errOOB₂. Assuming N trees in the forest, the importance of feature X is given by:

i m p o r t a n c e = \frac{1}{n} \sum_{i = 1}^{n} |e r r O O B_{2} - e r r O O B_{1}|

The reason for using this value for feature importance is that the OOB accuracy decreases sharply when random noise is added. This indicates that the feature has a greater impact on the prediction of the sample and therefore has greater importance.

3. Results

3.1. Prediction Results of Landslide Susceptibility

The probability ranges of the output landslides are all 0 to 1. The prediction results can be classified into five levels according to the natural breaks (Jenks) [2]: very low susceptibility (0–0.1), low susceptibility (0.1–0.3), medium susceptibility (0.3–0.55), high susceptibility (0.55–0.8) and very high susceptibility (0.8–1). Predictions of landslide geological hazard susceptibility in Muli County can thus be obtained from the graph shown in Figure 6. The data statistics are shown in Table 3.

From Figure 6 and Table 3, it can be seen that very high susceptibility is mainly distributed along both sides of the three major rivers in the county, comprising about 1078.98 km² and thus accounting for 8.14% of the total area of the county. The number of developing landslides accounts for 93.61% of the total number of landslides. The high susceptibility area and the medium susceptibility area are mainly distributed around the very high susceptibility area, which is also distributed along the river, with an area of about 1906.7 km², accounting for 14.38% of the total area of the county. The number of developed landslides here accounts for 6.39% of the total number of landslides; low and very low susceptibility areas are mainly distributed in mountainous regions with high elevation, and the number of developed landslides here is 0.

3.2. Factor Contribution Rate Output Results

The weights of the factors of the established random forest model have been yielded, and the results are shown in Figure 7. There are 10 factors that contribute to the proliferation of landslides, among which the three factors of elevation, distance from the road, and average annual rainfall are the most important, the sum of the importance of which reaches 67.54%. Distance from the fault, distance from the river, and the aspect are the next most important factors, while slope, land use type, vegetation index, and rock group make the least contribution to the occurrence of landslides.

3.3. Model Accuracy Evaluation

The results obtained from the test set used as the model input are here compared with the real results to obtain the computational accuracy evaluation index. The accuracy of the model is 99.43%, its Precision is 99.3%, its Recall is 99.48%, and its F1 is 99.39%. Further, all indices are excellent, thus indicating the overall accuracy and reliability of the model.

4. Discussion

4.1. Limitations of Landslide Data

A statistical table of landslide information for 2021 has been used in this study. The data in the table only show information on the number of people and properties threatened by landslide points in 2021, while the landslide points that do not threaten people or that have been treated by engineering approaches are not included in the table. In the year that has followed since data collection, new landslides may have occurred that are not included in the statistical table. Therefore, statistical tables of landslide information in a single year are not considered comprehensive. However, a bottleneck has developed in the collection of new landslide data over the years. Therefore, obtaining more comprehensive landslide data is an important challenge in the compilation of landslide susceptibility, danger, and risk assessment maps [49].

4.2. Differences in the Importance of Disaster-Causing Factors

It is normal for the ranking of importance of the disaster, causing factors to vary between studies because of the differences in geological and environmental conditions in different study areas, as well as the differences in the intensity of human engineering activities and rainfall intensity. For example, Mingyong Liao et al. (2022) found that rainfall, elevation, and lithology have significant effects on landslide occurrence when studying the influence of raster resolution on landslide susceptibility assessments [37]; Mohammed Amin Benbouras (2022) used a neural network algorithm in their landslide susceptibility assessment, and the results show that distance from a river is the most important factor affecting landslide susceptibility, while lithology and distance from the road also have some influence [50]. However, these studies are consistent in their general conclusions, in that they find that topography, rainfall, distance from the road, and slope are the main factors that induce landslide geological hazards [51], which conclusion is basically consistent with the results of this study.

In the process of remote sensing image interpretation and field investigations of landslides in Muli County, it was found that the topography of Muli County is complex and mountainous, its overall degree of possible exploitation is low, and the settlement areas and road construction sites in Muli County are primarily located on the sides of rivers with relatively low elevation, relatively flat terrain, and relatively high exploitability, which areas easily form unstable slopes and are susceptible to landslides. In addition, when the rainfall intensity increases, the saturation of the soil causes instability in the slip zone.

In conclusion, landforms, human activities, and rainfall are the main factors affecting landslides in this area.

4.3. Wide Applicability and Limitations of Landslide Susceptibility Model

The landslide susceptibility evaluation method proposed in this study uses a point data set from areas surrounding landslide and non-landslide events for training and, in this way, estimates the landslide susceptibility probability in the study area. This algorithm not only avoids the two steps of factor grading and subjective weight determination that are necessary for traditional evaluation, but it also reduces the calculation requirement. Moreover, the model can output results alone, which improves the objectivity of its results. As based on a random forest model, this algorithm has the advantages of high stability and ease of use, low time costs, and high precision [22,23], and its use can be extended to other research areas or other types of geological disaster assessment [24].

At the same time, this model has some limitations. Firstly, it is affected by the limitations of landslide data. Secondly, the delineation of the scope of landslides that this research process necessitates requires field investigations to ensure the accuracy of the delineation range yielded, as well as the model’s training. However, many landslides occurred in some study areas, and field investigations require much manpower and financial resources, which also makes it difficult to apply this method broadly.

5. Conclusions

In this study, 10 factors, including geological, environmental conditions, human engineering activities, and rainfall conditions, were integrated, and an efficient, rapid, and accurate method for landslide susceptibility analysis based on machine learning was established. This method can provide technical support for geological hazard assessment, and its results can also guide the local government’s disaster prevention and reduction work.

The statistical results regarding the importance of factors show that the influence of road construction over landslide occurrence is greater than that of rainfall. The unconsidered planning and construction of roads will weaken the infiltration of rainwater into the soil, which will thus more likely form runoff, and this will scour the soil and eventually cause an overbalance, turning into a landslide event. It is suggested that in the processes of highway construction and urban planning, the damage to slopes after development should be remedied so as to reduce the occurrence of landslides as much as possible.

Author Contributions

Conceptualization, X.W. and G.K.; methodology, W.C.; software, X.W. and G.K.; validation, W.C., Y.S. and R.Q.; formal analysis, J.W.; investigation, Z.W. and G.K.; resources, X.W.; data curation, X.W.; writing—original draft preparation, X.W. and W.C.; writing—review and editing, G.K. and P.L.; visualization, H.C.; supervision, G.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The images and geohazard data used in this paper are labeled in Table 2 and all other data used in this article came from published sources listed in the references.

Acknowledgments

We thank Sichuan Geological Survey Institute for this support.This study is part of the research activities conducted by the first author during the execution of the project at Sichuan Geological Survey Institute. The authors express sincere thanks to three anonymous reviewers for critical evaluation and constructive suggestions to improve the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

from sklearn.ensemble import RandomForestClassifier

from sklearn.model_selection import train_test_split

from sklearn.model_selection import cross_val_score

import numpy as np

from sklearn.metrics import f1_score, precision_score, recall_score

from sklearn.metrics import roc_curve, auc

def draw_auc(y,predicted_probs):

predicted_probs_0 = []

predicted_probs_1 = []

for i in range(len(predicted_probs)):

predicted_probs_0.append(predicted_probs[i][0])

predicted_probs_1.append(predicted_probs[i][1])

fpr, tpr, thresholds = roc_curve(y, predicted_probs_1, pos_label = 1)

print(fpr, tpr)

auc_num = auc(fpr, tpr)

plt.xlim(-1, 2)

plt.ylim(-1, 2)

plt.plot(fpr, tpr, label = ‘LightGBM (AUC = {:.3f})’.format(auc_num))

plt.plot([0, 1], [0, 1], ‘k--’)

plt.xlabel(‘False positive rate’)

plt.ylabel(‘True positive rate’)

plt.title(‘ROC curve ‘)

plt.legend(loc = ‘best’)

plt.show()

plt.close()

data = pd.read_csv(‘TRAIN_DATA.csv’)

data_dropna = data.dropna(how = ‘any’)

train_parm_list = [‘NDVI’,’Slope’,’Elevation’,’Rainfall’,’Rock’,’Landcover’,’Aspect’,’Fault’,’River’,’Road’]

TrainData_x = data_dropna.loc[:,train_parm_list]

TrainData_y = data_dropna.loc[:,[‘result’]].values.ravel()

TrainData_x_train,TrainData_x_train_test,TrainData_y_train,TrainData_y_test = train_test_split(TrainData_x,TrainData_y,test_size = 0.3,random_state = 1)

forest = RandomForestClassifier(random_state = 1,n_jobs = -1)

forest.fit(TrainData_x_train, TrainData_y_train)

quanzhong = forest.feature_importances_

print(‘Total weights: %.1f ‘%sum(quanzhong))

for i in range(len(quanzhong)):

add_str = ‘ ‘

print(‘Element:’,train_parm_list[i],add_str [0:10-len(train_parm_list[i])],’ Weight:%.4f ‘%quanzhong[i])

plt.figure()

plt.rcParams[‘font.sans-serif’] = [‘SimHei’]

plt.rcParams[‘axes.unicode_minus’] = False

plt.bar(train_parm_list,quanzhong)

plt.xlabel(‘Impact Factor’)

plt.ylabel(‘ Weight ‘)

plt.title(‘ Impact factor weighting chart ‘)

plt.show()

plt.close()

TrainData_x_train_predict = forest.predict(TrainData_x_train)

predicted_probs = forest.predict_proba(TrainData_x_train)

a = forest.score(TrainData_x_train,TrainData_y_train)

f1 = f1_score(TrainData_y_train,TrainData_x_train_predict, average = ‘macro’)

p = precision_score(TrainData_y_train,TrainData_x_train_predict, average = ‘macro’)

r = recall_score(TrainData_y_train,TrainData_x_train_predict, average = ‘macro’)

print(‘\nTraining Set—Accuracy:%.4f ‘%a,’ Accuracy rate:%.4f’%p,’ Recall Rate:%.4f ‘%r,’F1:%.4f ‘%f1)

draw_auc(TrainData_y_train, predicted_probs)

TrainData_x_train_test_predict = forest.predict(TrainData_x_train_test)

predicted_probs = forest.predict_proba(TrainData_x_train_test)

a = forest.score(TrainData_x_train_test,TrainData_y_test)

f1 = f1_score(TrainData_y_test, TrainData_x_train_test_predict, average = ‘macro’)

p = precision_score(TrainData_y_test, TrainData_x_train_test_predict, aver-age = ‘macro’)

r = recall_score(TrainData_y_test, TrainData_x_train_test_predict, average = ‘macro’)

print(‘ Test Set—Accuracy:%.4f ‘%a,’ Accuracy rate:%.4f’%p,’ Recall Rate:%.4f ‘%r,’F1:%.4f ‘%f1,’\n’)

draw_auc(TrainData_y_test, predicted_probs)

References

Bai, Z.; Liu, Q.; Liu, Y. Landslide Susceptibility Evaluation Based on Coupling of Entropy Index and Random Forest. Yangtze River 2022, 53, 95–102. [Google Scholar]
Panchal, S.; Shrivastava, A.K. Landslide hazard assessment using analytic hierarchy process (AHP):A case study of National Highway 5 in India. Ain Shams Eng. J. 2022, 13, 101626. [Google Scholar] [CrossRef]
Mandal, B.; Mandal, S. Analytical hierarchy process (AHP) based landslide susceptibility mapping of Lish river basin of eastern Darjeeling Himalaya, India. Adv. Space Res. 2018, 62, 3114–3132. [Google Scholar] [CrossRef]
Ali, S.A.; Farhana, P.; Jana, V.; Romulus, C.; Nguyen, T.; Quoc, B.; Matej, V.; Ljubomir, G.; Ateeque, A.; Mohammad, A. GIS-based landslide susceptibility modeling: A comparison between fuzzy multi-criteria and machine learning algorithms. Geosci. Front. 2021, 12, 857–876. [Google Scholar] [CrossRef]
Neaupane, K.M.; Piantanakulchai, M. Analytic network process model for landslide hazard zonation. Eng. Geol. 2006, 85, 281–294. [Google Scholar] [CrossRef]
Phuong, T.; Mahdi, P.; Khabat, K.; Omid, G.; Narges, K.; Artemi, C.; Saro, L. Evaluation of deep learning algorithms for national scale landslide susceptibility mapping of Iran. Geosci. Front. 2021, 12, 505–519. [Google Scholar] [CrossRef]
Sun, D.; Xu, J.; Wen, H.; Wang, D. Assessment of landslide susceptibility mapping based on Bayesian hyperparameter optimization: A comparison between logistic regression and random forest. Eng. Geol. 2021, 281, 105972. [Google Scholar] [CrossRef]
Kayastha, P.; Dhital, M.R.; de Smedt, F. Application of the analytical hierarchy process (AHP) for landslide susceptibility mapping: A case study from the Tinau watershed, west Nepal. Comput. Geosci. 2013, 52, 398–408. [Google Scholar] [CrossRef]
Irjesh, S.; Jayant, N.; Anil, K. Landslide susceptibility zonation using geospatial technique and analytical hierarchy process in Sikkim Himalaya. Quat. Sci. Adv. 2021, 4, 100039. [Google Scholar] [CrossRef]
Hassan, A.; Bakhtiar, F. An integrated approach of analytical network process and fuzzy based spatial decision making systems applied to landslide risk mapping. J. Afr. Earth Sci. 2017, 133, 15–24. [Google Scholar] [CrossRef]
Fahri, A.; Şerif, B. Data poisoning attacks against machine learning algorithms. Expert Syst. Appl. 2022, 208, 118101. [Google Scholar] [CrossRef]
Anne, E.; Geert, A.; Anne, E.; Abigail, C.; Joost, W.; Charles, M.; Job, N.; Andrew, D.; Carel, J.C.G.; Alasdair, G.; et al. A Machine Learning Algorithm to Estimate the Probability of a True Scaphoid Fracture After Wrist Trauma. J. Hand Surg. 2022, 47, 709–718. [Google Scholar] [CrossRef]
Zhu, M.; Wang, J.; Yang, X.; Zhang, Y.; Zhang, L.; Ren, H.; Wu, B.; Ye, L. A review of the application of machine learning in water quality evaluation. Eco-Environ. Health 2022, 1, 107–116. [Google Scholar] [CrossRef]
Ali, S.; Hoseyn, S.; Cristina, C.; Mohammad, H.; Marco, P.; David, M.; Nader, K.; Mohammad, H.; Larry, K.B. Using machine learning in photovoltaics to create smarter and cleaner energy generation systems: A comprehensive review. J. Clean. Prod. 2022, 364, 132701. [Google Scholar] [CrossRef]
Xu, H.; Sun, W.; Du, Y.; Wang, A. Survey on the Classic Machine Learning Algorithms and Their Applications. Comput. Knowl. Technol. 2020, 16, 17–19. [Google Scholar] [CrossRef]
Wu, W.; Claudio, Z.; Fadi, K.; Liu, G. Enhancing the performance of regional land cover mapping. Int. J. Appl. Earth Obs. Geoinf. 2016, 52, 422–432. [Google Scholar] [CrossRef]
Azarafza, M.; Azarafza, M.; Akgün, H.; Atkinson, P.M.; Derakhshani, R. Deep learning-based landslide susceptibility mapping. Sci. Rep. 2021, 11, 24112. [Google Scholar] [CrossRef]
Chen, W.; Sun, Z.; Han, J. Landslide Susceptibility Modeling Using Integrated Ensemble Weights of Evidence with Logistic Regression and Random Forest Models. Appl. Sci. 2019, 9, 171. [Google Scholar] [CrossRef] [Green Version]
Hong, H.; Pourghasemi, H.R.; Pourtaghi, Z.S. Landslide susceptibility assessment in Lianhua County (China): A comparison between a random forest data mining technique and bivariate and multivariate statistical models. Geomorphology 2016, 259, 105–118. [Google Scholar] [CrossRef]
Trigila, A.; Iadanza, C.; Esposito, C.; Scarascia-Mugnozza, G. Comparison of Logistic Regression and Random Forests techniques for shallow landslide susceptibility assessment in Giampilieri (NE Sicily, Italy). Geomorphology 2015, 249, 119–136. [Google Scholar] [CrossRef]
Zhao, L.; Wu, X.; Niu, R.; Wang, Y.; Zhang, K. Using the rotation and random forest models of ensemble learning to predict landslide susceptibility. Geomat. Nat. Hazards Risk 2020, 11, 1542–1564. [Google Scholar] [CrossRef]
Lin, G.W.; Hung, C.; Chang Chien, Y.F.; Chu, C.R.; Liu, C.H.; Chang, C.H.; Chen, H. Towards Automatic Landslide-Quake Identification Using a Random Forest Classifier. Appl. Sci. 2020, 10, 3670. [Google Scholar] [CrossRef]
He, Q.; Wang, M.; Liu, K. Rapidly assessing earthquake-induced landslide susceptibility on a global scale using random forest. Geomorphology 2021, 391, 107889. [Google Scholar] [CrossRef]
Duan, Y.; Tang, J.; Liu, Y.; Gao, X.; Duan, Y. Spatial sensitivity evaluation of loess landslide in Liulin County, Shanxi based on sandom forest. Sci. Geogr. Sin. 2022, 42, 343–351. [Google Scholar]
Wang, Y.; Sun, D.; Wen, H.; Zhang, H.; Zhang, F. Comparison of Random Forest Model and Frequency Ratio Model for Landslide Susceptibility Mapping (LSM) in Yunyang County (Chongqing, China). Int. J. Environ. Res. Public Health 2020, 17, 4206. [Google Scholar] [CrossRef] [PubMed]
Ou, P.; Wu, W.; Qin, Y.; Zhou, X.; Huangfu, W.; Zhang, Y.; Xie, L.; Huang, X.; Fu, X.; Li, J.; et al. Assessment of Landslide Hazard in Jiangxi Using Geo-information. Front. Earth Sci. 2021, 9, 648342. [Google Scholar] [CrossRef]
Abdullah, H.; Kanwarpreet, S.; Abhishek, S.; Shamshad, A.; Desh, D.; Shamshad, A.; Anwar, K.; Faris, M. Landslide susceptibility assessment in the Himalayan range based along Kasauli—Parwanoo road corridor using weight of evidence, information value, and frequency ratio. J. King Saud Univ.—Sci. 2022, 34, 101759. [Google Scholar] [CrossRef]
Cao, C.; Zhu, K.; Xu, P.; Shan, B.; Yang, G.; Song, S. Refined landslide susceptibility analysis based on InSAR technology and UAV multi-source data. J. Clean. Prod. 2022, 368, 133146. [Google Scholar] [CrossRef]
Efiong, J.; Eni, D.I.; Obiefuna, J.N.; Etu, S.J. Geospatial modelling of landslide susceptibility in Cross River State of Nigeria. Sci. Afr. 2021, 14, e01032. [Google Scholar] [CrossRef]
Liao, M.; Wen, H.; Yang, L. Identifying the essential conditioning factors of landslide susceptibility models under different grid resolutions using hybrid machine learning: A case of Wushan and Wuxi counties, China. CATENA 2022, 217, 106428. [Google Scholar] [CrossRef]
Meng, T.; Xu, X.; Liu, H. Landslide risk assessment in high altitude areas based on slope unit optimization: Taking the baige landslide in Jinsha River as an example. J. Henan Polytech. Univ. 2021, 40, 65–73. [Google Scholar] [CrossRef]
Ke, X. Analysis of the effects of groundwater-induced geohazards. Ind. Technol. Forum 2021, 20, 41–42. [Google Scholar]
Sarma, C.P.; Dey, A.; Krishna, A.M. Influence of digital elevation models on the simulation of rainfall-induced landslides in the hillslopes of Guwahati, India. Eng. Geol. 2020, 268, 105523. [Google Scholar] [CrossRef]
Zou, Y.; Qi, S.; Guo, G.; Zheng, B.; Zhan, Z.; He, N.; Huang, X.; Hou, X.; Liu, H. Factors controlling the spatial distribution of coseismic landslides triggered by the Mw 6.1 Ludian earthquake in China. Eng. Geol. 2022, 296, 106477. [Google Scholar] [CrossRef]
Lu, Z.; Leng, Y.; Zhao, G. Analytical Hierarchy Process in Susceptibility Assessment of Landslide Hazards-A Case Study in the Pu’an County, Guizhou Province. Guizhou Geol. 2022, 39, 287–293. [Google Scholar]
Zou, Q.; Jiang, H.; Cui, P.; Zhou, B.; Jiang, Y.; Qin, M.; Liu, Y.; Li, C. A new approach to assess landslide susceptibility based on slope failure mechanisms. CATENA 2021, 204, 105388. [Google Scholar] [CrossRef]
Chen, M.; Tang, C.; Li, M.; Xiong, J.; Luo, Y.; Shi, Q.; Zhang, X.; Tie, Y.; Feng, Q. Changes of surface recovery at coseismic landslides and their driving factors in the Wenchuan earthquake-affected area. CATENA 2022, 210, 105871. [Google Scholar] [CrossRef]
Jiao, Q.; Jiang, W.; Qian, H.; Li, Q. Research on characteristics and failure mechanism of Guizhou Shuicheng landslide based on InSAR and UAV data Geospatial modelling of landslide susceptibility in Cross River State of Nigeria. Nat. Hazards Res. 2022, 2, 17–24. [Google Scholar] [CrossRef]
Yan, Y.; Yang, Z.; Zhang, X.; Meng, S.; Guo, C.; Wu, R.; Zhang, Y. Landslide Susceptibility Assessment Based on Weight-of-Evidence Modeling of the Batang Fault Zone, Eastern Tibetan Plateau. Geoscience 2021, 35, 26–37. [Google Scholar] [CrossRef]
Shao, X.; Xu, C. Earthquake-induced landslides susceptibility assessment: A review of the state-of-the-art. Nat. Hazards Res. 2022, 2, 172–182. [Google Scholar] [CrossRef]
Yao, W.; Li, C.; Zuo, Q.; Zhan, H.; Criss, R.E. Spatiotemporal deformation characteristics and triggering factors of Baijiabao landslide in Three Gorges Reservoir region, China. Geomorphology 2019, 343, 34–47. [Google Scholar] [CrossRef]
Akinci, H. Assessment of rainfall-induced landslide susceptibility in Artvin, Turkey using machine learning techniques. J. Afr. Earth Sci. 2022, 191, 104535. [Google Scholar] [CrossRef]
Xue, Q.; Zhang, M.; Gao, B. Hazard assessment of loess landslide based on soil moisture content and supported by slope unit Qingjian City, Shaanxi Provience. Geol. China 2020, 47, 1904–1914. [Google Scholar] [CrossRef]
Salem, M.; Mohamed, E.L.; Mossad, M.; Mahanna, H. Random Forest modelling and evaluation of the performance of a full-scale subsurface constructed wetland plant in Egypt. Ain Shams Eng. J. 2022, 13, 101778. [Google Scholar] [CrossRef]
Parzinger, M.; Hanfstaengl, L.; Sigg, F.; Spindler, U.; Wellisch, U.; Wirnsberger, M. Comparison of different training data sets from simulation and experimental measurement with artificial users for occupancy detection-Using machine learning methods Random Forest and LASSO Confidence intervals for the random forest generalization error. Build. Environ. 2022, 223, 109313. [Google Scholar] [CrossRef]
Martins, T.F.; Seoane, J.C.S.; Tavares, F.M. Cu–Au exploration target generation in the eastern Carajás Mineral Province using random forest and multi-class index overlay mapping Empirical tool development for prairie pothole management using AnnAGNPS and random forest. J. South Am. Earth Sci. 2022, 116, 103790. [Google Scholar] [CrossRef]
Li, H.; Lin, J.; Lei, X.; Wei, T. Compressive strength prediction of basalt fiber reinforced concrete via random forest algorithm. Mater. Today Commun. 2022, 30, 103117. [Google Scholar] [CrossRef]
Yang, S.; Li, D.; Yan, L.; Huang, Y.; Wang, M. Landslide Susceptibility Assessment in High and Steep Bank Slopes along Wujiang River Based on Random Forest Model. Saf. Environ. Eng. 2021, 28, 131–138. [Google Scholar] [CrossRef]
Nikoobakht, S.; Azarafza, M.; Akgün, H.; Derakhshani, R. Landslide susceptibility assessment by using convolutional neural network. Appl. Sci. 2022, 12, 5992. [Google Scholar] [CrossRef]
Benbouras, M.A. Hybrid meta-heuristic machine learning methods applied to landslide susceptibility mapping in the Sahel-Algiers. Int. J. Sediment Res. 2022, 37, 601–618. [Google Scholar] [CrossRef]
Zhang, Y.; Wu, W.; Qin, Y.; Lin, Z.; Zhang, G.; Chen, R.; Song, Y.; Lang, T.; Zhou, X.; Huangfu, W.; et al. Mapping Landslide Hazard Risk Using Random Forest Algorithm in Guixi, Jiangxi, China. ISPRS Int. J. Geo-Inf. 2020, 9, 695. [Google Scholar] [CrossRef]

Figure 1. Study area location.

Figure 2. Factors causing landslides used in the study: (a) elevation, (b) slope, (c) aspect, (d) rock group, (e) road, (f) river, (g) fault, (h) land cover, (i) NDVI, (j) average annual rainfall.

Figure 3. Demonstration of landslide delineation work. a,b,c are typical landslides in the study area, (a1,b1,c1) are remote sensing images, (a2,b2,c2) are corresponding aerial images of UAV.

Figure 4. Random Forest Flow Chart [48].

Figure 5. Technology roadmap.

Figure 6. Risk prediction chart of landslide geological hazard.

Figure 7. Pie chart of factor contribution rate allocation.

Table 1. Advantages/disadvantages of landslide susceptibility assessment methods.

Evaluation Methodology		Advantages/Disadvantages
Qualitative evaluation	Expert Scoring Method	The determination of factor weights is influenced by the experience of scoring experts and is highly subjective. When the amount of data is large, the weights are difficult to determine, there are many qualitative components, and the results are not reliable.
Semi-quantitative evaluation	AHP	It has the ability to deal with qualitative and quantitative compound problems, but also requires the support of expert systems, while the matrix computing system has errors.
	ANP	Similar to the AHP evaluation method, but the matrix built is more computationally intensive and has stronger internal dependencies.
	FAHP	Similar to the AHP evaluation method, but with higher decision reliability.
Quantitative evaluation based on machine learning	RF	Good performance; fast training; balanced dataset error; good resistance to overfitting. However, it is not possible to control the inner workings of the model.
	SVM	Simplify the usual problems such as classification and regression, but the algorithm is difficult to implement for large training samples and also has difficulties in solving multiple classification problems.
	LR	Suitable for scenarios with classification probability; computationally cheap and easy to implement; good robustness to small noise in the data. However, it is easy to underfit, and the classification accuracy is not high; and it does not perform well when there are missing data feature characteristics.
	ANN	Good performance and high model accuracy. However, it is “black box” in nature; time-consuming and labor-intensive; requires large amount of data support.
	CNN	Handles high-latitude data; automatic feature extraction. However, when the network level is too deep, the parameters are slow to change; the pooling layer loses a lot of valuable information.

Table 2. Data types and sources.

No.	Data	Types	Sources
1	Elevation	Raster data (50 m)	GDEMV3 30M (http://www.gscloud.cn/, accessed on 15 September 2022)
2	Slope
3	Aspect
4	Rock group	Shapefile	1:50,000 geological conditions map of pregnancy disaster (Sichuan Geological Survey)
5	Distance to road
6	Distance to river
7	Distance to faults
8	Land use cover type	Shapefile	Muli County Third National Land Survey Results Data (Sichuan Geological Survey)
9	NDVI	Raster data (50 m)	Landsat 8 OLI_TIRS satellite data (http://www.gscloud.cn/, accessed on 15 September 2022)
10	Average annual precipitation	Raster data (50 m)	National average annual rainfall data (https://www.resdc.cn/, accessed on 15 September 2022) interpolation
11	Landslide table	table	Sichuan Geological Survey
12	GF-1 images	Raster data (2 m)	Sichuan Geological Survey
13	UAV photos	JPEG	Sichuan Geological Survey

Table 3. Statistical table of area and number of landslides of different risk zones.

Susceptibility Level	Area (km²)	Proportion (%)	Number of Landslides	Proportion (%)
Very low	8349.95	63.01%	0	0
Low	1916.98	14.46%	0	0
Moderate	1005.03	7.58%	5	1.60%
High	901.67	6.80%	15	4.79%
Very high	1078.98	8.14%	295	93.61%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wu, X.; Song, Y.; Chen, W.; Kang, G.; Qu, R.; Wang, Z.; Wang, J.; Lv, P.; Chen, H. Analysis of Geological Hazard Susceptibility of Landslides in Muli County Based on Random Forest Algorithm. Sustainability 2023, 15, 4328. https://doi.org/10.3390/su15054328

AMA Style

Wu X, Song Y, Chen W, Kang G, Qu R, Wang Z, Wang J, Lv P, Chen H. Analysis of Geological Hazard Susceptibility of Landslides in Muli County Based on Random Forest Algorithm. Sustainability. 2023; 15(5):4328. https://doi.org/10.3390/su15054328

Chicago/Turabian Style

Wu, Xiaoyi, Yuanbao Song, Wei Chen, Guichuan Kang, Rui Qu, Zhifei Wang, Jiaxian Wang, Pengyi Lv, and Han Chen. 2023. "Analysis of Geological Hazard Susceptibility of Landslides in Muli County Based on Random Forest Algorithm" Sustainability 15, no. 5: 4328. https://doi.org/10.3390/su15054328

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Analysis of Geological Hazard Susceptibility of Landslides in Muli County Based on Random Forest Algorithm

Abstract

1. Introduction

2. Data and Methods

2.1. Overview of the Study Area

2.2. Data

2.2.1. Data Selection Basis

2.2.2. Data Sources and Processing

2.3. Random Forest Model Establishment

2.3.1. Random Forest Modeling Principles

2.3.2. Model Establishment

2.3.3. Precision Analysis of Random Forest Model

2.3.4. Factor Importance Measure

3. Results

3.1. Prediction Results of Landslide Susceptibility

3.2. Factor Contribution Rate Output Results

3.3. Model Accuracy Evaluation

4. Discussion

4.1. Limitations of Landslide Data

4.2. Differences in the Importance of Disaster-Causing Factors

4.3. Wide Applicability and Limitations of Landslide Susceptibility Model

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI