Next Article in Journal
A Novel Approach for Direction of Arrival Estimation in Co-Located MIMO Radars by Exploiting Extended Array Manifold Vectors
Previous Article in Journal
Video Stream Recognition Using Bitstream Shape for Mobile Network QoE
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Landslide Susceptibility Evaluation of Machine Learning Based on Information Volume and Frequency Ratio: A Case Study of Weixin County, China

1
Faculty of Land Resources Engineering, Kunming University of Science and Technology, Kunming 650093, China
2
Key Laboratory of Geospatial Information Integration Innovation for Smart Mines, Kunming 650093, China
3
Spatial Information Integration Technology of Natural Resources in Universities of Yunnan Province, Kunming 650211, China
*
Author to whom correspondence should be addressed.
Sensors 2023, 23(5), 2549; https://doi.org/10.3390/s23052549
Submission received: 5 December 2022 / Revised: 17 February 2023 / Accepted: 22 February 2023 / Published: 24 February 2023
(This article belongs to the Section Environmental Sensing)

Abstract

:
A landslide is one of the most destructive natural disasters in the world. The accurate modeling and prediction of landslide hazards have been used as some of the vital tools for landslide disaster prevention and control. The purpose of this study was to explore the application of coupling models in landslide susceptibility evaluation. This paper used Weixin County as the research object. First, according to the landslide catalog database constructed, there were 345 landslides in the study area. Twelve environmental factors were selected, including terrain (elevation, slope, slope direction, plane curvature, and profile curvature), geological structure (stratigraphic lithology and distance from fault zone), meteorological hydrology (average annual rainfall and distance to rivers), and land cover (NDVI, land use, and distance to roads). Then, a single model (logistic regression, support vector machine, and random forest) and a coupled model (IV–LR, IV–SVM, IV–RF, FR–LR, FR–SVM, and FR–RF) based on information volume and frequency ratio were constructed, and the accuracy and reliability of the models were compared and analyzed. Finally, the influence of environmental factors on landslide susceptibility under the optimal model was discussed. The results showed that the prediction accuracy of the nine models ranged from 75.2% (LR model) to 94.9% (FR–RF model), and the coupling accuracy was generally higher than that of the single model. Therefore, the coupling model could improve the prediction accuracy of the model to a certain extent. The FR–RF coupling model had the highest accuracy. Under the optimal model FR–RF, distance from the road, NDVI, and land use were the three most important environmental factors, ac-counting for 20.15%, 13.37%, and 9.69%, respectively. Therefore, it was necessary for Weixin County to strengthen the monitoring of mountains near roads and areas with sparse vegetation to prevent landslides caused by human activities and rainfall.

1. Introduction

Landslides refer to the movement of considerable rock, soil, or rock debris material along a slope, and they have been confirmed as one of the most devastating natural disasters worldwide [1,2]. In 2020, there were 4810 landslide disasters, 1797 rockslide disasters, 899 mudslide disasters, and 183 ground collapse disasters in China, thus resulting in 197 casualties. To be specific, landslide hazards account for over 60% of all geological hazards. The landslides and the secondary disasters have caused huge property damage and numerous casualties. This is especially true in mountainous areas that are characterized by complex geological environments, extreme weather, and the effect of human activities [3]. Many factors have been confirmed to contribute to the occurrence of landslides. Thus, analyzing the environmental factors for landslide occurrence, building a regional landslide susceptibility evaluation model, and evaluating the level of landslide susceptibility for landslide disaster prediction, prevention, and control, as well as land planning, are of great significance [4].
Landslide susceptibility evaluation methods have been primarily based on known landslide hazard data and GIS technology to construct landslide spatial prediction models for the quantitative analysis of potential landslides [5]. Scholars have evaluated landslide susceptibility in a variety of ways, and there are two main evaluation methods. The first type is based on knowledge-driven qualitative analysis [6]. The qualitative analysis follows an in-depth analysis of regional landslide causation mechanisms, using expert theory, knowledge, and experience to select landslide factors in the region and determine their weights for analysis [7,8]. The results of the evaluation are closely correlated with the evaluator’s experiential knowledge, which primarily comprises the fuzzy comprehensive evaluation method, the hierarchical analysis method [9], the expert scoring method [10], and the fuzzy comprehensive judging method [11,12,13]. The second method is mainly based on a data-driven quantitative analysis method. The above methods comprise mathematical and statistical models and machine learning models [14]. Mathematical statistical models refer to probability statistics or regression models for known landslide points. Subsequently, the entire study area is analyzed in accordance with the relative weights of the factors [15,16,17]. The mathematical statistical model is a probability statistic or a regression model for the known landslide points. Next, the entire study area is studied based on the relative weights of the factors. To be specific, information value model (IV) [18], weight of evidence model (WOE) [19,20], entropy index (IOE) [21], coefficient of determination (CF) [22,23,24], and frequency ratio model (FR) [25] are primarily involved in this model. With the increase in the volume of data, the complexity of topographic, geological, and hydrological elements cannot be fully resolved using simple mathematical and statistical methods of simulation and analysis. The machine learning model is to build a classifier that expresses all the data based on the properties of the existing training data, and the classifier optimally predicts all the data. In general, this model comprises logistic regression models (LR) [26], back propagation neural network models (BPNN) [27], support vector machine models (SVM) [28], random forests (RF) [29], Bayesian network models (BN) [30], and decision tree (DT) models [31]. Although some of the models have been employed for landslide susceptibility mapping in specific areas, no model has been proposed that can be applied to all landslide conditions. Scholars have begun to study the application of hybrid models in landslide susceptibility evaluation over the past few years to increase the accuracy of landslide prediction [22,32,33]. The landslide susceptibility evaluation method has been transformed from a single model to a hybrid model. Hybrid models combine two or more models to integrate landslide sample selection, feature selection, and information extraction into landslide hazard prediction [18,34,35,36]. To be specific, the merits of the different models can be dependent on each other to optimize the evaluation results and increase the prediction accuracy and can be applied to different geological conditions.
In the northeastern region of Yunnan Province, geological disasters occur frequently (e.g., landslides and debris flows). Accordingly, the landslide susceptibility evaluation in this region should be conducted for landslide hazard warning, prevention, and mitigation in this region. Most of the existing research on the evaluation of landslide susceptibility in the northeastern region of Yunnan Province has used a single evaluation model [37,38], rare research has coupled statistical models with machine learning methods, and there have been fewer opinions regarding landslide disaster prevention and control in the northeastern region of Yunnan Province. In this study, given topography, geological structure, hydrological environment, and land cover, 12 environmental factors are selected based on historical landslide hazards in Weixin County, with Weixin County in the northeastern region of Yunnan Province as an example. In this study, two statistical models (including an information volume model (IV) and a frequency ratio model (FR)) coupling three machine learning models (including logistic regression (LR), support vector machine (SVM), and random forest (RF)) are selected. The effectiveness and applicability of the coupling models in the region are studied, and the prediction accuracy of different coupled models is compared with that of a single model. A vulnerability zoning map with high accuracy is generated, thus providing a reference for disaster management and planning in northeastern Yunnan Province.

2. Materials and Methods

2.1. Study Area

Weixin County lying in northeastern Yunnan Province is part of Zhaotong City. It is located at the junction of Yunnan, Guizhou, and Sichuan provinces, with longitude and latitude between 104°41′15″ E to 105°18′45″ E and 27°42′30″ N to 28°07′30″ N, taking up a land area of 1400 km2. As depicted in Figure 1, the topographic elevation is low in the north and high in the south, with an elevation range of 480–1905 m, and the mountainous area accounts for 60% of the total area of the county. The county is densely populated with large and small rivers, and the Nanguang River, Baishui River, and Chishui River are the main water systems. The study area has a typical subtropical monsoon climate, with an average annual precipitation of 1076–1102 mm and an average annual temperature of 13.3 °C. The rainfall in Weixin County is mostly accumulated from May to October, accounting for 60–80% of the total annual rainfall. Geological and tectonic movements are strong, mainly in the Zangjiang Great Rift Zone, the Bijie Great Rift Zone, and the Chaomatian Rift Zone. The exposure strata in the study area primarily comprise the Middle–Upper Cambrian, Lower Middle Jurassic, Middle–Upper Permian, Terrace-phase Lower Middle Triassic, and Ordovician–Silurian strata; their lithology largely includes tuffs, mudstones, and sandstones with mud and gravel. The county’s land cover type is mainly forest land and arable land, accounting for 64% and 29.5% of the county’s total area, respectively. Landslide disasters occur frequently in the county due to the special topography of Weixin County and the complex and variable climatic conditions and abundant local rainfall.
Weixin County is one of the geological disaster-prone areas in Yunnan Province, which is one of the key areas for geological disaster prevention and control. In 2009, a major landslide occurs in Weixin County, resulting in 14 deaths and 12 missing persons, causing great damage to the lives and properties of residents. Landslide remote sensing images and field survey photos are shown in Figure 2.

2.2. Data Sources

2.2.1. Landslide Cataloging Data

This study used GIS for data collection and processing. Landslide inventories are an important prerequisite for landslide susceptibility analysis because there is an assumption that past events have a strong influence on the future [39]. Thus, landslide inventory maps can provide useful information about the locations of previous landslides and may also identify areas where future landslides are likely to occur. In this study, a total of 345 landslide location point data were collected in Weixin County through the collection of historical data, remote sensing images, and Google image interpretation and field survey to construct a landslide inventory database. Then the landslide data were divided into a training dataset and a validation dataset for model building and validation, respectively. According to existing research [28,33,40,41], the dataset is usually divided according to 70% for training and 30% for validation [42,43]. The study area is mostly dominated by earthen landslides, and most of the landslides are traction landslides, a few are pile-fall landslides, and most of them are tongue-shaped and semicircular. The thickness of the landslides ranges from 1 m to 10 m, with small and medium-sized landslides predominating.

2.2.2. Data Description of Environmental Factors

A landslide is one of the highly destructive natural hazards on the Earth’s surface [44]. The selection of the causal factors for landslides is a vital task for landslide susceptibility modeling and mapping. The environmental factors that induce landslides mainly comprise topography, geology, hydrology, and human engineering activities. The effective selection of environmental factors lays the basis for establishing landslide hazard susceptibility modeling and significantly affects the reliability and accuracy of evaluation results. According to the historical landslide disaster occurrence in Weixin County and field survey data, topography, geological structure, hydrology, and land cover were selected for the selection of environmental factors. The selected environmental factors are listed in Table 1. The raster cell data with a resolution of 30 m × 30 m were converted using ArcGIS 10.2 software. The data sources are listed in Table 1, and the environmental factors are presented in Figure 3.

3. Research Methodology

3.1. Research Technology Routes

The idea of this study was to couple two statistical methods (IV and FR) with three machine learning methods (LR, SVM, and RF) to build nine landslide susceptibility evaluation models, to conduct a comparative analysis of the performance of single and coupled models, and finally to analyze the contribution of environmental factors to landslide development under the optimal model. The procedures are presented in a flowchart as shown in Figure 4 and included the following steps:
Step 1 was to collect landslide hazard-related data in the study area, mainly including historical landslide hazard site data and environmental factors data.
Step 2 was to perform independence tests for environmental factors, mainly Pearson correlation coefficients and multicollinearity diagnostics.
Step 3 was to obtain the frequency ratio and information value of the respective environmental factor by the frequency ratio and information value methods, respectively, and then obtain the influence law of each environmental factor on landslide development in its attribute interval.
Step 4 was to divide historical landslide hazard points, randomly select landslide and non-landslide points according to the ratio of 7:3, obtain training and validation sets, build nine landslide susceptibility prediction models (LR, SVM, RF, IV–LR, IV–SVM, IV–RF, RF–LR, FR–SVM, and FR–RF), and build landslide susceptibility maps basing on GIS.
Step 5 was to analyze and compare the performance of the models based on the confusion matrix, ROC curves, and AUC values of the validation dataset to find the optimal model.
Step 6 was to discuss the significance of the respective environmental factor based on the optimal model and rank the contribution of the respective factor to obtain the important trigger factors for landslide susceptibility in the study area.

3.2. Screening of Environmental Factors

Landslides are affected by a variety of factors (e.g., intrinsic and extrinsic factors), and there is a certain correlation between the factors. The extremely high correlation between factors will lead to problems (e.g., complexity of model operation and model overfitting). Accordingly, the correlation analysis between the evaluation factors should be conducted before establishing the evaluation model to eliminate the factors with high correlation to ensure the efficiency of the model operation and the rationality of the evaluation results [32,45]. Thus, the Pearson correlation coefficient (PCC), variance inflation factor (VIF), and tolerance level (TOL) were employed for independence tests.

3.2.1. Correlation Analysis of Factors

The Pearson correlation coefficient (PCC) method is a nonparametric statistical method that is adopted to measure the correlation (linear correlation) between two variables X and Y, with a value between −1 and 1. In general, the correlation coefficient is expressed as r. When r > 0, the two variables are positively correlated; when r < 0, the quantitative variables are negatively correlated; and when r = 0, the two variables are not correlated. In general, the Pearson coefficient method only refers to the absolute value of the correlation coefficient r to indicate the correlation level between two variables. The absolute value r ranging from 0 to 0.3 indicates uncorrelation, |r| between 0.3 and 0.6 indicates low correlation, |r| between 0.6 and 0.8 indicates moderate correlation, and |r| between 0.8 and 1.0 indicates high correlation [46]. The calculation equation is as follows:
r = i = 1 n ( X i X ¯ ) ( Y i Y ¯ ) i n ( X i X ¯ ) 2 i n ( Y i Y ¯ ) 2
where r denotes the correlation coefficient, X i and Y i are the observation values of point i corresponding to variables X and Y , and X ¯   Y ¯ denote the sample means of X and Y , respectively.

3.2.2. Multicollinearity Tests

Multicollinearity refers to the existence of a certain linear relationship between explanatory variables in a multiple regression model. Variance inflation factor (VIF) and tolerance level (TOL) are commonly used to test for multicollinearity. VIF takes a value higher than 1, and TOL takes a value between 0 and 1. When the VIF value is higher than 2 or TOL is less than 0.5, it indicates that there is strong multicollinearity between the factors; when VIF is higher than 1 and less than 2 or TOL is higher than 0.5, the multicollinearity between the factors is light [47]. The calculation is as follows:
VIF = 1 1 R i 2 = 1 TOL ( i = 1 , 2 , 3 k )
where R i denotes the correlation coefficient when the independent variable X i is a regression coefficient on the remaining variabsles.

3.3. Processing of Factors

3.3.1. Frequency Ratio (FR)

Frequency ratio (FR) method is a binary statistical method adopted to explain the probabilistic relationship between the dependent and independent variables [8,16,48]. The frequency ratio is capable of calculating the correlation between the probability of landslide occurrence and the evaluation factors. The effect of different levels of the respective factor on the occurrence of landslide (e.g., the contribution rate) is analyzed in accordance with the value of frequency ratio [16,20]. In this study, the frequency ratio method was employed to quantitatively analyze the correlation between the analysis of landslide distribution and the evaluation factors. The size of the FR value can reveal the problem of the contribution rate of the respective attribute interval of environmental factors to landslide generation, the larger the FR value [20]. The above analysis suggested that it more significantly contributed to the occurrence of landslide, and vice versa, that it was difficult for the landslide to occur in the interval. The frequency ratio equation is as follows:
FR = N i / N S j / S
where N j denotes the number of raster cells of the evaluation factor with landslides in interval i , N represents the number of raster cells of landslides in the whole area, S j expresses the number of raster cells of the evaluation factor in this graded interval, and S j denotes the number of raster cells in the whole area.

3.3.2. Information Volume (IV)

The information volume model (IV) is a common statistical method originally used in the field of mineral resource census exploration [14,17], etc. The magnitude of information is adopted to quantitatively describe the contribution of the respective factor to regional mineralization. Subsequently, some scholars have used the informativeness model for geological hazards [21]. In the evaluation of landslide susceptibility, the information size characterizing the occurrence of landslides under the attribute interval of the respective influence factor is obtained using the density of landslide occurrence based on the statistical historical landslide data [25]. The contribution rate of each factor attribute interval to landslide occurrence is judged based on the information size. The information size is calculated as follows:
IV ( x i , H ) = ln A i / A B i / B
where I ( x i , H ) denotes the value of information provided by the occurrence of landslide hazard, x i represents the rank of indicator factor in the evaluation unit, A i expresses the number of landslide raster cells of indicator factor in the study area, A is the total number of landslide raster cells in the study area, and B is the total number of raster cells of indicator factor in the study area. IV < 0, thus suggesting that the landslide was unfavorable under the level of evaluation factor i . IV > 0, thus revealing that the evaluation factor was unfavorable to the occurrence of landslide under the interval of attribute i .

3.4. Machine Learning Models

3.4.1. Logistic Regression

Logistic regression (LR) is a regression analysis method that has been usually adopted to explain the correlation between dichotomous dependent variables or predictor variables [3]. The logistic regression is compared with the general linear regression model in that the above variables can be continuous or discrete variables [10]. Thus, LR is a dichotomous problem by predicting the probability of an event occurring (“0” and “1”). The LR is expressed as follows:
P = e Y 1 + e Y
Y = α + β 1 X 1 + β 2 X 2 + + β n X n
where Y represents the denotes landslide event occurrence, α is a constant, P expresses the probability of a landslide, and X i represents the influence factors for the occurrence of landslides.

3.4.2. Support Vector Machines

Support vector machines (SVM) are a supervised learning method developed in accordance with statistical theory and the principle of structural risk minimization [22,49]; these machines have been extensively applied to classification and regression. The support vector machine is a binary classification model based on the principle of finding the separating hyperplane that maximizes the classification interval in a high-dimensional space [31,34,35]. The input data are x i ( i = 1 , 2 , 3 , n ) , and the output of the corresponding binary classification problem is ( y = ± 1 ) . The calculation is written as Equations (7)–(9):
L = | | ω | | 2 / 2 i N λ i ( ( y i ( ω · x i ) + b ) 1 )
y i = ( ω i · x i ) + b 1
y i ( ω · x i ) + b 1 + ζ i
where | | ω | | 2 denotes the norm of the normal vector in the hyperplane, b is a constant, L expresses the loss function, λ i is the Lagrange multiplier, and ζ i is the relaxation factor. In Equation (5), v ( 0 , 1 ] denotes the misclassification, and the radial kernel function is selected as the kernel function of SVM in this study. The formula is as follows:
L = 1 2 | | ω | | 2 1 v n i = 1 n ζ i

3.4.3. Random Forests

Random forest (RF) is a machine learning algorithm proposed by Leo Breiman, belonging to the type of integrated Bagging algorithm [29]. Many decision classification trees are randomly generated, and the respective decision tree can be voted or averaged to select the optimal classification result, allowing the model to analyze the results with high accuracy and generalization [50]. The principle of random forest is to generate a novel decision tree of training samples by randomly electing n samples from the original training dataset N by the self-service (bootstrap) resampling technique and then to repeat the above steps to generate m decision trees to form a random forest [8,28]. Moreover, the classification results of the novel data are determined by the number of votes formed by the classification trees. Random forest is a modification of the decision tree algorithm by combining multiple decision trees together, with the creation of the respective tree depending on the independently drawn samples [30,41]. The classification power of a single tree may be small, whereas a test sample can statistically select the most likely classification by the classification results of each tree after considerable decision trees are generated randomly. Random forest exhibits strong noise immunity stable performance, can handle high dimensionality, and does not have to do feature selection.

3.5. Performance Validation of the Model

3.5.1. Confusion Matrix

In this study, the performance of the evaluation model for landslide susceptibility was evaluating using the confusion matrix, which is often used in binary classification for the evaluation of model performance. The confusion matrix included four common parameters as listed in Table 2. In this study, four statistical indicators (including precision, recall, accuracy, and F1 score) were used to evaluate the performance of the respective model. The indicators are expressed as follows:
Precision = TP TP + FP
Recall = TP TP + FN
Accuracy = TP + TN TP + FP + FN + TN
F 1 -score = 2 × Precision × Recall Precision + Recall
where TP denotes the number of landslide rasters predicted correctly, TN denotes the number of non-landslide rasters predicted correctly, FP denotes the number of landslide rasters predicted incorrectly, and FN denotes the number of non-landslide raster cells predicted incorrectly.

3.5.2. ROC Curves and AUC Values

Recipient characteristic curves are often used in assessing landslide susceptibility evaluations. Receiver operating characteristic (ROC) curves are also a measure of model validity [51]. The area of the graph enclosed by the curve and the axes is called the area under the curve AUC. The AUC values range from 0.5 (worst model performance) to 1 (optimal model performance) [14,52]. When the AUC value is higher than 0.7, the closer the AUC value is to 1, the more accurate the model’s prediction is. The value of AUC can be calculated by the integral trapezoidal rule. The equation is written as follows:
AUC = ( TP + TN ) ( P + N )
where TP (true) and TN (true negative) denote the correctly classified raster cells, P expresses the total number of landslide raster cells, and N represents the total number of non-landslide raster cells.

4. Results

4.1. Independence Test of the Factors

4.1.1. Pearson’s Correlation Tests

Existing research has suggested that the environmental factors selected for model construction should maintain relative independence from each other to ensure the accuracy of model evaluation [32]. The correlation test of the factors was performed using the statistical tool of band set in ArcGIS, and the correlation coefficient matrix of 14 environmental factors was yielded and visualized using Origin software. The results are illustrated in Figure 5. When the absolute value of the correlation coefficient between two factors r > 0.6, the factors are considered to be strongly correlated with each other. As depicted in Figure 5, the correlation coefficients of the factors were all less than 0.6, and the correlations were weak. Accordingly, the factors had a small degree of interaction.

4.1.2. Multicollinearity Tests

In addition, multiple covariance analysis was conducted on the environmental factors using SPSS 22 software to obtain the TOL and VIF values of the environmental factors. The results are listed in Table 3; all environmental factors achieved TOL values less than 1 and VIF values less than 2. The above result indicated that there was no covariance among the selected environmental factors, thus verifying the rationality among the selected environmental factors.

4.2. Classification of the Respective Attribute Interval of Environmental Factors and Calculation of Frequency Ratio and Information Value

In this study, the ratio of the total number of landslide rasters to the total number of rasters was used for calculation.
(1) Factors for topography and geomorphology were as follows: The distribution of landslides is closely correlated with the elevation, and the vegetation cover and soil moisture are different in different elevation areas, which leads to different surface water collection capacity. Moreover, the intensity of human activities varies in different elevation ranges [53]. As depicted in Table 4, the information values between elevations 478–800 m, 1000–1100 m, and 1200–1300 m were higher than 0, and the frequency ratio was higher than 1, indicating that landslides were densely distributed in this interval. The slope affected the development and gestation process of the landslide by affecting the stress distribution, surface runoff, and groundwater recharge and discharge of the slope body [54]. Slope orientations between 20°–25°, 25°–30°, and 30°–40°, with the information values higher than 0 and frequency ratios higher than 1, accounted for more than 70% of the landslides occurring. The slope orientation was influenced by the intensity of solar radiation, leading to differences in evaporation, vegetation, and human activities [55]. In the east, south, west, and southwest directions, landslides were prone to occur, with frequency ratios higher than 1 in all the above directions and positive information values. The degree of twisting and deformation of slope surfaces affected the stress distribution in the slope parts and thus had different degrees of influence on the development of landslides [56]. In this study, plane curvature and profile curvature were chosen. As depicted in Table 5, the profile curvature ranged from 0.35 to 4.92 degrees, and the profile curvature ranged from −12.33 to −0.15 and 2.47 to 15.42. The frequency ratios were all higher than 1, and the information values were all higher than 0, indicating that this interval was favorable for landslides to occur.
(2) Factors for hydrological environment were as follows: Rainfall is one of the main triggering factors for landslide development, and rainfall changes the stress distribution of slope body and the stability of slope body [57]. The rainfall was between 1084 and 1095 mm, the information value was higher than 0, and the frequency ratio was higher than 1. Landslides occurred more frequently in this zone. Within a certain distance from the water system, the scouring and soaking of rivers could result in the loss of soil from the slope, thus leading to the destabilization of landslides. As depicted in Table 4, the FR value was higher than 1, and the IV value was higher than 0 within the distance of 800 m from the water system, indicating that landslides were likely to occur in this area.
(3) Factors for geology were as follows: In the range of distance less than 1200 m from the fracture zone, the FR value was higher than 1, and the information value was higher than 0, indicating that the distance from the fracture zone affected the occurrence of landslides. The stratigraphic lithology was an important internal factor for landslide development and stability [58]. The physical and mechanical properties of the rock mass and the interstratigraphic mechanism determined the geotechnical stress distribution, which in turn affected the stability of the slope. In this study area, landslides occurred mainly in the Middle and Upper Permian sandstones and carbonaceous rocks, etc. Landslides easily occurred in the lithological distribution area, as depicted in Table 4.
(4) Factors for land cover were as follows: The ground cover of vegetation affected the development and distribution of landslides, which mainly played a certain fixed role on the slope surface through the rhizomes of vegetation and slowed down the water flow rate and infiltration rate of the slope surface. Landslides easily occurred in areas with low vegetation cover, and their FR values were higher than 1 and IV values were higher than 0. Irrational human exploitation of land is also one of the causes of landslides, and different land cover types have different effects on the stability of slopes [59]. As depicted in Table 5, landslides occurred more in the range of construction land, water area, and grassland. Their FR values reached up to 4.65, 2.62, and 1.68, respectively, and the information values were positive. The distance from the road was also a vital factor for the development and distribution of landslides. Roadbed widening, blasting works, and artificial slope cutting in road projects could lead to the changed original geomorphology and geotechnical structure, thus resulting in the reduced slope stability. As depicted in Table 4, at the distance from the road less than 400 m, the FR values were obtained as 3.01 and 1.01, respectively, and the information values were higher than 0, thus revealing that the closer the distance from the road, the greater the possibility of landslide occurrence will be.

4.3. Results of the Model

4.3.1. LR Regression and Coupling Model

In this study, the original and frequency ratios and information values of the respective environmental factor in the training sample were input into SPSS 22 software for binary logistic regression calculation, and the regression coefficients and constants of each environmental factor were obtained. The magnitude of the regression coefficients could indicate the degree of contribution of the evaluation index factors to landslide generation and the regression coefficients under different models, as listed in Table 5. Next, the regression coefficients were substituted into Equations (5) and (6) and calculated using ArcGIS 10.2 raster calculator to predict the landslide sensitivity index of the respective raster cell in the study area.

4.3.2. Support Vector Machine (SVM) and Coupling Model

In this study, the data of the training set and the test set were input into Python, built using the Scikit-learn framework, then input into the svm library. Moreover, the regularization parameter ϑ and gamma parameter values of the support vector machine model RBF kernel function were obtained based on the training dataset using tenfold cross-validation and the optimal grid. The single SVM penalty parameter reached 0.4, the gamma parameter was 0.114, and the regularization parameters of the IV–SVM and FR–SVM models based on information value (IV) and frequency ratio (FR) coupling were 0.75, 0.24 and 0.83, 0.12, respectively. Subsequently, the trained models were employed for landslide sensitivity prediction for the whole study of 1,548,205 point objects. Afterward, the sensitivity values of all points were input into ArcGIS and then converted into 30 × 30 raster cells.

4.3.3. Random Forest and Coupling Models

In this study, the training set and test set data were input into Python built using the Scikit-learn framework and input into the RandomForestClassifier library. n_estimators and max_depth, important parameters in random forest, significantly affected the accuracy of the model. n_estimator represents the number of decision trees. The prediction performance of the RF model improved with the increase in n_estimators, whereas the computational effort of the model tended to increase, and the modeling time was extended. The parameters of the RF model alone were 150, and the parameters of the coupled models IV–RF and FR–RF based on information value (IV) and frequency ratio (FR) were 100 and 104, respectively. The trained model was then used for landslide sensitivity prediction of 1,548,205-point objects in the whole study area, and the sensitivity values of all points were input into ArcGIS10.2 and then converted into 30 × 30 raster cells.

4.4. Landslide Susceptibility Mapping

In accordance with Section 4.3, the landslide susceptibility map of Weixin County was generated (Figure 6) using the frequency ratio (FR), information value (IV), coupling logistic regression (LR), support vector machine (SVM), and random forest (RF) models. The landslide susceptibility zoning was based on the probability of landslide occurrence in the study area obtained from the models. The susceptibility of the study area was divided into five main classes: very low, low, medium, high, and very high susceptibility zones. The mapping of the landslide susceptibility results obtained from the nine models was relatively similar, which was generally consistent with the results of the field survey.
(1) Very high and high susceptibility areas often caused a large number of landslides due to vegetation destruction, unreasonable land use, road construction, housing development, and other reasons. Therefore, the very high landslide susceptibility areas in Weixin County were mainly distributed linearly along rivers and road extension areas and in areas with more frequent human activities. As shown in Figure 4, the very high and high susceptibility areas in Weixin County were mainly located in the western, southeastern, central, and northwestern parts of the study area. The low and very low susceptibility areas were mainly located in the northern part of the study area where there was high forest cover and less human activity, which was consistent with the results of the field survey.
(2) In the western, central, and southeastern parts of the study area, the very high susceptibility zones were mainly in the Ordovician, Middle Cambrian, and Cenozoic Quaternary periods, with the main lithologies being dolomite, shale, mudstone, and fine sandstone, etc. Under the influence of weathering, the rock bodies in the weathered crust layers were relatively fragmented, especially at the lithological boundaries of the strata. The loose structure of the soil layers during this period provided abundant material for landslides to occur. The unstable stratigraphic structure in the stratigraphic lithological boundary area was an important factor influencing the occurrence of landslides.
(3) In granitic rock areas, very high susceptibility areas were more clearly distributed mainly along roads. Environmental factors such as humidity, topographic relief, and geological tectonic activity accelerated weathering, altering the inherent nature of the material and reducing the strength of the surface rocks.
(4) The historical landslide data were overlaid with the landslide susceptibility zoning results for analysis, and the area shares, landslide shares, and frequency ratios of both for the respective zoning were calculated. The statistical results are listed in Table 6. With the increase in the landslide susceptibility level, the landslide percentage and the frequency ratio percentage increased, and the landslides in significantly high and high susceptibility areas accounted for over 70% of the total number of historical landslides. The areas of the significantly high susceptibility zone based on single models (LR, SVM, and RF) were 134.17 km2, 124.98 km2, and 113 km2, respectively; their proportions of the study area reached 9.63%, 8.97%, and 8.12%, respectively; and the frequency ratios were 4.01, 6.31, and 7.05, respectively. Based on coupled models of FR, the significantly high susceptibility areas achieved IVs (IV–LR, IV–SVM, IV–RF, FR–LR, FR–SVM, and FR–RF) of 120.45 km2, 121.71 km2, 115.58 km2, 120.42 km2, 122.60 km2, and 114.17 km2, respectively. Moreover, their proportions of the study area were obtained as 8.64%, 8.74%, 8.29%, 8.64%, 8.80%, and 8.19%, and the frequency ratios were 4.73, 4.65, 6.60, 6.49, 6.92, and 7.21, respectively. In brief, the area ratios of the significantly high, high, medium, low, and very low susceptibility zones in the study area were consistent with the distribution pattern of landslide hazards.
The three single machine learning models and the landslide susceptibility maps obtained by coupling FR and IV models had similar changing trends. Figure 6 shows that the significantly high and high susceptibility zones were primarily distributed in the distance from rivers, road extension areas, and areas with more frequent human activities, with the low vegetation cover, the well-developed water system, as well as the soft rock lithology distributed along the fracture zone. In the southeastern region, the significantly high susceptibility zone obtained by the support vector machine and its coupled model was larger in scope. As revealed by the actual survey data and images, the soil was loose and the ecological environment was fragile in this area, and there was a great possibility of geological disasters (e.g., landslides).

4.5. Accuracy Evaluation of the Model

The frequency ratio of the respective model was calculated by counting the distribution of historical landslide raster cells in each susceptibility class. The results are listed in Table 7. The results showed that 83% of the landslide raster cells fell into the significantly high and high susceptibility zones in the coupled FR–RF-based model, and more than 80% of the raster cells fell into the significantly high and high susceptibility zones in the remaining coupled models. The percentages of landslide raster cells falling into the significantly high and high susceptibility zones in the single model were the lowest in the LR model, the second highest in the SVM model, and the highest in the RF model. The above analysis shows that the prediction accuracy of the coupled model was higher than that of the single model in general.

4.5.1. Evaluation of Precision Parameters

The results of the confusion matrix and statistical indicators are listed in Table 7; each index in shown in Figure 7. In terms of precision, FR–RF > IV–RF > RF > SVM > IV–SVM > FR–SVM > IV–LR > FR–LR > LR; the FR–RF model had the highest precision, indicating that the FR–RF model had the strongest partitioning ability for negative samples. In terms of recall, FR–RF > IV–RF > RF > FR–SVM > IV–SVM > SVM > FR–LR > IV–LR > LR. It can be seen that the FR–RF model had the highest recall, which indicated that the FR–RF model had the strongest ability to identify positive samples. Second, in terms of F1 scores, the F1 scores of all models were higher than 0.7, and the ranked values of the F1 scores of the respective model were FR–RF > IV–RF > RF > FR–SVM > SVM > IV–SVM > FR–LR > IV–LR > LR, which indicated that all models could reflect the landslide susceptibility of the study area, and the performance of the FR–RF model was relatively higher. In terms of accuracy, RF > SVM > LR, indicating that the RF model could predict the occurrence of landslide hazards better than the SVM and LR models, followed by the highest accuracy of the FR–RF model, indicating that the coupled model could improve the prediction accuracy of the model.

4.5.2. Comparison of ROC Curve and AUC Values

Figure 8 presents the operating characteristic curves of the subjects for the nine models (including LR, SVM, RF, FR–SVM, FR–RF, FR–LR, IV–SVM, IV–RF, and IV–LR). The LR, SVM, and RF models achieved AUC values of 0.761, 0.855, and 0.936, respectively, and the AUC values achieved by the IV–LR, IV–SVM, and IV–RF models reached 0.791, 0.867, and 0.927, respectively. The AUC values achieved by the FR–LR, FR–SVM, and FR–RF models were 0.785, 0.885, and 0.949, respectively. The AUC values of all models are evaluated to be above 0.75. To be specific, the RF model exhibited the highest accuracy among the single models, the LR model achieved the lowest accuracy, and the FR–RF model had the highest accuracy among the coupled models, followed by the IV–RF model. Furthermore, the lowest AUC value achieved by the coupled FR–LR model was higher than that of the single model, thus revealing that the coupled model was beneficial to enhance the prediction ability of landslide hazard.

4.6. Case Study

The prediction performance of the model was analyzed and verified by using 15 landslide data points obtained from interpretation and investigation in 2021–2022. The FR–RF model with the best prediction effect was selected, and 10 landslides were found in the extremely high and highly prone areas of the model. From this point of view, the coupled model had good predictive performance. The Longdong Rock landslide in Guihua Village, Zhaxi Town, was selected, and the results showed that the landslide fell well into the extremely prone area of FR–RF model, as shown in Figure 9. It was found in the field investigation that the landslide was developed in the weathering mud dolomite rock mass in Paleozoic Ordovician (O1–S1). Due to the combined action of lithology, stress change in the slope after artificial excavation of slope foot, and groundwater, the rock mass weathering and slime phenomenon in this layer were obvious, and the physical and mechanical properties of the soil mass were low. Finally, the landslide developed into a slip zone and caused the slope to slide.

5. Discussions

The landslide susceptibility of Weixin County was evaluated using remote sensing, GIS tools, and machine learning algorithms. Three single models (including LR, SVM, and RF) and the hybrid models (IV–LR, IV–SVM, IV–RF, FR–LR, FR–SVM, and FR–RF) were built based on the information value (IV) and frequency ratio (FR) to draw nine landslide susceptibility maps in Weixin County, and the prediction accuracy of the nine models was compared. The comparison results indicated that the prediction accuracy of the coupled IV and FR-based models was higher than the single model accuracy. The hybrid model with frequency ratio coupled with random forest achieved the highest prediction accuracy among all models. The coupled models based on the amount of information value and frequency ratio exhibited higher prediction accuracy than the single model. They took on a critical significance to predicting possible future landslides and laid a basis for decision-making in the early warning and prevention of landslides in Weixin County.

5.1. Landslide Susceptibility Map Rationality

Through superposition analysis, the rationality of landslide susceptibility mapping in Weixin County was evaluated. As shown in Table 7, the detailed information of different levels of prone areas in different models was presented. Landslide susceptibility maps based on single models and coupled models of information content and frequency ratio had the same trend on the whole. In the LR, IV–LR and FR–LR models, the proportions of extremely highly prone areas were 8.64%, 8.64%, and 9.63%, and the proportions of landslides were 38.5%, 40.91%, and 40.14%, respectively. The frequency ratios were 4.01, 4.73, and 4.65. In SVM, IV–SVM, and FR–FR models, the proportion columns of extremely highly prone areas were 8.97%, 8.74%, and 8.8%, and the proportions of landslides were 56.62%, 57.62%, and 57.11%, respectively. The frequency ratios were 6.31, 6.6, and 6.49. In the RF, IV–RF, and FR–RF models, the proportions of extremely highly prone areas were 8.12%, 8.29%, and 8.19%; the proportions of landslides were 57.11%, 57.66%, and 59.11%; and the proportions of frequency were 7.05, 6.92, and 7.21. The above analysis showed that the landslide susceptibility map of Weixin County obtained by most models was reasonable, and the proportion of landslide occurrence gradually increased from very slightly prone areas to very highly prone areas. In general, compared with other models, the landslide susceptibility map obtained by the RF–RF model was the most reasonable. The above model achieved good results in the landslide susceptibility assessment mapping of Weixin County. In this study, only statistical methods and typical cases were selected to analyze and verify the results of the result analysis. In future studies, the susceptibility mapping results will be analyzed from the perspective of geological concepts, so as to make the landslide susceptibility mapping results more accurate and reliable.

5.2. Evaluation Units

The accuracy of landslide susceptibility evaluation was closely related to the selection of the evaluation unit. The commonly used evaluation units mainly include the grid unit, slope unit, topographic and geomorphic unit, and administrative unit. After selecting the appropriate evaluation unit, each evaluation unit can assign the value of each environmental factor. The grid unit is a grid that divides the study area into regular grids for storage and calculation. This method is widely used in landslide susceptibility assessment mapping, but the grid unit could not fully reflect the topographic relief and geological and hydrological elements of the study area. A grid cell size of 30 m × 30 m was used in this study, but in future studies, slope cells can be considered for landslide susceptibility analysis, and the similarities and differences between slope cells and grid cells can be compared.

5.3. Significance of Environmental Factors

The selection of suitable environmental factors is of great importance to landslide susceptibility evaluation. However, the selected evaluation factors are not all strong predictors; in some cases, some factors will generate noise and lead to the decreased accuracy of prediction. In the existing research, the preliminary analysis of the correlation between the respective environmental factor and landslide is only conducted, and the correlation interval of the occurrence of the respective environmental factor is obtained, whereas the contribution of each environmental factor to landslide susceptibility is not revealed [14,60]. Section 4.3 indicates that FR–RF with the highest model accuracy was selected for the factor significance analysis. In this study, the significance of the environmental factors was measured as the percentage decrease in the average Gini index versus the sum of the decrease in the average Gini index for all environmental factors. The 12 environmental factors were analyzed using the Python language and then visualized with Origin 2021 software to generate the significance ranking graph of the respective factor, and the results are illustrated in Figure 10. As depicted in Figure 10, the 12 environmental factors were ranked in order of significance as follows: distance to roads > NDVI > land use > stratigraphic lithology > elevation > rainfall > slope direction > distance to rivers > distance from fracture zone > slope > plane curvature > profile curvature. Distance to roads, NDVI, and land use were the three critical environmental factors, and their significance percentages were 20.15%, 13.37%, and 9.69%, respectively. It indicated that the above three factors contributed the most to the model and were important triggers for landslide generation in the study area. The lowest significance percentages of planar curvature and profile curvature were 2.80% and 1.79%, respectively, indicating that the above two environmental factors had a weak influence on the evaluation of landslide susceptibility in the study area.

5.4. Uncertainty of the Coupling Models

Hybrid models of statistical methods and machine learning methods have been increasingly applied to landslide susceptibility evaluation, which significantly increases the prediction accuracy of the models. Statistical methods are vital links between landslide susceptibility indices and environmental factors, and their linkage performance takes on a critical significance to the prediction accuracy of machine learning models. The commonly used statistical methods are deterministic factor, weight of evidence, information value, entropy index, and frequency ratio. The current research has not specified which statistical methods can improve the prediction accuracy of machine learning models. Different statistical methods bring great uncertainty to the combination of machine learning methods for landslide susceptibility prediction. In this study, only information values and frequency ratios were selected using three machine learning methods (including logistic regression, support vector machine, and random forest). In future research, more statistical methods and machine learning methods will be employed to analyze the uncertainty patterns in landslide susceptibility prediction.

6. Conclusions

Landslide susceptibility mapping is a key link in landslide hazard control. This study used Weixin County of China as the research area and selected appropriate environmental factors according to the data of historical landslide disaster points as the basic data. Three single models (LR, SVM, and RF) and coupled models (IV–LR, IV–SVM, IV–RF, FR–LR, FR–SVM, and FR–RF) based on the information volume (IV) and frequency ratio (FR) were constructed to carry out landslide susceptibility evaluation in Weixin County and generate landslide susceptibility map. The accuracy of the model was evaluated by various statistical indexes, and the accuracy of the model was evaluated by the ROC curve. In summary, the main conclusions were as follows: (1) The landslide susceptibility map obtained by the single models LR, SVM, and RF and the coupling models based on IV and FR (IV–LR, IV–SVM, IV–RF, FR–LR, FR–SVM, and FR–RF) had a good effect. The areas with high landslide hazard and high landslide risk in Weixin County were mainly distributed in the west, southeast, central, and northwest regions, extending along roads and rivers. The very low and low susceptibility areas were mainly distributed in the northern mountainous areas with fewer human activities and the southern areas with higher forest coverage. The accuracy and reliability of the model were verified by statistical index parameters, and the accuracy of the coupled model was higher than that of the single model on the whole. (2) The ROC curve, AUC value, and statistical index were used to evaluate and compare model performances. The overall accuracy of susceptibility based on single models was lower than that of coupled models. The FR–RF coupling model had the highest accuracy, and the AUC value was 0.949. Under the optimal FR–RF model, the importance analysis of environmental factors needs to strengthen the monitoring of mountains near roads and areas with sparse vegetation to prevent the occurrence of landslides caused by human activities and natural rainfall.
This study described in detail the construction of a single machine learning model and a coupled model based on the information content (IV) and frequency ratio (FR) and compared the performance between the models. The accuracy of the coupled model was further verified in this paper. In addition, this study can provide the government decision-making efficiency of landslide prevention and control, which is conducive to the rapid response of landslide warning. Integrated risk assessors and land use planning could also benefit from our findings.

Author Contributions

W.H. and G.C. conceptualized and drafted the manuscript and were responsible for the research design, experiment, and analysis. J.Z., Y.L., B.Q., W.Y. and Q.C. reviewed and edited the manuscript. Y.L. supported the data preparation and the interpretation of the results. All of the authors contributed to editing and reviewing the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number 41761081.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available on request from the authors.

Acknowledgments

We are grateful to the editors and anonymous reviewers of Sensor Journal. Thanks to their comments and suggestions, the quality of our paper has improved significantly. We would like to thank the Weixin County Bureau of Natural Resources for providing important information and the Yunnan Provincial Geological Survey Institute for providing valuable landslide and geological data.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Cruden, D.M. A simple definition of a landslide. Bull. Int. Assoc. Eng. Geol. 1991, 43, 27–29. [Google Scholar] [CrossRef]
  2. Ge, D.; Dai, K.; Guo, Z.; Li, Z. Early identification of serious geological hazards with integrated remote sensing technologies: Thoughts and recommendations. Geom. Inf. Sci. Wuhan Univ. 2019, 44, 949–956. [Google Scholar]
  3. Zhao, Z.; Zhang, F.; Zheng, J. Evaluation of landslide susceptibility by multiple adaptive regression spline method. Geom. Inf. Sci. Wuhan Univ. 2021, 46, 442–450. [Google Scholar]
  4. Wang, Y.; Fang, Z.; Niu, R.; Peng, L. Landslide susceptibility analysis based on deep learning. J. Geo-Inf. Sci. 2021, 23, 2244–2260. [Google Scholar]
  5. Guo, Z.Z.; Yin, K.L.; Fu, S.; Huang, F.; Gui, L.; Xia, H. Evaluation of landslide susceptibility based on GIS and WOE-BP model. Earth Sci. 2019, 44, 4299–4312. [Google Scholar]
  6. Sezer, E.A.; Pradhan, B.; Gokceoglu, C. Manifestation of an adaptive neuro-fuzzy model on landslide susceptibility mapping: Klang valley, Malaysia. Expert Syst. Appl. 2011, 38, 8208–8219. [Google Scholar] [CrossRef]
  7. Ulrich, K.; Benjamin, J.G.; Ghazanfar, A.K.; Lewis, A.O. GIS-based landslide susceptibility mapping for the 2005 Kashmir earthquake region. Geomorphology 2008, 101, 631–642. [Google Scholar]
  8. Huang, F.; Cao, Z.; Guo, J.; Jiang, S.; Li, S.; Guo, Z. Comparisons of heuristic, general statistical and machine learning models for landslide susceptibility prediction and mapping. Catena 2020, 191, 104580. [Google Scholar] [CrossRef]
  9. Panchal, S.; Shrivastava, A.K. Application of analytic hierarchy process in landslide susceptibility mapping at regional scale in GIS environment. J. Stat. Manag. Syst. 2020, 23, 199–206. [Google Scholar] [CrossRef]
  10. Zhu, A.; Miao, Y.; Wang, R.; Zhu, T.; Deng, Y.; Liu, J.; Yang, L.; Qin, C.; Hong, H. A comparative study of an expert knowledge-based model and two data-driven models for landslide susceptibility mapping. Catena 2018, 166, 317–327. [Google Scholar] [CrossRef]
  11. Pourghasemi, H.R.; Pradhan, B.; Gokceoglu, C. Application of fuzzy logic and analytical hierarchy process (AHP) to landslide susceptibility mapping at Haraz Watershed, Iran. Nat. Hazards 2012, 63, 965–996. [Google Scholar] [CrossRef]
  12. Jaafari, A.; Zenner, E.K.; Panahi, M.; Shahabi, H. Hybrid artificial intelligence models based on a neuro-fuzzy system and metaheuristic optimization algorithms for spatial prediction of wildfire probability. Agric. For. Meteorol. 2019, 266–267, 198–207. [Google Scholar] [CrossRef]
  13. Gholami, M.; Ghachkanlu, E.N.; Khosravi, K.; Pirasteh, S. Landslide prediction capability by comparison of frequency ratio, fuzzy gamma and landslide index method. J. Earth Syst. Sci. 2019, 128, 42. [Google Scholar] [CrossRef] [Green Version]
  14. Costache, R.; Tien Bui, D. Spatial prediction of flood potential using new ensembles of bivariate statistics and artificial intelligence: A case study at the Putna river catchment of Romania. Sci. Total Environ. 2019, 691, 1098–1118. [Google Scholar] [CrossRef]
  15. Lima, P.; Steger, S.; Glade, T.; Murillo-García, F.G. Literature review and bibliometric analysis on data-driven assessment of landslide susceptibility. J. Mt. Sci. 2022, 19, 1670–1698. [Google Scholar] [CrossRef]
  16. Chanu, M.L.; Bakimchandra, O. A Comparative Study on Landslide Susceptibility Mapping Using AHP and Frequency Ratio Approach; Springer: Singapore, 2021; pp. 267–281. [Google Scholar]
  17. Wu, H.; Song, T. An evaluation of landslide susceptibility using probability statistic modeling and GIS’s spatial clustering analysis. Hum. Ecol. Risk Assess. Int. J. 2018, 24, 1952–1968. [Google Scholar] [CrossRef]
  18. Costache, R. Flash-Flood Potential assessment in the upper and middle sector of Prahova river catchment (Romania). A comparative approach between four hybrid models. Sci. Total Environ. 2019, 659, 1115–1134. [Google Scholar] [CrossRef] [PubMed]
  19. Li, A.A.; Garman, R.H.; Kaufmann, W.; Auer, R.N.; Bolon, B. Weight of evidence (WOE) and benchmark dose (BMD) analysis: Brain morphometry and startle behavior as examples. Neurotoxicol. Teratol. 2015, 100, 113. [Google Scholar] [CrossRef]
  20. Sifa, S.F.; Mahmud, T.; Tarin, M.A.; Haque, D.M.E. Event-based landslide susceptibility mapping using weights of evidence (WoE) and modified frequency ratio (MFR) model: A case study of Rangamati district in Bangladesh. Geol. Ecol. Landsc. 2019, 4, 222–235. [Google Scholar] [CrossRef]
  21. Wang, Q.; Li, W.; Chen, W.; Bai, H. GIS-based assessment of landslide susceptibility using certainty factor and index of entropy models for the Qianyang County of Baoji city, China. J. Earth Syst. Sci. 2015, 124, 1399–1415. [Google Scholar] [CrossRef] [Green Version]
  22. Zhao, Z.; Liu, Z.Y.; Xu, C. Slope Unit-Based Landslide Susceptibility Mapping Using Certainty Factor, Support Vector Machine, Random Forest, CF-SVM and CF-RF Models. Front. Earth Sci. 2021, 9, 589630. [Google Scholar] [CrossRef]
  23. Mandal, S.; Mondal, S. Weighted Overlay Analysis (WOA) Model, Certainty Factor (CF) Model and Analytical Hierarchy Process (AHP) Model in Landslide Susceptibility Studies; Springer International Publishing: Cham, Switzerland, 2018; pp. 135–162. [Google Scholar]
  24. Ozdemir, A. A Comparative Study of the Frequency Ratio, Analytical Hierarchy Process, Artificial Neural Networks and Fuzzy Logic Methods for Landslide Susceptibility Mapping: Taşkent (Konya), Turkey. Geotech. Geol. Eng. 2020, 38, 4129–4157. [Google Scholar] [CrossRef]
  25. Yilmaz, I. Landslide susceptibility mapping using frequency ratio, logistic regression, artificial neural networks and their comparison: A case study from Kat landslides (Tokat—Turkey). Comput. Geosci. 2009, 35, 1125–1138. [Google Scholar] [CrossRef]
  26. Xu, K.; Guo, Q.; Li, Z.W.; Xiao, J.; Qin, Y.S.; Chen, D.; Kong, C.F. Landslide susceptibility evaluation based on BPNN and GIS: A case of Guojiaba in the Three Gorges Reservoir Area. Int. J. Geogr. Inf. Sci. 2015, 29, 1111–1124. [Google Scholar] [CrossRef]
  27. Ali, S.; Parvin, F.; Pham, Q.B.; Khedher, K.M.; Dehbozorgi, M.; Rabby, Y.W.; Anh, D.T.; Nguyen, D.H. An ensemble random forest tree with SVM, ANN, NBT, and LMT for landslide susceptibility mapping in the Rangit River watershed, India. Nat. Hazards 2022, 113, 1601–1633. [Google Scholar] [CrossRef]
  28. Zhou, X.; Wen, H.; Zhang, Y.; Xu, J.; Zhang, W. Landslide susceptibility mapping using hybrid random forest with GeoDetector and RFE for factor optimization. Geosci. Front. 2021, 12, 101211. [Google Scholar] [CrossRef]
  29. Lee, S.; Lee, M.J.; Jung, H.S.; Lee, S. Landslide susceptibility mapping using Naïve Bayes and Bayesian network models in Umyeonsan, Korea. Geocarto Int. 2019, 35, 1665–1679. [Google Scholar] [CrossRef]
  30. Guo, Z.; Shi, Y.; Huang, F.; Fan, X.; Huang, J. Landslide susceptibility zonation method based on C5.0 decision tree and K-means cluster algorithms to improve the efficiency of risk management. Geosci. Front. 2021, 12, 249–267. [Google Scholar] [CrossRef]
  31. Yuan, X.; Liu, C.; Nie, R.; Yang, Z.; Li, W.; Dai, X.; Cheng, J.; Zhang, J.; Ma, L.; Fu, X.; et al. A Comparative Analysis of Certainty Factor-Based Machine Learning Methods for Collapse and Landslide Susceptibility Mapping in Wenchuan County, China. Remote Sens. 2022, 14, 3259. [Google Scholar] [CrossRef]
  32. Di Napoli, M.; Carotenuto, F.; Cevasco, A.; Confuorto, P.; Di Martire, D.; Firpo, M.; Pepe, G.; Raso, E.; Calcaterra, D. Machine learning ensemble modelling as a tool to improve landslide susceptibility mapping reliability. Landslides 2020, 17, 1897–1914. [Google Scholar] [CrossRef]
  33. Pradhan, B.; Seeni, M.I.; Kalantar, B. Performance Evaluation and Sensitivity Analysis of Expert-Based, Statistical, Machine Learning, and Hybrid Models for Producing Landslide Susceptibility Maps; Springer International Publishing: Cham, Switzerland, 2017; pp. 193–232. [Google Scholar]
  34. Nguyen, V.T.; Tran, T.H.; Ha, N.A.; Ngo, V.L.; Nadhir, A.A.; Tran, V.P.; Duy Nguyen, H.; Ma, M.; Amini, A.; Prakash, I.; et al. GIS Based Novel Hybrid Computational Intelligence Models for Mapping Landslide Susceptibility: A Case Study at Da Lat City, Vietnam. Sustainability 2019, 11, 7118. [Google Scholar] [CrossRef] [Green Version]
  35. Yang, L.; Wei, C. Landslide Susceptibility Evaluation Using Hybrid Integration of Evidential Belief Function and Machine Learning Techniques. Water 2019, 12, 113. [Google Scholar]
  36. Gu, T.; Li, J.; Wang, M.; Duan, P. Landslide susceptibility assessment in Zhenxiong County of China based on geographically weighted logistic regression model. Geocarto Int. 2021, 37, 4952–4973. [Google Scholar] [CrossRef]
  37. Xiao, B.; Zhao, J.; Li, D.; Zhao, Z.; Zhou, D.; Xi, W.; Li, Y. Combined SBAS-InSAR and PSO-RF Algorithm for Evaluating the Susceptibility Prediction of Landslide in Complex Mountainous Area: A Case Study of Ludian County, China. Sensors 2022, 22, 8041. [Google Scholar] [CrossRef]
  38. Mehdi, S.; Baharak, M.; Hasan, A.; Abolfazl, M. Assessing landslide susceptibility using machine learning models: A comparison between ANN, ANFIS, and ANFIS-ICA. Environ. Earth Sci. 2020, 79, 536. [Google Scholar]
  39. Zhou, C.; Yin, K.; Cao, Y.; Ahmed, B.; Li, Y.; Catani, F.; Pourghasemi, H.R. Landslide susceptibility modeling applying machine learning methods: A case study from Longju in the Three Gorges Reservoir area, China. Comput. Geosci. 2018, 112, 23–37. [Google Scholar] [CrossRef] [Green Version]
  40. Pham, B.T.; Pradhan, B.; Tien Bui, D.; Prakash, I.; Dholakia, M.B. A comparative study of different machine learning methods for landslide susceptibility assessment: A case study of Uttarakhand area (India). Environ. Model. Softw. 2016, 84, 240–250. [Google Scholar] [CrossRef]
  41. Xiong, K.; Adhikari, B.R.; Stamatopoulos, C.A.; Zhan, Y.; Wu, S.; Dong, Z.; Di, B. Comparison of Different Machine Learning Methods for Debris Flow Susceptibility Mapping: A Case Study in the Sichuan Province, China. Remote Sens. 2020, 12, 295. [Google Scholar] [CrossRef] [Green Version]
  42. Pham, B.T.; Shirzadi, A.; Shahabi, H.; Omidvar, E.; Singh, S.K.; Sahana, M.; Asl, D.T.; Ahmad, B.B.; Quoc, N.K.; Lee, S. Landslide Susceptibility Assessment by Novel Hybrid Machine Learning Algorithms. Sustainability 2019, 11, 4386. [Google Scholar] [CrossRef] [Green Version]
  43. Zhao, C.; Lu, Z. Remote Sensing of Landslides—A Review. Remote Sens. 2018, 10, 279. [Google Scholar] [CrossRef] [Green Version]
  44. Zhu, Z.; Gan, S.; Yuan, X.; Zhang, J. Landslide Susceptibility Mapping with Integrated SBAS-InSAR Technique: A Case Study of Dongchuan District, Yunnan (China). Sensors 2022, 22, 5587. [Google Scholar] [CrossRef] [PubMed]
  45. Cheng, J.Y.; Dai, X.A.; Wang, Z.K.; Li, J.Z.; Qu, G.; Li, W.L.; She, J.X.; Wang, Y.L. Landslide Susceptibility Assessment Model Construction Using Typical Machine Learning for the Three Gorges Reservoir Area in China. Remote Sens. 2022, 14, 2257. [Google Scholar] [CrossRef]
  46. Shahzad, N.; Ding, X.L.; Abbas, S. A Comparative Assessment of Machine Learning Models for Landslide Susceptibility Mapping in the Rugged Terrain of Northern Pakistan. Appl. Sci. 2022, 12, 2280. [Google Scholar] [CrossRef]
  47. Angillieri, M.Y.E. Debris flow susceptibility mapping using frequency ratio and seed cells, in a portion of a mountain international route, Dry Central Andes of Argentina. Catena 2020, 189, 104504. [Google Scholar] [CrossRef]
  48. Marjanović, M.; Kovačević, M.; Bajat, B.; Voženílek, V. Landslide susceptibility assessment using SVM machine learning algorithm. Eng. Geol. 2011, 123, 225–234. [Google Scholar] [CrossRef]
  49. Stumpf, A.; Kerle, N. Object-oriented mapping of landslides using Random Forests. Remote Sens. Environ. 2011, 115, 2564–2577. [Google Scholar] [CrossRef]
  50. Polat, A. An innovative, fast method for landslide susceptibility mapping using GIS-based LSAT toolbox. Environ. Earth Sci. 2021, 80, 217. [Google Scholar] [CrossRef]
  51. Youssef, A.M.; Pourghasemi, H.R. Landslide susceptibility mapping using machine learning algorithms and comparison of their performance at Abha Basin, Asir Region, Saudi Arabia. Geosci. Front. 2021, 12, 639–655. [Google Scholar] [CrossRef]
  52. Wu, R.Z.; Hu, X.D.; Mei, H.B.; He, J.Y.; Yang, J.Y. Spatial susceptibility assessment of landslides based on random forest: A case study from Hubei section in the three gorges reservoir area. Earth Sci. 2021, 46, 321–330. [Google Scholar]
  53. Zhang, J.; Yin, K.; Wang, J.; Liu, L.; Huang, F. Evaluation of landslide susceptibility for Wanzhou district of Three Gorges Reservoir. Chin. J. Rock Mech. Eng. 2016, 35, 284–296. [Google Scholar]
  54. Langping, L.I.; Hengxing, L.; Changbao, G.; Yongshuang, Z. Geohazard Susceptibility Assessment along the Sichuan Tibet Railway and Its Adjacent Area Using an Improved Frequency Ratio Method. Geoscience 2017, 31, 911. [Google Scholar]
  55. Zevenbergen, L.W.; Thorne, C.R. Quantitative analysis of land surface topography. Earth Surf. Process. Landf. 1987, 12, 47–56. [Google Scholar] [CrossRef]
  56. Camera, C.A.S.; Bajni, G.; Corno, I.; Raffa, M.; Stevenazzi, S.; Apuani, T. Introducing intense rainfall and snowmelt variables to implement a process-related non-stationary shallow landslide susceptibility analysis. Sci. Total Environ. 2021, 786, 147360. [Google Scholar] [CrossRef] [PubMed]
  57. Xi, W.; Li, G.; Moayedi, H.; Nguyen, H. A particle-based optimization of artificial neural network for earthquake-induced landslide assessment in Ludian county, China. Geomat. Nat. Hazards Risk 2019, 10, 1750–1771. [Google Scholar] [CrossRef] [Green Version]
  58. Hadmoko, D.S.; Lavigne, F.; Sartohadi, J.; Hadi, P. Landslide hazard and risk assessment and their application in risk management and landuse planning in eastern flank of Menoreh Mountains, Yogyakarta Province, Indonesia. Nat. Hazards 2010, 54, 623–642. [Google Scholar] [CrossRef]
  59. Deng, X.; Sun, G.; He, N.; Yu, Y. Landslide susceptibility mapping with the integration of information theory, fractal theory, and statistical analyses at a regional scale: A case study of Altay Prefecture, China. Environ. Earth Sci. 2022, 81, 346. [Google Scholar] [CrossRef]
  60. Faming, H.; Zhou, Y.E.; Chi, Y. Uncertainties of landslide susceptibility prediction: Different attribute interval divisions of environmental factors and different data-based models. Earth Sci. 2020, 45, 4535–4549. [Google Scholar]
Figure 1. Location of the study area and landslide inventory. (a) The regions of China, small boxes indicate local areas. (b) In Yunnan Province, the red part is Weixin County. (c) Location of landslide in Weixin County.
Figure 1. Location of the study area and landslide inventory. (a) The regions of China, small boxes indicate local areas. (b) In Yunnan Province, the red part is Weixin County. (c) Location of landslide in Weixin County.
Sensors 23 02549 g001
Figure 2. (a,c) Examples of landslides from Google Earth and their locations marked in Figure 1. (b,d) Examples of landslides from field investigation and their locations marked in Figure 1.
Figure 2. (a,c) Examples of landslides from Google Earth and their locations marked in Figure 1. (b,d) Examples of landslides from field investigation and their locations marked in Figure 1.
Sensors 23 02549 g002
Figure 3. The environmental of factors: (a) elevation, (b) slope, (c) aspect, (d) distance to roads, (e) distance to rivers, (f) distance to faults, (g) rainfall, (h) land use, (i) lithology, (j) NDVI, (k) plan curvature, and (l) profile curvature.
Figure 3. The environmental of factors: (a) elevation, (b) slope, (c) aspect, (d) distance to roads, (e) distance to rivers, (f) distance to faults, (g) rainfall, (h) land use, (i) lithology, (j) NDVI, (k) plan curvature, and (l) profile curvature.
Sensors 23 02549 g003aSensors 23 02549 g003bSensors 23 02549 g003c
Figure 4. Methodological flowchart.
Figure 4. Methodological flowchart.
Sensors 23 02549 g004
Figure 5. Pearson correlation values between factors.
Figure 5. Pearson correlation values between factors.
Sensors 23 02549 g005
Figure 6. Landslide susceptibility mapping of different models: (a) LR, (b) SVM, (c) RF, (d) IV–LR, (e) IV–SVM, (f) IV–RF, (g) FR–LR, (h) RF–SVM, and (i) FR–RF.
Figure 6. Landslide susceptibility mapping of different models: (a) LR, (b) SVM, (c) RF, (d) IV–LR, (e) IV–SVM, (f) IV–RF, (g) FR–LR, (h) RF–SVM, and (i) FR–RF.
Sensors 23 02549 g006aSensors 23 02549 g006b
Figure 7. Precision comparsion of the model.
Figure 7. Precision comparsion of the model.
Sensors 23 02549 g007
Figure 8. ROC curves with associated AUC value validation set: (a) LR, IV–LR, and FR–LR; (b) SVM, IV–SVM, and FR–SVM; (c) RF, IV–RF, and FR–RF; and (d) LR, SVM, and RF.
Figure 8. ROC curves with associated AUC value validation set: (a) LR, IV–LR, and FR–LR; (b) SVM, IV–SVM, and FR–SVM; (c) RF, IV–RF, and FR–RF; and (d) LR, SVM, and RF.
Sensors 23 02549 g008aSensors 23 02549 g008b
Figure 9. Landslide susceptibility prediction and case validation and analysis: (a) Simple labeling based on the FR–RF model. (b) General picture of Longdongyan landslide. (c) Details of landslides.
Figure 9. Landslide susceptibility prediction and case validation and analysis: (a) Simple labeling based on the FR–RF model. (b) General picture of Longdongyan landslide. (c) Details of landslides.
Sensors 23 02549 g009
Figure 10. Importance ranking of environmental factors of the FR–RF model.
Figure 10. Importance ranking of environmental factors of the FR–RF model.
Sensors 23 02549 g010
Table 1. Landslide genesis factors and their sources.
Table 1. Landslide genesis factors and their sources.
FactorsClustersSources
ElevationTopographicASTER GDEM (spatial resolution of 30 m × 30 m)
(http://www.gscloud.cn/, accessed on 13 May 2021)
Slope
Aspect
Plan curvature
Profile curvature
Distance to faultsGeologicalGeological map of China
(Scale of 1:20,000)
Lithology
RainfallHydrologicalData Center of the Chinese Academy of Sciences
(Spatial resolution 1 km × 1 km)
(http://www.resdc.cn, accessed on 15 July 2021)
Distance to riversThe thematic map of the river system in China from 91 satellite map assistant software
(Scale of 1:50,000)
NDVILand coverThe geospatial data cloud network
(The Landsat 8 OLI image on http://www.gscloud.cn/ (accessed on 2 August 2021))
Land useThe land use and land cover change database in China
(http://www.resdc.cn, accessed on 10 September 2021)
Distance to roadsOpen street map data (https://www.openstreetmap.org, accessed on 3 August 2021)
Table 2. Confusion matrix.
Table 2. Confusion matrix.
Prediction SituationActual Situation
Positive SampleNegative Sample
LandslideTrue positive (TP)False positive (FP)
Negative sampleFalse negative (FN)True negative (TN)
Table 3. Collinearity diagnostic results of influence factors.
Table 3. Collinearity diagnostic results of influence factors.
FactorsTOLVIF
Elevation0.5131.951
Slope0.8441.185
Aspect0.9961.004
Profile curvature0.7131.403
Plan curvature0.7181.392
Distance to rivers0.8381.193
Distance to roads0.8531.172
Distance to faults0.8001.25
Rainfall0.7021.424
NDVI0.6031.659
Lithology0.7791.283
Land use0.7211.388
Table 4. Classification of attribute intervals of environmental factors with information values and frequency ratios.
Table 4. Classification of attribute intervals of environmental factors with information values and frequency ratios.
FactorsClassesFRIV
Elevation/m478–8001.660.51
801–9000.68−0.38
901–10000.97−0.03
1001–11001.070.06
1101–12000.97−0.03
1201–13001.530.43
1300–15000.9−0.1
>15000.33−1.12
AspectFlat (−1)0−9.94
North (0–22.5, 337.5–360)0.88−0.25
Northeast (22.5–67.5)0.93−0.08
East (67.5–11.25)1.050.05
Southeast (112.5–157.5)0.98−0.02
South (157.5–202.5)1.250.23
Southwest (202.5–247.5)1.20.18
West (247.5–292.5)1.050.05
Northwest (292.5–337.5)0.83−0.19
NDVI−0.7115.032.71
0.16–0.344.591.52
0.34–0.471.630.49
0.47–0.771.360.31
0.77–0.920.5−0.69
Distance to faults/m0–2001.690.52
200–4001.970.68
400–6001.690.53
600–8001.160.15
800–10000.94−0.06
1000–12001.220.2
>12000.87−0.14
Profile curvature−17.95–−8.80−9.73
−8.8–0.350.88−0.13
0.35–4.921.240.21
4.92–9.501.550.44
9.50–23.230−10.79
Plan curvature−12.33–−0.151.080.08
−0.15–0.290.94−0.07
0.29–1.490.96−0.05
1.49–2.470.94−0.06
2.47–15.421.060.06
Slope0–100.91−0.1
10–200.97−0.03
20–251.020.02
25–301.070.07
30–401.090.08
40–450.91−0.09
45–760.8−0.23
Rainfall/mm1076.24–1080.390.86−0.16
1080.39–1082.460.42−0.91
1082.46–1084.530.2−1.64
1084.53–1088.971.430.42
1088.97–1095.681.150.2
1095.68–1099.621−0.01
1099.62–1101.400.88−0.15
Distance to roads/m0–2003.021.1
200–40010
400–6000.73−0.32
600–8000.42−0.86
800–10000.56−0.57
>10000.35−1.05
Distance to rivers/m0–2001.440.36
200–4001.010.01
400–6000.71−0.35
600–8001.120.11
800–10000.56−0.58
>10000.91−0.09
Land useForestland0.78−0.25
Farmland0.83−0.19
Residential areas4.651.54
Grassland1.680.52
Water2.620.96
Bareland9.732.28
Gardenland1.30.26
LithologyDolomite1.740.56
Mudstone and limestone0.44−0.82
Shales1.40.34
Magmatic veins1.250.22
Metamorphic rock2.450.9
Granitic rocks0.74−0.3
Table 5. Coefficients and constant terms for LR, IV–LR, and FR–LR.
Table 5. Coefficients and constant terms for LR, IV–LR, and FR–LR.
FigureLRIV–LRFR–LR
Elevation00.280.36
Slope0.022.212.1
Aspect01.121.08
Distance to rivers0−0.374−0.421
Distance to roads−0.0010.7610.598
Distance to faults00.2980.288
Constant15.1−0.1−6.5
Profile curvature1.0721.8021.182
Plan curvature1.1080.5750.555
Rainfall−0.0130.6420.791
NDVI−3.380.6670.237
Lithology0.1580.6310.656
Land use0.0650.3430.182
Table 6. Distribution of landslides at all susceptibility levels with different models.
Table 6. Distribution of landslides at all susceptibility levels with different models.
ModelGeohazard LevelNumber of Area PixelsArea Pixels of Percentage (%)Number of Landslide PixelsRatio of Landslides (%)Frequency (FR)
LRVery low274,71217.742903.960.22
Low426,52027.555787.90.29
Moderate416,40826.9142419.460.72
High281,49318.18220330.11.66
Very high149,0729.63282438.584.01
IV–LRVery low393,10925.392783.80.15
Low479,922315507.510.24
Moderate339,14821.91126217.240.79
High212,18813.71223530.542.23
Very high133,8358.64299440.914.73
FR–LRVery low397,36325.672803.830.15
Low479,75730.995607.650.25
Moderate344,91322.28126617.30.78
High192,37712.43227531.082.5
Very high133,7948.64293840.144.65
SVMVery low572,22436.962473.370.09
Low373,93624.157029.590.4
Moderate218,64014.1290812.410.88
High244,53615.79131818.011.14
Very high138,8688.97414456.626.31
IV–SVMVery low677,66843.772638.840.2
Low382,46524.78249.890.4
Moderate172,21811.126849.350.84
High180,61011.67137214.311.23
Very high135,2448.74417657.626.6
FR–SVMVery low652,79342.162102.870.07
Low374,54624.1982011.20.46
Moderate173,88311.236869.370.83
High180,75111.67142319.441.67
Very high136,2318.8418057.116.49
RFVery low272,85017.62480.660.01
Low429,93227.773464.730.07
Moderate413,43926.797813.360.4
High306,26719.78172723.61.51
Very high125,7178.12422057.667.05
IV–RFVery low368,51523.8370.510.02
Low470,43430.393725.080.17
Moderate342,15422.192412.620.57
High238,67615.42178224.351.58
Very high128,4198.29420257.416.92
FR–RFVery low385,21624.88360.490.02
Low471,98530.493254.440.15
Moderate333,96121.5787411.940.55
High230,19314.87175824.021.62
Very high126,8508.19432659.117.21
Table 7. Analysis of prediction ability of different models by validation samples.
Table 7. Analysis of prediction ability of different models by validation samples.
LRSVMRFIV–LRIV–SVMIV–RFFR–LRFR–SVMFR–RF
TP161117531874168517681968175218091979
TN143816581779148516211831142415961868
FP725505384678542332739567295
FN618476355544461261477420250
Precision (%)68.9677.6482.9971.3176.5485.5770.3376.1487.03
Recall (%)72.2778.6584.0775.5979.3288.2978.6081.1688.78
Accuracy (%)69.4277.6683.1772.1877.1686.5072.3177.5387.59
F1 score (%)70.5878.1483.5373.3977.9086.9174.2478.5787.90
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

He, W.; Chen, G.; Zhao, J.; Lin, Y.; Qin, B.; Yao, W.; Cao, Q. Landslide Susceptibility Evaluation of Machine Learning Based on Information Volume and Frequency Ratio: A Case Study of Weixin County, China. Sensors 2023, 23, 2549. https://doi.org/10.3390/s23052549

AMA Style

He W, Chen G, Zhao J, Lin Y, Qin B, Yao W, Cao Q. Landslide Susceptibility Evaluation of Machine Learning Based on Information Volume and Frequency Ratio: A Case Study of Weixin County, China. Sensors. 2023; 23(5):2549. https://doi.org/10.3390/s23052549

Chicago/Turabian Style

He, Wancai, Guoping Chen, Junsan Zhao, Yilin Lin, Bingui Qin, Wanlu Yao, and Qing Cao. 2023. "Landslide Susceptibility Evaluation of Machine Learning Based on Information Volume and Frequency Ratio: A Case Study of Weixin County, China" Sensors 23, no. 5: 2549. https://doi.org/10.3390/s23052549

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop