Next Article in Journal
Optimization Method of Subway Station Guide Sign Based on Pedestrian Walking Behavior
Previous Article in Journal
Sustainable and Secure Transport: Achieving Environmental Impact Reductions by Optimizing Pallet-Package Strength Interactions during Transport
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Geological Disaster Susceptibility Evaluation of a Random-Forest-Weighted Deterministic Coefficient Model

1
Institute of International Rivers and Eco-Security, Yunnan University, Kunming 650500, China
2
Yunnan International Joint Laboratory of Critical Mineral Resource, Kunming 650500, China
3
School of Earth Science, Yunnan University, Kunming 650500, China
4
Faculty of Architecture and City Planning, Kunming University of Science and Technology, Kunming 650500, China
5
Yunnan Architectural Engineering Design Company Limited, Kunming 650501, China
*
Author to whom correspondence should be addressed.
Sustainability 2023, 15(17), 12691; https://doi.org/10.3390/su151712691
Submission received: 10 June 2023 / Revised: 18 August 2023 / Accepted: 21 August 2023 / Published: 22 August 2023

Abstract

:
An assessment of regional vulnerability to geological disasters can directly indicate the extent and intensity of risks within the study area; thus, providing precise guidance for disaster management efforts. However, in the evaluation of geological disaster susceptibility using a single deterministic coefficient model, the direct superimposition of deterministic coefficient values for each evaluation factor, without considering their objective weights, can impact the accuracy of susceptibility zoning outcomes. To address this limitation, this research proposes a novel approach: geological disaster susceptibility evaluation using a random-forest-weighted deterministic coefficient model. In this method, the objective weight of each evaluation factor is calculated based on a deterministic coefficient model and a parameter-optimized random forest model. By weighting and superimposing the deterministic coefficient values of each evaluation factor, a comprehensive deterministic coefficient map is generated. This map is further divided using the natural breakpoint method to obtain a geological disaster susceptibility zoning map. To validate the accuracy of the evaluation results, partition statistics and the ROC (Receiver Operating Characteristic) curve of the test sample points are utilized. The findings demonstrate that the model performs well in evaluating geological disaster susceptibility in Huize County. The evaluation results are considered reliable and accurate, highlighting the effectiveness of the proposed approach for assessing and zoning geological disaster susceptibility in the region.

1. Introduction

Geological disasters are natural disasters that change the geological conditions due to changes in the geological environment or the impact of human activities, resulting in the destruction, displacement, or instability of geological bodies inside the crust or on the surface, which in turn causes damage to human life, production, and environment. There are various types of geological disasters, including earthquakes, landslides, debris flows, collapses, etc. These disasters may lead to casualties, property losses, and even long-term damage to the environment. Therefore, it is of great practical significance to predict and prevent geological disasters. Yunnan Province, situated on the southwest border of China, is a representative mountainous plateau region. Its distinct geographical and geological environment and its topography are the primary causes behind the frequent occurrence of geological disasters in the region. Areas with mountainous plateau terrain are prone to geological hazards such as collapses, landslides, and debris flows, posing significant threats to national economic development, the safety of lives and properties, as well as causing substantial damage to infrastructure and the ecological environment within the affected areas [1]. Among geological disasters in China, slope-related incidents rank as the second most prevalent after earthquakes [2]. Within the realm of geological disaster management, susceptibility zoning plays a vital role and constitutes an essential component of prevention and mitigation efforts. A scientifically sound and reliable geological disaster susceptibility zoning map plays a strategic guiding role in various work processes by effectively predicting disaster-prone areas and reducing the losses caused by such events. Furthermore, the assessment of geological disaster susceptibility can offer valuable guidance for environmental management and ecological protection. It helps prevent long-term and irreparable environmental damage caused by disasters, promoting environmental preservation and ecological equilibrium. This assessment provides a robust foundation for the rational development and utilization of resources, mitigating environmental harm stemming from overexploitation and deforestation, and ensuring sustainable resource utilization. By evaluating the susceptibility of geological disasters, appropriate planning and construction standards can be formulated. This process reinforces the prevention and preparedness measures for urban geological disasters, fostering the sustainable development of cities. Therefore, the study of geological disaster susceptibility assessment is indispensable for regional sustainable development.
To date, both domestic and international scholars have made significant progress in evaluating geological disaster susceptibility. The primary evaluation models can be categorized into three types: qualitative models, mathematical statistical models, and machine learning models [3]. Qualitative models primarily include the Analytic Hierarchy Process [4] and the fuzzy comprehensive evaluation method [5]. These models determine the weight of evaluation factors based on experts’ understanding of the disaster mechanisms and their accumulated experience. However, the evaluation results from such models may contain subjective elements. Geological disasters often result from a combination of internal and inducing factors and their spatial and temporal characteristics are not influenced by human subjective consciousness. Mathematical statistical models mainly encompass the information quantity method [6] and the deterministic coefficient method [7]. These models objectively reflect the contribution of each evaluation factor to the occurrence of geological disasters based on the information value carried by each factor. However, they require extensive engineering geological data, which may not be suitable for large research areas. Furthermore, these models fail to fully consider the correlation between various evaluation factors in weight determination and merely overlap information values with deterministic coefficient values. As a result, the accuracy of the evaluation results can be affected to some extent. Machine learning models mainly include artificial neural networks [8] and support vector machines [9]. While these models can better adapt to the complex nonlinear characteristics of slope-related geological disasters, they often suffer from issues related to weak interpretation or overfitting of prediction results [10]. Additionally, these methods do not directly reflect the relative importance of each evaluation index, failing to effectively guide the focus of disaster prevention and control efforts.
In order to enhance the precision of model evaluation results, mitigate overfitting, and capture the relative importance of each evaluation factor, the random forest algorithm, as a prominent technique in ensemble learning, has emerged as a promising solution to address the aforementioned limitations. Merghadi et al. [11] conducted research in the Mira Basin in North Africa, focusing on the prediction accuracy of five machine learning models, including random forest, boosting gradient machines, and neural networks, for assessing landslide susceptibility due to geological hazards. The results demonstrated that the random forest model exhibited superior prediction accuracy. Likewise, He et al. [12] utilized the random forest model to investigate the rapid evaluation of earthquake-induced landslide susceptibility. Through verification, the study found that the random forest model displayed strong predictive capabilities, allowing for the analysis of three influential factors contributing to the occurrence of disasters. Furthermore, Goetz et al. [13] compared the predictive capabilities of mathematical statistical models and machine learning models in landslide susceptibility assessments across three distinct regions in Austria. The findings indicated that the random forest model was better suited to landslide susceptibility modeling. In recent years, a coupled model that integrates multiple individual models with the random forest technique has attracted considerable attention among scholars in the field of geological disasters, as it improves prediction and generalization capabilities. Liu et al. [14] combined the information method with the random forest model to investigate geological disaster susceptibility in Gongbujiangda County. The random forest model was employed to obtain the weights of the evaluation factors, and a weighted linear combination was used to generate the susceptibility zoning map. Similarly, Zheng et al. [15] examined the application of the deterministic coefficient method and the random forest model in assessing landslide susceptibility in Mangshi, Yunnan. The study revealed that the CF-RF (Certain Factors—Random Forests) model achieved higher accuracy than the standalone RF model. The evaluation results obtained can serve as a scientific reference.
The current research aims to identify and assess the likelihood and potential risks of geological disasters in specific areas. Its primary objectives are to provide decision-makers with scientific insights for preventing and mitigating disaster risks. By avoiding the construction of densely populated areas and critical infrastructure in high-risk zones, potential disaster risks can be reduced, safeguarding both lives and property and promoting sustainable development. To that end, and based on the literature cited above, this study explores the integration of machine learning and quantitative models to minimize the influence of subjective factors on the accuracy of geological disaster susceptibility assessments. For this study, eight factors associated with geological disasters, namely, elevation, slope, topographic relief, stratigraphic lithology, normalized vegetation index, distance from a fault, distance from a road, and distance from a river, have been selected as evaluation indices. The research area selected is Huize County in Yunnan Province, China. A CF-RF model is constructed to assess the susceptibility of geological disasters in this region, and the accuracy of the results is thoroughly analyzed. Furthermore, the significance of each evaluation factor is examined through the parameter optimization random forest model. The anticipated research outcomes will serve as a valuable reference for relevant departments engaged in geological disaster management and land-use spatial planning.

2. Overview of the Study Area

Huize County, under the administration of Qujing City, is situated in the northeastern part of Yunnan Province. It is positioned on the eastern bank of the Jinsha River, at the summit of the main peak of the Wumeng Mountains, and serves as the juncture between the eastern Yunnan Plateau and the western Guizhou Plateau. The county’s boundaries span between 103° 03′~103° 55′ E and 25° 48′~27° 04′ N. The terrain is characterized by steep slopes, towering heights, and numerous deep ravines, presenting an undulating landscape. Generally, the topography features a high-northwest-to-low-southeast gradient, with a relative elevation difference of 3322 m from west to east. The highest peak in the region is Dahailiangzi Guniuzhai, which stands at 4017 m and is the highest point in Qujing City. Conversely, the lowest point lies at the junction of the Xiaojiang River and Jinsha River, with an altitude of 695 m, marking the lowest elevation in Qujing City. The geomorphological attributes of the area classify it into three categories: mountain landforms, basin landforms, and localized glacial landforms. Huize experiences a typical temperate plateau monsoon climate, characterized by cool summers and cold winters, with indistinct seasonal boundaries. The annual maximum rainfall reaches 1500 mm in the higher-altitude regions, while the minimum annual rainfall in lower-altitude areas reaches 500 mm. On average, the annual rainfall measures approximately 817.1 mm. The region boasts a well-developed surface water system, with the Xiaojiang River, Yili River, and Niulan River as the primary watercourses. Due to river channel incision, slopes in the area are susceptible to destabilization under external forces, leading to disasters such as landslides and debris flows. The exposed geological strata in Huize span from the Proterozoic to the Quaternary with the exception of the Cretaceous period. Areas characterized by hard and soft/hard rock compositions are prone to geological disasters. The region experiences pronounced tectonic activity, primarily in the form of synclines. Notably, the Xiaojiang Fault exhibits significant recent tectonic movement and is prone to frequent earthquakes. Existing disasters primarily occur in proximity to fault zones. The geographic location and distribution of disaster points in the study area are depicted in Figure 1. This study selected 497 existing geological disaster points in Huize County, including 373 landslide disasters, 46 unstable slope disasters, 61 debris flows, and 17 collapse disasters. It can be seen that landslide disaster constitutes one of the main types of disasters in the region, accounting for 75% of the total number of disasters, which seriously threatens the safety of people’s lives and property in the region. Landslide disasters in Huize County mainly occur from June to August, which coincides with the flood season in this area. Heavy rainfall is the primary trigger factor of geological disasters in Huize County.
In Huize County, geological disasters are increasingly influenced by single-point heavy rainfall and continuous rainfall, with slope-related geological disasters being more likely during the flood season. These disasters pose a significant threat to the lives of approximately 921 households and 1545 individuals, as well as properties valued at around 36.13 million yuan. Over the years, the region has experienced five geological disaster risks resulting in two actual disasters, leading to property losses amounting to approximately 5.05 million yuan. For instance, the landslide in Malu town in 2019 serves as a prime example of the impact of heavy rains. During a single night, multiple landslides occurred, causing the collapse of highways, destruction of numerous houses, washing away of vehicles, breaking of trees, and inundation of villagers’ homes, resulting in significant loss of life and property. However, at the same time, Huize County also boasts abundant biological and mineral resources, a rich national cultural heritage, an extensive road network, and numerous scenic spots. Given this, conducting an evaluation of geological hazard susceptibility in Huize County becomes a pressing concern. Understanding susceptibility to geological disasters can enable the identification and assessment of potential risks, facilitating early warning and prediction of disaster events. This, in turn, empowers people to take necessary pre-emptive measures to minimize losses and casualties caused by such disasters. Furthermore, such evaluations provide a scientific foundation and decision-making support, thereby reducing the risk of geological disasters, safeguarding the lives and property of the populace, fostering sustainable socio-economic development, and promoting environmental preservation and ecological balance.

3. Introduction of the Model Method

3.1. Certainty Factor Model

The deterministic coefficient model was first introduced by Shortliffe et al. [16] and later refined by Heckerman [17]. This model serves as a probability function, drawing upon both existing data and theoretical knowledge to formulate a mathematical representation of the relationships between variables. The primary objective of this representation is to leverage it for predicting future outcomes. Central to the model is the principle of causality. It posits a defined relationship between variables; specifically, a change in one variable inevitably leads to a consequent change in another. In the context of evaluating geological disaster susceptibility, the model operates on the assumption that conditions leading to future geological disasters mirror those of past events. The deterministic coefficient model facilitates the analysis of relationships between the likelihood of geological disasters and their causative factors, such as geological conditions, topography, and hydrological states. By crafting such a model, it becomes possible to forecast potential geological disasters in the future. Consequently, appropriate preventive strategies can be designed to mitigate potential damages. The computational formula for this model is:
CF = P P a P P s P P a 1 P P s        P P a P P s P P a P P s P P s 1 P P a       P P a < P P s
where P P a represents the conditional probability of geological disasters in the evaluation factor classification level a. The assessment of geological disaster susceptibility typically involves expressing it as the ratio between the number of geological disaster points in a specific level and the corresponding proportion of the area occupied by that level a. P P s is expressed by the ratio of the number of geological disaster points in the whole study area to the area of the whole study area. From Equation (1), it is evident that the CF value ranges between [–1~1]. A positive value signifies a higher susceptibility to geological disasters, with values closer to 1 indicating a greater contribution of the corresponding factor. Conversely, a negative value suggests a lower likelihood of geological disasters occurring, with values closer to –1 indicating a diminished probability under the influence of such factors. When the calculated result approaches 0, it becomes difficult to determine the impact of the factor on the susceptibility to geological disasters.

3.2. Random Forest Model

The random forest model [18] is an ensemble learning method that integrates multiple decision trees through random sampling. It is widely regarded as one of the more accurate and reliable models. The application of the random forest model in assessing geological disaster susceptibility can be summarized as follows. Firstly, the model randomly selects N times from the sample set of N evaluation factors using the Bootstrap aggregating method to create N training sets, with each training set used to build a decision tree. During the growth process of each decision tree, internal nodes split and expand without pruning. At each splitting point, the internal nodes randomly sample from the feature set and select the optimal split among the extracted features. Ultimately, each decision tree independently votes on the calculated results of all decision trees, with the category receiving the most votes considered the final result. The random forest model involves two types of random sampling: one randomly samples the original sample set to obtain different training sets, while the other randomly samples the features at each internal node to determine the split feature set for that node. By employing these two random-sampling processes, the model reduces sensitivity to data noise and outliers during the classification process, thereby improving prediction accuracy and effectively mitigating overfitting. The key steps of the random forest algorithm are depicted in Figure 2 and can be summarized as follows:
(1)
Randomly sample (with replacement) N times from the original sample data to create N training sets. In this sampling process, approximately 36% of the data will not be included due to the sampling with replacement. These unselected data are referred to as Out-of-Bag (OOB) and is often used to evaluate model performance and select optimal parameters.
(2)
Each training set generates a decision tree. Assuming the original sample has M features, K features (where K ≤ M) are randomly selected as the split feature set at each internal node of all decision trees. The optimal value of K should be determined based on the OOB error.
(3)
The decision trees are generated using the classification and regression tree (CART) algorithm, with each tree growing freely without pruning.
(4)
Repeat the above steps j times to create j training sets and j decision trees. The unselected data corresponding to each decision tree form the j out-of-bag data.
(5)
Combine all the decision trees to form a random forest. The output of the random forest is determined by aggregating the votes from each decision tree. The results from each tree are summarized, and the category with the highest number of votes is considered the final result. For regression tasks, the final result is obtained by calculating the mean of the results from each tree.
The random forest algorithm employs impurity partitioning, specifically using the Gini index, to calculate the relative weight of each evaluation factor in the assessment of geological disaster susceptibility. The Gini reduction value R i x y of the evaluation factor i in the random forest is calculated when the nodes are divided. After adding   R i x y at all tree nodes, R i x y of all evaluation factors of each tree in the forest is averaged, which indicates the importance of the evaluation factor i.
Δ i = x = 1 m y = 1 c R i x y / i = 1 n x = 1 m y = 1 c R i x y
where m denotes the number of parent nodes; c denotes the number of classification child nodes; n is the total number of evaluation factors;   R i x y represents the Gini reduction value of the i-th evaluation factor at the y-th child node under the parent node x; and Δ i represents the relative importance of the i-th evaluation factor.

3.3. Weighted Linear Combination

Weighted linear combination exhibits a simple and comprehensible calculation principle, making it a popular approach in the assessment of geological disaster susceptibility. This method utilizes a linear model for a comprehensive evaluation and can be effectively integrated with GIS technology [19]. The weighted linear combination model employs a variety of prediction models to enhance the precision and robustness of its forecasts. This approach linearly amalgamates the outputs from several foundational models, applying weights to each model’s results. Each foundational model could be a distinct classifier, regressor, or another type of prediction model. Within the realm of geological disaster susceptibility assessment, a myriad of prediction models can be utilized to dissect the relationship between the likelihood of geological disasters and diverse contributory factors. Subsequently, the weighted linear combination model is harnessed to integrate the outputs of these individual prediction models, yielding more reliable and consistent estimates of geological disaster risks. By deploying the weighted linear combination model, the strengths of multiple prediction models can be synergistically leveraged, compensating for the individual limitations of each, thereby enhancing the accuracy and trustworthiness of geological disaster risk predictions. Leveraging these aforementioned benefits, this study integrates the relative importance of each evaluation factor, determined by the random forest model, with the deterministic coefficient. This integration establishes a geological disaster susceptibility evaluation model. The formula is:
y = i = 1 n Δ i x i
where y is the comprehensive CF value; n is the number of evaluation factors; and Δ i is the importance of evaluation factor i, calculated by the formula (2).
In this study, building upon the foundational deterministic coefficient model, the deterministic coefficient values for each evaluation factor are determined using Equation (1). Utilizing the multi-value extraction to point function in ArcGIS, deterministic coefficient values associated with each evaluation factor are mapped onto each sample dataset. Subsequently, sample datasets with CF values are introduced into the random forest model as both training and test sets. The significance of each evaluation index is ascertained using Equation (2), serving as the objective weight for the factor. Employing the grid calculator tool in ArcGIS, the CF value for each evaluation factor is weighted and amalgamated, taking into account the weight value for each factor as determined by the random forest model via Equation (3). The resulting computation is regarded as the composite information value for this geological disaster susceptibility assessment.

4. Evaluation of Geological Disaster Susceptibility

4.1. Selection and Classification of Evaluation Factors

The evaluation of geological disaster susceptibility relies on two critical components: evaluation factors and evaluation models. To ensure reliable susceptibility zoning results, it is essential to utilize objective and authentic data, as well as employ scientifically and logically sound classification methods. The selection of an appropriate evaluation model should align with the specific disaster conditions present in the study area. A well-chosen evaluation model can address the limitations of individual evaluation methods, tackle the challenges associated with correlation and weighting between evaluation factors, and ultimately enhance the accuracy of the evaluation results. Considering the survey data of geological disasters within the study area, a detailed investigation of typical slope geological disaster points has been conducted. Through a comprehensive analysis of the strong linkage between the factors causing disasters and the mechanisms behind their occurrence, it is evident that topography, geological structure, stratigraphic lithology, water systems, and human engineering activities exert significant control over geological disasters in Huize County. The distribution of previous geological disasters predominantly occurs in regions characterized by pronounced topographic relief, steep terrain, active tectonic movement, intense human engineering activities, low vegetation coverage, and alternating soft and hard strata, as well as rock mass joints and fissures. In this study, eight disaster-causing factors, namely, elevation, gradient, terrain relief, stratum lithology, normalized difference vegetation index, distance from a fault, distance from a road, and distance from a river, have been selected as evaluation indices (rainfall, earthquakes, and other inducing factors are typically not considered in the evaluation of geological disaster susceptibility). These evaluation factors are classified according to the patterns of disaster formation. In this paper, the elevation is divided into five categories: <1500 m, 1500–2100 m, 2100–2700 m, 2700–3500 m, and >3500 m. The slope is divided into five levels: <15°, 15°–25°, 25°–35°, 35°–50°, and >50°. Topographic relief is divided into five categories: <10 m, 10–50 m, 50–200 m, 200–500 m, and >500 m. According to the mechanical properties of rock mass in the study area, the formation lithology is divided into four categories: soft rock, soft and hard rock, loose soil, and hard rock. The normalized difference vegetation index is divided into five categories: <0, 0–0.2, 0.2–0.4, 0.4–0.6, and >0.6. The buffer zone is established with 500 m as the interval range, and both the distance from a fault and the distance from a road are divided into four categories: <500 m, 500–1000 m, 1000–1500 m, and >1500 m. Finally, the distance from a water system is divided into five categories: <500 m, 500–1000 m, 1000–1500 m, 1500–2000 m, and >2000 m. The classification results are presented in Figure 3.

4.2. Evaluation Factor of Each Grade CF Value Calculation

The collected data for each evaluation factor are imported into the ArcGIS platform, where they are reclassified and subsequently rasterized. All vector data are converted to raster format and combined with the disaster point data. Using the ArcGIS multi-value extraction to point function, the number of disaster points corresponding to different classification levels for each evaluation factor can be determined. After the classification and counting process, the CF values for each evaluation factor are calculated using Equation (1), and the results are presented in Table 1. Table 1 reveals that elevation, topographic relief, and NDVI (Normalized Difference Vegetation Index) exert significant control over the initial conditions of geological disasters in Huize County. Areas with altitudes below 1500 m, topographic relief below 10 m, and NDVI below 0 indicate relatively strong human engineering activities, disruption of the natural balance of rocks and soil, limited vegetation, and severe soil erosion. These factors contribute to the occurrence of geological disasters. Conversely, at altitudes exceeding 3500 m, the influence of heavy rainfall, low-temperature weather, and physical weathering of rock masses further amplify the risk of disaster occurrence. The CF values for road, fault, and water system indicators decrease with increasing distance, suggesting that a greater distance corresponds to a lower probability of disasters.

4.3. Multicollinearity Diagnosis of Evaluation Factors

When there is a high degree of correlation among evaluation factors in a susceptibility evaluation model, this often leads to lower-than-expected accuracy, thereby undermining the effectiveness of susceptibility zoning results in providing guidance. Prior to applying the model, it is crucial to assess the independence of each factor through a check for multicollinearity [20]. In this study, the CF values of both disaster point and non-disaster point data were calculated, normalized, and imported into SPSS 25.0 software. Subsequently, a multicollinearity diagnosis was conducted using SPSS 25.0 software to assess the eight evaluation factors (Table 2). The results were evaluated based on tolerance (T) and variance inflation factor (VIF). VIF measures the ratio of the variance in the presence of multicollinearity between explanatory variables to the variance in the absence of multicollinearity, providing insights into the degree of variance inflation caused by multicollinearity. T and VIF are reciprocals of each other. A value of T < 0.1 or VIF > 10 suggests a high degree of collinearity [21]. Analysis of Table 2 indicates that all evaluation indicators have T values exceeding 0.1 or, equivalently, VIF values below 10. This suggests that collinearity between the evaluation factors is not substantial and that independence is well-maintained, allowing for the application of these factors in the model.

4.4. Parameter Optimization of Random Forest Model

(1)
Feature number selection
The accuracy and performance of the random forest model are closely tied to the selection of model parameters, as only the most appropriate parameters can fully exploit the model’s potential. In the random forest algorithm, each decision tree is constructed using a random subset of features. The size of this subset is controlled by setting the maximum number of features. Therefore, determining the optimal feature number is a critical consideration in establishing the model. In this study, the out-of-bag error (OOB) is utilized to explore the optimal feature number. The OOB error is calculated using Bootstrap sampling, where approximately 36% of the sample data are left uncollected. These uncollected samples are then employed to evaluate the model’s classification prediction performance and calculate the error rate. Through iterative analysis using the Python programming language, the OOB error is statistically analyzed under different feature numbers (Figure 4). A smaller OOB error indicates higher prediction accuracy for the corresponding model. Generally, it is recommended to set the maximum number of features to a value greater than 1. This enables each decision tree to consider multiple features during the splitting of nodes, facilitating a better capture of feature interactions and information. Conversely, if the maximum number of features is set to 1, each decision tree will only consider one feature when splitting nodes, potentially limiting the performance of the random forest. From Figure 4, it can be observed that the OOB error is minimized when the number of features is 2 and increases when the number of features exceeds 2. Based on this analysis, the feature number for this model is set to 2.
(2)
Selection of the Number of Decision Trees
Similarly, a Python loop iteration is employed to obtain the out-of-bag errors of the model under different numbers of decision trees for statistical analysis (Figure 5). The model accuracy is further analyzed in conjunction with the confusion matrix (Table 3). In Figure 5, the blue line depicts the curve of the model’s error as the number of decision trees increases during the iteration process, providing insights into the model’s error. The curve exhibits fluctuations until reaching a certain number of decision trees, at which point, it stabilizes. The blue line shows slight fluctuations within a defined range, indicating the optimal number of decision trees for the model. From Table 3, it can be concluded that the classification and prediction accuracy of the model is demonstrated by the test dataset. The test dataset comprises 299 sample points, of which the model correctly classifies 293 samples. Hence, the classification and prediction accuracy of the RF model is 0.980 (the ratio of correctly classified sample points to the total number of test samples), indicating a high level of accuracy. Figure 5 and Table 3 clearly illustrate that the model’s error is 13%, and its accuracy is 98%. In summary, the number of decision trees in this model is set to 750.

4.5. Random Forest Weight

The 497 geological hazard points within the study area were labeled as the first category, denoted as 1. Similarly, the 497 non-geological disaster points selected based on the CF prior model were labeled as the second category, with the label 0. This resulted in a total of 994 sample data points with categorical attributes. The CF value of each sample point, obtained through the ArcGIS multi-value extraction to point function, was used as the training and testing data for the random forest model. Thus, each sample point contained the attribute values of each evaluation factor and a category label. The dataset was divided into 70% training data and 30% test data, comprising both disaster and non-disaster points, which were then input into the random forest model. Using PyCharm, the random forest model calculated the Gini coefficient reduction value for each evaluation factor during node splitting, and the importance of each evaluation index was determined using Equation (2) as the factor weight. The resulting weight diagram of the evaluation factors (Figure 6) illustrates the impact of NDVI, distance from a road, and elevation on geological disasters. In the context of geological disaster prevention and control, efforts should focus on addressing and mitigating the effects of these three factors.

4.6. CF Value Weighting

The random forest model, with optimized parameters, is used to calculate the weight of each evaluation index. The weight of each index is then linearly combined with the corresponding CF value layer using Equation (3). This process results in the creation of a weighted superposition layer of CF values.

4.7. Geological Disaster Prone Zoning

The weighted superposition grid map of CF values in the study area is divided into four levels, namely, low susceptibility (Figure 7a), medium susceptibility (Figure 7b), high susceptibility (Figure 7c), and extremely high susceptibility (Figure 7d). This division is achieved using the natural breakpoint method commonly employed in the ArcGIS reclassification function. The resulting zoning map of geological disaster susceptibility in Huize County, based on the deterministic coefficient model and random forest model, is presented in Figure 7. To validate the susceptibility zoning map, the locations of the previously occurred disaster points within the study area are examined. It can be seen that the susceptibility zoning results obtained in this study align closely with the distribution of disaster points, thereby offering valuable scientific references for relevant departments in their work.

5. Evaluation Results and Accuracy Verification

5.1. Analysis of Prone Partition Results

This study applies the deterministic coefficient model to calculate the CF value of each impact factor. By optimizing the parameters of the general random forest model, the weights of each evaluation factor are determined based on parameter optimization. The CF value layers of each evaluation factor are then linearly weighted using the objective weights obtained from the random forest model. This approach aims to enhance the accuracy and reliability of geological disaster susceptibility evaluation results.
(1)
The statistical analysis reveals that the study area can be categorized into extremely highly prone areas, highly prone areas, medium-susceptibility areas, and low-susceptibility areas, comprising 4.21%, 47.90%, 29.50%, and 18.39% of the total area, respectively. Figure 7 indicates that the northern, western, eastern, north-central, and southeastern parts of Huize County are highly susceptible to geological disasters, particularly Zhichang Township, Malu Township, Yiche Town, Nagu Town, Dahai Township, Luna Township, Yulu Township, Dajing Town, Huohong Township, Leye Town, Shangcun Township, and Jiache Township. These areas are characterized by steep terrain, active neotectonic movement, weak formation lithology, developed rock mass structure, dense road networks, strong human engineering activities, well-developed water systems, and evident river erosion. In light of these key areas, a relocation and avoidance plan is proposed to relocate vulnerable populations to lower-lying areas. Afforestation efforts should be undertaken, and existing vegetation should be effectively protected. The strengthening of monitoring and early warning systems is crucial, along with enhancing the disaster prevention capabilities of local communities. In areas highly susceptible to geological disasters, the geological conditions are extremely unstable, resulting in a significant incidence of such disasters. This situation severely hampers local resource development and production activities, exerting a substantial negative impact on local sustainable development. The frequent occurrence of geological disasters adversely affects soil quality and pollutes water resources, thereby inflicting severe damage on the local ecosystem and natural environment. Moreover, the emergence of these disaster-prone areas also curtails local economic development and social progress, leading to regional poverty and backwardness. To ensure the responsible development and utilization of these highly prone areas, it is imperative to emphasize scientific planning and effective management. Strengthening monitoring and early warning systems for geological disasters is crucial in order to minimize their impact on sustainability, environmental integrity, and human society.
(2)
The medium-susceptibility areas are primarily distributed along the fault zone, characterized by relatively gentle slopes compared to the highly prone areas. Human engineering activities are generally prevalent in these areas. Prevention measures should include restrictions on human engineering activities and the implementation of scientifically and contextually appropriate engineering measures to control small-slope geological disasters. The occurrence of disaster-prone areas in geological settings will exert a significant impact on sustainability, environmental integrity, and human society. These areas exhibit relatively unstable geological conditions, resulting in a high incidence of geological disasters that impede local resource development and production activities. Moreover, these disasters will cause environmental pollution and inflict damage on the ecosystem, thereby hindering local sustainable development. Simultaneously, the occurrence of geological disasters will pose a grave threat to the safety of life and property, as well as the quality of life for residents.
(3)
In the low-susceptibility areas, favorable geographical and geological conditions indicate a lower likelihood of future geological disasters. It is important to strictly prohibit all unreasonable human engineering activities in these areas and prioritize the protection of the existing geological and ecological environment. The emergence of low-susceptibility areas to geological disasters can significantly contribute to local sustainable development. These areas typically boast a relatively stable geological environment, and thus, serve as crucial zones for industrial and urban development. Due to their low probability of geological disasters, these areas experience minimal environmental damage. As a result, the ecosystem and natural environment remain relatively stable, creating favorable conditions for local ecological protection and environmental management. Concurrently, these regions offer opportunities for the development of local tourism and cultural industries, thereby fostering local economic prosperity and social advancement. Nevertheless, it is important to emphasize that a low susceptibility to geological disasters does not imply a complete absence of risk. Even in such areas, a comprehensive risk assessment and monitoring of geological disasters are still necessary to promptly address potential disaster risks.

5.2. Validation of Prone Partition Results

The accuracy of the evaluation results can be objectively verified by statistically testing the distribution of sample points (30% disaster points and non-disaster points) in each prone area, providing a more robust verification of the susceptibility zoning results. The ROC curve is widely used to assess the accuracy of the evaluation model in geological disaster susceptibility studies. It plots the correct prediction proportion of geological disasters against the proportion of incorrectly predicted disasters [22]. The accuracy of the model is represented by the area under the curve (AUC), which ranges from 0.5 to 1. A higher AUC value indicates greater accuracy of the evaluation model.
In this study, the accuracy of the evaluation results is verified through the susceptibility evaluation results test table (Table 4) and ROC curve (Figure 8). Table 4 shows that the low-susceptibility areas cover 18.39% of the total study area, and there are 28 test points in the extremely highly prone areas, accounting for 9.4% of the total test points. The S values, calculated as the ratio of the percentage of test points falling into each prone area to the total test points divided by the percentage of each prone area to the total area of the study area (S = D/M), are 1.42, 0.96, 0.76, and 2.23 for the respective prone areas. As shown in Figure 8, the AUC value of the evaluation model in this study is 0.989, which is greater than 0.5 and closer to 1. This indicates that the model’s accuracy meets the requirements and provides an objective and accurate reflection of the geological disaster-prone situation in the study area.

6. Discussion

In previous research, assessments of susceptibility to regional geological disasters largely employed qualitative or quantitative, or a combination of both methods [23,24,25,26,27,28,29,30,31,32]. Such an evaluation approach harbors significant subjective biases, potentially compromising the accuracy of the results. Furthermore, earlier evaluation models did not articulate the significance of each assessment index. This shortcoming occasionally led to the misclassification of disaster points as non-disaster points. In this research, we integrate the deterministic coefficient model with the random forest model. This combination aims to diminish the influence of subjective biases, leaning more towards the insights provided by objective data for predicting areas prone to geological hazards. Consequently, this enhances the reliability and credibility of the evaluation. Simultaneously, the random forest model elucidates the significance of each assessment criterion. By employing the deterministic coefficient prior model for the selection of non-disaster points, we can mitigate the aforementioned misclassifications, thereby augmenting the predictive accuracy of the model.
By employing the random forest algorithm, the Gini coefficient of each evaluation factor can be swiftly determined, enabling the calculation of their relative importance [33,34,35,36,37,38,39,40]. This approach significantly reduces the impact of subjective factors on the weighting of evaluation factors and enhances the objectivity and scientific validity of the evaluation results. Compared to the direct random selection of non-disaster point samples, the use of a CF prior model for selecting non-disaster point samples substantially reduces the error of misclassifying potential disaster points as non-disaster points. In this study, the deterministic coefficient model is utilized to establish an initial geological disaster susceptibility zoning map, aiming to minimize the influence of subjective factors on the evaluation results through the selection of non-disaster samples and the calculation of random forest model sample data. Through iterative calculation, the optimal parameters of the model are determined to ensure the best match between the evaluation model and the study area, thereby enhancing the reliability of the evaluation results. Nonetheless, several limitations exist in this study: (1) the exclusion of earthquakes as an evaluation factor may impact the accuracy of the evaluation results; (2) the steep terrain of Huize County poses challenges in obtaining geological data in certain areas; and (3) the subjective influence on the evaluation results cannot be entirely eliminated due to the artificial selection and classification of evaluation factors in the CF prior model.

7. Conclusions

Geological disasters are inherent given the Earth’s dynamic nature and will persist in human society. As researchers, our quest to understand geological disasters and mitigate their impact on human society is an ongoing endeavor. Moving forward, it is essential to explore new approaches that leverage advanced technologies such as meteorological satellites, high-resolution remote sensing, artificial intelligence, and spatiotemporal big data. These technologies can enable the rapid acquisition and classification of evaluation indicators for a given study area. By objectively weighting and superimposing these indicators, we can generate geological disaster susceptibility zoning maps. This will facilitate the swift assessment of geological disaster susceptibility and provide valuable insights for various sectors of human society, ultimately promoting high-quality development and resilience.
This study assesses geological disaster susceptibility in Huize County, Yunnan Province, China, using the deterministic coefficient model combined with the random forest model. The reliability of the evaluation findings is substantiated through the statistical scrutiny of the test sample point distribution and the ROC curve analysis. The outcomes indicate that the eight evaluation indices chosen in this study serve as the initial conditions triggering geological disasters in Huize County. Notably, factors such as elevation, vegetation coverage, human engineering endeavors, and stratigraphic lithology play a pivotal role in augmenting the incidence of geological disasters within the county. In essence, the study findings align with the preliminary hypothesis, thereby validating it. Based on the results, we can arrive at the following conclusions:
(1)
Medium-susceptibility and highly and extremely highly prone areas of geological disasters are widely distributed throughout Huize County, encompassing 81.61% of the study area. These areas are prone to geological disasters and require significant attention from relevant departments. During the flood season, the likelihood of geological disasters increases substantially, posing a serious threat to the safety of residents’ lives and property. Through susceptibility mapping of geological disasters, one can assess and forecast the likelihood of such events, enhancing early warning and prediction capabilities. Based on the outcomes of this zoning, tailored prevention and mitigation strategies can be devised. Implementing region-specific strategies ensures that these preventive measures are both precise and impactful, thereby minimizing disaster-induced damage and safeguarding lives and property. Moreover, susceptibility mapping for geological disasters furnishes a rigorous foundation for local planning, construction, and development, further advancing local sustainable growth.
(2)
The random forest algorithm provides the relative importance of each evaluation factor, enabling disaster management departments to implement targeted prevention and control measures for key disaster-causing factors. The random forest model adeptly handles missing data and outliers within a dataset and is less prone to overfitting. It can autonomously choose the best splitting variables and points while seamlessly addressing feature selection and integration challenges. The efficiency of the random forest model is further enhanced by its capability for parallel processing during both training and prediction phases. Given its capacity to process high-dimensional data and mitigate data noise, this model boasts a commendable prediction accuracy.
(3)
The optimal number of features denotes the ideal count of features chosen from the entire set for each decision tree during the construction of a random forest model. If the optimal number of features is excessively small, the model may encounter underfitting issues, whereas if it is overly large, the model may suffer from overfitting. The accuracy of the random forest model rises with an increase in the number of decision trees. However, after reaching a certain threshold, adding more decision trees no longer enhances model accuracy and may even lead to a decrease. Through Python language loop iteration, the optimal number of features and decision trees are determined by calculating the out-of-bag error under different feature numbers and decision trees. Through the optimization of model parameters, the random forest model’s prediction accuracy and stability can be enhanced. This optimization process aims to reduce the risk of overfitting, improve the model’s generalization ability, and enhance its efficiency, thereby making it more suitable for practical applications. The evaluation of selected parameters is performed using a confusion matrix. This matrix offers an intuitive way to assess the model’s classification results, allowing for a clear understanding of its prediction accuracy. It enables the assessment of error types and helps researchers to comprehend the model’s performance under different scenarios. By analyzing the confusion matrix, the factors contributing to model prediction errors can be identified, leading to model refinement and improved prediction accuracy. Ultimately, the optimal parameter random forest model is attained.
(4)
Utilizing a CF prior model for selecting non-disaster point samples can effectively mitigate the misclassification of potential disaster points as non-disaster points. This approach also minimizes the influence of subjective factors on evaluation results, lending greater credibility to the findings. Introducing the deterministic coefficient prior model enables the incorporation of a prior distribution, thereby reducing the risk of model overfitting and enhancing its generalization capability. By incorporating prior knowledge and distributions, the model’s structure is optimized and refined, resulting in improved reliability and stability of the model.
(5)
The evaluation model for susceptibility to geological disasters proposed in this paper can be adapted for use in other regions susceptible to similar geological threats. When implementing this model, it is imperative to gather pertinent data such as geological structure, topography, lithology, climate, and records of past geological disasters in the target region. Ensuring the precision and comprehensiveness of these data are paramount. Depending on the specifics of the target area, there might be a need to modify the parameters, adjust weights, or incorporate new assessment criteria into the model to make it consistent with the area’s characteristics. This calibration should be undertaken in conjunction with field observations and geological studies within the area of interest. A comparison between the historical geological disaster incidents and the model’s predictive outcomes will shed light on the model’s accuracy and dependability. If the model’s predictions do not align with real events, further refinement and adjustments of the model are warranted. It is crucial to acknowledge that given the unique geological environments and disaster traits of each region, directly employing a model in different settings might necessitate specific adjustments and amendments. Moreover, in regions where data are scarce or incomplete, the model’s applicability might be constrained.

Author Contributions

All authors contributed to the study’s conception and design. Material preparation, data collection, and analysis were performed by S.T., S.Z., J.Z., D.D., J.L. and Y.S. The first draft of the manuscript was written by S.Z. and all authors commented on previous versions of the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Science and Technology Innovation Team Program of Yunnan Province Education Department (Grant NO. CY22624109); the Graduate Tutor Team Program of Yunnan Province Education Department (Grant NO. CY22622205); the Yunnan Fundamental Research Projects (Grant NO. 202301BF070001–020); the Yunnan Key research and development plan program (Grant NO. 202303AP140020); and the Xing Dian Talent Teacher’s Program of Yunnan Province (Grant NO. XDTT202206).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The DEM and NDVI data are openly available in the geospatial data cloud platform at [https://www.gscloud.cn/] (accessed on 3 May 2023). Geological hazard point data and 1:200,000 geological maps are openly available from the Geographic Remote Sensing Ecology Network platform at [http://www.gisrs.cn/] (accessed on 4 May 2023). Water system and road data can be obtained from the National Catalogue Service for Geographic Information at [https://www.webmap.cn/] (accessed on 4 May 2023).

Acknowledgments

Thank you to laboratory colleagues for their hard work and my mentor for their careful guidance, and thank you to funders for sponsoring the work.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Zhang, S.F.; Wang, Y.F.; Jia, B.; Zhao, S.M. Spatial-temporal changes and influencing factors of geologic disasters from 2005 to 2016 in China. J. Geo-Inform. Sci. 2017, 19, 1567–1574. [Google Scholar]
  2. Wang, N.Q.; Wang, Y.F.; Luo, D.H.; Yao, Y. Review of landslide prediction and forecast research in China. Geol. Rev. 2008, 44, 355–361. [Google Scholar] [CrossRef]
  3. Li, Y.Y.; Mei, H.B.; Ren, X.J.; Hu, X.D.; Li, M.D. Geological disaster susceptibility evaluation based on certainty factor and support vector machine. J. Geo-Inform. Sci. 2018, 20, 1699–1709. [Google Scholar]
  4. Laura, Z.F.; Moussa, N.N.; Christian, B.A.M.; Monespérance, M.G.M.; Landry, W.D.P.; Rodrigue, T.K.; Armand, K.D.; Sébastien, O. Landslide susceptibility zonation using the analytical hierarchy process (AHP) in the Bafoussam-Dschang region (West Cameroon). Adv. Space Res. 2023, 71, 5282–5301. [Google Scholar] [CrossRef]
  5. Vakhshoori, V.; Zare, M. Landslide susceptibility mapping by comparing weight of evidence, fuzzy logic, and frequency ratio methods. Geomat. Nat. Hazards Risk 2016, 7, 1731–1752. [Google Scholar] [CrossRef]
  6. Wang, Q.Q.; Guo, Y.H.; Li, W.P.; He, J.H.; Wu, Z.Y. Predictive modeling of landslide hazards in Wen County, northwestern China based on information value, weights-of-evidence, and certainty factor. Geomat. Nat. Hazards Risk 2019, 10, 820–835. [Google Scholar] [CrossRef]
  7. Juliev, M.; Mergili, M.; Mondal, I.; Nurtaev, B.; Pulatov, A.; Hübl, J. Comparative analysis of statistical methods for landslide susceptibility mapping in the Bostanlik District, Uzbekistan. Sci. Total Environ. 2019, 653, 801–814. [Google Scholar] [CrossRef]
  8. Guo, Z.Z.; Yin, K.L.; Fu, S.; Huang, F.M.; Gui, L.; Xia, H. Evaluation of landslide susceptibility based on GIS and WOE-BP model. Earth Sci. 2019, 44, 4299–4312. [Google Scholar]
  9. Huang, F.M.; Yao, C.; Liu, W.P.; Li, Y.J.; Liu, X.W. Landslide susceptibility assessment in the Nantian area of China: A comparison of frequency ratio model and support vector machine. Geomat. Nat. Hazards Risk 2018, 9, 919–938. [Google Scholar] [CrossRef]
  10. Yang, S.; Li, D.Y.; Yan, L.X.; Huang, Y.; Wang, M.Z. Landslide susceptibility assessment in high and steep bank slopes landslide susceptibility assessment in high and steep bank slopes along Wujiang river based on random forest model. Saf. Environ. Eng. 2021, 28, 132–133. [Google Scholar] [CrossRef]
  11. Merghadi, A.; Abderrahmane, B.; Tien Bui, D. Landslide susceptibility assessment at Mila Basin (Algeria): A comparative assessment of prediction capability of advanced machine learning methods. ISPRS Int. J. Geo-Inform. 2018, 7, 268. [Google Scholar] [CrossRef]
  12. He, Q.; Wang, M.; Liu, K. Rapidly assessing earthquake-induced landslide susceptibility on a global scale using random forest. Geomorphology 2021, 391, 107889. [Google Scholar] [CrossRef]
  13. Goetz, J.N.; Brenning, A.; Petschko, H.; Leopold, P. Evaluating machine learning and statistical prediction techniques for landslide susceptibility modeling. Comput. Geosci. 2015, 81, 1–11. [Google Scholar] [CrossRef]
  14. Liu, F.Z.; Dai, T.Y.; Wang, J.C. Geological hazard susceptibility evaluation by coupled random forest and information model: A case study of Gongbujiangda county, Tibet autonomous region. J. Saf. Environ. 2022, 23, 2428–2438. [Google Scholar] [CrossRef]
  15. Zheng, Y.; Chen, J.; Wang, C.; Chen, T.W. Application of certainty factor and random forests model in landslide susceptibility evaluation in Mangshi City, Yunnan Province. Bull. Geol. Sci. Technol. 2020, 39, 131–144. [Google Scholar] [CrossRef]
  16. Shortliffe, E.H.; Davis, R.; Axline, S.G.; Buchanan, B.G.; Green, C.C.; Cohen, S.N. Computer-based consultations in clinical therapeutics: Explanation and rule acquisition capabilities of the MYCIN system. Comput. Biomed. Res. 1975, 8, 303–320. [Google Scholar] [CrossRef]
  17. Heckerman, D. Probabilistic interpretations for MYCIN’s certainty factors. In Readings in Uncertain Reasoning; Shafer, G., Pearl, J., Eds.; Morgan Kaufmann Publishers Inc.: San Mate, CA, USA, 1990; pp. 298–312. [Google Scholar]
  18. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  19. Avtar, R.; Singh, C.K.; Singh, G.; Verma, R.L.; Mukherjee, S.; Sawada, H. Landslide susceptibility zonation study using remote sensing and GIS technology in the Ken-Betwa River Link area, India. Bull. Eng. Geol. Environ. 2011, 70, 595–606. [Google Scholar] [CrossRef]
  20. Hong, H.Y.; Pradhan, B.; Sameen, M.I.; Kalantar, B.; Zhu, A.; Chen, W. Improving the accuracy of landslide susceptibility model using a novel region-partitioning approach. Landslides 2018, 15, 753–772. [Google Scholar] [CrossRef]
  21. Gao, H.X. Some method on treating the collinearity of independent variables in multiple linear regression. Appl. Stat. Manag. 2000, 20, 49–55. [Google Scholar] [CrossRef]
  22. Xu, C.; Xu, X.W. Logistic regression model and its validation for hazard mapping of landslides triggered by Yushu earthquake. J. Eng. Geol. 2012, 20, 326–333. [Google Scholar]
  23. Zheng, Q.; Lyu, H.M.; Zhou, A.N.; Shen, S.L. Risk assessment of geohazards along Cheng-Kun railway using fuzzy AHP incorporated into GIS. Geomat. Nat. Hazards Risk 2021, 1, 1508–1531. [Google Scholar] [CrossRef]
  24. Qi, Y.N.; Wang, L. Application of AHP-entropy weight method in hazards susceptibility assessment in mountain town. Bull. Surv. Mapp. 2021, 6, 112–116. [Google Scholar] [CrossRef]
  25. Zhou, J.X.; Tan, S.C.; Li, J.; Xu, J.; Wang, C.; Ye, H. Landslide Susceptibility Assessment Using the Analytic Hierarchy Process (AHP): A Case Study of a Construction Site for Photovoltaic Power Generation in Yunxian County, Southwest China. Sustainability 2023, 15, 5281. [Google Scholar] [CrossRef]
  26. Shen, S.K.; Zhang, H.L.; Yan, J.Q. Evaluation on the susceptibility of geological disasters in Chicheng County, Hebei Province. Geol. Rev. 2023, 69, 487–488. [Google Scholar] [CrossRef]
  27. Li, L.P.; Lan, H.X.; Guo, C.B.; Zhang, Y.S.; Li, Q.W.; Wu, Y.M. Geohazard Susceptibility Assessment along the Sichuan-Tibet Railway and Its Adjacent Area Using an Improved Frequency Ratio Method. Geoscience 2017, 5, 912–929. [Google Scholar]
  28. França Pereira, F.; Sussel Gonçalves Mendes, T.; Jorge Coelho Simões, S.; de Andrade, M.R.M.; Reiss, M.L.L.; Renk, J.F.C.; da Silva Santos, T.C. Comparison of LiDAR- and UAV-derived data for landslide susceptibility mapping using Random Forest algorithm. Landslides 2023, 20, 579–600. [Google Scholar] [CrossRef]
  29. Wang, H.; Xu, J.; Tan, S.; Zhou, J. Landslide Susceptibility Evaluation Based on a Coupled Informative–Logistic Regression Model—Shuangbai County as an Example. Sustainability 2023, 15, 12449. [Google Scholar] [CrossRef]
  30. Tian, Q.; Zhang, B.; Guo, J.F.; Liu, H.Z.; Chang, Z.L.; Li, Y.J.; Huang, F.M. Landslide susceptibility evaluation based on coupled model of information quantity and logistic regression. Sci. Technol. Eng. 2020, 20, 8460–8468. [Google Scholar]
  31. Li, Y.; Deng, X.; Ji, P.; Yang, Y.; Jiang, W.; Zhao, Z. Evaluation of Landslide Susceptibility Based on CF-SVM in Nujiang Prefecture. Int. J. Environ. Res. Public Health 2022, 19, 14248. [Google Scholar] [CrossRef]
  32. Ma, C.; Yan, Z.; Huang, P.; Gao, L. Evaluation of landslide susceptibility based on the occurrence mechanism of landslide: A case study in Yuan’an county, China. Environ Earth Sci. 2021, 20, 579–600. [Google Scholar] [CrossRef]
  33. Zhao, L.R.; Wu, X.L.; Niu, R.Q.; Wang, Y.; Zhang, K.X. Using the rotation and random forest models of ensemble learning to predict landslide susceptibility. Geomat. Nat. Hazards Risk. 2020, 1, 1542–1564. [Google Scholar] [CrossRef]
  34. Zhang, K.X.; Wu, X.L.; Niu, R.Q.; YANG, K.; Zhao, L.R. The assessment of landslide susceptibility mapping using random forest and decision tree methods in the Three Gorges Reservoir area, China. Environ. Earth Sci. 2017, 76, 405. [Google Scholar] [CrossRef]
  35. Wu, X.; Song, Y.; Chen, W.; Kang, G.; Qu, R.; Wang, Z.; Wang, J.; Lv, P.; Chen, H. Analysis of Geological Hazard Susceptibility of Landslides in Muli County Based on Random Forest Algorithm. Sustainability 2023, 15, 4328. [Google Scholar] [CrossRef]
  36. Alessandro, T.; Carla, L.; Carlo, E.; Gabriele, S.-M. Comparison of Logistic Regression and Random Forests techniques for shallow landslide susceptibility assessment in Giampilieri (NESicily, Italy). Geomorphology 2015, 249, 119–136. [Google Scholar] [CrossRef]
  37. Youssef, A.M.; Pourghasemi, H.R.; Pourtaghi, Z.S.; Mohamed, M.A. Landslide susceptibility mapping using random forest, boosted regression tree, classification and regression tree, and general linear models and comparison of their performance at Wadi Tayyah Basin, Asir Region, Saudi Arabia. Landslides 2016, 13, 839–856. [Google Scholar] [CrossRef]
  38. Natan, M.; Loris, F.; Sylvain, R.; Michael, L.; Andrea, P.; Michel, J.; Mikhail, K. Machine Learning Feature Selection Methods for Landslide Susceptibility Mapping. Math. Geosci. 2014, 46, 33–57. [Google Scholar] [CrossRef]
  39. Huang, P.; Peng, L.; Pan, H.Y. Linking the Random Forests Model and GIS to Assess Geo-Hazards Risk: A Case Study in Shifang County, China. IEEE Access 2020, 8, 28033–28042. [Google Scholar] [CrossRef]
  40. Ageenko, A.; Hansen, L.C.; Lyng, K.L.; Bodum, L.; Arsanjani, J.J. Landslide Susceptibility Mapping Using Machine Learning: A Danish Case Study. ISPRS Int. J. Geo-Inf. 2022, 11, 324. [Google Scholar] [CrossRef]
Figure 1. Overview of the study area.
Figure 1. Overview of the study area.
Sustainability 15 12691 g001
Figure 2. Random forest flow chart.
Figure 2. Random forest flow chart.
Sustainability 15 12691 g002
Figure 3. Evaluation factors grading chart.
Figure 3. Evaluation factors grading chart.
Sustainability 15 12691 g003
Figure 4. Out-of-bag error distribution under different characteristic numbers.
Figure 4. Out-of-bag error distribution under different characteristic numbers.
Sustainability 15 12691 g004
Figure 5. OOB error iteration.
Figure 5. OOB error iteration.
Sustainability 15 12691 g005
Figure 6. Evaluation factors weight diagram.
Figure 6. Evaluation factors weight diagram.
Sustainability 15 12691 g006
Figure 7. Geological disaster-prone zoning.
Figure 7. Geological disaster-prone zoning.
Sustainability 15 12691 g007aSustainability 15 12691 g007b
Figure 8. ROC Curve.
Figure 8. ROC Curve.
Sustainability 15 12691 g008
Table 1. Calculation results of CF value for each evaluation factor classification level.
Table 1. Calculation results of CF value for each evaluation factor classification level.
Evaluation FactorLevelCFEvaluation FactorLevelCF
Elevation<1500 m0.556265LithologySoft rock0.419044
1500–2100 m0.344333 Soft and hard rock−0.195131
2100–2700 m−0.263893 Loose soil−0.620498
2700–3500 m−0.529991 Hard rock−0.385349
>3500 m0.598133Distance from fault<500 m−0.098462
Slope<15°0.074902 500–1000 m0.328815
15°–25°0.101083 1000–1500 m−0.186762
25°–35°−0.292549 >1500 m0.012061
35°–50°−0.494485 Distance from road<500 m0.294626
>50°0.153006 500–1000 m−0.076598
Terrain relief<10 m0.539171 1000–1500 m−0.250823
10–50 m−0.548058 >1500 m−0.362680
50–200 m0.030540 Distance from river<500 m0.264579
200–500 m0.031302 500–1000 m0.111300
>500 m0.642141 1000–1500 m−0.154898
NDVI<00.712982 1500–2000 m−0.268385
0–0.20.242135 >2000 m−0.035985
0.2–0.40.277534
0.4–0.60.162455
>0.6−0.695400
Table 2. Evaluation factor for collinearity diagnosis.
Table 2. Evaluation factor for collinearity diagnosis.
FactorNDVIDistance from RoadDistance from FaultElevationGradientTerrain ReliefDistance from RiverStratum Lithology
Tolerability (T)0.9150.8920.9910.8540.9740.9400.9140.924
Variance Inflation Factor (VIF)1.0931.1211.0091.1711.0271.0641.0941.082
Table 3. Confusion matrix.
Table 3. Confusion matrix.
Confusion MatrixTrue Value/(pcs)
Disaster PointsNon-Disaster Points
Predicted valueDisaster points1432
Non-disaster points4150
Table 4. Results of verification of susceptibility zoning.
Table 4. Results of verification of susceptibility zoning.
Susceptibility DivisionArea/km2Area Proportion/% (M)Test Points/pcsProportion of Disaster Points/% (D)Ratio (S = D/A)
Extremely highly prone areas247.524.21289.402.23
Highly prone areas2818.3047.9010836.240.76
Medium-susceptibility area1735.9229.58428.190.96
Low-susceptibility area1082.2618.397826.171.42
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, S.; Tan, S.; Zhou, J.; Sun, Y.; Ding, D.; Li, J. Geological Disaster Susceptibility Evaluation of a Random-Forest-Weighted Deterministic Coefficient Model. Sustainability 2023, 15, 12691. https://doi.org/10.3390/su151712691

AMA Style

Zhang S, Tan S, Zhou J, Sun Y, Ding D, Li J. Geological Disaster Susceptibility Evaluation of a Random-Forest-Weighted Deterministic Coefficient Model. Sustainability. 2023; 15(17):12691. https://doi.org/10.3390/su151712691

Chicago/Turabian Style

Zhang, Shaohan, Shucheng Tan, Jinxuan Zhou, Yongqi Sun, Duanyu Ding, and Jun Li. 2023. "Geological Disaster Susceptibility Evaluation of a Random-Forest-Weighted Deterministic Coefficient Model" Sustainability 15, no. 17: 12691. https://doi.org/10.3390/su151712691

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop