1. Introduction
Landslides are slope gravity processes that reshape geological landforms by mobilizing landslide blocks [
1]. The main reasons for their formation include the following three aspects: ① geological reasons, such as material properties, structural characteristics, and permeability differences; ② morphological reasons, such as topographic changes and natural processes; and ③ man-made causes, such as deforestation, artificial vibration, and soil erosion [
2]. Landslides can cause huge losses to people’s lives and property [
3,
4,
5]. For example, the landslide in Xinmo Village, Diexi Town, MAO County, Sichuan Province, China, in 2017 left 83 people dead or missing, and the direct economic losses amounted to hundreds of millions of dollars [
6]. Accurate and effective landslide recognition has always been a challenging problem. Landslide identification has been studied by many scholars over the years, and we arrange it in chronological order.
In the early stage, landslide identification technology mainly relied on field investigation and the experience and judgment of geological experts [
7]. Chigira et al. [
8] determined the location, size and distribution of landslides caused by earthquakes in central Niigata Prefecture, Japan, by field investigation and using satellite or aerial image data. However, for some hard-to-reach or dangerous areas, there are certain difficulties in terms of the field survey, which may lead to incomplete data. Keefer et al. [
9] used statistical analysis to evaluate the distribution of earthquake-induced landslides. However, the research process relied on ground measurements with high cost and limited coverage. Xing et al. [
10] conducted a field investigation of a large catastrophic landslide in Guanling County, Guizhou Province, which cost a lot of manpower, and the DAN3D model used showed certain deviations in relation to the prediction of the maximum depth. Galli et al. [
11] conducted a systematic comparative analysis of landslide cataloging maps generated through remote sensing image analysis and field investigation. The technology and experience of the researchers and the complexity of the study area had a certain impact on the analysis process. Due to the great changes in the acquisition methods and data sources in terms of landslide data, there are limitations when dealing with large-scale and multi-dimensional datasets, and the early landslide recognition methods cannot meet the recognition requirements.
With the development of computer technology, machine learning has been widely introduced into the field of landslide detection. Machine learning methods are automated modeling techniques that analyze data to learn the underlying relationships and hidden patterns, thereby building analytical models. These methods are used to produce accurate and reproducible results for landslide susceptibility analysis through an iterative learning process. The commonly used machine learning methods mainly include logistic regression [
12], decision trees [
13], artificial neural networks [
14], support vector machines [
15], random forests [
16], and so on.
Table 1 lists the differences between these machine learning methods. Bai S B et al. [
17], based on the GIS data of Zhongxian County in the Three Gorges Reservoir area, used the logistic regression method to draw a detailed landslide susceptibility map and verified it. Logistic regression usually lacks classification accuracy, and the decision tree algorithm usually performs better than logistic regression, but it is prone to overfitting and has poor generalization ability [
18,
19]. Lian et al. [
20] proposed a new method to establish a landslide displacement prediction model based on a random hidden weight neural network. However, the learning process of the artificial neural network is a black box, which makes the results difficult to interpret. The SVM algorithm solves the problem of the interpretability of artificial neural networks. Kuntan [
21] proposed to use CUDA and OpenMP to accelerate the SVM classification and reduce the amount of computation. M.N. Jebur et al. [
22] integrated the condition factors into SVM to evaluate the correlation between landslide occurrence and each condition factor.
In recent years, deep learning has achieved good performance in the field of image processing [
23,
24,
25,
26]. In the process of image recognition, deep learning network models do not need to manually design the feature extraction process and are able to automatically learn complex feature representations from data. Specifically, traditional methods rely on manually designed feature extractors, which usually require domain expertise and are difficult to adapt to a changing data distribution. In contrast, deep learning automatically learns multi-level features through multi-layer neural networks, which greatly improves the generalization ability of the model and the ability to capture complex patterns [
27,
28]. The application of deep learning in the field of landslide recognition has become the research focus of many scholars. Wu Y [
29] proposed a landslide warning algorithm for the K-meansResNet model based on the ResNet network to realize landslide warning. Shi [
30] proposed an integrated method for landslide recognition in remote sensing images by combining a deep CNN and change detection. Zhang et al. [
31] classified the causes of natural landslides and engineering landslides, established a seven-layer improved CNN, and accurately predicted the probability of landslide susceptibility, which reflects the probability of future landslides in a particular area, is accurately predicted, and is quantified by a numerical value between zero and one, where higher values indicate a higher risk of landslides in the area. The DeepLab model has been widely used in various image analysis tasks, including landslide recognition, due to its excellent semantic segmentation ability. Hu et al. [
32] proposed a semantic segmentation method for remote sensing images based on the DeepLabv3 network, which improves the segmentation accuracy by adding the dilation factor in the spatial pyramid pooling (ASPP) parallel structure. Mao et al. [
33] proposed a CNN-based landslide segmentation method to identify landslides using the DeepLabv3+ model. Wang Wei et al. [
34] proposed a landslide image semantic segmentation method (CLBP-DeepLabv3+) combining DeepLabv3+ and the complete local binary pattern (CLBP).
However, there are still some challenges in terms of model performance when the DeepLabv3+ framework is applied to the field of landslide recognition. Firstly, the encoder part of DeepLabv3+ usually uses a pre-trained deep convolutional neural network (e.g., MobileNet) as the backbone network. Its feature extraction capability may not be fully adapted to the specific task of landslide recognition, resulting in insufficient landslide feature learning. Secondly, the pooling layer and multiple down-sampling operations in the decoder of the DeepLabv3+ framework will inevitably lose image details, especially fine structures such as landslide boundaries. In addition, a single model has the problems of low computational efficiency, high error rate and weak generalization ability. Therefore, it is very important to explore how to use the DeepLabv3+ deep learning model to better extract landslide edge information and improve the recognition accuracy in high-resolution remote sensing. In order to solve the above problems, this paper improves the DeepLabv3+ framework and proposes the DeepLabv3+-ResNet101-ECA model.
The fusion of multiple networks is the research trend in terms of deep learning. The proposed model integrates the advantages of the DeepLabv3+, ResNet101, and ECA modules and allows them to complement each other. The encoder uses ResNet101 as the backbone network to solve the problem of insufficient feature extraction ability, and the decoder integrates the ECA attention mechanism to effectively avoid the loss of key edge information. And as a combination model, it can effectively alleviate the problems of the low accuracy and poor generalization of a single model.
Our contributions are as follows. We use satellite images combined with the DEM data from landslide datasets and use the real mask image as a tag (
https://dx.doi.org/10.21227/ep6n-fm58). Then, based on the DeepLabv3+ framework, a combined network model named DeepLabv3+-ResNet101-ECA is designed. The model uses ResNet101 to replace the backbone network to enhance the feature extraction ability of the model for small landslides, so as to solve the problem that the contours of small landslides cannot be captured. Incorporating a lightweight ECA attention mechanism into the model to focus on the most important regions in the landslide image can improve the retention of boundary information and enhance the landslide detection ability. Finally, the generalization ability of the model is verified by transfer learning.
The DeepLabv3+-ResNet101-ECA model proposed in this paper uses deep learning technology to automatically extract complex terrain features and combines the attention mechanism to enhance the learning ability of key features, thereby improving the accuracy and robustness of landslide recognition. Due to the computational complexity of deep learning and its high dependence on data, DeepLabv3+-ResNet101-ECAye has potential limitations. The training and inference process of the model requires high computational resources, and optimization techniques such as pruning and quantization are required to reduce the computational cost. This model highly relies on high-quality, large-scale labeled datasets, which may limit the performance of the model if applied to regions where data are scarce or difficult to obtain. In order to reduce the need for large-scale labeled data and improve the generality of the model, transfer learning or data augmentation techniques are needed.
In addition, the proposed model has application feasibility in other regions. However, the geological conditions of different regions are different, which imposes certain restrictions on its application. Cross-regional cooperation and data sharing are helpful to break through this constraint, and they also help to understand the common laws and unique characteristics of landslide occurrence under different geological backgrounds and further improve the wide application of the model.
4. Experiments and Results
4.1. Experimental Setup
The experiments were conducted on a computer workstation equipped with a 13th Gen Intel® Core™ i5-13600KF processor running at 3.50 GHz and an Nvidia GeForce RTX 3080 Ti with 8 GB of GPU memory. The development environment was built using PyTorch 2.4.0, CUDA 11.6, and CUDNN 8.3.2.
4.2. Backbone Network Selection
The segmentation performance of the semantic segmentation model is closely related to the feature extraction ability of the backbone network. With the rapid development of deep learning, more and more networks have been designed and applied to different tasks. In this experiment, six networks are selected for analysis and comparison, namely MobileNet, Vgg16, Xception, ResNet18, ResNet50, and ResNet101. They are used as the backbone network in turn to train the Bijie data images, and the model is evaluated on the validation set.
Figure 7 shows the accuracy on the validation set using different backbone network models
As shown in
Figure 7, except for the DeepLabv3+-MobileNet network model, the accuracy of the other network models on the validation set is more than 96%. DeepLabv3+-ResNet18, DeepLabv3+-ResNet50 and DeepLabv3+-ResNet101 have high accuracy, and DeepLabv3+-ResNet101 has the best performance, with higher accuracy than the other network models.
Figure 8 shows the prediction results of the different backbone networks for landslide images. The figure contains five landslide image samples from different locations, which are, respectively, labeled as landslide I, II, III, IV and V. Among them, landslide I, III, IV and V have obvious landslide characteristics and forms, while landslide II has less obvious characteristic forms. The label of the landslide in the experiment is the mask image, which is used to compare with the predicted map, and the white area is the location where the landslide occurred. Different backbone network models are used to obtain the predicted mask images. For landslides with the obvious morphology and characteristics I, III, IV, and V, the six backbone networks have certain recognition effects. Only DeepLabv3+-ResNet101 has the most prominent performance on landslide II. This is because the characteristics of the landside II landslides are not obvious and there is interference information such as roads in the image. The other models found it easy to misjudge these non-landslide information as landslides when processing, resulting in a large difference from the real landslide cover. Through its deep architecture and residual connection design, ResNet101 enables the model to learn complex patterns and capture multi-scale information, so that landslide II can be better identified and achieve better results. Through the above analysis of various experimental data, it is verified that the DeepLabv3+-ResNet101 model has the best performance and can deal with the landslide detection task more accurately. The subsequent experiments are carried out on the basis of this model.
4.2.1. Analysis of Computational Cost and Accuracy
In describing deep learning models, in addition to the accuracy, the number of floating-point operations (
FLOPs) and the number of parameters are often used to illustrate the computational cost of the model. The number of Flops is calculated as follows:
where
H,
W and
are the height, width and number of the input feature map of the channel, k is the kernel width, and
is the number of output channels. We discuss the computational cost of the experiments through six metrics, as shown in
Table 2, and record the evaluation metrics for each model, as shown in
Table 3.
From the data in
Table 2 and
Table 3, we can see that DeepLabv3+-ResNet101 shows significant advantages in the landslide detection task when considering both the computational cost and the model accuracy. Although the training time, GPU memory consumption and parameter quantity of the proposed model are higher than those of some lightweight models, it performs well in terms of the evaluation metrics: the MIoU reaches 0.752, the F1 score is 0.9220, the recall rate is 0.9160, and the precision rate is 0.9283. In contrast, although MobileNet and ResNet18 are more computationally efficient, their MIoU is reduced by 18.9% and 2.8%, respectively, which will significantly affect the localization accuracy of the landslide boundaries.
The highest Flops of ResNet101 is 16.008 G, and its reasoning time of 0.0190 s still maintains the millisecond response ability, which can meet the real-time requirements in actual deployment. For high-precision tasks such as geological disaster detection, sacrificing part of training resources for more than a 10% improvement in key indicators has significant engineering value, especially in the case of false detection/missed detection, which may cause serious consequences, so selecting ResNet101 as the backbone network can more reliably balance performance and efficiency.
4.2.2. Statistical Significance Analysis
p-values are a commonly used method in statistical hypothesis testing for assessing the compatibility of data with hypotheses, and they can be used in deep learning to assess the statistical significance of a model or to perform hypothesis testing. During the run, we performed 10 statistical validation experiments, and each time, the average accuracy was preserved by 4 bits.
Table 4 records the average accuracy over 10 experiments.
According to the data in
Table 4, we performed the
t-test between each model and DeepLabv3+-ResNet101 to obtain the t-values and
p-values, as shown in
Table 5 below. All the calculated
p-values are much smaller than the commonly used significance level threshold of 0.05, proving the robustness of the above statistical results.
4.3. Different Attention Mechanisms
Four attention mechanisms, including SE, BAM, CBAM and ECA, were selected to be added after the decoder-fused features, respectively. The results of the models with different attention mechanisms trained on the validation set are shown in
Figure 9 and
Figure 10.
It can be seen from the figure that DeepLabv3+-ResNet101-ECA has lower loss and higher accuracy on the validation set, and the evaluation index values of the above models are shown in
Table 6. It can be determined from the results in the table that the DeepLabv3+-ResNet101-ECA model has the highest precision, recall, F1 score and MIoU, which are 0.9351, 0.9259, 0.9240 and 0.7603, respectively. And the lowest value in the table increased by 1.09%, 1.64%, 0.68%, and 1.51%, respectively.
In the experiment, six landslide samples in the test set were selected and labeled as landslide I, II, III, IV, V and VI, respectively, as shown in
Figure 11. For landslide I, the predicted mask edge blur of DeepLabv3+-Reset101-BAM and DeepLabv3+-Reset101-SE has a large difference. The prediction mask of DeepLabv3+-Reset101-CBAM is smaller than the true mask. For landslide II, III and VI, all four models show certain recognition performance. For landslide IV and V, the DeepLabv3+-Reset101-ECA model is more consistent with the real mask than the other models. In general, the DeepLabv3+-Reset101-ECA model has the best performance and the predicted mask is more consistent with the true mask, with high accuracy.
As shown in
Figure 11, the area selected by the blue box is the false positive area, that is, the area that incorrectly identifies the non-landslide area as a landslide. The DeepLabv3+-ResNet101-ECA model significantly reduces this type of error, which is very important for optimizing resource utilization and improving the credibility of the warning system. The area selected by the green box is the area with clearer detail capture, which reflects the improvement of the proposed model in capturing the subtle features of landslides, which improves the overall recognition accuracy and helps to comprehensively understand the landslide event. The region selected by the red box is the boundary-preserving region. The DeepLabv3+-ResNet101-eca model can delineate the landslide boundary more accurately in this area, which indicates that the new model has higher resolution and accuracy when dealing with complex terrain features.
4.4. The Location of the ECA
We explored the placement of the ECA attention mechanism in our experiments by adding it after the residual block, ASPP module, and decoder feature fusion, respectively. The experimental results are shown in
Table 7 below.
The feature fusion step of the decoder usually involves integrating information from different levels of the encoder. Applying the ECA attention mechanism at this stage can adaptively emphasize the channel information that is critical to the task, thereby enhancing the effectiveness and robustness of the final feature representation and ensuring that the model can make full use of the multi-level information. According to the experimental results in
Table 2 above, when the ECA module is added after the decoder feature fusion, the model has better performance and solves the problem of the loss of image detail information caused by multiple down-sampling in the decoder.
4.5. Transfer Learning Experiment
Bijie and Luding are located in different geographical areas, and their terrain features are quite different. The Bijie area is mostly karst landform, and the terrain is complex and diverse; Luding is located on the edge of the Qinghai–Tibet Plateau, with higher terrain and mostly mountainous terrain. This difference in topography will result in landslides with different mechanisms, scales, and forms. In the case of large differences in terrain features, directly applying the pre-trained model to the new region is likely to lead to overfitting of the data features of the source domain. In order to solve the above problems, in the experiment, we normalize the data from the Bijie and Luding regions to reduce the inconsistency of numerical ranges caused by terrain differences, and we use data enhancement techniques (such as rotation, flipping, cropping, etc.) to enrich the target dataset and enhance the terrain features. A dropout layer and regularization are introduced in the construction of the neural network model to prevent the model from overfitting.
In this paper, a parameter-based transfer learning method is used to train the landslide detection model. Firstly, the model is pre-trained on the Bijie dataset to obtain the pre-trained weights, and then the pre-trained weights are used for secondary training on the Luding landslide dataset. Although the objects in the two datasets are different, they share some underlying features, such as the edges and textures of the objects. This shared knowledge can be used by the model, so as to avoid the network learning from scratch, which can significantly improve the training speed and effect of the model.
Table 8 records the model evaluation metrics before and after using transfer learning. The first model in the table is only trained and predicted on the Ludingdataset, while the second model is pre-trained on the Bijie dataset, and then it is trained and predicted on the Luding data. It can be seen from the table that the four evaluation indicators of the model after transfer learning are improved by 1.98%, 1.73%, 1.85% and 4.95%, respectively, and the model after transfer learning has better performance indicators.
Figure 12 shows the predicted mask images of DeepLabv3+-ResNet101-ECA and DeepLabv3+-ResNet101-ECA-Trans-Learning. For landslide samples I, II, III, and IV, there is a certain degree of difference between the image edges predicted by DeepLabv3+-ResNet101-ECA without pre-training and the real mask edges. The images predicted by DeepLabv3+-ResNet101-ECA-Rrans-Learning have high agreement with the true mask. For the landslide sample V, the predicted output images of the two models are in good agreement with the reality. In summary, it is proved that transfer learning casues a good improvement in landslide recognition, and the model has good generalization ability.
4.6. Discussion
When applying the DeepLabv3+-ResNet101-ECA model to other regions, it is necessary to be aware of the following limitations. Since this model is a deep network model, the training and inference process requires high computational resources. Optimization techniques such as pruning and quantization are required to reduce the computational cost when necessary. Moreover, this model requires high-quality, large-scale landslide datasets, which can be alleviated by transfer learning or data augmentation techniques if applied to areas where data are scarce or difficult to obtain.
Due to the complexity of DeepLabv3+-Resnet101-ECA, the risk of overfitting is also high, especially when the amount of data is limited. To mitigate this risk, we should use methods such as cross-validation to evaluate the model performance and ensure that we validate our models with a localized test suite to ensure their accuracy and reliability before actual deployment.
Although the DeepLabv3+-ResNet101-ECA models perform well in specific environments, their performance may suffer in areas with limited resources or widely varying environmental conditions, and users may need to explore optimization techniques such as pruning or quantization to reduce the computational costs.
In this paper, the research on landslide recognition based on the DeepLabv3+-ResNet101-ECA model is not limited to the inventory of phenomena but deeply involved in the whole monitoring process, realizing the full process of coverage from data collection and processing to information transmission, effectively supporting the user’s decision-making process. How to embed the model into the early warning system to provide users with visual event information remains to be further studied.
5. Conclusions
In order to improve the accuracy and generalization of landslide recognition, this paper proposes a DeepLabv3+-ResNet101-ECA model, which uses ResNet101 as the backbone network and integrates the ECA attention mechanism into the model to improve the performance of landslide recognition.
In this paper, DEM is used as a feature of the image and the original RGB image is fused and input into the neural network, and the landslide features can be extracted from multiple angles. The experimental results are compared and analyzed with the mask as a label and the output mask.
Compared with the DeepLabv3+ model, the proposed model has the precision, recall, F1 score and MIoU increased by 1.17%, 2%, 0.96% and 2.36%, respectively. The improvement in these metrics is large. An increase in precision means that the overall prediction of the model is more correct and there are fewer misclassifications; The improved recall shows that the model is more effective in identifying all the real landslide events. The improvement in the F1 score reflects that the model achieves a better balance between accuracy and recall, and it improves the overall performance of the model. The improvement of the MIoU indicates that the model is more accurate in identifying the boundary of the landslide area. The improvement of these indicators not only improves the theoretical performance of the model but also provides disaster management departments with more accurate landslide disaster areas, which is helpful to quickly deploy rescue forces and resources and reduce disaster losses.
In addition, transfer learning is used to apply the pre-trained model to the Luding region, which verifies the generalization ability of DeepLabv3+-ResNet101-ECA. The results show that our designed model improves the accuracy of landslide detection to a certain extent and provides timely and accurate technical support for landslide identification evaluation and disaster prevention and mitigation decision-making.
The model proposed in this study has the ability to be integrated into the automated disaster management system. By integrating it into the existing disaster warning platform, the model can provide a continuous and automatic landslide warning service, which can monitor and assess the potential landslide risk in real time, thus greatly improving the disaster response efficiency and reducing casualties and property losses. This is not only crucial for the rapid response of landslide hazards but also provides a reference for other types of geological hazard monitoring, which has a wide range of application prospects and promotion value.
Despite the progress made in landslide detection accuracy, there is some room for improvement in this research. More accurate classification and segmentation of landslides is needed due to the inconspicuous shape of landslide features and omissions. In future work, we will fuse multi-source remote sensing data to obtain richer surface information, combine relevant terrain information, and fuse additional landslide texture features for landslide recognition.