Landslide Recognition Based on DeepLabv3+ Framework Fusing ResNet101 and ECA Attention Mechanism

Chen, Xinfang; Wang, Shiwei; Dinavahi, Venkata; Yang, Lijia; Wu, Dibai; Shen, Meiyi

doi:10.3390/app15052613

Open AccessArticle

Landslide Recognition Based on DeepLabv3+ Framework Fusing ResNet101 and ECA Attention Mechanism

by

Xinfang Chen

^1,2,

Shiwei Wang

^1,*,

Venkata Dinavahi

³

,

Lijia Yang

¹

,

Dibai Wu

¹ and

Meiyi Shen

¹

College of Information Engineering, Institute of Disaster Prevention, Langfang 065201, China

²

Hebei Province University Smart Emergency Application Technology Research and Development Center, Langfang 065201, China

³

Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB T6G 1H9, Canada

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(5), 2613; https://doi.org/10.3390/app15052613

Submission received: 19 January 2025 / Revised: 17 February 2025 / Accepted: 25 February 2025 / Published: 28 February 2025

Download

Browse Figures

Versions Notes

Abstract

A landslide is one of the most common geological disasters, which is associated with great destructive power and harm. In recent years, semantic segmentation models have been applied to landslide recognition research and have made some achievements. However, the current method still has issues, overlooking small targets like fine cracks, missegmenting boundaries, and struggling to differentiate spectral signatures such as those of different rock types in landslide-prone areas. In this paper, a landslide detection model based on the DeepLabv3+ framework, DeepLabv3+-ResNet101-ECA, is proposed. The backbone feature extraction network of DeepLabv3+ is replaced with ResNet101 to enhance the feature extraction ability of the model for small objects. The ECA attention mechanism is integrated into the model to improve the accuracy of the object segmentation and improve the detection accuracy. Taking the landslide in Bijie City, Guizhou Province, as the research object, compared with the original DeepLabv3+ model, the precision of DeepLabv3+-ResNet101-ECA is increased by 1.17%, the recall rate is increased by 2%, the F1 score is increased by 0.96%, and the MIou is increased by 2.36%. Finally, transfer learning is used to verify the generalization ability of the model. The results show that the improved model has a better detection effect on landslides.

Keywords:

artificial intelligence; DeepLabv3+; deep learning; ECA; landslide recognition; machine learning; ResNet

1. Introduction

Landslides are slope gravity processes that reshape geological landforms by mobilizing landslide blocks [1]. The main reasons for their formation include the following three aspects: ① geological reasons, such as material properties, structural characteristics, and permeability differences; ② morphological reasons, such as topographic changes and natural processes; and ③ man-made causes, such as deforestation, artificial vibration, and soil erosion [2]. Landslides can cause huge losses to people’s lives and property [3,4,5]. For example, the landslide in Xinmo Village, Diexi Town, MAO County, Sichuan Province, China, in 2017 left 83 people dead or missing, and the direct economic losses amounted to hundreds of millions of dollars [6]. Accurate and effective landslide recognition has always been a challenging problem. Landslide identification has been studied by many scholars over the years, and we arrange it in chronological order.

In the early stage, landslide identification technology mainly relied on field investigation and the experience and judgment of geological experts [7]. Chigira et al. [8] determined the location, size and distribution of landslides caused by earthquakes in central Niigata Prefecture, Japan, by field investigation and using satellite or aerial image data. However, for some hard-to-reach or dangerous areas, there are certain difficulties in terms of the field survey, which may lead to incomplete data. Keefer et al. [9] used statistical analysis to evaluate the distribution of earthquake-induced landslides. However, the research process relied on ground measurements with high cost and limited coverage. Xing et al. [10] conducted a field investigation of a large catastrophic landslide in Guanling County, Guizhou Province, which cost a lot of manpower, and the DAN3D model used showed certain deviations in relation to the prediction of the maximum depth. Galli et al. [11] conducted a systematic comparative analysis of landslide cataloging maps generated through remote sensing image analysis and field investigation. The technology and experience of the researchers and the complexity of the study area had a certain impact on the analysis process. Due to the great changes in the acquisition methods and data sources in terms of landslide data, there are limitations when dealing with large-scale and multi-dimensional datasets, and the early landslide recognition methods cannot meet the recognition requirements.

With the development of computer technology, machine learning has been widely introduced into the field of landslide detection. Machine learning methods are automated modeling techniques that analyze data to learn the underlying relationships and hidden patterns, thereby building analytical models. These methods are used to produce accurate and reproducible results for landslide susceptibility analysis through an iterative learning process. The commonly used machine learning methods mainly include logistic regression [12], decision trees [13], artificial neural networks [14], support vector machines [15], random forests [16], and so on. Table 1 lists the differences between these machine learning methods. Bai S B et al. [17], based on the GIS data of Zhongxian County in the Three Gorges Reservoir area, used the logistic regression method to draw a detailed landslide susceptibility map and verified it. Logistic regression usually lacks classification accuracy, and the decision tree algorithm usually performs better than logistic regression, but it is prone to overfitting and has poor generalization ability [18,19]. Lian et al. [20] proposed a new method to establish a landslide displacement prediction model based on a random hidden weight neural network. However, the learning process of the artificial neural network is a black box, which makes the results difficult to interpret. The SVM algorithm solves the problem of the interpretability of artificial neural networks. Kuntan [21] proposed to use CUDA and OpenMP to accelerate the SVM classification and reduce the amount of computation. M.N. Jebur et al. [22] integrated the condition factors into SVM to evaluate the correlation between landslide occurrence and each condition factor.

In recent years, deep learning has achieved good performance in the field of image processing [23,24,25,26]. In the process of image recognition, deep learning network models do not need to manually design the feature extraction process and are able to automatically learn complex feature representations from data. Specifically, traditional methods rely on manually designed feature extractors, which usually require domain expertise and are difficult to adapt to a changing data distribution. In contrast, deep learning automatically learns multi-level features through multi-layer neural networks, which greatly improves the generalization ability of the model and the ability to capture complex patterns [27,28]. The application of deep learning in the field of landslide recognition has become the research focus of many scholars. Wu Y [29] proposed a landslide warning algorithm for the K-meansResNet model based on the ResNet network to realize landslide warning. Shi [30] proposed an integrated method for landslide recognition in remote sensing images by combining a deep CNN and change detection. Zhang et al. [31] classified the causes of natural landslides and engineering landslides, established a seven-layer improved CNN, and accurately predicted the probability of landslide susceptibility, which reflects the probability of future landslides in a particular area, is accurately predicted, and is quantified by a numerical value between zero and one, where higher values indicate a higher risk of landslides in the area. The DeepLab model has been widely used in various image analysis tasks, including landslide recognition, due to its excellent semantic segmentation ability. Hu et al. [32] proposed a semantic segmentation method for remote sensing images based on the DeepLabv3 network, which improves the segmentation accuracy by adding the dilation factor in the spatial pyramid pooling (ASPP) parallel structure. Mao et al. [33] proposed a CNN-based landslide segmentation method to identify landslides using the DeepLabv3+ model. Wang Wei et al. [34] proposed a landslide image semantic segmentation method (CLBP-DeepLabv3+) combining DeepLabv3+ and the complete local binary pattern (CLBP).

However, there are still some challenges in terms of model performance when the DeepLabv3+ framework is applied to the field of landslide recognition. Firstly, the encoder part of DeepLabv3+ usually uses a pre-trained deep convolutional neural network (e.g., MobileNet) as the backbone network. Its feature extraction capability may not be fully adapted to the specific task of landslide recognition, resulting in insufficient landslide feature learning. Secondly, the pooling layer and multiple down-sampling operations in the decoder of the DeepLabv3+ framework will inevitably lose image details, especially fine structures such as landslide boundaries. In addition, a single model has the problems of low computational efficiency, high error rate and weak generalization ability. Therefore, it is very important to explore how to use the DeepLabv3+ deep learning model to better extract landslide edge information and improve the recognition accuracy in high-resolution remote sensing. In order to solve the above problems, this paper improves the DeepLabv3+ framework and proposes the DeepLabv3+-ResNet101-ECA model.

The fusion of multiple networks is the research trend in terms of deep learning. The proposed model integrates the advantages of the DeepLabv3+, ResNet101, and ECA modules and allows them to complement each other. The encoder uses ResNet101 as the backbone network to solve the problem of insufficient feature extraction ability, and the decoder integrates the ECA attention mechanism to effectively avoid the loss of key edge information. And as a combination model, it can effectively alleviate the problems of the low accuracy and poor generalization of a single model.

Our contributions are as follows. We use satellite images combined with the DEM data from landslide datasets and use the real mask image as a tag (https://dx.doi.org/10.21227/ep6n-fm58). Then, based on the DeepLabv3+ framework, a combined network model named DeepLabv3+-ResNet101-ECA is designed. The model uses ResNet101 to replace the backbone network to enhance the feature extraction ability of the model for small landslides, so as to solve the problem that the contours of small landslides cannot be captured. Incorporating a lightweight ECA attention mechanism into the model to focus on the most important regions in the landslide image can improve the retention of boundary information and enhance the landslide detection ability. Finally, the generalization ability of the model is verified by transfer learning.

The DeepLabv3+-ResNet101-ECA model proposed in this paper uses deep learning technology to automatically extract complex terrain features and combines the attention mechanism to enhance the learning ability of key features, thereby improving the accuracy and robustness of landslide recognition. Due to the computational complexity of deep learning and its high dependence on data, DeepLabv3+-ResNet101-ECAye has potential limitations. The training and inference process of the model requires high computational resources, and optimization techniques such as pruning and quantization are required to reduce the computational cost. This model highly relies on high-quality, large-scale labeled datasets, which may limit the performance of the model if applied to regions where data are scarce or difficult to obtain. In order to reduce the need for large-scale labeled data and improve the generality of the model, transfer learning or data augmentation techniques are needed.

In addition, the proposed model has application feasibility in other regions. However, the geological conditions of different regions are different, which imposes certain restrictions on its application. Cross-regional cooperation and data sharing are helpful to break through this constraint, and they also help to understand the common laws and unique characteristics of landslide occurrence under different geological backgrounds and further improve the wide application of the model.

2. Study Area and Dataset

In this paper, two typical landslide-prone areas, Bijie District, Guizhou Province, China, and Luding District, Sichuan Province, China, are selected as the study areas, as shown in Figure 1. Bijie and Luding are both located in plateau and mountainous areas, and their landslides have similar characteristics, which are mainly triggered by rainfall in densely vegetated areas, weak geological structures, and earthquake disasters. In addition, the landslide images from these two areas have similar features, such as RGB channels and DEM data.

2.1. The Bijie Dataset

Bijie, located in the northwest of China’s Guizhou Province, is one of the areas with the worst landslide disasters in China. The dataset comes from [36], where researchers initially located landslides in satellite images based on historical inventory data of landslides collected from field surveys. Secondly, geologists marked each landslide boundary, corrected the local bias of a few landslides in the inventory data, deleted some landslides in the inventory data that could not be recognized from the image, and marked some areas with obvious geological landslide morphology characteristics as new potential landslides. Finally, the potential landslide samples and the samples with large location deviations in the inventory data were further confirmed by careful field investigation.

The dataset was compiled from May to August 2018 by the TripleSat satellite. It contains 770 landslide instances, including rockfalls, rockslides, and certain debris flow landslides, as well as 2003 negative samples that are not landslides, and its resolution is 0.8 m. The dataset consists of remote sensing landslide images, landslide boundary contour files (i.e., masks), and digital elevation models (DEMs). The red dots in Figure 1 indicate the actual landslide sites.

The geo-tagging at the time of the landslides was conducted on the basis of field investigations, first accurately recording the actual location of each landslide event and then collecting related geological and geomorphic information. These data were then used to create landslide boundary files and register them with satellite images. Strict quality control measures were implemented throughout the labeling process. All the labeled data were reviewed and corrected several times to eliminate any possible human errors or inconsistencies.

The pixels of the image range from 128 × 128 to 1024 × 1024. The landslide image contains the four channels of red, green, blue spectrum and digital elevation model (DEM), and the spatial resolution of the DEM is 0.8 m. In order to reduce the impact of the number of landslide and non-landslide images on the performance of the model, data augmentation techniques such as cropping, rotation and flipping are used. Figure 2 shows the data image. In this experiment, the ratio of the test set, training set and validation set was set to 8:1:1.

In our study, we used the 2018 under the background of the image data (http://study.rsgis.whu.edu.cn/pages/download/ accessed on 26 August 2024), which has the characteristics of high resolution and wide coverage, with the RGB and four DEM channels, in accordance with the input conditions of the model. This dataset went through rigorous quality control and preprocessing steps, ensuring a suitable training process for the deep learning models. There may be differences between different data sources, such as differences in features such as the resolution, spectral range, and acquisition time, which may lead to certain deviations in the model output, and the resulting deviations are generally small for data with similar feature channels.

2.2. The Luding Dataset

In order to study the versatility and portability of the designed model, the feature knowledge learned by transfer learning on the Bijie benchmark dataset were effectively transferred. Then, it was applied to the specific environment of the Luding district for testing. The remote sensing images of the Luding study area typically consist of multiple remote sensing images covering a large area, including elements such as landslides, buildings, roads, fields, rivers, vegetation, and forests. The terrain data are mainly derived from airborne light detection and ranging data (https://dx.doi.org/10.21227/ep6n-fm58), so we named it LIDAR. The Luding County landslide dataset is based on Google Earth image data from 2020 to 2022, with the remote sensing images having a ground resolution of 0.4 to 0.6 m. The study area of Luding County, as shown in Figure 1, contains 230 landslide samples. These images have been registered, geometrically corrected, atmospherically corrected, and radiologically corrected, and data augmentation techniques have been used to increase the number of landslide samples.

3. Method

3.1. Data Preprocessing

Firstly, the high-resolution landslide image data and DEM data of Bijie City were collected. In order to ensure the consistency of their data dimensions, the DEM was added to the RGB image as a new channel by splicing, forming a four-channel input, so as to achieve the effect of data fusion. Then, we performed data enhancement operations such as rotation, flipping and cropping on the data. The rotation angles were 90°, 180° and 270°, respectively, and then all the images were linearly normalized, and the image was cropped to a size of 224 × 224.

For high-resolution images, reducing them to 224 × 224 can significantly reduce the amount of computation and memory occupation required for each convolution operation and improve the computational efficiency. The 224 × 224 resolution can retain important texture and shape information on the landslide area, while being small enough to avoid unnecessary background information interference. This size has been widely used in several deep learning models (AlexNet [37], VGG [38], and ResNet) and proved to be sufficient to support feature extraction tasks in complex scenes.

3.2. DeepLabv3+ Model

DeepLabv3+ is a further improvement of the well-performing semantic segmentation model DeepLabv3 developed by the team at Google [39]. It employs an encoder–decoder architecture where the encoder uses ASPP for feature extraction. In the decoder part, the low-level features and the deep-level features after ASPP processing are fused to make the details of the object boundary clearer, so as to improve the segmentation accuracy.

The structure of DeepLabv3+ is shown in Figure 3. The encoder uses convolution kernels of different sizes (e.g., 1 × 1, 3 × 3) and dilated convolutions with different dilation rates (e.g., rate 6, rate 12, and rate 18) to pool and down-sample the image features extracted by the DCNN. Then, the obtained new feature layers are entered into the decoder for up-sampling, and the number of channels is modified by 1 × 1 convolution. The feature layer of the encoder is concatenated with the feature layer in the decoder, and then 3 × 3 convolution and up-sampling are performed to obtain the final prediction map.

3.3. ResNet

Residual networks (ResNets) were proposed by He Kaiming et al. [40] in 2015 to solve the problem of gradient disappearance and gradient explosion in deep neural network training. The key idea behind ResNet is to introduce a “residual block” into the neural network. This network structure includes skip connections, which allow information and gradients to bypass one or more layers directly, facilitating the flow of information and gradients throughout the network. The residual modules ‘structures are illustrated in Figure 4.

In the residual network, when the input x passes through the network layer, the relationship between it and the output H(x) can be expressed by the following formula:

H (x) = F (x) + x,

(1)

where F(x) denotes the residual map and H(x) is the output of the whole residual block. The residual network is designed to learn the change from an input x to a desired output H(x). If F(x) is close to the zero vector, then H(x) ≈ x, indicating that the output of the network is close to the input. Good performance is maintained even when deepened. The input x can be passed directly to the later layers via skip connections, thus alleviating the problem of vanishing or exploding gradients.

3.4. ECA

This paper employs the ECA attention module [41], which is a channel attention mechanism often used in visual models. The structure of the ECA attention mechanism is illustrated in Figure 5. Given a feature map X∈R^C×H×W, where C is the number of channels, and H and W are the height and width of the feature map, respectively, the ECA module performs a global average pooling operation along the spatial dimensions to obtain a one-dimensional vector Z∈R^C. The relevant mathematical formula is as follows:

Z_{c} = \frac{1}{H \cdot W} \sum_{i = 1}^{H} \sum_{j = 1}^{W} X_{c i j},

(2)

This one-dimensional vector Z is convolved with a one-dimensional kernel k. Typically, the size of the kernel is an adjustable hyperparameter.

f (z) = {C o n v}_{k} (z),

(3)

Applying an activation function, such as the sigmoid function, after the convolution ensures that the generated channel weight values are within the range [0, 1].

σ (f (z)) = σ ({C o n v}_{k} (z))

(4)

The generated weight vector is channel-wise multiplied with the original feature map for feature selection and enhancement.

{\hat{X}}_{c}

represents the output feature map.

{\hat{X}}_{c} = σ (C o n v_{k} (z)) \cdot X_{c} .

(5)

3.5. DeepLabv3+-ResNet101-ECA

The improved DeepLabv3+ network structure is illustrated in Figure 6 and is divided into two main parts: the encoder and the decoder. At the encoder end, the landslide image is preprocessed and input into the backbone feature network, ResNet101. The feature map obtained after multi-level feature extraction by the backbone network is further processed by the ASPP module. This processing extracts and fuses information from five different scales. At the decoder end, the low-dimensional feature map obtained from the backbone network is fused with the feature map passed through the ASPP module. The ECA module is used to emphasize the important features in the fused feature map. Finally, a prediction segmentation mask of the same size as the original image is generated through convolution and up-sampling operations. In the experiments, the number of training epochs is set to 100. The cross-entropy loss is used as the loss function, and the Adam optimizer is selected with a learning rate of 0.0001.

3.6. Model Evaluation Methods

In deep learning and machine learning, the commonly used evaluation metrics include the Accuracy (Acc), Precision (P), Recall (R), F1, and MIoU [42]. These metrics are calculated based on the following four fundamental values. High accuracy ensures that when the system issues an alert, it has a high degree of confidence, reducing false positives. High recall guarantees that of all the potential landslide events, as many as possible are identified and catastrophic missed reports are avoided. The F1 score provides a way to balance precision and recall, which helps to optimize the reliability and response efficiency of early warning systems at the same time. As an important indicator for image segmentation tasks, the MIoU can effectively evaluate the accuracy of the model for landslide boundary identification, which is crucial for making evacuation plans. These evaluation indexes need to be calculated by TP, TN, FP, and FN.

TP is the actual positive example that is correctly predicted as positive. FP is the actual negative instance that is incorrectly predicted as positive. FN is the actual positive instance that was incorrectly predicted as negative. TN is the actual negative instance that is correctly predicted as negative. These metrics are all calculated based on the pixel level.

Accuracy represents the proportion of all the correctly predicted samples; specifically, the sum of TP and TN in the total number of samples. This is shown in Equation (6):

Accuracy = \frac{TP + TN}{TP + FP + FN + TN},

(6)

Precision represents the ratio of TP to the total number of predicted positive samples (TP + FP). This is shown in Equation (7):

Precision = \frac{TP}{TP + FP},

(7)

Recall, also known as the true positive rate, represents the proportion of actual positive samples that are correctly identified as positive. It is the ratio of TP to the total number of actual positive samples (TP + FN). This is shown in Equation (8):

Recall = \frac{TP}{TP + FN},

(8)

The F1 score, also known as the F1 value, is the harmonic mean of the precision and recall. It provides a balanced measure of both metrics. This is shown in Equation (9):

F 1 = \frac{2 \times Precision \times Recall}{Precision + Recall} = \frac{2 TP}{2 TP + FP + FN},

(9)

The MIoU represents the average ratio of the intersection to the union for each class in the dataset. Here, p(i,j) denotes the number of pixels of class i predicted as class j (FN), p(i,i) indicates the number of pixels of class i correctly predicted as class i (TP), and p(j,i) represents the number of pixels of class j incorrectly predicted as class i(FP). This is shown in Equation (10):

MIoU = \frac{1}{k + 1} \sum_{i = 0}^{k} \frac{p_{ii}}{\sum_{j = 0}^{k} p_{ij} + \sum_{j = 0}^{k} p_{ji} - p_{ii}} = \frac{1}{k + 1} \sum_{i = 0}^{k} \frac{TP}{FN + FP + TP} .

(10)

4. Experiments and Results

4.1. Experimental Setup

The experiments were conducted on a computer workstation equipped with a 13th Gen Intel^® Core™ i5-13600KF processor running at 3.50 GHz and an Nvidia GeForce RTX 3080 Ti with 8 GB of GPU memory. The development environment was built using PyTorch 2.4.0, CUDA 11.6, and CUDNN 8.3.2.

4.2. Backbone Network Selection

The segmentation performance of the semantic segmentation model is closely related to the feature extraction ability of the backbone network. With the rapid development of deep learning, more and more networks have been designed and applied to different tasks. In this experiment, six networks are selected for analysis and comparison, namely MobileNet, Vgg16, Xception, ResNet18, ResNet50, and ResNet101. They are used as the backbone network in turn to train the Bijie data images, and the model is evaluated on the validation set. Figure 7 shows the accuracy on the validation set using different backbone network models

As shown in Figure 7, except for the DeepLabv3+-MobileNet network model, the accuracy of the other network models on the validation set is more than 96%. DeepLabv3+-ResNet18, DeepLabv3+-ResNet50 and DeepLabv3+-ResNet101 have high accuracy, and DeepLabv3+-ResNet101 has the best performance, with higher accuracy than the other network models.

Figure 8 shows the prediction results of the different backbone networks for landslide images. The figure contains five landslide image samples from different locations, which are, respectively, labeled as landslide I, II, III, IV and V. Among them, landslide I, III, IV and V have obvious landslide characteristics and forms, while landslide II has less obvious characteristic forms. The label of the landslide in the experiment is the mask image, which is used to compare with the predicted map, and the white area is the location where the landslide occurred. Different backbone network models are used to obtain the predicted mask images. For landslides with the obvious morphology and characteristics I, III, IV, and V, the six backbone networks have certain recognition effects. Only DeepLabv3+-ResNet101 has the most prominent performance on landslide II. This is because the characteristics of the landside II landslides are not obvious and there is interference information such as roads in the image. The other models found it easy to misjudge these non-landslide information as landslides when processing, resulting in a large difference from the real landslide cover. Through its deep architecture and residual connection design, ResNet101 enables the model to learn complex patterns and capture multi-scale information, so that landslide II can be better identified and achieve better results. Through the above analysis of various experimental data, it is verified that the DeepLabv3+-ResNet101 model has the best performance and can deal with the landslide detection task more accurately. The subsequent experiments are carried out on the basis of this model.

4.2.1. Analysis of Computational Cost and Accuracy

In describing deep learning models, in addition to the accuracy, the number of floating-point operations (FLOPs) and the number of parameters are often used to illustrate the computational cost of the model. The number of Flops is calculated as follows:

F L O P_{S} = 2 H W (C_{i n} K^{2} + 1) C_{o u t} .

(11)

where H, W and

C_{i n}

are the height, width and number of the input feature map of the channel, k is the kernel width, and

C_{o u t}

is the number of output channels. We discuss the computational cost of the experiments through six metrics, as shown in Table 2, and record the evaluation metrics for each model, as shown in Table 3.

From the data in Table 2 and Table 3, we can see that DeepLabv3+-ResNet101 shows significant advantages in the landslide detection task when considering both the computational cost and the model accuracy. Although the training time, GPU memory consumption and parameter quantity of the proposed model are higher than those of some lightweight models, it performs well in terms of the evaluation metrics: the MIoU reaches 0.752, the F1 score is 0.9220, the recall rate is 0.9160, and the precision rate is 0.9283. In contrast, although MobileNet and ResNet18 are more computationally efficient, their MIoU is reduced by 18.9% and 2.8%, respectively, which will significantly affect the localization accuracy of the landslide boundaries.

The highest Flops of ResNet101 is 16.008 G, and its reasoning time of 0.0190 s still maintains the millisecond response ability, which can meet the real-time requirements in actual deployment. For high-precision tasks such as geological disaster detection, sacrificing part of training resources for more than a 10% improvement in key indicators has significant engineering value, especially in the case of false detection/missed detection, which may cause serious consequences, so selecting ResNet101 as the backbone network can more reliably balance performance and efficiency.

4.2.2. Statistical Significance Analysis

p-values are a commonly used method in statistical hypothesis testing for assessing the compatibility of data with hypotheses, and they can be used in deep learning to assess the statistical significance of a model or to perform hypothesis testing. During the run, we performed 10 statistical validation experiments, and each time, the average accuracy was preserved by 4 bits. Table 4 records the average accuracy over 10 experiments.

According to the data in Table 4, we performed the t-test between each model and DeepLabv3+-ResNet101 to obtain the t-values and p-values, as shown in Table 5 below. All the calculated p-values are much smaller than the commonly used significance level threshold of 0.05, proving the robustness of the above statistical results.

4.3. Different Attention Mechanisms

Four attention mechanisms, including SE, BAM, CBAM and ECA, were selected to be added after the decoder-fused features, respectively. The results of the models with different attention mechanisms trained on the validation set are shown in Figure 9 and Figure 10.

It can be seen from the figure that DeepLabv3+-ResNet101-ECA has lower loss and higher accuracy on the validation set, and the evaluation index values of the above models are shown in Table 6. It can be determined from the results in the table that the DeepLabv3+-ResNet101-ECA model has the highest precision, recall, F1 score and MIoU, which are 0.9351, 0.9259, 0.9240 and 0.7603, respectively. And the lowest value in the table increased by 1.09%, 1.64%, 0.68%, and 1.51%, respectively.

In the experiment, six landslide samples in the test set were selected and labeled as landslide I, II, III, IV, V and VI, respectively, as shown in Figure 11. For landslide I, the predicted mask edge blur of DeepLabv3+-Reset101-BAM and DeepLabv3+-Reset101-SE has a large difference. The prediction mask of DeepLabv3+-Reset101-CBAM is smaller than the true mask. For landslide II, III and VI, all four models show certain recognition performance. For landslide IV and V, the DeepLabv3+-Reset101-ECA model is more consistent with the real mask than the other models. In general, the DeepLabv3+-Reset101-ECA model has the best performance and the predicted mask is more consistent with the true mask, with high accuracy.

As shown in Figure 11, the area selected by the blue box is the false positive area, that is, the area that incorrectly identifies the non-landslide area as a landslide. The DeepLabv3+-ResNet101-ECA model significantly reduces this type of error, which is very important for optimizing resource utilization and improving the credibility of the warning system. The area selected by the green box is the area with clearer detail capture, which reflects the improvement of the proposed model in capturing the subtle features of landslides, which improves the overall recognition accuracy and helps to comprehensively understand the landslide event. The region selected by the red box is the boundary-preserving region. The DeepLabv3+-ResNet101-eca model can delineate the landslide boundary more accurately in this area, which indicates that the new model has higher resolution and accuracy when dealing with complex terrain features.

4.4. The Location of the ECA

We explored the placement of the ECA attention mechanism in our experiments by adding it after the residual block, ASPP module, and decoder feature fusion, respectively. The experimental results are shown in Table 7 below.

The feature fusion step of the decoder usually involves integrating information from different levels of the encoder. Applying the ECA attention mechanism at this stage can adaptively emphasize the channel information that is critical to the task, thereby enhancing the effectiveness and robustness of the final feature representation and ensuring that the model can make full use of the multi-level information. According to the experimental results in Table 2 above, when the ECA module is added after the decoder feature fusion, the model has better performance and solves the problem of the loss of image detail information caused by multiple down-sampling in the decoder.

4.5. Transfer Learning Experiment

Bijie and Luding are located in different geographical areas, and their terrain features are quite different. The Bijie area is mostly karst landform, and the terrain is complex and diverse; Luding is located on the edge of the Qinghai–Tibet Plateau, with higher terrain and mostly mountainous terrain. This difference in topography will result in landslides with different mechanisms, scales, and forms. In the case of large differences in terrain features, directly applying the pre-trained model to the new region is likely to lead to overfitting of the data features of the source domain. In order to solve the above problems, in the experiment, we normalize the data from the Bijie and Luding regions to reduce the inconsistency of numerical ranges caused by terrain differences, and we use data enhancement techniques (such as rotation, flipping, cropping, etc.) to enrich the target dataset and enhance the terrain features. A dropout layer and regularization are introduced in the construction of the neural network model to prevent the model from overfitting.

In this paper, a parameter-based transfer learning method is used to train the landslide detection model. Firstly, the model is pre-trained on the Bijie dataset to obtain the pre-trained weights, and then the pre-trained weights are used for secondary training on the Luding landslide dataset. Although the objects in the two datasets are different, they share some underlying features, such as the edges and textures of the objects. This shared knowledge can be used by the model, so as to avoid the network learning from scratch, which can significantly improve the training speed and effect of the model.

Table 8 records the model evaluation metrics before and after using transfer learning. The first model in the table is only trained and predicted on the Ludingdataset, while the second model is pre-trained on the Bijie dataset, and then it is trained and predicted on the Luding data. It can be seen from the table that the four evaluation indicators of the model after transfer learning are improved by 1.98%, 1.73%, 1.85% and 4.95%, respectively, and the model after transfer learning has better performance indicators.

Figure 12 shows the predicted mask images of DeepLabv3+-ResNet101-ECA and DeepLabv3+-ResNet101-ECA-Trans-Learning. For landslide samples I, II, III, and IV, there is a certain degree of difference between the image edges predicted by DeepLabv3+-ResNet101-ECA without pre-training and the real mask edges. The images predicted by DeepLabv3+-ResNet101-ECA-Rrans-Learning have high agreement with the true mask. For the landslide sample V, the predicted output images of the two models are in good agreement with the reality. In summary, it is proved that transfer learning casues a good improvement in landslide recognition, and the model has good generalization ability.

4.6. Discussion

When applying the DeepLabv3+-ResNet101-ECA model to other regions, it is necessary to be aware of the following limitations. Since this model is a deep network model, the training and inference process requires high computational resources. Optimization techniques such as pruning and quantization are required to reduce the computational cost when necessary. Moreover, this model requires high-quality, large-scale landslide datasets, which can be alleviated by transfer learning or data augmentation techniques if applied to areas where data are scarce or difficult to obtain.

Due to the complexity of DeepLabv3+-Resnet101-ECA, the risk of overfitting is also high, especially when the amount of data is limited. To mitigate this risk, we should use methods such as cross-validation to evaluate the model performance and ensure that we validate our models with a localized test suite to ensure their accuracy and reliability before actual deployment.

Although the DeepLabv3+-ResNet101-ECA models perform well in specific environments, their performance may suffer in areas with limited resources or widely varying environmental conditions, and users may need to explore optimization techniques such as pruning or quantization to reduce the computational costs.

In this paper, the research on landslide recognition based on the DeepLabv3+-ResNet101-ECA model is not limited to the inventory of phenomena but deeply involved in the whole monitoring process, realizing the full process of coverage from data collection and processing to information transmission, effectively supporting the user’s decision-making process. How to embed the model into the early warning system to provide users with visual event information remains to be further studied.

5. Conclusions

In order to improve the accuracy and generalization of landslide recognition, this paper proposes a DeepLabv3+-ResNet101-ECA model, which uses ResNet101 as the backbone network and integrates the ECA attention mechanism into the model to improve the performance of landslide recognition.

In this paper, DEM is used as a feature of the image and the original RGB image is fused and input into the neural network, and the landslide features can be extracted from multiple angles. The experimental results are compared and analyzed with the mask as a label and the output mask.

Compared with the DeepLabv3+ model, the proposed model has the precision, recall, F1 score and MIoU increased by 1.17%, 2%, 0.96% and 2.36%, respectively. The improvement in these metrics is large. An increase in precision means that the overall prediction of the model is more correct and there are fewer misclassifications; The improved recall shows that the model is more effective in identifying all the real landslide events. The improvement in the F1 score reflects that the model achieves a better balance between accuracy and recall, and it improves the overall performance of the model. The improvement of the MIoU indicates that the model is more accurate in identifying the boundary of the landslide area. The improvement of these indicators not only improves the theoretical performance of the model but also provides disaster management departments with more accurate landslide disaster areas, which is helpful to quickly deploy rescue forces and resources and reduce disaster losses.

In addition, transfer learning is used to apply the pre-trained model to the Luding region, which verifies the generalization ability of DeepLabv3+-ResNet101-ECA. The results show that our designed model improves the accuracy of landslide detection to a certain extent and provides timely and accurate technical support for landslide identification evaluation and disaster prevention and mitigation decision-making.

The model proposed in this study has the ability to be integrated into the automated disaster management system. By integrating it into the existing disaster warning platform, the model can provide a continuous and automatic landslide warning service, which can monitor and assess the potential landslide risk in real time, thus greatly improving the disaster response efficiency and reducing casualties and property losses. This is not only crucial for the rapid response of landslide hazards but also provides a reference for other types of geological hazard monitoring, which has a wide range of application prospects and promotion value.

Despite the progress made in landslide detection accuracy, there is some room for improvement in this research. More accurate classification and segmentation of landslides is needed due to the inconspicuous shape of landslide features and omissions. In future work, we will fuse multi-source remote sensing data to obtain richer surface information, combine relevant terrain information, and fuse additional landslide texture features for landslide recognition.

Author Contributions

Conceptualization, X.C. and S.W.; methodology, X.C. and S.W.; writing—original draft preparation, S.W., M.S. and D.W.; writing—review and editing, S.W., V.D. and L.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Science Research Project of Hebei Education Department under Grant ZC2023108, in part by the Langfang Science Technology Research Self-Financing Project under Grant 2024011018, and in part by the China Scholarship Council (CSC).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

The authors would like to thank the editor and the reviewers for their helpful comments.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Xiong, Z.; Feng, G.; He, L.; Wang, H.; Wei, J. Reactivation/Acceleration of Landslides Caused by the 2018 Baige Landslide-Dammed Lake and its Breach Floods Revealed by InSAR and Optical Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. 2024, 17, 1656–1672. [Google Scholar] [CrossRef]
Shi, B.; Zeng, T.; Tang, C.; Zhang, L.; Xie, Z.; Lv, G.; Wu, Q. Landslide Risk Assessment Using Granular Fuzzy Rule-Based Modeling: A Case Study on Earthquake-Triggered Landslides. IEEE Access 2021, 9, 135790–135802. [Google Scholar] [CrossRef]
Ma, Z.; Mei, G.; Piccialli, F. Machine learning for landslides prevention: A survey. Neural Comput. Appl. 2021, 33, 10881–10907. [Google Scholar] [CrossRef]
Mei, G.; Xu, N.; Qin, J.; Wang, B.; Qi, P. A Survey of Internet of Things (IoT) for Geohazard Prevention: Applications, Technologies, and Challenges. IEEE Internet Things J. 2020, 7, 4371–4386. [Google Scholar] [CrossRef]
Fan, X.; Scaringi, G.; Korup, O.; West, A.J.; van Westen, C.J.; Tanyas, H.; Hovius, N.; Hales, T.C.; Jibson, R.W.; Allstadt, K.E.; et al. Earthquake-Induced Chains of Geologic Hazards: Patterns, Mechanisms, and Impacts. Rev. Geophys. 2019, 57, 421–503. [Google Scholar] [CrossRef]
Brunsden, D. Landslide types, mechanisms, recognition, identification. In Proceedings of the Landslides in the South Wales Coalfield, Pontypridd, UK, 1–3 April 1985; Morgan, C.S., Ed.; [Google Scholar]
Su, L.-J.; Hu, K.-H.; Zhang, W.-F.; Wang, J.; Lei, Y.; Zhang, C.-L.; Cui, P.; Pasuto, A.; Zheng, Q.-H. Erratum to: Characteristics and triggering mechanism of Xinmo landslide on 24 June 2017 in Sichuan, China. J. Mt. Sci. 2017, 14, 2134–2135. [Google Scholar] [CrossRef]
Chigira, M.; Yagi, H. Geological and geomorphological characteristics of landslides triggered by the 2004 Mid Niigta prefecture earthquake in Japan. Eng. Geol. 2006, 82, 202–221. [Google Scholar] [CrossRef]
Keefer, D.K. The importance of earthquake-induced landslides to long-term slope erosion and slope-failure haz-ards in seismically active regions. In Geomorphology and Natural Hazards; Elsevier: Amsterdam, The Netherlands, 1994; pp. 265–284. [Google Scholar] [CrossRef]
Xing, A.; Yin, Y.; Jiang, Y.; Wang, G.; Yang, S.; Dai, D.; Zhu, Y.; Dai, J. Dynamic analysis and field investigation of a fluidized landslide in Guanling, Guizhou, China. Eng. Geol. 2014, 181, 1–14. [Google Scholar] [CrossRef]
Galli, M.; Ardizzone, F.; Cardinali, M.; Guzzetti, F.; Reichenbach, P. Comparing landslide inventory maps. Geomorphology 2008, 94, 268–289. [Google Scholar] [CrossRef]
Gutierrez, P.A.; Hervas-Martinez, C.; Martinez-Estudillo, F.J. Logistic Regression by Means of Evolutionary Radial Basis Function Neural Networks. IEEE Trans. Neural Netw. 2011, 22, 246–263. [Google Scholar] [CrossRef]
Song, J.; Xiao, L.; Molaei, M.; Lian, Z. Sparse Coding Driven Deep Decision Tree Ensembles for Nucleus Segmentation in Digital Pathology Images. IEEE Trans. Image Process. 2021, 30, 8088–8101. [Google Scholar] [CrossRef] [PubMed]
Sufi, F.K.; Alsulami, M. Knowledge Discovery of Global Landslides Using Automated Machine Learning Algorithms. IEEE Access 2021, 9, 131400–131419. [Google Scholar] [CrossRef]
Ertekin, Ş.; Bottou, L.; Giles, C.L. Nonconvex Online Support Vector Machines. IEEE Trans. Pattern Anal. Mach. 2011, 33, 368–381. [Google Scholar] [CrossRef] [PubMed]
Dong, L.; Du, H.; Mao, F.; Han, N.; Li, X.; Zhou, G.; Zhu, D.; Zheng, J.; Zhang, M.; Xing, L.; et al. Very High Resolution Remote Sensing Imagery Classification Using a Fusion of Random Forest and Deep Learning Technique—Subtropical Area for Example. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 113–128. [Google Scholar] [CrossRef]
Bai, S.-B.; Wang, J.; Lü, G.-N.; Zhou, P.-G.; Hou, S.-S.; Xu, S.-N. GIS-based logistic regression for landslide susceptibility mapping of the Zhongxian segment in the Three Gorges area, China. Geomorphology 2010, 115, 23–31. [Google Scholar] [CrossRef]
Nirbhav; Malik, A.; Maheshwar; Jan, T.; Prasad, M. Landslide susceptibility prediction based on decision tree and feature selection methods. J. Indian Soc. Remote Sens. 2023, 51, 771–786. [CrossRef]
Sun, D.; Yu, Y.; Goldberg, M.D. Deriving water fraction and flood maps from MODIS images using a decision tree approach. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2011, 4, 814–825. [Google Scholar] [CrossRef]
Lian, C.; Zeng, Z.; Yao, W.; Tang, H.; Chen, C.L.P. Landslide Displacement Prediction With Uncertainty Based on Neural Networks With Random Hidden Weights. IEEE Trans. Neural Netw. Learn. Syst. 2016, 27, 2683–2695. [Google Scholar] [CrossRef]
Tan, K.; Zhang, J.; Du, Q.; Wang, X. GPU parallel implementation of support vector machines for hyperspectral image classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2015, 8, 4647–4656. [Google Scholar] [CrossRef]
Jebur, M.N.; Pradhan, B.; Tehrany, M.S. Manifestation of LiDAR-Derived Parameters in the Spatial Prediction of Landslides Using Novel Ensemble Evidential Belief Functions and Support Vector Machine Models in GIS. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2015, 8, 674–690. [Google Scholar] [CrossRef]
Nguyen, B.-Q.; Kim, Y.-T. Landslide spatial probability prediction: A comparative assessment of naïve Bayes, ensemble learning, and deep learning approaches. Bull. Eng. Geol. Environ. 2021, 80, 4291–4321. [Google Scholar] [CrossRef]
Ghorbanzadeh, O.; Blaschke, T.; Gholamnia, K.; Meena, S.R.; Tiede, D.; Aryal, J. Evaluation of Different Machine Learning Methods and Deep-Learning Convolutional Neural Networks for Landslide Detection. Remote Sens. 2019, 11, 196. [Google Scholar] [CrossRef]
Ghorbanzadeh, O.; Meena, S.R.; Blaschke, T.; Aryal, J. UAV-Based Slope Failure Detection Using Deep-Learning Convolutional Neural Networks. Remote Sens. 2019, 11, 2046. [Google Scholar] [CrossRef]
Sameen, M.I.; Pradhan, B. Landslide Detection Using Residual Networks and the Fusion of Spectral and Topographic Information. IEEE Access 2019, 7, 114363–114373. [Google Scholar] [CrossRef]
Chadaram, S.; Yadav, S.K. Identification of Cracks Length by XFEM and Machine Learning Algorithm. In Lecture Notes in Intelligent Transportation and Infrastructure; Springer: Singapore, 2020; pp. 265–272. [Google Scholar] [CrossRef]
Zhang, F.; Hu, Z.; Fu, Y.; Yang, K.; Wu, Q.; Feng, Z. A New Identification Method for Surface Cracks from UAV Images Based on Machine Learning in Coal Mining Areas. Remote Sens. 2020, 12, 1571. [Google Scholar] [CrossRef]
Wu, Y.; Lu, G.; Zhu, Z.; Bai, D.; Zhu, X.; Tao, C.; Li, Y. A Landslide Warning Method Based on K-Means-ResNet Fast Classification Model. Appl. Sci. 2022, 13, 459. [Google Scholar] [CrossRef]
Shi, W.; Zhang, M.; Ke, H.; Fang, X.; Zhan, Z.; Chen, S. Landslide Recognition by Deep Convolutional Neural Network and Change Detection. IEEE Trans. Geosci. Remote Sens. 2021, 59, 4654–4672. [Google Scholar] [CrossRef]
Zhang, H.; Yin, C.; Wang, S.; Guo, B. Landslide susceptibility mapping based on landslide classification and improved convolutional neural networks. Nat. Hazards 2022, 116, 1931–1971. [Google Scholar] [CrossRef]
Hu, H.; Cai, S.; Wang, W.; Zhang, P.; Li, Z. A semantic segmentation approach based on DeepLab network in high-resolution remote sensing images. In Lecture Notes in Computer Science, Image and Graphics; Springer: Berlin/Heidelberg, Germany, 2019; pp. 292–304. [Google Scholar] [CrossRef]
Mao, J.Q.; He, J.; Liu, G.; Fu, R. Landslide recognition based on improved DeepLabv3+ algorithm. J. Nat. Disaster 2023, 32, 227–234. [Google Scholar] [CrossRef]
Wang, W.; Zhang, Z.; Zhu, X.; Yang, S. Semantic segmentation of landslide image using DeepLabv3+ and completed local binary pattern. J. Appl. Remote Sens. 2025, 19, 014502. [Google Scholar] [CrossRef]
Yang, Y.; Miao, Z.; Zhang, H.; Wang, B.; Wu, L. Lightweight attention-guided YOLO with level set layer for landslide detection from optical satellite images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 3543–3559. [Google Scholar] [CrossRef]
Ji, S.; Yu, D.; Shen, C.; Li, W.; Xu, Q. Landslide detection from an open satellite imagery and digital elevation model dataset using attention boosted convolutional neural networks. Landslides 2020, 17, 1337–1352. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
Simonyan, K. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
Chen, L.-C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar] [CrossRef]
Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020. [Google Scholar] [CrossRef]
Hou, H.; Chen, M.; Tie, Y.; Li, W. A Universal Landslide Detection Method in Optical Remote Sensing Images Based on Improved YOLOX. Remote Sens. 2022, 14, 4939. [Google Scholar] [CrossRef]

Figure 1. Image data of the study areas. The red points are the Bijie landslide locations, and the yellow points are the Luding landslide locations [35].

Figure 2. Landslide and non-landslide characteristic features. Image is an RGB image of size 224 × 224. The bright and dark parts of the DEM data are usually used to represent the ups and downs of the terrain. The brighter area usually represents a steeper slope, while the darker area represents a more gentle slope.

Figure 3. DeepLabv3 + architecture. DCNN stands for deep convolutional neural network and Atrous Conv stands for atrous convolution. Blue, brown, pink, and green indicate the extracted feature layers.

Figure 4. Residual modules’ structures.

Figure 5. Structure diagram of the ECA module (X represents the input feature map, C, H, W represent the channel number, height, and width of the feature map, GAP represents the global pooling, K represents the size of the convolution kernel, and

σ

represents the activation function).

Figure 5. Structure diagram of the ECA module (X represents the input feature map, C, H, W represent the channel number, height, and width of the feature map, GAP represents the global pooling, K represents the size of the convolution kernel, and

σ

represents the activation function).

Figure 6. DeepLabv3+-ResNet101-ECA network architecture. Inside the blue box is the encoder and inside the red box is the decoder.

Figure 7. Accuracy of different backbones on the validation set.

Figure 8. Predicted mask images based on different backbone networks of DeepLabv3+.

Figure 9. Impact of using different attention mechanisms on the validation set.

Figure 10. Accuracy of using different attention mechanisms on the validation set.

Figure 11. Predicted mask images from the different attention mechanisms. The region selected by the blue box is the false positive region, the region selected by the green box is the region with clearer detail capture, and the region selected by the red box is the region with preserved boundary.

Figure 12. Comparison of the prediction masks of DeepLabv3+-ResNet101-ECA and DeepLabv3+-ResNet101-ECA-Trans-Learning for Luding landslide data. The red marked part is the important feature area of the landslide image. The part marked in red in Figure 12 is an important feature area of the landslide image, and it is also the area that needs to be correctly predicted by the model. The model will focus on extracting and learning the input image RGB and DEM, so as to identify the landslide efficiently and accurately.

Table 1. Principles, advantages, and disadvantages of different machine learning methods.

Methods	Main Principles	Advantages	Disadvantages
Logistic Regression	Establish a regression formula based on existing data to classify the boundary line	The principle is simple; low computational cost	Easy to underfit; classification accuracy may not be high
Decision Tree	Building a decision model using a tree-like structure based on data attributes	This algorithm has a certain degree of robustness; it can handle irrelevant feature data	Possible overfitting issues may arise
Artificial Neural Network	A computational model for simulating the structure and function of biological neural networks	High class accuracy; strong learning ability	Requires a large number of parameters; unable to observe the learning process; the results are difficult to explain; long study time
Support Vector Machine	Find the best hyperplane and separate data points of different categories with maximum spacing	Low generalization error rate; the computational cost is not significant; the results are easy to explain	Sensitive to parameter adjustment and kernel function selection; the original classifier, without modification, is only suitable for handling binary classification problems
Random Forest	Generate multiple decision trees using random methods and train classification predictions on sample data	Can fully utilize limited samples; advantages of diversity and accuracy	Overfitting may occur on certain noisy issues

Table 2. Computational cost when training different backbones.

Backbone	Average Training Time (s)	Max GPU Memory Usage (MB)	Average Inference Time per Image (s)	FLOPs (G)	Total Number of Parameters
DeepLabv3+-MobileNet	5.79	599.52	0.0127	2.76	10,507,730
DeepLabv3+-VGG16	38.52	2979.57	0.0040	5.52	19,156,706
DeepLabv3+-Xception	10.2	1620.31	0.0117	8.28	35,864,490
DeepLabv3+-ResNet18	5.46	641.30	0.0073	3.5	16,606,434
DeepLabv3+-ResNet50	23.59	2426.75	0.0140	11.04	42,002,327
DeepLabv3+-ResNet101	33.91	3894.33	0.0190	16.008	60,994,455

Table 3. Evaluation metrics for different backbone networks.

Backbone	Precision	Recall	F1 Score	MIoU
DeepLabv3+	0.9234	0.9059	0.9144	0.7367
DeepLabv3+-MobileNet	0.8804	0.8757	0.8780	0.6357
DeepLabv3+-VGG16	0.9201	0.8906	0.9046	0.7004
DeepLabv3+-Xception	0.9063	0.8991	0.9026	0.7029
DeepLabv3+-ResNet18	0.9293	0.8990	0.9134	0.7347
DeepLabv3+-ResNet50	0.9241	0.9060	0.9147	0.7338
DeepLabv3+-ResNet101	0.9283	0.9160	0.9220	0.7552

Table 4. Average accuracy per experiment for different backbone network models.

Model	1	2	3	4	5	6	7	8	9	10
DeepLabv3+-MobileNet	0.9488	0.9495	0.9501	0.9510	0.9499	0.9486	0.9480	0.9492	0.9510	0.9520
DeepLabv3+-VGG16	0.9568	0.9551	0.9559	0.9601	0.9576	0.9542	0.9521	0.9501	0.9586	0.9600
DeepLabv3+-Xception	0.9591	0.9495	0.9523	0.9554	0.9491	0.9523	0.9554	0.9489	0.9553	0.9542
DeepLabv3+-ResNet18	0.9710	0.9682	0.9618	0.9701	0.9723	0.9695	0.9645	0.9712	0.9723	0.9729
DeepLabv3+-ResNet50	0.9553	0.9621	0.9634	0.9601	0.9495	0.9483	0.9595	0.9601	0.9521	0.9588
DeepLabv3+-ResNet101	0.9796	0.9785	0.9806	0.9813	0.9799	0.9780	0.9832	0.9811	0.9782	0.9830

Table 5. Results of the t-test between the average accuracy of different models and the average accuracy of DeepLabv3+-ResNet101.

Model	DeepLabv3+-MobileNet	DeepLabv3+-VGG16	DeepLabv3+-Xception	DeepLabv3+-ResNet18	DeepLabv3+-ResNet50
t-statistic	46.3659	20.0576	24.3033	7.9224	15.8881
p-value	5.0524 × 10⁻⁸	8.8519 × 10⁻⁸	1.6180 × 10⁻⁸	2.3924 × 10⁻⁸	6.8385 × 10⁻⁸

Table 6. Adding the evaluation metrics for different attention mechanisms.

Model	Precision	Recall	F1 Score	MIoU
DeepLabv3+-ResNet101-BAM	0.9326	0.9123	0.9188	0.7511
DeepLabv3+-ResNet101-CBAM	0.9242	0.9095	0.9215	0.7522
DeepLabv3+-ResNet101-SE	0.9242	0.9105	0.9172	0.7452
DeepLabv3+-ResNet101-ECA	0.9351	0.9259	0.9240	0.7603

Table 7. Model evaluation metrics of the ECA modules when added at different locations.

Location	Precision	Recall	F1 Score	MIoU
ASPP-ECA	0.9251	0.9166	0.9208	0.7534
Residual-ECA	0.9115	0.9222	0.9167	0.7433
Decoder-ECA	0.9328	0.9260	0.9241	0.7591

Table 8. Evaluation metrics for DeepLabv3+-ResNet101-ECA and DeepLabv3+-ResNet101-ECA-Trans-Learning. The DeepLabv3+-ResNet101-ECA representation model training and prediction are based on the Luding dataset. The DeepLabv3+-ResNet101-ECA-Trans-learning refers to pre-training on Bijie data and then prediction on Luding data.

Model	Precision	Recall	F1 Score	MIoU
DeepLabv3+-ResNet101-ECA	0.9286	0.9254	0.9270	0.7743
DeepLabv3+-ResNet101-ECA-Trans-Learning	0.9484	0.9427	0.9455	0.8238

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, X.; Wang, S.; Dinavahi, V.; Yang, L.; Wu, D.; Shen, M. Landslide Recognition Based on DeepLabv3+ Framework Fusing ResNet101 and ECA Attention Mechanism. Appl. Sci. 2025, 15, 2613. https://doi.org/10.3390/app15052613

AMA Style

Chen X, Wang S, Dinavahi V, Yang L, Wu D, Shen M. Landslide Recognition Based on DeepLabv3+ Framework Fusing ResNet101 and ECA Attention Mechanism. Applied Sciences. 2025; 15(5):2613. https://doi.org/10.3390/app15052613

Chicago/Turabian Style

Chen, Xinfang, Shiwei Wang, Venkata Dinavahi, Lijia Yang, Dibai Wu, and Meiyi Shen. 2025. "Landslide Recognition Based on DeepLabv3+ Framework Fusing ResNet101 and ECA Attention Mechanism" Applied Sciences 15, no. 5: 2613. https://doi.org/10.3390/app15052613

APA Style

Chen, X., Wang, S., Dinavahi, V., Yang, L., Wu, D., & Shen, M. (2025). Landslide Recognition Based on DeepLabv3+ Framework Fusing ResNet101 and ECA Attention Mechanism. Applied Sciences, 15(5), 2613. https://doi.org/10.3390/app15052613

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Landslide Recognition Based on DeepLabv3+ Framework Fusing ResNet101 and ECA Attention Mechanism

Abstract

1. Introduction

2. Study Area and Dataset

2.1. The Bijie Dataset

2.2. The Luding Dataset

3. Method

3.1. Data Preprocessing

3.2. DeepLabv3+ Model

3.3. ResNet

3.4. ECA

3.5. DeepLabv3+-ResNet101-ECA

3.6. Model Evaluation Methods

4. Experiments and Results

4.1. Experimental Setup

4.2. Backbone Network Selection

4.2.1. Analysis of Computational Cost and Accuracy

4.2.2. Statistical Significance Analysis

4.3. Different Attention Mechanisms

4.4. The Location of the ECA

4.5. Transfer Learning Experiment

4.6. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI