4.1. Data Preprocessing
Data preprocessing includes relabeling samples and implementing data augmentation strategies on samples. Due to different target tasks, we only selected 770 landslide samples in the Bijie landslide dataset as the original data. In the labeling work, in order to obtain reliable data, we relabeled according to the original labels provided in the Bijie dataset. At the same time, we processed the data according to the research goals and finally obtained 510 landslide samples. Among these obtained data, 90% of the data was used for training and validation of the model, and 10% of the data was used for testing the model.
The deep learning model learns the features of the image through the provided image samples during the training process. Therefore, the more samples that are provided to the model, the more the model can learn the features of such images and the better its predictions. For the model, if an image is rotated, cropped and then passed into the model, the model will consider it a new image, so we can enhance the sample by expanding it. We implemented a data enhancement strategy for the existing training set according to the characteristics of sample landslides with different directions, different structures and different boundary shapes. We enhance the dataset’s quality through augmentation techniques, we thereby improved the model’s training effect. However, excessive rotation and flipping of images will cause overfitting of the model, thereby reducing its generalization and causing causing the model to achieve high accuracy on the training set but but not achieve very high accuracy on the test set. Considering these problems in the process of data expansion, we rotate, scale and flip the sample set according to a certain probability based on experience. These expanded data were all used in our model training; after expansion, the training samples were changed from the original 510 to 2500. For more details on data preprocessing, see
Table 2.
4.2. Training
The training effect of the deep learning model is closely related to the accuracy of the data set, suitable parameters, and training methods. Therefore, in this study, we selected three models, U-Net, DeepLab v3+, and PSPNet, and used two different classification networks as the backbone network of each model; the backbone network selection of the model is shown in
Table 3. We record these models as U-Net (VGG), U-Net (ResNet50), DeepLab v3+ (MobileNet), DeepLab v3+ (Xception), PSPNet (MobileNet), and PSPNet (ResNet50).
In the experiment, we set and adjusted the training parameters uniformly for all the models, as shown in
Table 4. The input image size is fixed at 473 × 473. The classification of pixel types is landslide and background. The model training adopts the method of freezing training, which divides the training into two stages: freezing and unfreezing. In the freezing stage, the backbone of the model is frozen, and the feature extraction network does not change. At this time, 50 rounds of fine-tuning are performed on the network. The video memory occupied in the freezing phase is small, so the batch_size and learning rate are set larger. In the unfreezing stage, the backbone network of the model is unfrozen, and the feature extraction network will change; at this time, the network is trained for 100 rounds. Due to a large amount of video memory occupied, the batch_size is set to 0.5 times that of the frozen phase, and the learning rate is reduced.
In addition, during model training, we use the pre-trained weights obtained in the VOC dataset as the initial parameters of the model and set dice_loss to balance the number of training categories. We set focal_loss to balance positive and negative samples and use the NumPy form to give different loss weights to the background and landslides so that the model focuses on landslide pixels. In order to improve the recognition effect, we set the downsample_factor to 8, but it will also occupy much memory; therefore, in order to reduce the occupation of video memory, we do not use aux_branch by default, do not use multi-threading to read data and use the early stop strategy to save computing resources.
The hardware environment of this experiment: GPU: 4*NVIDIA Tesla K80, CPU: 32*Intel (R) Xeon (R) CPU E5-2620 v4 @ 2.10GHz, OS: CentOS 8.3. The software environment of this experiment: CUDA 11.2, Python 3.6, PyTorch 1.10.1, Tensorflow 2.2.0.
4.3. Experimental Results and Analysis
Pretrained models obtained by the experiment were used to predict the test set. The prediction results show that these models can effectively identify landslides. At the same time, according to the experimental results, we conclude that the PSPNet model using ResNet50 as the backbone network has the best recognition effect. We will discuss the results first from the perspective of image recognition and then from the perspective of index evaluation.
A total of 51 remote sensing images of landslides were included in the test set; all of the images were from the eastern part of the Qinghai–Tibet Plateau in Bijie City, Guizhou Province, China. As shown in
Figure 4, we show some landslide images in the test set, and none of the images in the test set participated in the training, through which we evaluated the performance of each model. We have selected some of these samples for analysis, as shown in
Figure 5. The figure includes four samples of landslide images that occurred in different places, marked as Landslide I, Landslide II, Landslide III, and Landslide IV. Among them, I, II, and III are new landslides with various characteristics and shapes of landslides, and IV is an old landslide with inconspicuous characteristics of landslides. In addition, the figure also includes the label file of the landslide, which was used for comparison with the predicted map. We use the pretrained model to obtain the predicted map; we will analyze the predicted maps of the four landslide samples separately.
Landslide I: U-Net (VGG) has a high error rate in identifying landslide images; it identifies the background associated with the landslide color as a landslide. However, the majority of these environments are land and cultivated land. There are 15 additional prediction maps for the same situation. DeepLab v3+ (Xception) is similar to U-Net (VGG). DeepLab v3+ (MobileNet), U-Net (ResNet50), PSPNet (MobileNet), and PSPNet (ResNet50) can more accurately identify landslides; in contrast, PSPNet (ResNet50) has better performance in recognizing landslides.
Landslide II: U-Net (VGG) and DeepLab v3+ (MobileNet) cannot fully identify landslides, so the effect is poor. DeepLabv3+ (Xception) recognizes roads with similar colors as landslides. U-Net (ResNet50) recognizes other objects with similar shapes to landslides as landslides and cannot distinguish the landslide from other content in the background. PSPNet (MobileNet) and PSPNet (ResNet50) perform better.
Landslide III: There is a chasm in identifying the landslide by U-Net (VGG); the color of the chasm part is darker, so the background color is closer, which is caused by the new vegetation growing on the landslide, the model cannot recognize this and thus identifies errors. The situation of DeepLab v3+ (Xception) is just the opposite. Although it can distinguish landslides from vegetation, it cannot distinguish the features between landslides and roads. DeepLabv3+ (MobileNet), U-Net (ResNet50), PSPNet (MobileNet) and PSPNet (ResNet50) can identify landslides more accurately; however, their sensitivities to landslide boundaries vary.
Landslide IV: Landslide IV has been formed for a long time, the landslide has been covered with vegetation, and the entire landslide is green. Because its landslide characteristics are not prominent, they are not easy to separate. This type of landslide is too complex for the model to recognize. Fortunately, according to the recognition effect of these models on Landslide IV, the model can distinguish the landslide from the features of the surrounding vegetation. However, the segmentation effect of the boundary is not accurate enough.
We evaluated the model using the metrics in
Section 3.5, and the evaluation results are shown in
Table 5. Observing the recall index and precision index values, we found that the recall rate of all models is higher than the precision; this means that these models can identify real landslide pixels but also identify many non-landslide pixels as landslide pixels. It can also be seen from the analysis of
Figure 6 that when identifying Landslide I and Landslide II, U-Net (VGG), DeepLabv3+ (Xception) and U-Net (ResNet50) easily confuse other objects with similar colors and shapes to landslides.
In
Table 5, mIoU values are used to evaluate model performance comprehensively. Among these pretrained models, PSPNet (ResNet50) produced the best landslide recognition effect, with an mIoU value of 91.18%, and obtained the highest precision index (93.76%); this means that the model has a good effect on the recognition of landslide pixels. Followed by PSPNet (MobileNet) and U-Net (ResNet50), the mIoU values of which are 89.11% and 88.75%, respectively; PSPNet (MobileNet) obtained the highest recall index (97.39%), which means that the model can identify most of the landslide pixels. U-Net (VGG) has the worst landslide recognition effect, with a mIoU of 81.64% and a recall and precision of 89.34% and 89.22%, respectively.
Below, we combine the chart to discuss the recognition effect of PSPNet on landslides when MobileNet and ResNet50 are used as the backbone network, respectively. As shown in
Table 6, we used precision, recall and IoU to evaluate the model’s ability to recognize landslide and background pixels, respectively, when using two different backbone networks. P is the abbreviation for precision, R is the abbreviation for recall and IoU represents intersection over union. In identifying background pixels, when ResNet50 is used as the backbone network, the IoU value is 97.76%, which is 14.79% higher than when MobileNet is used as the backbone network; this can be seen intuitively from the landslide prediction map. In
Figure 6, the blue boxes mark the parts of PSPNet (ResNet50) and PSPNet (MobileNet) that misidentify the background pixels as landslide pixels. Compared with PSPNet (MobileNet), PSPNet (ResNet50) is less likely to mistakenly identify content in the background (such as roads, green vegetation, bare land) as landslides. At the same time, the precision and recall of ResNet50 are higher than MobileNet. In identifying landslide pixels, when ResNet50 is used as the backbone network, the IoU value is 84.6%, which is 3.45% higher than when MobileNet is used as the backbone network. At the same time, the precision is 5.35% higher than that of PSPNet (MobileNet), and the recall is lower than 1.9%. Therefore, its landslide recognition effect is better.
4.4. Discussion
In previous research on landslide recognition based on deep learning, Ghorbanzadeh used CNN to train and test landslides in the Himalayas and obtained an F1 value of 87.8% and an mIoU value of 78.26% [
21]. Liu (2020) proposed to use ResU-Net to identify earthquake landslides in Jiuzhaigou, Sichuan Province, China, and obtained an F1 value of 93.3% and an mIoU value of 87.5% [
25]. Ullo (2021) applied Mask R-CNN to landslide recognition in digital images of target hilly areas acquired by drones; and when ResNet101 was used as the backbone network, the obtained F1 value was 97% [
28]. Liu (2021) proposed to use an improved Mask R-CNN to identify earthquake landslides in the Jiuzhaigou area of Sichuan Province, China, and obtained an F1 value of 94.5% and an mIoU value of 89.6% [
29]. Ghorbanzadeh (2022) obtained an F1 of 84.03% and an mIoU value of 72.49% when using ResU-Net and OBIA for landslide detection in multitemporal Sentinel-2 images [
21]. Our proposed PSPNet, using the classification network ResNet50 as the backbone network, achieves a 91.18% mIoU value on the Bijie landslide dataset; however, due to the differences in datasets and evaluation metrics, we cannot compare it with other models; however, according to the current experimental results, the method proposed in this paper is effective for landslide recognition.
Although PSPNet (ResNet50) achieves good results in landslide identification, it still has some shortcomings. For example, its segmentation of landslide boundaries still needs further improvement. Landslide images are different from traditional remote sensing images. In addition to the information of the images themselves, remote sensing images also contain rich geological information. For example, a digital elevation model (DEM) can reflect local terrain features at a specific resolution. Therefore, we should further combine deep learning with remote sensing to maximize the role of remote sensing data. If the DEM data and remote sensing images are fused, we can obtain the local terrain information of the landslide from the DEM, which will help the model to improve the segmentation accuracy of the landslide boundary.
At present, the automatic identification of landslides based on deep learning still presents research challenges and problems to be solved. For example, the scarcity of open source code for landslide identification research and the lack of high-resolution public landslide remote sensing image datasets and validation areas have brought great difficulties to such research. At the same time, the further improvement in landslide identification accuracy remains to be explored. Given these problems, we need to continue research on automatic landslide identification. In terms of datasets, we will try to integrate the DEM data into remote sensing images so that the datasets contain more information about landslides, thereby improving the accuracy of landslide identification. In terms of models, we will further discuss the influence of model structure on landslide identification and then improve the model to improve the effect of landslide identification.