1. Introduction
Landslides are a deforming and destructive phenomenon resulting from a combination of multiple factors [
1]. It causes enormous casualties and economic losses. Therefore, the rapid, accurate, and automatic identification and mapping of landslides hold significant value in terms of disaster prevention and mitigation [
2]. Traditionally, the identification of historical landslides is mainly based on field investigation or visual interpretation of remote sensing (RS) images. While these approaches yield accurate results, they are costly, labor-intensive, time-consuming, and rely on human subjectivity [
3].
Computer-assisted methods currently employed for historical landslide identification using RS images can be categorized into three main approaches: feature thresholding segmentation [
4,
5], machine learning [
6,
7], and deep learning (DL). The first two methods still necessitate the manual determination of thresholds and feature screening, which restricts their efficiency and generalizability. In contrast, DL methods have made significant strides in the intelligent interpretation of RS images [
8,
9], owing to their ability to automatically extract and classify features.
However, DL-based landslide identification relies heavily on the availability of a large amount of labeled data, resulting in increased costs associated with model training. Furthermore, supervised learning (SL) (
Figure 1a) with insufficiently labeled data usually leads to model overfitting, which prevents the model from learning the proper feature distributions embedded in the dataset. To address the reliance on expert-labeled data and mitigate these limitations, semi-supervised learning (SSL) (
Figure 1b) has been proposed to train DL models using a small amount of labeled data and a large amount of unlabeled data.
In general, SSL approaches can be categorized into three main categories: consistency regularization (CR), pseudo-labeling self-training (PS), and generative adversarial network (GAN). The principle of CR underlines that a robust model should maintain consistent results for the same input under extra perturbation. The PS technique involves identifying unlabeled data and incorporating these results with the original labeled data to form new labeled data. The goal is to minimize entropy. However, the PS process may introduce confirmation bias in the model, ultimately impacting the accuracy of landslide identification [
10]. The GAN synthesizes landslide images by a generator and trains a discriminator to differentiate between real and synthetic ones, enabling it to learn a richer feature space and improve landslide identification results [
11], whereas the utilization of GAN-based SSL methods poses challenges due to the instability of GANs [
12].
Currently, research on SSL in RS largely focuses on full-scene segmentation [
13,
14,
15] and classification [
16,
17,
18], with the limited exploration of specific historical landslide identification tasks. Zhang et al. [
11] applied WGAN-GP to extract discriminative deep features through unsupervised adversarial training from unlabeled landslide images, effectively learning pixel-level and object-level deep representations of landslides. He et al. [
19] used a generator to synthesize landslide images, training a discriminator to learn multidimensional low- and high-level semantic multiscale features, thereby reducing dependence on labeled data. Zhou et al. [
20] proposed a method combining class activation maps (CAMs) with a cycle generative adversarial network (CycleGAN) for RS landslide semantic segmentation, which reduces the workload of landslide annotation. In addition, U
2PL [
21] as a typical PS-based SSL approach originally proposed for natural image segmentation and achieves significant improvement in accuracy when applied to Luding landslide identification [
22]. The above studies demonstrate that SSL has substantial potential for historical landslide identification tasks. However, significant differences between RS images and natural images present various challenges for its application in landslide recognition.
Inadequate information can be extracted from embedded data due to a limited number of unlabeled RS images of landslides. As illustrated in
Figure 2, RS images of landslides exhibit distinct characteristics such as larger receptive fields, a higher presence of objects, richer embedded information, and more complex color textures compared to natural images. Furthermore, the scale of available landslide datasets is significantly smaller in comparison to popular natural image datasets. For instance, the Cityscapes dataset [
23] comprises 3475 images, while the MS COCO dataset [
24] contains 123,000 images. In contrast, the Bijie landslide dataset [
25] consists of a mere 770 images, the Nepal landslide dataset [
26] contains 275 images, and the Sichuan dataset [
27] includes only 107 images. Consequently, the key to improving the accuracy of landslide identification lies in the effective exploration and utilization of information from these limited yet information-rich RS images. However, existing CR approaches are restricted to single- [
15,
28,
29] or dual-level [
30] perturbations, lacking specific rules for perturbations tailored for landslide identification. This constraint hinders the model from exploring a wider perturbation space and gaining valuable knowledge from unlabeled landslide images. By integrating input, feature, and model perturbations, the perturbation space of the model is expanded, enabling the model to learn from a broader range of perspectives. This, in turn, enhances the utilization of limited yet information-rich RS images of landslides.
The generation of low-quality weak augmentation pseudo-labels. Currently, there is considerable interest in SSL approaches that utilize weak-to-strong consistency learning (WSCL) in natural images, which utilizes instant pseudo-labels by enabling the strongly augmented view to be guided by the weakly augmented view for learning during each training iteration [
15]. The effectiveness of these approaches stems from the ability of weak perturbations to generate accurate identifications while strong perturbation introduces additional information as well as mitigating confirmation bias. While various data augmentation strategies, including RandAugment [
31], CTAugment [
32], and AugSeg [
33], have been proposed for SSL in classification and segmentation tasks involving natural images, their application in landslide identification may result in augmentation types that do not accurately reflect real-world landslide scenarios. This mismatch negatively impacts the accuracy of landslide identification. Additionally, RS offers a diverse range of information sources. For example, the digital elevation model (DEM) and its derived topographic data are frequently utilized as auxiliary resources for landslide identification, yielding positive results [
34]. Unfortunately, existing data augmentation strategies are prone to corrupting this valuable information and impairing model performance, as shown in
Figure 3. In this work, we propose a data augmentation strategy specifically designed for the landslide identification task. Through a well-balanced combination of weak and strong data augmentations, the strategy simulates landslides under different angles and lighting conditions, mitigates the influence of color variations on landslide identification, and enhances the delineation of landslide boundaries.
To tackle the aforementioned challenges, we propose a comprehensive generic DL framework named hybrid perturbation mean match (HPM-Match), specifically designed for the identification of historical landslides. The HPM-Match framework addresses the limitations of existing approaches by integrating input-level, feature-level, and model-level perturbations in an end-to-end manner and updating model parameters through the exponential moving average (EMA). Additionally, it introduces an independent triple-stream perturbation (ITP) structure that relies on multidimensional CR constraints and defines implementation strategies for three different perturbations, thereby expanding the range of available perturbations. This enables the extraction of valuable knowledge from unlabeled landslide images from multiple perspectives. Moreover, the HPM-Match framework aims to fully exploit the manually pre-defined perturbation spaces while concurrently minimizing the introduction of erroneous information during the WSCL process. To enhance the representation of information without impairing the integrity of the original data, we propose a novel dual-branch input perturbation (DIP) generation approach. By applying two parallel WSCLs to the input data, DIP aims to create a more comprehensive and informative representation of input data, thereby facilitating more accurate landslide identification. We conducted experiments on three landslide identification datasets, each characterized by unique attributes. In comparison to both the state-of-the-art (SOTA) SSL approach and SL methods, our framework demonstrates superior performance while employing the same model.
7. Limitation and Future Work
Although HPM-Match demonstrates excellent performance in the experimental results, it still faces the risk of failure under complex scenarios and with limited landslide imagery data. The feature extraction capability of the U-Net used in the model is limited, which may cause it to struggle in extreme weather conditions (e.g., cloud cover) or highly complex terrains. However, models with stronger feature extraction capabilities generally have more complex structures, which can lead to increased computational costs and time, thereby reducing the practical applicability of HPM-Match. In regions with scarce imagery, HPM-Match may fail to adequately learn about unknown landslide characteristics, reducing both the accuracy and completeness of landslide identification. Additionally, the experiments use relatively small datasets and have not thoroughly explored the generalization ability of HPM-Match on larger, more diverse datasets.
HPM-Match improves the accuracy of historical landslide identification under limited-label conditions but also increases the computational cost of the model. We evaluated the training time of HPM-Match, SL, and other SOTA SSL methods on the Bijie dataset, as shown in
Figure 15. The results indicate that the training time of SSL methods is significantly higher than that of SL across all three label ratios. This is because SL only utilizes a small amount of labeled data for training and does not learn from unlabeled data. Among the four SSL methods, the Mean Teacher has the shortest training time but typically achieves the lowest F
1-Score. While HPM-Match achieves the best performance, its longer training time poses a challenge for practical applications.
In future work, we plan to integrate HPM-Match with lightweight models specialized for landslide identification and incorporate multimodal or dynamic data to further enhance the accuracy of historical landslide identification. We will also use K-fold cross-validation to rigorously evaluate the accuracy and reliability of the framework. Additionally, the impact of the DEM source on the final results and the ability of HPM-Match to generalize using large datasets also need to be considered. Last but not least, we will explore the application of HPM-Match in real-world landslide monitoring and examine its potential to address other geodynamic processes.