Attention-Enhanced Region Proposal Networks for Multi-Scale Landslide and Mudslide Detection from Optical Remote Sensing Images

Niu, Chong; Ma, Kebo; Shen, Xiaoyong; Wang, Xiaoming; Xie, Xiao; Tan, Lin; Xue, Yong

doi:10.3390/land12020313

Open AccessArticle

Attention-Enhanced Region Proposal Networks for Multi-Scale Landslide and Mudslide Detection from Optical Remote Sensing Images

¹

School of Environment Science and Spatial Informatics, China University of Mining and Technology, Xuzhou 221116, China

²

Shandong GEO-Surveying & Mapping Institute, Jinan 250002, China

³

Rizhao Marine and Fishery Research Institute, Rizhao 276800, China

⁴

Shandong Province Institute of Land Surveying and Mapping, Jinan 250013, China

⁵

Institute of Applied Ecology, Chinese Academy of Sciences, Shenyang 110016, China

^*

Author to whom correspondence should be addressed.

Land 2023, 12(2), 313; https://doi.org/10.3390/land12020313

Submission received: 30 November 2022 / Revised: 26 December 2022 / Accepted: 17 January 2023 / Published: 22 January 2023

Download

Browse Figures

Versions Notes

Abstract

:

Detecting areas where a landslide or a mudslide might occur is critical for emergency response, disaster recovery, and disaster cost estimation. Previous works have reported that a variety of convolutional neural networks (CNNs) significantly outperform traditional approaches for landslide/mudslide detection. These approaches always consider features from the local window and neighborhood information. The CNNs mainly focus on the features derived at a local scale, which might be inefficient for recognizing complex landslide and mudslide scenes. To effectively identify landslide and mudslide risks at a local and global scale, this paper integrates attentions into the architecture of state-of-the-art CNNs—including Faster RCNN—to develop an attention-enhanced region proposal network for multi-scale landslide/mudslide detection. In detail, we employed the attentions to process the region proposals generated by a region proposal network and then combined the results obtained from the attentions and region proposal network to identify whether the object included in a region proposal was a landslide/mudslide. Based on our developed dataset and the Bijie dataset, the experimental results prove that: (1) although the state-of-the-art CNNs for object detection can precisely detect landslides and mudslides, they are inadequate in dealing with similarity to non-landslide/non-mudslide regions; and (2) the proposed method, which integrates global features from attention layers into local features derived from CNNs, outperforms the unmodified CNNs in detecting non-landslides and non-mudslides. Our findings prove that the representations at the local and global scale might be significant for precise landslide and mudslide detection.

Keywords:

landslide detection; mudslide detection; attention; convolutional neural networks; remote sensing; multi-scale detection

1. Introduction

Landslides and mudslides are significant natural disasters occurring in a wide range of areas with special topography and landforms [1,2,3,4,5]. Due to their speedy flow rate, massive flow volume, and ability to carry solid debris, landslides and mudslides have a large capacity for severe destruction. According to a report published by the Second Global Forum on Landslides, landslides have led to costs of over EUR 6 billion per year in damage in industrialized countries (https://www.isprambiente.gov.it/contentfiles/00010100/10185-second-world-landslide-forum-press-release.pdf, accessed on 16 November 2022). Recognizing and localizing a landslide or mudslide is critical for emergency response, disaster recovery, and disaster cost estimation. However, landslides and mudslides can look significantly different in various areas. Considering these characteristics, remote sensing has potential as a technique for mudslide and landslide detection, supporting this task in terms of accelerated and automatic identification for a large-scale area [6,7,8,9,10].

Although topographical properties and spectral information are useful for determining landslide/mudslide risks, the implementation of digital elevation models (DEMs) and multispectral remote sensing images might be restricted due to their limited spatial resolution. Economically assessed DEMs and multispectral remote sensing images generally have a spatial resolution ranging from 15 m to 100 m, allowing small-scale landslides/mudslides to be visually recognized. Currently, high-resolution optical remote sensing images provide significant data for detecting and monitoring landslides and mudslides. Previous works have reported a variety of approaches enhanced by visual features and semantics, such as image morphology [11], object-based image analysis [12], bag of visual words [13], and machine learning techniques [14]. To detect landslides/mudslides at different scales, these approaches require users to select a parameter or a scale so that successive detection and recognition can be achieved. This indicates these approaches might be challenging to use to automatically detect multi-scale landslides/mudslides in a large region.

Recently, the efforts toward employing a series of convolutional neural networks (CNNs) have attracted considerable attention from the research community [15,16]. Among these CNNs, Faster R-CNN is considered a state-of-the-art approach for imagery-based object detection and recognition [17,18]. Based on the backbone of a CNN that can obtain high-level representations from local neighbors, Faster R-CNN innovatively incorporates a region proposal network (RPN) into the architecture of a CNN to generate a number of region candidates more rapidly than other conventional ways. The backbone of an RPN has been successfully implemented into numerous networks in studies on landslide/mudslide detection and recognition [19].

However, Faster R-CNN mainly focuses on the features derived from a local window. Its detection of landslides and mudslides relies on discovering their features and their background information. In other words, any possible solution for precise landslide and mudslide detection should take advantage of the representations of landslides and mudslides at the local and global scale. This makes it difficult for CNNs such as Faster R-CNN to obtain features and information from the whole image. Moreover, Raghu, Unterthiner, and Kornblith [20] reported that the attentions in ViT [21] could derive similarities from multiple layers to achieve global representations of data, supporting the obtainment of more spatial details. For high-resolution remote sensing images, the self-attention within multiple heads, which is developed based on the structure of a transformer, aims to process the representations derived from various patches, or different parts of a remote sensing image [22].

Considering the difference between attention and CNNs in feature learning and extraction, researchers have explored how to effectively deal with the features from every patch and local window [22]. The currently proposed solution tackles this through combining Faster R-CNN and attentions [23]. The features derived from RPNs and attentions need to be addressed in multilayer feature processing. Thus, this paper integrates the critical feature layers of attentions into the architecture of Faster R-CNN to develop an attention-enhanced region proposal network for multi-scale landslide and mudslide detection. This paper is organized as follows, following this introduction. The second section compares Faster RCNN and attentions with respect to feature extraction from optical remote sensing images. The third section presents the details of our proposed attention-enhanced landslide/mudslide detection model. The fourth section reports and discusses the experimental results. Lastly, the fifth section summarizes the highlights and findings of our work.

2. RPNs and Attentions

2.1. RPNs for Landslide/Muslide Feature Learning and Detection

RPNs are used in structures to generate a series of proposals in one layer; essentially, they comprise the last convolutional layer of the backbone as the input, and the specific location of the proposal acts as the output of the neural network. An RPN mainly carries out the following steps [17]: anchor box generation, foreground or background determination of the anchor box, and regression for object localization. Anchor box generation focuses on image regions that possibly contain a target object. Foreground or background determination of an anchor box identifies whether an anchor box contains foreground or background information. When foreground or background information is available, regression for object localization generates the position difference between the anchor box and the true object in order to obtain the precise location of the object.

The general workflow of a CNN for landslide/mudslide detection and recognition includes: (1) generating a series of candidate regions (or proposals); (2) determining whether a candidate region contains the target object; and (3) if a candidate region includes the target object, regressing to modify the geometry of this candidate region to generate a bounding box for the target object. The strength of RPNs in remote-sensing-based landslide/mudslide detection has attracted significant attention. Since the original version [24], a variety of improved RPNs have been proposed, including those combining a scale pyramid with an RPN [25], employing various backbone nets [26], integrating decision trees [27], etc. Although successive landslide detection has been attempted, these improved RPNs mainly focus on local features, which might be inefficient to represent landslides/mudslides at different scales.

2.2. Attentions for Landslide/Muslide Feature Learning and Detection

A number of researchers have reported that the series of attentions developed based on the backbone of a transformer outperform the CNN models in a variety of tasks, such as scene classification, object detection, and segmentation [15,16].The backbone of attention, or multi-head attention, was initially proposed by Vaswani et al. [18] for extracting and learning sequential features in natural language processing. Furthermore, Dosovitskiy et al. [19] integrated visual features into the architecture of a transformer to develop the attentions, including a patching and position embedding part, a transformer encoder, and multi-head classification. The patching and position embedding part focuses on dividing the entire image into various patches, and building a linear projection to address the features of these patches. The transformer encoder develops a multi-head attention layer for feature extraction and learning.

In comparison to CNNs, attention-enhanced neural networks (NNs) have the following advantages:

Scalability. The feature learning of attention-enhanced NNs is completed by an attention mechanism, which is a fully connected graph structure that helps discover the features and their relationships through nodes and vertexes [10]. Moreover, since the graph structure allows computing for heterogeneous nodes, an attention-enhanced NN can map the heterogamous data into a similar feature space to identify the corresponding inherited features. The landscapes of landslides and mudslides vary significantly, and thus their shapes and textures appear considerably different in various remote sensing images [20]. The scalability of attention-enhanced NNs might be useful for addressing the heterogeneities in visual features from optical remote sensing images;
Global-view-based feature extraction. Graph-structure-based feature learning in attention-enhanced NNs focuses on extracting the features from the entire image [21]. Otherwise, convolution extracts the features from a local image region, including a pixel and its neighborhoods [22];
Adaptive feature learning. Features derived from the convolutional layers in a CNN are generally followed by the same filter. Otherwise, the attention mechanism in attention-enhanced NNs could vary the filter for feature learning according to the pixels at a global scale; and
Semantic-driven feature understanding. The origin of the transformer makes attention-enhanced NNs competitive in dealing with the semantics inherited in an image. Moreover, previous works have proved that sequential information can be useful.

Specifically, previous works have reported that a spatio-contextual approach overperforms others in landform element extraction [24,25]. This indicates that the information derived from neighborhood pixels is critical for detecting landform objects such as landslides and mudslides. Currently, many works are focused on segmentation, rather than identifying and localizing landslide and mudslide regions. Few references have reported employing ViT and the improved ViT called SWIN to conduct landslide detection [28,29].

3. The Proposed Approach

3.1. Architecture

Figure 1 illustrates the architecture of the proposed approach, which includes three main layers: a feature map generation layer, an attention-enhanced region proposal extraction layer, and an ROI classification layer. The feature map generation layer, which was developed based on the backbone of VGGNet [25], was derived from Faster RCNN. This layer focuses on extracting potential features from the input remote sensing image. The attention-based feature extraction layer conducts foreground determination to identify whether a region contains a target object, and region proposal regression to determine the anchor box that perfectly covers the identified object. In foreground determination, we combine the RPN and multi-head attentions to generate region proposals based on global and local features, specifically focusing on extracting the features, the region proposal network layer, the feature classification layer, and the region proposal regression layer.

3.2. Feature Map Generation Layer

The architecture of the generation layer focuses on generating the feature maps using the backbone of VGGNet. More details can be found in Reference [17]. The architecture of VGGNet includes 13 convolution operations, 13 ReLU activations, and 4 pooling operations. Pooling operations are max pooling, and the structure of convolution includes Conv1-Conv2-Conv3-Conv4, the structure of which is shown as follows:

Conv1: {1 × 1, 64}, {3 × 3, 64}, and {1 × 1, 256};
Conv2: {1 × 1, 128}, {3 × 3, 128}, and {1 × 1, 512};
Conv3: {1 × 1, 256}, {3 × 3, 256}, and {1 × 1, 1024};
Conv4: {1 × 1, 512}, {3 × 3, 512}, and {1 × 1, 2048}.

The feature map, which is generated by Conv4, holds a dimension of 2048. This feature map is integrated into the results given by the attention-enhanced region proposal extraction layer.

3.3. Attention-Enhanced Region Proposal Extraction Layer

The architecture of the attention-enhanced region proposal extraction layer is shown in Figure 1. Based on the backbone of an RPN, we designed the attention-enhanced region proposal extraction layer as a workflow including two independent processing stages: foreground determination for determining whether the object included in a region proposal is true, and region proposal regression for determining the bounding box that perfectly contains the object.

3.3.1. Foreground Determination

Foreground determination employs multi-head attention for determination by Faster RCNN, which is shown in Figure 2.

(1): Two-dimensional positional embedding

We assume a remote sensing image as

[x, y, 3]

, where

x

and

y

refer to the horizontal and vertical dimension of this image, respectively. Specifically, we define

x = y

, so the embedding can equally divide the whole image into various tokens.

c

refers to the channel. We set the dimension of a token as

[16, 16, 3]

. Then, we have the number of tokens as

{(x / 16)}^{2}

and the dimension of tokens as

16 \times 16 \times 3 = 768

.

(2): Encoding

Encoding is completed by a variety of encoding blocks, including layer normalization (

L N

), a multi-head attention (

M H A

), a droppath (

D P

), and a multi-layer perceptron (

M L P

). The structure of an encoding is expressed by Equation (1):

L N \to M H A \to D P \oplus L N \to M L P \to D P,

(1)

Layer normalization computes the average and variance for every feature map of one sample, and then normalizes the average and variance results. The droppath randomly inactivates the layers in a neural network, preventing the co-adaption of parallel paths in the neural network. The multi-layer perceptron creates a non-linear activation function to predict the classification result with a structure involving a fully connected layer, a GeLU activation function, and DropOut. The mechanism of the multi-head attention, which is presented in the following, is a complicated procedure that is critical for encoding.

The structure of a multi-head attention includes query (

q

), key (

k

), and value (

v

), which are designed to match other parameters to be matched and to be extracted. These three parameters can be obtained using the following equation:

{\begin{matrix} q_{i, h} = w_{q_{i}, h} \times I_{i} \\ k_{i, h} = w_{k_{i}, h} \times I_{i} \\ v_{i, h} = w_{v_{i}, h} \times I_{i} \end{matrix},

(2)

where

I

refers to the input vector,

i

refers to the index of the input vector,

h

refers to the dimension of the head in the attentions, and

w

refers to the weight.

Then, we can obtain the attention (

a_{i}

) with the query (

q

) and key (

k

) using the following equation:

a_{i, h} = (q_{i, h} \times k_{i, h}) / \sqrt{d i m_{q_{i, h} \times k_{i, h}}},

(3)

where

i

and

j

refer to the index of the query (

q

) and the key (

k

), and

d i m_{q_{i, h} \times k_{i, h}}

refers to the dimension of the query (

q

) and key (

k

). Generally speaking,

q_{i, h} \times k_{i, h}

might generate a vector with a high dimension. Thus, the square root

(\sqrt{d i m_{q_{i, h} \times k_{i, h}}}

) can effectively decrease the dimensional number. Normalization can also be used to replace the square root.

Then, we use the Softmax function to process the attentions to enhance their scalability in linear and non-linear spaces. The Softmax function is expressed in Equation (4):

a s_{i} = e x p (a_{i, h}) / \sum_{h} e x p (a_{i, h}),

(4)

Based on this equation, we can obtain the output result (O_i), as shown in Figure 2, using Equation (5):

O_{i} = \sum_{i} \sum_{h} a s_{i} \times v_{i, h},

(5)

O_{i}

refers to the final features (or the tokens), which are given to the ROI classification layer to identify whether an object in a regional proposal is the target one.

For landslide and mudslide detection, the position of an object is also a critical parameter. Thus, we use the cosine function to create a positional vector (

P_{i}

) that has the same dimensions as

I_{i}

, whose positional encoding is expressed by Equation (6):

{\begin{matrix} q_{i, h} = w_{q_{i}, h} \times I_{i} + w_{p_{i}, h} \times P_{i} \\ k_{i, h} = w_{k_{i}, h} \times I_{i} + w_{p_{i}, h} \times P_{i} \\ v_{i, h} = w_{v_{i}, h} \times I_{i} + w_{p_{i}, h} \times P_{i} \end{matrix},

(6)

From Equation (6), we obtain the

q_{i, h}

,

k_{i, h}

, and

v_{i, h}

within the position and features. Then, we put these three parameters into Equations (1)–(5) to obtain the final features (or the tokens) within the position and features. Positional tokens undergo region proposal regression to generate the perforce anchor box (or region proposal) for a landslide and mudslide.

3.3.2. Region Proposal Regression

Region proposal regression aims to integrate a predicted region proposal into a new region proposal similar to the true one. This operation focuses on modifying the position of the anchor box by moving and scaling. We assume the region proposal as

[x, y, w, h]

, referring to the coordinate of the center pixel of this region proposal.

x

and

y

refer to the width and height of the center pixel, respectively. Then, we assume the true region proposal as

[x_{t}, y_{t}, w_{t}, h_{t}]

, and the predicted region proposal as

[x_{p}, y_{p}, w_{p}, h_{p}]

.

The regression includes translation and scaling, which are shown in Equation (7):

{\begin{matrix} ∆ x = (x_{t} - x_{p}) / w_{p} \\ ∆ y = (y_{t} - y_{p}) / h_{p} \\ ∆ w = l o g (w_{t} / w_{p}) \\ ∆ h = l o g (h_{t} / h_{p}) \end{matrix},

(7)

where

[∆ x, ∆ y, ∆ w, ∆ h]

refers to the offset between the true and the predicted region proposal. Then, the loss function is:

L = \sum_{∆, p} S L_{1} (∆ - p),

(8)

where

∆ \in {∆ x, ∆ y, ∆ w, ∆ h

},

p \in {x_{p}, y_{p}, w_{p}, h_{p}

}, and

S L_{1}

refer to smooth L1 loss function.

The above calculation is carried out by object detection with CNNs. In the proposed method, since we have the positional tokens generated by multi-head attentions, we design an additional method to integrate the moving and scale results into the result of positional tokens.

Based on Equation (6), we can obtain the query (qP), key (kP), and value (vP) of the position, rather than input features.

{\begin{matrix} q P_{i, h} = w_{p_{i}, h} \times P_{i} \\ k P_{i, h} = w_{p_{i}, h} \times P_{i} \\ v P_{i, h} = w_{p_{i}, h} \times P_{i} \end{matrix},

(9)

Then, we transfer the

q P

,

k P

, and

v P

from Equation (2) to Equation (5) to obtain the positional tokens

{\hat{O}}_{d i m}

, where

d i m

refers to its dimension. In addition, we define the true and predicted positional tokens as

{\hat{O t}}_{d i m}

and

{\hat{O p}}_{d i m}

, respectively. Using the smooth L1 loss function, we can obtain the difference between the true and predicted positional tokens using Equation (10):

L = \sum_{d i m} S L_{1} ({\hat{O t}}_{d i m} - {\hat{O p}}_{d i m}),

(10)

Next, we combine the results of Equation (8) with the results of Equation (10), and then integrate the combined results into the ROI pooling and identification layer to regress the position of possible landslide and mudslide objects again.

3.4. ROI Pooling and Identification

The region proposals generated by the attention-enhanced region proposal extraction layer might have different dimensions, making the traditional classification impossible to deal with. ROI pooling converts these proposals into proposal feature maps by rescaling the original dimension of a region proposal. Then, in the classification stage, the proposal feature maps are fully combined into Softmax to determine the category of the proposal. In addition, regression is used for the bounding box again to obtain the final exact position of the region proposal.

4. Experiment

4.1. Benchmark Dataset

To test the proposed approach for landslide and mudslide detection, we integrated our developed dataset into a state-of-the-art benchmark dataset called the Bijie Landslide Dataset [26]. Figure 3 illustrates the selected samples in this dataset. The dataset includes landslide and mudslide samples, and false samples that contain a scene similar to landslide and mudslide scenes (we call these non-landslide and non-mudslide samples), which total 770 and 2300, respectively. We selected the non-landslide and non-mudslide samples by (1) increasing the diversity among different landslides and mudslides, and (2) providing different scales of samples for the same landslide or mudslide object.

To further expand the dataset, we designed a variety of data augmentation techniques for each sample. The details of the data augmentation process are given in Table 1.

4.2. Experimental Analysis

The region proposal network was accessed at: https://pytorch.org/vision/main/models/faster_rcnn.html, accessed on 16 November 2022. To test the scalability of the proposed method, we created ten experimental groups in total. Figure 3 illustrates the selected samples from the integrated benchmark dataset.

For each experimental group, we randomly selected 300 samples of landslides and mudslides and non-landslides and non-mudslides from the integrated benchmark dataset to pretrain the proposed model, and used the remaining data to test the detection results. Table 2 lists the detection results for landslides and mudslides.

4.3. Discussion

From the results, several conclusions could be reached.

Both Faster RCNN and the proposed method perform well in detecting landslides and mudslides. This proves that the state-of-the-art CNN techniques for object recognition have great potential for visually detecting landslides and mudslides from remote sensing images. Although the land surface surrounding a landslide or mudslide can vary, CNN methods can effectively discover and learn the high-level representations of landslides or mudslides.
However, the recall of detection of this method is poor. In the benchmark dataset, we selected the non-landslide and non-mudslide samples based on expert knowledge, ensuring these images were similar to true samples. In other words, these samples included a potential landslide or a mudslide region, although the landslide or mudslide was not visually recognizable in these images. This issue generates considerable interference in state-of-the-art CNN techniques for object recognition. The selected incorrect results for non-landslide and non-mudslide detection are shown in Figure 4.
The recall of detection results generated by the proposed method outperformed those generated by Faster-RCNN. In comparison to the architecture of Faster R-CNN, the architecture of the proposed method added attentions for classifying the features derived from the global image. Thus, we assume that features extracted from every patch of an image can avoid interference by non-landslide and non-mudslide regions. The features at a global scale might be sufficient to enhance the identification of true landslide and mudslide features from optical remote sensing images.

5. Conclusions

Landslide and mudslide detection from optical remote sensing images is a main task of remote sensing in landform and geological disaster monitoring. Since landslides and mudslides are distributed across a wide range of areas, and developed in multiple scales, labor-based monitoring might not precisely and completely detect all landslides and mudslides. Remote sensing provides a comprehensive dataset allowing landslide and mudslide detection to be accomplished with multi-scale monitoring and no physical contact. High-resolution images can provide spatial details of a landslide and a mudslide, requiring an approach to effectively address the complicated background and neighborhood of landslides and mudslides [30,31].

State-of-the-art CNNs for object recognition (Faster RCNN) can extract and discover the representative features of landslides and mudslides. Thus, a number of previous works have employed a variety of CNNs to identity landslide and mudslide regions from optical remote sensing images. However, these models mainly focus on learning the representations at local scales, which might lead to the exclusion of surrounding information. As reported in the experimental results, Faster RCNN could not efficiently identify irrelevant regions. The attentions extract and discover the representative features from every part of an image. These representative features at a global scale might be useful for distinguishing between a true landslide and mudslide and a false one. The proposed method incorporates the global features derived from attentions into the architecture of Faster RCNN, allowing the features to be used at global and local scales. The experimental results prove that the proposed method can outperform Faster RCNN in filtering out irrelevant regions while retaining high precision in recognizing landslides and mudslides. Several aspects are worthy of attention in future works. First, landslides and mudslides are represented at different scales in optical remote sensing images. Adding scale-independent parameters could significantly improve attention-enhanced feature learning. Moreover, the integration of attentions into a YOLO (You Only Look Once) model—another state-of-the-art CNN for object detection—represents a potential technical solution.

Author Contributions

Conceptualization, C.N. and X.X.; methodology, C.N. and X.X.; validation, K.M., X.S. and X.W.; investigation, X.X.; resources, C.N. and Y.X.; data curation, C.N., X.S., X.W. and L.T.; writing—original draft preparation, X.X.; writing—review and editing, C.N. and X.X.; project administration, X.X. and Y.X.; funding acquisition, X.X. and Y.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Fundamental Applied Research Foundation of Liaoning Province, grant number 2022JH2/101300257; Key Technology Research and Development Program of Shan Dong Provincial Bureau of Geology & Mineral Resources (SDGM), grant number KY202224; Outstanding Young Scholars of SDGM, grant number KY202001; Shenyang Young and Middle-aged Scientific and Technological Talents Program, grant number RC210502; and Weifang Science and Technology Project, grant number 2021ZJ1134.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Scaioni, M.; Longoni, L.; Melillo, V.; Papini, M. Remote sensing for landslide investigations: An overview of recent achievements and perspectives. Remote Sens. 2014, 6, 9600–9652. [Google Scholar] [CrossRef] [Green Version]
Guzzetti, F.; Cardinali, M.; Reichenbach, P. The AVI project: A bibliographical and archive inventory of landslides and floods in Italy. Environ. Manag. 1994, 18, 623–633. [Google Scholar] [CrossRef]
Salvati, P.; Bianchi, C.; Fiorucci, F.; Giostrella, P.; Marchesini, I.; Guzzetti, F. Perception of flood and landslide risk in Italy: A preliminary analysis. Nat. Hazards Earth Syst. Sci. 2014, 14, 2589–2603. [Google Scholar] [CrossRef] [Green Version]
Liu, M.; Chen, X.; Yang, S. Collapse landslide and mudslides hazard zonation. In Landslide Science for a Safer Geoenvironment; Springer: Berlin/Heidelberg, Germany, 2014; pp. 457–462. [Google Scholar]
Biibosunov, B.; Beksulanov, J. Information technologies for landslides and mudflows research. In Proceedings of the E3S Web of Conferences; E3S Web of Conferences: Zhuhai, China, 2020; p. 6005. [Google Scholar]
Zhao, C.; Lu, Z. Remote sensing of landslides—A review. Remote Sens. 2018, 10, 279. [Google Scholar] [CrossRef] [Green Version]
Zhong, C.; Liu, Y.; Gao, P.; Chen, W.; Li, H.; Hou, Y.; Nuremanguli, T.; Ma, H. Landslide mapping with remote sensing: Challenges and opportunities. Int. J. Remote Sens. 2020, 41, 1555–1581. [Google Scholar] [CrossRef]
Kontoes, C.; Loupasakis, C.; Papoutsis, I.; Alatza, S.; Poyiadji, E.; Ganas, A.; Psychogyiou, C.; Kaskara, M.; Antoniadi, S.; Spanou, N. Landslide Susceptibility Mapping of Central and Western Greece, Combining NGI and WoE Methods, with Remote Sensing and Ground Truth Data. Land 2021, 10, 402. [Google Scholar] [CrossRef]
Sinčić, M.; Bernat Gazibara, S.; Krkač, M.; Lukačić, H.; Mihalić Arbanas, S. The Use of High-Resolution Remote Sensing Data in Preparation of Input Data for Large-Scale Landslide Hazard Assessments. Land 2022, 11, 1360. [Google Scholar] [CrossRef]
Ullah, I.; Aslam, B.; Shah, S.H.I.A.; Tariq, A.; Qin, S.; Majeed, M.; Havenith, H.-B. An integrated approach of machine learning, remote sensing, and GIS data for the landslide susceptibility mapping. Land 2022, 11, 1265. [Google Scholar] [CrossRef]
Yu, B.; Chen, F. A new technique for landslide mapping from a large-scale remote sensed image: A case study of Central Nepal. Comput. Geosci. 2017, 100, 115–124. [Google Scholar] [CrossRef] [Green Version]
Lahousse, T.; Chang, K.; Lin, Y. Landslide mapping with multi-scale object-based image analysis–a case study in the Baichi watershed, Taiwan. Nat. Hazards Earth Syst. Sci. 2011, 11, 2715–2726. [Google Scholar] [CrossRef]
Cheng, G.; Guo, L.; Zhao, T.; Han, J.; Li, H.; Fang, J. Automatic landslide detection from remote-sensing imagery using a scene classification method based on BoVW and pLSA. Int. J. Remote Sens. 2013, 34, 45–59. [Google Scholar] [CrossRef]
Mohan, A.; Singh, A.K.; Kumar, B.; Dwivedi, R. Review on remote sensing methods for landslide detection using machine and deep learning. Trans. Emerg. Telecommun. Technol. 2021, 32, e3998. [Google Scholar] [CrossRef]
Hacıefendioğlu, K.; Demir, G.; Başağa, H.B. Landslide detection using visualization techniques for deep convolutional neural network models. Nat. Hazards 2021, 109, 329–350. [Google Scholar] [CrossRef]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28, 1137–1149. [Google Scholar] [CrossRef] [Green Version]
Ghorbanzadeh, O.; Blaschke, T.; Gholamnia, K.; Meena, S.R.; Tiede, D.; Aryal, J. Evaluation of different machine learning methods and deep-learning convolutional neural networks for landslide detection. Remote Sens. 2019, 11, 196. [Google Scholar] [CrossRef] [Green Version]
Han, G.; Huang, S.; Ma, J.; He, Y.; Chang, S.-F. Meta faster r-cnn: Towards accurate few-shot object detection with attentive feature alignment. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtually, 22 February–1 March 2022; pp. 780–789. [Google Scholar]
Wang, Q.; Zhang, X.; Chen, G.; Dai, F.; Gong, Y.; Zhu, K. Change detection based on Faster R-CNN for high-resolution remote sensing images. Remote Sens. Lett. 2018, 9, 923–932. [Google Scholar] [CrossRef]
Raghu, M.; Unterthiner, T.; Kornblith, S.; Zhang, C.; Dosovitskiy, A. Do vision transformers see like convolutional neural networks? Adv. Neural Inf. Process. Syst. 2021, 34, 12116–12128. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 6000–6010. [Google Scholar]
Jaderberg, M.; Simonyan, K.; Zisserman, A. Spatial transformer networks. Adv. Neural Inf. Process. Syst. 2015, 28, 2017–2025. [Google Scholar]
Dong, R.; Jiao, L.; Zhang, Y.; Zhao, J.; Shen, W. A multi-scale spatial attention region proposal network for high-resolution optical remote sensing imagery. Remote Sens. 2021, 13, 3362. [Google Scholar] [CrossRef]
Wu, L.; Liu, R.; Li, G.; Gou, J.; Lei, Y. Landslide Detection Methods Based on Deep Learning in Remote Sensing Images. In Proceedings of the 2022 29th International Conference on Geoinformatics, Beijing, China, 15–18 August 2022; pp. 1–4. [Google Scholar]
Yang, D.; Mao, Y. Remote sensing landslide target detection method based on improved Faster R-CNN. J. Appl. Remote Sens. 2022, 16, 044521. [Google Scholar]
Zhang, D.; Zhang, S.; Wang, H.; Ai, X.; Yi, N. Research on Landslide Detection in Remote Sensing Image Based on Improved Faster-RCNN. In Proceedings of the 2022 2nd International Conference on Computer, Control and Robotics (ICCCR), Shanghai, China, 18–20 March 2022; pp. 263–267. [Google Scholar]
Tanatipuknon, A.; Aimmanee, P.; Watanabe, Y.; Murata, K.T.; Wakai, A.; Sato, G.; Hung, H.V.; Tungpimolrut, K.; Keerativittayanun, S.; Karnjana, J. Study on Combining Two Faster R-CNN Models for Landslide Detection with a Classification Decision Tree to Improve the Detection Performance. J. Disaster Res. 2021, 16, 588–595. [Google Scholar] [CrossRef]
Tang, X.; Tu, Z.; Wang, Y.; Liu, M.; Li, D.; Fan, X. Automatic detection of coseismic landslides using a new transformer method. Remote Sens. 2022, 14, 2884. [Google Scholar] [CrossRef]
Zhao, D.; Zang, Q.; Wang, Z.; Quan, D.; Wang, S. SwinLS: Adapting Swin Transformer to Landslide Detection; CEUR Workshop Proceedings: Aachen, Germany, 2022. [Google Scholar]
Zhou, X.; Li, W.; Arundel, S.T. A spatio-contextual probabilistic model for extracting linear features in hilly terrains from high-resolution DEM data. Int. J. Geogr. Inf. Sci. 2019, 33, 666–686. [Google Scholar] [CrossRef]
Zhou, X.; Xue, B.; Xue, Y.; Xie, X.; Yang, J.; Qin, K. An Exploratory Evaluation of Multiscale Data Analysis for Landform Element Detection on High-Resolution DEM. IEEE Geosci. Remote Sens. Lett. 2021, 19, 1–5. [Google Scholar] [CrossRef]

Figure 1. Architecture of the proposed approach.

Figure 2. Architecture of self-attention-enhanced region proposal extraction layer.

Figure 3. Selected samples of benchmark dataset.

Figure 4. Illustration on incorrect detection results.

Table 1. Data augmentation.

Data Augmentation	Methods
Rotation	Generating 7 new images through rotating the original image 45 degrees.
Flip	Generating 2 new images through flipping the original image over the horizontal dimension and vertical dimension.
Scale	Generating 4 new images through scaling the original image by 1:4, 1:2, 2:1, and 4:1.
Contrast	Generating 4 new images through modifying the original image with separate weighting parameters of AGCWD*: 0.2, 0.4, 0.6, and 0.8.
Brightness	Generating 4 new images through randomly modifying the brightness of the original image.
Fog noises	Generating 4 new images through randomly adding different cloud noises to the original image.

Efficient Contrast Enhancement Using Adaptive Gamma Correction with Weighting Distribution (AGCWD).

Table 2. Detection results for landslides and mudslides and non-landslides and non-mudslides.

	Faster RCNN		The Proposed Method
	Landslides and Mudslides	Non-Landslides and Non-Mudslides	Landslides and Mudslides	Non-Landslides and Non-Mudslides
Group 1	384/400 0.9600	1161/2000 0.5805	385/400 0.9625	1321/2000 0.6605
Group 2	379/400 0.9475	1140/2000 0.5700	380/400 0.9550	1346/2000 0.6730
Group 3	390/400 0.9750	1125/2000 0.5625	390/400 0.9750	1318/2000 0.6590
Group 4	385/400 0.9625	1098/2000 0.5490	388/400 0.9700	1240/2000 0.6200
Group 5	388/400 0.9700	1075/2000 0.5375	394/400 0.9850	1199/2000 0.5995
Group 6	385/400 0.9625	1197/2000 0.5985	387/400 0.9675	1372/2000 0.6860
Group 7	377/400 0.9425	1158/2000 0.5790	377/400 0.9425	1333/2000 0.6665
Group 8	382/400 0.9550	1146/2000 0.5730	383/400 0.9575	1409/2000 0.7045
Group 9	387/400 0.9675	1137/2000 0.5685	390/400 0.9750	1350/2000 0.6750
Group 10	389/400 0.9725	1141/2000 0.5705	390/400 0.9750	1367/2000 0.6835

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Niu, C.; Ma, K.; Shen, X.; Wang, X.; Xie, X.; Tan, L.; Xue, Y. Attention-Enhanced Region Proposal Networks for Multi-Scale Landslide and Mudslide Detection from Optical Remote Sensing Images. Land 2023, 12, 313. https://doi.org/10.3390/land12020313

AMA Style

Niu C, Ma K, Shen X, Wang X, Xie X, Tan L, Xue Y. Attention-Enhanced Region Proposal Networks for Multi-Scale Landslide and Mudslide Detection from Optical Remote Sensing Images. Land. 2023; 12(2):313. https://doi.org/10.3390/land12020313

Chicago/Turabian Style

Niu, Chong, Kebo Ma, Xiaoyong Shen, Xiaoming Wang, Xiao Xie, Lin Tan, and Yong Xue. 2023. "Attention-Enhanced Region Proposal Networks for Multi-Scale Landslide and Mudslide Detection from Optical Remote Sensing Images" Land 12, no. 2: 313. https://doi.org/10.3390/land12020313

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Attention-Enhanced Region Proposal Networks for Multi-Scale Landslide and Mudslide Detection from Optical Remote Sensing Images

Abstract

1. Introduction

2. RPNs and Attentions

2.1. RPNs for Landslide/Muslide Feature Learning and Detection

2.2. Attentions for Landslide/Muslide Feature Learning and Detection

3. The Proposed Approach

3.1. Architecture

3.2. Feature Map Generation Layer

3.3. Attention-Enhanced Region Proposal Extraction Layer

3.3.1. Foreground Determination

3.3.2. Region Proposal Regression

3.4. ROI Pooling and Identification

4. Experiment

4.1. Benchmark Dataset

4.2. Experimental Analysis

4.3. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI