A Super-Resolution Network for High-Resolution Reconstruction of Landslide Main Bodies in Remote Sensing Imagery Using Coordinated Attention Mechanisms and Deep Residual Blocks

Zhang, Huajun; Ye, Chengming; Zhou, Yuzhan; Tang, Rong; Wei, Ruilong

doi:10.3390/rs15184498

Open AccessArticle

A Super-Resolution Network for High-Resolution Reconstruction of Landslide Main Bodies in Remote Sensing Imagery Using Coordinated Attention Mechanisms and Deep Residual Blocks

by

Huajun Zhang

^1,2,

Chengming Ye

^1,2,*,

Yuzhan Zhou

^1,2,

Rong Tang

^1,2 and

Ruilong Wei

^1,2

¹

Key Laboratory of Earth Exploration and Information Technology of Ministry of Education, Chengdu University of Technology, Chengdu 610059, China

²

College of Geophysics, Chengdu University of Technology, Chengdu 610059, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2023, 15(18), 4498; https://doi.org/10.3390/rs15184498

Submission received: 7 July 2023 / Revised: 7 September 2023 / Accepted: 11 September 2023 / Published: 13 September 2023

(This article belongs to the Special Issue Machine Learning and Remote Sensing for Geohazards)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

The lack of high-resolution training sets for intelligent landslide recognition using high-resolution remote sensing images is a major challenge. To address this issue, this paper proposes a method for reconstructing low-resolution landslide remote sensing images based on a Super-Resolution Generative Adversarial Network (SRGAN) to fully utilize low-resolution images in the process of constructing high-resolution landslide training sets. First, this paper introduces a novel Enhanced Depth Residual Block called EDCA, which delivers stable performance compared to other models while only slightly increasing model parameters. Secondly, it incorporates coordinated attention and redesigns the feature extraction module of the network, thus boosting the learning ability of image features and the expression of high-frequency information. Finally, a residual stacking-based landslide remote sensing image reconstruction strategy was proposed using EDCA residual blocks. This strategy employs residual learning to enhance the reconstruction performance of landslide images and introduces LPIPS for evaluating the test images. The experiment was conducted using landslide data collected by drones in the field. The results show that compared with traditional interpolation algorithms and classic deep learning reconstruction algorithms, this approach performs better in terms of SSIM, PSNR, and LPIPS. Moreover, the network can effectively handle complex features in landslide scenes, which is beneficial for subsequent target recognition and disaster monitoring.

Keywords:

landslide; super resolution; attention mechanism; EDCA

1. Introduction

Landslides are a type of natural disaster in which materials move down slopes due to gravity [1,2,3]. This disaster can cause significant damage to the environment and properties as well as threaten human safety, globally [4,5,6]. A variety of factors, including topography, lithology, tectonics, vegetation, and human activities, can cause landslides [7]. The combination of these factors with the force of gravity can trigger landslides, often caused by events such as rainfall and earthquakes [8,9]. As a result, it is crucial to promptly identify landslides for emergency response and reconstruction to create a safe and resilient society [10]. This has led to significant research in recent years aimed at identifying landslides across different scenes, using high-resolution remote sensing technology [11,12]. However, developing an accurate, fast, and comprehensive cross-scene landslide identification method remains challenging, primarily due to the difficulty in obtaining high-quality remote sensing data and the differences in landslide scenes [13].

Satellite images constitute the main data source for detecting landslides and updating inventory maps, as stated by Ghorbanzadeh et al. [14]. Although the level of automation in landslide recognition technology is gradually improving, acquiring high-resolution remote sensing images remains a challenge. Computer-led intelligent landslide recognition methods are gradually replacing the traditional approach based on expert knowledge, as noted by Wang et al. [15]. Presently, methods for identifying landslides based on deep learning are widely employed [16]. Nonetheless, these techniques require a vast amount of landslide datasets. Generally, three strategies are employed to obtain landslide recognition sample sets: (1) collecting samples around the targeted detection locale captured using different sensors; (2) utilizing publicly available sample sets with landslides exhibiting similar characteristics; and (3) collecting images in the same location as the target detection area or its adjacent areas. Among these methods, the sample set extracted from the same scene image closely resembles the target area in terms of characteristics [17,18,19]. Nonetheless, this approach fails to tackle the restricted dataset issue. In the case of the other two strategies, the images obtained using different sensors vary in their resolutions. The sample sets with a higher resolution than the targeted recognition area possess more image features, whereas those with a lower resolution may not provide enough features for learning. Thus, providing more detailed information for low-resolution sample sets can enable more effective utilization of the image data [20].

Super-resolution reconstruction is a critical topic of interest and is widely utilized in various domains, including military, remote sensing, medicine, and video surveillance, as noted by Chen et al. [21], Liu et al. [22], and Shi et al. [23]. This technique holds greater significance in the field of remote sensing, where precision is critical. In the frequency domain, remote sensing images are usually decomposed through discrete wavelet transforms. The wavelet coefficient images are then interpolated using the nearest neighbor, bilinear, or bicubic [24] methods, and inverse discrete wavelet transforms are applied to reconstruct low-resolution images [25]. The SRCNN [26] models an advanced version of the classical CNN model that enhanced the model for super-resolution image reconstruction. The smaller receptive field in the lower convolution layer makes it possible for the network to focus on local details, whereas the larger receptive field in the higher layer enables the network to attend to the overall information of a large area. These methods have managed to increase the peak signal-to-noise ratio (PSNR) to some extent in large-scale marine remote sensing data. Lei et al. [27] proposed a novel local–global combinatorial network approach to image SSR, which consists of a well-designed network structure and the capacity to learn multi-scale information from remote sensing data. The approach focuses on the residuals between SR and HR to produce robust outcomes. In recent times, the utilization of GANs [28] has significantly boosted the SSR methodology for images. Ma et al. [29] developed a GAN-based super-resolution reconstruction approach for remote sensing images that succeeded in improving the previous super-resolution GAN technique by allowing global and local recursive blocks and residual learning for facilitating the training of depth networks. Liu et al. [30] utilized GAN for multi-frame image fusion. They employed a two-branch fusion to generate high-resolution multi-spectrum images and used the complete convolution network as the discriminator. This approach enabled them to obtain high-resolution multi-spectral images successfully.

As previously mentioned, super-resolution reconstruction methods based on countermeasure networks have been shown to be successful in reconstructing high-resolution images of objects with prominent and steady edges, such as urban buildings, roads, and boats. Nonetheless, due to the complicated internal texture and edge vagueness associated with landslide images, the effectiveness of this method in landslide image sets has yet to be demonstrated. In this study, we constructed a super-resolution reconstruction network that emphasizes the location information of landslides to improve the reconstruction effect of landslide images. We addressed the two major problems of artifacts and insufficient reconstruction and conducted targeted model training on the landslide target. The aim was to effectively expand the path of acquiring high-resolution landslide datasets and provide support for subsequent landslide identification.

In comparison to the works of other researchers [31,32,33], this method presents several distinctive features. Firstly, an innovative residual block (EDCA) that incorporates Coordinate Attention while using depth separable convolutions instead of traditional convolutions has been designed. Consequently, this not only enhances the ability of feature extraction but also maintains the same network depth while regulating the parameter count. Secondly, our experimental results have shown that the SRGAN super-resolution artifacts are significantly reduced, resulting in greater stability of the model. Moreover, the accuracy evaluation index and realism of the visual effect exhibit considerable enhancement.

2. Materials

For data acquisition, this study employed the Dajiang Elf 4 multi-rotor UAV integrated with a CMOS image sensor to conduct remote sensing within the research area by conducting field investigations. The collected image data are in the RGB format. The dataset was constructed using UAV images of the Guili landslide located in the upper reaches of the Jinsha River. The landslide in this area demonstrates significant deformation that covers most of the area, with cracks and collapses of various sizes. It encompasses diverse textures and types and has vegetative covering along the boundary, which makes it an ideal training set. The landslides in the area include both groups of landslides and a few single landslides. Therefore, this study chose to use a few single landslides and a profusion of boundary areas of landslide groups to develop the dataset. Additionally, a few shaded images (as marked in the red box in Figure 1) were introduced to reduce hue consistency in the dataset.

The experiment will crop the source images to a size of 512 × 512 pixels. To improve the network’s training effectiveness, the limited dataset will be expanded by rotating and mirroring the cropped images to 0°, 90°, 180°, and 270° to generate eight times the original amount of data. This results in over 1000 high-resolution reference images, each the size of 512 × 512 pixels, as indicated in Figure 1. The reference images are further downsampled four times using the bicubic interpolation algorithm to generate 128 × 128-pixel low-resolution images, referred to as LR. Images taken from areas with complex landslide boundaries and a variety of natural elements such as mountains are selected as validation images. Image blocks that feature rich texture details are then downsampled four times to create a test dataset.

3. Methods

In this study, we devised an enhanced deep residual network with coordinated attention (EDCA) using a residual block. We eliminated the batch normalization (BN) layer and replaced the conventional convolution approach with depth separable convolution. To enhance the feature extraction capability of the model and prioritize target location information, we integrated Coordinate Attention into the residual block. We increased the network depth by combining several residual blocks. By stacking multiple residual blocks, we focused more attention on the landslide–vegetation boundary. To evaluate the algorithm, we used peak signal-to-noise ratio (PSNR) and Structural Similarity (SSIM) as the evaluation indices. We used a combination of content loss, confrontation loss, and perceptual loss as the overall loss to improve the model’s fitting during training.

3.1. Network Architecture

The generation countermeasure network model consists of a generator and a discriminator. Figure 2 illustrates the process of training EDCA-SRGAN, in which the generator takes a low-resolution image (LR) and upscales it into a pseudo image (SR). The discriminator network then evaluates the authenticity of the SR and provides feedback in the form of loss. The VGG network is used to extract features from the ground-truth and SR images, while the Globe Loss measures the difference between these features. By continuously learning from feedback, the generator improves its output by producing images that progressively approach the original image. The generator’s output threshold is determined by the point at which the discriminator can no longer distinguish between the SR and the original image. While GAN-based approaches offer an optimal solution for super-resolution reconstruction tasks, relying solely on GAN methods can introduce noise and artifacts, thereby compromising image quality.

This paper proposes an EDCA-SRGAN network to address the lack of edge detail information and artifacts in the reconstruction of landslide images. The specific network framework is depicted in Figure 3 and Figure 4.

The generator in the generation countermeasure network for super-resolution reconstruction has a fundamental purpose to upscale the low-resolution input image using a neural network and output a high-resolution image. The EDCA-SRGAN generator comprises three principal components: feature extraction module A, feature enhancement module B, and upsampling reconstruction module C. Figure 3 illustrates these modules:

(1): The first module in the EDCA-SRGAN generator is the feature extraction module A, comprising the initial convolution layer of the network architecture. Small convolution kernels excel at extracting high-frequency edge details, while larger kernels perform better at capturing rougher structural content information. Therefore, to retain the edge detail information found in landslide images and minimize computational complexity, we leverage a small 3 × 3 convolution kernel. Furthermore, the generator takes in low-resolution landslide images through LR;
(2): Comprising 10 EDCA (Enhanced Deep Residual Network with Coordinated Attention) residual blocks and a convolution layer using a 3 × 3 kernel size, module B enhances the features extracted in the previous module. The stacking of residual blocks facilitates deep feature extraction by allowing for the inclusion of additional layers and connections, leading to better network performance. Notably, residual connections are crucial in mitigating the problem of “gradient disappearance,” which arises in networks with numerous layers;
(3): Module C, responsible for upsampling reconstruction, comprises an upsampling layer and a convolution layer. Instead of the common, general pooling technique, our paper utilizes adaptive pooling (Adaptive-Pooling) to upscale images. The benefits of Adaptive-Pooling are numerous; for instance, the function automatically determines the convolution kernel and step size, negating the need for arbitrary input. Besides, the convolution kernel is variable, and the step size is dynamic, providing overlap between adjacent pooling windows.

The discriminator network structure of the EDCA-SRGAN model is illustrated in Figure 4. It comprises a single convolutional layer with 64 channels and a stride of 1, along with multiple convolutional structures that have a stride of 2. The size of the input image is reduced to 1/16th of the original image size in both width and length, and the final channel is 512 dimensional. Following the output, the generated features pass through an adaptive average pooling and LReLU activation function, and a one-dimensional convolutional layer generates the final output.

To simplify the computational process, LReLU is used as the activation function, while adaptive average pooling is employed. With the increase in network depth, the capability of feature extraction is improved, and the number of image features increases, with the size of each convolutional layer used for feature extraction decreasing corresponding to network depth. The LReLU activation function is chosen over the ReLU function as it performs better with negative values, thereby avoiding sparse gradients. Moreover, adaptive average pooling can extract features more effectively, resulting in more precise image processing and reducing computing time.

3.2. EDCA Structure

The primary goal of this module is to lower the consumption of computing resources by eliminating the batch normalization (BN) layer. While the BN layer is crucial for high-level computer vision tasks such as classification, it is unsuitable for low-level computer vision tasks, such as super-resolution (SR) reconstruction. The normalization of network features in the BN layer can restrict the range flexibility of residual modules and increase the GPU memory load, which can impede the SR reconstruction task. As a result, the improved generator architecture removes the BN layer and integrates the Channel Attention (CA) module to enhance the feature extraction capacity by using location-based data. To regulate the number of parameters, depth separable convolution replaces the traditional convolution. The residual dense block structure, as displayed in Figure 5, is employed. These changes aim to amplify the accuracy and efficiency of the SR reconstruction task.

3.3. Coordinate Attention

Landslide image enhancement necessitates efficient feature extraction to depict detailed information while recovering blurred imagery without compromising the quality of areas with significant image information loss.

To overcome this challenge, we introduced Coordinate Attention to SRGAN, which is depicted in Figure 6. Coordinate Attention blocks enable channel-encoded representation of each coordinate in both horizontal and vertical dimensions, resulting in a direction-aware feature pair. This operation captures long correlations spatially and retains location information in another spatial dimension, enabling the network to pinpoint objects of interest more accurately. Furthermore, the model takes both the channel dimension and spatial dimension into account, allowing it to give more attention to useful channel information by learning the adaptive channel weight. Typically, channel information records global image characteristics such as hue, brightness, and shading. In landslide image edge enhancement, we utilized Coordinate Attention to capitalize on effective information, compensate for the convolution operation’s local limitations, and take advantage of global feature extraction.

3.4. Loss Function

During the super-resolution reconstruction process, there is inherent uncertainty when recovering high-frequency details from low-resolution images. In fact, the same scene image can produce a variety of high-resolution image reconstruction results. In the generated confrontation network, the generator’s loss function is used to describe the gap between the false image and the real image. This is an important indicator of the visual quality of the high-resolution image. Therefore, the composition of the loss function plays a crucial role in obtaining high-quality images. In this paper, we use the weighted sum of the content loss, confrontation loss, and perceptual loss as the total loss. The overall formula for the generator’s loss is as follows:

L_{G} = a L_{M S E} + β L_{G}^{R_{a}} + φ L_{{_{V G G}}_{/ i, j}}^{percep}

(1)

where L_G, L_MSE, and

L_{G}^{R_{a}}

L_{{_{V G G}}_{/ i, j}}^{percep}

correspond to the total loss, content loss, adversarial loss, and perceptual loss of the generator, respectively; α, β, and φ are coefficients that balance different loss terms.

The content loss L_MSE usually refers to pixel loss, which is the loss between the pixels of the pseudo image and the real image. Specifically, L_MSE represents the 1-norm distance between the pseudo image G (x_i) and the real image y, where x_i is the low-resolution input image, and G is the generator. The content loss L_MSE can be expressed as follows:

L_{M S E} = E_{x_{i}} {‖G (x_{i}) - y‖}_{M S E}

(2)

In a generated countermeasure network, the loss can be separated into generator loss and discriminator loss. Including the generator loss in the total loss, the function can enhance the optimization of the generator. Both the generator and the discriminator are exposed to the loss, and the formulas are symmetrical. The specific formulas are as follows:

L_{G}^{R_{a}} = - E_{X_{r}} [\log_{2} (D_{R_{a}} (x_{r}, y_{f}))] - E_{x}_{f} [\log_{2} (1 - D_{R_{a}} (x_{f}, x_{r}))]

(3)

L_{G}^{R_{a}} = - E_{X_{r}} [\log_{2} (1 - D_{R_{a}} (x_{r}, y_{f}))] - E_{x}_{f} [\log_{2} (D_{R_{a}} (x_{f}, x_{r}))]

(4)

Perceptual loss refers to the difference in feature information between the pseudo image and the real image as extracted by the feature extraction network. In SRGAN, the VGG network is employed as the feature extraction network to enhance the realism of the generated image and improve the visual effect of the enhanced image [34]. However, there are some drawbacks to this approach. The feature information beyond the activation layer of the VGG network becomes inactive, resulting in sparse overall feature information. To prevent the loss of integrity in the feature information of the landslides and surrounding areas, the feature information before the activation layer is considered in the perception loss, which is expressed as the following formula:

L_{{_{V G G}}_{/ i, j}}^{percep} = \frac{1}{W_{i, j} H_{i, j}} {\sum_{x = 1}^{W_{i, j}} \sum_{y = 1}^{H_{i, j}} (x_{r} - x_{f})}^{2}

(5)

VGG/i, j denotes the feature map of the jth convolution before the i-th max-pooling layer in the VGG network, and W_i,j and H_i,j denote the feature map dimensions of the network.

3.5. Image Quality Evaluation Index

While visual quality has the final say, we still need a robust and reliable image quality assessment metric to measure subtle variations in the SRR method assessment. Some previous work adopted peak signal-to-noise ratio (PSNR) as their metric. After normalizing the image from 0 to 1, the PSNR (dB) is expressed as follows:

P S N R = 10 \times \log (M S E)

(6)

where MSE refers to the mean square error between the pseudo image and the real image, and the unit of PSNR is dB. The larger the value, the smaller the distortion, because the larger the value, the smaller the MSE, and the smaller the MSE, the closer the distortion.

The structural similarity index (SSIM) is a measure of the similarity between two images. The SSIM formula is based on three comparison measures between samples x and y: brightness (L), contrast (C), and texture (S), expressed as follows:

L (x, y) = \frac{2 μ_{x} μ_{y} + c_{1}}{μ_{x}^{2} + μ_{y}^{2} + c_{1}} C (x, y) = \frac{2 σ_{x} σ_{y} + c_{2}}{σ_{x}^{2} + σ_{y}^{2} + c_{2}} S (x, y) = \frac{σ_{x y} + c_{3}}{σ_{x} σ_{y} + c_{3}}

(7)

S S I M (x, y) = [L {(x . y)}^{σ_{}} \cdot C {(x . y)}^{β} \cdot S {(x . y)}^{γ}]

(8)

where

μ_{x}

is the mean of x,

μ_{y}

is the mean of y,

σ_{x}^{2}

is the variance of x,

σ_{y}^{2}

is the variance of y,

σ_{x y}

is the covariance of x and y, and c₁ and c₂ are two constants that avoid division by zero.

LPIPS (Learned Perceptual Image Patch Similarity) is a quality assessment metric for images based on deep learning [35]. It measures distances between two images in a feature space, thus providing a quantitative score for the quality of images [36] expressed as follows:

d (x, x_{0}) = \sum_{ι} \frac{1}{H_{ι} W_{ι}} \sum_{h, w} {‖ω_{ι} ⊙ ({\hat{y}}_{h w}^{ι} - {\hat{y}}_{o h w}^{ι})‖}_{2}^{2}

(9)

where

d

is the distance between x₀ and x. The features are extracted from the L layer and normalized in the channel dimension. The channel dimension is activated by

w_{l}

scaling, and the L2 distance is calculated. This metric incorporates perceptual characteristics, such as color, texture, and edges, which are essential in image recognition. It accurately emulates the way that people perceive visual stimuli. In LPIPS, the score becomes smaller when the visual difference between two images is smaller, indicating higher quality. On the other hand, the larger the score, the greater the visual difference between the two images, indicating lower quality.

4. Experimental Results

This experiment utilizes objective evaluation results to demonstrate the super-resolution performance of the model. In addition to comparing the network depth and the number of parameters used in the model vertically, a horizontal comparison is also conducted between traditional methods and representative super-resolution models.

4.1. Implement Details

The dataset was selected from UAV images and cropped to 512 × 512 high-resolution images. Our selection criteria focused on landslide objects with rich texture and detailed information. To increase the dataset, the cropped images were rotated and mirrored at four angles (0°, 90°, 180°, and 270°), resulting in eight-fold samples. This process yielded a total of 1271 high-resolution reference images (I_HR), each with a resolution of 512 × 512 pixels. Out of these images, 1000 were allocated for training and 271 for testing. Additionally, these images were downsampled by a factor of four to 128 × 128 pixels, forming both I_HR and I_LR versions. During training, LR images were fed into the SRGAN model using the proposed EDCA-SRGAN architecture, as well as bicubic upsampling, SRGAN [37], and BAM [38] attention mechanism. The parameters of the VGG-19 model were pre-imported, and a learning rate of 2 × 10⁻⁴ was used. The learning rate followed a cosine decay schedule throughout the 200 epochs of uniform training. In the testing phase, the network directly processed the 128 × 128 images to obtain the super-resolution results.

The experimental hardware setup consisted of an NVIDIA GeForce GTX 3090 24 GB GPU and 64 GB of RAM.

4.2. Evaluation Using Different Number of EDCA Blocks

The most crucial aspect of the EDCA-SRGAN method in this paper is the generator for producing high-resolution images. The EDCA block is the primary component of the generator and has a significant impact on the image reconstruction quality. In the experiments, we conducted comparative studies by setting the number of EDCA modules to 5, 10, and 15, and the resulting quality measured with PSNR and SSIM are shown in Figure 7.

As evident from Figure 7, the performance of EDCA-SRGAN remains stable regardless of the number of EDCA blocks used. This is mainly due to the CA attention mechanism enhancing the feature extraction ability and the residual blocks being stacked without causing gradient vanishing. To strike a balance between SR quality and computational load, we selected 10 EDCA modules for all subsequent experiments. The model exhibits robustness and consistently delivers stable, high-quality performance.

4.3. Evaluation Result

Figure 8 reveals that the utilization of EDCA residual blocks leads to a significant boost in PSNR and SSIM compared to SRGAN and BAM-SRGAN. While the SRGAN model featuring the BAM attention mechanism initially improves the index, its scores decrease in subsequent stages, displaying higher fluctuations. In contrast, the PSNR improvement delivered via EDCA-SRGAN rises by approximately three percentage points, bringing it closer to the original map. Additionally, its index curve displays slower fluctuation and superior stability, thus accelerating network convergence. After undergoing super-resolution processing, the LR (low resolution) becomes more like the original image, mitigating the effects of random generation during the SR process and better preserving the characteristics of landslide images before and after processing.

Table 1 underscores that our proposed EDCA-SRGAN has delivered superior results in both PSNR and SSIM evaluation metrics. Compared to the original SRGAN model without network structure enhancements, PSNR and SSIM scores have improved by 0.8 dB and 0.01, respectively. Compared to the SRGAN model featuring the BAM attention mechanism, EDCA-SRGAN has improved PSNR and SSIM by 1.1 dB and 0.01, respectively. Despite utilizing the EDCA residual block proposed in this study, there is little disparity in parameters relative to the other two methods. EDCA-SRGAN not only efficiently controls the number of parameters but also enhances the model’s performance accuracy.

The qualitative results help further illustrate our method’s advantages. The SR comparison results are presented in Figure 9. Bicubic interpolation displays blurriness and is not suitable for handling complex and irregular image information in the landslide domains. The SRGAN model featuring the BAM attention mechanism surpasses the original model in terms of feature extraction and detail recovery in comparison to group 1, exhibiting some advancements in artifact suppression. However, given comparisons of all three groups, the overall performance is unstable, and some SR images are not better than the original model. Conversely, in the first group of comparison, EDCA-SRGAN delivers significantly better vegetation reduction than other methods, facilitating the proper identification and distinction of boundaries when interpreting landslides, unaffected by the later growth of vegetation. Additionally, our proposed method demonstrates exceptional consistency in suppressing SR image artifacts when considering the comprehensive comparison of all three SR images. In our results, objects generally maintain their original shapes and hues, rendering the artifact suppression more stable, thereby indicating remarkable improvements.

When performing global image comparisons, the SRGAN model’s use of batch normalization layers enhance global image reality compared to the traditional bicubic interpolation. However, many details such as gravel and vegetation appear as ripples in the image reconstructed using the SRGAN model, resulting in artifact images that significantly impair detail perception, distort edge detail information, and seriously affect reconstruction effects.

4.4. Ablation Experiment

We have presented the outcomes of an ablation study to showcase the effectiveness of our modifications. Given that minor modifications may not have a significant visual impact, PSNR and SSIM metric curves are used to represent these results, as depicted in Figure 10. Correspondingly, in line with the previous section, we utilized the same network parameters and training strategy. Specifically, in the control experiment, we eliminated Coordinate Attention while retaining the same number of residual blocks. Figure 10 demonstrates that removing Coordinate Attention leads to a significant drop in PSNR as well as a decline in the model’s robustness. Comparatively, the SSIM curve more explicitly demonstrates the negative effect on the model’s robustness, where the metric score starts high but quickly drops to a lower level with increased training frequency. The analysis highlights that the contribution of Coordinate Attention to the model’s performance and robustness cannot be ignored.

To further investigate the advantages and internal effects of the designed model, this paper conducts a visual analysis of the feature mapping within the network. Figure 10 demonstrates the use of blue, green, red, and yellow colors to indicate the relative activity levels of pixels. Specifically, blue is employed to represent low response values, green represents medium response values, yellow represents high response values, and red indicates very high response values.

Typically, brighter colors generally indicate higher levels of activity in the pixel area, while darker colors indicate lower levels of activity. A high response value usually suggests that the convolution check of this position is relatively sensitive to some features in the input image, providing a higher activation response where this feature exists. The degree of response for the corresponding landslide position in Figure 10a is notably higher than that in Figure 10b. Figure 11 and Figure 12 demonstrate that the response degree of the training set is significantly higher than that of the test set, and the response degree of the feature extraction module before and after improvement varies for landslide images under different vegetative coverage levels. This is dependent upon the degree to which the landslide is concealed. For boundary extraction of landslides, the improved model performs significantly better than the previous model. Some landslides with vegetation exhibit insufficient response with the unimproved model, resulting in inadequate target feature extraction.

5. Discussion

The objective of the model is to improve the resolution of low-resolution landslide images and enhance the definition of their boundary, as illustrated in Figure 13a, where the arrow denotes the boundary information of the landslide. In Figure 13b, the blue line represents the low-resolution version of the landslide boundary while the red line shows the actual boundary of the landslide. The low-resolution boundary of the landslide becomes blurred, making it challenging to interpret visual clues. Vegetation cover located at the edge of the landslide also creates significant difficulty in distinguishing the actual scope of the landslide. The process of constructing the landslide dataset has a considerable effect on the intelligent identification outcome. However, the process is subjective, and the recognition of empirical boundaries is inherently susceptible to error. Providing more details for low-resolution samples can help enhance data resources and increase their utilization.

Despite our method’s achievements, there remains a significant gap between our results and reality. Careful observation reveals that the objects’ boundary details become distorted, and in complex landslide scenes, the network fails to locate the outlines of each object amidst the chaotic background. Therefore, even with an ultra-high resolution UAV dataset that fully incorporates landslide samples, the surrounding environment images, and complex vegetation, mountain details constrain the SRR model’s performance without external prior knowledge. This challenge will remain a critical area for further research in the future.

6. Conclusions

The experiments presented in this article demonstrate that shallow network structures are inadequate for predicting complex details in complex scenes. Additionally, the GAN-based method, which consists of two networks, can lead to unstable model convergence and various fatal artifacts.

To address the SRR task of landslide remote sensing images, this article proposes an EDCA-SRGAN method. By incorporating Coordinate Attention blocks, the network can capture the global information of landslide images and fully utilize feature maps containing effective information. This approach improves the robustness of the model while extracting deeper features, which validates the strategy of strengthening the feature extraction module and ensuring a stable convergence process. However, despite these improvements, there is still a significant gap between the reconstructed images and the actual situation. The complexity of the landslide scene still makes the reconstructed landslide boundary not clear enough. Therefore, image reconstruction in landslide scenes remains a highly challenging issue that requires further solutions.

Author Contributions

Conceptualization, C.Y.; Methodology, C.Y., H.Z. and Y.Z.; Formal Analysis, R.T., R.W. and Y.Z.; Writing—Original Draft Preparation, H.Z., Y.Z. and R.T.; Writing—Review & Editing, C.Y., H.Z., Y.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China (42071411), the Strategic Priority Research Program of the Chinese Academy of Sciences under Grant XDA23090203, the Second Tibetan Plateau Scientific Expedition and Research Program (2019QZKK0902), and the Key Research and Development Program of Sichuan Province (2022YFG0200).

Conflicts of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as potential conflicts of interest.

References

Hungr, O.; Leroueil, S.; Picarelli, L. The Varnes Classification of Landslide Types, an Update. Landslides 2014, 11, 167–194. [Google Scholar] [CrossRef]
Leynaud, D.; Mulder, T.; Hanquiez, V.; Gonthier, E.; Regert, A. Sediment Failure Types, Preconditions and Triggering Factors in the Gulf of Cadiz. Landslides 2017, 14, 233–248. [Google Scholar] [CrossRef]
Wang, L.; Qiu, H.; Zhou, W.; Zhu, Y.; Liu, Z.; Ma, S.; Yang, D.; Tang, B. The Post-Failure Spatiotemporal Deformation of Certain Translational Landslides May Follow the Pre-Failure Pattern. Remote Sens. 2022, 14, 2333. [Google Scholar] [CrossRef]
Ji, S.; Yu, D.; Shen, C.; Li, W.; Xu, Q. Landslide Detection from an Open Satellite Imagery and Digital Elevation Model Dataset Using Attention Boosted Convolutional Neural Networks. Landslides 2020, 17, 1337–1352. [Google Scholar] [CrossRef]
Li, D.; Huang, F.; Yan, L.; Cao, Z.; Chen, J.; Ye, Z. Landslide Susceptibility Prediction Using Particle-Swarm-Optimized Multilayer Perceptron: Comparisons with Multilayer-Perceptron-Only, BP Neural Network, and Information Value Models. Appl. Sci. 2019, 9, 3664. [Google Scholar] [CrossRef]
Varnes, D.J.; Bufe, C.G. The Cyclic and Fractal Seismic Series Preceding an m(b) 4.8 Earthquake on 1980 February 14 near the Virgin Islands. Geophys. J. Int. 1996, 124, 149–158. [Google Scholar] [CrossRef]
Dao, D.V.; Jaafari, A.; Bayat, M.; Mafi-Gholami, D.; Qi, C.; Moayedi, H.; Phong, T.V.; Ly, H.-B.; Le, T.-T.; Trinh, P.T.; et al. A Spatially Explicit Deep Learning Neural Network Model for the Prediction of Landslide Susceptibility. Catena 2020, 188, 104451. [Google Scholar] [CrossRef]
Wang, H.; Cui, P.; Liu, D.; Liu, W.; Bazai, N.A.; Wang, J.; Zhang, G.; Lei, Y. Evolution of a Landslide-Dammed Lake on the Southeastern Tibetan Plateau and Its Influence on River Longitudinal Profiles. Geomorphology 2019, 343, 15–32. [Google Scholar] [CrossRef]
Pei, Y.; Qiu, H.; Yang, D.; Liu, Z.; Ma, S.; Li, J.; Cao, M.; Wufuer, W. Increasing Landslide Activity in the Taxkorgan River Basin (Eastern Pamirs Plateau, China) Driven by Climate Change. Catena 2023, 223, 106911. [Google Scholar] [CrossRef]
Wei, K.; Ouyang, C.; Duan, H.; Li, Y.; Chen, M.; Ma, J.; An, H.; Zhou, S. Reflections on the Catastrophic 2020 Yangtze River Basin Flooding in Southern China. Innovation 2020, 1, 100038. [Google Scholar] [CrossRef]
Xu, Y.; Liu, X.; Cao, X.; Huang, C.; Liu, E.; Qian, S.; Liu, X.; Wu, Y.; Dong, F.; Qiu, C.-W.; et al. Artificial Intelligence: A Powerful Paradigm for Scientific Research. Innovation 2021, 2, 100179. [Google Scholar] [CrossRef] [PubMed]
Cui, P.; Peng, J.; Shi, P.; Tang, H.; Ouyang, C.; Zou, Q.; Liu, L.; Li, C.; Lei, Y. Scientific Challenges of Research on Natural Hazards and Disaster Risk. Geogr. Sustain. 2021, 2, 216–223. [Google Scholar] [CrossRef]
Xu, Q.; Ouyang, C.; Jiang, T.; Yuan, X.; Fan, X.; Cheng, D. MFFENet and ADANet: A Robust Deep Transfer Learning Method and Its Application in High Precision and Fast Cross-Scene Recognition of Earthquake-Induced Landslides. Landslides 2022, 19, 1617–1647. [Google Scholar] [CrossRef]
Ghorbanzadeh, O.; Shahabi, H.; Crivellari, A.; Homayouni, S.; Blaschke, T.; Ghamisi, P. Landslide Detection Using Deep Learning and Object-Based Image Analysis. Landslides 2022, 19, 929–939. [Google Scholar] [CrossRef]
Wang, H.; Zhang, L.; Yin, K.; Luo, H.; Li, J. Landslide Identification Using Machine Learning. Geosci. Front. 2021, 12, 351–364. [Google Scholar] [CrossRef]
Wang, H.; Zhang, L.; Luo, H.; He, J.; Cheung, R.W.M. AI-Powered Landslide Susceptibility Assessment in Hong Kong. Eng. Geol. 2021, 288, 106103. [Google Scholar] [CrossRef]
Jia, S.; Wang, Z.; Li, Q.; Jia, X.; Xu, M. Multiattention Generative Adversarial Network for Remote Sensing Image Super-Resolution. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–15. [Google Scholar] [CrossRef]
Zhang, H.; Wang, P.; Jiang, Z. Nonpairwise-Trained Cycle Convolutional Neural Network for Single Remote Sensing Image Super-Resolution. IEEE Trans. Geosci. Remote Sens. 2021, 59, 4250–4261. [Google Scholar] [CrossRef]
Dong, R.; Zhang, L.; Fu, H. RRSGAN: Reference-Based Super-Resolution for Remote Sensing Image. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5601117. [Google Scholar] [CrossRef]
Qiu, H.; Zhu, Y.; Zhou, W.; Sun, H.; He, J.; Liu, Z. Influence of DEM Resolution on Landslide Simulation Performance Based on the Scoops 3D Model. Geomat. Nat. Hazards Risk 2022, 13, 1663–1681. [Google Scholar] [CrossRef]
Chen, Y.; Zhang, H.; Liu, L.; Tao, J.; Zhang, Q.; Yang, K.; Xia, R.; Xie, J. Research on Image Inpainting Algorithm of Improved Total Variation Minimization Method. J. Ambient Intell. Humaniz. Comput. 2023, 14, 5555–5564. [Google Scholar] [CrossRef]
Liu, K.; Yu, H.; Zhang, M.; Zhao, L.; Wang, X.; Liu, S.; Li, H.; Yang, K. A Lightweight Low-Dose PET Image Super-Resolution Reconstruction Method Based on Convolutional Neural Network. Curr. Med. Imaging 2023, 19, 1427–1435. [Google Scholar] [CrossRef] [PubMed]
Shi, J.; Ye, Y.; Liu, H.; Zhu, D.; Su, L.; Chen, Y.; Huang, Y.; Huang, J. Super-Resolution Reconstruction of Pneumocystis Carinii Pneumonia Images Based on Generative Confrontation Network. Comput. Methods Programs Biomed. 2022, 215, 106578. [Google Scholar] [CrossRef] [PubMed]
Keys, R. Cubic Convolution Interpolation for Digital Image Processing. IEEE Trans. Acoust. Speech Signal Process. 1981, 29, 1153–1160. [Google Scholar] [CrossRef]
Zhang, L.; Chen, D. A Novel Saliency-Oriented Superresolution Method for Optical Remote Sensing Images. IEEE Geosci. Remote Sens. Lett. 2018, 15, 1922–1926. [Google Scholar] [CrossRef]
Dong, C.; Loy, C.C.; He, K.; Tang, X. Learning a Deep Convolutional Network for Image Super-Resolution. In Computer Vision—ECCV 2014; Lecture Notes in Computer Science; Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T., Eds.; Springer International Publishing: Cham, Germany, 2014; Volume 8692, pp. 184–199. ISBN 978-3-319-10592-5. [Google Scholar]
Lei, S.; Shi, Z.; Zou, Z. Super-Resolution for Remote Sensing Images via Local-Global Combined Network. IEEE Geosci. Remote Sens. Lett. 2017, 14, 1243–1247. [Google Scholar] [CrossRef]
Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Networks. Adv. Neural Inf. Process. Syst. 2014, 27. [Google Scholar] [CrossRef]
Ma, W.; Pan, Z.; Guo, J.; Lei, B. Achieving Super-Resolution Remote Sensing Images via the Wavelet Transform Combined With the Recursive Res-Net. IEEE Trans. Geosci. Remote Sens. 2019, 57, 3512–3527. [Google Scholar] [CrossRef]
Liu, Q.; Zhou, H.; Xu, Q.; Liu, X.; Wang, Y. PSGAN: A Generative Adversarial Network for Remote Sensing Image Pan-Sharpening. IEEE Trans. Geosci. Remote Sens. 2021, 59, 10227–10242. [Google Scholar] [CrossRef]
Liu, B.; Zhao, L.; Li, J.; Zhao, H.; Liu, W.; Li, Y.; Wang, Y.; Chen, H.; Cao, W. Saliency-Guided Remote Sensing Image Super-Resolution. Remote Sens. 2021, 13, 5144. [Google Scholar] [CrossRef]
Ma, J.; Yu, J.; Liu, S.; Chen, L.; Li, X.; Feng, J.; Chen, Z.; Zeng, S.; Liu, X.; Cheng, S. PathSRGAN: Multi-Supervised Super-Resolution for Cytopathological Images Using Generative Adversarial Network. IEEE Trans. Med. Imaging 2020, 39, 2920–2930. [Google Scholar] [CrossRef] [PubMed]
Lei, J.; Xue, H.; Yang, S.; Shi, W.; Zhang, S.; Wu, Y. HFF-SRGAN: Super-Resolution Generative Adversarial Network Based on High-Frequency Feature Fusion. J. Electron. Imaging 2022, 31, 033011. [Google Scholar] [CrossRef]
Yan, Y.; Liu, C.; Chen, C.; Sun, X.; Jin, L.; Peng, X.; Zhou, X. Fine-Grained Attention and Feature-Sharing Generative Adversarial Networks for Single Image Super-Resolution. IEEE Trans. Multimed. 2022, 24, 1473–1487. [Google Scholar] [CrossRef]
Altini, N.; Marvulli, T.M.; Zito, F.A.; Caputo, M.; Tommasi, S.; Azzariti, A.; Brunetti, A.; Prencipe, B.; Mattioli, E.; De Summa, S.; et al. The Role of Unpaired Image-to-Image Translation for Stain Color Normalization in Colorectal Cancer Histology Classification. Comput. Meth. Programs Biomed. 2023, 234, 107511. [Google Scholar] [CrossRef]
Zhang, Z.; Lu, W.; Chen, S.; Yang, F.; Jingchang, P. Boundary Equilibrium SR: Effective Loss Functions for Single Image Super-Resolution. Appl. Intell. 2023, 53, 17128–17138. [Google Scholar] [CrossRef]
Ledig, C.; Theis, L.; Huszar, F.; Caballero, J.; Cunningham, A.; Acosta, A.; Aitken, A.; Tejani, A.; Totz, J.; Wang, Z.; et al. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
Park, J.; Woo, S.; Lee, J.-Y.; Kweon, I.S. BAM: Bottleneck Attention Module. arXiv 2018, arXiv:1807.06514. [Google Scholar]

Figure 1. Example of landslide dataset. Each sample has a size of 512 × 512, and one with a richer background is selected to form it.

Figure 2. Model training process.

Figure 3. Generator network structure. The generator network employs residual learning to extract feature maps and generate high-resolution (HR) images. The EDCA corresponds to a residual block.

Figure 4. Discriminator network structure. The discriminator network utilizes these feature maps to assess the quality of the generated HR images. The term “Conv” stands for convolutional layers of different kernel sizes.

Figure 5. Residual block. (a) Original. (b) SRResNet. (c) EDCA.

Figure 6. Coordinate Attention block.

Figure 7. Results of EDCA-GAN with different number of EDCA blocks. (a) PSNR. (b) SSIM.

Figure 8. Comparison of reconstruction indicators. (a) PSNR. (b) SSIM.

Figure 9. Comparison of reconstruction results. We conduct experiments on varied methods. Our proposed EDCA-SRGAN outperforms the others. Parentheses indicate the LPIPS indicator. The smaller the value, the smaller the gap between the restored image and the original image, indicating better restoration quality.

Figure 10. Results of Coordinate Attention ablation study. (a) PSNR. (b) SSIM.

Figure 11. The generator visualization results of the test set before and after improvement. (a) The EDCA-SRGAN generator outputs the results. (b) The generator of the SRGAN outputs the result. Blue is used to indicate low response values, green is used to indicate medium response values, and yellow is used to indicate higher response values. In addition, the use of red in visualization shows very high response values.

Figure 12. The generator visualization results of the training set before and after improvement. (a) The EDCA-SRGAN generator outputs the results. (b) The generator of the SRGAN outputs the result. Blue is used to indicate low response values, green is used to indicate medium response values, and yellow is used to indicate higher response values. In addition, the use of red in visualization shows very high response values.

Figure 13. Perimeter delineation and boundary comparison of landslide. (a) Original. (b) LR. (c) SR.

Table 1. Comparison of evaluation indexes of different methods. For PSNR and SSIM, a higher score means better. In each line, the best results are highlighted in red.

Method	PSNR (dB)	SSIM	Block Number	Params Number
Ground truth	∞	1.00000	0	0
SRGAN	25.46945	0.67910	10	1,103,377
BAM-SRGAN	25.11715	0.67298	10	1,118,177
EDCA-SRGAN	26.25111	0.68343	10	1,156,497

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, H.; Ye, C.; Zhou, Y.; Tang, R.; Wei, R. A Super-Resolution Network for High-Resolution Reconstruction of Landslide Main Bodies in Remote Sensing Imagery Using Coordinated Attention Mechanisms and Deep Residual Blocks. Remote Sens. 2023, 15, 4498. https://doi.org/10.3390/rs15184498

AMA Style

Zhang H, Ye C, Zhou Y, Tang R, Wei R. A Super-Resolution Network for High-Resolution Reconstruction of Landslide Main Bodies in Remote Sensing Imagery Using Coordinated Attention Mechanisms and Deep Residual Blocks. Remote Sensing. 2023; 15(18):4498. https://doi.org/10.3390/rs15184498

Chicago/Turabian Style

Zhang, Huajun, Chengming Ye, Yuzhan Zhou, Rong Tang, and Ruilong Wei. 2023. "A Super-Resolution Network for High-Resolution Reconstruction of Landslide Main Bodies in Remote Sensing Imagery Using Coordinated Attention Mechanisms and Deep Residual Blocks" Remote Sensing 15, no. 18: 4498. https://doi.org/10.3390/rs15184498

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Super-Resolution Network for High-Resolution Reconstruction of Landslide Main Bodies in Remote Sensing Imagery Using Coordinated Attention Mechanisms and Deep Residual Blocks

Abstract

1. Introduction

2. Materials

3. Methods

3.1. Network Architecture

3.2. EDCA Structure

3.3. Coordinate Attention

3.4. Loss Function

3.5. Image Quality Evaluation Index

4. Experimental Results

4.1. Implement Details

4.2. Evaluation Using Different Number of EDCA Blocks

4.3. Evaluation Result

4.4. Ablation Experiment

5. Discussion

6. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI