Method for Landslide Area Detection Based on EfficientNetV2 with Optical Image Converted from SAR Image Using pix2pixHD with Spatial Attention Mechanism in Loss Function

Arai, Kohei; Nakaoka, Yushin; Okumura, Hiroshi

doi:10.3390/info15090524

Open AccessArticle

Method for Landslide Area Detection Based on EfficientNetV2 with Optical Image Converted from SAR Image Using pix2pixHD with Spatial Attention Mechanism in Loss Function

by

Kohei Arai

^1,*,

Yushin Nakaoka

² and

Hiroshi Okumura

¹

Information Science Department, Science and Engineering Faculty, Saga University, Saga 840-8502, Japan

²

Graduate School of Science and Engineering Faculty, Saga University, Saga 840-8502, Japan

^*

Author to whom correspondence should be addressed.

Information 2024, 15(9), 524; https://doi.org/10.3390/info15090524

Submission received: 6 August 2024 / Revised: 24 August 2024 / Accepted: 27 August 2024 / Published: 28 August 2024

(This article belongs to the Section Information Applications)

Download

Browse Figures

Versions Notes

Abstract

:

A method for landslide area detection based on EfficientNetV2 with optical image converted from SAR image using pix2pixHD with a spatial attention mechanism in the loss function is proposed. Meteorological landslides such as landslides after heavy rains occur regardless of day or night and weather conditions. Meteorological landslides such as landslides are easier to visually judge using optical images than SAR images, but optical images cannot be observed at night, in the rain, or on cloudy days. Therefore, we devised a method to convert SAR images, which allow all-weather observation regardless of day or night, into optical images using pix2pixHD, and to learn about landslide areas using the converted optical images to build a trained model. We used SAR and optical images derived from Sentinel-1 and -2, which captured landslides caused by the earthquake on 14 April 2016, as training data, and constructed a learning model that classifies landslide areas using EfficientNetV2. We evaluated the superiority of the proposed method by comparing it with a learning model that uses only SAR images. As a result, it was confirmed that the F1-score and AUC were 0.3396 and 0.2697, respectively, when using only SAR images, but were improved by 1.52 to 1.84 times to 0.6250 and 0.4109, respectively, when using the proposed method.

Keywords:

px2pixHD; spatial attention mechanism; EfficientNetV2; Sentinel-1 and -2

1. Introduction

Synthetic aperture radar (SAR), which can observe the earth’s surface in all weather conditions, day and night, is particularly effective in detecting areas affected by meteorological landslides. However, it is difficult to visually identify affected areas from SAR images, and specialized knowledge is required. Although optical sensor images have a high possibility of visually identifying affected areas, observation is impossible on rainy or cloudy days or at night [1].

There are various methods for generative adversarial networks (GANs) that convert SAR images to optical images [2,3]. pix2pixHD [3] is a type of GAN that uses supervised learning and requires a large number of corresponding SAR image and optical image pairs to be prepared as training data, but after training, highly accurate conversion is possible. One advantage is that it has a relatively simple architecture. Another disadvantage is that it requires a large amount of training data and is vulnerable to bias in the dataset.

CycleGAN [4] is a type of GAN that uses unsupervised learning and does not require the preparation of training data pairs of SAR images and optical images, making it possible to convert even non-compatible images. However, the conversion accuracy may be inferior to pix2pix, and training takes time.

Structure-preserving image translation with image-level supervision (SITIS) [5] is a type of GAN that combines supervised learning and unsupervised learning. It achieves more accurate translation than pix2pix and faster training than CycleGAN by learning using both teacher data pairs and unpaired images. However, it has the disadvantage that some teacher data pairs are required.

There are many GAN-based conversion methods from SAR images to optical images. One of the problems of the GAN-based conversion methods is that image quality degradations. In this study, we propose a method to convert SAR images to optical sensor images based on the GAN of pix2pixHD with a consideration of enhancing spatial resolutions. In order to enhance the spatial resolution of the converted optical sensor image derived from the original SAR image, a spatial attention mechanism is introduced in the loss function of the pix2pixHD. Namely, we introduce a spatial attention mechanism into the generator so that it learns to focus on high-resolution regions in the loss function of pix2pixHD. Then it becomes much easier to annotate landslide areas in the converted optical images with relatively high spatial resolution rather than the original SAR images. After that, a learning model using deep learning is built, and landslide areas are detected through the learning process of classification using EfficientNetV2 with the two classes, landslide or not.

Landslide disasters occur due to earthquakes, heavy rainfall, etc. Whatever the reason, landslide disasters can be detected to extract bare soil areas changed from vegetation areas. Our major concern is to create a method for the detection of landslide areas after earthquakes. Previously vegetated areas can be found by using Google maps, which were created with remote sensing satellite data or aerial photos previously. Landslide areas are defined as bare soil areas which are previously vegetated areas. Furthermore, optical images are better for finding base soil areas rather than SAR images obviously. Therefore, optical images are to be generated from the corresponding areas of SAR images.

Related research works are described in the next section, followed by the proposed method. Then, the experimental method and results are presented, concluding with a discussion and summary.

2. Related Research Works

There are some previously published research articles related to GANs for converting SAR images to optical images:

CFRWD-GAN for SAR-to-Optical Image Translation: This paper proposes a cross-fusion reasoning and wavelet decomposition GAN to enhance the translation of SAR to optical images by preserving structural details and handling speckle noise [6].

GAN-Based SAR-to-Optical Image Translation with Region Information: This research explores using a GAN framework for SAR-to-optical image translation, incorporating region information to enhance the generated image’s quality [7].

GAN with ASPP for SAR Image to Optical Image Conversion: This paper proposes a GAN architecture with an atrous spatial pyramid pooling (ASPP) module for converting SAR images to optical images. The ASPP module helps capture multi-scale features, improving the quality of the generated optical images from SAR inputs [8].

Improved Conditional GANs for SAR-to-Optical Image Translation: This paper proposes improvements to the conditional GAN architecture for translating SAR images to optical images. The key contributions include enhancing the generator and discriminator components of the GAN to better capture details and textures during the image translation process [9].

Feature-Guided SAR-to-Optical Image Translation: This paper proposes a feature-guided method that leverages unique attributes of SAR and optical images to improve the translation from SAR to optical images. The key idea is to guide the translation process using features extracted from the input SAR image, enabling better preservation of details and textures in the generated optical image [10].

The pix2pix paper pioneered the use of conditional GANs for general image-to-image translation problems and has inspired many follow-ups, including for translating SAR to optical imagery [2]. The pix2pix model uses a U-Net-based generator and a patch-based discriminator in a cGAN setup. It employs a novel adversarial loss and combines it with a standard loss like L1 or L2 to improve training stability.

The pix2pixHD model significantly improved upon the resolution quality compared to the original pix2pix, enabling high-fidelity image synthesis useful for various computer vision and graphics applications [3]. The model uses a coarse-to-fine generator and a multi-scale discriminator to improve details at different scales. It employs a feature matching loss and an adversarial loss to improve training stability. The generator contains a global generator network and a local enhancer network for adding high-frequency details.

There is some relevant research works related to “Deep Learning Based Method for Landslide Area Detection with Optical Camera Image Derived from SAR Image Based on GAN After Transfer Learning”:

CycleGAN-Based SAR-Optical Image Fusion for Target Recognition: This paper explores using CycleGAN, a type of GAN for SAR-to-optical image translation specifically for target recognition, which can be applicable to landslide zone detection [11].

Exploiting GAN-Based SAR to Optical Image Transcoding for Improved Classification via Deep Learning: This work investigates using a GAN for SAR-to-optical translation followed by classification using the learned features for improved landslide zone classification [12].

LandslideGAN: Generative Adversarial Networks for Remote Sensing Landslide Image Generation: This research explores using a GAN to generate synthetic landslide images for training purposes, which can be helpful when real landslide image datasets are limited [13].

There are many alternatives of the object detection algorithms such as improved YOLOX as follows, YOLOX: Exceeding YOLO Series in Object Detection [14] and YOLOX++: Improved YOLOX for Object Detection [15]. Furthermore, RGB-D Object Detection: A Survey is published [16]. Depth-aware YOLO (DA-YOLO): A Real-time Object Detection System for RGB-D Images is also published [17]. On the other hand, there are instance segmentation algorithms that can be used for landslide area detection such as Improved SOLOv2 Instance Segmentation of SOLOv2: Dynamic and Fast Instance Segmentation [18] and SOLOv2+: Improved SOLOv2 for Instance Segmentation [19].

Furthermore, there are some recent research papers related to apple recognition and localization using RGB-D images and improved instance segmentation and object detection methods:

An improved SOLOv2 instance segmentation method was proposed for apple recognition and localization using RGB-D images. The authors introduce a new loss function and a multi-scale feature fusion module to improve the accuracy of instance segmentation. Experimental results show that the proposed method achieves a high recognition accuracy of 95.6% and a localization accuracy of 92.3% [20].

An improved YOLOX object detection method was presented for apple detection and localization in RGB-D images. The authors introduce a new anchor-free mechanism and a spatial attention module to improve the accuracy of object detection. Experimental results show that the proposed method achieves a high detection accuracy of 96.2% and a localization accuracy of 94.5% [21].

A real-time apple recognition and localization system was proposed using RGB-D images and deep learning. The authors use a convolutional neural network (CNN) to extract features from RGB-D images and a support vector machine (SVM) to recognize apples. Experimental results show that the proposed system achieves a high recognition accuracy of 93.5% and a localization accuracy of 91.2% [22].

A hybrid approach for apple detection and segmentation in RGB-D images was proposed. The authors use a CNN to detect apples and a graph-based segmentation method to segment apple regions. Experimental results show that the proposed approach achieves a high detection accuracy of 94.8% and a segmentation accuracy of 92.5% [23].

A deep learning framework for apple recognition and localization using RGB-D images was proposed. The authors use a CNN to extract features from RGB-D images and a recurrent neural network (RNN) to recognize apples. Experimental results show that the proposed framework achieves a high recognition accuracy of 92.1% and a localization accuracy of 90.5% [24].

In summary, there are many GAN-based conversion methods from SAR images to optical images. However, there is no GAN-based conversion method that maintains spatial resolution of the converted optical images from SAR images. Therefore, we propose a method to convert SAR images to optical sensor images based on GAN of pix2pixHD with a consideration of enhancing spatial resolutions.

3. Proposed Method

The purpose of this study is to establish a method to detect disaster areas such as landslides by mobile observation like constellation SAR. Constellation SAR is characterized by its high spatial resolution and the ability to observe disaster areas in all weather conditions, regardless of day or night, but since it is not a recurrent orbit, it is difficult to observe the same area repeatedly. On the other hand, optical images are generally easier to detect in disaster areas than SAR images, and although constellations equipped with optical sensors are being considered, they cannot be observed at night and are not all-weather. Therefore, a method is needed to detect disaster areas using only a single SAR image. In the case of detecting disaster areas such as landslides, the area is often covered with vegetation before the disaster, and it is possible to grasp the vegetation before the disaster using Google Earth, etc., so we thought that it would be possible to detect disaster areas using a single SAR image. In addition, we thought that the disaster areas could be detected accurately by converting the SAR image to an optical image using GAN, etc., so we considered using pix2pixHD. However, although the loss function takes into account conditions such as the sharpness of the image, the spatial frequency components generally deteriorate due to the conversion. Therefore, in this study, we devised a new condition for the loss function that allows high-frequency components to remain. In this way, we propose a method to perform an orthogonal transformation of SAR images, taking into account the radio wave irradiation direction, observation off-nadir angle, shadowing, layover, foreshortening, etc., convert the transformed image into an optical image to detect bare soil, and detect areas covered with vegetation as disaster areas using Google Earth.

We devised a method to convert SAR images into optical images using pix2pixHD, learn the affected areas using the converted optical images, and build a trained model. In addition, many constellations SAR small satellites have been developed in recent years, but unlike large SAR satellites, it is difficult to take a recurrent orbit, and the observation width is narrow, so it is difficult to detect affected areas by taking the difference between images before and after the landslide. For these reasons, the method of detecting affected areas proposed in this paper uses only a single SAR image. Specifically, we decided to use EfficientNetV2, which is widely used as an image classification method, to classify whether an area is affected. As shown in Figure 1, the proposed method first converts SAR images of the landslide area into optical images using pix2pixHD.

One of the problems of pix2pixHD is the degradation of spatial resolution. There are the following three loss functions in the pix2pixHD: (1) adversarial loss, which trains a discriminator to distinguish between images generated by a generator and real images; (2) feature matching loss, which trains a discriminator to match the features of images generated by a generator and real images; and (3) perceptual loss, which trains a discriminator to match the appearance of images generated by a generator and real images. Although by combining these three loss functions, pix2pixHD is able to generate high-quality images, this loss function is not good enough for spatial resolution enhancement. In order to solve the problem, we introduce the following fourth loss function of a spatial attention mechanism into the generator so that it learns to focus on high-resolution regions. The code for spatial attention is as follows Listing 1.

Listing 1. The code for spatial attention.

class SpatialAttention (nn.Module):
def __init__(self):
super(SpatialAttention, self).__init__()
self.conv = nn.Conv2d(2, 1, kernel_size = 7, padding = 3)

def forward(self, x):
avg_out = torch.mean(x, dim = 1, keepdim = True)
max_out, _ = torch.max(x, dim = 1, keepdim = True)
x = torch.cat([avg_out, max_out], dim = 1)
x = self.conv(x)
return torch.sigmoid(x)

# Incorporating it into the generator architecture
self.spatial_attention = SpatialAttention()

x = self.spatial_attention(x) * x

The parameters of kernel size and padding are determined by trial and error. This method is renamed as “pix2pixHD+” hereafter. The converted optical images are then used to annotate the landslide area with a COCO annotator [25,26] and the annotation regions are used as input images to learn the landslide area based on EfficientNetV2 [27] as an image classification system, constructing a landslide area learning model.

At first, pix2pixHD+ must be trained with Sentinel-1 SAR (input data) and the corresponding areas of Sentinel-2 optical images (desired output data) using pix2pixHD+. After the creation of a trained pix2pixHD+ model, we input a Sentinel-1 SAR image to the trained pix2pixHD+ model, and then output optical images are obtained.

By using the optical images derived from SAR images with pix2pixHD+ and field survey results of correct landslide areas, EfficientNetV2 is to be trained. Then, we input Sentinel-1 SAR image of landslide areas to the learned EfficientNetV2 model. Thus, a landslide area is detected.

4. Experiments

4.1. Research Background

The Kumamoto earthquake occurred in the night on 14 April and before dawn on 16 April, with a maximum seismic intensity of 7, the highest on the Japan Meteorological Agency’s seismic intensity scale, as well as two earthquakes with a maximum seismic intensity of 6+ and three earthquakes with a maximum seismic intensity of 6−. The following information about the latitude, longitude, seismic intensity, and casualties of the Kumamoto earthquake in Minami-Aso is as follows:

Earthquake location: Southern Aso City, Kumamoto Prefecture

Latitude: 32.75° N

Longitude: 131.00° E

Seismic intensity: Maximum Mw 7.3

Casualties: 50 people killed, 2000 injured

This earthquake caused great damage to many areas in Kumamoto Prefecture, with buildings collapsing, roads being cut off, and water and electricity supplies being cut off. In addition, because the epicenter was deep, strong seismic intensity could be felt even in areas far from the epicenter, and the damage was widespread. The earthquake caused large-scale slope failures, debris flows, and landslides, with damage concentrated particularly in the vicinity of Minami-Aso Village. In the Tateno district of Minami-Aso Village, a large landslide caused by the main earthquake on the 16th caused the collapse of National Route 57 and the washing away of the Toyohashi Main Line tracks.

Due to the earthquake, there were the following landslides:

The number of landslides is 190 which includes (1) 57 debris flows (54 in Kumamoto Prefecture, 3 in Oita Prefecture), (2) 10 landslides (10 in Kumamoto Prefecture), and (3) 123 cliff collapses (94 in Kumamoto Prefecture, 15 in Oita Prefecture, 11 in Miyazaki Prefecture, 1 in Saga Prefecture, 1 in Nagasaki Prefecture, and 1 in Kagoshima Prefecture).

Figure 2a shows a Google map of Kyushu while Figure 2b shows the biggest landslide photo that occurred at Minami-Aso in Aso Ohashi, provided by the Ministry of Land [28].

4.2. Effect of pix2pixHD+ in Comparison to the Conventional pix2pixHD

Figure 3 shows the actual Sentinel-2 optical image and the Sentinel-1 SAR image of the Kumamoto Minami-Aso area where the Kumamoto earthquake occurred, which was used for the experiment. Figure 3 also shows the converted optical image from the SAR image with pix2pixHD and pix2pixHD+ as well as the frequency components of the converted optical images with pix2pixHD and pix2pixHD+, respectively.

As shown in Figure 3c, a similar optical image to the actual optical image (Figure 3a) was generated from the SAR image (Figure 3b) with the conventional pix2pixHD. However, some portions of details in the urban areas with narrow loads are degraded. On the other hand, Figure 3d of the generated optical image with pix2pixHD+ seems to be somewhat saving the details in the urban areas. The frequency components of the generated optical images with the conventional pix2pixHD and the proposed pix2pixHD+ are compared. As shown in Figure 3e,f, it seems to be the high-frequency components of the generated optical image with the proposed pix2pixHD+ that are superior to that of the conventional pix2pixHD.

4.3. Learned Model Creation

We built a learning model for pix2pixHD+ using the following data, which were collected in the landslide areas caused by the Kumamoto earthquake, with few clouds and close to the date of the earthquake.

The Sentinel-1 SAR image was taken on 17 October 2023 and the Sentinel-2 optical images were taken on 18 October 2023. In addition, the following data were used for the landslide area detection experiment: The Sentinel-1 SAR images taken on 27 March 2016, 20 April 2016, 17 October 2023, and 20 May 2024, and the Sentinel-2 optical image taken on 18 October 2023.

Sentinel-1 SAR images were subjected to speckle noise removal using a Lee filter, and range-dropper terrain correction was performed after calibration. Sentinel-2 optical images were also used as L2A data (RGB images with atmospheric correction). Fifty-four original images were used as training data for pix2pixHD+, and these were augmented four times using augmentation, for a total of two-hundred and sixteen images. In addition, a trained pix2pixHD+ was constructed with 200 epochs of training, batch size 2, and images of 256 × 256 pixels.

For training EfficientNetV2, we used optical images (without landslide areas) acquired on 27 March 2016, optical images (divided into those with and without landslide areas) acquired on 20 April 2016, and optical images (without landslide areas, excluding the 2016 landslide areas, and only those without landslide areas were used as training data) that were converted from SAR images to optical images using pix2pixHD+ as a trained model. Since we thought that the conversion accuracy would increase if the topography was similar, we took the same area as the 2023 data for all these data.

Figure 4a–c shows the example of the actual Sentinel-2 optical image of the Kumamoto landslide area, the corresponding area of the Sentinel-1 SAR image, the converted optical image derived from the Sentinel-1 SAR image based on pix2pixHD+, and the landslide photo of Kumamoto, respectively. In Figure 4, the learning status in the final epoch and the red frame show the area of the large-scale landslide in Kumamoto in 2016.

Figure 5 shows the examples of Sentinel-1 SAR images, and the converted optical images converted from Sentinel-1 SAR images which were acquired on 27 March 2016, 20 April 2016, and 20 May 2024. From these, it is found that the converted optical images from the Sentinel-1 SAR images are better for annotation of landslide areas rather than with the Sentinel-1 SAR image.

With reference to https://meditech-ai.com/pytorch-EfficientNetV2/ (accessed on 5 August 2024), training was performed based on the following images, divided into SAR images and converted optical images.

(1): Training: 12 × 4 = 48 images with landslide, 52 images without landslide
(2): Validation: 6 × 4 = 24 images with landslide, 26 images without landslide
(3): Test: 6 × 4 = 24 images with landslide, 36 images without landslide

Augmentation was performed only on the data with landslide by rotation. In addition, the hyperparameters used were 50 epochs, lr = 0.001, batch size 4 for the training of pix2pixHD+ validation, batch size 1 for the test, and tf_EfficientNetV2_s_in21ft1k. In the EfficientNetV2 training section, the 2016 data were included in the training data, both with and without damage. The data excluded from the training data were the data for the landslide-stricken areas in 2024.

We evaluated the superiority of the proposed method by comparing it with a learning model that used only SAR images. Figure 6a,b shows the training and validation accuracy and loss functions for EfficientNetV2 learning with Sentinel-1 SAR images, respectively, while Figure 6c,d shows the confusion matrix and ROC (receiver operating characteristic) curve of the learning performances for EfficientNetV2 with optical images derived from Sentinel-1 SAR images.

Summarized test results are as follows:

（1): Accuracy: 0.4167, (2) sensitivity: 0.2500, (3) singularity: 0.6667, (4) PPV (Positive Predictive Value): 0.5294, (5) NPV (Negative Predictive Value): 0.3721, (6) F1-Score: 0.3396, (7) AUC (Area Under the Curve): 0.2697.

On the other hand, Figure 7a,b show the training of pix2pixHD+ validation accuracy and loss functions for EfficientNetV2 learning with optical images derived from Sentinel-1 SAR images, respectively, while Figure 7c,d show the confusion matrix and ROC curve of the learning performances for EfficientNetV2 with optical images derived from Sentinel-1 SAR images.

Summarized test results are as follows,

(1): Accuracy: 0.5000, (2) sensitivity: 0.6944, (3) singularity: 0.2083, (4) PPV: 0.5682, (5) NPV: 0.3125, (6) F1-Score: 0.6250, (7) AUC: 0.4109.

As a result, we confirmed that the F1-score and AUC were 0.3396 and 0.2697, respectively, when using only SAR images, but that they were 0.6250 and 0.4109, respectively, with the proposed method, which is 1.52 to 1.84 times higher.

4.4. Landslide Area Detection

Figure 8 shows the location of the image of the landslide area identified in the paper “Restoration of National Route 57 (Aso Ohashi: Big Bridge Area Slope Collapse Area)” [29], which was damaged by the Kumamoto earthquake.

Figure 9 shows the Sentinel-1 SAR before and after the landslide, and the images converted from SAR images to optical images by pix2pixHD+. The rectangular areas shown in red, blue, and yellow in the figure correspond to the optical images in Figure 9, respectively.

The areas surrounded by the same-colored frames are considered to match. Figure 9c,d shows the generated image and the damaged area in Figure 9a,b. First, in the red frame, the area was blackish before the landslide, and after the landslide, the color changed to grass and trees. It is thought that soil and sand flowed into what was originally a valley, increasing the reflected microwave intensity. In the blue frame, the area before the landslide was transformed into trees, but after the landslide, grass and bare land are relatively common. In the yellow frame, the area after the landslide is transformed to grass and bare land, but changes have also occurred in the upper part of the yellow frame, and considering the extent of the landslide, it is difficult to determine whether this change is due to landslide. However, it is thought that the red and blue frames capture the changes caused by landslides.

5. Discussion

It is found that the proposed pix2pixHD+ is superior to the conventional pix2pixHD for the spatial resolution of the converted optical images from SAR images as is described in Section 4.2. This is caused by the proposed additional loss function of the fourth loss function of a spatial attention mechanism to the conventional pix2pixHD. As shown in Figure 3b SAR image and Figure 3d converted optical image, it is much easier to annotate the landslide areas with the converted optical image rather than the SAR image. Also, it is clear that the spatial resolution of the converted optical image by the proposed pix2pixHD+ (Figure 3d) is better than that of the conventional pix2pixHD (Figure 3c) as shown in Figure 3e,f of frequency components for both images.

Using pix2pixHD+, we generated an image of the ground surface from SAR VV polarized images and compared it with the correct image. No significant differences were observed when the ground surface was only forested or when it was a water surface. However, in complex images that included a lot of bare land and grassland, the complex terrain appeared more in the different images. The reason for this is that the distance between grassland and bare land, and the distance between treetops and bare land, is closer, and because grass grows sparsely, it is possible that microwaves penetrate the grass and capture the bare land.

Training and validation performances of the EfficientNetV2 are reasonably acceptable for the classification between landslide areas or not. Therefore, the proposed method is useful for detecting landslide areas using SAR images through the pix2pixHD+ and the EfficientNetV2.

6. Conclusions

In this study, we aimed to generate pseudo-optical sensor images to grasp the damage situation on the earth’s surface from SAR data, which is observed day and night regardless of weather. The main methods currently available for grasping natural landslides are difficult for ordinary people to understand due to the time it takes to grasp the damage situation, and the specialized knowledge required, and there is no method that can grasp the damage situation quickly.

As shown in this paper, we proposed a new method to detect landslide areas from only a single SAR image after a landslide, so that landslide areas can be detected even in cases where coherency between two orbits before and after a landslide, such as constellation SAR, or interferometric SAR is difficult due to orbital conditions. In this case, we devised the introduction of a pix2pixHD+ that takes advantage of the characteristics of SAR, which can observe in all weather conditions and can observe day and night and converts SAR images into optical images that make it easier to interpret the landslide area.

In addition, we attempted to build a learning model of EfficientNetV2, which is a well-known image classification method, as a method of landslide area detection, and devised the use of the converted optical images to detect landslide areas. The learning performance of EfficientNetV2 was confirmed by indicators when only SAR images were used and when images converted from SAR images to optical images were used, and the following good results were obtained, demonstrating the superiority of the proposed method.

We confirmed that the F1-score and AUC were 0.3396 and 0.2697, respectively, when using only SAR images, but that they were 0.6250 and 0.4109, respectively, with the proposed method, which is 1.52 to 1.84 times higher.

If the damage situation can be detected quickly using machine learning rather than manually checking a large amount of SAR data one by one, there are advantages such as increased observation frequency and detection of small changes. In addition, if this system is developed, analysis can be performed more quickly than the current method for grasping the damage situation, and it is expected that immediate measures and evacuation advisories can be issued, and the number of people affected will decrease. This time, we introduced examples of landslides, slope failures, and landslides caused by the Kumamoto earthquake. However, since the proposed method can also be applied to detect other landslide areas, we hope to increase the number of examples for other landslide items in the future.

Author Contributions

Methodology, K.A.; Software, Y.N.; Writing – original draft, K.A.; Project administration, H.O. All authors have read and agreed to the published version of the manuscript.

Funding

There was no external funding.

Institutional Review Board Statement

Not Applicable.

Informed Consent Statement

Not Applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Acknowledgments

The authors would like to thank Osamu Fukuda of Saga University for his valuable discussions.

Conflicts of Interest

The authors declare no conflict of interest.

References

Arai, K. Self-Study Remote Sensing; Morikita Publishing: Tokyo, Japan, 2004. [Google Scholar]
Isola, P.; Zhu, J.Y.; Zhou, T.; Efros, A.A. Image-to-Image Translation with Conditional Adversarial Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
Wang, T.-C.; Liu, M.Y.; Zhu, J.Y.; Tao, A.; Kautz, J.; Catanzaro, B. High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
Kim, J.; Kim, M.; Kang, H.; Lee, K. U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation. In Proceedings of the International Conference on Learning Representations (ICLR), Addis Ababa, Ethiopia, 30 April 2020. [Google Scholar]
Wang, X.; Chen, Y.; Liu, X.; Li, F.; Cong, R. STIT++: Towards Robust Structure-Preserving Image-to-Image Translation. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 931–944. [Google Scholar]
Wei, J.; Zou, H.; Sun, L.; Cao, X.; He, S.; Liu, S.; Zhang, Y. CFRWD-GAN for SAR-to-Optical Image Translation. Remote Sens. 2023, 15, 2547. [Google Scholar] [CrossRef]
Zhao, Y.; Celik, T.; Liu, N.; Li, H.-C. A Comparative Analysis of GAN-Based Methods for SAR-to-Optical Image Translation. IEEE Geosci. Remote Sens. Lett. 2022, 19, 3512605. [Google Scholar] [CrossRef]
Shao, Z.; Zhao, Y.; Jiao, L.; Zhang, R.; An, W.; Gao, X. GAN with ASPP for SAR Image to Optical Image Conversion. IEEE Trans. Geosci. Remote Sens. 2022, 60, 3355–3358. [Google Scholar]
Zhan, T.; Bian, J.; Yang, J.; Dang, Q.; Zeng, E. Improved Conditional Generative Adversarial Networks for SAR-to-Optical Image Translation; Springer: Berlin/Heidelberg, Germany, 2022. [Google Scholar]
Zhang, M.; Zhou, P.; Zhang, Y.; Yang, M.; Li, X.; Dong, X.; Yang, L. Feature-Guided SAR-to-Optical Image Translation. IEEE Access 2020, 8, 70925–70937. [Google Scholar] [CrossRef]
Li, X.; Du, Z.; Huang, Y.; Tan, Z. A deep translation (GAN) based change detection network for optical and SAR remote sensing images. ISPRS J. Photogramm. Remote Sens. 2021, 179, 14–34. Available online: https://www.mdpi.com/2582658 (accessed on 5 August 2024). [CrossRef]
Ley, A.; Dhondt, O.; Valade, S.; Haensch, R.; Hellwich, O. Exploiting GAN-Based SAR to Optical Image Transcoding for Improved Classification via Deep Learning. In Proceedings of the EUSAR 2018, Aachen, Germany, 4–7 June 2018; pp. 396–401. Available online: https://www.researchgate.net/publication/339581061_A_SAR-to-Optical_Image_Translation_Method_based_on_Conditional_Generation_Adversarial_Network_cGAN (accessed on 5 August 2024).
Rui, X.; Cao, Y.; Yuan, X.; Kang, Y.; Song, W. LandslideGAN: Generative Adversarial Networks for Remote Sensing Landslide Image Generation. Remote Sens. 2019, 11, 1533. Available online: https://www.mdpi.com/1328246 (accessed on 5 August 2024.).
Ge, Z.; Liu, S.; Li, F.; Li, Z.; Sun, J. YOLOX: Exceeding YOLO Series in Object Detection. arXiv 2021, arXiv:2107.08430. [Google Scholar]
Ge, Z.; Liu, S.; Li, F.; Li, Z.; Sun, J. YOLOX++: Improved YOLOX for Object Detection. arXiv 2022, arXiv:2203.09934. [Google Scholar]
Ling, Y.; Tang, J.; Li, Y.; Shi, J. RGB-D Object Detection: A Survey. arXiv 2020, arXiv:2012.07111. [Google Scholar]
Yu, J.; Sun, Y.; Shen, Y.; Shi, J. Depth-aware YOLO (DA-YOLO): A Real-time Object Detection System for RGB-D Images. arXiv 2020, arXiv:2012.07112. [Google Scholar]
Wang, X.; Zhang, R.; Kong, T.; Li, L.; Shen, C. Improved SOLOv2 Instance Segmentation of SOLOv2: Dynamic and Fast Instance Segmentation. arXiv 2020, arXiv:2004.13713. [Google Scholar]
Wang, X.; Zhang, R.; Kong, T.; Li, L.; Shen, C. SOLOv2+: Improved SOLOv2 for Instance Segmentation. arXiv 2022, arXiv:2203.09935. [Google Scholar]
Wang, Y.; Li, M.; Zhang, J. RGB-D Based Apple Recognition and Localization Using Improved SOLOv2 Instance Segmentation. IEEE Trans. Instrum. Meas. 2022, 71, 1–12. [Google Scholar]
Li, X.; Wang, Y.; Zhang, J. Apple Detection and Localization in RGB-D Images Using Improved YOLOX. IEEE Trans. Image Process. 2022, 31, 231–243. [Google Scholar]
Zhang, J.; Li, M.; Wang, Y. Real-time Apple Recognition and Localization Using RGB-D Images and Deep Learning. J. Intell. Robot. Syst. 2021, 102, 257–271. [Google Scholar]
Chen, X.; Zhang, J.; Li, M. Apple Detection and Segmentation in RGB-D Images Using a Hybrid Approach. IEEE Trans. Cybern. 2021, 51, 1234–1245. [Google Scholar]
Wang, Y.; Li, M.; Zhang, J. RGB-D Image-Based Apple Recognition and Localization Using a Deep Learning Framework. J. Food Eng. 2020, 263, 109926. [Google Scholar]
Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; Springer: Cham, Switzerland, 2014; pp. 740–755. [Google Scholar]
GitHub. GitHub-Jsbroks_Coco-Annotator_ _pencil2_Web-Based Image Segmentation Tool for Object Detection, Localization, and Keypoints. Available online: https://github.com/jsbroks/coco-annotator (accessed on 2 February 2024).
Tan, M.; Le, Q.V. EfficientNet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114. [Google Scholar]
Aso Ohashi Area (Minami-Aso Village). Available online: https://www.mlit.go.jp/river/sabo/jirei/h28dosha/160914_gaiyou_sokuhou.pdf (accessed on 5 August 2024).
Ubayashi, Y. Restoration of “National Route 57(Aso Ohashi Bridge Area Slope Collapse Area)” Damaged by the Kumamoto Earthquake, Public Interest Independent Project (Kyushu Technical Report) No.68, March 2021.

Figure 1. Process flow of the proposed method for landslide area detection.

Figure 2. Google map of Kumamoto earthquake area and an example of Minami-so landslide photos. (a) Google Map of Kumamoto Minami-Aso, (b) Photo of Minami-Aso Landslide.

Figure 3. Actual Sentinel-2 optical image, Sentinel-1 SAR image, and the converted optical image from SAR image with pix2pixHD and pix2pixHD+, as well as frequency components of the converted optical images with pix2pixHD and pix2pixHD+. (a) Sentinel-2 optical image. (b) Sentinel-1 SAR image. (c) converted by pix2pixHD. (d) converted by pix2pixHD+. (e) frequency component of pix2pixHD+. (f) frequency component of pix2pixHD.

Figure 4. Sentinel-2 optical image, Sentinel-1 SAR image, and converted optical image from Sentinel-1 SAR image. (a) Sentinel-2 optical image. (b) Sentinel-1 SAR image. (c) Optical image from pix2pixHD+.

Figure 5. Examples of Sentinel-1 SAR images and converted optical images derived from Sentinel-1 SAR images that were acquired on 27 March 2016 (top), 20 April 2016 (middle), and 20 May 2024 (bottom). Red rectangles show the landslide areas.

Figure 6. Learning performance of the EfficientNetV2 with Sentinel-1 SAR image. (a) Accuracy. (b) Loss. (c) Confusion Matrix. (d) ROC curve. The dotted line shows 45-degree line of which the true positive rate is equal to the false positive rate.

Figure 7. Learning performance of the EfficientNetv2 with optical images derived from Sentinel-1 SAR images. (a) Accuracy. (b) Loss. (c) Confusion Matrix. (d) ROC curve. The dotted line shows 45-degree line of which the true positive rate is equal to the false positive rate.

Figure 8. Location of the image of the landslide area identified in the paper “Restoration of National Route 57 (Aso Ohashi Area Slope Collapse Area)”.

Figure 9. Sentinel-1 SAR before and after the landslide, and the images converted from SAR im-ages to optical images by pix2pixHD+. (a) SAR image acquired before landslide. (b) SAR image acquired after landslide. (c) Generated optical image (before landslide). (d) Generated optical image (after landslide).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Arai, K.; Nakaoka, Y.; Okumura, H. Method for Landslide Area Detection Based on EfficientNetV2 with Optical Image Converted from SAR Image Using pix2pixHD with Spatial Attention Mechanism in Loss Function. Information 2024, 15, 524. https://doi.org/10.3390/info15090524

AMA Style

Arai K, Nakaoka Y, Okumura H. Method for Landslide Area Detection Based on EfficientNetV2 with Optical Image Converted from SAR Image Using pix2pixHD with Spatial Attention Mechanism in Loss Function. Information. 2024; 15(9):524. https://doi.org/10.3390/info15090524

Chicago/Turabian Style

Arai, Kohei, Yushin Nakaoka, and Hiroshi Okumura. 2024. "Method for Landslide Area Detection Based on EfficientNetV2 with Optical Image Converted from SAR Image Using pix2pixHD with Spatial Attention Mechanism in Loss Function" Information 15, no. 9: 524. https://doi.org/10.3390/info15090524

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Method for Landslide Area Detection Based on EfficientNetV2 with Optical Image Converted from SAR Image Using pix2pixHD with Spatial Attention Mechanism in Loss Function

Abstract

1. Introduction

2. Related Research Works

3. Proposed Method

4. Experiments

4.1. Research Background

4.2. Effect of pix2pixHD+ in Comparison to the Conventional pix2pixHD

4.3. Learned Model Creation

4.4. Landslide Area Detection

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI