SMD-Net: Siamese Multi-Scale Difference-Enhancement Network for Change Detection in Remote Sensing

Zhang, Xiangrong; He, Ling; Qin, Kai; Dang, Qi; Si, Hongjie; Tang, Xu; Jiao, Licheng

doi:10.3390/rs14071580

Open AccessArticle

SMD-Net: Siamese Multi-Scale Difference-Enhancement Network for Change Detection in Remote Sensing

by

Xiangrong Zhang

^1,*

,

Ling He

¹

,

Kai Qin

²,

Qi Dang

³,

Hongjie Si

³,

Xu Tang

¹

and

Licheng Jiao

¹

Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, International Research Center for Intelligent Perception and Computation, Joint International Research Laboratory of Intelligent Perception and Computation, School of Artificial Intelligence, Xidian University, Xi’an 710071, China

²

The National Key Laboratory of Science and Technology on Remote Sensing Information and Image Analysis Technology, Beijing Research Institute of Uranium Geology, Beijing 100029, China

³

Huawei Cloud, Huawei Technologies, Xi’an 710076, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2022, 14(7), 1580; https://doi.org/10.3390/rs14071580

Submission received: 5 February 2022 / Revised: 12 March 2022 / Accepted: 15 March 2022 / Published: 25 March 2022

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Change detection, as an important task of remote sensing image processing, has a wide range of applications in many aspects such as land use and natural disaster assessment. Recent change detection methods have achieved good results. However, due to the environmental difference between the bi-temporal images and the complicated imaging condition, there are usually problems such as missing small objects, incomplete objects, and rough edges in the change detection results. The existing change detection methods usually lack attention in these areas. In this paper, we propose a Siamese change detection method, named SMD-Net, for bi-temporal remote sensing change detection. The proposed model uses multi-scale difference maps to enhances the information of the changed areas step by step in order to have better change detection results. Furthermore, we propose a Siamese residual multi-kernel pooling module (SRMP) for high-level features to enhance the high-level change information of the model. For the low-level features of multiple skip connections, we propose a feature difference module (FDM) that uses feature difference to fully extract the change information and help the model generate more accurate details. The experimental results of our method on three public datasets show that compared with other benchmark methods, our network comprises better effectiveness and has a better trade-off between accuracy and calculation cost.

Keywords:

change detection; deep learning; high-resolution images; remote sensing (RS) images; Siamese network

1. Introduction

Remote sensing (RS) image change detection refers to using multi-temporal RS images and other auxiliary data covering the same surface area to determine and analyze surface changes. The use of high-resolution RS images for regional change detection can intuitively, quickly, and truly obtain relevant change information and grasp its development trend. This technology is widely used, such as dynamic monitoring of forest and vegetation [1,2], land use and cover change analysis [3,4], natural disaster assessment [5,6], and other fields.

According to the basic unit, traditional RS change detection algorithms can be divided into two categories, pixel-based and object-based methods [7]. In pixel-based methods, the first step is to generate difference image by image difference [8], principal component analysis (PCA) [9], change vector analysis (CVA) [10], and other methods. Then, threshold-based and cluster-based methods [9,11,12] are adopted to generate the change detection results. The object-based method first extracts the features of the bi-temporal images, divides them into different semantic information, and then analyses the differences between the bi-temporal images [13,14,15,16,17]. However, with the development of Earth observation technology, more advanced sensors are designed for Earth observation with higher spatial resolutions and larger sensor detection ranges. It is difficult for traditional algorithms to excavate the potential connections of these huge amounts of high-resolution images. In addition, RS change detection tasks include not only changing objects but also other pseudo-change interference objects, such as light and seasons [18], which brings great challenges to the task. These factors make traditional methods unable to meet application needs. Many traditional change detection methods [9,10,11,12,13,14,15,16,17,19,20,21,22] usually employ handcrafted features to extract change maps. The manual feature extraction process is very cumbersome, and these methods have poor robustness in complex scenarios.

In recent years, deep learning has achieved admirable results in computer vision tasks, such as object detection [23,24,25], semantic segmentation [26,27,28], image classification [29,30,31], and other tasks. Many researchers have adopted the deep semantic segmentation model into the pixel-level RS change detection tasks and obtained remarkable performance. At present, this strategy has gradually become the mainstream method of RS change detection. The pixel-level RS change detection model based on deep learning is mainly divided into two main structures [32]. The first is the single-branch structure that concatenates the bi-temporal images directly as the input of the model. Based on U-Net [33], Daudt et al. [34] proposed the Fully Convolutional Early Fusion (FC-EF) that first concatenates the bi-temporal images as the input, then uses skip connection to prevent the loss of details caused by the convolution layers and pooling layers in the deep network, and finally obtains the change map through a softmax layer. Unet++_MSOF [35] is improved from Unet++ [36], which concatenates the bi-temporal images as the input and combines the deep supervision strategy. Another one is the double-branch structure with shared weights, also known as the Siamese structure [37]. The Siamese structure uses the same feature extractor to generate the bi-temporal feature maps that are used to calculate the distance between the corresponding features to measure the similarity and obtain a change intensity map. Based on the change intensity map, the change detection results can be calculated. Chen et al. [38] used VGG [29] and ResNet [30] as feature extractors and introduced a dual attention mechanism (DAM) [39] to solve the pseudo-change problem. Fang et al. [18] improved Unet++ to a Siamese structure, and introduced an Ensemble Channel Attention Module (ECAM) to aggregate and refine features of multiple semantic levels of deep supervision and achieved excellent results.

The above deep-learning-based methods have achieved good results, but there are still some common problems to be solved. The single-branch structure has the entanglement and non-correspondence of spatial features in bi-temporal images [18], which affects the performance of the model during feature fusion. However, the Siamese structure extracts features from the bi-temporal images separately and then fuses them to generate change detection results. Therefore, we adopt the Siamese structure to avoid this shortcoming of the single-branch structure. Due to RS images having the characteristics of complex scenes and abundant objects, bi-temporal images are usually affected by factors such as positional deviation, lighting conditions, and seasonal changes [32]. These complex factors result in it being difficult to detect the changed areas with weak intensity. However, most Siamese models lack attention to the vulnerable change information, and successive down-samplings and convolutions cause the loss of these changed areas, which often leads to missing small objects, incomplete objects, and the rough edges of objects in the results. In addition, the feature difference is not fully utilized in the existing RS change detection models.

To deal with the above problems, we propose a novel approach called Siamese Multi-Scale Difference-Enhancement Network (SMD-Net) that uses multi-scale feature difference maps to enhance the information of the changed areas, so that the model could have a good performance. Previous studies [26,33,36] have shown that the low-level features of the neural network (NN) contain fine-grained positioning information such as the location and texture of the RS objects, while high-level features contain coarse-grained semantic information such as the range of RS objects, land properties, semantics, and other information. Based on this view, we propose a Siamese residual multi-kernel pooling module (SRMP) for high-level features, which can provide high-level change information to enhance the ability of semantic discrimination to the changed areas. SRMP can help the model to improve the completeness of objects and the ability to detect small objects based on high-level information. For the low-level features of multiple skip connections, we propose a feature difference module (FDM) that can provide multi-scale detailed change information to enhance the local details of the changed areas. FDM can improve the performance of the model at the edges of the objects. During the feature fusion of the decoder, the model combines the different scales and depths of feature change maps provided by SRMP and FDM to enhance the changed information step by step. SRMP and FDM can increase the discriminative ability of the model in the changed areas so that it has a better performance in changed areas.

In the next two subsections, we discuss related work on remote sensing change detection, our motivation, and the main contributions of this paper.

1.1. Related Work and Motivation

In this part, we briefly review the existing RS change detection methods.

Due to the excellent ability of the U-shaped network to fuse low-level and high-level features, as well as its lightweight architecture, U-shaped networks are improved for RS change detection tasks. Papadomanolaki et al. [40] proposed a novel method for urban change detection, which combines the U-shaped network for feature representation and powerful recurrent networks for temporal modeling. To deal with the limitation that a large number of labeled datasets are required for supervised change detection tasks, Peng et al. [41] introduced a semi-supervised network (SemiCDNet) in combination with a generative adversarial network (GAN).

However, when the single-branch structure extracts feature maps in RS change detection tasks, the bi-temporal features will interfere with each other, which affects the performance of the model during feature fusion. The Siamese network is a method to distinguish between inputs and learns a measure of similarity from the data, which can avoid this problem well. The Siamese network consists of two identical NNs with shared weights. Each of the NNs generates a feature vector to calculate the similarity between the two inputs. Because of its excellent ability to measure similarity, the Siamese network has been widely used in RS change detection tasks. Dual attentive fully convolutional Siamese networks (DASNet) [38] use a dual attention mechanism (DAM) to improve feature discrimination and combine with weighted double-margin contrastive (WDMC) loss to solve the problem of sample imbalance. Chen et al. [42] combined CNN and RNN and proposed SiamCRNN that stacked the long short-term memory (LSTM) units to fully excavate the change information and can be used for both homogeneous and heterogeneous images for change detection. Yang et al. [43] proposed Asymmetric Siamese Networks (ASN) for change detection to locate and identify semantic changes through feature pairs obtained from modules of widely different structures. Meanwhile, in order to alleviate the influences of label imbalance in model training and evaluation, an adaptive threshold learning (ATL) module and a separated kappa (SeK) coefficient were proposed.

Due to the outstanding performance of the Siamese structure and U-shaped network, they were combined and applied to RS change detection tasks. By combining nested connections, Fang et al. [18] proposed a Siamese Network for Change Detection (SNUNet-CD) to alleviate the loss of localization information in the deep layers of NN through compact information transmission. Zhang et al. [44] proposed an End-to-end Superpixel-enhanced Change Detection Network (ESCNet) that combined differentiable superpixel segmentation to solve the problem of the precise location of the changing areas.

In the Siamese U-shaped network, the difference map cannot be directly used to represent the change due to the existence of spectral and position errors, but the difference map is still the most intuitive means to reveal the changes in the bi-temporal images [32]. Therefore, many scholars use the difference map to excavate the change information contained in it. Zhang et al. [45] proposed a deeply supervised image fusion network (IFN), and fused the image difference features with multi-level deep features of the original images for change map reconstruction, which improves the boundary completeness and internal compactness of objects in the output change maps. Zhang et al. [46] used the multi-scale and multi-depth feature difference maps, as well as RS images obtained by various sensors with different spatial resolutions to solve the multi-scale problem in RS change detection. Peng et al. [32] proposed a difference-enhancement dense-attention convolutional NN (DDCNN) that improved the accuracy of change feature extraction by introducing difference-enhancement (DE) unit and combined the up-sampling attention (UA) unit to capture the change characteristics of the ground features with the spatial context information. Wang et al. [47] proposed an attention mechanism-based deep supervision network (ADS-Net) for the detection of changes in bi-temporal remote sensing images, and they propose an adaptive attention mechanism combining spatial and channel features to capture the relationship of different scale changes and achieve more accurate change detection. In addition, the Siamese U-shaped network also plays an important role in the detection of changes in multi-source remote sensing images. A deep-translation-based change detection network (DTCDN) [48] was proposed for optical and Synthetic Aperture Radar (SAR) images from different times. The deep translation firstly maps images from one domain (e.g., optical) to another domain (e.g., SAR) through a cyclic structure into the same feature space. The translation results are imported to a supervised CD network that utilizes deep context features to realize the change detection of different sensors. Ebel et al. [49] proposed a dual stream network architecture for the fusion of bi-temporal SAR and bi-temporal optical data. Specifically, a dual-stream network based on U-Net was introduced to process SAR and optical data separately and combine the extracted features at a later decision stage. Hafner et al. [50] proposed a novel four-branch Siamese architecture for the fusion of SAR and optical observations for multi-modal change detection. Specifically, a pair of branches (weights sharing) is used to extract the features of the bi-temporal optical image, the other pair of branches (weights sharing) is used to extract the features of the bi-temporal SAR image, and, finally, a decoder is adopted to fuse the four features to generate the multi-source change detection results.

The above work has shown that, in RS change detection tasks, the difference map can better help the model to improve the effectiveness of the extracted change features, as well as the completeness of the boundary of the changed areas. However, the previous work either made insufficient use of the difference map or only used the difference map without retaining the original feature maps. This will make the change information not fully excavated, leading to missing small objects and incomplete change detection results.

In our proposed method, we not only retain the original feature maps but also adopt the feature difference maps to model multi-scale and multi-depth change information, which will enhance the change intensity. In addition, our research found that using such an encoder similar to Fully Convolutional Siamese-Concatenation (FC-Siam-Conc) and Fully Convolutional Siamese-Difference (FC-Siam-Diff) [34] can not only make the model have a small number of parameters but also can easily achieve a good result on the Change Detection Dataset (CDD) [51]. However, its performance on other datasets is often not very good (see Section 4.1 for details). We consider that ResNet has a more stable performance in all aspects, and the weights pre-trained on ImageNet can be loaded to accelerate the model’s convergence, so we employ ResNet as our encoder. In addition, we concatenate the bi-temporal feature map to preserve the original features and reduce the loss of information as the decoder.

1.2. Contribution

The main contributions of this paper are mainly as follows:

1.: We propose an SMD-Net for RS change detection, which combines multi-scale and multi-depth difference feature maps to enhance the change information and achieve more robust performance;
2.: The SRMP is proposed to provide high-level change information, which will enhance the overall changed areas, which is helpful for solving the problem of object incompleteness and missing small objects;
3.: The FDM is proposed to fully excavate the change information through the feature difference, which can enhance the details of changed areas to improve the edge accuracy.

The paper is organized as follows. Section 2 describes the method proposed in this paper. Section 3 introduces the experiments and results. Section 4 discusses the effect of dataset overlap and label error on the change detection model. Finally, the conclusion of this paper is drawn in Section 5.

2. Materials and Methods

In this section, we provide a detailed description of our proposed method.

2.1. Network Architecture

The pixel-level change detection of RS can be regarded as a special changed objects segmentation task. In the field of medical segmentation, CE-Net [52] has relatively complete segmentation results on multi-scale problems such as the objects’ completeness and the continuity of slender objects. Inspired by this advantage, we propose SMD-Net based on CE-Net. Since the change detection tasks need to process bi-temporal images, we adopt a Siamese structure. In addition, considering that RS images have a more complex scene and contain a large number of small objects, we propose SRMP and FDM to enhance the change information, so that the change detection results are more detailed and the model has better results on small objects. Our model is a Siamese U-shaped network. It is mainly composed of the Siamese encoders, a feature difference map processing module, and a decoder. The feature difference map processing module is composed of one SRMP and four FDMs, as shown in Figure 1a. In order to improve the model performance, the Siamese encoders adopt ResNet-34 [30] pre-trained on Imagenet dataset [53], and remove the pooling layer and fully connected layer at the end. Therefore, the first encoder block is composed of a convolution layer and a pooling layer, and the remaining four encoder blocks are composed of several residual blocks. The residual block adds a shortcut mechanism to the convolution block, as shown in Figure 1b, which can accelerate the model convergence and prevent the gradient vanishing [52]. Our encoder module adopts the Siamese structure that is weight-sharing. We input the bi-temporal RS image to the encoder module to obtain five pairs of original features with different scales and different depths.

In the feature difference map processing module, we obtain the five pairs of feature maps provided by the Siamese encoders as input. Among them, we regard a pair of features with the smallest scale as high-level features, which contain rich semantic and attribute information. While the remaining four pairs of features are regarded as low-level features, they mainly contain texture and detailed information of ground objects. High-level features are used to guide the categories and attributes of low-level features [32], and low-level features need to provide detailed information for rough high-level features, so we use different processing methods for high-level features and low-level features. We propose SRMP that can extract overall change information and enhance the change intensity in high-level features. In addition, the multi-kernel pooling mechanism in SRMP can solve the multi-scale problem of objects in RS images well. Therefore, we input a pair of high-level features into SRMP to model global change information. Moreover, our proposed FDM can fully excavate the edge, texture, and other detailed information of the changed areas from the low-level feature difference map, which also helps to remove some pseudo-changes. The remaining four pairs of low-level features are fed into four FDMs to extract detailed change information. It should be noted that in these operations of extracting change information, we also need to retain the original feature maps provided by the Siamese encoders.

In the feature fusion stage of the decoder, the decoder first processes high-level semantic information and uses the high-level change information extracted by SRMP to fuse more complete high-level change feature maps. These high-level change features can guide the expression of later low-level features. The decoder receives the lower-level feature maps continuously input by skip connections and uses the low-level change information provided by the SRMP to gradually improve the details of the changed areas. The decoder finally fuses multi-scale and multi-depth change information to fully enhance the changed areas and then generates the final pixel-level change detection result. We use the combination of convolution, transposed convolution, and convolution as the decoder block and, finally, recover more detailed information by continuously fusing other features. As shown in Figure 1c, it mainly includes a 1 × 1 convolution layer, a 3 × 3 transposed convolution layer (stride = 2), and another 1 × 1 convolution layer.

2.2. Siamese Residual Multi-Kernel Pooling Module

In RS change detection tasks, the object multi-scale problem is a common issue. In addition, the edge completeness of large objects and the detection of small objects are also huge challenges. The Residual Multi-kernel Pooling module (RMP) [52] was proposed to solve the problem of the large variety of object size in the medical segmentation field. RMP adopts a multi-kernel pooling mechanism to further extract context information and expand the receptive field, which makes the model have excellent performance in terms of object completeness. Therefore, we have made improvements based on RMP and proposed SRMP for RS change detection tasks. This module is specially used to process the high-level feature differences of the Siamese network, and provide high change intensity for small objects and the object edge in the changed areas, so as to guide the selection of low-level features in the feature fusion stage and finally achieve more excellent results.

The proposed SRMP is shown in Figure 2. Since the SRMP is proposed to process the difference map of the Siamese structure, we first calculate the absolute difference value of the high-level feature maps extracted by the Siamese encoders to obtain the feature difference map. Secondly, we use four different max-pooling kernels to down-sample them with different strides to obtain multiple feature maps with different levels of detail, and the size of the kernels are 2 × 2, 3 × 3, 4 × 4, and 5 × 5, respectively. The fusion of different receptive fields will generate high-level change information of different details simultaneously, so as to better solve the multi-scale problems. Then, we perform 1 × 1 convolution on each branch to convert it into a single-channel feature map and up-sample the four single-channel feature maps to restore the original size. In addition, we adopt nearest neighbor interpolation for up-sampling, which can retain the original single-channel feature maps, thereby avoiding information loss caused by other interpolation methods. Experimental results (See Section 3 for details) have shown that the nearest interpolation is better than bilinear interpolation. Finally, we concatenate the original two feature maps with the up-sampling feature maps as the output of SRMP.

2.3. Feature Difference Module

Since the low-level difference map contains rich change information, we propose FDM to extract the change information from the low-level feature difference maps to enhance the feature details of the changed areas, which can help the model achieve more accurate edges. In addition, since the feature difference map contains not only change information but pseudo-change information, the FDM can remove some pseudo-changes in the low-level difference maps during extracting change information. For each pair of low-level features in the model, we use FDM to extract features and generate multi-scale and multi-depth feature maps. These high-resolution feature maps and the high-level change information extracted by SRMP work together in the proposed model. In the feature fusion stage of the decoder, the model uses high-level features to guide these high-resolution change feature maps, so as to further improve the edge of the changed areas and ultimately generate better change detection results.

As shown in Figure 3, we use the bi-temporal low-level feature maps provided by the Siamese encoders to calculate the absolute difference to obtain the feature difference map and then feed the result into a convolution-BN-ReLU block to reduce its dimensionality. Here, we reduce its dimensionality to one-eighth of the feature difference map. Then, the dimensionality reduction feature map is fed into a residual block to obtain the low-level change feature maps. Finally, the extracted low-level change feature maps and the original bi-temporal feature maps are concatenated to generate FDM output results.

2.4. Loss Function

Our framework is an end-to-end deep learning model used to extract pixel-level change information. Generally, pixel-level tasks use Dice Loss [54] or cross-entropy loss. In RS change detection, there are often few positive samples and more negative samples, and the samples are extremely unbalanced. Training the model directly with these unbalanced sample datasets usually makes the accuracy very high, but a large number of positive samples are misjudged, and the recall rate is very low simultaneously. To overcome this situation, we adopt BCE Loss and Tversky Loss [55], which is a loss function weighted on Dice Loss to alleviate the impact of sample imbalance:

L_{l o s s} = L_{T} + L_{B C E}

(1)

where

L_{T}

is Tversky loss:

L_{T} (A, B) = \frac{(A \cap B)}{((A \cap B) + α | A - B | + β | B - A |)}

(2)

and

L_{B C E}

is the binary cross-entropy loss:

L_{B C E} = - (B \times ln A + (1 - B) \times ln (1 - A))

(3)

where A indicates the predicted result of the model, B indicates the ground truth.

| A - B |

is false positive (

F P

), and

| B - A |

is false negative (

F N

).

α

and

β

control the magnitude of penalties for

F P

s and

F N

s, respectively. By adjusting them, we can control the trade-off between

F P

s and

F N

s [55].

3. Experiments and Results

To evaluate the proposed method, we use three public datasets, the CDD, the Building Change Detection Dataset (BCDD) [56], and the Onera Satellite Change Detection Dataset (OSCD) [57], to evaluate the effectiveness of our model.

3.1. Datasets

3.1.1. CDD

The CDD, as shown in Figure 4, is an open RS change detection dataset obtained by Google Earth (DigitalGlobe). The dataset consists of 11 pairs of multi-source RS images, and the elements of change detection in this dataset include roads, buildings, vehicles, vegetation, and so on. The spatial resolution of bi-temporal images ranges from 0.03 to 1 m/px. In [51], the original images are processed to 16,000 pairs of images with a size of 256 × 256 pixels: 10,000 pairs of sub-images are used as a training dataset and 3000 pairs of sub-images are used as a the test and validation dataset.

3.1.2. BCDD

The BCDD (Figure 5) covers the area of the magnitude 6.3 earthquake that struck Christchurch, New Zealand, in February 2011 and has been reconstructed over the following years. The element of change detection in this dataset is buildings, and the spatial resolution of bi-temporal images is 0.3 m/px. Since the image size is 32,507 × 15,354 pixels, we split the 2 images into non-overlapping 256 × 256-pixel image pairs. Then, we divide the image into 6096 pairs of sub-images in the training dataset and 1524 pairs of sub-images in the validation dataset in a 4:1 ratio.

3.1.3. OSCD

The OSCD dataset was built using images from the Sentinel-2 satellites. It is focused on urban areas and ignores natural changes (e.g., vegetation growth or sea tides). The dataset contains 24 pairs of multi-spectral images with a resolution of about 600 × 600 with a spatial resolution of 10 m/px. We split the images (visible light bands) into 256 × 256-pixel image pairs and generate 1519 pairs of sub-images in the training dataset and 588 pairs of sub-images in the validation dataset, as shown in Figure 6.

3.2. Comparative Methods and Evaluation Metrics

In order to evaluate the performance of the proposed method, we compare it with several other deep-learning-based change detection methods: FC-EF, FC-Siam-Conc, and FC-Siam-Diff. The FC-EF method is a single-branch network that is based on U-Net, and the input is the concatenation of the bi-temporal images. FC-Siam-Conc and FC-Siam-Diff are double-branched Siamese structures, and they are two typical feature fusion methods. DASNet is a Siamese structure, which improves the recognition performance of the model through DAM. In addition, the WDMC loss is proposed to alleviate the problem of sample imbalance. We adapt the DASNet(VGG16) and DASNet(ResNet50) in our comparative experiments, and their encoders are loaded with the weights pre-trained on ImageNet. SNUNet-CD is a densely connected Siamese network (combination of Siamese network and NestedUNet [36]). In addition, the ECAM is proposed for deep supervision, which is used to extract the most representative features in different semantic levels for the final classification. A region detail preserving network for change detection (RDP-Net) [58] is a novel Siamese structure that requires only a small amount of parameters to achieve good results. It is worth noting that our comparative models are all based on the Siamese architecture except FC-EF. We conduct comparative experiments with the above methods on the CDD, BCDD, and OSCD datasets. Furthermore, among the comparison models, only the DASNet is loaded with pre-trained weights.

In order to quantitatively evaluate the performance of our proposed method, Precision (P), Recall (R), F1-score, Overall Accuracy (OA), Intersection-over-Union (IoU), and Kappa are utilized to compare the labels and our results, which are calculated as follows:

P = \frac{T P}{T P + F P}

(4)

R = \frac{T P}{T P + F N}

(5)

F 1 = 2 \times \frac{P \times R}{P + R}

(6)

O A = \frac{T P + T N}{T P + F N + F P + T N}

(7)

I o U = \frac{T P}{T P + F P + F N}

(8)

K a p p a = \frac{O A - P R E}{1 - P R E}

(9)

where true positive (

T P

) and true negative (

T N

) denote the number of changed and unchanged pixels detected correctly, respectively. False positive (

F P

) and false negative (

F N

) denote the number of changed and unchanged pixels detected incorrectly, respectively.

P R E

represents the proportion of expected agreement between the ground-truth and predictions with given class distributions [59]. The expression of

P R E

is as follows:

P R E = \frac{(T P + F P) (T P + F N) + (F N + T N) (F P + T N)}{{(T P + F P + T N + F N)}^{2}}

(10)

3.3. Experimental Setup

Experiment Settings

During training, we normalize the image and perform random flips and rotations for data enhancement. In addition, we choose the Adam [60] algorithm with a weight decay of 5 × 10

^{- 5}

as the optimization algorithm and adopt the following exponential decay learning rate strategy:

l r = l r_{i n i t} \times b^{e p o c h} + l r_{m i n}

(11)

For the initial learning rate

l r_{i n i t} = 10^{- 4}

, the decay base is

b = 0.99

. In order to keep the learning rate from being too small, we set the minimum learning rate as

l r_{m i n} = 10^{- 10}

. The parameters of the Tversky loss are set to

α = 0.3

and

β = 0.7

. The batch size of our model is set to 45. During the test, we binarize the output maps as the final results with a threshold of 0.5. The two datasets use the same training and testing settings. Moreover, all experiments are conducted with the PyTorch framework and a single NVIDIA GeForce RTX 2080 Ti GPU with 11 G memory.

3.4. Ablation Study

In order to justify the effectiveness of SRMP block and FDM block in SMD-Net, we conduct the following ablation studies.

3.4.1. Ablation Study for SRMP

First, in order to validate the effectiveness of the SRMP, we conduct some experiments on the BCDD dataset. We adopt Resnet-34 pre-trained on ImageNet as the Siamese encoders to construct a basic Siamese U-shaped network as the baseline and then add the SRMP block for comparison. In addition, during the up-sampling process of SRMP, we adopt two methods to verify the results. Figure 7a is the feature difference map obtained by calculating the absolute difference using the bi-temporal feature maps. The max-pooling operation processes Figure 7a to expand the receptive field and solves the object completeness in the high-level features and obtains Figure 7b. We use the nearest interpolation and the bilinear interpolation to obtain the up-sampling results in Figure 7c,d, respectively. Obviously, after the max pooling operation, due to the increase in the receptive field, the large object is more complete, and the slender object is more continuous (orange circle). From the results shown in Figure 7c,d, it can be found that the bilinear interpolation weakens the change intensity of the edge of the large object (green circle), while it will cause serious damage to the change intensity of the small object (red circle). In contrast, the result of Figure 7c is rougher but retains a relatively high change intensity in these regions. Maintaining these areas in high-level features has a higher change intensity, which will directly enhance the results in these areas.

As shown in Table 1, compared with the baseline, our SRMP (nearest) block achieves scores of 99.44%, 93.81%, 92.72%, and 94.92% on OA, F1-score, R, and P, respectively, which correspond to increases of 0.09%, 1.06%, 1.25%, and 0.53%, respectively. In addition, the SRMP with the nearest interpolation has better results than the bilinear interpolation, with an increase of 0.03% and 0.33% in the OA and F1-score, respectively. The above experiments prove that our SRMP module can extract effective high-level change information from the high-level feature difference map, and better solve the multi-scale problem of the objects and the problem of object completeness. In addition, the SRMP with the nearest interpolation can protect the change intensity of the areas such as small objects and object edge in high-level change information, thereby obtaining better results.

3.4.2. Ablation Study for FDM

Table 1 also shows the effectiveness of FDM. After adding the FDM blocks in baseline, our method brings a score of 0.06% and 0.8% on OA and F1-score improvement, respectively. After applying FDM to baseline + SRMP (nearest), our method yields a result of 99.48% and 94.33% on OA and F1-score, respectively, with an increase of 0.04% and 0.52%, respectively. This confirms the effectiveness of extracting change information from the low-level feature difference map to enhance the details of the results. In addition, our SMD-Net, based on the baseline, achieves a score of 2.71%, 1.58%, 0.13%, 2.80%, and 1.65% on the R, F1-score, OA, IoU, and Kappa, respectively.

3.5. Performance Experiment

3.5.1. Experiment on CDD

We conduct experiments on the CDD dataset to verify the effectiveness of our proposed network. The quantitative results of the comparison are shown in Table 2. Compared with other RS change detection methods, our method improves the OA and F1-score by 0.2% and 0.8%, respectively, which achieves 99.3% and 97.0%, respectively, and our method improves P, R, IoU, and Kappa by 0.3%, 1.2%, 1.4%, and 0.9%, respectively. The performance of our SMD-Net (no-pre-training) is not optimal, but it is still better than other methods in terms of object edge detection, object completeness, and small object detection (see Section 3.5.5 for details). In addition, the backbone of our method is ResNet, which has a large number of parameters, and the amount of data used in this experiment is small, so the pre-training of the model is still necessary.

In order to show the performance of our model more intuitively, we select several typical results to visualize them, as shown in Figure 8. It can be seen from the visualization results in the first row that our model has perfect performance in scenes with large differences in bi-temporal images. The second row shows that our method has better results on the complex edges of the changed areas, while other methods are either too rough on the edges or the edges are lost. Moreover, the third and fourth rows show the results of our method in object completeness and small object detection, respectively. Our model has more continuous road change detection results shown in the third row of the road, and it retains the edges well compared to SNUet-CD. In addition, compared with other methods, our model could detect all vehicles with high accuracy in the fourth row.

3.5.2. Experiment on BCDD

In order to verify the generalization of our proposed model, we carry out experiments on the BCDD dataset. The comparison between our proposed method and other RS change detection methods is shown in Table 3. Our model achieves 99.5% and 94.3% on OA and F1-score, respectively, which improves by 0.4% and 3.7%, respectively. Compared with other methods, our method improves P, R, IoU, and Kappa by 3.4%, 2.0%, 6.5%, and 5.7%, respectively. We select four typical scenes to show the results of our model, as shown in Figure 9. The first row to the fourth row show the performance of our model in the task of building change detection in largely changed scenes, building edges, object completeness, and small objects, respectively. The experiments prove that our model has better results in terms of object completeness, edge detail, and small objects. However, it is worth noting that in this special building change detection scene in the third row, the feature difference map is quite different from the change map. Using only the feature difference map (such as FC-Siam-Diff) without retaining the original features will not be able to obtain good results. On the basis of retaining the original features, our method extracts multi-scale and multi-depth change information from the feature difference maps to enhance the changed areas and achieves a perfect performance.

3.5.3. Experiment on OSCD

To further elucidate the effectiveness of our proposed method, we carry out experiments on the OSCD dataset. Due to the low spatial resolution of the OSCD, and the range of changes in ground objects is small. The results obtained by the up-sampling of DASNet on the OSCD are very poor, so we do not show them here. The comparison between our proposed method and other RS change detection methods is shown in Table 4. Our model achieves 60.6% on the F1-score, which is improved by 4.1%. Meanwhile, our method improves P, R, IoU, and Kappa by 1.4%, 2.7%, 4.1%, and 4.0%, respectively, compared to other methods. It is worth noting that SNUNet-CD-c48 achieved better OA results due to its more

T N

s. We select four typical scenes to show the results of our model, as shown in Figure 10. The first row to the fourth row shows the performance of our model in the task of change detection in object completeness, small objects, object edge, and largely changed scenes, respectively. The experiments prove that our model has better results.

3.5.4. Total Number of Parameters and Inference Time

We calculate the total parameters of each model, as shown in Figure 11. The total parameter of our model is 23.17 M, which is 3.89 M less than SNUet-CD-c48. Simultaneously, our model has better performance and achieves 97.0%, 94.3%, and 60.6% on the F1-scores on the CDD, BCDD, and OSCD, respectively. In addition, a pair of 256 × 256-pixel images are used to evaluate the inference time, as shown in Table 5. Since the encoder of our model adopts ResNet, the inference time is relatively long. However, RDP-Net has fewer parameters but takes a longer inference time. Compared with other RS change detection methods, our model has a better trade-off between the accuracy and the calculation cost.

3.5.5. Results Analysis

To evaluate the effectiveness of our proposed method in terms of object edge, object completeness, and small object detection, we conduct experiments to analyze these three aspects, respectively.

For object edge, we adopt Boundary IoU [61] to evaluate the performance of each model. In addition, the R is employed to evaluate the performance of each model in terms of object completeness. For small objects, we process the ground truth and only keep the objects smaller than 64 × 64 = 16,384 pixels (one-sixteenth of an image) and, finally, compute the

R_{S}

to analyze the detection results of each method. Their expressions are calculated as follows:

B o u n d a r y I o U = \frac{| (G_{D} \cap G) \cap (P_{D} \cap P) |}{| (G_{D} \cap G) \cup (P_{D} \cap P) |}

(12)

R_{S} = \frac{T P_{S}}{T P_{S} + F N_{S}}

(13)

where G is the ground truth binary mask, P is the prediction binary mask,

G_{D}

and

P_{D}

are the set of pixels in the boundary region of the binary mask, and D is the pixel width of the boundary region.

R_{S}

indicates the recall of small objects less than S pixels, while

T P_{S}

and

F N_{S}

represent

T P

and

F N

of small objects, respectively. We set D = 1, S = 64 × 64 = 16,384. The results of each method in CDD and BCDD are shown in Table 6 and Table 7, respectively.

From the results on the CDD dataset, we can see that our SMD-Net (no-pre-training) has similar results on the F1-score compared to RDP-Net and SNUNet-CD-c48, but it has obvious advantages in small objects. In addition SMD-Net is completely superior to other methods in these three aspects. In the results of the BCDD dataset, since we focus on the changes in buildings, the results of DASNet obtained by up-sampling have relatively high scores in terms of object completeness. Our SMD-Net (no-pre-training) has similar results to FC-Siam-Conc and DASNet (VGG16) in terms of F1-score. However, its mean of these three aspects outperforms other methods. In addition, our SMD-Net has an excellent performance in these three aspects.

These experiments show that our method has superior performance compared to other models in terms of object edge, object completeness, and small object detection.

4. Discussion

In the analysis of the change detection results, we find a series of problems, such as a small number of label errors in the dataset, and the overlap between the training dataset, validation dataset, and test dataset. We will analyze the effect of these problems on the performance of our model.

4.1. The Effect of Dataset Overlap

After a series of experiments, we find that the training dataset, validation dataset, and test dataset on CDD contain a lot of overlap. The overlap phenomenon in the dataset may cause the problem that the over-fitting appeared while the model has a good performance on the validation and the test dataset. This situation will make it impossible for us to verify the true performance of the model through the CDD dataset. For example, our model once achieved an F1-score of 98.6% on the CDD dataset, which is a good result, while its performance on the BCDD dataset (no overlap) is only about 90% on the F1-score. In addition, we also find similar problem in the SNUNet-CD and RDP-Net, as shown in Table 2, Table 3 and Table 4. In our reproduced results, these two models have more excellent performance on CDD than BCDD. However, the CDD dataset has more complex scenarios than the BCDD dataset, and it is obvious that overfitting occurred. From observations in the CDD dataset, we find that there is indeed a lot of overlap, as shown in Figure 12. Therefore, the experiments conducted on the CDD dataset cannot directly reflect the effectiveness of our proposed method.

In order to verify the true performance of our model on the complex changing scene of the CDD dataset, we reprocessed the 11 bi-temporal images and carried out experiments to compare with some other methods. We first divide each pair of images into two parts according to the width at a ratio of 4:1 as the training and the validation images. Then, we process them through cropping and rotation operations to generate 5026 pairs of sub-images as training dataset and 1231 pairs of sub-images as validation dataset with the size of 256 × 256. We conduct experiments on this dataset for comparison, and the results are shown in Figure 13. Note that we also compute the results on the training dataset to verify whether there is an overfitting problem.

The experimental results show that compared with other RS change detection methods, our model is improved by 0.6% and 6.0% on the OA and F1-score respectively, which achieves 97.2% and 79.8%, respectively. In addition, RDP-Net and SNUNet-CD-c32 have similar performances to FC-siam-Diff and DASNET on the training dataset, but their results on the validation dataset are even worse. A good result on overlapping datasets does not demonstrate the actual ability of the model, and it may be that overfitting has occurred and the scores may be falsely high. It can only prove that the model has a good fit or even overfitting ability to the data but may have poor results on unlearned data. This can prove that SNUNet-CD-c32 and RDP-Net have a good performance on original CDD because of the overfitting problem. The experiments prove that overlapping datasets cannot verify the true capabilities of the model, while our method has an excellent trade-off between accuracy and calculation cost. We additionally design two ablation experiments on this dataset to demonstrate the impact of the proposed module on the results, as shown in Table 8. It can be seen from the experimental results that removing any module will lead to a decrease in the results.

4.2. The Effect of Label Error

After observation, we find that the biggest problem of the proposed method on the BCDD dataset is the object-level error, rather than pixel-level error. There are various complicated reasons for these object-level errors, such as label errors, missed detection, and false detection. We select and visualize some typical samples, as shown in Figure 14. In addition, we visualize several typical results of label errors and verify the performance of other methods as shown in Figure 15. We can observe that there are obviously changed buildings in bi-temporal images. These results are correct, but the ground truth is mislabeled. We conduct experiments on the BCDD dataset to accurately analyze the detection ability of each model for label errors. The models with F1-score > 90.0 on the BCDD dataset are chosen to build a model zoo (FC-Siam-Conc, FC-Siam-Diff, DASNet (VGG16), baseline, and SMD-Net), as shown in Figure 16. Their predicted results are processed to remove the labeled objects, and then the five sets of results are used for voting with a threshold of four. Finally, the objects smaller than 5 × 5 = 25 pixels will be removed from the voting results. We list the number of objects detected by each model, along with the corresponding recall R

_{r e l a b e l}

, as shown in Table 9. Compared with the baseline, SMD-Net can detect fewer objects, but it has a higher R

_{r e l a b e l}

. Furthermore, our network can correctly detect more unlabeled changing areas than other models, which proves the good generalization ability of our method. In addition, the label error may cause the problem that our model detects the changed objects without corresponding labels, which has an impact on the score of our model as it detects more label errors.

5. Conclusions

In this paper, we have proposed an SMD-Net for high-resolution RS image change detection. Adopting the Siamese structure, SRMP block and FDM block are proposed to provide the decoder with multi-scale and multi-depth change information to enhance the change intensity while retaining the original features. Through this strategy, our network has an excellent performance in object completeness, small object detection, and object edge detection. Compared with some other methods, our proposed network has excellent performance on three public datasets and has a good trade-off between accuracy and calculation cost. In addition, our model has good generalization ability and can detect more label errors. However, our method still has some points for improvement. For example, the practicality of our method in real-world scenarios needs to be improved, and its inference time needs to be shortened. In addition to this, our experiments have some limitations. For example, in Section 4.2, since some objects in the BCDD dataset are mislabeled, we can only indirectly evaluate the performance of each model by means of pseudo-labels. In addition, our experiments do not fully determine how many mislabeled objects each model can realistically detect.

In the future, we will conduct further research on few-shot learning, inaccurate supervision, and multiple types of changed areas to improve the performance of change detection, as well as improving the ability of the model in real-world scenarios.

Author Contributions

Conceptualization, X.Z. and L.H.; methodology, L.H.; software, L.H.; validation, X.Z., L.H. and L.J.; formal analysis, X.Z. and L.H.; investigation, X.Z. and L.H.; resources, X.Z.; data curation, L.H.; writing—original draft preparation, L.H.; writing—review and editing, X.Z., L.H. and L.J.; visualization, L.H., K.Q. and Q.D.; supervision, X.Z., K.Q., Q.D. and X.T.; project administration, X.Z., K.Q., Q.D. and H.S.; funding acquisition, X.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China under Grants 61772400, 61801351, 61772399, and 91438201 and in part by the Key Research and Development Program in Shaanxi Province of China under grant 2019ZDLGY03-08.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Publicly available datasets were analysed in this study. The datasets can be found here: CDD: https://drive.google.com/file/d/1GX656JqqOyBi_Ef0w65kDGVto-nHrNs9/edit, BCDD: http://study.rsgis.whu.edu.cn/pages/download/building_dataset.html and OSCD: https://rcdaudt.github.io/oscd/, accessed on 18 March 2022.

Acknowledgments

The authors acknowledge the Foundation for financial supports of this work.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

SRMP	Siamese residual multi-kernel pooling module
FDM	feature difference module
CDD	Change Detection Dataset
BCDD	Building Change Detection Dataset
OSCD	Onera Satellite Change Detection Dataset
SAR	Synthetic Aperture Radar
RS	remote sensing
PCA	principal component analysis
CVA	change vector analysis
DAM	dual attention mechanism
NN	neural network
GAN	generative adversarial network
LSTM	long short-term memory
SeK	separated kappa
FP	false positive
FN	false negative
TP	true positive
TN	true negative
FC-EF	Fully Convolutional Early Fusion
FC-Siam-Conc	Fully Convolutional Siamese-Concatenation
FC-Siam-Diff	Fully Convolutional Siamese-Difference
OA	Overall Accuracy
P	Precision
R	Recall
IoU	Intersection-over-Union

References

Khan, S.H.; He, X.; Porikli, F.; Bennamoun, M. Forest Change Detection in Incomplete Satellite Images With Deep Neural Networks. IEEE Trans. Geosci. Remote Sens. 2017, 55, 5407–5423. [Google Scholar] [CrossRef]
Chen, J.; Rao, Y.; Shen, M.; Wang, C.; Zhou, Y.; Ma, L.; Tang, Y.; Yang, X. A Simple Method for Detecting Phenological Change From Time Series of Vegetation Index. IEEE Trans. Geosci. Remote Sens. 2016, 54, 3436–3449. [Google Scholar] [CrossRef]
Amitrano, D.; Guida, R.; Iervolino, P. Semantic Unsupervised Change Detection of Natural Land Cover with Multitemporal Object-Based Analysis on SAR Images. IEEE Trans. Geosci. Remote Sens. 2021, 59, 5494–5514. [Google Scholar] [CrossRef]
He, D.; Zhong, Y.; Zhang, L. Spectral-Spatial-Temporal MAP-Based Sub-Pixel Mapping for Land-Cover Change Detection. IEEE Trans. Geosci. Remote Sens. 2020, 58, 1696–1717. [Google Scholar] [CrossRef]
Giustarini, L.; Hostache, R.; Matgen, P.; Schumann, G.J.; Bates, P.D.; Mason, D.C. A Change Detection Approach to Flood Mapping in Urban Areas Using TerraSAR-X. IEEE Trans. Geosci. Remote Sens. 2013, 51, 2417–2430. [Google Scholar] [CrossRef] [Green Version]
Bovolo, F.; Bruzzone, L. A Split-Based Approach to Unsupervised Change Detection in Large-Size Multitemporal Images: Application to Tsunami-Damage Assessment. IEEE Trans. Geosci. Remote Sens. 2007, 45, 1658–1670. [Google Scholar] [CrossRef]
Gong, M.; Zhan, T.; Zhang, P.; Miao, Q. Superpixel-Based Difference Representation Learning for Change Detection in Multispectral Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2017, 55, 2658–2673. [Google Scholar] [CrossRef]
Mondini, A.; Guzzetti, F.; Reichenbach, P.; Rossi, M.; Cardinali, M.; Ardizzone, F. Semi-automatic recognition and mapping of rainfall induced shallow landslides using optical satellite images. Remote Sens. Environ. 2011, 115, 1743–1757. [Google Scholar] [CrossRef]
Çelik, T. Unsupervised Change Detection in Satellite Images Using Principal Component Analysis and k -Means Clustering. IEEE Geosci. Remote Sens. Lett. 2009, 6, 772–776. [Google Scholar] [CrossRef]
Chen, J.; Chen, X.; Cui, X.; Chen, J. Change Vector Analysis in Posterior Probability Space: A New Method for Land Cover Change Detection. IEEE Geosci. Remote Sens. Lett. 2011, 8, 317–321. [Google Scholar] [CrossRef]
Melgani, F.; Moser, G.; Serpico, S.B. Unsupervised change-detection methods for remote-sensing images. Opt. Eng. 2002, 41, 3288–3297. [Google Scholar]
Ma, W.; Jiao, L.; Gong, M.; Li, C. Image change detection based on an improved rough fuzzy c-means clustering algorithm. Int. J. Mach. Learn. Cybern. 2014, 5, 369–377. [Google Scholar] [CrossRef]
Zhang, Y.; Peng, D.; Huang, X. Object-Based Change Detection for VHR Images Based on Multiscale Uncertainty Analysis. IEEE Geosci. Remote Sens. Lett. 2018, 15, 13–17. [Google Scholar] [CrossRef]
Zhang, C.; Li, G.; Cui, W. High-Resolution Remote Sensing Image Change Detection by Statistical-Object-Based Method. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 2440–2447. [Google Scholar] [CrossRef]
Gil-Yepes, J.L.; Ruiz, L.A.; Recio, J.A.; Balaguer-Beser, A.; Hermosilla, T. Description and validation of a new set of object-based temporal geostatistical features for land-use/land-cover change detection. ISPRS J. Photogramm. Remote Sens. 2016, 121, 77–91. [Google Scholar] [CrossRef]
Qin, Y.; Niu, Z.; Chen, F.; Li, B.; Ban, Y. Object-based land cover change detection for cross-sensor images. Int. J. Remote Sens. 2013, 34, 6723–6737. [Google Scholar] [CrossRef]
Ma, L.; Li, M.; Blaschke, T.; Ma, X.; Tiede, D.; Cheng, L.; Chen, Z.; Chen, D. Object-Based Change Detection in Urban Areas: The Effects of Segmentation Strategy, Scale, and Feature Space on Unsupervised Methods. Remote Sens. 2016, 8, 761. [Google Scholar] [CrossRef] [Green Version]
Fang, S.; Li, K.; Shao, J.; Li, Z. SNUNet-CD: A Densely Connected Siamese Network for Change Detection of VHR Images. IEEE Geosci. Remote Sens. Lett. 2022, 19, 8007805. [Google Scholar] [CrossRef]
Bruzzone, L.; Fernández-Prieto, D. Automatic analysis of the difference image for unsupervised change detection. IEEE Trans. Geosci. Remote Sens. 2000, 38, 1171–1182. [Google Scholar] [CrossRef] [Green Version]
Huang, C.; Song, K.; Kim, S.; Townshend, J.; Davis, P.; Masek, J.G.; Goward, S.N. Use of a dark object concept and support vector machines to automate forest cover change analysis. Remote Sens. Environ. 2008, 112, 970–985. [Google Scholar] [CrossRef]
Guo, C.; Li, Y.; Liu, Y.; Shang, Y. Automatic change detection in high-resolution remote-sensing images by means of level set evolution and support vector machine classification. Int. J. Remote Sens. 2014, 35, 6255–6270. [Google Scholar]
Volpi, M.; Tuia, D.; Bovolo, F.; Kanevski, M.; Bruzzone, L. Supervised change detection in VHR images using contextual information and support vector machines. Int. J. Appl. Earth Obs. Geoinf. 2013, 20, 77–85. [Google Scholar] [CrossRef]
Girshick, R.B. Fast R-CNN. In Proceedings of the 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar] [CrossRef]
Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2017; pp. 6517–6525. [Google Scholar] [CrossRef] [Green Version]
He, K.; Gkioxari, G.; Dollár, P.; Girshick, R.B. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar] [CrossRef]
Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar] [CrossRef] [Green Version]
Lin, G.; Milan, A.; Shen, C.; Reid, I. RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef] [PubMed]
Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef] [Green Version]
Tan, M.; Le, Q.V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, Long Beach, CA, USA, 9–15 June 2019; Volume 97, pp. 6105–6114. [Google Scholar]
Peng, X.; Zhong, R.; Li, Z.; Li, Q. Optical Remote Sensing Image Change Detection Based on Attention Mechanism and Image Difference. IEEE Trans. Geosci. Remote Sens. 2020, 59, 7296–7307. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015—18th International Conference, Munich, Germany, 5–9 October 2015; Navab, N., Hornegger, J., III, Wells, W.M., Frangi, A., Eds.; Lecture Notes in Computer Science. Springer: Berlin/Heidelberg, Germany, 2015; Volume 9351, pp. 234–241. [Google Scholar] [CrossRef] [Green Version]
Daudt, R.C.; Saux, B.L.; Boulch, A. Fully Convolutional Siamese Networks for Change Detection. In Proceedings of the 2018 IEEE International Conference on Image Processing, ICIP 2018, Athens, Greece, 7–10 October 2018; pp. 4063–4067. [Google Scholar] [CrossRef] [Green Version]
Peng, D.; Zhang, Y.; Guan, H. End-to-End Change Detection for High Resolution Satellite Images Using Improved UNet++. Remote Sens. 2019, 11, 1382. [Google Scholar] [CrossRef] [Green Version]
Zhou, Z.; Siddiquee, M.M.R.; Tajbakhsh, N.; Liang, J. UNet++: Redesigning Skip Connections to Exploit Multiscale Features in Image Segmentation. IEEE Trans. Med. Imaging 2020, 39, 1856–1867. [Google Scholar] [CrossRef] [Green Version]
Bromley, J.; Bentz, J.W.; Bottou, L.; Guyon, I.; LeCun, Y.; Moore, C.; Säckinger, E.; Shah, R. Signature Verification Using A “Siamese” Time Delay Neural Network. Int. J. Pattern Recognit. Artif. Intell. 1993, 7, 669–688. [Google Scholar] [CrossRef] [Green Version]
Chen, J.; Yuan, Z.; Peng, J.; Chen, L.; Huang, H.; Zhu, J.; Liu, Y.; Li, H. DASNet: Dual Attentive Fully Convolutional Siamese Networks for Change Detection in High-Resolution Satellite Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 1194–1206. [Google Scholar] [CrossRef]
Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H. Dual Attention Network for Scene Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, 16–20 June 2019; pp. 3146–3154. [Google Scholar] [CrossRef] [Green Version]
Papadomanolaki, M.; Verma, S.; Vakalopoulou, M.; Gupta, S.; Karantzalos, K. Detecting Urban Changes with Recurrent Neural Networks from Multitemporal Sentinel-2 Data. In Proceedings of the 2019 IEEE International Geoscience and Remote Sensing Symposium, IGARSS 2019, Yokohama, Japan, 28 July–2 August 2019; pp. 214–217. [Google Scholar] [CrossRef] [Green Version]
Peng, D.; Bruzzone, L.; Zhang, Y.; Guan, H.; Ding, H.; Huang, X. SemiCDNet: A Semisupervised Convolutional Neural Network for Change Detection in High Resolution Remote-Sensing Images. IEEE Trans. Geosci. Remote Sens. 2021, 59, 5891–5906. [Google Scholar] [CrossRef]
Chen, H.; Wu, C.; Du, B.; Zhang, L.; Wang, L. Change Detection in Multisource VHR Images via Deep Siamese Convolutional Multiple-Layers Recurrent Neural Network. IEEE Trans. Geosci. Remote Sens. 2020, 58, 2848–2864. [Google Scholar] [CrossRef]
Yang, K.; Xia, G.; Liu, Z.; Du, B.; Yang, W.; Pelillo, M. Asymmetric Siamese Networks for Semantic Change Detection. CoRR 2020, 60, 1–18. [Google Scholar]
Zhang, H.; Lin, M.; Yang, G.; Zhang, L. ESCNet: An End-to-End Superpixel-Enhanced Change Detection Network for Very-High-Resolution Remote Sensing Images. IEEE Trans. Neural Netw. Learn. Syst. 2021, 1–15. [Google Scholar] [CrossRef] [PubMed]
Zhang, C.; Yue, P.; Tapete, D.; Jiang, L.; Shangguan, B.; Huang, L.; Liu, G. A deeply supervised image fusion network for change detection in high resolution bi-temporal remote sensing images. ISPRS J. Photogramm. Remote Sens. 2020, 166, 183–200. [Google Scholar] [CrossRef]
Zhang, M.; Shi, W. A Feature Difference Convolutional Neural Network-Based Change Detection Method. IEEE Trans. Geosci. Remote Sens. 2020, 58, 7232–7246. [Google Scholar] [CrossRef]
Wang, D.; Chen, X.; Jiang, M.; Du, S.; Xu, B.; Wang, J. ADS-Net: An Attention-Based deeply supervised network for remote sensing image change detection. Int. J. Appl. Earth Obs. Geoinf. 2021, 101, 102348. [Google Scholar] [CrossRef]
Li, X.; Du, Z.; Huang, Y.; Tan, Z. A deep translation (GAN) based change detection network for optical and SAR remote sensing images. ISPRS J. Photogramm. Remote Sens. 2021, 179, 14–34. [Google Scholar] [CrossRef]
Ebel, P.; Saha, S.; Zhu, X.X. Fusing multi-modal data for supervised change detection. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2021, 43, 243–249. [Google Scholar] [CrossRef]
Hafner, S.; Nascetti, A.; Azizpour, H.; Ban, Y. Sentinel-1 and Sentinel-2 Data Fusion for Urban Change Detection Using a Dual Stream U-Net. IEEE Geosci. Remote Sens. Lett. 2021, 19, 1–5. [Google Scholar] [CrossRef]
Lebedev, M.; Vizilter, Y.V.; Vygolov, O.; Knyaz, V.; Rubis, A.Y. Change detection in remote sensing images using conditional adversarial networks. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2018, 42, 2. [Google Scholar] [CrossRef] [Green Version]
Gu, Z.; Cheng, J.; Fu, H.; Zhou, K.; Hao, H.; Zhao, Y.; Zhang, T.; Gao, S.; Liu, J. CE-Net: Context Encoder Network for 2D Medical Image Segmentation. IEEE Trans. Med. Imaging 2019, 38, 2281–2292. [Google Scholar] [CrossRef] [Green Version]
Deng, J.; Dong, W.; Socher, R.; Li, L.; Li, K.; Li, F. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar] [CrossRef] [Green Version]
Crum, W.R.; Camara, O.; Hill, D.L.G. Generalized Overlap Measures for Evaluation and Validation in Medical Image Analysis. IEEE Trans. Med. Imaging 2006, 25, 1451–1461. [Google Scholar] [CrossRef] [PubMed]
Salehi, S.S.M.; Erdogmus, D.; Gholipour, A. Tversky loss function for image segmentation using 3D fully convolutional deep networks. In Proceedings of the International Workshop on Machine Learning in Medical Imaging, Quebec, QC, Canada, 10 September 2017; Springer: Berlin/Heidelberg, Germany, 2017; pp. 379–387. [Google Scholar]
Ji, S.; Wei, S.; Lu, M. Fully Convolutional Networks for Multisource Building Extraction From an Open Aerial and Satellite Imagery Data Set. IEEE Trans. Geosci. Remote Sens. 2019, 57, 574–586. [Google Scholar] [CrossRef]
Daudt, R.C.; Saux, B.L.; Boulch, A.; Gousseau, Y. Urban Change Detection for Multispectral Earth Observation Using Convolutional Neural Networks. In Proceedings of the 2018 IEEE International Geoscience and Remote Sensing Symposium, IGARSS 2018, Valencia, Spain, 22–27 July 2018; pp. 2115–2118. [Google Scholar] [CrossRef] [Green Version]
Chen, H.; Pu, F.; Yang, R.; Tang, R.; Xu, X. RDP-Net: Region Detail Preserving Network for Change Detection. arXiv 2022, arXiv:2202.09745. [Google Scholar]
El Amin, A.M.; Liu, Q.; Wang, Y. Zoom out CNNs features for optical remote sensing change detection. In Proceedings of the 2017 2nd International Conference on Image, Vision and Computing (ICIVC), Chengdu, China, 2–4 June 2017; pp. 812–817. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
Cheng, B.; Girshick, R.B.; Dollár, P.; Berg, A.C.; Kirillov, A. Boundary IoU: Improving Object-Centric Image Segmentation Evaluation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, Virtual, 19–25 June 2021; pp. 15334–15342. [Google Scholar]

Figure 1. (a) Overview of the proposed SMD-Net. First of all, the bi-temporal images are fed into the Siamese encoders (based on ResNet-34), and a pair of high-level features and four pairs of low-level features can be obtained. Secondly, the high-level features are fed into the SRMP module to enhance the overall changed area, and the low-level features are fed into the FDM to obtain detailed change information. Finally, the extracted features are fed into the feature decoder to generate the final change detection result; (b) The structure of Residual Blocks; (c) The structure of Decoder Blocks.

Figure 2. Overview of SRMP. Our proposed SRMP takes the absolute difference of the high-level feature maps of the bi-temporal features, and four pooling kernels of different-size are used to collect the change information. Then, the multiple branch features are fed into 1 × 1 convolution to reduce the dimensionality. Finally, the up-sampled features are concatenated with the original bi-temporal features as the output.

Figure 3. Overview of FDM. The proposed FDM takes the difference of the bi-temporal low-level feature maps, and a convolution-BN-ReLU block is used to reduce the dimensionality of the feature difference maps. Then, the dimensionality reduction feature is sorted through a residual block. Finally, the obtained change feature map is concatenated with the original bi-temporal features.

Figure 4. Samples of CDD dataset. The first and second rows: Bi-temporal images; the third row: Ground truth.

Figure 5. Samples of BCDD dataset. The first and second rows: Bi-temporal images; the third row: Ground truth.

Figure 6. Samples of OSCD dataset. The first and second rows: Bi-temporal images, and the third row: Ground truth.

Figure 7. Single-branch operation diagram of multi-kernel pooling: (a) the feature difference map: (b) the pooling result; (c) the result of nearest interpolation; (d) the result of bilinear interpolation.

Figure 8. Visualized comparison of the results of various change detection methods on the CDD dataset: (a,b) are original bi-temporal images; (c) the Ground Truth; (d) FC-EF; (e) FC-Siam-Conc; (f) DASNet(VGG16); (g) SNUet-CD-c32; (h) RDP-Net; (i) SMD-Net (our). The changed areas are shown in white, while the unchanged areas are shown in black. Red indicates the

F N

s, and green indicates the

F P

s.

Figure 8. Visualized comparison of the results of various change detection methods on the CDD dataset: (a,b) are original bi-temporal images; (c) the Ground Truth; (d) FC-EF; (e) FC-Siam-Conc; (f) DASNet(VGG16); (g) SNUet-CD-c32; (h) RDP-Net; (i) SMD-Net (our). The changed areas are shown in white, while the unchanged areas are shown in black. Red indicates the

F N

s, and green indicates the

F P

s.

Figure 9. Visualized comparison of the results of various change detection methods on the BCDD dataset: (a,b) are original bi-temporal images; (c) the Ground Truth; (d) FC-EF; (e) FC-Siam-Diff; (f) DASNet(VGG16); (g) SNUet-CD-c32; (h) RDP-Net; (i) SMD-Net (our). The changed areas are shown in white, while the unchanged areas are shown in black. Red indicates the

F N

s, and green indicates the

F P

s.

Figure 9. Visualized comparison of the results of various change detection methods on the BCDD dataset: (a,b) are original bi-temporal images; (c) the Ground Truth; (d) FC-EF; (e) FC-Siam-Diff; (f) DASNet(VGG16); (g) SNUet-CD-c32; (h) RDP-Net; (i) SMD-Net (our). The changed areas are shown in white, while the unchanged areas are shown in black. Red indicates the

F N

s, and green indicates the

F P

s.

Figure 10. Visualized comparison of the results of various change detection methods on the OSCD dataset: (a,b) are original bi-temporal images; (c) the Ground Truth; (d) FC-EF; (e) FC-Siam-Diff; (f) SNUet-CD-c32; (g) SNUet-CD-c48; (h) RDP-Net; (i) SMD-Net (our). The changed areas are shown in white, while the unchanged areas are shown in black. Red indicates the

F N

s, and green indicates the

F P

s.

Figure 10. Visualized comparison of the results of various change detection methods on the OSCD dataset: (a,b) are original bi-temporal images; (c) the Ground Truth; (d) FC-EF; (e) FC-Siam-Diff; (f) SNUet-CD-c32; (g) SNUet-CD-c48; (h) RDP-Net; (i) SMD-Net (our). The changed areas are shown in white, while the unchanged areas are shown in black. Red indicates the

F N

s, and green indicates the

F P

s.

Figure 11. Parameters (Params) versus accuary (F1-score) on CDD dataset. The radius of circles represents the number of parameters.

Figure 12. Overlapping samples visualization: (a) Patches selected from training dataset; (b) Patches selected from test dataset.

Figure 13. Results on the new CDD dataset. The left part is the result on OA, and the right part is the result on F1-score. The blue bar is the result on the training dataset, and the red bar is the result on the validation dataset.

Figure 14. Visualization of the object-level error between the results of our model and the ground-truth on the BCDD dataset. (a,b,c,d)

F N

s are object-level errors; (d)

F P

s are pixel-level errors.

Figure 14. Visualization of the object-level error between the results of our model and the ground-truth on the BCDD dataset. (a,b,c,d)

F N

s are object-level errors; (d)

F P

s are pixel-level errors.

Figure 15. Visualization of the results on BCDD dataset: (a,b) are original bi-temporal images; (c) the Ground Truth; (d) FC-EF; (e) FC-Siam-Diff; (f) SNUet-CD-c32; (g) SNUet-CD-c48; (h) RDP-Net; (i) SMD-Net (our). The changed areas are shown in white, while the unchanged areas are shown in black. Red indicates the

F N

s, and green indicates the

F P

s.

Figure 15. Visualization of the results on BCDD dataset: (a,b) are original bi-temporal images; (c) the Ground Truth; (d) FC-EF; (e) FC-Siam-Diff; (f) SNUet-CD-c32; (g) SNUet-CD-c48; (h) RDP-Net; (i) SMD-Net (our). The changed areas are shown in white, while the unchanged areas are shown in black. Red indicates the

F N

s, and green indicates the

F P

s.

Figure 16. Overview of the relabeling process.

Table 1. Results of BCDD ablation study.

Method	P (%)	R (%)	F1 (%)	OA (%)	IoU (%)	Kappa (%)
baseline	94.39	91.16	92.75	99.35	86.47	92.41
baseline + SRMP(bilinear)	94.13	92.83	93.48	99.41	87.75	93.17
baseline + SRMP(nearest)	94.92	92.72	93.81	99.44	88.34	93.51
baseline + FDM	94.28	92.83	93.55	99.41	87.88	93.24
SMD-Net	94.80	93.87	94.33	99.48	89.27	94.06

Table 2. Results on the CDD Dataset.

Method	P (%)	R (%)	F1 (%)	OA (%)	IoU (%)	Kappa (%)
FC-EF *	93.2	93.5	93.4	98.4	87.5	92.4
FC-Siam-Conc *	94.0	95.4	94.7	98.7	89.9	94.0
FC-Siam-Diff *	91.9	92.3	92.1	98.0	85.4	91.0
DASNet(VGG16)	91.4	92.5	91.9	98.0	85.1	90.8
DASNet(ResNet50)	92.2	93.2	92.7	98.2	86.4	91.7
SNUNet-CD-c32 *	96.2	95.9	96.0	99.1	92.4	95.5
SNUNet-CD-c48	96.3	96.2	96.2	99.1	92.8	95.7
RDP-Net *	94.9	95.4	95.2	98.8	90.7	94.5
SMD-Net(no-pre-training)	95.0	96.2	95.6	98.9	91.6	95.0
SMD-Net	96.6	97.4	97.0	99.3	94.2	96.6

The symbol “*” means our re-implemented results.

Table 3. Results on the BCDD Dataset.

Method	P (%)	R (%)	F1 (%)	OA (%)	IoU (%)	Kappa (%)
FC-EF *	89.5	85.6	87.5	98.9	77.8	86.9
FC-Siam-Conc *	91.1	89.3	90.2	99.1	82.1	89.7
FC-Siam-Diff *	91.4	88.7	90.0	99.1	81.9	89.6
DASNet(VGG16) *	89.3	91.9	90.6	99.1	82.8	90.1
DASNet(ResNet50) *	83.7	90.3	86.9	98.7	76.8	86.2
SNUNet-CD-c32 *	80.1	86.9	83.3	98.4	71.4	82.5
SNUNet-CD-c48 *	82.8	87.1	84.9	98.6	73.8	84.2
RDP-Net *	85.0	83.4	84.2	98.6	72.7	83.4
SMD-Net(no-pre-training)	90.6	89.8	90.2	99.1	82.1	89.7
SMD-Net	94.8	93.9	94.3	99.5	89.3	95.8

The symbol “*” means our re-implemented results.

Table 4. Results on the OSCD Dataset.

Method	P (%)	R (%)	F1 (%)	OA (%)	IoU (%)	Kappa (%)
FC-EF *	55.6	54.3	55.0	95.1	37.9	52.3
FC-Siam-Conc *	47.9	53.9	50.7	94.2	34.0	47.7
FC-Siam-Diff *	59.4	47.9	53.0	95.3	36.1	50.6
SNUNet-CD-c32 *	53.5	57.8	55.6	95.5	38.5	53.2
SNUNet-CD-c48 *	56.8	56.3	56.5	95.8	39.4	54.3
RDP-Net *	56.2	47.8	51.7	95.0	34.9	49.1
SMD-Net	60.8	60.5	60.6	95.6	43.5	58.3

The symbol “*” means our re-implemented results.

Table 5. Model inference time.

Method	Time (ms)
FC-EF	0.8288
FC-Siam-Conc	1.0104
FC-Siam-Diff	1.0606
DASNet(VGG16)	1.1730
DASNet(ResNet50)	3.2178
SNUNet-CD-c32	1.1670
SNUNet-CD-c48	1.2667
RDP-Net	2.9201
SMD-Net	2.6134

Table 6. Results on the CDD Dataset.

Method	FC-EF	FC-Siam-Conc	FC-Siam-Diff	DASNet (VGG16)	SNUNet-CD-c48	RDP-Net	Baseline	SMD-Net (No-Pre-Training)	SMD-Net
Edge (%)	21.9	25.4	18.1	15.3	30.0	26.7	33.5	29.6	39.9
Completeness (%)	93.2	95.4	92.3	92.5	96.2	95.4	96.7	96.2	97.4
Small objects (%)	78.1	84.8	74.3	73.1	85.5	87.0	90.1	87.6	93.2
Mean (%)	64.4	68.5	61.6	60.3	70.6	69.7	73.4	71.1	76.8

Table 7. Results on the BCDD Dataset.

Method	FC-EF	FC-Siam-Conc	FC-Siam-Diff	DASNet (VGG16)	SNUNet-CD-c48	RDP-Net	Baseline	SMD-Net (No-Pre-Training)	SMD-Net
Edge (%)	25.8	27.1	27.7	20.6	19.2	18.4	31.2	26.6	33.5
Completeness (%)	85.6	89.3	88.7	91.9	87.1	83.4	91.2	89.8	93.9
Small objects (%)	78.5	82.2	82.4	85.8	83.3	83.3	86.0	82.8	86.1
Mean (%)	63.3	66.2	66.3	66.1	63.2	61.7	69.5	66.4	71.2

Table 8. Results on the new CDD Dataset.

Method	P (%)	R (%)	F1 (%)	OA (%)	IoU (%)	Kappa (%)
SMD-Net	79.4	80.2	79.8	97.2	66.4	78.3
SMD-Net (no FDM)	80.6	78.0	79.3	97.2	65.7	77.8
SMD-Net (no SRMP)	80.2	78.9	79.5	97.2	66.0	78.0

Table 9. Results on BCDD Dataset.

Method	FC-EF	FC-Siam-Conc	FC-Siam-Diff	DASNet (VGG16)	SNUNet-CD-c32	SNUNet-CD-c48	RDP-Net	Baseline	SMD-Net
Number (15)	8	12	14	7	11	10	8	15	13
$R_{r e l a b e l}$ (%)	19.3	32.3	97.9	73.3	27.1	22.9	20.7	98.2	98.5

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, X.; He, L.; Qin, K.; Dang, Q.; Si, H.; Tang, X.; Jiao, L. SMD-Net: Siamese Multi-Scale Difference-Enhancement Network for Change Detection in Remote Sensing. Remote Sens. 2022, 14, 1580. https://doi.org/10.3390/rs14071580

AMA Style

Zhang X, He L, Qin K, Dang Q, Si H, Tang X, Jiao L. SMD-Net: Siamese Multi-Scale Difference-Enhancement Network for Change Detection in Remote Sensing. Remote Sensing. 2022; 14(7):1580. https://doi.org/10.3390/rs14071580

Chicago/Turabian Style

Zhang, Xiangrong, Ling He, Kai Qin, Qi Dang, Hongjie Si, Xu Tang, and Licheng Jiao. 2022. "SMD-Net: Siamese Multi-Scale Difference-Enhancement Network for Change Detection in Remote Sensing" Remote Sensing 14, no. 7: 1580. https://doi.org/10.3390/rs14071580

APA Style

Zhang, X., He, L., Qin, K., Dang, Q., Si, H., Tang, X., & Jiao, L. (2022). SMD-Net: Siamese Multi-Scale Difference-Enhancement Network for Change Detection in Remote Sensing. Remote Sensing, 14(7), 1580. https://doi.org/10.3390/rs14071580

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

SMD-Net: Siamese Multi-Scale Difference-Enhancement Network for Change Detection in Remote Sensing

Abstract

1. Introduction

1.1. Related Work and Motivation

1.2. Contribution

2. Materials and Methods

2.1. Network Architecture

2.2. Siamese Residual Multi-Kernel Pooling Module

2.3. Feature Difference Module

2.4. Loss Function

3. Experiments and Results

3.1. Datasets

3.1.1. CDD

3.1.2. BCDD

3.1.3. OSCD

3.2. Comparative Methods and Evaluation Metrics

3.3. Experimental Setup

Experiment Settings

3.4. Ablation Study

3.4.1. Ablation Study for SRMP

3.4.2. Ablation Study for FDM

3.5. Performance Experiment

3.5.1. Experiment on CDD

3.5.2. Experiment on BCDD

3.5.3. Experiment on OSCD

3.5.4. Total Number of Parameters and Inference Time

3.5.5. Results Analysis

4. Discussion

4.1. The Effect of Dataset Overlap

4.2. The Effect of Label Error

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI