CGMNet: Semantic Change Detection via a Change-Aware Guided Multi-Task Network

Tan, Li; Zuo, Xiaolong; Cheng, Xi

doi:10.3390/rs16132436

Open AccessArticle

CGMNet: Semantic Change Detection via a Change-Aware Guided Multi-Task Network

by

Li Tan

¹,

Xiaolong Zuo

^2,*

and

Xi Cheng

¹

College of Geophysics, Chengdu University of Technology, Chengdu 610059, China

²

State Key Laboratory of Information Engineering in Surveying Mapping and Remote Sensing, Wuhan University, Wuhan 430079, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2024, 16(13), 2436; https://doi.org/10.3390/rs16132436

Submission received: 21 April 2024 / Revised: 27 June 2024 / Accepted: 1 July 2024 / Published: 2 July 2024

Download

Browse Figures

Versions Notes

Abstract

:

Change detection (CD) is the main task in the remote sensing field. Binary change detection (BCD), which only focuses on the region of change, cannot meet current needs. Semantic change detection (SCD) is pivotal for identifying regions of change in sequential remote sensing imagery, focusing on discerning “from-to” transitions in land cover. The emphasis on features within these regions of change is critical for SCD efficacy. Traditional methodologies, however, often overlook this aspect. In order to address this gap, we introduce a change-aware guided multi-task network (CGMNet). This innovative network integrates a change-aware mask branch, leveraging prior knowledge of regions of change to enhance land cover classification in dual temporal remote sensing images. This strategic focus allows for the more accurate identification of altered regions. Furthermore, to navigate the complexities of remote sensing environments, we develop a global and local attention mechanism (GLAM). This mechanism adeptly captures both overarching and fine-grained spatial details, facilitating more nuanced analysis. Our rigorous testing on two public datasets using state-of-the-art methods yielded impressive results. CGMNet achieved Overall Score metrics of 58.77% on the Landsat-SCD dataset and 37.06% on the SECOND dataset. These outcomes not only demonstrate the exceptional performance of the method but also signify its superiority over other comparative algorithms.

Keywords:

semantic change detection; deep learning; change-aware mask; multi-task network

1. Introduction

Binary change detection (BCD), which focuses on identifying the differences in two temporal images covering the same geographic area [1], has significant applications in urban planning, environmental monitoring, and disaster management [2,3,4]. However, only obtaining the changed region does not meet application requirements in some situations, and we need to know the categories of a feature before and after the change [5]. Semantic change detection (SCD), a more advanced form of BCD, goes beyond simply identifying changed and unchanged areas (Figure 1). It provides detailed “from-to” information about changes in land cover, thus offering a broader range of applications [6,7,8].

SCD methods largely fall into two distinct categories [9]. The first category, known as the direct classification method [10], treats each type of change as a separate class. While straightforward, this method often leads to a proliferation of output classes, which can adversely affect accuracy. The alternative approach conceptualizes SCD as a multi-task challenge, commonly referred to as the post-classification method. This strategy involves both the detection of regions of change and subsequent land cover classification. It leverages the identified changed areas to effectively mask the land cover classification map, a technique that has demonstrated enhanced performance and growing acceptance in the field. Traditional methods use the handcraft features to compare the two temporal images, obtaining the SCD result. Wu et al. proposed an SCD method via kernel slow feature analysis (KSFA) and post-classification fusion [11]. Tu et al. detected damaged buildings using the bag-of-words method (BoW) and SVM [12]. However, the handcraft features are sensitive to the sensors and the weather conditions. In recent years, deep learning has undergone huge development, meaning it can extract the deep features of the images through a convolution operation [13,14,15,16]. There are many studies on SCD using deep-learning-based methods, and deep-learning-based methods have become the predominant approach in SCD [17]. A notable contribution in this area is the work of Daudt et al., who explored four different structural designs for SCD, encompassing both aforementioned methods. Their empirical findings indicated that the post-classification method generally yields more favorable outcomes [18]. Building on this foundation, Ding et al. developed SSCDl and BiSRNet, both falling under the umbrella of the post-classification method, to further advance the field of SCD [19]. Similarly, Xia et al. introduced the PCFN model, which is also based on the principles of the post-classification approach [9]. Niu et al. introduced a transformer to SMNet during the process of information extraction for SCD [20].

In SCD, the changed regions within dual temporal remote sensing images where the “from-to” information appears are worthy of more attention [21]. It is important that the network focuses on the region of change and obtains the precise land cover classification. Existing methods often neglect this pivotal aspect, leading to less precise feature classification results. In order to overcome this limitation, we incorporate prior knowledge from changed regions into the process of land cover classification through an innovative change-aware mask branch, which can refine the classification results. Properly fusing the features of two temporal remote sensing images after the feature extraction stage is pivotal for detecting regions of change. Furthermore, for the remote sensing images, characterized by a wider imaging range and a more complex set of features [22], we developed a sophisticated global and local attention mechanism. This mechanism captures both the overarching global features and the more nuanced, detailed information from two branches, which ensures a more comprehensive and precise analysis of the remote sensing images, which is vital for the effective application of SCD.

In this paper, the main contributions of our work are as follows:

We have developed a change-aware guided multi-task network (CGMNet) specifically tailored to SCD. CGMNet effectively harnesses change-aware information, enabling heightened focus on altered regions and facilitating more accurate land cover classification.
In order to address the challenges posed by remote sensing images with extensive imaging ranges and intricate details, we designed a global and local attention mechanism (GLAM). This mechanism is adept at capturing both the overarching global features and the fine-grained details inherent in remote sensing images.
We executed a series of comparative and ablation studies using two publicly accessible datasets. The results from these experiments clearly demonstrate the superior performance and effectiveness of our proposed network and its components.

We organize the rest of this article as follows. We describe the related work involving binary change detection and semantic change detection in Section 2. The structure and components of CGMNet are presented in Section 3. Then, we provide the experiments and results analysis in Section 4. Finally, the conclusions are presented in Section 5.

2. Related Work

2.1. Binary Change Detection

For the two temporal remote sensing images

I^{(t 1)}

and

I^{(t 2)}

, the BCD task can be expressed as follows:

d = f_{d i s t - B C D} (I^{t 1}, I^{t 2})

(1)

where d denotes the difference between the two temporal remote sensing images, which also means the changed region.

f_{d i s t - B C D}

represents the function for BCD [23], and the function can be split into traditional methods and deep-learning-based methods.

Traditional BCD methods usually incorporate clustering or threshold segmentation using the handcraft features to detect the region of change between the two temporal remote sensing images [24]. In [25], principal component analysis (PCA) and K-MEANS clustering were utilized to detect the region of change between multi-temporal remote sensing images without supervision. Tang et al. [26] and Huang et al. [27] designed a change detection framework based on the morphological building index (MBI). Ye et al. developed “three-layer SVDD fusion” (TLSF) based on support vector domain description (SVVD) for targeted change detection [28]. Change vector analysis (CVA) and spectral angle mapper (SAM) were combined to obtain a change map in [29]. However, traditional handcraft features are often designed relying on experience, and these features vary for different weather conditions and sensors, displaying low robustness [30]. Recently, deep-learning-based methods have achieved amazing results in many tasks in the field of remote sensing, including change detection [31], land cover classification [32], and object detection [33]. Deep-learning-based approaches extract the deep features of dual temporal remote sensing images through the convolution operation, and the feature maps are compared to obtain the region of change between the images. Caye Daudt et al. designed three fully convolutional neural network architectures (FC-EF, FC-Siam-conc, and FC-Siam-diff), which perform change detection using a pair of co-registered images in [34]. Zheng et al. designed cross-layer blocks in CLNet to incorporate the multi-scale features and multi-level context information for BCD [35]. Lei et al. proposed a network based on difference enhancement and spatial–spectral nonlocal (DESSN) processes for CD in very-high-resolution images [36]. The attention mechanism allowed the network to focus on important regions. Some researchers introduced the attention mechanism into a network for BCD to improve its performance. Zhang et al. designed an attention-guided edge refinement network (AERNet) for building change detection [37]. Transformer, which provides the means for natural language processing (NLP) [38], has been introduced into various vision tasks [39,40], and some studies utilize transformer for BCD. ChangeFormer presents a transformer-based Siamese network architecture for building change detection [41]. Chen et al. proposed a bitemporal image transformer (BIT) along with a semantic tokenizer to model contexts within the spatial–temporal domain [42].

2.2. Semantic Change Detection

SCD can be seen as an expansion of BCD, where we also need to obtain the change classes of the features within the region of change. The SCD task can be expressed as

d, I n f o_{f r o m - t o} = f_{d i s t - S C D} (I^{t 1}, I^{t 2})

(2)

where

I n f o_{f r o m - t o}

designates information about a changed feature, recording the class of the feature in the two phase states within the region of change.

f_{d i s t - S C D}

represents the functions for semantic change detection.

As a multiple-task process, SCD can be divided into land cover classification and region-of-change detection. There are four types of network structures commonly used for SCD, as shown in Figure 2 [5]. The first type is post-classification (Figure 2a), where two temporal images are independently segmented; the final SCD result is acquired by comparing the two temporal land cover classification results. This ignores the relationship between the two temporal images [34]. The second strategy sees one type of “from-to” as one class and inputs two temporal images to obtain the changed categories of the feature (Figure 2b). In this situation, there will be too many classes for segmentation [9]. The first two strategies see SCD as one task, which has been abandoned now. The third strategy splits SCD into two subtasks (Figure 2c). The two temporal images are concatenated to obtain the region of change, and the two temporal images are segmented independently. The final result is obtained through the land cover classification being masked by the region of change. The third strategy carries out two subtasks separately, which lacks communication between the deep features of the two temporal images [9]. The fourth strategy (Figure 2d) treats SCD as two integrated subtasks, which utilizes the deep features of two temporal images for region-of-change detection. Most of the current binary change detection networks adopt Strategy 4 [43,44]. Most existing network structures for SCD are improvements on Strategy 4, where the weight-sharing semantic segmentation branches are used for land cover classification in two temporal images, and the deep features of the images after feature extraction are concatenated and sent to a new decoder for region-of-change detection. Yang et al. located semantic changes through the feature pairs obtained from the feature extraction structures [45]. Zheng et al. utilized the temporal symmetry of two temporal images to fuse the deep features of images for SCD in ChangeMask [46]. Zhou et al. used a graph convolutional network to process the features after feature extraction to obtain the SCD result [47]. Ding et al. proposed two structures in [19], which even utilize the relationship between two temporal images for land cover classification. Moreover, transformer has been introduced into SCD; Yuan et al. designed a transformer-based encoder to extract the features of two temporal images [48]. As for SCD, we want to obtain the precise region of change and land cover classification within the changed region, which makes it vital that the network pays more attention to the changed region. However, the existing methods ignore this. In our method, we not only facilitate the network to make full use of both global and detailed information while fusing features but also facilitate the network to focus on the region of change by adding a change-aware mask branch.

3. Methodology

3.1. The Overall Architecture of CGMNet

Multi-task learning involves learning multiple related subtasks in parallel and sharing knowledge during the training process [5]. Compared to single-task learning, the relationship between subtasks in multi-tasking can enhance a network’s performance [49]. As for SCD, we need to obtain the region of change and the categories of change of the features. Therefore, we can treat SCD as a combination of binary change detection and semantic segmentation. As shown in Figure 3, the proposed CGMNet received two temporal remote sensing images and output the BCD results and land cover classification of dual time-phase remote sensing images. Eventually, the final result of SCD was acquired from the land cover classification result masked by a binary change detection map in the post-process stage.

The CGMNet structure is depicted in Figure 3, representing a multi-task network structure designed for semantic change detection. The feature maps of two temporal remote sensing images were obtained through feature extraction. The feature maps were passed through the feature classifier to obtain the land cover classification results. We fused the two temporal feature maps and further extracted the features to obtain the region of change through the change detection classifier. In order to utilize the prior knowledge of changed region for land cover classification, we proposed the change-aware mask branch, which constrains feature classification using fused feature map information, making the network focus more on the region of change. Notably, weight sharing was applied to the operations on the two temporal remote sensing images.

During feature extraction, ResNet34 was used as the backbone [50]. ResNet34 consists of four layers, which can be expressed as

{layer}_{i} (i = 1, 2, 3, 4)

. Every layer is composed of several ResBlocks, and the numbers are 3, 4, 6, and 3, respectively. After each layer, the feature map is downsampled with a larger receptive field but with the loss of detailed information occurring at the same time. The obtained feature maps from the backbone were directly subjected to classification to yield the land cover prediction results. In order to enhance the preservation of spatial details, we eliminated the downsampling operations in

{layer}_{3}

and

{layer}_{4}

of ResNet34. The global and local feature information of the two temporal remote sensing images were captured using the GLAM, and the feature maps were fused using a concatenation operation. By using the change information extraction module, we can obtain the final changed region prediction result. The change information extraction module is composed of four attention-guided residual modules (Figure 3b), where the coordinate attention mechanism (Figure 3c) was introduced to make the network focus more on the important region. In the change-aware mask branch, the two temporal feature maps are guided by change information to obtain land cover classification.

3.2. Dual Temporal Feature Fusion

In SCD, fusing the information of dual temporal remote sensing images is essential to detect changed regions and promote land cover classification. Before the fusion, we needed to extract the features of the feature maps. Remote sensing images, which are different from natural images, are characterized by a wider imaging range and a more complex set of features [22]. In order to extract robust features with both details and semantics, GLAM is designed to extract the global features and spatially detailed information from two branches, as shown in Figure 4. In the global feature extraction branch, the global attention map can be obtained by

A t t e n t i o n_{global} = Sigmoid (F_{a v g} \oplus F_{m a x})

(3)

F_{a v g} = MLP (AvgPool (F))

(4)

F_{m a x} = MLP (MaxPool (F))

(5)

where F stands for the feature map from which the information is extracted.

MaxPool (\cdot)

and

AvgPool (\cdot)

represent the max pooling operation and average pooling operation, respectively.

F_{a v g}

and

F_{m a x}

indicate the global features, and note that the multi-layer perceptron (MLP) operations have the same weight.

A t t e n t i o n_{global}

represents the global attention map, while

Sigmoid (\cdot)

and ⊕ indicate a sigmoid function and elementwise addition operation. Through using this branch, the global features can be recognized and we can identify the location of the region of change. In contrast, in the local feature extraction branch, we use a pointwise convolution operation to capture the exact spatial information of the feature map. The local attention map can be formulated by

A t t e n t i o n_{l o c a l} = Sigmoid ({PConv}_{1} ({PConv}_{2} (F)))

(6)

where

{PConv}_{i} (\cdot)

represents the different pointwise convolution operations. During the

{PConv}_{1} (\cdot)

operation, a 1 × 1 convolution operation is performed on the extracted feature map, and the BN operation and ReLU are used. In

{PConv}_{2} (\cdot)

, we used a different weight 1 × 1 convolution operation to extract the local features, and we utilized a BN operation to normalize the feature map. Ultimately,

A t t e n t i o n_{local}

is obtained through the

Sigmoid (\cdot)

operation, which states the local attention map. This branch can extract more fine-grained features from images, which contributes to more accurate results. The final attention map is obtained by performing elementwise processes on the global attention map and local attention map.

By leveraging GLAM, we delve deeper into the global and local feature information embedded within the two temporal feature maps. Subsequently, a concatenation operation was employed in conjunction with pointwise convolutional operations to seamlessly fuse and integrate the extracted information. This dual operation facilitates the amalgamation of global and local contextual details, enriching the overall representation of the feature maps for subsequent processing and analysis.

3.3. Change-Aware Mask Branch

For SCD, we need to obtain the region of change and land cover classification results within the region of change for the two temporal remote sensing images. We do not need to know the land cover classification results outside the region of change, which means facilitating the network to pay more attention to the region of change. After two-image temporal feature fusion, the region of change is more prominent in the feature map, where two-image temporal land cover classification prediction is supervised. In order to make the network focus more on a region of change, we added a change-aware mask branch (Figure 5), which can improve the ability of the network to extract the important features.

As shown in Figure 5a, in the change-aware mask branch, the input comprises a single temporal feature map and the feature map from the fusion process. After fusion, the feature map is input into the change-aware mask module, generating a change-aware feature map that contains prior knowledge of a changed region. Subsequently, both the single temporal feature map and the change-aware feature map undergo elementwise product and elementwise addition operations, followed by the application of an attention-guided residual convolutions module (Figure 3b). This series of operations culminates in the derivation of the final land cover prediction result within the change-aware mask branch. By using this mechanism, the network is steered to concentrate more on the changed region, thereby refining its ability to extract any essential features.

In the change-aware mask module, the change-aware feature map can be obtained by

F_{c} = Sigmoid (MaxPool (F_{f u s i o n}^{g l o b a l}) © AvgPool (F_{f u s i o n}^{g l o b a l}))

(7)

F_{f u s i o n}^{g l o b a l} = F_{f u s i o n} \otimes A_{g l o b a l}

(8)

where

F_{c}

represents the change-aware feature map, and

F_{f u s i o n}

and

A_{g l o b a l}

represent the feature map after the fusion and global attention operations (Equation (3)), respectively. The change-aware mask module aims to obtain the prior knowledge of a changed region. Firstly,

F_{f u s i o n}

is input through the global feature extraction branch in GLAM (Figure 4) to obtain

F_{f u s i o n}^{g l o b a l}

, which is able to pay more attention to the changed region globally. Further, inspired by [51], we extracted the channel attention map of

F_{f u s i o n}^{g l o b a l}

as the change-aware feature map,

F_{c}

. Eventually, the change-aware feature map and the single temporal feature map are sent to the change-aware mask branch to obtain the land cover classification results.

3.4. Loss Function

We used four loss functions to train CGMNet: the direct semantic class loss,

L_{s e g}^{d i r e c t}

; the change-aware mask semantic class loss,

L_{s e g}^{c a m}

; the binary change loss,

L_{c h a n g e}

; and SCLoss,

L_{s c}

[19]. The total loss,

L_{s c d}

, is expressed as

L_{s c d} = L_{s e g}^{d i r e c t} + L_{s e g}^{c a m} + L_{c h a n g e} + L_{s c} .

(9)

The direct semantic class loss,

L_{s e g}^{d i r e c t}

, and the change-aware mask semantic class loss,

L_{s e g}^{c a m}

, form the multiclass cross-entropy loss between the result, P, of each branch and the GT land cover classification, L. The calculation of

L_{s e g}^{X} (x = d i r e c t, c a m)

on each pixel is

L_{s e g}^{X} = - \frac{1}{N} \sum_{i = 1}^{N} y_{i} \log (p_{i})

(10)

where N is the number of land cover classes, and

y_{i}

and

p_{i}

denote the GT label and the predicted probability of the ith class, respectively. N is set according to the number of land cover classes in the dataset. While calculating

L_{s e g}^{X} (X = d i r e c t, c a m)

, the pixels within the unchanged area are excluded and each temporal result contributes the same weighting, which means

L_{s e g}^{X} = - \frac{L_{s e g}^{t_{1}_X} + L_{s e g}^{t_{2}_X}}{2}, (X = d i r e c t, c a m) .

(11)

The change loss,

L_{c h a n g e}

, is calculated by the binary cross-entropy loss between the predicted binary change map, C, and the GT change map,

L_{c}

. Each pixel for

L_{c h a n g e}

is calculated as

L_{c h a n g e} = - y_{c} \log (p_{c}) - (1 - y_{c}) \log (1 - p_{c})

(12)

where

y_{c}

and

p_{c}

denote the GT label and the predicted probability of change, respectively.

SCLoss is task-specific for SCD. It awards predictions with similar probability distributions in the unchanged areas while punishing those in the changed areas, which utilizes the characteristics of the SCD dataset to improve the performance of the network; this is calculated using the cosine loss function:

L_{s c} = \{\begin{matrix} 1 - \cos (x_{1}, x_{2}), & y_{c} = 1 \\ \cos (x_{1}, x_{2}), & y_{c} = 0 \end{matrix}

(13)

where

x_{1}

and

x_{2}

are the feature vectors predicted in relation to the pixels in the two temporal images, and

y_{c}

represents whether there is a change at the same position.

4. Experiment and Analysis

4.1. Dataset

In this study, to evaluate the performance of the proposed CGMNet, we conducted comprehensive experiments on two publicly available datasets, including the Landsat-SCD dataset [48] (available at https://captain-whu.github.io/SCD/, accessed on 18 October 2023) and the SECOND dataset [45] (available at https://doi.org/10.6084/m9.figshare.19946135.v1, accessed on 18 October 2023). The details are described as follows.

The Landsat-SCD dataset is a collection of Landsat-like images taken between the years 1990 and 2020 in Tumushuke, Xinjiang. The images include three bands, corresponding to red (R), green (G), and blue (B) wavelengths, and the spatial resolution of the R-G-B composite images is 30 m. Figure 6 presents several samples from the Landsat-SCD dataset. The Landsat-SCD dataset encompasses four land cover categories: farmland, desert, building, and water. The labels are characterized by two temporal feature types within the region of change, with a wide variety of change categories and a discrete distribution of change areas. It comprises 8468 image pairs of size 416 × 416 pixels. Excluding the enhanced images, there are a total of 2385 image pairs, of which 1908 image pairs are used as the training set and 477 image pairs are used as the validation set.

The SECOND dataset contains a series of two temporal remote sensing images from different sensors on multiple platforms, covering several cities (including Shanghai, Chengdu, etc.). The images also include three bands corresponding to R, G, and B wavelengths, with spatial resolutions ranging from 0.5 m to 3 m. Figure 7 presents several samples from the SECOND dataset. It includes six land cover categories: non-vegetated ground, building, tree, low vegetation, water, and playground. There are 2968 image pairs of size 512 × 512 pixels. For this dataset, we divided the images into a training set of 2375 and a testing set of 593 using a random splitting method.

These two datasets present a wide range of resolutions and diverse feature types, which are more representative and able to effectively assess the robustness of the proposed method.

4.2. Evaluation Metrics

In this study, to quantitatively assess the performance accuracy of the change-aware guided multi-task network (CGMNet), we employed four commonly used indicators in semantic change detection research. These include the mean intersection-over-union (mIoU), F1-score, SeK [45], and the Overall Score.

In SCD, mIOU and F1-score are able to evaluate the network’s ability to detect regions of change. The mIOU is the average of the unchanged region IOU (0_IOU) and changed region IOU (1_IOU). The calculation of these indicators is as follows:

0_IoU = \frac{T N}{T N + F N + F P}

(14)

1_IoU = \frac{T P}{T P + F P + F N}

(15)

mIoU = \frac{0_IoU + 1_IoU}{2}

(16)

where

T P

is the quantity of accurately identified changed pixels and

F P

is the quantity of unchanged pixels mistakenly detected as changed pixels.

T N

indicates the amount of unchanged pixels detected correctly, and

F N

represents the amount of changed pixels mistakenly identified as unchanged pixels. The F1-score was calculated as follows:

P = \frac{T P}{T P + F P}

(17)

R = \frac{T P}{T P + F N}

(18)

F 1_score = \frac{2 \times P \times R}{P + R}

(19)

where P and R represent precision and recall, respectively. Note that, when we calculate mIOU and F1-score, we treat SCD as BCD, where the output result is treated as the region of change and the non-region of change.

In order to alleviate the impact of the class imbalance problem, we adopted the separated kappa (SeK) coefficient proposed in [45]. The SeK coefficient is calculated based on the multi-class confusion matrix

Q = {q_{i, j}}

, where

q_{i, j}

represents the number of pixels that are classified into class i, while their GT index is j (

i, j \in {0, 1, \dots, N}

, 0 represents no-change) and

q_{0, 0}

= 0. The calculations are given as follows:

ρ = \frac{\sum_{i = 0}^{N} q_{i i}}{\sum_{i = 0}^{N} \sum_{j = 0}^{N} q_{i j}}

(20)

η = \frac{\sum_{i = 0}^{N} (\sum_{j = 0}^{N} q_{i j} \times \sum_{j = 0}^{N} q_{j i})}{{(\sum_{i = 0}^{N} \sum_{j = 0}^{N} q_{i j})}^{2}}

(21)

SeK = e^{1_IoU - 1} \times \frac{ρ - η}{1 - η} .

(22)

In order to comprehensively assess the benchmark methods, a weighted Overall Score is adopted, which is defined as

Overall Score = 0.3 \times mIoU + 0.7 \times SeK .

(23)

Finally, three metrics are provided to measure the computational costs, including the size of the parameters (Params), the floating-point operations (FLOPs), and the inference (Infer.) time for 100 epochs. The FLOPs and Infer. time are measured by considering the calculations for a pair of input images, each with 512 × 512 pixels.

4.3. Training Details

The experimental framework for this study was established on a high-performance NVIDIA GeForce RTX 3090 GPU with 24 GB of memory, operating under the Ubuntu 20.04 system environment. All computational procedures and algorithms were implemented by utilizing the PyTorch library (https://pytorch.org/, version 1.12). For the optimization of network weights, the stochastic gradient descent (SGD) method was employed. We also configured the batch size. Additionally, the initial learning rate was carefully chosen to be 0.1, ensuring a balanced convergence speed and stability in training. In order to enhance the robustness and generalizability of our models, we incorporated data augmentation techniques into our training regimen. This involved the random rotation of images, a strategy designed to diversify the training dataset and enable our models to better generalize from the training data to unseen data.

4.4. Performance Evaluation

4.4.1. Comparison Methods

In order to validate the effectiveness of the proposed CGMNet for region-of-change detection and land cover classification in SCD tasks, we selected some mainstream BCD methods, including CLNet [35], BIT [42], DESSN [36], and AERNet [37], as well as SCD methods, including HRSCD3 [18], HRSCD4 [18], SSCDl [19], and BiSRNet [19] as competing algorithms. Considering the fact that hyperparameters are quite important to model performance, to ensure fair comparison experiments, we followed the settings in their original literature.

CLNet designed the cross-layer blocks, starting with one input and then splitting this into two parallel but asymmetric branches, and this can incorporate the multi-scale features and multi-level context information. BIT introduced a transformer into the change detection process for remote sensing images. DESSN designed the difference-enhancement module and spatial–spectral nonlocal module to learn difference representation and strengthen edge integrity. HRSCD3 and HRSCD4 are proposed in [18], where four designed structures for SCD, HRSCD3, and HRSCD4 show better performance. SSCDl and BiSRNet were designed by the authors of [19], and BiSRNet aims to solve the problem of insufficient communication between time branches and change branches.

4.4.2. Quantitative Comparison

Table 1 and Table 2 delineate the results of various approaches using the Landsat-SCD dataset and SECOND dataset, respectively. For both the Landsat-SCD and SECOND datasets, our proposed CGMNet excels in achieving the highest values across all accuracy evaluation metrics that evaluate a network’s performance in detecting regions of change and land cover classification. Compared with BiSRNet, CGMNet obtains both a 0.35% improvement in mIOU and a 0.76% and 0.79% improvement in SeK on the Landsat-SCD and SECOND datasets, respectively. Among the methods for SCD, HRSCD3 directly concatenates two temporal images before feature extraction for binary change detection, while the other methods all utilize the information of the feature maps after feature extraction. HRSCD3 has the poorest mIOU value, especially on the SECOND dataset on high resolutions, obtaining only 68.84%, which illustrates the importance of fusing features through feature extraction for binary change detection. This is the motivation for GLAM, which is able to mine the features from two temporal feature maps for fusion, and its superiority will be discussed in Section 4.5. For the mIOU metric, which measures the results of detecting regions of change, the tables show that most multi-task methods have higher values than the methods for binary change detection, with the exception of AERNet on the Landsat-SCD dataset and the simplest HRSCD3. The mIOU values illustrate that multiple tasks can improve the performance of the network [49]. As shown in Table 1 and Table 2, our method has a similar number of parameters and inference time to BiSRNet, which shows that CGMNet has a great balance between accuracy performance and complexity/efficiency.

4.4.3. Qualitative Comparison

In order to further analyze the limitations and advantages of the compared methods, we visualized some prediction examples of the methods for SCD on the two publicly available datasets. Figure 8 shows the prediction examples of the compared methods on the Landsat-SCD dataset, and Figure 9 shows the prediction examples of the compared methods on the SECOND dataset. As shown in the prediction examples, showing consistency with the quantitative metrics, our proposed CGMNet method can obtain more precise results than the other compared methods.

In spite of the low resolution of the Landsat-SCD dataset, the proposed CGMNet obtained fine results. As shown in the second and third remote sensing image pairs in Figure 8, when compared to other comparison methods, ours is able to detect the contour of the changed region more precisely. Additionally, CGMNet is capable of accurately classifying the change classes of land cover. Take the results of land cover classification in the first pair of images as an example; in the area marked by the red box, the result of the method proposed in this study is closest to the GT, and the other methods wrongly detect water at the perimeter of the desert. The results illustrate that CGMNet can pay more attention to the region of change while extracting the exact region of change, thus obtaining more accurate classification results.

The SECOND dataset has a higher resolution, meaning that this provides a better visual representation of the performance of the methods. The proposed CGMNet method detects the region of change of two temporal remote sensing images. As shown in the first and third image pairs in Figure 9, our method detected the changed buildings and extract the contours of the buildings accurately. The CGMNet method proposed in the paper is able to obtain more precise land cover classification than the other methods. In the second image pair in Figure 9, we can see that CGMNet obtained the land cover classification result within complex ground conditions.

4.5. Ablation Studies

In order to validate the effectiveness of the GLAM module and the change-aware mask branch, we conducted ablation studies on both the Landsat-SCD and SECOND datasets. Table 3 and Table 4 present the results of these studies on the two datasets. We compared three model configurations: the base model without GLAM and the change-aware mask branch (denoted as “Base”); the base model supplemented with GLAM but without the change-aware mask branch (“Base + GLAM”); and our complete proposed CGMNet (“Base + GLAM + Branch”). As shown in Table 3, compared to Base, Base + GLAM obtains a 3.10% mIOU improvement and a 9.75% SeK improvement on the Landsat-SCD dataset, which illustrates the ability of GLAM to extract important information from a low-resolution dataset. With the addition of the change-aware mask branch, CGMNet achieves a 0.47% mIOU improvement and a 1.31% SeK improvement on the Landsat-SCD dataset. For the SECOND dataset, the findings from Table 4 unequivocally demonstrate the beneficial impact of both GLAM and the change-aware mask branch. With the addition of GLAM and change-aware mask branch, our proposed method significantly improves the mIOU metric by 0.30% and the SeK metric by 1.03% on the SECOND dataset. Notably, the cost of the effect increase is only a 1.23Mb increase in the number of parameters. This increase in the performance metrics underscores the efficacy of incorporating the GLAM module and the change-aware mask branch into the CGMNet framework.

5. Conclusions

In this paper, we introduced an innovative change-aware guided multi-task network (CGMNet) specifically designed for semantic change detection. This network is uniquely enhanced by the integration of a change-aware mask branch and a global and local attention mechanism (GLAM). The change-aware mask branch significantly improves the network’s focus on changed regions, leading to more precise land cover classification. Concurrently, GLAM is meticulously tailored to tackle the complexities of remote sensing images, adeptly capturing both global and spatially detailed features. The experimental results show that CGMNet achieves competitive performance when compared to state-of-the-art change detection methodologies across two publicly available datasets, with some improvements in specific scenarios. These results highlight the network’s strengths and its potential for application in the field of change detection. In terms of future work, our research will focus on advancing the network’s architecture, particularly in the realm of semi-supervised learning. This strategic shift aims to significantly reduce the labor-intensive process of data labeling, paving the way for more efficient and scalable applications in semantic change detection. Such innovations have the potential to greatly expand the applicability and impact of our network in various real-world scenarios.

Author Contributions

Conceptualization, L.T. and X.Z.; methodology, L.T. and X.C.; software, L.T.; validation, L.T. and X.Z.; formal analysis, L.T. and X.Z.; investigation, L.T. and X.Z.; resources, L.T. and X.Z.; writing—original draft preparation, L.T. and X.Z.; writing—review and editing, X.Z. and X.C.; visualization, L.T. and X.Z.; supervision, L.T., X.Z. and X.C.; project administration, X.Z.; funding acquisition, L.T. All authors have read and agreed to the published version of the manuscript.

Funding

This article is supported by the Sichuan Science and Technology Program (2023YFN0022).

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Ding, Q.; Shao, Z.; Huang, X.; Altan, O. DSA-Net: A novel deeply supervised attention-guided network for building change detection in high-resolution remote sensing images. Int. J. Appl. Earth Obs. Geoinf. 2021, 105, 102591. [Google Scholar] [CrossRef]
Wang, R.; Wu, H.; Qiu, H.; Wang, F.; Liu, X.; Cheng, X. A Difference Enhanced Neural Network for Semantic Change Detection of Remote Sensing Images. IEEE Geosci. Remote Sens. Lett. 2023, 20, 5509205. [Google Scholar] [CrossRef]
Zhu, Q.; Guo, X.; Li, Z.; Li, D. A review of multi-class change detection for satellite remote sensing imagery. Geo-Spat. Inf. Sci. 2024, 27, 1–15. [Google Scholar] [CrossRef]
Fang, S.; Li, K.; Li, Z. Changer: Feature interaction is what you need for change detection. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5610111. [Google Scholar] [CrossRef]
Cui, F.; Jiang, J. MTSCD-Net: A network based on multi-task learning for semantic change detection of bitemporal remote sensing images. Int. J. Appl. Earth Obs. Geoinf. 2023, 118, 103294. [Google Scholar] [CrossRef]
Ru, L.; Du, B.; Wu, C. Multi-temporal scene classification and scene change detection with correlation based fusion. IEEE Trans. Image Process. 2020, 30, 1382–1394. [Google Scholar] [CrossRef] [PubMed]
He, Y.; Zhang, H.; Ning, X.; Zhang, R.; Chang, D.; Hao, M. Spatial-temporal semantic perception network for remote sensing image semantic change detection. Remote Sens. 2023, 15, 4095. [Google Scholar] [CrossRef]
Zheng, Z.; Zhong, Y.; Wang, J.; Ma, A.; Zhang, L. Building damage assessment for rapid disaster response with a deep object-based semantic change detection framework: From natural disasters to man-made disasters. Remote Sens. Environ. 2021, 265, 112636. [Google Scholar] [CrossRef]
Xia, H.; Tian, Y.; Zhang, L.; Li, S. A deep Siamese postclassification fusion network for semantic change detection. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5622716. [Google Scholar] [CrossRef]
Volpi, M.; Tuia, D.; Bovolo, F.; Kanevski, M.; Bruzzone, L. Supervised change detection in VHR images using contextual information and support vector machines. Int. J. Appl. Earth Obs. Geoinf. 2013, 20, 77–85. [Google Scholar] [CrossRef]
Wu, C.; Zhang, L.; Du, B. Kernel slow feature analysis for scene change detection. IEEE Trans. Geosci. Remote Sens. 2017, 55, 2367–2384. [Google Scholar] [CrossRef]
Tu, J.; Li, D.; Feng, W.; Han, Q.; Sui, H. Detecting damaged building regions based on semantic scene change from multi-temporal high-resolution remote sensing images. ISPRS Int. J. Geo-Inf. 2017, 6, 131. [Google Scholar] [CrossRef]
Wang, Y.; Shao, Z.; Lu, T.; Wang, J.; Cheng, G.; Zuo, X.; Dang, C. Remote Sensing Pan-sharpening via Cross Spectral-Spatial Fusion Network. IEEE Geosci. Remote Sens. Lett. 2023, 21, 5000105. [Google Scholar] [CrossRef]
Wang, J.; Lu, T.; Huang, X.; Zhang, R.; Feng, X. Pan-sharpening via conditional invertible neural network. Inf. Fusion 2024, 101, 101980. [Google Scholar] [CrossRef]
Wang, Y.; Shao, Z.; Lu, T.; Liu, L.; Huang, X.; Wang, J.; Jiang, K.; Zeng, K. A lightweight distillation CNN-transformer architecture for remote sensing image super-resolution. Int. J. Digit. Earth 2023, 16, 3560–3579. [Google Scholar] [CrossRef]
Zuo, X.; Shao, Z.; Wang, J.; Huang, X.; Wang, Y. A cross-stage features fusion network for building extraction from remote sensing images. Geo-Spat. Inf. Sci. 2024, 1–15. [Google Scholar] [CrossRef]
Bai, T.; Wang, L.; Yin, D.; Sun, K.; Chen, Y.; Li, W.; Li, D. Deep learning for change detection in remote sensing: A review. Geo-Spat. Inf. Sci. 2023, 26, 262–288. [Google Scholar] [CrossRef]
Daudt, R.C.; Le Saux, B.; Boulch, A.; Gousseau, Y. Multitask learning for large-scale semantic change detection. Comput. Vis. Image Underst. 2019, 187, 102783. [Google Scholar] [CrossRef]
Ding, L.; Guo, H.; Liu, S.; Mou, L.; Zhang, J.; Bruzzone, L. Bi-temporal semantic reasoning for the semantic change detection in HR remote sensing images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5620014. [Google Scholar] [CrossRef]
Niu, Y.; Guo, H.; Lu, J.; Ding, L.; Yu, D. SMNet: Symmetric Multi-Task Network for Semantic Change Detection in Remote Sensing Images Based on CNN and Transformer. Remote Sens. 2023, 15, 949. [Google Scholar] [CrossRef]
Zhu, Q.; Guo, X.; Deng, W.; Shi, S.; Guan, Q.; Zhong, Y.; Zhang, L.; Li, D. Land-use/land-cover change detection based on a Siamese global learning framework for high spatial resolution remote sensing imagery. ISPRS J. Photogramm. Remote Sens. 2022, 184, 63–78. [Google Scholar] [CrossRef]
Li, R.; Zheng, S.; Duan, C.; Wang, L.; Zhang, C. Land cover classification from remote sensing images based on multi-scale fully convolutional network. Geo-Spat. Inf. Sci. 2022, 25, 278–294. [Google Scholar] [CrossRef]
Zhang, R.; Zhang, H.; Ning, X.; Huang, X.; Wang, J.; Cui, W. Global-aware siamese network for change detection on remote sensing images. ISPRS J. Photogramm. Remote Sens. 2023, 199, 61–72. [Google Scholar] [CrossRef]
Ke, L.; Lin, Y.; Zeng, Z.; Zhang, L.; Meng, L. Adaptive change detection with significance test. IEEE Access 2018, 6, 27442–27450. [Google Scholar] [CrossRef]
Celik, T. Unsupervised change detection in satellite images using principal component analysis and k-means clustering. IEEE Geosci. Remote Sens. Lett. 2009, 6, 772–776. [Google Scholar] [CrossRef]
Tang, Y.; Huang, X.; Zhang, L. Fault-tolerant building change detection from urban high-resolution remote sensing imagery. IEEE Geosci. Remote Sens. Lett. 2013, 10, 1060–1064. [Google Scholar] [CrossRef]
Huang, X.; Zhang, L.; Zhu, T. Building change detection from multitemporal high-resolution remotely sensed images based on a morphological building index. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2013, 7, 105–115. [Google Scholar] [CrossRef]
Ye, S.; Chen, D.; Yu, J. A targeted change-detection procedure by combining change vector analysis and post-classification approach. ISPRS J. Photogramm. Remote Sens. 2016, 114, 115–124. [Google Scholar] [CrossRef]
Zhuang, H.; Deng, K.; Fan, H.; Yu, M. Strategies combining spectral angle mapper and change vector analysis to unsupervised change detection in multispectral images. IEEE Geosci. Remote Sens. Lett. 2016, 13, 681–685. [Google Scholar] [CrossRef]
Zhu, Y.; Huang, B.; Gao, J.; Huang, E.; Chen, H. Adaptive polygon generation algorithm for automatic building extraction. IEEE Trans. Geosci. Remote Sens. 2021, 60, 4702114. [Google Scholar] [CrossRef]
Ding, Q.; Shao, Z.; Huang, X.; Altan, O.; Hu, B. Time-series land cover mapping and urban expansion analysis using OpenStreetMap data and remote sensing big data: A case study of Guangdong-Hong Kong-Macao Greater Bay Area, China. Int. J. Appl. Earth Obs. Geoinf. 2022, 113, 103001. [Google Scholar] [CrossRef]
Sun, Y.; Shao, Z.; Cheng, G.; Huang, X.; Wang, Z. Road and car extraction using UAV images via efficient dual contextual parsing network. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5632113. [Google Scholar] [CrossRef]
Shao, Z.; Cheng, G.; Ma, J.; Wang, Z.; Wang, J.; Li, D. Real-time and accurate UAV pedestrian detection for social distancing monitoring in COVID-19 pandemic. IEEE Trans. Multimed. 2021, 24, 2069–2083. [Google Scholar] [CrossRef] [PubMed]
Daudt, R.C.; Le Saux, B.; Boulch, A. Fully convolutional siamese networks for change detection. In Proceedings of the 2018 25th IEEE International Conference On Image Processing (ICIP), Athens, Greece, 7–10 October 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 4063–4067. [Google Scholar]
Zheng, Z.; Wan, Y.; Zhang, Y.; Xiang, S.; Peng, D.; Zhang, B. CLNet: Cross-layer convolutional neural network for change detection in optical remote sensing imagery. ISPRS J. Photogramm. Remote Sens. 2021, 175, 247–267. [Google Scholar] [CrossRef]
Lei, T.; Wang, J.; Ning, H.; Wang, X.; Xue, D.; Wang, Q.; Nandi, A.K. Difference enhancement and spatial–spectral nonlocal network for change detection in VHR remote sensing images. IEEE Trans. Geosci. Remote Sens. 2021, 60, 4507013. [Google Scholar] [CrossRef]
Zhang, J.; Shao, Z.; Ding, Q.; Huang, X.; Wang, Y.; Zhou, X.; Li, D. AERNet: An attention-guided edge refinement network and a dataset for remote sensing building change detection. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5617116. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing Systems; Curran Associates Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16 × 16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 213–229. [Google Scholar]
Bandara, W.G.C.; Patel, V.M. A transformer-based siamese network for change detection. In Proceedings of the IGARSS 2022—2022 IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, 17–22 July 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 207–210. [Google Scholar]
Chen, H.; Qi, Z.; Shi, Z. Remote sensing image change detection with transformers. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5607514. [Google Scholar] [CrossRef]
Jiang, H.; Peng, M.; Zhong, Y.; Xie, H.; Hao, Z.; Lin, J.; Ma, X.; Hu, X. A survey on deep learning-based change detection from high-resolution remote sensing images. Remote Sens. 2022, 14, 1552. [Google Scholar] [CrossRef]
Gao, S.; Li, W.; Sun, K.; Wei, J.; Chen, Y.; Wang, X. Built-up area change detection using multi-task network with object-level refinement. Remote Sens. 2022, 14, 957. [Google Scholar] [CrossRef]
Yang, K.; Xia, G.S.; Liu, Z.; Du, B.; Yang, W.; Pelillo, M.; Zhang, L. Asymmetric siamese networks for semantic change detection in aerial images. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5609818. [Google Scholar] [CrossRef]
Zheng, Z.; Zhong, Y.; Tian, S.; Ma, A.; Zhang, L. ChangeMask: Deep multi-task encoder-transformer-decoder architecture for semantic change detection. ISPRS J. Photogramm. Remote Sens. 2022, 183, 228–239. [Google Scholar] [CrossRef]
Zhou, Y.; Wang, J.; Ding, J.; Liu, B.; Weng, N.; Xiao, H. SIGNet: A Siamese graph convolutional network for multi-class urban change detection. Remote Sens. 2023, 15, 2464. [Google Scholar] [CrossRef]
Yuan, P.; Zhao, Q.; Zhao, X.; Wang, X.; Long, X.; Zheng, Y. A transformer-based Siamese network and an open optical dataset for semantic change detection of remote sensing images. Int. J. Digit. Earth 2022, 15, 1506–1525. [Google Scholar] [CrossRef]
Caruana, R. Multitask learning. Mach. Learn. 1997, 28, 41–75. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]

Figure 1. Difference between SCD and BCD.

Figure 2. Four strategies for semantic change detection.

Figure 3. Structure of the network. (a) The overview of the network. (b) Attention-guided residual module. (c) Co-ordinate attention mechanism. During feature extraction, the feature map is downsampled to 1/8 of the original image; then, the feature map is predicted directly and upsampled to the original image size to attain the final result.

Figure 4. Global and local attention mechanism.

Figure 5. (a) Change-aware mask branch. (b) Change-aware mask module.

Figure 6. A selection of samples from the Landsat-SCD dataset. (a–d) represent four different scenarios in the Landsat-SCD dataset.

Figure 7. A selection of samples from the SECOND dataset. (a–d) represent four different scenarios in the SECOND dataset.

Figure 8. Visual comparisons of CGMNet and other compared methods on the Landsat-SCD dataset.

Figure 9. Visual comparisons of CGMNet and other comparison methods on the SECOND dataset.

Table 1. Comparison with SOTA methods on the Landsat-SCD dataset.

Methods	Accuracy				Computational Costs
Methods	mIoU (%)	F1-Score (%)	SeK (%)	Overall Score (%)	Params (Mb)	FLOPs (Gbps)	Infer. Time (s/100e)
CLNet	77.94	78.53	-	-	8.10	34.63	-
BIT	82.34	83.40	-	-	3.04	34.93	-
DESSN	78.50	79.21	-	-	19.35	147.21	-
AERNet	85.19	86.42	-	-	25.36	51.27	-
HRSCD3	82.22	82.22	38.53	51.64	12.77	42.94	6.53
HRSCD4	82.29	83.62	38.01	51.30	13.71	43.97	6.52
SSCDl	82.75	84.11	40.42	53.13	23.31	189.76	2.08
BiSRNet	84.89	86.20	46.67	58.13	23.39	190.30	2.32
Ours	85.24	86.52	47.43	58.77	24.45	196.86	3.23

The top performance is shown in bold font, followed by the second best indicated by a horizontal line. In the computational costs comparison, only the SCD methods are involved.

Table 2. Comparison with SOTA methods on the SECOND dataset.

Methods	Accuracy				Computational Costs
Methods	mIoU (%)	F1-Score (%)	SeK (%)	Overall Score (%)	Params (Mb)	FLOPs (Gbps)	Infer. Time (s/100e)
CLNet	66.21	64.94	-	-	8.10	34.63	-
BIT	70.31	70.32	-	-	3.04	34.93	-
DESSN	69.54	69.26	-	-	19.35	147.21	-
AERNet	70.63	70.99	-	-	25.36	51.27	-
HRSCD3	68.84	69.04	16.61	32.28	12.77	42.94	6.53
HRSCD4	72.13	73.25	20.27	35.83	13.71	43.97	6.52
SSCDl	72.33	73.01	21.29	36.60	23.31	189.76	2.08
BiSRNet	72.34	73.11	21.00	36.40	23.39	190.30	2.32
Ours	72.69	73.59	21.79	37.06	24.45	196.86	3.23

The top performance is shown in bold font, followed by the second best indicated by a horizontal line. In the computational costs comparison, only the SCD methods are involved.

Table 3. Ablation experimental results on the Landsat-SCD dataset.

Methods	mIoU (%)	F1-Score (%)	SeK (%)	Overall Score (%)	Params (Mb)
Base	81.67	82.75	37.87	51.01	23.33
Base + GLAM	84.77	86.11	46.12	57.72	23.40
Base + GLAM + Branch	85.24	86.52	47.43	58.77	23.45

The top performance is shown in bold font, followed by the second best indicated by a horizontal line.

Table 4. Ablation experimental results on the SECOND dataset.

Methods	mIoU (%)	F1-Score (%)	SeK (%)	Overall Score (%)	Params (Mb)
Base	72.39	73.17	20.76	36.25	23.33
Base + GLAM	72.42	73.20	21.03	36.45	23.40
Base + GLAM + Branch	72.69	73.59	21.79	37.06	23.45

The top performance is shown in bold font, followed by the second best indicated by a horizontal line.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tan, L.; Zuo, X.; Cheng, X. CGMNet: Semantic Change Detection via a Change-Aware Guided Multi-Task Network. Remote Sens. 2024, 16, 2436. https://doi.org/10.3390/rs16132436

AMA Style

Tan L, Zuo X, Cheng X. CGMNet: Semantic Change Detection via a Change-Aware Guided Multi-Task Network. Remote Sensing. 2024; 16(13):2436. https://doi.org/10.3390/rs16132436

Chicago/Turabian Style

Tan, Li, Xiaolong Zuo, and Xi Cheng. 2024. "CGMNet: Semantic Change Detection via a Change-Aware Guided Multi-Task Network" Remote Sensing 16, no. 13: 2436. https://doi.org/10.3390/rs16132436

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

CGMNet: Semantic Change Detection via a Change-Aware Guided Multi-Task Network

Abstract

1. Introduction

2. Related Work

2.1. Binary Change Detection

2.2. Semantic Change Detection

3. Methodology

3.1. The Overall Architecture of CGMNet

3.2. Dual Temporal Feature Fusion

3.3. Change-Aware Mask Branch

3.4. Loss Function

4. Experiment and Analysis

4.1. Dataset

4.2. Evaluation Metrics

4.3. Training Details

4.4. Performance Evaluation

4.4.1. Comparison Methods

4.4.2. Quantitative Comparison

4.4.3. Qualitative Comparison

4.5. Ablation Studies

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI