Edge-Guided Hierarchical Network for Building Change Detection in Remote Sensing Images

Yang, Mingzhe; Zhou, Yuan; Feng, Yanjie; Huo, Shuwei

doi:10.3390/app14135415

Open AccessArticle

Edge-Guided Hierarchical Network for Building Change Detection in Remote Sensing Images

School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(13), 5415; https://doi.org/10.3390/app14135415

Submission received: 30 May 2024 / Revised: 14 June 2024 / Accepted: 17 June 2024 / Published: 21 June 2024

Download

Browse Figures

Versions Notes

Abstract

:

Building change detection monitors building changes by comparing and analyzing multi-temporal images acquired from the same area and plays an important role in land resource planning, smart city construction and natural disaster assessment. Different from change detection in conventional scenes, buildings in the building change detection task usually appear in a densely distributed state, which is easy to be occluded; at the same time, building change detection is easily interfered with by shadows generated by light and similar-colored features around the buildings, which makes the edges of the changed region challenging to be distinguished. Aiming at the above problems, this paper utilizes edge information to guide the neural network to learn edge features related to changes and suppress edge features unrelated to changes, so as to accurately extract building change information. First, an edge-extracted module is designed, which combines deep and shallow features to supplement the lack of feature information at different resolutions and to extract the edge structure of the changed features; second, an edge-guided module is designed to fuse the edge features with different levels of features and to guide the neural network to focus on the confusing building edge regions by increasing the edge weights to improve the network’s ability to detect the edges that have changed. The proposed building change detection algorithm has been validated on two publicly available data (WHU and LEVIR-CD building change detection datasets). The experimental results show that the proposed model achieves 91.14% and 89.76% in F1 scores, respectively, demonstrating superior performance compared to some recent learning change detection methods.

Keywords:

change detection; deep learning; feature fusion; edge features; optical remote sensing images

1. Introduction

Remote sensing change image detection [1] is a technique that analyzes the process of feature changes in remote sensing images of the same geographic area at different times. Through change detection, researchers are able to identify whether the region of interest of two remote sensing images has changed or not and obtain an accurate change map for subsequent research [2]. Building change detection is a branch of remote sensing image change detection that monitors building changes by comparing and analyzing remote sensing images from multiple time phases. Currently, remote sensing building detection has a wide range of applications in the fields of land resource planning [3,4], smart city construction [5], and humanitarian remote sensing. Notably, change detection plays a pivotal role in humanitarian remote sensing by aiding in natural disaster assessments and landmine and unexploded ordnance detection. In the aftermath of natural disasters, remote sensing technology can swiftly provide essential information that is critical for orchestrating emergency responses and effectively distributing resources [6,7,8]. Furthermore, it can detect subtle terrain changes, thereby enhancing the safety and efficiency of surveys in areas potentially littered with landmines and unexploded ordnance [9,10].

In recent years, the deep learning model [11,12,13,14,15,16,17,18] with powerful feature representation capability has gradually taken its place in the field of change detection. Deep learning-based change detection methods feed input image pairs into neural networks to extract features. The change detection task is then treated as a pixel-by-pixel binary classification or multi-classification task, generating binary difference maps and obtaining the final change maps through operations such as threshold segmentation. Based on the different stages of multi-temporal image fusion, deep learning-based change detection methods can be categorized into early fusion methods and late fusion methods. Early fusion refers to stacking the multi-temporal images before the encoding stage. Geng et al. [19] sliced the two multi-spectral images into small blocks and stacked the corresponding positional blocks of the bi-temporal images as an input sequence, which was fed into a recurrent neural network to extract the features. Peng et al. [20] constructed a change detection neural network structure based on Unet++ [21]. This network splices the bi-temporal data and feeds it as a whole into the neural network, extracting multi-scale change feature maps through dense connections. Late fusion uses multiple branches with shared weights to extract features from different temporal phases, which are then fused during the decoding stage. Zhan et al. [22] input the bi-temporal images into a five-layer Siamese neural network trained with weighted contrast loss to extract spatial features and then used the k-nearest neighbor technique to generate the final change maps. Chen et al. [23] chose Resnet50 as the baseline network and used spatial attention and channel attention modules to improve the discrimination of change features. Chen et al. [24] introduced a Transformer module to extract image context information to obtain excellent change detection results. However, the above methods focus on the fusion of different simultaneous image features and ignore the fusion of shallow and deep information, resulting in rough-edge results. Since change detection of buildings requires high edge clarity, it is difficult to achieve high-performance results in many scenarios if the above methods are directly used for building change detection.

Some recent works have started to explore approaches to improve the accuracy of building edge change detection, such as by introducing additional branches to detect the edges of building change areas [25,26,27]. Although these methods improve the edge accuracy of change detection to some extent, capturing the edge information from complexly distributed grounding objects remains challenging: (1) Buildings often have similar color characteristics to their surrounding regions, making it challenging to capture the edge of changed buildings; (2) Buildings usually have shadows, which can cause the edges of the buildings to be difficult to identify. These factors make it difficult for existing neural network models to discriminate the edges of buildings.

To address the above problems, we propose a novel edge-guided hierarchical network (EGHNet) to enhance the accuracy of change detection at the edges of buildings. In our proposed EGHNet, a new edge-extracted module is designed to enhance the ability to capture the edge information of buildings. In the edge-extracted module, we designed a pixel-wise cross-scale feature comparison operator based on the fact that the edge features at large-scale maps are more dense than that in small-scale maps. This operator enables our proposed edge-extracted module to have a more acute ability to capture the information at the edges of remote-sensing objects. Compared to traditional convolution operators, our designed operator can more accurately capture features at the edges. The edge-guided module performs an edge feature compensation process. Specifically, it selectively supplements the edge features onto the multi-scale feature maps under the guidance of the spatial distribution clues captured by the edge-extracted module. The edge-guided module is designed to address the problem of deep convolutional features easily losing detail information, thereby ensuring the model better detects the details of changed buildings. Additionally, our designed network structure is hierarchical, which allows us to better detect macroscopic features and microscopic details of remote sensing scenes, thereby enhancing the accuracy of change detection. The main contributions of this paper are summarized as follows:

(1): An edge-guided hierarchical network that leverages edge information to accurately extract building change information is proposed to utilize edge information to enhance the accuracy of building change detection by focusing on the edges of changed regions;
(2): An edge-extracted module is designed to combine deep and shallow features for comprehensive edge structure extraction. Additionally, an edge-guided module is developed to fuse edge features with different levels of features, enhancing the network’s ability to detect and focus on the edges of changed regions;
(3): The proposed method demonstrates superior performance on WHU and LEVIR-CD building change detection datasets, achieving high F1 scores and outperforming recent learning-based change detection methods.

2. Methods

2.1. Overall Architecture

High-quality edge information reflects both the geometric shape and location of buildings, aiding in their localization and detection. The encoder–decoder structure, through multiple downsampling steps, obtains high-level semantic features. However, this process irreversibly leads to a significant loss of image detail, affecting change detection performance, especially in the accuracy of edge segmentation.

To solve the above problems and improve the localization accuracy of the edge region, this study proposes an edge-guided building change detection algorithm, termed Edge-Guided Hierarchical Network (EGHNet). The overall framework of the algorithm includes Encoder-Decoder Backbone, edge-extracted module (EEM), and edge-guided module (EGM). The overall structure is shown in Figure 1.

For given bi-temporal images, these images are input to the encoder of the neural network to extract features, which include both deep semantic information and shallow detail information. The edge-extracted module utilizes both deep and shallow features to capture the edge structure of the changing feature, thus effectively extracting an accurate edge map of the changing feature. The edge-guided module fuses the extracted edge information with different levels of features, and the detail information is re-injected into the decoding process through the guidance of the edges, which effectively avoids the loss of detail information in the traditional coding and decoding structure in order to enhance the representation of features in the decoding process and to utilize the edge information to guide the generation of change detection results.

2.2. Hierarchical Encoder-Decoder Architecture

The overall network follows whole follows a coder-decoder architecture and contains two encoders and one decoder (denoted green, orange, and blue in Figure 1, respectively).

The two encoders are a pair of twin neural network branches that share the same network structure weights, which are used for feature extraction on remote sensing images

X_{1}

and

X_{2}

, respectively. The encoders use the ResNet-18 network [28], where the last three layers used for classification are removed. Each encoder branch consists of five stages that output five levels of features

F_{t}^{i}

, where t corresponds to two moments of the images

X_{1}

and

X_{2}

, and the resolution sizes of the features are

C_{i} \times (H / 2^{i - 1}) \times (W / 2^{i - 1})

, and the number of feature channels for each level is

{C_{1}, C_{2}, C_{3}, C_{4}, C_{5}} = {64, 64, 128, 256, 512}

, respectively. In this way, the two encoders are able to extract features independently for bi-temporal data, and each sample pair in a multi-temporal image can be represented by hierarchical features.

The expressive ability of a CNN model has a close relationship with its receptive field. Usually, shallow networks with small receptive fields can meet the localization needs of fine targets, and large-scale targets need deep networks to provide large receptive fields for localization. A single receptive field makes it difficult to meet the needs of different targets. In order to be able to accurately identify the changes of different scale targets, EGHNet connects the features extracted by the encoder with the features generated by the decoder by jump connection which maximally strengthens the multi-scale feature characterization ability of the building change detection network. Each layer of features in the decoder consists of two parts: one part is the features extracted by the encoder, which uses the edge-guided module to increase the weight of the target edges in the features, and the other part is the features with the same resolution obtained from the up-sampling in the decoder. These two parts of the features are jump-joined, spliced in channel dimension, and input to the next level after passing through a convolutional layer. The output layer of the network is normalized by a Softmax layer, and the final output is a change map corresponding to the two remotely sensed images.

2.3. Edge-Extracted Module

The low-level features of an image refer to features such as contours, edges, colors, texture shapes, etc., which can provide the exact location of the target features. Although the low-level features contain rich edge detail information, the image scene in the low-level features is more complex, and it is easy to introduce the edges of some non-changing target objects, which interferes with the change detection of the features. Therefore, this algorithm considers using high-level semantic features to assist the low-level features in locating the target object, exclude the interference of non-changing regions, and extract the edge information of changing features. The structure of the edge-extracted module to realize this function is shown in Figure 2.

In the edge-extracted Module, the highest-resolution features and the lowest-resolution features jointly extract the change target edges. As shown in Figure 2, this module contains four inputs, which are two by two for change feature extraction:

\begin{array}{l} F_{1} = | F_{t 1}^{1} - F_{t 2}^{1} | \\ F_{5} = | F_{t 1}^{5} - F_{t 2}^{5} | \end{array}

(1)

The edge-extracted module can estimate the edge region in a remote sensing image for each spatial location (i, j). Specifically, it searches the suitable points within a

k \times k

window centered at point (ki, kj) in feature

F_{1}

. For each location

F_{5} (i, j) \in ℝ^{1 \times 1 \times C}

, it is sent to the edge-extracting unit together with the corresponding position

F_{1} (k i, k j)

. For each

F_{1} (k i, k j) \in ℝ^{k \times k \times C}

, it is split into

k^{2}

th

1 \times 1 \times C

vectors

V_{1}

according to its position and then computed with

F_{5} (i, j)

in the edge-extracting unit. The edge-extracting unit is used to combine coarse and fine features in order to extract the edge information and get the location feature

F^{'} (i, j)

at each position. The formula is as follows:

F^{'} (i, j) = M (w [V_{1} (k i, k j), F_{5} (i, j)] \forall i \in {1, 2, \dots, k}, j \in {1, 2, \dots, k})

(2)

where

w

denotes a parametrically learnable MLP and M denotes an operation that permute

1 \times 1 \times C

vectors as a

k \times k \times C

tensor by its original position.

k

denotes the size difference multiple between two features.

Next, all location features are integrated along their original location by concatenating operation to obtain fusion edge feature

F^{'}

, the process that can be denoted as follows:

F^{'} = M (F^{'} (i, j) \forall i \in {1, 2, \dots, H}, j \in {1, 2, \dots, W})

(3)

where M denotes an operation that permutes

k \times k \times C

tensors as a

H \times W \times C

tensor by its original position. H and W denote the height and width of the feature

F_{1}

.

Based on the above fusion feature

F^{'}

, a 3

\times

3 convolutional layer and a Sigmoid function are used, and finally, the edge feature

F_{e d g e}

is obtained, and this process can be denoted as follows:

F_{e d g e} = σ (C o n v_{3 \times 3} (F^{'}))

(4)

where

σ

denotes the sigmoid function, and

C o n v_{3 \times 3}

denote the convolutional layers with convolutional kernel size 3

\times

3.

2.4. Edge-Guided Module

The encoder network performs multi-level feature extraction and differential feature extraction on remote sensing images, which are used to better extract discriminative features between changing and unchanging regions. In Section 2.3, the edge-extracted module extracts the change information of the target features by combining deep semantic features and shallow features. In order to further improve the edge detection performance of changing buildings and obtain finer features, the neural network uses edge features to guide the process of building change detection. Based on the above considerations, the edge-guided module (EGM) is designed in this section. There are four edge-guided modules in the EGHNet structure, which are located in the jump connection part of each resolution feature (as shown in the gray module in Figure 1). This module guides the network to improve the quality of the change detection results by fusing the edge features and change features of the target features, increasing the weight on the edge information in the change features of different resolutions, enhancing the salience of the change target features, and obtaining more discriminative features. Its specific structure is shown in Figure 3.

The input of this module is divided into two parts, which are the extracted edge features

F_{e d g e}

and the bi-phasic spatial features

F_{t 1}^{i}

and

F_{t 2}^{i}

with different resolutions. Among them, the edge features

F_{e d g e}

are generated by the edge-extracted module EEM in the previous subsection; bi-temporal features

F_{t 1}^{i}

and

F_{t 2}^{i}

. Use Equation (5) to compute the variation features at the corresponding resolution:

F_{i} = | F_{t 1}^{i} - F_{t 2}^{i} |

(5)

Meanwhile, the edge feature

F_{e d g e}

is downsampled by i − 1 times to generate a feature with the same size as the change feature. After that, the edge features are multiplied element-wise with the change feature

F_{i}

to obtain the edge weights, which are then added element-wise. Next, a 1

\times

1 convolutional layer is used for channel alignment to obtain the fusion feature

{F^{'}}_{i}

. The process is denoted as follows:

{F^{'}}_{i} = C o n v_{1 \times 1} [F_{i} \oplus (F_{i} ⊙ M a x p o o l i n g (F_{e d g e}))]

(6)

where

C o n v_{1 \times 1}

denotes convolution operation,

\oplus

denotes element-by-element addition,

⊙

denotes Hadamard product whose operation is element-by-element multiplication, and

M a x p o o l i n g

denotes maximum pooling as the downsampling method.

Through the above operation, the initial fusion of change features and edge features is realized, and the edge information in the deep features is given weights, which makes the network increase the attention to the edges.

To further explore the intrinsic connections in the features, EGM introduces a channel attention mechanism to filter the feature channels so as to retrieve feature responses that are more beneficial for the change detection task. Specifically, EGM uses channel global maximum pooling (GMP) and global average pooling (GAP) to aggregate features

F_{i}^{’}

, denoted respectively:

F_{m i} = M a x p o o l i n g (F_{i}^{’})

(7)

F_{a i} = A v e p o o l i n g (F_{i}^{’})

(8)

The features

F_{m i}

and

F_{a i}

obtained in both ways are then channel stacked and the attention values of each channel are obtained using convolution and Sigmoid functions. Finally, the EGM multiplies the channel attention values with the fused features

F_{i}^{’}

to obtain the final output

F_{e i}

. The process is denoted as follows:

F_{e i} = F_{i}^{’} ⊙ σ [C o n v_{1 \times 1} (C o n c a t (F_{m i}; F_{a i}))]

(9)

where

⊙

denotes the Hadamard product,

C o n v_{1 \times 1}

denotes a convolutional layer with a convolutional kernel size of 1

\times

1, and

σ

denotes the Sigmoid function.

C o n c a t (;)

denotes stacking of two features by channel.

In summary, the EGM realizes the fusion of edge features and change features, increases the weight of the edge portion of the features, and assigns weights to the feature channels through the attention mechanism, which directs the network to focus on the edge region of the building, and thus improves the change detection capability of the network.

2.5. Loss Function

During the training process of the edge-guided change detection network algorithm, there are two types of supervision: one for the change detection results themselves and the other for the edge information of the change results. Let

Y

represent the labels of the data,

Y_{e}

represent the edge maps of the data,

\hat{Y}

represent the change map results output by the network, and

{\hat{Y}}_{e}

represent the edge map results output by the network.

The proposed algorithm supervises the change detection results using a weighted binary cross-entropy loss

L_{W B C E}

, which focuses more on the difference between the output value and the true value, converges faster, and assigns different weights to the pixel values. Its formula is as follows:

L_{W B C E} (\hat{Y}, Y) = - \frac{1}{N} \sum_{i = 1}^{N} [β y_{i} \cdot \log {\hat{y}}_{i} + (1 - β) (1 - y_{i}) \cdot \log (1 - {\hat{y}}_{i})]

(10)

where

β

denotes the category balances coefficient.

The number of pixels in the edge and background portions of the edge labels is extremely imbalanced, which constrains the ability of the cross-entropy loss to supervise the model.: Therefore, Dice Loss is used in this section to supervise the edge information of the change results. Dice Loss is used to evaluate the similarity metric between two samples, and its performance is better than the cross entropy loss when dealing with data with an imbalanced distribution of positive and negative samples. The value of Dice Loss is in the range of 0 to 1, and the larger value indicates that the similarity of the two values is higher, which is defined as follows:

L_{d i c e} = 1 - \frac{2 \cdot | Y_{e} \cap {\hat{Y}}_{e} |}{| Y_{e} | + | {\hat{Y}}_{e} |} = 1 - \frac{2 \cdot T P}{2 \cdot T P + F P + F N}

(11)

where TP, FP, and FN denote true positive, false positive, and false negative, respectively.

In summary, the overall loss of the network is:

L = L_{b c e} (Y, \hat{Y}) + λ L_{d i c e} (Y_{e}, {\hat{Y}}_{e})

(12)

where

λ

denotes the weight balance coefficient, which was set to 5 in the experiment.

3. Results

We conduct extensive experiments to demonstrate the effectiveness of our proposed model. This section describes the experimental configurations and reports the evaluation results.

3.1. Experimental Dataset

In order to fully validate the effectiveness of the proposed algorithm on building datasets, the WHU Building Change Detection dataset [29] and the LEVIR-CD Dataset [30] are selected for experiments. The WHU dataset and the LEVIR-CD dataset were selected due to their high relevance to building change detection tasks and their high temporal and spatial resolutions. Both datasets are specifically designed for building change detection, containing multiple different building types, and are rich in detail. Ensuring that our research will better validate the performance of the building change detection model in a wider range of scenarios.

The WHU dataset covers remotely sensed images of Christchurch after a magnitude 6.3 earthquake in 2011 and its reconstruction four years later. The two images of these data are the remote sensing image containing 12,796 buildings in April 2012 and the remote sensing image containing 16,077 buildings in the same area in 2016, with a spatial resolution of 0.3 m. In this experiment, the whole image is cropped into image blocks of 256 × 256 size, and 20% of the data are randomly selected as the test set, while the rest is used as the data for the training set.

The LEVIR-CD Dataset consists of 637 pairs of 1024 × 1024 pixel bi-temporal images, divided into training, validation, and test sets. These bi-temporal images are provided by Google Earth and come from 20 different areas in Texas. In order to be consistent with the experimental setup on the WHU dataset, we cut each pair of images into non-overlapping image chunks of 256 × 256-pixel size, which are set up as training, validation, and test sets according to the original dataset distribution. In particular, the main challenges of this dataset are the very uneven distribution of samples and the architectural change detection. Figure 4a–c shows examples from the LEVIR-CD dataset.

3.2. Training Parameter Settings and Compared Methods

The network is implemented based on the deep learning framework PyTorch 1.0.1 using a NVIDIA GTX 3090 GPU (NVIDIA Corporation, in Santa Clara, CA, USA). During training, the network uses Adam as an optimizer. The encoder network was initialized using the ResNet-18 model pre-trained at ImageNet, and the rest was initialized using the Xavier method. The initial learning rate for both datasets is 0.001. The size of the convolutional kernel used for the convolutional layer in the decoder is set to 3, and the weight balance factor is set to 5.

In the experiments, the evaluation metrics use five evaluation metrics computed based on the confusion matrix: overall accuracy (OA), precision rate (PRE), recall rate (REC), and F1-score (F1).

To validate the effectiveness of our proposed EGHNet, we compare our method with multiple deep learning-based methods, including FC-EF [31], FC-Siam-Conc [31], BIT-CD [32], FCN-PP [33], UNet++MSOF [20], DSIFN [34], ChangeFormer [35], HSAA-CD [36] that are described as follows:

(1): FC-EF: Image-level fusion method, where the bi-temporal images are concatenated as a single input to an FCN;
(2): FC-Siam-conc: Feature-level fusion method, which employs a Siamese FCN to extract multi-level features and fuses bi-temporal information using concatenation.
(3): FCN-PP: An FCN with a pyramidal pool, which widens a wider receptive field and fuses the multi-scale features;
(4): UNet++MOSF: A densely connected “U-shaped” convolutional network with multiple side-outputs fusion;
(5): DSIFN: An encoder–decoder architecture that extracts multi-scale features using a VGG-based encoder and fuses features through an attention-based decoder;
(6): EGRCNN: An encoder–decoder architecture that contains two prediction branches, with one branch predicting the change area and the other predicting the edges of the change area;
(7): BIT-CD: Transformer-based change detector, which fuses bi-temporal convolutional features using transformer encoders and decoders;
(8): ChangeFormer: A transformer-based Siamese network architecture for Change Detection from a pair of co-registered remote sensing images;
(9): HSAA-CD: A dual-branch CNN and transformer-based Siamese network.

3.3. Comparison Experiments on the WHU Dataset

We have compared the proposed edge-guided building change detection-based algorithm with other change detection methods on the WHU building change detection dataset.

Figure 5 and Figure 6 show the qualitative comparison results of the multiple methods in a large building change scenario and a small building change scenario, respectively. In order to better observe the results, different colors are used to show the change detection results.

Comparison results show that FC-EF and FC-Siam-Conc perform with many misses and misdetections in the change detection maps (as shown in the red and green parts in Figure 5 and Figure 6). The change images generated by FCN-PP based on migration learning are intuitively closer to the manually labeled labels, but there are still some missed detection regions in some scenes. Unet++ MSOF and DSIFN have relatively good detection results in target scenes at various scales, but their edge detection is not clear. BIT-CD has good detection results in most scenes. However, it has a large region of false detection under the interference of similar colored features (as shown in the red part in Figure 5i). In the various scenarios, it can be seen that the change maps predicted by the proposed EGHNet are more complete and with clear details. Especially, our EGHNet still performs well under the interference of similar-colored features. This is due to the fact that the edge-extracted module brings additional information to the feature extraction and reconstruction process, which positively impacts the performance of the network.

Table 1 presents the quantitative comparison of different methods on the WHU dataset. The results demonstrated that the performance of our proposed EGHNet outperforms the other methods. Specifically, its accuracy (Pre) is 93.08%, recall (Rec) is 89.65%, F1 score is 91.64%, and overall accuracy (OA) is 98.74%. These results clearly show the superiority of EGHNet in processing WHU datasets, especially in terms of accuracy and recall, thus achieving significant performance improvements in several metrics.

3.4. Comparison Experiments on the LEVIR-CD Dataset

In the comparison experiments of the LEVIR-CD dataset, two building change scenarios in the LEVIR-CD dataset were selected for analysis, namely: large irregular building change and small densely distributed building change scenarios. Figure 7 and Figure 8 show the qualitative comparison results under the above two change scenarios in the LEVIR-CD dataset, respectively. For better observation, different colors are used to show the change detection results.

From Figure 7 and Figure 8, it can be seen that among the experimental comparison methods, FC-EF has a weaker detection capability, with better detection results only for large buildings, and cannot detect the target well within various sparse buildings and dense building clusters. The other methods also suffer from different degrees of misdetection, among which FC-Siam-Conc and Unet-MSOF have more misdetections. In contrast, FCN-PP and DSIFN misdetections are improved, but there is still a part of missed detection. Our proposed EGHNet method is able to accurately detect the changed region and capture the edge of ground objects in various scenarios, which is very close to the manually labeled labels from the visual point of view.

Table 2 demonstrates the performance of different methods on the LEVIR-CD dataset. The proposed EGHNet performs well in several aspects, where F1 and OA perform optimally among all the methods with 89.76% and 99.02%, respectively; PRE and REC perform sub-optimally with 91.56%, 88.23%, and 89.24%, respectively. This result proves the effectiveness of the proposed algorithm.

3.5. Ablation Studies

In order to verify the effectiveness of the proposed module, ablation experiments are designed in this section. Given that the edge-extracted module and edge-guided module in the proposed algorithms support each other and cannot work alone, we eliminate both EEM and EGM in the ablation experiments and keep only the change detection branch as the baseline model, called EGHNet_backbone.

As shown in Table 3, the ablation experiments on the WHU dataset indicate that the addition of edge guidance significantly improves the performance of EGHNet_backbone by 4.07% in terms of F1 scores. These experimental results validate the effectiveness of introducing edge information into the change detection task and highlight the great potential of this method in improving change detection performance.

In Equation (12), the loss function of edge information accounts for an important part of the overall loss function, and suitable weights have an impact on the change detection effect of the network.

Therefore, we conduct the ablation experiment to explore the supervision of the edge map of the change target and the setting of the weight balance coefficient λ. In order to comprehensively reflect the influence of the change of λ value, the ablation experiment sets three cases: when λ = 0, it means that no supervision is performed on the edge information; when λ = 1, it means that the edge information is equally valued with the change information; when λ > 1, it means that the percentage of the edge loss in the objective function is elevated. λ = 5 is chosen in this experiment. Table 4 demonstrates the objective evaluation metrics for the proposed algorithm on the dataset WHU when the value of λ is changed. From the data in Table 4, it can be seen that the change detection performance is improved when the edge information is used. Furthermore, the performance is further improved when the weight on the edge information loss as a percentage of the overall loss is increased (as shown in the third row that λ = 5).

3.6. Efficiency Analysis

We evaluated the efficiency of our proposed method, EGHNet, by comparing it with several existing methods using parameters (Params.) and floating-point operations per second (FLOPs). All methods used an input size of 256 × 256 × 3 for consistency.

As shown in Table 5, FC-EF and FC-Siam-Conc have the lowest Params and FLOPs but also the lowest F1 scores, indicating lower accuracy. FCN-PP and Unet++MSOF have moderate Params and FLOPs with improved F1 scores, yet they still do not achieve the highest performance levels.

BIT-CD maintains a balance between computational efficiency and performance, with a low parameter count and FLOPs, achieving a high F1 score of 89.35%. ChangeFormer, despite its high computational demands, achieves a high F1 score of 88.03%.

Our proposed EGHNet demonstrates superior performance with the highest F1 score of 91.14% while maintaining a reasonable parameter count (25.06 M) and FLOPs (112.09 GFLOPs). This highlights EGHNet’s effectiveness in delivering top-tier accuracy without the highest computational cost, showcasing a better tradeoff between model size and performance compared to other complex models like ChangeFormer.

3.7. Case Analysis

To demonstrate the advantage of our proposed approach, we select the post-disaster reconstruction after the 6.3 magnitude earthquake in Christchurch, New Zealand, as a case. The bi-temporal images are obtained in the same area in April 2012 and 2016, respectively. The number of buildings in the area increased significantly during the four years. We chose two areas of size 256 × 256 for building change analysis. Figure 9 shows the addition of some new buildings to the original building stock in the area. Our EGHNet performs well in detecting the increase in buildings in this area. Specifically, the change map detected by EGHNet is almost the same as the ground truth.

4. Conclusions

The change detection of buildings is a challenge because it is easily interfered with by the shadows generated by light and the similar color features around the buildings, resulting the problems such as dense distribution and susceptibility to occlusion. To address these problems, this paper proposes a building change detection model based on edge guidance to accurately extract building change information, namely EGHNet. The model includes a hierarchical encoder-decoder network, an edge-extracted module, and an edge-guided module. The edge-extracted module extracts the shallow and deep features in the bi-temporal image. The edge-extracted module combines the shallow and deep features to extract the edge information of the target object so as to effectively obtain the accurate edge map of the changed feature. The edge-guided module fuses the edge information with the difference information, increases the weight of the target edge, and guides the neural network to focus on the easily confused building edge region so as to improve the network’s change detection performance. We compare the proposed change detection model with existing models on two publicly available building change detection datasets. Experimental results demonstrate the superiority of the proposed model.

In future work, several directions can be pursued to further enhance the applicability of the proposed model. Extending the approach to detect other types of changes beyond buildings, such as natural landscape alterations or infrastructure developments, could broaden its utility. Integrating additional data sources, such as other satellite bands or sensor data, might provide more comprehensive insights and improve detection accuracy. Additionally, exploring transfer learning techniques could enable the model to be adapted to new datasets or geographic regions with minimal retraining, enhancing its generalizability and applicability to a wider range of change detection tasks. In real-world scenarios, our edge-focused methodology has potential cross-disciplinary applications, including humanitarian remote sensing, particularly in the detection of landmines and unexploded ordnance. Our approach is exceptionally adept at addressing the challenges posed by remotely triggered and non-metallic scatterable landmines. These landmines, which typically contain minimal metallic components, present significant detection challenges in complex environments. The enhanced edge detection capabilities of our proposed model could improve the identification and localization of these hazardous objects, facilitating safer and more efficient surveying of affected areas.

Author Contributions

Conceptualization, M.Y. and S.H.; methodology, Y.Z.; software, Y.F.; validation, M.Y., Y.Z. and S.H.; formal analysis, Y.F.; investigation, Y.Z.; resources, Y.Z.; data curation, Y.F.; writing—original draft preparation, M.Y.; writing—review and editing, Y.Z. and S.H.; visualization, Y.F.; supervision, Y.Z.; project administration, Y.Z.; funding acquisition, Y.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number 62171320.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original data presented in the study are openly available at https://chenhao.in/LEVIR/ (LEVIR-CD dataset, accessed on 12 January 2022) and https://study.rsgis.whu.edu.cn/pages/download/building_dataset.html (WHU-CD dataset, accessed on 5 May 2021).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Singh, A. Review article digital change detection techniques using remotely-sensed data. Int. J. Remote Sens. 1989, 10, 989–1003. [Google Scholar] [CrossRef]
Domínguez, E.M.; Meier, E.; Small, D.; Schaepman, M.E.; Bruzzone, L.; Henke, D. A multisquint framework for change detection in high-resolution multitemporal SAR images. IEEE Trans. Geosci. Remote Sens. 2018, 56, 3611–3623. [Google Scholar] [CrossRef]
Lv, Z.; Wang, F.; Cui, G.; Benediktsson, J.A.; Lei, T.; Sun, W. Spatial–spectral attention network guided with change magnitude image for land cover change detection using remote sensing images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 4412712. [Google Scholar] [CrossRef]
Rokni, K.; Ahmad, A.; Selamat, A.; Hazini, S. Water feature extraction and change detection using multitemporal Landsat imagery. Remote Sens. 2014, 6, 4173–4189. [Google Scholar] [CrossRef]
Jiang, H.; Peng, M.; Zhong, Y.; Xie, H.; Hao, Z.; Lin, J.; Ma, X.; Hu, X. A Survey on Deep Learning-Based Change Detection from High-Resolution Remote Sensing Images. Remote Sens. 2022, 7, 1552. [Google Scholar] [CrossRef]
Gong, M.; Zhao, J.; Liu, J.; Miao, Q.; Jiao, L. Change detection in synthetic aperture radar images based on deep neural networks. IEEE Trans. Neural Netw. Learn. Syst. 2015, 27, 125–138. [Google Scholar] [CrossRef] [PubMed]
Mahdavi, S.; Salehi, B.; Huang, W.; Amani, M.; Brisco, B. A PolSAR change detection index based on neighborhood information for flood mapping. Remote Sens. 2019, 11, 1854. [Google Scholar] [CrossRef]
Lu, W.; Wei, L.; Nguyen, M. Bi-temporal Attention Transformer for Building Change Detection and Building Damage Assessment. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 4917–4935. [Google Scholar] [CrossRef]
de Smet, T.S.; Nikulin, A. Catching “butterflies” in the morning: A new methodology for rapid detection of aerially deployed plastic land mines from UAVs. Lead. Edge 2018, 37, 367–371. [Google Scholar] [CrossRef]
Baur, J.; Dewey, K.; Steinberg, G.; Nitsche, F.O. Modeling the Effect of Vegetation Coverage on Unmanned Aerial Vehicles-Based Object Detection: A Study in the Minefield Environment. Remote Sens. 2024, 16, 2046. [Google Scholar] [CrossRef]
Huo, S.; Zhou, Y.; Zhang, L.; Feng, Y.; Xiang, W.; Kung, S.Y. Geometric Variation Adaptive Network for Remote Sensing Image Change Detection. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5609714. [Google Scholar] [CrossRef]
Lee, J.; Wiratama, W.; Lee, W.; Marzuki, I.; Sim, D. Bilateral Attention U-Net with Dissimilarity Attention Gate for Change Detection on Remote Sensing Imageries. Appl. Sci. 2023, 13, 2485. [Google Scholar] [CrossRef]
Yang, M.; Jiao, L.; Liu, F.; Hou, B.; Yang, S. Transferred deep learning-based change detection in remote sensing images. IEEE Trans. Geosci. Remote Sens. 2019, 57, 6960–6973. [Google Scholar] [CrossRef]
Wen, Y.; Zhang, Z.; Cao, Q.; Niu, G. TransC-GD-CD: Transformer-based Conditional Generative Diffusion Change Detection Model. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 7144–7158. [Google Scholar] [CrossRef]
Hong, D.; Qiu, C.; Yu, A.; Quan, Y.; Liu, B.; Chen, X. Multi-Task Learning for Building Extraction and Change Detection from Remote Sensing Images. Appl. Sci. 2023, 13, 1037. [Google Scholar] [CrossRef]
Noman, M.; Fiaz, M.; Cholakkal, H.; Narayan, S.; Anwer, R.M.; Khan, S.; Shahbaz Khan, F. Remote sensing change detection with transformers trained from scratch. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–14. [Google Scholar] [CrossRef]
Sun, C.; Chen, H.; Du, C.; Jing, N. SemiBuildingChange: A Semi-supervised High-Resolution Remote Sensing Image Building Change Detection Method with a Pseudo Bi-Temporal Data Generator. IEEE Trans. Geosci. Remote Sens. 2023, 51, 5622319. [Google Scholar] [CrossRef]
Peng, X.; Zhong, R.; Li, Z.; Li, Q. Optical remote sensing image change detection based on attention mechanism and image difference. IEEE Trans. Geosci. Remote Sens. 2020, 59, 7296–7307. [Google Scholar] [CrossRef]
Wang, J.; Zhong, Y.; Zhang, L. Change Detection Based on Supervised Contrastive Learning for High-Resolution Remote Sensing Imagery. IEEE Trans. Geosci. Remote. Sens. 2023, 61, 5601816. [Google Scholar] [CrossRef]
Geng, J.; Fan, J.; Wang, H.; Ma, X. Change detection of marine reclamation using multispectral images via patch-based recurrent neural network. In Proceedings of the 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Fort Worth, TX, USA, 23–28 July 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 612–615. [Google Scholar]
Peng, D.; Zhang, Y.; Guan, H. End-to-end change detection for high resolution satellite images using improved UNet++. Remote Sens. 2019, 11, 1382. [Google Scholar] [CrossRef]
Zhou, Z.; Rahman Siddiquee, M.M.; Tajbakhsh, N.; Liang, J. Unet++: A nested u-net architecture for medical image segmentation. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support; Springer: Cham, Switzerland, 2018; pp. 3–11. [Google Scholar]
Zhan, Y.; Fu, K.; Yan, M.; Sun, X.; Wang, H.; Qiu, X. Change detection based on deep siamese convolutional network for optical aerial images. IEEE Geosci. Remote Sens. Lett. 2017, 14, 1845–1849. [Google Scholar] [CrossRef]
Chen, J.; Yuan, Z.; Peng, J.; Chen, L.; Huang, H.; Zhu, J.; Liu, Y.; Li, H. DASNet: Dual attentive fully convolutional Siamese networks for change detection in high-resolution satellite images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 14, 1194–1206. [Google Scholar] [CrossRef]
Zhang, H.; Lin, M.; Yang, G.; Zhang, L. ESCNet: An end-to-end superpixel-enhanced change detection network for very-high-resolution remote sensing images. IEEE Trans. Neural Netw. Learn. Syst. 2023, 34, 28–42. [Google Scholar] [CrossRef]
Bai, B.; Fu, W.; Lu, T.; Li, S. Edge-guided recurrent convolutional neural network for multitemporal remote sensing image building change detection. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5610613. [Google Scholar] [CrossRef]
Jiang, Y.; Hu, L.; Zhang, Y.; Yang, X. WRICNet: A weighted rich-scale inception coder network for remote sensing image change detection. IEEE Trans. Geosci. Remote Sens. 2022, 60, 4705313. [Google Scholar] [CrossRef]
Song, S.; Zhang, Y.; Yuan, Y. Iterative Edge Enhancing Framework for Building Change Detection. IEEE Geosci. Remote Sens. Lett. 2023, 21, 6002605. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Ji, S.; Wei, S.; Lu, M. Fully convolutional networks for multisource building extraction from an open aerial and satellite imagery data set. IEEE Trans. Geosci. Remote Sens. 2018, 57, 574–586. [Google Scholar] [CrossRef]
Chen, H.; Shi, Z. A spatial-temporal attention-based method and a new dataset for remote sensing image change detection. Remote Sens. 2020, 12, 1662. [Google Scholar] [CrossRef]
Daudt, R.C.; Le Saux, B.; Boulch, A. Fully convolutional siamese networks for change detection. In Proceedings of the 2018 25th IEEE International Conference on Image Processing (ICIP), Athens, Greece, 7–10 October 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 4063–4067. [Google Scholar]
Chen, H.; Qi, Z.; Shi, Z. Remote sensing image change detection with transformers. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5607514. [Google Scholar] [CrossRef]
Lei, T.; Zhang, Y.; Lv, Z.; Li, S.; Liu, S.; Nandi, A.K. Landslide inventory mapping from bitemporal images using deep convolutional neural networks. IEEE Geosci. Remote Sens. Lett. 2019, 16, 982–986. [Google Scholar] [CrossRef]
Zhang, C.; Yue, P.; Tapete, D.; Jiang, L.; Shangguan, B.; Huang, L.; Liu, G. A deeply supervised image fusion network for change detection in high resolution bi-temporal remote sensing images. ISPRS J. Photogramm. Remote Sens. 2020, 166, 183–200. [Google Scholar] [CrossRef]
Zhang, Y.; Zhao, Y.; Dong, Y.; Du, B. Self-supervised pre-training via multi-modality images with transformer for change detection. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5402711. [Google Scholar]

Figure 1. Structure of Edge Guided Network (EGHNet). The green downward arrow denotes down-sampling. The yellow upward arrow denotes upsampling. The white hollow rightward arrow indicates output after the Sigmoid activation function.

Figure 2. Structure of Edge-Extracted Module.

Figure 3. Structure of Edge-Guided Module.

Figure 4. Samples of the LEVIR-CD dataset and WHU. (a–c) LEVIR-CD dataset. (d–f) WHU dataset.

Figure 5. Change detection results of different methods for large building change scenarios in the WHU dataset. (a) Prechange image; (b) post-change image; (c) label; (d–j) are the detection results of FC-EF [31], FC-Siam-Conc [31], FCN-PP [33], UNet++MSOF [20], DSIFN [34], BIT-CD [32] and the proposed EGHNet, respectively. In the color classification, the true positive is white, the true negative is black, the false positive is red, and the false negative is green.

Figure 6. Change detection results of different methods for small building change scenes in the WHU dataset. (a) Prechange image; (b) post-change image; (c) label; (d–j) are the detection results of FC-EF [31], FC-Siam-Conc [31], FCN-PP [33], UNet++MSOF [20], DSIFN [34], BIT-CD [32], and the proposed EGHNet, respectively. In the color classification, the true positive is white, the true negative is black, the false positive is red, and the false negative is green.

Figure 7. Change Detection Results of Different Methods for Large Irregular Building Change Scenarios in LEVIR-CD Dataset. (a) Prechange image; (b) post-change image; (c) label; (d–j) are the detection results of FC-EF [31], FC-Siam-Conc [31], FCN-PP [33], UNet++MSOF [20], DSIFN [34], BIT-CD [32], and the proposed EGHNet, respectively. In the color classification, the true positive is white, the true negative is black, the false positive is red, and the false negative is green.

Figure 8. Change Detection Results of Different Methods for Small Densely Distributed Building Change Scenarios in LEVIR-CD Dataset. (a) Prechange image; (b) post-change image; (c) label; (d–j) are the detection results of FC-EF [31], FC-Siam-Conc [31], FCN-PP [33], UNet++MSOF [20], DSIFN [34], BIT-CD [32], and the proposed EGHNet, respectively. In the color classification, the true positive is white, the true negative is black, the false positive is red, and the false negative is green.

Figure 9. Case on the post-disaster reconstruction of Christchurch in New Zealand. (a) T1 image. (b) T2 image. (c) Ground truth. (d) Change maps generated by EGHNet. In the color classification, the true positive is white, the true negative is black, the false positive is red, and the false negative is green.

Table 1. Comparison results of quantitative metrics of different methods on the WHU dataset. The best two Results are in bold and underlined.

Methods	PRE (%)	REC (%)	F1 (%)	OA (%)
FC-EF₁₈	74.56	73.71	73.14	96.29
FC-Siam-Conc₁₈	83.96	79.32	80.52	97.52
FCN-PP₁₉	92.00	83.02	86.54	98.37
Unet++MSOF₁₉	89.18	87.82	87.88	98.39
DSIFN₂₀	72.62	88.00	79.21	97.83
BIT-CD₂₂	88.50	90.24	89.35	96.99
ChangeFormer₂₃	89.69	86.43	88.03	98.95
HSAA-CD₂₄	85.55	83.54	84.56	98.39
EGHNet	93.08	89.65	91.14	98.74

Table 2. Comparative results of quantitative metrics of different methods on the LEVIR-CD dataset. The best two results are in bold and underlined.

Methods	PRE (%)	REC (%)	F1 (%)	OA (%)
FC-EF₁₈	78.73	79.78	78.72	98.27
FC-Siam-Conc₁₈	85.39	85.93	85.44	98.84
FCN-PP₁₉	91.58	84.17	87.54	99.01
Unet++MSOF₁₉	88.52	85.29	86.69	98.94
DSIFN₂₀	85.61	86.63	85.48	98.97
BIT-CD₂₂	89.24	89.37	89.31	98.92
ChangeFormer₂₃	89.39	87.73	88.56	98.81
HSAA-CD₂₄	89.14	88.83	89.02	98.83
EGHNet	91.56	88.23	89.76	99.02

Table 3. Results of ablation experiments for edge detection assisted change detection on WHU dataset.

Methods	PRE (%)	REC (%)	F1 (%)	OA (%)
EGHNet_backbone	92.17	82.77	87.07	98.79
EGHNet	93.08	89.65	91.14	98.74

Table 4. The weight balance coefficient affects the results of the ablation experiments on the WHU dataset.

$λ$	PRE (%)	REC (%)	F1 (%)	OA (%)
0	92.38	84.26	88.06	98.89
1	90.69	91.45	90.65	98.63
5	93.08	89.65	91.14	98.74

Table 5. Efficiency Analysis on the WHU Dataset.

Methods	GFLOPs (G)	Para (M)	F1 (%)
FC-EF₁₈	15.93	5.48	73.14
FC-Siam-Conc₁₈	21.71	5.47	80.52
FCN-PP₁₉	45.79	13.39	86.54
Unet++MSOF₁₉	45.43	11	87.88
DSIFN₂₀	82.26	50.44	79.21
BIT-CD₂₂	8.75	3.5	89.35
ChangFormer₂₃	202.7	41.02	88.03
EGHNet	112.09	25.06	91.14

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yang, M.; Zhou, Y.; Feng, Y.; Huo, S. Edge-Guided Hierarchical Network for Building Change Detection in Remote Sensing Images. Appl. Sci. 2024, 14, 5415. https://doi.org/10.3390/app14135415

AMA Style

Yang M, Zhou Y, Feng Y, Huo S. Edge-Guided Hierarchical Network for Building Change Detection in Remote Sensing Images. Applied Sciences. 2024; 14(13):5415. https://doi.org/10.3390/app14135415

Chicago/Turabian Style

Yang, Mingzhe, Yuan Zhou, Yanjie Feng, and Shuwei Huo. 2024. "Edge-Guided Hierarchical Network for Building Change Detection in Remote Sensing Images" Applied Sciences 14, no. 13: 5415. https://doi.org/10.3390/app14135415

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Edge-Guided Hierarchical Network for Building Change Detection in Remote Sensing Images

Abstract

1. Introduction

2. Methods

2.1. Overall Architecture

2.2. Hierarchical Encoder-Decoder Architecture

2.3. Edge-Extracted Module

2.4. Edge-Guided Module

2.5. Loss Function

3. Results

3.1. Experimental Dataset

3.2. Training Parameter Settings and Compared Methods

3.3. Comparison Experiments on the WHU Dataset

3.4. Comparison Experiments on the LEVIR-CD Dataset

3.5. Ablation Studies

3.6. Efficiency Analysis

3.7. Case Analysis

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI