In this study, the KOMPSAT-3 imagery with product level 1G and aerial change detection datasets were used to assess the effectiveness of the proposed method. The KOMPSAT-3 sensors can capture a panchromatic (PAN, 450–900 nm) image characterized by the high resolution of 0.7 m and multi-spectral (MS) images like blue (B, 450–520 nm), green (G, 520–600 nm), red (R, 630–690 nm), and near-infrared (NIR, 760–900 nm) with the lower spatial resolution of 2.8 m. In these images, pre-processing has been done, such as the radiometric correction, atmosphere errors, and geometric correction. The radiometric correction converts the image pixel values (Digital Numbers/DNs) to surface reflectance values. It involves the conversion of DNs to a radiance value, and then to top-of-atmosphere (TOA) radiance. On the other hand, gain and offset values are provided by the KOMPSAT-3 specification [
28] to derive the TOA reflectance values. After the atmosphere errors are corrected, the geometric correction is performed. This study defines changes as the result of construction or destruction activities; natural seasonal changes are excluded. The KOMSAT-3 dataset provides MS and PAN with resolutions of 7342 × 6847 × 4 and 29,368 × 27,388 × 1, respectively. The image data used were captured in Seoul city. For our experiments, the multispectral images were enhanced by employing IHS mean-filter pan-sharpening [
13] to produce high-resolution R, G, B, NIR, and PAN channels. Five cropped pairs of images were used for the dataset, and different areas were selected with diverse surface characteristics.
Figure 6 shows one set of the training images and the ground truth with a size of 3232 × 2206, acquired in March 2014 and December 2015. Note that the images contain multiple construction changes, residential districts, roads, playgrounds, and a small hill in the urban area. For the training setting, the images were divided into multiple patches with a size of 256 × 256 by a raster-scan sliding window with a stride of 70, generating 1024 pairs of patches and their ground truths. For the training, this study applies binary cross-entropy loss function and Adam optimizer with training parameters, including 30, 0.0001, 0.9, 0.999, and 4 as number of epochs, learning rate, β1, β2, and batch size, respectively. The proposed and conventional methods were trained over computer spec of Intel i7-6700 CPU @ 3.4Hz(8CPUs) with NVIDIA GeForce GTX 1080Ti GPU. It was implemented over PyTorch.
To evaluate the effectiveness of the proposed change detection system, four different areas were selected.
Figure 7 shows “Test area 1,” which has under-construction changes, acquired in March 2014 and December 2015. This area is in downtown Seoul, which has tall buildings, residential districts, playgrounds, and roads.
Figure 8 shows “Test area 2,” acquired in March 2014 and October 2015, which was also acquired in an urban area with building construction.
Figure 9 shows “Test area 3,” acquired in March 2014 and October 2015, which also has construction taking place in a forested area. The two images have significant radiometric differences. In addition, “Test area 4,” located near a river, was acquired in March 2014 and October 2015 (
Figure 10). Note that the image data have one bridge on a river for which geometric distortion occurs owing to different viewpoints. The image data for each test area were divided into 117 patches with a size of 256 × 256 and no overlap. All networks were tested for the patch pair. All ground truths were manually generated.
Because KOMPSAT-3 and CDD have different image characteristics and information, this study trained the specific dataset for generating a model. Hence, individual models were generated for KOMPSAT-3 and CDD and separate analyses on these datasets are presented. We used two metrics to measure the detection accuracy, namely,
F1-score and kappa coefficient (KC), to assess the effectiveness of the proposed and conventional architectures. The
F1-score can be interpreted as a weighted average of the precision and recall, the best score of
F1-score is 1.0, and the worst score is zero. In addition, KC is widely used to measure binary change detection because it is more informative for imbalanced data. The score of KC is interpreted in
Table 1 [
30,
31]. To assess the effectiveness of the proposed method, several conventional architectures (specifically, U-Net [
23], ATTUNet [
24] and Modified-UNet++ [
22]) were trained with the same hyperparameter settings and dataset as the proposed architecture and evaluated on the test datasets.
4.1. Performance Evaluation on the KOMPSAT-3 Dataset
Figure 12 shows the detection results of the proposed and conventional methods for “Test area 1.” In this experiment, the image order was set to the same order as in the training configuration, in which ‘Image P’ shows an under-construction area, and ‘Image C’ shows a completed construction area. As shown in
Figure 12, the proposed network subjectively yields a more accurate change map compared to the conventional approaches. The conventional U-Net and attention U-Net using DI generate more false positives than when using JF, owing to spectral information loss. JF can preserve the spectral information, which is the reflection characteristic of Earth’s surface; thus, the model can learn a better representation of changes. In addition, the proposed network consistently produces a better change map for reverse input order.
Figure 13 depicts the detection results for the reverse input order, which can be interpreted as the destruction change case. Conventional methods, such as U-Net, ATTUNet, and Modified-UNet++ using JF, yield worse change maps compared to the detection outcomes for the original order. These conventional approaches are significantly influenced by the input order established for change detection. However, the proposed algorithm consistently outperforms the conventional methods in terms of change detection, regardless of the input order.
Figure 14 shows the detection results of the proposed and conventional methods for “Test area 2,” in which the input order was the same as that of the training configuration. Conventional studies yield moderate precise change maps. However, U-Nets generate more false positives; in particular, U-Net with DI results in the worst case. ATTUNet and Modified-UNet++ consistently achieve precise change maps using both DI and JF because of their similar surface characteristics with respect to the training dataset. However, the proposed change map visually yields a more accurate change map than other conventional approaches. In the reverse input order, the conventional methods with JF produce worse change maps than those for the original input order, as shown in
Figure 15. Their detection outcomes are significantly influenced by the input order. This means that JF is influenced by the image input order when it comes to representing the visual characteristics. Networks with DI can generate relatively consistent change maps, regardless of the reverse input order. In addition, the proposed network consistently maintains a change map regardless of the reverse image order.
In “Test area 3,” ATTUNet with JF results in a better change map than that of other networks, as shown in
Figure 16. We found that the proposed network cannot identify the changes in some regions. This is due to the surface characteristics of this area, which has a more complex surface. Although the proposed algorithm is worse than ATTUNet for this test case, the proposed algorithm can maintain low false positives. Furthermore, the proposed method still outperforms the conventional approaches with DI and Modified-UNet++. Conventional methods with DI cannot maintain high true positives in some regions. Modified-UNet++ also generates false positives in all the building regions. The proposed method consistently generates a good change map for the reversed input order, as shown in
Figure 17. The ATTUNet with JF is significantly degenerated for the reverse order, producing many more false positives. Almost all building regions are detected as changes. In addition, Modified-UNet++ yields a very inaccurate change map.
Figure 18 shows the change maps for “Test area 4.” In this area, the proposed network visually achieves better change detection than the conventional approaches. The conventional methods with DI result in worse change maps, which imply ineffective use of DI for surface characteristics having very complex structures. They generate more false positives and false negatives. In addition, the use of JF is more effective than the use of DI, resulting in better change maps. However, the proposed network outperforms the conventional methods. For the reverse input order, the proposed algorithm generates a consistent change map, as shown in
Figure 19. The ATTUNet with JF is significantly degraded, producing many more false positives. Almost all the building regions are detected as changes. In addition, Modified-UNet++ yields a very unclear change map in which almost all regions are identified as changes.
Table 2 shows the values of
F1-score and KC to measure the detection accuracy objectively for the proposed and conventional algorithms. For “Test area 1,” all the algorithms achieve
F1-score and KC values greater than 0.75 and 75, respectively. In terms of KC interpretation, as tabulated in
Table 1, the conventional approaches with DI yield KC score interpreted as “Good,” which means that they can identify the change precisely, as shown in
Figure 14. In addition, the other conventional approaches with JF achieve better scores than when using DI both in terms of
F1-score and KC, interpreted as “Very Good,” which means that they can detect changes very well. However, the proposed architecture yields the highest score in terms of
F1-score and KC, 0.88 and 87.11, respectively, which implies that the proposed approach can identify changes more precisely than the conventional methods. For “Test area 2,” U-Net with DI still results in the lowest values of
F1-score and KC, 0.55 and 54.37, respectively, which mean “moderate” outcomes. The other conventional approaches can produce a change map interpreted as “Good,” achieving
F1-score above 0.61 and KC between 61 and 80, respectively. In this area, the proposed algorithm can identify the changes by achieving an
F1-score over 0.83 and a KC of 82.8, which implies a “Very Good” level and better detection than the other conventional approaches. For “Test area 3,” Modified-UNet++ results in “Fair” detection. In this area, ATTUNet yields the highest score by achieving an
F1-score of 0.63 and a KC of 62.87, resulting in a “Good” interpretation. Note that the proposed algorithm has a slightly worse score than ATTUNet for this case. However, the proposed architecture results in the highest values of
F1-score and KC, 0.40 and 39.78, respectively, for “Test area 4.” The conventional approaches achieve
F1-score and KC below 0.32 and 31, respectively. Overall, the proposed method yields better detection in terms of
F1-score and KC on average by achieving 0.68 and 66.93, respectively, which implies a “Good” detection rate for change detection. In addition, in terms of
F1-score and KC of the proposed method, the standard deviations are 0.19 and 21.11, indicating relatively stable change detection performance.
Table 3 shows the objective evaluation for the proposed and conventional approaches for reverse image input order. Conventional approaches using JF present significant performance degradation compared to the situation before reversing the image order. It seems that the trained networks only solve the change detection with the same image order as the training setting. In contrast to the conventional methods using DI, they can consistently produce the same result as before reversing the image order. However, the use of DI frequently results in poor detection or lower performance caused by spectral information loss. Overall, the proposed architecture can yield consistent performance with a higher detection rate, even for the reverse image order.
To investigate the effectiveness of several existing architectures (U-Net [
23], ATTUNet [
24], and Modified-UNet++ [
22]), they were trained with the reversed as well as forward input pair images like data augmentation methods.
Table 4 shows
F1-score and KC to objectively measure the detection accuracy for the proposed and existing algorithms for forward input pairs from the data-augmented KOMPSAT-3 dataset. Since the change detection method using DI feature as an input is input-order independent, we only evaluated the existing approaches using JF. The table shows that the existing methods yield higher accuracy on average compared to no data argumentation. In particular, we found that the
F1-score and KC for ‘Area 4’ of Modified-UNet++ improved the most. However, the proposed architecture outperforms the existing works on average for all the test datasets.
Table 5 shows detection accuracy of the proposed and existing algorithms for reversed input order with data-augmented training pair images from the KOMPSAT-3 dataset. While the existing methods are improved in the detection accuracy over those shown in
Table 3, the proposed architecture yields better performance on average than the existing methods. Moreover, the proposed architecture can achieve efficiency with no increase in learning time such as data augmentation.