Next Article in Journal
Estimating the CSLE Biological Conservation Measures’ B-Factor Using Google Earth’s Engine
Previous Article in Journal
Inversion of Forest above Ground Biomass in Mountainous Region Based on PolSAR Data after Terrain Correction: A Case Study from Saihanba, China
 
 
Article
Peer-Review Record

DTT-CGINet: A Dual Temporal Transformer Network with Multi-Scale Contour-Guided Graph Interaction for Change Detection

Remote Sens. 2024, 16(5), 844; https://doi.org/10.3390/rs16050844
by Ming Chen, Wanshou Jiang * and Yuan Zhou
Reviewer 1:
Reviewer 3: Anonymous
Reviewer 4: Anonymous
Reviewer 5: Anonymous
Reviewer 6: Anonymous
Remote Sens. 2024, 16(5), 844; https://doi.org/10.3390/rs16050844
Submission received: 17 December 2023 / Revised: 19 February 2024 / Accepted: 21 February 2024 / Published: 28 February 2024
(This article belongs to the Section AI Remote Sensing)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors


Comments for author File: Comments.pdf

Comments on the Quality of English Language


Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 2 Report

Comments and Suggestions for Authors


Comments for author File: Comments.pdf

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 3 Report

Comments and Suggestions for Authors

See attached file

Comments for author File: Comments.pdf

Comments on the Quality of English Language

Minor editing of English language required

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 4 Report

Comments and Suggestions for Authors

This research focused on change detection based on transformer and contour-guided graph interaction module. Below, I report some general observations that the authors consider.

(1) Please improve visualizing the figures. 

(2) Please reflect the numerical results in the abstract. 

(3) Please revise the paper carefully. I can see some typos like: "Illustration Of Graph Interaction Module."===> "Illustration of Graph Interaction Module."

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 5 Report

Comments and Suggestions for Authors

This manuscript proposed an effective method for change detection. However, some minor issues need to be addressed before publication.

(1). Fig 1 is difficult to understand. Why are fig. 1(b)-T1 and fig. 1(b)-T2 the same?

(2). Introduction introduces too few methods related to change detection, making it impossible to draw relevant conclusions from lines 102-112.

(3). Transformer-based change detection methods have been a research direction in recent years, but there is relatively little introduction in Related Work. The following manuscripts can be reviewed to improve the Related Work: 10.1109/JSTARS.2022.3177235; 10.3390/rs15071868; 10.1109/TGRS.2022.3169479.

(4). Section 3.3 only introduces the composition and structure of CGRM, but does not explain why these structures are designed in this way.

(5). Table 3-5 shows Params and flops, but there is no relevant description of these results in Section 4.5.1.

(6). The content of 5.2 is more like parameter analysis. Suggest modifying subtitle.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 6 Report

Comments and Suggestions for Authors

This manuscript proposed a hybrid network for change detection, namely DTT-CGINet. Specifically, the author proposed two modules including 1) a dual temporal transformer (DDT) to capture spatiotemporal contextual relationships between “pre-change” image and “post-change” image. 2) a contour-guided graph interaction module (CGIM) to guide the model to distinguish contour features. The motivation for this work is reasonable, and ablation studies have demonstrated its effectiveness. The excellent performance of this study in quantitative comparisons is impressive. Consequently, Therefore, the reviewers believe that this work can be published in REMOTE SENSING after revisions. Before publication, several questions should be answered. Detailed comments are as following.

 

1.      Novelty

(1)    CGIM is one of the core parts of this work. As for this part, I have two questions:

a)  The original source of inspiration for CGIM is from [1]. The original work used randomly initialized adjacency matrices, as the input of [1] came from different representations of the same object. However, using randomly initialized adjacency matrices in this work seems counterintuitive. In Figure 4, the graphs for t1 and t2 appear to use distinct adjacency matrices, which could lead to the nodes within t1 and t2 establishing entirely different connectivity patterns. Whether this approach is effective requires the authors to provide corresponding explanations or experimental evidence. Additionally, contour maps inherently contain valuable topological information for representing inter-feature connections, and it would be beneficial to consider leveraging this relationship.

b)  The Graph Interaction Module (GIM) uses Joint Attention to enable information propagation between G1 and G2, similar to the concept of performing self- and cross-attention separately. As a result, attention computation inherently achieves both intra- and inter-graph node interactions. Therefore, the need for the GCN module is questionable. The authors should perform ablation experiments to validate the effectiveness of the GCN module.

(2)    The FPD and CBAM carry out a downstream process after CGIM. While the paper provided a citation for CBAM, the operational details of FPD are unclear. The authors should provide citations for FPD or explain its specific functioning.

 

2.      Presentation

1)  The rectangular boxes in Figure 1 are shown in different colors. Is this differentiation intentional and based on a specific rule? If so, the authors should provide corresponding explanations.

2)  Figure 4 presents several issues. Firstly, multiple 1x1 conv are depicted, and it's hard to correlate them with the ‘Ï•’ as described in the formulas. The authors should differentiate them clearly in the figure. According to the description, the result of ‘Ï•1’ undergoes a product operation with ‘C’, but the 1x1 conv in Figure 1 does not seem to involve dimension reduction (contrary to line 276). Additionally, the feature map dimension in ‘Ï•2’ branch changes from ‘C’ to ‘C'’. The input size of the contour map is ‘W*H’, and after upsample, it remains ‘W*H’. Moreover, the usage of the English letter 'X' in shape descriptions is non-standard. Certain module names are hard to discern, and it is advisable to increase the font size for better readability.

3)  According to Figures 4 and 9, there is interaction in both CGIM and DTT-Encoder in the dual-branch network, but this interaction is not evident in Figure 3.

4)  In Figure 5, CEM is depicted to take inputs from different ResBlocks, whereas, according to description, it should refer to different layers of the ResNet (line 240). Additionally, the input for SCB should be a feature map; however, Figure 5 shows an image as the input, which is inappropriate.

5)  The representation of "ith" in Figure 6 is ambiguous. Does it refer to inputs at different scales, or does GIM include a multi-layered structure? The authors should provide a clear description to avoid ambiguity.

6)  The attention map in Figure 8 appears to be distorted, and there is a lack of a softmax module.

7)  "a contour map of images" should be in the plural form ("contour maps of images") (line 215).

8)  The caption of Table 1 does not need to be entirely in uppercase.

9)  The symbol "L" in the loss function should be formatted using the \mathcal{} font.

3.      Experiments

1)The experimental evaluation in this work is thorough, including comparisons with state-of-the-art methods on multiple datasets and extensive ablation studies, which is appreciated by the reviewer. However, as mentioned in the comment on the novelty section point 1, there is a lack of corresponding ablation experiments to demonstrate the necessity of the GCN module.

2)The paper discusses the use of pretrained weights for the ResNet backbone but does not mention the weights for the DTT module. Transformer models without pretrained weights could be challenging to train on small datasets. The authors should clarify whether pretrained weights were used for the DTT module or provide information on the weight initialization method.

 [1] Wang K, Zhang X, Lu Y, et al. CGRNet: Contour-guided graph reasoning network for ambiguous biomedical image segmentation[J]. Biomedical Signal Processing and Control, 2022, 75: 103621.

Comments on the Quality of English Language

Refer to the comments above.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 3 Report

Comments and Suggestions for Authors

There are no other comments.

Reviewer 6 Report

Comments and Suggestions for Authors

The quality of the manuscript has been greatly improved after the revision.

Comments on the Quality of English Language

The authors are suggested to carefully proofread the whole manuscript. Some minor editorial changes are required. For instance, in Abstract, " Then, use the feature ..." is a broken sentence.

Back to TopTop