Next Article in Journal
Spatial Distribution of Multiple Atmospheric Pollutants in China from 2015 to 2020
Next Article in Special Issue
Self-Distillation-Based Polarimetric Image Classification with Noisy and Sparse Labels
Previous Article in Journal
Evaluating Predictive Models of Tree Foliar Moisture Content for Application to Multispectral UAS Data: A Laboratory Study
Previous Article in Special Issue
Automated Detection and Analysis of Massive Mining Waste Deposits Using Sentinel-2 Satellite Imagery and Artificial Intelligence
 
 
Article
Peer-Review Record

SDAT-Former++: A Foggy Scene Semantic Segmentation Method with Stronger Domain Adaption Teacher for Remote Sensing Images

Remote Sens. 2023, 15(24), 5704; https://doi.org/10.3390/rs15245704
by Ziquan Wang, Yongsheng Zhang, Zhenchao Zhang *, Zhipeng Jiang, Ying Yu, Li Li and Lei Zhang
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Remote Sens. 2023, 15(24), 5704; https://doi.org/10.3390/rs15245704
Submission received: 2 November 2023 / Revised: 8 December 2023 / Accepted: 10 December 2023 / Published: 12 December 2023
(This article belongs to the Special Issue Remote Sensing Image Classification and Semantic Segmentation)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This study developed a new model, SDAT-Former++, for semantic segmentation on foggy images. The model incorporates style-invariant and fog-invariant features, with an attempt to fill in both style and domain gaps. This is an interesting work, because most semantic segmentation studies tested their methods using high-quality images, while a few studies focused on ill-posed images or low-quality images, which are commonly seen in real applications. The authors compared their method with several SOTA models and tested it on four datasets. Experimental results showed the superiority of the proposed method in classification accuracies, and the effectiveness of the developed modules as well. In general, I way say that methodological innovation and experimental evidence are sufficient for publishing in the Remote Sensing journal. Still, I have some comments for the improvement of this manuscript.

 1.        Abstract: I suggest the authors include some quantitative evidence in the research findings.

2.        Introduction: Lines 96-99, I suggest moving this paragraph discussion section or just removing it. In fact, this part indicates the difference compared to existing studies, belonging to a discussion point.

3.        Methods: 3.2 Sub-Modules. It would be nice to locate these six modules in the related figure, so that readers can have a clear picture of what they are.

4.        In section “Self-training on the target domain and consistency learning” (Section 3.6), I did not find the “consistency learning” in Figure 2. Please clarify it.

5.        In Section 3.5.1, you give the loss with different components. How to decide its weight? How to determine the value of “m”? How to set the parameters for positional encoding?

6.        How to train the Teacher Network? Is there gradient propagation in the Teacher Network?

7.        In Table 4, the mIoU on the ACDC dataset does not improve steadily when adding more images. Please give some explanations.

8.        In Figure 7, the feature maps seem more sharpened after positional encoding is applied, while the feature maps become fuzzy when Masked Learning is applied. Why is this? Give more details in the text.

 

Comments on the Quality of English Language

Comments on writing:

English writing can be further improved.

1.        Please check the singular and plural of nouns, e.g., the title: “Remote Sensing Image”, should be Remote Sensing Images.

2.        Please check the tense used in different sections, e.g., the present tense is usually used to describe methods (you can also use the past tense when an action is finished), while the past tense for results. The future tense is seldom used in academic writing.

 

3.        Table 4, different number clear images -> different number of clear images

Author Response

Please refer to the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

Main content and contributions:

This paper improves the original version of ICIP. Feature enhancement with mask learning and fog-pass filtering is introduced, which provides significant results in terms of accuracy and speed. The training process has also been simplified and improved. 

Advantages: In this paper, the author explains in detail how SDAT-Former works, makes up for the shortcomings of the original ICIP version, and gives a more convincing and intuitive explanation, which is very good. I think that SDAT-Former has actually learnt the anti-fogging and scene segmentation capabilities. In terms of simplifying the training process, I have read the original paper, and the optimisation based on intermediate domain pseudo-labels is indeed very tedious and lengthy, the author's improvement is to the point. The experiments are also adequate.

Detailed issues:  

(1) There are some typos and some sentences are rather convoluted, such as "'The feature map of the visualisation model also shows ...... (line 502)'".

(2) In the ablation experiments section, please add experimental results of consistent learning

(3) Some of the tables and images are stretched too big or too small, please revise them to appropriate size.

(4) There are still some literature to be cited about semantic segmentation in remote sensing images.

 

Comments on the Quality of English Language

None.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

The paper improved the previous conference paper on SDAT-Former technology by introducing position encoding to separate more high-dimensional details and weight perturbation to regularize knowledge transfer. The authors also add one more dataset to the current paper compared to previous work.  

1) In general, the paper is in good form and requires some minor corrections before publication.

2) lines 70-72: The current GANs models such as style GAN may not suffer from low-resolution problems. please check the state of the arts of GANs model and rewrite the two sentences.

3) Table 1 is confusing regarding the line for "Ours" results. the columns and cells in "OURS" columns are not overlapping. Please correct this.  

4) Presented visual images are very small in Figures 3, 4, 5, 6, 8 and 9. It is nearly impossible to assess quality from these figures right now. Please do increase the sizes of the imagery to provide a PROPER visual comparison. If the paper length is a concern, I suggest authors remove the number of examples but show properly presented visual results.   

5) This work used a few models during the whole task and looks requires high computational resources, can you also compare the computational cost such as the memory cost of GPU during training for each method?    

Comments on the Quality of English Language

6) Language quality is in general good but might require one more thorough revision before the second revision round. Please check. 

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Back to TopTop