SAR-CDSS: A Semi-Supervised Cross-Domain Object Detection from Optical to SAR Domain
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThis manuscript presents a semi-supervised cross-domain object detection method from optical to SAR domain. The structure of the manuscript is complete, the logic is clear. However, I have some concerns which require some attention and properly addressed by the authors. I recommend this manuscript to be published with situation revision. Below are the comments:
1. Do randomly mixed images need to be paired when domain mixing? Does the size of the object detection box need to be consistent? Do we need to preprocess the pixel values of the mixed images before and after?
2. Does the proposed method require resolution for SAR and optical images?
3. Is the proposed method robust to signal-to-noise ratio? Please provide an analysis of the signal-to-noise ratio.
4. The paper still needs to be written carefully and pay attention to standardized format.
Comments on the Quality of English LanguageThe English should be improved.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThe authors of the paper present an approach that utilizes both SAR images and optical images for target detection using a neural network.
The overall research work is good and can be of interest to the remote sensing community. The results are promising and show the potential of the suggested approach. However, the main focus of the paper is the neural network method proposed, but since this paper is submitted to a remote sensing journal,I suggest that the authors enhance the remote sensing part, dedication a section about the data used, the spatial resolutions, and the number of acquisitions.
Also, the authors should be careful about statements regarding the SAR backscatter images (e.g.," in line 55 to 57: "...Unlike optical images, SAR images exhibit speckles and intricate details related to color and texture, and this low degree of visualization brings challenges to SAR image interpretation...") The information presented in SAR images is unrelated to the target's color. It is related to the texture of the target, such as surface roughness.
* The theoretical part is unnecessarily long; I suggest reducing it
Please change the following for clarification:"
* Line (82): '[7] proposed a multi-feature joint algorithm that extracts both the size and azimuth of the object.' The azimuth is a term used for SAR acquisition, which can conflict with the meaning provided here. According to the reference cited, you are referring to azimuth angle estimation.
*In the table, I suggest using the term:" the proposed algorithm," not "ours."
Comments on the Quality of English Language
the english is good, please re-check the spelling
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsThe unique imaging mechanism of SAR images makes the acquisition and interpretation of SAR images more difficult, which seriously affects SAR image object detection. In response to this problem, the authors propose a semi-supervised cross domain object detection method from optical domain to SAR domain, which can achieve knowledge transfer and SAR object detection using optical images and a small number of SAR images. Firstly, a data augmentation method based on image mixing and strength exchange was proposed; Then, an adaptive optimization strategy was proposed at the feature level to filter out mixed domain samples that deviate significantly from the SAR feature distribution. Finally, it is proposed to embed a detection head based on normalized Wasserstein distance in the VIT model, which extracts global features while enhancing regions with smaller effective pixels in SAR images. This article has a certain degree of novelty, but there are issues in the article, and the main questions are as follows:
1. The article adopts a semi-supervised detection method, but the data used in the experiment is a large number of annotated optical images and a small number of annotated SAR images, which seems not reflect the idea of semi-supervised methods.
2. The article has some confusing descriptions, for example, line 214 describes ${x^{mix}}$ as a new image generated by randomly mixing the original image samples, and line 215 lacks the description of ${x^{mix}}$ as an initialized empty image.
3. The article has some descriptive errors, for example, the resolution of the HRSID dataset is 0.5m, 1m, and 3m, but the article describes it as 1m to 5m. The area calculated below the PR curve represents AP rather than mAP.
4. During the experiment, the authors used AP50 to measure the effectiveness of the model, but when analyzing the detection effect, accuracy was used, which are two different indicators.
5. During the experiment, only AP50 was used as the evaluation indicator, which was too single and did not reflect the overall performances of the model for small object detection. It is recommended to add multi-scale indicators in the COCO evaluation system.
6. The article only conducted experiments on mixed images generated from optical and SAR images, and did not demonstrate significant results in using optical images to assist SAR images. It is recommended that the authors increase comparative experiments on SAR images.
7. The red boxes in Figures 4 and 5 represent missed targets, but the comments state them as false targets.
8. There are many formatting differences in the article, such as inconsistent formatting between lines 274 and 284, inconsistent formatting between lines 309 and 313, inconsistent formatting between lines 195 and 208, etc. The authors should carefully check the entire text and unify the writing format.
Comments on the Quality of English LanguageExtensive editing of English language required.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Round 2
Reviewer 3 Report
Comments and Suggestions for AuthorsIn this paper, the author proposes a semi-supervised cross domain object detection method from optical domain to SAR domain, which can achieve knowledge transfer and SAR object detection using optical images and a small number of SAR images. The revision of this paper is sufficient and the logic is clearer than before, except some minor problems:
1. There is inconsistency in the description before and after in this paper. In Chapter “3. Methodology”, the author described SAR images as the source domain and optical images as the target domain, and generated images with the same distribution as the source domain using the Domain Mix method, which seems be inconsistent with the previous and subsequent descriptions. The same situation exists in Figure 3, and it is recommended that the authors could have a check.
2. In line 412, the author describes that “we can find that using optical remote sensing data to assist SAR image target detection has achieved significant results”, but Table 1 is introducing the dataset. Should it be Table 2.
3. The article only introduces the dataset used in the experimental process and the ratio of labeled SAR images used for supervision, but does not describe how the training set, test set, and validation set are divided.
4. There are formatting errors in certain parts of the article, such as "Third, To enhance" on line 195 and "domain. we propose" on line 208. It is recommended that the authors carefully check and correct them.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf