Next Article in Journal
Multi-Source Data-Driven Extraction of Urban Residential Space: A Case Study of the Guangdong–Hong Kong–Macao Greater Bay Area Urban Agglomeration
Previous Article in Journal
Rotational Motion Compensation for ISAR Imaging Based on Minimizing the Residual Norm
 
 
Article
Peer-Review Record

Few-Shot Object Detection for Remote Sensing Imagery Using Segmentation Assistance and Triplet Head

Remote Sens. 2024, 16(19), 3630; https://doi.org/10.3390/rs16193630 (registering DOI)
by Jing Zhang 1,2,3,4,*, Zhaolong Hong 1, Xu Chen 1 and Yunsong Li 2,3
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Remote Sens. 2024, 16(19), 3630; https://doi.org/10.3390/rs16193630 (registering DOI)
Submission received: 29 July 2024 / Revised: 27 August 2024 / Accepted: 23 September 2024 / Published: 29 September 2024

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This article mainly studies few-shot object detection in remote sensing images. A segmentation assistance module was proposed to improve performance by using binary classification segmentation as an auxiliary task. Tri-Head’s design uses knowledge distillation to alleviate the forgetting problem of base classes knowledge, while also improving its ability to detect novel classes. Finally, the classification loss function was improved to enhance classification accuracy. The proposed method achieved good performance on the NWPUv2 and DIOR datasets.

Overall, this article takes into account the real-time issue that is often overlooked in few-shot object detection, and conducts extensive experimental verification to validate the effectiveness of the method. 

There are some issues and details that need to be resolved:

1. Some of the text in the images is not clear enough, such as Figure 7 and Figure 9.

2. The analysis of experimental results can be appropriately enriched, such as analyzing the trend of changes.

3. SA module considers the entire bounding box interior as foreground objects. Does the background within the box interfere with the extracted features?

4. Several recently developed unsupervised remote sensing image interpretation methods have not been discussed in the Introduction. These include the Spatial-Spectral Masked Auto-encoder and Nearest Neighbor-Based Contrastive Learning.

5. The selection of text and border colors in the picture can be carefully considered, and colors that differ significantly from the background can be chosen.

6. The role of each module in Tables 4 and 5 in the overall architecture can be explained in detail.

 

Author Response

Comments 1: Some of the text in the images is not clear enough, such as Figure 7 and Figure 9.

Response 1: Thank you for pointing this out. We have redrawn Figures 4, 7, and 9 to make the bounding boxes and fonts clearer.

Comments 2: The analysis of experimental results can be appropriately enriched, such as analyzing the trend of changes.

Response 2: We have added some trend analysis on the results, and the newly added section is in the first paragraph of the Discussion.

Comments 3: SA module considers the entire bounding box interior as foreground objects. Does the background within the box interfere with the extracted features?

Response 3: There may be some interference. We initially planned to use saliency object detection to obtain labels to eliminate the interference of background within the box, but we did not find a good saliency object detection method to obtain high-quality image segmentation labels. In this case, treating the entire bounding box as the foreground target is a good approach.

Comments 4: Several recently developed unsupervised remote sensing image interpretation methods have not been discussed in the Introduction. These include the Spatial-Spectral Masked Auto-encoder and Nearest Neighbor-Based Contrastive Learning.

Response 4: Thank you for your suggestion. But our research topic is few-shot learning and object detection, and I am not familiar with the methods you recommended, so I did not discuss them.

Comments 5: The selection of text and border colors in the picture can be carefully considered, and colors that differ significantly from the background can be chosen.

Response 5: Thank you for your suggestion, this is indeed a problem. We have redrawn images and selected colors for the text and bounding boxes that differ significantly from the background. As shown in Figures 4, 7, and 9.

Comments 6: The role of each module in Tables 4 and 5 in the overall architecture can be explained in detail.

Response 6: We have provided a detailed introduction to the roles of each module in the third paragraph of the newly added Discussion section.

Reviewer 2 Report

Comments and Suggestions for Authors

The manuscript enhances the model based on YOLOv5s to address the challenges in few-shot object detection for Remote Sensing Imagery. The proposed Segmentation Assistance and the detection head with a distillation mechanism, Triplet Head, are innovative and have been proven effective through extensive experiments on two datasets. The paper is well-structured with clear logic and precise expressions, making it an excellent work. It would be even more perfect if the code were made open-source.

Author Response

Thank you for your review and comments. We will strive to make our code open source.

Reviewer 3 Report

Comments and Suggestions for Authors

I am writing to provide my review report on the manuscript titled "Few-shot Object Detection for Remote Sensing Imagery Using Segmentation Assistance and Triplet Head".

Overall, the paper covers important topics and is well presented.

This study proposed a simple few-shot object detection method based on the one-stage detector YOLOv5 with transfer learning.

Please refer to the review results below. 

1. 

First of all, it is more helpful to understand if the definition of base classes and novel classes is introduced in front part of this paper rather than line 388-392. 

2.

When it comes to visualizing the detection results of Figures 7 and 11, it is recommended to compare them with the results of Reference [23]. Or, it is recommend visualizing the results including the results of reference [23].

3. 

In the confusion matrix in Figure 10, the background class that is not in the NWPUv2 dataset. More detailed explanation is needed.

4. 

In Fig. 10, it seems that missclassification rate for aiplane class is the highest.  But, in line 466-468, you stated as below. 

"Upon examining the matrix, it becomes evident that within base classes, the ‘bridge’ category exhibits a comparatively higher probability for misclassification."

It is need to clearly describe which description is correct.

Thank you.

Author Response

Comments 1: First of all, it is more helpful to understand if the definition of base classes and novel classes is introduced in front part of this paper rather than line 388-392. 

Response 1: Thank you for your suggestion. We have moved up the definition of base classes and novel classes to the third paragraph highlighted in red in the Introduction section, specifically from lines 40 to 45.

Comments 2: When it comes to visualizing the detection results of Figures 7 and 11, it is recommended to compare them with the results of Reference [23]. Or, it is recommend visualizing the results including the results of reference [23].

Response 2: We have added comparison results with Reference[23] in Figures 7 and 11.

Comments 3: In the confusion matrix in Figure 10, the background class that is not in the NWPUv2 dataset. More detailed explanation is needed.

Response 3: The background class does not belong to the category of the dataset. The appearance of background class in the confusion matrix is to indicate how many foreground targets are misclassified as background and how many backgrounds are misclassified as foreground, that is, how many targets are missed or misclassified. We have added relevant explanations in lines 470 to 473.

Comments 4: In Fig. 10, it seems that missclassification rate for aiplane class is the highest. But, in line 466-468, you stated as below. "Upon examining the matrix, it becomes evident that within base classes, the ‘bridge’ category exhibits a comparatively higher probability for misclassification." It is need to clearly describe which description is correct.

Response 4: Thank you for your question. The airplane class belongs to novel classes, while the bridge class belongs to base classes. In novel classes, the aircraft category has the highest misclassification rate; Among base classes, the bridge class have the highest misclassification rate. In the text, we separate these two.

Back to TopTop