Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Open AccessArticle

Peer-Review Record

YOLO-HR: Improved YOLOv5 for Object Detection in High-Resolution Optical Remote Sensing Images

Remote Sens. 2023, 15(3), 614; https://doi.org/10.3390/rs15030614

by Dahang Wan^1,2, Rongsheng Lu^1,2,*, Sailei Wang^1,2, Siyuan Shen^1,2, Ting Xu^1,2 and Xianli Lang^1,2

Reviewer 1:

Ruiqian Zhang

Reviewer 2:

Haowei Zhang

Reviewer 3:

Hongming Zhu

Remote Sens. 2023, 15(3), 614; https://doi.org/10.3390/rs15030614

Submission received: 1 December 2022 / Revised: 8 January 2023 / Accepted: 9 January 2023 / Published: 20 January 2023

(This article belongs to the Special Issue Advanced Application of Artificial Intelligence and Machine Vision in Remote Sensing II)

Round 1

Reviewer 1 Report

The submission proposes a lightweight object detection network for high-resolution remote sensing images based on the YOLOv5 framework in order to balance detection accuracy, speed, and the amount of model parameters, as well as take advantage of existing features. The manuscript also conducts a series of experiments and discussions on the SIMD dataset. However, I have a number of major concerns with the manuscript, which are outlined below.

1- In the introduction section, the submission lists various target detection methods, but does not analyze the advantages, disadvantages and bottlenecks of these methods. For example, in line 82, the issues of the aforementioned methods are summarized suddenly, making the motivation of the proposed method less obvious.

2- In section 2, the subsection of “Datasets of Remote Sensing Image Object Detection” is not relevant to this paper. Table 2 has the same problem. There are considerable object detection algorithms designed for high resolution remote sensing imagery, which are ignored in this paper. Please provide more descriptions and references for these algorithms in Related Work section.

3- In section 2.2, it is recommended to add the reasons for choosing hybrid soft attention, i.e. advantages and the problems that can be solved.

4- In section 3, the scientific contributions of the proposed method should be more clearly clarified. For example, the existing method flow (e.g., section 3.2; section 3.3) can be ignored or described simply. The authors should describe more sentences about your own ideas.

5- In line 300, the paper describes that “Figure 5 depicts a visual comparison of the heat map of some detection findings prior to and following the addition of the MAB module”. Meanwhile, the Figure 5 titles the second and the third lines as “YOLO-HR without MAB” and “YOLO-HR with MAB”. However, in my understanding, the YOLO-HR algorithm has MAB, right? What do you mean by “YOLO-HR without MAB”? Is it means experiments on “YOLOv5s+MPH” algorithm?

6- In Table 5, the ablation studies on MAB seems ignore. Why not add the experiments of "Yolov5s+MAB"? Moreover, since both “Yolov5s” and “YOLO-HR” have added the "Finetune" ablation experiments, why not add the results from “yolov5s+MPH+Finetune”.

7- Please replace Figure 7 with a clearer version.

8- Some presentations and mistakes should be improved, included but not completed:

1) Abbreviation inconsistency: Yolov5 (line 18) and YOLOv5 (line 1).

2) Font inconsistency: line 132; line 145; format of brackets in 322.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 2 Report

This paper proposed YOLO-HR for object detection in high resolution remote sensing image, whose balance between effectiveness and efficiency are verified based on SIMD dataset. Overall, this paper is interesting. However, some critical issues need to be addressed before publication.

1. Abstract:

1) Please adopt one or two sentences to point out the importance of the research, not a half of abstract.

2) Please supplement the specific and detailed techniques you proposed.

3) I do suggest revise “between detection effect and speed” as “between effectiveness and efficiency”, and the followings in the main manuscript are also suggested.

2. The first paragraph line 28~ line 43

The organization seems too confused, and I cannot catch its meaning and emphasis. I think the first paragraph should briefly introduce the background of this study, and point out its importance. In this point, it should be reorganized.

3. The deficiencies of existing works line 84~ line 94

Please supplemented the specific references to support your claims, e.g., “Some researchers employ a two-stage model for object recognition [XX-XX]”,

4. The subjects are missing in your contributions. line 99~ line 104

5. Minor comment

Could you please clarify that the what the differences between “the detection in remote sensing image” and “the detection in radar”? where the latter item can be seen in

Joint detection threshold optimization and illumination time allocation strategy for cognitive tracking in a networked radar system, IEEE Trans. Signal Process., doi: 10.1109/TSP.2022.3188205.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 3 Report

1. In Table 6, the comparison with the experimental results of the state-of-the-art(SOTA) model is missing

(Faster RCNN was proposed in 2016, which is no longer the model with the best performance). Please

provide more review research and experimental comparisons of SOTA methods.

2. Please fix some typographical problems, such as the centering problem of image 6 and the line break

problem of equation 8.

3. we would like to see the results of the ablation experiments on MAB module, which is one of the

contributions of this paper, so as to prove the effectiveness of this attention module.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

I would like to thank the author for the efforts on the additional experiments and improved paper. However, I have some additional comments that need to be addressed:

Regarding Reply 2, the authors describe that “the high resolution is actually the application of the network input to the high-resolution remote sensing data set”&”It is difficult to sort through thousands of high-resolution remote sensing image target detection network papers, but their basic network algorithms are relatively simple to summarize and compare”. However, I have a completely different view. First of all, the object of this paper is the detection of targets in remote sensing images. Therefore, it is necessary to summarize the literature of related work on RS object detection. Second there are obvious differences between objects in remote sensing images and objects in natural scene images, such as object distribution, object scale, perspective, etc. This is the reason why many baseline methods in CV field are not very accurate when applied to remote sensing images. Since the authors insist that Tables 1 and 2 are relevant to the paper, the algorithms created for the dataset in Table 1 should be summarized in Table 2. The existing Table 2 seems to be for a list of network backbones rather than object detection networks in RS images. Finally, if a review of papers on remote sensing imagery is not important, what can this paper bring to the field?

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 2 Report

The authors are suggested to recheck the whole manuscript before publication. In addition, it is suggested to add refs. to clarify the difference between remote sensing image detection and radar detection.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 3 Report

The overall content is well revised, including the introduction and the method part. But the experimental

results part is still insufficient, please provide more comparisons of SOTA methods. Furthermore, the

expression could be shorten and refined to avoid using long sentences.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Article Menu

YOLO-HR: Improved YOLOv5 for Object Detection in High-Resolution Optical Remote Sensing Images

Further Information

Guidelines

MDPI Initiatives

Follow MDPI