Next Article in Journal
Analysis of the Feasibility of UAS-Based EMI Sensing for Underground Utilities Detection and Mapping
Previous Article in Journal
A Novel Ground-Based Cloud Image Segmentation Method Based on a Multibranch Asymmetric Convolution Module and Attention Mechanism
 
 
Article
Peer-Review Record

Multi-Scale Object Detection with the Pixel Attention Mechanism in a Complex Background

Remote Sens. 2022, 14(16), 3969; https://doi.org/10.3390/rs14163969
by Jinsheng Xiao 1, Haowen Guo 1, Yuntao Yao 1, Shuhao Zhang 1, Jian Zhou 2,* and Zhijun Jiang 3,4
Reviewer 1: Anonymous
Reviewer 2:
Reviewer 3: Anonymous
Remote Sens. 2022, 14(16), 3969; https://doi.org/10.3390/rs14163969
Submission received: 27 June 2022 / Revised: 3 August 2022 / Accepted: 11 August 2022 / Published: 16 August 2022

Round 1

Reviewer 1 Report

The authors propose a bidirectional multi-scale feature fusion network to fuse semantic features and shallow features to improve the detection effect of small objects in complex backgrounds. Some specific comments are as follows.

1. Figure 1-4 are too blurry. Can authors revise them to the high-resolution ones?

2. As a minor comment, is it appropriate to directly divide the angles into 180 categories? I think too many categories would affect the classification performance. If the angle is first classified into two categories (i.e., 0<=angle<=90 and 90<angle<=180), and then determinate its detailed value, can it improve the performance? The authors should conduct related experiments and analyze the results.

3. Do the comparison methods use the same backbone (i.e., ResNet50)? If not, the authors should reconduct the experiments for fair comparison. Otherwise, the authors should give the justification.

4. Ablation study should include not only quantitative analysis but also qualitative analysis. For example, the authors should provide some visual results to demonstrate the proposed method could accurately locate objects with different scales.

5. I am curious to know the computational efficiency of the proposed method. Could the authors provide the running time and model size of all comparison methods?

6. Typos: Line 182 in Page 5: ‘dinappropriate’ should be ‘inappropriate’.

7. ‘In recent years, the attention mechanism has …’. Some attention-related works in the field of remote sensing may be considered to cite here, such as ‘Hyperspectral Image Classification With Attention-Aided CNNs, 2021 TGRS’.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 2 Report

Aiming at the problem of object detection in complex backgrounds of remote sensing imagery, this paper proposes an object detection algorithm. The algorithm obtains feature maps with both shallow features and deep semantic features by constructing a bidirectional multi-scale feature fusion network, and then uses the pixel attention mechanism module to focus on useful information. At the same time this paper uses the method of converting the angle prediction regression problem into a classification problem to solve boundary problems. Finally, ablation experiments are performed on the DOTA dataset of the algorithm, which prove the effectiveness of its module and transformation method, and compare with several popular object detection algorithms on three datasets to obtain the superiority of the algorithm.

The work content of this paper is sufficient and the logic is clear, but there are also some problems:

(1) The problem of standardization of words and expressions. Firstly, a bottom-up path has been proposed by PANet, but this paper uses "design a new feature fusion network". Secondly there are some spelling mistakes like "dinappropriate" and so on.

(2) The number or proportion of samples in the datasets used as training set, validation set and test set is not given.

 

(3) The result description is inaccurate. For example, in the comparison results of different algorithms of DOTA-GF, the AP of large vehicle detected by the algorithm in this paper is quite different from the highest AP of the other four algorithms, but this paper uses "close to the highest AP".

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 3 Report

This paper designed a multi-scale object detection network with multi-feature selection module based on attention mechanism. The novelty of proposed approach is low and there are some serious issues with experimental analysis.

1. In Introduction and Related work, this paper treats angle prediction as a classification problem and uses the circular smooth label (CSL). On one hand, more relevant works that apply classification for object angle prediction should be introduced. On the other hand, please illustrate the difference between the proposed method and the existing CSL method.

2. There are many problems with the organization of this paper. First, the visualization results of multi-scale feature maps in Figure 6 should be shown in Results. Second, Figure 7 can be shown in the Results to verify the effectiveness of the proposed angle classification method compared with a five-parameter method. Third, in Section 3.3, the paper should focus on the proposed angle classification method. Some irrelevant descriptions should be deleted. Fourth, Figure 8 lacks comparative results, and can be shown in Section 4.2.

3. In the experiments, this paper selects 6 types of remote sensing objects in the DOTA dataset. The motivation for selecting 6 objects is not clear.

4. In Sections 4.2, 4.3, and 4.4, different comparison methods are chosen to verify the advancement of the proposed approach. Considering the experimental results on DOTA, CSL(2020), RRPN(2018), RetinaNet(2020), and Xiao(2020) are selected as comparative algorithms. More existing methods for oriented object detection in remote sensing images should be added. Besides, the results of CSL in Table 5 are lower than the published results. Considering the fewer object categories in the experiments, the APs of 6 typical objects should be higher than CSL(2020) rather than lower. Please clarify this point.

5. In Section 4.3, this paper adds 138 domestic remote sensing images to the DOTA dataset. This paper should add the number of labeled instances of different objects. Specifically, the collected satellite images from GF-6 and GF-2 can be analyzed independently.

6. In Section 4.4, the size of input images is 512*800 in RoI-Transformer, which is different from other methods. Please clarify this point.

 

7. This paper lacks a qualitative analysis of different detection methods.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

All my issues have been addressed.

Reviewer 2 Report

The author has answered my question, no further comments

Reviewer 3 Report

The comments and suggestions have largely been addressed.

Back to TopTop