Next Article in Journal
Remote Bridge Inspection and Actual Bridge Verification Based on 4G/5G Communication Environments
Previous Article in Journal
Cross-Modality Person Re-Identification Algorithm Based on Two-Branch Network
 
 
Article
Peer-Review Record

A Decoupled Semantic–Detail Learning Network for Remote Sensing Object Detection in Complex Backgrounds

Electronics 2023, 12(14), 3201; https://doi.org/10.3390/electronics12143201
by Hao Ruan, Wenbin Qian *, Zhihong Zheng and Yingqiong Peng
Reviewer 1: Anonymous
Reviewer 2:
Electronics 2023, 12(14), 3201; https://doi.org/10.3390/electronics12143201
Submission received: 27 June 2023 / Revised: 20 July 2023 / Accepted: 21 July 2023 / Published: 24 July 2023
(This article belongs to the Topic Computer Vision and Image Processing)

Round 1

Reviewer 1 Report

The paper proposes a decoupled semantic-detail learning network (DSDL-Net) for detecting multi-scale objects in complex remote sensing backgrounds. It incorporates a multi-receptive field feature fusion and detail mining module to learn higher semantic-level representations and preserve detail texture information. Additionally, an adaptive cross-level semantic-detail fusion network leverages a feature pyramid to fuse detailed and high-level semantic features. Experimental results demonstrate the approach's superiority over 12 benchmark models, achieving improved average precision on three remote sensing datasets.

1.    The authors propose a novel detection network, DSDL, to address the challenges of inaccurate localization and identification in remote sensing object detection. The approach seems promising and tackles an important problem in the field.

2.    The introduction of the MRF-DM structure within the backbone is interesting. It is claimed to retain and compress detail features while learning high-quality semantic information. I would appreciate further clarification on the specific mechanisms employed in this structure and how it achieves the desired objectives.

3.    The CSDF structure for the integration of semantic and detailed information appears to be a key component of the proposed network. The concurrent execution of global and local attention processes under shared global attention conditions is intriguing. Add more details on how these processes are implemented and their impact on the network’s overall performance.

4.    The authors mention certain categories, including bridges, ports, overpasses, train stations, and vehicles, where the performance of the proposed network exhibits room for improvement. It would be valuable to discuss the specific challenges associated with detecting these categories and propose potential avenues for addressing these limitations in future work.

5.    It would also be helpful if the authors could provide insights into the computational requirements and training time of the DSDL network, as these factors can significantly impact the practical feasibility of the proposed approach.

Overall, the paper presents a promising approach for improving localization and identification accuracy in remote sensing object detection. Addressing the above-mentioned points will strengthen the manuscript and contribute to the advancement of the field.

The paper is well-written and organized and the English requires minor editing. 

Author Response

Please refer to the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

It is very important to have new technology to detect object in complex backgrounds using AI/ML technology. In this paper, combination of various method is key, I believe. In that case, I have several comments and suggestion for your paper to improve.

 

1)    Figure 1, it is overview structure of DSDL-net but it is very difficult to understand overall outline of your proposed new method with MRFF-DM and CSDF. It seems that you also break down figure 2, 4, 5, 6 and 7. For instance, in figure 1, it looks DM and MRFF are parallel processing but it looks MRFF as first process and then DM as second process. And then, in figure 4, suddenly propose new A-SPPCSP and there is no A-SPPCSP in figure 2. In Figure 5, ACMix (ACmix Attention Module) is also proposed and no explanation in Figure 4.

2)    Ablation experiment metrics, to confirm the accuracy, why you propose to use AP? Usually, to detect objective, there is no average precision (AP) using remote sensing data but it should be truth or false. Of course,  formulation (22)-(24) are important to evaluate the accuracy.

3)    In Figure 8 and 9, since your proposed method is to detect object in  complex backgrounds, do you compare between simple backgrounds and complex backgrounds to convince your new method. In addition, in figure 8, it is difficult to understand legend meaning. Because it is highly depended on accuracy with target object size and density. Can you please explain more detail about the test data and it’s condition in your experiment in Ch 4.4. more clearly in detail?

Result information is too poor. 

Author Response

Please refer to the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

No futher comment

N/A

Back to TopTop