Next Article in Journal
A Live Detecting System for Strain Clamps of Transmission Lines Based on Dual UAVs’ Cooperation
Previous Article in Journal
Vision-Based Anti-UAV Detection Based on YOLOv7-GS in Complex Backgrounds
Previous Article in Special Issue
Lightweight and Efficient Tiny-Object Detection Based on Improved YOLOv8n for UAV Aerial Images
 
 
Article
Peer-Review Record

An All-Time Detection Algorithm for UAV Images in Urban Low Altitude

by Yuzhuo Huang, Jingyi Qu *, Haoyu Wang and Jun Yang
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Submission received: 10 May 2024 / Revised: 14 July 2024 / Accepted: 17 July 2024 / Published: 18 July 2024

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The manuscript presents an all-time detection algorithm for UAV in urban low-altitude environments, utilizing visible imagery during the day and infrared imagery at night, with the introduction of a defogging detection structure and SE-backbone for enhanced feature extraction in various lighting conditions. The authors are expected to reformulate the scientific values and novelties of this work. The following suggestions are for reference.

General concept comments

1.         The introduction can be improved. The current last paragraph can be reformulated as the main structure of the paper. Apart from that, please supplement an additional paragraph that clearly states the novelties or contributions of the paper in the last but one paragraph.

2.         Is Figure 2 necessary? It just showcases qualitatively the potential images of UAV, yet it is declared “Data Availability Statement: No applicable” at the end of this paper. Should the dataset at least be available on request? Otherwise, other researchers will not be able to reproduce the results, neither could t they benefit from the proposed method. This is just for reference.

3.         It seems that the main novelties lie in the combination of the existing algorithms, such as combining “GridDehazeNet” and “YOLOv7”, introducing SE into CNN, etc. This is understandable for works with strong application background. However, the authors are expected to emphasize the issues they have tackled when transferring one existing method to a new research field, i.e., the image detection in low altitude of this work. Otherwise, the work will be deemed as simple combinations of mature methods (which should not be).

4.         The proposed method is aimed at “images” of UAV instead of video (please correct me if I am wrong). In the real-world applications, the reviewer thinks that the interested requirement is to detect UAV based on real-time video. However, this is not discussed in the paper. The authors should at least state how (theoretically) the proposed method can be extended to video-based cases (not necessarily in this paper). This should also be discussed in the conclusion as part of the future work.

Specific comments

1.         The organization of Figure 1 looks wired. Please consider supplementing captions for each subfigure (or simply keeping only the left side, as the reviewer does not find more information from the other column) and unifying the size of those figures.

2.       English writing must be improved. For instance, Line 50, “Aim to solving the problems above, an all-time detection algorithm for …” is grammatically wrong.

Comments on the Quality of English Language

English writing can be improved. The paper should at least make sure grammatical correction. Please refer to the last comment.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

The author proposes a framework that can work all-weather for the problem of drone recognition, but the article doesn't state what the point of studying this problem is. Some suggestions:

1. The paragraph below Figure 2 introduces the acquisition method of UAV and background image data, the introduction mentions the acquisition method of UAV visible light pictures, and does not introduce the acquisition method of UAV infrared images, and the paper does not see the description of UAV infrared images in the dataset.

2. The article describes the acquisition of foggy weather data, and the author is compositing the drone image to scale onto the background image, which is not a reasonable way to handle the situation, because post-compositing does not simulate the effect of the drone being obscured by the fog. In fact, when the target drone is far away, there may be no drone at all in the author's footage. This type of data synthesis is not realistic and cannot support the algorithm in this paper. The plausibility of the dataset is questionable.

3. In section 4.1 the author mentions that different models are used to identify targets during daytime and nighttime, so the models used during daytime and nighttime are completely different, and it seems that the author is just using a combination of the two models, not proposing a new model that can identify targets around the clock.

4. In section 4.1 the authors mention the use of contrast and gray scale averages as a basis for determining whether it is day or night mode, an approach that is questionable. In the dark, visible light images may have lower grayscale and contrast than infrared images.

5. In sections 4.2.1 and 4.2.3, the authors do not give the necessary description of the methods used, and the reader is not informed of the details of the methods used by the authors.

6. There is no need to write in section 5.2, it is something everyone knows.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

This work investigates the detection algorithm based on visible images during the day and infrared images at night with the help of dataset used for urban visible background and urban infrared background. Authors also tackle the foggy environment, low resolution infrared images, unclear object contour, and complex image background issues. To obtain effective information of deeper feature maps, authors utilized dehaze detection structure in daytime while squeeze and excitation-backbone structure and space to depth-path aggregation network with feature pyramid network structure nighttime. The proposed system can achieve 96.3% and 94.7% mAP50 on the UAV-visible and UAV-infrared datasets and perform real-time object detection with inference speed of 40.16 FPS and 28.57 FPS, respectively. The authors demonstrate a strong understanding of the latest developments in the field, making their insights particularly valuable. To further improve the quality of this paper, the authors should address the comments below.

 

1.      To improve the related work section, consider including a review of relevant approaches in the field. Conclude the section by summarizing the limitations of existing work and highlighting the specific problem your research tackles.

2.      In experimental configuration, the learning is set to 1.0. However, exploring a range of learning rates between 0.1 and 1.0 (e.g., 0.1, 0.3, 0.5, 0.7) could potentially yield an optimized learning rate value that enhances the presented results.

3.      YOLOv8 may possibly have a higher false positive rate, while YOLOv9 offers an improved accuracy alongside requiring 15% fewer parameters and 25% less computational resources. Considering factors like speed, accuracy, and efficiency, what is the motivation behind choosing YOLOv8.

 

4.      The proposed system is developed for air-ground non-cooperative UAVs surveillance for air security. How proposed system will tackle nano drones of <30 mm? All mentioned drones are big in size while for surveillance, authors should propose a solution that is both broad in scope and impactful in its potential outcomes.

 

Comments on the Quality of English Language

The text contains some grammatical typos/errors. Please proofread it carefully. A thorough grammar review would enhance the manuscript's clarity.

1.      Instead of " So we incorporate ", use " So, we incorporate ".

2.      Instead of " At night, as infrared images ", use " At night, infrared images ".

3.      Instead of " The Figure 1b is the foggy ", use " Figure 1b is the foggy ".

4.      Instead of " In daytime, We propose ", use " In daytime, we propose ".

5.      Instead of " At night, We propose ", use " At night, we propose ".

6.      Instead of " UAV detection requires the better detection accuracy ", use " UAV detection requires better detection accuracy ".

7.      Instead of " During the inclement weather conditions ", use " During inclement weather conditions ".

8.      Instead of " It is difficult to detect the small objects ", use " It is difficult to detect small objects ".

9.      Instead of " In order to eliminate the problem ", use " to eliminate the problem ".

10.  Instead of " accuracy in the foggy days ", use " accuracy in foggy days ".

11.  Instead of " algorithms requires a large number of datasets ", use " algorithms requires a many datasets ".

12.  Instead of " deep learning based approach "v " deep learning-based approach ".

13.  Instead of " From the results, it can be seen that the YOLOv7 selected ", use " From the results, the YOLOv7 selected ".

14.  Instead of " The detection on the test set is able to achieve ", use " The detection on the test set can achieve ".

15.  Instead of " Recall and the Precision represents the detection accuracy ", use " Recall and the Precision represents detection accuracy ".

16.  Instead of " It can be seen that the top of curve is closest ", use " The top of curve is closest ".

 

17.  Instead of " On the basis of this model ", use " Based on this model,".

 

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

Efforts have been made by the authors to improve the paper quality and scientific soundness. Most of my comments are addressed suitably. There are still some minor comments, just for reference.

1.         Comment 2. The reviewer appreciates the authors’ efforts to make the data available. Still, should Figure 2 be improved? For instance, classify different types of drones. It can be noted that, e.g., some of them are equipped with a camera, the numbers of propellers are different, etc. Or, what marks are made to the UAV images when using the dataset to train a net?

2.         Figure 3 obviously contains different types of urban sites. Again, there is no sub-caption therein, neither are these figures classified.

3.         Results are compared using the tables and figures. Are the figures enough? Should the training process be included as well? This is just for reference. Please refer to the common style of other literature.

4.         Definite article is not necessary when you cite a figure, such as Line 472, one may use “Figure 12b” without “the”.

Comments on the Quality of English Language

 Minor editing of English language required. Please see the new comment 4.

Author Response

 Please see the attachment.

Author Response File: Author Response.pdf

Back to TopTop