Next Article in Journal
Autonomous Vehicles Traversability Mapping Fusing Semantic–Geometric in Off-Road Navigation
Previous Article in Journal
Unoccupied-Aerial-Systems-Based Biophysical Analysis of Montmorency Cherry Orchards: A Comparative Study
 
 
Article
Peer-Review Record

G-YOLO: A Lightweight Infrared Aerial Remote Sensing Target Detection Model for UAVs Based on YOLOv8

by Xiaofeng Zhao, Wenwen Zhang *, Yuting Xia *, Hui Zhang, Chao Zheng, Junyi Ma and Zhili Zhang
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Submission received: 21 August 2024 / Revised: 12 September 2024 / Accepted: 12 September 2024 / Published: 18 September 2024

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

1 the paper introduction does not show motivation to new model design enought. As for me authors need to show modern hardware requirements, tasks wich require obj det and problems that are existing now in this field.

2. Also motivation, hiposesys and aim of the paper could be shown clearly at the end of the introduction.

3. in the figure 1 changes corresponding to the original yolo should be highlighted.

 4. In the section 3.1 some motivation to make changes should be given. For instancse, why authors use 2 goust conv instead of one ? Do authors test any combintation of suggested changes? Authors show some proves of some changes in the results. But also some intuition could be given.

5.section 4.2 should remid hardware for fps testing. Also some motivaton under fps in hundreds frames (like 500) should be argued. We may imagen that 10-50 fps would be sufficient enoght. Probably some technical parameters like sppe of uav would inmpact on this characteristic.

 6. In general results seems to be interesting. However, github repository with model is required.

7. Also some benchmarks testing beside tested dataset would be interesting.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

This paper presents modifications to the YOLOv8 network with a focus on UAV applications, aiming to create a low footprint network with high performance on infrared images. 

The paper offers a significant improvement in the number of parameters for the proposed network, but there are a number of improvements that need to be made.

THere is insufficient explanation for the 'why' of the modifications, beyond that they seem to give good results.

In particular, why is the network particularly well suited for IR images? I suspect its not, really, but this makes it more tractable and suitable for drones. So, the comparison is probably ok but this should be pointed out.

The GhostBottleneck, DWConv, ODConv are not the author's work but they fail to cite them the first time they appear which is confusing (around lines 85-97). A better explanation of these methods is needed.

SlideLoss may be the author's, but its hard to tell.

In the abstract the FPS is improved by 71 but we need to know the relative term (maybe something like "FPS is 71 greater than the closest competitor, at 556 FPS vs ??"; in fact its not clear what the closest competitor is, as the next highest in table 4 is FPS 540).

In Table 4, YOLOv9c is claimed to have 51M parameters, but in  "YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information" they claim v9c has 25.3 M parameters. this makes me question the entire table so this needs to be corrected.

It is difficult to compare the results from Table 4, since most authors introduce new architectures and run against a more standard dataset, such as COCO.  Would it be possible to do this?

Some plots would make table 4 easier to understand - perhaps a scatter plot of parameters vs F1, and parameters vs mAP50?

Figure 7 has very low impact, but I do think its good to include a comparison like this.  But,  I would recommend perhaps fewer columns and maybe fewer cases to illustrate better.  Looking at the images its not clear that any of the methods work any differently. So having fewer, that are larger, with some annotations on differences would be helpful.

Make fig 1 bigger and make the caption more descriptive; explain what the pieces are (I know that is done in the paper but adding more in the figure caption makes the paper easier to understand).

PLease add more content to ALL the captions to make the paper better.

I do not understand figure 3 at all. Can this be improved?

A graphic showing the function f(x) for equation 4 would be helpful.

How was 0.1 chosen in eqn 4?

Why did you have to simplify the dataset in lines 321? "for analysis purposes" is not sufficient explanation.

Table 1 is NOT the label counts, its the image count.  Can you provide the label count?

Explain the headers on table 1. I think (0,32x32) means small objects are sized 0 pixels (which is impossible) to 32x32, but i had to read hard to figure this out when it could have been explained. 

Some scatter plots for the ablation would be helpful.  Figure 6 is somewhat helpful but a few more key scatter plots would be helpful and easier to understand.

Line 402 should say 'objects' not 'objectives'

The table 4 is already complicated but given the emphasis on object size it would be nice to see the performance per size class small, medium, large.

 

Comments on the Quality of English Language

There are extensive English editing needed.  The abstract is good but as the paper develops problems crop up.  A few examples are here, but there are too many to enumerate individually. The sentence around lines 49-53 is too long and needs to be broken up to improve understandability.  Another example is line 55, a 'run on' sentence where 1-stage and 2-stage are introduced but the 2-stage is elaborated for an unknown reason. The paragraph covering lines 49-80 is too long and needs to be broken up into 1-stage, 2-stage, and YOLO. 

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

The paper presents an innovative lightweight UAV-based target detection model for infrared small targets detection in complex environments. The model is based on YOLOv8. The article is well written and structured. It has a comprehensive review of related works, a detailed description of the proposed methodology, ablation and comparative experiments. However, there are some minor issues that could be misleading in the presentation of the results and methodology. Please find the comments below:

- Please check the Equation (4). It seems that all x values are in range from \mu-0.1 to \mu. You should use >= and <= in the first and third lines to get unambiguous function. 

- The figure 6 is misleading as it sums up metrics of different meaning and units (parameters, F1, mAP50, FPS, FLOPs) and the information for comparing different models with respect to a specific metric becomes too complicated. It would be better to use different figures for each metric. 

- In figure 7, the colors should be explained in the figure caption as well. 

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The authors address all my comments.

Author Response

Thank you again for your valuable comments on our paper, your comments are our way forward.

Reviewer 2 Report

Comments and Suggestions for Authors

I downloaded the author response and the updated 'v2' paper.  I see many changes in yellow but not all. IN particular, the references to GhostBottleneck and ODConv are in the response but NOT in the updated V2 paper.  The reference to SEAttention is given as 46 but is not in brackets, and similar with the SlideLoss which is not in yellow (these are all before line 103).

At this point, I am going to check ONLY the author response and recommend a revision so that the v2 matches the author response.

Please put something in the abstract about the relative improvement in fps.  I am of the opinion that abstracts ARE very important, and sometimes they may be the only thing a reader actually sees.

Please check the other parameter count values in table 4.  I appreciate the correction.

I think figure 8 really does help thanks for the inclusion.

Table 1 is more clear, but it would be nice to see the relative number of objects in the set.  If you cannot provide that, maybe change the caption again to say "labeled object counts" instead of 'label counts'.  i think that makes it more clear although the class balance would be helpful (even if its something like 'the classes were split roughly 20%-50%-20% etc')

I am still not seeing much of the impact of figure 7.  it is better, but I suggest put arrows to help highlight the detections where your method is better.  It may also be good to point out that in some cases where your method detects an object, other versions detect it as well but your parameter improvements are beneficial.

 

 

 

Comments on the Quality of English Language

Improved

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Back to TopTop