Next Article in Journal
ADSAttack: An Adversarial Attack Algorithm via Searching Adversarial Distribution in Latent Space
Previous Article in Journal
Enhancing NOMA’s Spectrum Efficiency in a 5G Network through Cooperative Spectrum Sharing
 
 
Article
Peer-Review Record

KPE-YOLOv5: An Improved Small Target Detection Algorithm Based on YOLOv5

Electronics 2023, 12(4), 817; https://doi.org/10.3390/electronics12040817
by Rujin Yang 1, Wenfa Li 1,2,*, Xinna Shang 1,3,*, Deping Zhu 1 and Xunyu Man 1
Reviewer 1:
Reviewer 2:
Electronics 2023, 12(4), 817; https://doi.org/10.3390/electronics12040817
Submission received: 25 December 2022 / Revised: 18 January 2023 / Accepted: 3 February 2023 / Published: 6 February 2023

Round 1

Reviewer 1 Report

The authors have suggested an enhanced version of the Yolo v5 object detection algorithm to enable it to detect smaller objects. To achieve this, they first incorporated the Kmeans++ algorithm to identify anchor boxes accurately. In addition, they introduced the scSE attention module along with increasing the small target detection layer. In summary, they increased the detection performance to a certain extent.

Advantages:
1. The paper addresses a really important issue of detecting very small objects which is crucial for some OD tasks such as unmanned aerial vehicles. This is also important for detecting small faces.

2. The improvements were compared to some other algorithms in a fair way.

3. Organization of the paper is clear and the flow of the writing is well maintained.

Issues: The paper currently has several problems to be resolved in order to be published in this journal.

1. I have found many English language problems which need special proofreading.

1a)PP2, Ln41: classical -> vanilla or well-known

1b)PP2, Ln39:. unmanned and unmanned aircraft -> one of the "unmanned" is redundant.

1c) PP2: Ln:50 "and the detection has outperformed" -> detection performance

1d) PP3, Ln: 107 basic -> basis architecture

1e) PP3, Ln 108 as figure -> in Fig.1

1f) PP3, Ln 114 menas -> means [30]..

1g) PP4, Ln 123 improves -> reduces

1h) PP4, Ln 127 cluster centers -> cluster centroids

1i) PP4, Ln 147 perspective -> point of view

1j) PP4, Line 217, "K-YOLOv5s is an algorithm...box." -> wrong sentence construction. Rephrase it

1k) PP9, table 5 Remove "Add" from "Add location"

There are many more mistakes. Please make it to be proofread.

2. I have also noticed many scientifically debatable statements which are listed below:

2a) Page 1 Ln:33 "The first type of small target is known as an absolute 31 small target, and it can be identified as such when its pixels are smaller than 32*32 pixels in 32 the image. " -> Is there a scientific paper that describes what is small (relatively or absolutely)? If so, please cite this critical information.

2b) PP2, Line 55: "The presence of fully-connected ..." -> the inefficiency of RCNN is the lack of a region proposal network that was first proposed by Fast-RCNN. Vanilla RCNN is more of an image classification network attempting to look for all windows found by the selective search mechanism.

3. There are also some points that must be improved in the revised version. They are listed as:

3a) PP5, Ln 155: of the ith channel -> channel-wise recalibration is not well defined or explained. Please elaborate on this part more.

3b)PP 4, Ln 123: It is not clear how the authors incorporated the K-means++ clustering module into the pipeline. Please explain it by also stating whether this module hurts the end-to-end nature of Yolov5

3c) PP4, line 135: "cluster centers until..." -> is this procedure applied for all epochs?

3d) PP4, line 142 "attention module is..." -> please explain it and show its location in the whole network architecture.

3e) PP7, table 1 "Size (%)" -> what are the criteria for having these groups? if it is the size of the region then please explain and point it out in the manuscript. 

3f) As is known Yolo is constantly being enhanced. Now we have YOLO8. I think as of January 2023, the paper must do a benchmark test with yolo 7 and 8 and show its merits. The next generations might have resolved the problem.

 

Author Response

I thank the reviewers for their review and suggestions, and I have revised the manuscript in accordance with your suggestions:

1.I have proofread the vocabulary and grammar in the article and completed the corrections.

2.About scientifically debatable statements.

2a)Definitions of absolute targets are from the MS COCO dataset and are referenced elsewhere in the article. The source of the definition of relative target has been added as a reference in the corresponding article.

2b)The reasons for RCNN inefficiency have been re-explained in the manuscript.

3.Some points that improved in the revised version.

3a)The core of channel-wise convolution lies in the sparsity of input and output connections, where each output is connected to only some of the inputs, which is conceptually different from grouped convolution, where there is no strict distinction between inputs, but rather a certain stride to sample multiple related inputs for output (sliding in the channel dimension), which can reduce the number of parameters and ensure a certain degree of information flow between channels. The number of parameters is reduced and a certain degree of information flow between channels is ensured. This is the basic knowledge related to convolution and does not need to be explained specifically in the article.

3b)The K-means++ algorithm re-clustering anchor frame size is only a pre-processing operation on the dataset, so it does not break the end-to-end nature of the original model.

3c)Because the K-means++ algorithm re-clustering anchor box size is only a pre-processing operation on the dataset, it appears in each epoch.

3d)The location of the attention mechanism insertion has been illustrated in the corresponding section of the article, and Figure 2 clearly shows the location of the attention mechanism presence in the backbone network.

3e)The specific meaning of size has been explained in the corresponding part of the article.

3f)A comparison of the model with YOLOv8 has been added to the comparative experiments section of the article.

Author Response File: Author Response.pdf

Reviewer 2 Report

The YOLO algorithm has been the target of different adaptations and improvements since its first version, also because it is available in open source, many works have focused on YOLO. His work is focused on YOLOv5, actually a very stable version, but there are other recent versions. The detection of small objects is a challenge for any deep learning computer vision method and the objective of your work is relevant with one more step to improve this type of algorithms in the detection of small objects.

The paper is well written and organized.

The computation time must be presented, as well as analyze whether or not there is damage to your algorithm in detecting large objects and whether the computational time is not increasing in relation to the base YOLOv5.

Your algorithm may be good for small objects, but computational time consumption may not be desired for real-time applications. For these reasons it is important to present an analysis of computational times.

At the end of the introduction, the organization of the structure of the paper should be written.

Avoid writing in the first person - "We".

The conclusion need to be improved.

Author Response

I thank the reviewers for their review and suggestions, and I have revised the manuscript in accordance with your suggestions:

1.Computational time-related evaluation metrics have been added to the comparative experiments section of the article.

2.A description of the structure of the article has been added at the end of the introduction.

3.The first person "We" has been reduced in the article.

4.The conclusion section of the article has been revised.

Author Response File: Author Response.pdf

Back to TopTop