The YOLO Framework: A Comprehensive Review of Evolution, Applications, and Benchmarks in Object Detection
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe authors have done great work; the efforts are obvious in the paper.
There are some minor points to be looked into
1. Line 185 - Why are some references superscripts?
Line 188 - Error! Bookmark not defined
Line 209 - The architecture of SSD is illustrated in Error! Reference source not found.. -- Reference missing
2. Figure2 not provided
3. References or figures are missing - need to fix this
4. Line 306 - Place the caption and table together; don't split like this \
Line 1004 - Same applies to Table 9;also do you need such a big table for this information?
5. Line 380, 381 - avoid splitting words like showcasing
6. Section 5 - Line 601 - "YOLOv6: Enhanced Speed and Practicality." Is this a subsection? It can be made bold or specified. Does this mean that YOLOv6 has enhanced speed? Which is its specialty? Mention what is being conveyed here. The same comment applies to all the points in this section.
Why 1] for each reference?
Overall, the article is a good review; 53200 seems to be too much, and filtering 126 articles out of this many articles seems inefficient— need to look at how the study is performed.
Author Response
The authors have done great work; the efforts are obvious in the paper.
There are some minor points to be looked into
- Line 185 - Why are some references superscripts?
Line 188 - Error! Bookmark not defined
Line 209 - The architecture of SSD is illustrated in Error! Reference source not found.. -- Reference missing
Thank you. These issues have been resolved. The problems arose during the format conversion process. The original draft was not prepared using the journal template, which led to formatting inconsistencies during the conversion.
- Figure2 not provided
Thank you, the Figure was added back.
- References or figures are missing - need to fix this
Thank you. We have resolved the issues with the references and figures by reorganizing the draft according to the journal's formatting requirements.
- Line 306 - Place the caption and table together; don't split like this \
Line 1004 - Same applies to Table 9;also do you need such a big table for this information?
Thank you. The format has been adjusted.
- Line 380, 381 - avoid splitting words like showcasing
Thank you. This problem has been solved.
- Section 5 - Line 601 - "YOLOv6: Enhanced Speed and Practicality." Is this a subsection? It can be made bold or specified. Does this mean that YOLOv6 has enhanced speed? Which is its specialty? Mention what is being conveyed here. The same comment applies to all the points in this section.
Why 1] for each reference?
Thank you. ‘YOLOv6: Enhanced Speed and Practicality’ is a subtitle, and it was highlighted now. The ‘1]’ came from the format transfer.
Overall, the article is a good review; 53200 seems to be too much, and filtering 126 articles out of this many articles seems inefficient— need to look at how the study is performed.
Thank you, and I apologize for any confusion caused. The number "53,200" represents the total results that appear when searching for the keyword "YOLO" in top-tier publications. From this extensive pool, we prioritized high-impact and highly-cited publications, focusing on those that have significantly contributed to the field.
Reviewer 2 Report
Comments and Suggestions for AuthorsPlease consider the attached comments.
Comments for author File: Comments.pdf
Author Response
The authors have presented a The YOLO Framework. The authors proposed a framework along with a review of evolution, applications, and benchmarks in object detection. However, some modifications need to be considered to proceed for acceptance. Here are the comments:
1-The “Abstract” is very long. Please reduce the size, reformulate it and clearly explain the main contribution of this research. Additional text can be moved to the “Introduction” section, if required.
Thank you. The abstract has been rewritten based on the comments.
2-At the end of the “Introduction” please add a new paragraph stating the structure of the remainder of the paper. For example: “The remainder of this paper is organized as follows. In Section…. etc.”
Thank you, and one paragraph has been added.
3- Add a table of all used contractions at the beginning.
Thank you. The table 1 has been expanded.
4- The formatting of all references is wrong. Kindly fix.
Thank you. The format has been updated.
5- Some tables are constructed by the author but others need a “Reference”. Authors did not provide any references for some tables. Please clarify.
Thank you. The references has been added.
5- Regarding performance analysis. I have the following three concerns:
A- The authors have chosen Mean Average Precision (mAP) to be 0.50. We know that it should be from 0 to 1. However, the closer the value to 1, the better the ranking performance will be. Therefore, you need to justify the reason for chosen this value in the same paragraph.
Thank you. In this study, mAP50 (mean Average Precision at 50% Intersection over Union) is employed as the primary metric for evaluating and comparing the performance across different YOLO versions. The choice of mAP50 simplifies the benchmarking process by focusing on straightforward detection tasks, providing a consistent and accessible measure of model effectiveness. By setting a fixed IoU threshold of 50%, mAP50 highlights the models' capabilities in identifying objects with acceptable localization accuracy, making it particularly suitable for general-purpose evaluations and less complex detection scenarios. This approach ensures a balanced comparison, especially when dealing with diverse datasets and varying levels of object detection difficulty.
B- The authors provided a values for mAP for Blood cell detection, Facemask detection, and Human Detection. However, these values need validations. Authors should provide real life examples to justify the obtained values. For example, apply your algorithm on a real image. This will validate the provided accuracy.
We appreciate the reviewer’s insightful suggestion regarding the validation of mAP values using real-life examples. In this study, the mAP values for Blood Cell Detection, Facemask Detection, and Human Detection were obtained by evaluating the YOLO models on open-source benchmark datasets. These datasets provide a standardized framework for comparing model performance across diverse scenarios and are widely recognized for their reliability in object detection research.
However, we acknowledge the importance of validating these results on real-world samples to further substantiate the reported accuracies. Unfortunately, access to practical, domain-specific data, especially for applications like medical imaging, can be challenging due to privacy concerns and restrictions in obtaining proprietary datasets. Nonetheless, we aim to address this limitation in future work by collaborating with relevant organizations to secure access to real-world samples. This would allow us to evaluate the YOLO models in practical scenarios, providing additional validation for their applicability and robustness.
Thank you for highlighting this critical point, as it offers a valuable direction to enhance the rigor and relevance of our findings.
C- Your work need to be compared with other people work in the literature.
We sincerely thank the reviewer for their thoughtful suggestion to include comparisons with other studies in the literature. While we recognize the value of such comparisons, the focus of this paper is to provide a comprehensive review of the YOLO framework itself, tracing its evolution across various versions and examining their advancements, benchmarks, and applications.
Our primary aim is to analyze the architectural developments and performance improvements of successive YOLO versions (e.g., YOLOv8, YOLO-NAS, YOLOv9, YOLOv10, and YOLOv11) using standardized datasets and metrics such as mAP50. This approach allows for a consistent and focused evaluation of the YOLO framework, ensuring clarity and relevance for readers interested in the development of this specific family of models.
While comparisons with works from other researchers could offer additional insights, they often involve diverse datasets, training strategies, and specific application scenarios, which may introduce inconsistencies and detract from the central focus of this review. Instead, our intent is to present a unified perspective on YOLO’s evolution, highlighting its impact across domains and its adaptability to various real-world challenges.
We deeply appreciate the reviewer’s comment and will ensure that this distinction is made clear in the paper to better align with the reader’s expectations.
6-Kindly provide two paragraphs. One to explain and list the main “contributions” of this research and other paragraph to list the main “Limitations” of this work.
Thank you for your thoughtful suggestions. The two paragraphs addressing the main contributions and limitations of this research have been incorporated into the paper. Additionally, the seventh concern has been resolved in conjunction with these updates. We appreciate the opportunity to refine the manuscript and ensure it meets the highest standards. Please let us know if further revisions are needed
7-The “conclusion” needs reformulation to reflect the main contribution of the authors.
Thank you. The concern has been resolved in conjunction with 6th comment.
Reviewer 3 Report
Comments and Suggestions for AuthorsThe paper is useful and interesting but there are some comments:
1) The required problem definition for each of the algorithms and the solution algorithms are not presented in mathematical form to see how the effect is achieved
2) For each application area of algorithms, it is desirable to present a specific example with its mathematical description and show on it what effect was obtained specifically
3) To explain all the figures and tables to understand their meaning better.
Author Response
1) The required problem definition for each of the algorithms and the solution algorithms are not presented in mathematical form to see how the effect is achieved
Thank you for your insightful comment. We acknowledge that the original Figure 2, which illustrates the architecture, was inadvertently omitted from the review draft during the formatting process. This figure has now been reinstated to enhance the clarity of the discussion. Additionally, details regarding the loss functions have been included to further elucidate the mathematical model underlying YOLO. These updates aim to provide a more comprehensive understanding of the problem definition and solution algorithms, addressing the core aspects of the mathematical framework effectively. We appreciate your feedback in guiding these improvements.
2) For each application area of algorithms, it is desirable to present a specific example with its mathematical description and show on it what effect was obtained specifically
Thank you for your valuable comment. We appreciate your suggestion to include specific examples with mathematical descriptions for each application area of the algorithms. While this review primarily focuses on summarizing the advancements and applications of different YOLO versions, we have enriched the paper by adding relevant details to clarify the mathematical foundations and effects in select application areas. For example, we have expanded discussions to include loss functions, bounding box regression, and classification metrics, illustrating how these contribute to performance in areas such as medical imaging and autonomous vehicles. These additions aim to provide a clearer understanding of how YOLO achieves its results across diverse domains while adhering to the scope of the review. We trust this enhancement addresses your concern effectively.
3) To explain all the figures and tables to understand their meaning better.
Thank you for your insightful comment. To enhance the clarity and comprehensibility of the figures and tables presented in the paper, we have revised the accompanying descriptions and explanations. Each figure and table now includes detailed captions that outline their context, significance, and the data they represent. Additionally, we have expanded the discussions within the main text to provide better integration and interpretation of the visual data, ensuring that readers can fully understand their meaning and relevance to the topics being discussed. We hope these updates address your concerns and make the content more accessible and informative.
Round 2
Reviewer 2 Report
Comments and Suggestions for AuthorsCorrection approved.
Author Response
Thank you for your effort and comments.