Next Article in Journal
Exploring the Sensory Characteristics of the Soybean Spreads Enhanced with Vegetables and Spices
Next Article in Special Issue
DCTransformer: A Channel Attention Combined Discrete Cosine Transform to Extract Spatial–Spectral Feature for Hyperspectral Image Classification
Previous Article in Journal
A Methodological Approach for Interpreting and Comparing the Viscoelastic Behaviors of Soft Biological Tissues and Hydrogels at the Cell-Length Scale
Previous Article in Special Issue
The Impact of Different Types of El Niño Events on the Ozone Valley of the Tibetan Plateau Based on the WACCM4 Mode
 
 
Article
Peer-Review Record

Efficient Small-Object Detection in Underwater Images Using the Enhanced YOLOv8 Network

Appl. Sci. 2024, 14(3), 1095; https://doi.org/10.3390/app14031095
by Minghua Zhang, Zhihua Wang, Wei Song, Danfeng Zhao * and Huijuan Zhao
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Appl. Sci. 2024, 14(3), 1095; https://doi.org/10.3390/app14031095
Submission received: 12 December 2023 / Revised: 17 January 2024 / Accepted: 23 January 2024 / Published: 27 January 2024
(This article belongs to the Special Issue Intelligent Computing and Remote Sensing)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

1. The paper conducts experiments on two datasets, UTDAC2020 and Pascal VOC, and compares the proposed method with other YOLOv8 variants and well-established detectors. However, it's crucial to include a broader comparison with a wider range of state-of-the-art object detection models to establish a more comprehensive benchmark.

2. The paper discusses experiments on UTDAC2020 and Pascal VOC, but it lacks a discussion on the generalization of the proposed method to other diverse datasets.

3. The paper uses Average Precision (AP), AP50, Parameters, FLOPS, and Model Size for evaluation. While these metrics are standard for object detection, it's important to justify their selection and discuss any limitations or biases associated with them.

4. The paper mentions consistent training parameters across experimental groups but lacks a detailed discussion on the sensitivity of the proposed method to hyperparameter choices.

5. The paper mentions early stopping if the model does not show improvement within 50 epochs. While early stopping is a common practice, it's essential to discuss the potential impact on the final model and whether different choices of the early stopping criteria were explored.

6. The qualitative analysis mentions instances of false positives and false negatives in the detection results. Providing a more in-depth analysis of these cases, along with potential causes and solutions, would enhance the paper's transparency and help readers understand the model's limitations.

7. The paper discusses the model's performance under different scenarios, such as small object detection and variations in image resolution. However, it would be beneficial to include robustness testing under challenging conditions, such as occlusion, complex backgrounds, or noisy data, to assess the model's reliability in real-world scenarios.

8. The paper mentions improvements in terms of reduction in model parameters, speed-up in training, and improvements in detection accuracy. However, it's crucial to explicitly state the evaluation metrics used to measure detection accuracy, and to compare the proposed model with existing state-of-the-art models to demonstrate its superiority.

9. The paper refers to experiments conducted on the UTDAC2020 underwater dataset. It's essential to discuss the generalization of the proposed improvements to other datasets and scenarios.

10. Neural networks are well-known and have been used in previous studies such as PMID: 36642410, PMID: 36166351. Therefore, the authors are suggested to refer to more works in this description to attract a broader readership.

11. The introduction of new components, such as Coordinate Attention and Deformable ConvNets v2, may incur additional computational overhead.

12. The paper should address the robustness of the proposed model to different challenges in object detection, such as occlusion, varying lighting conditions, and complex backgrounds. Additionally, limitations and scenarios where the proposed modifications may not perform optimally should be discussed to provide a balanced view of the model's capabilities.

13. The paper provides an overview of the modifications made to the YOLOv8 model, but it lacks specific implementation details such as hyperparameters used, training strategies, and any fine-tuning procedures.

14. The paper should include a detailed comparison with the baseline YOLOv8 model without the proposed modifications.

Comments on the Quality of English Language

English writing and presentation style should be improved.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

OBSERVATIONS TO MANUSCRIPT Appliscie – 2794811: Efficient Small Objet Detection in Underwater Image using the Enhanced YOLOv8 Network  

 

The paper is interesting, since the authors propose a high-precision lightweight underwater detector by specifically optimizing for underwater scenarios based on the YOLOv8 model. Our method achieves 52.12% of AP on the underwater dataset UTDAC2020, with only 8.5 M parameters, 25.5 B FLOPS, and 17MB model size. It surpasses the performance of large model YOLOv8l, 51.69% AP, with 43.6 M parameters, 164.8 B FLOPS, and 84MB model size. Furthermore, by increasing the input image resolution to 1280 X 1280 pixels, our model achieves 53.18% AP, making it the state-of-the-art (SOTA) model for the UTDAC2020 underwater dataset. Additionally, we achieve 84.4% mAP on the Pascal VOC dataset with a substantial reduction in model parameters compared to previous well-established detectors. Experimental results demonstrate that our proposed lightweight method retains effectiveness on underwater datasets and generalization to common datasets.

However, there are some observations that the authors should consider to improve the document.

Please define AP, since it is not known if it is a unit or an abbreviation, as was the case with FLOPS, SOTA.

Line 32: Keywords should not repeat words that appear in the title of the work. It is only allowed when it comes to scientific names of animals, plants, soils. YOLOv8, underwater, network should not appear.

Citations from literature are not reported correctly in the body of the writing. Authors must follow the instructions indicated in the journal's author guide. For example, on line 41, the sentence says: underwater magnetism[1]… . The sentence could be written: underwater magnetism [1]... Throughout the writing it must be corrected... Another example is line 42, the sentence says: … underwater unexploded ordnance detection[2]… . The sentence could be written: … underwater unexploded ordnance detection [2]…

Likewise, authors should leave a space between the final word of a sentence and the corresponding quote. For example line 41, 42, 49, 64, 76, 77, 78, 85, 94, 95, 99, 106, 109, 117, 123, 124, 134, 146, 147, 159. 161, 162, 179, 193, 195, 198, 203, 204, 208, 212, 214, 224, 226, 256, 273, 451.

Another problem that the authors have is the way they write the units of the quantities they report. The measurement units must be separated by a space of the corresponding value, for example, on line 24, the authors wrote: the underwater dataset UTDAC2020, with only 8.5M parameters, 25.5B FLOPS, and 17MB model… There are doubts here, if the authors refer to the number of parameters used in the process they carried out and then the authors should write 8500 parameters and not place the M, since this unit of measurement that defines thousands of units as biyes, grams, etc. Therefore, it is suggested that line 24 be written as follows: the underwater dataset UTDAC2020, with only 8500 parameters, 25.5 B FLOPS, and 17 MB model… .

Please review the entire document and leave a space between the corresponding value and the unit of measurement. The above applies to all units, except for the percentage symbol.

When you refer to a multiplication or try to express a multiplicative relationship of so much, between the numbers you must place the multiplication sign or, failing that, a capital X. For example, line 284, 3 × 3 appears, authors should write 3 X 3. Please, authors should review the entire document.

For numbers or values ranging from 1000 to 9999 or less than 10,000, a comma should not be written to separate the thousands. For example, line 326 says: ... The dataset includes 5,168 training images and 1,293 validation... . This sentence can be written as follows: ... The dataset includes 5168 training images and 1293 validation...  . Review the entire document.

There are some figures (Figure 5, 6 and 7) that cannot be displayed correctly, or all of them do not show what the authors want to show.

Even though the authors show important findings, in the improvement of the YOLOv8 model, the discussion is already lacking; that the authors base their discussion on their own comments, when the model they developed could be explained based on different existing theorems for the applications and modifications that the authors made to the YOLOv8 model. For example, from line 432-434, it is true, if you increase the resolution of an image or map as you mention, the sharpness is greater, so smaller objects can be observed, but why is this part relevant, why in your case they can do it, what do other authors say about it, you should discuss their results. Please enrich the discussion. The article could be accepted if the authors improve their writing and enrich the discussion of it.

Comments for author File: Comments.pdf

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

The paper addresses a significant issue in marine exploration—efficient small object detection in underwater images. The proposed enhancements to the YOLOv8 model are tailored for underwater scenarios, making the research relevant and practical. The modifications to the YOLOv8 model, such as replacing the backbone with FasterNet-T0, adding a Prediction Head for Small Objects, and incorporating Deformable ConvNets and Coordinate Attention in the neck part, are innovative and well-motivated. These modifications specifically target the challenges posed by small, densely distributed, and occluded objects underwater. The comprehensive evaluation of the proposed method on the UTDAC2020 underwater dataset, comparison with YOLOv8l, and achieving state-of-the-art results on UTDAC2020 by increasing the input image resolution demonstrate the effectiveness of the proposed approach. Some suggestions on paper revision can be found below:

It is suggested to add more relevant work on the application of lightweight CNN models for object detection, such as: doi.org/10.3390/app12052281 and doi.org/10.1007/s00170-022-10335-8. Besides, providing more details about the Prediction Head for Small Objects, especially how it contributes to improved small object detection accuracy, would be valuable for readers to understand the novelty.

Including visual examples of the detection results, particularly for challenging underwater scenarios, would enhance the paper's impact and help readers grasp the practical implications of the proposed method. Additionally, a discussion of potential limitations or scenarios where the proposed method might face challenges in underwater environments would contribute to a more balanced presentation of the research.

 

Overall, the paper presents an innovative and efficient approach to small object detection in underwater images using enhancements to the YOLOv8 model. With some clarifications, visual examples, discussions of limitations, and additional comparisons with other models, the paper has the potential to be a valuable contribution to the field of marine exploration and computer vision.

Comments on the Quality of English Language

The quality of English in the manuscript is generally good.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

OBSERVATIONS TO MANUSCRIPT Appliscie – 2794811 (2): Efficient Small Objet Detection in Underwater Image using the Enhanced YOLOv8 Network 

The paper is interesting, since the authors propose a high-precision lightweight underwater detector by specifically optimizing for underwater scenarios based on the YOLOv8 model. Our method achieves 52.12% of AP on the underwater dataset UTDAC2020, with only 8.5 M parameters, 25.5 B FLOPS, and 17MB model size. It surpasses the performance of large model YOLOv8l, 51.69% AP, with 43.6 M parameters, 164.8 B FLOPS, and 84MB model size. Furthermore, by increasing the input image resolution to 1280 X 1280 pixels, our model achieves 53.18% AP, making it the state-of-the-art (SOTA) model for the UTDAC2020 underwater dataset. Additionally, we achieve 84.4% mAP on the Pascal VOC dataset with a substantial reduction in model parameters compared to previous well-established detectors. Experimental results demonstrate that our proposed lightweight method retains effectiveness on underwater datasets and generalization to common datasets.

The authors answered the questions and doubts that were raised during the review of the manuscript. Likewise, the authors highlighted the changes they made in the writing, which facilitated its review. The article improved substantially, so it can be said that it is publishable.

Comments for author File: Comments.pdf

Back to TopTop