Underwater Small Target Detection Based on YOLOX Combined with MobileViT and Double Coordinate Attention
Round 1
Reviewer 1 Report
1.1 The main question addressed by the research is optimization of real-time object detection architectures for underwater applications.
1.2 The topic is not original and novelity is not clearly shown in the introduction.
1.3 It is not clearly shown, do the current models not enougth for the field or not. I reccomed to describe any case study in the introduction.
1.4 I also recommend to divide introduction into intro and relative paper review.
1.5 The paper add the new model architecture to the subject area compared with other published material, but authors do that without any explanation.
1.6. The main remark to the paper is to explain or prove all proposal given by author.
2. Line 75-77: "In underwater target detection, attention mechanisms are frequently used in feature extraction, and in mobile networks, attention mechanisms have proven their usefulness in computer vision through their ability to achieve high feature extraction at a relatively low cost." - The meaning is not clear, what does the " high feature extraction" means, attention-like layers have aim to "highlight" and help to rest only valuable features for target.
3. lines 52-67: yolo family and real-time obj det go far away from this review, see for instnace https://paperswithcode.com/task/real-time-object-detection, also yolo6-8 papaers.
4 lines 69-74: please show some case, as example
5. lines 52-67, this text does not describe yolox and their benifits comparing to other models.
6. lines 85-108: text does not describe mobileVit, sota in this field
7. lines 85-108: "Thus we choose MobileVIT[28] to combined with YOLOX[29] ." this important point, which can be thought as aim of the work is given with sufficient provment or explanation
8. "The YOLOX uses mosaic data enhancement during the image pre-processing 128 stage and selects four images from the dataset for stitching and testing, which can enrich 129 image backgrounds." i belive that augmentation can not be thougth as part of any architecture.
9. section 2.1 should contain illustration of yolox
10. section 2.3 do the layer correponds to the paper https://arxiv.org/abs/2103.02907 I can not find any refference.
11. line 253 "Because YOLOX has advantages in image enhancement, target classification, and label classification" - this important statment given without any prove, please show either some refferences or researches on all this relsuts
12. Figure 5 must higlight difference in original and proposed architectures.
13. the dataset in 4.1 does not have any refference. Where we can find URPC2020?
14. If URPC2020 exist anywhere It would be interensting to comapre author results with outher reseacher have.
15. To the best of my knowledge "mAP is the standard to measure the 293 accuracy of the model in target detection." mAP can assume as mAP50 as,for instance, mAP 50:95 or other form, author should show thier measure determination.
1. The sentances like "Thus we designed a proposed coordinate attention named double coordinate attention (DCA)" (line 227) are not correctly constructed
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report
It should be clearly stated whether the model used makes original contributions or just uses a combination of already existing modules.
All equations must be accompanied by much more pertinent justifications. There are many mathematical quantities that appear without explaining their meaning.
Better results are assumed through a balance between the accuracy of recognition and the necessary resources. It would have been useful, in the paragraph of discussing the results, also more concrete data such as calculation time and the need for computing resources (for example, memory volume). As a discussion of the existence of these resources on different devices directly involved in the applications under consideration.
From the logic of the presentation, a strong and direct connection must emerge between the mathematical support and the particular solution proposed as a personal contribution. Otherwise, the reader may be left with the impression that the mathematical support was introduced only formally because these are the usual requirements.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 3 Report
Comments for author File: Comments.pdf
The authors need to conduct a proper proof-reading of the overall manuscript. There are many typos and grammar mistakes.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
The authors aderres all my comment.
However, I also reccomend to add work motivation into introudction.
Also all points of contribution could be concidered in the discussion.
Author Response
Please see the attachment
Author Response File: Author Response.pdf