Behavior Recognition of Squid Jigger Based on Deep Learning
Round 1
Reviewer 1 Report
This study examines the classification performance of three distinct deep learnin models using a self-constructed dataset on the behavior of squid fishing crew members. The paper reveals that the LSTM-Restnet50 model attains the highest performance when NUM_FRAMES=8. The paper exhibits a certain level of innovation in its practical application.
The following are the suggested revisions for this paper:
1. Line 129: The behavior descriptions in Table 1 are excessively intricate. It is recommended to simplify these descriptions and incorporate any additional information into the main text.
2. Line 156: There is ambiguity in the statement "7-fold increase in the amount of source data."
3. Line 179, Line210: Change the colors for 1frame in Figures 3 and 4 to red, green, and blue instead of red, yellow, and blue.
4. Line 199: Clarify that Xavier normal method improves stability and convergence speed of the model training process, not the model itself, in Section 2.3.1
5. Line 210: Figure 4 demonstrates that the right part should be identified as the diagram illustrating the structure of LSTM.
6. Line 253: Figure 6. The drawing is not standardized, and the placement of the “linearly map” is not appropriate. It needs to be modified.
7. Line 302: The description in the first paragraph of section 3.2 of the experimental models is redundant and unnecessary. The three different video classification models have already been introduced in a previous section, so there is no need to provide the explanation again at this point.
8. Line 313: In the second paragraph of the 3.2 experimental results, the selection criteria for the LSTM encoding model should be described in the introduction of the 2.3.2 LSTM-Restnet model.
9. Line 417: Add a supplement in Section 4.2 to address reasons why Resnet50 outperforms Resnet152.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report
- Introduction: a better summarization of the current state of the art is needed, particularly about the main weaknesses and research gaps that still linger. This is important to better contextualize the research and to highlight the main contributions of the study.
- Table 1: the figures are too small, so it is very difficult to notice any difference between different classes.
- Image augmentation should only be applied to the training set, and never to the whole dataset prior to division. This is a serious mistake, because the same samples, with only slight differences, will be present in both the training and test sets, leading to serious bias. This completely invalidates the results. Unfortunately, this type of incorrect procedure has become quite common in the literature, which is concerning.
- Language needs some work, as there are several grammatically incorrect sentences.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 3 Report
In this paper, the authors introduce how the 3DCNN model, LSTM+Resnet model, and timesformer model are applied to video classification tasks and for the first time, applied to the EMS system. In addition, this paper tests and compares the application effects of the three models in video classification, and discusses the advantages and challenges of using them for video recognition. Through experiments, we obtained the accuracy and relevant indicators of video recognition using different models. The author’s work is timely new and interesting but to proceed further this article needs several improvements. Some of them as are follows:
1. There are many grammatical mistakes and typos in the paper that must be removed with detailed proofreading.
2. Moreover, the authors should further highlight the key contribution of this work. In this regard, I suggest to add the key contribution in bullet form at the last second paragraph of the introduction section.
3. Please add a paper organization paragraph at the end of the introduction section.
4. The novelty of this work seems very limited, moreover the 3D CNN network architecture diagram is too general.
5. The related work and literature review is very limited. Further literature can be added.
6. To know more about underwater target detection and underwater communication, the authors can also refer to “Localization and Detection of Targets in Underwater Wireless Sensor Using Distance and Angle Based Algorithms,” IEEE ACCESS”, “A Review of Underwater Localization Techniques, Algorithms and Challenges,” Journal of Sensors.”
7. The comparison of the proposed work is completely missing. The authors should compare the proposed work with other work as well to properly validate the performance of their model or approach. Especially, I would suggest to compare with the latest algorithms from the recent years in a separate table.
8. There are several references incorrectly placed, also the formatting is not proper. Please revise all of them carefully.
9. Figure 1 and figure seems identical and gives no information at all. Therefore, one of them should be deleted, I prefer to delete figure 1.
10. Figure 2 and Table 1 captions are identical.
11. The caption of Figure 6 “This is a figure. Schemes follow the same formatting” is very strange, must be revised.
12. Most of the figures and tables captions are too general and short, please revise all of them. Also, add some brief description to each.
13. The results of this work seem very general and limited. Therefore, I suggest to add more results and experimental work.
14. The references are very limited according to the paper length. Further references can be added, especially form the last five years.
Extensive editing of English language required
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Round 2
Reviewer 2 Report
Although the introduction could still be slightly improved, overall I am satisfied with the modifications introduced by the authors.
There are still some language problems, but overall the text can be understood without much effort.
Author Response
I appreciate your recognition of my work. I am glad that I have been able to meet your expectations for academic writing. Your affirmation and praise are the best rewards for my efforts and dedication. We have invited native English-speaking friends and colleagues to review our paper to ensure its quality.
Reviewer 3 Report
NA
Author Response
Thank you very much for recognizing my work. I am delighted to meet your academic writing requirements. Your affirmation and praise are the best reward for my efforts and dedication.
I will continue to work hard and improve my writing skills and professional knowledge in order to present you with content that is of higher quality and accuracy in the future.