Next Article in Journal
Experimental Comparison of the Single-Event Effects of Single-Photon and Two-Photon Absorption under a Pulsed Laser
Previous Article in Journal
Experimental Volume Incidence Study and the Relationship of Polypropylene Macrofiber Slenderness to the Mechanical Strengths of Fiber-Reinforced Concretes
 
 
Article
Peer-Review Record

Underwater Accompanying Robot Based on SSDLite Gesture Recognition

Appl. Sci. 2022, 12(18), 9131; https://doi.org/10.3390/app12189131
by Tingzhuang Liu †, Yi Zhu †, Kefei Wu and Fei Yuan *
Reviewer 1:
Reviewer 2: Anonymous
Appl. Sci. 2022, 12(18), 9131; https://doi.org/10.3390/app12189131
Submission received: 3 August 2022 / Revised: 23 August 2022 / Accepted: 6 September 2022 / Published: 11 September 2022
(This article belongs to the Special Issue Underwater Robot)

Round 1

Reviewer 1 Report

The article is devoted to the creation of an accompanying underwater robotic system based on gesture recognition. The article proposes an improved underwater gesture recognition algorithm.

The authors suggest separately recognizing a human before recognizing a gesture, which improves the recognition effect. The verification of the algorithm and the proposed robotic system in the swimming pool was carried out. The results of the work can be of certain interest to scholars in the field of real-time image processing.

However, I have several questions and comments for the Authors. Please, find them below.

1.  "Therefore, at present, the most common way of underwater man-machine interaction is to capture the diver’s gesture for recognition and analysis..." there are not enough references to studies confirming the prevalence of this method.

2.  The "introduction" provides an insufficient overview of existing solutions. What else is used for marine exploration?

3. The abstract of the article states that "Our experiments show that the use of underwater datasets and target tracking can effectively improve the accuracy of gesture recognition by 40-100%." However, these values are not explained in the conclusions. 

In the text of the paper, there is the phrase "mAP index improves greatly after the addition of underwater person tracking.

With the increase of the score threshold (i.e., remove network detection results below the threshold), the improvement range increased from 30% to 100%" however, these estimates are obvious from the text.

4.In conclusion, there are not enough numerical estimates of the proposed algorithm

 

Minor issues:

1. Line 3. "However, the underwater robots on the market have some problems, such as single function, poor effect, ..." what do "poor effect" and "single function" mean? Rewrite more specifically

2. Line 6. Explain the abbreviation "KFC"

3. Line 12. "gesture Recognition" why is there a capital letter here?

4. Line 16,17. "Marine" why is this word capitalized?

5. Line 21. "diveRs"? Perhaps a letter is missing here.

6. Line 24,25,68,84.. "man-machine" i believe more common term is "human-machine"

7. Line 120, "Where, denotes the recovered image, and..." perhaps something is missing here. In formula (1) the notation is not explained.

8. Line 153. What is "Smmoth" mean?

9. Algorithm 1  "Underwater person tracking algorithm" perhaps it is better to replace "person" with "human" since the wording "human tracking algorithm " occurs earlier in the text.

 

Here are some articles on a similar topic that can improve your article

"Underwater gesture recognition using classical computer vision and deep learning techniques"

"Gesture-based language for diver-robot underwater interaction"

The English language requires extensive proofreading prior to publication.

However, I believe that the peer-reviewed manuscript could be a great contribution to MDPI after relatively major revisions.

Author Response

Response to Reviewer’s Comments

Reply to Reviewer #1

Thank you so much for your attention to our manuscript. Your valuable comments and suggestions are really appreciated and help us revise the manuscript for resubmission. We have made modifications in the following aspects.

Blue font indicates each comment, black font indicates our response for each comment, and red font indicates our changes in the paper.

Comment:

"Therefore, at present, the most common way of underwater man-machine interaction is to capture the diver’s gesture for recognition and analysis..." there are not enough references to studies confirming the prevalence of this method.

Response to comment:

Thanks for this comment. During the preliminary research process, we retrieved and studied a large number of literatures on gesture recognition research for underwater human-machine interaction, including but not limited to the following articles:

  • Chiarella, D., Bibuli, M., Bruzzone, G., Caccia, M., Ranieri, A., Zereik, E., ... & Cutugno, P. (2015, May). Gesture-based language for diver-robot underwater interaction. In Oceans 2015-genova (pp. 1-9). IEEE.
  • Mišković, N., Bibuli, M., Birk, A., Caccia, M., Egi, M., Grammer, K., ... & Vukić, Z. (2016). Caddy—cognitive autonomous diving buddy: two years of underwater human-robot interaction. Marine Technology Society Journal, 50(4), 54-66.
  • Gustin, F., Rendulic, I., Miskovic, N., & Vukic, Z. (2016). Hand gesture recognition from multibeam sonar imagery. IFAC-PapersOnLine, 49(23), 470-475.
  • Xu, P. (2017). Gesture-based Human-robot Interaction for Field Programmable Autonomous Underwater Robots. arXiv preprint arXiv:1709.08945.
  • Chiarella, Davide, Bibuli, et al. A Novel Gesture-Based Language for Underwater Human–Robot Interaction[J]. Journal of Marine Science & Engineering, 2018.
  • Jiang, Y., Zhao, M., Wang, C., Wei, F., Wang, K., & Qi, H. (2021). Diver’s hand gesture recognition and segmentation for human–robot interaction on AUV. Signal, Image and Video Processing, 15(8), 1899-1906.
  • Jiang, Y., Peng, X., Xue, M., Wang, C., & Qi, H. (2021). An underwater human–robot interaction using hand gestures for fuzzy control. International Journal of Fuzzy Systems, 23(6), 1879-1889.

Because we did not cite enough references in our paper, we failed to reflect the prevalence of this method. We have added some references to the paper to support our research.

Line 34: Therefore, at present, in order to realize the communication and interaction between the diver and the robot, many researchers [11-13] adopted the method of recognizing and analyzing the captured diver’s gesture to obtain the diver’s command and intention.

Comment:

The "introduction" provides an insufficient overview of existing solutions. What else is used for marine exploration?

Response to comment:

Thanks for this comment. Due to the research content of this paper, we only focus on the interaction between the underwater robot the diver. In fact, in the field of marine exploration, there many existing solutions and researches, such as marine exploration using ROV or AUV. We have made additions in the “Introduction” of the paper.

Line 15: In recent years, underwater robots have begun to play an increasingly important role in marine exploration and development. Underwater robots are mainly divided into remote operated vehicle (ROV) and automatic underwater vehicle (AUV) [1,2]. ROV can complete complex underwater tasks through the manipulation of personnel on shore, and is widely used in many fields such as underwater mining, hull cleaning, pipeline monitoring, etc [3,4]. AUV is mainly used in marine resource exploration, mineral resource development and other fields due to its strong autonomy and large diving depth [5].

The research and application of underwater robots has become increasingly mature, and it has made remarkable achievements in the field of marine exploration. However, for some important scenes such as exploration under complex terrain, aquatic resource fishing, underwater archaeology, and marine biological census, divers are still the main body.

Comment:

The abstract of the article states that "Our experiments show that the use of underwater datasets and target tracking can effectively improve the accuracy of gesture recognition by 40-100%." However, these values are not explained in the conclusions.

In the text of the paper, there is the phrase "mAP index improves greatly after the addition of underwater person tracking.

With the increase of the score threshold (i.e., remove network detection results below the threshold), the improvement range increased from 30% to 100%" however, these estimates are obvious from the text.

Response to comment:

Thanks for this comment. Since the full numerical experimental results are not presented in our paper, there is not enough evidence to support the conclusion of the “improvement range increased from 40% to 100%” proposed in the paper. We have shown the gesture recognition accuracy before and after tracking in Table 5. It can be seen from Table 5 that when the score threshold is 0.5, the mAP index of gesture recognition before tracking is 0.527, and the mAP index after tracking is 0.735. The improvement rate is 39.5% (about 40%). When the score threshold is set to 0.9, the mAP index of gesture recognition before is 0.327, and the mAP index after tracking is 0.670. The improvement rate is 104.9% (about 105%). With the increase of the score threshold, the improvement rate increased from 40% to 105%, which proves the correctness of our conclusion. We have made additions and revision to the paper.

Line 283: We use the trained MobileNetV3-small model to test the underwater gesture recognition accuracy before and after tracking. Table 5 shows the mean average precision (mAP) of gesture recognition before and after the addition of tracking in detail. It can be seen from the table that mAP index improves greatly after the addition of underwater human tracking. With the increase of the score threshold (i.e., remove network detection results below the threshold), the improvement rate increased from 40% to 105%.

And from Figure 12, it can be seen that with increase of the set score threshold, after adding underwater human tracking, the performance of gesture recognition is more stable, and it can continue to maintain a high recognition accuracy. The reason is that the tracking removes a large number of interfering pixels and make gesture recognition focus on finding preset gestures from the person body image rather than the whole image. To sum up, adding the step of person tracking before underwater gesture recognition can not only improve the detection speed (due to the smaller input image), but also greatly improve the accuracy of recognition results.

Table 5. Gesture recognition accuracy before and after person tracking with MobileNetV3 Small model.

Score threshold

mAP (before tracking)

mAP (after tracking)

Improvement rate

0.5

0.527

0.735

39.5%

0.6

0.491

0.735

49.7%

0.7

0.418

0.720

72.2%

0.8

0.364

0.671

84.3%

0.9

0.327

0.670

104.9%

Comment:

In conclusion, there are not enough numerical estimates of the proposed algorithm

Response to comment:

Thanks for this comment. Due to our negligence, we did not show more accurate calculation results in the original paper to show accuracy improvement before and after tracking. After our revision, we have added Table 5 to the paper to show our experimental results. As can be seen from Table 5 and Figure 12, our proposed algorithm can greatly improve the accuracy of underwater gesture recognition, and with the increase of the score threshold, the gesture recognition accuracy after adding tracking only slightly decreases, the recognition stability is higher.

Table 5. Gesture recognition accuracy before and after person tracking with MobileNetV3 Small model.

Score threshold

mAP (before tracking)

mAP (after tracking)

Improvement rate

0.5

0.527

0.735

39.5%

0.6

0.491

0.735

49.7%

0.7

0.418

0.720

72.2%

0.8

0.364

0.671

84.3%

0.9

0.327

0.670

104.9%

Comment:

Line 3. "However, the underwater robots on the market have some problems, such as single function, poor effect, ..." what do "poor effect" and "single function" mean? Rewrite more specifically.

Response to comment:

Thanks for this comment. We ignored this in the process of writing “Abstract”, and we have supplemented the “poor effect” and “single function” in the paper.

Line 3: However, the underwater robots on the market have some problems, such as only a single function of object detection or tracking, the use of traditional algorithms with low accuracy and robustness, and the lack of effective interaction with divers.

Comment:

Line 6. Explain the abbreviation "KFC"

Response to comment:

Thanks for this comment. We have explained it in the paper.

Line 6, 175: change to “the kernelized correlation filters (KCF)”

Comment:

Line 12. "gesture Recognition" why is there a capital letter here?

Response to comment:

Thanks for this comment. We have corrected this error in the paper.

Line 12: change the “gesture Recognition” to “gesture recognition”

Comment:

Line 16,17. "Marine" why is this word capitalized?

Response to comment:

Thanks for this comment. We have corrected this error in the paper.

Line 16, 17: change the “Marine” to “marine”

Comment:

Line 21. "diveRs"? Perhaps a letter is missing here

Response to comment:

Thanks for this comment. We have corrected this error in the paper.

Line 21: change the “dives” to “divers”

Comment:

Line 24,25,68,84.. "man-machine" I believe more common term is "human-machine"

Response to comment:

Thanks for this comment. In the paper, we have changed the word uniformly to “human-machine”

Line 24, 25, 35, 68, 84: change the “man-machine” to “human-machine”

 Comment:

Line 120, "Where, denotes the recovered image, and..." perhaps something is missing here. In formula (1) the notation is not explained.

Response to comment:

Thanks for this comment. We overlooked this place in the proofreading process of the paper. We have been explained the various notation in formula (1) in the paper.

Line 120: where  denotes the recovered image, and  corresponds to the direct light transmission map (DLTM),  is called the veiling light which is the background light at infinity under underwater imaging conditions,  represents back scattered light transmission map (BLTM).

Comment:

Line 153. What is "Smmoth" mean?

Response to comment:

Thanks for this comment. We have corrected this error in the paper.

Line 153: change the “Smmoth” to “Smooth”

Comment:

Algorithm 1 "Underwater person tracking algorithm" perhaps it is better to replace "person" with "human" since the wording "human tracking algorithm " occurs earlier in the text.

Response to comment:

Thanks for this comment. We have unified the words in the paper as “underwater human tracking”

Line 4, 42, 54, 102, 137, 272, 273, 282, 285, 289, 330, Figure 1, Figure 11 and Figure 12: change the “person tracking” to “human tracking”.

 

Author Response File: Author Response.pdf

Reviewer 2 Report

The paper presents a underwater robot-based system with gesture recognition. The created prototype is tested and evaluated, showing improved visual processing features. Some limitations are explained and future work is outlined.

-It will be good if the authors in Introduction replace Sec.2,3,4,5 with Section 2,3,4,5.

-It will be good the used variables of parameters in equation 1 to be explained immediately after it. The same could be said for equation 2. After equation 3 should have "," and Where to begin with small letter "w". Check the text carefully for typos.

-The rows on Figure 6,7,10 should be captioned with a, b, c like the previous figures. The same for Figures 13 and 16.

Author Response

Response to Reviewer’s Comments

Reply to Reviewer #2

Thank you so much for your attention to our manuscript. Your valuable comments and suggestions are really appreciated and help us revise the manuscript for resubmission. We have made modifications in the following aspects.

Blue font indicates each comment, black font indicates our response for each comment, and red font indicates our changes in the paper.

Comment:

It will be good if the authors in Introduction replace Sec.2,3,4,5 with Section 2,3,4,5.

Response to comment:

Thanks for this comment. We have made changes in the paper.

Line 57-59: change the “Sec.” to “Section”

Comment:

It will be good the used variables of parameters in equation 1 to be explained immediately after it. The same could be said for equation 2. After equation 3 should have "," and Where to begin with small letter "w". Check the text carefully for typos.

Response to comment:

Thanks for this comment. We overlooked this place in the proofreading process of the paper. We have been explained the various notation in equation (1) and modified the text after equation 3.

Line 120: where  denotes the recovered image, and  corresponds to the direct light transmission map (DLTM),  is called the veiling light which is the background light at infinity under underwater imaging conditions,  represents back scattered light transmission map (BLTM).

Line 153: change to “where N is the number of matched default boxes,”

Comment:

The rows on Figure 6,7,10 should be captioned with a, b, c like the previous figures. The same for Figures 13 and 16.

Response to comment:

Thanks for this comment. We have edited several figures in the paper.

 

 

 

Author Response File: Author Response.pdf

Back to TopTop