Next Article in Journal
An Intelligent Energy Management System Solution for Multiple Renewable Energy Sources
Previous Article in Journal
Corporate Digital Responsibility: A Board of Directors May Encourage the Environmentally Responsible Use of Digital Technology and Data: Empirical Evidence from Italian Publicly Listed Companies
 
 
Article
Peer-Review Record

Application of Target Detection Method Based on Convolutional Neural Network in Sustainable Outdoor Education

Sustainability 2023, 15(3), 2542; https://doi.org/10.3390/su15032542
by Xiaoming Yang 1,2, Shamsulariffin Samsudin 1,*, Yuxuan Wang 3, Yubin Yuan 1, Tengku Fadilah Tengku Kamalden 1 and Sam Shor Nahar bin Yaakob 4
Reviewer 1: Anonymous
Reviewer 3: Anonymous
Reviewer 4:
Sustainability 2023, 15(3), 2542; https://doi.org/10.3390/su15032542
Submission received: 17 November 2022 / Revised: 10 January 2023 / Accepted: 28 January 2023 / Published: 31 January 2023
(This article belongs to the Section Sustainable Education and Approaches)

Round 1

Reviewer 1 Report

This work proposed target detection method based on submersible vision system. The method is described and some experimental results are shown. However, some problems exist.

1. The title is not match to the task proposed in this work. The title is about outdoor education while the method is focusing on realizing the intelligence of underwater robots.

2. The relationship between outdoor education and target information acquisition based on underwater robots.

3. The contributions of the proposed underwater camera information recognition should be clearly described.

4. The detail about ROI target should be described.

5. In the experimental part, the competitors should be introduced and discussed based on the experimental results.

Author Response

This work proposed target detection method based on submersible vision system. The method is described and some experimental results are shown. However, some problems exist.

  1. The title is not match to the task proposed in this work. The title is about outdoor education while the method is focusing on realizing the intelligence of underwater robots.

Reply: Thank you for your advice. We have adjusted the content of the article. In the introduction, the content of "outdoor education" is added. Outdoor education can be simply understood as physical education conducted in an outdoor environment. In the process of outdoor learning, students are in an outdoor environment. The outdoor environment is a factor that has an important impact on students, and will strengthen students' learning impression and perception experience. Before the formal start of outdoor teaching, effective teaching will be organized, and the organizer will analyze according to the actual outdoor conditions and teaching purposes. It will make scientific preparation for the problems that may arise in the outdoor environment, so that students can truly participate in the learning of the outdoor environment. Diving is an outdoor sports activity under high pressure.

  1. The relationship between outdoor education and target information acquisition based on underwater robots.

Reply: Thank you for your comments. The relationship between the two has been added at the end of the introduction section. In the outdoor education of underwater robots, the acquisition of target information by intelligent robots can help students better learn and understand the underwater world.

  1. The contributions of the proposed underwater camera information recognition should be clearly described.

Reply: Your suggestion is very useful. A clear explanation has been added at the end of the introduction section. Underwater camera information recognition plays an important role in the development of intelligent manned diving.

  1. The detail about ROI target should be described.

Reply: We have added in Section 3.3. The function of Region of Interest (ROI) pooling layer is to extract feature maps of the same size from ROI of different sizes mapped on convolutional feature maps. ROI pooling has only one pyramid layer, which turns a feature map of any size into a feature map of a fixed size. It enables significant acceleration of training and testing and also maintains high detection accuracy. This layer has two inputs: a fixed size feature map obtained from a depth convolution network with multiple convolutions and maximum pool layers.

  1. In the experimental part, the competitors should be introduced and discussed based on the experimental results.

Reply: Thank you for your advice. As the results have been compared with relevant studies, more comparative content has been added in Section 4.4. Compared with the research of Yaman et al. (2021), the evaluation index of classification performance of underwater recognition model is slightly higher.

Reviewer 2 Report

Paper is taking into consideration just short modification of object detection approach and it does not include any novelty.

Author Response

Paper is taking into consideration just short modification of object detection approach and it does not include any novelty.

Reply: Thank you for your comments. In the last paragraph of the introduction section, we revised the innovative points of the article. The innovation of the research is to propose an underwater camera data mining method based on deep learning for outdoor diving education.

Reviewer 3 Report

This manuscript proposed a novel deep learning-based method for sustainable development of outdoor education, where underwater data were processed and analysed using a convolutional neural network (CNN). The performance of the proposed method has been validated based on real image data, with satisfactory results. Overall, the topic of this research is interesting, and the manuscript was well organised and written. The detailed comments are given as follows.

1.       The contribution and innovation of the manuscript should be clarified clearly in abstract and introduction.

2.       Broaden and update literature review on CNN/deep learning and its applications in real data/image processing. E.g. Torsional capacity evaluation of RC beams using an improved bird swarm algorithm optimised 2D convolutional neural network; Vision-based concrete crack detection using a hybrid framework considering noise effect.

3.       The performance of CNN is heavily dependent on the setting of hyperparameters. How did the authors set them in this research to achieve the optimal prediction performance?

4.       How about the robustness of the proposed method against noise effect?

5.       In abstract, it is not necessary to give the conclusions in detail.

 

6.       More future research should be included in conclusion part.

Author Response

  1. The contribution and innovation of the manuscript should be clarified clearly in abstract and introduction.

Reply: Thank you for your advice. The abstract and introduction both emphasize the innovation of the article. The innovation of the research innovatively proposes a method based on convolutional neural network to mine the target information in underwater camera data. The research contributions have been added at the end of the abstract and introduction section, respectively.

  1. Broaden and update literature review on CNN/deep learning and its applications in real data/image processing. E.g. Torsional capacity evaluation of RC beams using an improved bird swarm algorithm optimised 2D convolutional neural network; Vision-based concrete crack detection using a hybrid framework considering noise effect.

Reply: As required, some documents have been updated and expanded in the second part. Yu, Liang et al. (2022) established a data driven model based on two-dimensional convolutional neural network. Yu, Samali et al. (2022) proposed a vision-based automatic recognition of concrete structure surface conditions. The model consists of the most advanced pre-trained convolutional neural network, migration learning, and decision level image fusion.

  1. The performance of CNN is heavily dependent on the setting of hyperparameters. How did the authors set them in this research to achieve the optimal prediction performance?

Reply: Thank you for your advice. The content of hyperparameter setting has been added at the end of Section 3.4. The hyperparameters in the CNN model are set as follows: Ph=Kh-1, Pw=Kw-1 in the padding setting; Stride setting: output shape [(Nh Kh+Ph+Sh)/Sh]*[(Nw Kw+Pw+Sw)/Sw] according to the step of the given height Sh and width Sw, rounded down.

  1. How about the robustness of the proposed method against noise effect?

Reply: Thank you for raising this question. This research does not analyze the robustness under the influence of noise, and this part will be carried out in the future research.

  1. In abstract, it is not necessary to give the conclusions in detail.

Reply: The detailed conclusions in the abstract have been deleted.

  1. More future research should be included in conclusion part.

Reply: Thank you for your advice. For all your suggestions, we have added content about future research in the conclusion section. In the future, the details of robustness analysis of noise impact and super parameter setting of CNN model will be added.

Reviewer 4 Report

The author used the Single Shot-multiBox Detector algorithm (SSD) to detect underwater objects. The author experimented based on two datasets: conventional and laser-gated cameras' underwater images. The datasets contain four classes: person, fish, AUV, and other objects. For the convention set, the datasets were collected from various sources, such as search engine, open underwater datasets, and taken by the research team in the laboratory pool, including 1,073 images. The laser-gated dataset contains 992 images in total. If I understand correctly, the author used the SSD algorithm using ResNet34 as the backbone convolutional neural network (CNN) to detect underwater objects. Then, the features are extracted and sent to classify (as shown in Figure 9).

 

The author could be more concerned about these comments.

  • What is the research contribution? Please provide strong contributions.
  • In the title, suggest removing 'image processing.'
  • In section 3.3 'Algorithm and design of submersible camera model based on machine learning,' I suggest the author change the word 'machine learning' to 'deep learning technique.'
  • As shown in Figure 8, the author presented a diagram of the data filtering flowchart. In the decision box 'contains useless information', could the author describe how the algorithm keeps or discard images? Which technique does the author use?
  • As shown in Figure 9, in the 'feature extraction' box, could the author shows which feature extraction technique is used in this process? Could the author explain more clearly why the 'softmax' was sent to parametric regression, and the 'regressor' was sent to feature classification?
  • The author presented in the dataset section that two datasets contain four classes: person, fish, AUV, and other objects. So, why are seagrass, coral, and reef present in Figures 10, 11, and 12? 
  • The author used accuracy, precision, recall, specificity, and F1-score for the evaluation metrics. So, the author visualized the experimental results in a graph presented in each class, as shown in Figure 13. Please report the overall performance as well.
  • The author evaluated the proposed method on two datasets. Could the author compare the results with other research that used these two datasets?
  • More experiments are required.
  • For the detection, could the author present the mAP value?
  • The discussion section is required.

Author Response

The author used the Single Shot-multiBox Detector algorithm (SSD) to detect underwater objects. The author experimented based on two datasets: conventional and laser-gated cameras' underwater images. The datasets contain four classes: person, fish, AUV, and other objects. For the convention set, the datasets were collected from various sources, such as search engine, open underwater datasets, and taken by the research team in the laboratory pool, including 1,073 images. The laser-gated dataset contains 992 images in total. If I understand correctly, the author used the SSD algorithm using ResNet34 as the backbone convolutional neural network (CNN) to detect underwater objects. Then, the features are extracted and sent to classify (as shown in Figure 9).

 

The author could be more concerned about these comments.

  • What is the research contribution? Please provide strong contributions.

Reply: Thank you for your advice. We updated our research contributions in the abstract and introduction. Diving project is a new field in outdoor education. The contribution of this research is to promote the application of deep learning technology in outdoor diving education.

  • In the title, suggest removing 'image processing.'

Reply: Thank you for your advice. We have revised the title according to your request. The revised title is: The Sustainable Development of Outdoor Education Using Convolutional Neural Network.

  • In section 3.3 'Algorithm and design of submersible camera model based on machine learning,' I suggest the author change the word 'machine learning' to 'deep learning technique.'

Reply: Thank you for your advice. The title of section 3.3 has been modified according to your requirements.

  • As shown in Figure 8, the author presented a diagram of the data filtering flowchart. In the decision box 'contains useless information', could the author describe how the algorithm keeps or discard images? Which technique does the author use?

Reply: Thank you for your comments. We have added specific data filtering content below Figure 8. Data filtering refers to the quantitative filtering of data on the basis of not affecting the subsequent recognition results. The original data is input into a filter. After the filter filtering, images containing useless information and noise information are eliminated, leaving images containing valid information. Data filtering needs to follow certain filtering criteria. This paper uses the following filtering strategies to filter the original data. The specific filtering strategies mainly include two parts: discard images containing useless information, and discard images containing too much noise information.

  • As shown in Figure 9, in the 'feature extraction' box, could the author shows which feature extraction technique is used in this process? Could the author explain more clearly why the 'softmax' was sent to parametric regression, and the 'regressor' was sent to feature classification?

Reply: Thank you for your question. The feature extraction technology based on convolutional neural network is used in Figure 9. After checking, we modified the content in Figure 9. It should be "softmax" sent to "Feature Classification" and "Regressor" sent to "Parametric Regression". This is because the Softmax layer is mainly used to solve the multi classification problem. It is specifically responsible for calculating the probabilities of different features to obtain the probability distribution of different types. Softmax regression is a supervised learning algorithm based on the idea of logical regression. It can achieve excellent classification results through efficient combination with different learning algorithms. The function of Regressor is parameter regression.

  • The author presented in the dataset section that two datasets contain four classes: person, fish, AUV, and other objects. So, why are seagrass, coral, and reef present in Figures 10, 11, and 12? 

Reply: Sorry we didn't make this clear. We added an explanation in section 3.4. The data in the model recognition part of this paper is the underwater camera information collected by the Chinese underwater vehicle during the diving process, including coral images. That is, the image data used in Figures 10, 11 and 12. The data set used in the model classification and evaluation part includes people, fish, autonomous underwater vehicles and other objects.

  • The author used accuracy, precision, recall, specificity, and F1-score for the evaluation metrics. So, the author visualized the experimental results in a graph presented in each class, as shown in Figure 13. Please report the overall performance as well.

Reply: Thank you for your advice. We have added an overall review of the model at the end of section 4.4. From the overall performance of target recognition and classification performance evaluation of this model, we can see that this model has good underwater automatic recognition and classification performance.

  • The author evaluated the proposed method on two datasets. Could the author compare the results with other research that used these two datasets?

Reply: Thank you for your advice. We added a literature comparison using the same data set in section 4.3.

  • More experiments are required.

Reply: Thank you very much for your advice. The performance test experiment of the model in this research is not comprehensive enough. We will add more experiments on the model in future research.

  • For the detection, could the author present the mAP value?

Reply: Thank you for your advice. In the experiment, we used F1Score instead of mAP. This part will be added in future research.

  • The discussion section is required.

Reply: Your suggestion is very good. We added the discussion part of the article. In this part, it is emphasized that the research ideas and methods in this paper can realize the detection of underwater targets, which is of great significance in the field of diving and outdoor sports. The optimized algorithm structure will make the recognition accuracy and speed better.

Round 2

Reviewer 1 Report

This work has been revised as suggested by reviewer. However, the competitors are still limited. 

Author Response

This work has been revised as suggested by reviewer. However, the competitors are still limited. 

Reply: Thank you for reviewing the manuscript. After checking the article, we further revised it.

Reviewer 2 Report

The presented paper after revision got higher status but in my opinion still requires few improvements to be considered for publication.

1) Some contribution was described in corrected manuscript but still - what is the advantage of application of author’s network in contrast to training for example Yolo Net (or any other) with the same data and application for that purpose?

2) Can the authors present also results in the form of confusion matrix?

3) What are the potential practical applications and your further plans in that area - improve final sentence in conclusions part.

4) What is the division of train/test (in %) in the research procedure?

5) Was the cross validation performed in the results?

6) The application of simulated data (from computer games environment) is getting much more popular. Please comment in your work if such an application could be fitted similarly in evaluation and validation process (example https://doi.org/10.3390/s20174960) and training mode (https://doi.org/10.3390/s20185250) for improvement of final results. In such an approach one could simulate different animated models and apply for training and validation.

Author Response

The presented paper after revision got higher status but in my opinion still requires few improvements to be considered for publication.

1) Some contribution was described in corrected manuscript but still - what is the advantage of application of author’s network in contrast to training for example Yolo Net (or any other) with the same data and application for that purpose?

Reply: Thank you for your advice. The innovation points of the article have been updated in the introduction section. Compared with other networks, the advantage of the model in this paper is that the data set uses ResNet34 network to train the network model to realize automatic recognition and classification of underwater targets and improve the automation and intelligence of underwater optical image detection.

2) Can the authors present also results in the form of confusion matrix?

Reply: Thank you for your suggestion. In consideration of the size of the data results, we have selected a line chart in the presentation of the data results. Your suggestions are very useful. In the future research, we will consider using confusion matrix to present the result data.

3) What are the potential practical applications and your further plans in that area - improve final sentence in conclusions part.

Reply: Thank you for your advice. The last sentence of the conclusion has been revised. This paper is expected to be applied in the field of underwater intelligent detection, laying the foundation for the development of underwater intelligent detection.

4) What is the division of train/test (in %) in the research procedure?

Reply: Thank you for your comments. The ratio of training and test data set in this article is 3:7 (30% and 70%), which has been added to section 3.4.

5) Was the cross validation performed in the results?

Reply: Sorry, the results of this article are not cross verified. It will be added in a future study.

6) The application of simulated data (from computer games environment) is getting much more popular. Please comment in your work if such an application could be fitted similarly in evaluation and validation process (example https://doi.org/10.3390/s20174960) and training mode (https://doi.org/10.3390/s20185250) for improvement of final results. In such an approach one could simulate different animated models and apply for training and validation.

Reply: Thank you very much for your advice. We have understood what you mentioned and decided to use more simulation data for analysis in future research.

Reviewer 4 Report

Thank you the author for spending time to change all the concerns. The revised manuscript is ready to be published in the journal.

Author Response

Thank you for your comments.

Round 3

Reviewer 2 Report

Dear Authors,

I have a feeling that my suggestion was treated as "maybe" in future works that will be included in your potential research. My idea was to improve your current work so that your potential readers will get from your potential paper as much as is it possible. Still in my opinion few points were not correctly addressed:

1. In any YOLO method, you can include additional classes with your labels. So there is a potential hypothesis that YOLO will get better results in detection than your proposed architecture. Without training and comparing results with any other architecture how you can say that it is better?

2. Confusion matrix is a clear way of demonstrating final result of classification which gives the reader a better understanding of the results. If you have all results saved in your files then what is a problem of adding such a matrix in some attachment of your manuscript?

3. Ok thank you

4. The division of your training/test is fully not usual. Normally this ratio is train:test - 7:3, 8:2 but in your research is totally different. What is a reason of such a division - can you address works that are doing in such a way?

5. In my opinion without cross validation it is difficult to say if your results are comparable or you just chose the best ones.

6. This is remark that should be addressed in the manuscript - try to discuss in your work what are the potential advantages of using synthetic data in improving quality of classification with examples etc.

Author Response

I have a feeling that my suggestion was treated as "maybe" in future works that will be included in your potential research. My idea was to improve your current work so that your potential readers will get from your potential paper as much as is it possible. Still in my opinion few points were not correctly addressed:

Reply: Sorry to give you a bad impression. We have further modified the article according to your specific requirements below.

  1. In any YOLO method, you can include additional classes with your labels. So there is a potential hypothesis that YOLO will get better results in detection than your proposed architecture. Without training and comparing results with any other architecture how you can say that it is better?

Reply: Thank you for your advice. We have modified the description of target detection algorithm in Section 4.4. We introduced a method of fish unique behavior recognition based on simulated feature point selection. This method combines feature point extraction with special behavior recognition, and the accuracy of eating behavior detection is 96.02%. The accuracy of underwater classification detection in this paper is not much different from previous studies, which shows that CNN model can be used in image automatic detection.

  1. Confusion matrix is a clear way of demonstrating final result of classification which gives the reader a better understanding of the results. If you have all results saved in your files then what is a problem of adding such a matrix in some attachment of your manuscript?

Reply: Sorry that it was not properly modified before. We added the confusion matrix in Section 4.4.

  1. Ok thank you

Reply: Thank you very much for your review and support.

  1. The division of your training/test is fully not usual. Normally this ratio is train:test - 7:3, 8:2 but in your research is totally different. What is a reason of such a division - can you address works that are doing in such a way?

Reply: We are very sorry for our negligence. We deleted the wrong expression and added the correct data set allocation proportion in Section 3.4. The data set in this paper is divided into training set, verification set, and test set according to 7:2:1.

  1. In my opinion without cross validation it is difficult to say if your results are comparable or you just chose the best ones.

Reply: Sorry that I didn't explain the cross validation clearly. A simple cross validation is carried out in Section 4.4, except that the training set is used for model training. The accuracy rate of network classification and recognition obtained by using the test set is 0.966. The confusion matrix is drawn for the classification results obtained from the validation set in the dataset.

  1. This is remark that should be addressed in the manuscript - try to discuss in your work what are the potential advantages of using synthetic data in improving quality of classification with examples etc.

Reply: Thank you for your advice. We added this part to the discussion, added the impact on this study if synthetic data are used, and gave an example.

Back to TopTop