Next Article in Journal
Advances in Reconfigurable Vectorial Thrusters for Adaptive Underwater Robots
Previous Article in Journal
Numerical Investigation on Hydrodynamic Characteristics of Immersed Buoyant Platform
 
 
Article
Peer-Review Record

Deep Learning for Deep Waters: An Expert-in-the-Loop Machine Learning Framework for Marine Sciences

J. Mar. Sci. Eng. 2021, 9(2), 169; https://doi.org/10.3390/jmse9020169
by Igor Ryazanov 1,†, Amanda T. Nylund 2,†, Debabrota Basu 1,‡, Ida-Maja Hassellöv 2,* and Alexander Schliep 1,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
J. Mar. Sci. Eng. 2021, 9(2), 169; https://doi.org/10.3390/jmse9020169
Submission received: 4 December 2020 / Revised: 29 January 2021 / Accepted: 2 February 2021 / Published: 7 February 2021
(This article belongs to the Section Coastal Engineering)

Round 1

Reviewer 1 Report

Author write the abstract covering complete paper.

Author introduced domain and also problem is explain.

Author proposed methodology in detail.

Author discussed dataset and results as well.

Author discussed results and concluded work in comprehensive manner.

Author Response

We thank Reviewer #1 for this positive feedback. We understand that no changes were requested based on this review.

Reviewer 2 Report

The authors introduced the use of ML for Marine Sciences. The major comments are as follows:

  1. the paper is very long in introducing ML methods; CNN, augmentation It is better to concentrate on the main aims of the paper than providing a long definition of ML concepts.
  2. Resnet was used in the analysis - the main question is then why it was not used as a pretrained network to remove the need for large dataset and augmentation. 
  3. Both the train and test set was balanced by data augmentation; it is always better to balance the train set but use the original imbalanced test set to be sure results are generalisable;
  4. why data augmentation was used to solve the small data size especially as there is a need for someone to validate images. There are self-supervised, semi-supervised, or transfer learning concepts that help with the small sample size. Augmentation is a solution usually to balancing the datasets. For instance, you could use Resnet trained on other datasets. 
  5. why gmm was used for data augmentation

Author Response

We are grateful for the comments posed by reviewer #2 and please find our responses in the numbered list below.

  1. We concur with the reviewer that the exposition about machine learning is more extensive than would be standard for a submission to a machine learning journal or conference. However, we expect that more readers of the publication will come from Marine Science and related areas and thus, they will appreciate the broader contextualization of the proposed method. Therefore, we kindly suggest leaving the current introduction to machine learning as it is.
  2. Pretraining a deep neural network, such as ResNet, would be one of the natural ways to address the issue of sparsely labelled data. However, to perform the knowledge transfer, a compatible source dataset is required at the beginning. Unfortunately, despite extensive surveying of the existing literature and consulting with the representatives of the companies producing acoustic scanner devices, we concluded that there is no suitable publicly available data source.
    We acknowledge applying transfer learning as one of the most natural directions to take, in case suitable datasets become available, and already mention it in the discussion section (lines 461-462) of the originally submitted manuscript.
  3. We fully agree that for a thorough evaluation, the test set needs to resemble the original data as closely as possible in terms of class balance. Our choice to balance the test dataset is dictated by two reasons. Firstly, the data imbalance is not necessarily an inherent property of this type of data. In particular, the next planned experiment with the submerged scanner involves putting it directly below the ferry lane, which would likely drastically change the data balance. Secondly, the available dataset is so small that setting aside a significant part of it before performing augmentation would likely compromise the example generation. Once more representative experimental data is available, we hope to continue evaluating the framework and testing its generalization ability.
  4. As mentioned above (Reply No. 2), knowledge transfer in our problem would be desirable, but is, unfortunately, not possible due to the unique nature of the dataset. We also considered the self-supervised and semi-supervised methods, however, as the literature suggests, their application would require more data by at least one order of magnitude. For example, see https://arxiv.org/pdf/2002.08721.pdffor a review, the sparsely labeled STLF-10 dataset, where self- and semi-supervised methods are evaluated, has 150k images with 5,000 training labels. Therefore, the size of the ship wake dataset was again a bottleneck, so we focused on data augmentation. That said, the expert-in-the-loop framework does not necessarily require this specific type of data augmentation and can be as valid with other approaches if the labeled data is scarce, so other methods of handling the scarcity are worth investigating.
  5. We were striving for a simple, efficient, and easy to train model in the augmentation. GMMs have been used widely and with great success for similar tasks in other settings. We do not believe that the choice is critical, but we are willing to consider other options in the future if there is evidence of potential for better results.

Reviewer 3 Report

The manuscript presents an interesting topic and English is acceptable. However, I have two pressing concerns:

1) Why do the authors only use a dataset for the case study?

2) Where is a comparison with similar proposals in order to illustrate the quality of the proposal here?

Moreover, I have also to recommend a conclusions section. It is very strange to finish a manuscript with a discussion. I think it is mandatory to have a proper distribution of the sections.

I desire the best of lucks for the authors.

Author Response

We thank Reviewer #3 for the advice to include a conclusions section, which is now added to the manuscript. As the journal guidelines states that a conclusion section is optional, and it is standard in many journals to combine the discussion and conclusion section, it was omitted in the first version of the manuscript. However, as authors we of course want to increase the readability of the paper. Thus, we have now added a conclusion section as well.

 

Changes in manuscript:

A conclusion section has been added (line 477-485). The last paragraph of the discussion section has been moved to the conclusion section (line 475-479 in the old manuscript). Rephrased last paragraph of the discussion, line 468-476.

 

The responses to the other two comments/questions are listed below.

 

  1. We interpret the first question from reviewer 3 as a question of why we have only used one dataset in the study. As stated in the introduction (line 70-72), there is currently only one publicly available dataset with annotated data of ship wakes in acoustic data, which is the one used in this study. If there were more datasets available, we would of course have included more datasets. However, the aim of the current study was to present a framework of how machine learning can be used to overcome the problem of small/scarce datasets, in emerging applications of acoustic measurements within marine science, where labelled datasets are not yet widely available.

 

  1. As stated in the introduction, ship-induced vertical mixing has not been well studied (line 47-48), and this is the first machine learning approach to this particular problem. In section “2.2 Machine Learning for Acoustic Detection in Marine Science”, we provide a review of previous studies involving machine learning and acoustic detection. As stated there, only one previous study has applied deep learning in acoustic classification (line 124-129). However, the focus of that study was identification of fish species, and there was no lack of labelled data. Hence it is not directly comparable to our proposed framework.

Suggested addition (line 486-487) to the manuscript clarifying that there are no comparable studies:
“Since this is the first study of its kind, the performance cannot be compared to the result of previous studies.”

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Authors answered all my questions. It is still my recommendation it is more correct to only include real data as part of the test set - even if all the data was used for the augmentation.

Author Response

Dear Reviewer,

Thank you for your input. As the Academic Editor, elaborated a bit further on your statement, I hope it is ok that we address the issue to both the Academic Editor and you in the same formulation.

The Academic Editor added the following:
I would have to agree with R2 on the issues of the augmented data set being used in the validation phase. I do believe it's best practice to only use the 'true' data here, while it may be needed to augment training data to create a generalized model. If you are worried about data size, would not something like 10-fold CV be appropriate (or just in general better than holding back a single subset (3.5.3) where you say it's noticeably different than the training was anyways?) This would help ensure the model was indeed generalizable. In reading further, from your results, while you don't call it 10-fold CV, you do something similar with 10 repeat tests of the combined real and synthetic data - where here I still do agree with the reviewer that I think this should just be 'real' data used.
Your skill is likely higher using the cleaner/smoother synthetic data as well in the validation set. If you choose not to follow the advice of myself and R2 on this, I suggest a solid justification in the text as to why this approach was done over just using the 'real' data.

Authors' answer:
The problem are the small numbers of examples and the precarious class imbalance. Clearly, with sufficient data we would have used a different evaluation scheme, using a large separate set of measurements. As the myriad examples of adversarial misclassifications have shown, even with ample data deep learning models are quite brittle due to overfitting. Deep learning models have the capacity to base classification on minor details (e.g. patches of specific textures on images), so we would maintain that having a larger variety of images in the testing phase, even if obtained by augmentation, is indeed prudent with the numbers as small as they are. In fact, we are not convinced that the smoother images are easier to classify, should the model indeed focus on specific texture patches for examples.

Test-time augmentation, while not mainstream, has been gaining popularity for this and other reasons (e.g. basing classification on majority rule of test data augmented via geometric transformations) recently. See some of the references below which provide additional information for our approach. We added two of the references and some commentary to section 3.8.

Given the proof-of-concept nature of the article, and the focus on leveraging the expert in a loop more effectively than just by labeling, we feel that this would be however well outside the scope.

Wang, Guotai, et al. "Aleatoric uncertainty estimation with test-time augmentation for medical image segmentation with convolutional neural networks." Neurocomputing 338 (2019): 34-45.
Zheng, Qinghe, et al. "A Full Stage Data Augmentation Method in Deep Convolutional Neural Network for Natural Image Classification." Discrete Dynamics in Nature and Society 2020 (2020).
Shanmugam, Divya, et al. "When and Why Test-Time Augmentation Works." arXiv preprint arXiv:2011.11156 (2020).

 

Best regards,

Ida-Maja Hassellöv on behalf of all the co-authors

Back to TopTop