Next Article in Journal
Improved Ternary Gray-Code Phase Unwrapping Algorithm for 3D Measurement Using a Binary Defocusing Technique
Previous Article in Journal
Automatic Categorization of LGBT User Profiles on Twitter with Machine Learning
 
 
Article
Peer-Review Record

Side-Scan Sonar Image Classification Based on Style Transfer and Pre-Trained Convolutional Neural Networks

Electronics 2021, 10(15), 1823; https://doi.org/10.3390/electronics10151823
by Qiang Ge 1, Fengxue Ruan 1, Baojun Qiao 1, Qian Zhang 2, Xianyu Zuo 1 and Lanxue Dang 1,*
Reviewer 1: Anonymous
Reviewer 2:
Electronics 2021, 10(15), 1823; https://doi.org/10.3390/electronics10151823
Submission received: 4 June 2021 / Revised: 24 July 2021 / Accepted: 26 July 2021 / Published: 29 July 2021
(This article belongs to the Section Computer Science & Engineering)

Round 1

Reviewer 1 Report

The manuscript proposed a CNN-based method of side-scan sonar image generation and a transfer learning-based method for automatic classification of side-scan sonar images. The experiments demonstrated the efficacy of VGG16, VGG19, and ResNet in terms of classification. However, the improvement of using 'similar side-scan sonar images' seems to be not significant. Tables 3 and 4 showed that only two extra airplane samples are correctly classified. The high overall accuracy listed in Table 2 may also be contributed mostly from a large number of seafloor and wreck samples. To prove the usefulness of the Style Transfer Network, more samples should be put into testing, and generate more similar images with fewer real training samples. 

Some questions and typos:
1. L139: Why the 'fourth layer' is emphasized here? I don't see how Eq. 4 can imply the 4th layer.
2. Eq. 5: The 'alpha' seems to have a different style comparing to the 'beta' alphabet.
3. Fig. 4: The equation 'Lcontent=summation()...' and its left 'R^L' have inconsistent notation format comparing to Eq. (4) and R^l defined in L125.
4. L254: asS^l --> as S^l
5. Table 1: The sample images for seafloor and wreck are missing. They should be reported as well to demonstrate how difficult the classification problem is.
6. Fig. 6: Why do the victim and aircraft have a similar size? Is it true for the side-scan sonar?
7. Table 1, 3, 4: Downing victim --> Drowning victim?
8. L351: figure 10 --> Figure 10

Author Response

1.L139: Why the 'fourth layer' is emphasized here? I don't see how Eq. 4 can imply the 4th layer.

When extracting content features by convolution neural network, the features extracted by low-level convolution will retain more detailed information. After style transfer, the edge information of "similar to side scan sonar image" is obvious, which is not consistent with the fuzzy edge characteristics of side scan sonar image, while the high-level features are generally about the layout information of the image, so the features on the fourth level convolution are selected for style transfer.  only stores the content features obtained by the fourth layer convolution. Therefore, in equation 4,  represents the content characteristics of the fourth layer, and the relevant content in line 123 of the paper has been corrected.

 

2.Eq 5: The ‘alpha’ seems to have a different style comparing to the ‘beta’ alphabet.

We have corrected The format error of equation 5 in the paper.

 

3 . Fig 4 : The equation’ = summation…’and its left ‘R^L’ have inconsistent notation format comparing to Eq. ( 4 ) and R^A defined in L125

We have modified the equation in Figure 4 and the  format error.

 

4.L254:asS^Làas S^L

We have modified the mistakes of L125 in the paper.

 

  1. Table 1 : The sample images for seafloor and wreck are missing . They should be reported as well to demonstrate how difficult the classification problem is

We have added images of seafloor and wreck to figure 6 in Section 4.2.1.

 

  1. Fig 6 : Why do the victim and aircraft have a similar size ? Is it true for the side-scan sonar ?

The side scan sonar image in the data set is accumulated for a long time. Each time the side scan sonar image is collected, the height of sonar equipment from the seabed is different, so the size of detected objects is also different. When the sonar equipment is close to the seabed, the detected object image is relatively large, and the object image is relatively small when it is higher from the seabed. At the same time, in order to facilitate the display, this paper defines the aircraft image and the drowning image as the same size, so it seems that the aircraft and the drowning victim have similar size. But in the data set, the image of the aircraft is larger than the drowning victim, and the same is true of the wreck. To avoid misunderstanding, the aircraft image has been replaced in Figure 6.

 

  1. Table 1.3.4 : Downing victim-- > Drowning victim ?

We have modified the spelling mistakes in Tables 1, 3 and 4.

  1. L351 : figure 10-- > Figure 10

In the paper, we have modified the relevant case spelling errors.

Author Response File: Author Response.pdf

Reviewer 2 Report

The manuscript is devoted to relevant topics and could make a significant contribution to the development of the studied area. However, as presented, I cannot recommend it for publication. Essential revision of both the text itself and a more detailed description of the presented solutions is required.

Here are some comments:

In expression (1), the summation limits are not fully defined, which can lead to ambiguity. Expression (2) is incorrect. In Expression (3), the letter N must be written in italics. In addition, the summation limits are also undefined. The same applies to expression (4).

On line 157, the letter N should be in italics: N. Line 158 contains an undefined letter x. Line 164 contains an undefined letter k.

Figure 2 is not very informative. The sizes of the arrays are not indicated, there are also no other specific data. Expressions and letters in Figure 4 also require correction.

In general, the text describes the results of the study in a very general way.

I ask you to radically revise the manuscript once more and only then send it again.

Author Response

1.In expression (1), the summation limits are not fully defined, which can lead to ambiguity. Expression (2) is incorrect. In Expression (3), the letter N must be written in italics. In addition, the summation limits are also undefined. The same applies to expression (4).

we have defined i , j and k in expression (1), modified the error in expression(2), and defined i and j of expression (3) and (4) in Section 2.1.

 

  1. On line 157, the letter N should be in italics: N. Line 158 contains an undefined letter x. Line 164 contains an undefined letter k.

It has been modified in the corresponding part of the paper, and N has been changed to N, where x is a spelling error and has been changed to×,At the same time, K is defined.

 

  1. Figure 2 is not very informative. The sizes of the arrays are not indicated, there are also no other specific data. Expressions and letters in Figure 4 also require correction.

We have added detailed data to Figure 2. At the same time, and modified the expression and letters in Figure 4.

 

  1. In general, the text describes the results of the study in a very general way.

In Chapter 4.2.3 of the paper. We re - describe the experimental results, not only to summarize the experimental results in general, but to analyze the experimental results from the principle. At the same time, a set of visualization experiments are added to verify the effectiveness of the model from the visual point of view.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

L208: 'depth feature': comparing to low- and mid-level features mentioned in L201, 'depth' is just a noun without shallow or deep meaning. According to the context, I suggest using 'deep feature' or 'high-level feature'.

L294: If the extra images are generated by the style transfer network, I think calling it 'simulated side-scan sonar images' is more often than 'similar side-scan sonar images'.

Fig. 9 caption: How about rewriting in this way: "The first 12 channel images in the first convolution layer of (a) vgg19; and (b) pre-trained vgg19." It is grammatically more clear.

L423: work to?

It seems that Tables 3 and 4 didn't mention which side is the true categories or predicted categories.

Table 4: There are 5 wreck images and 1 seafloor classified as airplane and seafloor (or the other way around depends on whether the row or the column represents the true categories). How do these images look like? Are they visually like airplanes and seafloor? It will be nice to display them and investigate the reason for the few prediction errors. Since style transfer simulation can help correct airplane classification, can it be used to simulate more wreck images or seafloor images to reduce the classification error to zero?

Author Response

1.L208: 'depth feature': comparing to low- and mid-level features mentioned in L201, 'depth' is just a noun without shallow or deep meaning. According to the context, I suggest using 'deep feature' or 'high-level feature'.

We have modified the 'depth feature' to 'high-level feature' in L208.

 

  1. L294: If the extra images are generated by the style transfer network, I think calling it 'simulated side-scan sonar images' is more often than 'similar side-scan sonar images'.

We have modified all the 'similar side-scan sonar images' to 'simulated side-scan sonar images' in our paper.

 

  1. Fig. 9 caption: How about rewriting in this way: "The first 12 channel images in the first convolution layer of (a) vgg19; and (b) pre-trained vgg19." It is grammatically more clear.

We have modified the corresponding content in our paper.

 

  1. L423: work to?

We have modified the spelling mistakes in our paper.

  1. It seems that Tables 3 and 4 didn't mention which side is the true categories or predicted categories.

The vertical side is the real category, and the horizontal side is the predicted category. We have modified this problem in our paper.

  1. Table 4: There are 5 wreck images and 1 seafloor classified as airplane and seafloor (or the other way around depends on whether the row or the column represents the true categories). How do these images look like? Are they visually like airplanes and seafloor? It will be nice to display them and investigate the reason for the few prediction errors. Since style transfer simulation can help correct airplane classification, can it be used to simulate more wreck images or seafloor images to reduce the classification error to zero?

In our paper, we have presented the five airplane images and one seafloor image. And we have explained the reason of the classification errors and further verified it by experiments. Of course, I have also simulated more "simulated side scan sonar images" for experiments, but the classification errors have not been further reduced. Because we can train the pre-trained CNN well by using real images and existing simulated images, we can not reduce the classification error by adding more simulation data. The reason for some image classification errors is that the feature extraction ability of vgg19 network is not strong enough. We will combine the deeper ResNet to classify the side scan sonar image.

Reviewer 2 Report

The manuscript is devoted to relevant topics and could make a significant contribution to the development of the studied area. However, as presented, I still cannot recommend it for publication.

Here are some comments:

In expression (1), the summation limits are still not fully defined, which can lead to ambiguity. It is not specified how many feature maps with subscript k and how many maps with subscript n are available. Expression (2) is still incorrect.

Expressions and some letters in Figure 4 still require correction.

In the text, in many places, there are no spaces between letters and punctuation marks. For some examples see lines 52, 55, 65, 82, 230, 457, 450, 458, 462.

The bibliography was compiled carelessly and without observing editorial requirements. Each sentence in the reference list has indefinite characters in square brackets: [J], [M], [C]. In one case, the pages are marked with the abbreviation “pp.”, while in other cases they are marked as “p.” or without any abbreviation. In some positions, the year of publication is indicated with parentheses (see for example lines 767, 471), and in others without brackets at all. It is necessary to streamline and unify the description of references.

Please carefully proofread and improve the language presentation. I can recommend acceptation the manuscript with only the highest standard in terms of technical contribution and language presentation. Otherwise, I will be forced to reject the manuscript.

Author Response

1.In expression (1), the summation limits are still not fully defined, which can lead to ambiguity. It is not specified how many feature maps with subscript k and how many maps with subscript n are available. Expression (2) is still incorrect.

About expression (1) and (2), we quote the paper that "Image style transfer using convolutional neural networks.", and the formula expression is the same as that in the paper.  DOI: 10.1109/CVPR.2016.265

 

2.Expressions and some letters in Figure 4 still require correction.

The letter has been modified. The expression in the figure refers to the expression in the paper that "Image style transfer using convolutional neural networks."

DOI: 10.1109/CVPR.2016.265

 

  1. In the text, in many places, there are no spaces between letters and punctuation marks. For some examples see lines 52, 55, 65, 82, 230, 457, 450, 458, 462.

We have modified the corresponding content in our paper.

 

4.The bibliography was compiled carelessly and without observing editorial requirements. Each sentence in the reference list has indefinite characters in square brackets: [J], [M], [C]. In one case, the pages are marked with the abbreviation “pp.”, while in other cases they are marked as “p.” or without any abbreviation. In some positions, the year of publication is indicated with parentheses (see for example lines 767, 471), and in others without brackets at all. It is necessary to streamline and unify the description of references.

We have modified the format of references in our paper.

 

  1. Please carefully proofread and improve the language presentation. I can recommend acceptation the manuscript with only the highest standard in terms of technical contribution and language presentation. Otherwise, I will be forced to reject the manuscript.

We have applied for editing service to edit the language expression of our paper.

Round 3

Reviewer 2 Report

Expression (2) is still not clear. What does the letter L mean? It is not defined anywhere. Shouldn't "l" change from 0 to L-1?

Expression 2 in the text and in Figure 4 does not match. In the same figure, the summation limits for the sum symbols are not visible.

Author Response

  1. Expression (2) is still not clear. What does the letter Lmean? It is not defined anywhere. Shouldn't "l" change from 0 to L-1?

L is the maximum layer index in the CNN, so the value of l ranges from 0 to L. We have modified them in our paper.

 

  1. Expression 2 in the text and in Figure 4 does not match. In the same figure, the summation limits for the sum symbols are not visible.

  Thank you for your careful review. There are indeed errors in Figure 4. We confused "l" and "L", and we have modified them in Figure 4. After correction, Lstyle can be obtained by summing the products of wl and El from layer 0 to layer L.

Back to TopTop