Next Article in Journal
A New Set of MODIS Land Products (MCD18): Downward Shortwave Radiation and Photosynthetically Active Radiation
Next Article in Special Issue
Extended Feature Pyramid Network with Adaptive Scale Training Strategy and Anchors for Object Detection in Aerial Images
Previous Article in Journal
Preliminary Results from a Wildfire Detection System Using Deep Learning on Remote Camera Images
Previous Article in Special Issue
Depthwise Separable Convolution Neural Network for High-Speed SAR Ship Detection
 
 
Article
Peer-Review Record

Precise and Robust Ship Detection for High-Resolution SAR Imagery Based on HR-SDNet

Remote Sens. 2020, 12(1), 167; https://doi.org/10.3390/rs12010167
by Shunjun Wei, Hao Su *, Jing Ming, Chen Wang, Min Yan, Durga Kumar, Jun Shi and Xiaoling Zhang
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Remote Sens. 2020, 12(1), 167; https://doi.org/10.3390/rs12010167
Submission received: 7 November 2019 / Revised: 19 December 2019 / Accepted: 27 December 2019 / Published: 2 January 2020
(This article belongs to the Special Issue Pattern Recognition and Image Processing for Remote Sensing)

Round 1

Reviewer 1 Report

Comments on ‘Precise and Robust Ship Detection for High-Resolution SAR Imagery Based on HR-SDNet’

In this paper, the authors proposed a ship detection method based on a high-resolution ship detection network (HR-SDNet) for high-resolution SAR imagery. SSDD dataset and TerraSAR-X images were used for demonstration. The presented work is a hot research topic in remote sensing applications but the contribution is not significant. The main reason is that mmdetection framework is pretty good in object detection and can be directly used for SAR image ship detection with low and medium ocean conditions. The authors should pay more attention on both inshore and offshore ship detection, especially small ships and dense ships in coastal ports. Although the authors stated this issue, the experimental results are not enough for validation. Therefore, I suggest the authors add more results in this manuscript.

Language should be checked carefully. There are many grammar errors within the text. A native English speaker would be helpful in language editing. Some unnecessary text should be removed, such as the text between section 2 and 2.1 The original SAR datasets should be given in the dataset description. More importantly, the inshore and offshore ship detection results should be added, especially the detection results in high sea conditions, which would be better for the validation of the proposed methodology.

Author Response

您好,我们已根据您的评论对稿件进行了修改,请参阅附件以查看修改后的文章。

 

Author Response File: Author Response.pdf

Reviewer 2 Report

A - Content


It is clear that this paper represents a significant amount of work. However
this effort is rather undercut by the conclusion section, which does not really
discuss the results presented, but rather acts as a brief summary. There are
many aspects to the results that potentially warrant explanation or discussion in the conclusion. Some of these and other queries/suggestions are listed below.

1. The average precision (AP) is introduced as the metric and defined as an
integral over the precision as a function of the recall. What is the uncertainty on this value? How does it compare with the level of improvement quoted?

2. By separating the results, it is clear that the performance is quite different
depending upon the size of the vessels. It might be interesting to see a break down of the performance by other available parameters e.g. image resolution.

3. How was the ground truth determined for the bounding boxes and therefore the bounding box error? (I realise this work may not have been done by the authors, but I am curious).

4. Although the concept and overall layout of the HRFPN network is clear,
the description is quite hard to follow and it would be tricky for others to
attempt to precisely recreate it using just this paper. A block diagram of a "residual unit" might be a useful addition.

5. How robust is the training for each network? If the training is repeated
with random initialisation of the trainable parameters, how much do the results vary? Larger networks ought to be less vulnerable to local minima, so
I would expect a small variation. It should be possible to satisfy the reader
of the significance of the improvements made, for example the 3.3% gain
mentioned in the abstract.

6. The false positives in the TSX images are dismissed a bit too abruptly as
being on land, similar to ships, and having little impact. I agree that false
positives clearly on land are less of a problem as they can be filtered with
a landmask, but not all of them are. Some appear to be in tidal flats or
part of harbour installations. Could it also be that the truth information
is incomplete? I would recommend a brief discussion of the causes of false
positives and their mitigation. It’s hard to tell without the full resolution
image to look at, but it seems that there might be some false negatives as
well, judging by the wake signatures present, although these could be due to the limitations of SAR rather than the analysis method presented.

7. It would be useful to have the robustness study (section 3.6) results quantified using the established criteria (AP, AP
50, APS etc.). Although figure 11 is very valuable in demonstrating what the image modifications look like to the reader, it is hard to assess how big the improvement is over the Cascade R-CNN.

8. It is claimed in the conclusion that the bounding box accuracy could be
more precise using the proposed network. This does not seem unreasonable
as a possibility, but I can’t find direct evidence to support the claim in the paper. Given that spatial accuracy was given as one of the primary motivations in the introduction (lines 90-97) for developing a network that maintains a high resolution representation, I would expect a quantitative statement.

9. A lot of effort has been expended to compare the proposed network with
a large number of existing solutions. It would be nice to clarify the nature
of the comparison a bit further. For example, how do the models compare in
terms of total number of trainable parameters?
It is clear from the comparison of the three different HRFPN networks (-W18, -W32 and -W40) that the number of trainable parameters is having an effect. How much of the improvement compared to the existing networks is down to the architecture, and how much to the number of parameters?

10. It would be useful to have an explanation for the large improvement in APM and APL versus modest change in APS in tables 3, 4 and 5. At the very least it seems worthy of comment.


B - Language


There are a number of areas where the standard of English dips below the
minimum threshold for publication. Sometimes it is just individual words
that need replacing, elsewhere whole clauses would benefit from careful editing.

Author Response

Hello, we have revised the manuscript according to your comments, please see the attachment to see our revised article.

Author Response File: Author Response.pdf

Reviewer 3 Report

The paper is presenting an approach of general interest, that seems to provide good results.

The structure of the paper is good and the content rich, but the quality of the text shall be significantly improved both from the linguistic and from the content point of view, by significantly improving the clarity of the explanations.

Several expressions are not correctly used in the text, as for example

Line 238 and ff. are using Proposals instead of Candidates; Up-sampling is not increasing the resolution of the data, but just providing a finer sampling; In line 279 and ff. ships are not dense;

 

The text could for example be significantly made clearer by eliminating all numbers in Lines 478 and ff. and citing Table 5.

 

The proposed approach is validated against “multiple popular baseline ship detectors”, but all of them belong to the machine learning group. It would be interesting to see in the comparison also how the proposed approach compares with standard CFAR algorithms.

 

It is hence recommended that the authors deeply rework the paper from the linguistic and clarity of explanation point of view, since it cannot be considered as suitable for publication at the present stage.

Author Response

Hello, we have revised the manuscript according to your comments, please see the attachment to see our revised article.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

The authors have resolved most of my concerns. Only one minor comment at this time. Some other ship detection articles should be added and discussed in the introduction, especially the detecton procedure using superpixels, which are also effective and efficient.

 

Author Response

Comment 1: Some other ship detection articles should be added and discussed in the introduction, especially the detecton procedure using superpixels, which are also effective and efficient.

Response: Thank you very much for your comment, we have added the ship detection methods based on superpixels to the  introduction in the manuscript. As shown below:

In addition, many ship detection methods based on superpixels have been proposed. Li et al. [16] came up with an improved superpixel-level constant false alarm rate (CFAR) detection method. He et al. [17] proposed a method for automatically detecting ships using three superpixel-level dissimilarity measures. Lin et al. [18] proposed a superpixel-level Fisher vector to describe the difference between the target and clutter. However, these methods are also difficult to accurately detect ships for both inshore and offshore scenes.

Reviewer 3 Report

PLease still consider revising the english language and simplify the very long explanations.

Author Response

Comment 1: PLease still consider revising the english language and simplify the very long explanations.

Response: Thanks for your comment. We have modified the English grammar and simplified some very long explanations.

This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.


Round 1

Reviewer 1 Report

This paper proposes a ship detection strategy from high-resolution SAR imagery based on a neural architecture that maintains the high-resolution features along a cascaded CNN to improve its detection accuracy. A modified non-maximum suppression algorithm is also used to better distinguish targets that are close to each other (if this is the meaning to assign to the phrase "dense ship objects"). This proposal has its rationale, and probably its merits, but the paper in its present form does not put the matter in the proper light, and does not help the reader in understanding fully the structure of the net, and how it works. The language used, the abundance of acronyms, the non-uniqueness of the notation and the non-uniform terminology worsen significantly the situation. I do not comment the experimental part, as this should be preceded by a clear introduction with a reasoned and structured state of the art (and this could produce a value-added tutorial significance), and a clear, concise and accurate presentation of the method proposed, also to be understood by a generally knowledgeable reader in remote sensing.

 

 

Some specific comments:

The same symbol C is used to denote the classification score, the corresponding block in Fig. 1 and the width of the feature maps.

I do not understand the phrase "in all architectures" associated to the RPN at page 7.

The same symbol f is used in eq. (2) to denote a linear penalty and in eqs. (3)-(4) to denote a regressor.

The symbols in the "discrete case" expression in eq. (12) are not made explicit: what are k, N, and Delta-r?

I do not see how the long edge and short edge of the image can be resized to 1000 and 600, respectively "without changing the aspect ratio" (p. 11, repeated twice).

The "clipping function" definitions in eqs. (13) and (14) seems to be inverted between the "linear" and "logarithmic" changes; moreover, the notation In(x) and the parameters k and b should be explained.

What is the meaning of "without burying" at line 1 of page 17?

Reviewer 2 Report

This article addresses the problem of detecting ships in SAR images at various scales. The proposed ship detection method consists in a the convolutional architecture illustrated in Fig 1.

A major problem of this article is that its major contribution seems to be having stacked together pre-existing CNN architectures and exploited them for SAR ship detection. The authors shall clearly state in what their approach innovates with respect to standard HRNet, etc.

Concerning Sec 2 Methods, the Fig 1 suggests that the overall network is composed of two components; the introduction to Sec 2 (l 152) tells that the network is composed of three modules. Finally, Sec 2 is subdivide din 4 subsection suggesting that the architecture is composed of four modules. These contradicting informations are totally misleading for the reader, the text, the figure and the section two organization shall all indicate the same number of components of the network architecture. Also, the network components shall be presented in Sec 2 according to the information processing order. In this sense, the HRNet module used to implement the backbone that extracts the features shall be described before everything else in Sec 2 as Sec 2.1. In Fig 1, please highlight the four modules using the same names used in section 2.

About the choice of HRNet for feature extraction, it is not clear why it has been chosen in place of other architectures such as, eg, ResNet or its variants. My understanding is that HRNet has some extra robustness to scale variations, but that is not clarified anywhere in the text.

About the training process, it is not clear how this network is trained. Sec 2.4 provides some insights on the loss functions but no clear explication of how the network is trained. Every/some module is trained on its own or the entire network is trained end to end as a whole ? The training procedure shall be described in a separate section with respect to Sec 2 that shall just describe the network architecture. In principle, the training process shall be described in enough detail so to allow the reader to reproduce the same training results.

In Fig 3a and 3b, what are the all/train/test "number" mentioned in the legends ? Did the authors mean "set" instead ?
Fig 5 is not needed since the IoU metric is well known to the averagely knowledgeable person in the field.

In Table 3 please mark in bold the best result for every AP metric. Do the same also for the following tables.

The numeric results show that the proposed method performs better than the considered references (eg, Fig 8 and Table 5); however the considered references are meant for generic object detection and thus are not meaningful references. Also, what is the complexity of the proposed method wrt the references ? It seems to be much higher, which makes the comparison not just little meaningful but also unfair.

Finally, the quality of the writing is very bad. For example, at l 38 the sentence "SAR [] has the advantage in the independent of the illuminant" I assume meant "SAR has the advantage of being independent from natural light sources". AT l 41 "These SAR images have been applied.." I guess it meant "SAR techniques have been applied" ? At l 51 "object detection [] remains to be a challenging task. " shall be "obj det remains a challenging task".

For all the above reasons, I cannot recommend publication of this paper until the above issues have been addressed.

Back to TopTop