Next Article in Journal
From Illegal Waste Dumps to Beneficial Resources Using Drone Technology and Advanced Data Analysis Tools: A Feasibility Study
Previous Article in Journal
Marine Gravimetry and Its Improvements to Seafloor Topography Estimation in the Southwestern Coastal Area of the Baltic Sea
 
 
Article
Peer-Review Record

Multi-Path Interactive Network for Aircraft Identification with Optical and SAR Images

Remote Sens. 2022, 14(16), 3922; https://doi.org/10.3390/rs14163922
by Quanwei Gao 1, Zhixi Feng 1,2, Shuyuan Yang 1,2,*, Zhihao Chang 1 and Ruyu Wang 3
Reviewer 1:
Reviewer 2:
Remote Sens. 2022, 14(16), 3922; https://doi.org/10.3390/rs14163922
Submission received: 24 June 2022 / Revised: 25 July 2022 / Accepted: 8 August 2022 / Published: 12 August 2022
(This article belongs to the Section AI Remote Sensing)

Round 1

Reviewer 1 Report

This paper proposed a data fusion method for aircraft identification with optical and SAR images.  A deep Multi-path Interactive Network (MIN) is proposed for aircraft identification with optical and SAR images, by employing multi-modal IASM in the deep network. It can accurately identify aircraft with clouds, especially thick clouds. The effectiveness of MIN is verified by detecting aircraft under clouds of different thicknesses on the FACD dataset. 

This paper is well presented. And as far as I could verify, the theoretical analysis is correct. The followings are some detailed comments.

1. In Figure 7, the optical image in the cloudless scene is not typical and it is recommended for replacement.

2. The size of Figure 10 is not suitable, and it is recommended to adjust it appropriately.

3. The “-”  in Table 1 should be centered;

4. Does the optical extraction feature branch and the SAR feature extraction branch use pre-trained weights? It is recommended to explain in the thesis.

5. The highest resolution of the Gaofen-3 satellite image is 1m, which is described as 0.5m in the paper. Whether there is a problem described in the paper?

Author Response

Dear Editors and Reviewers,

Thanks very much for your valuable comments on our paper. The manuscript has certainly benefited from these insightful revision suggestions. In this revision, we have revised the Paper: remotesensing-1809157 “Multi-path Interactive Network for Aircraft Identification with Optical and SAR images” according to the reviewers’ comments and made a point-by-point response to these comments. Below are the changes we have made in our revised paper as well as responses to your comments.

Author Response File: Author Response.pdf

Reviewer 2 Report

This manuscript reported a method to detect object using ResNet-34 by Optical and Radar images.  The paper developed Interactive Attention Sum-Max fusion module (IASM) to interact the features from multi-model images. The experiments are well developed and showing the better performances.

The research is well conducted and suitable to publish in Remote Sensing.

The following observations are listed for the authors to revise the actual version.

  1)   From the abstract and introduction section, the author mentioned to developed a Multi-path Interactive Network (MIN). It is better to mention that “The backbone of MIN is ResNet-34” at the beginning for not miss leading the readers.

  2)   The paper used Faster R-CNN as the SOTA method to compare the developed MIN. It is better to mention the related work in other SOTA research in object detection, for example: Once learning [1], YOLO [2], SSD [3], Vision Transformer [4] and SWIN Transformer [5]. There is no section to show the related work.

  3)   The fusion methods are well described in subsection 2.2. It is better to introduce time/momentum influence. For example the information of the image before Thick cloud. In this case, the identification performance of the heuristic method will be better than just to use the ML method. (This observation is suggested for the further work.)

  4)   The last section is very very simple. For a journal paper, it is better to conclude the main points of the contributions, future wotk and others to support the developed models.

  5)   The research is categorized in object detection. It focuses only on aircraft detection and in stable scenario. Especially, the developed method is applied by “human with the pre-knowledge: there are aircraft”. It is better to develop the method for more dynamic and uncertainty scenario with different objects. For example, in fig. 5, in case of thick cloud, if the object is a truck (not an aircraft), it may still appear as C2, how can the developed model detect this truck? Even the section 3 shows so many performance evaluations for this simple scenario, the value of the research is not practice and encourage and needs to improved for more realistic.

 References:

[1] Weigang L, Da Silva NC, 1999. A study of parallel neural networks. Int Joint Conf on Neural Networks, p.1113- 1116. https://doi.org/10.1109/IJCNN.1999.831112

[2] Redmon J, Divvala S, Girshick R, et al., 2016. You only look once: unified, real-time object detection. Proc IEEE Conf on Computer Vision and Pattern Recogni- tion, p.779-788.

[3] Liu W, Anguelov D, Erhan D, et al., 2016. SSD: single shot multibox detector. 14th European Conf on Computer Vision, p.21-37.

[4] AlexeyDosovitskiy,LucasBeyer,AlexanderKolesnikov,DirkWeissenborn, Thomas Zhai, Xiaohua Heigold, Sylvain Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.

[5] Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10012–10022, 2021.

Author Response

Dear Editors and Reviewers,

Thanks very much for your valuable comments on our paper. The manuscript has certainly benefited from these insightful revision suggestions. In this revision, we have revised the Paper: remotesensing-1809157 “Multi-path Interactive Network for Aircraft Identification with Optical and SAR images” according to the reviewers’ comments and made a point-by-point response to these comments. Below are the changes we have made in our revised paper as well as responses to your comments.

Author Response File: Author Response.pdf

Reviewer 3 Report

The topic of aircraft detection in a cloudy environment by fusing the features of optical and SAR images in very relevant. However, I have the following comments to offer.

1.I do not see much difference between Figs. 8b and 8d. IASM fusion is giving almost similar results as that of optical images.

2.Even in Fig.11, not much visual difference can be observed among b to h methods.

3. However, Fig.12 is showing the effectiveness of IASM method.

4. All the abbreviations in the paper must be expanded when they are appearing first time.

5.On line 302 and 303, Figure numbers 10 and 11 should be interchanged.

6.It will be appreciable if authors can provide comparison of different algorithms with their proposed one in terms of amount of time taken/number of operations.   

Author Response

Dear Editors and Reviewers,

Thanks very much for your valuable comments on our paper. The manuscript has certainly benefited from these insightful revision suggestions. In this revision, we have revised the Paper: remotesensing-1809157 “Multi-path Interactive Network for Aircraft Identification with Optical and SAR images” according to the reviewers’ comments and made a point-by-point response to these comments. Below are the changes we have made in our revised paper as well as responses to your comments.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

The manuscript was well revised and responded all the comments of the reviewer. Even though the paper is not perfect, but it is ready to publish this stage research.

 

Tere is a sentence in the introduction that needs to be improved:   Original: With the rapid development of object detection in computer vision, such as Once learning [2], YOLO [3], SSD [4], Vision Transformer [5] and SWIN Transformer [6].

 

It should be: With the rapid development of object detection in computer vision, such as Once learning [2], You Only Look Once  (YOLO) [3], Single Shot Detector (SSD) [4], Vision Transformer [5] and SWIN Transformer [6].

This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.


Back to TopTop