Next Article in Journal
Feature Interaction-Based Reinforcement Learning for Tabular Anomaly Detection
Next Article in Special Issue
An Enhanced Lightweight Network for Road Damage Detection Based on Deep Learning
Previous Article in Journal
Implementation and Evaluation of 5G MEC-Enabled Smart Factory
Previous Article in Special Issue
Saliency Detection Based on Low-Level and High-Level Features via Manifold-Space Ranking
 
 
Article
Peer-Review Record

SuperDet: An Efficient Single-Shot Network for Vehicle Detection in Remote Sensing Images

Electronics 2023, 12(6), 1312; https://doi.org/10.3390/electronics12061312
by Moran Ju 1,*, Buniu Niu 1, Sinian Jin 1 and Zhaoming Liu 2
Reviewer 1:
Reviewer 2: Anonymous
Reviewer 3:
Electronics 2023, 12(6), 1312; https://doi.org/10.3390/electronics12061312
Submission received: 9 February 2023 / Revised: 3 March 2023 / Accepted: 7 March 2023 / Published: 9 March 2023

Round 1

Reviewer 1 Report

The paper proposes a An Efficient Single-Shot Network for Vehicle Detection in Remote Sensing Images.

 

1.     The paper is well presented. Though the entire manuscript has to be proof read to correct English usage and grammatical errors.

2.     Recent literature are not included In the survey. There are many works on Vehicle Detection in Remote Sensing Images which are not cited. [25] to [33] mostly refers to the works from 2017 to 2019.

3.     Notations used need to be consistent. Every notation used need to be explained

4.     Visual comparison results of state- of- the-art methods need to be included.

5.     Although comparisons with Faster RCNN, Faster RER-CNN, 290 YOLO V2, YOLO V3, SSD, CorrNet and DAGN are included, comparisons with recent state-of-the-art methods are not included. Authors should include qualitative and comparisons results with recent ones.

[1] Tan, Qulin, Juan Ling, Jun Hu, Xiaochun Qin, and Jiping Hu. "Vehicle detection in high resolution satellite remote sensing images based on deep learning." IEEE Access 8 (2020): 153394-153402.

[2] Zhang, Zhongyu, Yunpeng Liu, Tianci Liu, Zhiyuan Lin, and Sikui Wang. "DAGN: A real-time UAV remote sensing image vehicle detection framework." IEEE Geoscience and Remote Sensing Letters 17, no. 11 (2019): 1884-1888.

[3] Ghasemi Darehnaei, Zeinab, Mohammad Shokouhifar, Hossein Yazdanjouei, and Seyed Mohammad Jalal Rastegar Fatemi. "SI‐EDTL: swarm intelligence ensemble deep transfer learning for multiple vehicle detection in UAV images." Concurrency and Computation: Practice and Experience 34, no. 5 (2022): e6726.

[4] Wang, Yi, Syed Muhammad Arsalan Bashir, Mahrukh Khan, Qudrat Ullah, Rui Wang, Yilin Song, Zhe Guo, and Yilong Niu. "Remote sensing image super-resolution and object detection: Benchmark and state of the art." Expert Systems with Applications (2022): 116793.

6.     The computational complexity of the method needs to be compared with the existing ones, to substantiate its use for real time applications.

7.     With relevant theoretical proof, substantiate the credibility of ablation study results presented in TABLE 1.

8.     Accuracy and loss curves need to be plotted and discussed.

9.     Authors have presented results on only 2 image inputs. Include more (atleast 2 more) to summarise the study result.

10.  Conclusion section need to be refined. Report the accuracy obtained by the proposed method, reason for misses in object detection, limitations and future work plan.

 

Author Response

Point 1: 1. The paper is well presented. Though the entire manuscript has to be proof read to correct English usage and grammatical errors.

 Response 1: We have checked and modified the paper to correct English usage and grammatical errors.

 

Point 2: Recent literature are not included In the survey. There are many works on Vehicle Detection in Remote Sensing Images which are not cited. [25] to [33] mostly refers to the works from 2017 to 2019.

Response 2: Thank you for your question and advice. According to your advice, we have added some new works in the related works. Furthermore, we add the advantages and disadvantages comparison of different detectors in Table 1.

 

Point 3:  Although comparisons with Faster RCNN, Faster RER-CNN, 290 YOLO V2, YOLO V3, SSD, CorrNet and DAGN are included, comparisons with recent state-of-the-art methods are not included. Authors should include qualitative and comparisons results with recent ones..

Response 3: Thank you for your question and advice. Although a lot of target detectors have been investigated recently, they are not specifically designed for vehicle detection in remote sensing images and more importantly most of these methods are without open source code. To verify the performance, we compare with the recent ones with source code. It is noted that Faster RER-CNN, CorrNet and DAGN are all designed specifically for vehicle detection in remote sensing images. Our paper aims to combine super resolution method with the detection method on vehicle detection. We hope this can give some new idea to the vehicle detection in remote sensing images.

Point 4: Visual comparison results of state- of- the-art methods need to be included.

Response 4: Thank you for your advice. We have added qualitative results of DAGN and ARFFDet in Figure 5 and Figure 6. Compared with the Ground Truth, it is obvious that some vehicles have been missed and some have been detected with fault categories by DAGN and ARFFDet. SuperDet can effectively alleviate the above problems by recovering the details of the vehicle from its low resolution image.

 

Point 5:  With relevant theoretical proof, substantiate the credibility of ablation study results presented in TABLE 1.

Response 5: Thank you for your and advice. We have added the relevant theoretical proof and explanation in ablation study.

Point 6:  The computational complexity of the method needs to be compared with the existing ones, to substantiate its use for real time applications. Loss curves need to be plotted and discussed.

Response 6: Thank you for your question and advice. Superdet improves the detection performance by combining the resolution reconstruction method. To test the computational complexity and time for each methods, FPS is used to verify the effeciency. Because we introduced SRM modules, we optimized the speed by simplifying the VDM. Superdet is suitable for real-time detection with titan x GPU. Now we are exploring to further simplify the whole model. We have added the loss curve in the experiment section.

 

Point 7:  Conclusion section need to be refined. Report the accuracy obtained by the proposed method, reason for misses in object detection, limitations and future work plan.

Response 7: Thank you for your advice. We have refined the conclusion section.

Author Response File: Author Response.pdf

Reviewer 2 Report

I have only minor comments on this paper:

General:
- I would be consistent with using acronyms. For example: Convolutional Neural Network and CNN.
- I would also be consistent with the use of Upper Case and lower case, because it is used unevenly in the paper.

page 5. please, correct "inconspicuous and small, ,"


Section IV Experiments. I would add a report on the computing time spent for each of the proposed techniques or an analysis of the computing complexity.

Author Response

Point 1: I would be consistent with using acronyms. For example: Convolutional Neural Network and CNN. I would also be consistent with the use of Upper Case and lower case, because it is used unevenly in the paper.

 Response 1: Thank you for your advice. We have checked and modified the paper consistent with using acronyms and Upper Case and lower case.

 

Point 2: page 5. please, correct "inconspicuous and small, ,"

Response 2: Thank you for your advice. We have modified the mistake.

 

Point 3: Section IV Experiments. I would add a report on the computing time spent for each of the proposed techniques or an analysis of the computing complexity.

Response 3: Thank you for your advice. To show the computing time spent for each of the proposed techniques,we added a paragraph to explain the computing time with FPS in ablation study.

Author Response File: Author Response.pdf

Reviewer 3 Report

This paper presents SuperDet, a single shot based detector, which mainly contains two modules SRM and VDM to detect vehicles in remote sensing applications.

Strength: The highlight of this paper is a new network that is suitable for real-time vehicle detection.

Weakness: The novelty of this work is not clear. The paper contains mostly implementation details. For instance, SRM is one of the core ideas of the paper. It acts as a feature enhancer to help in detection. However, it is a well-established concept and has been used in object detection in aerial images. A comparison with existing SRMs is missing, such as how author’s SRM is better than existing ones. Similarly, VDM is also based on an existing network. One idea could be to write further details about the context expansion module. Moreover, experiment section can be improved by adding further evaluations.

Comments:

·         Section 2.1 contains mostly general description of existing detectors which are not designed for remote sensing applications. It would be better to focus only the detectors which are designed specifically for remote sensing applications

·         Section 2.2: it would be better to include a table which contains a brief overview of all related detectors, such as a comparison in terms of advantages and disadvantages

·         Comparison of proposed detector with the existing networks is missing in related work, such as advantages or disadvantages etc.

·         Section 3.1 is empty. It would be better to bring the Figure 1 and corresponding text down in this section, as they belong to the “architecture” section not to “Introduction”

·         VEDAI dataset: SuperDet is marginally better than DAGN. It would be better to include qualitative results of DAGN and Faster RER-CNN as well in Figure 5 (similar comment for Figure 6)

·         Table 2: not a fair comparison with detectors which are not designed for vehicle detection from aerial imagery

·         Table 2 and Table 4: should input be 512*512 rather than just 512?

·         Table 4: SuperDet is marginally better in mAP but ARFFDet has 36% more FPS. Please include a further comparison which clearly shows that your network is better than existing ones.

Author Response

Point 1: Section 2.1 contains mostly general description of existing detectors which are not designed for remote sensing applications. It would be better to focus only the detectors which are designed specifically for remote sensing applications.

 Response 1: Thank you for your question and advice. We first introduce the description of general existing detectors that is because most of the detectors designed for the remote sensing applications are improved by these general existing detectors. We added the explanation why we introduce these general detectors. Furthermore, to focus more on the detectors designed for remote sensing applications, we added the comparison of these proposed detectors as you advised.

Point 2: Section 2.2: it would be better to include a table which contains a brief overview of all related detectors, such as a comparison in terms of advantages and disadvantages. Comparison of proposed detector with the existing networks is missing in related work, such as advantages or disadvantages etc..

Response 2: Thank you for your advice. We have added the table to compare the advantages and disadvantages of proposed detector with the existing networks.

Point 3:  Section 3.1 is empty. It would be better to bring the Figure 1 and corresponding text down in this section, as they belong to the “architecture” section not to “Introduction”.

Response 3: Thank you for your advice. We have bring the Figure 1 to the “architecture”.

Point 4:   VEDAI dataset: SuperDet is marginally better than DAGN. It would be better to include qualitative results of DAGN and Faster RER-CNN as well in Figure 5 (similar comment for Figure 6).

Response 4: Thank you for your advice. We have added qualitative results of DAGN and ARFFDet in Figure 5 and Figure 6. Compared with the Ground Truth, it is obvious that some vehicles have been missed and some have been detected with fault categories by DAGN and ARFFDet. SuperDet can effectively alleviate the above problems by recovering the details of the vehicle from its low resolution image.

Point 5:  Table 2: not a fair comparison with detectors which are not designed for vehicle detection from aerial imagery. Table 2 and Table 4: should input be 512*512 rather than just 512?

Response 5: Thank you for your question and advice. Most of the detectors designed for the remote sensing applications are improved by these general existing detectors. Hence, we added these general detectors. Except that we added the comparative experiments with the vehicle detectors which have open source code. We have modified Table 2 and Table 4 with 512*512.

Point 6:  Table 4: SuperDet is marginally better in mAP but ARFFDet has 36% more FPS. Please include a further comparison which clearly shows that your network is better than existing ones.

Response 6: Thank you for your question and advice. We added the explanation in the paper. Although ARFFDet is faster than SuperDet, however, it has not addressed the low resolution and inconspicuous features of the vehicle targets, which affects the detection accuracy. We designed SRM to recover the details of the vehicle from its low resolution image and our proposed detector achieved better speed-to-accuracy tradeoff. SuperDet is suitable for real-time vehicle detection in remote sensing images.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Authors have addressed all the suggestions. The manuscript may be considered for publication in the current form after proof reading.

Author Response

Thank you for agreeing to publish our paper

Reviewer 3 Report

The authors have made some improvements. However, novelty of their work is still not clear. The paper contains mostly implementation details.  

Further comments:

·         Paragraph in Section 3 along with Figure 2 should be moved into Section 3.1. Section 3 should have an overview of  the proposed method, for instance Table 1 can be used as a reference.

·         Figure 6 and 7:

o    It would be better to include a title (e.g., ground truth) on vertical axis of each row

o    It would be more helpful to understand if authors add some extra detail on each image, for instance: detected: X, False detected: Y

 

·         Line 340-345: Accuracy of SuperDet is just slightly better than ARFFDet but the latter is 36% faster  

Author Response

Point 1:  Paragraph in Section 3 along with Figure 2 should be moved into Section 3.1. Section 3 should have an overview of  the proposed method, for instance Table 1 can be used as a reference.

 Response 1: Thank you for your advice. We have moved the Paragraph in Section 3 and the Figure 2 into  Section 3.1. Furthermore, we added an overview of the proposed method.

Point 2: Figure 6 and 7:o    It would be better to include a title (e.g., ground truth) on vertical axis of each row o    It would be more helpful to understand if authors add some extra detail on each image, for instance: detected: X, False detected: Y.

Response 2: Thank you for your advice. We have added the title on vertical axis of each row and all the fault detection and missed detection have been marked in red box.

Point 3:   Line 340-345: Accuracy of SuperDet is just slightly better than ARFFDet but the latter is 36% faster.

Response 3: Thank you for your advice. This paper achieves the combination of the super resolution algorithm with deep convolutional neural network (DCNN) based object detector, aiming to provide new ideas for the vehicle or small target detection task in remote sensing images. Although ARFFDet is faster than SuperDet, it has not addressed the low resolution and inconspicuous features of the vehicle targets. Now we are exploring to further simplify the whole model.

 

Author Response File: Author Response.pdf

Round 3

Reviewer 3 Report

The authors have addressed all the concerns.

Back to TopTop