Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Spiral Search Grasshopper Features Selection with VGG19-ResNet50 for Remote Sensing Object Detection

Remote Sens. 2022, 14(21), 5398; https://doi.org/10.3390/rs14215398

by Andrzej Stateczny^1,*

, Goru Uday Kiran², Garikapati Bindu³, Kanegonda Ravi Chythanya⁴

and Kondru Ayyappa Swamy⁵

Reviewer 1:

Rei Sonobe

Reviewer 2:

Rishabh Das

Reviewer 3: Anonymous

Remote Sens. 2022, 14(21), 5398; https://doi.org/10.3390/rs14215398

Submission received: 12 September 2022 / Revised: 23 October 2022 / Accepted: 25 October 2022 / Published: 27 October 2022

Round 1

Reviewer 1 Report

The authors evaluated the proposed method (Spiral Search Grasshopper Optimization technique) for object detection using the DOTA and DIOR Dataset. And then the results show that the proposed method increased the exploitation and select unique features and it was effective to overcome imbalance data and overfitting problem.

The paper is on an interesting topic and the results are useful for the RS community. I list here below some comments to improve the presentation.

3. Proposed Method

LL.158-159

Why did you choose VGG19 and Res-Net50 architectures?

I agree with that they are popular, but if you paid attention to some features of them, please provide.

4. Simulation Setup

L.309 Parameter settings

Why did you choose this setting? If you referred to any literatures, please provide references.

5. Results

DOTA and DIOR have some categories.

Could you provide more details? Which categories were easy or difficult to detect for each architecture?

The discussion aspect is weak and needs to be improved.

Is adding other architectures effective to improve the propose method?

Author Response

Reviewer 1:

The paper is on an interesting topic and the results are useful for the RS community. I list here below some comments to improve the presentation.

Proposed Method

LL.158-159: Why did you choose VGG19 and Res-Net50 architectures? I agree with that they are popular, but if you paid attention to some features of them, please provide.

Answer:

Thank you for your useful comments.

Both the VGG-19 and ResNet makes it possible to train up to fully connected layers and still achieves compelling performance. Taking advantage of its powerful representational ability, the performance of many computer vision applications other than image classification have been boosted, such as object detection.

ResNet architecture doesn’t compromise its performance too much, also the architecture has many independent effective paths and the majority of them remain intact after removing a couple of layers. On the contrary, the VGG-19 network has only one effective path, so removing a single layer compromises this one the only path. But that it is a huge network, which means that it takes more time to train its parameters. As a result, as the network goes deeper, its performance gets saturated or even starts degrading rapidly. Therefore, the plain data from the VGG-19 and recursive data from the ResNet are collected together. From those collected data’s, matched features are selected for further processing method. In order to select more informative features from the dataset, both the VGG and ResNet models are implemented in this research.

The same comment which has received from another reviewer too. Therefore, the following are the features, which we have implemented and placed the results already in Table 1 and 3 at section 5. Further experimentation about those features will be tested in the future, they are

Fast R-CNN
YOLO.
EfficientDet.

Simulation Setup

L.309 Parameter settings: Why did you choose this setting? If you referred to any literatures, please provide references.

Answer:

Thank you for your important comments. The parameters which we have placed in this research are common (base) specifications which are referred from the references [46] & [47]. While considers the hyper-parameters, the proposed model is experimented with 3 constant learning rates 0.1, 0.01, and 0.001; From that analysis, 0.1 is too large; 0.001 is too small; therefore, 0.01 tends to be better for tested networks which is used in this research.

Cited References

[46] Nakamura, Kensuke, Bilel Derbel, Kyoung-Jae Won, and Byung-Woo Hong. "Learning-Rate Annealing Methods for Deep Neural Networks." Electronics 10, no. 16 (2021): 2029.

[47] Xu, Zhangze, Wenyong Gui, Ali Asghar Heidari, Guoxi Liang, Huiling Chen, Chengwen Wu, Hamza Turabieh, and Majdi Mafarja. "Spiral motion mode embedded grasshopper optimization algorithm: design and analysis." IEEE Access 9 (2021): 71104-71132.

Results: DOTA and DIOR have some categories. Could you provide more details? Which categories were easy or difficult to detect for each architecture?

Answer:

Thank you for your useful comments.

DOTA Dataset:

Fifteen categories are chosen and annotated in DOTA dataset, including plane, ship, storage tank, baseball diamond, tennis court, basketball court, ground track field, harbor, bridge, large vehicle, small vehicle, helicopter, roundabout and soccer ball field. The categories are selected by experts in aerial image interpretation according to whether a kind of objects is common and its value for real-world applications.

The first 10 categories in the DOTA dataset are common. The remaining 10 categories are difficult to detect for each architecture when split into large ones (harbor, bridge, large vehicle, helicopter, roundabout) and small ones (tennis court, basketball court, ground track field, soccer ball field), because there is obvious difference between these two sub-categories in aerial images.

DIOR Dataset:

DIOR is one of the largest and most diverse open-source datasets in remote sensing image object detection. The dataset contains 23,463 images covering 20 common categories: aircraft, airport, baseball field, basketball court, bridge, chimney, dam, expressway service area, expressway toll station, harbour, ship, golf course, ground track field, overpass, stadium, storage tank, tennis court, train station, vehicle, and windmill. The training set contained 5862 images, the validation set contained 5863 images, and the remaining 11,738 images are used as the testing set.

While in the DIOR dataset, some images, i.e. harbour, ship, stadium, train station, expressway toll station may have noise and some targets in motion would be blurred. So, it has a high degree of inter-class similarity and intra-class diversity. These characteristics greatly increase the difficulty of detection.

The discussion aspect is weak and needs to be improved.

Answer:

Thank you for your valuable comments. We have improved the result analysis by adding the discussion at section 5.4.

The SSG model is used in this study to select the features more effectively and enhance the object detection. The SSG keeps the balance between exploration and exploitation, which aids in the selection of the relevant features, and helps to increase the exploitation of the feature selection. For better object representation in the images, the VGG-19 & ResNet50 model extracts the features. The SSG technique improves object classification performance by choosing the relevant features from extracted features. The SSG model has 78.42% and 82.45% mAP on the DIOR and DOTA datasets, respectively, according to the result analysis. This proposed SSG model is better than the existing LO-Det [22], CF2PN [23], AOPG [24], and SLA [29] models.

The above statement is updated at section 5.4.

Is adding other architectures effective to improve the propose method?

Answer:

Thank you for your useful comments. Analysis about the other architectures may improve the performances which is considered as future work of this research study.

The future work of this model involves in apply new deep learning architecture models such as Dense Net, ResNet 101, Squeezenet for feature extraction and selection to analyze the classification performance.

The above statement is updated at conclusion.

Reviewer 2 Report

This research proposes Spiral Search Grasshopper (SSG) Optimization technique which increases the exploitation and selects unique features for object detection to help overcome imbalance data and overfitting problems. Overall, the manuscript is well written and the authors have done a good job explaining the methodology. The author should work on the following comments to improve the manuscript:

In the literature review section, for each cited work the authors should discuss how the current research solves the open research problem.
In the literature review section, the authors do a good job discussing the research landscape. The authors should add a paragraph that summarizes and concludes the discussion in the literature review.
In section 3, the authors should discuss how and why the hyperparameters were chosen.
It is unclear why mAP was chosen as the performance metric.

Author Response

Reviewer 2:

In the literature review section, for each cited work the authors should discuss how the current research solves the open research problem.

Answer:

Thank you for your valuable comments. We have updated the literature review section and provided the research solution at the end.

In the literature review section, the authors do a good job discussing the research landscape. The authors should add a paragraph that summarizes and concludes the discussion in the literature review.

Answer:

Thank you for your important comments. We have summarized the overall process of literature at the end of section 2.

According to the literature, several deep learning algorithms were used for object detection and produced notable performance in detection. The development of techniques for detecting objects using remote sensing has received significant effort. Many techniques are less effective at finding small objects in distant sensing photos. There are restrictions on object categories of little items that are higher than large-sized objects in many sorts of research on object identification using scene categorization. The efficiency of small item detection in remote sensing images is negatively impacted by the lack of diversity in many methods. Additionally, an overfitting issue and a data imbalance had an impact on the model's performance. To solve those concerns, the SSG approach is used to increase feature exploitation, which supports in the detection of small objects and minimizes overfitting issues.

The above statement is updated at section 2.

In section 3, the authors should discuss how and why the hyperparameters were chosen.

Answer:

Thank you for your useful comments. The discussion about the hyperparameters is presented at section 3.3.

While considers the hyper-parameters, the proposed model is experimented with 3 constant learning rates 0.1, 0.01, and 0.001; From that analysis, 0.1 is too large; 0.001 is too small; therefore, 0.01 tends to be better for tested networks which is used in this research [46], [47].

Cited References:

[46] Nakamura, Kensuke, Bilel Derbel, Kyoung-Jae Won, and Byung-Woo Hong. "Learning-Rate Annealing Methods for Deep Neural Networks." Electronics 10, no. 16 (2021): 2029.

It is unclear why mAP was chosen as the performance metric.

Answer:

Thank you for your useful comments.

The Mean Average Precision (mAP) is used as a standard metric to analyze the accuracy of an object detection model. mAP is a metric used to evaluate object detection models such as VGG19 and Res-Net50 Fast R-CNN, YOLO, Mask R-CNN, etc. The mean of Average Precision (AP) values is calculated over recall values from 0 to 1. The closeness of results of observations, computations, or estimates of graphic map features to their true value or position. The higher the score, the more accurate the model is in its detections.

Reviewer 3 Report

Comments on the article “Spiral Search Grasshopper Features Selection with VGG19-2 ResNet50 for Remote Sensing Object Detection” by Stateczny et al., are listed as below,

l Original research manuscripts, which should comprise at least 18 pages (https://www.mdpi.com/journal/remotesensing/instructions).

l Related works are to long to catch.

l Where do you get the dataset? Could you provide the address of the dataset?

l Why do you select two feature extraction models (VGG and ResNet)? Is it enough if we only select one feature extraction method? Shall the features be similar or repeat?

l What’s the main difference between the SSG which you proposed and the Saremi et al. [39]?

l Although the results are better than other models, more comparison still need, such as Yolo, EfficientDet, Faster R CNN?

l Show more details on the images for the readers, only the tables are not enough.

l There are no precision and recall results although the author have given the equations.

Author Response

Reviewer 3:

Original research manuscripts, which should comprise at least 18 pages (https://www.mdpi.com/journal/remotesensing/instructions).

Answer:

Thank you for your valuable comments. We have prepared this research paper with 18 pages.

Related works are too long to catch.

Answer:

Thank you for your useful comments. In order to understand the clear knowledge about this work, we have provided the lengthy literature review.

For readers understanding, we have summarized the overall process attained from the existing literatures at the end of section 2.

The above statement is updated at section 2.

Where do you get the dataset? Could you provide the address of the dataset?

Answer:

Thank you for your important comments. We have provided the address of the specified datasets.

DOTA Dataset: https://datasets.superannotate.com/dota-dataset/

DIOR Dataset: Li, K., Wan, G., Cheng, G., Meng, L., Han, J., 2020. Object detection in optical remote sensing images: A survey and a new benchmark. ISPRS Journal of Photogrammetry and Remote Sensing 159, 296– 307.

Why do you select two feature extraction models (VGG and ResNet)? Is it enough if we only select one feature extraction method? Shall the features be similar or repeat?

Answer:

Thank you for your valuable comments. We have selected two feature extraction models (VGG and ResNet, because,

ResNet architecture doesn’t compromise its performance too much, also the architecture has many independent effective paths and the majority of them remain intact after removing a couple of layers. On the contrary, the VGG-19 network has only one effective path, so removing a single layer compromises this one the only path. But that it is a huge network, which means that it takes more time to train its parameters. As a result, as the network goes deeper, its performance gets saturated or even starts degrading rapidly. Therefore, the plain data from the VGG-19 and residual data from the ResNet are collected together. From those collected data’s, matched features are selected for further processing.

From the above-stated reason, it is not enough to use one feature extraction method. In order to obtain more informative features from the dataset, both the VGG and ResNet models are implemented in this research.

What’s the main difference between the SSG which you proposed and the Saremi et al. [39]?

Answer:

Thank you for your valuable comments. As per the reviewer’s comment, we have updated the main difference between the proposed SSG and Saremi et al. [39] at section 3.3.

[39] Mirjalili S.Z.; Mirjalili S.; Saremi S.; Faris H.; Aljarah I. Grasshopper optimization algorithm for multi-objective optimization problems. Applied Intelligence, 2018, 48(4), 805-820.

Although the results are better than other models, more comparison still need, such as Yolo, EfficientDet, Faster R CNN?

Answer:

Thank you for your important comments. We have updated the results with Yolo, EfficientDet, Faster R CNN in Table 1 and 3 at section 5.1 and 5.2 respectively.

Show more details on the images for the readers, only the tables are not enough.

Answer:

Thank you for your useful comments. As per the reviewer’s comment, we have updated more details on the images (Figure 6 to 11) present in the result and discussion section.

There are no precision and recall results although the author has given the equations.

Answer:

Thank you for your valuable comments. We have updated the precision and recall results in Table 2 and 4 at section 5.1 and 5.2 respectively.

Round 2

Reviewer 1 Report

The authors improved the manuscript well and I think this paper can now be accepted for publication.

Author Response

Thank you for your valuable comments.

Reviewer 3 Report

I appreciate the authors' effort, I sugget the author give a detailed information about the difference between VGG and ResNet when taking as the feature extration network.

Author Response

Reviewer 3:

I appreciate the authors' effort; I suggest the author give a detailed information about the difference between VGG and ResNet when taking as the feature extraction network.

Answer:

Thank you for your valuable comments. We have included a detailed information about the difference between VGG and ResNet at section 3.2.2.

VGG19 is one of the popular algorithms used in image classification. However, if it is also used for feature extraction, it provides the following advantages.

Initially, the features of VGG19 are flat and capable of achieving optimal values. Also, it is a pre-trained model which is trained on a large dataset and fine-tuned to fit the image with ease. Similarly, ResNet50 is much deeper than VGG19 and its architecture size is substantially smaller due to the usage of fully-connected layers which reduces the architecture size. In ResNet50, networks with large number of layers are trained easily without increasing the training error percentage.

Even though, the subspace value is wider in VGG19 when compared to ResNet50 which creates more error in the architecture. While considering the ResNet50, the subspace value is optimal, but there may be a chance of overlapping in feature subspace. Whereas deliberate those features in the training and testing stage, the subspace error value of some classes gets affected. Further, ResNet50 usually requires more time period for training, therefore, making it practically is infeasible in real-world applications.

In order to obtain more informative features, optimal values from both the VGG19 and ResNet50 models are collected. Hence, the output from VGG19 and ResNet50 are combined and applied in feature extraction for a better representation of the object.

The above statements are updated at section 3.2.2.

Article Menu

Spiral Search Grasshopper Features Selection with VGG19-ResNet50 for Remote Sensing Object Detection

Further Information

Guidelines

MDPI Initiatives

Follow MDPI