Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

RSCNet: An Efficient Remote Sensing Scene Classification Model Based on Lightweight Convolution Neural Networks

Electronics 2022, 11(22), 3727; https://doi.org/10.3390/electronics11223727

by Zhichao Chen^1,2, Jie Yang^1,3,*, Zhicheng Feng^1,2 and Lifang Chen⁴

Reviewer 1:

Xuan Truong Nguyen

Reviewer 2: Anonymous

Reviewer 3:

Mohamad Awad

Electronics 2022, 11(22), 3727; https://doi.org/10.3390/electronics11223727

Submission received: 2 October 2022 / Revised: 9 November 2022 / Accepted: 9 November 2022 / Published: 14 November 2022

(This article belongs to the Section Artificial Intelligence)

Round 1

Reviewer 1 Report

This study presents a lightweight model RSCNet for remote sensing classification. Based on the backbone ShuffeNet v2, RSCNet is developed with three modifications. Firstly, transfer learning is utilized to optimize the initialization weights of the backbone. Secondly, a lightweight channel attention mechanism is accessed behind the backbone to strengthen important channel characteristics. Thirdly, the label smoothing regularization technique is used to optimize the cross-entropy loss function. The experimental results show that the classification accuracy of RSCNet is 96.75% and 99.05% on the AID and UCMerced_LandUse datasets, respectively. Meanwhile, the amount of calculation of the proposed model is 153.71M FLOPs, and the time spent for a single inference on the CPU is about 2.75ms.

The paper is quite well-structured and the experimental results are intuitive. However, my main concern is about the novelty of this work because all three techniques, i.e., transfer learning, channel attention, and label smoothing regularization used in RSCNet are proposed in the previous works. Therefore, I suggest the authors conduct more experiments and justify the proposed network. Here are comments in detail:

1. According to Table 2, transfer learning is the main source of performance enhancement. However, the technique has been adopted in previous works [1]-[2] on the remote sensing domain where the volume of remote sensing images is relatively small. What is the training data ratio? The authors are encouraged to conduct more transfer-learning-related experiments with different options such as small training data ratios, i.e., 5%, 10%, 15%, or 20%, or larger datasets such as NWPU-RESISC45 with 31,500 images.

2. According to Table 1, transfer learning consistently improves the performances of many networks, i.e., more than 3%. Meanwhile, BiMobilenet [52] achieves 96.87% when trained on the AID dataset with a training data ratio of 50%. Is it possible to boost similar performance when applying transfer learning on the top of SOTA networks such as [52] or [3]-[5]?

3. In Figure 3, why is ECA only applied to the outputs of Conv5 instead of other blocks?

4. Table 2 should be extended by including more options such as Transfer learning with ECA or Transfer learning with regularization. In addition, the authors may add more ablation results with other datasets.

5. Some SOTA networks such as [3]-[5] or [52] should be added for comparison. For example, [52] is included in Table 4, but excluded in Table 3.

6. Experimental sections can be reorganized for clarity. For example, transfer learning experiments are conducted with the AID dataset which has 10,000 images. Meanwhile, there are no similar results with the UCMerced_LandUse dataset which has only 2100 images.

7. Figure 11 looks confusing. Why does the inference time fluctuate a lot for other networks? Why do RSCNet and Mobilnet v2 have similar execution times?

8. In the context of neural networks, FLOPs simply mean the total number of floating-point operations required for a single forward pass. The authors should double-check the definition of FLOPs used in the manuscript.

[1] J. Chen et al., Object detection in remote sensing images based on deep transfer learning, Multimedia Tools and Applications, Volume 81, Issue 9, Apr 2022 pp 12093–12109.

[2] M. Zhang et al, Segmenting across places: The need for fair transfer learning with satellite imagery, CVPR2022W.

[3] Xie, J.; He, N.; Fang, L.; Plaza, A. Scale-free convolutional neural network for remote sensing scene classification. IEEE Trans. Geosci. Remote Sens. 2019, 57, 6916–6928

[4] Zhang, W.; Tang, P.; Zhao, L. Remote sensing image scene classification using CNN-CapsNet. Remote Sens. 2019, 11, 494.

[5] Cheng, G.; Yang, C.; Yao, X.; Guo, L.; Han, J. When deep learning meets metric learning: Remote sensing image scene classification via learning discriminative CNNs. IEEE Trans. Geosci. Remote Sens. 2018, 56, 2811–2821.

Author Response

Please see the attachment.

Author Response File: Author Response.doc

Reviewer 2 Report

The article presents “ RSCNet: An efficient remote sensing scene classification model based on lightweight convolution neural networks”. The topic is interesting but the article lacks novelty. However, the authors must address following comments.

11. What is the major contribution of the authors? Enlist contributions.

22. What is the novelty? Transfer learning is widely used in every field and is being replaced by vision transformers nowadays for classification what is the novel contribution made by authors?

33. The abstract must emphasize on the proposed method rather then presenting general summary of article.

4. Authors must provide the full forms of all the acronyms used.

5. Why the authors think this is attention? The proposed block is not attention please refer to the article where the authors used attention block with D-CNNs (doi.org/10.3390/jimaging8030070)

6. The authors should correct the numbering of Sections the first section must be 1. Introduction instead of 0. Introduction.

7. Authors should provide the specifications of CPU and GPU along with the experimental setup parameters to reproduce the results.

8. Why did the authors not add precision, recall, MCC, Cohen’s Kappa, specificity, sensitivity? These must be included.

9. Why did the authors not include ROC curves for the proposed model on both dataset?

10. Testing results must be added for ablation study to highlight the importance of the proposed model.

11. In Table 2 change (\) with (x) symbol or explain the meaning of (\).

12. Improve quality of confusion matrices its hard to read.

Author Response

Please see the attachment.

Author Response File: Author Response.doc

Reviewer 3 Report

The research work "RSCNet: An efficient remote sensing scene classification model based on lightweight convolution neural networks" is an interesting research work. The authors are required to modify their paper according to the following remarks:

1-There are other recent CNN efficient normal and lightweight networks not mentioned in the literature review (lines 36 to 48) such as:

A-Olaf Ronneberger, Philipp Fischer, Thomas Brox, U-Net: Convolutional Networks for Biomedical Image Segmentation, Computer Vision and Pattern Recognition 2015,

https://doi.org/10.48550/arXiv.1505.04597

B-Awad, M.M.; Lauteri, M. Self-Organizing Deep Learning (SO-UNet)—A Novel Framework to Classify Urban and Peri-Urban Forests. Sustainability 2021, 13, 5548. https://doi.org/10.3390/su13105548

C-Tan, M. and Le, Q.V. (2019) EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. Proceedings of the 36th International Conference on Machine Learning, ICML 2019, Long Beach, 9-15 June 2019, 6105-6114. http://proceedings.mlr.press/v97/tan19a.html

2-Second step on lines 166 to 168 needs more elaboration. How the RSSC model is matched with the pre-trained weights by network layer names to complete the initialization of the weights?

3-The authors indicated on lines 172 to 174 that attention mechanisms increases accuracy to CNN at the cost of increasing the complexity. What complexity? To what extent the CNN's accuracy improves by using attention mechanisms. Add references to support the idea

4-No need for subsection 3.4 because it is repeated later in other sections such as comparing RSSC with lightweight models

5-The authors indicated on lines 259 to 261 that ShuffleNet v2 is the best model because it starts with high accuracy and maintains it during the training epochs.

That is not true because most of the time starting with high accuracy and maintaining it means that the model is stuck in local minima instead of finding the global optimum solution.

6-On lines 278 to 280, the authors indicated that the loss of the prediction results in all dimensions is taken into account, how it is taken into account in all directions. Explain

Author Response

Please see the attachment.

Author Response File: Author Response.doc

Round 2

Reviewer 1 Report

The paper presents a lightweight CNN model for remote sensing based on ShuffNet v2. Although the proposed network utilizes some well-known techniques, such as transfer learning and channel attention, it achieves a good result for a given task. The paper is almost ready for publication in this forum. However, please consider the following comments for further improvement.
1. The authors should describe more motivations for this paper. According to the execution time graph, reducing the FLOPs of a model may not reduce its execution time since the model may require more memory accesses than others.
2. The need for transfer learning can be highlighted to address the lack of data and verified with the result.
3. The roles of attention and regularization are hidden by that of transfer learning. The authors may reduce the number of data and verify if the techniques can give a comparable result.

Author Response

Please see the attachment

Author Response File: Author Response.doc

Reviewer 2 Report

I would like to thank authors for addressing some of my previous comments. The revised version is much improved than initial version. However, the following concerns must be addressed.

1. In line 20 authors define acronym as "remote sensing image(RSI)". There must be space between meaning and acronym like "remote sensing image (RSI)". Please revise for all acronyms as the problem is for all.

2. In line 35 authors changed the way of defining acronym like "CNN(Convolution Neural Network)" however this method is wrong please follow same method like line 20 for all acronyms as consistency is very important in the article.

3. Please organization of the paper at the end of introduction.

4. Acronym defination in the line 70 is wrong follow this method and be consistent with it "Scale Invariant Feature Transform (SIFT)".

5. Where is the robustness check?

6. What is the relation of this sentence with the other sentences "Barshooi and Amirkhani [41], Elkorany and Elsharkawy [42] applied ShuffleNet v2 to COVID-19 chest film classification." The article is not discussing COVID-19 please remove these references.

7. In line 124 define this acronym DWConv.

8. Specify the model of GPU "TITAN RTX?" and also manufactor information.

9. In Table 2, 3, 4, 5, 6, 7 change (A_n /%) to A_n (%) in the header follow this for others also.

10. Change confusion matrix it must be readable. The confusion matrix must show the overall accuracy and percentages of TN, TP etc. What is 1 to 21 classes? If these are classes change these labels to class name. It is necessary to add high level diagram of confusion matrix. Authors can use MATLAB software to generate high level confusion matrices.

Author Response

Please see the attachment

Author Response File: Author Response.doc

Reviewer 3 Report

The paper has been modfied according to the reviewers comments. Thank you for your efforts and for your interesting research work. I suggest to pay attention for some minor English mistakes.

Author Response

Please see the attachment.

Author Response File: Author Response.doc

Round 3

Reviewer 2 Report

Thanks for addressing my comments. In the current version confusion matrices have small fonts and it's difficult to read. Please improve confusion matrices. Thanks.

Author Response

"Please see the attachment."

Author Response File: Author Response.doc

Article Menu

RSCNet: An Efficient Remote Sensing Scene Classification Model Based on Lightweight Convolution Neural Networks

[1] J. Chen et al., Object detection in remote sensing images based on deep transfer learning, Multimedia Tools and Applications, Volume 81, Issue 9, Apr 2022 pp 12093–12109.

[2] M. Zhang et al, Segmenting across places: The need for fair transfer learning with satellite imagery, CVPR2022W.

[3] Xie, J.; He, N.; Fang, L.; Plaza, A. Scale-free convolutional neural network for remote sensing scene classification. IEEE Trans. Geosci. Remote Sens. 2019, 57, 6916–6928

Further Information

Guidelines

MDPI Initiatives

Follow MDPI