Next Article in Journal
Research on Multiple Air-To-Air Refueling Planning Based on Multi-Dimensional Improved NSGA-II Algorithm
Previous Article in Journal
A Distributed Conflict-Free Task Allocation Method for Multi-AGV Systems
 
 
Article
Peer-Review Record

Hyperspectral Image Classification Based on Transposed Convolutional Neural Network Transformer

Electronics 2023, 12(18), 3879; https://doi.org/10.3390/electronics12183879
by Baisen Liu 1,2,3, Zongting Jia 1,*, Penggang Guo 1 and Weili Kong 1
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3:
Electronics 2023, 12(18), 3879; https://doi.org/10.3390/electronics12183879
Submission received: 19 August 2023 / Revised: 9 September 2023 / Accepted: 11 September 2023 / Published: 14 September 2023

Round 1

Reviewer 1 Report

Ref.: electronics-2592809

Manuscript Title: Hyperspectral Image Classification based on Transposed CNN-Transformer

Reviewer Comment: focus of the paper is on multi-scale spectral-spatial Transformer to capture high-level semantic features of HSI while preserving spectral information as much as possible.

- the novelty, clarity, logic and organization of this paper are so good, and it presented a hard work. It needs some minor modification. Although the paper is interesting, I have some concerns:

1. Title - The title reflects the results presented here.

2. Abstract 

- Expansion required for HSI at the beginning

- quantified results are not mentioned

3. Introduction

- Well written

- significance of Transformer models needs to be mentioned relevant to hyperspectral data.

4. Multi-scale Transposed CNN-transformer Feature Extraction

- Figure 1 - notations and abbreviations can be mentioned (eg: cat, PE etc)

- Citations needed to support 2.1. Inception-Based Spectral-Spatial Information Enhancement Extraction

- Citations needed to support 2.2. Spatial Transpose Inception module

- Figure 2. Spatial Transpose Inception module (- notations and abbreviations can be mentioned (eg: cat, summing point, image etc)

5. Experimental Results

- Table 2 and Figure 8 are giving same information. 

- Fig 8 - what is colour legends represent what?? Y- axes unit??

- Fig 9 - Y axes unit??

- is there a difference of time in execution among all methods? If yes, why not time can be considered as a factor?

 6. Conclusion

- Limitations of the proposed work can be mentioned more in detail

- "Consequently, future research will delve into exploring HSI spectral-spatial feature extraction and post-classification processing." - sentence is not complete and understandable.

Author Response

Response to Reviewer 1's comments:

Dear Reviewer,

 

We truly appreciate your positive and constructive suggestions on our manuscript titled "Hyperspectral image classification based on Multi-scale CNN-transformer". These suggestions are highly valuable and have been instrumental in improving our paper, as well as providing important guidance for our research. We have carefully considered your suggestions and made the necessary revisions, which are highlighted in red throughout the manuscript. The main corrections and our responses to the reviewer's comments are as follows:

1、Comment

Title - The title reflects the results presented here.
Response: Thank you for your suggestion. We truly appreciate your feedback on our title, which we believe is a positive evaluation indicating that the title is consistent with our research results and does not require any modifications. However, if we have misunderstood your suggestion, please kindly provide us with more details so that we can get in touch with you and clarify any misunderstandings.

2、Comment      

Abstract 

- Expansion required for HSI at the beginning

- quantified results are not mentioned

Response: Thank you for your feedback. We have made revisions to our abstract based on your suggestion. In the first paragraph, we have expanded the introduction of HSI and added the latest experimental results at the end.

3、Comment      

Introduction

- Well written

- significance of Transformer models needs to be mentioned relevant to hyperspectral data

Response: Thank you for your suggestion. We sincerely appreciate your recognition of our work.

4、Comment      

Multi-scale Transposed CNN-transformer Feature Extraction

- Figure 1 - notations and abbreviations can be mentioned (eg: cat, PE etc)

- Citations needed to support 2.1. Inception-Based Spectral-Spatial Information Enhancement Extraction

- Citations needed to support 2.2. Spatial Transpose Inception module

- Figure 2. Spatial Transpose Inception module (- notations and abbreviations can be mentioned (eg: cat, summing point, image etc)

Response: Thank you for your feedback. We have provided explanations for the abbreviations in Figures 1 and 2. Sections 2.1 and 2.2 describe the spectral spatial feature extraction module and the spatial transposed module, which were designed based on the characteristics of convolution. To support our design, we have included references 33 and 34, which discuss relevant convolution techniques.

5、Comment      

Experimental Results

- Table 2 and Figure 8 are giving same information.

- Fig 8 - what is colour legends represent what?? Y- axes unit??

- Fig 9 - Y axes unit??

- is there a difference of time in execution among all methods? If yes, why not time can be considered as a factor?

Response: Thank you for your feedback. Perhaps we did not express ourselves clearly. Table 2 shows the results of the parameter settings for patch size, 3D-Inception, and 2D-Inception. Figure 8 shows a comparison experiment of whether or not to include the transposed convolution module and the multi-head attention mechanism with or without sine functions. In Figure 8, the green color represents the experimental results without the transposed convolution, and the yellow color represents the experimental results with the transposed convolution. Since this was not explained in the manuscript, we have added an explanation below Figure 8. In Figure 9, we have also included a legend for clarification. There is no difference in execution time between the experiments, only a difference in whether or not a certain module is added.

6、Comment      

  1. Conclusion

- Limitations of the proposed work can be mentioned more in detail

- "Consequently, future research will delve into exploring HSI spectral-spatial feature extraction and post-classification processing." - sentence is not complete and understandable.

Response: Thank you for your suggestion. We have revised the last paragraph based on your feedback, and provided a more detailed explanation of the limitations and shortcomings of our work, as well as future research directions.

 

We have made every effort to improve the manuscript and have made some changes to it. We hope that these revisions are satisfactory.

 

Thank you again for your work on our paper. Wishing you all the best.

 

Yours Sincerely,

 

Authors

Reviewer 2 Report

This paper presented a multi-scale spectral-spatial Transformer for the classification of HIS. The developed method is able to capture high-level semantic features of HSI while preserving spectral information as much as possible. Experimental results demonstrate the effectiveness of the proposed method. My comments are:

 1. The contributions of this paper can be illustrated with bullets instead of within a single paragraph.

2. The literature review on the topic can be improved as there are a lot of new techniques in the field. The following SOTA methods should be mentioned, including DOI: 10.1109/LGRS.2019.2911322, DOI: 10.1109/TGRS.2021.3127536, DOI: 10.1109/TCI.2019.2911881.

3. The authors need to introduce the compared methods and add the corresponding references. It is not clear whether they represent the current SOTA methods.

 

4. There are a few typos in the paper. For instance, line 52 “HIS”. The authors need to thoroughly proofread the paper.

See my comments

Author Response

Response to Reviewer 2's comments:

Dear Reviewer,

 

We truly appreciate your positive and constructive suggestions on our manuscript titled "Hyperspectral image classification based on Multi-scale CNN-transformer". These suggestions are highly valuable and have been instrumental in improving our paper, as well as providing important guidance for our research. We have carefully considered your suggestions and made the necessary revisions, which are highlighted in red throughout the manuscript. The main corrections and our responses to the reviewer's comments are as follows:

1、Comment

1.The contributions of this paper can be illustrated with bullets instead of within a single paragraph.

Response: Thank you for your suggestion. Based on your feedback, we have now used bullet points to list our contributions.

2、Comment

  1. The literature review on the topic can be improved as there are a lot of new techniques in the field. The following SOTA methods should be mentioned, including DOI: 10.1109/LGRS.2019.2911322, DOI: 10.1109/TGRS.2021.3127536, DOI: 10.1109/TCI.2019.2911881.\

Response: Thank you for your feedback. Based on your suggestion, we have included the aforementioned references to make our article more comprehensive. The added references are positioned at 29, 30, and 31. As a result, the subsequent reference positions have changed, and we have indicated these changes in red.

3、Comment

The authors need to introduce the compared methods and add the corresponding references. It is not clear whether they represent the current SOTA methods.

Response: Thank you for your feedback. Based on your suggestion, we have added references to the comparative methods used in our study. These comparative methods are all recent, published within the last two years, and possess a certain level of novelty.

4、Comment

There are a few typos in the paper. For instance, line 52 “HIS”. The authors need to thoroughly proofread the paper.

Response: Thank you for your feedback. We take full responsibility for the low-level error in our manuscript. Based on your suggestion, we conducted a thorough review of the entire article and have made corrections to this type of error wherever they were found.

We have made every effort to improve the manuscript and have made some changes to it. We hope that these revisions are satisfactory.

 

Thank you again for your work on our paper. Wishing you all the best.

 

Yours Sincerely,

 

Authors

 

 

Author Response File: Author Response.pdf

Reviewer 3 Report

Authors contributions:

According to the authors the challenge of hyperspectral image (HSI) classification, where traditional Convolutional Neural Networks (CNNs) have excellent feature extraction capabilities but struggle to capture deep semantic features, especially due to the large number of spectral bands in HSIs. This sets the stage for the need to explore alternative models.

The paper proposes a new approach, T-CNNTF, which combines the strengths of CNNs and Transformer models. It introduces a multi-scale spectral-spatial Transformer, aimed at capturing high-level semantic features while retaining spectral information.

The T-CNNTF method includes a spectral-spatial Inception module that utilizes multi-scale convolutional kernels to extract both spectral and spatial features from the hyperspectral data. This module enhances the ability to process both types of information effectively. Another component of the T-CNNTF approach is the spatial transposed Inception module. This module employs 2D convolution kernels to extract spatial features from the data, complementing the spectral features captured by the previous module.

The final classification results are obtained by applying a linear layer to the learnable tokens from the Transformer output. The approach is evaluated on three public datasets, and its performance is compared with other deep learning methods commonly used for HSI classification.

The experimental results demonstrate the superiority of the proposed T-CNNTF approach compared to other deep learning methods for HSI classification. This suggests that the combination of spectral-spatial processing and Transformer models is effective in addressing the challenges of HSI data.

The study validates the effectiveness of progressively combining various components across different dimensions (spectral and spatial) and integrating a Transformer. This indicates that the proposed method leverages the strengths of multiple techniques.

 

Limitations of this work:

The integration of multiple components, including spectral-spatial Inception modules, spatial transposed Inception modules, and Transformers, could lead to increased computational complexity. This might require significant computational resources and longer training times, making the approach less suitable for resource-constrained environments.

The T-CNNTF approach likely involves a range of hyperparameters that need to be tuned for optimal performance. Finding the right combination of hyperparameters for each module and their interaction can be time-consuming and require expertise.

The effectiveness of the proposed approach could be influenced by the quality and quantity of available hyperspectral data. If the dataset used for training is limited in size or diversity, the generalization capability of the model might be compromised.

The inclusion of multiple modules and mechanisms might improve classification performance, but it could also increase model complexity. There's a trade-off between complexity and performance, as overly complex models might be prone to overfitting or could be harder to deploy.

The experimental results are likely presented on specific public datasets. The approach's performance might vary when applied to different datasets with varying levels of complexity, noise, and class distributions.

 

I have some reviewer notes:

Abstract. Where the results can be implemented in practice? How the work will be continued?

Introduction. The aim of this work is not clearly presented.

Describe Figure 3 in the text.

Line 243. Where AVRIS is available? Describe it.

Line 250. Where ROSIS is available?

Lines 267 to 272. Describe hardware as: Model (Manufacturer, City, Country). Describe software as: Software, Version (Manufacturer, City, Country).

Figure 6 does not have vertical and horizontal axis titles.

Figure 7 does not have vertical and horizontal axis titles.

Table 2. If presented values are dimensionless, you have to describe it.

Figure 8 does not have vertical and horizontal axis titles.

Figure 9 does not have vertical and horizontal axis titles.

Table 3. If presented values are dimensionless, you have to describe it.

Table 4. If presented values are dimensionless, you have to describe it.

Table 5. If presented values are dimensionless, you have to describe it.

Discussion part is missing. It will be good to compare your results with minimum three other papers (more is better).

Conclusion. Where the results can be implemented in practice?

 

I have some suggestions:

Make better graphical presentation of your results. Make better description of software and hardware that you use.

 Minor editing of English language required.

Author Response

Response to Reviewer 3's comments:

Dear Reviewer,

 

We truly appreciate your positive and constructive suggestions on our manuscript titled "Hyperspectral image classification based on Multi-scale CNN-transformer". These suggestions are highly valuable and have been instrumental in improving our paper, as well as providing important guidance for our research. We have carefully considered your suggestions and made the necessary revisions, which are highlighted in red throughout the manuscript. The main corrections and our responses to the reviewer's comments are as follows:

1、Comment

Abstract. Where the results can be implemented in practice? How the work will be continued?

Response: Thank you for your feedback. This result has made certain contributions to plant disease and pest detection, as well as geological surveys. Different objects absorb different spectra, resulting in varying reflectance spectra. Therefore, the reflectance spectra of plants affected by diseases or pests differ from those of healthy plants. The more accurate the identification and classification of objects, the more precise the detection of diseases and pests can be.

 

In terms of geological surveys, this method utilizes satellite imagery of ground features for classification, significantly reducing manpower and time costs. Our future plans involve deploying this method on hardware to achieve accelerated processing, further improving classification accuracy, and reducing model size, among other developments. Due to limitations in the abstract's length, we will elaborate on these aspects in the conclusion section.

2、Comment

Introduction. The aim of this work is not clearly presented.

Response: Thank you for your feedback. We acknowledge that the purpose of our study was not clearly stated in the article. Based on your suggestion, we have now included a direct statement about the objective of our study before the contributions section.

3、Comment

Describe Figure 3 in the text.

Line 243. Where AVRIS is available? Describe it.

Line 250. Where ROSIS is available?

Lines 267 to 272. Describe hardware as: Model (Manufacturer, City, Country). Describe software as: Software, Version (Manufacturer, City, Country).

Response: Thank you for your feedback. Based on your advice, we have included a description of the computation mechanism of Figure 3 and provided an explanation of the symbols in the diagram at line 245 of the article.

 

Due to the high cost of spectral imaging instruments, our experiments in this study were based on publicly available datasets. The AVRIS instrument used in our study is developed and maintained by a NASA-affiliated laboratory, while ROSIS is a spectral imaging instrument developed by the German Aerospace Center (DLR) and has been in use since 2001. We did not include these details in the article due to space limitations, but if you believe it is necessary, please feel free to contact us.

 

Furthermore, based on your suggestion, we have revised the experimental setup description. We have now provided basic hardware parameters and manufacturer information, as well as software details such as the development framework, programming language, developer information, and their nationality. You can find this information in lines 297-306 of the article.

4、Comment

Figure 6 does not have vertical and horizontal axis titles.

Figure 7 does not have vertical and horizontal axis titles.

Table 2. If presented values are dimensionless, you have to describe it.

Figure 8 does not have vertical and horizontal axis titles.

Figure 9 does not have vertical and horizontal axis titles.

Table 3. If presented values are dimensionless, you have to describe it.

Table 4. If presented values are dimensionless, you have to describe it.

Table 5. If presented values are dimensionless, you have to describe it.

Response: Thank you for your feedback. Based on your suggestion, we have made modifications to the figures and tables:

Figures 6, 7, 8, and 9 now include axis labels to provide clarity.

Tables 2, 3, 4, and 5 have been updated with explanations or captions to enhance understanding.

We believe these changes will improve the readability and comprehension of the figures and tables. Thank you for bringing this to our attention.

5、Comment

Discussion part is missing. It will be good to compare your results with minimum three other papers (more is better).

Response: Thank you for your feedback. You are correct that our article lacked a comparative discussion between the methods. Based on your suggestion, we have added a comparative discussion of the experimental results in lines 388-393, 401-406, and 413-416 of the article.

By incorporating this comparative analysis, we aim to provide a more comprehensive evaluation and explanation of the performance differences observed among the different methods.

We appreciate your valuable input and believe that these additions will enhance the quality and completeness of our research.

 

6、Comment

Conclusion. Where the results can be implemented in practice?

Response: Thank you for your feedback. Your suggestions have made our article more comprehensive. Based on your advice, we have revised the conclusion section to introduce the domain in which our work was implemented and provide an outline of our future plans.

We believe that these modifications will provide a clearer understanding of the context and implications of our research. We appreciate your valuable input and strive to make our work more informative and impactful.

7.suggestions:

Make better graphical presentation of your results. Make better description of software and hardware that you use.

Response: Thank you very much for your valuable feedback. Based on your suggestion, we have added descriptions to the figures and provided more details about the hardware setup. These enhancements have further improved the content and structure of our article.

 

We greatly appreciate your input, and we will always keep your suggestions in mind for our future research work. Your feedback has been incredibly helpful, and we are committed to continuously improving and refining our work based on such valuable insights.

 

We have made every effort to improve the manuscript and have made some changes to it. We hope that these revisions are satisfactory.

 

Thank you again for your work on our paper. Wishing you all the best.

 

Yours Sincerely,

 

Authors

Author Response File: Author Response.pdf

Back to TopTop