Next Article in Journal
Quality Control of CyGNSS Reflectivity for Robust Spatiotemporal Detection of Tropical Wetlands
Next Article in Special Issue
Evaluation of LiDAR-Derived Features Relevance and Training Data Minimization for 3D Point Cloud Classification
Previous Article in Journal
Repeated (4D) Marine Geophysical Surveys as a Tool for Studying the Coastal Environment and Ground-Truthing Remote-Sensing Observations and Modeling
Previous Article in Special Issue
A Recursive Hull and Signal-Based Building Footprint Generation from Airborne LiDAR Data
 
 
Article
Peer-Review Record

SATNet: A Spatial Attention Based Network for Hyperspectral Image Classification

Remote Sens. 2022, 14(22), 5902; https://doi.org/10.3390/rs14225902
by Qingqing Hong, Xinyi Zhong, Weitong Chen, Zhenghua Zhang, Bin Li, Hao Sun, Tianbao Yang and Changwei Tan *
Reviewer 1:
Reviewer 3: Anonymous
Remote Sens. 2022, 14(22), 5902; https://doi.org/10.3390/rs14225902
Submission received: 27 October 2022 / Revised: 13 November 2022 / Accepted: 15 November 2022 / Published: 21 November 2022

Round 1

Reviewer 1 Report

This manuscript proposed a spatial attention-based network (SATNet) with 3D convolution and ViT for hyperspectral image classification. The theoretical description of the paper is very detailed, and the ideal is interesting. In the experimental results section, the proposed method outperforms other compared methods. However, some questions that should be concerned

(1)    The motivation and contribution of the manuscript need to be further clearly emphasized.

(2)    The authors should discuss the computational complexity of the algorithm.

(3)    Some STOA methods regarding HSIC should be compared in your experiments, e.g. graph-based methods.

(4)    Some more methods regarding remote sensing using graph-based methods should be investigated in your introduction, e.g., Semi-Supervised Locality Preserving Dense Graph Neural Network With ARMA Filters and Context-Aware Learning for Hyperspectral Image Classification, Graph Sample and Aggregate-Attention Network for Hyperspectral Image Classification, Unsupervised Self-correlated Learning Smoothy Enhanced Locality Preserving Graph Convolution Embedding Clustering, Self-supervised Locality Preserving Low-pass Graph Convolutional Embedding, AF2GNN: Graph Convolution with Adaptive Filters and Aggregators, Deep hybrid: multi-graph neural network collaboration for hyperspectral image classification .

(5)    The authors use PCA to reduce the spectral dimension of hyperspectral images, which is essentially to convert hyperspectral images into natural images. In this process, most spectral features will be lost, which will lose the significance of rich spectral features of the original hyperspectral image. However, the authors deliberately avoided this problem, but this problem was not solved. I think this algorithm has major defects.

(6)    The authors should add ablation experiments to analyze the role of 3D CNN and ViT.

(7)    Some large datasets should be included in your experiments, e.g., Houston 2013 and Xiongan.

All in all, from my existing knowledge, the classification accuracy of the method proposed by the authors is not outstanding. The algorithm innovation is limited, and the paper needs to be significantly revised if it needs to be published.

Author Response

Detailed Responses to Reviewers

Dear Editors and Reviewers:

Thank you for your letter and the reviewers’ comments concerning our manuscript entitled “SATNet: A Spatial Attention Based Network for Hyperspectral Image Classification”. These comments are valuable and very helpful for revising and improving our paper. We have studied comments carefully and have made correction which we hope meet with approval. Based on the instructions provided in your letter, we uploaded the file of the revised manuscript. The main corrections in the manuscript and the responses to the reviewer’s comments are as following:

 

Responds to the reviewer #1 comments:

  1. The motivation and contribution of the manuscript need to be further clearly emphasized.

Response: Thank you for your comments. We provide a clearer statement of the motivation and contribution of this manuscript in the introduction to the paper. Lines 106–117 of the article further explain our motivation and contribution. It reads as follows.

“The aforementioned work shows that CNNs are capable of effective feature extraction, however, the feature maps they produced a lot of spatially duplicated data. Additionally, CNNs ignore certain spatial information because of the local nature of convolution, which has an impact on the effectiveness of classification. The new framework for HSI classification proposed in this paper is based on the vision transformer (ViT) model spatial attention networks and 3D OctConv, which we will intend to apply to agriculture in the future, as shown in Figure 1 (a). The main contributions in this paper are as follows: (1) Using 3D 3D OctConv to extract the spectral-spatial features of HSIs, which reduces the spatial redundant information of the feature maps and decreases the computational effort. (2) Combining 3D OctConv and ViT, which can extract local-global features and enhance the extraction of spatial-spectral feature information, improves the classification performance.”

 

 

  1. The authors should discuss the computational complexity of the algorithm.

Response: Thank you for your comments. We compared the complexity and model parametric quantities of all the models used in the manuscript, and the results are shown in the Table 9. It reads as follows.

“As indicated in Table 9, we also contrasted the computational complexity and quantity of model parametrizations for various techniques on various datasets. According to the table, Baseline employing only the completely connected layer has the most model parameters, followed by 3D Octave, HybridSN, and SATNET. Due to its utilization of all spectral bands, SSRN requires the most computational effort. On the IP dataset, HybridSN required more computational effort than SATNET, while on the UP and SA datasets, SATNET required more computational effort than HybridSN. Additionally, 3D Octave required less processing power than HybridSN since it divided the feature maps into frequency-based components.”

Table 9. Complexity and number of parameters of different algorithms.

 

IP

UP

SA

 

GFLOPs

Params.

GFLOPs

Params.

GFLOPs

Params.

Baseline

2.16

67.5M

0.98

30.6M

0.98

30.6M

3DCNN

0.187

0.82M

0.043

0.11M

0.459

0.20M

SSRN

14.19

0.34M

7.29

0.19M

14.47

0.35M

HybridSN

8.88

2.67M

1.70

1.19M

1.70

1.19M

ViT

0.0726

1.9M

0.0357

0.8M

0.0357

0.8M

3D_Octave

5.46

22.1M

1.489

6.06M

1.489

6.06M

Octave_ViT

6.81

2.41M

2.10

0.93M

2.10

0.93M

SANET

7.89

2.40M

2.40

0.93M

2.40

0.93M

 

 

  1. Some STOA methods regarding HSIC should be compared in your experiments, e.g. graph-based methods.

Response: Thank you for your constructive comments.  Hyperspectral image classification based on graph convolution is to convert hyperspectral images from Euclidean distance perspective to non-Euclidean distance perspective for analysis, for the spatial graph convolution is to divide the pixels in the image space into nodes one by one and calculate the relationship between the neighboring edges of the nodes; for the spectral graph convolution is to replace the information of all spectral bands with mean value and classify them by unsupervised clustering method. Graph convolution for HSI classification is a good novel method to alleviate the problem of limited labeled samples and to a certain extent the phenomenon of local edge disappearance in convolutional neural networks.

However, HSI images have a wide range of spectral bands, and their sensitive bands differ for various aspects, such as in applications for precision and smart agriculture, where spectral data is very important. And the issues with homogenous and heterogeneous spectra are frequently present with hyperspectral photographs. Therefore, we think that using spectral mean rather than spectral band features in the graph convolution network will partially disregard the significant information in the spectrum bands. The use of convolutional neural network is to extract the spatial features and main spectral features of each pixel point of the image based on Euclidean distance, which can better alleviate this problem, and we intend to apply this model to agriculture in the future, so we think that these two categories are different perspectives of classification methods and can not be added.

 

 

  1. Some more methods regarding remote sensing using graph-based methods should be investigated in your introduction, e.g., Semi-Supervised Locality Preserving Dense Graph Neural Network With ARMA Filters and Context-Aware Learning for Hyperspectral Image Classification, Graph Sample and Aggregate-Attention Network for Hyperspectral Image Classification, Unsupervised Self-correlated Learning Smoothy Enhanced Locality Preserving Graph Convolution Embedding Clustering, Self-supervised Locality Preserving Low-pass Graph Convolutional Embedding, AF2GNN: Graph Convolution with Adaptive Filters and Aggregators, Deep hybrid: multi-graph neural network collaboration for hyperspectral image classification.

Response: Thank you for your comments and the reference you provided. These articles have good content, novel ideas, and improvements based on the problems of graph neural networks for HSI classification, which provide new ideas for HSI classification with great significance, and we have added them to our introduction. We added these documents in line 70 and lines 87-94. It reads as follows.

“The mainstream deep learning models for HSI classification include autoencoders (AEs) [38], deep belief networks (DBNs), convolutional neural networks (CNNs) [39,40] and graph convolutional network (GCN) [5,6,11].”

“For example, Ding et al. [13] proposed a semi-supervised network based on graph samples and aggregated attention for hyperspectral image classification and an unsupervised clustering method that can retain local spectral features and spatial features [14]. They additionally suggested a new self-supervised locally-preserving low-pass graph convolutional embedding method for large-scale hyperspectral image clustering since unsupervised HSI classification is not well suited to complex large-scale HSI datasets [37] .”

 

 

 

  1. The authors use PCA to reduce the spectral dimension of hyperspectral images, which is essentially to convert hyperspectral images into natural images. In this process, most spectral features will be lost, which will lose the significance of rich spectral features of the original hyperspectral image. However, the authors deliberately avoided this problem, but this problem was not solved. I think this algorithm has major defects.

Response: Thank you for your comments. Because hyperspectral images have rich spectral information and similar information in adjacent spectral bands, a large amount of spectral redundant information is generated, and reducing the spectral redundant information is beneficial to improve the performance of hyperspectral classification. PCA reduces the dimensionality of all spectral bands of hyperspectral images for feature mapping to K dimensions, and removes the redundant spectral information to obtain K main features. Therefore, we think that the use of PCA retains the spectral features with high information but low correlation, discards the redundant features, and does not lose the significance of rich spectral features.

 

 

  1. The authors should add ablation experiments to analyze the role of 3D CNN and ViT.

Response: Thank you for your comments. In our manuscript, we have three ablation experiments based on the ViT model, based on the 3D Octave convolution model, and the 3D Octave convolution combined with the ViT model. We analyze their effects on HSI classification by these three ablation experiments. Because our method uses 3D Octave convolution instead of 3D vanilla convolution, we do not add the ablation experiment of 3D CNN.

 

7.Some large datasets should be included in your experiments, e.g., Houston 2013 and Xiongan.

Response: Thank you for your useful comments. Due to the large size of the Houston 2013 dataset, we could not run all the comparison experiments due to the limitation of computer configuration. Since the SSRN model uses all spectral bands, it consumes a large amount of memory, and our computer configuration is not sufficient to run it. In addition, when selecting the best spectral band for Houston 2013, we were prompted with insufficient memory when PCA=110, so we were unable to implement all experiments. We can provide the results of the run if you need.

 

Due to our oversight, some of the data in Table 4 IP and Table 5 IP in the paper are the results obtained by running on different computer devices, and we have corrected them in order to unify the experimental environment. Due to the wrong parameter setting of SSRN on the SA dataset before, in this check, we corrected the results of SSRN in Table 8 and its experimental result plots (Figure 10(d), Figure 13(c)) and the difference between SATNET and it in rows 495-498. See the revised draft for details.

 

Else, in the revision of this manuscript, we have made some changes in spelling.

Special thanks to you for your comments.

We try our best to improve the manuscript and made some changes in the manuscript. If there are any other deficiencies in the article, we look forward to your further comments.

We appreciate for Editors/Reviewers’ warm work earnestly and hope that the revised manuscript will be accepted for publication.

Once again, thank you very much for your comments and suggestions.

Kind regards,

Xinyi Zhong

 

Author Response File: Author Response.docx

Reviewer 2 Report

As a researcher working in the same field, I am impressed by the technique introduced in the paper, because it sheds new light on the earlier results of several authors and obviously can be successfully used in practice. From this point of view, the subject of the paper fits well with the scope of the journal

Remote Sensing 

 

The paper is ended with numerical simulations that corroborate the theoretical results.

This manuscript contains new ideas and good results that help other researchers.

 

 

Remote Sensing  

 

Therefore, I recommend publishing this work after taking these points into account.

1-The English writing of the paper is required to be improved. Please check the manuscript carefully for typos and grammatical errors. I found some typos and grammatical errors within this manuscript, which have been excluded from my review. In addition, the English structure of the article, including punctuation, semicolon, and other structures, must be carefully reviewed.

2-In the introduction, the authors did not provide a strong motivation for the paper and the obtained results. In addition, they should discuss the main contributions of their work in detail after the motivation part. Then they should summarize the main structure of their paper in brief at the end of the introduction.

3-The literature review about the problem under study is not adequate. I suggest the authors keep up-to-date the introductory part with the recent relevant developments and publications.

4-Future recommendations should be added to assist other researchers to extend the presented research analysis.

Sincerely Yours

 

 

Author Response

Detailed Responses to Reviewers

Dear Editors and Reviewers:

Thank you for your letter and the reviewers’ comments concerning our manuscript entitled “SATNet: A Spatial Attention Based Network for Hyperspectral Image Classification”. These comments are valuable and very helpful for revising and improving our paper. We have studied comments carefully and have made correction which we hope meet with approval. Based on the instructions provided in your letter, we uploaded the file of the revised manuscript. The main corrections in the manuscript and the responses to the reviewer’s comments are as following:

 

Responds to the reviewer #2 comments:

  1. The English writing of the paper is required to be improved. Please check the manuscript carefully for typos and grammatical errors. I found some typos and grammatical errors within this manuscript, which have been excluded from my review. In addition, the English structure of the article, including punctuation, semicolon, and other structures, must be carefully reviewed.

Response: Thank you so much for your useful comments. We apologize for the language problems in the original manuscript and inconvenience they caused in your reading. The manuscript has been thoroughly revised with assistance from remote sensing experts and a native English speaker with appropriate research background.

 

 

  1. In the introduction, the authors did not provide a strong motivation for the paper and the obtained results. In addition, they should discuss the main contributions of their work in detail after the motivation part. Then they should summarize the main structure of their paper in brief at the end of the introduction.

Response: Thank you for your comments. We provide a clearer statement of the motivation and contribution of this manuscript in the introduction to the paper. The structure of our paper is described in the last paragraph of the introduction section. Lines 106–117 of the article further explain our motivation and contribution. And the structure of the article is introduced in lines 118-123. It reads as follows.

“The aforementioned work shows that CNNs are capable of effective feature extraction, however, the feature maps they produced a lot of spatially duplicated data. Additionally, CNNs ignore certain spatial information because of the local nature of convolution, which has an impact on the effectiveness of classification. The new framework for HSI classification proposed in this paper is based on the vision transformer (ViT) model spatial attention networks and 3D OctConv, which we will intend to apply to agriculture in the future, as shown in Figure 1 (a). The main contributions in this paper are as follows: (1) Using 3D 3D OctConv to extract the spectral-spatial features of HSIs, which reduces the spatial redundant information of the feature maps and decreases the computational effort. (2) Combining 3D OctConv and ViT, which can extract local-global features and enhance the extraction of spatial-spectral feature information, improves the classification performance.

The paper are structured as follows: the second part presents the related work involved in the model proposed in the paper as well as a general introduction to the proposed model, the third part describes the three HSI datasets used for the experiments, the experimental setup, performance comparison, the fourth part discusses the method of this paper and other comparative methods, and the fifth part concludes with conclusions and an implication for future work.”

 

 

  1. The literature review about the problem under study is not adequate. I suggest the authors keep up-to-date the introductory part with the recent relevant developments and publications.

Response: Thank you for your comments. In the introduction, we have incorporated a few good techniques for hyperspectral image classification that have recently been developed by other researchers, such as graph-based methods including “Semi-Supervised Locality Preserving Dense Graph Neural Network With ARMA Filters and Context-Aware Learning for Hyperspectral Image Classification, Graph Sample and Aggregate-Attention Network for Hyperspectral Image Classification, Unsupervised Self-correlated Learning Smoothy Enhanced Locality Preserving Graph Convolution Embedding Clustering, Self-supervised Locality Preserving Low-pass Graph Convolutional Embedding, AF2GNN: Graph Convolution with Adaptive Filters and Aggregators, Deep hybrid: multi-graph neural network collaboration for hyperspectral image classification”. Additionally, we included additional references from more recent years and deleted other references from prior years in line 68-70 and lines 87-94.

 

 

  1. Future recommendations should be added to assist other researchers to extend the presented research analysis.

Response: We gratefully appreciate for your valuable comment. We have further analyzed the shortcomings of the proposed approach to gain a clearer understanding of where we can improve in the future. And we have added directions in the text that we will continue to explore in the future. It reads as follows.

“In the future, HSI classification need to be carried out from a small sample viewpoint, such as data improvement utilizing adversarial generative networks, in the hopes of lowering the cost of labeling hyperspectral images. This is due to the high cost of manually labeling hyperspectral images. In addition, the usage of a lightweight transformer network structure to reduce the significant computational cost of self-attention in the ViT model should be further investigated. In order to increase the training and inference speed of the ViT model and, hopefully, get a higher classification accuracy with less time spent on it, we will lightweight the model in future study.”

 

Due to our oversight, some of the data in Table 4 IP and Table 5 IP in the paper are the results obtained by running on different computer devices, and we have corrected them in order to unify the experimental environment. Due to the wrong parameter setting of SSRN on the SA dataset before, in this check, we corrected the results of SSRN in Table 8 and its experimental result plots (Figure 10(d), Figure 13(c)) and the difference between SATNET and it in rows 495-498. See the revised draft for details.

Else, in the revision of this manuscript, we have made some changes in spelling.

Special thanks to you for your comments.

We try our best to improve the manuscript and made some changes in the manuscript. If there are any other deficiencies in the article, we look forward to your further comments.

We appreciate for Editors/Reviewers’ warm work earnestly and hope that the revised manuscript will be accepted for publication.

Once again, thank you very much for your comments and suggestions.

Kind regards,

Xinyi Zhong

 

 

Author Response File: Author Response.docx

Reviewer 3 Report

-The paper should be interesting ;;;

-it is a good idea to add a block diagram of the proposed research (step by step);;;

-it is a good idea to add more photos of measurements, sensors + arrows/labels what is what  (if any);;;

-What is the result of the analysis?;;

-figures should have high quality. ;;;;;

-references to figures in the text should be added;;;

-text should be formatted;;;;

-please add photos of the application of the proposed research, 2-3 photos ;;; 

-what will society have from the paper?;;

-axes, labels of figures should be bigger;;;;

-Is there a possibility to use the proposed research for other topics for example thermal imaging etc.:

"Ventilation Diagnosis of Angle Grinder Using Thermal Imaging";;;;

-please compare the advantages/disadvantages of other approaches etc.;;;

-references should be from the web of science 2020-2022 (50% of all references, 30 references at least);;;

-Conclusion: point out what have you done;;;;

-please add some sentences about future work;;;

Author Response

Detailed Responses to Reviewers

Dear Editors and Reviewers:

Thank you for your letter and the reviewers’ comments concerning our manuscript entitled “SATNet: A Spatial Attention Based Network for Hyperspectral Image Classification”. These comments are valuable and very helpful for revising and improving our paper. We have studied comments carefully and have made correction which we hope meet with approval. Based on the instructions provided in your letter, we uploaded the file of the revised manuscript. The main corrections in the manuscript and the responses to the reviewer’s comments are as following:

Responds to the reviewer #3 comments:

  1. The paper should be interesting.

Response: Thank you for your comments. We will pay more attention to essay writing in the future to improve our essay writing skills and make them more readable.

 

 

  1. It is a good idea to add a block diagram of the proposed research (step by step).

Response: We are appreciative of the reviewers’ suggestions. We have added the module block diagram and related content at the beginning of Section 2 of the article. It reads as follows.

“The method proposed in this paper is a new model of spatial attention network based on 3D OctConv and ViT, and the flow chart of this model is shown in Figure 2. Spatial attention, 3D OctConv, and ViT feature extraction are the three main modules of the framework. This section introduces each of modules separately as well as the model.”

 Figure 2. Flow of framework for SATNet.

 

 

  1. It is a good idea to add more photos of measurements, sensors + arrows/labels what is what (if any).

Response: Thank you for your comments. Although we have hyperspectral instrument, the datasets we use are open source, downloaded from the internet, and not taken by us, so we are unable to provide photos of the instruments used to take these datasets.

 

 

  1. What is the result of the analysis?

Response: Thank you for your comments. We put both the experimental results and the analysis of the results in the performance comparison part of the experiment. We conducted comparison experiments on three datasets pairs, and from the experimental results, we can see that our method has higher classification accuracy than the other methods. It reads as follows.

“From the above results, it could be concluded that all three metrics (OA, AA, and Kappa) reflected that the classification performance of our method on these three datasets was significantly better than other methods. To comprehensively evaluate the classification performance, Figures 8, 9 and 10 showed the classification graphs obtained using different methods on the three datasets of IP, UP and SA, respectively, from a visualization perspective. The comparison showed that on the IP dataset, the classification map of Baseline produced the most noise points and many pixels were misclassified on the boundary between different categories, followed by the classification maps of 3DCNN and ViT with more noise points. The classification map of this paper's method had the least noise points and had clearer discrimination, despite OctaveVit and 3D_Octave having fewer noise points. Since the spatial resolution of the IP dataset is larger than that of the UP and SA datasets, it is more likely to produce confounding and more difficult to classify. Therefore, the classification map of the UP and SA datasets produced fewer noise points than those of the IP dataset. On the UP dataset, the classification map of ViT produced the most noise points and the largest error. While the classification map of the method in this paper has the highest quality and the least number of misclassified pixel points, OctaveVit and 3D_Octave are only the second highest. For the classification maps of SA dataset, the classification map of SSRN produced the most noise points and the worst classification effect, followed by ViT, Baseline and 3DCNN. OctaveVit and 3D_Octave classification maps have fewer noise points, and the classification map of the proposed method in this paper had the least noise points and was closest to ground truth map.

Figures 11, 12 and 13 showed the confusion matrix obtained on the IP, UP and SA datasets using different methods Baseline, 3DCNN, SSRN, HybridSN, ViT, 3D_Octave and OctaveVit respectively. Each row of it represented the predicted value, each column represented the actual value, and the value on the diagonal line represented the correctly predicted result. The confusion matrix provided an intuitive view of the performance of the classification algorithm on each category. It put the predicted and true results of each category in a table, so that the number of correct and incorrect identifications for each category could be clearly known. For the IP dataset, it could be seen from Figure 11(h) that there are 22 Class II features predicted as the Class XI and 16 Class XVI features predicted as the Class XII. For the UP dataset, it could be seen from Figure 12(h) that 11 Class I features are predicted as Class VIII and 6 Class IV features are predicted as Class I. In addition, 11 Class III features are predicted as Class VIII and 26 Class IV features are predicted as Class VIII; 15 Class VIII features are predicted as Class IV, indicating that Class I, Class III, and Class IV features are not easily distinguished from Class VIII distinction. For the SA dataset, it could be seen from Figure 13(h) that 3 Class VI features were predicted as Class V; 2 Class IV features were predicted as Class V.”

 

 

  1. Figures should have high quality.

Response: Thank you for your comments. We feel sorry for the inconvenience brought to the reviewer. We have processed the figures to improve the quality of the figures. We have modified Figure 7 and Figure 11-13.

 

 

  1. References to figures in the text should be added.

Response: Thank you for your comments. We have added the references to the figures. We have added cited references in Figure 4 and Figure 5.

 

  1. Text should be formatted.

Response: Thank you for your reminding. We have made corrected to the text format. We modified the properties of all tables in the text to be non-wrapped and left-justified, with tables 6-8 indented 3.5cm left and the rest indented 4.5cm left. In addition, we bolded the serial numbers of the figure titles in Figures 7-13, for example (a).

 

 

  1. Please add photos of the application of the proposed research, 2-3 photos.

Response: Thank you for your comments. We have added two photos of the application in the introduction section. Figure 1(a) shows the application of hyperspectral images in agriculture, such as crop classification, disease detection, etc. Figure 1(b) shows the application of hyperspectral images in Earth observation.

 

 

  1. What will society have from the paper?

Response: Thank you for your comments. The purpose of hyperspectral image classification is to classify features using the theory and technology of classification. Improving the classification accuracy of hyperspectral images can provide a solid and reliable foundation of feature information for subsequent hyperspectral image applications, such as Agricultural Monitoring, forest vegetation classification, marine water pollution monitoring, etc.

 

 

  1. Axes, labels of figures should be bigger.

Response: Thank you for your reminding. We have made changes to the axes and labels of figures size. We have modified Figure 7 and Figure 11-13.

 

 

  1. Is there a possibility to use the proposed research for other topics for example thermal imaging: "Ventilation Diagnosis of Angle Grinder Using Thermal Imaging".

Response: Thank you for your comments. It can be applied in agriculture, such as classifying different crops in the field using hyperspectral images of farmland taken by drones. The research approach in this paper is to use the rich spectral and spatial features of hyperspectral images to achieve feature classification. Hyperspectral images have hundreds of continuous spectral bands, while thermal imaging is a distribution map of infrared radiant energy, and from this perspective, we think it may not be very applicable.

 

 

  1. Please compare the advantages/disadvantages of other approaches etc.

Response: Thank you for your comments. Baseline uses a fully-connected layer for hyperspectral image (HSI) classification, because the input of the fully-connected layer is one-dimensional data, and the rich spatial information in the HSI data cube must be vectorized to be processed together with the spectral information, Baseline cannot fully utilize the spectral-spatial information. 3DCNN uses a 3D convolution kernel, which conforms to the HSI data The 3DCNN uses 3D convolution kernels, which conforms to the nature of the cube, and thus can more fully utilize the spectral and spatial features of HSI, but the 3DCNN is computationally expensive and generates a large amount of spatial redundant information. HybridSN adds 2D convolution kernels along the spectral channel to the 3DCNN, which further improves the spectral-spatial feature extraction capability of the network, but still has the above-mentioned problems. SSRN adds spectral and spatial residuals to the network The addition of spectral and spatial residual blocks in the network alleviates the problem of classification accuracy degradation, but there is a large amount of spectral redundancy due to the use of all spectral bands, which increases the computational cost. ViT, 3D_OCTAVE, and OctaveVit are our ablation experiments. ViT can extract global features, but it requires a large number of labeled samples for training, which leads to poor results because of the small number of samples used. 3D_OCTAVE can better extract spectral-spatial features and reduce spatially redundant information, but the number of model parameters is much larger. OctaveVit combines these advantages and reduces the number of model parameters.

 

 

  1. References should be from the web of science 2020-2022 (50% of all references, 30 references at least).

Response: Thank you for your comments. We have updated and added references cited in the article to ensure that 50% of the references in the article are from the years 2020-2022. Such as

  1. Ding, Y.; Zhang, Z.; Zhao, X.; Hong, D.; Li, W.; Cai, W.; Zhan, Y. AF2GNN: Graph convolution with adaptive filters and aggregator fusion for hyperspectral image classification. Inf. Sci. 2022, 602, 201-219.
  2. Yao, D.; Zhi-li, Z.; Xiao-feng, Z.; Wei, C.; Fang, H.; Yao-ming, C.; Cai, W.-W. Deep hybrid: Multi-graph neural network collaboration for hyperspectral image classification. Defence Technology 2022.

 

  1. Ding, Y.; Zhao, X.; Zhang, Z.; Cai, W.; Yang, N. Graph Sample and Aggregate-Attention Network for Hyperspectral Image Classification. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1-5.
  2. Ding, Y.; Zhang, Z.; Zhao, X.; Cai, W.; Yang, N.; Hu, H.; Huang, X.; Cao, Y.; Cai, W. Unsupervised Self-Correlated Learning Smoothy Enhanced Locality Preserving Graph Convolution Embedding Clustering for Hyperspectral Images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1-16.

 

  1. Ding, Y.; Zhang, Z.; Zhao, X.; Cai, Y.; Li, S.; Deng, B.; Cai, W. Self-Supervised Locality Preserving Low-Pass Graph Convolutional Embedding for Large-Scale Hyperspectral Image Clustering. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1-16.
  2. Madani, H.; McIsaac, K. Distance Transform-Based Spectral-Spatial Feature Vector for Hyperspectral Image Classification with Stacked Autoencoder. Remote Sens. 2021, 13, 1732.
  3. Paoletti, M.E.; Haut, J.M. Adaptable Convolutional Network for Hyperspectral Image Classification. Remote Sens. 2021, 13, 3637.
  4. Liu, Q.; Wu, Z.; Jia, X.; Xu, Y.; Wei, Z. From Local to Global: Class Feature Fused Fully Convolutional Network for Hyperspectral Image Classification. Remote Sens. 2021, 13, 5043.

 

 

  1. Conclusion: point out what have you done.

Response: Thank you for your comments. We previously put summaries of the work done in discussion, and now we have put them in the summary section and have updated the content of discussion and conclusion. Here's what we have done.

“In the paper, a novel spatial attention network for HSI classification called SATNet was proposed. In our approach, we first employ PCA to reduce the spectral redundancy by dimensionality reduction of the spectral bands. Secondly, to decrease the redundant of spatial information, 3D OctConv is utilized. In addition, we apply a spatial attention module to process the input feature maps, thus enabling adaptive selection of spatial information to emphasize pixels that are similar or categorically useful to the central pixel. To extract the overall data of the feature maps and further enhance the spectral and spatial properties, the processed results are linked with the three-dimensional low-frequency map produced by OctConv and translated into two dimensions. The result showed the classification performance is improved using our method. For the IP, UP and SA, the OA reaches 99.06%, 99.67% and 99.984% respectively. The extensive experimental results showed the superiority of our approach.”

 

 

  1. Please add some sentences about future work.

Response: We gratefully appreciate for your valuable comment. We have further analyzed the shortcomings of the proposed approach to gain a clearer understanding of where we can improve in the future. And we have added directions in the text that we will continue to explore in the future. Here's what we'll be doing in the future.

“In the future, HSI classification need to be carried out from a small sample viewpoint, such as data improvement utilizing adversarial generative networks, in the hopes of lowering the cost of labeling hyperspectral images. This is due to the high cost of manually labeling hyperspectral images. In addition, the usage of a lightweight transformer network structure to reduce the significant computational cost of self-attention in the ViT model should be further investigated. In order to increase the training and inference speed of the ViT model and, hopefully, get a higher classification accuracy with less time spent on it, we will lightweight the model in future study.”

 

Due to our oversight, some of the data in Table 4 IP and Table 5 IP in the paper are the results obtained by running on different computer devices, and we have corrected them in order to unify the experimental environment. Due to the wrong parameter setting of SSRN on the SA dataset before, in this check, we corrected the results of SSRN in Table 8 and its experimental result plots (Figure 10(d), Figure 13(c)) and the difference between SATNET and it in rows 495-498. See the revised draft for details.

 

 

Else, in the revision of this manuscript, we have made some changes in spelling.

Special thanks to you for your comments.

We try our best to improve the manuscript and made some changes in the manuscript. If there are any other deficiencies in the article, we look forward to your further comments.

We appreciate for Editors/Reviewers’ warm work earnestly and hope that the revised manuscript will be accepted for publication.

Once again, thank you very much for your comments and suggestions.

Kind regards,

Xinyi Zhong

 

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

No more comments.

Reviewer 3 Report

---

Back to TopTop