Next Article in Journal
Satellite Monitoring of the Urban Expansion in the Pearl River–Xijiang Economic Belt and the Progress towards SDG11.3.1
Next Article in Special Issue
Hyperspectral Image Classification Using Spectral–Spatial Double-Branch Attention Mechanism
Previous Article in Journal
Constructing a Regional Ionospheric TEC Model in China with Empirical Orthogonal Function and Dense GNSS Observation
Previous Article in Special Issue
Unsupervised Nonlinear Hyperspectral Unmixing with Reduced Spectral Variability via Superpixel-Based Fisher Transformation
 
 
Article
Peer-Review Record

Vision Transformer-Based Ensemble Learning for Hyperspectral Image Classification

Remote Sens. 2023, 15(21), 5208; https://doi.org/10.3390/rs15215208
by Jun Liu 1, Haoran Guo 1, Yile He 1 and Huali Li 2,*
Reviewer 1: Anonymous
Reviewer 2:
Remote Sens. 2023, 15(21), 5208; https://doi.org/10.3390/rs15215208
Submission received: 7 September 2023 / Revised: 18 October 2023 / Accepted: 30 October 2023 / Published: 2 November 2023
(This article belongs to the Special Issue Advances in Hyperspectral Remote Sensing Image Processing)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

In this paper, hyperspectral image classification is accomplished by assembling seven different ViT variants. Specifically, to balance the contradiction between limited number of training samples and correlation between training and testing samples, spatial shuffling is employed as preprocessing. The reviewer does not think the current manuscript qualified for publication.

1.   The novelty of the proposed ensemble learning framework based on ViT is quite limited. Assembling does not make a novel work. Besides, according to the experimental results, the classification accuracy is not improved too much compared with results acquired by using individual Vit variant, considering the increasing complexity by assembling ViT variants.

2.  It seems that the employed spatial shuffle preprocessing is quite critical in accuracy improvement based on the ablation experimental results. Therefore, the authors should also clarify the difference between spatial shuffling employed in this manuscript and that employed in Ref.[46].

Author Response

Please see the attached file. Thanks.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

The authors proposed vision transformer-based ensemble learning for hyperspectral image classification. The proposed method provide a new solution for high-accuracy classification of hyperspectral images. However, the proposed method have the following issues need to be concerned.

1. Why did the authors use seven ViT variants instead of more? Generally, ensemble learning needs dozens of base classifiers to construct ensemble model.

2. The Abstract needs to be reorganized to make it logical. For example, the sentence " This framework adopts a two-level ensemble strategy to achieve the final pixel-level classification" is sudden. Currently, the readers can not achieve effective information from this sentence.

3. The Abstract needs to be reorganized to make it logical. For example, the sentence " This framework adopts a two-level ensemble strategy to achieve the final pixel-level classification" is sudden. Currently, the readers can not achieve effective information from this sentence.

4. The introduction of Transformer-based HSI classification is inadequate. Go through the academic search, we can find many transformer-based HSI classification work.

5. The motivation of the proposed method needs to be enhanced. First, why adopt ensemble learning instead of other methods to solve small sample problems. Second, how to ensure that the weak classifiers generated by ViT are diverse enough.

6. Currently, only two deep learning methods, CNN2D and PPF, have been compared with the proposed methods. To demonstrate the effectiveness of the proposed method, it is recommended to add more methods based on deep learning to compare with the proposed methods.

7. This paper aims to address the high-accuracy classification problem under limited samples. However, the number of training samples is not satisfied with the requirement of small sample condition. We can see that the training number of each class for SV and UP datasets are very high, reaching several hundreds. Usually, a training set contains only a few samples per class to be called a small sample. So, the reviewer suggests the authors use small training samples for experiments.

8. The reader finds it difficult to discern the classification result differences among the various methods in Figure 5. It is suggested that the authors highlight different regions by enlarging them, or classify the unlabeled background in reference to the figure to demonstrate the consistency of the classification results.

9. Conclusion part: The authors should not only conclude the advantages of the proposed method, but also point out the deficiencies for future research.

10. grammars or typos: 

line 28, Hyperspectral image-->hyperspectral image

line 51: Random Forests (RF) -->Random Forests (RFs)

Author Response

Please see the attached file. Thanks.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

The authors have addressed my concerns. It seems that it is ready for publication.

Back to TopTop