Next Article in Journal
Missing Burns in the High Northern Latitudes: The Case for Regionally Focused Burned Area Products
Next Article in Special Issue
Harbor Aquaculture Area Extraction Aided with an Integration-Enhanced Gradient Descent Algorithm
Previous Article in Journal
Crossing the Great Divide: Bridging the Researcher–Practitioner Gap to Maximize the Utility of Remote Sensing for Invasive Species Monitoring and Management
Previous Article in Special Issue
Spatial-Aware Network for Hyperspectral Image Classification
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

TRS: Transformers for Remote Sensing Scene Classification

1
College of Computer Science and Technology, Jilin University, Changchun 130012, China
2
Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China
3
Department of Jilin University Library, Jilin University, Changchun 130012, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2021, 13(20), 4143; https://doi.org/10.3390/rs13204143
Submission received: 11 September 2021 / Revised: 11 October 2021 / Accepted: 13 October 2021 / Published: 16 October 2021
(This article belongs to the Special Issue Deep Learning for Remote Sensing Image Classification)

Abstract

Remote sensing scene classification remains challenging due to the complexity and variety of scenes. With the development of attention-based methods, Convolutional Neural Networks (CNNs) have achieved competitive performance in remote sensing scene classification tasks. As an important method of the attention-based model, the Transformer has achieved great success in the field of natural language processing. Recently, the Transformer has been used for computer vision tasks. However, most existing methods divide the original image into multiple patches and encode the patches as the input of the Transformer, which limits the model’s ability to learn the overall features of the image. In this paper, we propose a new remote sensing scene classification method, Remote Sensing Transformer (TRS), a powerful “pure CNNs → Convolution + Transformer → pure Transformers” structure. First, we integrate self-attention into ResNet in a novel way, using our proposed Multi-Head Self-Attention layer instead of 3 × 3 spatial revolutions in the bottleneck. Then we connect multiple pure Transformer encoders to further improve the representation learning performance completely depending on attention. Finally, we use a linear classifier for classification. We train our model on four public remote sensing scene datasets: UC-Merced, AID, NWPU-RESISC45, and OPTIMAL-31. The experimental results show that TRS exceeds the state-of-the-art methods and achieves higher accuracy.
Keywords: transformers; deep convolutional neural networks; multi-head self-attention; remote sensing scene classification transformers; deep convolutional neural networks; multi-head self-attention; remote sensing scene classification

Share and Cite

MDPI and ACS Style

Zhang, J.; Zhao, H.; Li, J. TRS: Transformers for Remote Sensing Scene Classification. Remote Sens. 2021, 13, 4143. https://doi.org/10.3390/rs13204143

AMA Style

Zhang J, Zhao H, Li J. TRS: Transformers for Remote Sensing Scene Classification. Remote Sensing. 2021; 13(20):4143. https://doi.org/10.3390/rs13204143

Chicago/Turabian Style

Zhang, Jianrong, Hongwei Zhao, and Jiao Li. 2021. "TRS: Transformers for Remote Sensing Scene Classification" Remote Sensing 13, no. 20: 4143. https://doi.org/10.3390/rs13204143

APA Style

Zhang, J., Zhao, H., & Li, J. (2021). TRS: Transformers for Remote Sensing Scene Classification. Remote Sensing, 13(20), 4143. https://doi.org/10.3390/rs13204143

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop