Next Article in Journal
Anticipated Capabilities of the ODYSEA Wind and Current Mission Concept to Estimate Wind Work at the Air–Sea Interface
Previous Article in Journal
The Characteristics and Evolution of Structural and Functional Connectivity in a Large Catchment (Poyang Lake) during the Past 30 Years
Previous Article in Special Issue
A Cross-Channel Dense Connection and Multi-Scale Dual Aggregated Attention Network for Hyperspectral Image Classification
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Feature Embedding Network with Multiscale Attention for Hyperspectral Image Classification

1
School of Electronic Engineering, Xidian University, Xi’an 710071, China
2
Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, Collaborative Innovation Center of Quantum Information of Shaanxi Province, School of Artificial Intelligence, Xidian University, Xi’an 710071, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2023, 15(13), 3338; https://doi.org/10.3390/rs15133338
Submission received: 15 May 2023 / Revised: 23 June 2023 / Accepted: 28 June 2023 / Published: 29 June 2023
(This article belongs to the Special Issue Deep Learning for Hyperspectral Image Classification)

Abstract

:
In recent years, convolutional neural networks (CNNs) have been widely used in the field of hyperspectral image (HSI) classification and achieved good classification results due to their excellent spectral–spatial feature extraction ability. However, most methods use the deep semantic features at the end of the network for classification, ignoring the spatial details contained in the shallow features. To solve the above problems, this article proposes a hyperspectral image classification method based on a Feature Embedding Network with Multiscale Attention (MAFEN). Firstly, a Multiscale Attention Module (MAM) is designed, which is able to not only learn multiscale information about features at different depths, but also extract effective information from them. Secondly, the deep semantic features can be embedded into the low-level features through the top-down channel, so that the features at all levels have rich semantic information. Finally, an Adaptive Spatial Feature Fusion (ASFF) strategy is introduced to adaptively fuse features from different levels. The experimental results show that the classification accuracies of MAFEN on four HSI datasets are better than those of the compared methods.

1. Introduction

Hyperspectral Image (HSI) is a three-dimensional data cube composed of hundreds of continuous spectral bands, which contains rich spectral–spatial information and is very helpful for ground object recognition. Therefore, HSI classification has been widely applied in environmental monitoring [1,2], mineral exploration [3], precision agriculture [4,5] and other fields.
In the early stages of HSI classification research, most methods mainly focused on the utilization of spectral features, such as kernel-based support vector machines [6], polynomial logistic regression [7,8] and random subspaces [9,10]. However, these methods only consider spectral information and ignore spatial features, so it is difficult to obtain good classification performance.
As deep-learning-based methods became widely applied and achieved excellent results in image classification [11,12], semantic segmentation [13] and natural language processing [14], researchers began to introduce them into HSI classification [15,16,17], and proposed many classification methods based on Convolutional Neural Networks (CNNs) [18,19,20]. Hu et al. [21] proposed Deep Convolutional Neural Networks (DCNNs), which used multiple 1D-CNNs to extract spectral features and improve the classification performance. Li et al. [22] adopted 3D-CNNs to effectively extract spectral–spatial features, thereby improving the classification performance. Since then, more deep learning methods based on spectral–spatial feature extraction have been used for HSI classification. Zhong et al. [23] designed an end-to-end Spectral–Spatial Residual Network (SSRN), which used continuous residual blocks to learn spectral and spatial features separately, so as to extract more discriminative features. Roy et al. [24] proposed Hybrid Spectral CNN (HybridSN) by combining the characteristics of 3D-CNN and 2D-CNN, which reduced the model’s complexity and obtained satisfactory performance. Mu et al. [25] designed a U-shaped deep network model with principal component features as the model input and edge features of space as the model label, which realized the adaptive fusion of these two features. The fusion features were combined with the spectral features extracted by the Long Short-Term Memory (LSTM) model for spectral–spatial feature classification. To fully exploit the spectral–spatial features of HSIs, Huang et al. [26] proposed a Dual-Branch Attention-Assisted CNN (DBAA-CNN). This network could extract sufficient diverse information, achieving higher classification accuracy. Lu et al. [27] proposed a new dual-branch network structure, where each branch learned pixel-level spectral features and patch-level spectral–spatial features, respectively. The features from the two branches were then combined to further enhance classification performance.
In order to obtain more abundant local spatial information, various classification methods based on multiscale feature extraction have been proposed. Yu et al. [28] proposed a Dual-Channel Convolution Network (DCCN) to maximize the use of global and multiscale information from HSIs. Zhang et al. [29] proposed a Multiscale Dense Network (MSDN), which made full use of different scales of information in the network to realize deep feature extraction and multiscale feature fusion. To utilize the correlation information between different levels, Song et al. [30] proposed a Deep Feature Fusion Network (DFFN), which introduced residual learning to alleviate the overfitting problem and fused the features of different levels to improve the classification accuracy.
Recently, a large number of studies have shown [31,32,33] that different spectral bands and spatial pixels have different contributions to HSI classification tasks, and highlighting bands and pixels rich in effective information through the attention mechanism can significantly improve HSI classification performance. Sun et al. [34] proposed a Spectral–Spatial Attention Network (SSAN). Firstly, a simple Spectral–Spatial Network (SSN) was constructed to extract spectral–spatial features. Then, the attention module was embedded into the SSN to suppress the interfering pixels, which achieved good results on three classical datasets, but the low computational efficiency of the attention module made it time consuming to train the SSAN. Lei et al. [35] proposed a Local Attention Network (LANet) to improve the semantic segmentation of HSIs by enhancing the scene-related representation in the encoding and decoding stages, which greatly improved the semantic representation of low-level features and further improved the segmentation performance. In addition, Transformers have also begun to be used in HSI classification due to their ability to model global features of images. Hong et al. [36] used Transformers to rethink the HSI classification process from a sequence perspective and proposed a new backbone network, SpectralFormer, to achieve high performance for the HSI classification task. Sun et al. [37] proposed a Spectral–Spatial Feature Tokenization Transformer (SSFTT) to capture spectral–spatial features and high-level semantic features. The encoder module of the Transformer was introduced into the network for feature representation and learning, which achieved good classification results and greatly improved the computational efficiency.
HSI classification is a kind of pixel-level classification, and the detail information of edges and shapes is crucial to improving the classification accuracy. However, the general HSI classification model based on deep learning usually only focuses on the use of deep semantic features for classification, and ignores the shallow features, which is not conducive to further improvement of classification performance. The Feature Pyramid Network (FPN) [38] embedded high-level features rich in semantic information into shallow features rich in detail information through a top-down path, so that all levels of features had rich semantic information. It achieved good results in the application of object detection [39,40], instance segmentation [41] and other computer vision fields. Based on FPN, Wang et al. [42] proposed an FPN with dual-filter feature fusion for HSI classification. The enhanced multiscale features were obtained by embedding dual-filter feature fusion modules in each horizontal branch of an FPN, and then the final feature representation obtained by fusing features of each level from top to bottom was used for classification, which achieved good performance. Fang et al. [43] used a convolutional attention module in bottom-up feature extraction to extract effective information, and then used a bidirectional pyramid for instance segmentation of HSI. Chen et al. [44] introduced coordinate attention in each horizontal branch to obtain more HSI features, and then added and fused the features of each level of FPN to achieve effective HSI classification of small samples.
Inspired by the idea of the FPN, this article proposes a Feature Embedding Network with Multiscale Attention (MAFEN) to make full use of both deep and shallow features through bottom-up feature extraction and top-down feature embedding. Firstly, a Multiscale Attention Module (MAM) is designed to express rich information for different levels of features. MAM first uses convolutional kernels with different receptive field sizes to extract multiscale information, and then uses spectral–spatial attention to suppress redundant information at each scale, so as to highlight the bands and pixels rich in effective information. Secondly, the deep semantic information is embedded into the shallow features through the top-down channel to enhance the representation ability of the features at different levels. Finally, an Adaptive Spatial Feature Fusion (ASFF) [45] strategy is introduced to automatically learn the fusion weight of each feature map through the network, so as to realize the adaptive fusion of features at different levels.
The main contributions of this article are as follows:
  • The MAM is designed to enhance the representation ability of features at different levels. Firstly, multiscale convolution is used to obtain rich information representation, and then the attention mechanism is used to highlight important information.
  • The ASFF strategy is introduced for feature fusion in HSIs to adaptively fuse features of different levels and improve classification performance.
  • The MAFEN is proposed, where the deep features are embedded into the shallow features through the top-down channel to enrich their semantic information, and the shallow features are adaptively fused with features at other levels.
The rest of this article is organized as follows: The MAFEN method is described in detail in Section 2. Section 3 presents the experiments and analysis. Section 4 concludes the article.

2. The Proposed Method

In this section, our proposed MAFEN for HSI classification is described in detail, and its overall framework is shown in Figure 1. Firstly, the MAFEN backbone network uses 3D-CNN and 2D-CNN to extract the features of different depths from the dimensionality-reduced hyperspectral images. Secondly, MAM was designed to enhance the representation ability of different levels of features through multiscale convolution, and the spectral–spatial attention mechanism was used to highlight important information and suppress redundant information. Then, the high-level semantic information was embedded into the low-level local spatial information through the top-down channel to make the features at different levels have rich semantics. Finally, ASFF was introduced to adaptively fuse the features of different levels to obtain the final feature representation for classification.

2.1. Multiscale Attention Module

CNNs are limited by fixed-size receptive fields, which may result in insufficient local spatial features. To obtain richer local information of features at different levels, a multiscale approach can be used to control the sizes of convolutional kernels, thus obtaining different receptive fields. Moreover, the feature maps may contain redundant information that could degrade the representation performance, thereby affecting the final classification results. Therefore, we utilized spectral–spatial attention to extract crucial information from the features obtained using multiscale convolutions to enhance classification performance. We designed an MAM that utilized multiscale convolutions and spectral–spatial attention to obtain more rich and effective feature representations. Figure 2a,b illustrates the overall framework of the MAM and the structure of the spectral–spatial attention module, respectively, as described below.
As shown in Figure 2, firstly, the MAM convolved the features F i i = 1 , 2 , 3 of different levels with three convolutional kernels of different sizes to obtain multi-scale information, where the sizes of the convolutional kernels were 1 × 1 , 3 × 3 and 5 × 5 , respectively, and F 1 , F 2 and F 3 represent the extracted low-level, mid-level, and high-level features, respectively. Then, the spectral–spatial attention modules were employed to extract effective information from the features extracted by each convolutional kernel, where spectral attention and spatial attention were cascaded. Finally, the three features were fused by element-wise summation.

2.1.1. Spectral Attention

The main purpose of spectral attention is to generate band weights W s p e to recalibrate the importance of each spectral band. Considering that the patch block may contain pixels from other classes, using global average pooling may introduce interference to the pixels of the current class. Therefore, we only used the center vector p i to generate the weight W s p e .
The specific structure of the spectral attention module is shown in Figure 3. Firstly, the center vector p i R 1 × 1 × b was taken from the input cube F R s × s × b , where s × s was the spatial size of F and b was the number of bands. Then, the band weight W s p e R 1 × 1 × b was obtained through the calculation of two convolutional layers with a kernel size of 1 × 1 , as shown in Equation (1).
W s p e = σ W 2 δ W 1 p i ,
where σ and δ represent the sigmoid and ReLU activation functions, respectively. W 1 and W 2 are the weight parameters of the two convolutional layers, and represents the convolution operation. Finally, as shown in Figure 2b, the band weight W s p e was used to recalibrate the bands in the feature F to highlight the useful spectral information, using Equation (2).
F = W s p e F ,
where represents element-wise multiplication.

2.1.2. Spatial Attention

Spatial attention aims to enhance the spatial information for pixels belonging to the same class as that of the central pixel, while suppressing pixels of other classes. Therefore, the spatial weight W s p a should have the same width and height as those of the input feature F , with a specific structure as shown in Figure 4. Firstly, global max pooling was applied to the input feature F along the channel direction, as shown in Equation (3).
F max = max c F i , j ,
where F i , j represents the value at position i , j in the feature F R s × s × b , max c represents taking the maximum value along the channel direction c and F max R s × s is the feature map after global max pooling. Then, it is passed through two 2D convolutional layers to generate the spatial weight W s p a R s × s , as shown in Equation (4).
W s p a = σ δ F max W 1 W 2 ,
where W 1 and W 2 are the weight parameters of the two convolutional layers, σ and δ represent the sigmoid and ReLU activation functions, respectively, and denotes the convolution operation. Finally, as shown in Figure 2b, the spatial weight W s p a is used to recalibrate the spatial information in the feature F and highlight the useful spatial information, using Equation (5).
F = W s p a F ,
where represents element-wise multiplication.

2.2. Feature Embedding Network

Deep neural networks learn the fine-grained features of local objects in HSIs in shallow layers, and high-level semantic features in deep layers. However, during the deep learning process, shallow features are often lost or even disappear, so they are generally not involved in the final HSI classification. In addition, different-depth features have different levels of information representation, and fully utilizing information at different levels is beneficial to improving the effectiveness of HSI classification. In this article, we propose a new Multiscale Attention Feature Embedding Network. The backbone of MAFEN consists of a spectral–spatial feature extraction channel and a deep feature embedding channel. The detailed description of the MAFEN is as follows.
Let H R w × h × b represent the original HSI data, where w and h represent the width and height of the spatial dimension, respectively, and b is the number of spectral bands. Each pixel in H corresponds to a one-hot label vector Y R 1 × 1 × K , where K is the number of land cover classes. HSIs have rich spectral information, which will lead to a large number of spectral dimensions and an increase in computational complexity. HSIs may also contain noise, causing interference with the classification. Using Principal Component Analysis (PCA) to perform dimensionality reduction can improve classification accuracy by removing noise and redundant information, and can also reduce computation time and resource consumption, thereby enhancing computational efficiency and making deep learning models more efficient. Therefore, PCA is commonly used to process HSI data. PCA reduces the number of spectral bands from b to l, while maintaining the spatial size of HSI. The resulting reduced-dimensional HSI data are represented as H p c a R w × h × l , where l is the number of reduced spectral bands. To fully leverage the spectral and spatial information provided by the HSI, a set of cubes F 0 R s 0 × s 0 × l is extracted from H p c a , where s 0 × s 0 represents the spatial size of the patch blocks in the HSI cube. The center pixel of each patch is denoted as x i , y i , and the true label of each patch is determined by the label of the center pixel.
(1) Feature Extraction Channel: Given the ith feature F i R s i × s i × l , i = 0, 1, 2, 3, where F 0 represents the cube corresponding to the HSI input data; F 1 , F 2 and F 3 represent low-level, mid-level and high-level features, respectively. The feature map F i + 1 is obtained by applying two layers of convolutions (3D-CNN and 2D-CNN) and residual connections to each feature map F i in a bottom-up manner, as shown in Equations (6) and (7):
M = δ B N f 1 F i , w 1 ,
F i + 1 = M P δ B N f 2 M , w 2 + M ,
where f 1 · , w 1 represents a 3D convolution with a weight parameter w 1 and kernel size of 3 × 3 × 3 , and f 2 · , w 2 represents a 2D convolution with a weight parameter w 2 and kernel size of 3 × 3 . B N stands for batch normalization, and δ represents the activation function, which is ReLU here. M P · denotes the max pooling function.
The 3D-2D convolution is used to extract spectral–spatial features from the HSI data, resulting in three features with different levels of information. High-level features contain rich semantic information, while low-level features capture fine-grained local spatial information.
(2) Deep Feature Embedding Channel: Multiscale attention was applied to different deep features F i in three branches to extract effective spectral–spatial information, thereby enhancing the classification performance. Then, transpose convolution was applied to the deep features F i i = 3 , 2 to complete upsampling and obtain F i , as shown in Equation (8).
F i = φ F i , θ ,
where φ · , θ represents the transpose convolution with a kernel size of 3 × 3 and weight parameter θ . As a result, F i has the same spatial resolution as F i 1 . Next, F i and F i 1 were added together for fusion, and the fused features were convolved as shown in Equation (9).
F i 1 = f 3 F i F i 1 , w 3 ,
where f 3 · , w 3 represents the convolution operation with a weight parameter w 3 , and represents element-wise addition for fusion. Through the above process, high-level features can be embedded into low-level features, enhancing the feature representation capability of the model.

2.3. Adaptive Spatial Feature Fusion

In contrast to conventional feature fusion strategies, ASSF can learn the fusion weights for each feature map automatically through the network, achieving adaptive fusion. The specific structure is shown in Figure 5.
Firstly, the three different-level features F i R s × s × b , i = 1 , 2 , 3 were concatenated along the channel dimension to obtain the feature F s R s × s × 3 b . Then, a convolution operation was applied to change the channel length, as shown in Equation (10).
F s = δ f 4 F s , w 4 ,
where f 4 · , w 4 represents a 2D convolution with a kernel size of 1 × 1 and δ is the ReLU activation function. The resulting F s from the convolution operation has a size of s × s × 3 . To obtain the feature fusion weights α , β , γ of size s × s , the Softmax function was applied to normalize the exponential function of the data along the channel direction of F s at the same position, as shown in Equation (11).
α , β , γ = exp F s , i , j k k = 1 3 exp F s , i , j k ,
where F s , i , j c represents the value of the kth channel of the feature F s at position i , j . Therefore, the network can learn the weights for each feature automatically, enhancing the fusion capability. Next, features F 3 , F 2 and F 1 were multiplied in an element-wise way by weights α , β and γ in each band, respectively, to obtain F 3 , F 2 and F 1 , which were then summed to obtain the final feature representation F . Finally, the feature F was fed into a linear layer for classification.

3. Experiment and Analysis

3.1. Dataset Description

In order to verify the performance of the proposed method, we selected four classical datasets for experiments, including Indian Pines, Kennedy Space Center (KSC), Pavia University and Salinas.
The Indian Pines dataset was a hyperspectral remote sensing image with a size of 145 × 145 and a spatial resolution of 20 m. It was acquired using an Airborne Visible/Infrared Imaging Spectrometer (AVIRIS). It contained 200 spectral bands and 16 land cover classes, for a total of 10,249 labeled samples. The false-color image and ground-truth label image are shown in Figure 6a. Table 1 lists the specific classes of the Indian Pines dataset and the number of training and testing samples for each class.
The KSC dataset was a hyperspectral remote sensing image with a size of 512 × 217 and a spatial resolution of 18 m. It was also acquired using an AVIRIS sensor. It contained 176 spectral bands and 13 land cover classes, for a total of 5211 labeled samples. The false-color image and ground-truth label image are shown in Figure 6b. Table 2 lists the specific classes of the KSC dataset and the number of training and testing samples for each class.
The Pavia University dataset was a hyperspectral remote sensing image with a size of 610 × 340 and a spatial resolution of 1.3 m. It was acquired using a Reflective Optics System Imaging Spectrometer (ROSIS). It contained 103 spectral bands and 9 land cover classes, for a total of 42,776 labeled samples. The false-color image and ground-truth label image are shown in Figure 6c. Table 3 lists the specific classes of the Pavia University dataset and the number of training and testing samples for each class.
The Salinas dataset was a hyperspectral remote sensing image with a size of 512 × 217 and a spatial resolution of 3.7 m. It was also acquired using an AVIRIS sensor. It contained 204 spectral bands and 16 land cover classes, for a total of 54,129 labeled samples. The false-color image and ground-truth label image are shown in Figure 6d. Table 4 lists the specific classes of the Salinas dataset and the number of training and testing samples for each class.

3.2. Experimental Setting

(1) Evaluation Metrics: To quantitatively assess the effectiveness of the proposed model, we used Overall Accuracy (OA), Average Accuracy (AA) and the Kappa coefficient as the evaluation metrics. A higher value for each metric indicated better classification performance.
(2) Configuration: The experiments were conducted using an Inter Xeon Silver 4114 2.2 GHz CPU, 128 GB RAM, and a NVIDIA Geforce RTX 2080 Ti 12 GB graphics card. The PyTorch deep learning framework was used to train the network, with epoch and batch_size set to 100 and 32, respectively. A learning rate decay strategy was employed, with an initial learning rate of 0.001, and a decay of 0.1 every 50 epochs. Adam was chosen as the optimization method for the experiments. Each method was tested five times, and the mean value was taken as the experimental result, along with the calculation of the standard deviation.
In the Indian Pines, Pavia University and Salinas datasets, the size of the patch block (Patch_Size) was set to 13, while in the KSC dataset it was set to 15. The dimension of PCA dimensionality reduction (PCA_Components) was set to 64, 128, 32 and 96 for the Indian Pines, KSC, Pavia University and Salinas datasets, respectively.
In practice, labeling samples of hyperspectral image data requires expert knowledge, which is time-consuming and expensive. Therefore, how to train the model well with a limited or low percentage of samples has become an important topic [17], and it is also a necessary way to test the effectiveness of the proposed model. In recent years, with the improvement of deep neural network models, the percentage of the training samples has tended to decrease from 30% [24] to 10%, 5% or 3% [17,20]. In this paper, we also used a low percentage of samples to train the proposed model. There were many available samples in the Pavia University and Salinas datasets, so the number of random training samples in these two datasets accounted for 3% of the total samples, and the number of random training samples in the Indian Pines and KSC datasets accounted for 10% of the total samples.

3.3. Experimental Results and Analysis

3.3.1. Classification Results

We compared the proposed MAFEN model with several representative methods to validate its effectiveness, including traditional methods such as SVM, deep-learning-based methods such as 3DCNN [22], SSRN [23], DFFN [30] and HybridSN [24], as well as attention-based methods such as Speformer [36] and SSFTT [37]. The detailed experimental results of these methods on the Indian Pines, KSC, Pavia University and Salinas datasets are as follows.
(1) Indian Pines: Firstly, all the models were evaluated using the Indian Pines dataset, and the quantitative experimental results are shown in Table 5, where the numbers in bold mean the best results. The results of evaluation metrics show that the proposed MAFEN method performed the best, obtaining the highest OA, AA and Kappa values. Specifically, compared with SVM, the accuracy of 3DCNN was improved by 12.02%, which shows that deep learning has a significant advantage in HSI classification. SSRN had better classification performance than 3DCNN because it uses continuous residual blocks to learn spectral and spatial features separately. The classification performance of DFFN was lower than that of SSRN, which may be because the feature distribution of the Indian Pines dataset did not match well with the DFFN network structure and the way of feature fusion. The poor accuracy of HybridSN and Speformer in the “Grass-pawn-mowed” (class 7, mint green) and “Oats” (class 9, yellow) categories was due to the fact that the number of training samples in the two categories was only three and two, respectively, which was challenging for HSI classification. SSFTT achieved 100% accuracy in the “Grass-Pasture-Mowed” category, but 76.67% accuracy in the “Oats” category, because the region shape of this category was narrow and spatial features could not be fully extracted. However, the accuracy of the proposed model on the two categories was 100% and 93.34%, respectively, and the accuracy of each category was relatively close, which indicates that the model has good feature expression ability for samples with a small training number and an irregular region shape.
Figure 7 shows the classification maps of these methods on the Indian Pines dataset. The compared methods performed poorly on the land cover objects with region edges and narrow shapes, but the proposed MAFEN model generated more accurate classification maps with better homogeneity in each region. This is because MAFEN enhances the feature representation of different categories through deep feature embedding and multiscale attention learning.
(2) KSC: Secondly, we evaluated all the models on the KSC dataset, and the quantitative experimental results are shown in Table 6. The KSC dataset had a sparse distribution of classes, and patches were less affected by interference from neighboring classes, which allowed for better extraction of pixel-level features. Therefore, the accuracy of various methods was relatively high. Among them, HybridSN and Speformer achieved 100% accuracy in five and three categories, respectively, and SSFTT achieved 100% accuracy in nine classes. The proposed MAFEN model achieved 100% accuracy in 10 classes, with OA, AA and Kappa values reaching 99.91%, 99.87% and 99.90%, respectively.
Figure 8 displays the classification maps of these methods on the KSC dataset. Several comparison methods had more noise points in the category “CP/Oak” (class 4, in cyan), which led to poor classification results. Among them, HybridSN achieved the best classification performance, reaching 95.51%. The proposed MAFEN model can better distinguish this category and achieved the best accuracy in this category, reaching 98.68%.
(3) Pavia University: We further evaluated all the models on the Pavia University dataset, and the quantitative experimental results are shown in Table 7. The Pavia University dataset had a large number of samples in each category and abundant training samples, resulting in good classification performance for all methods. Compared with other methods, the proposed MAFEN model achieved the best overall classification results and the best accuracy in most categories.
Figure 9 shows the classification maps of these methods on the Pavia University dataset. The “Gravel” (class 3, in orange) and “Bricks” (class 8, in steel blue) classes had similar spectra but differed in spatial details. From the classification maps, it can be seen that SSRN using deep features for classification could not effectively distinguish “Gravel” and “Bricks”. However, the proposed model performed well on “Gravel” and “Bricks”, which indicates that utilizing shallow local spatial details was beneficial for distinguishing the “Gravel” and “Bricks” classes.
(4) Salinas: Finally, we evaluated all the models on the Salinas dataset, and quantitative analysis results for different methods are presented in Table 8. The Salinas dataset had larger regions and regular shapes for different classes, which allowed for better extraction of spatial features. The proposed MAFEN model achieved an accuracy of 100% for six classes, with OA, AA and Kappa values reaching 99.82%, 99.80% and 99.80%, respectively. This further confirms the feature representation capability of the proposed model.
Figure 10 shows the classification maps of different methods on the Salinas dataset. From the top left corner of the classification maps, we can observe that due to the very similar spectra of “Grapes Untrained” (class 8, in steel blue) and “Vinyard Untrained” (class 15, in olive), the compared methods contained a lot of noise in the classification maps of these two classes. However, the proposed MAFEN model, which has better spectral–spatial feature representation capability, was able to distinguish these two classes, resulting in smoother and more accurate classification maps.

3.3.2. Parameter Analysis

(1) Impact of Patch_Size and PCA_Components on the OA: We analyzed the influence of Patch_Size and PCA_Components on classification performance in the Indian Pines, KSC, Pavia University and Salinas datasets. Patch_Size was selected as (11, 13, 15, 17, 19) for all four datasets, and PCA_Components was selected as (32, 48, 64, 80, 96, 112, 128) for the Indian Pines, KSC and Salinas datasets. Since the Pavia University dataset had 103 spectral bands, PCA_Components was selected as (32, 48, 64, 80, 96) for this dataset. From Figure 11a, we can observe that the best performance was achieved when Patch_Size was set to 13 and PCA_Components was set to 64 on the Indian Pines dataset. From Figure 11b, the classification performance on the KSC dataset was strongly correlated with PCA_Components. As PCA_Components increased, the classification performance improved, and eventually the OA approached 100%. In Figure 11c, the classification performance was better when PCA_Components was in the range (32, 48). As you can see from Figure 11d, the smaller PCA_Components fit well on the Salinas dataset.
(2) OA of Models with Different Percentages of Training Samples: Figure 12 presents the classification accuracy of various methods with different percentages of training samples. Considering the differences in the number of available samples for each dataset, 2%, 4%, 6%, 8% and 10% of labeled samples were selected as training samples for the Indian Pines and KSC datasets, while 0.5%, 1%, 2%, 3% and 4% of labeled samples were selected as training samples for the Pavia University and Salinas datasets. From Figure 12, it can be seen that the proposed MAFEN method still performed well even with fewer training samples. In addition, as the percentage of training samples increased, the accuracy of various methods also increased. Among them, the accuracy of SSFTT and HybridSN was close to our method, showing better classification performance.
(3) Computational Performance: The results of training and testing time consumed by SSRN, DFFN, HybridSN, Speformer, SSFTT and our proposed MAFEN method are listed in Table 9. It can be seen that there were obvious differences in the training and testing time of several methods on different datasets. Among them, SSFTT was the most computationally efficient, and the training time was much lower than that of the other methods on all datasets. In contrast, the training time of Speformer was longer, especially on the Indian Pines dataset, where the training time was more than four times longer than that of the other models. On the whole, our proposed method performed well with relatively short training time on different datasets.

3.3.3. Ablation Experiment

In order to thoroughly validate the effectiveness of each component in the proposed method, the ablation experiment was conducted on the Indian Pines, KSC, Pavia University and Salinas datasets to analyze the impact of the ASFF and MAM components. Four combinations were considered, where the Base network did not contain the MAM and ASFF modules. Three indicators, OA, AA and Kappa, were used to analyze the influence of different components on the whole model, and the experimental results are shown in Figure 13.
The Base network, which did not include the MAM and ASFF modules, had the worst classification performance on the four datasets. When adding the MAM or ASFF module, the classification performance was significantly improved compared with that of the Base network, which verifies the effectiveness of the MAM and ASFF modules. Compared with the network containing the ASFF module, the network containing the MAM module had better improvement performance, showing that the attention mechanism can extract effective spectral–spatial information, which is more helpful to improve the HSI classification performance. The classification performance of the MAFEN network with two modules was better, which reflects the better performance of spectral–spatial features by using both modules simultaneously. In summary, the results of the ablation experiment further prove the effectiveness of the proposed model.

4. Conclusions

The method named Feature Embedding Network with Multiscale Attention (MAFEN) is proposed in this article to improve the classification performance of Hyperspectral Images (HSIs). The MAFEN model first utilizes multiscale attention modules to extract informative features, then embeds deep features into shallow features to enhance the feature representation capability of the network. Finally, adaptive fusion is performed on features at different levels. Experiments were conducted on four commonly used HSI datasets, and comparisons were made with existing methods. The proposed MAFEN method demonstrated superior spectral–spatial feature representation capability, as it effectively utilized spatial details from shallow features and semantic information from deep features, resulting in a significant improvement in classification accuracies on all four datasets compared with those of several other methods. In the future, we will further study new attention-based networks to fully leverage the critical information in HSIs.

Author Contributions

Conceptualization, Y.L. and J.Z.; Data curation, J.F. and C.M.; Funding acquisition, Y.L. and C.M.; Investigation, J.Z. and J.F.; Methodology, Y.L. and J.Z.; Software, J.Z.; Supervision, Y.L.; Visualization, J.Z. and C.M.; Writing—original draft, J.Z.; Writing—review and editing, Y.L. and C.M. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (Nos. 62077038, 61672405, 62176196 and 62271374).

Data Availability Statement

The data used in this study are available at Hyperspectral Remote Sensing Scenes—Grupo de Inteligencia Computacional (GIC) (ehu.eus) (accessed on 9 May 2023) (Indian Pines), Hyperspectral Remote Sensing Scenes—Grupo de Inteligencia Computacional (GIC) (ehu.eus) (accessed on 9 May 2023) (Kennedy Space Center), Hyperspectral Remote Sensing Scenes—Grupo de Inteligencia Computacional (GIC) (ehu.eus) (accessed on 9 May 2023) (Pavia University), Hyperspectral Remote Sensing Scenes—Grupo de Inteligencia Computacional (GIC) (ehu.eus) (accessed on 9 May 2023) (Salinas).

Acknowledgments

Thanks are due to Mohammed Abdullah Mahdi Alloaa for assistance with the English editing during the review progress of this article.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Zhang, X.; Liu, L.; Chen, X.; Gao, Y.; Jiang, M. Automatically Monitoring Impervio-us Surfaces Using Spectral Generalization and Time Series Landsat Imagery from1985 to 2020 in the Yangtze River Delta. J. Remote Sens. 2021, 2021, 9873816. [Google Scholar] [CrossRef]
  2. Yang, X.; Yu, Y. Estimating Soil Salinity Under Various Moisture Conditions: An Experimental Study. IEEE Trans. Geosci. Remote Sens. 2017, 55, 2525–2533. [Google Scholar] [CrossRef]
  3. Avtar, R.; Sahu, N.; Aggarwal, A.K.; Chakraborty, S.; Kharrazi, A.; Yunus, A.P.; Dou, J.; Kurniawan, T.A. Exploring Renewable Energy Resources Using Remote Sensing and GIS—A Review. Resources 2019, 8, 149. [Google Scholar] [CrossRef]
  4. Weiss, M.; Jacob, F.; Duveiller, G. Remote Sensing for Agricultural Applications: A Meta-Review. Remote Sens. Environ. 2020, 236, 111402. [Google Scholar] [CrossRef]
  5. Liang, L.; Di, L.; Zhang, L.; Deng, M.; Qin, Z.; Zhao, S.; Lin, H. Estimation of Crop LAI Using Hyperspectral Vegetation Indices and a Hybrid Inversion Method. Remote Sens. Environ. 2015, 165, 123–134. [Google Scholar] [CrossRef]
  6. Ye, Q.; Huang, P.; Zhang, Z.; Zheng, Y.; Fu, L.; Yang, W. Multiview Learning WithRobust Double-Sided Twin SVM. IEEE Trans. Cybern. 2022, 52, 12745–12758. [Google Scholar] [CrossRef]
  7. Haut, J.M.; Paoletti, M.E. Cloud Implementation of Multinomial Logistic Regression for UAV Hyperspectral Images. IEEE J. Miniat. Air Space Syst. 2020, 1, 163–171. [Google Scholar] [CrossRef]
  8. Li, J.; Bioucas-Dias, J.M.; Plaza, A. Spectral–Spatial Hyperspectral Image Segmentation Using Subspace Multinomial Logistic Regression and Markov Random Fields. IEEE Trans. Geosci. Remote Sens. 2012, 50, 809–823. [Google Scholar] [CrossRef]
  9. Du, B.; Zhang, L. Random-Selection-Based Anomaly Detector for Hyperspectral Imagery. IEEE Trans. Geosci. Remote Sens. 2011, 49, 1578–1589. [Google Scholar] [CrossRef]
  10. Du, B.; Zhang, L. Target Detection Based on a Dynamic Subspace. Pattern Recognit. 2014, 47, 344–358. [Google Scholar] [CrossRef]
  11. Luo, Y.; Cao, X.; Zhang, J.; Cao, X.; Guo, J.; Shen, H.; Wang, T.; Feng, Q. CE-FPN: Enhancing Channel Information for Object Detection. Multimed. Tools Appl. 2021, 81, 30685–30704. [Google Scholar] [CrossRef]
  12. Obaid, K.B.; Zeebaree, S.R.M.; Ahmed, O.M. Deep Learning Models Based on Image Classification: A Review. Int. J. Sci. Bus. 2020, 4, 75–81. [Google Scholar] [CrossRef]
  13. Yang, Y.; Hou, Y.-L.; Hou, Z.; Hao, X.; Shen, Y. Image-Level Supervised Instance Segmentation Using Instance-Wise Boundary. In Proceedings of the 2021 IEEE International Conference on Image Processing (ICIP), Anchorage, AK, USA, 19–22 September 2021; pp. 1069–1073. [Google Scholar]
  14. Zhang, Q.; Yuan, Q.; Li, Z.; Sun, F.; Zhang, L. Combined Deep Prior with Low-Rank Tensor SVD for Thick Cloud Removal in Multitemporal Images. ISPRS J. Photogramm. Remote Sens. 2021, 177, 161–173. [Google Scholar] [CrossRef]
  15. Liu, J.; Yang, Z.; Liu, Y.; Mu, C. Hyperspectral Remote Sensing Images Deep Feature Extraction Based on Mixed Feature and Convolutional Neural Networks. Remote Sens. 2021, 13, 2599. [Google Scholar] [CrossRef]
  16. Hong, D.; Gao, L.; Yokoya, N.; Yao, J.; Chanussot, J.; Du, Q.; Zhang, B. More Diverse Means Better: Multimodal Deep Learning Meets Remote-Sensing Imagery Classification. IEEE Trans. Geosci. Remote Sens. 2021, 59, 4340–4354. [Google Scholar] [CrossRef]
  17. Feng, J.; Zhao, N.; Shang, R.; Zhang, X.; Jiao, L. Self-Supervised Divide-and-Conquer Generative Adversarial Network for Classification of Hyperspectral Images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–17. [Google Scholar] [CrossRef]
  18. Mu, C.; Dong, Z.; Liu, Y. A Two-Branch Convolutional Neural Network Based on Multi-Spectral Entropy Rate Superpixel Segmentation for Hyperspectral Image Classification. Remote Sens. 2022, 14, 1569. [Google Scholar] [CrossRef]
  19. Cao, X.; Fu, X.; Xu, C.; Meng, D. Deep Spatial-Spectral Global Reasoning Network for Hyperspectral Image Denoising. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–14. [Google Scholar] [CrossRef]
  20. Mu, C.; Zeng, Q.; Liu, Y.; Qu, Y. A Two-Branch Network Combined With Robust Principal Component Analysis for Hyperspectral Image Classification. IEEE Geosci. Remote Sens. Lett. 2021, 18, 2147–2151. [Google Scholar] [CrossRef]
  21. Hu, W.; Huang, Y.; Wei, L.; Zhang, F.; Li, H. Deep Convolutional Neural Networks for Hyperspectral Image Classification. J. Sens. 2015, 2015, 258619. [Google Scholar] [CrossRef]
  22. Li, Y.; Zhang, H.; Shen, Q. Spectral–Spatial Classification of Hyperspectral Imagery with 3D Convolutional Neural Network. Remote Sens. 2017, 9, 67. [Google Scholar] [CrossRef]
  23. Zhong, Z.; Li, J.; Luo, Z.; Chapman, M. Spectral–Spatial Residual Network for Hyperspectral Image Classification: A 3-D Deep Learning Framework. IEEE Trans. Geosci. Remote Sens. 2018, 56, 847–858. [Google Scholar] [CrossRef]
  24. Roy, S.K.; Krishna, G.; Dubey, S.R.; Chaudhuri, B.B. HybridSN: Exploring 3-D–2-D CNN Feature Hierarchy for Hyperspectral Image Classification. IEEE Geosci. Remote Sens. Lett. 2020, 17, 277–281. [Google Scholar] [CrossRef]
  25. Mu, C.; Liu, Y.; Liu, Y. Hyperspectral Image Spectral–Spatial Classification Method Based on Deep Adaptive Feature Fusion. Remote Sens. 2021, 13, 746. [Google Scholar] [CrossRef]
  26. Huang, W.; Zhao, Z.; Sun, L.; Ju, M. Dual-Branch Attention-Assisted CNN for Hyperspectral Image Classification. Remote Sens. 2022, 14, 6158. [Google Scholar] [CrossRef]
  27. Lu, T.; Liu, M.; Fu, W.; Kang, X. Grouped Multi-Attention Network for Hyperspectral Image Spectral-Spatial Classification. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–12. [Google Scholar] [CrossRef]
  28. Yu, H.; Zhang, H.; Liu, Y.; Zheng, K.; Xu, Z.; Xiao, C. Dual-Channel Convolution Network with Image-Based Global Learning Framework for Hyperspectral Image Classification. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
  29. Zhang, C.; Li, G.; Du, S. Multiscale Dense Networks for Hyperspectral Remote Sensing Image Classification. IEEE Trans. Geosci. Remote Sens. 2019, 57, 9201–9222. [Google Scholar] [CrossRef]
  30. Song, W.; Li, S.; Fang, L.; Lu, T. Hyperspectral Image Classification with Deep Feature Fusion Network. IEEE Trans. Geosci. Remote Sens. 2018, 56, 3173–3184. [Google Scholar] [CrossRef]
  31. Zhang, Z.; Liu, D.; Gao, D.; Shi, G. S3Net: Spectral–Spatial–Semantic Network for Hyperspectral Image Classification With the Multiway Attention Mechanism. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–17. [Google Scholar] [CrossRef]
  32. Zhang, X.; Sun, G.; Jia, X.; Wu, L.; Zhang, A.; Ren, J.; Fu, H.; Yao, Y. Spectral–Spatial Self-Attention Networks for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2022, 60, 963. [Google Scholar] [CrossRef]
  33. Mei, S.; Li, X.; Liu, X.; Cai, H.; Du, Q. Hyperspectral Image Classification Using Attention-Based Bidirectional Long Short-Term Memory Network. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–12. [Google Scholar] [CrossRef]
  34. Sun, H.; Zheng, X.; Lu, X.; Wu, S. Spectral–Spatial Attention Network for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2020, 58, 3232–3245. [Google Scholar] [CrossRef]
  35. Ding, L.; Tang, H.; Bruzzone, L. LANet: Local Attention Embedding to Improve the Semantic Segmentation of Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2021, 59, 426–435. [Google Scholar] [CrossRef]
  36. Hong, D.; Han, Z.; Yao, J.; Gao, L.; Zhang, B.; Plaza, A.; Chanussot, J. SpectralFormer: Rethinking Hyperspectral Image Classification with Transformers. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–15. [Google Scholar] [CrossRef]
  37. Sun, L.; Zhao, G.; Zheng, Y.; Wu, Z. Spectral–Spatial Feature Tokenization Transformer for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–14. [Google Scholar] [CrossRef]
  38. Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 936–944. [Google Scholar]
  39. Yang, Q.; Zhang, T.; Qiu, T.; Xiao, Y.; Jiang, X. Double Feature Pyramid Networks for Classification and Localization on Object Detection. In Proceedings of the 2022 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Prague, Czech Republic, 9–12 October 2022; pp. 1395–1400. [Google Scholar]
  40. Wenju, L.; Wanghui, C.; Liu, C.; Gan, Z. A Graph Attention Feature Pyramid Network for 3D Object Detection in Point Clouds. In Proceedings of the 2022 7th International Conference on Intelligent Informatics and Biomedical Science (ICIIBMS), Nara, Japan, 24–26 November 2022; Volume 7, pp. 94–98. [Google Scholar]
  41. Hu, M.; Li, Y.; Fang, L.; Wang, S. A2-FPN: Attention Aggregation Based Feature Pyramid Network for Instance Segmentation. arXiv 2021, arXiv:2105.03186. [Google Scholar]
  42. Wang, G.; Guo, W.; Wang, Y.; Wang, W. Feature Pyramid Network Based on Double Filter Feature Fusion for Hyperspectral Image Classification. In Proceedings of the 2022 16th IEEE International Conference on Signal Processing (ICSP), Beijing, China, 21–24 October 2022; Volume 1, pp. 240–244. [Google Scholar]
  43. Fang, L.; Jiang, Y.; Yan, Y.; Yue, J.; Deng, Y. Hyperspectral Image Instance Segmentation Using Spectral–Spatial Feature Pyramid Network. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–13. [Google Scholar] [CrossRef]
  44. Ding, C.; Chen, Y.; Li, R.; Wen, D.; Xie, X.; Zhang, L.; Wei, W.; Zhang, Y. Integrating Hybrid Pyramid Feature Fusion and Coordinate Attention for Effective Small Sample Hyperspectral Image Classification. Remote Sens. 2022, 14, 2355. [Google Scholar] [CrossRef]
  45. Liu, S.; Huang, D.; Wang, Y. Learning Spatial Fusion for Single-Shot Object Detection. arXiv 2019, arXiv:1911.09516. [Google Scholar]
Figure 1. The overall framework of the MAFEN for hyperspectral image classification. F 0 represents the cube corresponding to the HSI input data, F 1 , F 2 and F 3 are three feature maps with different levels of information obtained using 3D-2D convolution, representing low-level features, mid-level features and high-level features, respectively. F 1 , F 2 and F 3 represent the final features of each branch.
Figure 1. The overall framework of the MAFEN for hyperspectral image classification. F 0 represents the cube corresponding to the HSI input data, F 1 , F 2 and F 3 are three feature maps with different levels of information obtained using 3D-2D convolution, representing low-level features, mid-level features and high-level features, respectively. F 1 , F 2 and F 3 represent the final features of each branch.
Remotesensing 15 03338 g001
Figure 2. The structure of MAM. (a) The overall framework of MAM. (b) The structure of the spectral–spatial attention module in MAM. Firstly, MAM convolves the features F i of different levels with three convolutional kernels of different sizes to obtain multi-scale information. Then, cascaded spectral attention and spatial attention are employed to extract effective information from the feature F extracted by each convolution kernel.
Figure 2. The structure of MAM. (a) The overall framework of MAM. (b) The structure of the spectral–spatial attention module in MAM. Firstly, MAM convolves the features F i of different levels with three convolutional kernels of different sizes to obtain multi-scale information. Then, cascaded spectral attention and spatial attention are employed to extract effective information from the feature F extracted by each convolution kernel.
Remotesensing 15 03338 g002
Figure 3. Detailed structure diagram of the spectral attention module. The center vector p i was taken from the input cube F .
Figure 3. Detailed structure diagram of the spectral attention module. The center vector p i was taken from the input cube F .
Remotesensing 15 03338 g003
Figure 4. A detailed structure diagram of the spatial attention module. F max is the feature map obtained by global max pooling of F along the channel direction.
Figure 4. A detailed structure diagram of the spatial attention module. F max is the feature map obtained by global max pooling of F along the channel direction.
Remotesensing 15 03338 g004
Figure 5. The specific structure of ASFF. F s is the feature obtained by concatenating three different-level features along the channel dimension, α , β and γ are the feature fusion weights, F i represents the weighted features, F is the final feature representation.
Figure 5. The specific structure of ASFF. F s is the feature obtained by concatenating three different-level features along the channel dimension, α , β and γ are the feature fusion weights, F i represents the weighted features, F is the final feature representation.
Remotesensing 15 03338 g005
Figure 6. False-color images and ground-truth maps. (a) Indian Pines. (b) KSC. (c) Pavia University. (d) Salinas.
Figure 6. False-color images and ground-truth maps. (a) Indian Pines. (b) KSC. (c) Pavia University. (d) Salinas.
Remotesensing 15 03338 g006
Figure 7. Classification maps of the Indian Pines dataset. (a) SVM (OA = 79.60%); (b) 3DCNN (OA = 91.62%); (c) SSRN (OA = 95.75%); (d) DFFN (OA = 91.43%); (e) HybridSN (OA = 97.73%); (f) Speformer (OA = 93.92%); (g) SSFTT (OA = 98.49%); (h) MAFEN (OA = 99.10%).
Figure 7. Classification maps of the Indian Pines dataset. (a) SVM (OA = 79.60%); (b) 3DCNN (OA = 91.62%); (c) SSRN (OA = 95.75%); (d) DFFN (OA = 91.43%); (e) HybridSN (OA = 97.73%); (f) Speformer (OA = 93.92%); (g) SSFTT (OA = 98.49%); (h) MAFEN (OA = 99.10%).
Remotesensing 15 03338 g007
Figure 8. Classification maps of the KSC dataset. (a) SVM (OA = 87.81%); (b) 3DCNN (OA = 92.74%); (c) SSRN (OA = 95.04%); (d) DFFN (OA = 97.39%); (e) HybridSN (OA = 98.98%); (f) Speformer (OA = 97.15%); (g) SSFTT (OA = 99.48%); (h) MAFEN (OA = 99.91%).
Figure 8. Classification maps of the KSC dataset. (a) SVM (OA = 87.81%); (b) 3DCNN (OA = 92.74%); (c) SSRN (OA = 95.04%); (d) DFFN (OA = 97.39%); (e) HybridSN (OA = 98.98%); (f) Speformer (OA = 97.15%); (g) SSFTT (OA = 99.48%); (h) MAFEN (OA = 99.91%).
Remotesensing 15 03338 g008
Figure 9. Classification maps of the Pavia University dataset. (a) SVM (OA = 93.94%); (b) 3DCNN (OA = 98.09%); (c) SSRN (OA = 95.76%); (d) DFFN (OA = 98.25%); (e) HybridSN (OA = 97.35%); (f) Speformer (OA = 96.28%); (g) SSFTT (OA = 98.26%); (h) MAFEN (OA = 99.09%).
Figure 9. Classification maps of the Pavia University dataset. (a) SVM (OA = 93.94%); (b) 3DCNN (OA = 98.09%); (c) SSRN (OA = 95.76%); (d) DFFN (OA = 98.25%); (e) HybridSN (OA = 97.35%); (f) Speformer (OA = 96.28%); (g) SSFTT (OA = 98.26%); (h) MAFEN (OA = 99.09%).
Remotesensing 15 03338 g009
Figure 10. Classification maps of the Salinas dataset. (a) SVM (OA = 93.30%); (b) 3DCNN (OA = 96.64%); (c) SSRN (OA = 96.27%); (d) DFFN (OA = 98.77%); (e) HybridSN (OA = 98.46%); (f) Speformer (OA = 98.49%); (g) SSFTT (OA = 98.89%); (h) MAFEN (OA = 99.82%).
Figure 10. Classification maps of the Salinas dataset. (a) SVM (OA = 93.30%); (b) 3DCNN (OA = 96.64%); (c) SSRN (OA = 96.27%); (d) DFFN (OA = 98.77%); (e) HybridSN (OA = 98.46%); (f) Speformer (OA = 98.49%); (g) SSFTT (OA = 98.89%); (h) MAFEN (OA = 99.82%).
Remotesensing 15 03338 g010aRemotesensing 15 03338 g010b
Figure 11. Impact of Patch_Size and PCA_Components on the OA. (a) Indian Pines; (b) KSC; (c) Pavia University; (d) Salinas.
Figure 11. Impact of Patch_Size and PCA_Components on the OA. (a) Indian Pines; (b) KSC; (c) Pavia University; (d) Salinas.
Remotesensing 15 03338 g011aRemotesensing 15 03338 g011b
Figure 12. OA of models with different percentages of training samples. (a) Indian Pines; (b) KSC; (c) Pavia University; (d) Salinas.
Figure 12. OA of models with different percentages of training samples. (a) Indian Pines; (b) KSC; (c) Pavia University; (d) Salinas.
Remotesensing 15 03338 g012
Figure 13. Ablation experiment. (a) Indian Pines; (b) KSC; (c) Pavia University; (d) Salinas.
Figure 13. Ablation experiment. (a) Indian Pines; (b) KSC; (c) Pavia University; (d) Salinas.
Remotesensing 15 03338 g013aRemotesensing 15 03338 g013b
Table 1. The information for each class in the Indian Pines dataset.
Table 1. The information for each class in the Indian Pines dataset.
No.ClassColorTrainTest
1AlfalfaRemotesensing 15 03338 i001541
2Corn-notillRemotesensing 15 03338 i0021431285
3Corn-mintillRemotesensing 15 03338 i00383747
4CornRemotesensing 15 03338 i00424213
5Grass-pastureRemotesensing 15 03338 i00548435
6Grass-treesRemotesensing 15 03338 i00673657
7Grass-pasture-mowedRemotesensing 15 03338 i007325
8Hay-windrowedRemotesensing 15 03338 i00848430
9OatsRemotesensing 15 03338 i009218
10Soybean-notillRemotesensing 15 03338 i01097875
11Soybean-mintillRemotesensing 15 03338 i0112452210
12Soybean-cleanRemotesensing 15 03338 i01259534
13WheatRemotesensing 15 03338 i01320185
14WoodsRemotesensing 15 03338 i0141261139
15Buildings-Grass-Trees-DrivesRemotesensing 15 03338 i01539347
16Stone-Steel-TowersRemotesensing 15 03338 i016984
Total 10249225
Table 2. The information for each class in the KSC dataset.
Table 2. The information for each class in the KSC dataset.
No.ClassColorTrainTest
1ScrubRemotesensing 15 03338 i01776685
2Willow_swampRemotesensing 15 03338 i01824219
3CP_hammockRemotesensing 15 03338 i01926230
4CP/OakRemotesensing 15 03338 i02025227
5Slash_pineRemotesensing 15 03338 i02116145
6Oak/BroadleafRemotesensing 15 03338 i02223206
7Hardwood_swampRemotesensing 15 03338 i0231095
8Graminoid_marshRemotesensing 15 03338 i02443388
9Spartina_marshRemotesensing 15 03338 i02552468
10Catial_marshRemotesensing 15 03338 i02640364
11Salt_marshRemotesensing 15 03338 i02742377
12Mud_flatsRemotesensing 15 03338 i02850453
13WaterRemotesensing 15 03338 i02993834
Total 5204691
Table 3. The information for each class in the Pavia University dataset.
Table 3. The information for each class in the Pavia University dataset.
No.ClassColorTrainTest
1AsphaltRemotesensing 15 03338 i0301996432
2MeadowsRemotesensing 15 03338 i03155918,090
3GravelRemotesensing 15 03338 i032632036
4TreesRemotesensing 15 03338 i033922972
5Metal SheetsRemotesensing 15 03338 i034401305
6Bare soilRemotesensing 15 03338 i0351514878
7BitumenRemotesensing 15 03338 i036401290
8BricksRemotesensing 15 03338 i0371103572
9ShadowsRemotesensing 15 03338 i03828919
Total 128241,494
Table 4. The information for each class in the Salinas dataset.
Table 4. The information for each class in the Salinas dataset.
No.ClassColorTrainTest
1Brocoli_green_weeds_1Remotesensing 15 03338 i039601949
2Brocoli_green_weeds_2Remotesensing 15 03338 i0401123614
3FallowRemotesensing 15 03338 i041591917
4Fallow_rough_plowRemotesensing 15 03338 i042421352
5Fallow_smoothRemotesensing 15 03338 i043802598
6StubbleRemotesensing 15 03338 i0441193840
7CeleryRemotesensing 15 03338 i0451073472
8Grapes_untrainedRemotesensing 15 03338 i04633810,933
9Soil_vinyard_developRemotesensing 15 03338 i0471866017
10Corn_senesced_green_weedsRemotesensing 15 03338 i048983180
11Lettuce_romaine_4wkRemotesensing 15 03338 i049321036
12Lettuce_romaine_5wkRemotesensing 15 03338 i050581869
13Lettuce_romaine_6wkRemotesensing 15 03338 i05127889
14Lettuce_romaine_7wkRemotesensing 15 03338 i052321038
15Vinyard_untrainedRemotesensing 15 03338 i0532187050
16Vinyard_vertical_trellisRemotesensing 15 03338 i054541753
Total 162252,507
Table 5. Classification results of different methods on the Indian Pines dataset.
Table 5. Classification results of different methods on the Indian Pines dataset.
ClassSVM3DCNNSSRNDFFNHybridSNSpeformerSSFTTMAFEN
133.81 ± 0.0589.74 ± 0.0196.58 ± 4.7891.16 ± 0.0690.73 ± 13.8394.63 ± 5.2197.07 ± 3.58100 ± 0.00
274.99 ± 0.0385.01 ± 0.1293.87 ± 2.9493.47 ± 0.0493.96 ± 1.8889.90 ± 1.7196.68 ± 0.6896.76 ± 0.56
368.86 ± 0.0286.41 ± 0.2693.95 ± 4.9989.12 ± 0.1498.90 ± 0.6789.97 ± 1.2599.22 ± 0.4199.57 ± 0.66
447.57 ± 0.0496.19 ± 0.0286.29 ± 2.1792.41 ± 0.0596.53 ± 2.2397.65 ± 1.0799.62 ± 0.5599.15 ± 0.91
585.29 ± 0.0388.19 ± 0.0199.17 ± 0.7881.38 ± 0.0698.85 ± 1.3297.06 ± 1.0798.57 ± 1.4899.86 ± 0.11
695.77 ± 0.0388.16 ± 0.0298.14 ± 0.5096.53 ± 0.0198.99 ± 0.6799.33 ± 0.4199.63 ± 0.3999.51 ± 0.34
760.00 ± 0.2487.50 ± 0.0497.6 ± 3.2093.60 ± 0.0818.4 ± 36.871.20 ± 10.24100 ± 0.00100 ± 0.00
898.56 ± 0.01100 ± 0.0099.58 ± 0.45100 ± 0.0099.91 ± 0.1999.86 ± 0.19100 ± 0.0099.90 ± 0.19
930.00 ± 0.0888.23 ± 0.1688.89 ± 7.0366.67 ± 0.1221.43 ± 7.3666.67 ± 11.1176.67 ± 7.3693.34 ± 5.44
1075.45 ± 0.0289.35 ± 0.0196.32 ± 1.8889.86 ± 0.0599.11 ± 0.1592.78 ± 1.4197.76 ± 1.1299.43 ± 0.19
1182.14 ± 0.0193.71 ± 0.0394.14 ± 3.0489.69 ± 0.1499.36 ± 0.2595.70 ± 1.3499.38 ± 0.3399.38 ± 0.20
1261.31 ± 0.0193.63 ± 0.2392.77 ± 2.2577.21 ± 0.1495.43 ± 1.3980.56 ± 3.2996.29 ± 1.3698.58 ± 0.39
1395.14 ± 0.02100 ± 0.0099.68 ± 0.6595.19 ± 0.0398.38 ± 1.4199.68 ± 0.4398.16 ± 2.0499.89 ± 0.22
1494.19 ± 0.0297.99 ± 0.0499.37 ± 0.1397.76 ± 0.0199.39 ± 0.5298.53 ± 1.0399.89 ± 0.2199.84 ± 0.13
1553.91 ± 0.0488.92 ± 0.0197.64 ± 1.0594.30 ± 0.0299.08 ± 1.0590.72 ± 2.6797.46 ± 2.90100 ± 0.00
1680.95 ± 0.0595.29 ± 0.0199.28 ± 0.9598.05 ± 0.0196.90 ± 1.7891.90 ± 5.8088.80 ± 6.8097.14 ± 0.95
OA79.60 ± 0.0191.62 ± 0.0195.75 ± 0.6191.43 ± 0.0497.73 ± 0.3393.92 ± 0.9198.49 ± 0.4699.10 ± 0.06
AA71.12 ± 0.0191.78 ± 0.0195.83 ± 0.6190.40 ± 0.0486.49 ± 0.3391.01 ± 0.9396.58 ± 0.4698.90 ± 0.30
Kappa76.66 ± 0.0190.45 ± 0.0195.16 ± 0.6190.27 ± 0.0497.41 ± 0.3393.06 ± 1.0498.28 ± 0.4698.98 ± 0.07
Table 6. Classification results of different methods on the KSC dataset.
Table 6. Classification results of different methods on the KSC dataset.
ClassSVM3DCNNSSRNDFFNHybridSNSpeformerSSFTTMAFEN
195.01 ± 0.0198.12 ± 0.3499.68 ± 0.3599.74 ± 0.21100 ± 0.0099.74 ± 0.0699.80 ± 0.25100 ± 0.00
285.84 ± 0.0677.53 ± 8.5698.54 ± 0.7399.27 ± 1.0797.35 ± 1.2798.36 ± 0.89100 ± 0.00100 ± 0.00
386.93 ± 0.0487.15 ± 4.0970.78 ± 32.3697.22 ± 2.2698.43 ± 1.6697.13 ± 2.05100 ± 0.0099.74 ± 0.35
450.66 ± 0.0380.11 ± 1.3082.91 ± 11.6783.96 ± 7.6995.51 ± 2.4270.84 ± 2.2991.37 ± 6.8598.68 ± 1.47
531.45 ± 0.1984.25 ± 6.4672.28 ± 34.0991.31 ± 0.9497.93 ± 1.5794.07 ± 2.81100 ± 0.00100 ± 0.00
652.75 ± 0.0487.63 ± 4.3277.57 ± 19.7275.92 ± 6.5198.64 ± 1.7792.23 ± 1.6299.51 ± 0.75100 ± 0.00
775.16 ± 0.1696.22 ± 4.0569.57 ± 36.19100 ± 0.0080.00 ± 8.52100 ± 0.00100 ± 0.00100 ± 0.00
889.28 ± 0.0385.83 ± 3.6599.48 ± 0.3699.43 ± 0.50100 ± 0.0099.95 ± 0.10100 ± 0.00100 ± 0.00
997.73 ± 0.0195.77 ± 0.90100 ± 0.0098.50 ± 0.63100 ± 0.00100 ± 0.00100 ± 0.00100 ± 0.00
1094.51 ± 0.0293.43 ± 4.7699.78 ± 0.4499.89 ± 0.2299.07 ± 1.8799.67 ± 0.32100 ± 0.00100 ± 0.00
1197.04 ± 0.0193.86 ± 2.54100 ± 0.0099.63 ± 0.2799.95 ± 0.1199.95 ± 0.11100 ± 0.0099.84 ± 0.32
1288.30 ± 0.0495.74 ± 1.3999.25 ± 0.3899.38 ± 0.61100 ± 0.0093.55 ± 2.8099.51 ± 0.87100 ± 0.00
1399.07 ± 0.0198.67 ± 1.07100 ± 0.00100 ± 0.00100 ± 0.00100 ± 0.00100 ± 0.00100 ± 0.00
OA87.81 ± 0.0192.74 ± 1.4295.04 ± 3.3197.39 ± 0.6298.98 ± 0.9197.15 ± 0.4099.48 ± 0.4399.91 ± 0.09
AA80.29 ± 0.0190.33 ± 1.4689.99 ± 7.7195.71 ± 0.9997.45 ± 3.2395.81 ± 0.5099.25 ± 0.6499.87 ± 0.14
Kappa86.41 ± 0.0191.91 ± 1.5894.47 ± 3.7097.10 ± 0.6998.86 ± 1.0296.83 ± 0.4599.43 ± 0.4899.90 ± 0.10
Table 7. Classification results of different methods on the Pavia University dataset.
Table 7. Classification results of different methods on the Pavia University dataset.
ClassSVM3DCNNSSRNDFFNHybridSNSpeformerSSFTTMAFEN
194.65 ± 0.0198.75 ± 0.0195.04 ± 2.6199.36 ± 0.4797.33 ± 4.4993.59 ± 0.5097.58 ± 1.5099.29 ± 0.56
298.12 ± 0.0199.35 ± 0.0199.49 ± 0.5999.96 ± 0.0399.90 ± 0.0599.29 ± 0.1699.67 ± 0.0899.97 ± 0.02
376.84 ± 0.0491.83 ± 0.0386.31 ± 6.1497.89 ± 1.0598.57 ± 0.7987.65 ± 2.4190.27 ± 3.5491.68 ± 2.31
492.91 ± 0.0393.00 ± 0.0296.52 ± 0.8590.51 ± 3.5295.38 ± 1.0395.34 ± 0.3297.19 ± 1.1498.48 ± 0.64
599.30 ± 0.0198.57 ± 0.0198.74 ± 1.5496.88 ± 3.4598.57 ± 0.9099.97 ± 0.0699.91 ± 0.15100 ± 0.00
687.84 ± 0.0299.68 ± 0.0191.96 ± 4.5899.40 ± 0.45100 ± 0.0096.87 ± 0.4598.43 ± 1.1599.79 ± 0.27
785.92 ± 0.0299.70 ± 0.0185.44 ± 12.8599.18 ± 0.7659.94 ± 5.9483.16 ± 1.3997.72 ± 1.3198.60 ± 1.08
889.92 ± 0.0196.56 ± 0.0290.09 ± 5.7297.99 ± 1.7295.74 ± 3.0493.51 ± 0.5397.22 ± 0.9497.87 ± 0.63
999.76 ± 0.0193.65 ± 0.0398.48 ± 1.2278.24 ± 12.6894.04 ± 3.0398.74 ± 0.5697.91 ± 2.6199.41 ± 0.47
OA93.94 ± 0.0198.09 ± 0.0195.76 ± 0.5898.25 ± 0.4297.35 ± 1.6296.28 ± 0.1698.26 ± 0.2699.09 ± 0.03
AA91.69 ± 0.0196.79 ± 0.0193.56 ± 0.8795.49 ± 1.4493.28 ± 5.2294.24 ± 0.2697.32 ± 0.4998.34 ± 0.18
Kappa91.93 ± 0.0197.47 ± 0.0194.37 ± 0.7897.68 ± 0.5796.48 ± 2.1695.07 ± 0.2197.69 ± 0.3498.80 ± 0.05
Table 8. Classification results of different methods on the Salinas dataset.
Table 8. Classification results of different methods on the Salinas dataset.
ClassSVM3DCNNSSRNDFFNHybridSNSpeformerSSFTTMAFEN
199.35 ± 0.0199.96 ± 0.0199.64 ± 0.3097.98 ± 4.0499.94 ± 0.1299.68 ± 0.5399.99 ± 0.02100 ± 0.00
299.88 ± 0.0199.52 ± 0.01100 ± 0.0099.62 ± 0.22100 ± 0.0099.75 ± 0.2399.99 ± 0.0199.99 ± 0.02
399.12 ± 0.0198.37 ± 0.0199.99 ± 0.0299.99 ± 0.02100 ± 0.0098.41 ± 0.9099.97 ± 0.04100 ± 0.00
499.74 ± 0.0195.63 ± 0.0298.79 ± 0.9191.72 ± 8.3699.87 ± 0.1299.56 ± 0.2599.62 ± 0.2099.50 ± 0.35
598.11 ± 0.0199.11 ± 0.0199.69 ± 0.3499.85 ± 0.2999.71 ± 0.4399.41 ± 0.2599.68 ± 0.2499.73 ± 0.44
699.96 ± 0.01100 ± 0.0099.83 ± 0.3398.68 ± 1.8899.99 ± 0.0199.98 ± 0.0299.98 ± 0.0499.99 ± 0.02
799.85 ± 0.0199.94 ± 0.0199.85 ± 0.1499.83 ± 0.0799.95 ± 0.0999.37 ± 0.4999.95 ± 0.04100 ± 0.00
890.83 ± 0.0194.36 ± 0.0291.89 ± 6.0599.08 ± 1.7199.92 ± 0.1297.77 ± 0.3496.81 ± 1.1199.81 ± 0.15
999.75 ± 0.0199.74 ± 0.01100 ± 0.00100 ± 0.00100 ± 0.0099.18 ± 0.3099.97 ± 0.06100 ± 0.00
1096.17 ± 0.0198.38 ± 0.0197.73 ± 1.11100 ± 0.0099.84 ± 0.1699.53 ± 0.2499.36 ± 0.3599.97 ± 0.05
1196.12 ± 0.0298.01 ± 0.0199.52 ± 0.4599.71 ± 0.48100 ± 0.0099.15 ± 0.3299.86 ± 0.08100 ± 0.00
12100 ± 0.0099.92 ± 0.01100 ± 0.0097.15 ± 1.6699.65 ± 0.7099.05 ± 0.9099.84 ± 0.32100 ± 0.00
1398.88 ± 0.0199.52 ± 0.0199.73 ± 0.5497.16 ± 1.9739.29 ± 4.8598.90 ± 0.4498.81 ± 0.6999.91 ± 0.08
1494.31 ± 0.0295.87 ± 0.0195.16 ± 3.9693.47 ± 6.0396.95 ± 5.8296.59 ± 1.7798.88 ± 0.1698.92 ± 0.38
1569.37 ± 0.0387.70 ± 0.0587.48 ± 6.6897.57 ± 2.6397.10 ± 5.1995.53 ± 0.6297.83 ± 0.4199.43 ± 0.24
1698.95 ± 0.0199.49 ± 0.0199.16 ± 0.38100 ± 0.0099.93 ± 0.1499.60 ± 0.2899.27 ± 0.1999.54 ± 0.26
OA93.30 ± 0.0196.64 ± 0.0196.27 ± 0.5498.77 ± 0.6998.46 ± 1.1998.49 ± 0.1298.89 ± 0.2899.82 ± 0.06
AA96.27 ± 0.0197.84 ± 0.0198.03 ± 0.2898.24 ± 0.8595.76 ± 2.9198.84 ± 0.2199.36 ± 0.1399.80 ± 0.04
Kappa92.52 ± 0.0196.26 ± 0.0195.85 ± 0.5998.63 ± 0.7798.28 ± 1.3398.32 ± 0.1398.77 ± 0.3199.80 ± 0.07
Table 9. Computational performance comparison of several comparison methods and the proposed method.
Table 9. Computational performance comparison of several comparison methods and the proposed method.
MethodsIndian PinesKSCPavia UniversitySalinas
Train. (s)Test. (s)Train. (s)Test. (s)Train. (s)Test. (s)Train. (s)Test. (s)
SSRN78.901.69115.471.90129.759.71449.5924.43
DFFN65.561.8237.791.1083.548.26112.9111.71
HybridSN73.852.0240.541.0291.939.21115.2812.37
Speformer394.0623.99118.995.7058.1512.07198.7452.78
SSFTT25.180.6526.640.768.781.1510.791.46
MAFEN66.131.9386.212.6765.597.6688.759.76
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, Y.; Zhu, J.; Feng, J.; Mu, C. A Feature Embedding Network with Multiscale Attention for Hyperspectral Image Classification. Remote Sens. 2023, 15, 3338. https://doi.org/10.3390/rs15133338

AMA Style

Liu Y, Zhu J, Feng J, Mu C. A Feature Embedding Network with Multiscale Attention for Hyperspectral Image Classification. Remote Sensing. 2023; 15(13):3338. https://doi.org/10.3390/rs15133338

Chicago/Turabian Style

Liu, Yi, Jian Zhu, Jiajie Feng, and Caihong Mu. 2023. "A Feature Embedding Network with Multiscale Attention for Hyperspectral Image Classification" Remote Sensing 15, no. 13: 3338. https://doi.org/10.3390/rs15133338

APA Style

Liu, Y., Zhu, J., Feng, J., & Mu, C. (2023). A Feature Embedding Network with Multiscale Attention for Hyperspectral Image Classification. Remote Sensing, 15(13), 3338. https://doi.org/10.3390/rs15133338

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop