Incorporating Attention Mechanism, Dense Connection Blocks, and Multi-Scale Reconstruction Networks for Open-Set Hyperspectral Image Classification

Zhou, Huaming; Wu, Haibin; Wang, Aili; Iwahori, Yuji; Yu, Xiaoyu

doi:10.3390/rs15184535

Open AccessArticle

Incorporating Attention Mechanism, Dense Connection Blocks, and Multi-Scale Reconstruction Networks for Open-Set Hyperspectral Image Classification

by

Huaming Zhou

¹,

Haibin Wu

^1,*

,

Aili Wang

¹

,

Yuji Iwahori

²

and

Xiaoyu Yu

³

¹

Heilongjiang Province Key Laboratory of Laser Spectroscopy Technology and Application, Harbin University of Science and Technology, Harbin 150080, China

²

Department of Computer Science, Chubu University, Kasugai 487-0027, Aichi, Japan

³

College of Electron and Information, University of Electronic Science and Technology of China, Zhongshan Institute, Zhongshan 528402, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2023, 15(18), 4535; https://doi.org/10.3390/rs15184535

Submission received: 27 July 2023 / Revised: 11 September 2023 / Accepted: 13 September 2023 / Published: 15 September 2023

(This article belongs to the Special Issue Recent Advances in Processing Mixed Pixels for Hyperspectral Image)

Download

Browse Figures

Versions Notes

Abstract

:

Hyperspectral image classification plays a crucial role in various remote sensing applications. However, existing methods often struggle with the challenge of unknown classes, leading to decreased classification accuracy and limited generalization. In this paper, we propose a novel deep learning framework called IADMRN, which addresses the issue of unknown class handling in hyperspectral image classification. IADMRN combines the strengths of dense connection blocks and attention mechanisms to extract discriminative features from hyperspectral data. Furthermore, it employs a multi-scale deconvolution image reconstruction sub-network to enhance feature reconstruction and provide additional information for classification. To handle unknown classes, IADMRN utilizes an extreme value theory-based model to calculate the probability of unknown class membership. Experimental results on the three public datasets demonstrate that IADMRN outperforms state-of-the-art methods in terms of classification accuracy for both known and unknown classes. Experimental results show that the proposed methods outperform several state-of-the-art methods, which outperformed DCFSL by 8.47%, 6.57%, and 4.25%, and outperformed MDL4OW by 4.35%, 4.08%, and 2.47% on the Salinas, University of Pavia, and Indian Pines datasets, respectively. The proposed framework is computationally efficient and showcases the ability to effectively handle unknown classes in hyperspectral image classification tasks. Overall, IADMRN offers a promising solution for accurate and robust hyperspectral image classification, making it a valuable tool for remote sensing applications.

Keywords:

hyperspectral image classification; unknown class handling; dense connection blocks; attention mechanisms; multi-scale deconvolution; extreme value theory

Graphical Abstract

1. Introduction

Hyperspectral imaging is a robust technology that combines spatial and spectral information about ground objects to provide highly detailed information about the Earth’s surface. Hyperspectral imaging (HSI) comprises multiple bands of image, each corresponding to a specific wavelength range, and is capable of detecting and identifying unique spectral signatures of different materials [1]. The spectral detection range of hyperspectral technology far exceeds the perception range of the human eye, making it a powerful tool for environmental monitoring, agriculture, ecology, oceanography, geology, and land management [2]. The purpose of HSI classification is to classify each pixel into a corresponding ground class (e.g., building, soil, grassland, tree, river, road, and so on). As a core step in HSI data processing, HSI classification plays an irreplaceable role in most hyperspectral technology applications. Nonetheless, precise classification of hyperspectral images remains a provocative task, especially in open scenes where previously unseen categories may appear in the dataset [3].

Hyperspectral image classification has been an ongoing and active topic of research for decades, and many methodologies have been advanced to cope with this provocative problem. In the early stages, spectral-based classification methodologies employ the spectral information of hyperspectral images to classify pixels. Principal component analysis (PCA) [4] and linear discriminant analysis (LDA) [5] are two extensively utilized feature extraction methods that have been adopted successfully for hyperspectral image classification. Nevertheless, these methods may not capture the complex and nonlinear relationships between spectral and spatial features of hyperspectral images, restricting their performance. Feature-based classification methods derive features from hyperspectral images and subject them to classification. Local Binary Pattern (LBP), Gray Level Co-generation Matrix (GLCM), and Histogram of Oriented Gradients (HOG) are somewhat frequent feature extraction methods adopted in hyperspectral image classification. Those methods, by contrast, probably fail to capture the complex and advanced features of hyperspectral images, yielding limited properties in some cases.

Deep learning techniques have evolved rapidly in the fields of computer vision and pattern recognition in recent years, and they are believed to produce better performance than traditional shallow classifiers. Inspired by the aggressive adoption of deep learning, many deep learning classifiers have been developed for hyperspectral image classification with remarkable performance. Chen et al. [6] initially came up with a deep HSI classification network based on a superposition autoencoder (SAE). They gave a further presentation of a convolutional neural network (CNN)-based HSI classification method that incorporates sparse connectivity and weight sharing for efficient feature extraction, which adopts sparse connectivity and weight sharing to achieve effective feature extraction [7]. Subsequently, Zhong et al. raised the spectral-spatial residual network (SSRN) for HSI classification, which relieves the gradient disappearance or explosion issue as the complexity of the network layers increases [8]. Contrary to conventional algorithms, deep learning methods can extract features automatically from a large set of labeled data rather than requiring human design of specific feature schemas [9]. However, they rely on the use of known labeled samples for their optimal performance [10].

Whereas the objectives of these methods are to attain high classification accuracy with training samples, this is based on the closed assumption that all training and test data originate from the same labeling space. Nevertheless, it is extremely challenging in real-world applications to stick to this assumption. In such cases, traditional classification methods often produce incorrect classification results, meaning that unknown samples are classified into a known category. Comparing with the open set of natural images, the open set of HSIs has three remarkable differences: fewer samples, fewer categories, and lower openness. Firstly, the difficulty of sampling HSI leads to a much smaller sample size of available training data versus natural images. Secondly, HSIs often have only a few or a dozen categories, whereas natural image datasets such as the ImageNet dataset may have thousands of categories. Finally, the unknown samples usually belong to the tail categories with small sample sizes and occupy only a few parts of the research area. Thus, it is hard to use the open-set classification technique developed for natural image classification directly for HSI classification. To tackle this concern, little work has specialized in the open-set classification of HSI. Pal et al. [11] posited a residual 3D convolutional block attention module to extract discriminative prototypes for each known class, and then designed a meta-learning-based outlier point calibration network to distinguish between known and unknown samples. To improve the robustness of HSI classification methods in open-set while maintaining the classification accuracy of known classes, Jun Yue [12] proposed a spectral-spatial reconstruction framework that simultaneously performs spectral feature reconstruction, spatial feature reconstruction, and pixel-wise classification in open-set. Zhuo [13] proposes a feature consistency-based prototype network (FCPN) for open-set HSI classification, which makes full use of feature consistency between homogeneous samples without pseudo-unseen samples. Nonetheless, the existence of several challenges persists. Hyperspectral images encompass both spatial and spectral information, yielding high-dimensional datasets that are provocative for precise analysis and classification in open-set [14]. Aside from others, spectral mixing is a frequent problem in open-set classification of hyperspectral images, where the spectral features of different materials overlap, making accurate classification difficult [15].

Some key distinctions exist between open-set image classification and traditional closed-set image classification. In closed-set classification, all the categories are known, whereas in open-set classification, some categories are unknown. It means that in open-set classification, the model needs to have the capabilities to process the cases of unknown categories. When using deeper or more structurally complex network models in real-world open-set hyperspectral image classification tasks, they are extremely susceptible to the issues of overfitting and gradient vanishing. Aiming at the problem of unknown class samples being misclassified into known classes in the closed-set classification algorithm, we advance a novel hyperspectral open-set classification method: A model for integrating attention mechanism dense connection blocks and a multiscale reconstruction network for hyperspectral open set classification (IADMRN) gives a solution for the open set dataset of hyperspectral images. by constructing the fusion attention mechanism, densely connected block feature extraction sub-networks, multi-scale reconstruction sub-networks, classification sub-networks, and the EVT (Extreme Value Theory) extreme value model, thus enabling the network to realize the classification function in the open dataset. Moreover, the identification ability of the network for unknown classes is further strengthened to adjust to open-set classification environments in practical applications. The major novelties of this study are summarized as follows:

We propose a novel feature extraction network model structure composed of dense connection blocks combined with an attention mechanism. It builds the connection relationship between different layers to make optimal use of features and mitigate the gradient disappearance problem. By using the channel attention module and the space attention module, you can solve the problem of what to focus on and where to focus on in the channel and spatial dimensions. Additionally, we enhance the attention mechanism model by bringing in a depthwise separable convolution, which reinforces the attention allocation in spatial and channel dimensions by splitting the correlation between spatial and channel dimensions. It promotes the feature formulation ability during forward propagation of the network and sufficiently extracts the feature information of small targets.
We harness multi-task learning to perform classification and reconstruction simultaneously, thus permitting automatic identification of unknown classes. Deconvolutional filters with different sizes reconstruct different semantic information. Therefore, a multi-scale feature reconstruction architecture is introduced, which reconstructs the spatial context at multiple scales, thereby making full use of the rich spatial information and enhancing the robustness of the feature reconstruction in a complex background. Additionally, the multi-scale reconstruction network based on deconvolution helps to recover fine-grained details lost during feature extraction. The model further incorporates a multi-scale DeConv layer to fuse together and bolster the reconstructed network, rendering the reconstructed images intact and raising the classification accuracy.
Experiments were evaluated on the Salinas, University of Pavia, and Indian Pines datasets. It demonstrates that the proposed method can obtain superior classification performance for the known class and unknown classes compared to other state-of-the-art classification methods.
The rest of this article is organized as follows: Section 2 describes our proposed classification approach in detail. Section 3 reports the experimental results and evaluates the performance of the proposed method. It also analyzes the selection of experimental parameters in Section 4. Section 5 gives the conclusion.

2. Methodology

Considering that existing methods usually struggle to overcome the challenges of unknown categories, resulting in degraded classification accuracy and limited generalization, In this paper, we propose a new deep learning framework called IADMRN, which addresses the unknown class handling problem in hyperspectral image classification. The overview architecture of the IADMRN model is shown in Figure 1.

In the Section 2, the proposed IADMRN will be described in detail. Firstly, we briefly introduce the overall structure of the IADMRN. Secondly, the feature extraction module is introduced. Thirdly, the multi-scale image reconstruction network is presented. Finally, the classification sub-network is presented.

2.1. The Proposed IADMRN Framework for Open-Set HSI Classification

Figure 1 presents the holistic framework of our proposed IADMRN for Open-Set HSI Classification, which is exemplified by the Indian Pines dataset. To begin with, the hyperspectral image to be classified is injected. IADMRN conducts feature extraction using a feature extraction sub-network formed by dense connection blocks combined with an attention mechanism. Dense connection blocks combined with a depthwise separable convolutional attention mechanism empower the network to repurpose information at multiple levels, capturing the most relevant or discriminative categorical features. It further expands the feature expressivity of the network by segmenting its spatial and channel dimensions with an optional concern for important features.

The extracted features are transferred to a multi-scale image reconstruction network with deconvolution to reconstruct the image. By fusing feature maps through multi-scale DeConv layers, the image reconstruction sub-network refines the reconstructed image by recovering fine-grained details lost during feature extraction, reducing information loss, and enhancing feature representation capabilities. The reconstruction loss is arrived at by contrasting the reconstructed image with the original image, and the reconstruction loss is injected into the EVT extreme value theory model to evaluate the probability that the samples in the image belong to the unknown class. If the probability of the samples belonging to the unknown class is low, the samples are adjudged as known class and input to the FC and SoftMax layers for normal classification; if the probability of the samples belonging to the unknown class is strong, the samples are assessed as unknown class.

2.2. Feature Extraction Sub-Network

As shown in Figure 2, the dense connection block and an attention mechanism [16] form the feature extraction sub-network. The feature extraction sub-network of IADMRN is designed to efficiently extract relevant and discriminative features from hyperspectral images. It consists of densely connected blocks combined with an attention mechanism to allow for effective information propagation and selective feature extraction. The dense connection block establishes the connection relationship between different layers to make the best use of hyperspectral image features. Depthwise separable convolution [17] can strengthen the attention mechanism mode, which reinforces the attention allocation on the spectral and spatial dimensions of hyperspectral images by splitting the correlation between spatial and channel dimensions.

Specifically, depthwise separable convolution first performs an independent convolution operation for each channel, which improves the expressiveness of each channel in the spatial dimension. Then, the correlations between different channels are taken into account by point-by-point convolution operations, which can strengthen the attention mechanism model in the spectral dimension. Consequently, depth-separable convolution can effectively strengthen the allocation of attention to hyperspectral images in both spectral and spatial dimensions to improve the performance of the model.

2.2.1. Dense Connection Block

To be specific, in the dense connection block, the input of each layer is a join of the outputs of all the previous layers [18]. The structure results in adequate information transfer within the module and, at the same time, effectively mitigates the gradient disappearance problem [19]. In a dense connection block, assuming that the input of the current layer is a hyperspectral image

X

and the output is the hyperspectral feature map

H

, the layer can be represented as:

H = F_{k} ([X, H_{k - 1} (X), H_{k - 2} (X), \dots, H_{0} (X)])

(1)

where

[X, H_{k - 1} (X), H_{k - 2} (X), \dots, H_{0} (X)]

denotes the joining of the outputs of all previous layers.

F_{k}

denotes the operation of the current layer [20].

H_{i}

denotes the operation of the

i

layer, and

k

denotes the number of the current layer. Specifically,

F_{k}

usually consists of BN, ReLU, and two-dimensional convolution operations, which can be expressed as:

F_{k} (X) = B_{k} (R_{k} (C_{k} ([X, H_{k - 1} (X), H_{k - 2} (X), \dots, H_{0} (X)])))

(2)

where

B_{k}

denotes the BN operation.

R_{k}

denotes the ReLU operation, and

C_{k}

denotes the two-dimensional convolution operation [21].

By introducing dense connectivity inside the module, dense connection blocks can effectively mitigate the gradient disappearing problem, and improve the model performance of extracting hyperspectral image features, and improve the feature representation without increasing the model complexity. The direct connections among layers of dense connection blocks can strengthen the expressive power of hyperspectral feature information, and the outputs of each layer are directly attached to the insects of all subsequent layers. Consequently, a high degree of sharing and transferring of information among layers is thus achieved, leading the network to capture more effectively various features more effectively from the data.

2.2.2. Depthwise Separable Convolution Attention Mechanism

The attention mechanism consists of depthwise separable convolution, channel attention, and spatial attention. Depthwise convolution convolves hyperspectral images in groups without changing the channel depth, and extracts the spatial features of each channel of the hyperspectral image separately. Depthwise separable convolution decomposes the standard convolution operation into deep convolution and point-by-point convolution. Point-by-point convolution realizes the information interaction among channels by performing a convolution operation on the input with a convolution kernel. Point-by-point convolution extracts the spectral features of hyperspectral images on a point-by-point basis using kernel functions. Therefore, the depthwise separable convolution has the merit of effectively extracting the spectral-spatial cooperative feature of hyperspectral images.

Depthwise separable convolution can strengthen the attention mechanism mode, which reinforces the attention allocation in spatial and channel dimensions by splitting the correlation between spatial and channel dimensions of the hyperspectral image. Depthwise separable convolution is an improvement based on traditional convolution, which decomposes the convolution operation into two steps: depth convolution and point-by-point convolution. Thus, the number of parameters and computations can be significantly reduced, while maintaining the effectiveness of the convolution operation [22]. Hence, by adding depthwise separable convolution in front of the attention mechanism, it can make the feature map more accurate and thus improve the performance of the final task.

Z_{i, j, k} = W_{k}^{point} * (\sum_{p = 0}^{k - 1} W_{p}^{depth} * X_{i, j, p})

(3)

A_{c, i, j} = w_{c} \cdot w_{i, j}

(4)

Y_{c, i, j} = A_{c, i, j} \cdot Z_{i, j, c}

(5)

O_{i, j} = \sum_{c = 0}^{C - 1} Y_{c, i, j}

(6)

where

X

is the input hyperspectral image feature map,

Z

is the hyperspectral image feature map after depthwise separable convolution processing [23].

A

is the CBAM attention map.

Y

is the weighted feature map.

O

is the final output feature map.

W^{d e p t h}

and

W^{p o i n t}

denote the depth convolution kernel and point-by-point convolution kernel of depthwise separable convolution, respectively.

w_{c}

and

w_{i, j}

denote the channel attention weight and spatial attention weight of CBAM, respectively, and

C

is the number of channels [24].

The attention mechanism combined with the implementation of deep separable convolution can squeeze the inconsequential parts of the hyperspectral image feature map to values close to zero, thus rendering the model more attentive to the important feature parts and providing improved accuracy and robustness of the hyperspectral image feature representation [25].

In particular, assuming that the input hyperspectral image feature map of the current network is

X

, the feature map obtained after the first dense connection block is

Z

. Then, an attention mechanism consisting of depthwise separable convolution, channel attention, and spatial attention is performed

Z

. The final output hyperspectral image feature map is

O

, which can be expressed as:

O = C B A M (DepSepConv (Z))

(7)

where

DepSepConv (\cdot)

denotes the depthwise separable convolution operation and

C B A M (\cdot)

denotes the CBAM attention mechanism operation [26]. Specifically, assuming that the dimension of

Z

is

H \times W \times C

, the operation of the depthwise separable convolution can be expressed as:

Y = DepthwiseConv (DepthwiseBN (DepthwiseReLU (PointwiseConv (Z))))

(8)

where

DepthwiseConv (\cdot)

denotes the deep convolution operation,

DepthwiseBN (\cdot)

denotes the deep BN operation.

DepthwiseReLU (\cdot)

denotes the deep ReLU operation, and

PointwiseConv (\cdot)

denotes the point convolution operation.

Y

has the same dimension as

Z

[27]. The operation of the CBAM attention mechanism can be expressed as:

O = \sum_{c = 1}^{C} A_{c} \cdot Y_{:, :, c}

(9)

where

A_{c}

denotes the attention weight of channel

c

and

Y_{:, :, c}

denotes the hyperspectral image feature map of

Y

on the

c

channel.

2.2.3. The Structure of Feature Extraction Sub-Network

In the whole network, the first fused attention mechanism dense connection block and the second fused attention mechanism dense connection block are sequentially cascaded to form the fused attention mechanism dense connection network. Specifically, the output

X_{2}

of the first fused attention mechanism thickly connected block is used as an input

X_{3}

of the second fused attention mechanism thickly connected block, as follows:

X_{0} \overset{D B 1}{\to} X_{2} \overset{D B 2}{\to} X_{4}

(10)

where

D B 1

denotes the first fused attention mechanism’s thickly connected block and

D B 2

denotes the second fused attention mechanism’s thickly connected block.

Dense connection blocks are utilized to forge a robust connection between the different layers of the network, permitting information to circulate more freely through the network and reducing the risk of gradient disappearance. It contributes to the efficiency and robustness of the model and also helps to prevent overfitting by reducing the number of parameters required [28]. The attention mechanism in the hyperspectral image feature extraction sub-network is induced by the use of depthwise separable convolution. It allows one to selectively focus on important features in spatial and channel dimensions, which is particularly relevant for hyperspectral images containing small targets and subtle spectral differences. By focusing on these important features, the model can achieve higher accuracy and better generalization performance. By attending to these important features, the model can achieve higher accuracy and better generalization performance.

The feature extraction sub-network in IADMRN is efficient and effective in extracting high-quality hyperspectral image features for subsequent classification and reconstruction tasks. By incorporating dense connection blocks with an attention mechanism, the feature extraction sub-network can achieve superior performance for HSI classification of open sets.

2.3. Image Reconstruction Sub-Network

The image reconstruction sub-network consists of a multi-scale deconvolution network, shown in Figure 3. The multi-scale deconvolution network is made up of five branches. By performing element-by-element summation of hyperspectral image feature maps obtained from different branches, the reconstructed images benefit from the aggregated information, resulting in a more complete and richer representation. Figure 3 gives the structure of the image reconstruction sub-network.

Specifically, since hyperspectral images usually contain a large number of spectral bands, the dimension catastrophe problem is easily encountered in feature extraction and classification tasks. Image reconstruction can solve this problem by performing dimensionality reduction on the original hyperspectral image, which can improve the efficiency and accuracy of the classifier.

Second, image reconstruction can also help improve the robustness of classification. Hyperspectral images are usually affected by noise and other interferences, which can lead to a decrease in the accuracy of the classifier. By using image reconstruction techniques, the effects of these noises and other disturbances can be reduced, and the robustness of the classifier to these disturbances can be improved [29].

In conclusion, image reconstruction can contribute to enhancing the interpretability of the classifier [30]. It is difficult to understand the internal workings of hyperspectral image classifiers because they are usually black-box models. Through using image reconstruction techniques, the original hyperspectral image can be converted into a more understandable form, leading to a better understanding of the workings of the classifier and the classification results.

Multi-scale deconvolution image reconstruction in IADMRN is based on the concepts of upsampling and feature fusion to reconstruct high-quality hyperspectral images. It follows a step-by-step process that involves multiple deconvolution operations with different kernel sizes and the fusion of the reconstructed feature maps. Deconvolution can be seen as the inverse process of convolution, allowing the low-dimensional feature vectors to be mapped back into the spatial domain of the original feature map [31]. This upsampling restores the spatial resolution of the feature maps. The principle can be summarized as follows:

C_{1} = DeConv (F, W_{1})

(11)

C_{2 i} = DeConv (C_{2 i - 1}, W_{2 i}), i = 1, \dots, 4

(12)

C_{2 j + 1} {= BN}^{(1 + {(- 1)}^{j + 1}) / 2} (ReLU (DeConv (C_{2 j - 1}, W_{2 j + 1})), j = 1, \dots, 4

(13)

F

represents the hyperspectral image feature map obtained from the feature extraction sub-network.

C_{i}

represent the reconstructed hyperspectral image feature maps at different branches of the hyperspectral image reconstruction sub-network.

W_{i}

are the convolution kernels used in the deconvolution operation.

DeConv

stands for deconvolution operation.

The hyperspectral image feature maps are reconstructed at different scales by applying deconvolution operations with varying kernel sizes. This multi-scale approach captures hyperspectral image features at different levels of detail, ranging from fine-grained details to broader contextual information. By using different kernel sizes, the network can effectively reconstruct features at different scales, enhancing the representation of the reconstructed hyperspectral images.

Q_{r} = C_{2} + C_{4} + C_{6} + C_{8} + C_{9}

(14)

Q_{r}

stands for the final reconstructed hyperspectral image.

The image reconstruction sub-network employs a multi-scale approach through the use of deconvolution operations. By utilizing various kernel sizes (such as 1 × 1, 3 × 3, 5 × 5, 7 × 7, and 9 × 9), it facilitates the reconstruction of hyperspectral image features at different scales, effectively capturing both fine-grained details and broader contextual information. This comprehensive feature reconstruction enhances the fidelity of the reconstructed hyperspectral images [32]. Further, the sub-network incorporates a fusion strategy that combines the reconstructed hyperspectral image feature maps from multiple branches. By element-wise summarizing the hyperspectral image feature maps obtained at different stages, the reconstructed hyperspectral image benefits from the aggregated information, resulting in a more complete and informative representation. This information fusion mechanism enables the identification of unknown classes by comparing the fused feature map with the original image.

2.4. Classification Sub-Network

Classification sub-network leverages EVT extreme value modeling to identify unknown classes within the hyperspectral image feature map. By calculating the probabilities associated with each pixel using the reconstruction loss, the EVT model distinguishes between known and unknown classes.

EVT is a statistical method commonly used in the modeling and analysis of extreme events that can be used to predict events that are likely to occur under extreme conditions. In deep learning, EVT is used in classification problems to distinguish between normal and abnormal data. The classification principle of EVT extreme value modeling is based on the assumption of extreme value distribution, i.e., the distribution of data can be approximated under extreme conditions. In classification problems, the EVT extreme value model is used to evaluate whether new data are normal or not by fitting the normal data in the training dataset to a model of the extreme value distribution, which is then used to evaluate whether the new data are normal or not.

As shown in Figure 4, A solution from the EVT model trained on five known classes: A, B, C, D, and E. Each class has its own independent shape and scale parameters learned from the data, and supports a soft margin. However, the unknown classes in the data do not have these characteristics. Via kernel-free non-linear modeling, the EVT model supports open set recognition and can reject the four “?” inputs that lie beyond the support of the training set as “unknown.” This capability allows for the inclusion of unknown class samples during training, which improves the classifier’s generalization ability and enables effective handling of previously unseen or novel classes. To make accurate classification decisions, the sub-network employs a threshold-based approach. By comparing the probability of belonging to the unknown class with a predefined threshold, the model effectively separates known and unknown classes [33]. This mechanism enables the classifier to assign samples to the appropriate class with high confidence, ensuring reliable classification results.

2.4.1. Reconstruction Loss Calculation

The Classification sub-network incorporates the calculation of reconstruction loss, which measures the discrepancy between the original and reconstructed hyperspectral images. The mean square error (MSE) is employed as the loss function to quantify the difference. By including the image reconstruction task as a secondary objective, the sub-network effectively utilizes the hyperspectral information and enforces the reconstruction fidelity during training.

Q_{r} = f_{_{r}} (X_{4})

(15)

l_{r} (Q, Q_{r}) = MSE ∥ Q - Q_{r} ∥_{1}

(16)

where

Q_{r}

is the reconstructed instance,

f_{_{r}} (.)

is the reconstruction function also called the decoder, and

X_{4}

is the potential feature of the encoder

f_{D B} (.)

output. We use the

l_{r} (.)

distance as the reconstruction loss.

2.4.2. EVT Extreme Value Modeling

To identify unknown classes, the Classification sub-network utilizes EVT extreme value modeling. The reconstruction loss serves as input to an EVT model, where the probability of each pixel belonging to the unknown class is determined. By leveraging extreme value theory, the model captures the distribution of reconstruction loss and assigns probabilities to each pixel, enabling effective identification of unknown class samples [34].

EVT indicates that the tail for should be a Weibull distribution. For a large class of distributions

S

and a threshold

r

sufficiently large, with

\{S_{1}, \dots, S_{n}\}

,

n

independent and identically distributed samples, the cumulative distribution function can be approximated by the generalized Pareto distribution (GPD).

\begin{matrix} P (s - r \leq s | s > r) = \frac{F_{V} (r + s) - F_{V} (s)}{1 - F_{S} (r)} \\ = R_{k, u} (s) \end{matrix}

(17)

The parameters

k

and

u

can be estimated from the given tail data. Here,

s

is the reconstruction loss, and

R_{k, u} (s)

is the cumulative distribution function of the GPD.

2.4.3. Threshold-Based Classification Decision

The Classification sub-network employs a threshold-based approach to make classification decisions. A predefined threshold value is compared to the probability of belonging to the unknown class. If the probability exceeds the threshold, the sample is classified as an unknown class. Otherwise, it is classified using traditional classification techniques, such as the fully connected (FC) layer followed by the SoftMax layer.

C l a s s = \{\begin{array}{l} U n k n o w n, P > T \\ FC (SoftMax (F_{v})), P \leq T \end{array}

(18)

P

present the probability of belonging to the unknown class.

T

present threshold.

F_{v}

stands for sample by feature extraction subtask. By setting an appropriate threshold, the sub-network effectively separates samples into known and unknown classes, ensuring reliable and accurate classification outcomes. As mentioned earlier, the SoftMax function transforms the score vector into a probability vector, and the class with the largest probability is considered the predicted class. A naive solution to identify unknown classes is to consider those instances with the largest probability smaller than 0.5 as unknown, which is one of the baselines in the experiments (SoftMax with threshold = 0.5).

3. Results

3.1. Datasets Description

In this section, the proposed methodology is evaluated using three hyperspectral datasets to assess its performance and effectiveness. These datasets are carefully chosen to cover a diverse range of applications and scenarios, allowing for a comprehensive evaluation of the proposed methodology’s dominance and applicability.

(1): Salinas: The Salinas dataset is named after the Salinas Valley in California, USA, where the data were collected. The dataset provides high-resolution hyperspectral images captured by an airborne sensor, containing detailed spectral and spatial information about the agricultural region [35,36]. The Salinas dataset consists of 512 × 217 pixels, with a total of 224 spectral bands covering the wavelength range from 0.2 to 2.4 μm. In addition, some unannotated man-made materials have been annotated as unknown samples, and the reference map contains 17 classes in total. The number of classes is presented in Table 1.
(2): University of Pavia: The University of Pavia dataset is an extensively accessible hyperspectral remote sensing dataset, commonly adopted in the field of hyperspectral image analysis and classification. It was gathered by the Reflection Optical System Imaging Spectrometer (ROSIS) sensor in an agricultural area of Pavia, Italy. The dataset delivers high spatial resolution and covers a vast spectral range, yielding valuable information for a variety of land cover and land use applications [37]. It incorporates 103 bands after filtering out the 12 bands affected by noise and water absorption, with a scene size of 610 × 340 and a GSD of 1.3 m. Additionally, some buildings left unannotated are marked as unknown. They share distinct spectral profiles with the known land cover, and the reference map contains 10 classes. The detailed number of pixels available in each class, the false-color composite image, and the ground truth map are presented in Table 2.
(3): Indian Pines: The Indian Pines dataset provides both spatial and spectral information about the captured scene. It consists of a 145 × 145 pixel spatial resolution, meaning it contains 21,025 pixels in total. Each pixel in the dataset represents a hyperspectral signature containing spectral reflectance information across 145 different bands [38]. The Indian Pines dataset encompasses a diverse set of land cover classes, including crops, vegetation, bare soil, and man-made structures. It contains a total of 16 different land cover classes, making it suitable for various classification and analysis tasks. Since the number of instantiations was less than 10 for part classes, these tail classes were discarded in the experiments following the paper [35] and regarded as unknown classes, leading to 8 known classes. Details of the dataset can be found in Table 3.

3.2. Experimental Parameters Setting

(1): Implementation Details: All experiments were performed on an Intel(R) Xeon(R) 4208 CPU @ 2.10 GHz processor and Nvidia GeForce RTX 2080Ti graphics card. The AdaDelta optimizer is used for backpropagation in the training phase. The learning rate was set at 1.0 for the first 170 epochs and then at 0.1 for another 30 epochs before the training stopped. We adopted the early stopping mechanism to accelerate the training. If the loss does not decrease for five epochs, the learning proceeds immediately to the next phase.
(2): Evaluation Indexes: Three widely used objective indexes, that is, overall accuracy (OA), average accuracy (AA), and Kappa coefficient, are adopted to verify the classification effect of all methods. All objective results are listed by calculating the average of ten random training samples.

3.3. Comparison of the Proposed Methods with the State-of-the-Art Methods

The experiment mainly compares the proposed algorithm IADMRN with Convolutional Neural Network (CNN) [39], Residual Network (ResNet) [40], Deep Cross-Domain Few-Shot Learning (DCFSL) [41], multitask deep learning methods with unknown classes in the open world (MDL4OW) [11] and generative adversarial networks with open set domain adaptation (OS-GAN) [42] classification performance for the hyperspectral dataset. 200 training samples per class were used as the training sample number for hyperspectral classification, as shown in Table 4, Table 5 and Table 6. Compared with other methods, the IADMRN proposed in this paper has the highest classification accuracy for three datasets.

In the Salinas Valley dataset, which contains 16 agricultural classes, some of the man-made materials are not considered and are treated as unknown classes. As shown in Table 4, the OA of IADMRN was 92.53%. Among all classification groups, the classification method IADMRN proposed in this paper has the highest OA, AA, and Kappa coefficients, reaching 92.53%, 92.94%, and 90.92%, respectively. In addition, CNN, ResNet, and DCFSL methods cannot well identify the unknown class, and the classification accuracy for the unknown class is less than 5%. Compared with MDL4OW and OS-GAN, the recognition accuracy of OA increased by 4.35% and 8.47%, respectively. In contrast, our method obtains satisfactory classification accuracy with 80% class accuracy for the unknown class, which illustrates the effectiveness of the proposed open-set classification module.

For the UP dataset, we report the class-by-class accuracy, open OA, open AA, and 100 Kappa score of the proposed method and the comparison methods in Table 5. From this table, we can see that compared with the CNN, ResNet, DCFSL, OS-GAN, and MDL4OW methods, the OA of IADMRN increased by 20.06%, 13.43%, 9.6%, 6.57%, and 4.08%, respectively.

Table 6 shows the classification results on the Indian Pines dataset; again, our approach has produced the best results with an OA of 86.79%. Compared to other learning methods such as CNN, ResNet, DCFSL, OS-GAN, and MDL4OW, the improvements are 18.5%, 9.04%, 5.66%, 4.25%, and 2.47% in OA; 16.4%, 9.92%, 7.77%, 5.62%, and 3.05% in AA; and 22.01%, 11.2%, 7.94%, 6.44%, and 2.52% in the Kappa, respectively. For the classification accuracy of unknown classes, our method improved over OS-GAN and MDL4OW by 34.18% and 25.54%, respectively.

In addition, the quantitative classification results reported, we simultaneously visualized the classification maps corresponding to the results in Figure 5, Figure 6 and Figure 7.

In the Salinas dataset, the unknown class is the white area in the truth map, which is mainly distributed in the lower left and middle of the picture. The black areas in the classification map are the novel classes identified by the models. It can be seen that CNN, ResNet, and DCFSL methods fail to identify the unknown samples. In the classification diagrams of these methods, the black areas are hardly visible. These methods mentioned above ignore the difference between the known classes and unknown samples; therefore, they cannot correctly classify the unknown samples. Scatter is still present in the upper left region of the Figure 6d,e. And the black, unknown-class area in the middle region is not identified completely. Therefore, OS-GAN and MDL4OW methods classify the unknown class; however, there are still large problems of misclassification and failure to identify the unknown class. In Figure 7, it is obvious that the CNN, ResNet, and DCFSL methods cannot perform well on this dataset. These methods are unable to extract unknown class-discriminative features and thus produce poor classification maps.

Different from these methods, the proposed method in Figure 5f, Figure 6f and Figure 7f can well classify the unknown samples and can well align with the reference image. Based on these experimental results, it can be concluded that the proposed method achieves the best classification performance among all studied methods and is able to greatly improve the open-set classification accuracy. IADMRN achieves superior classification accuracy for unknown classes compared to other methods due to its advanced features and methodologies. The dense connection network model structure combined with an attention mechanism enhances feature extraction and discrimination, mitigating the problem of gradient vanishing. The incorporation of deep separable convolution improves the attention mechanism, allowing for better extraction of spatial and channel features relevant to unknown classes. The multi-scale deconvolution image reconstruction recovers fine-grained details and provides additional cues for distinguishing unknown classes. Multi-task learning combines classification and reconstruction tasks, leveraging reconstruction loss to improve classification accuracy. EVT-based unknown class identification accurately identifies unknown classes, enhancing classification performance. Overall, IADMRNs innovations enable it to leverage rich feature representations, handle unknown classes effectively, and achieve superior classification accuracy.

4. Discussion

In order to find the optimal network structure, it is necessary to experiment with different parameters, which play a crucial role in the size of the model and the complexity of the proposed IADMRN. In this paper, the optimal parameter combination is determined by analyzing the influence of parameters on the accuracy of classification results, including the Ablation Study, the threshold of SoftMax, and the number of branches in the feature fusion strategy.

4.1. The Threshold of SoftMax

The first parameter is the threshold for SoftMax. The Classification sub-network employs a threshold-based approach to make classification decisions. A predefined threshold value is compared to the probability of belonging to the unknown class. The SoftMax function transforms the score vector into a probability vector, and the class with the largest probability is considered the predicted class. We show a threshold analysis in Figure 8 to demonstrate that the effectiveness of the SoftMax threshold setting. As seen in Figure 8, the threshold value increases the higher the OA value. Up to the threshold value of 0.5, the OA value is the highest. Beyond 0.5, the OA value decreases as the threshold value increases. The choice of the SoftMax threshold can have implications for the performance of the model. If the threshold is too high, the model may be overly conservative and classify many samples as unknown classes, even though they belong to a known class. This can lead to lower accuracy on the known class classification task. On the other hand, if the threshold is too low, the model may be too lenient and classify many samples as known classes, even though they belong to an unknown class. This can lead to lower accuracy on the unknown class classification task.

4.2. The Number of Branches in Feature Fusion Strategy

The second parameter is the number of branches in the feature fusion strategy. This paper analyzes the correlation and complementarity of information in the deep network using multibranch feature fusion. IADMRN2, IADMRN3, IADMRN4, IADMRN5, and IADMRN6 refer to methods that fuse two, four, five, and six hierarchical branches. Among them, IADMRN2 represents the fusion of the first and second branches. IADMRN6 represents the duplicate fusion of the fifth branch and IADMRN5. It can be seen from Figure 9 that in different datasets, IADMRN5s precision values are superior to those of IADMRN2, IADMRN3, and IADMRN4. In addition, taking the Salinas dataset as an example, compared with the IADMRN2, the OA, AA, and Kappa values of the IADMRN5 fusion strategy increased by 2.28%, 3.32%, and 2.79%, respectively. As can be seen in Figure 9, IADMRN2 only serves as a feature map due to its smallest sensory field and cannot capture other scale features. Therefore, the reconstructed image is the worst, resulting in the lowest classification accuracy. When IADMRN5 contains deconvolutional layers with five different scale receptive fields, the features at different scales can be fully extracted, and the reconstruction performance is optimal.

To some extent, multiple layer fusion improves the image reconstruction performance. And the classification accuracy of IADMRN6 is equal to or even slightly lower than that of IADMRN5. This indicates that excessive fusion layers may bring redundant information, which increases the number of parameters and reduces the reconstruction speed and quality. Therefore, the IADMRN proposed in this paper uses five branches for feature fusion, and its structure is shown in Figure 3.

4.3. Ablation Study

In order to verify the effectiveness of each module in IADMRN, an ablation study is conducted on the Salinas dataset and the Pavia University dataset. On the basis of IADMRN, we tried to replace the dense block module with an ordinary convolution layer, which constitutes IADMRN without the Dense connection block module. We use ordinary convolution instead of depthwise separable convolution and deconvolution instead of Multi-scale deconvolution image reconstruction, representing IADMRN without the Depthwise separable convolution attention mechanism and IADMRN without the Multi-scale deconvolution image reconstruction, respectively. In the experimental setup of this ablation study, IADMRN without the Multi-scale deconvolution image reconstruction, IADMRN without the Depthwise separable convolution attention mechanism, and IADMRN without the Dense connection block module were compared with our proposed method.

The relevant accuracy indicators and results of the ablation study are shown in Figure 10. It can be seen from the table that each module proposed in this paper plays a very important role in improving the accuracy. After removing the Multi-scale deconvolution image reconstruction module, the OA is decreased by 2.5%. After removing the Depthwise separable convolution attention mechanism module, the OA is decreased by 2.1%. It can be concluded that each module in the proposed method contributes to the improvement of the accuracy and plays an important role in the performance of the network.

5. Conclusions

In this paper, we propose IADMRN, a novel framework for hyperspectral image classification with a focus on handling unknown classes. The framework combines advanced techniques such as feature extraction sub-networks with dense connection blocks and attention mechanisms, multi-scale deconvolution image reconstruction, and EVT-based unknown class identification. Through extensive evaluations on diverse hyperspectral datasets, including the Indian Pines, UP, and Salinas datasets, we demonstrated the dominance of IADMRN in terms of classification accuracy, particularly for unknown classes.

Our results showed that IADMRN outperforms existing methods in terms of classification accuracy for both known and unknown classes. The feature extraction sub-network effectively captures and utilizes discriminative features, mitigating the problem of gradient vanishing and enhancing classification performance. The multi-scale deconvolution image reconstruction leverages fine-grained details and contextual information, leading to improved classification accuracy. The integration of EVT-based unknown class identification enables accurate identification and assignment of unknown class labels, further enhancing classification results.

In conclusion, IADMRN presents a comprehensive and effective solution for hyperspectral image classification with unknown classes. Its innovative features and methodologies contribute to superior classification accuracy by addressing the limitations of existing methods. The framework’s versatility and performance across different datasets and applications make it a valuable tool in hyperspectral image analysis. Future work could explore potential enhancements, such as incorporating domain adaptation techniques or extending the framework to handle other types of data modalities, further advancing the field of hyperspectral image classification.

Author Contributions

Conceptualization, A.W., H.Z., H.W., Y.I. and X.Y.; methodology, software, validation, H.Z.; writing—review and editing, H.W., A.W., Y.I. and X.Y.; supervision, Y.I. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the high-end foreign expert introduction program (G2022012010L), Heilongjiang Natural Science Foundation Project (LH2023F034) and Reserved Leaders of Heilongjiang Provincial Leading Talent Echelon (2021).

Data Availability Statement

https://pan.baidu.com/s/1BlMWN9nADMSTuVQeK9U98g?pwd=iqcm (accessed on 15 June 2023).

Conflicts of Interest

The authors declare no conflict of interest.

References

Cao, X.; Yao, J.; Xu, Z.; Meng, D. Hyperspectral image classification with convolutional neural network and active learning. IEEE Trans. Geosci. Remote Sens. 2020, 58, 4604–4616. [Google Scholar] [CrossRef]
Wang, L.; Shi, Y.; Zhang, Z. Hyperspectral Image Classification Combining Improved Local Binary Mode and Superpixel-level Decision. J. Signal Process. 2023, 39, 61–72. [Google Scholar] [CrossRef]
Fang, L.; Zhu, D.; Yue, J.; Zhang, B.; He, M. Geometric-Spectral Reconstruction Learning for Multi-Source Open-Set Classification with Hyperspectral and LiDAR Data. IEEE/CAA J. Autom. Sin. 2022, 9, 1892–1895. [Google Scholar] [CrossRef]
Imani, M.; Ghassemian, H. Principal component discriminant analysis for feature extraction and classification of hyperspectral images. In Proceedings of the 2014 Iranian Conference on Intelligent Systems (ICIS), Bam, Iran, 4–6 February 2014; pp. 1–5. [Google Scholar] [CrossRef]
Yuan, H.; Lu, Y.; Yang, L.; Luo, H.; Tang, Y.Y. Spectral-spatial linear discriminant analysis for hyperspectral image classification. In Proceedings of the 2013 IEEE International Conference on Cybernetics (CYBCO), Lausanne, Switzerland, 13–15 June 2013; pp. 144–149. [Google Scholar] [CrossRef]
Chen, Y.; Lin, Z.; Zhao, X.; Wang, G.; Gu, Y. Deep learning-based classification of hyperspectral data. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2014, 7, 2094–2107. [Google Scholar] [CrossRef]
Chen, Y.; Jiang, H.; Li, C.; Jia, X.; Ghamisi, P. Deep feature extraction and classification of hyperspectral images based on convolutional neural networks. IEEE Trans. Geosci. Remote Sens. 2016, 54, 6232–6251. [Google Scholar] [CrossRef]
Zhong, Z.; Li, J.; Luo, Z.; Chapman, M. Spectral–spatial residual network for hyperspectral image classification: A 3-D deep learning framework. IEEE Trans. Geosci. Remote Sens. 2017, 56, 847–858. [Google Scholar] [CrossRef]
Liu, Y.; Tang, Y.; Zhang, L.; Liu, L.; Song, M.; Gong, K.; Peng, Y.; Hou, J.; Jiang, T. Hyperspectral open set classification with unknown classes rejection towards deep networks. Int. J. Remote Sens. 2020, 41, 6355–6383. [Google Scholar] [CrossRef]
Duan, Z.; Chen, H.; Li, X.; Zhou, J.; Wang, Y. A Semi-Supervised Learning Method for Hyperspectral-Image Open Set Classification. Photogramm. Eng. Remote Sens. 2022, 88, 653–664. [Google Scholar] [CrossRef]
Pal, D.; Bundele, V.; Sharma, R.; Banerjee, B.; Jeppu, Y. Few-shotopen-set recognition of hyperspectral images with outlier calibration network. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2022; pp. 3801–3810. [Google Scholar]
Yue, J.; Fang, L.; He, M. Spectral-spatial latent reconstruction for open-set hyperspectral image classification. IEEE Trans. Image Process. 2022, 31, 5227–5241. [Google Scholar] [CrossRef]
Xie, Z.; Duan, P.; Liu, W.; Kang, X.; Wei, X.; Li, S. Feature Consistency-Based Prototype Network for Open-Set Hyperspectral Image Classification. IEEE Trans. Neural Netw. Learn. Syst. 2023, 1–11. [Google Scholar] [CrossRef]
Liu, Y.; Hou, J.; Peng, Y.; Jiang, T. Distance-based hyperspectral open-set classification of deep neural networks. Remote Sens. Lett. 2021, 12, 636–644. [Google Scholar] [CrossRef]
Tang, X.; Peng, Y.; Li, C.; Zhou, T. Open set domain adaptation based on multi-classifier adversarial network for hyperspectral image classification. J. Appl. Remote Sens. 2021, 15, 044514. [Google Scholar] [CrossRef]
Wen, C.; Hong, M.; Yang, X.; Jia, J. Pulmonary Nodule Detection Based on Convolutional Block Attention Module. In Proceedings of the 2019 Chinese Control Conference (CCC), Guangzhou, China, 27–30 July 2019; pp. 8583–8587. [Google Scholar] [CrossRef]
Srivastava, H.; Sarawadekar, K. A Depthwise Separable Convolution Architecture for CNN Accelerator. In Proceedings of the 2020 IEEE Applied Signal Processing Conference (ASPCON), Kolkata, India, 7–9 October 2020; pp. 1–5. [Google Scholar]
Liu, S.; Shi, Q.; Zhang, L. Few-shot hyperspectral image classification with unknown classes using multitask deep learning. IEEE Trans. Geosci. Remote Sens. 2020, 59, 5085–5102. [Google Scholar] [CrossRef]
Liu, Y.; Hou, J.; Peng, Y.; Xu, Y.; Jiang, T. Hyperspectral Open Set Classification towards Deep Networks Based on Boxplot. IOP Conf. Ser. Earth Environ. Sci. 2021, 693, 012085. [Google Scholar] [CrossRef]
Li, S.; Song, W.; Fang, L.; Chen, Y.; Ghamisi, P.; Benediktsson, J.A. Deep learning for hyperspectral image classification: An overview. IEEE Trans. Geosci. Remote Sens. 2019, 57, 6690–6709. [Google Scholar] [CrossRef]
Camps-Valls, G.; Bruzzone, L. Kernel-based methods for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2005, 43, 1351–1362. [Google Scholar] [CrossRef]
He, L.; Li, J.; Liu, C.; Li, S. Recent advances on spectral–spatial hyperspectral image classification: An overview and new guidelines. IEEE Trans. Geosci. Remote Sens. 2017, 56, 1579–1597. [Google Scholar] [CrossRef]
Yang, X.; Ye, Y.; Li, X.; Lau, R.Y.; Zhang, X.; Huang, X. Hyperspectral image classification with deep learning models. IEEE Trans. Geosci. Remote Sens. 2018, 56, 5408–5423. [Google Scholar] [CrossRef]
Chen, H.; Miao, F.; Chen, Y.; Xiong, Y.; Chen, T. A hyperspectral image classification method using multifeature vectors and optimized KELM. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 2781–2795. [Google Scholar] [CrossRef]
Hong, D.; Han, Z.; Yao, J.; Gao, L.; Zhang, B.; Plaza, A.; Chanussot, J. SpectralFormer: Rethinking hyperspectral image classification with transformers. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5518615. [Google Scholar] [CrossRef]
Hong, D.; Gao, L.; Yao, J.; Zhang, B.; Plaza, A.; Chanussot, J. Graph convolutional networks for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2020, 59, 5966–5978. [Google Scholar] [CrossRef]
Mou, L.; Lu, X.; Li, X.; Zhu, X.X. Nonlocal graph convolutional networks for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2020, 58, 8246–8257. [Google Scholar] [CrossRef]
Hang, R.; Li, Z.; Liu, Q.; Ghamisi, P.; Bhattacharyya, S.S. Hyperspectral image classification with attention-aided CNNs. IEEE Trans. Geosci. Remote Sens. 2020, 59, 2281–2293. [Google Scholar] [CrossRef]
Wang, H.; Wang, A.C.; Cang, S. Hyperspectral remote sensing image reconstruction method based on compressive sensing. Chin. J. Liq. Cryst. Disp. 2017, 32, 219–226. [Google Scholar] [CrossRef]
Guo, Y.L.; Li, Y.M.; Zhu, L.; Xu, D.Q.; Li, Y.; Tan, J.; Zhou, L.; Liu, G. Research of hyperspectral reconstruction based on HJ1A-CCD data. Huan Jing ke Xue = Huanjing Kexue 2013, 34, 69–76. [Google Scholar]
He, X.; Chen, Y.; Lin, Z. Spatial-spectral transformer for hyperspectral image classification. Remote Sens. 2021, 13, 498. [Google Scholar] [CrossRef]
Rudd, E.M.; Jain, L.P.; Scheirer, W.J.; Boult, T.E. The Extreme Value Machine. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 762–768. [Google Scholar] [CrossRef]
Pal, D.; Bose, S.; Banerjee, B.; Jeppu, Y. Extreme Value Meta-Learning for Few-Shot Open-Set Recognition of Hyperspectral Images. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5512516. [Google Scholar] [CrossRef]
Sawant, S.S.; Manoharan, P. Unsupervised band selection based on weighted information entropy and 3D discrete cosine transform for hyperspectral image classification. Int. J. Remote Sens. 2020, 41, 3948–3969. [Google Scholar] [CrossRef]
Lee, H.; Kwon, H. Going deeper with contextual CNN for hyperspectral image classification. IEEE Trans. Image Process. 2017, 26, 4843–4855. [Google Scholar] [CrossRef]
Zhong, Z.; Li, J.; Ma, L.; Jiang, H.; Zhao, H. Deep residual networks for hyperspectral image classification. In Proceedings of the 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Fort Worth, TX, USA, 23–28 July 2017; pp. 1824–1827. [Google Scholar]
Bhosle, K.; Musande, V. Evaluation of deep learning CNN model for land use land cover classification and crop identification using hyperspectral remote sensing images. J. Indian Soc. Remote Sens. 2019, 47, 1949–1958. [Google Scholar] [CrossRef]
Li, Q.; Wong FK, K.; Fung, T. Mapping multi-layered mangroves from multispectral, hyperspectral, and LiDAR data. Remote Sens. Environ. 2021, 258, 112403. [Google Scholar] [CrossRef]
Morchhale, S.; Pauca, V.P.; Plemmons, R.J.; Torgersen, T.C. Classification of pixel-level fused hyperspectral and lidar data using deep convolutional neural networks. In Proceedings of the 2016 8th Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS), Los Angeles, CA, USA, 21–24 August 2016; pp. 1–5. [Google Scholar]
Liu, X.; Meng, Y.; Fu, M. Classification Research Based on Residual Network for Hyperspectral Image. In Proceedings of the 2019 IEEE 4th International Conference on Signal and Image Processing (ICSIP), Wuxi, China, 19–21 July 2019; pp. 911–915. [Google Scholar]
Li, Z.; Liu, M.; Chen, Y.; Xu, Y.; Li, W.; Du, Q. Deep Cross-Domain Few-Shot Learning for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5501618. [Google Scholar] [CrossRef]
Krishnendu, C.S.; Sowmya, V.; Soman, K.P. Impact of Dimension Reduced Spectral Features on Open Set Domain Adaptation for Hyperspectral Image Classification. In Evolution in Computational Intelligence; Bhateja, V., Peng, S.L., Satapathy, S.C., Zhang, Y.D., Eds.; Advances in Intelligent Systems and Computing; Springer: Singapore, 2021; Volume 1176. [Google Scholar] [CrossRef]

Figure 1. The framework of proposed IADMRN for Open-Set HSI Classification.

Figure 2. The structure of the feature extraction sub-network.

Figure 3. The structure of image reconstruction sub-network.

Figure 4. EVT model can identify five categories A, B, C, D, E and reject the “?” inputs that lie beyond the support of the training set as “unknown.”.

Figure 5. Classification maps obtained by different methods on the Salinas dataset with 200 training samples. (a) CNN. (b) ResNet. (c) DCFSL. (d) OS-GAN. (e) MDL4OW. (f) Our method.

Figure 6. Classification maps obtained by different methods on the Pavia Dataset with 200 training samples (a) CNN. (b) ResNet. (c) DCFSL. (d) OS-GAN. (e) MDL4OW. (f) Our method.

Figure 7. Classification maps obtained by different methods on the Indian Pines with 200 training samples. (a) CNN. (b) ResNet. (c) DCFSL. (d) OS-GAN. (e) MDL4OW. (f) Our method.

Figure 8. Comparison of the different threshold of SoftMax under IADMRN.

Figure 9. Comparison of the different branches combination in feature fusion strategy.

Figure 10. The ablation experiments at Salinas and Pavia University scene.

Table 1. Salinas Valley Dataset Labeled Sample Counts.

No	Name	Number
1	Weeds-2	3726
2	Stubble	3979
3	Lettuce-4wk	1068
4	Vinyard-U	7268
5	Lettuce-5wk	1927
6	Lettuce-6wk	916
7	Grapes	11,271
8	Vinyard-T	1807
9	Weeds-1	2009
10	Celery	3579
11	Fallow	1976
12	Fallow-P	1394
13	Fallow-S	2678
14	Corn	3278
15	Lettuce-7wk	1070
16	Soil	6203
17	Unknown	5613
Total Numbers		59,742

Table 2. University of Pavia Dataset Labeled Sample Counts.

No	Name	Number
1	Asphalt	6631
2	Meadows	18,649
3	Gravel	2099
4	Trees	3064
5	Metal S.	1345
6	Bare S.	5029
7	Bitumen	1330
8	Brick	3682
9	Shadow	947
10	Unknown	5163
	Total Numbers	47,939

Table 3. Indian Pines Dataset Labeled Sample Counts.

No	Name	Number
1	Corn-notill	1428
2	Corn-mintill	830
3	Grass-pasture	483
4	Haywindrowed	478
5	Soybean-notill	972
6	Soybean-mintill	2454
7	Soybean-clean	593
8	Woods	1265
9	Unknown	2007
Total Numbers		10,510

Table 4. Classification results on the Salinas Valley dataset by different classification methods.

Classes	CNN	ResNet	DCFSL	OS-GAN	MDL4OW	IADMRN
Weeds-1	82.16	81.22	85.47	87.76	90.41	91.75
Weeds-2	86.02	87.38	90.51	91.65	93.27	94.52
Fallow	90.25	89.16	92.04	94.07	95.71	96.94
Fallow-P	87.43	86.31	89.68	93.88	96.52	97.50
Fallow-S	77.29	79.03	84.75	83.27	86.81	87.44
Stubble	85.97	86.27	88.01	90.43	92.35	93.12
Celery	89.42	88.69	92.26	91.72	93.54	94.87
Grapes	71.03	72.64	74.19	76.96	80.14	82.21
Soil	89.64	90.55	93.13	92.75	96.06	97.83
Corn	85.21	87.94	91.27	93.41	93.15	94.06
Lettuce-4wk	84.16	83.08	86.39	89.02	93.76	95.21
Lettuce-5wk	83.29	85.45	90.84	93.08	95.73	96.64
Lcttuce-6wk	87.39	88.06	92.67	95.13	97.07	98.45
Lcttuce-7wk	88.34	87.24	90.64	95.94	96.32	97.81
Vinyard-U	73.61	69.21	78.65	80.06	81.13	83.19
Vinyard-T	87.12	88.13	90.59	92.37	94.25	96.95
Unknown	1.51	3.02	4.87	32.45	61.84	80.76
OA (%)	78.14	79.54	82.34	84.06	88.18	92.53
AA (%)	82.16	83.06	85.12	87.64	91.87	92.94
Kappa (%)	71.43	72.18	77.91	81.29	86.32	90.92

Table 5. Classification results on the University of Pavia dataset by different classification methods.

Classes	CNN	ResNet	DCFSL	OS-GAN	MDL4OW	IADMRN
Asphalt	78.21	85.43	88.46	89.24	91.32	93.59
Meadows	82.72	87.05	92.12	93.17	92.05	93.48
Gravel	84.56	86.61	89.37	91.56	93.79	97.52
Trees	85.09	89.03	92.05	91.28	93.52	95.16
Metal S.	76.10	78.85	86.53	88.74	87.31	89.91
Bare S.	77.21	83.49	89.46	91.59	91.07	95.35
Bitumen	80.43	83.12	87.91	88.63	90.62	92.06
Brick	75.15	76.59	83.19	82.04	86.51	89.19
Shadow	80.64	78.17	82.59	85.17	83.84	87.57
Unknown	1.79	2.41	3.35	31.61	57.84	79.23
OA (%)	73.66	80.29	84.12	87.15	89.64	93.72
AA (%)	75.59	82.73	84.03	89.79	92.12	94.06
Kappa (%)	72.12	75.44	80.47	85.96	88.57	91.19

Table 6. Classification results on the Indian Pines dataset by different classification methods.

Classes	CNN	ResNet	DCFSL	OS-GAN	MDL4OW	IADMRN
Asphalt	70.03	77.27	86.21	89.42	91.13	92.18
Meadows	69.87	74.25	88.59	89.57	90.08	91.49
Gravel	72.15	76.13	81.34	84.95	83.71	85.47
Trees	82.06	87.05	91.71	93.01	96.57	97.08
Metal S.	84.64	86.90	88.65	87.29	90.54	91.35
Bare S.	77.98	85.36	89.20	91.82	93.39	94.86
Bitumen	80.75	85.22	84.46	85.53	87.76	90.71
Brick	78.02	81.14	83.25	84.76	87.21	89.02
Unknown	0.79	3.07	5.81	25.44	34.08	59.62
OA (%)	68.29	77.75	81.13	82.54	84.32	86.79
AA (%)	73.41	79.89	82.04	84.19	86.76	89.81
Kappa (%)	61.54	72.35	75.61	77.11	81.03	83.55

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhou, H.; Wu, H.; Wang, A.; Iwahori, Y.; Yu, X. Incorporating Attention Mechanism, Dense Connection Blocks, and Multi-Scale Reconstruction Networks for Open-Set Hyperspectral Image Classification. Remote Sens. 2023, 15, 4535. https://doi.org/10.3390/rs15184535

AMA Style

Zhou H, Wu H, Wang A, Iwahori Y, Yu X. Incorporating Attention Mechanism, Dense Connection Blocks, and Multi-Scale Reconstruction Networks for Open-Set Hyperspectral Image Classification. Remote Sensing. 2023; 15(18):4535. https://doi.org/10.3390/rs15184535

Chicago/Turabian Style

Zhou, Huaming, Haibin Wu, Aili Wang, Yuji Iwahori, and Xiaoyu Yu. 2023. "Incorporating Attention Mechanism, Dense Connection Blocks, and Multi-Scale Reconstruction Networks for Open-Set Hyperspectral Image Classification" Remote Sensing 15, no. 18: 4535. https://doi.org/10.3390/rs15184535

APA Style

Zhou, H., Wu, H., Wang, A., Iwahori, Y., & Yu, X. (2023). Incorporating Attention Mechanism, Dense Connection Blocks, and Multi-Scale Reconstruction Networks for Open-Set Hyperspectral Image Classification. Remote Sensing, 15(18), 4535. https://doi.org/10.3390/rs15184535

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Incorporating Attention Mechanism, Dense Connection Blocks, and Multi-Scale Reconstruction Networks for Open-Set Hyperspectral Image Classification

Abstract

1. Introduction

2. Methodology

2.1. The Proposed IADMRN Framework for Open-Set HSI Classification

2.2. Feature Extraction Sub-Network

2.2.1. Dense Connection Block

2.2.2. Depthwise Separable Convolution Attention Mechanism

2.2.3. The Structure of Feature Extraction Sub-Network

2.3. Image Reconstruction Sub-Network

2.4. Classification Sub-Network

2.4.1. Reconstruction Loss Calculation

2.4.2. EVT Extreme Value Modeling

2.4.3. Threshold-Based Classification Decision

3. Results

3.1. Datasets Description

3.2. Experimental Parameters Setting

3.3. Comparison of the Proposed Methods with the State-of-the-Art Methods

4. Discussion

4.1. The Threshold of SoftMax

4.2. The Number of Branches in Feature Fusion Strategy

4.3. Ablation Study

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI