1. Introduction
Hyperspectral imaging is a robust technology that combines spatial and spectral information about ground objects to provide highly detailed information about the Earth’s surface. Hyperspectral imaging (HSI) comprises multiple bands of image, each corresponding to a specific wavelength range, and is capable of detecting and identifying unique spectral signatures of different materials [
1]. The spectral detection range of hyperspectral technology far exceeds the perception range of the human eye, making it a powerful tool for environmental monitoring, agriculture, ecology, oceanography, geology, and land management [
2]. The purpose of HSI classification is to classify each pixel into a corresponding ground class (e.g., building, soil, grassland, tree, river, road, and so on). As a core step in HSI data processing, HSI classification plays an irreplaceable role in most hyperspectral technology applications. Nonetheless, precise classification of hyperspectral images remains a provocative task, especially in open scenes where previously unseen categories may appear in the dataset [
3].
Hyperspectral image classification has been an ongoing and active topic of research for decades, and many methodologies have been advanced to cope with this provocative problem. In the early stages, spectral-based classification methodologies employ the spectral information of hyperspectral images to classify pixels. Principal component analysis (PCA) [
4] and linear discriminant analysis (LDA) [
5] are two extensively utilized feature extraction methods that have been adopted successfully for hyperspectral image classification. Nevertheless, these methods may not capture the complex and nonlinear relationships between spectral and spatial features of hyperspectral images, restricting their performance. Feature-based classification methods derive features from hyperspectral images and subject them to classification. Local Binary Pattern (LBP), Gray Level Co-generation Matrix (GLCM), and Histogram of Oriented Gradients (HOG) are somewhat frequent feature extraction methods adopted in hyperspectral image classification. Those methods, by contrast, probably fail to capture the complex and advanced features of hyperspectral images, yielding limited properties in some cases.
Deep learning techniques have evolved rapidly in the fields of computer vision and pattern recognition in recent years, and they are believed to produce better performance than traditional shallow classifiers. Inspired by the aggressive adoption of deep learning, many deep learning classifiers have been developed for hyperspectral image classification with remarkable performance. Chen et al. [
6] initially came up with a deep HSI classification network based on a superposition autoencoder (SAE). They gave a further presentation of a convolutional neural network (CNN)-based HSI classification method that incorporates sparse connectivity and weight sharing for efficient feature extraction, which adopts sparse connectivity and weight sharing to achieve effective feature extraction [
7]. Subsequently, Zhong et al. raised the spectral-spatial residual network (SSRN) for HSI classification, which relieves the gradient disappearance or explosion issue as the complexity of the network layers increases [
8]. Contrary to conventional algorithms, deep learning methods can extract features automatically from a large set of labeled data rather than requiring human design of specific feature schemas [
9]. However, they rely on the use of known labeled samples for their optimal performance [
10].
Whereas the objectives of these methods are to attain high classification accuracy with training samples, this is based on the closed assumption that all training and test data originate from the same labeling space. Nevertheless, it is extremely challenging in real-world applications to stick to this assumption. In such cases, traditional classification methods often produce incorrect classification results, meaning that unknown samples are classified into a known category. Comparing with the open set of natural images, the open set of HSIs has three remarkable differences: fewer samples, fewer categories, and lower openness. Firstly, the difficulty of sampling HSI leads to a much smaller sample size of available training data versus natural images. Secondly, HSIs often have only a few or a dozen categories, whereas natural image datasets such as the ImageNet dataset may have thousands of categories. Finally, the unknown samples usually belong to the tail categories with small sample sizes and occupy only a few parts of the research area. Thus, it is hard to use the open-set classification technique developed for natural image classification directly for HSI classification. To tackle this concern, little work has specialized in the open-set classification of HSI. Pal et al. [
11] posited a residual 3D convolutional block attention module to extract discriminative prototypes for each known class, and then designed a meta-learning-based outlier point calibration network to distinguish between known and unknown samples. To improve the robustness of HSI classification methods in open-set while maintaining the classification accuracy of known classes, Jun Yue [
12] proposed a spectral-spatial reconstruction framework that simultaneously performs spectral feature reconstruction, spatial feature reconstruction, and pixel-wise classification in open-set. Zhuo [
13] proposes a feature consistency-based prototype network (FCPN) for open-set HSI classification, which makes full use of feature consistency between homogeneous samples without pseudo-unseen samples. Nonetheless, the existence of several challenges persists. Hyperspectral images encompass both spatial and spectral information, yielding high-dimensional datasets that are provocative for precise analysis and classification in open-set [
14]. Aside from others, spectral mixing is a frequent problem in open-set classification of hyperspectral images, where the spectral features of different materials overlap, making accurate classification difficult [
15].
Some key distinctions exist between open-set image classification and traditional closed-set image classification. In closed-set classification, all the categories are known, whereas in open-set classification, some categories are unknown. It means that in open-set classification, the model needs to have the capabilities to process the cases of unknown categories. When using deeper or more structurally complex network models in real-world open-set hyperspectral image classification tasks, they are extremely susceptible to the issues of overfitting and gradient vanishing. Aiming at the problem of unknown class samples being misclassified into known classes in the closed-set classification algorithm, we advance a novel hyperspectral open-set classification method: A model for integrating attention mechanism dense connection blocks and a multiscale reconstruction network for hyperspectral open set classification (IADMRN) gives a solution for the open set dataset of hyperspectral images. by constructing the fusion attention mechanism, densely connected block feature extraction sub-networks, multi-scale reconstruction sub-networks, classification sub-networks, and the EVT (Extreme Value Theory) extreme value model, thus enabling the network to realize the classification function in the open dataset. Moreover, the identification ability of the network for unknown classes is further strengthened to adjust to open-set classification environments in practical applications. The major novelties of this study are summarized as follows:
We propose a novel feature extraction network model structure composed of dense connection blocks combined with an attention mechanism. It builds the connection relationship between different layers to make optimal use of features and mitigate the gradient disappearance problem. By using the channel attention module and the space attention module, you can solve the problem of what to focus on and where to focus on in the channel and spatial dimensions. Additionally, we enhance the attention mechanism model by bringing in a depthwise separable convolution, which reinforces the attention allocation in spatial and channel dimensions by splitting the correlation between spatial and channel dimensions. It promotes the feature formulation ability during forward propagation of the network and sufficiently extracts the feature information of small targets.
We harness multi-task learning to perform classification and reconstruction simultaneously, thus permitting automatic identification of unknown classes. Deconvolutional filters with different sizes reconstruct different semantic information. Therefore, a multi-scale feature reconstruction architecture is introduced, which reconstructs the spatial context at multiple scales, thereby making full use of the rich spatial information and enhancing the robustness of the feature reconstruction in a complex background. Additionally, the multi-scale reconstruction network based on deconvolution helps to recover fine-grained details lost during feature extraction. The model further incorporates a multi-scale DeConv layer to fuse together and bolster the reconstructed network, rendering the reconstructed images intact and raising the classification accuracy.
Experiments were evaluated on the Salinas, University of Pavia, and Indian Pines datasets. It demonstrates that the proposed method can obtain superior classification performance for the known class and unknown classes compared to other state-of-the-art classification methods.
The rest of this article is organized as follows:
Section 2 describes our proposed classification approach in detail.
Section 3 reports the experimental results and evaluates the performance of the proposed method. It also analyzes the selection of experimental parameters in
Section 4.
Section 5 gives the conclusion.
2. Methodology
Considering that existing methods usually struggle to overcome the challenges of unknown categories, resulting in degraded classification accuracy and limited generalization, In this paper, we propose a new deep learning framework called IADMRN, which addresses the unknown class handling problem in hyperspectral image classification. The overview architecture of the IADMRN model is shown in
Figure 1.
In the
Section 2, the proposed IADMRN will be described in detail. Firstly, we briefly introduce the overall structure of the IADMRN. Secondly, the feature extraction module is introduced. Thirdly, the multi-scale image reconstruction network is presented. Finally, the classification sub-network is presented.
2.1. The Proposed IADMRN Framework for Open-Set HSI Classification
Figure 1 presents the holistic framework of our proposed IADMRN for Open-Set HSI Classification, which is exemplified by the Indian Pines dataset. To begin with, the hyperspectral image to be classified is injected. IADMRN conducts feature extraction using a feature extraction sub-network formed by dense connection blocks combined with an attention mechanism. Dense connection blocks combined with a depthwise separable convolutional attention mechanism empower the network to repurpose information at multiple levels, capturing the most relevant or discriminative categorical features. It further expands the feature expressivity of the network by segmenting its spatial and channel dimensions with an optional concern for important features.
The extracted features are transferred to a multi-scale image reconstruction network with deconvolution to reconstruct the image. By fusing feature maps through multi-scale DeConv layers, the image reconstruction sub-network refines the reconstructed image by recovering fine-grained details lost during feature extraction, reducing information loss, and enhancing feature representation capabilities. The reconstruction loss is arrived at by contrasting the reconstructed image with the original image, and the reconstruction loss is injected into the EVT extreme value theory model to evaluate the probability that the samples in the image belong to the unknown class. If the probability of the samples belonging to the unknown class is low, the samples are adjudged as known class and input to the FC and SoftMax layers for normal classification; if the probability of the samples belonging to the unknown class is strong, the samples are assessed as unknown class.
2.2. Feature Extraction Sub-Network
As shown in
Figure 2, the dense connection block and an attention mechanism [
16] form the feature extraction sub-network. The feature extraction sub-network of IADMRN is designed to efficiently extract relevant and discriminative features from hyperspectral images. It consists of densely connected blocks combined with an attention mechanism to allow for effective information propagation and selective feature extraction. The dense connection block establishes the connection relationship between different layers to make the best use of hyperspectral image features. Depthwise separable convolution [
17] can strengthen the attention mechanism mode, which reinforces the attention allocation on the spectral and spatial dimensions of hyperspectral images by splitting the correlation between spatial and channel dimensions.
Specifically, depthwise separable convolution first performs an independent convolution operation for each channel, which improves the expressiveness of each channel in the spatial dimension. Then, the correlations between different channels are taken into account by point-by-point convolution operations, which can strengthen the attention mechanism model in the spectral dimension. Consequently, depth-separable convolution can effectively strengthen the allocation of attention to hyperspectral images in both spectral and spatial dimensions to improve the performance of the model.
2.2.1. Dense Connection Block
To be specific, in the dense connection block, the input of each layer is a join of the outputs of all the previous layers [
18]. The structure results in adequate information transfer within the module and, at the same time, effectively mitigates the gradient disappearance problem [
19]. In a dense connection block, assuming that the input of the current layer is a hyperspectral image
and the output is the hyperspectral feature map
, the layer can be represented as:
where
denotes the joining of the outputs of all previous layers.
denotes the operation of the current layer [
20].
denotes the operation of the
layer, and
denotes the number of the current layer. Specifically,
usually consists of BN, ReLU, and two-dimensional convolution operations, which can be expressed as:
where
denotes the BN operation.
denotes the ReLU operation, and
denotes the two-dimensional convolution operation [
21].
By introducing dense connectivity inside the module, dense connection blocks can effectively mitigate the gradient disappearing problem, and improve the model performance of extracting hyperspectral image features, and improve the feature representation without increasing the model complexity. The direct connections among layers of dense connection blocks can strengthen the expressive power of hyperspectral feature information, and the outputs of each layer are directly attached to the insects of all subsequent layers. Consequently, a high degree of sharing and transferring of information among layers is thus achieved, leading the network to capture more effectively various features more effectively from the data.
2.2.2. Depthwise Separable Convolution Attention Mechanism
The attention mechanism consists of depthwise separable convolution, channel attention, and spatial attention. Depthwise convolution convolves hyperspectral images in groups without changing the channel depth, and extracts the spatial features of each channel of the hyperspectral image separately. Depthwise separable convolution decomposes the standard convolution operation into deep convolution and point-by-point convolution. Point-by-point convolution realizes the information interaction among channels by performing a convolution operation on the input with a convolution kernel. Point-by-point convolution extracts the spectral features of hyperspectral images on a point-by-point basis using kernel functions. Therefore, the depthwise separable convolution has the merit of effectively extracting the spectral-spatial cooperative feature of hyperspectral images.
Depthwise separable convolution can strengthen the attention mechanism mode, which reinforces the attention allocation in spatial and channel dimensions by splitting the correlation between spatial and channel dimensions of the hyperspectral image. Depthwise separable convolution is an improvement based on traditional convolution, which decomposes the convolution operation into two steps: depth convolution and point-by-point convolution. Thus, the number of parameters and computations can be significantly reduced, while maintaining the effectiveness of the convolution operation [
22]. Hence, by adding depthwise separable convolution in front of the attention mechanism, it can make the feature map more accurate and thus improve the performance of the final task.
where
is the input hyperspectral image feature map,
is the hyperspectral image feature map after depthwise separable convolution processing [
23].
is the CBAM attention map.
is the weighted feature map.
is the final output feature map.
and
denote the depth convolution kernel and point-by-point convolution kernel of depthwise separable convolution, respectively.
and
denote the channel attention weight and spatial attention weight of CBAM, respectively, and
is the number of channels [
24].
The attention mechanism combined with the implementation of deep separable convolution can squeeze the inconsequential parts of the hyperspectral image feature map to values close to zero, thus rendering the model more attentive to the important feature parts and providing improved accuracy and robustness of the hyperspectral image feature representation [
25].
In particular, assuming that the input hyperspectral image feature map of the current network is
, the feature map obtained after the first dense connection block is
. Then, an attention mechanism consisting of depthwise separable convolution, channel attention, and spatial attention is performed
. The final output hyperspectral image feature map is
, which can be expressed as:
where
denotes the depthwise separable convolution operation and
denotes the CBAM attention mechanism operation [
26]. Specifically, assuming that the dimension of
is
, the operation of the depthwise separable convolution can be expressed as:
where
denotes the deep convolution operation,
denotes the deep BN operation.
denotes the deep ReLU operation, and
denotes the point convolution operation.
has the same dimension as
[
27]. The operation of the CBAM attention mechanism can be expressed as:
where
denotes the attention weight of channel
and
denotes the hyperspectral image feature map of
on the
channel.
2.2.3. The Structure of Feature Extraction Sub-Network
In the whole network, the first fused attention mechanism dense connection block and the second fused attention mechanism dense connection block are sequentially cascaded to form the fused attention mechanism dense connection network. Specifically, the output
of the first fused attention mechanism thickly connected block is used as an input
of the second fused attention mechanism thickly connected block, as follows:
where
denotes the first fused attention mechanism’s thickly connected block and
denotes the second fused attention mechanism’s thickly connected block.
Dense connection blocks are utilized to forge a robust connection between the different layers of the network, permitting information to circulate more freely through the network and reducing the risk of gradient disappearance. It contributes to the efficiency and robustness of the model and also helps to prevent overfitting by reducing the number of parameters required [
28]. The attention mechanism in the hyperspectral image feature extraction sub-network is induced by the use of depthwise separable convolution. It allows one to selectively focus on important features in spatial and channel dimensions, which is particularly relevant for hyperspectral images containing small targets and subtle spectral differences. By focusing on these important features, the model can achieve higher accuracy and better generalization performance. By attending to these important features, the model can achieve higher accuracy and better generalization performance.
The feature extraction sub-network in IADMRN is efficient and effective in extracting high-quality hyperspectral image features for subsequent classification and reconstruction tasks. By incorporating dense connection blocks with an attention mechanism, the feature extraction sub-network can achieve superior performance for HSI classification of open sets.
2.3. Image Reconstruction Sub-Network
The image reconstruction sub-network consists of a multi-scale deconvolution network, shown in
Figure 3. The multi-scale deconvolution network is made up of five branches. By performing element-by-element summation of hyperspectral image feature maps obtained from different branches, the reconstructed images benefit from the aggregated information, resulting in a more complete and richer representation.
Figure 3 gives the structure of the image reconstruction sub-network.
Specifically, since hyperspectral images usually contain a large number of spectral bands, the dimension catastrophe problem is easily encountered in feature extraction and classification tasks. Image reconstruction can solve this problem by performing dimensionality reduction on the original hyperspectral image, which can improve the efficiency and accuracy of the classifier.
Second, image reconstruction can also help improve the robustness of classification. Hyperspectral images are usually affected by noise and other interferences, which can lead to a decrease in the accuracy of the classifier. By using image reconstruction techniques, the effects of these noises and other disturbances can be reduced, and the robustness of the classifier to these disturbances can be improved [
29].
In conclusion, image reconstruction can contribute to enhancing the interpretability of the classifier [
30]. It is difficult to understand the internal workings of hyperspectral image classifiers because they are usually black-box models. Through using image reconstruction techniques, the original hyperspectral image can be converted into a more understandable form, leading to a better understanding of the workings of the classifier and the classification results.
Multi-scale deconvolution image reconstruction in IADMRN is based on the concepts of upsampling and feature fusion to reconstruct high-quality hyperspectral images. It follows a step-by-step process that involves multiple deconvolution operations with different kernel sizes and the fusion of the reconstructed feature maps. Deconvolution can be seen as the inverse process of convolution, allowing the low-dimensional feature vectors to be mapped back into the spatial domain of the original feature map [
31]. This upsampling restores the spatial resolution of the feature maps. The principle can be summarized as follows:
represents the hyperspectral image feature map obtained from the feature extraction sub-network. represent the reconstructed hyperspectral image feature maps at different branches of the hyperspectral image reconstruction sub-network. are the convolution kernels used in the deconvolution operation. stands for deconvolution operation.
The hyperspectral image feature maps are reconstructed at different scales by applying deconvolution operations with varying kernel sizes. This multi-scale approach captures hyperspectral image features at different levels of detail, ranging from fine-grained details to broader contextual information. By using different kernel sizes, the network can effectively reconstruct features at different scales, enhancing the representation of the reconstructed hyperspectral images.
stands for the final reconstructed hyperspectral image.
The image reconstruction sub-network employs a multi-scale approach through the use of deconvolution operations. By utilizing various kernel sizes (such as 1 × 1, 3 × 3, 5 × 5, 7 × 7, and 9 × 9), it facilitates the reconstruction of hyperspectral image features at different scales, effectively capturing both fine-grained details and broader contextual information. This comprehensive feature reconstruction enhances the fidelity of the reconstructed hyperspectral images [
32]. Further, the sub-network incorporates a fusion strategy that combines the reconstructed hyperspectral image feature maps from multiple branches. By element-wise summarizing the hyperspectral image feature maps obtained at different stages, the reconstructed hyperspectral image benefits from the aggregated information, resulting in a more complete and informative representation. This information fusion mechanism enables the identification of unknown classes by comparing the fused feature map with the original image.
2.4. Classification Sub-Network
Classification sub-network leverages EVT extreme value modeling to identify unknown classes within the hyperspectral image feature map. By calculating the probabilities associated with each pixel using the reconstruction loss, the EVT model distinguishes between known and unknown classes.
EVT is a statistical method commonly used in the modeling and analysis of extreme events that can be used to predict events that are likely to occur under extreme conditions. In deep learning, EVT is used in classification problems to distinguish between normal and abnormal data. The classification principle of EVT extreme value modeling is based on the assumption of extreme value distribution, i.e., the distribution of data can be approximated under extreme conditions. In classification problems, the EVT extreme value model is used to evaluate whether new data are normal or not by fitting the normal data in the training dataset to a model of the extreme value distribution, which is then used to evaluate whether the new data are normal or not.
As shown in
Figure 4, A solution from the EVT model trained on five known classes: A, B, C, D, and E. Each class has its own independent shape and scale parameters learned from the data, and supports a soft margin. However, the unknown classes in the data do not have these characteristics. Via kernel-free non-linear modeling, the EVT model supports open set recognition and can reject the four “?” inputs that lie beyond the support of the training set as “unknown.” This capability allows for the inclusion of unknown class samples during training, which improves the classifier’s generalization ability and enables effective handling of previously unseen or novel classes. To make accurate classification decisions, the sub-network employs a threshold-based approach. By comparing the probability of belonging to the unknown class with a predefined threshold, the model effectively separates known and unknown classes [
33]. This mechanism enables the classifier to assign samples to the appropriate class with high confidence, ensuring reliable classification results.
2.4.1. Reconstruction Loss Calculation
The Classification sub-network incorporates the calculation of reconstruction loss, which measures the discrepancy between the original and reconstructed hyperspectral images. The mean square error (MSE) is employed as the loss function to quantify the difference. By including the image reconstruction task as a secondary objective, the sub-network effectively utilizes the hyperspectral information and enforces the reconstruction fidelity during training.
where
is the reconstructed instance,
is the reconstruction function also called the decoder, and
is the potential feature of the encoder
output. We use the
distance as the reconstruction loss.
2.4.2. EVT Extreme Value Modeling
To identify unknown classes, the Classification sub-network utilizes EVT extreme value modeling. The reconstruction loss serves as input to an EVT model, where the probability of each pixel belonging to the unknown class is determined. By leveraging extreme value theory, the model captures the distribution of reconstruction loss and assigns probabilities to each pixel, enabling effective identification of unknown class samples [
34].
EVT indicates that the tail for should be a Weibull distribution. For a large class of distributions
and a threshold
sufficiently large, with
,
independent and identically distributed samples, the cumulative distribution function can be approximated by the generalized Pareto distribution (GPD).
The parameters and can be estimated from the given tail data. Here, is the reconstruction loss, and is the cumulative distribution function of the GPD.
2.4.3. Threshold-Based Classification Decision
The Classification sub-network employs a threshold-based approach to make classification decisions. A predefined threshold value is compared to the probability of belonging to the unknown class. If the probability exceeds the threshold, the sample is classified as an unknown class. Otherwise, it is classified using traditional classification techniques, such as the fully connected (FC) layer followed by the SoftMax layer.
present the probability of belonging to the unknown class. present threshold. stands for sample by feature extraction subtask. By setting an appropriate threshold, the sub-network effectively separates samples into known and unknown classes, ensuring reliable and accurate classification outcomes. As mentioned earlier, the SoftMax function transforms the score vector into a probability vector, and the class with the largest probability is considered the predicted class. A naive solution to identify unknown classes is to consider those instances with the largest probability smaller than 0.5 as unknown, which is one of the baselines in the experiments (SoftMax with threshold = 0.5).
4. Discussion
In order to find the optimal network structure, it is necessary to experiment with different parameters, which play a crucial role in the size of the model and the complexity of the proposed IADMRN. In this paper, the optimal parameter combination is determined by analyzing the influence of parameters on the accuracy of classification results, including the Ablation Study, the threshold of SoftMax, and the number of branches in the feature fusion strategy.
4.1. The Threshold of SoftMax
The first parameter is the threshold for SoftMax. The Classification sub-network employs a threshold-based approach to make classification decisions. A predefined threshold value is compared to the probability of belonging to the unknown class. The SoftMax function transforms the score vector into a probability vector, and the class with the largest probability is considered the predicted class. We show a threshold analysis in
Figure 8 to demonstrate that the effectiveness of the SoftMax threshold setting. As seen in
Figure 8, the threshold value increases the higher the OA value. Up to the threshold value of 0.5, the OA value is the highest. Beyond 0.5, the OA value decreases as the threshold value increases. The choice of the SoftMax threshold can have implications for the performance of the model. If the threshold is too high, the model may be overly conservative and classify many samples as unknown classes, even though they belong to a known class. This can lead to lower accuracy on the known class classification task. On the other hand, if the threshold is too low, the model may be too lenient and classify many samples as known classes, even though they belong to an unknown class. This can lead to lower accuracy on the unknown class classification task.
4.2. The Number of Branches in Feature Fusion Strategy
The second parameter is the number of branches in the feature fusion strategy. This paper analyzes the correlation and complementarity of information in the deep network using multibranch feature fusion. IADMRN2, IADMRN3, IADMRN4, IADMRN5, and IADMRN6 refer to methods that fuse two, four, five, and six hierarchical branches. Among them, IADMRN2 represents the fusion of the first and second branches. IADMRN6 represents the duplicate fusion of the fifth branch and IADMRN5. It can be seen from
Figure 9 that in different datasets, IADMRN5s precision values are superior to those of IADMRN2, IADMRN3, and IADMRN4. In addition, taking the Salinas dataset as an example, compared with the IADMRN2, the OA, AA, and Kappa values of the IADMRN5 fusion strategy increased by 2.28%, 3.32%, and 2.79%, respectively. As can be seen in
Figure 9, IADMRN2 only serves as a feature map due to its smallest sensory field and cannot capture other scale features. Therefore, the reconstructed image is the worst, resulting in the lowest classification accuracy. When IADMRN5 contains deconvolutional layers with five different scale receptive fields, the features at different scales can be fully extracted, and the reconstruction performance is optimal.
To some extent, multiple layer fusion improves the image reconstruction performance. And the classification accuracy of IADMRN6 is equal to or even slightly lower than that of IADMRN5. This indicates that excessive fusion layers may bring redundant information, which increases the number of parameters and reduces the reconstruction speed and quality. Therefore, the IADMRN proposed in this paper uses five branches for feature fusion, and its structure is shown in
Figure 3.
4.3. Ablation Study
In order to verify the effectiveness of each module in IADMRN, an ablation study is conducted on the Salinas dataset and the Pavia University dataset. On the basis of IADMRN, we tried to replace the dense block module with an ordinary convolution layer, which constitutes IADMRN without the Dense connection block module. We use ordinary convolution instead of depthwise separable convolution and deconvolution instead of Multi-scale deconvolution image reconstruction, representing IADMRN without the Depthwise separable convolution attention mechanism and IADMRN without the Multi-scale deconvolution image reconstruction, respectively. In the experimental setup of this ablation study, IADMRN without the Multi-scale deconvolution image reconstruction, IADMRN without the Depthwise separable convolution attention mechanism, and IADMRN without the Dense connection block module were compared with our proposed method.
The relevant accuracy indicators and results of the ablation study are shown in
Figure 10. It can be seen from the table that each module proposed in this paper plays a very important role in improving the accuracy. After removing the Multi-scale deconvolution image reconstruction module, the OA is decreased by 2.5%. After removing the Depthwise separable convolution attention mechanism module, the OA is decreased by 2.1%. It can be concluded that each module in the proposed method contributes to the improvement of the accuracy and plays an important role in the performance of the network.
5. Conclusions
In this paper, we propose IADMRN, a novel framework for hyperspectral image classification with a focus on handling unknown classes. The framework combines advanced techniques such as feature extraction sub-networks with dense connection blocks and attention mechanisms, multi-scale deconvolution image reconstruction, and EVT-based unknown class identification. Through extensive evaluations on diverse hyperspectral datasets, including the Indian Pines, UP, and Salinas datasets, we demonstrated the dominance of IADMRN in terms of classification accuracy, particularly for unknown classes.
Our results showed that IADMRN outperforms existing methods in terms of classification accuracy for both known and unknown classes. The feature extraction sub-network effectively captures and utilizes discriminative features, mitigating the problem of gradient vanishing and enhancing classification performance. The multi-scale deconvolution image reconstruction leverages fine-grained details and contextual information, leading to improved classification accuracy. The integration of EVT-based unknown class identification enables accurate identification and assignment of unknown class labels, further enhancing classification results.
In conclusion, IADMRN presents a comprehensive and effective solution for hyperspectral image classification with unknown classes. Its innovative features and methodologies contribute to superior classification accuracy by addressing the limitations of existing methods. The framework’s versatility and performance across different datasets and applications make it a valuable tool in hyperspectral image analysis. Future work could explore potential enhancements, such as incorporating domain adaptation techniques or extending the framework to handle other types of data modalities, further advancing the field of hyperspectral image classification.