Research on Camouflage Target Classification and Recognition Based on Mid Wave Infrared Hyperspectral Imaging

Zhang, Shikun; Cao, Yunhua; Bai, Lu; Wu, Zhensen

doi:10.3390/rs17081475

Open AccessArticle

Research on Camouflage Target Classification and Recognition Based on Mid Wave Infrared Hyperspectral Imaging

School of Physics, Xidian University, Xi’an 710071, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(8), 1475; https://doi.org/10.3390/rs17081475

Submission received: 23 January 2025 / Revised: 18 April 2025 / Accepted: 19 April 2025 / Published: 21 April 2025

Download

Browse Figures

Versions Notes

Abstract

:

Mid-wave infrared (MWIR) hyperspectral imaging integrates MWIR technology with hyperspectral remote sensing, enabling the capture of radiative information that is difficult to obtain in the visible spectrum, thus demonstrating significant value in camouflage recognition and stealth design. However, there is a notable lack of open-source datasets and effective classification methods in this field. To address these challenges, this study proposes a dual-channel attention convolutional neural network (DACNet). First, we constructed four MWIR camouflage datasets (GCL, SSCL, CW, and LC) to fill a critical data gap. Second, to address the issues of spectral confusion between camouflaged targets and backgrounds and blurred spatial boundaries, DACNet employs independent spectral and spatial branches to extract deep spectral–spatial features while dynamically weighting these features through channel and spatial attention mechanisms, significantly enhancing target–background differentiation. Our experimental results demonstrate that DACNet achieves an average accuracy (AA) of 99.96%, 99.45%, 100%, and 95.88%; an overall accuracy (OA) of 99.94%, 99.52%, 100%, and 96.39%; and Kappa coefficients of 99.91%, 99.41%, 100%, and 95.21% across the four datasets. The classification results exhibit sharp edges and minimal noise, outperforming five deep learning methods and three machine learning approaches. Additional generalization experiments on public datasets further validate DACNet’s superiority in providing an efficient and novel approach for hyperspectral camouflage data classification.

Keywords:

hyperspectral remote sensing; convolutional neural network; deep learning; disguising targets; medium-wave infrared

Graphical Abstract

1. Introduction

Camouflaged targets are objects made from materials like camouflage fabrics, optical coatings, and camouflage nets [1,2,3] to achieve visual similarity with their surrounding environment, thereby evading detection by reconnaissance systems. The identification of such targets represents a key research focus and plays a critical role in modern warfare [4,5,6]. Nevertheless, their inherent concealment properties pose significant challenges for conventional target recognition algorithms, which frequently fail to accurately delineate the target–background boundaries. Recent advancements in hyperspectral remote sensing technology have led to its extensive adoption in various domains, including military reconnaissance [7,8] and environmental monitoring [9,10]. Characterized by its comprehensive radiative, spatial, and spectral data, as well as its unique “spectral and one” feature [11], hyperspectral imagery has emerged as an indispensable tool for camouflage target detection and identification.

Hyperspectral images can be classified into both visible light and infrared hyperspectral categories. The former captures fine distinctions in the reflected light from object surfaces, delivering rich texture and color information. These images are instrumental in critical applications, such as medical diagnostics [12,13], mineral exploration [14,15], and crop monitoring [16,17] within specific spectral bands. However, the effective detection of targets using visible light hyperspectral imaging is significantly challenging under external environmental influences—including nighttime conditions, variable lighting, complex backgrounds, and smoke.

However, the reconnaissance and detection of camouflaged targets are often conducted in complex background environments, limiting the effectiveness of visible light hyperspectral imaging. In contrast to visible light hyperspectral imaging, infrared imaging has gained significant attention from researchers worldwide due to its ability to acquire thermal information about the target’s state and distribution without physical contact. The infrared spectrum includes short-, mid-, long-, and very long-wave infrared bands. Among these, mid-wave infrared hyperspectral images in the 3–5 μm range primarily detect the emitted radiation from the targets and the environment. This allows for the acquisition of radiation information that is difficult to obtain in the visible light spectrum, making it of considerable research value for camouflaged target detection, target stealth, and anti-stealth applications. Regarding the application of infrared hyperspectral imaging technology, in 2022, El Oufir, M.K. et al. [18] developed a hybrid snow density estimation model that enables the retrieval of density from near-infrared hyperspectral data and an integrated system (EBS). This model reduces the number of classification errors, helps mitigate the risks of these estimation errors, and produces more robust density results than the other models. In 2023, Ren et al. [19] proposed a cloud phase classification method that simultaneously utilizes the far-infrared and thermal infrared bands. The potential features of the bands were analyzed based on the differences in the simulated cloud brightness temperature (BT) spectra of various cloud phases. The experimental results showed that the proposed method performs well and that far-infrared band features can significantly enhance cloud phase classification. In the same year, Zhang et al. [20] captured visible and near-infrared (VNIR) hyperspectral images of maize canopies at continuous time intervals. By applying reference panel calibration and locally weighted scatter plot smoothing to minimize the effects of ambient light and daily growth, they studied spectral variations across all relevant VNIR wavelengths throughout the day. In 2024, Jia et al. [21] employed near-infrared hyperspectral imaging technology to capture spectral images and extract soil reflectance features. The data processing phase involved extracting the mean spectral band reflectance values as inputs for the model, with the carbon storage values detected by chemical methods serving as the output. This approach provides essential support for the rapid, accurate, and non-destructive detection of carbon storage in modern wetland soils. An analysis of the current state of infrared hyperspectral research shows that studies have primarily focused on near-infrared and long-wave infrared hyperspectral imaging, while research on mid-wave infrared hyperspectral images remains relatively scarce.

In recent years, the rapid advancement of deep learning technologies in computer vision has led to an increasing number of applications of deep learning algorithms in hyperspectral image classification, yielding substantial research achievements in hyperspectral image processing. Notably, leveraging their inherent advantages of “parameter sharing” and “local perception”, convolutional neural networks (CNNs) [22,23] demonstrate exceptional capability in extracting deep features from data while maintaining manageable model parameter growth. Consequently, CNNs have emerged as a predominant and extensively studied model in the hyperspectral image processing domain.

Currently, the main trend in hyperspectral image classification is based on convolutional neural networks (CNNs), which combine spatial and spectral information, namely, classification methods based on spatio-spectral feature extraction [24]. This approach effectively enhances the classification accuracy and recognition performance of hyperspectral images. In 2022, Fırat, H. et al. [25] combined 3D and 2D convolutions for hyperspectral image classification, constructing a CNN called HybridSN based on spatio-spectral mixed feature extraction. This network captures depth information using 3D convolution and processes planar features using 2D convolution. The network gradually optimizes the loss function during training to achieve a high classification accuracy. In 2021, H. Firat et al. [26] proposed a ResNet-50 method based on 3D convolutional neural networks. This network addresses issues such as gradient characteristic loss and degradation caused by increasing depth during feature extraction. Chen et al. [27] introduced a deep learning method using polarization feature maps for classification. Initially, a polar coordinate transformation method was employed to convert the spectral information of all the pixels in the image into spectral feature maps. Subsequently, these feature maps were classified using the proposed Deep Contextual Feature Fusion Network (DCFF-NET), achieving overall accuracies of 98.15%, 99.86%, and 99.98%. Li et al. [28] developed a dual-layer, dual-branch 3D/2D attention cascade network model called DBANet, which demonstrates significant advantages in feature extraction and classification performance, showcasing excellent performance in hyperspectral remote sensing image classification. Therefore, CNN-based hyperspectral classification algorithms effectively compensate for the shortcomings of traditional classification methods, improving accuracy and achieving remarkable results. However, most hyperspectral classification algorithms are currently based on open-source visible/near-infrared hyperspectral datasets. There is a lack of analysis and research on the application of algorithms in specific scenarios, particularly camouflage scenarios or when the target is in a camouflage state, where research using mid-wave infrared hyperspectral data for target classification and recognition is significantly insufficient.

In summary, in response to the various problems in current hyperspectral imaging and classification recognition research mentioned in the above analysis, the work we conducted in this study is as follows:

Mid-wave infrared hyperspectral experiments on camouflage targets with complex backgrounds using a Hyper-Cam mid-wave infrared hyperspectral imager. Based on the experimental data, we established a comprehensive camouflage target database containing complex backgrounds comprising four distinct datasets (GCL, SSCL, CW, and LC). These datasets are suitable for deep learning training applications.
We propose a dual-channel attention convolutional neural network (DACNet) based on spatial and spectral feature extraction. The network employs 1D and 2D convolution units to extract spectral and spatial information, respectively, while incorporating a dual-channel attention structure to effectively acquire deep-level information on camouflaged targets in hyperspectral images.
The classification and recognition performance of the DACNet network was tested on four mid-wave infrared hyperspectral camouflage datasets—GCL, SSCL, CW, and LC—and three public datasets—IP, PU, and SA. We compared the recognition performance of other convolutional neural networks to verify that the proposed DACNet model exhibited excellent classification and recognition performance on these datasets.

2. Methods

The DACNet network architecture comprises three fundamental components: (1) a Principal Component Analysis (PCA) module for dimensionality reduction, (2) dedicated 1D and 2D convolution units for independent spectral and spatial feature extraction, respectively, and (3) an integrated dual-channel attention mechanism embedded within both convolutional pathways. The operational workflow of DACNet proceeds in the following sequence:

(1): Firstly, Principal Component Analysis (PCA) is applied to reduce the dimensionality of the hyperspectral data. Information from thirty bands is extracted from the hundreds in the hyperspectral camouflage image for classification and recognition. This approach helps to better understand the structure of the data, extract the main features, and prevent issues related to dimensionality explosion in hyperspectral data.
(2): A dual-channel attention mechanism is employed to independently enhance the weight of the spectral and spatial information during training, further amplifying the distinction between the target and background. The attention mechanism in the spectral extraction path consists of three layers: a 2D global average pooling layer with a kernel size of 1, followed by two fully connected layers with ReLU and Sigmoid activation functions. The attention mechanism in the spatial extraction path also comprises three layers: a max pooling layer, an average pooling layer, and a 2D convolution layer with an input channel size of 2, output channel size of 1, kernel size of (7,7), and padding size of 3.
(3): The spectral and spatial features in the hyperspectral image are fully explored through the 1D and 2D convolutional branches for spectral and spatial feature extraction, respectively. The 1D convolutional branch consists of five 1D convolution layers, where four layers have a kernel size of 3, stride of 2, and padding size of 1, and one layer has a kernel size determined by the input channel number and a stride of 1. The output channel sizes are 16, 32, 64, and 128, and the number of land cover classes is denoted as num_classes. The 2D convolutional branch consists of five 2D convolution layers, with four layers having a kernel size of (3,3) and a stride of 1; one layer has a kernel size determined by the input channel number and data dimensions. The output channel sizes are 16, 32, 64, 128, and num_classes. Non-linear ReLU activation functions and batch normalization layers are introduced to prevent overfitting and reduce the convergence time. Finally, the spectral and spatial features obtained from the two branches are flattened into 1D vectors, fused, and aggregated, followed by a fully connected layer for outputting the classification results.

When the size of the input data is (1, 30, 13, 13) and the number of classes, num_classes, is 7, the visualization structure of the DACNet network is as shown in Figure 1. The size below each module in the figure represents the size of the hyperspectral image after being processed by that module. The overall network structure parameters are listed in Table 1. The total number of parameters in the model is 158,236.

2.1. Principal Component Analysis

Hyperspectral images contain a large amount of spectral and spatial information, which not only increases computational complexity and storage costs but may also lead to the “curse of dimensionality”, making data analysis more challenging. Principal Component Analysis (PCA) is a commonly used dimensionality reduction technique that projects correlated high-dimensional datasets onto a new lower-dimensional coordinate space, facilitating a better understanding of the data structure and extraction of key features. The fundamental idea of PCA is to transform multiple variables in the data into a set of new orthogonal vectors through a linear transformation. These combined variables, known as principal components, have maximum variance and are uncorrelated with each other. The workflow of PCA is illustrated in Figure 2.

(a): Calculate the mean: The raw data are centralized by subtracting the mean of each feature so that the center of the data is at the origin.
(b): Calculate the covariance matrix: The covariance matrix of the centralized data is calculated, which reflects the correlation between different features.
(c): Calculate the eigenvalues and eigenvectors: Perform eigenvalue decomposition on the covariance matrix to obtain eigenvalues and corresponding eigenvectors. The eigenvectors represent the main data directions, and the eigenvalues represent the magnitude of the variance in these directions.
(d): Selecting principal components: The eigenvector with the largest eigenvalue is selected as the principal component, whose corresponding eigenvalue represents the main variance in the data.
(e): Projection: The raw data are projected onto the selected principal components to obtain a new low-dimensional feature space.

2.2. One- and Two-Dimensional Convolution Branches

In mid-wave infrared hyperspectral images containing disguised targets, classification and recognition are particularly challenging due to the minimal spectral differences between the targets and their backgrounds. Conventional approaches present significant limitations; one-dimensional convolution operations tend to lose critical spatial information, while two-dimensional convolution fails to adequately capture spectral characteristics. Although three-dimensional convolution can simultaneously process both spectral and spatial information, this method substantially increases the number of network parameters, slows the training speed, and introduces undesirable redundancy and interference. Consequently, single-convolution module architectures are poorly suited for this classification task. To address this issue, our study adopts a network structure that extracts spectral and spatial features separately. The one- and two-dimensional convolution branches extract the spectral and spatial features, respectively. Finally, the two types of feature outputs are fused to overcome the shortcomings of a single-convolution module, improve the classification efficiency and performance, and better adapt to the task of disguised target recognition.

To address these limitations, this study proposes a network architecture that separately extracts spatial and spectral features. A one-dimensional convolution branch is used to extract the spectral features, while a two-dimensional convolution branch is employed for spatial feature extraction. Finally, the hyperspectral data extracted by both branches are fused and aggregated before being output. This dual-branch architecture effectively addresses the inherent constraints of single-convolution module approaches, resulting in enhanced classification performance and improved operational efficacy. By comprehensively utilizing both the spectral and spatial information embedded in hyperspectral imagery, the proposed architecture demonstrates superior suitability for the challenging task of camouflaged target recognition.

After the hyperspectral data undergo PCA dimensionality reduction and other processing operations, they are fed into the one- and two-dimensional convolution branches for computation. In the former, the data are first compressed to a size suitable for the one-dimensional convolution layer. They then pass through this layer, where processing is performed along the depth dimension to extract spectral information from the image and generate convolutional feature maps of the continuous spectral bands in the input layer. The specific calculation formula for a one-dimensional convolutional layer is as follows:

p_{(1 D) l, j}^{x} = f (\sum_{m} \sum_{h = 0}^{H_{l} - 1} k_{l, j, m}^{h} p_{(l - 1), m}^{(x + h)} + b_{i, j})

(1)

Here,

H_{l}

is the length of the convolution kernel;

j

is the sequence number of the current convolution kernel;

k_{l, j, m}^{h}

is the specific value of the

j

-th convolution kernel in the

l

-th layer at position

h

;

p_{(l - 1), m}^{(x + h)}

represents the

m

-th feature map in the

(l - 1)

th layer;

b_{l, j}

is the bias; and

p_{(1 D) l, j}^{x}

is the output of the one-dimensional convolution layer.

In the two-dimensional convolutional processing pathway, the input data undergo direct transformation through the 2D convolutional layer, which operates across the spatial (height and width) dimensions to systematically extract critical spatial information from imagery. This computational process produces comprehensive convolutional feature maps that preserve and enhance the continuous spectral band characteristics of the original input data. The precise mathematical representation of this two-dimensional convolution operation is defined as follows:

p_{(2 D) l, j}^{x, y} = f (\sum_{m} \sum_{h = 0}^{H_{l} - 1} \sum_{w = 0}^{W_{l} - 1} k_{l, j, m}^{h, w} p_{(l - 1), m}^{(x + h), (y + w)} + b_{l, j})

(2)

Here,

H_{l}

and

W_{l}

are the length and width of the convolution kernel;

j

is the sequence number of the current convolution kernel;

k_{l, j, m}^{h, w}

is the specific value of the

j

-th convolution kernel in the

l

-th layer at position

(h, w)

;

p_{(l - 1), m}^{(x + h), (y + w)}

represents the

m

-th feature map in the

(l - 1)

-th layer;

b_{l, j}

is the bias; and

p_{(2 D) l, j}^{x, y}

is the output of the two-dimensional convolution layer.

Finally, the output data of both the one- and two-dimensional convolution branches are compressed to one dimension and fused together to obtain the output result of the DACNet network. The calculation result is as follows:

F_{o u t p u t} = F l a t t e n {(p_{(1 D) l, j}^{x})}_{1} + F l a t t e n {(p_{(2 D) l, j}^{x, y})}_{1}

(3)

Here,

F_{o u t p u t}

represents the output result of the DACNet network and

F l a t t e n {(•)}_{1}

represents the operation of compressing data into one dimension.

2.3. Dual-Channel Attention Mechanism

The attention mechanism represents a deep learning module that emulates human attentional processes, with its fundamental principle being the dynamic allocation of weights according to the relevance between the input data and the current task objective, thereby amplifying informative feature channels while attenuating less relevant ones. Within the domain of hyperspectral image classification and recognition, this mechanism enables the precise processing of both spectral and spatial information, and its integration into convolutional neural networks significantly enhances classification performance. To specifically improve the classification capability of mid-wave infrared hyperspectral camouflage imagery, this investigation introduces an innovative dual-channel attention architecture that strategically incorporates distinct spectral and spatial attention modules into their respective extraction branches within the DACNet framework. This design intentionally reinforces spectral characteristics while diminishing spatial components in one pathway; conversely, it emphasizes spatial over spectral features, creating a synergistic effect that magnifies the discriminative contrast between target and background regions and ultimately yields superior camouflage target recognition outcomes.

In selecting the spectral attention module (channel attention module), we chose the SENet attention mechanism embedded in the spectral extraction branch of the network to enhance the spectral information weight and reduce the spatial information weight. After being weighted by SENet, the data entered the spectral extraction branch of the DACNet for training. The implementation steps of the SENet attention mechanism are shown in Figure 3.

(a): Squeeze: Using two-dimensional global average pooling, compress the two-dimensional features $(H * W)$ of each channel into a real number with a global receptive field, and compress the feature map from $[h, w, c]$ to $[1, 1, c]$ .
(b): Excitation: Generate weights for each channel through parameters, with the same number of output weight values as the number of channels in the input feature map; the size of the weights is $[1, 1, c]$ .
(c): Reweight: Weighting the normalized weights obtained in the previous steps onto the features of each channel completes the recalibration of the original features in the channel dimension. After reweighting, the size of the feature data is $[h, w, c] * [1, 1, c] = [h, w, c]$ .

In selecting spatial attention modules, we chose to embed the spatial attention mechanism part (spatial attention) of the CBAM attention mechanism into the spatial extraction branch of the network, improving the spatial information weight and reducing the spectral information weight. The workflow of the spatial attention mechanism is shown in Figure 4.

(a): Firstly, the input feature map (size $[b, c, h, w]$ ) is subjected to maximum and average pooling in the channel dimension. The average pooling operation calculates the average value across spatial channels, while the maximum pooling operation identifies the maximum. At this point, the size of the feature map becomes $[b, 1, h, w]$ .
(b): Stack the two pooled feature maps in the channel dimension with a size of $[b, 2, h, w]$ . Subsequently, the convolutional layers with kernel sizes of (7,7) are used to perform operations, fuse channel information, and reduce the size of the feature map from $[b, 2, h, w]$ to $[b, 1, h, w]$ .
(c): Subsequently, the convolved result is weight-normalized using the Sigmoid function and then multiplied by the weights of the input feature map to obtain the output feature map (size $[b, c, h, w]$ ).

3. Results

3.1. Experiment and Dataset

On 21 December 2023, between 14:00 and 20:00, at the South Campus of Xi’an University of Electronic Science and Technology, Chang’an District, Xi’an City, Shaanxi Province, researchers employed the Hyper-Cam extended mid-wave spectral imaging instrument to acquire mid-wave infrared hyperspectral image data on camouflage target models across various background conditions and camouflage techniques. The collected experimental data served as the foundation for constructing a specialized mid-wave infrared hyperspectral test dataset specifically designed for camouflage target analysis. The configuration of the testing equipment and the detailed experimental setup are shown in Figure 5. The imaging system operated within the mid-wave infrared spectrum of 3–5 μm, achieving a spectral resolution of 13 cm⁻¹ and maintaining a spatial resolution of 320 × 256 pixels.

Measurement targets include scaled models of camouflage tanks, aircraft, armored vehicles, and other targets. Camouflage methods include nets and coatings. The ground background includes a combination of sand, cement ground, bare soil, shrubs, grassland, forest, and water bodies. The test background is shown in Figure 6.

After noise removal and atmospheric correction, the dataset was constructed using the Region of Interest (ROI) tool in ENVI 5.3 to create prior-label truth samples. The specific information on the datasets is as follows:

(1): Shrubs, Grassland, and Bare Soil Background (GCL) Dataset

The GCL dataset was captured on 21 December 2023 at 4:42 pm, and the hyperspectral cube data are shown in Figure 7a. They consist of six targets and three types of land features. After removing the band near the 4.3 μm atmospheric absorption band, 82 bands with a pixel size of 320 × 256 remain after screening. The truth map produced by the ROI tool is shown in Figure 7b. The target and land cover types corresponding to this dataset and the number of samples are listed in Table 2.

(2): Sandy, Sandy, Cement, Grassland, and Bare Soil Background (SSCL) Dataset

The SSCL dataset was captured at 16:55 on 21 December 2023, and the hyperspectral cube data are shown in Figure 8a. They consist of five targets and six types of land features. After removing the band near the 4.3 μm atmospheric absorption band, 82 bands with a pixel size of 320 × 256 remain after screening. The truth map produced by the ROI tool is shown in Figure 8b. The target and land cover types corresponding to this dataset and the number of samples are listed in Table 3.

(3): Grassland and camouflage net background (CW) dataset

The CW dataset was captured at 17:51 pm on 21 December 2023, and the hyperspectral cube data are shown in Figure 9a. They consist of five targets, two types of land features, and one camouflage network. After removing the band near the 4.3 μm atmospheric absorption band, 82 bands with a pixel size of 320 × 256 remain after screening. The true value map produced by the ROI tool is shown in Figure 9b. The target and land cover types corresponding to the CW dataset and the number of samples are listed in Table 4.

(4): Forest and Grassland Background (LC) Dataset

The LC dataset was captured at 18:35 on 21 December 2023, and the hyperspectral cube data are shown in Figure 10a. They consist of five targets and four types of land features. After removing the band near the 4.3 μm atmospheric absorption band, 82 bands with a pixel size of 320 × 256 remain after screening. The true value map produced using the ROI tool is shown in Figure 10b. The LC dataset corresponds to the target and land cover types, as well as the number of samples, as shown in Table 5.

3.2. Classification Evaluation Indicators

In order to effectively evaluate the classification performance of DACNet, this study used hyperspectral image classification performance evaluation indicators, namely the average classification accuracy (AA), overall classification accuracy (OA), and Kappa coefficient [29].

The average classification accuracy (AA) represents the average classification accuracy of each category, with equal emphasis on each. The calculation formula is as follows:

A A = \frac{1}{N} \sum_{i = 1}^{N} \frac{C_{i, i}}{\sum_{j = 1}^{N} C_{i, j}}

(4)

The overall classification accuracy (OA) represents the proportion of correctly classified samples to the total number of samples and is an evaluation indicator of the overall performance of the classifier. The calculation formula is as follows:

O A = \frac{\sum_{i = 1}^{N} C_{i, i}}{\sum_{i = 1}^{N} \sum_{j = 1}^{N} C_{i, j}}

(5)

The Kappa coefficient is an indicator that measures the consistency of a classifier, taking into account the possibility of accidental correct classification and avoiding the impact of data imbalance. It evaluates the classifier performance by comparing the consistency between the actual and predicted categories, with values closer to 1 indicating better performance. The calculation formula is as follows:

k a p p a = \frac{N \times \sum_{i = 1}^{n} C_{i, i} - \sum_{i = 1}^{n} (\sum_{j = 1}^{n} C_{i, j} \times \sum_{i = 1}^{n} C_{i, j})}{N^{2} - \sum_{i = 1}^{n} (\sum_{j = 1}^{n} C_{i, j} \times \sum_{i = 1}^{n} C_{i, j})}

(6)

3.3. Multi-Category Comparative Evaluation

To evaluate the classification performance of DACNet on the mid-wave infrared hyperspectral camouflage dataset, we conducted tests on four constructed camouflage datasets: GCL, SSCL, CW, and LC. The experiments were carried out on a Windows system, using an Intel^® Core™ i7-6700 CPU, 16.0GB RAM, Python 3.9.16, and PyTorch 2.0.1.

In terms of the parameter settings, the number of principal components for dimensionality reduction was set to 30, the batch size was set to 128, the number of training epochs was set to 150, and the learning rate was set to 0.001. The Cross-Entropy Loss Function is used to calculate the loss, and the backpropagation algorithm, along with the Adagrad optimizer, is employed to train the optimal parameters, continuously improving the training accuracy.

In addition, to demonstrate the classification performance and advanced nature of the DACNet algorithm, we selected five representative deep-learning-based algorithms for comparison, including 1DCNN [30], Two-Stream-2DCNN [31], 3DCNN [32], HybridSN [33], and ResNet-50 [34]. The training results of the five algorithms on the camouflage datasets are compared and analyzed against those of the DACNet algorithm. For training, the input image block size was set to 13 × 13, and the training/testing set ratio was 1:9.

3.3.1. Comparison and Evaluation of GCL Dataset Classification Results

Figure 11 and Table 6 present the classification results of the GCL dataset and a comparison of the evaluation metrics, respectively. As shown in Table 6, among the six classification methods, 1DCNN and ResNet exhibit poor classification performance on the hyperspectral camouflage dataset, while Two-Stream-2DCNN shows moderate training results. Both the 3DCNN and HybridSN networks demonstrate superior classification performance, with evaluation metrics all exceeding 90%. In contrast, the DACNet model designed in this study achieves an OA of 99.94%, an AA of 99.96%, and a Kappa coefficient of 99.91%, with a classification accuracy of 100% for all four land cover categories, outperforming all other methods in terms of both classification effectiveness and accuracy. As shown in Figure 11, the HybridSN network missed some targets and failed to detect the target in the top-left corner of the image. Although both 1DCNN and 3DCNN identified all targets, the latter produced small and blurry boundary contours, while the former generated excessive noise in its classification results. Additionally, both the ResNet-50 and Two-Stream-2DCNN networks encountered misdetections; the latter misidentified some targets as backgrounds, while the former erroneously classified much of the background as targets. In contrast, the DACNet network accurately detected all targets without any missed or false detections, demonstrating superior classification and recognition performance compared to all other networks.

3.3.2. Comparison and Evaluation of SSCL Dataset Classification Results

Figure 12 and Table 7 present the classification results of the SSCL dataset and a comparison of the evaluation metrics, respectively. As shown in Table 7, the DACNet model achieved an OA of 99.52%, an AA of 99.45%, and a Kappa coefficient of 99.41%. Furthermore, among the seven land cover categories, five achieved 100% classification accuracy, making the DACNet the top-performing model across all evaluation metrics. The HybridSN network demonstrated the second-best performance, with all three evaluation metrics exceeding 96%. Among all the models evaluated, 1DCNN showed the lowest classification accuracy. Additionally, as shown in Figure 12, the 3DCNN and Two-Stream-2DCNN networks exhibited relatively poor performance, with both misclassifying certain areas between the grassland and targets, and the 3DCNN additionally failed to detect some targets. The 1DCNN network performed the worst overall, completely failing to recognize the camouflage targets. While DACNet shows minor misclassifications of some grassland pixels as targets, it successfully detects all targets and delivers superior classification results compared to all other networks.

3.3.3. Comparison and Evaluation of CW Dataset Classification Results

For the CW dataset, Figure 13 and Table 8 present the classification results and evaluation parameter comparisons, respectively. As shown in Table 8, the DACNet model achieves 100% accuracy across all metrics—OA, AA, Kappa coefficient—and prediction accuracy for all land cover types, outperforming all other network models. The 3DCNN and HybridSN networks demonstrate strong performance, with evaluation metrics exceeding 99%, although they were slightly inferior to DACNet. The remaining three network models show moderate performance. Figure 13 reveals the challenging nature of target identification in the CW dataset due to camouflage nets obscuring some small target models. While 3DCNN and ResNet successfully detect unconcealed targets, they fail to identify targets beneath camouflage nets. HybridSN partially detects concealed targets but with incomplete recognition. Two-Stream-2DCNN exhibits noticeable misidentification errors. In contrast, DACNet excels at clearly recognizing both exposed target models and accurately detecting concealed targets under camouflage nets while generating fewer false alarms. This superior performance stems from DACNet’s unique architecture, which independently processes spatial and spectral information before integration, combined with its attention mechanism, which enhances target–background discrimination. These capabilities enable DACNet to demonstrate exceptional generalization and detection performance for camouflaged targets in the CW dataset.

3.3.4. Comparison and Evaluation of LC Dataset Classification Results

For the LC dataset, Figure 14 and Table 9 present the classification results and evaluation parameter comparisons, respectively. As shown in Table 9, the DACNet model achieves an OA of 97.24%, an AA of 96.69%, and a Kappa coefficient of 96.33%. Notably, it classifies two land cover classes with 100% accuracy, surpassing the other five network models. The 1DCNN and ResNet networks exhibit relatively weaker performance on the LC dataset, with lower evaluation metric scores. The remaining three networks demonstrate strong classification performance, with all evaluation metrics exceeding 90%. However, for the fourth land cover class (camouflaged target model), all other networks achieve classification accuracies below 80%, while only 3DCNN reaches 93% accuracy—second to DACNet’s 96%. This performance gap arises because camouflaged targets in the LC dataset are highly concealed and spectrally similar to their background, making it challenging for conventional networks to extract the discriminative spectral features. In contrast, DACNet overcomes this limitation by leveraging separate convolutional branches to extract spatial and spectral information combined with an attention mechanism that refines the feature representation, leading to superior classification precision. Furthermore, Figure 14 illustrates that while the other five networks typically detect only three grassland target models, DACNet successfully identifies both occluded targets near tree trunks and highly concealed targets in grassland regions with significantly fewer false alarms. This demonstrates DACNet’s clear advantage in classification performance over all competing networks on the LC dataset.

3.4. Ablation Experiment

The DACNet network model mainly consists of 1D and 2D convolution branches and a dual-channel attention mechanism. To fully explore the role of each module in the classification and recognition processes of hyperspectral camouflage datasets, ablation experiments were conducted using DACNet. The experiments were divided into four conditions: (1) absent 1D convolution branch, (2) absent 2D convolution branch, (3) absent spectral attention SE module in the dual-channel attention mechanism, and (4) absent spatial attention SA module in the dual-channel attention mechanism. In each ablation experiment, the hyperparameter settings of the network model were consistent with those of the original model.

The results of the ablation experiments are shown in Table 10. It can be observed that the absence of a 1D convolution branch yields the worst performance, with the lowest OA, AA, and Kappa metrics across all four datasets. This demonstrates the critical role of the 1D convolution branch in spectral feature extraction. The absence of the 2D convolution branch similarly leads to a decrease, to some extent, in the evaluation metrics, as this branch is capable of accurately extracting spatial and texture features from hyperspectral data. Likewise, the absence of either the spectral or spatial attention modules in the dual-channel attention mechanism also results in a decrease in classification performance, although the impact is smaller than that of the convolution branches. Therefore, each module in the DACNet model plays a significant role in the classification performance of hyperspectral camouflage images.

3.5. Time Analysis

In the field of deep learning, network models typically require substantial computational resources, including high-performance hardware such as GPUs and TPUs. Longer running times correspond to increased utilization of computational resources, leading to higher costs. Therefore, when designing training models, in addition to considering accuracy and classification performance, time efficiency must also be prioritized as a critical consideration in balancing accuracy and computational time. For the time analysis in this study, the GPU used was an NVIDIA GeForce MX350, and the CPU was an Inter(R) Core(TM) i5-1035G1 CPU @1.00GHz. Table 11 shows the program running times of the six algorithmic models on the four camouflage datasets (GCL, SSCL, CW, and LC), with all results averaged over five test runs.

As shown in Table 11, the average running times of the DACNet network on the four datasets are 86.37 s, 84.15 s, 72.40 s, and 45.22 s, respectively. Its runtime efficiency is higher than that of the HybridSN, ResNet-50, and Two-Stream-2DCNN networks and second only to the 1DCNN model. Therefore, the proposed DACNet network model in this study effectively improves the model accuracy and classification performance while maintaining reasonable runtime efficiency when processing hyperspectral camouflage data.

3.6. Comparison with Traditional Camouflage Target Recognition Methods

To further validate the effectiveness of the DACNet network model and enhance the completeness of the research framework, we conducted supplementary experiments to address the limitations of existing works in terms of comparative dimensions. Considering that existing research predominantly focuses on performance comparisons among deep learning models and lacks a systematic analysis of traditional machine learning methods, this paper specifically selects three representative traditional machine learning algorithms—Support Vector Machine (SVM), K-Nearest Neighbor (KNN), and Naive Bayes (NB)—to conduct systematic comparative experiments. The experimental results are shown in Table 12.

As shown in Table 1, for the four types of mid-wave infrared hyperspectral camouflage data, the classification accuracy of DACNet remains superior to that of traditional models using machine learning for camouflaged target recognition. Among the three machine learning methods, KNN demonstrates relatively better classification results, while SVM shows moderate performance. However, both exhibit lower evaluation metrics on the LC dataset, which aligns with the earlier observation that camouflaged targets are more concealed and exhibit smaller spectral distinctions from the background, thereby making classification more challenging. Furthermore, the NB model performs the worst, achieving Kappa coefficients of 0 across all four datasets, indicating that this model completely loses its effective discriminative capability in mid-wave infrared hyperspectral camouflage scenarios. In summary, the DACNet network demonstrates significant advantages over traditional machine learning methods in terms of both classification performance and accuracy.

3.7. Experiment on Public Datasets

The existing experimental validation datasets in this study are limited to singular controlled experimental scenarios (e.g., shrubland, grassland, etc.), and the current mid-wave infrared hyperspectral dataset has a restricted sample size and relatively small scale, which may constrain the model’s ability to analyze the spatial and spectral features of ground objects. This limitation makes it challenging to comprehensively evaluate the DACNet algorithm’s performance in terms of cross-scenario adaptability, model generalization capabilities, and superiority over existing hyperspectral classification methods. To systematically validate the generalizability of the DACNet algorithm and establish its universal theoretical foundation, this study conducts hyperspectral classification experiments on three public datasets—Indian Pines, Pavia University, and Salinas—to verify the generalization and broad applicability of the DACNet network model. The basic information on the three types of public datasets is as follows:

(1): Indian Pines (IP) Dataset

The Indian Pines (IP) dataset is a hyperspectral dataset collected by the AVIRIS sensor on 12 June 1992, covering a 2.9 km × 2.9 km area in Northwestern Indiana. The dataset has a pixel size of 145 × 145 (21,025 pixels), a spatial resolution of 20 m, and contains 16 land cover classes.

(2): Pavia University (PU) Dataset

The Pavia University (PU) data are part of a hyperspectral dataset acquired by a German airborne reflective optics system imaging spectrometer over the city of Pavia, Italy, in 2003. The spectrometer captured 115 spectral bands in the wavelength range of 0.43–0.86 μm with a spatial resolution of 1.3 m. A total of 12 bands were discarded due to noise, leaving 103 spectral bands for analysis. The dataset has dimensions of 610 × 340 (207,400 pixels) and includes nine land cover classes.

(3): Salinas (SA) Dataset

The Salinas (SA) data are hyperspectral images collected by the AVIRIS imaging spectrometer over the Salinas Valley in California, USA, with a spatial resolution of 3.7 m. The original image contains 224 spectral bands, of which 20 bands (unusable due to water absorption effects) were removed, leaving 204 bands for use. The image has dimensions of 512 × 217 (111,104 pixels) and comprises 16 land cover classes.

Surface feature maps, ground-truth maps, and detailed class information for the IP, PU, and SA datasets are shown in Figure 15 and Table 13, respectively.

3.7.1. Comparison and Evaluation of IP Dataset Classification Results

Figure 16 and Table 14, respectively, show the classification result maps of the IP dataset and a comparison of the evaluation metrics. As illustrated in Table 14, DACNet achieves the best classification performance among the six methods, with OA, AA, and Kappa parameters reaching 95.89%, 92.17%, and 94.33%, respectively—all higher than those of the other networks. Additionally, DACNet outperforms the other methods in classifying the 11 land cover categories, achieving 100% accuracy for 4 of these and demonstrating exceptional classification capabilities. The ResNet-50 network ranks second to DACNet, with its three evaluation metrics reaching 95.45%, 92.14%, and 94.26%, respectively. The TS2DCNN, 3DCNN, and HybridSN networks also show strong performance, with all three evaluation metrics exceeding 90%. The 1DCNN network exhibits the lowest classification accuracy.

3.7.2. Comparison and Evaluation of PU Dataset Classification Results

Figure 17 and Table 15 show the classification result maps of the PU dataset and a comparison of the evaluation metrics, respectively. As illustrated in Table 15, among the six classification methods, the 1DCNN exhibits moderate performance. The TS2DCNN, 3DCNN, HybridSN, and ResNet-50 networks demonstrate similar classification performance on the PU dataset, although the HybridSN network shows relatively better results overall. The DACNet network proposed in this study achieves the best classification performance, outperforming other methods in classifying six land cover categories and achieving 100% accuracy for four of these. Its OA, AA, and Kappa parameters reach 98.99%, 98.15%, and 98.70%, respectively, surpassing those of the other networks.

3.7.3. Comparison and Evaluation of SA Dataset Classification Results

Figure 18 and Table 16, respectively, show the classification result maps of the SA dataset and a comparison of the evaluation metrics. As illustrated in Table 16, the classification performance of the 1DCNN network on the SA dataset shows significant improvement compared to the other two datasets, with the OA, AA, and Kappa parameters all exceeding 90%. The TS2DCNN, 3DCNN, HybridSN, and ResNet-50 networks achieve high classification performance on the SA dataset, with all three evaluation metrics surpassing 99%. The DACNet network proposed in this study achieves the best classification performance among the six methods, with OA, AA, and Kappa reaching 99.91%, 99.86%, and 99.90%, respectively. Among the 16 land cover categories, 15 achieve 100% classification accuracy, while one category achieves 99% accuracy.

In summary, the DACNet network model proposed in this study also demonstrated excellent classification performance on three public hyperspectral datasets, with better classification and recognition performance than the other five network models, verifying the effectiveness of our model in extracting spatial, spectral, and texture features. This demonstrates that the DACNet network model has good generalization and generality; it is not only suitable for classifying hyperspectral camouflage target data but also plays a positive role in other dataset categories in this regard.

4. Conclusions

Given the limited research on the classification and recognition of mid-wave infrared hyperspectral image data for camouflage targets in camouflage scenarios, this study is based on four datasets of camouflaged targets in complex backgrounds collected using a Hyper-Cam mid-wave infrared hyperspectral imager. A dual-channel attention convolutional neural network (DACNet) model, based on the extraction of spatial–spectral features, is constructed to explore its performance in target stealth recognition tasks.

The classification and recognition performances of the DACNet network model were evaluated using classification metrics. The results show that on the four camouflage datasets, GCL, SSCL, CW, and LC, the classification metrics for the DACNet model are (99.94%, 99.96%, and 99.91%); (99.52%, 99.45%, and 99.41%); (100%, 100%, and 100%); and (96.39%, 95.88%, and 95.21%), respectively. In hyperspectral camouflage data processing, the DACNet model outperforms five contemporary deep learning methods and three traditional machine learning recognition methods, including 1DCNN, Two-Stream-2DCNN, 3DCNN, HybridSN, and ResNet-50, as well as SVM, KNN, and NB, demonstrating superior performance. Additionally, the classification results of DACNet were evaluated using classification maps. The results show that the classification maps of DACNet not only exhibit clearer edges but also contain less noise, making them closer to ground-truth maps. Moreover, in all datasets, the DACNet model accurately identifies all camouflage targets, outperforming the other five network models in terms of classification and recognition performance. Furthermore, classification and recognition experiments were conducted on three public datasets—IP, PU, and SA—using DACNet, which also demonstrated excellent classification performance, verifying the universality and superiority of the DACNet network. Therefore, the model proposed in this study provides a new classification method for processing mid-wave infrared hyperspectral camouflage data.

Author Contributions

Conceptualization, S.Z. and Y.C.; data curation, S.Z. and Y.C.; methodology, S.Z.; validation, Y.C. and Z.W.; writing—original draft, S.Z.; writing—review and editing, L.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by the 111 Project (Grant No. B17035) and the National Natural Science Foundation of China (61875156).

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

The authors are grateful to the anonymous reviewers for their critical reviews, insightful comments, and valuable suggestions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Xu, J.; Qian, W.; Chen, Q. Calculation model of scattering depolarization for camouflaged target detection system. Optik 2018, 158, 341–348. [Google Scholar] [CrossRef]
Hogan, G.B.; Cuthill, C.I.; Scott-Samuel, E.N. Dazzle camouflage and the confusion effect: The influence of varying speed on target tracking. Anim. Behav. 2017, 123, 349–353. [Google Scholar] [CrossRef] [PubMed]
Wang, J.; Zhang, L.; Ma, Y.; Xie, W.; Dong, J.; Dong, Y.; Li, W.; Zhang, C. Construction of diverse adaptive camouflage nets based on soluble yellow-to-green switching electrochromic materials. Chem. Eng. J. 2024, 498, 155278. [Google Scholar] [CrossRef]
Wang, T.; Yu, Z.; Fang, J.; Xie, J.; Yang, F.; Zhang, H.; Zhang, L.; Du, M.; Li, L.; Ning, X. Multidimensional fusion of frequency and spatial domain information for enhanced camouflaged object detection. Inf. Fusion 2025, 117, 102871. [Google Scholar] [CrossRef]
Chen, T.; Ruan, H.; Wang, S.; Xiao, J.; Hu, X. A three-stage model for camouflaged object detection. Neurocomputing 2025, 614, 128784. [Google Scholar] [CrossRef]
Zhang, M.; Shen, C.; Deng, Y.; Wang, L. Camouflaged object detection via boundary refinement. Multimed. Syst. 2025, 31, 56. [Google Scholar] [CrossRef]
Zhao, J.; Zhou, B.; Wang, G.; Ying, J.; Liu, J.; Chen, Q. Spectral Camouflage Characteristics and Recognition Ability of Targets Based on Visible/Near-Infrared Hyperspectral Images. Photonics 2022, 9, 957. [Google Scholar] [CrossRef]
Shimoni, M.; Haelterman, R.; Perneel, C. Hypersectral Imaging for Military and Security Applications: Combining Myriad Processing and Sensing Techniques. IEEE Geosci. Remote Sens. Mag. 2019, 7, 101–117. [Google Scholar] [CrossRef]
Krezhova, D.; Maneva, S.; Petrov, N.; Moskova, I.; Krezhov, K. Detection of environmental changes using hyperspectral remote sensing. In Proceedings of the International Physics Conference of the Balkan Physical Union, Istanbul, Turkey, 24–27 August 2015. [Google Scholar] [CrossRef]
Naoto, Y.; Jonathan, C.; Karl, S. Potential of Resolution-Enhanced Hyperspectral Data for Mineral Mapping Using Simulated EnMAP and Sentinel-2 Images. Remote Sens. 2016, 8, 172. [Google Scholar] [CrossRef]
Hu, Y.; Zhang, J.; Ma, Y.; An, J.; Ren, G.; Li, X. Hyperspectral Coastal Wetland Classification Based on a Multiobject Convolutional Neural Network Model and Decision Fusion. IEEE Geosci. Remote Sens. Lett. 2019, 16, 1110–1114. [Google Scholar] [CrossRef]
Zhang, C.; Ma, X.; Zhang, A.; Yan, B.; Zhao, K.; Cheng, Q. Novel discretized gravitational search algorithm for effective medical hyperspectral band selection. J. Frankl. Inst. 2024, 361, 107269. [Google Scholar] [CrossRef]
Petracchi, B.; Gazzoni, M.; Torti, E.; Marenzi, E.; Leporati, F. Machine Learning-Based Classification of Skin Cancer Hyperspectral Images. Procedia Comput. Sci. 2023, 22, 2856–2865. [Google Scholar] [CrossRef]
van der Werff, H.; Hecker, C.; Baines, A.; Botha, A.E.J.; Fletcher, J.; Portela, B. Active Hyperspectral Scanning of Rock Face with a Supercontinuum Laser. Remote Sens. 2024, 16, 4631. [Google Scholar] [CrossRef]
Chen, L.; Sui, X.; Liu, R.; Chen, H.; Li, Y.; Zhang, X.; Chen, H. Mapping Alteration Minerals Using ZY-1 02D Hyperspectral Remote Sensing Data in Coalbed Methane Enrichment Areas. Remote Sens. 2023, 15, 3590. [Google Scholar] [CrossRef]
Zhang, T.; Xuan, C.; Ma, Y.; Tang, Z.; Gao, X. An efficient and precise dynamic neighbor graph network for crop mapping using unmanned aerial vehicle hyperspectral imagery. Comput. Electron. Agric. 2025, 230, 109838. [Google Scholar] [CrossRef]
Wei, L.; Yu, M.; Zhong, Y.; Zhao, J.; Liang, Y.; Hu, X. Spatial–Spectral Fusion Based on Conditional Random Fields for the Fine Classification of Crops in UAVBorne Hyperspectral Remote Sensing Imagery. Remote Sens. 2019, 11, 780. [Google Scholar] [CrossRef]
El Oufir, M.K.; Chokmani, K.; El Alem, A.; Bernier, M. Using Ensemble-Based Systems with Near-Infrared Hyperspectral Data to Estimate Seasonal Snowpack Density. Remote Sens. 2022, 14, 1089. [Google Scholar] [CrossRef]
Ren, H.; Liu, L.; Ye, J.; Xie, H. Using Downwelling Far- and Thermal-Infrared Hyperspectral Radiance for Cloud Phase Classification in the Antarctic. Remote Sens. 2023, 16, 71. [Google Scholar] [CrossRef]
Zhang, J.; Ma, D.; Wei, X.; Jin, J. Visible and Near-Infrared Hyperspectral Diurnal Variation Calibration for Corn Phenotyping Using Remote Sensing. Remote Sens. 2023, 15, 3057. [Google Scholar] [CrossRef]
Jia, L.; Yang, F.; Chen, Y.; Peng, L.; Leng, H.; Zu, W.; Zang, Y.; Gao, L.; Zhao, M. Prediction of wetland soil carbon storage based on near infrared hyperspectral imaging and deep learning. Infrared Phys. Technol. 2024, 139, 105287. [Google Scholar] [CrossRef]
Xu, S.; Geng, S.; Fan, T.; Li, C.; Gao, H. Hyperspectral image classifi-cation method based on narrowing semantic gap convolutional neural network. Int. J. Remote Sens. 2024, 45, 2208–2234. [Google Scholar] [CrossRef]
Yi, W.; Kim, D.Y.; Jin, H.; Yoon, S.; Ahn, K.H. Early detection of pore clogging in microfluidic systems with 3D convolutional neural network. Sep. Purif. Technol. 2025, 359, 130428. [Google Scholar] [CrossRef]
Yang, J.; Qin, J.; Qian, J.; Li, A.; Wang, L. A Multipath and Multiscale Siamese Network Based on Spatial-Spectral Features for Few-Shot Hyperspectral Image Classification. Remote Sens. 2023, 15, 4391. [Google Scholar] [CrossRef]
Fırat, H.; Asker, M.E.; Bayındır, M.İ.; Hanbay, D. Hybrid 3D/2D Complete Inception Module and Convolutional Neural Network for Hyperspectral Remote Sensing Image Classification. Neural Process. Lett. 2022, 55, 1087–1130. [Google Scholar] [CrossRef]
Firat, H.; Hanbay, D. Classification of Hyperspectral Images Using 3D CNN Based ResNet50. In Proceedings of the 2021 29th Signal Processing and Communications Applications Conference (SIU), Istanbul, Turkey, 9–11 June 2021. [Google Scholar]
Chen, Z.; Chen, Y.; Wang, Y.; Wang, X.; Wang, X.; Xiang, Z. DCFF-Net: Deep Context Feature Fusion Network for High-Precision Classification of Hyperspectral Image. Remote Sens. 2024, 16, 3002. [Google Scholar] [CrossRef]
Li, Z.; Chen, G.; Li, G.; Zhou, L.; Pan, X.; Zhao, W.; Zhang, W. DBANet: Dual-branch Attention Network for hyperspectral remote sensing image classification. Comput. Electr. Eng. 2024, 118, 109269. [Google Scholar] [CrossRef]
Huang, J.; Zhang, Y.; Yang, F.; Chai, L. Attention-Guided Fusion and Classification for Hyperspectral and LiDAR Data. Remote Sens. 2023, 16, 94. [Google Scholar] [CrossRef]
Zhang, L.; Zhang, L.; Du, B. Deep learning for remote sensing data advances in machine learning for remote sensing and geosciences. IEEE Geosci. Remote Sens. Mag. 2016, 4, 22–40. [Google Scholar] [CrossRef]
Li, X.; Ding, M.; Pižurica, A. Deep Feature Fusion via Two-Stream Convolutional Neural Network for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2020, 58, 2615–2629. [Google Scholar] [CrossRef]
Ying, L.; Haokui, Z.; Qiang, S. Spectral–Spatial Classification of Hyperspectral Imagery with 3D Convolutional Neural Network. Remote Sens. 2017, 9, 67. [Google Scholar] [CrossRef]
Roy, S.K.; Krishna, G.; Dubey, S.R.; Chaudhuri, B.B. HybridSN: Exploring 3-D–2-D CNN Feature Hierarchy for Hyperspectral Image Classification. IEEE Geosci. Remote Sens. Lett. 2020, 17, 277–281. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]

Figure 1. DACNet network structure flow chart.

Figure 2. PCA workflow diagram.

Figure 3. SENet workflow diagram.

Figure 4. Workflow diagram of the spatial attention mechanism.

Figure 5. Experimental overview. (a) Experimental site. (b) Experimental instruments.

Figure 6. Layout of targets against different ground backgrounds. (a) Shrubs, grassland, and bare soil background. (b) Sandy, cement, grassland, and bare soil background. (c) Grassland and camouflage net background. (d) Forest and grassland background.

Figure 7. GCL dataset. (a) Cube plot of hyperspectral imaging data. (b) Truth plot.

Figure 8. SSCL dataset. (a) Cube plot of hyperspectral imaging data. (b) Truth plot.

Figure 9. CW dataset. (a) Cube plot of hyperspectral imaging data. (b) Truth plot.

Figure 10. LC dataset. (a) Cube plot of hyperspectral imaging data. (b) Truth plot.

Figure 11. Classification and recognition results of GCL dataset. (a) Grayscale image. (b) True value image. (c) 1DCNN. (d) TS2DCNN. (e) 3DCNN. (f) HybridSN. (g) ResNet. (h) DACNet.

Figure 12. Classification and recognition results of SSCL dataset. (a) Grayscale image. (b) True value image. (c) 1DCNN. (d) TS2DCNN. (e) 3DCNN. (f) HybridSN. (g) ResNet. (h) DACNet.

Figure 13. Classification and recognition results of CW dataset. (a) Grayscale image. (b) True value image. (c) 1DCNN. (d) TS2DCNN. (e) 3DCNN. (f) HybridSN. (g) ResNet. (h) DACNet.

Figure 14. LC dataset classification and recognition results. (a) Grayscale image. (b) True value image. (c) 1DCNN. (d) TS2DCNN. (e) 3DCNN. (f) HybridSN. (g) ResNet. (h) DACNet.

Figure 15. Ground cover and truth maps of IP, PU, and SA datasets. (a) surface features of IP. (b) Ground Truth of IP. (c) surface features of PU. (d) Ground Truth of PU. (e) surface features of SA. (f) Ground Truth of SA.

Figure 16. Classification and recognition results of IP dataset. (a) Surface features image. (b) True value image. (c) 1DCNN. (d) TS2DCNN. (e) 3DCNN. (f) HybridSN. (g) ResNet. (h) DACNet.

Figure 17. Classification and recognition results of PU dataset. (a) Surface features image. (b) True value image. (c) 1DCNN. (d) TS2DCNN. (e) 3DCNN. (f) HybridSN. (g) ResNet. (h) DACNet.

Figure 18. Classification and recognition results of SA dataset. (a) Surface features image. (b) True value image. (c) 1DCNN. (d) TS2DCNN. (e) 3DCNN. (f) HybridSN. (g) ResNet. (h) DACNet.

Table 1. DACNet network structure parameters.

Branch	Layer	Kernel_Size	Stride	Padding	Output Shape
Spectral attention mechanism and spectral dimension feature extraction	SpectralAttention	—	—	—	[30,13,13]
	Squeeze	—	—	—	[1,30]
	Conv1d	3	2	1	[16,15]
	Relu	—	—	—	[16,15]
	Conv1d	3	2	1	[32,8]
	Relu	—	—	—	[32,8]
	Conv1d	3	2	1	[64,4]
	Relu	—	—	—	[64,4]
	Conv1d	3	2	1	[128,2]
	Relu	—	—	—	[128,2]
	Conv1d	Spectral kernel	1	0	[7,1]
Spatial attention mechanism and spatial dimension feature extraction	SpatialAttention	—	—	—	[30,13,13]
	Conv2d	(3,3)	(1,1)	0	[16,11,11]
	Relu	—	—	—	[16,11,11]
	Conv2d	(3,3)	(1,1)	0	[32,9,9]
	Relu	—	—	—	[32,9,9]
	Conv2d	(3,3)	(1,1)	0	[64,7,7]
	Relu	—	—	—	[64,7,7]
	Conv2d	(3,3)	(1,1)	0	[128,5,5]
	Relu	—	—	—	[128,5,5]
	Conv2d	Spatial kernel	(1,1)	0	[7,1,1]
Fusion and aggregation of spatial−spectral features	Linear	—	—	—	7
Fusion and aggregation of spatial−spectral features	Total params	158,236

Table 2. Categories and corresponding sample sizes in the GCL dataset.

No.	Class Names	Number of Samples
1	Grass	1296
2	Shrub	1969
3	Target Scaled Model	382
4	Cement Floor	208
	Total	3855

Table 3. Categories and corresponding sample sizes in the SSCL dataset.

No.	Classes Name	Number of Samples
1	Artificial Cement	759
2	Grass	666
3	Sandy Land	1469
4	Shrub	324
5	Target Scaled Model	238
6	Marble	720
7	Bare Soil	261
	Total	4437

Table 4. Categories and corresponding sample sizes in the CW dataset.

No.	Classes Name	Number of Samples
1	Target Scaled Model	217
2	Camouflage Net	522
3	Grass 1	1014
4	Grass 2	1007
	Total	2760

Table 5. Category types and corresponding sample sizes in the LC dataset.

No.	Classes Name	Number of Samples
1	Target Scaled Model	189
2	Bare Soil	755
3	Grass	345
4	Shrub	653
5	Trunk	275
	Total	2217

Table 6. Comparison of classification results of various networks on GCL dataset.

No	1DCNN	TS2DCNN	3DCNN	HybridSN	ResNet-50	DACNet
1	85.00	88.00	92.00	99.00	89.00	100.00
2	84.00	87.00	96.00	99.00	86.00	100.00
3	100.00	100.00	100.00	96.00	100.00	100.00
4	99.00	100.00	100.00	100.00	100.00	100.00
OA	86.51	89.32	95.18	99.01	88.91	99.94
AA	87.14	91.27	94.69	99.21	92.05	99.96
Kappa	77.37	86.88	92.09	98.39	81.63	99.91

Explanation: The bold numbers in the table represent the maximum values corresponding to evaluation metrics or accuracy.

Table 7. Comparison of classification results of various networks on the SSCL dataset.

No	1DCNN	TS2DCNN	3DCNN	HybridSN	ResNet-50	DACNet
1	0.00	0.00	100.00	100.00	100.00	100.00
2	63.00	72.00	97.00	97.00	97.00	99.00
3	42.00	67.00	95.00	96.00	97.00	100.00
4	27.00	76.00	97.00	90.00	98.00	100.00
5	72.00	83.00	94.00	99.00	90.00	99.00
6	65.00	70.00	97.00	99.00	97.00	100.00
7	27.00	45.00	99.00	99.00	98.00	100.00
OA	61.87	77.26	95.79	97.92	94.59	99.52
AA	41.84	80.03	95.15	96.26	94.10	99.45
Kappa	50.41	76.47	94.72	97.40	93.13	99.41

Explanation: The bold numbers in the table represent the maximum values corresponding to evaluation metrics or accuracy.

Table 8. Comparison of classification results of various networks on the CW dataset.

No	1DCNN	TS2DCNN	3DCNN	HybridSN	ResNet-50	DACNet
1	96.00	100.00	100.00	100.00	100.00	100.00
2	98.00	98.00	100.00	100.00	99.00	100.00
3	91.00	100.00	99.00	99.00	98.00	100.00
4	89.00	99.00	100.00	100.00	98.00	100.00
OA	91.91	99.21	99.35	99.89	98.30	100.00
AA	90.57	99.03	99.32	99.90	98.56	100.00
Kappa	88.45	98.76	99.08	99.85	97.59	100.00

Explanation: The bold numbers in the table represent the maximum values corresponding to evaluation metrics or accuracy.

Table 9. Comparison of classification results of various networks on LC dataset.

No	1DCNN	TS2DCNN	3DCNN	HybridSN	ResNet-50	DACNet
1	95.00	100.00	100.00	100.00	100.00	100.00
2	72.00	97.00	89.00	94.00	77.00	98.00
3	91.00	100.00	99.00	100.00	95.00	100.00
4	32.00	75.00	93.00	79.00	53.00	96.00
5	64.00	96.00	97.00	97.00	87.00	98.00
OA	74.10	94.39	95.18	95.24	85.77	97.24
AA	67.10	93.32	94.69	93.93	80.71	96.69
Kappa	63.78	92.54	92.09	95.88	80.86	96.33

Explanation: The bold numbers in the table represent the maximum values corresponding to evaluation metrics or accuracy.

Table 10. Experimental study on ablation of different modules in DACNet.

Ablated Models	GCL			SSCL			CW			LC
Ablated Models	OA	AA	Kappa	OA	AA	Kappa	OA	AA	Kappa	OA	AA	Kappa
-w/o 1DCB	92.71	93.18	93.98	92.55	92.31	93.25	96.43	96.85	97.14	86.24	88.57	87.68
-w/o 2DCB	94.83	93.21	94.15	93.89	93.22	93.58	96.77	96.10	97.28	88.32	89.25	88.73
-w/o AM-SE	99.82	99.85	99.69	99.45	99.32	99.31	99.62	99.42	99.78	94.74	92.04	92.97
-w/o AM-SA	99.85	99.89	99.76	99.37	98.88	99.22	99.75	99.68	99.91	94.79	92.61	93.04
Full model	99.94	99.96	99.91	99.52	99.45	99.41	100.00	100.00	100.00	97.24	96.69	96.33

Table 11. Running time of six algorithm models on various datasets.

Model	GCL	SSCL	CW	LC
Model	Time(s)	Time (s)	Time (s)	Time (s)
1DCNN	50.15	104.73	42.38	46.28
TS2DCNN	200.98	98.69	55.84	72.63
3DCNN	75.66	125.27	85.39	45.98
HybridSN	392.37	480.50	355.31	227.87
ResNet-50	7436.27	6589.47	5897.76	3500.12
DACNet	86.37	84.15	72.40	45.22

Table 12. Comparison of experimental results between DACNet and traditional methods.

Models	GCL			SSCL			CW			LC
Models	OA	AA	Kappa	OA	AA	Kappa	OA	AA	Kappa	OA	AA	Kappa
SVM	94.72	95.69	91.25	89.91	88.20	87.39	78.30	74.98	75.61	76.20	67.98	67.35
KNN	93.05	92.25	88.56	89.55	87.22	86.93	86.61	84.53	80.85	74.21	62.36	63.96
NB	51.10	25.00	0.00	33.71	14.29	0.00	40.12	25.00	0.00	34.45	20.00	0.00
DACNet	99.94	99.96	99.91	99.52	99.45	99.41	100.00	100.00	100.00	97.24	96.69	96.33

Table 13. Types and quantities of land cover categories in IP, PU, and SA datasets.

Dataset	Category	Class	Number of Samples
IP	C1	Alfalfa	46
	C2	Corn-notill	1428
	C3	Corn-mintill	830
	C4	Corn	237
	C5	Grass-pasture	483
	C6	Grass-trees	730
	C7	Grass-pasture-mowed	28
	C8	Hay-windrowed	478
	C9	Oats	20
	C10	Soybean-nottill	972
	C11	Soybean-mintill	2455
	C12	Soybean-clean	593
	C13	Wheat	205
	C14	Woods	1265
	C15	Buildings-Grass-Trees-Drivers	386
	C16	Stone-Steel-Towers	93
PU	C1	Asphalt	6631
	C2	Meadows	18,649
	C3	Gravel	2099
	C4	Trees	3064
	C5	Painted metal sheets	1345
	C6	Bare Soil	5029
	C7	Bitumen	1330
	C8	Self-Blocking Bricks	3682
	C9	Shadows	947
SA	C1	Brocoli_green_weeds_1	2009
	C2	Brocoli_green_weeds_22	3726
	C3	Fallow	1976
	C4	Fallow_rough_plow	1394
	C5	Fallow_smooth	2678
	C6	Stubble	3959
	C7	Celery	3579
	C8	Grapes_untrained	11,271
	C9	Soil_vinyard_develop	6203
	C10	Corn_senesced_green_weeds	3278
	C11	Lettuce_romain_4wk	1068
	C12	Lettuce_romain_5wk	1927
	C13	Lettuce_romain_6wk	916
	C14	Lettuce_romain_7wk	1070
	C15	Vinyard_untrained	7268
	C16	Vinyard_vertical_trellis	1807

Table 14. Comparison of classification results of various networks on IP dataset.

Class Names	1DCNN	TS2DCNN	3DCNN	HybridSN	ResNet-50	DACNet
Alfalfa	0.00	96.00	100.00	93.00	94.00	96.00
Corn-notill	41.00	95.00	94.00	95.00	95.00	95.00
Corn-mintill	7.00	89.00	85.00	90.00	91.00	89.00
Corn	11.00	96.00	89.00	93.00	92.00	94.00
Grass-pasture	38.00	98.00	95.00	96.00	97.00	98.00
Grass-trees	74.00	98.00	100.00	99.00	99.00	100.00
Grass-pasture-mowed	0.00	83.00	93.00	75.00	93.00	95.00
Hay-windrowed	84.00	97.00	100.00	99.00	99.00	100.00
Oats	0.00	82.00	72.00	41.00	78.00	100.00
Soybean-notill	44.00	94.00	92.00	97.00	97.00	98.00
Soybean-mintill	47.00	94.00	94.00	96.00	95.00	96.00
Soybean-clean	15.00	92.00	92.00	92.00	92.00	94.00
Wheat	91.00	98.00	99.00	100.00	100.00	99.00
Woods	72.00	99.00	98.00	97.00	93.00	95.00
Buildings-Grass-Trees-Drives	45.00	91.00	88.00	93.00	92.00	93.00
Stone-Steel-Towers	100.00	95.00	90.00	96.00	97.00	100.00
OA	54.30	94.64	93.88	95.30	95.45	95.89
AA	36.97	91.55	91.79	91.83	92.14	92.17
Kappa	45.53	93.89	93.01	94.23	94.26	94.33

Explanation: The bold numbers in the table represent the maximum values corresponding to evaluation metrics or accuracy.

Table 15. Comparison of classification results of various networks on PU dataset.

Class Names	1DCNN	TS2DCNN	3DCNN	HybridSN	ResNet-50	DACNet
Asphalt	72.00	99.00	98.00	98.00	98.00	98.00
Meadows	90.00	100.00	100.00	100.00	100.00	100.00
Gravel	71.00	97.00	95.00	99.00	96.00	100.00
Trees	90.00	99.00	98.00	100.00	100.00	99.00
Painted metal sheets	99.00	99.00	100.00	100.00	100.00	100.00
Bare Soil	85.00	100.00	100.00	100.00	100.00	100.00
Bitumen	75.00	100.00	99.00	99.00	99.00	99.00
Self-Blocking Bricks	69.00	97.00	95.00	93.00	94.00	99.00
Shadows	85.00	99.00	97.00	99.00	99.00	100.00
OA	83.76	98.29	98.71	98.92	98.87	98.99
AA	69.41	96.59	97.81	98.12	98.14	98.15
Kappa	78.05	98.05	98.30	98.57	98.56	98.70

Explanation: The bold numbers in the table represent the maximum values corresponding to evaluation metrics or accuracy.

Table 16. Comparison of classification results of various networks on SA dataset.

Class Names	1DCNN	TS2DCNN	3DCNN	HybridSN	ResNet-50	DACNet
Brocoli_green_weeds_1	100.00	100.00	100.00	100.00	100.00	100.00
Brocoli_green_weeds_2	99.00	100.00	100.00	100.00	100.00	100.00
Fallow	94.00	100.00	99.00	99.00	100.00	100.00
Fallow_rough_plow	97.00	99.00	98.00	99.00	99.00	99.00
Fallow_smooth	97.00	100.00	100.00	100.00	99.00	100.00
Stubble	100.00	100.00	100.00	100.00	100.00	100.00
Celery	100.00	100.00	100.00	100.00	100.00	100.00
Grapes_untrained	79.00	100.00	100.00	99.00	99.00	100.00
Soil_vinyard_develop	99.00	100.00	100.00	100.00	100.00	100.00
Corn_senesced_green_weeds	97.00	100.00	100.00	100.00	100.00	100.00
Lettuce_romaine_4wk	96.00	100.00	99.00	100.00	100.00	100.00
Lettuce_romaine_5wk	93.00	100.00	100.00	100.00	100.00	100.00
Lettuce_romaine_6wk	90.00	100.00	100.00	100.00	100.00	100.00
Lettuce_romaine_7wk	88.00	100.00	100.00	100.00	100.00	100.00
Vinyard_untrained	78.00	100.00	100.00	100.00	100.00	100.00
Vinyard_vertical_trellis	98.00	100.00	100.00	100.00	100.00	100.00
OA	91.03	99.84	99.72	99.78	99.69	99.91
AA	93.94	99.83	99.78	99.79	99.72	99.86
Kappa	90.00	99.82	99.68	99.76	99.68	99.90

Explanation: The bold numbers in the table represent the maximum values corresponding to evaluation metrics or accuracy.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, S.; Cao, Y.; Bai, L.; Wu, Z. Research on Camouflage Target Classification and Recognition Based on Mid Wave Infrared Hyperspectral Imaging. Remote Sens. 2025, 17, 1475. https://doi.org/10.3390/rs17081475

AMA Style

Zhang S, Cao Y, Bai L, Wu Z. Research on Camouflage Target Classification and Recognition Based on Mid Wave Infrared Hyperspectral Imaging. Remote Sensing. 2025; 17(8):1475. https://doi.org/10.3390/rs17081475

Chicago/Turabian Style

Zhang, Shikun, Yunhua Cao, Lu Bai, and Zhensen Wu. 2025. "Research on Camouflage Target Classification and Recognition Based on Mid Wave Infrared Hyperspectral Imaging" Remote Sensing 17, no. 8: 1475. https://doi.org/10.3390/rs17081475

APA Style

Zhang, S., Cao, Y., Bai, L., & Wu, Z. (2025). Research on Camouflage Target Classification and Recognition Based on Mid Wave Infrared Hyperspectral Imaging. Remote Sensing, 17(8), 1475. https://doi.org/10.3390/rs17081475

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Research on Camouflage Target Classification and Recognition Based on Mid Wave Infrared Hyperspectral Imaging

Abstract

1. Introduction

2. Methods

2.1. Principal Component Analysis

2.2. One- and Two-Dimensional Convolution Branches

2.3. Dual-Channel Attention Mechanism

3. Results

3.1. Experiment and Dataset

3.2. Classification Evaluation Indicators

3.3. Multi-Category Comparative Evaluation

3.3.1. Comparison and Evaluation of GCL Dataset Classification Results

3.3.2. Comparison and Evaluation of SSCL Dataset Classification Results

3.3.3. Comparison and Evaluation of CW Dataset Classification Results

3.3.4. Comparison and Evaluation of LC Dataset Classification Results

3.4. Ablation Experiment

3.5. Time Analysis

3.6. Comparison with Traditional Camouflage Target Recognition Methods

3.7. Experiment on Public Datasets

3.7.1. Comparison and Evaluation of IP Dataset Classification Results

3.7.2. Comparison and Evaluation of PU Dataset Classification Results

3.7.3. Comparison and Evaluation of SA Dataset Classification Results

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI