A Novel Hyperspectral Image Classification Method Using Class-Weighted Domain Adaptation Network

Wang, Haoyu; Cheng, Yuhu; Wang, Xuesong

doi:10.3390/rs15040999

Open AccessArticle

A Novel Hyperspectral Image Classification Method Using Class-Weighted Domain Adaptation Network

by

Haoyu Wang

^1,2,

Yuhu Cheng

^1,2 and

Xuesong Wang

^1,2,*

¹

School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China

²

Engineering Research Center of Intelligent Control for Underground Space, Ministry of Education, China University of Mining and Technology, Xuzhou 221116, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2023, 15(4), 999; https://doi.org/10.3390/rs15040999

Submission received: 22 December 2022 / Revised: 23 January 2023 / Accepted: 9 February 2023 / Published: 10 February 2023

Download

Browse Figures

Versions Notes

Abstract

:

With the development of science and technology, hyperspectral image (HSI) classification has been studied in depth by researchers as one of the important means of human cognition in living environments and the exploration of surface information. Nevertheless, the shortage of labeled samples is a major difficulty in HSI classification. To address this issue, we propose a novel HSI classification method called class-weighted domain adaptation network (CWDAN). First, the convolutional domain adaption network (ConDAN) is designed to align the marginal distributions and second-order statistics, respectively, of both domains via multi-kernel maximum mean discrepancy (MK-MMD) and CORAL loss. Then, the class-weighted MMD (CWMMD) is defined to simultaneously consider the conditional distribution discrepancy and changes of class prior distributions, and the CWMMD-based domain adaptation term is incorporated into the classical broad learning system (BLS) to construct the weighted conditional broad network (WCBN). The WCBN is applied to reduce the conditional distribution discrepancy and class weight bias across domains, while performing breadth expansion on domain-invariant features to further enhance representation ability. In comparison with several existing mainstream methods, CWDAN has excellent classification performance on eight real HSI data pairs when only using labeled source domain samples.

Keywords:

hyperspectral image (HSI); classification; convolutional neural network; domain adaptation; broad learning

1. Introduction

Hyperspectral images (HSIs) can give comprehensive spectral and geographical information, which are commonly employed in astronomy, the military, and agriculture [1,2,3,4,5]. HSI classification is the critical and common technique in these applications. It is designed to use the spectral and spatial information of HSIs to identify surface objects on earth [6,7]. Luo et al. [8] realized the complementation of different features by taking into account the neighborhood, tangential, and statistical distribution of each sample under different features. In addition, an embedded objective function was constructed to effectively complete feature reduction and HSI classification. Classical machine learning [9,10,11] is used for HSI classification. The impressive classification performance of supervised learning methods generally requires the support of abundant labeled samples [12]. However, it is difficult for researchers to obtain accurate HSI labels [13]. Thus, how to accurately classify HSIs using a few labels is a hot topic in remote sensing [14]. Active learning and semi-supervised learning provide a viable solution. Active learning can actively select samples with the largest amount of information from unlabeled samples, and manually label them to increase the amount of training samples [15]. Semi-supervised learning can complete HSI classification with limited labeled samples and a large amount of unlabeled samples [16]. Luo et al. [17] proposed a novel sparse adaptive hypergraph discriminant analysis method, which reveals the structure relationship of HSIs using sparse representation to obtain the discriminative embedded features of HSIs. Zhang et al. [18] proposed a semisupervised classification method based on simple linear iterative cluster segmentation, which effectively explored the spectral features of HSIs and achieved good classification accuracy.

The above techniques can deal with the issue of insufficient training samples due to labeling difficulty to some extent. However, when the distributions of training and testing sample sets are quite different, it is difficult to achieve satisfactory results with the above methods. Domain adaptation (DA) transfers knowledge from a labeled domain (source domain) to a comparable but not identical domain (target domain) by exploring domain-invariant features [19,20]. When the target domain labels are missing or insufficient, DA can exploit similar labeled samples in the source domain to solve the problem of the target domain [21]. According to whether there is a discrepancy between two domains, DA methods can be mainly summarized to fall into two categories: homogeneous and heterogeneous. For the homogeneous DA methods, the feature spaces of both domains are consistent. Kumar et al. [22] developed a theory for gradual domain adaptation and reliably adapted the different distributions between domains with the gradual shift structure. Unlike with homogeneous DA, the main difficulty with heterogeneous DA is that the source and target domain data are located in different feature spaces. To address multi-source heterogeneous unsupervised DA problems, Liu et al. [23] presented a shared-fuzzy-equivalence-relation neural network containing multiple source branches and one target branch.

According to the type of learning model, DA methods can be classified as shallow-learning-based or deep-learning-based. At present, there are a large number of domain adaptation methods based on shallow learning. Since the shallow-learning model cannot fit the data distribution well, it will lead to under-fitting and under-matching problems. The deep neural network has a powerful non-linear representation capability, and thus can extract discriminative and compact features of the input [24,25]. Given the above advantages, DA methods based on deep neural networks have been extensively studied by researchers [26]. Long et al. [27] proposed a deep adaptation network (DAN) to reduce the marginal distribution discrepancy between domains by embedding multiple-kernel maximum mean discrepancy (MK-MMD) into CNNs, which generalized CNNs to the domain adaptation scenario. MMD is one of the most commonly used nonparametric methods to measure the distribution discrepancy across domains. For MMD, kernel selection is very important for ensuring effectiveness. In view of this, Liu et al. [28] proposed a class of non-parametric two-sample tests for learning deep kernels. To detect the discrepancy between natural and adversarial data, Gao et al. [29] further designed a simple and effective semantic-aware MMD based on the two-sample tests. Ganin and Lempitsky [30] applied the adversarial idea to domain adaptation and proposed a domain-adversarial neural network (DANN). Ma et al. [31] proposed a deep domain adaptation network containing three modules: domain alignment module, task allocation module, and domain adaptation module, which successfully achieved the cross-domain classification of HSIs. Wang et al. [32] added a weighted maximum mean-discrepancy-based term and a manifold regularization term into the objective function of the deep neural network, simultaneously achieving domain adaptation. Deng et al. [33] introduced metric learning into a deep embedding network, and achieved same-scene and different-scene HSI classifications.

Recently, the novel broad learning system (BLS) [34] was proposed, which can be viewed as an alternative way of learning in a deep structure. The structure of BLS mainly consists of three parts: the mapped feature (MF) layer for feature mapping of input data, the enhancement node (EN) layer for breadth expansion of the mapping features, and the output layer for solving the weight through ridge regression. Compared with deep learning networks, BLS has the following advantages [35,36]: (1) The structure of the BLS is simple and flexible, and it is easy to integrate with other models; (2) because of EN, BLS is able to achieve feature breadth expansion to enhance the capability of feature representation. Wang et al. [37] fused graph convolution operation and BLS as a unified framework, fully utilizing the flexibility and feature breadth expansion ability of BLS to achieve efficient HSI classification. Guo et al. [38] made full use of the fast training-speed of BLS to pre-train multiple groups of classification models, and then built a dynamic integration structure with multiple classifier groups to determine the class of vehicles. Kong et al. [39] extended the multi-level depth feature using BLS to obtain multi-level features, and introduced block diagonal constraints to enhance the independence between multi-level features. The above model has achieved good results in HSI classification. Feng and Chen [40] achieved good performance in regression and classification tasks by organically combining a Takagi–Sugeno fuzzy system with BLS.

Due to the high label cost of HSIs, HSIs (target domain) are often lacking or even without labels in actual scenes. In order to classify the HSIs of the target domain in this scenario, transfer learning can use the source domain with rich labels to help the target domain complete the classification task. However, due to the differences in equipment, environment and spatial area during the acquisition of different HSIs, there are often large differences in the distribution of the two domains, which causes susceptibility to the phenomenon of difficult knowledge transfer between the two domains. The above problems are difficult to solve with common transfer-learning-based methods because they have the following limitations:

(1): Common transfer learning methods often only consider aligning the two domain distributions by minimizing the difference between the first or second order statistics of the two domains, with which it is difficult to achieve comprehensive adaptation to the two domain distributions;
(2): Common transfer learning methods often ignore the difference in conditional distribution between the two domains, which easily leads to confusion of the two domains’ discriminant features, and hinders the improvement of the model’s classification performance;
(3): Due to the difference between the two HSIs in spatial areas during acquisition, the distribution of their class prior may not be consistent. This phenomenon may lead to insufficient alignment of class distribution. When the two-domain classes are unbalanced, this phenomenon often causes serious performance damage to the model.

To solve the above problems, we propose CWDAN, which considers the marginal distribution, conditional distribution and class prior distribution of the two domains. Specifically, when aligning the marginal distribution of the two domains, the difference between the first-order statistics and the second-order statistics is minimized through the MMD and CORAL, so as to achieve full alignment of the marginal distribution of the two domains. Then, BLS is used to expand the width of the domain-adaptation features extracted by the ConDAN to further enhance the feature representation ability. In addition, the CWMMD-based domain adaptation term is added to BLS, which reduces the difference between the conditional distributions and class weight bias, so that the model pays more attention to the classes that occupy a higher proportion in the target domain, and thus improves its classification performance in the target domain. The main novel aspects of our work are summarized below:

(1): We propose a convolutional domain adaptation network (ConDAN); by simultaneously reducing the two-domain difference between the first- and second-order statistics, sufficient and fast marginal distribution alignment was achieved;
(2): We defined a novel class-weighted maximum mean discrepancy (CWMMD) and further imposed the CWMMD-based domain adaptation term in classical BLS. The weight of each class in the domain adaptation is adjusted based on the class prior distribution, and the training is focused on the important classes in the target domain, which solves the problem of class being unbalanced during conditional distribution alignment;
(3): Deep learning and broad learning are embedded in a unified framework, and the strong feature extraction ability of deep learning as well as the feature breadth expansion ability of broad learning are fully utilized to achieve the extraction and enhancement of domain-invariant features.

The rest of this paper is organized as follows. The flowchart of CWDAN for HSI classification is presented in Section 2. The experimental results on eight real HSI data pairs are presented and analyzed, followed by a conclusion in Section 4.

2. Classification of HSIs Based on CWDAN

2.1. Network Overview

ConDAN consists of a band selection module, a feature extractor and an auxiliary classifier. The band selection module reduces the two-domain dimensions by selecting spectral bands to reduce the redundancy. The feature extractor is composed of a conv1d, nonlinear layer, and a pooling layer, which is used to extract features from two domains. The auxiliary classifier consists of FCs and Softmax layers, which can obtain the target domain pseudo labels.

WCBN consists of an MF layer, an EN layer and a classifier. The MF layer based on CWMMD can simultaneously reduce the conditional distribution discrepancy and class weight bias. The EN layer is used for feature breadth expansion. The classifier can obtain predicted results.

2.2. Flowchart of CWDAN for HSI Classification

The structure diagram of the proposed CWDAN consisting of five stages is shown in Figure 1.

(1): Use band selection to reduce the dimensions and eliminate redundant features of the original HSIs and unify the HSIs of two domains into the same dimension;
(2): Input the preprocessed HSIs of the source and target domains into ConDAN for model training, and extract the deep features of HSIs with the trained ConDAN, which servers as the auxiliary classifier by minimizing the difference between the first- and second-order statistics of the two domains through marginal and covariance adaptation terms;
(3): First, exploit the auxiliary classifier trained on the source domain to obtain the target domain pseudo labels, and based on that, calculate the class importance parameter. Then, input the deep features extracted by ConDAN to WCBN, and weighted-align the conditional distributions between two domains based on CWMMD. Thus, the domain-invariant features of the source and target HSIs, i.e., source MFs and target MFs, can be extracted;
(4): Map source and target MFs to ENs with randomly generated weights for feature breadth expansion, which is helpful for further enhancing the representation ability of the features;
(5): Calculate the output layer weights with ridge region theory, and then calculate the class probability vector of the input samples and obtain the predicted results.

2.3. Band-Selection-Based HSI Dimensionality Reduction

HSIs contains a great deal of bands and there is usually a high correlation between bands, which may result in the problem known as the curse of dimensionality [41]. Therefore, it is necessary to investigate effective dimensionality reduction (DR) methods before HSI classification. The commonly used DR methods include band selection and band extraction, and the former directly selects some bands from the original band space based on certain searching strategies. Motivated by [42], a simple band selection method is implemented to reduce the dimension of HSIs. Denoting the number of bands in original HSIs,

N_{b}

, d is the dimension after band selection.

⌊N_{b} / d⌋

and

⌊N_{b} / d⌋ + 1

as intervals are exploited to select bands, where

⌊.⌋

represents the floor operation. Then we have:

\{\begin{matrix} c + b = d \\ ⌊N_{b} / d⌋ * c + (⌊N_{b} / d⌋ + 1) * b = N_{b} \end{matrix}

(1)

where c and b are the numbers of the selected bands.

X_{0} \in R^{n \times d}

denotes the samples after dimension reduction,

X_{0}^{s}

=

\{x_{1}, x_{2}, \dots, x_{n_{s}}\} \in R^{n_{s} \times d}

and

X_{0}^{t} = \{x_{1}, x_{2}, \dots, x_{n_{t}}\} \in R^{n_{t} \times d}

represent samples from the source and target domains, respectively, and

Y_{s}

represents the sample labels of the source domain.

2.4. Deep Feature Extracting Based on ConDAN

Although deep neural networks are more powerful for extracting deep and discriminative features of original HSIs, the higher task-specific layers of CNNs can barely learn the transferable knowledge if the distribution of training data and testing data are different [27]. Dataset shift has functioned as a bottleneck for the transferability of deep neural networks, resulting in statistically unbounded risk for the target task [43,44]. In view of this, we propose the ConDAN by adding a domain adaptation layer in the CNN.

The ConDAN is composed of a convolution layer, nonlinear layer, pooling layer, fully connected layer, domain adaptation layer and Softmax layer, as shown in Figure 2.

The domain adaptation layer is added to reduce the distribution discrepancy between two domains, and the output of the domain adaptation layer is connected to the Softmax layer. The loss function of ConDAN can be expressed as:

\begin{matrix} L (X_{0}^{s}, X_{0}^{t}, Y_{s}; θ) = & α_{1} L_{MMD} (X_{0}^{s}, X_{0}^{t}; θ) \\ + α_{2} L_{CORAL} (X_{0}^{s}, X_{0}^{t}; θ) \\ + L_{src} (X_{0}^{s}, Y_{s}; θ) \end{matrix}

(2)

where

L_{MMD} (X_{0}^{s}, X_{0}^{t}; θ)

and

L_{CORAL} (X_{0}^{s}, X_{0}^{t}; θ)

are the marginal and covariance adaptation terms, which are used to align the marginal distribution of the two domains.

α_{1}

and

α_{2}

are the marginal and covariance adaptation parameters, which are used to adjust the importance of first-order statistic difference and second- order statistic difference, respectively.

L_{src} (X_{0}^{s}, Y_{s}; θ)

is the classification loss on the labeled source domain.

The marginal adaptation term is:

\begin{matrix} L_{MMD} (X_{0}^{s}, X_{0}^{t}; θ) = {∥\frac{\sum_{i = 1}^{n_{s}} ϑ (X_{i}^{s})}{n_{s}} - \frac{\sum_{j = 1}^{n_{t}} ϑ (X_{j}^{t})}{n_{t}}∥}_{H}^{2} \end{matrix}

(3)

The covariance adaptation term is defined as:

L_{CORAL} (X_{0}^{s}, X_{0}^{t}; θ) = \frac{1}{4 d_{1}} ∥C_{s} - C_{t}∥

(4)

where

d_{1}

is the input dimension of the domain adaptation layer,

C_{s}

and

C_{t}

are the covariance matrices of the source and target domains.

The classification loss on the labeled source data can be expressed as:

L_{src} (X_{0}^{s}, Y_{s}; θ) = - \sum_{k \in Y_{s}} \sum_{c = 1}^{C} Y_{k c}^{s} ln S_{k c}^{s}

(5)

where C is the number of classes,

Y

is the class matrix, and

S

is the prediction result of ConDAN.

2.5. Class-Weighted Domain Adaptation and Breadth Expansion Based on WCBN

ConDAN only adapts the marginal distributions of both domains, but ignores the difference of conditional distribution between the two domains, which will lead to confusion of the discriminative features. However, most of the existing methods for aligning the conditional distribution do not take into account the changes of class prior distributions, i.e., class weight bias across domains. The above problem easily leads to confusion of the two-domain discriminative features, and hinders the improvement of the model’s classification performance.

To address this concern, the CWMMD is proposed, and the domain adaptation term based on CWMMD is added to the MF during feature mapping, and then the WCBN is obtained. On the one hand, WCBN helps the model to achieve more fine-grained domain-invariant feature extraction by aligning the conditional distribution of the two domains. On the other hand, the changes of class prior distributions are considered. CWMMD is used to weight-adapt the conditional distribution according to the class importance of the target domain, so that the model pays more attention to the class with a high proportion of the target domain, so as to obtain the domain-invariant features more applicable to the target domain.

The workflow of WCBN can be described as follows: Map the deep features

X = [X_{s}; X_{t}] \in R^{n \times d_{2}}

extracted by ConDAN with

A

to MF, and the i-th group MF is:

Z_{i} = X A + β_{e i}, i = 1, \dots, d^{M}

(6)

where

Z_{i} \in R^{(n_{s} + n_{t}) \times G^{M}}

,

A

and

β_{e i}

are the connecting weight and bias from

X

to MF,

d^{M}

is the number of nodes in MF,

G^{M}

is the feature dimension of each group. Similar to SAE, the optimization equation is:

\underset{A_{i}}{argmin} {∥X A_{i} - Z_{i}∥}_{2}^{2} + λ {∥A_{i}∥}_{1}

(7)

where

λ

denotes the regularization parameter. To simultaneously consider the conditional distribution divergences and the changes of class prior distributions, the domain adaptation term

D_{c f} (P_{s}, P_{t})

based on CWMMD is used to constrain the stacked autoencoder (SAE) which is exploited to fine-tune the weights mapped from input data to MF in the original BLS, obtaining the weighted conditional distribution SAE (WCDSAE):

\underset{A_{i}}{argmin} {∥X A_{i} - Z_{i}∥}_{2}^{2} + λ {∥A_{i}∥}_{1} + γ D_{c f} (P_{s}, P_{t})

(8)

where

γ

is the domain adaptation parameter,

c \in \{1, 2, \dots, C\}

is the class index, and

D_{c f} (P_{s}, P_{t})

can be expressed as:

\begin{matrix} D_{c f} (P_{s}, P_{t}) & = \sum_{c = 1}^{C} ω_{c} {∥E [f (z_{s}^{(c)})] - E [f (z_{t}^{(c)})]∥}_{H}^{2} \\ = tr (A^{T} X^{T} M X A) \end{matrix}

(9)

where

ω_{c} = \sqrt{\frac{n_{c}^{t} / n_{t}}{n_{c}^{s} / n_{s}}}

is the class importance weight. When c has a high proportion in the target domain and a low proportion in the source domain, the class will get a larger

ω_{c}

. In this way, the model will pay more attention to the classes that occupy a high proportion in the target domain, and thus improves its classification performance in the target domain,

M = \sum_{c = 1}^{C} ω_{c} M_{c}

,

M_{c}

can be written as:

{(M_{c})}_{i j} = \{\begin{matrix} \frac{1}{{(n_{c}^{s})}^{2}}, & x_{i}, x_{j} \in D_{s}^{(c)} \\ \frac{1}{{(n_{c}^{t})}^{2}}, & x_{i}, x_{j} \in D_{t}^{(c)} \\ - \frac{1}{(n_{c}^{s}) (n_{c}^{t})}, & \{\begin{matrix} x_{i} \in D_{s}^{(c)}, x_{j} \in D_{t}^{(c)} \\ x_{i} \in D_{t}^{(c)}, x_{j} \in D_{s}^{(c)} \end{matrix} \\ 0, & otherwise \end{matrix}

(10)

Thus, Equation (7) can be expressed as:

\underset{A_{i}}{argmin} {∥X A_{i} - Z_{i}∥}_{2}^{2} + λ {∥A_{i}∥}_{1} + γ tr (A_{i}^{T} X^{T} M X A_{i})

(11)

According to the alternating direction method of multipliers, Equation (11) can be solved to get

A_{i}

.

Z_{i}

can be calculated by:

Z_{i} = X A_{i}

(12)

For clarity, we denote the mapped features from the source and target domains as

Z_{i}^{s} = X_{s} A_{i}

and

Z_{i}^{t} = X_{t} A_{i}

, respectively. Afterwards, we use randomly generated weight

W^{E}

to map

Z

to EN for breadth expansion:

H = σ (Z W^{E})

(13)

where

Z = [Z_{1}, Z_{2}, \dots, Z_{d^{M}}]

,

σ (\cdot)

is the TanSig function here,

H \in R^{(n_{s} + n_{t}) \times d^{E}}

are features of EN, and

d^{E}

is the number of nodes in EN. Both MF and EN are connected to the output layer, and the objective function of WCBN can be expressed as:

\underset{W}{argmin} {∥Q_{s} W - Y_{s}∥}_{2}^{2} + δ {∥ W ∥}_{2}^{2}

(14)

where

Q_{s} = [Z_{s} | H_{s}]

,

δ

is the regularization parameter. Calculate Equation (14) with ridge regression theory to get:

W = \frac{Q_{s}^{T} Y_{s}}{δ I + Q_{s}^{T} Q_{s}}

(15)

The predicted result can be calculated as:

Y_{t} = Q_{t} W

(16)

where

Q_{t} = [Z_{t} | H_{t}]

.

3. Experiments and Analysis

3.1. HSI Datasets

Two real HSI datasets including BOT and KSC were used for experiments. The sample size of per-class surface object is described in Table 1. There are three groups of hyperspectral data (BOT5, BOT6, and BOT7) in the BOT dataset in total, which were collected in May, June, and July, respectively over the Okavango Delta, Botswana. Any two of the three BOT data groups can be selected as the source and target domains, and thus six data pairs can be used for experiments. We selected two groups of hyperspectral data (KSC1 and KSC3) from the KSC dataset for experiment; each dataset is treated as the source domain and target domain, respectively, for the experiment.

3.2. Experimental Setting

The following methods are used for comparison, including non-transfer learning methods, shallow-learning-based DA methods and deep-learning-based DA methods:

(1): Non-transfer learning method: BLS [34];
(2): Shallow-learning-based DA methods: TCA [21], JDA [45], and DABL [46];
(3): Deep-learning-based DA methods: DANN [30], DAN [27], DCORAL [47], CDAN [48], DDA-Net [31], DDME [32], and SG-CNN [42].

The experimental settings are as follows: the configuration of ConDAN is shown in Table 2. In CWDAN,

α_{1}

= 1,

α_{2}

= 0.6,

γ

= 0.5,

λ

= 0.01,

G^{M}

= 25,

d^{M}

= 55, and

d^{E}

= 700; a total of 8 data pairs were used for experiments: BOT5–6, BOT6–5, BOT5–7, BOT7–5, BOT6–7, BOT7–6, KSC1–3, and KSC3–1. The former in each set of data pairs represent the source domain, and the latter represent the target domain.

3.3. Comparative Experiments

OA is the abbreviation for overall accuracy. The overall classification accuracy is obtained by dividing the number of correctly classified samples by the total number of samples. The comparison of OA achieved by different classification methods on eight data pairs is reported in Table 3. Table 4 lists the classification accuracy of each surface object in the BOT5–6 and BOT7–6 data pairs respectively. The following observations can be made:

(1): Since BOT5–7, BOT7–5, KSC1–3, and KSC3–1 have big spectral drift, OA achieved by all evaluation methods on these data pairs is low;
(2): Because BLS belongs to the non-transferable method, it achieves the lowest OA on all BOT and KSC data pairs;
(3): Among three shallow-learning-based DA methods, DABL achieves the highest OA while TCA has the lowest. The main reasons for the above experimental results can be summarized as: JDA additionally adapts the conditional distribution compared to TCA, and DABL performs the breadth expansion of features compared to JDA;
(4): Compared with DAN and DCORAL, our ConDAN achieves higher OA because it comprehensively takes full advantage of DAN and DCORAL, i.e., it simultaneously aligns the marginal distributions and covariance matrices of the source and target domains;
(5): Both DANN and CDAN are adversarial adaptation methods. The DANN only pays attention to aligning the feature distributions across domains, while CDAN tries to capture the cross-covariance between features and classifier predictions. Therefore, CDAN achieves higher OAs than DANN.

We used the BOT5–6, BOT6–5, and BOT7–6 data pairs for visualization illustration. The original features and the features extracted by CWDAN were visualized to obtain Figure 3. For example, the red hollow circle and red solid square denote the sample points of Class 1 in the source and target domains, respectively. From Figure 3a–c, it is easily observed that not only do some original features have large within-class variance, but also that different classes may have considerable overlapping features, which is not conducive to HSI classification. Compared with original HSI features, the features extracted by CWDAN have smaller within-class variance and larger inter-class variance, which makes it easy to distinguish different classes of surface objects.

To further verify the effect of CWDAN, Figure 4 visibly shows the alignment results of Classes 2–4, i.e., Firescar, Island interior, and Riparian in the BOT5–6 data pair, where the data from different domains are displayed in circles with different colors, and the bigger orange and green rhombi represent the means of classes from the source and target domains respectively. It can be observed from Figure 4a–c that the distribution discrepancy between domains is large, since BLS does not apply the domain adaptation strategy. As shown in Figure 4d–f, with the help of CWDAN, the features of all three classes are better aligned and the centroids of the same class from different domains become closer.

3.4. Ablation for Band Selection

To determine the number of selected bands, we conducted experiments on eight HSI data pairs, CWDAN-32 (d = 32), CWDAN (d = 64), CWDAN-128 (d = 128) and CWDAN-O (CWDAN without band selection), as shown in Table 5.

It can be seen from Table 5 that:

(1): CWDAN-32 has the lowest classification accuracy, because the number of bands selected is small at this time, which will lead to an excessive interval of band selection, causing serious loss of spectral information;
(2): Compared with CWDAN-O, the classification accuracy of CWDAN-128 is improved, but optimal classification accuracy is not achieved. This is because higher band selection will result in excessively small intervals of band selection, which can only alleviate the phenomenon of information redundancy between bands to a certain extent, and can hinder the improvement of model classification accuracy;
(3): The classification accuracy of CWDAN is the highest because at this time, the spectral information loss is small, and the spectral bands have low redundancy, so that the model can achieve effective feature extraction and complete the HSI classification task more accurately.

3.5. Ablation Study

To verify the impact of BS (band selection), MMD, CORAL, CDA (conditional distribution adaptation) and CWMMD on the CWDAN classification performance, we added ablation experiments, as shown in Table 6 and Table 7.

It can be observed from Table 6 and Table 7 that:

(1): Compared with CWDAN-A, CWDAN-B shows higher classification accuracy, which is because band selection can effectively alleviate the problem of HSI band information redundancy and help the model achieve more identifiable feature extraction;
(2): Compared with CWDAN-B, CWDAN-C achieved better classification performance. This is because reducing the difference between the first-order statistics of the two domains can effectively align the distribution, helping to achieve knowledge transfer between the two domains;
(3): Compared with CWDAN-C, CWDAN-D has better classification performance, which indicates that reducing the difference between the first- and second-order statistics simultaneously can effectively align the marginal distribution of the two domains;
(4): Compared with CWDAN-D, CWDAN-E has higher classification accuracy, which additionally adapts the conditional distribution, and helps the model extract more fine-grained domain-invariant features;
(5): Compared with CWDAN-E, CWDAN achieved better classification performance. This is because CWMMD can alleviate the effect of class weight bias based on the changes of class prior distributions.

3.6. The Impact of Noise on CWDAN

Finally, we analyze the impact of noisy source domain labels on the OA of CWDAN. We took BOT5–6 as an example and randomly assigned 10%, 20%, 30%, 40%, and 50% labels of training samples as the wrong ones. Table 8 shows the comparison of the decline rate of OA (%) achieved by CWDAN, CDAN, and DAN, where “0%” noisy means that all labels of training samples are correct, i.e., clean labels. It can be observed that: (1) OA generally decreases as the number of noisy labels increases; (2) compared with CDAN and DAN, the decline rates of OA of CWDAN under different amounts of noisy labels are relatively small, indicating the robustness of CWDAN against noisy labels.

3.7. Computational Cost

In this section, we show the running time and number of network parameters for different methods in Table 9.

It can be seen from Table 9 that:

(1): Compared with shallow learning methods, such as BLS, TCA, DABL, and JDA, CWDAN has longer running time. This is because when aligning the marginal distribution, CWDAN needs to optimize the model parameters through back-propagation, which increases the running time;
(2): CWDAN takes less time than deep-learning-based DA methods such as DANN, CDAN, DDA-Net, and SG-CNN. This is because CWDAN consists of ConDAN and WCBN, where ConDAN quickly aligns the marginal distribution from the first and second order statistics at the same time, while WCBN aligns the two-domain conditional distribution without updating the network parameters through back-propagation.

4. Discussion

The HSI dimension reduction method usually consists of two categories: feature-learning-based and band-selection-based. The feature-learning-based method maps the original data to a low-dimensional subspace for dimension reduction. Common techniques include maximum noise fraction [49] and principal component analysis [50]. The band-selection-based method selects bands with large amounts of information according to a certain standard. Common techniques include equal interval band selection [42] and progressive band selection [51]. However, the operation of image transformation with the feature-learning-based method makes the transformed data no longer have the original physical attributes, which is not conducive to understanding the original data. Compared with the above method, the advantages of band selection include [52,53]: (1) the selected band not only contains useful detailed data information, but also maintains the integrity of its physical attributes; (2) the operation of band selection is relatively simple.

The common methods for alleviating the class distribution misalignment caused by class unbalance are divided into two categories: data-resampling-based [54,55] and sample-reweighting-based [56,57]. The first method alleviates the class distribution misalignment by oversampling the minority classes or undersampling the majority classes. However, the oversampling of minority classes may easily lead to over-fitting, while the undersampling of majority classes may lead to information loss. The other method alleviates the class distribution misalignment by assigning the corresponding weight to each class in the process of domain adaptation. However, most of the sample reweighting-based methods tend to pay the same attention to each class when designing the weight factor, without considering the relationship between the weight factor and the class prior distribution in the target domain. Considering that the classification task is targeted at the target domain, the proposed method designs the weight factor based on the prior distribution of the target class, and focuses the training of the model on the important classes of the target domain during domain adaptation, which improves the accuracy of the model in the task of cross-domain HSI classification.

The OA of CWDAN in all data pairs is the highest, with a little sacrifice in terms of running time and number of parameters, which is due to the advantages of the proposed method. The advantages of CWDAN are summarized as follows: (1) By simultaneously reducing the two-domain difference between the first and second order statistics, sufficient and fast marginal distribution alignment is achieved; (2) the weight of each class in the domain adaptation is adjusted based on the class prior distribution of the two domains, and the training is focused on the important classes in the target domain, which solves the problem of class unbalanced during the conditional distribution alignment; (3) deep learning and broad learning are embedded in a unified framework, and the strong feature extraction ability of deep learning and the feature breadth expansion ability of broad learning are fully utilized to achieve the extraction and enhancement of domain-invariant features.

The disadvantages of CWDAN are summarized as follows: the model cannot adaptively adjust

α_{1}

and

α_{2}

, which makes it difficult to adapt to the marginal distribution differences according to importance, thus hindering the further improvement of domain adaptation performance.

5. Conclusions

As remote sensing technology advances, more and more hyperspectral images can be collected and analyzed. The classification of hyperspectral images is a basic foundation for its further application in land resource evaluation and analysis. One of the most challenging problems faced by HSI classification is the scarcity of sample labels. To address this concern, we have investigated the effectiveness of CWDAN for HSI classification. The proposed CWDAN can achieve satisfactory classification performance, since it aligns the deep-feature marginal distributions, second-order statistics, and conditional distributions between two domains, and reduces the class weight bias via CWMMD. Extensive experiments conducted with eight real HSI data pairs demonstrate that our proposed CWDAN is superior to several advanced domain adaptation methods.

Author Contributions

H.W., Y.C. and X.W. provided significant contributions to the work. H.W. and X.W. provided method ideas for this study; H.W. and X.W. performed the experiments; H.W. analyzed the data; H.W. and X.W. wrote the original paper; X.W. reviewed and edited the paper. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China under Grant 61976215 and Grant 62176259. This research was funded by the Natural Science Foundation of Jiangsu Province under Grant BK20221116. This research was also funded by the Excellent Post Doctorate Program of Jiangsu Province under Grant 2022ZB530.

Data Availability Statement

The locations of these observers are generated by computer simulation. lt is easy to generate the simulation with the method in the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

BLS	Broad learning system
CDAN	Conditional adversarial domain adaptation
CNN	Convolutional neural network
ConDAN	Convolutional domain adaption network
CWDAN	Class-weighted domain adaptation network
CWMMD	Class-weighted MMD
DA	Domain adaptation
DABL	Domain adaptation broad learning
DAN	Deep adaptation network
DANN	Domain adversarial neural network
DCORAL	Deep correlation alignment
DDA-Net	Deep domain adaptation network
DDME	Discriminative distribution and manifold embedding
DR	Dimensionality reduction
EN	Enhancement nodes
HSI	Hyperspectral image
JDA	Joint distribution adaptation
MK-MMD	Multi-kernel maximum mean discrepancy
MMD	Maximum mean discrepancy
OA	Overall Accuracy
SG-CNN	Shuffled group convolutional neural network
TCA	Transfer component analysis
WCBM	Weighted conditional broad network
WMMD	Weighted MMD

References

Zhang, T.; Wang, W.; Wang, J.; Cai, Y.; Yang, Z.; Li, J. Hyper-LGNet: Coupling local and global features for hyperspectral image classification. Remote Sens. 2022, 14, 5251. [Google Scholar] [CrossRef]
Yang, S.; Feng, Z.; Wang, M.; Zhang, K. Self-paced learning-based probability subspace projection for hyperspectral image classification. IEEE Trans. Neural Netw. Learn. Syst. 2018, 30, 630–635. [Google Scholar] [CrossRef] [PubMed]
Ma, L.; Crawford, M.; Lei, Z.; Liu, Y. Centroid and covariance alignment-based domain adaptation for unsupervised classification of remote sensing images. IEEE Trans. Geosci. Remote Sens. 2019, 57, 2305–2323. [Google Scholar] [CrossRef]
Datta, D.; Mallick, P.K.; Reddy, A.V.N.; Mohammed, M.A.; Jaber, M.M.; Alghawli, A.S.; Al-qaness, M.A.A. A hybrid classification of imbalanced hyperspectral images using ADASYN and enhanced deep subsampled multi-grained cascaded forest. Remote Sens. 2022, 14, 4853. [Google Scholar] [CrossRef]
Ren, Q.; Tu, B.; Liao, S.; Chen, S. Hyperspectral image classification with iformer network feature extraction. Remote Sens. 2022, 14, 4866. [Google Scholar] [CrossRef]
Jia, S.; Lin, Z.; Deng, B.; Zhu, J.; Li, Q. Cascade superpixel regularized gabor feature fusion for hyperspectral image classification. IEEE Trans. Neural Netw. Learn. Syst. 2020, 31, 1638–1652. [Google Scholar] [CrossRef]
Ding, Y.; Pan, S.; Chong, Y. Robust spatial-spectral block-diagonal structure representation with fuzzy class probability for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2020, 58, 1747–1762. [Google Scholar] [CrossRef]
Luo, F.; Zou, Z.; Liu, J.; Lin, Z. Dimensionality reduction and classification of hyperspectral image via multistructure unified discriminative embedding. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5517916. [Google Scholar] [CrossRef]
Melgani, F.; Bruzzone, L. Classification of hyperspectral remote sensing images with support vector machines. IEEE Trans. Geosci. Remote Sens. 2004, 42, 1778–1790. [Google Scholar] [CrossRef]
Peng, J.; Li, L.; Tang, Y. Maximum likelihood estimation-based joint sparse representation for the classification of hyperspectral remote sensing images. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 1790–1802. [Google Scholar] [CrossRef]
Li, W.; Wu, G.; Zhang, F.; Du, Q. Hyperspectral image classification using deep pixel-pair features. IEEE Trans. Geosci. Remote Sens. 2017, 55, 844–853. [Google Scholar] [CrossRef]
Wang, X.; Tan, K.; Du, Q.; Chen, Y.; Du, P. Caps-TripleGAN: GAN-assisted capsNet for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2019, 57, 7232–7245. [Google Scholar] [CrossRef]
Li, X.; Zhang, L.; Du, B.; Zhang, L. On gleaning knowledge from cross domains by sparse subspace correlation analysis for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2019, 57, 5863–5872. [Google Scholar] [CrossRef]
Ma, C.; Jiang, J.; Li, H.; Mei, X.; Bai, C. Hyperspectral image classification via spectral pooling and hybrid transformer. Remote Sens. 2022, 14, 4732. [Google Scholar] [CrossRef]
Liu, C.; Li, J.; He, L. Superpixel-based semisupervised active learning for hyperspectral image classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 357–370. [Google Scholar] [CrossRef]
Qin, A.; Shang, Z.; Tian, J.; Wang, Y.; Zhang, T.; Tang, Y. Spectral-spatial graph convolutional networks for semisupervised hyperspectral image classification. IEEE Geosci. Remote Sens. Lett. 2019, 16, 241–245. [Google Scholar] [CrossRef]
Luo, F.; Zhang, L.; Zhou, X.; Guo, T.; Cheng, Y.; Yin, T. Sparse-adaptive hypergraph discriminant analysis for hyperspectral image classification. IEEE Geosci. IEEE Geosci. Remote Sens. Lett. 2020, 17, 1082–1086. [Google Scholar] [CrossRef]
Zhang, Y.; Liu, K.; Dong, Y.; Wu, K.; Hu, X. Semisupervised classification based on SLIC segmentation for hyperspectral Image. IEEE Geosci. Remote Sens. Lett. 2020, 17, 1440–1444. [Google Scholar] [CrossRef]
Hu, L.; Kan, M.; Shan, S.; Chen, X. Unsupervised domain adaptation with hierarchical gradient synchronization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Online, 14–19 June 2020; pp. 4042–4051. [Google Scholar]
Bruzzone, L.; Marconcini, M. Domain adaptation problems: A DASVM classification technique and a circular validation strategy. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 32, 770–787. [Google Scholar] [CrossRef] [PubMed]
Ma, L.; Luo, C.; Peng, J.; Du, Q. Unsupervised manifold alignment for cross-domain classification of remote sensing images. IEEE Geosci. Remote Sens. Lett. 2019, 16, 1650–1654. [Google Scholar] [CrossRef]
Kumar, A.; Ma, T.; Liang, P. Understanding self-training for gradual domain adaptation. In Proceedings of the 37th International Conference on Machine Learning, Online, 13–18 July 2020; pp. 5424–5435. [Google Scholar]
Liu, F.; Zhang, G.; Lu, J. Multisource heterogeneous unsupervised domain adaptation via fuzzy-relation neural networks. IEEE Trans. Fuzzy Syst. 2020, 29, 3308–3322. [Google Scholar] [CrossRef]
Aydemir, M.S.; Bligin, G. Semisupervised hyperspectral image classification using deep features. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 3615–3622. [Google Scholar] [CrossRef]
Gong, Z.; Zhong, P.; Hu, W. Statistical loss and analysis for deep learning in hyperspectral image classification. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 322–333. [Google Scholar] [CrossRef]
Wang, Z.; Du, B.; Guo, Y. Domain adaptation with neural embedding matching. IEEE Trans. Neural Netw. Learn. Syst. 2020, 13, 2387–2397. [Google Scholar] [CrossRef]
Long, M.; Cao, Y.; Cao, Z.; Wang, J.; Jordan, M.I. Transferable representation learning with deep adaptation networks. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 41, 3071–3085. [Google Scholar] [CrossRef]
Liu, F.; Xu, W.; Lu, J.; Zhang, G.; Gretton, A.; Sutherland, D.J. Learning deep kernels for non-parametric two-sample tests. In Proceedings of the 37th International Conference on Machine Learning, Online, 13–18 July 2020; pp. 6272–6282. [Google Scholar]
Gao, R.; Liu, F.; Zhang, J.; Han, B.; Liu, T.; Niu, G.; Sugiyama, M. Maximum mean discrepancy test is aware of adversarial attacks. In Proceedings of the 38th International Conference on Machine Learning, Online, 18–24 July 2021; pp. 3564–3575. [Google Scholar]
Ganin, Y.; Lempitsky, V. Unsupervised domain adaptation by backpropagation. In Proceedings of the 32nd International Conference on Machine Learning, Lile, France, 6–11 July 2015; pp. 1180–1189. [Google Scholar]
Ma, X.; Mou, X.; Wang, J.; Liu, X.; Wang, H.; Yin, B. Cross-data set hyperspectral image classification based on deep domain adaptation. IEEE Trans. Geosci. Remote Sens. 2019, 57, 10164–10174. [Google Scholar] [CrossRef]
Wang, Z.; Du, B.; Shi, Q.; Tu, W. Domain adaptation with discriminative distribution and manifold embedding for hyperspectral image classification. IEEE Geosci. Remote Sens. Lett. 2019, 16, 1155–1159. [Google Scholar] [CrossRef]
Deng, B.; Jia, S.; Shi, D. Deep metric learning-based feature embedding for hyperspectral image classification. IEEE Geosci. Remote Sens. Lett. 2020, 58, 1422–1435. [Google Scholar] [CrossRef]
Chen, C.L.P.; Liu, Z. Broad learning system: An effective and efficient incremental learning system without the need for deep architecture. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 10–24. [Google Scholar] [CrossRef] [PubMed]
Liu, Z.; Chen, C.; Feng, S.; Feng, Q.; Zhang, T. Stacked broad learning system: From incremental flatted structure to deep model. IEEE Trans. Syst. Man Cybern. Syst. 2021, 51, 209–222. [Google Scholar] [CrossRef]
Chen, C.L.P.; Liu, Z.; Feng, S. Universal approximation capability of broad learning system and its structural variations. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 1194–1204. [Google Scholar] [CrossRef] [PubMed]
Wang, H.; Cheng, Y.; Chen, C.L.P. Broad graph convolutional neural network and its application in hyperspectral image classification. IEEE Trans. Emerg. Top. Comput. Intell. 2022; early access. [Google Scholar] [CrossRef]
Guo, L.; Li, R.; Jiang, B. An ensemble broad learning scheme for semi-supervised vehicle type classification. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 5287–5297. [Google Scholar] [CrossRef] [PubMed]
Kong, Y.; Wang, X.; Cheng, Y.; Chen, C.L.P. Multi-stage convolutional broad learning with block diagonal constraint for hyperspectral image classification. Remote Sens. 2021, 13, 3412. [Google Scholar] [CrossRef]
Feng, S.; Chen, C.L.P. Fuzzy broad learning system: A novel neuro-fuzzy model for regression and classification. IEEE Trans. Cybern. 2022, 50, 414–424. [Google Scholar] [CrossRef]
Wang, Q.; Lin, J.; Yuan, Y. Salient band selection for hyperspectral image classification via manifold ranking. IEEE Trans. Neural Netw. Learn. Syst. 2016, 27, 1279–1289. [Google Scholar] [CrossRef]
Liu, Y.; Gao, L.; Xiao, C.; Qu, Y.; Zheng, K.; Marinoni, A. Hyperspectral image classification based on a shuffled group convolutional neural network with transfer learning. Remote Sens. 2022, 12, 1780. [Google Scholar] [CrossRef]
Schölkopf, B.; Platt, J.; Hofmann, T. Analysis of representations for domain adaptation. In Proceedings of the 19th Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 4–7 December 2006; pp. 137–144. [Google Scholar]
Ben-David, S.; Blitzer, J.; Crammer, K.; Kulesza, A.; Pereira, F.; Vaughan, J. A theory of learning from different domains. Mach Learn. 2010, 79, 151–175. [Google Scholar] [CrossRef]
Long, M.; Wang, J.; Ding, G.; Sun, J.; Yu, P.S. Transfer feature learning with joint distribution adaptation. In Proceedings of the 2013 IEEE International Conference on Computer Vision, Sydney, Australia, 1–8 December 2013; pp. 2200–2207. [Google Scholar]
Wang, H.; Wang, X.; Chen, C.; Cheng, Y. Hyperspectral image classification based on domain adaptation broad learning. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 3006–3018. [Google Scholar] [CrossRef]
Sun, B.; Saenko, K. Deep CORAL: Correlation alignment for deep domain adaptation. In Proceedings of the 14th European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 443–450. [Google Scholar]
Long, M.; Cao, Z.; Wang, J.; Jordan, M. Conditional adversarial domain adaptation. In Proceedings of the 32nd Conference on Neural Information Processing Systems, Montréal, QC, Canada, 3–8 December 2018; pp. 1640–1650. [Google Scholar]
Othman, E.; Bazi, Y.; Melgani, F.; Alhichri, H.; Alajlan, N.; Zuair, M. Domain adaptation network for cross-scene classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 4441–4456. [Google Scholar] [CrossRef]
Jiang, J.; Ma, J.; Chen, C.; Wang, Z.; Cai, Z.; Wang, L. SuperPCA: A superpixelwise PCA approach for unsupervised feature extraction of hyperspectral imagery. IEEE Trans. Geosci. Remote Sens. 2018, 56, 4581–4593. [Google Scholar] [CrossRef]
Chang, C.I.; Liu, K.H. Progressive band selection of spectral unmixing for hyperspectral imagery. IEEE Trans. Geosci. Remote Sens. 2014, 52, 2002–2017. [Google Scholar] [CrossRef]
Wang, Y.; Zhu, Q.; Ma, H.; Yu, H. A hybrid gray wolf optimizer for hyperspectral image band selection. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5527713. [Google Scholar] [CrossRef]
Shang, X.; Song, M.; Wang, Y.; Yu, C.; Yu, H.; Li, F.; Chang, C.I. Target-constrained interference-minimized band selection for hyperspectral target detection. IEEE Trans. Geosci. Remote Sens. 2020, 59, 6044–6064. [Google Scholar] [CrossRef]
Zhen, Q.; Zhang, X.; Li, Z.; Hou, B.; Tang, X.; Gao, L.; Jiao, L. Few-shot hyperspectral image classification based on domain adaptation of class balance. In Proceedings of the IGARSS 2022 IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, 17–22 July 2022; pp. 2255–2258. [Google Scholar]
Wang, P.; Liu, J.; Zhou, J.; Chen, P.; Duan, R.; Zhang, T. A cross-domain bearing fault diagnosis method towards unbalanced data based on universal domain adaptation. In Proceedings of the 2022 IEEE International Conference on Sensing, Diagnostics, Prognostics, and Control, Chongqing, China, 5–7 August 2022; pp. 174–178. [Google Scholar]
Chen, J.; Chen, G.; Fang, B.; Wang, J.; Wang, L. Class-aware domain adaptation for coastal land cover mapping using optical remote sensing imagery. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 11800–11813. [Google Scholar] [CrossRef]
Xu, M.; Wang, H.; Ni, B.; Tian, Q.; Zhang, W. Cross-domain detection via graph-induced prototype alignment. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 12352–12361. [Google Scholar]

Figure 1. Structure diagram of CWDAN.

Figure 2. Structure of ConDAN.

Figure 3. Illustrates the visualization of feature distribution. (a) Original features on BOT5-6 (b) Original features on BOT6–5 (c) Original features on BOT7–6 (d) CWDAN features on BOT5–6 (e) CWDAN features on BOT6–5 (f) CWDAN features on BOT7–6.

Figure 4. Class alignment on BOT5–6. (a) BLS on class 2 (b) BLS on class 3 (c) BLS on class 4 (d) CWDAN on class 2 (e) CWDAN on class 3 (f) CWDAN on class 4.

Table 1. Class names and number of samples of BOT and KSC data.

BOT					KSC
Class	Surface Object	BOT5	BOT6	BOT7	Class	Surface Object	KSC1	KSC3
1	Exposed soils	215	229	615	1	Water	927	1392
2	Firescar	354	335	433	2	Scrub	761	422
3	Island interior	337	370	664	3	Slash pine	161	166
4	Riparian	448	303	438	4	Harwood swamp	105	248
5	Savanna	330	342	710	5	Willow swamp	243	180
6	Short mopane	239	299	330	6	Graminoid marsh	432	453
7	Primary floodplain	437	308	584	7	Salt marsh	419	156
8	Woodlands	357	324	633	8	Oak/broadleaf hammock	229	274
9	Water	297	361	590	9	CP/oak hammock	252	132
					10	CP hammock	256	431
	Total	3014	2871	4997		Total	3784	3854

Table 2. Configuration of ConDAN.

BOT		KSC
C1	Covn1D: size = 5 × 64	C1	Covn1D: size = 5 × 64
P1	Pooling: size = 2	P1	Pooling: size = 2
N1	Tanh	N1	Tanh
C2	Conv1D: size = 5 × 64	C2	Covn1D: size = 5 × 64
N2	Tanh	N2	Tanh
C3	Covn1D: size = 3 × 64	C3	Covn1D: size = 3 × 64
P2	Pooling: size = 2	N3	Tanh
FC1	size = 64	C4	Covn1D: size = 3 × 64
FC2	size = 9	N4	Tanh
		FC1	size = 32
		FC2	size = 10

Table 3. Comparison of classification performance (OA, %).

Data Pairs		BLS [34]	TCA [21]	JDA [45]	DANN [30]	DAN [27]	DCORAL [47]	CDAN [48]
BOT	5–6	73.18	80.29	82.51	87.81	88.37	88.33	90.18
	6–5	62.74	67.85	67.88	85.47	85.77	87.13	89.12
	5–7	57.89	61.98	66.16	75.35	75.97	75.17	76.19
	7–5	64.70	66.09	66.17	76.68	76.61	79.06	80.57
	6–7	67.30	70.68	71.00	79.19	80.37	80.53	82.45
	7–6	74.57	75.62	78.06	86.45	85.89	87.84	89.31
KSC	1–3	63.08	64.58	65.75	72.21	71.20	71.67	73.72
KSC	3–1	59.21	67.12	68.18	69.48	69.05	71.06	71.54
Data Pairs		DDA-Net [31]	DDME [32]	DABL [46]	SG-CNN [42]	ConDAN	CWDAN
BOT	5–6	90.84	89.10	86.66	90.67	90.80	92.09
	6–5	88.96	87.92	85.17	88.65	88.89	90.34
	5–7	76.41	75.47	70.28	76.36	77.93	78.62
	7–5	82.16	77.25	79.13	81.57	81.98	83.30
	6–7	82.89	80.81	81.25	82.09	83.09	83.85
	7–6	89.79	86.45	88.16	88.37	88.38	90.90
KSC	1–3	73.79	71.41	72.01	72.74	72.78	75.16
KSC	3–1	72.16	70.65	70.51	72.17	72.49	73.86

Table 4. Comparison of classification performance (BOT5-6).

Surface Object	BLS [34]	TCA [21]	JDA [45]	DANN [30]	DAN [27]	DCORAL [47]	CDAN [48]
Exposed soils	12.22	83.41	79.48	79.48	85.59	82.10	82.97
Firescar	99.10	99.40	99.70	99.70	99.70	99.70	99.40
Island interior	88.11	74.05	83.24	92.43	91.89	94.32	96.49
Riparian	65.02	61.72	63.37	78.22	83.50	80.20	75.25
Savanna	78.07	86.26	89.18	91.52	83.33	91.81	95.03
Short mopane	55.18	93.65	96.99	96.32	98.33	94.65	99.00
Primary floodplain	86.36	88.31	87.99	93.83	98.38	99.35	98.38
Woodlands	50.93	34.57	38.89	54.01	52.78	49.07	60.49
Water	98.34	100	100	100	100	100	100
Surface Object	DDA-Net [31]	DDME [32]	DABL [46]	SG-CNN [42]	ConDAN	CWDAN
Exposed soils	93.89	81.22	66.38	99.13	97.82	97.38
Firescar	100	99.70	100	98.81	98.81	100
Island interior	97.84	95.68	95.68	93.51	98.11	99.46
Riparian	82.51	69.31	66.67	72.61	84.49	87.46
Savanna	94.74	94.74	94.15	89.77	94.44	97.66
Short mopane	97.66	99.33	95.65	94.98	96.32	99.67
Primary floodplain	98.05	97.73	92.21	99.68	99.03	99.03
Woodlands	50.62	58.95	59.26	67.91	48.15	47.84
Water	100	100	100	100	99.72	100

Table 5. Classification performance (OA, %) with different numbers of bands.

Data Pair	BOT5–6	BOT6–5	BOT6–7	BOT7–6	BOT5–7	BOT7–5	KSC1–3	KSC3–1
CWDAN-32	90.03	87.26	76.12	81.26	80.34	88.36	73.27	70.72
CWDAN	92.09	90.34	78.62	83.30	83.85	90.90	75.16	73.86
CWDAN-128	91.15	89.14	77.35	82.27	81.87	89.36	74.02	71.05
CWDAN-O	90.85	88.83	76.93	81.86	81.35	88.91	73.65	70.81

Table 6. Ablation experiments.

Technique	CWDAN-B	CWDAN-C	CWDAN-D	CWDAN-E	CWDAN
BS	√	√	√	√	√
MMD		√	√	√	√
CORAL			√	√	√
CDA				√	√
CWMMD					√

Table 7. Impact of different techniques on OA (%).

Data Pairs		CWDAN-A	CWDAN-B	CWDAN-C	CWDAN-D	CWDAN-E	CWDAN
BOT	5–6	86.89	88.12	90.31	90.84	91.64	92.09
	6–5	85.71	87.26	88.84	89.25	89.52	90.34
	5–7	73.68	75.12	77.35	77.97	78.10	78.62
	7–5	77.85	79.35	81.68	82.23	82.88	83.30
	6–7	79.36	80.27	82.67	83.16	83.57	83.85
	7–6	83.28	85.34	88.73	89.52	90.16	90.90
KSC	1–3	69.12	70.61	72.86	73.34	74.10	75.16
KSC	3–1	68.87	70.28	72.03	72.68	72.87	73.86

Table 8. Comparison of decline rate of OA (%) on BOT5–6.

Method	0% Noisy	10% Noisy		20% Noisy		30% Noisy		40% Noisy		50% Noisy
Method	OA	OA	Rate	OA	Rate	OA	Rate	OA	Rate	OA	Rate
CWDAN	92.09	91.92	0.18	91.26	0.90	89.79	2.50	88.02	4.42	82.24	10.70
CDAN [48]	90.18	88.47	1.90	86.73	3.83	85.34	5.37	84.01	6.84	78.65	12.79
DAN [27]	88.37	85.34	3.43	83.07	6.00	79.97	9.51	76.45	13.49	75.51	14.55

Table 9. Running time and number of parameters for all the networks.

Dataset	Index	BLS	TCA	JDA	DANN	DAN	DCORAL
BOT	Time (s)	8.65	4.21	81.32	2327.21	3187.26	2213.83
BOT	Parameters (M)	~	~	~	42.31	42.19	38.02
KSC	Time (s)	10.32	5.16	124.26	2216.34	3314.62	2361.32
KSC	Parameters (M)	~	~	~	48.14	48.07	47.02
Dataset	Index	CDAN	DDA-Net	DDME	DAB	SG-CNN	CWDAN
BOT	Time (s)	2532.14	2678.35	2563.45	18.92	3215.25	2109.36
BOT	Parameters (M)	46.48	93.30	38.02	~	74.75	39.40
KSC	Time (s)	2645.25	2958.21	2236.47	19.67	3421.83	2197.28
KSC	Parameters (M)	46.75	52.13	47.02	~	71.39	48.39

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, H.; Cheng, Y.; Wang, X. A Novel Hyperspectral Image Classification Method Using Class-Weighted Domain Adaptation Network. Remote Sens. 2023, 15, 999. https://doi.org/10.3390/rs15040999

AMA Style

Wang H, Cheng Y, Wang X. A Novel Hyperspectral Image Classification Method Using Class-Weighted Domain Adaptation Network. Remote Sensing. 2023; 15(4):999. https://doi.org/10.3390/rs15040999

Chicago/Turabian Style

Wang, Haoyu, Yuhu Cheng, and Xuesong Wang. 2023. "A Novel Hyperspectral Image Classification Method Using Class-Weighted Domain Adaptation Network" Remote Sensing 15, no. 4: 999. https://doi.org/10.3390/rs15040999

APA Style

Wang, H., Cheng, Y., & Wang, X. (2023). A Novel Hyperspectral Image Classification Method Using Class-Weighted Domain Adaptation Network. Remote Sensing, 15(4), 999. https://doi.org/10.3390/rs15040999

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Novel Hyperspectral Image Classification Method Using Class-Weighted Domain Adaptation Network

Abstract

1. Introduction

2. Classification of HSIs Based on CWDAN

2.1. Network Overview

2.2. Flowchart of CWDAN for HSI Classification

2.3. Band-Selection-Based HSI Dimensionality Reduction

2.4. Deep Feature Extracting Based on ConDAN

2.5. Class-Weighted Domain Adaptation and Breadth Expansion Based on WCBN

3. Experiments and Analysis

3.1. HSI Datasets

3.2. Experimental Setting

3.3. Comparative Experiments

3.4. Ablation for Band Selection

3.5. Ablation Study

3.6. The Impact of Noise on CWDAN

3.7. Computational Cost

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI