Hierarchical Prototype-Aligned Graph Neural Network for Cross-Scene Hyperspectral Image Classification

Shen, Danyao; Hu, Haojie; He, Fang; Zhang, Fenggan; Zhao, Jianwei; Shen, Xiaowei

doi:10.3390/rs16132464

Open AccessArticle

Hierarchical Prototype-Aligned Graph Neural Network for Cross-Scene Hyperspectral Image Classification

by

Danyao Shen

^†,

Haojie Hu

^*,†

,

Fang He

,

Fenggan Zhang

,

Jianwei Zhao

and

Xiaowei Shen

Xi’an Research Institute of High Technology, Xi’an 710025, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Remote Sens. 2024, 16(13), 2464; https://doi.org/10.3390/rs16132464

Submission received: 23 May 2024 / Revised: 28 June 2024 / Accepted: 3 July 2024 / Published: 5 July 2024

(This article belongs to the Special Issue The Recent Progression of Machine Learning in Remote Sensing: Theory and Modelling)

Download

Browse Figures

Versions Notes

Abstract

:

The objective of cross-scene hyperspectral image (HSI) classification is to develop models capable of adapting to the “domain gap” that exists between different scenes, enabling accurate object classification in previously unseen scenes. Many researchers have devised various domain adaptation techniques aimed at aligning the statistical or spectral distributions of data from diverse scenes. However, many previous studies have overlooked the potential benefits of incorporating spatial topological information from hyperspectral imagery, which could provide a more accurate representation of the inherent data structure in HSIs. To overcome this issue, we introduce an innovative approach for cross-scene HSI classification, founded on hierarchical prototype graph alignment. Specifically, this method leverages prototypes as representative embedded representations of all samples within the same class. By employing multiple graph convolution and pooling operations, multi-scale domain alignment is attained. Beyond statistical distribution alignment, we integrate graph matching to effectively reconcile semantic and topological information. Experimental results on several datasets achieve significantly improved accuracy and generalization capabilities for cross-scene HSI classification tasks.

Keywords:

cross-scene; hyperspectral image classification; domain adaptation; prototype graph alignment; graph convolution; graph matching

Graphical Abstract

1. Introduction

Hyperspectral image classification stands as a prominent research domain within hyperspectral image data processing. Currently, numerous methodologies have been devised for hyperspectral image classification, including traditional machine learning algorithms [1,2,3,4] and deep learning methods [5,6,7,8]. However, it is crucial to note that the efficacy of these techniques heavily relies on the availability of a substantial number of labeled samples. However, in the domain of hyperspectral imagery, acquiring a substantial amount of labeled samples is extremely challenging due to the significant time and manpower required [9]. In many practical hyperspectral image classification scenarios, it may even be impossible to obtain any labeled samples. Previous hyperspectral classification methods were typically applicable only to fixed scenarios, where the training and testing samples are independent and drawn from identical distribution. Consequently, these methods cannot be transferred to other scenarios and cannot be tested on real-time acquired data without labels [10]. In such cases, the task of transferring the knowledge of a well-trained model to unlabeled scenarios becomes a common challenge, known as the cross-scene classification of hyperspectral imagery.

In cross-scene hyperspectral classification tasks, the image data with known labels are typically denoted as the source domain (SD), while the image data with unknown labels are termed the target domain (TD). The primary goal is to leverage the labeled SD to train a model and transfer its knowledge to accurately classify land cover in the TD. However, practical hyperspectral image data acquisition is subject to various influencing factors, such as sensor noise variations, seasonal changes, and diverse weather conditions [11]. These factors lead to spectral reflectance disparities between the same land cover classes in the source and target domains. Consequently, direct application of the SD for classification in the TD often encounters spectral shift challenges. Therefore, the models trained within a single scene cannot be directly used in other scenes with unknown labels.

To address this challenge, many researchers use domain adaptation techniques targeting the reduction in spectral shift at the feature level, aiming to learn models invariant to domain variations across different scenes. This enables the training of a classifier on a source scene and its effective application to a target scene. For instance, Sun et al. proposed discriminative cross-view subspaces and utilized subspace alignment methods for unsupervised cross-scene remote sensing image classification [12]. Qin et al. introduced a tensor alignment approach for HSI classification, achieving subspace alignment across tensor domains [13]. Yang et al. introduced an ideal regularization discriminative multi-kernel subspace alignment method for HSI domain adaptation [14]. Furthermore, various deep learning methods enhance model classification performance in the target domain by employing adaptive layers to facilitate alignment between the source and target domains. Zhu et al. proposed a deep subdomain adaptation network by defining the subdomain concept and aligning relevant subdomains using local maximum mean discrepancy [15]. Long et al. introduced a deep adaptation network that simultaneously added three adaptive network layers and employed multi-kernel maximum mean discrepancy in each network layer to achieve domain alignment [16]. Qu et al. introduced a physically constrained transfer learning method that significantly improved results in real remote sensing scenarios by sharing rich spatial information in hyperspectral imagery [17]. Moreover, the concepts of contrastive learning and adversarial networks are currently widely utilized in cross-scene hyperspectral image classification. Ning et al. designed an instance-to-instance contrastive learning framework based on category matching, which helps extract domain-invariant and class-discriminative features simultaneously [18].

Wang et al. proposed a bi-classifier adversarial network that incorporates self-training strategies, effectively enhancing feature discriminative performance [19].

Liu et al. combined the adversarial learning strategy and the contrastive learning strategy to achieve cross-domain few-shot HSI classification [20].

Domain adaptation serves as an effective means of transferring knowledge from a labeled dataset (source domain) to an unlabeled dataset (target domain), finding extensive utility in cross-scene classification tasks for hyperspectral imagery [21,22]. However, most of the previous methods overlook the topological characteristics of hyperspectral imagery. These topological relationships that reflect global characteristics are not constrained by spatial coordinates alone and maintain their original properties even in the presence of deformations [10]. Such invariant features contribute to improving the classification performance, particularly in cross-scene applications with domain shift. Recently, graph neural networks have become increasingly popular for their effectiveness in representing and analyzing data within non-Euclidean spaces. They are particularly well-suited for graph-structured data, as they can model the intricate relationships between samples (or vertices). Given the absence of strict correspondence between local spatial relationships in two hyperspectral image scenes, exploiting the intrinsic topological relationships among land cover categories becomes paramount to better align the two domains based on non-local spatial relationships. As shown in Figure 1, even with distribution alignment alone, target domain samples may still be incorrectly aligned with different categories of source domain samples. While the classification boundaries in the source domain might suit its own data adequately, they could prove inadequate for discerning patterns in the target domain. However, aligning the topological structural information ensures the preservation of feature consistency and relationships within the scene, ultimately achieving category-level domain alignment.

In light of the challenge of leveraging spatial topological information in hyperspectral imagery for cross-scene classification tasks, an innovative approach grounded in hierarchical prototype graph alignment (HPGA) is proposed. The main contributions of our work are summarized as follows:

A generic, end-to-end differentiable model framework for conducting domain alignment on prototype graph structure data derived from both source and target domains in a hierarchical manner is proposed. By aligning topological and semantic information at different scales, the accuracy and generalization capability of hyperspectral image classification are improved.
Different scales of prototype graph structure data are obtained using differentiable graph pooling. This allows the model to analyze data at multiple levels, capturing richer semantic information at different hierarchies.
The problem of the cross-domain alignment of graph structures is transformed into a graph-matching problem. During network optimization, the graph optimal transport (GOT) distance is minimized in order to align the source and target domains. This approach leverages graph structure information to better address the spectral shift problem.
Experimental results demonstrate that the proposed hyperspectral cross-scene classification method based on hierarchical prototype graph alignment achieves excellent performance on several datasets. This indicates that the approach has strong generalization capabilities when dealing with hyperspectral cross-scene classification tasks.

2. Methodology

In this section, we begin by offering foundational insights into our approach through a discussion of essential definitions and notation. This includes an exploration of domain adaptation concepts and an introduction to graph-matching principles. Following this, we describe the proposed HPGA method in detail.

2.1. Preliminaries

2.1.1. Domain Adaptation

Assuming the training and testing data follow the same distribution, machine learning methods can be generalized. This assumption, however, is challenged when training a model in one domain (the source domain) and testing it in another (the target domain), where data distributions may diverge. Domain adaptation (DA) addresses this challenge by striving to enhance performance, aiming to identify features invariant across domains, thus facilitating the improved generalization of models trained in the source domain to the target domain [16,23].

One classic approach to domain adaptation is maximum mean discrepancy (MMD) [24]. MMD, a kernel-based measure, seeks to minimize the distance between the feature distributions of the source and target domains. By reducing MMD, the feature distributions of the source and target domains converge, thereby enhancing the model’s generalization performance. Let

X_{s} = {x_{i}^{s}}_{i = 1}^{n_{s}} \in R^{d}

and

X_{t} = {x_{i}^{t}}_{i = 1}^{n_{t}} \in R^{d}

denote the data in the source and target domains, respectively, where d represents the dimensionality of the data and

n_{s}

and

n_{t}

are the number of samples in the source and target domains, respectively.

Y_{s}

and

Y_{t}

represent the corresponding class labels. The computation of MMD involves mapping the samples from the source and target domains to a feature space using a mapping function

ϕ

and then calculating the distance between the two distributions in the feature space. The MMD distance between two different domains can be described as follows:

\begin{matrix} MMD (X_{s}, X_{t}) = {∥\frac{1}{n_{s}} \sum_{i = 1}^{n_{s}} ϕ (x_{i}^{s}) - \frac{1}{n_{t}} \sum_{j = 1}^{n_{t}} ϕ (x_{j}^{t})∥}_{H}^{2} \end{matrix}

(1)

where

ϕ

is a feature mapping function. The physical interpretation of MMD entails determining the distance between the means of the source domain (SD) and target domain (TD) within the Reproducing Kernel Hilbert Space (RKHS), which is presented by H. RKHS constitutes a complete function inner product space, and after squaring and expanding, the inner product in RKHS can be transformed into a kernel function. Consequently, MMD computation can be directly conducted using this kernel function. If the two distributions are identical, the MMD is 0; otherwise, the MMD is greater than 0.

2.1.2. Graph Matching

For graph-structured data, addressing cross-domain alignment involves formulating a graph-matching problem. Here, we employ graph optimal transport (GOT) [25] for feature matching and graph topological structure matching. GOT employs two optimal transport distances: the Wasserstein distance (WD) [26] for node matching and the Gromov–Wasserstein distance (GWD) [27] for edge matching. The WD quantifies the distance between node embeddings across domains, disregarding the encoded topological information in the graph. Conversely, the GWD compares graph structures by assessing distances between pairs of nodes in each graph. By combining these two distances, the GOT framework effectively accounts for both node and edge information, thereby achieving enhanced graph matching. The formula for the WD is as follows:

\begin{matrix} D_{WD} [P (X_{s}), P (X_{t})] = inf_{γ \in Π [P (X_{s}), P (X_{t})]} E_{(x^{s}, x^{t}) \sim γ} [c (x^{s}, x^{t})] \end{matrix}

(2)

\begin{matrix} c (x_{i}^{s}, x_{j}^{t}) = 1 - \frac{{(x_{i}^{s})}^{T} x_{j}^{t}}{∥ x_{i}^{s} ∥_{2} {∥ x_{j}^{t} ∥}_{2}} \end{matrix}

(3)

where

c (x^{s}, x^{t})

represents the cross-domain cost matrix obtained through the cosine distance, and

Π [P (X_{s}), P (X_{t})]

represents the joint distribution

γ (x^{s}, x^{t})

over all pairs. The Gromov–Wasserstein distance (GWD) can be expressed as follows:

\begin{matrix} D_{GWD} [P (X_{s}), P (X_{t})] = & inf_{γ \in Π [P (X_{s}), P (X_{t})]} E_{(x^{s}, x^{t}) \sim γ, ({\hat{x}}^{s}, {\hat{x}}^{t}) \sim γ} [L (x_{i}^{s}, x_{j}^{t}, {\hat{x}}_{i}^{s}, {\hat{x}}_{j}^{t})] \end{matrix}

(4)

\begin{matrix} L (x_{i}^{s}, x_{j}^{t}, {\hat{x}}_{i}^{s}, {\hat{x}}_{j}^{t}) = ∥c_{1} (x_{i}^{s}, {\hat{x}}_{i}^{s}) - c_{2} (x_{j}^{t}, {\hat{x}}_{j}^{t})∥ \end{matrix}

(5)

where

L (x_{i}^{s}, x_{j}^{t}, {\hat{x}}_{i}^{s}, {\hat{x}}_{j}^{t})

is the cost function that evaluates the intra-graph structural similarity between the two pairs of nodes

(x_{i}^{s}, x_{i}^{s})

and

(x_{j}^{t}, x_{j}^{t})

. During the computation of these two distances, GOT learns an optimal transport matrix

T \in R^{n \times m}

, where

T

is shared between the WD and GWD. This matrix is used to integrate the WD and GWD to obtain the GOT distance and optimize the alignment between the source and target domains. The optimal transport matrix

T

possesses several unique characteristics that make it a favorable choice for addressing cross-domain alignment problems: (1) Self-normalization: the sum of all elements in the optimal transport matrix

T

equals 1. (2) Sparsity: When solved exactly, the optimal solution

T

contains at most

(2 r - 1)

non-zero elements, where

r = max (n, m)

. This achieves alignment that is more interpretable and robust. (3) Efficiency: Compared to traditional linear programming solvers, the optimal solution can be easily obtained through an iterative process of matrix–vector multiplication. This allows for the application of GOT to large-scale deep neural networks. The computation process of the GOT distance is illustrated in Figure 2.

2.2. Overall Framework

This paper introduces HPGA, a novel method for cross-scene HSI classification that addresses the disparities between source and target domains through the alignment of hierarchical prototype graphs. Prototype representation is a method used to describe graph structure features, where representative vertices can be identified using graph pooling algorithms [28]. Hierarchical prototype representation refers to clustering at different levels to obtain prototype representations at different granularities. The purpose of this approach is to capture general structural information across all graphs and enable reliable vertex partitioning at different levels. Different from methods that apply multi-scale convolution on a single-layer graph, the proposed method imports the HSI into multiple layers of graphs, allowing for the learning of features at various scales. As depicted in Figure 3, our proposed algorithm centers around two key operations: domain alignment (including distribution alignment and graph alignment) and differentiable graph pooling.

2.3. Hierarchical Prototype Representation

In the target domain, the lack of supervision leads to initial nodes in graph construction frequently deviating from actual instances. Additionally, sample representations within the same category often display multi-modality [29], causing differences between the embedding space and the sample instances. As the scale of graph construction directly reflects the topological structure of the images at different scales, graph alignment at different scales allows for a more thorough comprehension of the scene, enabling the model to accurately capture the consistency of features and relationships (domain-invariant features) within the scene. Building on the approach of Snell et al. [30], which introduced a latent space where each class is represented by a prototype and data points of the corresponding class cluster around it, this paper adopts graph pooling algorithms to identify representative vertices as prototypes and construct graph topologies with larger receptive fields. Hierarchical prototype representation refers to pooling operations at different levels to obtain prototype representations at different granularities, gradually reducing the discrepancy between the node representations and the prototypes of the corresponding class across domains, ultimately achieving category-level prototype alignment.

Prototype representation learning involves clustering or pooling nodes based on the output of a graph neural network (GNN) to obtain a coarsened graph as input to another GNN layer. However, the challenge in designing such a pooling layer for GNNs lies in providing a general method for hierarchical pool nodes across a series of input graphs, not just within a single graph. Specifically, given the output

Z = GNN (A, X)

of a GNN module and the graph adjacency matrix

A \in R^{n \times n}

, a graph pooling operation can be performed to obtain a new coarsened graph with

m < n

nodes, represented by a weighted adjacency matrix

A^{'} \in R^{m \times m}

and node embeddings

Z^{'} \in R^{m \times d}

. This new coarsened graph can serve as input to another GNN layer, and this process can be repeated for L times, resulting in a model with L GNN layers. To achieve this, a clustering assignment matrix is defined that determines which prototype category each node belongs to based on the output of the GNN module. This matrix can be used to generate new graph features and adjacency matrices for the next GNN module. Let

S^{(l)} \in R^{n_{l} \times n_{l + 1}}

denote the clustering assignment matrix learned at the l-th layer. Each row of

S^{(l)}

corresponds to a node (or cluster) at the l-th layer, and each column corresponds to a cluster at the

(l + 1)

-th layer. Intuitively,

S^{(l)}

provides a soft assignment of each node at the l-th layer to a cluster in the coarsened graph at the next layer.

Assuming that the assignment matrix

S^{(l)}

for the l-th layer has been computed, let

A^{(l)}

denote the input adjacency matrix and

Z^{(l)}

denote the input node embedding matrix for this layer. Based on these inputs, the pooling process can be computed using the following two formulas to obtain a new coarsened adjacency matrix

A^{(l + 1)}

and a new embedding matrix

X^{(l + 1)}

:

\begin{matrix} X^{(l + 1)} = S^{{(l)}^{T}} Z^{(l)} \in R^{n_{l + 1} \times d} \end{matrix}

(6)

\begin{matrix} A^{(l + 1)} = S^{{(l)}^{T}} A^{(l)} S^{(l)} \in R^{n_{l + 1} \times n_{l + 1}} \end{matrix}

(7)

Equation (6) aggregates the node embeddings

Z^{(l)}

based on the clustering assignment matrix

S^{(l)}

, resulting in embedded representations for

n_{l + 1}

prototypes. Similarly, Equation (7) transforms the adjacency matrix

A^{(l)}

into a coarsened adjacency matrix that represents the connection strength between each pair of prototypes. The pooling layer achieves graph coarsening through Equations (6) and (7). It is important to note that

A^{(l + 1)}

is a real-valued matrix representing a fully connected adjacency graph with weights, where each element

A_{i j}^{(l + 1)}

can be considered as the connection strength between prototype i and prototype j. Likewise, each row of

X^{(l + 1)}

corresponds to the embedded representation of a prototype. The coarsened adjacency matrix

A^{(l + 1)}

and embedding representation

X^{(l + 1)}

can be used together as input for another GNN layer.

Next, we will discuss how to generate the assignment matrix

S^{(l)}

and embedding matrix

Z^{(l)}

used in Equations (6) and (7). In this process, two separate GNNs are used to generate these two matrices, both taking the prototype representation

X^{(l)}

and the coarsened adjacency matrix

A^{(l)}

as network inputs. The generation of the embedding matrix

Z^{(l)}

can be expressed as

\begin{matrix} Z^{(l)} = {GNN}_{l, e m b e d} (A^{(l)}, X^{(l)}) \end{matrix}

(8)

The above equations utilize the adjacency matrix between prototypes at the l-th layer (from Equation (7)) and the prototype feature representation (from Equation (6)) and employ a standard GNN to obtain the new cluster node embeddings

Z^{(l)}

. In contrast, the pooling GNN at the l-th layer utilizes the input prototype feature representation

X^{(l)}

and the adjacency matrix

A^{(l)}

to generate an assignment matrix:

\begin{matrix} S^{(l)} = Softmax ({GNN}_{l, p o o l} (A^{(l)}, X^{(1)})) \end{matrix}

(9)

Here, the Softmax function is computed along the row dimension. The output dimension of

{GNN}_{l, p o o l}

corresponds to the maximum predefined number of prototypes at the l-th layer, which is a hyperparameter of the model. It is important to note that these two GNNs share the same input data but have different network parameters and serve different purposes: the embedding GNN generates new embedding representations for the input nodes, while the pooling GNN generates a probabilistic assignment matrix that assigns the input nodes to

n_{l + 1}

clusters.

2.4. Domain Alignment

After obtaining graph structures at different levels, domain alignment between the source and target domains needs to be performed. Domain alignment in this context includes distribution alignment and graph alignment (graph matching). The maximum mean discrepancy (MMD) distance is a commonly used method for aligning marginal distributions. In practice, the unbiased estimate of the MMD distance compares the squared distance between empirical kernel mean embeddings. The formula is expressed as

\begin{matrix} L_{MMD} (Z_{s}, Z_{t}) = & \frac{1}{n_{s}^{2}} \sum_{i = 1}^{n_{s}} \sum_{j = 1}^{n_{s}} r (z_{i}^{s}, z_{j}^{s}) + \frac{1}{n_{t}^{2}} \sum_{i = 1}^{n_{t}} \sum_{j = 1}^{n_{t}} r (z_{i}^{t}, z_{j}^{t}) \\ - \frac{2}{n_{s} n_{t}} \sum_{i = 1}^{n_{s}} \sum_{j = 1}^{n_{t}} r (z_{i}^{s}, z_{j}^{t}) \end{matrix}

(10)

where the kernel function

r (\cdot, \cdot)

represents the inner product of features in RKHS. The variable

z

represents the feature vectors extracted from the source domain

Z_{s}

and the target domain

Z_{t}

. Specifically,

z_{i}^{s}

denotes the i-th feature vector in the source domain, while

z_{j}^{t}

denotes the j-th feature vector in the target domain. In theory, to detect small distribution differences, commonly used kernel functions are typically chosen, such as the Gaussian Radial Basis Function (GRBF) or Laplacian kernel function [31]. In practice, they are both positive semi-definite functions that satisfy the Mercer condition [32]. Considering the linear function

r (x_{i}, x_{j}) = x_{i}^{T} x_{j}

, the MMD distance simplifies to the mean difference between two distributions in the input space. Therefore, selecting an appropriate non-linear kernel function becomes crucial. In the experiments of this paper, the GRBF is used as the kernel function:

\begin{matrix} r (x_{i}, x_{j}) = \exp (- \frac{∥ x_{i} - x_{j} ∥^{2}}{2 σ^{2}}) \end{matrix}

(11)

This kernel function ensures that MMD is an unbiased estimator for differentiating between two probability distributions, where

σ

is the Gaussian kernel parameter.

The WD evaluates the distance between node features from the source and target domains, excluding topological details. Conversely, the GWD assesses the graph structures by measuring the distances between pairs of nodes within each graph. By integrating both metrics, the alignment process effectively accounts for information from both nodes and edges, leading to improved graph matching.

Graph alignment is achieved through the computation of the graph optimal transport (GOT) distance, which involves two metrics: the Wasserstein distance (WD) and the Graph Wasserstein distance (GWD). While the WD quantifies the dissimilarity between the node features of the two domains without factoring in topological information, the GWD assesses the divergence in graph structures by gauging the separation between node pairs within each respective graph. By combining these two distances, the information of both nodes and edges can be effectively considered to achieve better graph matching. As shown in Figure 2, the computation of the GOT distance can be represented as

\begin{matrix} GOT (G_{s}, G_{t}) = \sum_{i, \hat{i}, j, \hat{j}} (γ T_{i j} c (g_{i}^{s}, g_{j}^{t}) + (1 - γ) T_{\hat{i} \hat{j}} L (g_{i}^{s}, g_{j}^{t}, {\hat{g}}_{i}^{s}, {\hat{g}}_{j}^{t})) \end{matrix}

(12)

where

T_{i j}

and

T_{\hat{i} \hat{j}}

are the transformation matrices for the two pairs of nodes, namely (

g_{i}^{s}, g_{j}^{t}

) and (

{\hat{g}}_{i}^{s}, {\hat{g}}_{j}^{t}

), and

γ

is set to 0.5 in the experiments. Therefore, the GOT loss can be represented as:

\begin{matrix} L_{GOT} (Z_{s}, Z_{t}) = \sum_{l = 0}^{L} GOT (G_{s}^{l}, G_{t}^{l}) \end{matrix}

(13)

Here,

G^{l}

represents the subgraph obtained after processing through l layers of GNN, where

l = 0

denotes the original subgraph features generated from

Z_{s}

and

Z_{t}

.

2.5. Loss Function

The cross-entropy loss function is used to predict the probabilities of the samples in the source and target domains and calculate the error based on the labels. Specifically, for a labeled source domain sample

x_{i}^{s}

, the cross-entropy loss is defined as

\begin{matrix} L_{y} (p_{i}, y_{i}) = - \sum_{c = 1}^{C} y_{i}^{c} log p_{i}^{c} \end{matrix}

(14)

where

y_{i}

is the one-hot encoded label of the source domain sample and

p_{i}

is the predicted probability distribution obtained through the Softmax function. Therefore, the classification loss of the GNN on the source domain dataset is defined as the average of the cross-entropy losses for all labeled source domain samples:

\begin{matrix} L_{GNN} (Z_{s}, Y_{s}) = \frac{1}{n_{s}} \sum_{i = 1}^{n_{s}} L_{y} (S (g_{i}^{s}), y_{i}) \end{matrix}

(15)

where

S (\cdot)

represents the Softmax function.

In practice, training the entire GNN network solely using the gradient signal from the graph classification task is challenging. Therefore, an auxiliary link prediction (LP) objective is used to train the entire GNN network. Specifically, in each layer, this objective minimizes a link prediction loss, which is defined as

\begin{matrix} L_{L P} = \sum_{l = 0}^{L} {∥ A^{(l)} - S^{(l)} S^{{(l)}^{T}} ∥}_{F} \end{matrix}

(16)

where

| \cdot |

represents the Frobenius norm.

During the training process, the losses

L_{L P}

,

L_{M M D}

, and

L_{G O T}

are incorporated into the final classification loss function, defined as

\begin{matrix} L (X_{s}, X_{t}, Y_{s}) = & L_{GNN} (Z_{s}, Y_{s}) + λ_{1} L_{M M D} (Z_{s}, Z_{t}) + λ_{2} L_{G O T} (Z_{s}, Z_{t}) \\ + λ_{3} L_{L P} (Z_{s}, Z_{t}) \end{matrix}

(17)

where

λ_{1}

,

λ_{2}

, and

λ_{3}

are regularization parameters.

λ_{1}

and

λ_{2}

control the contributions of data distribution alignment and graph alignment, respectively, while

λ_{3}

ensures that neighboring nodes are assigned together. During the training of the entire GNN network, the weights are adjusted based on the losses related to sample probability prediction, data distribution alignment, graph alignment, and link prediction. This adjustment aims to achieve more accurate cross-domain HSI classification.

3. Experiments

3.1. Dataset Description

Two commonly used hyperspectral image (HSI) datasets were utilized to assess the performance of our proposed algorithm. The specifics of each dataset are outlined below.

3.1.1. Houston

This dataset comprises scene data obtained using different sensors in and around the University of Houston for two different years (2013 and 2018). The Houston 2013 dataset consists of 144 spectral bands covering a wavelength range of 380–1050 nm with a spatial resolution of 2.5 m, resulting in a total of 349 × 1905 pixels. The Houston 2018 dataset shares the same wavelength range, consisting of 48 spectral bands, and has a spatial resolution of 1 m. Both datasets consist of seven consistent categories.

In the experiments, the 48 spectral bands (wavelength range: 0.38–1.05 μm) corresponding to the Houston 2018 scenes were extracted from the Houston 2013 scenes. An overlapping region of size 209 × 955 was selected. The names of land cover categories along with their sample counts are listed in Table 1, while Figure 4 illustrates pseudocolored images and ground truth maps.

3.1.2. HyRANK

The HyRANK dataset was established within the framework of the International Society for Photogrammetry and Remote Sensing (ISPRS) scientific program. This dataset consists of satellite hyperspectral images collected using the Hyperion sensor (EO-1, USGS) and contains 176 spectral bands. The dataset comprises two labeled scenes, Dioni and Loukia, with dimensions of

250 \times 1376

and

249 \times 945

pixels, respectively. The dataset includes 12 consistent categories, and these categories along with their quantities are listed in Table 2. Pseudocolored images and ground truth maps for the two scenes are presented in Figure 5.

3.2. Experimental Settings

To comprehensively evaluate the performance of the proposed method, a comparison was made with some classical unsupervised deep domain adaptation algorithms, including DAN [16], DAAN [33], Multi-Representation Adaptation Network (MRAN) [34], Deep Subspace Alignment Network (DSAN) [15], Heterogeneous Transfer Convolutional Neural Network (HTCNN) [35], and bi-classifier adversarial Network (BCAN) [19]. Regarding training details, 5% of the samples were randomly selected from the source domain for training from both datasets. Notably, only data from the source domain (SD) and target domain (TD), along with labels from the source domain, were utilized during model training for all comparative methods, without leveraging label information from the target domain. Network optimization was performed using the Nesterov Adam algorithm. To comprehensively assess the classification performance of various methods, evaluation metrics including per-class accuracy (PA), overall accuracy (OA), average accuracy (AA), and the Kappa coefficient were employed. Specifically, OA represents the fraction of correctly differentiated samples, PA indicates accuracy for each class, AA is the average of all per-class accuracies, and the Kappa coefficient measures the robustness of the agreement’s degree.

All experiments were conducted using the PyTorch framework on a computer equipped with an Intel Core i7-9700 processor (32 GB RAM) and Nvidia GTX 3090 GPU (24 GB memory). Graph convolutions were implemented using the torch geometric library. Based on experimentation, a three-layer graph network yielded optimal results. Following the graph convolutions, feature construction was enhanced using a CNN layer and a Batch Normalization (BN) layer with a kernel size, stride, and padding all set to one. Optimization utilized batch stochastic gradient descent with a learning rate of 0.01, a momentum of 0.5, and L2 norm decay set to

1 \times 10^{- 4}

. Training iterations totaled 2000 for the Houston dataset and 400 for the HyRANK dataset.

3.3. Experimental Results

Table 3 and Table 4, respectively, present the classification performance of all compared methods on the two sets of cross-scene datasets, including the the mean and standard deviation of ten experiments. Results indicate that HPGA consistently exhibits the best performance across all target domains. Particularly on the Houston 2018 dataset, HPGA achieves an average accuracy (AA) that is nearly 12% higher than the runner-up method. Although HPGA’s overall accuracy (OA) on the Houston dataset is slightly lower than that of HTCNN, it shows a more significant improvement in average accuracy (AA) compared to overall accuracy. This indicates that HPGA performs well across all categories, not just having high accuracy overall. This demonstrates that the algorithm proposed in this chapter has better balance and robustness when dealing with different categories, which is particularly important for practical applications in high-spectral-image classification where various complex land cover categories exist, necessitating good performance across all categories. For the HyRANK dataset, HPGA achieves the best performance in OA, AA, and the Kappa metric. However, classification performance is weaker on some categories, such as the fourth class (orchards), the fifth class (Olive Groves), and the sixth class (Coniferous Forests). This may be due to the fact that the topological relationships among land cover categories in forest scenes are significantly weaker than those in urban scenes. The experimental results on the HyRANK and Pavia cross-scene datasets show that our proposed method, which incorporates topological relationships among land cover categories in the training process, significantly surpasses domain adaptation methods that focus solely on statistical feature alignment.

Classification maps are further shown in Figure 6 and Figure 7. The results in Figure 6 illustrate HPGA’s superiority, showing reduced noise and better alignment with actual land cover distribution. In the marked area (Figure 6), HPGA accurately classifies trees and avoids misclassifying non-residential buildings as roads. In the HyRANK dataset (Figure 7), HPGA utilizes topological structure information to achieve a more accurate classification with lower noise levels in marked regions. Experimental results confirm HPGA’s excellent performance and strong generalization capabilities across several datasets.

In cross-scene hyperspectral image classification, a significant spectral shift occurs where the spectral reflectance of the same land cover category differs between the source domain (SD) and the target domain (TD).

To further demonstrate the cross-domain learning capability and alignment performance of our method HPGA, we visualized the domain-invariant features learned by HPGA. Figure 8 presents the T-SNE visualization using the original features from the Houston dataset and the features extracted by HPGA. In the figure, red hollow circles and blue hollow triangles represent samples from the source domain and target domain, respectively. The first row in each subplot displays the feature distribution of the source and target domains after dimensionality reduction of the original features, while the second row shows the feature distribution after domain adaptation. It is evident from the figure that the original samples from the source and target domains exhibit inconsistent distributions for the same categories. However, after applying the proposed domain adaptation method HPGA, the feature distribution distances are effectively reduced. This allows a model trained on the source domain to be directly applied to the target domain without the need for additional data labeling or model retraining. This phenomenon demonstrates that through distribution alignment and topological structure alignment, the proposed HPGA can project data from both domains into a domain-invariant subspace, thereby alleviating the spectral shift phenomenon to a large extent. This further confirms the effectiveness and practicality of HPGA in the problem of cross-scene hyperspectral image classification.

4. Discussion

In this section, we examine how various modules impact the proposed method’s performance and conduct experiments to assess the sensitivity of hyperparameters.

4.1. Ablation Experiments

To assess the influence of individual loss terms on the model’s performance, we conducted ablation studies on all components of the loss function. The overall accuracy (OA), average accuracy (AA), and Kappa coefficient results for the two datasets are presented in Table 5. These results reveal that the model exhibits the poorest performance when solely utilizing the cross-entropy classification loss. Incorporating the MMD loss term results in a notable improvement in model accuracy on both datasets, with an increase of approximately 10%. This suggests that leveraging the MMD loss contributes significantly to enhancing the model’s generalization capability. Furthermore, with the introduction of the GOT loss, the model’s accuracy on the Houston dataset increases by 0.83%, while on the HyRANK dataset, it improves by 4.28%. This suggests that topological structure alignment provides different degrees of improvement for different datasets. Simultaneously, when the link prediction loss term is applied, the model’s accuracy on both datasets increases by 1% to 2%, indicating that the link prediction loss also contributes to the model’s performance improvement. The ablation experiment results indicate that each loss term contributes positively to enhancing the model’s performance and generalization capability. Notably, when all loss terms are combined, the model achieves an optimal classification performance on the target domain.

4.2. Parameter Sensitivity Analysis

A parameter sensitivity analysis was carried out to assess the impact of parameter variations on HPGA’s performance across the two target domains. The regularization parameters,

λ_{1}

and

λ_{2}

, are crucial hyperparameters in the HPGA algorithm. These parameters individually regulate the influences of distribution alignment and graph alignment on domain generalization. The values of parameters

λ_{1}

and

λ_{2}

were varied within the set

0.001, 0.01, 0.1, 1, 10

. Figure 9 illustrates the change in HPGA’s classification results (OA) on the two experimental datasets under different combinations of

λ_{1}

and

λ_{2}

. The results indicate that for the Houston dataset, the optimal parameters are

λ_{1} = 1

and

λ_{2} = 0.1

, while for the HyRANK dataset, the optimal parameters are

λ_{1} = λ_{2} = 0.01

. This parameter sensitivity analysis underscores the robustness and stability of the HPGA algorithm across diverse datasets. Depending on specific application contexts and dataset characteristics, these hyperparameters can be fine-tuned to enhance performance accordingly.

In addition to

λ_{1}

and

λ_{2}

, the parameter

λ_{3}

is also an important parameter in the HPGA model. To find the optimal regularization parameter

λ_{3}

, a parameter sensitivity analysis was conducted on both sets of experimental data.

λ_{1}

and

λ_{2}

were both set to 0.1, and then the HPGA model was separately trained on each set of experimental data. The classification performance scores on each set of experimental data were recorded, and the regularization parameter that yielded the best performance was selected. The experimental outcomes are presented in Table 6. It is evident that the regularization parameter

λ_{3}

demonstrates greater robustness in comparison to

λ_{1}

and

λ_{2}

with respect to the results.

To validate the effectiveness of the hierarchical prototype alignment method, we experimented with the number of layers L involved in the alignment loss, as shown in Table 7. The results indicate that increasing the number of layers generally improves classification performance. For the Houston 2018 dataset, overall accuracy (OA) rises from 69.4% with

L = 1

to 72.03% with

L = 3

, then drops to 68.64% with

L = 4

. Similarly, for the Louki dataset, OA increases from 57.26% with

L = 1

to 60.84% with

L = 3

, then slightly decreases to 59.77% with

L = 4

. These findings suggest that while more layers initially enhance accuracy, an optimal number

(L = 3)

exists beyond which performance declines due to increased complexity and potential overfitting. Therefore, the hierarchical prototype structure can improve classification effectiveness.

5. Conclusions

Conventional domain adaptation techniques in hyperspectral imagery mainly tackle the spectral shift between source and target domains by aligning statistical feature distributions, often overlooking the intricate relationships among different land cover classes. In response, this paper introduces a novel approach for cross-scene hyperspectral classification, employing hierarchical prototype graph alignment. This method harnesses prototypes as representative embedded representations of all samples within the same category. It achieves domain alignment across multiple scales through a series of graph convolution and pooling operations. In addition to distribution alignment, graph matching is introduced, effectively harmonizing semantic and topological information. The proposed method significantly enhances the accuracy and generalization capacity of cross-scene hyperspectral image classification.

Author Contributions

Conceptualization, D.S., H.H. and F.H.; methodology, D.S. and H.H.; software, H.H. and F.Z.; validation, H.H., F.Z. and F.H.; formal analysis, F.Z.; writing—original draft preparation, H.H.; writing—review and editing, F.Z., F.H. and J.Z.; visualization, F.H.; supervision, X.S.; funding acquisition, F.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Publicly available datasets were analyzed in this study, which can be found here: https://github.com/YuxiangZhang-BIT/Data-CSHSI.

Acknowledgments

The authors would like to thank the authors of all the references used in this paper, the editors, and the anonymous reviewers for their detailed comments and suggestions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Li, K.; Zhang, G.; Li, X.; Xie, J. Face recognition based on improved Retinex and sparse representation. Procedia Eng. 2011, 15, 2010–2014. [Google Scholar] [CrossRef]
Zhang, L.; Yang, M.; Feng, X. Sparse representation or collaborative representation: Which helps face recognition? In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 471–478. [Google Scholar]
Li, W.; Zhang, Y.; Liu, N.; Du, Q.; Tao, R. Structure-aware collaborative representation for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2019, 57, 7246–7261. [Google Scholar] [CrossRef]
Wang, R.; Chen, H.; Lu, Y.; Zhang, Q.; Nie, F.; Li, X. Discrete and Balanced Spectral Clustering with Scalability. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 14321–14336. [Google Scholar] [CrossRef]
Zhang, M.; Li, W.; Du, Q. Diverse region-based CNN for hyperspectral image classification. IEEE Trans. Image Process. 2018, 27, 2623–2634. [Google Scholar] [CrossRef]
Zhang, M.; Li, W.; Du, Q.; Gao, L.; Zhang, B. Feature extraction for classification of hyperspectral and LiDAR data using patch-to-patch CNN. IEEE Trans. Cybern. 2018, 50, 100–111. [Google Scholar] [CrossRef]
Xu, X.; Li, W.; Ran, Q.; Du, Q.; Gao, L.; Zhang, B. Multisource remote sensing data classification based on convolutional neural network. IEEE Trans. Geosci. Remote Sens. 2017, 56, 937–949. [Google Scholar] [CrossRef]
Zhao, X.; Tao, R.; Li, W.; Li, H.C.; Du, Q.; Liao, W.; Philips, W. Joint classification of hyperspectral and LiDAR data using hierarchical random walk and deep CNN architecture. IEEE Trans. Geosci. Remote Sens. 2020, 58, 7355–7370. [Google Scholar] [CrossRef]
Zhao, Q.; Wang, X.; Wang, B.; Wang, L.; Liu, W.; Li, S. A Dual-Attention Deep Discriminative Domain Generalization Model for Hyperspectral Image Classification. Remote Sens. 2023, 15, 5492. [Google Scholar] [CrossRef]
Zhang, Y.; Li, W.; Zhang, M.; Qu, Y.; Tao, R.; Qi, H. Topological Structure and Semantic Information Transfer Network for Cross-Scene Hyperspectral Image Classification. IEEE Trans. Neural Netw. Learn. Syst. 2021, 34, 2817–2830. [Google Scholar] [CrossRef]
Bruzzone, L.; Prieto, D.F. Unsupervised retraining of a maximum likelihood classifier for the analysis of multitemporal remote sensing images. IEEE Trans. Geosci. Remote Sens. 2001, 39, 456–460. [Google Scholar] [CrossRef]
Sun, H.; Liu, S.; Zhou, S.; Zou, H. Unsupervised cross-view semantic transfer for remote sensing image classification. IEEE Geosci. Remote Sens. Lett. 2015, 13, 13–17. [Google Scholar] [CrossRef]
Qin, Y.; Bruzzone, L.; Li, B. Tensor alignment based domain adaptation for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2019, 57, 9290–9307. [Google Scholar] [CrossRef]
Yang, W.; Peng, J.; Sun, W. Ideal regularized discriminative multiple kernel subspace alignment for domain adaptation in hyperspectral image classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 5833–5846. [Google Scholar] [CrossRef]
Zhu, Y.; Zhuang, F.; Wang, J.; Ke, G.; Chen, J.; Bian, J.; Xiong, H.; He, Q. Deep subdomain adaptation network for image classification. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 1713–1722. [Google Scholar] [CrossRef]
Long, M.; Cao, Y.; Wang, J.; Jordan, M. Learning transferable features with deep adaptation networks. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; PMLR: London, UK, 2015; pp. 97–105. [Google Scholar]
Qu, Y.; Baghbaderani, R.K.; Li, W.; Gao, L.; Zhang, Y.; Qi, H. Physically constrained transfer learning through shared abundance space for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2021, 59, 10455–10472. [Google Scholar] [CrossRef]
Ning, Y.; Peng, J.; Liu, Q.; Huang, Y.; Sun, W.; Du, Q. Contrastive learning based on category matching for domain adaptation in hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5301814. [Google Scholar] [CrossRef]
Wang, H.; Cheng, Y.; Liu, X.; Kong, Y. Bi-classifier adversarial network for cross-scene hyperspectral image classification. IEEE Geosci. Remote Sens. Lett. 2023, 20, 5504005. [Google Scholar] [CrossRef]
Liu, F.; Gao, W.; Liu, J.; Tang, X.; Xiao, L. Adversarial Domain Alignment with Contrastive Learning for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5525720. [Google Scholar] [CrossRef]
Li, Z.; Tang, X.; Li, W.; Wang, C.; Liu, C.; He, J. A two-stage deep domain adaptation method for hyperspectral image classification. Remote Sens. 2020, 12, 1054. [Google Scholar] [CrossRef]
Zhang, Y.; Li, W.; Sun, W.; Tao, R.; Du, Q. Single-source domain expansion network for cross-scene hyperspectral image classification. IEEE Trans. Image Process. 2023, 32, 1498–1512. [Google Scholar] [CrossRef]
Kang, G.; Jiang, L.; Yang, Y.; Hauptmann, A.G. Contrastive adaptation network for unsupervised domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4893–4902. [Google Scholar]
Gretton, A.; Borgwardt, K.; Rasch, M.; Schölkopf, B.; Smola, A. A kernel method for the two-sample-problem. Adv. Neural Inf. Process. Syst. 2006, 19, 513–520. [Google Scholar]
Chen, L.; Gan, Z.; Cheng, Y.; Li, L.; Carin, L.; Liu, J. Graph optimal transport for cross-domain alignment. In Proceedings of the International Conference on Machine Learning, Virtual, 12–18 July 2020; PMLR: London, UK, 2020; pp. 1542–1553. [Google Scholar]
Peyré, G.; Cuturi, M. Computational optimal transport: With applications to data science. Found. Trends Mach. Learn. 2019, 11, 355–607. [Google Scholar] [CrossRef]
Peyré, G.; Cuturi, M.; Solomon, J. Gromov-wasserstein averaging of kernel and distance matrices. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; PMLR: London, UK, 2016; pp. 2664–2672. [Google Scholar]
Wang, Z.; Luo, Y.; Huang, Z.; Baktashmotlagh, M. Prototype-matching graph network for heterogeneous domain adaptation. In Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA, 12–16 October 2020; pp. 2104–2112. [Google Scholar]
Xu, M.; Wang, H.; Ni, B.; Tian, Q.; Zhang, W. Cross-domain detection via graph-induced prototype alignment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 12355–12364. [Google Scholar]
Snell, J.; Swersky, K.; Zemel, R. Prototypical networks for few-shot learning. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 4080–4090. [Google Scholar]
Gretton, A.; Borgwardt, K.M.; Rasch, M.J.; Schölkopf, B.; Smola, A. A kernel two-sample test. J. Mach. Learn. Res. 2012, 13, 723–773. [Google Scholar]
Vapnik, V.N. An overview of statistical learning theory. IEEE Trans. Neural Netw. 1999, 10, 988–999. [Google Scholar] [CrossRef]
Yu, C.; Wang, J.; Chen, Y.; Huang, M. Transfer learning with dynamic adversarial adaptation network. In Proceedings of the 2019 IEEE International Conference on Data Mining (ICDM), Beijing, China, 8–11 November 2019; pp. 778–786. [Google Scholar]
Zhu, Y.; Zhuang, F.; Wang, J.; Chen, J.; Shi, Z.; Wu, W.; He, Q. Multi-representation adaptation network for cross-domain image classification. Neural Netw. 2019, 119, 214–221. [Google Scholar] [CrossRef]
He, X.; Chen, Y.; Ghamisi, P. Heterogeneous transfer learning for hyperspectral image classification based on convolutional neural network. IEEE Trans. Geosci. Remote Sens. 2019, 58, 3246–3263. [Google Scholar] [CrossRef]

Figure 1. Illustration of joint distribution alignment and topological structure alignment for cross-scene learning.

Figure 2. Calculation flowchart of the GOT distance.

Figure 3. Flowchart of the proposed HPGA.

Figure 4. Pseudocolor image and ground truth map of Houston. (a) Pseudocolor image of Houston 2013. (b) Ground truth map of Houston 2013. (c) Pseudocolor image of Houston 2018. (d) Ground truth map of Houston 2018.

Figure 5. Pseudocolor image and ground truth map of HyRANK. (a) Pseudocolor image of Dioni. (b) Ground truth map of Dioni. (c) Pseudocolor image of Loukia. (d) Ground truth map of Loukia.

Figure 6. Classification result maps for the target scene Houston2018. (a) False color image. (b) Ground truth. (c) DAN. (d) DAAN. (e) MRAN. (f) DSAN. (g) HTCNN. (h) BCAN. (i) HPGA.

Figure 7. Classification result maps for the target scene HyRANK. (a) False color image. (b) Ground truth. (c) DAN. (d) DAAN. (e) MRAN. (f) DSAN. (g) HTCNN. (h) BCAN. (i) HPGA.

Figure 8. Comparison of 2D visualization of domain adaptive pre- and post-features on the Houston dataset.

Figure 9. The impact of parameters

λ_{1}

and

λ_{2}

on classification results.

Figure 9. The impact of parameters

λ_{1}

and

λ_{2}

on classification results.

Table 1. Number of source and target samples for the Houston dataset.

Class	Class Name	Number of Classes
Class	Class Name	Houston 2013 (Source)	Houston 2018 (Target)
1	Grass healthy	345	1353
2	Grass stressed	365	4888
3	Trees	365	2766
4	Water	285	22
5	Residential buildings	319	5347
6	Non-residential buildings	408	32,459
7	Road	443	6365
Total		2530	53,200

Table 2. Number of source and target samples for the HyRank dataset.

Class	Class Name	Number of Classes
Class	Class Name	Dioni (Source)	Loukia (Target)
1	Dense Urban Fabric	1262	206
2	Mineral Extraction Sites	204	54
3	Non-Irrigated Arable Land	614	426
4	Fruit Trees	150	79
5	Olive Groves	1768	1107
6	Coniferous Forest	361	422
7	Dense Sderophyllous Vegetation	5035	2996
8	Sparse Sderophyllous Vegetation	6374	2361
9	Sparsely Vegetated Areas	1754	399
10	Rocks and Sand	492	453
11	Water	1612	1393
12	Coastal Water	398	421
Total		20,024	10,317

Table 3. Classification results of different methods on the Houston dataset.

Class	DAN	DAAN	MRAN	DSAN	HTCNN	BCAN	HPGA
1	55.95 ± 4.15	61.57 ± 3.42	56.39 ± 5.24	57.95 ± 6.22	4.85 ± 3.78	69.56 ± 7.56	45.08 ± 3.72
2	72.18 ± 6.85	76.94 ± 2.89	75.57 ± 6.48	67.90 ± 9.13	71.57 ± 7.35	82.82 ± 3.27	89.13 ± 1.46
3	62.87 ± 4.93	66.67 ± 3.46	68.00 ± 4.58	71.69 ± 5.77	35.75 ± 5.34	57.40 ± 8.28	72.60 ± 4.74
4	100.00 ± 0.00	72.73 ± 2.78	63.64 ± 4.81	81.82 ± 3.67	54.64 ± 6.98	63.18 ± 5.27	100.00 ± 0.00
5	56.33 ± 4.78	52.76 ± 7.44	66.50 ± 3.65	61.79 ± 8.23	54.40 ± 5.76	80.49 ± 3.66	80.14 ± 7.55
6	74.11 ± 3.57	69.64 ± 4.88	68.54 ± 2.71	70.26 ± 11.85	90.80 ± 4.98	76.07 ± 3.90	72.82 ± 4.71
7	31.80 ± 7.33	54.23 ± 8.23	54.36 ± 4.76	54.53 ± 7.54	44.05 ± 9.77	53.14 ± 3.19	55.48 ± 6.19
OA (%)	66.05 ± 5.87	66.59 ± 6.72	66.95 ± 4.98	67.08 ± 2.48	74.72 ± 4.35	60.81 ± 6.99	72.27 ± 5.28
AA (%)	64.74 ± 3.67	66.77 ± 4.87	64.71 ± 3.85	66.56 ± 3.62	50.72 ± 5.15	73.25 ± 5.84	78.36 ± 5.18
Kappa (%)	47.82 ± 3.72	50.83 ± 6.77	51.51 ± 4.71	52.05 ± 1.20	55.24 ± 4.11	58.16 ± 6.16	58.29 ± 1.76

Table 4. Classification results of different methods on the HyRANK dataset.

Class	DAN	DAAN	MRAN	DSAN	HTCNN	BCAN	HPGA
1	10.68 ± 3.58	18.45 ± 6.41	11.65 ± 7.23	26.21 ± 4.37	5.34 ± 1.89	12.14 ± 8.32	10.22 ± 5.69
2	40.74 ± 7.12	14.81 ± 3.94	0.00 ± 0.00	18.52 ± 9.01	0.00 ± 0.00	31.85 ± 2.18	12.02 ± 3.57
3	23.47 ± 4.91	7.51 ± 1.47	3.52 ± 9.27	23.71 ± 6.81	45.07 ± 5.63	8.54 ± 2.89	69.48 ± 1.54
4	3.80 ± 2.22	10.13 ± 9.34	0.00 ± 0.00	0.00 ± 0.00	0.00 ± 0.00	7.09 ± 1.93	0.00 ± 0.00
5	17.71 ± 9.84	10.12 ± 3.45	29.09 ± 8.57	48.69 ± 2.67	18.61 ± 4.55	17.52 ± 3.78	20.23 ± 5.41
6	4.98 ± 3.29	5.21 ± 2.43	40.52 ± 1.12	45.97 ± 6.54	2.61 ± 7.92	20.62 ± 4.78	13.88 ± 9.87
7	62.05 ± 4.67	81.91 ± 3.23	64.12 ± 6.98	60.58 ± 5.33	77.77 ± 8.45	85.34 ± 7.56	77.24 ± 2.88
8	65.14 ± 8.69	72.91 ± 5.43	57.65 ± 3.97	67.26 ± 9.15	62.22 ± 4.36	71.01 ± 7.83	69.72 ± 1.62
9	63.16 ± 7.19	7.02 ± 2.11	71.18 ± 9.45	37.09 ± 6.54	6.27 ± 8.22	57.84 ± 3.48	7.52 ± 5.17
10	0.00 ± 0.00	0.22 ± 0.15	23.18 ± 8.44	4.86 ± 1.75	0.00 ± 0.00	0.00 ± 0.00	58.28 ± 9.34
11	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00
12	95.96 ± 8.73	100 ± 0.00	100 ± 0.00	100 ± 0.00	97.86 ± 3.24	97.96 ± 7.89	57.72 ± 4.53
OA (%)	56.31 ± 5.87	60.47 ± 3.21	58.32 ± 9.48	60.92 ± 7.63	58.63 ± 1.22	42.49 ± 2.91	62.16 ± 8.36
AA (%)	40.64 ± 2.34	35.69 ± 9.12	41.74 ± 3.87	44.41 ± 6.58	34.65 ± 5.14	64.31 ± 1.82	44.84 ± 2.47
Kapp (%)	45.94 ± 8.54	49.71 ± 3.98	49.14 ± 7.12	52.44 ± 1.35	47.47 ± 6.47	54.99 ± 2.44	53.03 ± 4.78

Table 5. Ablation experiments of HPGA on two datasets.

Module				Dataset
$L_{GNN}$	$L_{MMD}$	$L_{GOT}$	$L_{LP}$	Houston			HyRANK
$L_{GNN}$	$L_{MMD}$	$L_{GOT}$	$L_{LP}$	OA (%)	AA (%)	Kappa (%)	OA (%)	AA (%)	Kappa (%)
✓	-	-	-	61.22 ± 3.45	64.29 ± 2.31	50.85 ± 4.67	45.80 ± 5.24	32.16 ± 1.78	40.23 ± 2.59
✓	✓	-	-	68.42 ± 2.54	70.17 ± 4.13	54.76 ± 3.11	56.91 ± 3.98	37.22 ± 1.67	46.34 ± 2.46
✓	✓	✓	-	69.25 ± 5.47	75.84 ± 2.36	56.89 ± 4.89	61.19 ± 1.24	41.75 ± 3.21	52.19 ± 2.88
✓	✓	✓	✓	72.27 ± 5.28	78.36 ± 5.18	58.29 ± 1.76	62.16 ± 3.72	44.84 ± 2.47	53.03 ± 4.78

Table 6. Effect of parameter

λ_{3}

on classification results on two datasets.

Table 6. Effect of parameter

λ_{3}

on classification results on two datasets.

OA (%)	$λ_{3}$
OA (%)	0.001	0.01	0.1	1	10
Houston 2018	68.85 ± 2.34	71.15 ± 3.45	71.06 ± 1.78	67.35 ± 4.56	67.04 ± 2.67
Louki	60.17 ± 3.12	59.82 ± 1.54	60.84 ± 2.67	59.77 ± 4.89	60.04 ± 2.45

Table 7. Effect of parameter L on classification results on two datasets.

OA (%)	L
OA (%)	1	2	3	4
Houston 2018	69.74 ± 3.14	72.03 ± 4.23	72.27 ± 2.67	68.64 ± 3.21
Louki	57.26 ± 4.12	59.97 ± 2.45	62.16 ± 3.36	59.77 ± 1.78

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Shen, D.; Hu, H.; He, F.; Zhang, F.; Zhao, J.; Shen, X. Hierarchical Prototype-Aligned Graph Neural Network for Cross-Scene Hyperspectral Image Classification. Remote Sens. 2024, 16, 2464. https://doi.org/10.3390/rs16132464

AMA Style

Shen D, Hu H, He F, Zhang F, Zhao J, Shen X. Hierarchical Prototype-Aligned Graph Neural Network for Cross-Scene Hyperspectral Image Classification. Remote Sensing. 2024; 16(13):2464. https://doi.org/10.3390/rs16132464

Chicago/Turabian Style

Shen, Danyao, Haojie Hu, Fang He, Fenggan Zhang, Jianwei Zhao, and Xiaowei Shen. 2024. "Hierarchical Prototype-Aligned Graph Neural Network for Cross-Scene Hyperspectral Image Classification" Remote Sensing 16, no. 13: 2464. https://doi.org/10.3390/rs16132464

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Hierarchical Prototype-Aligned Graph Neural Network for Cross-Scene Hyperspectral Image Classification

Abstract

1. Introduction

2. Methodology

2.1. Preliminaries

2.1.1. Domain Adaptation

2.1.2. Graph Matching

2.2. Overall Framework

2.3. Hierarchical Prototype Representation

2.4. Domain Alignment

2.5. Loss Function

3. Experiments

3.1. Dataset Description

3.1.1. Houston

3.1.2. HyRANK

3.2. Experimental Settings

3.3. Experimental Results

4. Discussion

4.1. Ablation Experiments

4.2. Parameter Sensitivity Analysis

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI