Hyperspectral Image Classification Based on Adaptive Global–Local Feature Fusion

Yang, Chunlan; Kong, Yi; Wang, Xuesong; Cheng, Yuhu

doi:10.3390/rs16111918

Open AccessArticle

Hyperspectral Image Classification Based on Adaptive Global–Local Feature Fusion

¹

School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China

²

School of Electronics and Electrical Engineering, Bengbu University, Bengbu 233030, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2024, 16(11), 1918; https://doi.org/10.3390/rs16111918

Submission received: 20 March 2024 / Revised: 15 May 2024 / Accepted: 22 May 2024 / Published: 27 May 2024

Download

Browse Figures

Review Reports Versions Notes

Abstract

Labeled hyperspectral image (HSI) information is commonly difficult to acquire, so the lack of valid labeled data becomes a major puzzle for HSI classification. Semi-supervised methods can efficiently exploit unlabeled and labeled data for classification, which is highly valuable. Graph-based semi-supervised methods only focus on HSI local or global data and cannot fully utilize spatial–spectral information; this significantly limits the performance of classification models. To solve this problem, we propose an adaptive global–local feature fusion (AGLFF) method. First, the global high-order and local graphs are adaptively fused, and their weight parameters are automatically learned in an adaptive manner to extract the consistency features. The class probability structure is then used to express the relationship between the fused feature and the categories and to calculate their corresponding pseudo-labels. Finally, the fused features are imported into the broad learning system as weights, and the broad expansion of the fused features is performed with the weighted broad network to calculate the model output weights. Experimental results from three datasets demonstrate that AGLFF outperforms other methods.

Keywords:

adaptive fusion; global–local features; class probability structure; weighted broad learning system; hyperspectral image classification

1. Introduction

Hyperspectral images typically include a large amount of approximately continuous spectral band information and spatial location information [1,2,3]. HSI classification distinguishes the corresponding categories of each pixel, which is a basic and key application technology in remote sensing and which can be successfully utilized in numerous fields such as mineral detection, environment detection, and crop monitoring [4,5,6]. Early applied HSI classification methods, including random forest [7], support vector machine (SVM) [8], and graph-based [9] methods, enhanced feature classification ability by exploring rich and effective spectral information. Other models, such as independent component analysis [10] and principal component analysis (PCA) [11], are often used to identify valid spectral features. However, the lack of spatial contextual information in these early methods has led to poor classification results. Therefore, several subsequent classification methods focused on studying rich spatial information of the HSI surface. For example, rich spatial and spectral information is integrated into sparse representation to achieve high-quality classification methods [12,13]. Markov random field [14,15] and superpixel methods [16,17] can fully explore spatial position information to achieve ideal classification. Although these methods can efficiently handle the classification task of HSI, they cannot effectively identify the small differences between different classes or the larger differences between the same classes.

Deep learning (DL) can efficiently extract high-order features [18] and has been successfully applied to HSI classification tasks. Previously proposed deep HSI methods, such as deep belief networks [19], directly captured HSI features without manual acquisition. Stacked autoencoder (SAE) [20] stacked several autoencoder layers to achieve recognition of each input spectral vector. However, these methods cannot consider both spectral and spatial information to deal with spectral variability. Consequently, numerous scholars used the convolutional neural network (CNN) [21] model to learn effective spatial and spectral features of HSI, obtaining promising results. Mei et al. [22] combined HSI spectral and spatial information into a feature learning CNN model to obtain better classification results. Kong et al. [23] designed intra-class and inter-class hypergraph models to extract high-quality spectral band information and effective spatial location information using the CNN model. Chen et al. [24] captured rich spatial–spectral data simultaneously with 3D CNN for HSI classification. Liu et al. [25] proposed a Siamese CNN to capture rich and effective spatial–spectral information and used SVM to achieve the final HSI classification. Yang et al. [26] designed 2D and 3D CNNs and improved the regression model for HSI classification. Wang et al. [27] used a residual network model to capture spatial–spectral information quickly.

DL can efficiently capture the desired features via multiple stacked units [28,29]. However, DL models usually require constant restructuring and extensive network training. In contrast, the broad learning system (BLS) constructs a flat formal network model [30] and easily enables network broad expansion. The original input HSI data are optimized by transformations to generate mapped features (MFs), which are transformed into enhancement nodes (ENs) by the generated random weight mappings. All ENs and MFs are simultaneously transmitted to the output layer, and the corresponding weights are calculated using the ridge regression theory. Jin et al. [31] constructed the graph regular BLS model by adding the manifold regular term to increase the final recognition ability. Kong et al. [32] used a novel graph regularized SAE structure, fine-tuned partial weight values of MFs and ENs in the BLS model, and finally realized classification through spectral clustering. Wang et al. [33] constructed domain adaptation BLS to dynamically adjust the conditions and marginal distributions to maintain data feature alignment by introducing manifold distribution constraints, thus significantly improving the performance of the BLS model. However, only a small amount of labeled data is available in the actual classification application task, and a large amount of unlabeled data is easily obtained but cannot be effectively utilized. The supervision method cannot solve this problem. BLS methods commonly adopt the supervised learning model, which cannot effectively utilize most unlabeled data and obtain relatively better classification performance.

Semi-supervised learning (SSL) can efficiently utilize a large amount of unlabeled and a few labeled data. Unlabeled data can provide considerable useful input information to improve the ability of the corresponding SSL model. Graph-based SSL methods are largely adopted owing to their efficient scalability. Basic graph SSL models usually use methods such as k-nearest neighbor [34] and non-negative local linear reconstruction [35] to construct data graphs, which are susceptible to the influence of composition strategies and nearest neighbor parameters and can only construct local data graph structures. Therefore, SSL methods combined with sparse graphs are constantly proposed. Morsier et al. [36] constructed the kernel low-rank sparse graph by solving the sample similarity relationship in Hilbert space and described the data relationship between sparse and low-rank constraints. Ma et al. [37] used a new robust non-negative and l1 paradigm-constrained SSL method to construct reliable inter-data by exploring the label structure relationship graphs. However, these algorithms do not sufficiently consider the class structure of the data. Therefore, Shao et al. [38] used the probability category model combined with the sparse representation method to obtain a reliable relationship graph among HSI data. This SSL algorithm ignores the construction of the global relationship graph. Ding et al. [39] obtained high-order information by exploring global neighbor relationships instead of using similarity measurements of HSI data. However, these methods only consider the global or local graphs and do not fully utilize both global and local information, which significantly limits the classification performance of the model.

To solve this problem, we propose an adaptive global–local feature fusion (AGLFF) method. The proposed method extracts the corresponding features of HSI through the adaptive fusion of global high-order and local graphs and uses the class probability (CP) structure to express the relationship between the data and the categories. Simultaneously, a weighted broad learning system (WBLS) is used to enable network broad expansion. Our contributions are as follows:

The global–local adaptive fusion graph is built to obtain consistent spatial–spectral data. Adaptive fusion can automatically learn the weight parameters of the global high-order and local graphs, which can realize feature smoothing of intra-class data and increase the discriminability of inter-class data.
The CP structure is used to express the relationship between the fused feature and the categories to better utilize unlabeled data, resulting in improved classification performance.
Adaptive fusion features are introduced into the BLS model as weights, and the WBLS model is used to expand the broad of the fused features to further enhance the expressiveness of data.

The rest of the paper is organized as follows. The classification process of the proposed AGLFF is detailed in Section 2. Detailed experimental results are reported and analyzed in Section 3. Section 4 concludes this paper.

2. Adaptive Global–Local Feature Fusion Method

Figure 1 shows the classification process of the proposed AGLFF, which mainly includes five steps. First, PCA is used to reduce the dimensions of the original HSI. Second, superpixels are fetched with simple linear iterative clustering (SLIC) [40], and then the global high-order and local graphs are adaptively fused to obtain consistent spatial–spectral features. Third, the CP structure is applied to calculate the pseudo-labels corresponding to the unlabeled samples. Fourth, the consistency features after fusion are extended by BLS to enhance the feature representation. Finally, the fused features are introduced into BLS as weights, and the output weights are calculated by the ridge regression theory.

2.1. Adaptive Feature Fusion

To fully utilize global and local information, consistent spatial–spectral features are obtained by adaptive fusion. HSI typically includes a lot of approximately continuous spectral band information. PCA is used to reduce the dimension of the original data. Define the HSI matrix

X \in R^{N \times b}

, with pixel data

x \in R^{b}

after dimensionality reduction, where N and b represent the pixel number and dimension value, respectively. Define a spatial coordinate matrix

K \in R^{N \times 2}

, where

k \in R^{2}

represents the spatial coordinate data after dimension reduction. SLIC is then used to segment the data after dimensionality reduction to generate superpixels. A reliable local adjacency graph is constructed based on the spatial–spectral information between superpixels and pixels. Inspired by [39], the nearest neighbors are determined based on the probabilistic neighbor relationship between the superpixels and pixels belonging to the same class. The statistical characteristic information of each superpixel after segmentation is expressed as its corresponding mean value. The average value of spectral features of the superpixel is

E = [e_{1}, \dots, e_{M}] \in R^{M \times b}

, while

F = [f_{1}, \dots, f_{M}] \in R^{M \times 2}

is the average value of spatial coordinates, where M represents the value of the superpixel. The smaller spectral distance

{∥x_{i} - e_{j}∥}_{2}^{2}

and the smaller spatial location distance

{∥k_{i} - f_{j}∥}_{2}^{2}

between pixel i and superpixel j correspond to a larger probability relationship

S_{i j}

of being in the same class. The local adjacency graph model between superpixels and pixels can be expressed as follows:

min_{s_{i}^{T} 1 = 1, 0 \leq s_{i} \leq 1} \sum_{i = 1}^{N} \sum_{j = 1}^{M} ({∥x_{i} - e_{j}∥}_{2}^{2} + λ {∥k_{i} - f_{j}∥}_{2}^{2}) S_{i j} + θ {∥S∥}_{F}^{2},

(1)

where

x_{i}

and

k_{i}

represent the spectral information and spatial location coordinates of pixel i, respectively, and

e_{j}

and

f_{j}

represent the average spectral information and spatial location coordinate values of superpixel j, respectively.

S_{i j}

represents the local neighbor relationship between pixel i and superpixel j. The addition of

{∥S∥}_{F}^{2}

prevents obtaining the trivial nearest neighbor solution with a probability value of 1, while

θ

and

λ

are the corresponding regularization parameters.

Owing to the long spatial distance and large spectral variation between intra-class pixels, the local adjacency graph can only obtain the neighboring relationship of a few nearby neighbor pixels and cannot obtain the global neighbor relationship. Global consistency features between superpixels and pixels are obtained using the graph topological consistency relationship, and feature aggregation between intra-class data can be further achieved. Inspired by [39], for two superpixels, the topological relationship remains highly consistent if the superpixels are connected through consecutive neighbors. The topological consistency relationship diagram is shown in the green dashed box in Figure 1. The superpixel P and superpixel Q are the same class. They are spatially located far away, and their spectral information is extremely variable, so the spatial–spectral correlation is low. However, they are connected by continuous neighbors, which means a higher topological consistency. Finding the similarity relationship between the superpixels can calculate their topological consistency relationship. The relationship model is expressed as follows:

min_{\begin{matrix} Z \end{matrix}} \sum_{r}^{M} (\frac{1}{2} \sum_{i, j = 1}^{M} A_{i j} {∥Z_{r i} - Z_{r j}∥}^{2} + ρ \sum_{i = 1}^{M} {∥Z_{r i} - I_{r i}∥}^{2}),

(2)

where

A_{i j}

is the similarity relationship of superpixels i and j,

Z \in R^{M \times M}

is the topological consistency relationship between the superpixels,

Z_{r i}

is the topological consistency relationship between superpixels r and i,

ρ

is the regular parameter, and

I

is the identity matrix. The first term in (2) corresponds to the graph topological consistency relationship assumption, and the second term prevents obtaining a trivial solution.

A

represents the corresponding similarity matrix between the superpixels, which can be calculated as follows:

min_{\begin{matrix} a_{i}^{T} 1 = 1, 0 \leq a_{i} \leq 1 \end{matrix}} \sum_{i, j = 1}^{M} ({∥e_{i} - e_{j}∥}_{2}^{2} + λ_{1} {∥f_{i} - f_{j}∥}_{2}^{2}) A_{i j} + θ_{1} {∥A∥}_{F}^{2},

(3)

where

e_{j}

and

f_{j}

represent the average spectral information and average spatial location coordinate values of superpixel j, respectively.

λ_{1}

and

θ_{1}

are regularization parameters. Equation (3) can be resolved according to [41], and (1) can be solved in the same manner. Equation (2) is independent of the value of r. Therefore, the problem can be solved by separating r:

min_{\begin{matrix} z_{r} \end{matrix}} \frac{1}{2} \sum_{i, j = 1}^{M} A_{i j} {∥Z_{r i} - Z_{r j}∥}^{2} + ρ \sum_{i = 1}^{M} {∥Z_{r i} - I_{r i}∥}^{2} .

(4)

Equation (4) is derived with respect to

z_{r}

and considers the value of zero to obtain its optimal solution

z_{r}^{*}

:

{Lz}_{r}^{*} + ρ (z_{r}^{*} - I_{r}) = 0,

(5)

where

L \in R^{M \times M}

denotes the Laplacian matrix of

A

, and

I_{r}

and

z_{r}^{*}

are the column vectors. Subsequently,

z_{r}^{*} = {(I + L / ρ)}^{- 1} I_{r} .

(6)

The rth column of

I

is

I_{r}

; therefore, the rth column of

{(I + L / ρ)}^{- 1}

is

z_{r}^{*}

. The optimal topological relationship can be expressed as follows:

Z = {(I + L / ρ)}^{- 1} .

(7)

HSI classification is the task that operates on all pixels; thus, the global consistency relationship

G

is calculated by combining the local neighbor relationship

S

in (1) with the topological relationship

Z

in (4). Subsequently,

G = SZ,

(8)

where

G \in R^{N \times M}

is a global consistency relationship between pixels and superpixels, which can effectively capture the global neighbor features. Considering that the neighbor relationship between pixels and superpixels cannot be fully expressed using global features. Therefore, the global and local effective features are adaptively fused to obtain the neighbor relationship between pixels and superpixels. The model can be expressed as follows:

\begin{matrix} min_{\begin{matrix} s_{i}^{T} 1 = 1, 0 \leq s_{i} \leq 1 \\ B + C = 1 \end{matrix}} \sum_{i = 1}^{N} \sum_{j = 1}^{M} B ({∥x_{i} - e_{j}∥}_{2}^{2} + λ {∥k_{i} - f_{j}∥}_{2}^{2}) S_{i j} + θ {∥S∥}_{F}^{2} \\ + \sum_{i = 1}^{N} \sum_{j = 1}^{M} C ({∥x_{i} - e_{j}∥}_{2}^{2} + λ {∥k_{i} - f_{j}∥}_{2}^{2}) G_{i j} + θ_{2} {∥G∥}_{F}^{2} . \end{matrix}

(9)

This can be simply transformed to

\begin{matrix} min_{\begin{matrix} s_{i}^{T} 1 = 1, 0 \leq s_{i} \leq 1 \\ B + C = 1 \end{matrix}} \sum_{i = 1}^{N} \sum_{j = 1}^{M} B ({∥x_{i} - e_{j}∥}_{2}^{2} + λ {∥k_{i} - f_{j}∥}_{2}^{2}) S_{i j} \\ + \sum_{i = 1}^{N} \sum_{j = 1}^{M} C ({∥x_{i} - e_{j}∥}_{2}^{2} + λ {∥k_{i} - f_{j}∥}_{2}^{2}) S_{i j} Z + λ_{2} {∥S∥}_{F}^{2}, \end{matrix}

(10)

where

λ_{2} = θ I + θ_{2} Z^{T} Z

. Equation (10) is independent of the value of i. Subsequently,

\begin{matrix} min_{\begin{matrix} s_{i}^{T} 1 = 1, 0 \leq s_{i} \leq 1 \\ B + C = 1 \end{matrix}} \sum_{j = 1}^{M} B ({∥x_{i} - e_{j}∥}_{2}^{2} + λ {∥k_{i} - f_{j}∥}_{2}^{2}) S_{i j} \\ + \sum_{j = 1}^{M} C ({∥x_{i} - e_{j}∥}_{2}^{2} + λ {∥k_{i} - f_{j}∥}_{2}^{2}) S_{i j} Z + λ_{2} {∥S∥}_{F}^{2} . \end{matrix}

(11)

Let

d_{i} = d_{i}^{x e} + λ d_{i}^{k f}

represent distance, where

d_{i}^{x e} = \sum_{j = 1}^{M} {∥x_{i} - e_{j}∥}_{2}^{2}

and

d_{i}^{k f} = \sum_{j = 1}^{M} {∥k_{i} - f_{j}∥}_{2}^{2}

; then, we have the following:

min_{\begin{matrix} s_{i}^{T} 1 = 1, 0 \leq s_{i} \leq 1 \\ B + C = 1 \end{matrix}} B d_{i} S_{i j} + C d_{i} S_{i j} Z + λ_{2} {∥S∥}_{F}^{2} .

(12)

Equation (12) can be transformed into a vector form as follows:

min_{\begin{matrix} s_{i}^{T} 1 = 1, 0 \leq s_{i} \leq 1 \\ B + C = 1 \end{matrix}} {∥s_{i} + \frac{B I + C Z}{2 λ_{2}}∥}_{2}^{2} .

(13)

The Lagrangian function can be expressed as follows:

\begin{matrix} L (s_{i}, α_{i}, ω_{i}, γ_{i}) = & \frac{1}{2} {∥s_{i} + \frac{B I + C Z}{2 λ_{2}}∥}_{2}^{2} - α_{i} (s_{i}^{T} 1 - 1) \\ - ω_{i}^{T} (s_{i} - 0) - γ_{i}^{T} (B + C - 1) . \end{matrix}

(14)

Subsequently, considering the derivative of (14) and setting it to zero gives

s_{i} + \frac{B I + C Z}{2 λ_{2}} d_{i} - α_{i} 1 - ω_{i} = 0 .

(15)

According to Karush–Kuhn–Tucker conditions, the desired result of (15) is expressed as follows:

S_{i j} = {(- \frac{B I + C Z}{2 λ_{2}} d_{i j} + α_{i})}_{+} .

(16)

According to the constraint

s_{i}^{T} 1 = 1

, we have

\begin{matrix} \sum_{j = 1}^{M} (- \frac{B I + C Z}{2 λ_{2}} d_{i j} + α_{i}) = 1 \\ \Rightarrow α_{i} = (1 + \frac{B I + C Z}{2 λ_{2}} d_{i j}) / M . \end{matrix}

(17)

After obtaining

α_{i}

, the corresponding

S_{i j}

can be determined. We denote

\hat{G} = \sqrt{D} (G Δ^{- 1} G^{T} + μ I) \sqrt{D}

as the regular matrix of

G

, where

G Δ^{- 1} G^{T} + μ I

increases the stability of the fusion model.

D = d i a g (\sum_{j = 1}^{N} {(G Δ^{- 1} G^{T} + μ I)}_{i j})

,

Δ \in R^{M \times M}

, and

Δ_{j j} = \sum_{i = 1}^{M} G_{i j}

; the parameter

μ

is empirically set to 0.1. Similarly, the regular matrix

\hat{S}

of S can be calculated. The fused feature can be expressed as

F_{u} = B \hat{S} + C \hat{G},

(18)

where B and C are fusion coefficients, and

F_{u}

is the fused feature. Algorithm 1 summarizes the fusion process.

Algorithm 1 Adaptive Feature Fusion Process

Input: PCA-based HSI representation

X \in R^{N \times b}

, pixel spatial coordinate

K \in R^{N \times 2}

, superpixel spectral feature

E = [e_{1}, \dots, e_{M}] \in R^{M \times b}

, and superpixel spatial coordinate

F = [f_{1}, \dots, f_{M}] \in R^{M \times 2}

.

(1): Obtain the local neighbor relationship $S$ between pixels and superpixels according to (1).
(2): Obtain the topological relationship $Z$ between superpixels according to (4).
(3): Obtain the global consistency relationship $G$ according to (8).
(4): Calculate adaptive fused feature according to (9).
(5): Obtain the optimized $S_{i j}$ according to (16) and (17).
(6): Obtain the adaptive fused feature $F_{u}$ according to (18).

Output: Adaptive fused feature

F_{u}

.

2.2. Class Probability Structure

Unlabeled data pixels have no label information and cannot be effectively used. We use a CP structure to calculate their pseudo-labels. The labeled samples generated via adaptive fusion are denoted as

X_{l}^{F} = [x_{1}^{F}, \dots, x_{l}^{F}] \in R^{l \times b}

, and their corresponding labels are denoted as

Y_{l} = [y_{1}, \dots, y_{l}] \in R^{l \times c}

. The unlabeled samples via adaptive fusion are denoted as

X_{u}^{F} = [x_{1}^{F}, \dots, x_{u}^{F}] \in R^{u \times b}

, where l denotes the number of known labeled data, b denotes the dimensionality of the data, u denotes the number of unlabeled data, c denotes the category number, and

N = l + u

is the number of total data pixels. At any

x_{i}^{F} \in X_{u}^{F}

, the similarity relationship with the labeled sample

X_{l}^{F}

is

\begin{matrix} min {∥x_{i}^{F} - X_{l}^{F} σ∥}_{F}^{2} + δ {∥σ∥}_{1} \\ s . t . σ \geq 0 \end{matrix},

(19)

where

δ

and

σ

are the desired regular term parameter and sparse coefficient, respectively. Equation (19) can be further optimally solved using the alternating direction multiplier method [42] to determine the CP vector as

p_{i} = σ^{T} Y_{l},

(20)

where

p_{i} = (p_{i 1}, \dots, p_{i k}, \dots, p_{i c}) \in R^{1 \times c}

, and

p_{i k}

is the probability value that the ith data belongs to the kth category. The CP matrix

P_{u} = [p_{1}; \dots; p_{u}] \in R^{u \times c}

of unlabeled samples can be obtained by label propagation of given labeled samples, and

P_{l} = [p_{1}; \dots; p_{l}] \in R^{l \times c}

is the corresponding CP matrix of the given labeled samples. Therefore, for any two samples i and j, the probability of belonging to the same class is denoted as

P_{i j} = \{\begin{matrix} 1, i = j \\ p_{i} p_{j}^{T}, i \neq j \end{matrix} .

(21)

The CP matrix

P

can be divided into four small matrix blocks and denoted as

P = (\begin{matrix} P_{l l} P_{l u} \\ P_{u l} P_{u u} \end{matrix}),

(22)

where

P_{l l}

is the numerical probability matrix of the same class in the labeled data, and

P_{u u}

is the numerical probability matrix of the same class in the unlabeled data.

P_{l u}

and

P_{u l}

are the numerical probability matrix of the same class in the unlabeled and labeled data, respectively. By calculating the index of the data with the maximum probability for each row of

P_{l u}

, the most similar labeled data can be obtained for all unlabeled data. Therefore, the pseudo-labels of the unlabeled data can be solved and expressed as

\begin{matrix} p_{i k} = max (p_{i}) \\ \Rightarrow y_{i}^{u} = y_{k}^{l} . \end{matrix}

(23)

2.3. Weighted Broad Learning System

Global–local fused features are introduced into the BLS model as weights to construct a WBLS. Using the WBLS to expand the broad of the fused features can further enhance the feature representation of the data. Given the adaptive fused HSI data

X^{F} = [X_{l}^{F}; X_{u}^{F}] \in R^{N \times b}

, the labels

Y^{*} = [Y_{l}; Y_{u}]

can be computed through the CP structure. The model uses randomly generated weights

W^{M}

and deviation values

β

to map

X^{F}

to the newly expanded MF, and we have

U_{i} = X^{F} W_{i}^{M} + β_{i}^{M}, i = 1, 2, \dots, G^{M},

(24)

where

G^{M}

is the number of feature groups included in the MF,

W^{M} = [W_{1}^{M}, \dots, W_{G^{M}}^{M}]

is the weight,

β^{M} = [β_{1}^{M}, \dots, β_{G^{M}}^{M}]

is the deviation value, and

U_{i}

denotes the ith group of MF.

W^{M}

is obtained via sparse autoencoder. Subsequently, the obtained MF features are mapped to EN through selected functions to further expand the feature broad. We have

H_{j} = ϕ_{j} ({UW}_{j}^{E} + β_{j}^{E}), j = 1, 2, \dots, G^{E},

(25)

where

ϕ_{j} (\cdot)

denotes the selected nonlinear function,

G^{E}

is the number of nodes contained in the EN,

W^{E} = [W_{1}^{E}, \dots, W_{G^{E}}^{E}]

is the weight, and

β^{E} = [β_{1}^{E}, \dots, β_{G^{E}}^{E}]

is the deviation value. Finally, MF and EN are combined and passed to the output layer, and the output is denoted as

Y^{*} = [U | H] W,

(26)

where

W

denotes the weights of the output layer. The fused global–local matrix

F_{u}

is added to the BLS as weights to construct the objective function of the WBLS. We have

\underset{W}{arg min} {∥F_{u} [U | H] W - Y^{*}∥}_{2}^{2} + ξ {∥W∥}_{2}^{2},

(27)

where

ξ

denotes a regularization parameter. Equation (27) can be optimized using the ridge regression theory. We have

W = {(ξ I + {[U | H]}^{T} F_{u}^{T} F_{u} [U | H])}^{- 1} {[U | H]}^{T} F_{u}^{T} Y^{*} .

(28)

The prediction of the WBLS can then be calculated as follows:

Y = [U | H] W .

(29)

AGLFF uses fused global–local features to achieve data sample smoothing. The pseudo-labels corresponding to unlabeled samples are calculated via the CP structure to effectively utilize HSI unlabeled data. Moreover, the fused features are added to the BLS as weights to enhance feature representation. Algorithm 2 summarizes the AGLFF method.

Algorithm 2 AGLFF Method

Input: Adaptive fused data

X^{F}

.

(1): Obtain the probability class matrix P according to (21).
(2): Obtain pseudo-labels $Y_{u}$ of unlabeled data $X_{u}^{F}$ according to (23).
(3): Obtain MF features $U$ according to (24) and calculate EN features $H$ according to (25).
(4): Obtain weights $W$ of WBLS according to (28).
(5): Obtain predictive labels $Y$ of AGLFF according to (29).

Output: Predictive labels

Y

.

3. Experiments and Analysis

The properties of the AGLFF method were used for assessment in three real HSI datasets, namely, Indian Pines (IP), Kennedy Space Center (KSC), and Pavia University (PU). The average accuracy (AA), overall accuracy (OA), consumed time (T, s), kappa coefficient, and accuracy of each category were used to evaluate the results, and the experimental results were averaged over 10 replicate experiments. The experimental methods were run with Pytorch and MATLAB R2016a.

3.1. HSI Datasets

The IP dataset was obtained over India and includes 145 × 145 pixels, 16 categories, and 200 spectral bands of agricultural scenes. Figure 2a,b shows the false-color image data and ground-truth map.

The KSC dataset was extracted over Florida, including 512 × 614 pixels, 13 categories, and 176 spectral bands of space center scenes. Figure 3a,b shows the false-color image data and ground-truth map.

The PU dataset was extracted over northern Italy, including 610 × 340 pixels, 9 categories, and 103 spectral bands of university campus scenes. Figure 4a,b shows the false-color image data and ground-truth map.

The sample selection for the IP, KSC, and PU datasets is presented in Table 1. We chose 30 samples of each class as labeled samples (represented by n.l.s.), and the remainder were unlabeled samples (represented by n.u.s.) for all methods. Considering the small number of grass-pasture-mowed and oats classes in the IP dataset, 15 were selected here as labeled samples, and the remainder were unlabeled samples.

3.2. Comparative Experiments

To verify the superiority of the proposed AGLFF, the following nine classification and comparison methods were adopted, including traditional methods (SVM [43] and ELM [44]), superpixel methods (SuperPCA [45] and S3PCA [46]), broad learning methods (BLS [30], SBLS [47], and AGLFF1), depth methods (GCN [48] and GCGCN [39]), and graph-based semi-supervised methods (NSCKL [49] and XPGN [50]). AGLFF1 was the AGLFF model without WBLS and contained only regular BLS. The hyperparameter values of AGLFF1 and AGLFF were the same. The selected methods used grid search to set corresponding parameters. To demonstrate the impact of fusion features on HSI classification, the classification effects of global, local, and global–local fusion features and their corresponding models were compared and analyzed. Figure 2, Figure 3 and Figure 4 and Table 2, Table 3 and Table 4 show that:

The classification results of AGLFF outperform the other methods because the model achieves a consistent fusion of global and local features. In addition, semi-supervised classification is performed by sufficient unlabeled data, and the fused features are added to the BLS model as weights, making the features smoother and obtaining higher classification accuracy. GCGCN is better than GCN because of the use of an efficient GCN method, which captures rich global spectral–spatial features. SBLS uses more unlabeled information for semi-supervised classification through BLS and obtains relatively good results. The GCN model has the worst classification results because it processes only the spectral information. AGLFF1 has the best classification results except for AGLFF and GCGCN due to the use of global–local fusion features, achieving higher classification accuracy in several classes in all three datasets. Hence, the proposed AGLFF outperforms the other nine methods by using the global–local fusion features and introducing them into the BLS model as weights, with an OA value of 96.11% and a kappa value of 95.23% for the IP dataset.
The four models—ELM, SVM, BLS, and SuperPCA—consume the shortest time. BLS is the model with the shortest time, except for ELM, SuperPCA, and SVM, mainly because the BLS model is relatively simple and the parameters can be calculated according to the inverse matrix. GCGCN and GCN depth methods consume the longest time. XPGN takes a longer time than AGLFF because it use three branch models to acquire information from various scales, and the training time is longer. NSCKL takes less time than AGLFF due to its relatively simple structure. AGLFF takes neither the most nor the least time because it takes some time to fuse the global–local features and calculate the class probability matrix between the samples. However, AGLFF has the best classification performance and can realize feature smoothing of intra-class data and increase the discriminability of inter-class data.
The accuracy obtained by all methods on the IP dataset is relatively low because of the small degree of difference classes; for example, corn-mintill and corn-notill are less distinguishable and more difficult to classify. All methods yielded better and less time-consuming results on the KSC dataset because the dataset contains fewer samples of classes and less inter-class similarity, making it easier to distinguish between categories. The AGLFF has the best results on the KSC dataset, with an OA value of 99.26% and a kappa value of 99.09%. For the proposed AGLFF, misclassification appears only in Class 13 (water), and the rest of the classes are classified correctly. This further illustrates the advantages of AGLFF.

3.3. Parameter Analysis

3.3.1. Semi-Supervised Label Ratio

For the performance evaluation of AGLFF, a different ratio of labeled data from each class was selected from the dataset for model training. Figure 5 shows classification results acquired by seven methods of different sample numbers on three datasets. According to the OAs of different methods, the classification results of all methods were enhanced significantly with the increase in the number of labeled samples. AGLFF outperformed other methods, particularly for 20% and 40% labeled samples of each class. The proposed method maintained a clear advantage, indicating that the AGLFF method has a more stable classification effect.

3.3.2. Analysis of Construction Graph

To better verify the superiority of the global–local fused feature model, its performance was compared with the global and local feature models on different datasets. From the classification results of the different feature models in Figure 6, the classification accuracy of the fused features for any number of labeled samples on the three datasets is determined to be higher than that of global and local features. AGLFF performs best, while the local feature model performs relatively poorly, because AGLFF can extract better inter-class discriminable information by searching a large range of intra-class neighbor samples to obtain more consistent information.

3.3.3. Parameter Settings and Analysis

We first analyze the relevant parameters

λ

,

λ_{1}

,

θ

,

θ_{1}

,

θ_{2}

,

ρ

, B, and C involved in the model composition to evaluate the performance of AGLFF. As parameters

λ

,

λ_{1}

,

θ

,

θ_{1}

, and

θ_{2}

contribute equally to the model, and B and C are complementary parameters that sum up to 1, for simplicity, only the effects of

λ

,

θ

,

ρ

, and C on the model performance were investigated. Parameters

λ

,

θ

, and

ρ

are within the range of

[0.001, 50]

. Figure 7a–c shows that the OAs on the three datasets vary with the variation in

λ

and

θ

; the OAs are at their maximum when

λ

is 30 and

θ

is 10. Therefore, parameters

λ

and

λ_{1}

are set to 30, while

θ

,

θ_{1}

, and

θ_{2}

are set to 10. Figure 7d shows that, as parameter

ρ

increases gradually from 0.001, the OAs increase and then decrease, reaching the maximum when

ρ

is at 0.1. Thus,

ρ

is set at 0.1. Parameter C is considered to be in the range

[0, 1]

. Figure 7e shows that the classification accuracy varies with the variation in C. When C is 0, it is a local feature, and the OA is the minimum value. When C is 1, it is a global feature. The C obtained by adaptive fusion is 0.8, and the classification accuracy is the highest at this time, indicating the superior performance of the adaptive fusion method.

The number of superpixels is also an important parameter affecting the performance of AGLFF. The number of superpixels M obtained by SLIC segmentation is in the range

[100, 1900]

. Figure 7f shows that the OAs gradually increase with the increasing number of superpixels M; the OAs become relatively large between 1100 and 1500. When M gradually increases from 1500, the OAs become smaller, indicating that extremely large superpixels will make the HSI excessively segmented and the classification effect becomes worse. Therefore, the number of superpixels M is set to 1500 for all datasets.

Subsequently, we set the number of groups

G^{M}

in the MF, the number of nodes

d^{M}

in the MF, and the number of nodes

G^{E}

in the EN of the BLS. Here, the number of groups

G^{M}

in the MF is the same as that of nodes

d^{M}

in the MF. Figure 8 shows that the OAs increase as

G^{E}

and

G^{M}

become larger and finally reach saturation. The performance of AGLFF can be gradually improved as

G^{E}

and

G^{M}

become larger, but extremely large values also lead to overly complex model calculation. Therefore, the corresponding

G^{M}

in MF and

G^{E}

in EN on the three datasets can be obtained as 30–500, 20–500, and 40–400, respectively.

3.4. Ablation Studies

To demonstrate the efficiency of the selected method, ablation studies were performed on local features (LFs), global features (GFs), global–local fusion features (FFs) and WBLS. For a detailed analysis, see Table 5 and Table 6. Compared with AGLFF-A and AGLFF-B, we can see that AGLFF has higher classification performance, indicating that the global–local fusion feature can realize feature smoothing of intra-class data and increase the discriminability of inter-class data. Compared with AGLFF-C, AGLFF has higher classification performance, which indicates that the WBLS model can expand the breadth of the fused features to further enhance the expressiveness of the data. Overall, a thorough study of the above experimental results with different compositions further validates the efficiency of our model.

4. Conclusions

To effectively utilize the spatial–spectral information of HSI, an AGLFF classification method is proposed. Adaptive fusion of global high-order and local data can realize feature smoothing of intra-class data and increase the discriminability of inter-class data. The adaptive method automatically learns the weight parameters of the global high-order and local data, which can reduce the number of parameters calculated. The probabilistic relationship between the fused features and the categories is then calculated by the CP structure, and the pseudo-labels corresponding to unlabeled data are obtained. Moreover, the fused features are also introduced into the BLS model as weights, and the feature broad expansion using WBLS increases the discriminability of the data. The validation results on three datasets show that the AGLFF method better extracts the consistent features of the sample data and obtains the best classification results compared to other methods. We will consider adding an adaptive weighting strategy to the BLS model in the future to improve its performance.

Author Contributions

All of the authors provided significant contributions to the work. C.Y. and Y.C. designed the experiments; C.Y., Y.K. and X.W. performed the experiments; C.Y., Y.K. and Y.C. analyzed the data; C.Y., Y.K. and Y.C. wrote the paper; X.W. reviewed and edited the paper. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China under Grant 62176259 and Grant 61976215. This research was funded by the Key Research and Development Program of Jiangsu Province under Grant BE2022095. This research was also funded by the Key University Natural Science Research Program of Anhui Province under Grant KJ2021A1119.

Data Availability Statement

Publicly available datasets were analyzed in this study, which were obtained from: http://www.ehu.eus/ccwintco/index.php?title=Hyperspectral_Remote_Sensing_Scenes (accessed on 6 October 2023).

Acknowledgments

The authors would like to thank the editors and anonymous reviewers for their detailed and constructive suggestions.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

HSI	Hyperspectral Image
AGLFF	Adaptive Global–Local Feature Fusion
SVM	Support Vector Machine
PCA	Principal Component Analysis
DL	Deep Learning
SAE	Stacked Autoencoder
CNN	Convolutional Neural Network
BLS	Broad Learning System
MF	Mapped Features
EN	Enhancement Nodes
SSL	Semi-supervised Learning
CP	Class Probability
WBLS	Weighted Broad Learning System
SLIC	Simple Linear Iterative Clustering
IP	Indian Pines
KSC	Kennedy Space Center
PU	Pavia University
AA	Average Accuracy
OA	Overall Accuracy

References

Yu, C.; Zhou, S.; Song, M.; Chang, C. Semisupervised hyperspectral band selection based on dual-constrained low-rank representation. IEEE Geosci. Remote Sens. Lett. 2022, 19, 5503005. [Google Scholar] [CrossRef]
Cheng, Y.; Chen, Y.; Kong, Y.; Wang, X. Soft instance-level domain adaptation with virtual classifier for unsupervised hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5509013. [Google Scholar] [CrossRef]
Prades, J.; Safont, G.; Salazar, A.; Vergara, L. Estimation of the Number of Endmembers in Hyperspectral Images Using Agglomerative Clustering. Remote Sens. 2020, 12, 3585. [Google Scholar] [CrossRef]
Wang, H.; Cheng, Y.; Wang, X. A Novel Hyperspectral Image Classification Method Using Class-Weighted Domain Adaptation Network. Remote Sens. 2023, 15, 999. [Google Scholar] [CrossRef]
Wang, H.; Wang, X.; Cheng, Y. Graph meta transfer network for heterogeneous few-shot hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2021, 61, 5501112. [Google Scholar] [CrossRef]
Kong, Y.; Wang, X.; Cheng, Y.; Chen, C.L.P. Multi-stage convolutional broad learning with block diagonal constraint for hyperspectral image classification. Remote Sens. 2021, 13, 3412. [Google Scholar] [CrossRef]
Ham, J.; Chen, Y.; Crawford, M.; Ghosh, J. Investigation of the random forest framework for classification of hyperspectral data. IEEE Trans. Geosci. Remote Sens. 2005, 43, 492–501. [Google Scholar] [CrossRef]
Melgani, F.; Bruzzone, L. Classification of hyperspectral remote sensing images with support vector machines. IEEE Trans. Geosci. Remote Sens. 2004, 42, 1778–1790. [Google Scholar] [CrossRef]
Salazar, A.; Safont, G.; Vergara, L.; Vidal, E. Graph Regularization Methods in Soft Detector Fusion. IEEE Access 2023, 11, 144747–144759. [Google Scholar] [CrossRef]
Sun, W.; Du, Q. Graph-regularized fast and robust principal component analysis for hyperspectral band selection. IEEE Trans. Geosci. Remote Sens. 2018, 56, 3185–3195. [Google Scholar] [CrossRef]
Villa, A.; Benediktsson, J.; Chanussot, J.; Jutten, C. Hyperspectral image classification with independent component discriminant analysis. IEEE Trans. Geosci. Remote Sens. 2011, 49, 4865–4876. [Google Scholar] [CrossRef]
Shi, C.; Sun, J.; Wang, T.; Wang, L. Hyperspectral Image Classification Based on a 3D Octave Convolution and 3D Multiscale Spatial Attention Network. Remote Sens. 2023, 15, 257. [Google Scholar] [CrossRef]
Liu, W.; Liu, B.; He, P.; Hu, Q.; Gao, K.; Li, H. Masked Graph Convolutional Network for Small Sample Classification of Hyperspectral Images. Remote Sens. 2023, 15, 1869. [Google Scholar] [CrossRef]
Pan, C.; Gao, X.; Wang, Y.; Li, J. Markov random fields integrating adaptive interclass-pair penalty and spectral similarity for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2019, 57, 2520–2534. [Google Scholar] [CrossRef]
Ghamisi, P.; Benediktsson, J.; Ulfarsson, M. Spectral–spatial classification of hyperspectral images based on hidden markov random fields. IEEE Trans. Geosci. Remote Sens. 2014, 52, 2565–2574. [Google Scholar] [CrossRef]
Lu, T.; Li, S.; Fang, L.; Jia, X.; Benediktsson, J. From subpixel to superpixel: A novel fusion framework for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 4398–4411. [Google Scholar] [CrossRef]
Cai, Y.; Zhang, Z.; Ghamisi, P.; Ding, Y.; Liu, X.; Cai, Z.; Gloaguen, R. Superpixel contracted neighborhood contrastive subspace clustering network for hyperspectral images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5530113. [Google Scholar] [CrossRef]
Wang, K.; Wang, X.; Zhang, T.; Cheng, Y. Few-shot learning with deep balanced network and acceleration strategy. Int. J. Mach. Learn Cybern. 2022, 13, 133–144. [Google Scholar] [CrossRef]
Chen, Y.; Zhao, X.; Jia, X. Spectral–spatial classification of hyperspectral data based on deep belief network. IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens. 2015, 8, 2381–2392. [Google Scholar] [CrossRef]
Chen, Y.; Lin, Z.; Zhao, X.; Wang, G.; Gu, Y. Deep learning-based classification of hyperspectral data. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2014, 7, 2094–2107. [Google Scholar] [CrossRef]
Cai, Y.; Liu, X.; Cai, Z. BS-Nets: An end-to-end framework for band selection of hyperspectral image. IEEE Trans. Geosci. Remote Sens. 2020, 58, 1969–1984. [Google Scholar] [CrossRef]
Mei, S.; Ji, J.; Hou, J.; Li, X.; Du, Q. Learning sensor-specific spatial-spectral features of hyperspectral images via convolutional neural networks. IEEE Trans. Geosci. Remote Sens. 2017, 55, 4520–4533. [Google Scholar] [CrossRef]
Kong, Y.; Wang, X.; Cheng, Y. Spectral–spatial feature extraction for HSI classification based on supervised hypergraph and sample expanded CNN. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2018, 11, 4128–4140. [Google Scholar] [CrossRef]
Chen, Y.; Jiang, H.; Li, C.; Jia, X.; Ghamisi, P. Deep feature extraction and classification of hyperspectral images based on convolutional neural networks. IEEE Trans. Geosci. Remote Sens. 2016, 54, 6232–6251. [Google Scholar] [CrossRef]
Liu, B.; Yu, X.; Zhang, P.; Yu, A.; Fu, Q.; Wei, X. Supervised deep feature extraction for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2018, 56, 1909–1921. [Google Scholar] [CrossRef]
Yang, X.; Ye, Y.; Li, X.; Lau, R.Y.; Zhang, X.; Huang, X. Hyperspectral image classification with deep learning models. IEEE Trans. Geosci. Remote Sens. 2018, 56, 5408–5423. [Google Scholar] [CrossRef]
Mou, L.; Ghamisi, P.; Zhu, X. Unsupervised spectral–spatial feature learning via deep residual conv–deconv network for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2018, 56, 391–406. [Google Scholar] [CrossRef]
Kong, Y.; Wang, X.; Cheng, Y.; Chen, Y.; Chen, C.L.P. Graph domain adversarial network with dual-weighted pseudo-label loss for hyperspectral image classification. IEEE Geosci. Remote Sens. Lett. 2022, 19, 6005105. [Google Scholar] [CrossRef]
Ding, Y.; Pan, S.; Chong, Y. Robust spatial-spectral block-diagonal structure representation with fuzzy class probability for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2020, 58, 1747–1762. [Google Scholar] [CrossRef]
Chen, C.L.P.; Liu, Z. Broad learning system: An effective and efficient incremental learning system without the need for deep architecture. IEEE Trans. Neural Netw. Learn Syst. 2018, 29, 10–24. [Google Scholar] [CrossRef]
Jin, J.; Chen, C.L.P. Regularized robust broad learning system for uncertain data modeling. Neurocomputing 2018, 322, 58–69. [Google Scholar] [CrossRef]
Kong, Y.; Cheng, Y.; Chen, C.L.P.; Wang, X. Hyperspectral image clustering based on unsupervised broad learning. IEEE Geosci. Remote Sens. Lett. 2019, 16, 1741–1745. [Google Scholar] [CrossRef]
Wang, H.; Wang, X.; Chen, C.L.P.; Cheng, Y. Hyperspectral image classification based on domain adaptation broad learning. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2020, 13, 3006–3018. [Google Scholar] [CrossRef]
Camps-Valls, G.; Marsheva, T.V.B.; Zhou, D. Semi-supervised graph-based hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2007, 45, 3044–3054. [Google Scholar] [CrossRef]
Zhang, Y.; Cao, G.; Shafique, A.; Fu, P. Label propagation ensemble for hyperspectral image classification. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2019, 12, 3623–3636. [Google Scholar] [CrossRef]
De Morsier, F.; Borgeaud, M.; Gass, V.; Thiran, J.-P.; Tuia, D. Kernel low-rank and sparse graph for unsupervised and semi-supervised classification of hyperspectral images. IEEE Trans. Geosci. Remote Sens. 2016, 54, 3410–3420. [Google Scholar] [CrossRef]
Ma, J.; Chow, T.W. Robust non-negative sparse graph for semi-supervised multi-label learning with missing labels. Inf. Sci. 2018, 422, 336–351. [Google Scholar] [CrossRef]
Shao, Y.; Sang, N.; Gao, C.; Ma, L. Spatial and class structure regularized sparse representation graph for semi-supervised hyperspectral image classification. Pattern Recognit. 2018, 81, 81–94. [Google Scholar] [CrossRef]
Ding, Y.; Guo, Y.; Chong, Y.; Pan, S.; Feng, J. Global consistent graph convolutional network for hyperspectral image classification. IEEE Trans. Instrum. Meas. 2021, 70, 5501516. [Google Scholar] [CrossRef]
Achanta, R.; Shaji, A.; Smith, K.; Lucchi, A.; Fua, P.; Süsstrunk, S. SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans. Pattern Anal. Machine Intell. 2012, 34, 2274–2282. [Google Scholar] [CrossRef]
Nie, F.; Wang, X.; Huang, H. Clustering and projected clustering with adaptive neighbors. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), New York, NY, USA, 24–27 August 2014; pp. 977–986. [Google Scholar]
Lin, Z.; Liu, R.; Su, Z. Linearized alternating direction method with adaptive penalty for low-rank representation. In Proceedings of the Advances in Neural Information Processing Systems, Granada, Spain, 20 September 2011; pp. 612–620. [Google Scholar]
Wu, Y.; Yang, X.; Plaza, A.; Qiao, F.; Gao, L.; Zhang, B.; Cui, Y. Approximate computing of remotely sensed data: SVM hyperspectral image classification as a case study. IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens. 2016, 9, 5806–5818. [Google Scholar] [CrossRef]
Zhai, H.; Zhang, H.; Zhang, L.; Li, P.; Plaza, A. A new sparse subspace clustering algorithm for hyperspectral remote sensing imagery. IEEE Geosci. Remote Sens. Lett. 2017, 14, 43–47. [Google Scholar] [CrossRef]
Jiang, J.; Ma, J.; Chen, C.; Wang, Z.; Cai, Z.; Wang, L. SuperPCA: A superpixelwise PCA approach for unsupervised feature extraction of hyperspectral imagery. IEEE Trans. Geosci. Remote Sens. 2018, 56, 4581–4593. [Google Scholar] [CrossRef]
Zhang, X.; Jiang, X.; Jiang, J.; Zhang, Y.; Liu, X.; Cai, Z. Spectral—Spatial and superpixelwise PCA for unsupervised feature extraction of hyperspectral imagery. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5502210. [Google Scholar] [CrossRef]
Kong, Y.; Wang, X.; Cheng, Y.; Chen, C.L.P. Hyperspectral imagery classification based on semi-supervised broad learning system. Remote Sens. 2018, 10, 685. [Google Scholar] [CrossRef]
Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar] [CrossRef]
Su, Y.; Gao, L.; Jiang, M.; Plaza, A.; Sun, X.; Zhang, B. NSCKL: Normalized Spectral Clustering With Kernel-Based Learning for Semisupervised Hyperspectral Image Classification. IEEE Trans. Cybern. 2023, 53, 6649–6662. [Google Scholar] [CrossRef]
Xi, B.; Li, J.; Li, Y.; Song, R.; Xiao, Y.; Du, Q.; Chanussot, J. Semi-supervised Cross-scale Graph Prototypical Network for Hyperspectral Image Classification. IEEE Trans. Neural Netw. Learn. Syst. 2023, 34, 9337–9351. [Google Scholar] [CrossRef]

Figure 1. Framework of the AGLFF model.

Figure 2. Classification maps from the IP dataset. (a) False-color image. (b) Ground-truth map. (c) SVM. (d) ELM. (e) SuperPCA. (f) S3PCA. (g) BLS. (h) SBLS. (i) GCN. (j) GCGCN. (k) NSCKL. (l) XPGN. (m) AGLFF1. (n) AGLFF.

Figure 3. Classification maps from the KSC dataset. (a) False-color image. (b) Ground-truth map. (c) SVM. (d) ELM. (e) SuperPCA. (f) S3PCA. (g) BLS. (h) SBLS. (i) GCN. (j) GCGCN. (k) NSCKL. (l) XPGN. (m) AGLFF1. (n) AGLFF.

Figure 4. Classification maps from the PU dataset. (a) False-color image. (b) Ground-truth map. (c) SVM. (d) ELM. (e) SuperPCA. (f) S3PCA. (g) BLS. (h) SBLS. (i) GCN. (j) GCGCN. (k) NSCKL. (l) XPGN. (m) AGLFF1. (n) AGLFF.

Figure 5. Classification performance of different models with different label ratios. (a) IP. (b) KSC. (c) PU.

Figure 6. Classification performance of the fused, global, and local features. (a) IP. (b) KSC. (c) PU.

Figure 7. Classification performance of the AGLFF model with different parameters. (a) OA versus parameters

λ

and

θ

on IP. (b) OA versus parameters

λ

and

θ

on KSC. (c) OA versus parameters

λ

and

θ

on PU. (d) OA versus parameter

ρ

on three datasets. (e) OA versus parameter C on three datasets. (f) OA versus parameter M on three datasets.

Figure 7. Classification performance of the AGLFF model with different parameters. (a) OA versus parameters

λ

and

θ

on IP. (b) OA versus parameters

λ

and

θ

on KSC. (c) OA versus parameters

λ

and

θ

on PU. (d) OA versus parameter

ρ

on three datasets. (e) OA versus parameter C on three datasets. (f) OA versus parameter M on three datasets.

Figure 8. OA versus

G^{E}

and

G^{M}

. (a) IP. (b) KSC. (c) PU.

Figure 8. OA versus

G^{E}

and

G^{M}

. (a) IP. (b) KSC. (c) PU.

Table 1. Description of unlabeled and labeled samples in the IP, KSC, and PU datasets. “n.l.s.” represents labeled samples, and “n.u.s.” represents unlabeled samples.

IP				KSC			PU
Class	Surface Object	n.l.s	n.u.s	Surface Object	n.l.s	n.u.s	Surface Object	n.l.s	n.u.s
1	Alfalfa	30	16	Scrub	30	731	Asphalt	30	6601
2	Corn-notill	30	1398	Willow swamp	30	213	Meadows	30	18,619
3	Corn-mintill	30	800	Cabbage palm hammock	30	226	Gravel	30	2069
4	Corn	30	207	Slash pine	30	222	Trees	30	3034
5	Grass-pasture	30	453	Oak/broadleaf	30	131	Painted metal sheets	30	1345
6	Grass-trees	30	700	Hardwood	30	199	Bare soil	30	4999
7	Grass-pasture-mowed	15	13	Swamp	30	75	Bitumen	30	1300
8	Hay-windrowed	30	448	Graminoid marsh	30	401	Self-blocking bricks	30	3652
9	Oats	15	5	Spartina marsh	30	490	Shadows	30	917
10	Soybean-notill	30	942	Cattail marsh	30	374
11	Soybean-mintill	30	2425	Salt marsh	30	389
12	Soybean-clean	30	563	Mud flats	30	473
13	Wheat	30	175	Water	30	897
14	Woods	30	1235
15	Buildings-grass-trees-drives	30	356
16	Stone-steel-towers	30	63

Table 2. Classification performance of different methods on the IP dataset.

Class	SVM	ELM	SuperPCA	S3PCA	BLS	SBLS	GCN	GCGCN	NSCKL	XPGN	AGLFF1	AGLFF
1	20.48	35.98	100	100	100	100	95.00	100	90.39	98.13	98.96	96.88
2	55.49	63.72	92.65	91.82	64.89	92.17	56.71	91.15	94.69	88.80	90.53	90.79
3	42.77	43.67	96.28	93.52	55.76	94.88	51.50	92.53	89.38	96.11	96.08	96.21
4	36.5	36.01	88.41	96.20	51.82	99.90	84.64	99.95	87.05	99.47	99.92	99.89
5	78.45	82.96	95.14	96.03	86.36	94.08	83.71	97.04	93.71	97.09	96.91	99.40
6	91.33	94.22	97.14	97.14	88.87	99.80	94.03	96.79	98.19	99.57	99.36	99.74
7	39.80	36.61	92.86	92.86	83.16	98.57	92.31	96.15	99.87	99.16	98.81	99.92
8	98.93	99.29	99.55	100	90.87	99.82	96.61	100	100	100	100	100
9	30.76	29.28	100	100	100	100	100	100	73.68	99.71	63.33	100
10	47.49	57.41	89.52	90.70	63.64	86.88	77.47	91.51	95.58	95.15	95.56	91.43
11	71.92	74.56	93.73	95.31	81.70	89.69	56.56	95.38	93.24	88.67	90.20	94.68
12	53.36	52.18	96.67	97.34	69.82	97.80	58.29	95.81	95.92	96.10	97.13	98.94
13	89.38	92.81	99.43	99.43	90.59	99.43	100	99.31	98.02	99.39	99.62	97.93
14	93.40	93.97	90.20	91.55	90.87	97.23	80.03	99.26	99.91	99.44	99.77	99.10
15	50.86	57.67	98.53	98.60	80.16	99.44	69.55	97.50	90.65	98.48	99.91	93.35
16	85.47	89.58	97.82	98.41	91.88	99.05	98.41	99.52	92.30	98.17	98.94	99.38
OA (%)	64.35	68.37	94.61	95.79	81.36	93.95	69.24	95.35	94.74	94.83	95.03	96.11
AA (%)	61.65	64.99	95.49	96.18	80.02	96.80	65.27	96.80	93.29	97.09	95.31	97.35
Kappa (%)	59.92	64.40	92.99	93.67	80.32	92.29	80.39	94.67	93.99	94.10	94.77	95.23
T (s)	3.76	1.51	1.95	3.98	3.17	528.75	580.00	641.00	47.97	399.54	367.83	392.57

Table 3. Classification performance of different methods on the KSC dataset.

Class	SVM	ELM	SuperPCA	S3PCA	BLS	SBLS	GCN	GCGCN	NSCKL	XPGN	AGLFF1	AGLFF
1	93.72	83.35	96.72	96.72	88.60	98.80	80.15	97.84	98.98	98.87	97.78	99.40
2	76.49	77.84	98.71	99.53	76.74	96.06	80.02	95.85	95.59	98.12	96.62	99.48
3	73.15	61.93	97.23	98.23	70.36	96.19	76.66	97.39	99.91	100	99.85	100
4	45.16	68.55	92.17	97.75	75.61	80.23	27.63	99.64	87.08	84.98	99.73	99.24
5	60.70	71.25	96.18	97.71	69.06	99.08	72.56	98.93	88.26	98.82	99.24	98.31
6	48.55	82.48	98.05	97.49	74.12	75.34	80.51	100	99.17	98.06	100	100
7	68.69	61.33	100	100	72.22	79.33	95.60	100	100	99.89	99.91	99.28
8	64.89	74.69	97.72	100	76.51	98.35	85.02	100	99.85	99.76	100	100
9	80.65	92.59	96.73	95.51	91.87	96.29	84.59	100	100	100	99.82	99.45
10	98.8	98.19	92.51	89.30	100	97.33	91.64	99.48	99.09	98.11	100	100
11	91.85	96.99	99.49	99.49	99.15	96.50	89.18	100	99.71	91.28	99.82	100
12	79.96	89.06	99.10	100	92.38	92.30	76.20	96.89	96.06	97.65	97.72	99.65
13	99.50	98.44	96.07	96.66	100	99.02	99.49	100	100	99.91	94.74	94.81
OA (%)	82.19	85.68	97.09	97.82	88.38	93.62	83.72	99.18	98.29	96.97	98.28	99.26
AA (%)	75.55	81.28	96.97	97.57	83.59	92.24	79.94	99.09	97.21	97.34	98.86	99.53
Kappa (%)	79.68	84.11	96.43	96.85	87.01	92.68	81.89	98.99	98.09	96.16	98.08	99.09
T (s)	3.35	1.39	1.51	14.55	2.03	243.17	289.28	356.25	234.17	254.39	173.13	186.29

Table 4. Classification performance of different methods on the PU dataset.

Class	SVM	ELM	SuperPCA	S3PCA	BLS	SBLS	GCN	GCGCN	NSCKL	XPGN	AGLFF1	AGLFF
1	91.71	95.36	81.03	94.97	97.82	86.53	69.78	94.59	97.11	95.89	91.76	96.94
2	91.15	94.16	86.27	90.02	97.29	97.53	54.10	98.11	99.90	99.82	98.91	99.11
3	60.59	59.05	94.10	99.14	60.98	98.44	69.69	99.35	87.69	89.13	99.12	98.55
4	74.12	75.01	78.83	95.00	87.24	87.32	91.23	96.11	94.35	97.37	84.1	94.52
5	95.67	99.12	97.11	99.27	100	99.87	98.74	99.77	100	100	99.84	99.82
6	60.51	60.56	94.62	98.67	71.02	99.37	65.34	99.69	98.66	99.16	99.29	99.56
7	54.89	54.57	96.79	98.53	57.91	99.97	86.64	99.53	98.61	99.19	99.87	99.89
8	80.07	72.14	92.89	95.68	80.42	94.12	72.26	97.84	96.86	98.71	98.93	97.94
9	99.91	99.98	98.32	99.13	100	90.59	99.93	96.80	90.29	99.29	94.59	96.61
OA (%)	80.04	80.98	91.00	96.24	87.19	95.09	66.19	97.71	97.15	97.09	96.76	98.01
AA (%)	78.74	78.88	91.11	96.71	83.63	94.86	58.39	97.98	95.94	97.62	96.27	98.11
Kappa (%)	75.19	75.62	84.14	91.98	83.19	93.47	78.63	96.98	96.21	96.53	95.69	97.35
T (s)	3.36	1.99	3.84	123.64	5.76	1121.99	1783.00	1653.00	756.82	1057.19	897.01	932.35

Table 5. Ablation experiments with different compositions.

Algorithm	AGLFF-A	AGLFF-B	AGLFF-C	AGLFF
LFs	✓
GFs		✓
FFs			✓	✓
WBLS	✓	✓		✓

Table 6. Ablation analysis results for OA (%).

Dataset	AGLFF-A	AGLFF-B	AGLFF-C	AGLFF
IP	92.87	94.76	95.03	96.11
KSC	96.05	98.09	98.28	99.26
PU	94.07	96.19	96.76	98.01

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yang, C.; Kong, Y.; Wang, X.; Cheng, Y. Hyperspectral Image Classification Based on Adaptive Global–Local Feature Fusion. Remote Sens. 2024, 16, 1918. https://doi.org/10.3390/rs16111918

AMA Style

Yang C, Kong Y, Wang X, Cheng Y. Hyperspectral Image Classification Based on Adaptive Global–Local Feature Fusion. Remote Sensing. 2024; 16(11):1918. https://doi.org/10.3390/rs16111918

Chicago/Turabian Style

Yang, Chunlan, Yi Kong, Xuesong Wang, and Yuhu Cheng. 2024. "Hyperspectral Image Classification Based on Adaptive Global–Local Feature Fusion" Remote Sensing 16, no. 11: 1918. https://doi.org/10.3390/rs16111918

APA Style

Yang, C., Kong, Y., Wang, X., & Cheng, Y. (2024). Hyperspectral Image Classification Based on Adaptive Global–Local Feature Fusion. Remote Sensing, 16(11), 1918. https://doi.org/10.3390/rs16111918

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Hyperspectral Image Classification Based on Adaptive Global–Local Feature Fusion

Abstract

1. Introduction

2. Adaptive Global–Local Feature Fusion Method

2.1. Adaptive Feature Fusion

2.2. Class Probability Structure

2.3. Weighted Broad Learning System

3. Experiments and Analysis

3.1. HSI Datasets

3.2. Comparative Experiments

3.3. Parameter Analysis

3.3.1. Semi-Supervised Label Ratio

3.3.2. Analysis of Construction Graph

3.3.3. Parameter Settings and Analysis

3.4. Ablation Studies

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI