Figure 1 shows the classification process of the proposed AGLFF, which mainly includes five steps. First, PCA is used to reduce the dimensions of the original HSI. Second, superpixels are fetched with simple linear iterative clustering (SLIC) [
40], and then the global high-order and local graphs are adaptively fused to obtain consistent spatial–spectral features. Third, the CP structure is applied to calculate the pseudo-labels corresponding to the unlabeled samples. Fourth, the consistency features after fusion are extended by BLS to enhance the feature representation. Finally, the fused features are introduced into BLS as weights, and the output weights are calculated by the ridge regression theory.
2.1. Adaptive Feature Fusion
To fully utilize global and local information, consistent spatial–spectral features are obtained by adaptive fusion. HSI typically includes a lot of approximately continuous spectral band information. PCA is used to reduce the dimension of the original data. Define the HSI matrix
, with pixel data
after dimensionality reduction, where
N and
b represent the pixel number and dimension value, respectively. Define a spatial coordinate matrix
, where
represents the spatial coordinate data after dimension reduction. SLIC is then used to segment the data after dimensionality reduction to generate superpixels. A reliable local adjacency graph is constructed based on the spatial–spectral information between superpixels and pixels. Inspired by [
39], the nearest neighbors are determined based on the probabilistic neighbor relationship between the superpixels and pixels belonging to the same class. The statistical characteristic information of each superpixel after segmentation is expressed as its corresponding mean value. The average value of spectral features of the superpixel is
, while
is the average value of spatial coordinates, where
M represents the value of the superpixel. The smaller spectral distance
and the smaller spatial location distance
between pixel
i and superpixel
j correspond to a larger probability relationship
of being in the same class. The local adjacency graph model between superpixels and pixels can be expressed as follows:
where
and
represent the spectral information and spatial location coordinates of pixel
i, respectively, and
and
represent the average spectral information and spatial location coordinate values of superpixel
j, respectively.
represents the local neighbor relationship between pixel
i and superpixel
j. The addition of
prevents obtaining the trivial nearest neighbor solution with a probability value of 1, while
and
are the corresponding regularization parameters.
Owing to the long spatial distance and large spectral variation between intra-class pixels, the local adjacency graph can only obtain the neighboring relationship of a few nearby neighbor pixels and cannot obtain the global neighbor relationship. Global consistency features between superpixels and pixels are obtained using the graph topological consistency relationship, and feature aggregation between intra-class data can be further achieved. Inspired by [
39], for two superpixels, the topological relationship remains highly consistent if the superpixels are connected through consecutive neighbors. The topological consistency relationship diagram is shown in the green dashed box in
Figure 1. The superpixel P and superpixel Q are the same class. They are spatially located far away, and their spectral information is extremely variable, so the spatial–spectral correlation is low. However, they are connected by continuous neighbors, which means a higher topological consistency. Finding the similarity relationship between the superpixels can calculate their topological consistency relationship. The relationship model is expressed as follows:
where
is the similarity relationship of superpixels
i and
j,
is the topological consistency relationship between the superpixels,
is the topological consistency relationship between superpixels
r and
i,
is the regular parameter, and
is the identity matrix. The first term in (
2) corresponds to the graph topological consistency relationship assumption, and the second term prevents obtaining a trivial solution.
represents the corresponding similarity matrix between the superpixels, which can be calculated as follows:
where
and
represent the average spectral information and average spatial location coordinate values of superpixel
j, respectively.
and
are regularization parameters. Equation (
3) can be resolved according to [
41], and (
1) can be solved in the same manner. Equation (
2) is independent of the value of
r. Therefore, the problem can be solved by separating
r:
Equation (
4) is derived with respect to
and considers the value of zero to obtain its optimal solution
:
where
denotes the Laplacian matrix of
, and
and
are the column vectors. Subsequently,
The
rth column of
is
; therefore, the
rth column of
is
. The optimal topological relationship can be expressed as follows:
HSI classification is the task that operates on all pixels; thus, the global consistency relationship
is calculated by combining the local neighbor relationship
in (
1) with the topological relationship
in (
4). Subsequently,
where
is a global consistency relationship between pixels and superpixels, which can effectively capture the global neighbor features. Considering that the neighbor relationship between pixels and superpixels cannot be fully expressed using global features. Therefore, the global and local effective features are adaptively fused to obtain the neighbor relationship between pixels and superpixels. The model can be expressed as follows:
This can be simply transformed to
where
. Equation (
10) is independent of the value of
i. Subsequently,
Let
represent distance, where
and
; then, we have the following:
Equation (
12) can be transformed into a vector form as follows:
The Lagrangian function can be expressed as follows:
Subsequently, considering the derivative of (
14) and setting it to zero gives
According to Karush–Kuhn–Tucker conditions, the desired result of (
15) is expressed as follows:
According to the constraint
, we have
After obtaining
, the corresponding
can be determined. We denote
as the regular matrix of
, where
increases the stability of the fusion model.
,
, and
; the parameter
is empirically set to 0.1. Similarly, the regular matrix
of
S can be calculated. The fused feature can be expressed as
where
B and
C are fusion coefficients, and
is the fused feature. Algorithm 1 summarizes the fusion process.
Algorithm 1 Adaptive Feature Fusion Process |
Input: PCA-based HSI representation , pixel spatial coordinate , superpixel spectral feature , and superpixel spatial coordinate .
- (1)
Obtain the local neighbor relationship between pixels and superpixels according to ( 1). - (2)
Obtain the topological relationship between superpixels according to ( 4). - (3)
Obtain the global consistency relationship according to ( 8). - (4)
Calculate adaptive fused feature according to ( 9). - (5)
Obtain the optimized according to ( 16) and ( 17). - (6)
Obtain the adaptive fused feature according to ( 18). Output: Adaptive fused feature .
|
2.2. Class Probability Structure
Unlabeled data pixels have no label information and cannot be effectively used. We use a CP structure to calculate their pseudo-labels. The labeled samples generated via adaptive fusion are denoted as
, and their corresponding labels are denoted as
. The unlabeled samples via adaptive fusion are denoted as
, where
l denotes the number of known labeled data,
b denotes the dimensionality of the data,
u denotes the number of unlabeled data,
c denotes the category number, and
is the number of total data pixels. At any
, the similarity relationship with the labeled sample
is
where
and
are the desired regular term parameter and sparse coefficient, respectively. Equation (
19) can be further optimally solved using the alternating direction multiplier method [
42] to determine the CP vector as
where
, and
is the probability value that the
ith data belongs to the
kth category. The CP matrix
of unlabeled samples can be obtained by label propagation of given labeled samples, and
is the corresponding CP matrix of the given labeled samples. Therefore, for any two samples
i and
j, the probability of belonging to the same class is denoted as
The CP matrix
can be divided into four small matrix blocks and denoted as
where
is the numerical probability matrix of the same class in the labeled data, and
is the numerical probability matrix of the same class in the unlabeled data.
and
are the numerical probability matrix of the same class in the unlabeled and labeled data, respectively. By calculating the index of the data with the maximum probability for each row of
, the most similar labeled data can be obtained for all unlabeled data. Therefore, the pseudo-labels of the unlabeled data can be solved and expressed as
2.3. Weighted Broad Learning System
Global–local fused features are introduced into the BLS model as weights to construct a WBLS. Using the WBLS to expand the broad of the fused features can further enhance the feature representation of the data. Given the adaptive fused HSI data
, the labels
can be computed through the CP structure. The model uses randomly generated weights
and deviation values
to map
to the newly expanded MF, and we have
where
is the number of feature groups included in the MF,
is the weight,
is the deviation value, and
denotes the
ith group of MF.
is obtained via sparse autoencoder. Subsequently, the obtained MF features are mapped to EN through selected functions to further expand the feature broad. We have
where
denotes the selected nonlinear function,
is the number of nodes contained in the EN,
is the weight, and
is the deviation value. Finally, MF and EN are combined and passed to the output layer, and the output is denoted as
where
denotes the weights of the output layer. The fused global–local matrix
is added to the BLS as weights to construct the objective function of the WBLS. We have
where
denotes a regularization parameter. Equation (
27) can be optimized using the ridge regression theory. We have
The prediction of the WBLS can then be calculated as follows:
AGLFF uses fused global–local features to achieve data sample smoothing. The pseudo-labels corresponding to unlabeled samples are calculated via the CP structure to effectively utilize HSI unlabeled data. Moreover, the fused features are added to the BLS as weights to enhance feature representation. Algorithm 2 summarizes the AGLFF method.
Algorithm 2 AGLFF Method |
Input: Adaptive fused data .
- (1)
Obtain the probability class matrix P according to ( 21). - (2)
Obtain pseudo-labels of unlabeled data according to ( 23). - (3)
Obtain MF features according to ( 24) and calculate EN features according to ( 25). - (4)
Obtain weights of WBLS according to ( 28). - (5)
Obtain predictive labels of AGLFF according to ( 29). Output: Predictive labels .
|