Next Article in Journal
X-band MMICs for a Low-Cost Radar Transmit/Receive Module in 250 nm GaN HEMT Technology
Previous Article in Journal
Wood Veneer Defect Detection Based on Multiscale DETR with Position Encoder Net
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Joint Learning of Correlation-Constrained Fuzzy Clustering and Discriminative Non-Negative Representation for Hyperspectral Band Selection

College of Computer Science, Liaocheng University, Liaocheng 252059, China
*
Author to whom correspondence should be addressed.
Sensors 2023, 23(10), 4838; https://doi.org/10.3390/s23104838
Submission received: 14 March 2023 / Revised: 14 May 2023 / Accepted: 15 May 2023 / Published: 17 May 2023
(This article belongs to the Section Sensing and Imaging)

Abstract

:
Hyperspectral band selection plays an important role in overcoming the curse of dimensionality. Recently, clustering-based band selection methods have shown promise in the selection of informative and representative bands from hyperspectral images (HSIs). However, most existing clustering-based band selection methods involve the clustering of original HSIs, limiting their performance because of the high dimensionality of hyperspectral bands. To tackle this problem, a novel hyperspectral band selection method termed joint learning of correlation-constrained fuzzy clustering and discriminative non-negative representation for hyperspectral band selection (CFNR) is presented. In CFNR, graph regularized non-negative matrix factorization (GNMF) and constrained fuzzy C-means (FCM) are integrated into a unified model to perform clustering on the learned feature representation of bands rather than on the original high-dimensional data. Specifically, the proposed CFNR aims to learn the discriminative non-negative representation of each band for clustering by introducing GNMF into the model of the constrained FCM and making full use of the intrinsic manifold structure of HSIs. Moreover, based on the band correlation property of HSIs, a correlation constraint, which enforces the similarity of clustering results between neighboring bands, is imposed on the membership matrix of FCM in the CFNR model to obtain clustering results that meet the needs of band selection. The alternating direction multiplier method is adopted to solve the joint optimization model. Compared with existing methods, CFNR can obtain a more informative and representative band subset, thus can improve the reliability of hyperspectral image classifications. Experimental results on five real hyperspectral datasets demonstrate that CFNR can achieve superior performance compared with several state-of-the-art methods.

1. Introduction

Hyperspectral images (HSIs) are typically generated by capturing hundreds of narrow and continuous electromagnetic bands from the radiation of ground objects via hyperspectral sensors. Thus, HSIs can provide an abundance of spectral and spatial information regarding target objects [1]. HSIs are currently used in a wide variety of applications, such as target detection [2], land cover classification [3], urban management [4], and soil investigation [5]. Hyperspectral classification is an important task in such applications because it can identify the category of the existing land cover in each pixel of an HSI. However, HSIs usually have high-dimensional features and large amounts of redundant information, inducing the Hughes phenomenon in the applications of hyperspectral classification [6]. Band selection that reduces redundant information by selecting a set of representative bands from an HSI is an effective method to tackle this problem [7].
Band selection methods are generally categorized into two categories: supervised and unsupervised methods. Supervised methods rely on labeled samples, which are typically costly to obtain in practice [8]. Conversely, unsupervised methods can perform band selection without using labeled samples, providing greater flexibility in practical applications [9]. Over the past few decades, researchers have introduced various unsupervised methods, which can be categorized as ranking, searching, and clustering-based methods. Ranking-based methods commonly use certain indicators to measure the significance of each band, such as maximum variance-based principal component analysis (MVPCA) [10] and density-based spatial clustering of applications with noise [11]. Unfortunately, the performance of ranking-based methods is limited because they rarely consider the high correlation between bands [12]. Searching-based methods usually select the representative bands based on effective objective functions used to optimize given criteria, such as the volume gradient-based band selection method [13] and particle swarm optimization-based method [14]. However, these methods have high computational complexity during optimization [4]. Clustering-based methods first perform clustering on all bands, and then select a representative band from each cluster. Typical examples include Ward’s linkage strategy using divergence (WaLuDi) [15], enhanced fast density-peak-based clustering (E-FDPC) [16], adaptive subspace partition strategy (ASPS) [9], and region-aware hierarchical latent feature representation learning-guided clustering (HLFC) [17].
Clustering-based methods can generally provide superior performance by considering the similarity among bands [16]. However, the performance of clustering-based methods is degraded when handling high-dimensional data [17]. The concept of combining representation learning with clustering has been applied to many fields to improve clustering performance in the case of high-dimensional data. For example, He et al. [18] proposed a spatial weighted matrix distance-based fuzzy clustering algorithm that first uses variable-based principal component analysis for dimensionality reduction, and then divides multivariate time series data into different clusters. Gu et al. [19] introduced fuzzy double C-means based on the sparse self-representation method, in which a discriminative feature set is obtained via sparse self-representation followed by the use of the fuzzy double C-means to obtain superior clustering results. Notably, the aforementioned studies reveal that representation learning and clustering remain independent stages. Some researchers have proposed integrating representation learning and clustering into one framework. For instance, Lu et al. [20] presented subspace clustering constrained sparse non-negative matrix factorization (SC-NMF) for hyperspectral unmixing. In SC-NMF, subspace clustering is embedded into the non-negative matrix factorization to extract endmembers and corresponding abundance accurately. The joint framework of representation learning and clustering has yielded excellent results in some practical applications but has not been investigated in band selection applications. Therefore, it is a challenge to design an effective joint model of clustering and representative learning for the band selection task and introduce appropriate regularizations into the joint model based on problem-dependent information.
To address the abovementioned issues, a novel clustering-based band selection method called joint learning of correlation-constrained fuzzy clustering and discriminative non-negative representation (CFNR) is proposed in this paper. CFNR aims to perform clustering on the discriminative non-negative representation of all bands rather than on the original high-dimensional hyperspectral bands, as well as fully consider the band correlation property and intrinsic manifold structure of HSIs, by which an informative and representative band subset for hyperspectral classification is selected. Specifically, CFNR can perform clustering and representation learning jointly on the target HSIs by integrating the objective function of graph regularized non-negative matrix factorization (GNMF) into the constrained fuzzy C-means (FCM) model. Therefore, effective clustering results are expected to be obtained by conducting fuzzy clustering on the feature representation of bands learned by GNMF. Specifically, GNMF is used in CFNR to obtain discriminative non-negative representation of all bands by taking advantage of the intrinsic manifold structure of HSIs. Furthermore, a correlation constraint is imposed on the FCM membership matrix to exploit prior information regarding the existing strong correlation between neighboring bands in an HSI, by which adjacent bands are enforced to possess similar cluster assignments. This condition is expected to improve the effectiveness of clustering for band selection. Finally, an information entropy-based method is employed to select a representative band from each cluster. Five real hyperspectral datasets are used to evaluate the performance of CFNR and compare it with five representative methods, demonstrating that the proposed CFNR method can provide superior performance. The main contributions of this study are listed as follows.
  • A novel band selection method called CFNR, by which fuzzy clustering and learning of discriminative non-negative representation can be performed simultaneously, is developed to select representative bands for the classification task of HSIs. CFNR conducts clustering on the discriminative non-negative representation of all bands rather than the original high-dimensional hyperspectral bands. Compared with existing related band selection methods, the advantage of the proposed CFNR is the integration of representation learning and clustering into a unified model.
  • A correlation constraint based on problem-dependent information is imposed on the membership matrix in the CFNR model to take full advantage of the high correlation between adjacent bands in HSIs. Furthermore, GNMF, through which the intrinsic manifold structure of HSIs can be fully exploited, is utilized in CFNR to learn the discriminative non-negative representation of all bands. In addition, the alternating direction multiplier method (ADMM) is used to solve the optimization problem furnished by the proposed joint model.
  • The performance of our proposed method is validated through comparison with five representative band selection methods on five real datasets. Experimental results show that CFNR demonstrates superior performance for band selection.
The remainder of this paper is organized as follows. Section 2 introduces the theoretical foundations of the research problem. Section 3 presents the proposed model as well as the solution. In Section 4, the, details of the experimental results from the five real datasets and the corresponding analysis are presented, including a discussion of the findings from our experiments, advantages, and limitations of our method. Finally, the paper is summarized in Section 5.

2. Related Work

2.1. Constrained FCM

FCM, one of the most popular fuzzy clustering methods, divides a dataset  X = [ x 1 , x 2 , , x L ] R N × L  into K clusters using a soft clustering assignment strategy. Compared with hard clustering methods, such as K-means, the advantage of FCM lies in its use of a membership matrix  U R K × L  to indicate the assignment probability of each sample to different clusters [21]. For example,  U k , l  is used to represent the probability that data point  X : , l  belongs to the k-th cluster. FCM can be formally expressed as a minimization problem as follows.
min U , C k = 1 K l = 1 L U k , l m | | ( X T ) l , : C k , : | | 2 , s . t . k = 1 K U k , l = 1 ,
where  C = [ c 1 , c 2 , , c k , , c K ]  denotes a matrix formed by the centroid vector  c k k = 1 , 2 , , K m > 1  denotes the fuzzification factor and is set to 2 in this study;  | | · | |  denotes the Euclidean distance between the l-th data point and k-th cluster center.
The objective function of problem (1) is nonconvex; thus, the solutions of FCM are prone to trapping into local minima and are sensitive to the initial values [22]. To better address the needs of a specific application, different constraints are imposed on solution  U  in problem (1) [23,24]. Thus, the general model of the constraint-based FCM can be given as
min U , C k = 1 K l = 1 L U k , l m | | ( X T ) l , : C k , : | | 2 + β g ( U ) , s . t . k = 1 K U k , l = 1 ,
where  g ( U )  denotes the regularization term used to impose the specific constraint on  U  and  β  represents the corresponding regularization parameter.

2.2. GNMF

Non-negative matrix factorization (NMF) has become a widely used method for low-dimensional representation learning of high-dimensional data owing to its simple structure and meaningful explainability [25]. Numerous variants of NMF have been recently proposed to further improve NMF performance in different applications [26,27]. For example, GNMF [28], which introduces graph-based regularization into the standard NMF, was proposed to exploit the intrinsic manifold structure of data. Rather than the standard NMF, GNMF is used in the current study to achieve the discriminative non-negative representation of all bands in an HSI. Specifically, NMF aims to find two low-rank non-negative matrices  A R N × P  and  S R L × P  that satisfy the condition  X A S T  [29]. The NMF model can generally be given as
min A , S | | X A S T | | F 2 , s . t . A , S 0 ,
where  | | · | | F  represents the Frobenius norm of matrices and  A  and  S  represent the basis and coefficient matrices, respectively. The GNMF model [28] is defined on the basis of the standard NMF model by first constructing a graph indicating the K-neighbors of each sample and then adding a regularization term based on the obtained graph. The GNMF model can formally be expressed as
min A , S | | X A S T | | F 2 + λ 2 Tr ( S T L S ) , s . t . A , S 0 ,
with
L = D W ,
where  Tr ( · )  denotes the trace of the corresponding matrices;  L  represents the graph Laplacian matrix;  W  represents the weight matrix of the graph; and  D  represents a diagonal matrix with  D i i = j W i j .

3. Proposed Methodology

Figure 1 illustrates a flowchart of the proposed method, in which clustering assignments and representation learning are jointly achieved. Specifically, the original hyperspectral data cube is first transformed into a two-dimensional matrix representation. Subsequently, GNMF is applied to learn a low-dimensional non-negative representation of the HSIs with clustering discriminability by exploiting the intrinsic manifold structure of HSIs. Furthermore, correlation-constrained FCM, which can effectively preserve the local similarity between the membership vectors of adjacent bands, is adopted on the basis of the obtained feature representation of each band to conduct clustering analysis for all the bands. Representative bands for subsequent classification tasks are then generated by selecting a band from each cluster using an information entropy-based method.

3.1. Correlation-Constrained FCM

To make full use of the high correlation among adjacent bands during clustering, we design an efficient correlation constraint for FCM. This constraint is designed on the principle that similar samples should have similar cluster assignments. Specifically, the design of the correlation constraint is inspired by the total variation (TV) regularization [30], which has been demonstrated to be quite efficient for image recovery and imaging inverse problems [31]. According to [32], TV regularization could be expressed as  T V ( S ) = j N S ( i ) | | s i s j | | 1 , where  S = [ s 1 , s 2 , , s n , , s N ]  indicates the matrix formed by the pixel  s n n = 1 , 2 , , N N S ( i )  denotes the set of indexes of neighboring pixels of the i-th pixel, and  | | · | | 1  represents the l1-norm of vectors. For the convenience of computation, the TV regularization can be implemented by the matrix operation  | | F S | | 1 , 1 , where  | | S | | 1 , 1 = i = 1 M | | S i | | 1  and  F  denotes the linear operator used to compute the differences between  s i  and its neighbors. In this study, the correlation constraint is imposed on matrix  U  via the regularization term  g ( U ) , which is given by
g ( U ) = | | UH | | 1 , 1 ,
where linear operator  H R L × ( L 1 )  is used to compute the difference between two adjacent bands. Specifically, the difference vector  d l l = 1 , 2 , , ( L 1 ) , which denotes the difference between bands  X : , l  and  X : , ( l + 1 )  are computed using  U H = [ d 1 , d 2 , , d l , , d ( L 1 ) ]  with  H  defined by
H = 1 0 0 0 1 1 0 0 0 1 1 0 0 0 1 0 0 0 0 1 0 0 0 1 .
Based on the abovementioned definition, the correlation-constrained FCM can be written as
min U , C k = 1 K l = 1 L U k , l m | | ( X T ) l , : C k , : | | 2 + β | | UH | | 1 , 1 , s . t . k = 1 K U k , l = 1 .

3.2. CFNR Model

Traditional clustering-based band selection methods typically fail to provide good clustering results owing to the high dimensionality of the original hyperspectral bands. To address this problem, discriminative non-negative representation learning is applied to each band of the target HSI. This is inspired by a study [33] in which hyperspectral bands were expressed as sparse linear representations of several basis vectors via NMF [33]. In this study, the objective function of GNMF in Equation (4) is introduced into the model of correlation-constrained FCM in Equation (8) to simultaneously perform non-negative representation learning and clustering. Consequently, the CFNR model can be expressed as
min U , C , A , S k = 1 K l = 1 L U k , l m | | S l , : C k , : | | 2 + α 2 | | X A S T | | F 2 + λ 2 Tr ( S T L S ) + β | | U H | | 1 , 1 , s . t . k = 1 K U k , l m = 1 , A 0 , S 0 .
Overall, the CFNR model demonstrates the following advantages.
  • CFNR combines non-negative representation learning and clustering into one model to improve the performance of band selection.
  • CFNR can learn discriminative non-negative representation based on manifold learning by preserving the internal manifold structure of HSIs in a low-dimensional space.
  • CFNR maximizes the strong correlation between adjacent bands of HSIs, which is beneficial for obtaining superior clustering results for band selection.

3.3. Solution of the CFNR Model

ADMM [34] is an effective method for solving large optimization problems. The principle of ADMM is to break up the problem into subproblems that can be solved iteratively by introducing additional variables [35]. Alternate optimization is achieved by fixing other variables and optimizing the desired variables [36]. In this study, ADMM is adopted to solve the optimization problem expressed in Equation (9). To facilitate model optimization, the non-negative constraint is first integrated into the objective function in Equation (9), which can be rewritten as
min U , C , A , S k = 1 K l = 1 L U k , l m | | S l , : C k , : | | 2 + α 2 | | X A S T | | F 2 + λ 2 Tr ( S T L S ) + β | | U H | | 1 , 1 + l R + ( A ) + l R + ( S ) , s . t . k = 1 K U k , l = 1 ,
where  l R + ( A )  is an indicator function that has the value of zero if each entry of matrix  A  is non-negative; otherwise it has the value  + .
To solve the optimization problem given in Equation (10) via ADMM, seven auxiliary variables,  V 1 , V 2 , V 3 , V 4 , V 5 , V 6 , and  V 7  are introduced into the objective function in Equation (10). Subsequently, the optimization problem is reformulated as
min U , C , A , S , V 1 , V 2 , V 3 , V 4 , V 5 , V 6 , V 7 k = 1 K l = 1 L U k , l m | | ( V 1 ) l , : C k , : | | 2 + α 2 | | X A V 2 T | | F 2 + λ 2 Tr ( V 5 T L V 5 ) + β | | V 4 | | 1 , 1 + l R + ( V 7 ) + l R + ( V 6 ) , s . t . k = 1 K U k , l = 1 , V 1 = S , V 2 = S , V 3 = U , V 4 = V 3 H , V 5 = S , V 6 = S , V 7 = A .
Based on the objective function in Equation (11), the augmented Lagrange function is written as
L ( U , C , A , S , V 1 , V 2 , V 3 , V 4 , V 5 , V 6 , V 7 , Z 1 , Z 2 , Z 3 , Z 4 , Z 5 , Z 6 , Z 7 ) = k = 1 K l = 1 L U k , l m | | ( V 1 ) l , : C k , : | | 2 + α 2 | | X A V 2 T | | F 2 + λ 2 Tr ( V 5 T L V 5 ) + β | | V 4 | | 1 , 1 + l R + ( V 6 ) + l R + ( V 7 ) Z 7 , A V 7 + ρ 2 | | A V 7 | | F 2 Z 1 , S V 1 + ρ 2 | | S V 1 | | F 2 Z 2 , S V 2 + ρ 2 | | S V 2 | | F 2 Z 3 , U V 3 + ρ 2 | | U V 3 | | F 2 Z 4 , V 3 H V 4 + ρ 2 | | V 3 H V 4 | | F 2 Z 5 , S V 5 + ρ 2 | | S V 5 | | F 2 Z 6 , S V 6 + ρ 2 | | S V 6 | | F 2 ,
where matrices  Z 1 Z 2 Z 3 Z 4 Z 5 Z 6 , and  Z 7  are Lagrange multipliers;  ρ  > 0 is the penalty parameter; and  ·  denotes the inner product operator.
Next, we apply ADMM to optimize variables  A S U C V 1 V 2 V 3 V 4 V 5 V 6 , and  V 7  according to Equation (12). Note that t denotes the number of iterations.
A-update: To perform an optimization of  A , we ignore terms that are not related to  A  in the objective function given in Equation (12). The simplified optimization problem can be given as
A ( t + 1 ) = arg min A α 2 | | X t A t ( V 2 T ) t | | F 2 + ρ 2 | | A t V 7 t ζ 7 t | | F 2 .
By setting the derivative of the objective function in Equation (13) to zero, the solution of  A  can be obtained as
A ( t + 1 ) = ( ρ V 7 t + ρ ζ 7 t + α X t V 2 t ) ( α ( V 2 T ) t V 2 t + ρ I ) ( 1 ) ,
where  I  denotes the identity matrix,  ζ 2 = Z 2 / ρ , and  ζ 7 = Z 7 / ρ .
V 7 -update: The variable  V 7  is optimized in accordance with Equation (15):
V 7 ( t + 1 ) = arg min V 7 l R + ( V 7 t ) + ρ 2 | | A ( t + 1 ) V 7 t ζ 7 t | | F 2 .
According to Equation (15),  A ( t + 1 ) ζ 7 t  needs to be projected onto the non-negative quadrant. This is achieved by the update rule of  V 7  Equation (16):
V 7 ( t + 1 ) = max ( A ( t + 1 ) ζ 7 t , 0 ) .
C-update: According to Equation (12), matrix  C  can be optimized by solving the sub-optimization problem given in Equation (17):
C k , : ( t + 1 ) = arg min C k = 1 K l = 1 L ( U k , l m ) t | | ( V 1 t ) l , : C k , : t | | 2 .
Based on Equation (17), the optimization process of  C  is the same as that in the standard FCM. Thus, the update rule of  C  can be written as Equation (18):
C k , : ( t + 1 ) = k = 1 K ( U k , l m ) t ( V 1 t ) l , : k = 1 K ( U k , l m ) t .
S-update: By fixing other variables that are not related to the variable  S  in Equation (12), the suboptimization problem Equation (19) about  S  can be written as
S ( t + 1 ) = arg min S ρ 2 | | S t V 1 t ζ 1 t | | F 2 + ρ 2 | | S t V 2 t ζ 2 t | | F 2 + ρ 2 | | S t V 5 t ζ 5 t | | F 2 + ρ 2 | | S t V 6 t ζ 6 t | | F 2 .
By setting the derivative of the objective function in Equation (19) to zero, the optimization of  S  can be simply expressed as
S ( t + 1 ) = 1 4 ( V 1 t + ζ 1 t + V 2 t + ζ 2 t + V 5 t + ζ 5 t + V 6 t + ζ 6 t ) ,
where  ζ 1 = Z 1 / ρ ζ 5 = Z 5 / ρ , and  ζ 6 = Z 6 / ρ .
V 1 -update: The suboptimization of about  V 1  can be expressed as Equation (21) by fixing irrelevant variables in Equation (12):
V 1 ( t + 1 ) = arg min V 1 k = 1 K l = 1 L ( U k , l m ) t | | ( V 1 t ) l , : C k , : ( t + 1 ) | | 2 + ρ 2 | | V 1 t S 1 ( t + 1 ) + ζ 1 t | | F 2 = l = 1 L k = 1 K ( U k , l m ) t | | ( V 1 t ) l , : C k , : ( t + 1 ) | | 2 + ρ 2 l = 1 L | | ( V 1 t ) l , : ( S 1 ( t + 1 ) ) l , : + ( ζ 1 t ) l , : | | F 2 .
We set the derivative of the objective function in Equation (21) to zero. Subsequently, the update rule of  V 1  is obtained and formulated as
( V 1 ( t + 1 ) ) l , : = k = 1 K ( U k , l m ) t C k , : t + ρ 2 ( ( S 1 ( t + 1 ) ) l , : + ( ζ 1 t ) l , : ) k = 1 K ( U k , l m ) t + ρ 2 .
Using matrix operations, Equation (22) can be rewritten as
V 1 ( t + 1 ) = ( ( U m ) t ) T C ( t + 1 ) + ρ 2 ( S ( t + 1 ) ζ 1 t ) ) . / ( ( 1 ( U m ) t ) ) T + ρ 2 ) ,
where  1  denotes a column vector of one.
V 2 -update: Fixing variables that are irrelevant to  V 2  in Equation (12), the subproblem concerning  V 2  is formulated as
V 2 ( t + 1 ) = arg min V 2 α 2 | | X A ( t + 1 ) ( V 2 T ) t | | F 2 + ρ 2 | | V 2 t S ( t + 1 ) + ζ 2 t | | F 2 .
By setting the derivative of the objective function in Equation (24) to zero, the update rule of  V 2  is expressed as
V 2 ( t + 1 ) = ( ρ S ( t + 1 ) ρ ζ 2 t + α X T A ( t + 1 ) ) ( α ( A T ) ( t + 1 ) A ( t + 1 ) + ρ I ) ( 1 ) .
V 5 -update: To optimize  V 5 , the corresponding subproblem is formulated as Equation (26):
V 5 ( t + 1 ) = arg min V 5 λ 2 Tr ( ( V 5 T ) t L V 5 t ) + ρ 2 | | V 5 t S ( t + 1 ) + ζ 5 t | | F 2 .
We set the derivative of the objective function in Equation (26) to zero and obtain
V 5 ( t + 1 ) = ( λ L + ρ I ) ( 1 ) ( ρ S ( t + 1 ) ρ ζ 5 t ) .
V 6 -update: The suboptimization problem regarding  V 6  is shown by Equation (28):
V 6 ( t + 1 ) = arg min V 6 l R + ( V 6 t ) + ρ 2 | | S ( t + 1 ) V 6 t ζ 6 t | | F 2 .
The update rule of  V 6  is obtained by projecting  S ( t + 1 ) ζ 6 t  onto the non-negative quadrant, expressed as
V 6 ( t + 1 ) = max ( S ( t + 1 ) ζ 6 t , 0 ) .
U -update: We derive the update steps of  U  by referring to [35]. Specifically, the subproblem of  U  can be reformulated as Equation (30):
U ( t + 1 ) = arg min U k = 1 K l = 1 L ( U k , l 2 ) t | | ( V 1 ( t + 1 ) ) l , : C k , : ( t + 1 ) | | 2 Z 3 t , U t V 3 t + ρ 2 | | U t V 3 t | | F 2 , s . t . k = 1 K U k , l = 1 .
Subsequently, Equation (30) can be rewritten as
arg min U k = 1 K l = 1 L ( A ˜ t ( U k , l 2 ) t + B ˜ t U k , l t ) , s . t . k = 1 K U k , l = 1 ,
with
A ˜ t = | | ( V 1 ) l , : ( t + 1 ) C k , : ( t + 1 ) | | 2 + ρ 2 , B ˜ t = Z 3 t ρ V 3 t .
To solve the subproblem given in Equation (31), the Lagrangian multiplier method is used, and the obtained update rule of  U  can be expressed as Equation (33):
U ( t + 1 ) = Q ˜ t B ˜ t 2 A ˜ t ,
where  Q ˜ t = ( 1 + k = 1 K B ˜ t 2 A ˜ t ) / k = 1 K 1 2 A ˜ t .
V 3 -update: The subproblem with respect to  V 3  is shown by Equation (34):
V 3 ( t + 1 ) = arg min V 3 ρ 2 | | V 3 t U ( t + 1 ) + ζ 3 t | | F 2 + ρ 2 | | V 3 t H V 4 t + ζ 4 t | | F 2 .
By setting the derivative of the objective function in Equation (34) to zero [32], the solution of  V 3  is obtained as
V 3 ( t + 1 ) = ( U ( t + 1 ) ζ 3 t + V 4 t H T ζ 4 t H T ) ( I + H H T ) 1 ,
where  Z 3 = ζ 3 / ρ  and  Z 4 = ζ 4 / ρ .
V 4 -update: A suboptimization problem with respect to  V 4  is written as
V 4 ( t + 1 ) = arg min V 4 β | | V 4 t | | 1 , 1 + ρ 2 | | V 4 t V 3 ( t + 1 ) H + ζ 4 t | | F 2 .
The soft threshold [32] is employed to update  V 4 , and the update rule of  V 4  is given as
V 4 ( t + 1 ) = soft ( β / ρ , V 3 ( t + 1 ) H ζ 4 t ) .
Based on the obtained updating rules for variables  A S U C V 1 V 2 V 3 V 4 V 5 V 6 , and  V 7 , the detailed algorithm steps of our proposed CFNR method are summarized in Algorithm 1.
Algorithm 1: CFNR for Hyperspectral Band Selection
Sensors 23 04838 i001

3.4. Information Entropy-Based Method for Representative Band Selection

After dividing all bands into different clusters, the next task is to select a group of representative bands from the obtained clusters. While most current band selection methods rely on selecting the band closest to its cluster centroid regarding Euclidean distance [12,37,38], this may be ineffective when dealing with noisy bands. In CFNR, considering the sensitivity of FCM to noise [39,40], we aim to select the target band subset via a method that can effectively reduce the effect of noise on the band selection. Specifically, to obtain the target band subset based on the clustering results, the proposed CFNR approach adopts the information entropy-based method [41,42], using which the band that contains the maximum amount of information in each cluster is selected as the representative band. This method is based on the assumption that bands should be selected based on the amount of information contained in the band. Specifically, the information entropy of all bands in each cluster is calculated and then sorted in descending order. Subsequently, the first band in each cluster is selected as the representative band. In this study, the information entropy  H ( X : , l )  of the band  X : , l l = 1 , 2 , , L , is calculated by
H ( X : , l ) = ω Θ p ( ω ) l o g p ( ω ) ,
where  ω  denotes a grayscale value;  Θ  denotes a gray space, which contains all grayscale values of band  X : , l ; and  p ( ω )  denotes the probability distribution of  ω  in band  X : , l , which can be calculated using a grayscale histogram. Notably, the effects of noise interference on band selection can be avoided to a certain extent using the information entropy-based method [39].

4. Results and Discussion

4.1. Datasets

The five HSI datasets used in the experiments are concisely described. Table 1 and Figure 2 show the main information and images of these datasets, respectively.
The Botswana dataset comprises data obtained by the NASA EO-1 satellite over the Okavango Delta, Botswana in 2001. The size of the images in this dataset is 1476 × 256 pixels with a spatial resolution of 30 m. A total of 145 bands were retained in the experiments after removing noisy and uncalibrated bands with numbers (10–15, 82–97, 102–119, 134–164, and 187–220). This dataset has 14 identified classes.
The Pavia University dataset was acquired by the ROSIS sensor over the University of Pavia in Italy. The size of the images in this dataset is 610 × 340 pixels with a spatial resolution of 1.3 m, 115 spectral bands, and 16 classes. A total of 103 bands were retained in the experiments by excluding 12 noisy bands.
Indian Pines, one of the widely used test datasets for hyperspectral classification, comprises data obtained using the AVIRIS sensor in 1992. This dataset comprises images with a size of 145 × 145 pixels and a spatial resolution of 20 m with 220 bands in the spectral range of 400–2500 nm. The bands with numbers (104–108, 150–163, and 220) were removed to reduce the influence of water absorption bands, and the 200 retained bands were used in the experiments. In addition, this dataset includes 16 ground truth classes.
The Salinas dataset comprises data obtained by the AVIRIS sensor over the Salinas Valley in California, USA. The size of images in the Salinas dataset is 512 × 217 pixels with a spatial resolution of 3.7 m. A total of 204 bands were retained in the experiments after removing noisy and uncalibrated bands (108–112, 154–167, and 224). This dataset contains 16 identified classes.
The Pavia Centre dataset was acquired by the ROSIS sensor over the University of Pavia in Italy. This dataset contains 115 bands, excluding some bands that do not contain information, and 103 bands are retained in the experiment. The dataset has 9 classes of ground cover, and an image size of 1096 × 715 pixels.

4.2. Compared Methods

The proposed CFNR is compared with five representative band selection methods to evaluate its performance. These band selection methods are briefly introduced as follows.
1. MVPCA [10]: MVPCA is a representative ranking-based band selection method that first constructs a loading factor matrix from an eigenform matrix. All bands are then ranked in accordance with the loading factor matrix, and the top-ranked bands are ultimately selected as the representative band subset.
2. WaLuDi [15]: According to the correlation measure between bands based on Kullback–Leibler divergence, WaLuDi uses a hierarchical clustering algorithm based on Ward’s link method to continuously reduce the number of bands within a cluster continuously until an ideal subset of bands is obtained.
3. E-FDPC [16]: As a clustering-based band selection method, E-FDPC divides bands into clusters and calculates the score for each band by weighting the local density and intracluster distance. Based on the obtained scores, E-FDPC selects the bands with high scores as representative bands. In addition, E-FDPC can automatically determine the optimal number of representative bands through the introduction of isolated-point-stopping criterion.
4. ASPS [9]: ASPS is a clustering-based band selection method that first roughly divides the HSI to obtain a limited number of equal-sized subcubes. Subsequently, ASPS adjusts the subcubes by using the intercluster to intracluster distance ratio to obtain the subcubes with low correlation. Finally, ASPS selects the band with the least noise from each subcube as a representative band to form the target band subset.
5. HLFC [17]: HLFC is also a clustering-based band selection method. HLFC separates an HSI into multiple regions and then learns the corresponding low-dimensional latent features of each region through a superpixel segmentation algorithm. Subsequently, all the latent features of regions are integrated into a unified feature representation of the HSI. Finally, HLFC performs K-means clustering on the unified feature representation; the representative bands are selected from the obtained clusters using the information entropy-based method.

4.3. Experimental Setup

Two classifiers, namely linear discriminant analysis (LDA) [43] and support vector machine (SVM) [44], which adopts the radial basis function as the kernel function, are employed in the experiments to test the performance of the proposed methods. The proposed method is then compared with five representative band selection methods, comprising the clustering-based methods WaLuDi, E-FDPC, ASPS, HLFC, and the classical ranking-based method MVPCA. All the methods were implemented using MATLAB 2016b and executed on a computer using the Windows 10 operating system and Intel Core i7-9700K 3.60 GHz CPU.
The number of selected bands K in the experiments ranges from 5 to 50 with an interval of 5. Each experiment is performed 10 times, and the average results are reported. In CFNR, the matrices  U  are randomly initialized in [0, 1] and  A  and  S  are initialized using K-means [45]. Twenty percent of the samples are randomly selected for training and the rest are used for prediction. The values of tolerance  ε , maximum number of iterations T, and penalty parameter  ρ  are set to  10 5 , 200, and  10 3 , respectively. Each of the Lagrange multipliers  Z 1 Z 2 Z 3 Z 4 Z 5 Z 6 , and  Z 7  is initialized as an all-ones matrix. Moreover, the model has four hyperparameters represented by three regularization parameters  α λ , and  β  and the dimension P of non-negative representation. The empirical values of these regularization parameters are shown in Table 2. For the dimension value P of non-negative representation, a P value that is substantially large results in a loss of explanation ability for low-dimensional representation learning, decreasing its benefits in band selection. Conversely, when P is set to a substantially small value, this may lead to substantial information loss. Therefore, the value of P is empirically set to 40. In addition, Table 3, Table 4, Table 5, Table 6 and Table 7 show the number of training and testing samples for each class in the five datasets. The experimental performance is evaluated using three criteria: overall accuracy (OA), average overall accuracy (AOA), and Kappa coefficient (Kappa).

4.4. Experimental Results

In this section, the results of a series of experiments conducted on five real datasets to demonstrate the effectiveness of our proposed CFNR method are presented.

4.4.1. Classification Performance Comparison

Table 8 shows the values of AOA and Kappa obtained using different methods on the five datasets, where AOA and Kappa represent the average performance over the number of bands ranging from 5 to 50 with an interval of 5. In Table 8, the columns represent the dataset, classifier, and names of the methods and the rows represent the classification accuracy of the dataset with different methods. The values in red font represent the best results. As shown in Table 8, the classification performance of the proposed CFNR outperforms that of other methods on five datasets. The performance of HLFC was the second best when using the SVM classifier on the Pavia University and Indian Pines datasets. When using the LDA classifier, the Kappa of HLFC is the same as our proposed method on the Pavia University dataset. MVPCA performs poorly on all five datasets when using the SVM classifier, with CFNR, HLFC, ASPS, and WaLuDi providing better performance. For example, for the Botswana dataset, HLFC and ASPS exhibit good performance when using the SVM, but CFNR yields superior results. CFNR demonstrates a better AOA than MVPCA, WaLuDi, DPC, ASPS, and HLFC by 10.40%, 1.58%, 14.98%, 0.09%, and 1.06%, respectively. Similar performance is demonstrated in the case of the other four datasets when using the SVM classifier. The advantage of CFNR on the Indian Pines and Pavia University datasets is not evident compared with HLFC and ASPS in the LDA classifier. However, CFNR still achieves good results in the LDA classifier. Overall, the effectiveness of the CFNR method is demonstrated by comparing its AOA and Kappa with those of other methods. In addition, Figure 3, Figure 4, Figure 5, Figure 6 and Figure 7 shows the curves of the OA values, based on which all six band selection methods are compared when using SVM and LDA classifiers on the five datasets.
(1) Botswana dataset: Figure 3a,b shows the results of using the SVM and LDA classifiers on the Botswana dataset. According to Figure 3a, CFNR provides satisfactory performance for most of the selected bands. For example, CFNR achieves excellent performance when 10, 40, and 45 bands are selected. When 25 and 35 bands are selected, the OA values of CFNR are similar to those of ASPS, but remain higher than those of the other methods. CFNR demonstrates the second-best performance when the number of selected bands is 15, 20, and 30. In particular, in the case of 50 selected bands, the performance of CFNR is similar to that of HLFC, while surpassing those of the WaLuDi, ASPS, MVPCA, and E-FDPC. HLFC demonstrates the second-best performance when the number of selected bands is 10. The performance of CFNR is considerably better than that of HLFC when 50 bands are selected. Although the performance of CFNR is similar to that of WaLuDi when 5 bands are selected, CFNR performs better than HLFC, ASPS, MVPCA, and E-FDPC. Furthermore, CFNR exhibits excellent performance when using the LDA classifier. As shown in Figure 3b, CNFR exhibits the best performance when the number of selected bands ranges from 5 to 25. Although CFNR performs slightly worse than ASPS when 30, 40, and 50 bands are selected, CFNR still outperforms HLFC, WaLuDi, and E-FDPC. When 45 bands are selected, the performance of CFNR is similar to that of WaLuDi but is still better than that of HLFC, E-FDPC, and MVPCA.
(2) Pavia University dataset: Figure 4a,b verifies the performance of CFNR on the Pavia University dataset. As shown in Figure 4a, CFNR achieves superior results when using the SVM classifier. For example, except when 10 and 30 bands are selected, the proposed method achieves excellent performance. Although the OA of CFNR is inferior to ASPS when 30 bands are selected, it still outperforms the other methods. At 10 bands, the performance of CFNR is similar to that of HLFC and better than those of WaLuDi, ASPS, E-FDPC, and MVPCA. When 10 and 15 bands are selected, the performance of CFNR is similar to that of HLFC. In the rest of the cases, the performances of CFNR exceed those of HLFC. Considering the results of the LDA classifier shown in Figure 4b, the performance of CFNR is not inferior to the other methods. Specifically, in cases of selecting 20, 45, and 50 bands, the performances of CFNR are similar to those of HLFC and ASPS and superior to those of WaLuDi, MVPCA, and E-FDPC. When 10 bands are selected, CFNR performs similarly to HLFC, and it outperforms ASPS, WaLuDi, E-FDPC, and MVPCA. When 30 bands are selected, the performance of CFNR is superior to those of WaLuDi, MVPCA, and E-FDPC. When 30 bands are selected, HLFC achieves the second-best performance and is better than CFNR. Furthermore, the OA values of the CFNR, ASPS, WaLuDi, and HFLC are similar when 40 bands are selected, but CFNR exhibits superior performance than the other methods.
(3) Indian Pines dataset: Similarly, for the Indian Pines dataset, Figure 5a,b shows that our proposed method exhibits outstanding performance compared with that of the other methods. In particular, CFNR has a distinct advantage in experiments conducted on the SVM classifier. As shown in Figure 5a, our proposed method works best on almost all bands in the SVM classifier. CFNR achieves satisfactory classification performance when the numbers of selected bands are 5–25 and 45. When 30 and 50 bands are selected, the performance of CFNR is similar to that of HLFC and better than that of WaLuDi, MVPCA, and E-FDPC. In other cases, CFNR performs no worse than the other methods. According to Figure 5b, the performance of CFNR is superior for most of the selected bands when using the LDA classifier. At 5, 25, 45, and 50 bands, CFNR demonstrates excellent performance. CFNR achieves the second-best performance with 20 selected bands. Moreover, when the number of selected bands is 15 and 35, the performance of CFNR is similar to that of WaLuDi and superior to the other methods. When 5 bands are selected, HLFC demonstrates the second-best performance. At 10 bands, the performances of CFNR, WaLuDi, and HLFC are similar and superior to those of ASPS, E-FDPC, and MVPCA.
(4) Salinas dataset: Figure 6a,b shows the results for the SVM and LDA classifiers on the Salinas dataset. According to Figure 6a, CFNR outperforms most methods when using the SVM classifier. Specifically, CFNR demonstrates excellent performance when 5 bands are selected. For 10–30 selected bands, the OA values of CFNR are the second best. For 35–50 selected bands, the performance of CFNR is similar to that of ASPS and surpasses those of WaLuDi, HLFC, MVPCA, and E-FDPC. In addition, the advantage of CFNR is more apparent when using the LDA classifier, as shown in Figure 6b. In Figure 6b, the OA of CFNR is the best when the number of selected bands ranges from 10 to 35. CFNR performs similarly to HLFC when the number of selected bands ranges from 40 to 50 and still outperforms ASPS, WaLuDi, and E-FDPC. When 5 bands are selected, the performance of CFNR is similar to that of WaLuDi, E-FDPC, and HLFC but still better than those of ASPS and MVPCA.
(5) Pavia Centre dataset: Figure 7a,b verifies the performance of CFNR on the Pavia Centre dataset. CFNR exhibits good performance for the SVM classifier in Figure 7a. For example, when the number of selected bands is 5, the proposed CFNR method achieves the second-best performance, but its performance is better than those of MVPCA, E-FDPC, ASPS, and HLFC. At 10 bands, the OA of CFNR is similar to that of E-FDPC and better than those of MVPCA, WaLuDi, E-FDPC, ASPS, and HLFC. When the numbers of selected bands range from 20 to 50, CFNR has a slight advantage over the other methods. Considering the results of the LDA classifier shown in Figure 7b, the performance of CFNR is not inferior to those of the other methods. In particular, CFNR performs best when the number of selected bands is 5, 15, and 20. When 25–50 bands are selected, the performances of CFNR are similar to those of HLFC and ASPS but superior to those of the E-FDPC, MVPCA, and WaLuDi.
In addition, to provide an intuitive description of the quality of the bands selected by CFNR, Figure 8, Figure 9, Figure 10, Figure 11 and Figure 12 display the classification maps afforded by SVM and LDA when CFNR is used to select 30 bands on each of the 5 datasets. By comparing the ground truth and classification maps afforded by SVM and LDA as shown in Figure 8, Figure 9, Figure 10, Figure 11 and Figure 12, we can see that CFNR evidently demonstrates satisfactory results under the condition of removing 79%, 85%, 70%, 85%, and 70% of the bands in the Botswana, Indian Pines, Pavia University, Salinas, and Pavia Centre datasets, respectively.
Overall, our proposed CFNR method shows satisfactory results for the five datasets. According to the experimental results, the performance of CFNR is outstanding when using the SVM classifier. In addition, although the effect of CFNR is not as good as that of SVM on the LDA classifier, it is superior to those of other comparison methods in many cases. Therefore, CFNR provides a good classification and can select a band subset that meets the requirements of hyperspectral classification applications, verifying the effectiveness of our method.

4.4.2. Convergence Analysis

Figure 13 shows the plotted convergence curve of CFNR for five datasets for the case of 30 selected bands to demonstrate the convergence of the CFNR method. This figure reveals that the algorithm of CFNR is looped 50 times for each of the 5 datasets, in which the normalized cost of the objective function of CFNR is used. The algorithm converges after 15 loops on the Indian Pines dataset. On the Pavia University and Botswana datasets, the proposed method converges approximately around 35 and 5 loops, respectively. When CFNR is tested on the Salinas and the Pavia Centre datasets, it converges after 7 and 9 loops, respectively.

4.5. Experimental Discussion

The findings from our experiments as well as the advantages and limitations of our method are presented in this section; moreover, suggestions for future work are presented.
1. Principal findings and comparison with other studies. CFNR is compared with ranking and clustering-based band selection methods. The experimental results in Figure 3, Figure 4, Figure 5, Figure 6 and Figure 7 reveal that the ranking and clustering-based methods exhibit relatively poor performance when the number of selected bands is small. This finding implies that it is considerably difficult to use fewer bands and provide sufficient information. In addition, as shown in Table 8 and Figure 3, Figure 4, Figure 5, Figure 6 and Figure 7, most clustering-based methods outperform ranking-based methods. The presumed reason is that the ranking-based band selection methods are typically based on a single criterion. These findings are consistent with those of previous studies [46,47]. Moreover, as shown in Figure 3, Figure 4, Figure 5, Figure 6 and Figure 7, the OA values of all methods increase with the increasing number of bands, but the rate of increase becomes progressively slow. This is because as the number of selected bands increases, more feature information is included in the representative bands, thereby increasing redundancy [48].
2. Advantages of the proposed method. According to the experiments shown in Figure 3, Figure 4, Figure 5, Figure 6 and Figure 7 and Table 8, the main advantages of CFNR lie in its superior performance in the SVM classifier and its robustness against datasets. As shown in Figure 3, Figure 4, Figure 5, Figure 6 and Figure 7, compared with other clustering-based band selection methods, the proposed CFNR shows good performance for all 5 datasets when the numbers of selected bands are 5–10 and 35–50 in the SVM classifier. On the one hand, the non-negative expression based on manifold learning in CFNR successfully finds low-dimensional discriminative representations for clustering the HSIs. On the other hand, the constrained FCM model in CFNR provides improved clustering results required for band selection tasks. Table 8 shows that most methods perform substantially worse on the Indian dataset than the four other datasets, which may be due to important feature information being contained in the removed noise bands of the Indian dataset [9]. However, the CFNR method still demonstrates better performance than the other methods on the Indian dataset when using SVM. This finding indicates that the bands selected by our proposed method are highly discriminative for the SVM classifier.
3. Limitations of the study. One limitation of our proposed approach lies in the presence of four hyperparameters, introducing some inconvenience to the applications of the proposed CFNR method. Nevertheless, the excellent performance of the proposed method addresses this limitation. Furthermore, the influence of noise is considered to a certain extent using the information entropy-based method in CFNR. However, how noise is handled merits further consideration. Determining the hyperparameter values through an adaptive solution and removing noise in representation learning should also be emphasized in future research.

5. Conclusions

A novel hyperspectral band selection method named CFNR is introduced in this study. In the proposed method, GNMF is integrated into the FCM model, by which clustering can be performed on the discriminative non-negative representation of all bands of a target HSI. Specifically, by exploiting the intrinsic manifold structure of HSIs with the help of GNMF, the discriminative non-negative representation of each band is determined. A correlation constraint is imposed on the membership matrix in the model of the proposed method to exploit the band correlation property of HSIs. Consequently, the similarity of clustering assignments among neighboring bands is enforced. This condition is favorable for obtaining clustering results that are consistent with the requirements of band selection. In addition, the proposed approach adopts the information entropy-based method to select a representative band subset from the obtained clusters. Compared with existing clustering-based band selection methods, CFNR designs an effective joint learning model of clustering and representative learning for band selection. As a result, clustering can be performed on the discriminative non-negative representation of all bands rather than on the original high-dimensional hyperspectral bands. Additionally, ADMM is used to provide an optimized solution for the proposed CFNR model. Various experiments on the Indian Pines, Botswana, Pavia University, Salinas, and Pavia Centre datasets indicate that the proposed method can provide superior performance compared with several state-of-the-art methods.

Author Contributions

Conceptualization, Z.L. and W.W.; Data curation, Z.L.; Formal analysis, Z.L. and W.W.; Funding acquisition, W.W.; Investigation, Z.L. and W.W.; Methodology, Z.L. and W.W.; Project administration, W.W.; Resources, W.W.; Software, Z.L.; Supervision, W.W.; Validation, Z.L.; Visualization, Z.L. and W.W.; Writing—original draft, Z.L. and W.W.; Writing—review and editing, Z.L. and W.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by Discipline with Strong Characteristics of Liaocheng University—Intelligent Science and Technology under Grant 319462208.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

We would like to thank Qi Wang (Computer Science and the Center for Optical Imagery Analysis and Learning, Northwestern Polytechnical University, Xi’an, China) and Chang Tang (School of Computer Science and the Key Laboratory of Geological Survey and Evaluation of Ministry of Education, China University of Geosciences, Wuhan, China) for sharing the source codes.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
HSIsHyperspectral remote sensing images
MVPCAMaximum-variance principal component analysis
WaLuDiWard’s linkage strategy using divergence
E-FDPCEnhanced fast density-peak-based clustering
ASPSAdaptive subspace partition strategy
HLFCRegion-aware hierarchical latent feature representation learning-guided clustering
NMFNon-negative matrix factorization
GNMFGraph regularized non-negative matrix factorization
FCMFuzzy C-means
ADMMAlternating direction multiplier method
SVMSupport vector machine
LDALinear discriminant analysis
OAOverall accuracy
AOAAverage overall accuracy
CFNRJoint learning of correlation constrained fuzzy clustering and discriminative non-negative representation

References

  1. Wang, C.; Gong, M.; Zhang, M.; Chan, Y. Unsupervised Hyperspectral Image Band Selection via Column Subset Selection. IEEE Geosci. Remote Sens. Lett. 2015, 12, 1411–1415. [Google Scholar] [CrossRef]
  2. Gao, C.; Wu, Y.; Hao, X. Hierarchical Suppression Based Matched Filter for Hyperspertral Imagery Target Detection. Sensors 2021, 21, 144. [Google Scholar] [CrossRef] [PubMed]
  3. Song, M.; Yu, C.; Xie, H.; Chang, C.-I. Progressive Band Selection Processing of Hyperspectral Image Classification. IEEE Geosci. Remote Sens. Lett. 2020, 17, 1762–1766. [Google Scholar] [CrossRef]
  4. Brabant, C.; Alvarez-Vanhard, E.; Laribi, A.; Morin, G.; Thanh Nguyen, K.; Thomas, A.; Houet, T. Comparison of Hyperspectral Techniques for Urban Tree Diversity Classification. Remote Sens. 2019, 11, 1269. [Google Scholar] [CrossRef]
  5. Angelopoulou, T.; Chabrillat, S.; Pignatti, S.; Milewski, R.; Karyotis, K.; Brell, M.; Ruhtz, T.; Bochtis, D.; Zalidis, G. Evaluation of Airborne HySpex and Spaceborne PRISMA Hyperspectral Remote Sensing Data for Soil Organic Matter and Carbonates Estimation. Remote Sens. 2023, 15, 1106. [Google Scholar] [CrossRef]
  6. Baisantry, M.; Sao, A.K.; Shukla, D.P. Band Selection Using Combined Divergence–Correlation Index and Sparse Loadings Representation for Hyperspectral Image Classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 5011–5026. [Google Scholar] [CrossRef]
  7. Jia, Y.; Shi, Y.; Luo, J.; Sun, H. Y-Net: Identification of Typical Diseases of Corn Leaves Using a 3D-2D Hybrid CNN Model Combined with a Hyperspectral Image Band Selection Module. Sensors 2023, 23, 1494. [Google Scholar] [CrossRef]
  8. Wei, Y.; Hu, H.; Xu, H.; Mao, X. Unsupervised Hyperspectral Band Selection via Multimodal Evolutionary Algorithm and Subspace Decomposition. Sensors 2023, 23, 2129. [Google Scholar] [CrossRef]
  9. Wang, Q.; Li, Q.; Li, X. Hyperspectral Band Selection via Adaptive Subspace Partition Strategy. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2019, 12, 4940–4950. [Google Scholar] [CrossRef]
  10. Chang, C.-I.; Du, Q.; Sun, T.-L.; Althouse, M. A joint band prioritization and band-decorrelation approach to band selection for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 1999, 37, 2631–2641. [Google Scholar] [CrossRef]
  11. Datta, A.; Ghosh, S.; Ghosh, A. Combination of Clustering and Ranking Techniques for Unsupervised Band Selection of Hyperspectral Images. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2015, 8, 2814–2823. [Google Scholar] [CrossRef]
  12. Zeng, M.; Cai, Y.; Cai, Z.; Liu, X.; Hu, P.; Ku, J. Unsupervised Hyperspectral Image Band Selection Based on Deep Subspace Clustering. IEEE Geosci. Remote Sens. Lett. 2019, 16, 1889–1893. [Google Scholar] [CrossRef]
  13. Geng, X.; Sun, K.; Ji, L.; Zhao, Y. A Fast Volume-Gradient-Based Band Selection Method for Hyperspectral Image. IEEE Trans. Geosci. Remote Sens. 2014, 52, 7111–7119. [Google Scholar] [CrossRef]
  14. Yang, H.; Du, Q.; Chen, G. Particle Swarm Optimization-Based Hyperspectral Dimensionality Reduction for Urban Land Cover Classification. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2012, 5, 544–554. [Google Scholar] [CrossRef]
  15. MartÍnez-UsÓMartinez-Uso, A.; Pla, F.; Sotoca, J.M.; GarcÍa-Sevilla, P. Clustering-Based Hyperspectral Band Selection Using Information Measures. IEEE Trans. Geosci. Remote Sens. 2007, 45, 4158–4171. [Google Scholar] [CrossRef]
  16. Jia, S.; Tang, G.; Zhu, J.; Li, Q. A Novel Ranking-Based Clustering Approach for Hyperspectral Band Selection. IEEE Trans. Geosci. Remote Sens. 2016, 54, 88–102. [Google Scholar] [CrossRef]
  17. Wang, J.; Tang, C.; Liu, X.; Zhang, W.; Li, W.; Zhu, X.; Wang, L.; Zomaya, A.Y. Region-Aware Hierarchical Latent Feature Representation Learning-Guided Clustering for Hyperspectral Band Selection. IEEE Trans. Cybern. 2022. early access. [Google Scholar] [CrossRef]
  18. He, H.; Tan, Y. Unsupervised Classification of Multivariate Time Series Using VPCA and Fuzzy Clustering With Spatial Weighted Matrix Distance. IEEE Trans. Cybern. 2020, 50, 1096–1105. [Google Scholar] [CrossRef]
  19. Gu, J.; Jiao, L.; Yang, S.; Liu, F. Fuzzy Double C-Means Clustering Based on Sparse Self-Representation. IEEE Trans. Fuzzy Syst. 2018, 26, 612–626. [Google Scholar] [CrossRef]
  20. Lu, X.; Dong, L.; Yuan, Y. Subspace Clustering Constrained Sparse NMF for Hyperspectral Unmixing. IEEE Trans. Geosci. Remote Sens. 2020, 58, 3007–3019. [Google Scholar] [CrossRef]
  21. Zhang, X.; Pan, W.; Wu, Z.; Chen, J.; Mao, Y.; Wu, R. Robust Image Segmentation Using Fuzzy C-Means Clustering with Spatial Information Based on Total Generalized Variation. IEEE Access 2020, 8, 95681–95697. [Google Scholar] [CrossRef]
  22. Chang, X.; Wang, Q.; Liu, Y.; Wang, Y. Sparse Regularization in Fuzzy c -Means for High-Dimensional Data Clustering. IEEE Trans. Cybern. 2017, 47, 2616–2627. [Google Scholar] [CrossRef] [PubMed]
  23. Zhou, J.; Pedrycz, W.; Gao, C.; Lai, Z.; Wan, J.; Ming, Z. Robust Jointly Sparse Fuzzy Clustering With Neighborhood Structure Preservation. IEEE Trans. Fuzzy Syst. 2022, 30, 1073–1087. [Google Scholar] [CrossRef]
  24. Wang, C.; Pedrycz, W.; Zhou, M.; Li, Z. Sparse Regularization-Based Fuzzy C-Means Clustering Incorporating Morphological Grayscale Reconstruction and Wavelet Frames. IEEE Trans. Fuzzy Syst. 2021, 29, 1826–1840. [Google Scholar] [CrossRef]
  25. Yang, Z.; Zhang, Y.; Xiang, Y.; Yan, W.; Xie, S. Non-Negative Matrix Factorization With Dual Constraints for Image Clustering. IEEE Trans. Syst. Man Cybern. 2020, 50, 2524–2533. [Google Scholar] [CrossRef]
  26. Wang, Y.; Zhang, Y. Nonnegative Matrix Factorization: A Comprehensive Review. IEEE Trans. Knowl. Data Eng. 2013, 25, 1336–1353. [Google Scholar] [CrossRef]
  27. Feng, X.; Li, H.; Wang, R.; Du, Q.; Jia, X.; Plaza, A. Hyperspectral Unmixing Based on Nonnegative Matrix Factorization: A Comprehensive Review. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2022, 15, 4414–4436. [Google Scholar] [CrossRef]
  28. Cai, D.; He, X.; Han, J.; Huang, T.S. Graph Regularized Nonnegative Matrix Factorization for Data Representation. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 33, 1548–1560. [Google Scholar]
  29. Lu, X.; Wu, H.; Yuan, Y.; Yan, P.; Li, X. Manifold Regularized Sparse NMF for Hyperspectral Unmixing. IEEE Trans. Geosci. Remote Sens. 2013, 51, 2815–2826. [Google Scholar] [CrossRef]
  30. Rudin, L.I.; Osher, S.; Fatemi, E. Nonlinear total variation based noise removal algorithms. Physica D 1992, 60, 259–268. [Google Scholar] [CrossRef]
  31. Afonso, M.V.; Bioucas-Dias, J.M.; Figueiredo, M.A.T. An Augmented Lagrangian Approach to the Constrained Optimization Formulation of Imaging Inverse Problems. IEEE Trans. Image Process. 2011, 20, 681–695. [Google Scholar] [CrossRef]
  32. Iordache, M.D.; Bioucas-Dias, J.M.; Plaza, A. Total Variation Spatial Regularization for Sparse Hyperspectral Unmixing. IEEE Trans. Geosci. Remote Sens. 2012, 50, 4484–4502. [Google Scholar] [CrossRef]
  33. Li, J.; Qian, Y. Clustering-based hyperspectral band selection using sparse nonnegative matrix factorization. J. Zhejiang Univ. Sci. C 2011, 12, 542–549. [Google Scholar] [CrossRef]
  34. Eckstein, J.; Bertsekas, D.P. On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 1992, 55, 293–318. [Google Scholar] [CrossRef]
  35. Guo, L.; Chen, L.; Lu, X.; Chen, C.L.P. Membership Affinity Lasso for Fuzzy Clustering. IEEE Trans. Fuzzy Syst. 2020, 28, 294–307. [Google Scholar] [CrossRef]
  36. Zhang, W.; Xue, X.; Zheng, X.; Fan, Z. NMFLRR: Clustering scRNA-Seq Data by Integrating Nonnegative Matrix Factorization With Low Rank Representation. IEEE J. Biomed. Health Inform. 2022, 26, 1394–1405. [Google Scholar] [CrossRef]
  37. Sun, W.; Peng, J.; Yang, G.; Du, Q. Correntropy-Based Sparse Spectral Clustering for Hyperspectral Band Selection. IEEE Geosci. Remote Sens. Lett. 2020, 17, 484–488. [Google Scholar] [CrossRef]
  38. He, C.; Zhang, Y.; Gong, D.; Song, X.; Sun, X. A Multitask Bee Colony Band Selection Algorithm With Variable-Size Clustering for Hyperspectral Images. IEEE Trans. Evol. Comput. 2022, 26, 1566–1580. [Google Scholar] [CrossRef]
  39. Zhang, M.; Ma, J.; Gong, M. Unsupervised Hyperspectral Band Selection by Fuzzy Clustering With Particle Swarm Optimization. IEEE Geosci. Remote Sens. Lett. 2017, 14, 773–777. [Google Scholar] [CrossRef]
  40. Zhang, Z.; Wang, D.; Sun, X.; Zhuang, L.; Liu, R.; Ni, L. Spatial Sampling and Grouping Information Entropy Strategy Based on Kernel Fuzzy C-Means Clustering Method for Hyperspectral Band Selection. Remote Sens. 2022, 14, 5058. [Google Scholar] [CrossRef]
  41. Li, S.; Peng, B.; Fang, L.; Li, Q. Hyperspectral Band Selection via Optimal Combination Strategy. Remote Sens. 2022, 14, 2858. [Google Scholar] [CrossRef]
  42. Sun, H.; Ren, J.; Zhao, H.; Yuen, P.; Tschannerl, J. Novel Gumbel-Softmax Trick Enabled Concrete Autoencoder with Entropy Constraints for Unsupervised Hyperspectral Band Selection. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–13. [Google Scholar] [CrossRef]
  43. Bandos, T.V.; Bruzzone, L.; Camps-Valls, G. Classification of Hyperspectral Images With Regularized Linear Discriminant Analysis. IEEE Trans. Geosci. Remote Sens. 2009, 47, 862–873. [Google Scholar] [CrossRef]
  44. Melgani, F.; Bruzzone, L. Classification of hyperspectral remote sensing images with support vector machines. IEEE Trans. Geosci. Remote Sens. 2004, 42, 1778–1790. [Google Scholar] [CrossRef]
  45. Gillis, N.; Kuang, D.; Park, H. Hierarchical Clustering of Hyperspectral Images Using Rank-Two Nonnegative Matrix Factorization. IEEE Trans. Geosci. Remote Sens. 2015, 53, 2066–2078. [Google Scholar] [CrossRef]
  46. Sun, X.; Shen, X.; Pang, H.; Fu, X. Multiple Band Prioritization Criteria-Based Band Selection for Hyperspectral Imagery. Remote Sens. 2022, 14, 5679. [Google Scholar] [CrossRef]
  47. Wang, W.; Wang, W.; Liu, H. Correlation-Guided Ensemble Clustering for Hyperspectral Band Selection. Remote Sens. 2022, 14, 1156. [Google Scholar] [CrossRef]
  48. Yuan, Y.; Lin, J.; Wang, Q. Dual-Clustering-Based Hyperspectral Band Selection by Contextual Analysis. IEEE Trans. Geosci. Remote Sens. 2016, 54, 1431–1445. [Google Scholar] [CrossRef]
Figure 1. Flowchart of the proposed CFNR method.
Figure 1. Flowchart of the proposed CFNR method.
Sensors 23 04838 g001
Figure 2. Images of five HSI datasets. (a) Botswana. (b) Pavia University. (c) Indian Pines. (d) Salinas. (e) Pavia Centre.
Figure 2. Images of five HSI datasets. (a) Botswana. (b) Pavia University. (c) Indian Pines. (d) Salinas. (e) Pavia Centre.
Sensors 23 04838 g002
Figure 3. OA for the SVM and LDA classifiers by selecting different numbers of bands on the Botswana dataset. (a) OA by SVM. (b) OA by LDA.
Figure 3. OA for the SVM and LDA classifiers by selecting different numbers of bands on the Botswana dataset. (a) OA by SVM. (b) OA by LDA.
Sensors 23 04838 g003
Figure 4. OA for the SVM and LDA classifiers by selecting different numbers of bands on the Pavia University dataset. (a) OA by SVM. (b) OA by LDA.
Figure 4. OA for the SVM and LDA classifiers by selecting different numbers of bands on the Pavia University dataset. (a) OA by SVM. (b) OA by LDA.
Sensors 23 04838 g004
Figure 5. OA for the SVM and LDA classifiers by selecting different numbers of bands on the Indian Pines dataset. (a) OA by SVM. (b) OA by LDA.
Figure 5. OA for the SVM and LDA classifiers by selecting different numbers of bands on the Indian Pines dataset. (a) OA by SVM. (b) OA by LDA.
Sensors 23 04838 g005
Figure 6. OA for the SVM and LDA classifiers by selecting different numbers of bands on the Salinas dataset. (a) OA by SVM. (b) OA by LDA.
Figure 6. OA for the SVM and LDA classifiers by selecting different numbers of bands on the Salinas dataset. (a) OA by SVM. (b) OA by LDA.
Sensors 23 04838 g006
Figure 7. OA for the SVM and LDA classifiers by selecting different numbers of bands on the Pavia Centre dataset. (a) OA by SVM. (b) OA by LDA.
Figure 7. OA for the SVM and LDA classifiers by selecting different numbers of bands on the Pavia Centre dataset. (a) OA by SVM. (b) OA by LDA.
Sensors 23 04838 g007
Figure 8. Classification map of CFNR on Botswana dataset when the band is 30. (a) Ground truth. (b) SVM. (c) LDA.
Figure 8. Classification map of CFNR on Botswana dataset when the band is 30. (a) Ground truth. (b) SVM. (c) LDA.
Sensors 23 04838 g008
Figure 9. Classification map of CFNR on Pavia University dataset when the band number is 30. (a) Ground truth. (b) SVM. (c) LDA.
Figure 9. Classification map of CFNR on Pavia University dataset when the band number is 30. (a) Ground truth. (b) SVM. (c) LDA.
Sensors 23 04838 g009
Figure 10. Classification map of CFNR in Indian Pine dataset when the band number is 30. (a) Ground truth. (b) SVM. (c) LDA.
Figure 10. Classification map of CFNR in Indian Pine dataset when the band number is 30. (a) Ground truth. (b) SVM. (c) LDA.
Sensors 23 04838 g010
Figure 11. Classification map of CFNR on Salinas dataset when the band number is 30. (a) Ground truth. (b) SVM. (c) LDA.
Figure 11. Classification map of CFNR on Salinas dataset when the band number is 30. (a) Ground truth. (b) SVM. (c) LDA.
Sensors 23 04838 g011
Figure 12. Classification map of CFNR on Pavia Centre dataset when the band number is 30. (a) Ground truth. (b) SVM. (c) LDA.
Figure 12. Classification map of CFNR on Pavia Centre dataset when the band number is 30. (a) Ground truth. (b) SVM. (c) LDA.
Sensors 23 04838 g012
Figure 13. Convergence curves of our proposed method on the five datasets.
Figure 13. Convergence curves of our proposed method on the five datasets.
Sensors 23 04838 g013
Table 1. Information of the five HSI datasets.
Table 1. Information of the five HSI datasets.
DatasetPixelsSpatial ResolutionClassesBands
Botswana   1476 × 256 30 m/pixel14145
Pavia University   610 × 340 1.3 m/pixel9103
Indian Pines   145 × 145 20 m/pixel16200
Salinas   512 × 217 3.7 m/pixel16204
Pavia Centre   1096 × 715 1.3 m/pixel9102
Table 2. Hyperparameters settings for the five datasets.
Table 2. Hyperparameters settings for the five datasets.
Dataset   α   λ   β
Botswana101.20.02
Pavia University121.10.03
Indian Pines121.20.01
Salinas111.20.02
Pavia Centre121.10.01
Table 3. Number of training and testing samples in the Botswana dataset.
Table 3. Number of training and testing samples in the Botswana dataset.
ClassNameTraining SamplesTesting Samples
1Water54216
2Hippo grass2081
3Floodplain grasses 150201
4Floodplain grasses 243172
5Reeds54215
6Riparan54215
7Firescar52207
8Island interior41162
9Acacia woodlands63251
10Acacia shrublands50198
11Acacia grasslands61244
12Short mopane36145
13Mixed mopane54214
14Exposed soils1976
Table 4. Number of training and testing samples in the Pavia University dataset.
Table 4. Number of training and testing samples in the Pavia University dataset.
ClassNameTraining SamplesTesting Samples
1Asphalt13265305
2Meadows373014,919
3Gravel4201679
4Trees6132451
5Painted metal sheets2691076
6Bare Soil10064023
7Bitumen2661064
8Self-Blocking Bricks7362946
9Shadows18947576
Table 5. Number of training and testing samples in the Indian Pine dataset.
Table 5. Number of training and testing samples in the Indian Pine dataset.
ClassNameTraining SamplesTesting Samples
1Alfalfa937
2Corn-notill2861142
3Corn-mintill166664
4Corn47190
5Grass-pasture97386
6Grass-trees146584
7Grass-pasture-mowed622
8Hay-windrowed96382
9Oats416
10Soybean-notill194778
11Soybean-mintill4911964
12Soybean-clean119474
13Wheat41164
14Woods2531012
15Buildings-Grass-Trees-Drives77309
16Stone-Steel-Towers1974
Table 6. Number of training and testing samples in the Salinas dataset.
Table 6. Number of training and testing samples in the Salinas dataset.
ClassNameTraining SamplesTesting Samples
1Brocoli_green_weed_14022009
2Brocoli_green_weeds_27453726
3Fallow3951976
4Fallow_rough_plow2791394
5Fallow_smooth5362678
6Stubble7923959
7Celery7163579
8Grapes_untrained225411,271
9Soil_vinyard_develop12416203
10Corn_senesced_green_weeds6563278
11Lettuce_romaine_4wk2141068
12Lettuce_romaine_5wk3851927
13Lettuce_romaine_6wk16916
14Lettuce_romaine_7wk2031017
15Vinyard_untrained14547268
16Vinyard_vertical_trellis3611807
Table 7. Number of training and testing samples in the Pavia Centre dataset.
Table 7. Number of training and testing samples in the Pavia Centre dataset.
ClassNameTraining SamplesTesting Samples
1Water168842
2Trees164820
3Asphalt163816
4Self-Blocking Bricks162808
5Bitumen162808
6Tiles2521260
7Shadows95476
8Meadows16824
9Bare Soil164820
Table 8. Performance comparison on the five datasets, where the values in red bold font represent the best results. CFNR is our proposed method.
Table 8. Performance comparison on the five datasets, where the values in red bold font represent the best results. CFNR is our proposed method.
DatasetClassifierCFNRMVPCA [10]WaLuDi [15]E-FDPC [16]ASPS [9]HLFC [17]
BotswanaAOA(SVM)89.14378.74487.56174.16089.05088.081
Kappa(SVM)0.88240.76950.86520.72000.88130.8709
AOA(LDA)90.32484.93089.92383.02289.16089.917
Kappa(LDA)0.89680.84060.89260.82040.88470.8925
Pavia
University
AOA(SVM)89.42377.23088.15085.21988.82088.834
Kappa(SVM)0.85820.66610.84090.79990.84970.8499
AOA(LDA)81.37276.70881.12978.23281.23081.359
Kappa(LDA)0.76850.71200.76560.73080.76790.7685
Indian
Pines
AOA(SVM)79.45164.23177.89364.97177.81278.556
Kappa(SVM)0.76340.58050.74500.58880.74350.7534
AOA(LDA)65.56952.44664.78859.72065.46064.698
Kappa(LDA)0.63140.49130.62360.57220.63040.6225
SalinasAOA(SVM)98.46582.71391.81292.11198.04992.333
Kappa(SVM)0.98260.80550.90860.91200.97670.9145
AOA(LDA)88.25781.73887.49387.90882.83087.845
Kappa(LDA)0.87250.80300.86440.86880.81430.8682
Pavia
Centre
AOA(SVM)94.01973.04189.25092.22286.63185.5632
Kappa(SVM)0.93090.70270.85620.91320.95730.8461
AOA(LDA)96.23389.00181.84996.09181.01495.961
Kappa(LDA)0.94710.84780.77420.94510.76440.9433
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, Z.; Wang, W. Joint Learning of Correlation-Constrained Fuzzy Clustering and Discriminative Non-Negative Representation for Hyperspectral Band Selection. Sensors 2023, 23, 4838. https://doi.org/10.3390/s23104838

AMA Style

Li Z, Wang W. Joint Learning of Correlation-Constrained Fuzzy Clustering and Discriminative Non-Negative Representation for Hyperspectral Band Selection. Sensors. 2023; 23(10):4838. https://doi.org/10.3390/s23104838

Chicago/Turabian Style

Li, Zelin, and Wenhong Wang. 2023. "Joint Learning of Correlation-Constrained Fuzzy Clustering and Discriminative Non-Negative Representation for Hyperspectral Band Selection" Sensors 23, no. 10: 4838. https://doi.org/10.3390/s23104838

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop