Next Article in Journal
High-Resolution Airborne Hyperspectral Imagery for Assessing Yield, Biomass, Grain N Concentration, and N Output in Spring Wheat
Previous Article in Journal
LighterGAN: An Illumination Enhancement Method for Urban UAV Imagery
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Unified Low-Rank Subspace Clustering with Dynamic Hypergraph for Hyperspectral Image

School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2021, 13(7), 1372; https://doi.org/10.3390/rs13071372
Submission received: 15 February 2021 / Revised: 16 March 2021 / Accepted: 28 March 2021 / Published: 2 April 2021

Abstract

:
Low-rank representation with hypergraph regularization has achieved great success in hyperspectral imagery, which can explore global structure, and further incorporate local information. Existing hypergraph learning methods only construct the hypergraph by a fixed similarity matrix or are adaptively optimal in original feature space; they do not update the hypergraph in subspace-dimensionality. In addition, the clustering performance obtained by the existing k-means-based clustering methods is unstable as the k-means method is sensitive to the initialization of the cluster centers. In order to address these issues, we propose a novel unified low-rank subspace clustering method with dynamic hypergraph for hyperspectral images (HSIs). In our method, the hypergraph is adaptively learned from the low-rank subspace feature, which can capture a more complex manifold structure effectively. In addition, we introduce a rotation matrix to simultaneously learn continuous and discrete clustering labels without any relaxing information loss. The unified model jointly learns the hypergraph and the discrete clustering labels, in which the subspace feature is adaptively learned by considering the optimal dynamic hypergraph with the self-taught property. The experimental results on real HSIs show that the proposed methods can achieve better performance compared to eight state-of-the-art clustering methods.

Graphical Abstract

1. Introduction

Hyperspectral image (HSI) classification is an important problem in the remote sensing community. Extensive prior literature addresses the classification in the framework of supervised classification [1,2,3], in which the training of the classifier relies on the labeled data (with ground-truth information). However, the labeled datasets are strained and impossible to obtain in some applications by human capacity [4]. With the aim of exploiting the unlabeled remote sensing data, unsupervised classification methods containing the segmentation of the dataset into several groups with no prior label information are necessary.
According to the existing literature, clustering methods are divided into several categories [5]. The two most popular categories suitable for the characteristics of the HSIs are centroid-based methods and spectral-based methods. Among the centroid-based clustering methods, k-means [6] and fuzzy c-means (FCM) [7] get more attention due to their computational efficiency and simplicity, which group pixels by finding the minimum distance between pixels and each clustering centroid through iterative update. Recently, the spectral-based clustering methods have been highly popular and have been widely used for hyperspectral data clustering. In general, these methods construct a similarity matrix based on the original data first, then apply the centroid-based methods to the eigenspaces of the Laplacian matrix to segment pixels. Specifically, the locally spectral clustering (LSC) [8] and the globally spectral clustering (GSC) [9] use the local and global neighbors about each pixel to construct the similarity matrix which represents the relationship between pairs of pixels respectively, and applies k-means on the eigenspace of the Laplacian matrices, but they cannot distinguish between subspaces the pixels should belong to. Otherwise, the large spectral variability results in the uniform feature point distribution, which increased the difficulty of HSI clustering [5]. The recently developed sparse subspace clustering (SSC) [10,11] and low-rank subspace clustering (LRSC) [12,13] methods use the sparse and low-rank representation coefficients to define the adjacent matrix, and apply spectral clustering to obtain the segmentation result. However, compared with SSC, LRSC is better at exploring the global structure information by finding the lowest-rank representation of all the data jointly. Nevertheless, the original LRSC model cannot explore the local latent structure information of the data while exploiting the corresponding subspaces.
Inspired by the theory of manifold learning in image processing [14], Lu et al. [15] proposed the graph-embedded low-rank representation (GLRR) by incorporating graph regularization into low-rank representation objective function. However, the general graph regularization model only uses the paired relationship between two pixels, which cannot excavate the complex high-order relationships of the pixels. In fact, the relationship between the hyperspectral pixels we are interested in is not just a pairwise relationship between two pixels, but a plural or even more complex relationship. Instead of considering pairwise relations, the hypergraph models the data manifold structures by exploring the high-order relations among data points, which is first proposed by Berge [16]. Zhou et al. [17] combined the powerful methodology of spectral clustering to extend originally undirected graph to hypergraph. Then, hypergraph is widely used in feature extraction [18], band selection [19], dimension reduction [20], and noise reduction [21] in hyperspectral images. According to the extensive prior literature, the methods associated with hypergraph based on representation learning are divided into two categories. One is using the representation coefficient as the hyperedge weight to construct hypergraph, such as [2,22] regards sparse and low-rank coefficients as a new feature to measure the similarity of the pixels and adaptively select neighbors for constructing the hypergraph, respectively. The other is using hypergraph as regularization to optimal representation coefficient by capturing intrinsic geometrical structure. Gao et al. [23] first introduced hypergraph into sparse coding, in which hypergraph explores the similarity information among the pixels within the same hyperedge, and simultaneously updates the sparse representation coefficient of them to be similar to each other. Motivated by the idea of hypergraph regularization, it was introduced into non-negative matrix factorization [24], sparse NMF [25], low-rank representation [26,27].
It is noteworthy that there are two main problems in existing hypergraph-based representation learning methods. First, the pre-constructed hypergraph is usually learned from the original data with a certain distance measurement but not optimized dynamically. Then, Zhang et al. [28] proposed a unified framework for data structure estimation and feature selection, which update the hypergraph weight in the hypergraph learning process. In Reference [29], a dynamic hypergraph structure learning method was proposed, in which the incidence matrix of hypergraph can be learned by considering the data correlation on both the label space and the feature space. In addition, the data from the original feature space may contain various of noises, which could degenerate the performance since these methods highly depend on the constructed hypergraph. Zhu et al. [30,31] proposed an unsupervised sparse feature selection method by embedding a hypergraph Laplacian regularizer, in which the hypergraph was learned dynamically from the optimized sparse subspace feature. Otherwise, the hypergraph was adaptively learned from the latent representation space, which can robustly characterize the intrinsic data structure [32,33]. Second, the clustering performance obtained by the existing k-means-based methods is unstable as the initialization of the cluster centers has too much impact on the performance of the k-means method. Therefore, it is necessary to construct a unified framework and directly generate discrete clustering labels [34,35,36]. However, the existing unified clustering framework is based on the general graph structure, which may lead to significant information loss and reduce the performance of the clustering algorithm.
To address the issues, we propose a novel unified dynamic hypergraph low-rank subspace clustering method for hyperspectral images, known as UDHLR. First, we develop a dynamic hypergraph low-rank subspace clustering method, known as DHLR, where the hypergraph regularization is used to preserve the local complex structure of the low-dimensional data. Meanwhile, the hypergraph is adaptively learned from the low-rank subspace feature. However, the DHLR algorithm works in two separate steps: learning the low-rank coefficient matrix as similarity graph; generating the discrete clustering label by the k-means method. Therefore, we integrate these two subtasks into a unified framework, in which low-rank representation coefficient, hypergraph structure and discrete clustering label are optimized by using the results of the others to get an overall optimal result. The main contributions of our methods are summarized as follows:
(1)
Instead of pre-constructing a fixed hypergraph incidence and weight matrices, the hypergraph is adaptively learned from the low-rank subspace feature. The dynamically constructed hypergraph is well structured and theoretically suitable for clustering.
(2)
The proposed method simultaneously optimizes continuous labels, and discrete cluster labels by a rotation matrix without any relaxing information loss.
(3)
It jointly learns the similarity hypergraph from the learned low-rank subspace data and the discrete clustering labels by solving a unified optimization problem, in which the low-rank subspace feature and hypergraph are adaptively learned by considering the clustering performance and the continuous clustering labels just serve as intermediate products.
The remainder of this paper is organized as follows: Section 2 revisits the low-rank representation and hypergraph. Section 3 describes the proposed DHLR and UDHLR models. Section 4 presents the experimental setting and experimental results. Section 5 presents the discussions about computation complexity. Finally, concludes are presented in Section 6. The framework of the proposed methods is shown in Figure 1.

2. Related Work

The important notations in the paper are summarized in Table 1.

2.1. Low-Rank Representation

Let X = x 1 , x 2 , , x n     R d × n denotes a hyperspectral image with n samples, x i R d represents the i-th pixel with d spectral bands. Low-rank representation (LRR) attempts to solve the following objective function to seek the lowest-rank representation for clustering
min Z rank Z s . t . X = XZ + N ,
where Z R n × n denotes the lowest-rank representation matrix under a self-expressive dictionary [13], rank Z is the rank of matrix Z, N is a sparse matrix of outliers. However, the rank minimization problem is NP-hard and difficult to optimize, thus the nuclear norm is adopted to address this issue, yielding the following optimization [13]:
min Z Z * s . t . X = XZ + N ,
where Z * is the nuclear norm constraint of matrix Z and is calculated as Z * = i n δ i , δ i is the i-th singular value of matrix Z. The representation matrix Z can be solved by optimizing the above problem via the inexact augmented Lagrange multiplier (ALM) method [37]. Finally, the adjacency matrix Z   +   Z T as the edge weights can be constructed with the obtained low-rank coefficient matrix, and the clustering result is obtained by applying the k-means to the eigenspaces of the Laplacian matrix induced by the adjacency matrix.

2.2. Hypergraph

The relationship between pixels we are interested in is not just a pairwise relationship between two pixels, but a plural or even more complex relationship. When simply compress the multivariate relationship into a pairwise relationship between two pixels, it would inevitably lose a lot of useful information, thus it would affect the accuracy of feature learning to a certain extent [19].
Let G = V , E , W denote a hypergraph, where V = v i i = 1 n and E = e i i = 1 n are the set of vertexes and hyperedges, respectively. The dataset X can be used to make up the set of the vertexes V . W = diag w e 1 , w e 2 , , w e n denotes the weight matrix of the hyperedges. For simplicity, it only considers the case where the hyperedge contains the same number of vertices. For a given edge e i     E , the hyperedge weight can be constructed as w e i = v j N v i exp v i v j 2 / 2 σ 2 , in which N v i is the set of the nearest neighbors to v i , σ is the kernel parameter. An incidence matrix H denotes the relationship between vertices and hyperedges, with entries defined as:
h v i , e j = 1 , if v i e j 0 , otherwise .
The vertex degree of each vertex v i is defined as d v i   =   j = 1 n w e j h v i , e j , and the degree of hyperedge e i is defined as d e i   =   j = 1 n h v j , e i . Then, D v   =   diag d v 1 , d v 2 , , d v n and D e = diag d e 1 , d e 2 , , d e n are the vertex–degree matrix and the hyperedge–degree matrix, respectively. Finally, the normalized hypergraph Laplacian matrix is
L = I D v 1 2 HW D e 1 H T D v 1 2 .
Thus, the hypergraph can well represent local structure information and complex relationship between pixels. It worth noting that the quality of L depends on H and W , we use L H , W to represent hypergraph Laplacian hereunder.

3. Materials and Methods

The conventional hypergraph construction is only based on the original features, which is independent of the learned features in low-rank subspace. There is no guarantee that the learned hypergraph is optimal to model the pixelwise relationship among subspace feature. Therefore, this learned a suboptimal hypergraph structure can lead to a suboptimal solution in the process of learning incidence matrix. To address the above problems, we propose to learn a dynamic hypergraph to explore the intrinsic complex local structure of pixels in their low-dimensional feature space. In addition, hypergraph-based manifold regularization can make the low-rank representation coefficient well capture the global structure information of the hyperspectral data. In the end, the proposed model learns a rotation matrix to simultaneously learn continuous labels and discrete cluster labels in one step.

3.1. Dynamic Hypergraph-Based Low-Rank Subspace Clustering

Based on Section 2.2, a hypergraph structure can be used to maintain the local relationship of the original data [17,28]. First, we propose to preserve the local complex structure of the low-dimensional data by the hypergraph regularization. To do this, we design the following objective function:
min X = XZ + N , Z 0 Z * + λ 1 e E , Xz i , Xz j V w e h X z i , e h X z j , e d e X z i d Xz i X z j d Xz j 2 + λ 2 N 2 , 1 .
Obviously, Equation (5) is equivalent to:
min X = XZ + N , Z 0 Z * + λ 1 Tr XZ L H , W Z T X T + λ 2 N 2 , 1 ,
where L H , W is hypergraph Laplacian, λ 1 and λ 2 are two tuning parameters. However, H is pre-constructed based on the original data, which usually cannot be learned dynamically. In this paper, we propose to update hypergraph H based on the low-dimensional subspace information, furthermore, couple with the learning of Z in a unified framework. To achieve this, we design the final objective function as follows:
min Z , N , H , W Z * + λ 1 Tr XZ L H , W Z T X T + λ 2 N 2 , 1 + λ 3 W F 2 s . t . X = XZ + N , Z 0 , w T 1 = 1 , w i > 0 ,
where W = diag w , the two-norm regularization on the weight matrix is used to avoid overfitting. On the one hand, Z can preserve the global structures via the low-rank constraint to conduct subspace learning. On the other hand, it can also preserve the local structures via the second term of Equation (7) to select the informative features. The proposed dynamic hypergraph low-rank subspace clustering is known as DHLR.

3.2. Optimization Algorithm for Solving Problem (7)

In order to solve problem (7), the variable J is introduced to make (7) separable for optimization as follows:
min J , Z , N , H , W J * + λ 1 Tr XZ L H , W Z T X T + λ 2 N 2 , 1 + λ 3 W F 2 s . t . X = XZ + N , Z = J , Z 0 , w T 1 = 1 , w i > 0 .
The optimization problem (8) can be solved with ADMM algorithm by minimizing the following augmented Lagrangian formulation:
L J , Z , N , H , W   =   J * + λ 1 Tr XZ L H , W Z T X T + λ 2 N 2 , 1 + λ 3 W F 2 η w T 1 1 + μ 2 X XZ N + C 1 μ F 2 + Z J + C 2 μ F 2 ,
where C 1 , C 2 and η are Lagrange multipliers, μ is a positive penalty parameter. The variables J , Z , N , H , W . and Lagrange multipliers can be obtained by alternately solving each variable of (9) with other variables fixed. The detailed solution steps are as follows:
Update J : Fixing variables Z , N , H , W , we can obtain the solution of J by solving the following problem:
J t + 1 = arg min J   J * + μ t 2 Z t J + C 2 t μ t F 2 .
By introducing the singular value thresholding (SVT) operator [38], the solution of J is given as:
J t + 1 = Θ 1 / μ t Z t + C 2 t μ t ,
where Θ denotes the SVT operator.
Update Z : Fixing variables J , N , H , W , we can obtain the objective function about Z as follows:
Z t + 1 = arg min Z 0 λ 1 Tr XZ L H t , W t Z T X T + μ t 2 X XZ N t + C 1 t μ t F 2 + Z J t + 1 + C 2 t μ t F 2 .
Problem (12) has a closed-form solution as a quadratic minimization problem, which is:
Z t + 1 = λ 1 μ t X T L H t , W t X + X T X + I 1 X T X X T N t + J t + 1 + X T C 1 t C 2 t μ t .
Update N : Fixing variables J , Z , H , W , we can obtain the solution of N by solving the following problem:
N t + 1 = arg min N λ 2 N 2 , 1 + μ t 2 X X Z t + 1 N + C 1 t μ t F 2 .
The objective function on the variable N can be rewritten as:
N t + 1 = arg min N λ 2 μ t N 2 , 1 + 1 2 P t + 1 N F 2 .
In which P t + 1 = X X Z t + 1 + C 1 t μ t , the i-th column of N t + 1 is
N i t + 1 = P i t + 1 2 λ 2 / μ t P i t + 1 2 P i t + 1 , if P i t + 1 2 > λ 2 / μ t 0 , otherwise ,
where P i and N i are the i-th column of matrices P and N , respectively.
Update H and D e : According to the definition of the hypergraph in Section 2.2, the hyperedges are constructed from the original data and may affect the accuracy of the hypergraph with the noise. To tackle this problem, we use the low-dimensional subspace feature with no noisy to learn the hyperedges. Then the formulation for constructing the set of the hyperedges is given like [30] as follow:
e i t + 1 = v j t + 1 | Xz j t + 1 N Xz i t + 1 , i , j = 1 , , n ,
in which N · is the near neighbor pixels. In this work, Xz j t + 1 is the top K similarity neighbors of Xz i t + 1 except for itself. After producing the incidence matrix H t + 1 , it is easy to work out D e t + 1 by
d e i t + 1 = j = 1 n h v j t + 1 , e i t + 1 D e t + 1 = diag d t + 1 .
Update W : Fixing variables J , Z , N , H , we can obtain the objective function about W as follows:
W t + 1   =   arg min W > 0 λ 1 Tr X Z t + 1 L H t + 1 , W Z t + 1 T X T + λ 3 W F 2 η w T 1 1 .
in which L H t + 1 , W = I D v t 1 2 H t + 1 W D e t + 1 1 H t + 1 T D v t 1 2 . By letting B t + 1 = D e t + 1 1 H t + 1 T D v t 1 2 Z t + 1 T X T X Z t + 1 D v t 1 2 , and b t + 1 = diag B t + 1 . Equation (19) can be rewritten as the following form:
w t + 1 = arg min w i > 0 λ 1 b t + 1 w + λ 3 w t 2 2 η w T 1 1 .
Then Equation (20) can be rewritten as the following form:
w t + 1 = arg min w i > 0 w λ 1 2 λ 3 b t + 1 + η 2 2 .
According to the Karush–Kuhn–Tucker conditions, the closed-form solution for w i is:
w i t + 1 = λ 1 2 λ 3 b i t + 1 + η + , i = 1 , , n .
Then we further obtain W t + 1 = diag w t + 1 and
d v i t + 1 = j = 1 n w e j t + 1 h v i t + 1 , e j t + 1 D v t + 1 = diag d t + 1 .
Update the Lagrange multipliers C 1 , C 2 and penalty parameter μ by
C 1 t + 1 = C 1 t + μ t X X Z t + 1 E t + 1 C 2 t + 1 = C 2 t + μ t Z t + 1 J t + 1 μ t + 1 = m i n μ max , ρ · μ t .
The entire procedure for solving DHLR method is summarized in Algorithm 1.
Algorithm 1 the DHLR algorithm for HSI clustering
  • Input: A 2-D matrix of the HSI X R d × n , the number of desired clusters c and the regularization parameter λ 1 , λ 2 , λ 3 .
  • Initialization: Initialize the hypergraph L H 0 , W 0 by using the original data.
  • while ( Z t J t > ε or X X Z t N t > ε and t M a x I t e r ) do
  •    1. Update J t + 1 by solving Equation (11).
  •    2. Update Z t + 1 by solving Equation (13).
  •    3. Update N t + 1 by solving Equation (16).
  •    4. Update H t + 1 and D e t + 1 by solving Equations (17) and (18).
  •    5. Update W t + 1 and D v t + 1 by solving Equations (22) and (23).
  •    6. Update L H t + 1 , W t + 1 according to Equation (4).
  •    7. Update the Lagrange multipliers and penalty parameter by (24).
  • end while
  •    8. Construct the adjacent matrix with M = Z + Z T .
  •    9. Applying spectral clustering to the Laplacian matrix induced by the adjacent matrix M .
  • Output: the cluster assignment for X .

3.3. Unified Dynamic Hypergraph-Based Low-Rank Subspace Clustering

Most existing hypergraph-based clustering methods contain two independent processes: the hypergraph construction and clustering. Using the hypergraph to construct similarity matrix, then use the spectral clustering or k-means to produce final clustering labels [39]. Although this approach was very popular in clustering applications, it may also produce very unstable performance since the initialization of the cluster centers has too much impact on the performance of the k-means method [8]. In order to address their problem, we propose a unified framework to exploit the correlation between the similarity hypergraph and discrete cluster labels for the clustering task. It updates the dynamic hypergraph with an optimal low-rank subspace feature and then directly generates the discrete cluster labels by introducing a rotation matrix. Finally, the proposed model cannot only make use of the optimal dynamic hypergraph and the global low-dimensional feature information but also get the discrete clustering labels. In order to achieve the above purpose, the objective function can be denoted as
min Z , N , H , W , F , Q , Y Z * + λ 1 Tr XZ L H , W Z T X T + λ 2 N 2 , 1 + λ 3 W F 2 + λ 4 Tr F T L H , W F + λ 5 Y FQ F 2 s . t . X = XZ + N , Z 0 , F T F = I c , F R n × c , Q T Q = I c , w T 1 = 1 , w i > 0 , Y I dx ,
where λ 4 and λ 5 are penalty parameters. In general, F = f 1 , f 2 , , f n T R n × c (s.t. F Idx ) is the cluster indicator matrix in spectral clustering method. In order to solve the NP-hard problem caused by the discrete constraint on F , F R n × c is relaxed into continuous domain, and the orthogonal constraint is adopted to make it computational tractable. In order to achieve an ideal clustering structures, [40] proposed to impose a rank constrain on the hypergraph Laplacian matrix L induced by representation matrix Z , rank L   =   n c . Under this constraint, we can directly partition the data into clusters. The rank constraint problem is equivalent to minimize i = 1 c σ i L [41]. According to Ky Fan’s theorem [42], i = 1 c σ i L = min F T F = I Tr F T LF . In order to generate the discrete clustering label, we introduce a rotation matrix Q R c × c . According to the spectral solution invariance property [43], the last term can find a proper orthonormal Q to make the result of FQ approximate to the real discrete clustering labels. Y R n × c is the discrete label matrix. In fact, Equation (25) is not a simple unification of some terms, which can exploit the relationship between the dynamic hypergraph matrix and the clustering labels. Ideally, we have z i j 0 if and only if pixel i and j are in the same cluster, equivalently y i = y j . It is also true vice versa. Therefore, the feedback from the inferred labels and the similarity hypergraph matrix can affect each other. From this point of view, our clustering framework has the self-taught property.

3.4. Optimization Algorithm for Solving Problem (25)

In order to solve problem (25), the variable J is introduced to make (25) separable for optimization as follows:
min J , Z , N , H , W J * + λ 1 Tr XZ L H , W Z T X T + λ 2 N 2 , 1 + λ 3 W F 2 + λ 4 Tr F T L H , W F + λ 5 Y FQ F 2 s . t . X = XZ + N , Z = J , Z 0 , F T F = I c , Q T Q = I c , w T 1 = 1 , w i > 0 , Y I dx .
Then, (26) can be rewritten into the following augmented Lagrangian formulation:
L J , Z , N , H , W , F , Q , Y = J * + λ 1 Tr XZ L H , W Z T X T + λ 2 N 2 , 1 + λ 3 W F 2 + λ 4 Tr F T L H , W F + λ 5 Y FQ F 2 η w T 1 1 + μ 2 X XZ N + C 1 μ F 2 + Z J + C 2 μ F 2 .
The steps to update J , Z , N and H are similar to those of DHLR except for updating W , F , Q and Y .
Update W : Fixing variables J , Z , N , H , F , Q , Y , we can obtain the solution of W by solving the following problem:
W t + 1 = arg min W > 0 λ 1 Tr X Z t + 1 L H t + 1 , W Z t + 1 T X T + λ 3 W F 2 + λ 4 Tr F t T L H t + 1 , W F t η w T 1 1 .
By letting B t + 1 = D e t + 1 1 H t + 1 T D v t 1 2 Z t + 1 T X T X Z t + 1 D v t 1 2 , S t + 1 = D e t + 1 1 H t + 1 T D v t 1 2 F t F t T D v t 1 2 , b t + 1 = diag B t + 1 and s t + 1 = diag S t + 1 , since W is diagonal matrix, Equation (28) can be rewritten as the following form:
w t + 1 = arg min w i > 0 λ 1 b t + 1 w λ 4 s t + 1 w + λ 3 w 2 2 η w T 1 1 .
Similar to the solution of problem (20), the closed-form solution for w i is:
w i t + 1 = 1 2 λ 3 λ 1 b i t + 1 + λ 4 s i t + 1 + η + , i = 1 , , n .
We further obtain W t + 1 = diag w t + 1 , and D v t + 1 via the same formulation as Equation (23). The diagram about the iterative optimization of H and W in the dynamic hypergraph is shown in Figure 2.
Update F : with other variables fixed, it is equivalent to solving
F t + 1 = arg min F T F = I c λ 4 Tr F T L H t + 1 , W t + 1 F + λ 5 Y t F Q t F 2 .
The solution of variable F can be efficiently obtained via the algorithm proposed by [44].
Update Q : By fixing other variables, we have
Q t + 1 = arg min Q T Q = I c λ 5 Y t F t + 1 Q F 2 .
It has a closed-form solution as an orthogonal Procrustes problem [45]. The solution is Q = U V T , where U and V are left and right parts of the SVD decomposition of Y T F .
Update Y : with other variates fixed, the problem becomes
Y t + 1 = arg min Y Idx λ 5 Y F t + 1 Q t + 1 F 2 .
Notes that Tr Y T Y   =   n , the above subproblem can be rewritten as below:
Y t + 1 = arg max Y Idx Tr Y T F t + 1 Q t + 1 F 2 .
The optimal solution of variate Y is:
Y i j t + 1 = 1 , j = arg max k F t + 1 Q t + 1 i k 0 , otherwise .
Update the Lagrange multipliers and penalty parameter like DHLR in Equation (24). The details of the UDHLR algorithm optimization are summarized in Algorithm 2.
Algorithm 2 the UDHLR algorithm for HSI clustering
  • Input: A 2-D matrix of the HSI X R d × n , the number of desired clusters c and the regularization parameter λ 1 , λ 2 , λ 3 , λ 4 , λ 5 .
  • Initialization: Initialize the hypergraph L H 0 , W 0 by using the original data, randomly initialize F 0 and Q 0 .
  • while ( Z t J t > ε or X X Z t N t > ε and t M a x I t e r ) do
  •    1. Update J t + 1 , Z t + 1 , N t + 1 , H t + 1 and D e t + 1 as Algorithm 1.
  •    2. Update W t + 1 and D v t + 1 by solving Equations (30) and (23).
  •    3. Update L H t + 1 , W t + 1 according to Equation (4).
  •    4. Update F t + 1 by solving Equation (31).
  •    5. Update Q t + 1 by solving Equation (32).
  •    6. Update Y t + 1 by solving Equation (35).
  •    7. Update the Lagrange multipliers and penalty parameter by (24).
  • end while
  • Output: the cluster label Y for data X .

4. Results

4.1. Experimental Datasets

To validate the effectiveness of the proposed methods, we conduct experiments on three real-world hyperspectral datasets, namely Indian Pines, Salinas-A, and Jasper Ridge. Table 2 summarizes the detailed information of these three datasets.

4.1.1. Indian pines

The Indian Pines dataset was collected by an Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) sensor over Northwestern Indiana in 1992. The image has a spatial resolution of 20 m and 220 spectral bands ranging from 0.4 to 2.5 μm. During the test, 20 spectral bands (104–108, 150–163, and 220) are removed due to water absorption and noisy [46]. The size of this image is 145 × 145 . There are originally 16 classes in total. Following, e.g., [47], nine main classes were used in our experiment: corn-no-till, corn-minimum-till, grass pasture, grass-trees, hay-windrowed, soybean-no-till, soybean-minimum-till, soy-bean-clean, and woods.

4.1.2. Selinas-A

The original data set is the Salinas Valley data. This scene was acquired by the AVIRIS sensor over the Salinas Valley, California in 1998. The size of this image is 512 × 217 and contains 224 spectral bands with a spatial resolution of 3.7 m per pixel. There are originally 16 classes in Salina Valley. Following, e.g., [48], a subset of the Salinas Valley scene, denoted as Salinas-A hereinafter, is adopted, which contains of 86 × 83 pixels with 6 classes and 204 bands remain after removing noisy bands. The subset is in the [591–678] × [158–240] of Salinas Valley.

4.1.3. Jasper Ridge

There are 512 × 614 pixels and 224 spectral bands in Jasper Ridge dataset. After removing the spectral bands 1–3, 108–112, 154–166 and 220–224 affected by water vapor and the atmospheric, we obtained 198 spectral bands. Since the ground-truth is too complex to get in this hyperspectral image, we consider a sub image containing 100 × 100 pixels with four classes. The first pixel starts from the (105,269)-th pixel in the original image.

4.2. Experimental Setup

4.2.1. Evaluation Metrics

In the experimental results, the normalized mutual information (NMI) is employed to gauge the clustering performance quantitatively, which measures the overlap between the experimental obtained labels and the ground-truth labels. Given two variables A and B , NMI is defined as [49]:
NMI A , B   =   I A , B H A H B ,
where I A , B is the mutual information between A and B , H A and H B respectively denote the entropies of A and B . Obviously, if A is identical with B , NMI A , B will be equal to 1; if A is independent from B , NMI A , B will become 0.
In addition, we also evaluate the clustering performance by measuring user’s accuracy, producer’s accuracy, overall accuracy (OA), average accuracy (AA), k coefficient. For a dataset with n pixels, y i is the clustering label of pixel x i obtained by clustering method, g i is the ground-truth of x i . The OA is obtained by
OA = i = 1 n δ map y i , g i n
where δ x , y   =   1 , if x = y ; δ x , y   =   0 , otherwise. map · is the optimal mapping function that permutes clustering labels to match the ground-truth labels. The best mapping can be found by using the Kuhn–Munkres algorithm [50]. The average accuracy (AA) is the ratio between the number of predictions on each class and the total number of each class. For clustering tasks, the clustering results (i.e., clustering labels) obtained in the experiment must be aligned to the class labels of the ground-truth. To achieve the above purpose, a simple exhaustive search on all permutations of the cluster labels is used to maximize the resulting OA as was done in [51]. We note that this alignment is perhaps the most beneficial for maximizing OA measurement, there may be alternative alignments that powerful for maximizing AA or k [52].

4.2.2. Compared Methods

In order to evaluate the clustering performance of the proposed DHLR and UDHLR algorithms, eight clustering methods are selected for fair comparison. The first category comprises two centroid-based methods, which are k-means [6] and fuzzy c-means (FCM) [7]. The iterations of the k-means method are 200 in our experiment. The fuzziness exponent in FCM we set is 2. For the second category, we compare against classical spectral-based clustering approaches using both a globally connected graph (GSC [8]) as well as a locally connected graph (LSC [9]). The graph weights are constructed by a Gaussian kernel. The third category comprises four subspace-based spectral clustering methods, including SSC [10,11], LRSC [12,13], GLRSC [15], and the hypergraph-regularized LRSC (HGLRSC)as described in [52].
The regularization parameters for SSC and LRSC are searched from the set 10 4 , 10 3 , 10 2 , 10 1 , 1 , 10 1 , 10 2 , 10 3 , 10 4 , to choose the value producing the best clustering result. For the parameter pair λ 1 and λ 2 in both GLRSC and HG-LRSC, the same is done searching over the set 10 4 , 10 3 , 10 2 , 10 1 , 1 , 10 1 , 10 2 , 10 3 , 10 4 to choose the appropriate values. These parameters are used in the experiment of the compared algorithms.
For SSC and LRSC, the regularization parameter is set via an exhaustive search over the set 10 4 , 10 3 , 10 2 , 10 1 , 1 , 10 1 , 10 2 , 10 3 , 10 4 , choosing the value yielding the best OA performance. The same is done for the parameter pair and in both GLRSC and HGLRSC as well as in (4) using an exhaustive search over the values 10 4 , 10 3 , 10 2 , 10 1 , 1 , 10 1 , 10 2 , 10 3 , 10 4 . These parameters are used throughout the remainder of the experimental results.

4.3. Parameters Tuning

There are five parameters λ 1 , λ 2 , λ 3 , λ 4 , and λ 5 in UDHLR, and three parameters λ 1 , λ 2 , λ 3 in DHLR. In this section, we evaluate the parameter sensitivity of the proposed methods on three datasets, and investigate different parameter settings. In the experiment, we tune the parameters λ 1 , λ 2 , λ 3 , λ 4 , and λ 5 in the range of 10 4 , 10 3 , 10 2 , 10 1 , 1 , 10 1 , 10 2 , 10 3 , 10 4 . We observe the variations of OA with different values of each parameters.
(1) Parameter analysis in DHLR: In DHLR, λ 1 is the manifold regularization parameter, λ 2 is the noise regularization parameter, λ 3 is penalty parameter of hyperedge weight W . Figure 3 shows the OA of DHLR with respect to the parameter λ 1 . For the Indian Pines dataset, the peak value of OA generates when λ 1 = 0.1 . For the Salinas-A dataset, we set λ 1 = 1 for obtaining the best result in the experiments. For the Jasper Ridge dataset, the clustering results are better when we set λ 1 = 1 . Figure 4 shows the OA of DHLR with respect to the parameter λ 2 . For the Indian Pines dataset, the peak value of OA generates when λ 2 = 1000 . For the Salinas-A dataset, we set λ 2 = 1 for obtaining the best result in the experiments. For the Jasper Ridge dataset, the clustering results are better when we set λ 2 = 0.01 . Figure 5 shows the OA of DHLR with respect to the parameter λ 3 . According to Figure 5, we find that the proposed methods can achieve better performance with λ 3 in the setting of 1000, 0.01, 0.001 for the Indian Pines, Salinas-A, and Jasper Ridge datasets, respectively.
(2) Parameter analysis in UDHLR: Except for the same three parameters λ 1 , λ 2 , and λ 3 as DHLR, λ 4 is the parameter of the label feature manifold regularization. In addition, λ 5 is conductive to discrete label learning. Figure 6 shows the OA of UDHLR with respect to the parameters λ 1 . For the Indian Pines dataset, the best results can be achieved when λ 1 = 10 . For the Salinas-A dataset, the clustering results are better when we set λ 1 = 1000 . For the Jasper Ridge dataset, we set λ 1 = 0.001 for obtaining the best result in the experiments. Figure 7 shows the OA of UDHLR with respect to the parameters λ 2 . For the Indian Pines dataset, the best results can be achieved when λ 2 = 0.01 . For the Salinas-A dataset, the clustering results are better when we set λ 2 = 1 . For the Jasper Ridge dataset, we set λ 2 = 100 for obtaining the best result in the experiments. Figure 8 shows the OA of UDHLR with respect to the parameters λ 3 . The UDHLR performs well when λ 3 being set of 1, 0.01, 100 for the Indian Pines, Salinas-A, and Jasper Ridge datasets, respectively. In UDHLR, λ 4 and λ 5 play a vital role in clustering performance. Figure 9 demonstrates the OA values of three datasets under tuning λ 4 while keeping other parameters fixed. As can be seen, the best result can be achieved when λ 4 = 10 for the Indian Pines dataset. For the Salinas-A dataset, we set λ 4 = 1000 in our experiments. The results in Figure 9c show that the UDHLR performs well when λ 4 = 1000 for the Jasper Ridge dataset. Figure 10 shows the OA of UDHLR with respect to the parameters λ 5 . For the Indian Pines dataset, the peak value of OA generates when λ 5 = 0.01 . For the Salinas-A dataset, we set λ 5 = 0.001 for obtaining the best result in the experiments. For the Jasper Ridge dataset, the clustering results are better when we set λ 5 = 1 .

4.4. Investigate of Clustering Performance

Both the clustering maps and quantitative evaluation results are given in this section. The presented results clearly demonstrate that DHLR and UDHLR outperform the other methods on the three datasets. We run all the methods 100 times independently, and show the mean results of the clustering result in the corresponding Tables of the three datasets. In addition, the corresponding variance values of the methods generated in three datasets are recorded in Figure 11.
(1) Indian Pines: Figure 12 shows the clustering maps of the Indian Pines dataset. Table 3 gives the quantitative the clustering results. In general, the graph-based methods get better performance than the methods with no graph. Specifically, the K-means and FCM methods perform poorly with many misclassifications in the cluster map because of without exploring the local geometrical structure of the data. Compared with K-means and FCM, the GSC and LSC methods improves the clustering results by applying k-means on the eigenspace of the Laplacian matrices. In contrast, the subspace clustering methods can obtain a much better performance by using subspace learning to model the complex inherent structure of HSI data. Compared with K-means, SSC and LRSC perform much better in this dataset, obtaining the increments in OA of 3.82% and 5.58%, respectively. However, the learned representation coefficient matrix cannot capture the essential geometric structure information. As a result, the clustering results are not very high. GLRSC and HGLRSC improve the clustering performance of LRSC by optimizing the low-rank representation coefficient with the graph and hypergraph regularization, which shows the advantage of incorporating the latent geometric structure information. Unfortunately, the hypergraph is usually fixed, which is constructed by the original data, which is not optimized adaptively. The proposed DHLR algorithm improves 4.49% compared with the classical LRSC, and more than 4.07% compared with HGLRSC. Furthermore, the proposed UDHLR method obtains the best results with the 2.07% improvements than DHLR.
(2) Salinas-A: Figure 13 illustrates the visualization performance of the Salinas-A dataset. Table 4 gives the corresponding quantitative clustering results. Among these comparison algorithms, GLRSC and HGLRSC combine the graph theory and representation learning into the HSI data clustering. Meanwhile, SSC and LRSC only use the representation learning to obtain the new feature, and GSC and LSC only use the graph theory into the clustering. It can be seen from Table 4 that clustering accuracy of GSC, LSC, SSC and LRSC is lower than GLRSC and HGLRSC. This indicates that learning with the local geometry structure information can improve the HSI clustering observably. In addition, K-means and FCM methods perform poorer than the spectral-based methods. Compared with the aforementioned methods, the proposed DHLR and UDHLR effectively improve the clustering performance by optimal the hypergraph adaptively. As shown in Table 4, UDHLR achieves the highest OA than other methods. We can see that the proposed DHLR and UDHLR algorithms can effectively preserve the detailed structure information, and show an obvious advantage compared with the other clustering methods.
(3) Jasper Ridge: Figure 14 and Table 5 show the visual and quantitative clustering results of Jasper Ridge dataset, respectively. From Figure 14 and Table 5, we can see that the centroid-based and spectral-based clustering methods—K-means, FCM, GSC, LSC, SSC, and LRSC—achieve poorer clustering performance when compared with the graph and hypergraph combined clustering results. On the contrary, GLRSC obtains a much higher clustering accuracy than LRSC. HGLRSC also obtains higher clustering precision than LRSC and GLRSC. The proposed DHLR and UDHLR algorithm outperform the other state-of-the-art clustering methods significantly. In which the UDHLR method achieves the best clustering results, with the best OA of 92.56%, which again demonstrates the advantage of the proposed algorithm.

5. Discussion

In this section, we will discuss the computation complexity of the proposed DHLR and UDHLR methods. The main computation cost of the DHLR algorithm lies in updating J t + 1 , Z t + 1 , N t + 1 , which need the complexity about O n 2 r all of them. As referred in [13], r is the rank of the dictionary with the orthogonal basis of the dictionary data. The updating of H t + 1 need to construct an n × n matrix, whose time complexity is O n d 2 . The complexity of updating W t + 1 is O n d 2 . In addition, updating the Lagrange multipliers take the complexity of O n d , which is too small to be neglected. The complexity of the UDHLR algorithm comes from the updating of F t + 1 , Q t + 1 , Y t + 1 , except for the variables J t + 1 , Z t + 1 , N t + 1 , H t + 1 , W t + 1 same as DHLR. The complexity for updating F t + 1 is O n c 2 + c 3 . The solution of solving Q t + 1 involves SVD and the complexity is O n c 2 + c 3 . To update Y t + 1 , we need O n c 2 . Therefore, the total complexity of UDHLR is O n 2 r + n d 2 + n c 2 + c 3 . Though, the number of the cluster c is small, the computation complexity of the proposed methods is greatly higher than original LRSC algorithm because of involving matrix inversion and SVD. In the future, we will consider parallel computing to increasing running speed.

6. Conclusions

In this paper, we propose a novel unified adaptive hypergraph-regularized low-rank subspace learning method for hyperspectral clustering. In the proposed framework, low-rank and the hypergraph terms are used to explore the local and global structure information of data, and the last two terms are used to learn the continuous label and the discrete label. Specifically, the hypergraph is adaptively learned from the low-rank subspace feature without exploring a fixed incidence matrix, which is theoretically optimal for clustering. Otherwise, the proposed model learns a rotation matrix to simultaneously learn continuous labels and discrete cluster labels, which need no relaxing information loss as many existing spectral clustering methods. It jointly learns the similarity hypergraph from the learned low-rank subspace data and the discrete clustering labels by solving an optimization problem, in which the subspace feature is adaptively learned by considering the clustering performance and the continuous clustering labels just serve as intermediate products. The experimental results demonstrate that the proposed DHLR and UDHLR outperforms the existing clustering methods. However, the computational complexity of each iteration is very high in the proposed methods, which should be optimized in the view of running time. In the future, we will optimize the complexity of the proposed method and intend the hypergraph learning to conduct the large-scale hyperspectral image clustering.

Author Contributions

Conceptualization, J.X. and L.X.; methodology, L.X.; software, J.X.; validation, J.X., and L.X.; formal analysis, L.X. and J.Y.; writing—original draft preparation, J.X.; writing—review and editing, L.X. and J.Y.; visualization, J.X.; supervision, L.X. and J.Y.; project administration, L.X.; funding acquisition, L.X. All authors have read and agreed to the published version of the manuscript.

Funding

This work has been supported by the National Natural Science Foundation of China (Grant No.61871226, 61571230), Jiangsu Provincial Social Developing Project (Grant No. BE2018727), the National Major Research Plan of China (Grant No. 2016YFF0103604).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data reported in this paper are available on https://rslab.ut.ac.ir/data, accessed on 15 February 2021.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Wang, Q.; He, X.; Li, X. Locality and structure regularized low rank representation for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2018, 57, 911–923. [Google Scholar] [CrossRef] [Green Version]
  2. Liu, J.; Xiao, Z.; Chen, Y.; Yang, J. Spatial-spectral graph regularized kernel sparse representation for hyperspectral image classification. ISPRS Int. J. Geo-Inform. 2017, 6, 258. [Google Scholar] [CrossRef]
  3. Liu, J.; Wu, Z.; Xiao, Z.; Yang, J. Classification of hyperspectral images using kernel fully constrained least squares. ISPRS Int. J. Geo-Inform. 2017, 6, 344. [Google Scholar] [CrossRef] [Green Version]
  4. Shen, Y.; Xiao, L.; Chen, J.; Pan, D. A Spectral-Spatial Domain-Specific Convolutional Deep Extreme Learning Machine for Supervised Hyperspectral Image Classification. IEEE Access 2019, 7, 132240–132252. [Google Scholar] [CrossRef]
  5. Zhang, H.; Zhai, H.; Zhang, L.; Li, P. Spectral—spatial sparse subspace clustering for hyperspectral remote sensing images. IEEE Trans. Geosci. Remote Sens. 2016, 54, 3672–3684. [Google Scholar] [CrossRef]
  6. Lloyd, S. Least squares quantization in PCM. IEEE Trans. Inform. Theory 1982, 28, 129–137. [Google Scholar] [CrossRef]
  7. Peizhuang, W. Pattern recognition with fuzzy objective function algorithms (James C. Bezdek). SIAM Rev. 1983, 25, 442. [Google Scholar] [CrossRef]
  8. Ng, A.Y.; Jordan, M.I.; Weiss, Y. On spectral clustering: Analysis and an algorithm. In Advances in Neural Information Processing Systems; MIT Press: Vancouver, BC, Canada, 2002; pp. 849–856. [Google Scholar]
  9. Belkin, M.; Niyogi, P. Laplacian eigenmaps for dimensionality reduction and data representation. Neur. Comput. 2003, 15, 1373–1396. [Google Scholar] [CrossRef] [Green Version]
  10. Elhamifar, E.; Vidal, R. Sparse subspace clustering. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Los Alamitos, CA, USA, 20–25 June 2009; pp. 2790–2797. [Google Scholar]
  11. Elhamifar, E.; Vidal, R. Sparse subspace clustering: Algorithm, theory, and applications. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 2765–2781. [Google Scholar] [CrossRef] [Green Version]
  12. Vidal, R.; Favaro, P. Low rank subspace clustering (LRSC). Pattern Recognit. Lett. 2014, 43, 47–61. [Google Scholar] [CrossRef] [Green Version]
  13. Liu, G.; Lin, Z.; Yan, S.; Sun, J.; Yu, Y.; Ma, Y. Robust recovery of subspace structures by low-rank representation. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 35, 171–184. [Google Scholar] [CrossRef] [Green Version]
  14. Zheng, M.; Bu, J.; Chen, C.; Wang, C.; Zhang, L.; Qiu, G.; Cai, D. Graph regularized sparse coding for image representation. IEEE Trans. Image Process. 2010, 20, 1327–1336. [Google Scholar] [CrossRef] [PubMed]
  15. Lu, X.; Wang, Y.; Yuan, Y. Graph-regularized low-rank representation for destriping of hyperspectral images. IEEE Trans. Geosci. Remote Sens. 2013, 51, 4009–4018. [Google Scholar] [CrossRef]
  16. Berge, C. Hypergraphs; North-Holland: Amsterdam, The Netherlands, 1989. [Google Scholar]
  17. Zhou, D.; Huang, J.; Schölkopf, B. Learning with hypergraphs: Clustering, classification, and embedding. Adv. Neural Inf. Process. Syst. 2006, 19, 1601–1608. [Google Scholar]
  18. Yuan, H.; Tang, Y.Y. Learning with hypergraph for hyperspectral image feature extraction. IEEE Geosci. Remote Sens. Lett. 2015, 12, 1695–1699. [Google Scholar] [CrossRef]
  19. Bai, X.; Guo, Z.; Wang, Y.; Zhang, Z.; Zhou, J. Semisupervised hyperspectral band selection via spectral—Spatial hypergraph model. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2015, 8, 2774–2783. [Google Scholar] [CrossRef] [Green Version]
  20. Du, W.; Qiang, W.; Lv, M.; Hou, Q.; Zhen, L.; Jing, L. Semi-supervised dimension reduction based on hypergraph embedding for hyperspectral images. Int. J. Remote Sens. 2018, 39, 1696–1712. [Google Scholar] [CrossRef]
  21. Chang, Y.; Yan, L.; Zhong, S. Hyper-laplacian regularized unidirectional low-rank tensor recovery for multispectral image denoising. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4260–4268. [Google Scholar]
  22. Huang, H.; Chen, M.; Duan, Y. Dimensionality reduction of hyperspectral image using spatial-spectral regularized sparse hypergraph embedding. Remote Sens. 2019, 11, 1039. [Google Scholar] [CrossRef] [Green Version]
  23. Gao, S.; Tsang, I.W.H.; Chia, L.T. Laplacian sparse coding, hypergraph laplacian sparse coding, and applications. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 35, 92–104. [Google Scholar] [CrossRef] [PubMed]
  24. Zeng, K.; Yu, J.; Li, C.; You, J.; Jin, T. Image clustering by hyper-graph regularized non-negative matrix factorization. Neurocomputing 2014, 138, 209–217. [Google Scholar] [CrossRef]
  25. Wang, W.; Qian, Y.; Tang, Y.Y. Hypergraph-regularized sparse NMF for hyperspectral unmixing. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2016, 9, 681–694. [Google Scholar] [CrossRef]
  26. Yin, M.; Gao, J.; Lin, Z. Laplacian regularized low-rank representation and its applications. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 38, 504–517. [Google Scholar] [CrossRef] [PubMed]
  27. Zeng, M.; Ning, B.; Hu, C.; Gu, Q.; Cai, Y.; Li, S. Hyper-Graph Regularized Kernel Subspace Clustering for Band Selection of Hyperspectral Image. IEEE Access 2020, 8, 135920–135932. [Google Scholar] [CrossRef]
  28. Zhang, Z.; Bai, L.; Liang, Y.; Hancock, E. Joint hypergraph learning and sparse regression for feature selection. Pattern Recognit. 2017, 63, 291–309. [Google Scholar] [CrossRef] [Green Version]
  29. Zhang, Z.; Lin, H.; Gao, Y.; BNRist, K. Dynamic Hypergraph Structure Learning. In Proceedings of the International Joint Conferences on Artificial Intelligence Organization, Stockholm, Sweden, 13–19 June 2018; pp. 3162–3169. [Google Scholar]
  30. Zhu, X.; Zhu, Y.; Zhang, S.; Hu, R.; He, W. Adaptive Hypergraph Learning for Unsupervised Feature Selection. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence IJCAI, Melbourne, VIC, Australia, 19–25 August 2017; pp. 3581–3587. [Google Scholar]
  31. Zhu, X.; Zhang, S.; Zhu, Y.; Zhu, P.; Gao, Y. Unsupervised Spectral Feature Selection with Dynamic Hyper-graph Learning. IEEE Trans. Knowl. Data Eng. 2020. [Google Scholar] [CrossRef]
  32. Tang, C.; Liu, X.; Wang, P.; Zhang, C.; Li, M.; Wang, L. Adaptive hypergraph embedded semi-supervised multi-label image annotation. IEEE Trans. Multimed. 2019, 21, 2837–2849. [Google Scholar] [CrossRef]
  33. Ding, D.; Yang, X.; Xia, F.; Ma, T.; Liu, H.; Tang, C. Unsupervised feature selection via adaptive hypergraph regularized latent representation learning. Neurocomputing 2020, 378, 79–97. [Google Scholar] [CrossRef]
  34. Kang, Z.; Peng, C.; Cheng, Q.; Xu, Z. Unified Spectral Clustering with Optimal Graph. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17), San Francisco, CA, USA, 4–9 February 2017. [Google Scholar]
  35. Han, Y.; Zhu, L.; Cheng, Z.; Li, L.; Liu, X. Discrete Optimal Graph Clustering. IEEE Trans. Cybernet. 2020, 50, 1697–1710. [Google Scholar] [CrossRef] [Green Version]
  36. Yang, Y.; Shen, F.; Huang, Z.; Shen, H.T. A Unified Framework for Discrete Spectral Clustering. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; pp. 2273–2279. [Google Scholar]
  37. Lin, Z.; Chen, M.; Ma, Y. The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices. arXiv 2010, arXiv:1009.5055. [Google Scholar]
  38. Liu, G.; Yan, S. Latent Low-Rank Representation for subspace segmentation and feature extraction. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 1615–1622. [Google Scholar] [CrossRef]
  39. Boley, D.; Chen, Y.; Bi, J.; Wang, J.Z.; Huang, J.; Nie, F.; Huang, H.; Rahimi, A.; Recht, B. Spectral Rotation versus K-Means in Spectral Clustering. In Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence, Bellevue, WA, USA, 14–18 July 2013. [Google Scholar]
  40. Kang, Z.; Peng, C.; Cheng, Q. Twin Learning for Similarity and Clustering: A Unified Kernel Approach. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17), San Francisco, CA, USA, 4–9 February 2017. [Google Scholar]
  41. Mohar, B. The Laplacian spectrum of graphs. In Graph Theory, Combinatorics, and Applications; Wiley: Hoboken, NJ, USA, 1991; pp. 871–898. [Google Scholar]
  42. Fan, K. On a Theorem of Weyl Concerning Eigenvalues of Linear Transformations I. Proc. Natl. Acad. Sci. USA 1949, 35, 652–655. [Google Scholar] [CrossRef] [Green Version]
  43. Yu, S.X.; Shi, J. Multiclass Spectral Clustering. In Proceedings of the IEEE International Conference on Computer Vision, Nice, France, 13–16 October 2003. [Google Scholar]
  44. Wen, Z.; Yin, W. A feasible method for optimization with orthogonality constraints. Math. Programm. 2013, 142, 397–434. [Google Scholar] [CrossRef] [Green Version]
  45. Nie, F.; Zhang, R.; Li, X. A generalized power iteration method for solving quadratic problem on the Stiefel manifold. Sci. China Inf. Sci. 2017, 60, 112101. [Google Scholar] [CrossRef]
  46. Zhong, Y.; Zhang, L.; Gong, W. Unsupervised remote sensing image classification using an artificial immune network. Int. J. Remote Sens. 2011, 32, 5461–5483. [Google Scholar] [CrossRef]
  47. Ji, R.; Gao, Y.; Hong, R.; Liu, Q. Spectral-Spatial Constraint Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2014, 3, 1811–1824. [Google Scholar] [CrossRef]
  48. Ul Haq, Q.S.; Tao, L.; Sun, F.; Yang, S. A Fast and Robust Sparse Approach for Hyperspectral Data Classification Using a Few Labeled Samples. Geosci. Remote Sens. IEEE Trans. 2012, 50, 2287–2302. [Google Scholar] [CrossRef]
  49. Strehl, A.; Ghosh, J. Cluster Ensembles—A Knowledge Reuse Framework for Combining Multiple Partitions. J. Mach. Learn. Res. 2002, 3, 583–617. [Google Scholar]
  50. Lovasz, L.; Plummer, M.D. Matching Theory; AMS Chelsea Publishing: Amsterdam, North Holland, 1986. [Google Scholar]
  51. Murphy, J.M.; Maggioni, M. Unsupervised Clustering and Active Learning of Hyperspectral Images with Nonlinear Diffusion. IEEE Trans. Geosci. Remote Sens. 2019, 57, 1829–1845. [Google Scholar] [CrossRef] [Green Version]
  52. Xu, J.; Fowler, J.E.; Xiao, L. Hypergraph-Regularized Low-Rank Subspace Clustering Using Superpixels for Unsupervised Spatial-Spectral Hyperspectral Classification. IEEE Geosci. Remote Sens. Lett. 2020, 1–5. [Google Scholar] [CrossRef]
Figure 1. Illustration of the proposed method.
Figure 1. Illustration of the proposed method.
Remotesensing 13 01372 g001
Figure 2. Illustration of updating the dynamic hypergraph.
Figure 2. Illustration of updating the dynamic hypergraph.
Remotesensing 13 01372 g002
Figure 3. The OA of DHLR with different λ 1 on three datasets.
Figure 3. The OA of DHLR with different λ 1 on three datasets.
Remotesensing 13 01372 g003
Figure 4. The OA of DHLR with different λ 2 on three datasets.
Figure 4. The OA of DHLR with different λ 2 on three datasets.
Remotesensing 13 01372 g004
Figure 5. The OA of DHLR with different λ 3 on three datasets.
Figure 5. The OA of DHLR with different λ 3 on three datasets.
Remotesensing 13 01372 g005
Figure 6. The OA of UDHLR with different λ 1 on three datasets.
Figure 6. The OA of UDHLR with different λ 1 on three datasets.
Remotesensing 13 01372 g006
Figure 7. The OA of UDHLR with different λ 2 on three datasets.
Figure 7. The OA of UDHLR with different λ 2 on three datasets.
Remotesensing 13 01372 g007
Figure 8. The OA of UDHLR with different λ 3 on three datasets.
Figure 8. The OA of UDHLR with different λ 3 on three datasets.
Remotesensing 13 01372 g008
Figure 9. The OA of UDHLR with different λ 4 on three datasets.
Figure 9. The OA of UDHLR with different λ 4 on three datasets.
Remotesensing 13 01372 g009
Figure 10. The OA of UDHLR with different λ 5 on three datasets.
Figure 10. The OA of UDHLR with different λ 5 on three datasets.
Remotesensing 13 01372 g010
Figure 11. Histogram of the clustering accuracy with variance.
Figure 11. Histogram of the clustering accuracy with variance.
Remotesensing 13 01372 g011
Figure 12. Indian Pines dataset. (a) Ground-truth. (b) k-means, 40.66%. (c) FCM, 40.70%. (d) GSC, 42.33%. (e) LSC, 44.13%. (f) SSC, 44.48%. (g) LRSC, 46.24%. (h) GLRSC, 46.86%. (i) HGLRSC, 47.38%. (j) DHLR, 51.45%. (k) UDHLR, 53.52%.
Figure 12. Indian Pines dataset. (a) Ground-truth. (b) k-means, 40.66%. (c) FCM, 40.70%. (d) GSC, 42.33%. (e) LSC, 44.13%. (f) SSC, 44.48%. (g) LRSC, 46.24%. (h) GLRSC, 46.86%. (i) HGLRSC, 47.38%. (j) DHLR, 51.45%. (k) UDHLR, 53.52%.
Remotesensing 13 01372 g012
Figure 13. Salinas-A dataset. (a) Ground-truth. (b) k-means, 65.12%. (c) FCM, 61.21%. (d) GSC, 69.27%. (e) LSC, 61.36%. (f) SSC, 66.49%. (g) LRSC, 74.79%. (h) GLRSC, 75.15%. (i) HGLRSC, 75.45%. (j) DHLR, 79.37%. (k) UDHLR, 84.31%.
Figure 13. Salinas-A dataset. (a) Ground-truth. (b) k-means, 65.12%. (c) FCM, 61.21%. (d) GSC, 69.27%. (e) LSC, 61.36%. (f) SSC, 66.49%. (g) LRSC, 74.79%. (h) GLRSC, 75.15%. (i) HGLRSC, 75.45%. (j) DHLR, 79.37%. (k) UDHLR, 84.31%.
Remotesensing 13 01372 g013
Figure 14. Jasper Ridge dataset. (a) Ground-truth. (b) k-means, 75.56%. (c) FCM, 75.28%. (d) GSC, 70.75%. (e) LSC, 77.55%. (f) SSC, 79.52%. (g) LRSC, 80.12%. (h) GLRSC, 81.09%. (i) HGLRSC, 81.16%. (j) DHLR, 82.89%. (k) UDHLR, 92.56%.
Figure 14. Jasper Ridge dataset. (a) Ground-truth. (b) k-means, 75.56%. (c) FCM, 75.28%. (d) GSC, 70.75%. (e) LSC, 77.55%. (f) SSC, 79.52%. (g) LRSC, 80.12%. (h) GLRSC, 81.09%. (i) HGLRSC, 81.16%. (j) DHLR, 82.89%. (k) UDHLR, 92.56%.
Remotesensing 13 01372 g014
Table 1. Important notation used in this paper.
Table 1. Important notation used in this paper.
NotationDefinition
dNumber of bands
nNumber of pixels
cNumber of the classes
XHyperspectral image
ZLow-rank representation matrix
NNoise matrix
GA hypergraph
VThe vertexes of hypergraph
EThe hyperedges of hypergraph
WThe weight of hyperedges
HThe incidence matrix of hypergraph
LHypergraph Laplacian matrix
QRotation matrix
FThe continuous label indicator matrix
YThe label matrix
tNumber of iterations
Table 2. Important notation used in this paper.
Table 2. Important notation used in this paper.
DatasetsSize(N)Dim(D)Classes(C)
Salinas-A71382046
Jasper Ridge10,0001984
Indian Pines21,0252009
Table 3. Performance of Indian Pines dataset.
Table 3. Performance of Indian Pines dataset.
IPClassMethod
No.k-MeansFCMGSCLSCSSCLRSCGLRSCHGLRSCDHLRUDHLR
User’s
accuracy
(%)
C126.5042.6129.1511.1656.9756.9756.9023.8400.550.62
C20.0015.830.000.000.000.000.000.350.000.00
C37.0412.2718.514.6328.776.0444.2657.3465.7965.79
C440.7035.4831.7330.1288.3595.8590.2276.4383.9387.81
C599.3998.9891.4199.5999.8010099.7999.5999.5999.59
C617.4631.510.004.240.000.720.720.6150.200.72
C766.0543.5673.9588.8645.8745.8746.2381.2859.9285.61
C80.490.0027.520.0031.1130.7831.920.3241.2042.50
C961.2867.2359.3576.8956.1172.7265.4556.1088.1788.17
Producer’s
accuracy
(%)
C123.3931.1730.1327.3527.9827.9828.2022.5521.0522.50
C20.0014.660.000.000.000.000.002.630.000.00
C36.2717.1326.1317.4250.8811.5331.6535.6293.4293.96
C480.2188.9286.4990.0089.3072.9178.9280.1983.1581.38
C563.3664.7971.8649.7499.1890.7296.0684.25100100
C618.1521.520.0025.300.0012.5012.5012.7624.012.80
C743.8753.8044.1139.6743.7043.4843.6942.3041.9040.57
C860.000.0021.910.0025.6725.3625.4210024.8024.53
C972.8873.8573.0075.7869.4075.9488.6987.05100100
OA (%)40.6640.7042.3344.1344.4846.2446.9647.3851.4553.52
AA (%)35.4338.6136.8535.0545.2245.4448.3943.9954.3752.32
κ 0.2840.3080.2990.3030.3410.3610.3720.3530.4230.429
NMI (%)43.0841.1743.5546.6847.2845.3146.1246.5848.6854.26
Table 4. Performance of Salinas-A dataset.
Table 4. Performance of Salinas-A dataset.
IPClassMethod
No.k-meansFCMGSCLSCSSCLRSCGLRSCHGLRSCDHLRUDHLR
User’s
accuracy
(%)
C10.0099.7410099.7499.4899.7499.4899.480.0099.74
C292.850.001000.0062.5094.4887.9889.2890.5892.04
C353.8348.0649.6346.6254.0399.8697.7099.2798.22100
C410099.8599.4099.8599.850.000.000.0099.8599.70
C587.2395.1192.8695.1198.1299.7499.3799.2490.4897.49
C653.5353.4639.3855.6937.3052.7159.8658.7459.0442.88
Producer’s
accuracy
(%)
C10.001001001001001001001000.00100
C294.230.0033.020.0030.5588.5895.5996.4996.2042.53
C394.1589.2870.3597.2650.3097.1996.0695.9477.7796.76
C452.4155.6698.8290.5796.690.000.000.0089.8597.39
C563.4294.8898.0158.6490.7454.4754.6854.3899.8699.74
C610033.7591.0434.4510010093.1697.0495.7799.65
OA (%)65.1261.2169.2761.3666.4974.7975.1575.4579.3784.31
AA (%)64.5766.0480.2166.1775.2174.4274.0774.3473.0388.64
κ 0.5820.5150.6310.5170.5890.6890.6910.6960.7420.808
NMI (%)70.8462.6767.6463.8864.3884.0281.6983.3381.1086.30
Table 5. Performance of Jasper Ridge dataset.
Table 5. Performance of Jasper Ridge dataset.
IPClassMethod
No.k-meansFCMGSCLSCSSCLRSCGLRSCHGLRSCDHLRUDHLR
User’s
accuracy
(%)
C197.3997.1356.6872.1763.1595.5078.0778.4495.1392.38
C259.1456.2299.8799.2197.9267.5590.1090.0191.91100
C390.0793.2842.1779.6571.4910097.4097.3297.6584.22
C40.000.0099.460.001000.132.522.521.1987.38
Producer’s
accuracy
(%)
C193.3595.3810010099.6399.8899.9210099.8192.91
C210010098.8399.9099.5410099.9699.8910095.08
C371.1971.4940.9469.7658.2971.9460.2360.4372.1388.60
C40.000.0034.700.0049.020.015.475.382.7591.26
OA (%)75.5675.2870.7577.5579.5280.1281.0881.1682.8992.56
AA (%)61.6561.6674.5562.7683.1465.7967.0267.0768.3390.99
κ 0.6620.6590.6060.6900.7190.7230.7320.7330.7570.894
NMI (%)73.5674.4570.2474.4869.2578.4368.7268.8271.4677.18
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Xu, J.; Xiao, L.; Yang, J. Unified Low-Rank Subspace Clustering with Dynamic Hypergraph for Hyperspectral Image. Remote Sens. 2021, 13, 1372. https://doi.org/10.3390/rs13071372

AMA Style

Xu J, Xiao L, Yang J. Unified Low-Rank Subspace Clustering with Dynamic Hypergraph for Hyperspectral Image. Remote Sensing. 2021; 13(7):1372. https://doi.org/10.3390/rs13071372

Chicago/Turabian Style

Xu, Jinhuan, Liang Xiao, and Jingxiang Yang. 2021. "Unified Low-Rank Subspace Clustering with Dynamic Hypergraph for Hyperspectral Image" Remote Sensing 13, no. 7: 1372. https://doi.org/10.3390/rs13071372

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop