Next Article in Journal
Experimental Study of the Thermal Infrared Emissivity Variation of Loaded Rock and Its Significance
Previous Article in Journal
Bilateral Filter Regularized L2 Sparse Nonnegative Matrix Factorization for Hyperspectral Unmixing
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Classification of Hyperspectral Images with Robust Regularized Block Low-Rank Discriminant Analysis

1
School of Electronic and Information Engineering, Hebei University of Technology, Tianjin 300401, China
2
Computer Science Department, Worcester Polytechnic Institute, Worcester, MA 01609, USA
3
College of Resources and Environment, Huazhong Agricultural University, Wuhan 430070, China
4
Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
5
Post Doctoral Fellow, School of Energy and Environmental Engineering, Hebei University of Technology, Tianjin 300401, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2018, 10(6), 817; https://doi.org/10.3390/rs10060817
Submission received: 22 April 2018 / Revised: 12 May 2018 / Accepted: 17 May 2018 / Published: 24 May 2018
(This article belongs to the Section Remote Sensing Image Processing)

Abstract

:
Classification of Hyperspectral Images (HSIs) has gained attention for the past few decades. In remote sensing image classification, the labeled samples are insufficient or hard to obtain; however, the unlabeled ones are frequently rich and of a vast number. When there are no sufficient labeled samples, overfitting may occur. To resolve the overfitting issue, in this present work, we proposed a novel approach for HSI feature extraction, called robust regularized Block Low-Rank Discriminant Analysis (BLRDA), which is a robust and efficient feature extraction method to improve the HSIs’ classification accuracy with few labeled samples. To reduce the exponentially growing computational complexity of the low-rank method, we divide the entire image into blocks and implement the low-rank representation for each block respectively. Due to the symmetric matrix requirements for the regularized graph of discriminant analysis, the k-nearest neighbor is applied to handle the whole low-rank graph integrally. The low-rank representation and the kNN can maximally capture and preserve the global and local geometry of the data, respectively, and the performance of regularized discriminant analysis feature extraction can be apparently improved. Extensive experiments on multi-class hyperspectral images show that the proposed BLRDA is a very robust and efficient feature extraction method. Even with simple supervised and semi-supervised classifiers (nearest neighbor and SVM) and randomly given parameters, the feature extraction method achieves significant results with few labeled samples, which shows better performance than similar feature extraction methods.

Graphical Abstract

1. Introduction

With the advancement of remotely-sensed hyperspectral imaging instruments, Hyperspectral Images (HSIs) have gained widespread attention throughout the globe. HSI contains enriched information due to the presence of various bands (almost more than hundreds) [1]. HSI provides the comprehensive spectral information of the materials’ physical properties. This ubiquitous technique is applied in agricultural monitoring [2,3], forestry [4], ecosystem monitoring [5], mineral identification [6,7], environmental pollution monitoring [8] and urban growth analysis [9,10]. For the HSIs’ classification, good results usually entail many labeled samples. The images are deemed as high-dimensional points that lie in or nearly in low-dimensional manifolds. Recently, numerous dimensionality reduction algorithms have proposed preserving the local manifold structure of the data, such as Locally Linear Embedding (LLE) [11], Isomap [12] and Laplacian Eigenmap [13]. The main limitation is the lack of sufficient labeled training data since identifying and labeling samples comprise a daunting task. The main drawback of the machine learning approach is overfitting [14,15] due to the limited number of training data when dealing with small sample problems in highly dimensional and nonlinear cases, which is referred to as the “Hughes phenomenon” [16,17].
To resolve the issue above, semi-supervised learning was proposed to utilize both labeled data and the information conveyed by the marginal distribution of the unlabeled samples to boost the algorithmic performance [18,19,20]. Graph embedding means that each node is mapped to a low-dimensional feature vector and tries to maintain the connection between vertices [20]. Cai et al. proposed Semi-supervised Discriminant Analysis (SDA), which is a novel method that takes advantage of labeled and unlabeled samples [21]. Nevertheless, the semi-supervised learning algorithmic performance relies profoundly on the graph construction process. The k-nearest neighbor [12] and Locally Linear Embedding (LLE) neighbors [13,22], as well as other traditional methods depend mainly on the pair of Euclidean distances. However, these methods are not robust to noise as a result of using Euclidean distances to find the pairwise weights. Yan et al. proposed the l 1 graph for SSL (Semi-supervised learning) [23]. Zhu et al. presented a series of novel semi-supervised learning approaches arising from graph representation [18,24,25]. Belkin and Niyogi [26] proposed a regression function that fits the label of the labeled data while maintaining the smoothness of the data. Jebara et al. provided a b-matching graph for SSL, which ensures that each node has the same number of edges in the balanced graph [22]. Zhou et al. [18] conducted semi-supervised learning with local and global consistency.
Low-rank representation is proposed to construct an undirected graph (LR-graph) [27,28,29], which jointly receives the graph of all of the data according to the low-rank constraint [27,28,29]. The graph regards as finite a group of samples in which one sample is associated with a vertex and the weight represents two connecting vertexes’ similarity [20,30,31]. In [32], Zhang et al. introduced the HSI restoration Low-Rank-based Recovery Method (LRMR), which can remove various noises. Veganzones et al. revealed an approach that partitions the image into patches and resolves the fusion issue of each patch by low-rank representation individually [33]. According to LRR (Low-rank Representation) and the Learned Dictionary (LD), a hyperspectral anomaly detector has been put forward, which assumes that the hyperspectral image can disintegrate into a low-rank and a sparse matrix, respectively, representing the background and anomalies [34]. To resolve the failure to preserve spatial information problem, a Tensor Sparse and Low-rank Graph for Discriminant Analysis (TSLGDA) is proposed [35]. In [36], He et al. suggested a spatial-spectral mixed-noise removal method for the HSI. The proposed Sparse and Low-rank Graph-based Discriminant Analysis (SLGDA) incorporates sparsity and low-rank to simultaneously preserve both local and global information [30]. Qi et al. presented a multi-task joint sparse and low-rank representation model for high-resolution satellite image interpretation [37].
The calculation of the LRR graph is a grave concern; because the growing data points subsequently issue the pixels in HSIs, the computational complexity of LRR grows exponentially. To resolve the issues above, we explore the low-rank structure via block theory. Figure 1 shows the formulation of the proposed regularized block low-rank discriminant analysis feature extraction for HSI classification. As shown in Figure 1, we preprocess the hyperspectral image first by the Image Fusion and Recursive Filtering feature (IFRF), which removes the redundant information simultaneously [17]. The IFRF feature method is to select better bands, which eradicates the noise and redundant information concurrently. However, our goal is to improve the graph adjacency between the pixels for the regularization term of regularized discriminant analysis. Consequently, inspired by the LRR algorithm, we aim to exploit the low-rank structure. LR uses all samples as the dictionary, where each sample is represented as a linear combination of the dictionary. The low-rank hypothesis is a global constraint, which guarantees that data points in the same subspace are clustered in the same class [29]. When all samples distribute into independent subspaces, the coefficients of the low-rank assumption reveal the membership of these samples: the within-cluster affinities are dense, and the between-cluster affinities are all zeros.
Afterward, divide the processed image into blocks (subsets) by pixels and implement the low-rank representation on each block image. Eventually, we combine the subsets’ feature representation with a complete feature graph. Further, the k-nearest neighbor is applied to handle the integral low-rank graph to satisfy the symmetric matrix requirements for the regularized graph of discriminant analysis. Additionally, at the same time, k-nearest neighbor preserves the local information of the image [38]. Furthermore, process the semi-supervised discriminant analysis for feature extraction, which takes advantage of the labeled samples and the distribution of the whole samples. Finally, implement the supervised and semi-supervised classifier methods. We perform comprehensive experiments on several real multi-class HSIs. We summarize the main contributions of the paper in the following paragraphs.
  • From the inspiration of LRR and the semi-supervised discriminant analysis algorithm, we proposed a robust feature extraction method, regularized block low-rank discriminant analysis. The block LRR solves the growing computational complexity problem of LRR, which captures the global structure of the data.
  • After image fusion and recursive filtering, we implement our proposed regularized block low-rank discriminant analysis feature extraction method. The kNN approach simultaneously addresses two issues such as SDA’s symmetric requirements and capturing the local information of the HSIs. Consequently, the BLRDA method maximally preserves the local geometry of the data.
  • Extensive experiments on several multi-class HSIs demonstrate that our proposed BLRDA method is a novel feature extraction method not yet evident in the literature. It can enhance the performance of hyperspectral image classification significantly. Given simple supervised and semi-supervised classifiers, the feature graph achieves significant performance in the HSIs’ classification with few labeled samples. Moreover, the BLRDA feature extraction method is much more robust than similar methods.
We organize the remaining part of our paper as follows. We start with a preliminary introduction in Section 2. Section 3 deciphers the HSI classification based on regularized block low-rank discriminant analysis. To check the reproducibility of our proposed method, several experiments on real-world hyperspectral images are carried out in Section 4. In Section 5, we provide the discussion. Conclusions are presented in Section 6.

2. Preliminary

2.1. Regularization

To avoid the overfitting issue, regularization is implemented. Regularization means adding some rules (restrictions) to the objective function that needs training, which amends the solution space and reduces the possibility of finding the wrong solution, so that it does not inflate itself. The constraint is interpreted as a priori knowledge (the regularization parameter is equivalent to introducing the prior distribution to the parameter).
The regularization constraint has a guiding role. When optimizing the error function, it tends to choose the gradient reduction direction. Subsequently, the final solution tends to conform to a priori knowledge (such as the general l-norm a priori knowledge, indicating that the original problem is more likely to be relatively simple and tends to produce sparse parameters).
Regularization has roughly two functions. From the model modification point of view, it is used to balance the two terms in the learning process such as bias-variance, fitting ability-generalization ability, loss function-generalization ability and empirical risk-structural risk. From the model solution point of view, regularization provides the possibility of a unique solution. Eventually, least squares fitting may produce an infinite solution, but adding l 1 or l 2 regularization terms can lead to unique solutions.
In practice, overfitting [14,15] may happen due to limited training samples [16]. To address the overfitting issue, semi-supervised learning was proposed to utilize both labeled samples and unlabeled samples to enhance algorithmic performance. The priori assumption of consistency is the key of SSL, which indicates that nearby points may have the same label or similar embeddings [18]. Cai et al. proposed Semi-supervised Discriminant Analysis (SDA), which takes advantage of both labeled and unlabeled samples [21].
The semi-supervised learning algorithmic performance relies profoundly on the graph construction processes. Hence, inspired by low-rank representation, we aim to improve the semi-supervised discriminant analysis feature extraction method via a novel regularized low-rank-based graph term. Furthermore, we narrated a brief outline of two important techniques, semi-supervised discriminant analysis and the low-rank representation algorithm.

2.2. Overview of Semi-supervised Discriminant Analysis

Semi-supervised Discriminant Analysis (SDA) is derived from Linear Discriminant Analysis (LDA). LDA is a supervised method that minimizes within-class covariance while maximizing between-class covariance. The objective function of LDA is as follows:
a o p t = arg max a a T S b a a T S w a
where S b is defined as the between-class scatter and S w is the within-class scatter matrix.
Overfitting may occur when there are not enough training samples. A possible solution to deal with overfitting could be learned on both labeled and unlabeled data by imposing a normalizer [21]. Applying the notion of the regularized graph, a smoothness penalty is incorporated into the objective function of linear discriminant analysis, and the overfitting problem is solved. The labeled samples in the SDA algorithm are to maximize the different classes’ separability. The graph adjacency of all of the samples is used to estimate the fundamental geometric information, which is called the regularized graph.
Given a set of samples x 1 , , x m , x m + 1 , , x m + l , where N = m + l , the first m samples are labeled as y 1 , , y m , and the remaining l samples are unlabeled. There are c classes. A rejection matrix a is obtained here in the SDA method [21], which can present the prior assumption of consistency according to a regularization term.
a o p t = arg max a a T S b a a T S t a + α J a
Here, S t is defined as the total class scatter matrix. α is the parameter that balances the complexity and empirical loss of the model. The regularized graph J a controls the learning complexity of the hypothesis family.
Given a set of examples, S i j is the graph adjacency that represents the model of the relationships of nearby data points. Typically, the k-nearest neighbor graph is used to calculate the neighbor points’ relationship, where an edge between the nearest neighbor is the weight of each other.
The model can easily incorporate the priori knowledge in the regularization term J a , as follows:
J a = i j a T x i a T x j 2 S i j = 2 i a T x i D i i x i T a 2 i j a T x i S i j x j T a = 2 a T X D S X T a = 2 a T XL X T a
The column (or row, since S is symmetric) sum of S , i.e., D i i = j S i j is the diagonal matrix D . The Laplacian matrix [23] is L = D S . Then, the SDA’s objective function with regularization term J a  is:
a o p t = max a a T S b a a T S t + α XL X T a
We obtain the projective vector a by maximizing the generalized eigenvalue problem where d is the weight matrix’s rank for the labeled graph.
S b a = λ S t + α XL X T a

2.3. Low-Rank Representation

The low-rank representation is first proposed to construct a graph (LR-graph) [23], which jointly receives the graph according to the low-rank constraint. It can efficiently catch the global structures of the data [39].
Suppose X = [ x 1 , x 2 , , x n ] R m × n is the samples set. Each column is an object sample. Therefore, we represent the feature matrix X on a given dictionary A [29] as:
X = AZ
where Z = [ z 1 , z 2 , , z n ] R l × n is the low-rank coefficient matrix and z i is the representation coefficient of x i . z i is a linear combination, which is called the representation coefficient. The so-called low-rank representation is a matrix for this coefficient.
For example, in HSIs, samples of the same category exhibit similar spectra, while samples from different groups do not present similar spectra. Due to the apparent intra-class similarity and inter-class dissimilarity, each sample can be represented well by others in the same class, while not by others in another class. Therefore, when being represented on the dictionary A , samples from the i-th class x i will produce a significant representation on the A i component and small factors on other parts A j ( j i ). The appropriate permutations on columns of X , Z will exhibit apparent diagonal-block structure, as shown in Figure 2.
However, the actual labels of samples are unknown in X ; it is intractable to directly reveal the block-diagonal structure in Z . Nevertheless, the underlying block-diagonal structure enables Z to be low-rank [40]. Therefore, we exploit the low-rank property of Z instead of the block-diagonal structure implicitly. Furthermore, each sample can be well represented due to the intra-class similarity. The following low-rank framework [29] searches the lowest rank solution:
min Z r a n k ( Z ) s . t . X = AZ
Due to the discrete nature of the rank function, the above optimization problem is onerous to sort out. It is evident from the literature that by matrix completion methods (e.g., [41]), the optimization problem is changed to a convex problem:
min Z Z * s . t . X = AZ
where · * is the nuclear norm (i.e., trace norm) [42], which is the matrix’s singular values’ summation. A more reasonable function is proposed by taking into consideration the noise or corruption in real-world situations, where the residual matrix is often sparse.
min Z , E   Z * + λ E l s . t . X = AZ + E
where · l can be the l 2 , 1 -norm or l 1 -norm. Here, the l 2 , 1 -norm is chosen as the error term, which is E 2 , 1 = j = 1 n i = 1 n E i j 2 . λ is to balance the low-rank and the error term. Here, the inexact Augmented Lagrange Multipliers (ALM) method [43,44] is applied to obtain the optimization solution  Z * .

2.4. Image Fusion and Recursive Filtering Feature

Initially, we extracted the spatial information of the HSI by using Image Fusion and Recursive Filtering (IFRF), which is one of the easiest approaches to image fusion [17]. The IFRF feature of an HSI contains the main information of the HSI, which eradicates noise and redundant information simultaneously. It plays a vital role in identifying objects and can be used to discriminate between different classes in the classification problem [17]. Hence, we preprocess the HSI first by IFRF to eliminate noise and redundant information.
Let us suppose R = ( r 1 , r 2 , , r D ) R M × D represents the original hyperspectral image, which has D-dimensional bands and M pixels in the i-th band of the whole image represented by r i . We have attempted to spectrally partition the whole hyperspectral image (hyperspectral bands) into multiple subsets. Each subset is composed of several K continuous bands. Here, we define N as the number of subsets. Further, we can get N = D / K , where D / K indicates the floor operation that computes the biggest integer no larger than D / K . The i-th ( i ( 1 , 2 , , N ) ) subset is shown as follows:
P i = ( r i , , r ( i + K ) ) , i f ( i + K ) D ( r i , , r D ) , o t h e r w i s e .
Afterward, fuse the adjacent bands of each subset by image fusion method (the averaging method). For example, the i-th image fusion feature F i , i.e., the i-th fusion band, is computed as follows,
Q i = n = 1 N i P n i N i
Here, P n i refers to the n-th band in the hyperspectral image’s i-th subset. N i is the i-th subset’s band number. After image fusion, we remove the noise pixels and redundant information for each subset. Then, we obtain the i-th feature by recursive filtering on the above fusion band Q i .
O i = RF δ s , δ r ( Q i )
Here, RF indicates the recursive filtering transform method. δ s is defined as the filter’s spatial standard deviations, and δ r is the range standard deviations [45]. After image fusion and recursive filtering, we obtain the feature image O = [ O 1 , , O N ] R M × N . Let X = O T = [ x 1 , x 2 , , x M ] R N × M be the preprocessed features vector, where x i represents a pixel with N band numbers (dimension) in the hyperspectral image.

3. Methodology

In the above section, we study the classic regularized discriminant analysis, semi-supervised discriminant analysis and the LRR, which can efficiently capture the global structures of the data. Furthermore, the labeled samples, as well as the unlabeled samples can be exploited simultaneously by depicting the underlying block-diagonal structure of Z . Considering this, we have attempted to decipher in our present work the robust regularized block low-rank discriminant analysis feature extraction method. We have comprehensively provided insight into the optimization method based on the inexact ALM algorithm. The framework of the HSI classification applying the proposed graph incorporation is found in the following part of this section.

3.1. Regularized Block Low-Rank Discriminant Analysis

The HSI features’ vector X = [ x 1 , x 2 , , x M ] R N × M , N is the channel number (data dimension), and M is the pixels number (sample or object number). Determining an appropriate subspace for classification is an important task. The low-rank representation feature graph is an onerous task; the operating time increases exponentially with a growing number of samples. Hence, we explore the low-rank structure via block theory, which divides the whole image into blocks for low-rank representation.
We choose the block size as S pixels in each block partition. Let { g 1 , g 2 , , g m } represent m blocks’ set index of the image, where there are S pixels in each g i . To formulate the matrix X , according to { g 1 , g 2 , , g m } , we can put the vectors belonging to the same blocks together in the form of X = { X g 1 , X g 2 , , X g m } , A = { A g 1 , A g 2 , , A g m } and E = { E g 1 , E g 2 , , E g m } . Further, the LRR optimization problem for each block is converted into the following form:
min Z g i , E g i Z g i * + λ E g i l    s . t . X g i = A g i Z g i + E g i
where we choose the l 2 , 1 -norm as the error term · l , which is E g i 2 , 1 = j = 1 n i = 1 n E g i i j 2 . Here, we use the inexact ALM method [43,44] as the optimal solution Z g i * .
For properly optimizing problem (13), by presenting an auxiliary variable J i regarding Z g i * , we convert it to the similar issue as follows [29]:
min Z g i , E g i , J i J i * + λ E g i 2 , 1    s . t . X g i = A g i Z g i + E g i , Z g i = J i
Incorporating an intermediate variable J i , we minimize the augmented Lagrange function as follows via the inexact ALM method [43]:
L ( Z g i , E g i , J i , Y 1 , Y 2 ) = J i * + λ E g i 2 , 1 + t r [ Y 1 , X g i A g i Z g i E g i ] + t r [ Y 2 , Z g i J i ] + μ 2 ( X g i A g i Z g i E g i F 2 + Z g i J i F 2 )
where Y 1 , Y 2 and μ > 0 are Lagrangian multipliers and the penalty parameter, respectively. To minimize the problem (15), we update the variables Z g i , J i , E g i , Y 1 and Y 2 alternately with others fixed. Meanwhile, the sub-problem is convex regarding all the variables; it can supply a relevant, unique solution. The inexact ALM is applied to update the variables J i , Z g i and E g i iteratively. The optimization process outline is given in the following Algorithm 1.
After the block low-rank representation, we get Z g i R S × S . Then, we combine these feature representation subsets with a whole feature graph Z = { Z g 1 , Z g 2 , , Z g m } = [ z 1 , z 2 , , z M ] R S × M .
The combined low-rank graph is composed of multiple blocks graph Z , which is asymmetric. Traditionally, the symmetrization process is used to determine the mean value of its transposition by itself to satisfy the symmetric requirement of SDA. However, the graph adjacency matrix Z is not a square matrix. Our previous work shows that the k-nearest neighbor is an effective symmetrization process that can significantly improve the performance of SDA [38]. Hence, the kNN algorithm integrally handles the combined low-rank graph, which addresses two issues, such as addressing the symmetrization process and further maximally preserving the local information of the image by the kNN’s property.
The samples z i and z j are considered as neighbors if z i is among the k-nearest neighbor of z j or z j is among the k-nearest neighbor of z i . Here, we use the Heat kernel weighting [13] method to assign weights for S as follows:
S i j = exp z i z j 2 2 σ 2 , 0 , i f z i N k z j o r z j N k z i o t h e r w i s e .
where N k z i denotes k neighbors of z i in (16).
Therefore, the BLRDA method can achieve high-quality feature representations for classification.
Algorithm 1 Resolving function (15) by the inexact ALM.
Input: Mapped data graph X g i , regularization parameter α for local affinity. Z g i = J i = 0 , E g i = 0 , Y 1 = 0 , Y 2 = 0 , μ = 10 6 , μ max = 10 6 , ρ = 1.1 and ε = 10 8 .
1:
while No convergence do  
2:
    Fix other variables and update J i ;
         J i = arg min 1 μ J i * + 1 2 J i ( Z g i + Y 2 / μ ) F 2
3:
    Fix other variables and update Z g i ;
         Z g i = ( I + A g i T A g i ) 1 ( A g i T ( X g i E g i ) + J i + ( ( A g i T Y 1 Y 2 ) / μ ) )
4:
    Fix the others and update E g i ;
         E g i = arg min λ μ E g i 2 , 1 + 1 2 E g i ( X g i A g i Z g i + Y 1 / μ ) F 2
5:
    Update the multipliers as follows:
         Y 1 = Y 1 + μ X g i A g i Z g i E g i ,
         Y 2 = Y 2 + μ Z g i J i
6:
    Update parameter μ by μ = min ρ μ , μ max ;
7:
    Examine convergence conditions
         X g i A g i Z g i E g i < ε and Z g i J i < ε  
8:
end while
Output: The Laplacian regularized low-rank representation graph Z g i .

3.2. BLRDA Feature Extraction for HSI Classification

The semi-supervised discriminant analysis method can successfully solve the overfitting problem with few labeled samples. Given a set of samples x i , y i i = 1 l with c classes and unlabeled samples x i i = l + 1 m , the l k is the samples of the k-th class. The algorithmic procedure of HSI classification by applying the regularized block low-rank discriminant analysis feature extraction method for both supervised and semi-supervised classification is stated below:
Step 1 Construct the adjacency graph: Construct the block low-rank and kNN graph S in Formula (16) for the regularization term. In addition, calculate the graph Laplacian L = D S .
Step 2 Construct the labeled graph: for the labeled graph, construct the matrix as:
W = W l × l 0 0 0
Define I ˜ = I 0 0 0 , where I l × l is an identity matrix.
Step 3 Eigen-problem: Calculate the eigenvectors for the generalized eigenvector problem:
XW X T a = λ X I ˜ + α L X T a
where x 1 , , x m , x m + 1 , , x m + l . The W is of rank d, and we will have d eigenvectors denoted as { a 1 , a 2 , , a d } .
Step 4 Regularized discriminant analysis embedding: Let the transformation matrix A = [ a 1 , a 2 , , a d ] R M × d . Finally, embedding the samples into a d-dimensional subspace,
x z = A T x
we can obtain the between-class scatter S b and the total class scatter matrix S t .
XW X T = X l W l × l X l T = S b
X I ˜ X T = X l X l T = S t
Therefore, the Eigen-problem in Formula (18) is the same as the Eigen problem in Formula (5), and we can obtain the projective matrix A :
S b A = λ S t + α XL X T A
where S is the graph after kNN, i.e., S i j is the graph of nearby data points’ relationships. L = D S is the Laplacian matrix [23]. D i i = j S i j is the diagonal matrix D .
Then, graph S can be embedded into the d-dimensional subspace as the SDA embedding matrix Φ .
Φ = A T S
Step 5 HSIs’ Classification: Finally, perform the supervised and semi-supervised classifier nearest neighbor and SVM classifier for classification. The classifiers mentioned above are simple and ubiquitously-used classifiers.

4. Experiments and Analysis

To investigate the performance of the BLRDA feature extraction method, we have conducted extensive experiments on several real multi-class hyperspectral images. In this section, we have incorporated the hyperspectral images used in the present study and illustrated the performance of our proposed methods. We performed the experimental analysis on Intel Core CPU 2.60 GHz and 8 GB RAM machines.

4.1. Experimental Setup

4.1.1. Datasets

We evaluate the proposed method on three hyperspectral images, namely the Indian Pines image, the Pavia University scene and the Salinas image (http://www.ehu.eus/ccwintco/index.php/Hyperspectral_Remote_Sensing_Scenes).
  • The Indian Pines image is for the agricultural Indian Pine test site in Northwestern Indiana, which was obtained by the sensor AVIRIS. There are 145 × 145 pixels and 224 spectral bands between the wavelength range 400 nm and 2500 nm. The image scene comprises two-thirds agriculture and one-third forest or other natural perennial vegetation. It includes two major dual-lane highways, a rail line, as well as some low-density housing, other built structures and smaller roads. There is minuscule coverage, approximately less than 5%, because of some of the crops present; corn and soybeans are in the early stages of growth in June. The ground truth is divided into sixteen classes and is not all mutually exclusive. Twenty water absorption bands (Nos. 104–108, 150–163 and 220) were removed. The Indian Pines image’s color composite and the ground truth image are demonstrated in Figure 3.
  • The Pavia University scene was obtained from an urban area surrounding the University of Pavia, Italy on 8 July 2002. It was recorded by a Reflective Optics System Imaging Spectrometer, which has a 1.3-m spatial resolution and spectral coverage ranging between 0.43 and 0.86 μ m. There are 115 bands of size 610 × 340 in the image. After removing 12 channels, 103 channels are left for testing. The Pavia University scene’s color composite and the corresponding ground truth image are shown in Figure 4.
  • The Salinas image was obtained from Salinas Valley, CA, USA, by the AVIRIS sensor. It has a high spatial resolution with 3.7-m pixels. The image includes 224 bands and comprises 512 lines with 217 samples. Similar to the Indian Pines image, 20 water absorption bands, including [108–112], [154–167] and 224, were discarded. This image is available only as at-sensor radiance data. The image contains vegetables, bare soil, vineyard fields, and so on, with 16 classes. Figure 5 shows the color composite and the corresponding ground truth of the Salinas image.

4.1.2. Evaluation Criteria

To evaluate the proposed method of HSIs, we give some evaluation criteria as follows.
Classification Accuracy (CA) refers to the pixels in the image classification in each class. In the field of remote sensing classification, the confusion matrix [46] is frequently used, which is defined in the form: M = [ m i j ] n × n , where m i j denotes that the number of pixels labeled by j should belong to class i. n is the class number. The reliability of classification depends on the diagonal values of the confusion matrix. Higher values of the confusion matrix are suggested for favorable results.
The three primary indicators we used include the Overall Accuracy (OA), Average Accuracy (AA) and the kappa coefficient [47]. For the three metrics, OA refers to the percentage of pixels correctly classified. AA measures the average correctly-classified pixels’ percentage of each class. To make the measurements more objective, the kappa coefficient is used to estimate the percentage of correctly-classified pixels. Whereas OA and AA check how many of all pixels are classified correctly, assuming that the reference classification (ground truth) is true, here it is assumed that both classification and reference classification are independent class assignments of equal reliability; how well they agree is what is measured. The big advantage of the kappa coefficient over overall accuracy is that the kappa coefficient considers chance agreement and corrects for it. Chance agreement means here the probability that classification and reference classification agree by mere chance. Assuming statistical independence, we obtain for this probability the estimation [48,49].

4.1.3. Classifier Settings

We conduct a series of experiments to test and verify the proposed BLRDA feature extraction method under the supervised and semi-supervised classifiers.
  • The Support Vector Machine (SVM) classifier is a commonly-used supervised learning model for classification and regression analysis.
  • Nearest Neighbor (NN) is used as the semi-supervised classifier, which stores all available samples and classifies new samples based on similarity measures (e.g., distance functions).

4.1.4. Comparative Algorithms

To explain the significant improvement achieved by using the regularized block low-rank discriminant analysis feature extraction method in the hyperspectral images’ classification, several comparative methods are shown in the paper. For fairness, these comparative graphs incorporate the regularized discriminant analysis algorithm, which is shown below,
  • BSDA (Block Sparse Representation Discriminant Analysis) method [43]: The Sparse Representation (SR)-graph considers the sparse representation by resolving the problem: a ^ = arg min a y Xa 1 . The weight of the graph is W i j = a j i .
  • BKDA (Block k-Nearest Neighbor Discriminant Analysis) method [43]: Euclidean distance is employed as our similarity measure, and the Gaussian kernel is adopted for the k-nearest neighbor and kNN feature graph. The nearest neighbor number is set as five here.
  • BLEDA (Block Locally Linear Embedding Discriminant Analysis) method [11]: Neighbors in Locally Linear Embedding (LLE) consider reconstructing a sample from its neighbor points and then minimizes the l 2 reconstruction error.
    min i x i j W i j x j 2 s . t . j W i j = 1
    W i j = 0 if x j does not belong to the neighbors of x i . The nearest neighbor number k is set as five.
  • Image Fusion and Recursive Filtering (IFRF) method mentioned in Section 2.4.

4.2. Supervised and Semi-Supervised HSI Classification Results

To examine the performance of the combined BLRDA feature extraction method, we perform experiments on the three HSIs. Ten independent runs of each algorithm are evaluated by resampling the training samples in each run. We have chosen the mean values as the results. Unlike most of the existing supervised and semi-supervised HSI experiments, we test the performance of all comparative methods using only a small part of the label samples. In a practical scenario, the labeled samples are difficult to capture, and the unlabeled ones are usually present in substantial numbers. Table 1 provides the training and testing data for all classes. For the Indian Pines image, the Pavia University scene and the Salinas image, the training set is approximately 6%, 4% and 0.4%, which are minimal sets compared to the entire dataset. Subsequently, the training sets are chosen randomly. Considering the classes with a meager number of samples, we have incorporated a minimum threshold of training samples. Here, we set the minimum threshold of training samples for each class as five, which can eradicate the difference between the classes with a low number of samples. In our experiments, the filter’s spatial standard deviations and range standard deviations are δ s and δ r with 200 and 0.3, respectively. The parameter σ in k-nearest neighbor S i j = exp z i z j 2 2 σ 2 is 0.1, which is provided randomly. Hence, the following results are not under the best parameters, which shows the robustness and superiority of our method.
Initially, we utilize different graph construction methods to get the regularized graph of the given hyperspectral images. Then, the SDA algorithm is implemented for feature extraction. Furthermore, the NN method and SVM classifier are applied for final classification in the derived low-dimensional feature subspace. Table 2, Table 3 and Table 4 show the detailed classification results for CA, OA and AA, as well as the kappa coefficient obtained from various methods, where the bold numbers suggest the best results for different graph algorithms. Figure 6, Figure 7 and Figure 8 show the classification results (randomly selected from our above experiments) acquired by several methods for the three hyperspectral images associated with the corresponding OA values. It has been observed that the results in the figures are random for each method and do not have comparability. These results reveal the supervised and semi-supervised classification results of different feature extraction methods. From these results, we can see the following.
In most cases, our proposed BLRDA feature extraction method brings about the highest classification accuracy. Therefore, it significantly improves the classification performance of hyperspectral images, which indicates that the BLRDA method is a superior HSI feature extraction method for both NN and SVM classifiers. Consequently, our feature extraction method is robust to both supervised and semi-supervised classification.
From Table 2 and Table 4, we reach the same conclusion that the proposed approaches BLRDA + NN and BLRDA + SVM are preferable to the other methods, particularly BLRDA + SVM. For example, with our methods, the Corn-notill, Corn-mintill, Grass-pasture, Grass-pasture-mowed Oats Soybean-notill and Buildings-Grass-Trees-Drives classes are significantly improved. For the Pavia University scene, BLRDA + SVM gives better results. The traditional graph construction methods (such as kNN-graph and LLE-graph) may perform well in some classes, but they are not as stable as our algorithm.

4.3. Running Time

We analyzed the running times of different models on the Indian Pines scene, Pavia University scene and Salinas scene images. We used 10 separate runs to calculate the total time. We give the mean running time. As shown in Table 5, the execution time of the BLRDA method was slightly longer compared to the others. Although our algorithm is slower than the traditional kNN algorithms, the performance is much better than these baseline algorithms at an acceptable running time.

4.4. Robustness of the BLRDA Algorithm

In the above subsection, we evaluated the performance of the proposed BLRDA method. Fully excavating the superiority of the proposed method, we analyze the robustness of the BLRDA in this subsection. Taking into consideration practical situations, we analyzed the robustness based on the labeled samples’ size and noise.

4.4.1. Robustness to the Size of Labeled Samples

We analyze the impact of different sizes of training and testing sets in this subsection. We perform experiments on these three images, namely the Indian Pines image, the Pavia University scene and the Salinas image. The test was carried out with 10 independent runs for each algorithm. We have calculated the mean value of the results. Figure 9 shows the supervised and semi-supervised classification accuracy of HSIs with different feature extraction methods. The figure compares the overall classification results with different training size samples in each class. The percentage of training samples (in percentage form) grow from 2–14% for the Indian Pines image, 2–8% for the Pavia University scene and 0.2–0.8% for the Salinas image.
In most cases, our proposed BLRDA method consistently achieves the best results, which is robust to the label percentage variations. With the size of the training sample increasing, OA generally increases in all methods that show a similar trend. When the training sample is fixed, the effectiveness of the BLRDA method is usually superior to others, in terms of both the NN and SVM classifier. Similarly, the three classification criteria increase with the number of training samples simultaneously.
It is noteworthy that the method we proposed achieves higher classification accuracy even at very low label rates, while some other compared algorithms are not as robust as our BLRDA algorithm, especially when the label rate is low. Due to the high cost and difficulty of labeling data, our proposed graph for the SDA algorithm is much more robust and suitable for real-world HSIs.

4.4.2. Robustness on Simulated Noisy Hyperspectral Images

In the simulated experiment, we evaluated the noise robustness of the BLRDA method on the three hyperspectral images. Zero-mean Gaussian noise with different variances was added to all bands. For the Indian Pines image and Salinas image, the variance value ranges from 50–250. In the image of the Pavia University scene, the variance of noise varies from 100–500. For different bands, the noise intensity is equal. Figure 10 gives an example of noisy image samples in the Indian Pines image where the band is randomly selected, which is similar to the other two hyperspectral images.
Compare the performance of different methods in these noisy environments. Ten independent runs of each algorithm have been evaluated, where the training samples in each run are resampled. We have calculated the mean value of the results. The labeled sample rates are 6%, 2% and 0.2%, respectively, as shown in Table 6. Figure 11, Figure 12 and Figure 13 show the classification results (randomly selected from our above experiments) for the three noisy hyperspectral images associated with the corresponding OA values. It has been observed that the results in the figures are random for each method and do not have comparability. These results reveal the NN classification results of different feature extraction algorithms with some noise variance σ shown in the caption of these figures. We can see that the results of our method are robust to noise and labeled sample size. With few labeled samples, our method BLRDA is more powerful than other methods, which benefits from the robustness of the low-rank representation to noise and the global property of the LR-graph. As the noise gradually increases, the performance of some methods drops substantially. Our method is robust to noise and suffers little performance degradation. Above all, the BLRDA performs very well with respect to all three experimental hyperspectral images.

4.5. Parameters of the BLRDA Graph Effect

We evaluate the parameters for the BLRDA method. We conduct 10 independent runs per algorithm. We have calculated the mean value of the results. We show the performance of different block sizes and reduction dimensions in Table 7 and Table 8. From Table 7, we can see that the block size increases for (25, 50, 75 and 100), and the training set is about 6%, 4% and 0.4% for the Indian Pines image, the Pavia University scene and the Salinas image.
In general, the classification results increase slightly with the growing block size and the reduced dimension. It is observed that the increase in block size simultaneously accelerates the running time considerably. Therefore, we could use the small block size in actual situations for the purpose of efficiency with minimal classification accuracy loss.
From Table 7, we can conclude that the reduced dimension is robust, which will work on a small block size. Overall, our proposed BLRDA method is much more robust and excellent for supervised and semi-supervised classification of HSIs.

5. Discussion

Classification of HSIs plays a pivotal role in understanding the HSIs. In the present work, we propose a novel approach for HSI feature extraction, Regularized Block Low-rank Discriminant Analysis (BLRDA). Our goal is to enhance the classification accuracy of HSIs by the useful feature extraction method. Experimental results on the three images show that the BLRDA is a competitive feature extraction method with other comparative methods for HSI classification.
From the supervised and semi-supervised experiments illustrated in Table 2, Table 3 and Table 4, we observe that the BLRDA method is an effective feature extraction method, which achieves the highest classification accuracy compared to the other methods. The performance was remarkable even with simple supervised and semi-supervised classifiers (nearest neighbor and SVM) and randomly given parameters. In some case, traditional construct graph methods (such as kNN-graph and LLE-graph) may perform well in some classes, but they are not as stable as our proposed algorithm. The LR-graph captures the global data structures better and obtains the representation of all samples under global low-rank constraints, while the most common k-Nearest Neighbor (kNN) and Locally Linear Embedding (LLE) neighbors use fixed global parameters to determine the weights of the graph with Euclidean distances whose graphical structures are associated with noise instability and are sensitive to noise. Additionally, the SR-graph lacks global constraints, which greatly degrades performance when the data are grossly corrupted. However, the LR-graph addresses these drawbacks, which jointly receives the graph of the whole data and is demonstrated to be robust to the noise. Further, the k-nearest neighbor is used to integrally handle the combined low-rank graph, which performs two functions: preserving the local information of the image and further satisfying the algorithmic requirements for the subsequent dimension reduction procedure. Consequently, it significantly improves the classification performance of hyperspectral images, which indicates that the BLRDA method is a superior HSI feature graph both for supervised classification and for semi-supervised classification.
We analyzed robustness based on the labeled samples’ size effect and noise effect. In Figure 9, we discover that the BLRDA method is usually superior to the others in terms of both the NN and SVM classifier, which is robust to the label percentage variations. Figure 9 also shows that the proposed method achieves higher classification accuracy even at meager label rates. Some of the other compared algorithms are not as robust as our BLRDA algorithm, in particular when the label rate is low. Due to the lack of labeled samples, our proposed method is much more robust and suitable for real-world HSI classification. As we can see in Table 6 and Figure 11, Figure 12 and Figure 13, the results of our method are robust to noise and labeled sample size. As the noise increases, the performance of BLRDA drops very slowly, from 0.9765–0.9481 in the case of the India Pines image, which has little effect in practical situations. However, for the comparative methods BSDA , BKDA, BLEDA and IFRF, overall accuracy drops quickly with the increase in noise. In addition to the India Pines image, we can see the same results from these three HSI results. This benefits from the robustness of the low-rank representation to noise, while k-Nearest Neighbor (kNN) and Locally Linear Embedding (LLE) neighbors use fixed global parameters with Euclidean distances, which are sensitive to noise. The LR-graph better captures the global property of the data. Therefore, the BLRDA method is demonstrated to be robust to the labeled samples’ size and noise, which indicates that it is highly competitive in practical situations.
We evaluate the performance of parameters with different block sizes and reduction dimensions for the BLRDA method, as shown in Table 7 and Table 8. From the tables, we can see that the classification results increase slightly with the growing block size and reduced dimension. Therefore, we could use small block sizes in actual situations for the purpose of efficiency, with classification accuracy loss so minimal that it could be ignored. Moreover, from Table 8, we can conclude that the reduced dimension is robust in a medium-sized block.
The proposed BLRDA method performs quite well and is competitive with respect to supervised and semi-supervised classification of HSIs. In future work, we plan to explore further real-world applications and improve the BLRDA method, thereby making it more efficient. Inspired by one reviewer, in the future, we could divide the image into blocks by a proper community detection algorithm [50,51,52,53] instead of dividing the image by the regular pixels’ position, which is very great advice. The way of dividing blocks may impact the subsequent low-rank representation.

6. Conclusions

In this paper, we presented a novel graph for HSI feature extraction, which is referred to as Regularized Block Low-rank Discriminant Analysis (BLRDA). To reduce computational complexity, the entire image is divided into blocks, and the low-rank representation for each group is implemented separately. The global structure of the hyperspectral image can be captured by low-rank representation. Additionally, the local geometrical structure is preserved by the k-nearest neighbor algorithm. Therefore, the performance of image classification has been enhanced by the BLRDA method. Experiments on several real multi-class hyperspectral images indicate that our proposed BLRDA method is an efficient and robust feature extraction method for both supervised and semi-supervised classifiers.

Author Contributions

B.Z. conceived of and designed the experiments and wrote the paper. W.D. performed some experiments and analyzed the data. K.X. gave some comments and idea for the work. Y.L. and A.A. contributed to the review and revision. S.C. and A.A. carefully edit the language to improve the writing quality.

Funding

This research was funded by [Hebei Province Natural Science Foundation] grant number [E2016202341] and [Research Project of Science and Technology for Hebei Province Higher Education Institutions] grant number [BJ2014013].

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Xu, L.; Zhang, H.; Zhao, M.; Chu, D.; Li, Y. Integrating Spectral and Spatial Features for Hyperspectral Image Classification Using Low-Rank Representation. In Proceedings of the IEEE International Conference on Industrial Technology (ICIT), Toronto, ON, Canada, 22–25 March 2017; pp. 1024–1029. [Google Scholar]
  2. Haboudane, D.; Miller, J.R.; Pattey, E.; Zarco-Tejada, P.J.; Strachan, I.B. Hyperspectral vegetation indices and novel algorithms for predicting green LAI of crop canopies: Modeling and validation in the context of precision agriculture. Remote Sens. Environ. 2004, 90, 337–352. [Google Scholar] [CrossRef]
  3. Dorigo, W.A.; Zurita-Milla, R.; de Wit, A.J.; Brazile, J.; Singh, R.; Schaepman, M.E. A review on reflective remote sensing and data assimilation techniques for enhanced agroecosystem modeling. Int. J. Appl. Earth Obs. Geoinform. 2007, 9, 165–193. [Google Scholar] [CrossRef]
  4. Dalponte, M.; Bruzzone, L.; Vescovo, L.; Gianelle, D. The role of spectral resolution and classifier complexity in the analysis of hyperspectral images of forest areas. Remote Sens. Environ. 2009, 113, 2345–2355. [Google Scholar] [CrossRef]
  5. Ustin, S.L.; Roberts, D.A.; Gamon, J.A.; Asner, G.P.; Green, R.O. Using imaging spectroscopy to study ecosystem processes and properties. AIBS Bull. 2004, 54, 523–534. [Google Scholar] [CrossRef]
  6. Heinz, D.C. Fully constrained least squares linear spectral mixture analysis method for material quantification in hyperspectral imagery. IEEE Trans. Geosci. Remote Sens. 2001, 39, 529–545. [Google Scholar] [CrossRef]
  7. Kruse, F.A.; Boardman, J.W.; Huntington, J.F. Comparison of airborne hyperspectral data and EO-1 Hyperion for mineral mapping. IEEE Trans. Geosci. Remote Sens. 2003, 41, 1388–1400. [Google Scholar] [CrossRef]
  8. Cocks, T.; Jenssen, R.; Stewart, A.; Wilson, I.; Shields, T. The HyMapTM Airborne Hyperspectral Sensor: The System, Calibration and Performance. In Proceedings of the 1st EARSeL workshop on Imaging Spectroscopy, Zurich, Switzerland, 6–8 October 1998; pp. 37–42. [Google Scholar]
  9. De Morsier, F.; Borgeaud, M.; Gass, V.; Thiran, J.P.; Tuia, D. Kernel low-rank and sparse graph for unsupervised and semi-supervised classification of hyperspectral images. IEEE Trans. Geosci. Remote Sens. 2016, 54, 3410–3420. [Google Scholar] [CrossRef]
  10. Hughes, G. On the mean accuracy of statistical pattern recognizers. IEEE Trans. Inf. Theory 1968, 14, 55–63. [Google Scholar] [CrossRef]
  11. Roweis, S.T.; Saul, L.K. Nonlinear dimensionality reduction by locally linear embedding. Science 2000, 290, 2323–2326. [Google Scholar] [CrossRef] [PubMed]
  12. Tenenbaum, J.B.; De Silva, V.; Langford, J.C. A global geometric framework for nonlinear dimensionality reduction. Science 2000, 290, 2319–2323. [Google Scholar] [CrossRef] [PubMed]
  13. Belkin, M.; Niyogi, P. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 2003, 15, 1373–1396. [Google Scholar] [CrossRef]
  14. Ul Haq, Q.S.; Tao, L.; Sun, F.; Yang, S. A fast and robust sparse approach for hyperspectral data classification using a few labeled samples. IEEE Trans. Geosci. Remote Sens. 2012, 50, 2287–2302. [Google Scholar] [CrossRef]
  15. Kuo, B.C.; Chang, K.Y. Feature extractions for small sample size classification problem. IEEE Trans. Geosci. Remote Sens. 2007, 45, 756–764. [Google Scholar] [CrossRef]
  16. Gu, Y.; Wang, Q.; Wang, H.; You, D.; Zhang, Y. Multiple kernel learning via low-rank nonnegative matrix factorization for classification of hyperspectral imagery. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2015, 8, 2739–2751. [Google Scholar] [CrossRef]
  17. Kang, X.; Li, S.; Benediktsson, J.A. Feature extraction of hyperspectral images with image fusion and recursive filtering. IEEE Trans. Geosci. Remote Sens. 2014, 52, 3742–3752. [Google Scholar] [CrossRef]
  18. Zhou, D.; Bousquet, O.; Lal, T.N.; Weston, J.; Schölkopf, B. Learning with local and global consistency. In Advances in Neural Information Processing Systems; MIT Press: Vancouver, BC, Canada, 2004; pp. 321–328. [Google Scholar]
  19. Guo, Z.; Zhang, Z.; Xing, E.P.; Faloutsos, C. Semi-Supervised Learning based on Semiparametric Regularization. In Proceedings of the SIAM International Conference on Data Mining, Atlanta, Georgia, 24–26 April 2008; pp. 132–142. [Google Scholar]
  20. Subramanya, A.; Talukdar, P.P. Graph-based semi-supervised learning. Synth. Lect. Artif. Intell. Mach. Learn. 2014, 8, 1–125. [Google Scholar] [CrossRef]
  21. Cai, D.; He, X.; Han, J. Semi-Supervised Discriminant Analysis. In Proceedings of the IEEE 11th International Conference on Computer Vision, Rio de Janeiro, Brazil, 14–20 October 2007; pp. 1–7. [Google Scholar]
  22. Jebara, T.; Wang, J.; Chang, S.F. Graph Construction and b-Matching for Semi-Supervised Learning. In Proceedings of the 26th Annual International Conference on Machine Learning, Montreal, QC, Canada, 14–18 June 2009; pp. 441–448. [Google Scholar]
  23. Yan, S.; Wang, H. Semi-supervised Learning by Sparse Representation. In Proceedings of the SIAM International Conference on Data Mining, Sparks, NV, USA, 30 April–2 May 2009; pp. 792–801. [Google Scholar]
  24. Zhu, X.; Lafferty, J.; Rosenfeld, R. Semi-Supervised Learning with Graphs. Ph.D. Thesis, Carnegie Mellon University, Pittsburgh, PA, USA, 2005. [Google Scholar]
  25. Zhu, X.; Ghahramani, Z.; Lafferty, J.D. Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions. In Proceedings of the 20th International Conference on Machine Learning (ICML-03), Washington, DC, USA, 21–24 August 2003; pp. 912–919. [Google Scholar]
  26. Belkin, M.; Matveeva, I.; Niyogi, P. Regularization and semi-supervised learning on large graphs. In International Conference on Computational Learning Theory; Springer: Berlin/Heidelberg, Germany, 2004; pp. 624–638. [Google Scholar]
  27. Zhuang, L.; Gao, H.; Lin, Z.; Ma, Y.; Zhang, X.; Yu, N. Non-Negative Low Rank and Sparse Graph for Semi-Supervised Learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA, 16–21 June 2012; pp. 2328–2335. [Google Scholar]
  28. Zhuang, L.; Gao, S.; Tang, J.; Wang, J.; Lin, Z.; Ma, Y.; Yu, N. Constructing a nonnegative low-rank and sparse graph with data-adaptive features. IEEE Trans. Image Proc. 2015, 24, 3717–3728. [Google Scholar] [CrossRef] [PubMed]
  29. Liu, G.; Lin, Z.; Yu, Y. Robust Subspace Segmentation by Low-Rank Representation. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), Haifa, Israel, 21–25 June 2010; pp. 663–670. [Google Scholar]
  30. Li, W.; Liu, J.; Du, Q. Sparse and Low-Rank Graph for Discriminant Analysis of Hyperspectral Imagery. IEEE Trans. Geosci. Remote Sens. 2016, 54, 4094–4105. [Google Scholar] [CrossRef]
  31. Rohban, M.H.; Rabiee, H.R. Supervised neighborhood graph construction for semi-supervised classification. Patt. Recognit. 2012, 45, 1363–1372. [Google Scholar] [CrossRef]
  32. Zhang, H.; He, W.; Zhang, L.; Shen, H.; Yuan, Q. Hyperspectral image restoration using low-rank matrix recovery. IEEE Trans. Geosci. Remote Sens. 2014, 52, 4729–4743. [Google Scholar] [CrossRef]
  33. Veganzones, M.A.; Simoes, M.; Licciardi, G.; Yokoya, N.; Bioucas-Dias, J.M.; Chanussot, J. Hyperspectral super-resolution of locally low-rank images from complementary multisource data. IEEE Trans. Image Proc. 2016, 25, 274–288. [Google Scholar] [CrossRef] [PubMed]
  34. Niu, Y.; Wang, B. Hyperspectral Anomaly Detection Based on Low-Rank Representation and Learned Dictionary. Remote Sens. 2016, 8, 289. [Google Scholar] [CrossRef]
  35. Pan, L.; Li, H.C.; Deng, Y.J.; Zhang, F.; Chen, X.D.; Du, Q. Hyperspectral Dimensionality Reduction by Tensor Sparse and Low-Rank Graph-Based Discriminant Analysis. Remote Sens. 2017, 9, 452. [Google Scholar] [CrossRef]
  36. He, W.; Zhang, H.; Zhang, L.; Shen, H. Total-variation-regularized low-rank matrix factorization for hyperspectral image restoration. IEEE Trans. Geosci. Remote Sens. 2016, 54, 178–188. [Google Scholar] [CrossRef]
  37. Qi, K.; Liu, W.; Yang, C.; Guan, Q.; Wu, H. Multi-Task Joint Sparse and Low-Rank Representation for the Scene Classification of High-Resolution Remote Sensing Image. Remote Sens. 2016, 9, 10. [Google Scholar] [CrossRef]
  38. Zu, B.; Xia, K.; Pan, Y.; Niu, W. A Novel Graph Constructor for Semisupervised Discriminant Analysis: Combined Low-Rank and-Nearest Neighbor Graph. Comput. Intell. Neurosci. 2017, 2017. [Google Scholar] [CrossRef] [PubMed]
  39. Cortes, C.; Mohri, M. On transductive regression. Adv. Neural Inf. Proc. Syst. 2007, 19, 305. [Google Scholar]
  40. Zhang, L.; Wei, W.; Zhang, Y.; Shen, C.; van den Hengel, A.; Shi, Q. Cluster sparsity field for hyperspectral imagery denoising. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2016; pp. 631–647. [Google Scholar]
  41. Candès, E.J.; Li, X.; Ma, Y.; Wright, J. Robust principal component analysis? J. ACM (JACM) 2011, 58, 11. [Google Scholar] [CrossRef]
  42. Cai, J.F.; Candès, E.J.; Shen, Z. A singular value thresholding algorithm for matrix completion. SIAM J. Opt. 2010, 20, 1956–1982. [Google Scholar] [CrossRef]
  43. Lin, Z.; Chen, M.; Ma, Y. The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices. arXiv, 2010; arXiv:1009.5055. [Google Scholar]
  44. Liu, G.; Lin, Z.; Yan, S.; Sun, J.; Yu, Y.; Ma, Y. Robust recovery of subspace structures by low-rank representation. Patt. Anal. Mach. Intell. IEEE Trans. 2013, 35, 171–184. [Google Scholar] [CrossRef] [PubMed]
  45. Gastal, E.S.; Oliveira, M.M. Domain transform for edge-aware image and video processing. In ACM Transactions on Graphics (ToG); ACM: New York, NY, USA, 2011; Volume 30, p. 69. [Google Scholar]
  46. Congalton, R.G. A review of assessing the accuracy of classifications of remotely sensed data. Remote Sens. Environ. 1991, 37, 35–46. [Google Scholar] [CrossRef]
  47. Richards, J.A.; Richards, J. Remote Sensing Digital Image Analysis; Springer: Berlin/Heidelberg, Germany, 1999; Volume 3. [Google Scholar]
  48. Thompson, W.D.; Walter, S.D. A reappraisal of the kappa coefficient. J. Clin. Epidemiol. 1988, 41, 949–958. [Google Scholar] [CrossRef]
  49. Gwet, K. Kappa statistic is not satisfactory for assessing the extent of agreement between raters. Stat. Methods Int. Reliab. Assess. 2002, 1, 1–6. [Google Scholar]
  50. Javed, M.A.; Younis, M.S.; Latif, S.; Qadir, J.; Baig, A. Community detection in networks: A multidisciplinary review. J. Netw. Comput. Appl. 2018. [Google Scholar] [CrossRef]
  51. Agreste, S.; De Meo, P.; Fiumara, G.; Piccione, G.; Piccolo, S.; Rosaci, D.; Sarné, G.M.; Vasilakos, A.V. An empirical comparison of algorithms to find communities in directed graphs and their application in web data analytics. IEEE Trans. Big Data 2017, 3, 289–306. [Google Scholar] [CrossRef]
  52. He, K.; Li, Y.; Soundarajan, S.; Hopcroft, J.E. Hidden community detection in social networks. Inf. Sci. Int. J. 2017, 425, 92–106. [Google Scholar] [CrossRef]
  53. Nathan, E.; Zakrzewska, A.; Riedy, J.; Bader, D.A. Local Community Detection in Dynamic Graphs Using Personalized Centrality. Algorithms 2017, 10, 102. [Google Scholar] [CrossRef]
Figure 1. Formulation of the proposed regularized block low-rank discriminant analysis feature extraction for Hyperspectral Image (HSI) classification. IFRF, Image Fusion and Recursive Filtering; SDA, Semi-supervised Discriminant Analysis.
Figure 1. Formulation of the proposed regularized block low-rank discriminant analysis feature extraction for Hyperspectral Image (HSI) classification. IFRF, Image Fusion and Recursive Filtering; SDA, Semi-supervised Discriminant Analysis.
Remotesensing 10 00817 g001
Figure 2. The representation with a diagonal-block structure.
Figure 2. The representation with a diagonal-block structure.
Remotesensing 10 00817 g002
Figure 3. Indian Pines dataset. (a) Three-band color composite of the Indian Pines image. (b,c) Ground truth image and reference data.
Figure 3. Indian Pines dataset. (a) Three-band color composite of the Indian Pines image. (b,c) Ground truth image and reference data.
Remotesensing 10 00817 g003
Figure 4. Pavia University scene dataset. (a) Three-band color composite of the Pavia University scene. (b,c) Ground truth image and reference data.
Figure 4. Pavia University scene dataset. (a) Three-band color composite of the Pavia University scene. (b,c) Ground truth image and reference data.
Remotesensing 10 00817 g004
Figure 5. Salinas dataset. (a) Three-band color composite of the Salinas image. (b,c) Ground truth image and reference data.
Figure 5. Salinas dataset. (a) Three-band color composite of the Salinas image. (b,c) Ground truth image and reference data.
Remotesensing 10 00817 g005
Figure 6. Supervised and semi-supervised classification results (Indian Pines image) obtained by: (a) BLRDA + NN; (b) BSDA + NN; (c) BKDA + NN; (d) BLEDA + NN; (e) IFRF + NN; (f) BLRDA + SVM; (g) BSDA + SVM; (h) BKDA + SVM; (i) BLEDA + SVM; (j) IFRF + SVM.
Figure 6. Supervised and semi-supervised classification results (Indian Pines image) obtained by: (a) BLRDA + NN; (b) BSDA + NN; (c) BKDA + NN; (d) BLEDA + NN; (e) IFRF + NN; (f) BLRDA + SVM; (g) BSDA + SVM; (h) BKDA + SVM; (i) BLEDA + SVM; (j) IFRF + SVM.
Remotesensing 10 00817 g006
Figure 7. Supervised and semi-supervised classification results (Pavia University scene) obtained by: (a) BLRDA + NN, (b) BSDA + NN; (c) BKDA + NN; (d) BLEDA + NN; (e) IFRF + NN; (f) BLRDA + SVM; (g) BSDA + SVM; (h) BKDA + SVM; (i) BLEDA + SVM; (j) IFRF + SVM.
Figure 7. Supervised and semi-supervised classification results (Pavia University scene) obtained by: (a) BLRDA + NN, (b) BSDA + NN; (c) BKDA + NN; (d) BLEDA + NN; (e) IFRF + NN; (f) BLRDA + SVM; (g) BSDA + SVM; (h) BKDA + SVM; (i) BLEDA + SVM; (j) IFRF + SVM.
Remotesensing 10 00817 g007
Figure 8. Supervised and semi-supervised classification results (Salinas image) obtained by: (a) BLRDA + NN; (b)BSDA + NN; (c) BKDA + NN; (d) BLEDA + NN; (e) IFRF + NN; (f) BLRDA + SVM; (g) BSDA + SVM; (h) BKDA + SVM; (i) BLEDA + SVM; (j) IFRF + SVM.
Figure 8. Supervised and semi-supervised classification results (Salinas image) obtained by: (a) BLRDA + NN; (b)BSDA + NN; (c) BKDA + NN; (d) BLEDA + NN; (e) IFRF + NN; (f) BLRDA + SVM; (g) BSDA + SVM; (h) BKDA + SVM; (i) BLEDA + SVM; (j) IFRF + SVM.
Remotesensing 10 00817 g008
Figure 9. Supervised and semi-supervised classification accuracy of HSIs with different percentages of training samples. LLE, Locally Linear Embedding.
Figure 9. Supervised and semi-supervised classification accuracy of HSIs with different percentages of training samples. LLE, Locally Linear Embedding.
Remotesensing 10 00817 g009
Figure 10. Noised hyperspectral image example: Three-band color composite of noised Indian Pines image.
Figure 10. Noised hyperspectral image example: Three-band color composite of noised Indian Pines image.
Remotesensing 10 00817 g010
Figure 11. NN classification results on noisy Indian Pines image (noise variance σ = 250) obtained by: (a) BLRDA; (b) BSDA; (c) BKDA; (d) BLEDA; (e) IFRF.
Figure 11. NN classification results on noisy Indian Pines image (noise variance σ = 250) obtained by: (a) BLRDA; (b) BSDA; (c) BKDA; (d) BLEDA; (e) IFRF.
Remotesensing 10 00817 g011
Figure 12. NN classification results on noisy Pavia University scene (noise variance σ = 500) obtained by: (a) BLRDA; (b) BSDA; (c) BKDA; (d) BLEDA; (e) IFRF.
Figure 12. NN classification results on noisy Pavia University scene (noise variance σ = 500) obtained by: (a) BLRDA; (b) BSDA; (c) BKDA; (d) BLEDA; (e) IFRF.
Remotesensing 10 00817 g012
Figure 13. NN classification results on noisy Salinas image (noise variance σ = 500) obtained by: (a) BLRDA; (b) BSDA; (c) BKDA; (d) BLEDA; (e) IFRF.
Figure 13. NN classification results on noisy Salinas image (noise variance σ = 500) obtained by: (a) BLRDA; (b) BSDA; (c) BKDA; (d) BLEDA; (e) IFRF.
Remotesensing 10 00817 g013
Table 1. Training and testing samples for the three hyperspectral images.
Table 1. Training and testing samples for the three hyperspectral images.
ClassIndian Pines ImagePavia University SceneSalinas Image
TrainTestSample No.TrainTestSample No.TrainTestSample No.
C183846271636066311419952009
C2911337142875217,89718,6492037063726
C35577583090200920991319631976
C420217237128293630641113831394
C53444948359128613451626622678
C649681730207482250292139383959
C77212859127113302035593579
C834444478153352936825111,22011,271
C971320439049473061736203
C1064908972 1932593278
C1115323022455 1010581068
C1241552593 1319141927
C1318187205 9907916
C148111841265 1010601070
C1529357386 3572337268
C16118293 1317941807
Total702954710,249176241,01442,77630553,82454,129
Table 2. Supervised and semi-supervised classification results for the Indian Pines image. BLRDA, Block Low-Rank Discriminant Analysis. BSDA, Block Sparse Representation Discriminant Analysis. BKDA, Block k-Nearest Neighbor Discriminant Analysis. BLEDA, Block Locally Linear Embedding Discriminant Analysis. IFRF, Image Fusion and Recursive Filtering.
Table 2. Supervised and semi-supervised classification results for the Indian Pines image. BLRDA, Block Low-Rank Discriminant Analysis. BSDA, Block Sparse Representation Discriminant Analysis. BKDA, Block k-Nearest Neighbor Discriminant Analysis. BLEDA, Block Locally Linear Embedding Discriminant Analysis. IFRF, Image Fusion and Recursive Filtering.
MethodBLRDABSDABKDABLEDAIFRFBLRDABSDABKDABLEDAIFRF
ClassifierNNSVM
110.98680.94230.93510.918110.96750.98930.99190.9925
20.96460.93480.9240.89790.84920.95610.93390.9280.91330.9615
30.96210.92510.93210.93330.86580.96290.95880.93830.91480.9674
40.92370.91260.93340.94980.81860.98790.88220.9830.96890.9499
50.97360.97870.97480.96390.97190.99120.97370.9480.96290.9880
60.99160.9870.98080.98070.94880.99630.9930.99680.99900.9824
70.98860.85230.81150.91810.555610.8580.95690.97880.7186
81111111111
90.88840.85670.86290.88810.714710.96430.97140.98570.8865
100.94140.95690.94470.95480.87050.98240.93530.9430.95790.9534
110.98840.97570.97670.97430.92390.95960.94960.9490.95620.9670
120.98250.95780.92340.96770.81240.97820.96720.9670.93150.9543
1311110.9815110.998911
140.99510.99730.99540.99690.99620.99960.99940.99750.99630.9944
150.97310.97530.98670.9740.96530.99290.97190.9670.95180.9672
160.98780.98780.98780.98780.98160.98770.98360.98630.97810.9755
OA0.99790.96620.95670.95710.91120.97130.95720.95680.95440.9703
AA0.99670.9560.94530.95630.88590.99660.95130.96330.96450.9537
Kappa0.99760.96150.95060.9510.89860.99670.95120.95060.94800.9661
Table 3. Supervised and semi-supervised classification results for the Pavia University scene. BLRDA, Block Low-Rank Discriminant Analysis. BSDA, Block Sparse Representation Discriminant Analysis. BKDA, Block k-Nearest Neighbor Discriminant Analysis. BLEDA, Block Locally Linear Embedding Discriminant Analysis. IFRF, Image Fusion and Recursive Filtering.
Table 3. Supervised and semi-supervised classification results for the Pavia University scene. BLRDA, Block Low-Rank Discriminant Analysis. BSDA, Block Sparse Representation Discriminant Analysis. BKDA, Block k-Nearest Neighbor Discriminant Analysis. BLEDA, Block Locally Linear Embedding Discriminant Analysis. IFRF, Image Fusion and Recursive Filtering.
MethodBLRDABSDABKDABLEDAIFRFBLRDABSDABKDABLEDAIFRF
ClassifierNNSVM
10.98260.9720.97110.9660.90390.96550.94230.95040.95660.9756
20.99680.99290.99250.99110.98330.99510.94310.99370.99410.9967
30.97240.96150.93450.95750.87530.96410.96350.94240.94390.9451
40.99060.99670.98890.99390.96340.99510.99460.97040.98610.9817
5110.99960.99840.99660.99790.95860.85610.92410.94
60.99820.99730.99590.99610.97950.99730.97830.99890.99790.9988
70.97220.95390.95020.97720.87870.97310.8970.96590.97370.966
80.96510.94360.94470.94720.85100.96000.99910.9430.94880.9569
90.95010.96610.91450.9270.88110.95570.93520.92020.92510.9573
OA0.98870.98310.98010.98150.94650.98470.97970.97550.97710.9828
AA0.98090.97620.9670.97310.92360.97820.96360.95430.96110.9687
Kappa0.98500.9780.97360.97540.92890.97960.97310.96740.96960.9772
Table 4. Supervised and semi-supervised classification results for the Salinas image. BLRDA, Block Low-Rank Discriminant Analysis. BSDA, Block Sparse Representation Discriminant Analysis. BKDA, Block k-Nearest Neighbor Discriminant Analysis. BLEDA, Block Locally Linear Embedding Discriminant Analysis. IFRF, Image Fusion and Recursive Filtering.
Table 4. Supervised and semi-supervised classification results for the Salinas image. BLRDA, Block Low-Rank Discriminant Analysis. BSDA, Block Sparse Representation Discriminant Analysis. BKDA, Block k-Nearest Neighbor Discriminant Analysis. BLEDA, Block Locally Linear Embedding Discriminant Analysis. IFRF, Image Fusion and Recursive Filtering.
MethodBLRDABSDABKDABLEDAIFRFBLRDABSDABKDABLEDAIFRF
ClassifierNNSVM
11110.99990.99291110.99831
210.99920.99560.99990.984210.99480.99190.9981
30.99920.9880.99760.99650.98390.99970.99570.99570.98710.9979
40.97720.95670.95890.94550.90020.98340.93870.94550.96090.9449
50.99140.99960.99940.99690.9732110.99830.99930.9918
60.99990.99920.99961110.9970.99920.99371
70.99880.99930.99930.99660.98640.99710.99790.99780.99460.9953
80.99270.98590.98310.98780.95470.98750.980.99180.98820.9826
90.99810.99670.99490.99240.99670.99990.99690.99480.99880.9993
100.99350.99690.99360.99510.98500.99620.9870.99180.95920.9933
110.99980.99670.99760.98630.922210.99760.99440.99510.9939
120.97370.96440.9710.97410.93080.99790.97510.98680.96750.9961
130.98250.98420.99070.96780.891210.99830.98930.98860.9733
140.96270.97660.98790.98450.97890.98950.98620.97070.96220.9697
150.96170.96770.98020.95670.82730.99630.910.96460.94750.9869
160.9994110.99990.997010.99480.999410.9998
OA0.990.98830.990.98660.95000.99310.97950.98930.98710.9905
AA0.9890.98850.99110.98690.95650.99460.98610.99030.98870.9891
Kappa0.98880.98690.98880.98510.94430.99230.97710.98810.98560.9894
Table 5. Run time of different methods on real-word HSIs (units).
Table 5. Run time of different methods on real-word HSIs (units).
MethodBLRDABSDABKDABLEDAIFRFBLRDABSDABKDABLEDAIFRF
ClassifierNNSVM
Indian Pines32.0569.3619.6068.0312.247.5468.788.1667.9273.87
Pavia U238.11385.92176.01292.8117.72125.41215.38140.19233.1262.92
Salinas307.85297.81243.32230.618.58163.50163.43125.95128.949.70
Table 6. Overall accuracy with varying variance Gaussian noise on three HSIs.
Table 6. Overall accuracy with varying variance Gaussian noise on three HSIs.
Hyperspectral Images σ BLRDABSDABKDABLEDAIFRF
Indian Pines image (6%)500.97650.94850.94120.93960.9061
1000.96720.93000.92240.91920.8847
1500.96290.90050.90420.89340.8667
2000.96010.88430.89040.87790.8387
2500.94810.86270.87600.84620.8163
Pavia University scene (2%)1000.97340.96070.95340.96680.9099
2000.97250.96260.93360.94310.9018
3000.96270.95250.92610.92710.9085
4000.95830.93610.91230.91300.8913
5000.95290.92790.90520.90170.8952
Salinas image (0.2%)500.97530.96420.95180.94530.9238
1000.96780.92800.91980.92800.9147
1500.94950.89960.89000.89700.8958
2000.94820.87260.86190.87420.8899
2500.93210.86790.85030.83530.8720
Table 7. Classification accuracy of the BLRDA method with different block sizes.
Table 7. Classification accuracy of the BLRDA method with different block sizes.
ImagesClassifierNNSVM
Block SizeOAAAKappaTimeOAAAKappaTime
Indian Pines image250.97840.96730.975428.060.97210.98310.968164.28
500.97570.96990.971234.010.97280.98360.971369.96
750.97620.96760.972840.880.97540.97910.971976.70
1000.97760.97250.974446.390.97820.98190.975182.40
University of Pavia image250.98840.98140.9846212.050.98330.96990.9779295.10
500.98870.98090.9850238.110.98280.96870.9772315.92
750.98770.98170.9837258.570.98470.97360.9796339.07
1000.98630.97770.9818277.800.98540.97340.9807378.05
Salinas image250.99630.99580.9960315.900.99170.99220.9907283.85
500.99790.99670.9976340.800.99300.99240.9923315.72
750.99810.99730.9979383.070.99300.99170.9922353.74
1000.99850.99780.9984412.470.99410.99380.9934383.43
Table 8. Classification accuracy of the BLRDA method with different reduced dimensions.
Table 8. Classification accuracy of the BLRDA method with different reduced dimensions.
ImagesClassifierNNSVM
DimensionOAAAKappaOAAAKappa
Indian Pines image120.99760.99650.99740.97200.97430.9680
140.99770.99660.99740.97020.97500.9660
160.99800.99680.99780.97110.97720.9670
180.99800.99680.99780.96880.97500.9644
200.99800.99680.99780.97050.97550.9663
University of Pavia image120.98830.98000.98450.98270.96900.9771
140.98850.98080.98480.98230.96780.9765
160.98870.98100.98500.98240.96860.9767
180.98880.98120.98510.98320.96980.9777
200.98900.98160.98540.98350.96830.9781
Salinas image120.99810.99670.99790.99210.99230.9909
140.99810.99690.99790.99310.99180.9920
160.99830.99710.99810.99190.99250.9907
180.99830.99700.99810.99540.99500.9946
200.99830.99700.99820.99600.99540.9953

Share and Cite

MDPI and ACS Style

Zu, B.; Xia, K.; Du, W.; Li, Y.; Ali, A.; Chakraborty, S. Classification of Hyperspectral Images with Robust Regularized Block Low-Rank Discriminant Analysis. Remote Sens. 2018, 10, 817. https://doi.org/10.3390/rs10060817

AMA Style

Zu B, Xia K, Du W, Li Y, Ali A, Chakraborty S. Classification of Hyperspectral Images with Robust Regularized Block Low-Rank Discriminant Analysis. Remote Sensing. 2018; 10(6):817. https://doi.org/10.3390/rs10060817

Chicago/Turabian Style

Zu, Baokai, Kewen Xia, Wei Du, Yafang Li, Ahmad Ali, and Sagnik Chakraborty. 2018. "Classification of Hyperspectral Images with Robust Regularized Block Low-Rank Discriminant Analysis" Remote Sensing 10, no. 6: 817. https://doi.org/10.3390/rs10060817

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop