1. Introduction
A hyperspectral image (HSI) records a wide range of electromagnetic wave data reflected by the earth’s surface. HSI has been widely used in agricultural mapping [
1] and mineral identification [
2], and due to its high-resolution spectral record of the land covers, HSI data is suitable for the classification of different objects on land [
3,
4,
5]. However, among all HSI data acquired, the labeled one is very limited. In this situation, semi-supervised learning (SSL) provides a promising way to deal with both the limited labeled data and the rich unlabeled data [
6,
7].
In recent years, many groups have applied SSL methods to the HSI classification area. The typical SSL methods include the self-training method [
8], the collaborative training method [
9], the generative model method [
10] and the graph-based method [
11]. The self-training method [
8] adds pseudo-labels to high-confidence unlabeled samples in each iteration until all the unlabeled samples are labeled. The collaborative learning [
9] is proposed to make the HSI classification performances more reasonable and promising within limited labeled data samples which combines activate learning (AL) with SSL. The generative models such as expectation-maximization algorithms with finite-mixture models [
10] have been applied for HSI classification. It is worth mentioning a self-training method based on convolutional neural networks (CNN) proposed by Wu et al. [
12]. In their work, authors propose a CNN-based classification framework which uses self-training to gradually assign pseudo labels to unlabeled samples by clustering and employs spatial constraints to regulate self-training process. It is an attractive work that combines the spectral neighborhood information with the spectral information and achieves high performance on several datasets. However, the CNN-based method could be time-consuming at the training stage, and the performance of a self-training model is highly dependent on the initial samples it chooses.
Among all SSL methods, the graph-based method [
13] has attracted attention from many researchers because it is easy to analyze the mathematical formulation and can obtain a close-form solution. On the other hand, sparse representation (SR) provides us a reliable way, due to its solid foundation in mathematics, to describe the linkage between samples, which could help the graph building. The SR method was first introduced by Yan et al. [
13] and Cheng et al. [
14] to generate the L1-graph. Afterwards, the SR-based graph method was applied in the HSI classification [
13,
14,
15,
16,
17]. For example, Gu et al. [
15] proposed the L1-graph semi-supervised learning model for hyperspectral image classification, and Shao et al. [
17] presented the probabilistic class structure regularized sparse representation (PCSSR) approach which outperforms state-of-the-art algorithms in graph construction in most cases [
18,
19]. Different from normal SR methods, the PCSSR approach introduces a probabilistic class structure regularizer into the SR model, where probabilistic class structure reflects the probabilistic relationship between each sample and each class, and further, calculates the distance between each two samples based on their probabilistic relationship. With the distance information provided, the process of the SR algorithm will be guided by it. Therefore, the key point of the PCSSR algorithm is the distance information and how to generate it properly. In the previous study, however, researchers only apply another SR process to compute out the distance information, which only focuses on the spectral information but fails to utilize the spatial information.
Despite the highly discriminative capability to achieve high classification accuracy, PCSSR suffers from the limitation for neglecting the spatial information of HSI. Since sample pixels have the characteristics of spatial continuality, failing to consider spatial information would miss such important characteristics that are beneficial for enhancing classification capability. Referring to the classification results in PCSSR paper, within a wide range of land covers for a certain class, we may observe mislabeled pixels that have been classified into a wrong class. Therefore, we conclude that the classification results by using only spectral information would lack spatial continuality and smoothness.
In order to address the above-mentioned limitation, this work aims to incorporate the spatial distance information into PCSSR to improve the discriminative capability of PCSSR. In addition, for better estimating the spatial distance, we propose a new measurement method for spatial distance called class adjusted spatial distance (CASD). This new method takes into account both the spatial distance and class difference between each two pixels. By such means, we can obtain appropriate discriminative information for pixels belonging to the same class but with long spatial distance, by assigning a relatively small CASD value. The effectiveness of employing CASD for the regularization process in PCSSR was thoroughly verified by the experimental results. Experimental results on different datasets demonstrate that the proposed method can significantly improve the classification accuracy by incorporating the spatial information in the CASD metrics. Compared with state-of-the-start classification models, the proposed method achieves the highest overall accuracies of 99.71%, 97.13%, and 97.07% on BOT, KSC, and truncated IND PINE datasets, respectively, with a small number of training samples selected from each class. Specifically, the main contributions of this paper include the following two aspects.
We propose the concept of the CASD. The calculation of the CASD based mainly on the planar Euclidean distance and the shortest path algorithm. The CASD takes the class similarity between samples into consideration, which can make the measurement of distance more accurate and reasonable.
We apply the CASD to estimate the distance information needed in the PCSSR algorithm. The results show that, this approach can enhance the performance of the PCSSR algorithm when enough training samples are provided. We achieve the highest improvement of classification accuracy of 8.65% and 3.85% on the KSC and the BOT dataset when the number of labeled samples selected from each class reaches 20, and achieve 15.97% on the truncated IND PINE dataset when the number of labeled samples selected from each class reaches 15.
2. Related Works
This section provides a brief discussion of existing graph construction methods for HSI classification. During the process of graph-based SSL method, label propagation (LP) is a crucial step for transferring labels from a limited number of labeled samples to abundant unlabeled samples [
6] with a given graph which denotes the connection among all samples. The basic idea of the LP algorithm is to assume that similar samples should have similar labels, so the mathematical way of achieving this purpose is to define an energy function (see Equation (8)) for the given graph that is used to judge the “smoothness” of the classification results—if the results meet the assumption of LP (i.e., similar samples should have similar labels), the value of the energy function will be small and vice versa.
To implement the above-mentioned procedure, we need to first obtain a well-constructed graph and provide an accurate adjacent matrix. The adjacency matrix of the graph reflects the relationship between samples, and a well-constructed graph should denote the similarity between samples honestly. Therefore, we need to find a good and proper method to generate an accurate similarity matrix, i.e., the adjacency matrix of the graph. Different from traditional graph construction methods, SR-based methods have the capabilities of learning the local relationship from samples and computing the well-discriminated edge weights of the graph, and therefore are robust to noises and parameter variations. We discuss below some representative methods in these two categories.
2.1. Traditional Graph Construction Methods
The process of graph construction is momentous in graph-based SSL which mainly involves two steps: building the graph adjacency structure and calculating the graph edge weight. For building graph adjacency structure, k-nearest neighbors (KNN) and
-ball neighborhood are the two most popular approaches [
20]. As for graph weighting methods, Zhou et al. [
21] use the Gaussian kernel (GK) function to calculate the edge weight, however if only a few labeled samples are provided, it will be hard to determine the hyper-parameters in the function [
22]. Wang et al. [
22] propose a non-negative local linear reconstruction (LLR) to use the neighborhood information of each data point to construct a graph in order to derive a more reliable and stable way to construct the graph. First, they approximate the entire graph as a series of overlapping linear neighborhood patches, then they find the edge weight of each linear neighborhood patch, and then they aggregate all the edge weights together to form the edge weight matrix of the entire graph; Ma et al. [
23] consider that sparsity is essential for improving the efficiency of SSL algorithms. Therefore, they propose local linear embedding (LLE)-based weight which can capture the local geometric properties of hyperspectral data and is good for weighting the graph edge in a low-level computational cost. Zhuang et al. [
24] proposed nonnegative low-rank and sparse (NNLRS) approach to use both low-rankness of high dimensional data samples and the sparsity to construct a good graph. The obtained graph can capture the local low-dimensional linear structures of the data samples and the global cluster or subspace structures of the data samples.
However, these traditional methods share the same disadvantages that they all have fixed manually tuning parameters. As a result, this category of graph construction methods are very sensitive to the data noise and parameter variations.
2.2. SR-Based Graph Construction Methods
Unlike the traditional graph generation approaches, the SR-based methods can learn the local relationship from samples and compute the well-discriminated edge weights of the graph. By encoding a certain sample as a sparse linear combination of all the other samples, the sparse coefficients of the linear combination can be viewed as the edge weights from the certain sample to all the other samples [
13,
14]. By doing so, the graph that LP algorithm demanded could be generated.
In addition to the most SR based methods, Shao et al. proposed the probabilistic class structure regularized sparse representation (PCSSR) approach. In their work, the authors manage to incorporate the SR model with a probabilistic class structure that reflects the probabilistic relationship between each sample and each class. Further, with the probabilistic class structure provided, the distance between each two samples can be acquired according to the difference between their probabilistic class labels. Finally, a class structure regularization is developed using the distance between each two samples. The authors claim that, with the class structure regularizer, PCSSR can learn a more discriminative graph from the data, and as shown in the experimental results, the PCSSR method outperforms state of the art on Hyperion and airborne visible infrared imaging spectrometer (AVIRIS) hyperspectral data. The class structure regularizer and the full model of PCSSR are shown in Equations (1) and (2), respectively, where
is the adjacency matrix we need to obtain for the LP algorithm,
is the distance matrix and each entry
represents the distance between two samples based on the difference between their probabilistic class label, and
denotes all samples in training set and testing set.
However, the probabilistic class structure used in the PCSSR paper in obtained only through another SR process, which fails to take into account the abundant spatial information in the HSI dataset. Despite the highly discriminative capability to achieve high classification accuracy, PCSSR suffers from the limitation for neglecting the spatial information of HSI. Since sample pixels have the characteristics of spatial continuality, failing to consider spatial information would miss such important characteristics that are beneficial for enhancing classification capability. Therefore, we conclude that the classification results by using only spectral information would lack spatial continuality and smoothness. In order to address the above-mentioned limitation, our work aims to incorporate the spatial distance information into PCSSR to improve the discriminative capability of PCSSR, which will be introduced and tested in the following sections.
3. Modeling and Algorithms
This section details the proposed HSI classification approach that introduces CASD in a SR graph-based method, in order to take advantage of spatial information for improving the classification accuracy. The fundamental idea is to use our proposed CASD instead of the distance matrix
acquired by SR process in the original PCSSR method to measure the distance between any two samples. The CASD-assisted PCSSR can achieve a more accurate and reasonable measurement of sample distances. We further employ the LP algorithm to predict the probability of each unlabeled pixel belonging to a certain class.
Figure 1 illustrates the general flow of the proposed CASD-assisted HSI classification method. In what follows, we describe in detail the main steps in this classification flow.
3.1. Class Adjusted Spatial Distance
For the purpose of incorporating spatial information into PCSSR, we propose using CASD to replace distance matrix
required by SR process in the original PCSSR. We first provide a brief introduction to th planar Euclidean distance (PED). Consider two points
,
in a plane. The PED between these two points is defined as:
As we have discussed, to improve the performance of the PCSSR algorithm, a proper distance measurement between each two samples is needed. The distance matrix should reflect the similarity or difference among samples. Since each sample is just an area on the ground, the simplest way to measure the distance between each two samples is by calculating the spatial distance, i.e., the PED between them. The distribution of land covers is usually in a continuous way, so if a sample belongs to some class , the samples in its spatial neighborhood are likely to belong to the same class as it. Thus, we can use the PED between two samples to represent their similarity.
However, PED has its limitation for measuring the distance information that PCSSR needs. It is possible that two samples distant from each other belong to the same class, which is not unusual in the land cover classification. In this case, PCSSR using PED would fail to classify such samples. To overcome this limitation, we introduce the class adjusted spatial distance (CASD) to replace the naïve planar Euclidean distance. Generally speaking, the CASD is a distance measurement which considers not only the Euclidean distance between two samples but also their class difference. We mainly use the Euclidean distance algorithm and the shortest path algorithm to solve the CASD.
We first generate a complete undirected graph
where
represents all the
samples and
is valued with the Euclidean distances between every two samples. The distance from a sample point to itself is defined as
. Then, we check all the labeled samples (vertices) in the complete graph
. If two labeled samples belong to the same class, we change the edge weight between them to
. In this way, we make the samples with the same class “closer” to each other. At the last step, we apply the shortest path algorithm (for example Dijkstra algorithm [
25]) between every two vertices in the graph
and revalue the edge weight between them with the length of the computed shortest path. We define this new edge weight as “the class adjusted spatial distance”. The above process is illustrated in
Figure 2, and the Algorithm 1 is described below.
Algorithm 1: Compute CASD for each two samples |
Input: Array with the spatial coordinates of labeled samples and the coordinates of unlabeled samples , the label vector that notes the class of every labeled sample |
Output: The adjacency matrix |
Weight the edges in the graph by: Update the edge weight between every two labeled samples , according to the following equation: Calculate the shortest path between every two vertices , in the graph: Update the edge weight
|
The element value in the output adjacency matrix represents the calculated CASD between the i-th sample and j-th sample.
3.2. CASD-Assisted PCSSR
Based upon the CASD metric defined in
Section 3.1, we now describe how to generate the graph for the LP algorithm by using the PCSSR flow. To start with, the PCSSR-based graph generation method is derived from the typical SR-based method. For every sample, the SR based method aims to encode it as a sparse linear combination of the other samples [
13,
14]. The typical SR model is formulated as follows:
where
denotes all the samples in training set and testing set;
represents the L-1 norm. By solving this regularization model, we can obtain the graph weight matrix
demanded in the following LP process.
Furthermore, due to the complex working environment and contamination during the data transmission, many hyperspectral images are corrupted by different types and amounts of noises, two common types of which are stripping noise and salt-and-pepper noise. Therefore, considering the corrupted data and the noise during collection, the method can be rewritten as follow to enhance the robustness against noises:
where
denotes all the samples in training set and testing set and
is a tradeoff parameter that controls the sparsity of
.
In the next step, we come to a point of divergence from the original paper—the original PCSSR paper next introduces a probabilistic class structure term where represents the possibility that a sample belongs to the class , and then calculates the distance matrix based on the probabilistic class structure , where . It is necessary to state that, in the original PCSSR paper, the probabilistic class structure is generated through a standard SR process, and one of the aims of our work is to introduce the spatial information into the PCSSR.
Therefore, instead of computing the probabilistic class structure, we run Algorithm 1, as proposed in
Section 3.1, to get the CASD information between each two samples and apply the CASD information as the new distance matrix
, where
measures the distance between the i-th and j-th sample. If the two samples are close to each other (by category or in spatial),
will be a small number, which means they are similar to each other. The additional regularizer for graph edge matrix
is as follows:
Obviously, to acquire a smaller
,
needs to be small when
is a large number. By this regularizer, the linkage between two far-away samples will be regularized into a weak linkage. Once we obtain the similarity (or the distance) matrix
by calculating CASD, the final formula of our CASD-assisted PCSSR approach is formulated as:
where
controls the sparsity of
,
controls the effect of class structure regularizer. The model formulated in (7) is a constrained optimization problem and can be relaxed and solved by Lagrange multipliers methods, for example the alternating direction methods of multipliers (ADM) [
26]. However, ADM has the disadvantage of introducing extra variables and requiring parameter tuning. In this work, following the original PCSSR method [
17], we employ the ADM with adaptive penalty (ADMAP) [
27], which can overcome the above-mentioned limitations, to solve problem (7).
3.3. Label Propagation
After getting the sparse graph and its adjacency matrix
, we can obtain the final prediction result by using the LP algorithm on the obtained graph. As mentioned in
Section 2, the main purpose of the LP algorithm is to transfer labels from the labeled samples to unlabeled samples, and during this process, a prediction matrix will be generated. Furthermore, the generated prediction results should meet the basic assumption of LP algorithm that similar samples should have similar labels. The mathematical way of achieving the purpose of the LP algorithm is to define an energy function
with a given graph and to minimize the function
.
where
are respectively the predicted label vectors of the i-th and j-th data samples.
is composed of all the predicted label vectors. The matrix
is the adjacency matrix of the graph needed for the LP process.
In order to maintain the experimental consistency with the original PCSSR paper, we follow the formula of LP algorithm used in the original PCSSR paper. The full explanation and the adapted formula are detailed as follows.
The labeled samples are expressed as
, and a large number of unlabeled samples
. There are total
classes denoted as
. Let
be the total number of data samples, and usually, the value
is much smaller than
. The matrix
, the adjacency matrix of graph
which can be obtained from the PCSSR process, implies the similarity or the connection between each two samples. Next, we define a label matrix
with
rows, where each row
is a one-hot vector representing the class that the corresponding labeled sample
belongs to.
is the prediction matrix, of which each element
represents the probability of the i-th sample belonging to the j-th class;
is the upper
rows of
, while
is the lower
rows of
.
where the expression after
can be viewed as the energy function;
are the predicted label vector of the data sample
.
is the Laplacian matrix where
is a diagonal matrix, and
.
Then we split
into 4 blocks by the number of labeled and unlabeled samples:
Finally, we get the prediction matrix that records the possibility of each unlabeled sample belonging to each class:
The final prediction result for every unlabeled sample is given by:
where
denotes the class that the unlabeled sample
is most likely to belong to.
4. Experimental Results and Analysis
In this section, we will test the CASD assisted PCSSR algorithm on six different hyperspectral datasets. The algorithm is implemented with MATLAB 2019b and runs on a laptop with i5-7300HQ and GTX 1050TI. We use traditional graph-based algorithms in comparison. The codes and datasets used to generate the results and figures are available in Code Ocean [
28].
4.1. Experimental Datasets
Two groups of datasets are used to evaluate our model. The first group includes the whole Botswana (BOT) dataset, the whole Kennedy Space Center (KSC) dataset, and the truncated Indian Pines (truncated IND PINE) dataset where the labeled ground blocks are relatively discrete and distant from each other. Different from the first group, the labeled samples in the second group are less discrete and always appear in bulk. The whole Indian Pines (IND PINE) dataset, the whole Salinas (SAL) dataset, and the whole Pavia University (PAV) dataset are included.
The BOT dataset was collected by the Hyperion sensor on EO-1 satellite over the Okavango Delta, Botswana in May 2001. The 242 spectral bands of the Hyperion image are ranging from 357 to 2576 nm with a spatial resolution of 30 m. Total number of 145 bands in BOT are left after removing some un-calibrated and noisy bands. The KSC, IND PINE and SAL dataset were separately gathered by the AVIRIS sensor over the Kennedy Space Center on March 23, 1996, over the Indian Pines test site in North-western Indian in 1992 and over the Salinas Valley, California, with 224 spectral reflectance bands in the wavelength ranging from 400 to 2500 nm. The un-calibrated bands and noisy bands covering the water absorption feature are removed and only 200 bands remain. The PAV dataset was acquired by the reflective optics system imaging spectrometer (ROSIS) sensor over Pavia University with 103 spectral bands and a spatial resolution of 1.3 m. Before analysis, some of the samples which contain no information are discarded.
More information of these six datasets can be found in [
29], and all datasets can be downloaded from [
28]. The ground truth of every dataset is shown in
Figure 3 and
Figure 4. The sample size of each class in each dataset is shown in
Table 1 and
Table 2.
4.2. Experimental Setup
In this part, we evaluate the performance of our CASD assisted PCSSR algorithm on all datasets, and its performance on group I datasets will be compared to other traditional graph-based classification methods stated in [
17], including the original PCSSR graph method, the Gaussian kernel (GK) graph method, the nonnegative local linear reconstruction (LLR) graph method, the local linear embedding (LLE) graph method, the nonnegative low-rank and sparse (NNLRS) graph method, and the SR graph method. Our CASD assisted PCSSR approach is implemented under the same label propagation framework as other models, and the hyperparameters from other models stay the same as [
17]. The process of hyper-parameter determination during our model development will be stated in
Section 4.4.
We separate every dataset into two parts, i.e., the training set and the testing set. In our case, the latter is much larger than the former. For each dataset, we randomly pick out 3/5/10/15/20 samples per class as the training set (the labeled samples), and the rest as the testing set (the unlabeled samples). An example of dividing IND PINE dataset is illustrated by
Figure 5. To accord with [
17], we run our algorithm 20 times for each dataset. The mean of overall accuracy (OA), average accuracy (AA), and the Kappa coefficient are utilized to evaluate the classification results.
4.3. Results and Discussion
Figure 6 shows how the classification overall accuracy (OA) changes with the number of labeled samples on six different datasets, and
Figure 7 demonstrates the visualized classification results. For the classification result on the KSC dataset, as illustrated in
Figure 6a, the CASD assisted PCSSR-graph method performs better than other methods when the number of labeled samples is more than 5 per class, finally achieving an accuracy about 97% and about 10% higher than other methods. For the result on the BOT dataset, as illustrated in
Figure 6b, the CASD assisted PCSSR-graph method performs better than other methods when the number of labeled samples is more than 5 per class, finally achieving an accuracy about 99% and about 5% higher than other methods. For the result on the truncated IND PINE dataset presented in
Figure 6c, the performance of our method surpasses other methods all along, and obtains an accuracy about 96% and about 16% higher than other methods when the number of labeled samples is 15 per class.
Furthermore, the classification accuracy of each class, the overall accuracy (OA), the average accuracy (AA), and the Kappa coefficient for the different graph-based methods on three datasets are shown in
Table 3,
Table 4 and
Table 5, where the highest value of each row is shown in bold. For the BOT dataset,
Table 3 exhibits that our method outperforms all the other algorithms with the best class-specific accuracies on almost all indices on all classes. The only exception is that on Class Two, our method achieves an accuracy 99.93% whereas the highest accuracy is 100.00%. For the KSC dataset,
Table 4 presents that our method achieves better performance than all the other algorithms on almost all indices. The only exception is that on Class 11 our method achieves an accuracy 99.64% whereas the highest accuracy is 99.70%. For the truncated IND PINE dataset,
Table 5 shows that our method outperforms all the other algorithms once again with the best class-specific accuracies on almost all indices. The only exception is that on Class Eight our method achieves an accuracy 99.64% whereas the highest accuracy is 100.00%.
All the above figures and tables clarify that the classification accuracies of our model are more satisfactory than other traditional graph-based methods. Based on the above experiment results, we can come to the following conclusions:
For datasets in Group I, the CASD assisted PCSSR algorithm doesn’t perform so well when a small number of labeled samples are provided. However, as more labeled samples are given, our method gradually surpasses other graph-based methods, finally by more than 5% in overall accuracy. The experiment result indicates the introduction of the spatial information can effectively improve the classification accuracy of those traditional spectral-focusing algorithms when given a relatively larger training set.
For datasets in Group II, our algorithm achieves super high accuracy on the SAL dataset. While for the IND PINE dataset, compared to the truncated one in Group I, the algorithm gets poorer performance on the whole IND PINE dataset than on the truncated one.
Conclusion 1 states that the performance of the CASD assisted PCSSR algorithm is highly related to the number of labeled samples for each class. Lack of labeled samples leads to low accuracy and the increment of labeled samples can improve the result effectively.
Since the result of the PCSSR algorithm is regularized by the probabilistic class structure which is generated by our CASD algorithm, the distances (CASDs) between samples have a great effect on the final performance of our algorithm. We can do the following operations to visualize the effect of the distances: for every unlabeled sample, find out the labeled sample with the shortest CASD to it, then mark that unlabeled sample. The classification results on BOT dataset (with three labeled samples per class) are shown in
Figure 8. Please notice that “the visualization of the CASD” is an independent process, which is only for a better understanding of how well the CASD is measured. It is not an intermediate result of CASD assisted PCSSR algorithm.
It is easy to see, if the labeled samples we select from different categories are very limited, they can’t be sufficiently assigned to every ground block in testing set. During the classification of such a sample block, if the samples of the same category are far away or the samples of the different classes are nearby, misclassification is likely to happen. The flaws in the probabilistic class structure generated by spatial algorithm can interfere the following sparse representation process, finally resulting in a decrease of accuracy. With the number of labeled samples increasing, the probability that a block is assigned to labeled samples will rise, the accuracy of the algorithm will be improved, and finally, the OA will be improved.
The spatial algorithm performs well only when the samples to predict are close to the labeled samples. As the distance increases, the reliability of prediction will drop. Besides, the classification boundary delineated by the spatial algorithm does not take into account the edge information of the hyperspectral figure. Therefore, samples in the intersecting area between classes are more affected by neighbor samples and more likely to be assigned to an incorrect category. If there are many unlabeled samples near the intersecting area, the classification result based on CASD could be unsatisfactory (
Figure 9). To sum up, the classification effect will be relatively poor at the category boundary away from the training samples. Conversely, if the ground blocks to be classified in the dataset are broken and scattered, the classification boundary is more likely to fall in negligible areas (the black background area), and the classification center is more likely to fall within the ground block that needs to be classified. Therefore, with enough training samples given, the classification result on the more scattered dataset are basically better.
4.4. Parameters Sensitivity Analysis
In this subsection, we will discuss the parameter sensitivity of our model using the truncated IND PINE dataset with 10 labeled samples selected from every class. There are two parameters in PCSSR algorithm,
and
.
controls the sparsity of
while
controls the effect of class structure regularizer. We repeat 50 runs for each fixed parameter configuration and present the average results. For example, in
Figure 10a, we use a fixed
value and varies
value to observe the classification results. For each
value to be observed, we repeat 50 runs, calculate the classification accuracy during each run, and finally obtain the average accuracy. During the experiment, we first keep
equal to
and vary the value of
from
to
with the step of
. As we can see from
Figure 10a, the algorithm reaches the optimal performance when
equals
. Then we fix
and let
change. As illustrated in
Figure 10b,c, the OA basically keeps the same when
is between
to
, and drops when
is larger. The result shows that sparsity and probabilistic structure both matter in the classification process, though the variety of performance isn’t so great when parameters change.
5. Conclusions
This paper has developed a novel graph construction method called CASD assisted PCSSR algorithm. The proposed method introduces the spatial information into the classification process on the SR graph, so that the “distance” of two samples can be measures by both spatial distance and class distance. It is shown by the experimental result that CASD assisted PCSSR algorithm is an effective method for hyperspectral data classification and can achieve a relatively high performance when enough training samples are provided.
The shortage of our method also exists: Firstly, the number of training samples should be sufficient for the training process. If the training set is very limited while the ground blocks to predict are in large numbers, the final performance might be not as good. However, due to the sparse representation model used in this work, we only need a relatively small size of training set to accomplish model training. Secondly, categorizing by CASD doesn’t assure a well-delineated intersection line between classes, which means the samples close to that line might be badly classified. Nevertheless, the final output of the model could be corrected by the following sparse representation process since the CASD algorithm only provides a “suggestion” to the PCSSR algorithm. Our future work is to extract the edge information from the hyperspectral data. Applying it to the CASD algorithm may compensate for the lack of classification accuracy in the intersecting area between classes.