Next Article in Journal
A Robust and Effective Identification Method for Point-Distributed Coded Targets in Digital Close-Range Photogrammetry
Previous Article in Journal
Self-Calibrated Multi-Floor Localization Based on Wi-Fi Ranging/Crowdsourced Fingerprinting and Low-Cost Sensors
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Unsupervised Cluster-Wise Hyperspectral Band Selection for Classification

by
Mateus Habermann
1,*,
Elcio Hideiti Shiguemori
1 and
Vincent Frémont
2
1
Institute for Advanced Studies, Sao Jose dos Campos 12228-001, Brazil
2
Department of Automatics and Robotics, Nantes Université, École Centrale de Nantes, CNRS, LS2N, UMR 6004, F-44000 Nantes, France
*
Author to whom correspondence should be addressed.
Remote Sens. 2022, 14(21), 5374; https://doi.org/10.3390/rs14215374
Submission received: 17 August 2022 / Revised: 6 October 2022 / Accepted: 18 October 2022 / Published: 27 October 2022
(This article belongs to the Topic Big Data and Artificial Intelligence)

Abstract

:
A hyperspectral image provides fine details about the scene under analysis, due to its multiple bands. However, the resulting high dimensionality in the feature space may render a classification task unreliable, mainly due to overfitting and the Hughes phenomenon. In order to attenuate such problems, one can resort to dimensionality reduction (DR). Thus, this paper proposes a new DR algorithm, which performs an unsupervised band selection technique following a clustering approach. More specifically, the data set was split into a predefined number of clusters, after which the bands were iteratively selected based on the parameters of a separating hyperplane, which provided the best separation in the feature space, in a one-versus-all scenario. Then, a fine-tuning of the initially selected bands took place based on the separability of clusters. A comparison with five other state-of-the-art frameworks shows that the proposed method achieved the best classification results in 60% of the experiments.

Graphical Abstract

1. Introduction

In pattern recognition problems, the separation among classes in the feature space is of great importance for the success of the classifier [1]. An appropriate separation may be achieved by means of effective data representation  [2,3]. When it comes to hyperspectral image (HSI) classification, by selecting the right bands, one can provide a wider class separation [4], as well as attenuate the negative effects of the Hughes phenomenon [5] and avoid the overfitting of the classifier [6,7,8].
In such a scenario, feature extraction (FE), i.e., a combination of the original spectral bands, is capable of tackling the aforementioned problems, but it is not a recommended approach for dimensionality reduction of hyperspectral data, because the resulting features do not carry the physical information any longer [9], impairing, consequently, a proper understanding of the model [10,11]. Band selection (BS), on the other hand, is as good as FE in terms of providing class separability; moreover, it keeps the original information about the spectral bands [9,12]. Since a BS method provides suitable bands for a given task, it is possible to design tailored sensors to perform that application, consequently avoiding redundant and irrelevant bands [13].
BS methods can be grouped into one out of three major categories [14]: wrapper methods, when the selection of bands occurred during the training phase of the classifier; in this case, the classifier must be trained from scratch every time a band subset is assessed; embedded methods, when the classifier selects the bands by itself, for example, Lasso [15]; and filter methods, when the band selection process takes place before the classifier training phase; it has no relation to the classifier to be used [2].
In terms of the available data set to train the algorithm, a band selection framework can be considered supervised [16,17,18,19], semi-supervised [20,21,22,23], or unsupervised [24,25,26]. The latter ends up being the most feasible in real applications due to the difficulty in acquiring labeled samples  [27].
It is known that unsupervised state-of-the-art BS frameworks follow either a ranking-based [28] or a clustering-based approach [29]. Ranking-based methods sort the spectral bands in relation to a specific criterion. However, they fail in terms of avoiding correlated bands [30]. Clustering-based BS frameworks, on the other hand, aim to find the most representative bands of each cluster of the data set, decreasing the correlation amongst bands [31]. Thus, clustering in the BS literature is normally used to form clusters of spectral bands. For instance, in [32], the authors propose a BS method that uses dynamic programming to cluster the spectral bands, which are considered continuous. In [33], the density peak is used for the clustering of the bands, due to its capability of tackling non-spherical data. The authors propose the weight between the normalized local density and cluster distance. In [34], the BS algorithm is performed based on correntropy-based clustering of the spectral bands. In [29], the authors propose a BS algorithm that initially clusters the spectral bands based on Euclidean distance. Then one band per cluster is selected, resulting in a band subset whose bands are ranked in a fine-tuning step. In [35], the proposed BS uses a self-tuning algorithm to cluster the spectral bands. In [13], a kernel-based probabilistic clustering of spectral bands is proposed, based on the assumption that there is a smooth transition between two adjacent clusters. Finally, in [36] the most representative bands for each pixel are selected by means of an attention mask. Then, an autoencoder reconstructs the original image using the selected bands. In the end, the final bands are selected by a clustering method.
As those cited papers performed the clustering operation on the spectral bands, structural information on the data set was not taken into account. Moreover, since the final objective of band selection frequently lies in a better classification of the data instances, an analysis based solely on the best representative bands (without looking at class separation on the feature space) ends up being of secondary importance. Furthermore, in an unsupervised BS framework, normally the filter approach is used. Thus, the band selection takes place in a preprocessing step, i.e., before the use of the classifier itself [37]. Thus, a priori one does not know beforehand which classifier will be used. For this reason, this paper presents a BS framework that seeks to maximize the distance among classes in the feature space. Therefore, our main purpose is not the representation of the data set by a few bands [38], but rather the selection of bands that best separate the classes. This class separability during the selection of bands is the gap we propose to fill in relation to other approaches. Since this framework is set to work in an unsupervised environment, the actual classes are represented by clusters.
Thus, in the proposed approach, the bands were iteratively selected based on data set portions, which, in turn, were defined by clustering algorithms. Eleven clustering methods were evaluated in order to provide the best match between the resulting clustering and the actual data classes. Thus, the clusters formed may be deemed as representatives of the actual classes, which enables an analysis based on the separability of the classes in the feature space; consequently, structural information was taken into account. Once the clusters were formed, a one-versus-all approach was adopted. In this way, the selected bands were those that provided the best separability between the cluster and the rest of the data set. Then, those bands were subjected to a fine-tuning procedure, which consisted of placing these bands into some clusters in order to select a combination of those that provided the biggest cluster separability in the feature space. The proposed method bears the acronym CW due to its cluster-wise approach.
The contributions of this paper are as follows:
  • The use of a cluster-wise approach to solving the unsupervised band selection problem;
  • Once two clusters were formed, the selection of bands was based on the parameters of a hyperplane defined by a single-layer neural network;
  • Fine-tuning of the selected bands based on cluster separability in the feature space.
In Section 2, the proposed method is presented. In Section 3, the results of the proposed method are compared to five competitors by using three classifiers and three hyperspectral images commonly used in BS literature. Finally, in Section 5, we offer the conclusion of this work.

2. Method

Every BS algorithm is supposed to select relevant features—refer to [39] for a thorough definition of feature relevance. In short, a relevant spectral band (i) should provide useful information [40]; and (ii) should not be redundant  [14]. Since the proposed band selection framework is designed for classification purposes, the bands considered to provide useful information are those that provide maximum separation between clusters in the feature space. When it comes to redundancy between spectral bands, in this work, it is measured by correlation.
Therefore, following this reasoning, the proposed method is composed of three parts: Data clustering; Selection of bands of interest; and Redundancy reduction.

2.1. Data Clustering

Regarding unsupervised problems—data reconstruction [2] and data structure analysis (DSA), for instance, are approaches that render feature selection feasible.
Data entry clustering can find natural groupings in data sets, and, for this reason, it is considered a DSA-based band selection approach, when used for this purpose.
Inspired by [41], the proposed method also performs clustering of the data entries. However, here, we adopted a partitional clustering instead of a hierarchical one, as illustrated in Figure 1a. With partitional clustering, each resulting cluster C i , i { 1 , 2 , , k } , could be taken as a representative of the real class if (i) k equals the number of classes present in the data set, and (ii), the clustering algorithm, is appropriate for the data set at hand.
One generally wants to classify objects present in a known scene, supposing one knows beforehand the number k of classes is plausible.

Choice of the Clustering Algorithm

As for the fitness of a clustering algorithm to hyperspectral data, 11 methods were evaluated. It is worth noting that our focus was not on the best clustering algorithm available in the literature, but rather just to use some well-established clustering algorithms to show the efficiency of the proposed method.
The input data are the Salinas hyperspectral image [42], with 224 bands and 16 classes.
So, each clustering algorithm was set to find k = 16 clusters—as we will see later in this paper, the proposed method sets k equal to the number of classes in the image. It is important to clarify that, at this point, the focus is on the comparison of clustering methods, so the data labels will be used.
The measure of agreement between the two data sets—the clustering result and Salinas ground truth—was computed by means of the adjusted Rand index (r) [1]. In short, let κ 1 and κ 2 be two different clustering types of a given data set; where κ 1 is the real class of the Salinas image and κ 2 is the result of a clustering algorithm.
Considering all pairs of vectors x j and x l , with j l , let α 1 be the number of times that both vectors belong to the same clusters in the clustering types κ 1 and κ 2 . Moreover, let α 2 be the number of times the vectors belong to different clusters in κ 1 and different clusters in κ 2 .
Finally, the adjusted Rand index between clusters κ 1 and κ 2 is given by
r = ( α 1 + α 2 ) / m ,
where m is the number of possible vector pairs in the data set.
For each clustering algorithm, r was calculated 10 times. Table 1 shows the mean values for k-means and k-medoids algorithms with different distance metrics. K-means using the cosine similarity measure has the best outcome. For the sake of clarity, the bigger the values of r, the more similar the clustering types of κ 1 and κ 2 .
Consequently, all of the clusters throughout this paper were obtained by k-means using the cosine similarity measure.
It is worth mentioning that an appropriate partitional clustering is able to turn supervised band selection algorithms into unsupervised ones, by taking the resulting clusters as class representatives, and the degree of success depends on r values. This paper follows that approach, by considering [4] as a reference. Since 0 r 1 , where 1 means the two clustering outcomes match identically, r = 0.7941 indicates a good match between the clusters and the real classes of Salinas HSI. At this point, we opted to analyze a hyperspectral image not used in Section 3 in order to maintain the unsupervised nature of the proposed approach.

2.2. Selection of Bands of Interest

Once the initial data set is split into k clusters, it is time to present the proposed band selection algorithm, which has k iterations. At each iteration, two steps take place: (i) the selection of candidate bands and (ii) fine-tuning.

2.2.1. Selection of Candidate Bands

Let C 0 = [ b 1 , b 2 , , b d ] R n × d be the initial cluster, i.e., the HSI, where b j is the j t h band vector whose norm l 2 is scaled to 1, n is the number of pixels and d is the dimensionality of the data set.
Let C i , i { 1 , 2 , , k } , be the k clusters after the partitional clustering of C 0 , where k is the number of classes in the data set.
The following properties hold for the clusters:
  • C i , for i { 1 , 2 , , k } ;
  • i = 1 k C i = C 0 ;
  • C i C l = , with i l and i , l { 1 , 2 , , k } .
For each cluster, a one-versus-all binary classification was performed between C i and C 0 C i .
As in [4], we used a single-layer neural network to generate the separating hyperplane f. As an illustration, both the one-versus-all classification and the hyperplane f are shown in Figure 1b.
The cross-entropy loss function of the neural network is given by
L f = 1 η j = 1 η [ y j l o g ( y ^ j ) + ( 1 y j ) l o g ( 1 y ^ j ) ] ,
where η is the cardinality of the set containing the data points —since we make | C 0 C i | | C i | in order to balance the two clusters, η n —; y j { 0 , 1 } is the expected output to the input vector x j R d × 1 , where label 1 corresponds to cluster C i ; and y ^ j is the calculated output given by
y ^ j = 1 1 + e z j ,
which is the sigmoid activation function. where e is the Euler’s number, and z j is the hyperplane equation
z j = x j ( 1 ) w ( 1 ) + x j ( 2 ) w ( 2 ) + + x j ( d ) w ( d ) + β ,
where w R d × 1 and β R —both calculated by a single-layer neural network—are the hyperplane f parameters.
The training phase of the network consists of 2000 training epochs, using the backpropagation algorithm, with 70% of the data set for training and the remaining 30% for the test.
After the neural network’s training, a given input vector x j will cause either z j 0 or z j < 0 . As z j is the argument of a sigmoid function, if
  • z j 0 , then y ^ j r o u n d ( y ^ j ) = 1 ,
  • z j < 0 , then y ^ j r o u n d ( y ^ j ) = 0 ,
where r o u n d ( y ^ j 0.5 ) = 1 , and r o u n d ( y ^ j < 0.5 ) = 0 .
The band selection is based on the magnitude of weight vector components w ( l ) , l { 1 , , d } . Indeed, according to (4), the biggest weights in magnitude, | w ( l ) | , will strongly determine the signal of z j . Therefore, the bands x j ( l ) —related to the biggest | w ( l ) | —are the most relevant for the binary one-versus-all classification, and are, consequently, initially selected.
In order to provide an illustrative view on this matter, Figure 2 depicts a 2D situation in which a linear classifier, represented by a line segment, separates two different clusters, in red and blue colors, composed of synthetic data of variables v 1 and v 2 . In Figure 2a, the clusters are linearly separable, and it is easy to perceive that this separation is provided by variable v 2 , whereas variable v 1 bears similar values for both clusters. It is worth noting that the green line’s parameters w ( 1 ) and w ( 2 ) , calculated by a single-layer neural network ( β is omitted), indicate the relative importance of variables in this binary classification. That is, | w ( 2 ) | = 4.8126 > | w ( 1 ) | = 0.5782 indicates a higher relevance of variable v 2 in relation to v 1 . A similar situation occurs in Figure 2b, but this time | w ( 1 ) | > | w ( 2 ) | , indicating that variable v 1 provides better separability between the clusters. Figure 2c,d shows that the same analysis is valid even when the clusters overlap.
According to the proposed method, the number s of selected bands is defined by the user.
Since the method has k iterations, the selection of ( s / k ) N bands per iteration would be sufficient. However, at each iteration, the method selects 4 ( s / k ) bands, from which only s / k are kept after the fine-tuning step. It is worth noting that, except for 4 ( s / k ) , other numbers have not been tested.

2.2.2. Fine-Tuning

At each iteration i { 1 , 2 , , k } , 4 ( s / k ) bands are selected based on the biggest weights | w ( l ) | , according to (4).
Those bands are then placed in s / k clusters—by means of k-means (Euclidean)— q l , l { 1 , 2 , , ( s / k ) } , and from each cluster 1 band b will be initially selected.
By picking 1 band from each cluster q, several tuples t are formed. An example of it is shown in Figure 3. The exact number of tuples is | q 1 | × | q 2 | × × | q s / k | . Formally, at iteration i the set containing all tuples is given by
Q = { ( b i ( 1 ) , , b j ( s / k ) ) , b j ( l ) q l , l { 1 , , ( s / k ) } } .
Note that this approach for refining the band selection is based on [43]; however, here we adopted a different criterion to assess the importance of each tuple of bands.
For each tuple t Q , its bands were evaluated according to the class separability they provided between C i and C 0 C i . At this point, the data sets of both clusters contain only the bands in t.
The class separability is measured by the ρ R index,
ρ = t r a c e ( S w + S b ) t r a c e ( S w ) ,
where
S w = p C i Σ C i + p ( C 0 C i ) Σ ( C 0 C i ) ,
and
S b = p C i ( μ C i μ 0 ) ( μ ( C 0 C i ) μ 0 ) p ( C 0 C i ) ,
where μ C i is the mean of cluster C i , μ 0 is the global mean, Σ is the covariance matrix, and p is the a priori probability. Since the clusters C i and C 0 C i are balanced, i.e., | C i | = | C 0 C i | , then p C i = p ( C 0 C i ) = 0.5 .
According to (6), the bigger the ρ , the more compact the clusters are, and the more distant they are from each other.
Finally, for each tuple t Q there is a corresponding ρ value, and only the bands in t m a x , whose ρ is the biggest, are selected at iteration i.

2.3. Redundancy Reduction

Let Ψ R d × d be the correlation matrix of the data set C 0 , calculated according to Pearson’s correlation coefficient.
Let ψ be a set composed of the bands the most correlated to the band represented by ψ indices. Set ψ is calculated before the iterations start, according to Algorithm 1. For the sake of clarity, ψ j = b l , for instance, means that b l is the band the most correlated to band  b j .
Algorithm 1 starts by creating a correlation matrix Ψ , according to Pearson’s correlation. After the initialization of matrix I and vector ψ , we sort all the columns of Ψ in a descend fashion, and store the indices i d x , which corresponds to the band indices. To the first position of ψ is assigned the band b I ( 1 , 1 ) . Then the remaining positions of ψ receive the bands the most correlated to the band corresponding to that position in a way no same band is assigned to more than one position. The output is the vector ψ .
Given t m a x and ψ , the subset δ t m a x of the bands the most correlated to those in t m a x is given by Algorithm 2.
In Algorithm 2 we have t m a x and ψ as input. Given the bands in t m a x , the algorithm finds the most correlated bands to those in t m a x and insert them in subset δ t m a x , which is the output of the algorithm.
Algorithm 1 Most correlated bands
1: Ψ = c o r r ( C 0 )                 ▷ Pearson’s correlation
2: Initialize matrix I
3: ψ =
4: for all the columns c Ψ  do
5:     [ v a l u e s , i d x ] = s o r t ( c , d e s c e n d )         ▷ i d x N 1 × d
6:     I = [ I ; i d x ]
7: ψ 1 b I ( 1 , 1 )
8: for i = 2 : d do
9:    for j = 1 : d do
10:        if  b I ( i , j ) ψ  then
11:            ψ ψ b I ( i , j )
12:           Break
13: Return: ψ
Algorithm 2 The most correlated bands to a given subset
1: Input: t m a x , ψ
2: δ t m a x =
3: for j = 1 : | t m a x | do          ▷ Vector cardinality
4:    for l = 1 : | ψ |  do
5:        if  ψ l = t j m a x  then
6:            δ t m a x δ t m a x b l
7: Return: δ t m a x
Finally, at each iteration i, the bands in t m a x are selected and inserted in S , which is the final subset of selected bands. So,
S S t m a x .
Once we have t m a x , its most correlated bands δ t m a x are inserted in subset D , which are the bands to be discarded. That is,
D D δ t m a x .
Then, for the next iteration i + 1 the data set is updated according to
C 0 C 0 ( S D ) .
The method iterates until i = k . Then, the final output is the subset S of selected bands.

2.4. Proposed Method’s Overview

Algorithm 3 presents the overview of the proposed method.
Algorithm 3 Proposed band selection algorithm
1: Input: Data set C 0 , number k of classes
2: S =                        ▷ Subset of selected bands
3: D =                     ▷ Subset of bands to be discarded
4: Proceed to k-means clustering (cosine distance) of C 0 into k clusters C i
5: for i = 1 : k do
6:    Proceed to a binary classification between clusters C i and C 0 C i (one-versus-all)
 using a single-layer neural net
7:    Select the 4 ( s / k ) N bands related to the biggest separating hyperplane parameters
| w | , according to (4)
8:    Proceed to the band selection fine-tuning, according to Section 2.2.2
9:    Update subset of selected bands S according to (7)
10:    Update subset D according to (8)
11:    Update data set according to (9)
12: Return: S

3. Results

Normally the quality of the subset of selected bands is assessed by the performance of the classifiers. So, this approach is adopted here, with support vector machine (SVM), K-nearest neighbor (KNN), and classification and regression tree (CART) classifiers [1], via three hyperspectral data sets used in several BS papers: Botswana, Indian Pines, and Pavia University. All of them can be downloaded at [42].

3.1. Competitors

The proposed method is compared to five state-of-the-art BS methods: ASPS, MPWR, ONR, UBS, and VGBS.

3.1.1. ASPS

ASPS [44] is the acronym for hyperspectral band selection via adaptive subspace partition strategy. Its framework begins with a two-step partition of the data cube, starting with the coarse partition of the image cube into a predetermined number of parts, and then there is a fine subspace partition, from which bands with low noise levels are finally selected.

3.1.2. MPWR

In [45], the authors proposed a manifold-preserving and weakly redundant (MPWR) unsupervised band selection method. A manifold-preserving band-importance metric was used to measure the band-wise essentiality. Concerning the redundancy caused by the correlated bands, this paper establishes a constrained band-weight optimization model. Thus, both band-wise manifold-preserving capability and intraband correlation are integrated into the BS method.

3.1.3. ONR

The approach called ’hyperspectral band selection via optimal neighborhood reconstruction’ is based on optimal neighborhood reconstruction (ONR) [46]. It proceeds to different band combinations in order to reconstruct the original data set, and also applies a noise reducer to minimize the influence of noisy bands.

3.1.4. UBS

UBS is used as the acronym for the approach presented in [47]. This method is based on the spectral decomposition of a matrix, then the loading-factors matrix can be constructed for band prioritization, according to the corresponding eigenvalues and eigenvectors.

3.1.5. VGBS

This paper states that there is a relation between the volume of a sub-simplex and the volume gradient of a simplex. Based on this, they proposed a BS method called VGBS [48]. It is unsupervised and seeks to remove the most redundant band based only on the gradient of volume instead of calculating the volumes of all sub-simplexes.

3.2. Experimental Results

In order to compare the outcome of the proposed method, five different bands were selected from scratch: 10, 20, 30, 40, and 50. A set with 50 bands does not necessarily contain the 10-band set, for example, due to the nature of the neural networks.
We compare the results to other BS methods for each hyperspectral image separately. All of the classifier accuracies exhibited here are the mean values of ten runs.
Our approach is dubbed CW—the unsupervised cluster-wise method.

3.2.1. (Case 1) Botswana HSI

The Botswana image is composed of 242 spectral bands and has 14 classes. See further details about this image at [42].
Table 2 shows the results of the BS methods using Botswana HSI. Bold values represent the highest scores attained. Figure 4 presents the same results in an illustrative way. In the figure, different marks are used in order to identify the competitors. The line connecting the marks does not mean interpolation, but rather helps the reader.
Since the Botswana image has 14 classes, the proposed method has the same number of iterations—see Algorithm 3. For each one-versus-all case, a single-layer neural net is run, and the error (of the test set) versus the epoch curves are shown in Figure 5. Clearly, there is convergence in all cases, meaning that it is possible to find a hyperplane to separate the clusters C i and C 0 C i . The different curve shapes indicate how fast the algorithm converged. For instance, in Figure 5, Cluster 1 versus All converged faster than Cluster 6 versus All.
Concerning the results, there were five different sets of selected bands, and each set was subjected to three classifiers. Thus, in total, we had 15 different experiments, from which the proposed CW framework surpassed its competitors in 9 out of 15 cases.

3.2.2. (Case 2) Indian Pines HSI

This image has 224 bands and 16 classes. Further details about this image can be found at [42].
Table 3 shows the accuracies and standard deviations of the results.
Out of 15 experiments, the proposed CW method achieved the best results 10 times. In Figure 6, it is possible to have a visual idea of the performance of all BS methods.
According to Figure 7, for all iterations, the single-layer neural network converged. This convergence is seen by the fact that the error was lower as the number of epochs increased.

3.2.3. (Case 3) Pavia University HSI

This image has 102 spectral bands and 9 classes. See more information about this image at [42].
In Table 4, we see the results, in terms of overall accuracy and standard deviation, of all methods. The proposed CW algorithm has the best accuracies in 8 out of 15 experiments. It is also shown in Figure 8.
Figure 9 indicates that all the one-versus-all separating hyperplanes solutions converged.

3.3. Remark

Finally, since this is a paper about band selection, Table 5 presents all of the bands selected by the proposed method CW.

4. Discussion

The proposed method was introduced in Section 2.2 and its performance was reviewed in Section 3.2.
It is still important to emphasize the advantages of using the proposed method CW. On the other hand, the deficiencies of the algorithm will also be highlighted.

4.1. Pros

As shown in Table 2, Table 3 and Table 4, with their corresponding Figure 4, Figure 6 and Figure 8, each classifier, due to its intrinsic characteristics, performed differently when compared to the others. SVM classifiers are well known for their effectiveness in high-dimensional spaces, such as the feature space of a hyperspectral image. Thus, for this reason, we see SVM outperforming the other classifiers. On this topic, we see an exception in Table 4 and Figure 8, where SVM is less accurate than the other two classifiers. Given that KNN and CART exhibit stable performances as the band numbers increase, the reason for the bad performance of SVM, in this case, as the dimensionality became higher, may lie in the fact that the Pavia University data were not well discriminative for the SVM algorithm (this phenomenon happened with all competitors). When it comes to the KNN classifier, we set the number of neighbors equal to 7, i.e., K = 7 in all experiments; the objective here was not to find the best settings for the KNN but to provide equal conditions for the comparison of the band selection methods. As KNN classifies an input pattern based on its neighbors in the feature space, it performed better than the CART classifier, whose decision trees rely on binary rules, which become more complex as dimensionality increases.
In general, the proposed CW method outperformed its competitors in lower dimensions, due to the fact that the CW algorithm selected its bands based on the class separations in the feature space. Thus, even in lower dimensions, we saw a good performance of the proposed method, which was designed to be used as a filter method, i.e., a preprocessing step of hyperspectral data classification tasks. As the dimension increased, the CW method maintained good results when compared to its competitors. In fact, considering all of the results, we see that the proposed method achieved the best results in 9 + 10 + 8 45 = 60 % in the experiments. This is likely due to the fact that the CW method is capable of selecting the best spectral bands for each individual cluster (or class) in a one-versus-all fashion, even in an unsupervised case. Moreover, the cluster separability criterion used during the band selection process makes the job of the classifiers easier.
In terms of processing times, the proposed method does not appear amongst the fastest ones, as Figure 10 indicates. However, its outstanding accuracy mean compensates for this fact. Moreover, the mean processing time of the CW method was less than 50 s, which caused no problems in offline applications.

4.2. Cons

The proposed CW algorithm is not capable of addressing all the issues concerning a band selection application, such as the optimal number of bands to be selected. In fact, we do not address this topic in this paper. Here, the number s of bands to be selected is a user-defined parameter.
Moreover, it is necessary to know the number k of classes in the scene depicted by the image. Even though a remote sensing expert may easily infer the number of classes in a given scene, this topic remains unsolved.

5. Conclusions

The high dimensionality of a hyperspectral image can be useful in terms of good discrimination amongst objects and classes. On the other hand, it can also be a source of problems, such as the curse of dimensionality and overfitting of the classifier.
In order to alleviate such issues, this paper proposes a novel unsupervised band selection framework based on partitional clustering, in which each cluster stands for a real class of the data set. A hyperplane was used to separate all clusters in a one-versus-all fashion. After this, we proceeded to fine-tuning the initially selected bands based on the cluster separability in the feature space.
The proposed method achieved the best classification results in 60% of the experiments.
In future works, it is advisable to verify the performance of support vector machines to find a separating hyperplane between clusters. Furthermore, other numbers of initially chosen bands should be tested, rather than only 4 ( s / k ) . Moreover, some more recent clustering algorithms could be tested in order to check their effects on the final results. Finally, one could use optimization algorithms to find a suitable subset of bands during the fine-tuning process.

Author Contributions

Conceptualization, M.H. and V.F.; methodology, M.H.; software, M.H.; validation, M.H., V.F. and E.H.S.; formal analysis, M.H.; writing—original draft preparation, M.H.; writing—review and editing, M.H.; supervision, V.F. and E.H.S.; funding acquisition, V.F. All authors have read and agreed to the published version of the manuscript.

Funding

This work was carried out in the framework of the NExT Senior Talent Chair DeepCoSLAM, which were funded by the French Government, through the program Investments for the Future managed by the National Agency for Research ANR-16-IDEX-0007, and with the support of Région Pays de la Loire and Nantes Métropole.

Data Availability Statement

Publicly available datasets were analyzed in this study. This data can be found here: http://www.ehu.eus/ccwintco/index.php/Hyperspectral_Remote_Sensing_Scenes, accessed on 16 August 2022.

Acknowledgments

The authors thank the reviewers for their critiques and comments.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Theodoridis, S.; Koutroumbas, K. Pattern Recognition, 4th ed.; Elsevier: Amsterdam, The Netherlands, 2009. [Google Scholar]
  2. Guyon, I.; Elisseeff, A. An Introduction to Variable and Feature Selection. J. Mach. Learn. Res. 2003, 3, 1157–1182. [Google Scholar]
  3. Chen, C.H. Statistical Pattern Recognition, 1st ed.; Spartan Books: Washington, DC, USA, 1973. [Google Scholar]
  4. Habermann, M.; Frémont, V.; Shiguemori, E.H. Supervised band selection in hyperspectral images using single-layer neural networks. Int. J. Remote Sens. 2019, 40, 3900–3926. [Google Scholar] [CrossRef]
  5. Houle, M.E.; Kriegel, H.P.; Kröger, P.; Schubert, E.; Zimek, A. Can Shared-Neighbor Distances Defeat the Curse of Dimensionality? In Scientific and Statistical Database Management; Gertz, M., Ludäscher, B., Eds.; Springer: Berlin/Heidelberg, Germany, 2010; pp. 482–500. [Google Scholar]
  6. Duda, R.O.; Hart, P.E.; Stork, D.G. Pattern Classification, 2nd ed.; Wiley: New York, NY, USA, 2001. [Google Scholar]
  7. Haykin, S.S. Neural Networks and Learning Machines, 3rd ed.; Pearson Education: Upper Saddle River, NJ, USA, 2009. [Google Scholar]
  8. Kuncheva, L.I.; Matthews, C.E.; Arnaiz-González, Á.; Rodríguez, J.J. Feature Selection from High-Dimensional Data with Very Low Sample Size: A Cautionary Tale. arXiv 2020, arXiv:abs/2008.12025. [Google Scholar]
  9. Shang, X.; Song, M.; Wang, Y.; Yu, C.; Yu, H.; Li, F.; Chang, C. Target-Constrained Interference-Minimized Band Selection for Hyperspectral Target Detection. IEEE Trans. Geosci. Remote Sens. 2020, 59, 6044–6064. [Google Scholar] [CrossRef]
  10. Zeng, M.; Cai, Y.; Cai, Z.; Liu, X.; Hu, P.; Ku, J. Unsupervised Hyperspectral Image Band Selection Based on Deep Subspace Clustering. IEEE Geosci. Remote Sens. Lett. 2019, 16, 1889–1893. [Google Scholar] [CrossRef]
  11. Feng, J.; Chen, J.; Sun, Q.; Shang, R.; Cao, X.; Zhang, X.; Jiao, L. Convolutional Neural Network Based on Bandwise-Independent Convolution and Hard Thresholding for Hyperspectral Band Selection. IEEE Trans. Cybern. 2021, 51, 4414–4428. [Google Scholar] [CrossRef] [PubMed]
  12. Ding, X.; Li, H.; Yang, J.; Dale, P.; Chen, X.; Jiang, C.; Zhang, S. An Improved Ant Colony Algorithm for Optimized Band Selection of Hyperspectral Remotely Sensed Imagery. IEEE Access 2020, 8, 25789–25799. [Google Scholar] [CrossRef]
  13. Bevilacqua, M.; Berthoumieu, Y. Multiple-Feature Kernel-Based Probabilistic Clustering for Unsupervised Band Selection. IEEE Trans. Geosci. Remote Sens. 2019, 57, 6675–6689. [Google Scholar] [CrossRef] [Green Version]
  14. Tang, J.; Alelyani, S.; Liu, H. Feature selection for classification: A review. In Data Classification; CRC Press: Boca Raton, FL, USA, 2014; pp. 37–64. [Google Scholar] [CrossRef]
  15. Liu, T.; Xiao, J.; Huang, Z.; Kong, E.; Liang, Y. BP Neural Network Feature Selection Based on Group Lasso Regularization. In Proceedings of the 2019 Chinese Automation Congress (CAC), Hangzhou, China, 22–24 November 2019; pp. 2786–2790. [Google Scholar] [CrossRef]
  16. Cao, X.; Xiong, T.; Jiao, L. Supervised Band Selection Using Local Spatial Information for Hyperspectral Image. IEEE Geosci. Remote Sens. Lett. 2016, 13, 329–333. [Google Scholar] [CrossRef]
  17. Gao, J.; Du, Q.; Gao, L.; Sun, X.; Wu, Y.; Zhang, B. Ant colony optimization for supervised and unsupervised hyperspectral band selection. In Proceedings of the 2013 5th Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS), Gainesville, FL, USA, 25–28 June 2013; pp. 1–4. [Google Scholar] [CrossRef]
  18. Habermann, M.; Frémont, V.; Shiguemori, E.H. Problem-based band selection for hyperspectral images. In Proceedings of the 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Fort Worth, TX, USA, 23–28 July 2017; pp. 1800–1803. [Google Scholar] [CrossRef] [Green Version]
  19. Wei, X.; Cai, L.; Liao, B.; Lu, T. Local-View-Assisted Discriminative Band Selection with Hypergraph Autolearning for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2020, 58, 2042–2055. [Google Scholar] [CrossRef]
  20. Feng, J.; Jiao, L.; Liu, F.; Sun, T.; Zhang, X. Mutual-Information-Based Semi-Supervised Hyperspectral Band Selection with High Discrimination, High Information, and Low Redundancy. IEEE Trans. Geosci. Remote Sens. 2015, 53, 2956–2969. [Google Scholar] [CrossRef]
  21. Guo, Z.; Yang, H.; Bai, X.; Zhang, Z.; Zhou, J. Semi-supervised hyperspectral band selection via sparse linear regression and hypergraph models. In Proceedings of the 2013 IEEE International Geoscience and Remote Sensing Symposium—IGARSS, Melbourne, Australia, 21–26 July 2013; pp. 1474–1477. [Google Scholar] [CrossRef] [Green Version]
  22. Cao, X.; Wei, C.; Ge, Y.; Feng, J.; Zhao, J.; Jiao, L. Semi-Supervised Hyperspectral Band Selection Based on Dynamic Classifier Selection. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 1289–1298. [Google Scholar] [CrossRef]
  23. Bai, J.; Xiang, S.; Shi, L.; Pan, C. Semisupervised Pair-Wise Band Selection for Hyperspectral Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2015, 8, 2798–2813. [Google Scholar] [CrossRef]
  24. Karoui, M.S.; Djerriri, K.; Boukerch, I. Unsupervised Hyperspectral Band Selection by Sequentially Clustering A Mahalanobis-Based Dissimilarity of Spectrally Variable Endmembers. In Proceedings of the 2020 Mediterranean and Middle-East Geoscience and Remote Sensing Symposium (M2GARSS), Tunis, Tunisia, 9–11 March 2020; pp. 33–36. [Google Scholar] [CrossRef]
  25. Sui, C.; Tian, Y.; Xu, Y.; Xie, Y. Unsupervised Band Selection by Integrating the Overall Accuracy and Redundancy. IEEE Geosci. Remote Sens. Lett. 2015, 12, 185–189. [Google Scholar] [CrossRef]
  26. Yang, C.; Bruzzone, L.; Zhao, H.; Tan, Y.; Guan, R. Superpixel-Based Unsupervised Band Selection for Classification of Hyperspectral Images. IEEE Trans. Geosci. Remote Sens. 2018, 56, 7230–7245. [Google Scholar] [CrossRef]
  27. Wang, Q.; Zhang, F.; Li, X. Optimal Clustering Framework for Hyperspectral Band Selection. IEEE Trans. Geosci. Remote Sens. 2018, 56, 5910–5922. [Google Scholar] [CrossRef]
  28. Xu, B.; Li, X.; Hou, W.; Wang, Y.; Wei, Y. A Similarity-Based Ranking Method for Hyperspectral Band Selection. IEEE Trans. Geosci. Remote Sens. 2021, 59, 9585–9599. [Google Scholar] [CrossRef]
  29. Datta, A.; Ghosh, S.; Ghosh, A. Combination of Clustering and Ranking Techniques for Unsupervised Band Selection of Hyperspectral Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2015, 8, 2814–2823. [Google Scholar] [CrossRef]
  30. Jia, S.; Tang, G.; Zhu, J.; Li, Q. A Novel Ranking-Based Clustering Approach for Hyperspectral Band Selection. IEEE Trans. Geosci. Remote Sens. 2016, 54, 88–102. [Google Scholar] [CrossRef]
  31. Xie, W.; Lei, J.; Yang, J.; Li, Y.; Du, Q.; Li, Z. Deep Latent Spectral Representation Learning-Based Hyperspectral Band Selection for Target Detection. IEEE Trans. Geosci. Remote Sens. 2020, 58, 2015–2026. [Google Scholar] [CrossRef]
  32. Zhang, F.; Wang, Q.; Li, X. Hyperspectral image band selection via global optimal clustering. In Proceedings of the 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Fort Worth, TX, USA, 23–28 July 2017; pp. 1–4. [Google Scholar] [CrossRef]
  33. Tang, G.; Jia, S.; Li, J. An enhanced density peak-based clustering approach for hyperspectral band selection. In Proceedings of the 2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Milan, Italy, 26–31 July 2015; pp. 1116–1119. [Google Scholar] [CrossRef]
  34. Sun, W.; Peng, J.; Yang, G.; Du, Q. Correntropy-Based Sparse Spectral Clustering for Hyperspectral Band Selection. IEEE Geosci. Remote Sens. Lett. 2020, 17, 484–488. [Google Scholar] [CrossRef]
  35. Kumar, V.; Hahn, J.; Zoubir, A.M. Band selection for hyperspectral images based on self-tuning spectral clustering. In Proceedings of the 21st European Signal Processing Conference (EUSIPCO 2013), Marrakech, Morocco, 9–13 September 2013; pp. 1–5. [Google Scholar]
  36. Dou, Z.; Gao, K.; Zhang, X.; Wang, H.; Han, L. Band Selection of Hyperspectral Images Using Attention-Based Autoencoders. IEEE Geosci. Remote Sens. Lett. 2021, 18, 147–151. [Google Scholar] [CrossRef]
  37. Damodaran, B.B.; Courty, N.; Lefèvre, S. Sparse Hilbert Schmidt Independence Criterion and Surrogate-Kernel-Based Feature Selection for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 2385–2398. [Google Scholar] [CrossRef] [Green Version]
  38. Sun, W.; Peng, J.; Yang, G.; Du, Q. Fast and Latent Low-Rank Subspace Clustering for Hyperspectral Band Selection. IEEE Trans. Geosci. Remote Sens. 2020, 58, 3906–3915. [Google Scholar] [CrossRef]
  39. Kohavi, R.; John, G.H. Wrappers for Feature Subset Selection. Artif. Intell. 1997, 97, 273–324. [Google Scholar] [CrossRef] [Green Version]
  40. Cai, Y.; Liu, X.; Cai, Z. BS-Nets: An End-to-End Framework for Band Selection of Hyperspectral Image. IEEE Trans. Geosci. Remote Sens. 2020, 58, 1969–1984. [Google Scholar] [CrossRef] [Green Version]
  41. Habermann, M.; Frémont, V.; Shiguemori, E.H. Unsupervised Hyperspectral Band Selection Using Clustering and Single-Layer Neural Network. Revue Française de Photogrammétrie et de Télédétection 2018, 217–218, 33–42. [Google Scholar] [CrossRef]
  42. Graña, M.; Veganzons, M.A.; Ayerdi, B. Hyperspectral Remote Sensing Scenes. Available online: http://www.ehu.eus/ccwintco/index.php/Hyperspectral_Remote_Sensing_Scenes (accessed on 16 August 2022).
  43. Fu, X.; Shang, X.; Sun, X.; Yu, H.; Song, M.; Chang, C.I. Underwater Hyperspectral Target Detection with Band Selection. Remote Sens. 2020, 12, 1056. [Google Scholar] [CrossRef] [Green Version]
  44. Wang, Q.; Li, Q.; Li, X. Hyperspectral Band Selection via Adaptive Subspace Partition Strategy. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 4940–4950. [Google Scholar] [CrossRef]
  45. Sui, C.; Li, C.; Feng, J.; Mei, X. Unsupervised Manifold-Preserving and Weakly Redundant Band Selection Method for Hyperspectral Imagery. IEEE Trans. Geosci. Remote Sens. 2020, 58, 1156–1170. [Google Scholar] [CrossRef]
  46. Wang, Q.; Zhang, F.; Li, X. Hyperspectral Band Selection via Optimal Neighborhood Reconstruction. IEEE Trans. Geosci. Remote Sens. 2020, 58, 8465–8476. [Google Scholar] [CrossRef]
  47. Chang, C.-I.; Du, Q.; Sun, T.-L.; Althouse, M.L.G. A joint band prioritization and band-decorrelation approach to band selection for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 1999, 37, 2631–2641. [Google Scholar] [CrossRef] [Green Version]
  48. Geng, X.; Sun, K.; Ji, L.; Zhao, Y. A Fast Volume-Gradient-Based Band Selection Method for Hyperspectral Image. IEEE Trans. Geosci. Remote Sens. 2014, 52, 7111–7119. [Google Scholar] [CrossRef]
Figure 1. (a) Illustration of k clusters after the partitional clustering. (b) A representation of one-versus-all binary classification by means of a line segment f, where v 1 and v 2 are variables that enable the 2D representation.
Figure 1. (a) Illustration of k clusters after the partitional clustering. (b) A representation of one-versus-all binary classification by means of a line segment f, where v 1 and v 2 are variables that enable the 2D representation.
Remotesensing 14 05374 g001
Figure 2. A 2D binary classification illustration using synthetic data, in variables v 1 and v 2 . The hyperplane parameters w ( 1 ) , w ( 2 ) , and β (the latter not shown here) are calculated by a single-layer neural network, whose result is depicted by a green line segment. The magnitude of the neural networks (or the hyperplane parameters) indicate the relevance of its correspondent feature. (a) Two linearly separable classes. Clearly, attribute v 2 provides good separation between the clusters, which is corroborated by w ( 2 ) > w ( 1 ) . (b) A similar situation as in (a) occurs here, but this time v 1 provides the separation between the clusters, and w ( 1 ) > w ( 2 ) . In (c,d) it is possible to draw the same conclusion, even when the clusters overlap.
Figure 2. A 2D binary classification illustration using synthetic data, in variables v 1 and v 2 . The hyperplane parameters w ( 1 ) , w ( 2 ) , and β (the latter not shown here) are calculated by a single-layer neural network, whose result is depicted by a green line segment. The magnitude of the neural networks (or the hyperplane parameters) indicate the relevance of its correspondent feature. (a) Two linearly separable classes. Clearly, attribute v 2 provides good separation between the clusters, which is corroborated by w ( 2 ) > w ( 1 ) . (b) A similar situation as in (a) occurs here, but this time v 1 provides the separation between the clusters, and w ( 1 ) > w ( 2 ) . In (c,d) it is possible to draw the same conclusion, even when the clusters overlap.
Remotesensing 14 05374 g002
Figure 3. Each cluster provides one band for the composition of tuple t. In this example, the band b i q 1 , the band b j q 2 , and the band b l q s / k , among others connected by the dashed line, form the tuple t. Each possible combination of bands in different clusters gives rise to all t Q .
Figure 3. Each cluster provides one band for the composition of tuple t. In this example, the band b i q 1 , the band b j q 2 , and the band b l q s / k , among others connected by the dashed line, form the tuple t. Each possible combination of bands in different clusters gives rise to all t Q .
Remotesensing 14 05374 g003
Figure 4. Overall Accuracies for Botswana Image according to KNN, CART, and SVM classifiers.
Figure 4. Overall Accuracies for Botswana Image according to KNN, CART, and SVM classifiers.
Remotesensing 14 05374 g004
Figure 5. Error versus epoch curve of each one-versus-all case for the Botswana image.
Figure 5. Error versus epoch curve of each one-versus-all case for the Botswana image.
Remotesensing 14 05374 g005
Figure 6. Overall accuracies for Indian pines image according to KNN, CART, and SVM classifiers.
Figure 6. Overall accuracies for Indian pines image according to KNN, CART, and SVM classifiers.
Remotesensing 14 05374 g006
Figure 7. Error versus epoch curve of each one-versus-all case for the Indian pines image. Only the first 200 training epochs are shown here.
Figure 7. Error versus epoch curve of each one-versus-all case for the Indian pines image. Only the first 200 training epochs are shown here.
Remotesensing 14 05374 g007
Figure 8. Overall accuracies for the Pavia University image according to KNN, CART, and SVM classifiers.
Figure 8. Overall accuracies for the Pavia University image according to KNN, CART, and SVM classifiers.
Remotesensing 14 05374 g008
Figure 9. Error versus epoch curve of each one-versus-all case for the Pavia University image. Only the first 50 training epochs are shown here.
Figure 9. Error versus epoch curve of each one-versus-all case for the Pavia University image. Only the first 50 training epochs are shown here.
Remotesensing 14 05374 g009
Figure 10. Mean processing time of all images and all classifiers together.
Figure 10. Mean processing time of all images and all classifiers together.
Remotesensing 14 05374 g010
Table 1. Adjusted Rand index ( 0 r 1 ) (mean values out of 10 runs for Salinas HSI).
Table 1. Adjusted Rand index ( 0 r 1 ) (mean values out of 10 runs for Salinas HSI).
Clustering Algorithmr
K-means (Euclidean)0.6997
K-means (cityblock)0.7382
K-means (cosine)0.7941
K-means (correlation)0.7170
K-medoids (Euclidean)0.7062
K-medoids (Mahalanobis)0.7685
K-medoids (cityblock)0.7402
K-medoids (Minkowski)0.7396
K-medoids (Chebychev)0.7269
K-medoids (Spearman)0.7674
K-medoids (Jaccard)0.6146
Table 2. Botswana image results (overall accuracy and standard deviation in %).
Table 2. Botswana image results (overall accuracy and standard deviation in %).
Method10 Bands20 Bands30 Bands40 Bands50 Bands
KNN classifier
CW90.86 ± 0.6988.81 ± 0.2790.86 ± 0.9191.48 ± 0.9793.22 ± 0.69
ASPS85.63 ± 0.7589.73 ± 0.9491.07 ± 0.7689.01 ± 0.6391.58 ± 0.72
MPWR81.52 ± 0.9986.45 ± 1.4390.76 ± 0.5690.35 ± 1.0590.25 ± 0.89
ONR90.66 ± 0.91 92 . 61  ± 0.5389.94 ± 0.6491.38 ± 0.8991.99 ± 0.86
UBS88.50 ± 1.0789.53 ± 1.0587.99 ± 0.8689.94 ± 0.6690.04 ± 0.96
VGBS88.09 ± 0.7990.55 ± 1.0088.50 ± 1.3087.17 ± 1.2988.09 ± 1.10
CART classifier
CW84.80 ± 1.2386.55 ± 1.2784.91 ± 1.4985.93 ± 1.0386.04 ± 1.13
ASPS81.72 ± 1.0485.32 ± 1.2783.78 ± 1.1583.98 ± 1.3884.50 ± 0.96
MPWR72.59 ± 1.2781.31 ± 1.1384.29 ± 1.0285.52 ± 1.1385.01 ± 1.47
ONR83.37 ± 0.7484.80 ± 1.3584.91 ± 1.0684.60 ± 1.0184.70 ± 1.46
UBS80.39 ± 1.1483.26 ± 1.0783.68 ± 1.3685.32 ± 0.8485.01 ± 0.98
VGBS83.98 ± 0.8185.22 ± 0.9582.24 ± 1.3583.47 ± 0.6586.24 ± 1.44
SVM classifier
CW89.73 ± 0.7294.76 ± 0.9194.97 ± 0.6993.94 ± 0.6194.15 ± 0.56
ASPS87.78 ± 0.6791.27 ± 0.6793.84 ± 0.6692.09 ± 0.6994.05 ± 0.55
MPWR87.06 ± 1.0690.86 ± 0.9793.73 ± 0.6794.76 ± 0.7594.15 ± 0.61
ONR92.91 ± 0.3694.25 ± 0.4893.83 ± 0.6294.76 ± 0.4694.14 ± 0.77
UBS89.42 ± 0.8292.50 ± 0.9892.71 ± 0.6793.42 ± 0.8692.91 ± 0.89
VGBS90.04 ± 0.9792.81 ± 0.5893.63 ± 1.0693.01 ± 0.6893.53 ± 0.65
Table 3. Indian Pines image results (overall accuracy and standard deviation in %).
Table 3. Indian Pines image results (overall accuracy and standard deviation in %).
Method10 Bands20 Bands30 Bands40 Bands50 Bands
KNN classifier
CW76.81 ± 1.1380.00 ± 0.5478.14 ± 0.3181.13 ± 0.5979.45 ± 0.70
ASPS68.61 ± 0.6866.50 ± 0.9366.41 ± 0.9063.19 ± 0.3363.32 ± 0.27
MPWR69.66 ± 0.8970.63 ± 0.5972.52 ± 0.4373.59 ± 0.4970.63 ± 1.12
ONR71.35 ± 0.3570.99 ± 0.6167.41 ± 1.6271.45 ± 0.5772.03 ± 0.96
UBS64.62 ± 0.5263.15 ± 0.6964.78 ± 1.0964.98 ± 0.8263.22 ± 0.72
VGBS69.14 ± 0.8070.21 ± 0.9067.94 ± 0.7470.41 ± 0.5569.98 ± 0.61
CART classifier
CW70.82 ± 0.7874.21 ± 0.6772.55 ± 0.8673.50 ± 0.7072.75 ± 0.81
ASPS69.49 ± 0.4169.20 ± 0.3371.94 ± 0.4673.20 ± 0.7773.20 ± 0.58
MPWR63.45 ± 0.6667.32 ± 0.7370.73 ± 0.7071.58 ± 0.6571.28 ± 0.88
ONR71.28 ± 0.9975.41 ± 0.7673.30 ± 1.4074.57 ± 1.2273.33 ± 0.82
UBS71.71 ± 0.9673.66 ± 0.7874.67 ± 0.8374.24 ± 1.1273.69 ± 0.86
VGBS70.44 ± 0.3070.60 ± 0.5771.22 ± 1.2870.83 ± 0.7571.32 ± 0.78
SVM classifier
CW84.58 ± 0.8086.70 ± 0.1684.29 ± 5.1487.97 ± 4.4887.61 ± 0.78
ASPS81.39 ± 0.5880.36 ± 0.6582.60 ± 0.5880.72 ± 0.5283.68 ± 0.20
MPWR72.88 ± 0.9578.82 ± 0.6481.07 ± 0.8284.39 ± 0.5083.77 ± 0.50
ONR82.70 ± 0.3184.75 ± 0.7084.10 ± 0.6086.93 ± 0.8284.88 ± 4.00
UBS79.61 ± 0.5182.86 ± 0.2376.91 ± 0.6182.08 ± 0.3779.94 ± 3.60
VGBS76.59 ± 0.7079.97 ± 0.7979.06 ± 0.5280.68 ± 0.7580.07 ± 1.04
Table 4. Pavia University image results (overall accuracy and standard deviation in %).
Table 4. Pavia University image results (overall accuracy and standard deviation in %).
Method10 Bands20 Bands30 Bands40 Bands50 Bands
KNN classifier
CW92.19 ± 0.2692.12 ± 0.4891.73 ± 0.1590.37 ± 0.2490.99 ± 0.15
ASPS87.67 ± 0.1190.24 ± 0.3989.90 ± 0.3989.33 ± 0.2990.54 ± 0.40
MPWR91.80 ± 0.3990.99 ± 0.1492.37 ± 0.9390.45 ± 0.4191.19 ± 1.19
ONR88.89 ± 0.1992.30 ± 0.1591.82 ± 0.2991.02 ± 0.1691.60 ± 0.11
UBS86.00 ± 0.4288.15 ± 0.4487.84 ± 0.2288.10 ± 0.2288.17 ± 0.10
VGBS84.26 ± 0.3987.77 ± 0.3486.42 ± 0.2887.24 ± 0.3988.30 ± 0.60
CART classifier
CW89.54 ± 0.1289.18 ± 0.3589.31 ± 0.2989.04 ± 0.3989.04 ± 0.37
ASPS83.23 ± 0.3586.69 ± 0.2686.74 ± 0.4887.06 ± 0.3786.47 ± 0.10
MPWR89.29 ± 0.6788.82 ± 0.8389.67 ± 0.7189.03 ± 0.9588.81 ± 0.81
ONR85.37 ± 0.1389.63 ± 0.2389.03 ± 0.2688.69 ± 0.1689.00 ± 0.25
UBS85.02 ± 0.4987.27 ± 0.4386.26 ± 0.3186.50 ± 0.2387.42 ± 0.36
VGBS85.79 ± 0.2388.26 ± 0.3188.12 ± 0.3288.49 ± 0.2088.04 ± 0.31
SVM classifier
CW94.97 ± 2.3890.77 ± 12.5975.56 ± 16.1768.26 ± 16.5167.01 ± 15.36
ASPS89.58 ± 9.4888.67 ± 9.4140.19 ± 12.3948.71 ± 15.9640.56 ± 10.81
MPWR94.93 ± 3.9591.85 ± 8.6470.66 ± 13.8139.13 ± 17.5154.86 ± 15.50
ONR91.31 ± 12.3952.29 ± 24.5646.62 ± 10.8643.36 ± 21.4143.48 ± 12.96
UBS89.32 ± 2.4954.75 ± 18.8753.18 ± 13.9938.06 ± 16.0545.21 ± 13.29
VGBS91.29 ± 13.5989.74 ± 14.5474.77 ± 17.8850.58 ± 8.8651.20 ± 12.23
Table 5. All the bands selected by the proposed CW method.
Table 5. All the bands selected by the proposed CW method.
Number of BandsSelected Bands
Botswana image
104 11 17 21 24 41 69 101 105 120
207 10 12 28 32 33 37 39 40 49 55 58 59 65 67 73 75 76 113 124
302 4 5 7 21 27 29 30 32 33 34 35 37 43 44 47 54 56 57 58 61 62 71 74 78 80 82 118 120 124
402 3 4 5 6 8 13 14 16 27 28 31 32 33 34 35 36 39 41 42 52 54 58 60 63 65 69 70 72 74 78 88 89 92 97 100 101 105 135 142
501 2 4 5 6 8 10 16 19 21 22 23 24 25 26 27 30 31 32 33 34 36 41 42 47 57 58 59 66 67 69 72 74 75 77 78 87 89 94 96 98 100 101 102 104 109 110 113 130 144
Indian Pines image
1016 20 21 33 34 39 92 97 119 128
208 10 15 16 17 19 26 27 30 33 36 43 46 47 64 78 97 98 117 133
305 6 7 8 9 15 27 30 35 37 39 40 46 56 57 62 63 64 71 73 74 75 76 78 82 92 98 168 173 174
404 6 7 9 10 16 17 19 27 30 32 33 34 35 36 40 46 50 52 53 57 63 69 72 74 84 92 93 97 99 100 117 121 122 126 137 139 140 142 199
506 9 11 12 15 20 22 23 25 26 29 30 31 32 33 36 41 42 43 44 45 46 49 50 51 55 56 59 65 71 73 74 75 76 77 84 92 95 98 102 114 117 119 121 122 130 138 168 172 199
Pavia University image
1021 42 55 70 72 73 75 83 85 98
2015 18 28 46 49 55 56 60 61 63 65 71 83 85 88 89 91 95 99 103
3010 16 20 22 31 36 38 40 50 54 59 61 62 64 65 67 70 72 74 77 80 83 85 91 92 94 96 98 100 102
403 10 11 14 16 17 18 20 23 27 38 39 44 46 51 56 58 59 61 62 63 65 67 69 71 72 74 75 76 78 80 83 84 85 91 94 96 98 100 103
509 10 11 13 15 17 18 20 23 25 26 28 29 31 33 35 37 40 41 44 56 57 59 61 62 63 65 66 67 69 71 72 73 75 77 79 81 83 84 85 88 90 92 94 95 96 98 100 102 103
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Habermann, M.; Shiguemori, E.H.; Frémont, V. Unsupervised Cluster-Wise Hyperspectral Band Selection for Classification. Remote Sens. 2022, 14, 5374. https://doi.org/10.3390/rs14215374

AMA Style

Habermann M, Shiguemori EH, Frémont V. Unsupervised Cluster-Wise Hyperspectral Band Selection for Classification. Remote Sensing. 2022; 14(21):5374. https://doi.org/10.3390/rs14215374

Chicago/Turabian Style

Habermann, Mateus, Elcio Hideiti Shiguemori, and Vincent Frémont. 2022. "Unsupervised Cluster-Wise Hyperspectral Band Selection for Classification" Remote Sensing 14, no. 21: 5374. https://doi.org/10.3390/rs14215374

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop