**1. Introduction**

Hyperspectral image classification has received considerable interest in recent years [1–23]. Its band selection (BS) issue has been also studied extensively [24–57]. In general, there are two approaches to BS. One is to select bands one at a time, sequentially; this is referred to as sequential multiple band selection (SQMBS). In this case, a criterion that can be used to select bands, according to priorities ranked by the criterion, is usually required. Such a criterion is referred to as a band prioritization (BP) criterion, and it can be designed according to two perspectives. One type of BP criterion is based on data characteristics or statistics such as variance, signal-to-noise ratio (SNR), entropy, and information divergence (ID) to calculate a priority score for each of the individual bands in order to rank them [25]. As a result, such BP-based SQMBS is generally unsupervised and is not adaptive to any particular application. In other words, the same selected bands are also applied to all different applications. The other type of BP criterion is supervised and is adaptive to a particular application, such as classification [26–57], target detection [49,50], endmember extraction [51], spectral

unmixing [52], etc. Unfortunately, one of major problems with BP-derived BS methods is how to deal with band correlation. Since hyperspectral imagery has very high interband correlation, the fact that a band has a high priority to be selected implies that its adjacent bands also have high priorities to be selected. To avoid this dilemma, band decorrelation may be required to remove redundant bands from a group of selected bands. However, this also comes with two issues, i.e., how to select a band correlation criterion to measure the correlation between two bands, and how to determine the threshold for two bands that are sufficiently decorrelated.

As an alternative to BP-based SQMBS methods, another approach, referred to as simultaneous multiple band selection (SMMBS), is to select multiple bands simultaneously as a band subset. This approach does not have issues in prioritizing bands or decorrelating bands that are encountered in SQMBS. However, the price paid for these advantages is how to develop an effective search strategy to find an optimal band subset, since it generally requires an exhaustive search, which is practically infeasible. To address this issue, several works have been recently proposed, such as band clustering [58–60], particle swarm optimization (PSO) in [35], firefly algorithm (FA) in [36], multitask sparsity pursuit (MTSP) [38], multigraph determinantal point process (MDPP) [43], dominant set extraction BS (DSEBS) in [40], etc. Of particular interest is a new concept of band subset selection (BSS) to address this issue which is quite different from the aforementioned SMMBS methods in the sense of the search strategy to be used for finding an optimal set of multiple bands. It considers a selected band as a desired endmember. Accordingly, finding an optimal set of endmembers from all data sample vectors can be translated to selecting an optimal band subset simultaneously from all bands. With this interpretation, two sequential algorithms designed to realize an N-finder algorithm (N-FINDR) [61] numerically, called sequential N-FINDR (SQ N-FINDR) and successive N-FINDR (SC N-FINDR) [62–65] can be redesigned to find desired band subsets, called SQ BSS and SC BSS algorithms. These two SQ BSS and SC BSS algorithms were recently developed for SMMBS in applications of anomaly detection [66] and spectral unmixing and classification [67,68]. This paper further extends BSS to hyperspectral image classification and has several different aspects not found in [66–68]. First and foremost is the criterion used for BSS, which is the minimum variance resulting from a linearly constrained finite impulse response filter arising in adaptive beamforming in array signal processing [69–72]. This linearly constrained minimum variance (LCMV)-based BSS interprets signal sources as class signature vectors and linearly constrains the class signature vectors, finding an optimal band subset for classification. It is very different from constrained energy minimization (CEM)-based BS [26], which constrains a single selected band, and also from constrained multiple band selection (CMBS) [68], which extends CEM-BS by constraining multiple bands as band subsets, not as class signature vectors as LCMV-BSS does. Secondly, two new SQ BSS and SC BSS algorithms are developed for LCMV-BSS, specifically for classification, referred to as SQ LCMV-BSS and SC LCMV-BSS. Thirdly, the classifier used to evaluate BS performance is also an LCMV classifier which is particularly designed to best utilize the bands selected by LCMV-BSS. Fourthly, despite the fact that LCMV-BSS may not exhaust all possible band combinations, to the authors' best knowledge, LCMV-BSS is probably the only BSS algorithm to search band subsets among all possible band combinations numerically compared to other SMMBS algorithms such as PSO, FA, MTSP, MDPP, DSEBS which are indeed designed to run only a very small selected set of band subsets. Finally, and most importantly, the proposed LVMV-BSS is very easy to implement because there are no parameters that need to be tuned, as many BS methods have. This is a tremendous advantage since such parameters must be adaptive to various applications.

#### **2. LCMV Criterion for BSS**

Suppose that there are *M* classes of interest and each class is specified by a class signature vector, denoted by **d**1, **d**2, ··· , **d***M*. We can now form a class signature matrix, denoted by **D** = [**d**1**d**2 ··· **d***M*]. The goal is to design an FIR linear filter with *L* filter coefficients {*<sup>w</sup>*1, *w*2, ··· , *wL*}, denoted by an *L*-dimensional vector **w** = (*<sup>w</sup>*1, *w*2, ··· , *wL*)*<sup>T</sup>* that minimizes the filter output energy subject to the following constraint:

$$\mathbf{D}^T \mathbf{w} = \mathbf{c} \text{ where } \mathbf{d}\_j^T \mathbf{w} = \sum\_{l=1}^L w\_l t\_{jl} \text{ for } 1 \le j \le M \tag{1}$$

where **c** = (*<sup>c</sup>*1, *c*2, ··· , *ck*)*<sup>T</sup>* is a constraint vector. Using (1), we derive the following linearly constrained optimization problem:

$$\min\_{\mathbf{w}} \left\{ \mathbf{w}^T \mathbf{R} \mathbf{w} \right\} \text{ subject to} \\ \mathbf{D}^T \mathbf{w} = \mathbf{c} \tag{2}$$

where **R** = (1/*N*)∑*Ni*=<sup>1</sup> **<sup>r</sup>***i***<sup>r</sup>***Ti* is the autocorrelation sample matrix of the image. The solution to (2) is called the LCMV-based classifier and can be obtained in [69,71,72] by

$$\boldsymbol{\delta}^{\text{LCMV}}(\mathbf{r}) = \left(\mathbf{w}^{\text{LCMV}}\right)^{T}\mathbf{r} \tag{3}$$

with

$$\mathbf{w}^{\text{LCMV}} = \mathbf{R}^{-1} \mathbf{D} \left( \mathbf{D}^T \mathbf{R}^{-1} \mathbf{D} \right)^{-1} \mathbf{c}. \tag{4}$$

Substituting (3) into (4) yields

$$\begin{aligned} & \left(\mathbf{w}^{\text{LCMV}}\right)^{T}\mathbf{R}^{-1}\mathbf{w}^{\text{LCMV}}\\ &= \left[\mathbf{R}^{-1}\mathbf{D}\left(\mathbf{D}^{T}\mathbf{R}^{-1}\mathbf{D}\right)^{-1}\mathbf{c}\right]^{T}\mathbf{R}^{-1}\left[\mathbf{R}^{-1}\mathbf{D}\left(\mathbf{D}^{T}\mathbf{R}^{-1}\mathbf{D}\right)^{-1}\mathbf{c}\right] \\ &= \mathbf{c}^{T}\left(\mathbf{D}^{T}\mathbf{R}^{-1}\mathbf{D}\right)^{-1}\mathbf{D}^{T}\mathbf{R}^{-1}\mathbf{D}\left(\mathbf{D}^{T}\mathbf{R}^{-1}\mathbf{D}\right)^{-1}\mathbf{c} = \mathbf{c}^{T}\left(\mathbf{D}^{T}\mathbf{R}^{-1}\mathbf{D}\right)^{-1}\mathbf{c} \end{aligned} \tag{5}$$

According to [70], (5) is the minimum variance weighted by **R**−1. As a matter of fact, (5) can be also viewed as the minimal **<sup>R</sup>**−1-weighted least squares error (LSE) caused by misclassification errors from operating *δ*LCMV on the entire image cube. For those who would like to learn more about LCMV, its details can be found in [69–71].

#### **3. Band Subset Selection**

A BS problem is generally described as follows. Assume that *J*(.) is a generic objective function of **Ω**BS for the BS to be optimized where **Ω**BS is a band subset selected from a full band set **Ω**. For a given number *n*BS of selected bands, a BS method is to find an optimal band subset **<sup>Ω</sup>**<sup>∗</sup>BS with |**<sup>Ω</sup>**BS|= *n*BS which satisfies the following optimization problem:

$$\Omega\_{\rm BS}^{\*} = \arg \left\{ \max / \min\_{\Omega\_{\rm BS} \subseteq \Pi\_{\rm}} \Omega\_{\rm BS} \vert\_{\rm = \rm m\_{\rm BS}} J(\Omega\_{\rm BS}) \right\}. \tag{6}$$

Depending upon how the objective function *J*(**Ω**BS) is designed, the optimization in (6) can be performed by either maximization or minimization over all possible band subsets **Ω**BS contained in **Ω** with |**<sup>Ω</sup>**BS|= *n*BS .

Since solving (6) requires exhausting all possible *<sup>n</sup>*BS-band combinations to find an optimal band subset, **<sup>Ω</sup>**<sup>∗</sup>BS, it is practically impossible to do so. Accordingly, many approaches have been investigated by designing various criteria or features to define *J*(**Ω**BS) and solve (6). One traditional approach is to design a BP criterion to rank all bands from which BS can be carried out by selecting bands according to their calculated priorities by a particular BP criterion. Such an approach generally results in an SQMBS method which selects multiple bands one at a time sequentially. As noted in the introduction, one major issue arising from this approach is how to deal with redundant bands caused by band correlation. As an alternative, another BP-derived SQMBS method is to specify a particular application such as minimum estimated abundance covariance (MEAC) for classification [34], which can generate feature vectors for BP and then takes advantage of the sequential forward floating search (SFFS) and sequential backward floating search (SBFS) developed in [73] to derive forward and backward BS methods. However, the band correlation issue still remains.

In contrast to SQMBS, many recent efforts have been directed to SMMBS, which selects multiple bands simultaneously at the same time. Associated with SMMBS are also two main issues needed to be addressed. One is determining the number *n*BS of bands to be selected, which is also an issue in SQMBS. Generally, *n*BS can be determined by either trial-and-error or the virtual dimensionality (VD) developed in [69,74]. The other is a more critical issue, which is to how to find appropriate *n*BS bands. Suppose that *n*BS = *p* is the number of bands needed to be selected, **<sup>Ω</sup>***p* = ,**B***l*1 ,**B***l*2 ,...,**B***lp*- is a *p*-band band subset selected from a full band set **Ω** = {**<sup>B</sup>**1,**B**2, ··· ,**B***L*} where *L* is the total number of bands, and **B***lj* is the selected *j*th band. In order to find an optimal band subset **<sup>Ω</sup>**<sup>∗</sup>*p*, we must run

through all possible *Lp* = *L*! *p*!(*<sup>L</sup>*−*p*)! *p*-combinations among *L* bands. Practically, this is infeasible if *L* is large such as in hyperspectral imagery. In this case, developing an effective search strategy for findinganoptimalsetofmultiplebandsthatdoesnotexistinSQMBSisagrea<sup>t</sup>challengetoSMMBS.

A simple SMMBS approach is to group or combine bands into clusters, each of which produces a representative band for BS using certain band measure criteria [58–60]. In particular, the concept in [58] is similar to Fisher's ratio, using mutual information as a band prioritization criterion for clustering. Most interestingly, a band group-wise method was developed [38], which used band combinations by compressive sensing and a multitask sparsity pursuit (MTSP)-based criterion to select band combinations based on linear sparse representation via an evolution-based algorithm-derived search strategy. Another SMMBS approach is to narrow the search range by specifying particular parameters to limit a small number of band subsets as candidate optimal sets, then follow an optimization algorithm such as PSO [35] or FA [36] to find an optimal band subset from the selected candidate set of band subsets.

Most recently, two other promising approaches have been reported. One is to use graph-based representations with each path used to specify a particular band subset. For example, Yuan et al. [43] proposed a graph-based SMMBS method, called multigraph determinantal point process (MDPP), which makes use of multiple graphs to discover a structure and diverse band subset from a graph where each node represents a band and the edges are specified by similarity between bands. Accordingly, a path represents a possible band subset. Then, a search algorithm called mixture determinantal point process (Mix-DPP) was further developed to find a diverse subset that can be a potential optimal band combination. The other is DSEBS, which exploits structure information via a set of local spatial–spectral filters and uses a graph-based clustering search strategy derived from dominant set extraction to find a potential optimal band subset [40].

In addition to the above-mentioned approaches there is also a new approach, called BSS, which considers the problem of multiple band selection as an endmember finding problem. If a desired selected band is interpreted as an endmember and the full band set as the entire data set, then a band subset can be interpreted as a set of endmembers. Consequently, finding an optimal set of *n*BS bands can be carried out in a similar way to finding an optimal set of *n*BS endmembers. This BSS-based approach has recently proved to be very promising and has grea<sup>t</sup> potential in various applications such as anomaly detection in [65], spectral unmixing in [66], and target detection in [67]. This paper presents another new application of BSS to hyperspectral image classification with LCMV used as a criterion particularly designed for classification.
