*3.2. Wrapper Methods*

Wrapper methods search for a subset of features that gives the best classification performance, with the best performing subset being selected. Although generally considered to outperform filter methods, wrappers are known to be computationally demanding and can suffer from overfitting [82].

Two of the studies reviewed implemented genetic algorithms (GA), in which wavebands are encoded as genes that are subsequently grouped into chromosomes. These chromosomes are allowed to evolve over many generations where their fitness, as determined by a classifier, controls their likelihood to reproduce and pass their genes onto the next generation. Fitness of chromosomes is determined each generation by a chosen classifier, and with the classification accuracy of each chromosome being its fitness score, chromosomes with increased fitness are more likely to reproduce. Both studies used the same dataset of lab measured tropical mangrove leaves [49,50]. The selection of bands differed between the two studies, despite the use of the same dataset and feature selector, though methodologies did differ. The variability of selected bands with similar classification performance seen between these studies demonstrates that multiple band selections can perform classification equally well. The ensemble of chromosomes used in [50] helped to identify key regions for discriminating target species related to biophysical and biochemical aspects of the vegetation that may have been missed if a study was reliant upon the first single chromosome to reach the stopping criterion. This is apparent when comparing the bands selected in both studies, with [49] selecting no VIS bands, resulting in the authors concluding that pigments were not significant for the discrimination of the target species. However, the importance of the VIS, particularly the green region became apparent in [50] where 21 out of 120 total bands were selected from 513 ±19 nm.

Forward feature selection (FFS) is a wrapper method of feature selection that begins with a model containing a single feature that best discriminates the classes, with new features iteratively added to the model based on their ability to improve class discrimination [83]. FFS was implemented by [27] in their comparison between floral and leaf spectra, however, only the results for leaf spectra are discussed here. The leaf spectra within this study were constrained to 475–900 nm at 1 nm increments, with only eight wavebands being selected. These bands came from narrow regions of the spectra, occurring at 450–499 nm in the blue, and the red minimum and red edge from 650–749 nm. In a similar spectral range of 402.9 to 989.1 nm of airborne collected spectra, a very different feature selection trend was observed by [11] following the use of the FFS variant sequential floating feature selection (SFFS). Wavebands were selected from across the entire reduced spectrum, with a notable gap in selection

occurring in the NIR between 800 and 849 nm. Selection differences exhibited between these studies could be related to the differences in target species, leaf or canopy scale spectra, or version of FFS used. The only VIS-SWIR study in this review to use FFS applied it to AVIRIS imagery of urban street trees [9]. However, feature selection was only performed to identify spectral regions responsible for species separability, with all bands used for classification. These informative spectral regions matched a number of known informative regions from the literature, such as water absorption in the NIR, cellulose and lignin features in the SWIR, and bands associated with photosynthetic pigments in the VIS. Interestingly however, the highly selected red minimum and red edge were not selected in this study, along with the majority of the NIR.

#### *3.3. Embedded Methods*

Despite being described as a wrapper method in [8], recursive feature elimination with a support vector machine (SVM-RFE) is considered to be an embedded method [84]. Embedded methods differ from wrappers, as they do not treat the classifier as a black box, rather, features are selected using information gained whilst training the classifier [85]. A claimed strength of SVM as a classifier is its reported independence of the Hughes effect, or curse of dimensionality [86,87]. However, it has been shown that SVM classifications can be affected by the Hughes effect and can benefit the from dimensionality reduction of its inputs, especially when sample sizes are small [88].

In order to be used as a feature selection method, [8] implemented recursive feature elimination (RFE) with a SVM, determining that from the original 401 bands the optimal number of features to include for classification is 20, after 1–5, 10, 15, 20, and 30 were all evaluated. The 20 bands selected demonstrated a number of trends that were not apparent in the other feature selection methods implemented in the same study. Firstly, the bands formed four distinct contiguous clusters at 520–530 nm, 745–775 nm, 1005–1030 nm, 2295–2305 nm, and then a final single band at 2345 nm. Secondly, the wavelengths of certain selected bands were also unique amongst the methods used, with SVM-RFE being the only method to select bands from the NIR plateau out of all feature selection methods implemented in [8]. Additionally, being the only method to not select bands from the NSWIR. Although not reported in a manner suitable for inclusion in Table 1, [17] also performed feature ranking with a SVM. As with [8], [17] identified the optimal number of features to be between 15 and 20, depending on the dataset, pre-processing, and feature selection methods used. Unlike [8], where the SVM selected bands from distinct contiguous regions, [17] report the SVM selecting bands evenly spread over the entire spectrum.

Random forest (RF) is an ensemble classification method, in which a number of decision tree classifiers are trained from a sub-sample of the dataset, with their results combined via a voting system. One third of samples are retained for validation purposes known as the out-of-bag (OOB) samples, with the remaining in-the-bag samples being used to construct the decision tree [89].

Of the original 72 bands in [29] between 384.8 nm and 1054.3 nm, eight were selected for classification via RF. Although no other feature selection method was implemented in this study, a previous study by [10] performed feature selection with the spectral angle mapper (SAM) add-on Selector using the same data. This resulted in the selection of a far greater 31 bands. Upon binning of the bands at 50 nm, a clear difference in the selection methods are evident (Figure 1). The RF selected bands of [29] are focused in the 400–550 nm region with a single band from the red edge at 706 nm, whereas the SAM bands are focused along the red edge and NIR plateau between 650 and 950 nm, with additional bands in the 350–450 and 1000–1050 nm regions.

As with the bands selected in [29], the RF selected bands in [36] fell within four bins in the VIS and VNIR regions. However, in [29], band selection was focused on the green region with limited selection apparent in the red and NIR plateau with the exception of a single band near the red edge inflection point. This focus was seemingly switched in [36] with bands falling into the bins along the red edge up to the NIR plateau shoulder, with the remaining bin occurring at the blue/green edge. The Chan and Paelinckx study [36] also offers a comparison to an alternative feature selection method using the best-first search (BFS) algorithm as a wrapper. The band selection techniques differ greatly in the VIS and VNIR regions with only the bins at 450–499 and 700–749 in common. However, band selection is more similar at longer wavelengths where the majority of bands were selected by both methods.

The wavebands selected via RF in [8] are in direct opposition to those selected by RF in [36]. Selected bands in [36] mainly occurred along the red edge and NIR plateau shoulder, no band was selected in this region by [8]. Instead, focus was placed on the green, yellow, and red regions of the VIS wavelengths, an area completely ignored by [36] RF selector, though significant for their BFS selection. Additionally, [8] provided the top 20 informative bands determined by a RF classifier using the full 201 waveband dataset. Although these two implementations of RF differed in selecting bands, the overall trend was very similar, with high selection rates in the VIS, low in the NIR, and similar selection throughout the SWIR.

Additionally, a study by [33] produced waveband selections similar to those in [8] with similar results in the VIS with the exception of no selection in the early green (500–549 nm), and selection of the red edge bin rather than the red minimum. The biggest difference between [33] and all other RF studies is the reduced selection at longer wavelengths, although all studies essentially ignored the NIR, [33] only selected two bands from the SWIR, both within the same NSWIR bin at the water absorption feature near 1400–1449 nm.
