1. Introduction
Hyperspectral remote-sensing images (HSIs) have rich spectral and spatial information and the unity of map and spectrum. Their applications have yielded important value in the fields of mineral exploration [
1], environmental monitoring [
2], precision agriculture and forestry [
3] and national defense and the military [
4]. However, the large volume and high dimension of HSI data can easily lead to a dimensional disaster [
5]. On the other hand, due to the numerous HSI bands, the strong correlation between adjacent bands and a lot of redundant information, information recognition and feature extraction are difficult and the accuracy is not high. Therefore, when using the original HSI, the band with the highest separability should be selected.
In recent years, scholars have conducted a large amount of research on hyperspectral BS methods. Jiang et al. [
6] proposed a BS method based on the minimum redundancy maximum relevance (MRMR), which computed the correlation between each band and the labels and then calculated the redundancy between each band and the other bands. The band subsets were selected by maximizing the relevance and minimizing the redundancy. Xu et al. [
7] used structural similarity to measure the relationship between bands and ranked bands according to their similarity and significant differences to select representative band subsets. In order to speed up the efficiency of band selection, a strategy called band grouping was introduced in BS. Wang et al. [
8] grouped bands followed by the clone selection algorithm to select representative bands for each group as a subset. Wang et al. [
9] proposed a fast neighborhood grouping BS method (FNGBS), which divided the HSI into several groups using a coarse-to-fine strategy and simultaneously selected the most relevant and information-rich bands for each group based on local density and information entropy factors. Although ranking-based BS methods can quickly obtain feature subsets, they often overlook the intrinsic structure of HSI data, resulting in high correlations among the selected bands. In contrast, filtering-based methods are designed based on certain mathematical models or principles, allowing them to utilize the structure and characteristics of HSIs to select bands relevant to the task, which can effectively remove redundant bands. Kira et al. [
10] proposed a statistical approach Relief, which had advantages in terms of time and accuracy. Fu et al. [
11] proposed a novel adjacent band grouping and normalized matching filter for BS (NGNMF), which preserved spatial–spectral information while reducing data dimensionality. However, limitations in the type of filters and parameter settings may restrict its applicability and performance. Cluster-based methods offer flexibility in selecting bands based on the actual characteristics of the data, thereby reducing the correlation among band subsets. Wang et al. [
12] proposed a hyperspectral band selection method based on an adaptive subspace partitioning strategy (ASPS), which divided HSI into multiple sub-cubes with the maximum ratio of inter-class distance to intra-class distance and selected the least noisy bands within each sub-cube. Zhang [
13] selected representative bands based on the similarity metric, calculated applicable similarity metric weights using the coefficient of variation and then put the similarity metric into K-means as the kernel with weights. Wang et al. [
14] raised an area-aware hierarchical latent feature representation learning-guided clustering (HLFC) method, which reflected band similarity by constructing a similarity map, learning latent features hierarchically and then clustering with k-means. However, cluster-based methods do not consider the overall performance after merging representative bands, and they are not guided by the classifier when searching for band subsets, leading to relatively low accuracy. In recent years, deep learning has also been applied to hyperspectral band selection and it has achieved good results [
15,
16,
17,
18,
19,
20,
21,
22,
23,
24], but it requires a large number of training samples, while labeled samples for HSI are limited.
To overcome the aforementioned issue, the wrapper method based on a heuristic search, with its strong ability to search feature space and full consideration of classification effect, has been extensively used in HSI BS. Su et al. [
25] used particle swarm optimization (PSO) to select hyperspectral bands while automatically identifying the optimum number of selected bands. It used two particle swarms: an external swarm to determine the appropriate band numbers and an internal swarm for band selection. It was proven that automatically selecting a variable number of bands had a better classification result than a fixed number of bands, but using two PSOs to search greatly increased the complexity. Medjahed et al. [
26] used the grey wolf optimization algorithm (GWO) for solving the BS problem and selected the subset of bands well by optimizing the objective function. Medjahed et al. [
27] proposed a BS method based on the sine cosine algorithm (SCA), which used SCA in combination with KNN to select band subsets based on classification accuracy. The results showed that IMACA produced accurate classification results, but the classification accuracy of selected bands was not high. Most existing optimization algorithms have many parameters, tend to fall into local optimality and cannot solve complex optimization problems. As a result, newly proposed algorithms are increasing, such as greylag goose optimization (GGO) [
28], Coati optimization algorithm (COA) [
29], parrot optimizer (PO) [
30], reptile search algorithm (RSA) [
31], MOQEA/D [
32], machine learning (ML) [
33], deep neural network (DNN) [
34], image segmentation [
35] and so on [
36]. In addition, there are also winners of CEC competitions such as LSHADE [
37], COLSHADE [
38], IMODE [
39], KGE [
40] and SASS [
41].
The wild horse optimizer (WHO) [
42] is a new meta-heuristic algorithm that has been employed successfully for the optimization of practical problems due to its limited parameters, great optimum capability and relatively low time complexity. For example, it has been employed in extracting model parameters in photovoltaic systems [
43], solving nonlinear multi-objective optimization problems in energy management [
44] and solving link failure problems in underwater channels [
45]. Although WHO can achieve satisfactory results on some practical issues, there are still some problems, such as a limited exploitation capability and stagnation of locally optimal solutions. Therefore, it is necessary to improve WHO according to practical problems. Ewees et al. [
46] proposed an improved version of WHO (WHOW) by using the spiral position update strategy of the whale optimization algorithm (WOA) for updating positions in WHO, and the experimental consequences indicated the advantages of WHOW in solving different optimization problems and its outstanding feature selection ability for most benchmark datasets. Zheng et al. [
47] proposed an improved WHO (IWHO), which utilized the random running strategy and a competition mechanism with waterhole to enhance development capabilities and then used a dynamic inertia weight strategy to optimize the global optimal solution. Simulation tests and application experiments demonstrated the best optimization ability of the improved algorithm. However, according to the NFL theory [
48], no single optimization algorithm can address all optimization difficulties, and algorithms need to be improved to fit actual problems. BS is essentially an NP-hard problem [
49] and as the number of bands grows, the above algorithms’ optimization processes may prematurely converge or even stagnate.
To solve these problems, this paper proposes a hyperspectral BS algorithm based on an enhanced WHO (IBSWHO), which can overcome the shortcomings of the original WHO and automatically select informative bands while maintaining excellent classification precision. The main contributions of this paper are described as follows.
The Sobol sequence, Cauchy mutation and dynamic random search technique are helpful for IBSWHO to step out of local optimality and increase search efficiency and exploitation ability.
A binary-coded version of WHO is proposed and applied to image feature selection.
HSI BS is constructed as a binary optimization problem, and IBSWHO is applied to HSI BS to select the optimal number of bands automatically, and its performance is verified on typical HSI datasets.
The rest of this paper is organized as follows:
Section 2 introduces the proposed BS algorithm (IBSWHO) and its specific steps. Experimental results and a comparative study are presented in
Section 3. Finally, the full paper and future work are summarized in
Section 4.
3. Experiments
In this section, two sets of experiments are conducted to test the optimization and convergence performance of IBSWHO on benchmark functions, and its effectiveness on HSI BS. All experiments were conducted on a simulation experimental platform with Intel (R) Core (TM) i7-8750H CPU @ 2.20 GHz, 8 GB RAM, the Windows 11 operating system and MATLAB R2021a.
3.1. Optimization Performance Test
For testing the optimization and convergence performance of IBSWHO, nine commonly used nonlinear benchmark functions are selected and PSO, GWO, WHO and a recently proposed improved WHO (IWHO) are used for comparison. The details of the test functions and the parameters of the optimization algorithms are shown in
Table 1,
Table 2 and
Table 3, respectively. The results of the comparative experiments are shown in
Table 4, and we have highlighted the best results in bold. Among the benchmark functions, F1–F3 are unimodal functions, F4–F6 are multimodal functions and F7–F9 are fixed-dimension multimodal functions. To verify the optimization ability of IBSWHO for complex problems, the CEC2019 test functions, which are more complex and difficult than other benchmark test functions, are introduced and compared with the three recent state-of-the-art top algorithms, i.e., SASS, COLSHADE and KGE, and basic optimization algorithms, i.e., PSO, GWO and WHO. The experimental results are shown in
Table 5.
To ensure the fairness and objectivity of the experiment, the population size of all algorithms is set to 30, the maximum iteration is set to 500, each algorithm runs 30 times independently and the average and standard deviation of 30 experiments are counted under the dimensions of 30, 200 and 500, respectively. The accuracy and quality of each algorithm’s solutions are evaluated by comparing their average values, while the algorithm’s stability is indicated by the standard deviation value.
It can be seen from
Table 4 that the proposed IBSWHO achieves the greatest advantages in accuracy and stability compared with the other four algorithms in most functions, regardless of unimodal, multimodal or fixed-dimension multimodal functions. Therefore, when solving optimization problems, IBSWHO finds it the easiest to exit out of the local optimum and has the best exploration and development ability.
From
Table 5, it can be seen that compared to the other six algorithms, IBSWHO has the closest average value to the theoretical optimum in the eight functions, demonstrating the best convergence performance and search success rate. Therefore, IBSWHO possesses the capability to solve complex problems.
However, this is not the case for all test functions, such as F2, F6, CEC04 and CEC08, which is normal, as the NFL theorem suggests that one optimization algorithm cannot solve all problems.
3.2. Experimental Results and Analysis of Band Selection
To certify the effect of IBSWHO in HSI BS, it is compared with several meta-heuristic optimization algorithms and common and advanced band selection methods. To find the band subset that minimizes the classification error rate, all algorithms employ the classification error rate under the SVM classifier as the objective function, given that it stands out as one of the most competitive classifiers in small-sample problems. The LIBSVM library is utilized to implement the SVM classifier, with the radial basis function (RBF) serving as the chosen kernel. During the training phase, the two parameters of the SVM (c and γ) are selected through 5-fold cross-validation.
3.2.1. Description of Datasets
The experiment is conducted using three commonly used hyperspectral datasets, which were obtained by different sensors, namely the Indian Pines, Pavia University, and Salinas datasets.
- (1)
Indian Pines: The Indian Pines dataset is imaged by the airborne visual infrared imaging spectrometer (AVIRIS) in 1992 on a patch of Indian Pine trees in Indiana, USA, acquiring an image size of 145 × 145 pixels, a spectrum range of 0.4–2.5 μm and a spatial resolution of 20 m, containing 220 bands and 16 ground truth classes. In this paper, we use the bands that contain the absorbing region removed for a total of 200 bands. The pseudo-color image and the ground truth image are shown in
Figure 4.
- (2)
Pavia University: The Pavia University dataset is a portion of hyperspectral data imaged by the German airborne reflection optical spectral imager (ROSIS) in Pavia, Italy, in 2003. The size of the images is 610 × 340 pixels, the spectrum range is 0.43–0.86 mm and the spatial resolution is about 1.3 m. It contains 115 bands and 9 classes of ground truth. In this paper, 103 available bands are used for subsequent research after removing 12 noise bands. The pseudo-color image and real image of the ground are shown in
Figure 5.
- (3)
Salinas: The Salinas dataset is also imaged by AVIRIS, which is an image of the Salinas Valley in California, USA. The size of the image is 512 × 217 pixels, the spectrum range is 0.4–2.5 μm and the spatial resolution is 3.7 m. It contains 224 bands and 16 kinds of ground objects. The pseudo-color image and the real image of the ground are shown in
Figure 6.
3.2.2. Experimental Parameter Settings
Firstly, the proposed IBSWHO is compared with two basic and selection methods (MRMR and Relief), and three recently proposed effective BS methods (ASPS, FNGBS and NGNMF). Additionally, other optimization algorithms, including PSO, GWO, GA and WHO, are also used for further comparison with the parameters set as shown in
Table 3. Selected band subsets are classified by SVM, and the classification accuracy is used to assess the discrimination ability of the band subsets. For each dataset, 20% of samples are randomly selected as training data, and the remaining 80% are used as testing data. To ensure the fairness of the experimental results, each algorithm is repeated 10 times on each dataset with the initial population of 20, and the maximum number of iterations is 20. There are numerous evaluation metrics to assess the classification performance in HSI classification. In this paper, we utilize the averages of OA (overall accuracy), AA (average accuracy), Kappa (Kappa coefficient) and each category accuracy of the band subset for evaluation. The concepts of OA, AA and Kappa are described in
Appendix A.
3.2.3. Analysis of Experimental Results
Table 6 lists the average results of IBSWHO and other competing methods running on the Indian Pines dataset. As shown in the table, IBSWHO has the highest classification accuracy and the best performance in all methods. Class 3 (Corn-mintil) increased from 40.06% to 77.56%. Class 12 (Soybean-clean) improved its classification accuracy by 7.98% compared with the best-performing NGNMF. However, it can also be observed that IBSWHO performs poorly in some categories, possibly because similar spectral land-cover categories make it more difficult to distinguish between categories. The classification effect of Class 9 (Oats) is not ideal owing to the small sample size in this category. Classification plots for all methods on the Indian Pines dataset are shown in
Figure 7. The black background represents unlabeled pixels, colors consistent with the true sample color represent correctly classified pixels and inconsistent colors represent misclassified pixels. The more correctly classified pixels, the smoother the classification plot is. From the figure, for example, we can see that the classification map obtained by IBSWHO is the smoothest, compared to that obtained by other methods that produce the Soybean-clean region.
The average of the experimental results on the Pavia University dataset is shown
Table 7.
It can be seen from
Table 7 that the classification result of IBSWHO is the highest in all algorithms, especially the increase from 9.96% to 86.94% for Class 7 (Bitumen), and the individual accuracy of Class 6 (Bare-soil) which also increased by 18.37% compared to ASPS. IBSWHO’s Kappa coefficient is greater than 0.93, indicating that the predicted labels are generally consistent with the true labels, so IBSWHO has a strong optimization ability on the Pavia University dataset. The classification graphs on this dataset are shown in
Figure 8, which show that the classification image of IBSWHO is the smoothest. For example, the Bricks region generated by IBSWHO has the fewest misclassified pixels which is the smoothest.
The average result on the Salinas dataset is shown
Table 8.
It is observed from
Table 8 that the OAs of the BS methods using optimization algorithms are all above 90%, indicating that the optimization algorithms have a good band selection effect on the Salinas dataset. Furthermore, the results of IBSWHO are still the best. For instance, the accuracy of Class 14 (Lettuce7 wk) was 3.85% higher than that of PSO, and Class 15 (Vinyarduntrained) improved by 16.33% over PSO. Compared to the performance of the three advanced methods, IBSWHO demonstrates the best performance, indicating that IBSWHO is the most feasible BS method for the Salinas dataset. Classification plots are shown in
Figure 9, which show that the plot of IBSWHO is the smoothest and clearest.
In summary, in terms of classification, compared with the basic ranking and filtering methods, the wrapper based on an optimization algorithm is the most effective. The reason is that the basic ranking and filtering methods use mutual information as the indicator of selecting band subset rather than a classifier system, so the filter has a short time but a low accuracy. Compared with other optimization algorithms and advanced band selection techniques, although the accuracy of IBSWHO is lower than that of the comparison algorithm in some classes, the overall accuracy is the highest, which can reflect the effectiveness of the added modification strategies. Moreover, the performance of IBSWHO in the three datasets is the best, which also shows the robustness of IBSWHO.
In terms of convergence performance, curves for the variation in fitness with iterations are displayed in
Figure 10. As can be observed, the initial fitness of IBSWHO is the lowest in all datasets, indicating that the Sobol sequence used in the initialization phase can enhance population diversity and provide a better solution. With the increase in iterations, the iterative curves of GA, PSO, GWO and WHO tend to stabilize, while IBSWHO maintains a declining trend, which indicates that the mutation strategy and dynamic random search technique are helpful to improve the exploration and development ability of IBSWHO and help to find a better band subset. Thus, IBSWHO has an excellent optimization ability to find the best band subset with the best classification accuracy.
3.2.4. Comparison of the Number of Selected Bands
Due to the adaptive selection of bands based on the task, both the selected bands and the number of bands are not fixed. In the above band selection experiments, we present the average number of bands selected by the optimal band subset in
Table 9.
From the table, it can be observed that GWO selects the smallest number of band subsets. This is because GWO is an efficient search algorithm with diverse search strategies, allowing it to effectively explore different band subsets and find the optimal solution with the minimum number of bands. However, although GWO selects the smallest number of bands, as shown in
Table 6,
Table 7 and
Table 8, its classification accuracy is not high. This indicates that GWO may lose bands with high information content, and the features are not well preserved. In addition to GWO, the proposed IBSWHO selects the smallest number of bands, and its classification accuracy is the highest. This suggests that IBSWHO can select the most critical and effective features, and the small number of bands indicates the strong generalization of the algorithm.
3.2.5. Statistical Significance Evaluation
In order to evaluate the statistical significance of the differences between IBSWHO and other comparison algorithms for classification, we carried out a nonparametric McNemar test [
55]. It is based on a standardized normal test statistic:
where
indicates the number of samples correctly classified by Method 1 but incorrectly classified by Method 2, and
is the number of samples correctly classified by Method 2 but incorrectly classified by Method 1. For a 5% significance level, if
> 1.96, there is a significant performance difference between the two methods.
Table 10 presents the value of
comparing IBSWHO with the other comparison methods. The results of the McNemar test show that the performance of IBSWHO is statistically different from other methods, and only has no significant difference from the NGNMF method on the Salinas dataset. The reason for this is that the Salinas dataset contains a large area of land cover and a sufficient number of samples. The strategy of selecting similar pixels in NGNMF can adaptively extract land-cover features, thus demonstrating a stronger robustness for complex land-cover scenarios. Therefore, NGNMF can achieve better classification results, but the proposed IBSWHO method still outperforms it.
3.2.6. Effect of Hyperparametric Population Size on Accuracy
The population size
is defined at the initial stage of the algorithm. In this paper,
is set to 20, the value of which has a large impact on the convergence speed and solution accuracy of the algorithm. In this subsection, we discuss the effect of
on the accuracy of all algorithms on the Indian Pines dataset. The relationship among different numbers of
, OA, running time on the Indian Pines dataset is shown in
Figure 11. According to the graph, when the number of populations is 20 and 30, the classification accuracy reaches the maximum.
However, when population size is 30, the time complexity increases a lot, but there is little difference in classification accuracy, so the population size of 20 was selected in this experiment.
3.2.7. Ablation Analysis of IBSWHO
For the sake of further proving the effectiveness of incorporating different strategies, we conducted ablation experiments on three datasets. The three strategies are denoted by Sobol, Cauchy and dynamic, respectively. Each experiment was performed ten times and then the results were averaged. The results of experiments are shown in
Table 11.
The Indian Pines dataset suffers from limited training data and unbalanced category distribution. As shown in
Table 11, the inclusion of the Sobol strategy resulted in a relatively significant improvement, with OA improving by 0.88% and AA improving by 1.48% over WHO. This is due to the fact that the Sobol sequence is a non-overlapping, low-bias random number generation method that allows for more accurate results with small sample sizes.
The Pavia University and Salinas datasets have a large scale and high band correlation, making it difficult to remove redundant information. The Cauchy variation and the dynamic random search technique can speed up the algorithm to jump out of the local optimum and expand the search range to select band subsets more efficiently. The Cauchy variation is more effective for the Pavia University dataset, which improves the OA by 0.7% compared to WHO. The dynamic random search technique is more effective for the Salinas dataset, which improves the OA by 0.63% compared to WHO. Moreover, different combinations of these strategies have different effects for different datasets, and only when the three strategies are added at the same time are the classification results the best in different datasets, which shows that the combination of the three strategies increases the robustness of the proposed IBSWHO.
In summary, the combination of three strategies allows IBSWHO to achieve excellent results in both band selection and classification, and the absence of strategy reduces classification accuracy.
4. Summary and Prospects
BS is a non-transformational feature selection method, which is important for improving the classification accuracy of HSI, removing redundant bands with high correlation and reducing computational complexity. In this paper, a new HSI BS method based on an enhanced wild horse optimizer (IBSWHO) is proposed. IBSWHO improves the performance of jumping out of local optimum by increasing population diversity and mutation to expand search ranges, and it automatically selects the most appropriate band subset for HSI classification task. To verify the effectiveness of IBSWHO, we use SVM as a classifier to compare IBSWHO with advanced band selection methods and other optimization algorithms on three commonly used hyperspectral datasets, and we use overall accuracy, average accuracy, Kappa coefficient and individual class accuracy as evaluation indicators. In accordance with experimental results, IBSWHO’s classification accuracy is satisfactory, and for some classes with complex spectral features, it is also the best in comparative methods. Therefore, IBSWHO can select excellent bands for classification tasks, separate classes well and improve classification accuracy. Moreover, IBSWHO has a small number of parameters, which do not easily fall into the local optimum and converge stably to the global optimum solution. With fixed parameters, IBSWHO achieves good results both for the benchmarking function and for the hyperspectral band selection task, so this algorithm is a generally applicable and robust algorithm that does not involve a large number of hyperparameter adjustments.
However, as only classification accuracy is used as an objective function, the relationship between classification accuracy and the number of bands selected is not well balanced. At the same time, improved strategies increase the time complexity of the algorithm to some extent. Therefore, in future work, we will explore higher-quality objective functions and improve the execution efficiency of the algorithm.