1. Introduction
Machine learning and data mining have expanded in many fields, including active matter, molecular and materials science, nature language process (NLP) and biomedicine [
1,
2,
3]. To create more complex machine learning models, many datasets with high-dimensional feature spaces are created [
4,
5]. However, as the dimensionality of data increases, there are more and more redundant features, and it becomes more difficult to train models with high generalization ability. Therefore, it is necessary to perform feature selection to solve these problems. Feature selection is a critical step in data mining and machine learning that involves identifying the most relevant and useful features within a dataset or set of characteristics. Predictive models can be more effective and precise by eliminating redundant or unnecessary features. This improves classification accuracy and helps algorithms generalize better to new data, prevent overfitting, and produce more accurate predictions. In addition to these benefits, feature selection can help uncover hidden relationships within the data and provide more insightful explanations for predictive models.
Unsupervised Feature Selection methods can be classified into three main approaches, similar to supervised and semi-supervised feature selection [
6,
7]. These approaches are determined by the feature selection strategy employed, including filter, wrapper, and embedded methods [
8,
9]. In particular, filter methods rank the features according to the calculated scores by using a statistical metric to assign each feature a meaningful score. However, they might use up more computational resources. In the wrapper method, the selection subset obtained by the selection algorithm is evaluated using a classifier, and feature selection is guided by the feedback of the classifier [
10]. As a result, the accuracy of the wrapper method is greater than the filtering method, as indicated by [
11]. Furthermore, feature selection is regarded as a component of the machine learning training phase, which makes embedding methods a particular case of packing methods [
12,
13].
Meanwhile, it is possible to think of feature selection as a global group optimization problem. In particular, some of the original dataset answers the optimization problem, which can be resolved using exhaustive and heuristic search techniques [
14]. In contrast to heuristic search methods, exhaustive search methods typically have higher computational costs, especially for high-dimensional datasets [
4]. Using meta-heuristic search techniques may be a more practical way to solve the feature selection problem [
14]. Therefore, it is essential to choose features using efficient methods.
Evolutionary algorithms (EAs) have recently been used to solve feature selection challenges for the global search capacity of feature selection methods. Numerous researchers have used various population intelligence techniques to address feature selection issues, including the cuckoo search (CS) [
15], genetic algorithm (GA) [
11], particle swarm optimization (PSO) [
12], whale optimization algorithm (WOA) [
16], sparrow search algorithm (SSA) [
17], harris hawks optimization (HHO) [
18,
19] and variants of these algorithms, for dealing with the feature selection problems. For instance, Hegazy et al. [
20] attempt to enhance the basic SSA structure to increase the solution accuracy, reliability, and convergence speed. Additionally, Behrouz et al. propose an unsupervised probabilistic feature selection algorithm using ant colony optimization [
21].
The golden jackal optimization (GJO) algorithm is one of the EAs, and research has demonstrated that it is both efficient and simple to apply [
22]. Nevertheless, despite its extensive use, the conventional GJO algorithm may have certain drawbacks, such as insufficiently exploiting issue areas. In addition, the no free lunch (NFL) theory contends that no single algorithm is capable of solving every optimization problem [
23]. The conventional GJO algorithm was created to solve continuous optimization issues. There might be better choices for feature selection tasks involving binary solution spaces. These circumstances drive us to improve the conventional GJO to make it better suited for feature selection tasks.
The main contributions of this paper are summarized as follows:
We aim to simultaneously reduce the number of selected features and improve the classification accuracy. Specifically, we design a fitness function to achieve these optimization objectives jointly.
We propose an improved binary golden jackal optimization algorithm (IBGJO) to solve the designed fitness function. First, IBGJO introduces a chaotic tent map to improve the exploitation capability of conventional GJO. Second, a new position-updating mechanism by cosine similarity is proposed to balance the exploitation and exploration capabilities of the algorithm. Finally, a binarization strategy is introduced to transfer the continuous solution space to the binary ones, making it suitable for dealing with feature selection solutions.
We conduct various experiments to assess the performance of the proposed IBGJO with the comparative algorithms on 28 classical UC Irvine (UCI) Machine Learning Repository datasets in terms of average fitness value, average classification accuracy, average CPU running time and average number of selected features.
The rest of this paper is organized as follows.
Section 2 gives a brief overview of the related work.
Section 3 designs the formulated fitness function of feature selection.
Section 4 gives the details of IBGJO.
Section 5 presents the experimental results. Finally,
Section 6 concludes this paper and suggests the future works.
2. Related Work
The significance of wrapper-based selection techniques in feature selection optimizations cannot be overlooked [
24,
25]. These methods operate on the premise of treating feature selection as a black box, and employ meta-heuristic algorithms and classifiers to obtain the optimal subset [
26]. Numerous classical meta-heuristic algorithms have undergone modifications to tackle the feature selection problem, such as binary bat algorithm (BBA) [
27], bare bones particle swarm optimization algorithm (BPSO) [
12], binary gray wolf optimization algorithm (BGWO) [
28], binary gravitational search algorithm (BGSA) [
29], and so on.
In recent times, an increasing number of novel algorithms have been proposed to enhance the optimization of feature selection problems based on the wrapper approach, due to their vital significance. For instance, Al-Tashi et al. [
30] examine binary optimization utilizing hybrid grey wolf optimization for feature selection in their paper. To resolve feature selection issues, a binary version of the hybrid grey wolf optimization (GWO) and PSO is suggested. In 2019 [
31], binary variations of the butterfly optimization algorithm (BOA) are suggested and utilized to choose the best feature subset for classification purposes. A self-adaptive particle swarm optimization (SaPSO) approach is suggested by Xue et al., especially for large-scale feature selection [
32]. The two-archive multi-objective artificial bee colony algorithm (TMABC-FS) is a multi-objective feature selection approach that Zhang et al. investigate to satisfy diverse decision-makers’ criteria [
33]. To increase the predictability of the hospitalization expense model, a novel method proposed based on the GA for feature selection and parameter optimization of the support vector machine (SVM) in 2019 [
34]. Aimed at finding distinguishing characteristics across several class labels, Zhang et al. [
35] offer an embedded multi-label feature selection approach with manifold regularization. To develop a more affordable computational model for voice analysis-based emotion categorization, Dey et al [
36] offer a meta-heuristic feature selection (FS) method employing a hybrid of equilibrium optimization (EO) and golden ratio optimization (GRO) algorithms. Wang and Chen [
37] propose an improved whale optimization algorithm (CMWOA) that integrates chaotic and multi-swarm techniques to accomplish parameter optimization and feature selection simultaneously for SVM in 2020. For feature selection issues in medical diagnosis, a hybrid crow search optimization method integrated with chaos theory and fuzzy c-means algorithm was proposed in 2020 [
38].
Several leading-edge researchers have focused on GJO algorithms for optimizing feature selection. Initially designed to address continuous problems, GJO requires transfer functions to convert it into a binary form (BGJO) [
39] that can effectively handle feature selection optimizations. While some studies have made strides in addressing feature selection challenges across a variety of contexts, it is important to note that the NFL [
23] theorem holds that no method can solve every optimization problem. Furthermore, none of the aforementioned research has identified optimal subsets of variables across all datasets tested. Nonetheless, given the strong potential of conventional GJO in this area, our aim in this study is to incorporate several enhanced factors into conventional GJO with the objective of improving the efficiency of feature selection optimizations.
3. Problem Formulation
In this study, feature selection aims to minimize the number of chosen features while improving the classification accuracy, which can be defined as a multi-objective optimization problem [
40]. To consider the two objectives of optimization, we constructed the following fitness function:
where
and
stand for the number of chosen features and the total number of features, respectively, and
is the classification error rate of a certain classifier. Additionally, the weights used to balance these two goals are
and
.
The formulated feature selection problem has a nonlinear discrete search space with numerous potential local minimum points. As a result, we suggest the binary IBGJO algorithm to address the feature selection problem.
4. Proposed Improved Golden Jackal Optimization Algorithm for Feature Selection
The following section provides a succinct overview of the conventional GJO algorithm and its key principles. Additionally, the conventional GJO algorithm is reviewed before delving into a comprehensive discussion of the proposed IBGJO algorithm’s formulation.
4.1. Conventional Golden Jackal Optimization
The conventional GJO algorithm is inspired by the hunting behavior of golden jackal pairs and adopts a swarm-based approach [
22].
Figure 1 shows the entire foraging process of the golden jackal pair. The whole foraging process includes searching for prey, tracking and surrounding of prey, attacking prey and capturing prey. This section delves into the mathematical modeling of the conventional GJO algorithm.
4.1.1. Search Space Formulation
The initial solution of the golden jackal optimization algorithm is also uniformly distributed on the search space, which is similar to other metaheuristic methods, and its distribution is as follows:
where
represents the initial randomized population, and
and
denote the upper and lower boundaries of the decision variables. Moreover,
is a random number that falls within the range of
. The initialization procedure involves generating a foundational
matrix, with the male and female jackals occupying the first and second positions, correspondingly. The composition of the
is illustrated as follows:
where
stands for the
i-th prey’s
j-th dimension. There are
n preys and
d variables in total. The prey position can be regarded as an optimal solution. During optimization, an objective function is used to assess the fitness of each prey, with the resulting fitness values being compiled into a matrix:
where
f is the objective function,
displays the value of the
j-th dimension of the
i-th prey, and
is the matrix for storing each prey’s fitness. A male jackal (MJ) is the most suitable, and a female jackal (FMJ) is the second most suitable. The jackal couple finds the appropriate prey location.
4.1.2. Searching for Prey (Exploration Stage)
With their remarkable capability to detect and pursue prey, jackals can usually track down food successfully. Nevertheless, there are instances when their attempts fail, and the potential prey evades capture, prompting the jackals to give up and search for alternative sources of sustenance. During hunts, the MJ takes the lead, while the FMJ follows closely behind, and the mathematically modelled jackal pairs hunt as follows:
where
t represents the current iteration,
is the vector indicating the prey’s position. In contrast,
and
are the positions of the MJ and FMJ, respectively. The revised positions of MJ and FMJ in relation to the prey are represented by
and
. The prey’s evasive energy
E is computed as:
depicts the beginning state of the prey’s energy, while
represents the prey’s declining energy, where
r is any random value between 0 and 1.
T stands for the max iteration number and
is a constant value of 1.5.
decreases linearly across iterations, from 1.5 to 0. In Equations (
5a) and (
5b), the distance between the jackal and the prey is calculated by
. Depending on how well the prey manages to evade the jackal, this distance is either added to or deducted from its present location. The vector of random numbers
in Equations (
5a) and (
5b) represents the Lévy movement and is based on the Lévy flight. Prey is multiplied by
to imitate Lévy-style prey movement, which is comparable to MPA [
41] and is computed as follows:
is the Lévy flight function, which is calculated as follows:
where
is constant set to 1.5 and
u,
v are random values inside of
. The jackal positions are updated by averaging Equations (
5a) and (
5b), which results in the following:
4.1.3. Tracking and Pouncing the Prey (Exploitation Stage)
As prey are pursued by jackals, their evasive energy declines, leading to the eventual encirclement of the prey by a pair of jackals identified in an earlier phase. Once encircled, the prey is attacked and consumed by the jackals. The following mathematical model is a representation of the hunting behaviour of male and female jackals that hunt in pairs, which is as follows:
where
is the position vector of the prey during the current iteration
t, and
and
indicate the position of the MJ and FMJ. The updated MJ and FMJ positions in relation to the prey are represented by
and
. Equation (
6a) determines the prey’s evading energy, or
E. The jackal positions are updated in accordance with Equation (
9).
The purpose of
in Equations (
10a) and (
10b) is to allow for arbitrary behavior in the exploitation stage, favoring exploration and avoiding local optima. Equation (
7) is used to determine
. In the final iterations, this component aids in avoiding local optima sluggishness.
As a result of jackals moving closer to the prey, the factor can be carefully considered. Typically, natural obstacles stand in the way of jackals’ proper and swift movement toward their prey. This is the goal of during the exploitation stage.
4.1.4. Switching from Exploration to Exploitation
The escape energy of the prey is utilized in the conventional GJO algorithm to transition from exploration to exploitation. Throughout avoiding behavior, prey energy significantly decreases. In light of this, Equation (
6a) is used to represent the evasive energy. Every repetition, the initial energy
deviates arbitrarily from the range of
to 1. The prey is physically waning when
value decreases from 0 to
, but when
value increases from 0 to 1, it indicates an improvement in the strength of prey.
According to
Figure 2, the altering avoiding energy
E decreases over the iteration procedure. When
, jackal partners hunt for prey that is exploring in different areas, and when
, the jackal attacks the prey and engages in predation, as depicted in
Figure 1.
To sum up, the conventional GJO search procedure starts with the random generation of a population of prey (possible solutions). MJ and FMJ hunting couples calculate the location of the prey during iterations. Each prospective member of the population updates their separation from the jackal pair. To emphasize exploration and exploitation, the
parameter is decreased from 1.5 to 0, accordingly. When
, the hunting pair of golden jackals strays from their victim, and when
, it gathers at the prey. The conventional GJO algorithm is finally completed by satisfying an end criterion. Algorithm 1 presents the conventional GJO algorithm’s pseudo-code.
Algorithm 1 Conventional Golden Jackal Optimization |
Require: The size of population , solution dimension , the max number of iterations , lower and upper bounds , , the fitness function, the golden jackal , , etc. |
Ensure: The best solution found in the search process |
- 1:
Initializing the population through random mechanism - 2:
While - 3:
Calculate the fitness values of preys - 4:
best prey (Male Jackal position) - 5:
second best prey (Female Jackal Position) - 6:
for (each prey) - 7:
Update the evading energy E using Equations ( 6a), ( 6b) and ( 6c) - 8:
Update using Equations ( 7) and ( 8) - 9:
if // (Exploration phase) - 10:
Update the prey position using Equations ( 5b), ( 5a) and ( 9) - 11:
if //(Exploitation phase) - 12:
Update the prey position using Equations ( 10a), ( 10b) and ( 9) - 13:
end for - 14:
- 15:
end While - 16:
return
|
4.2. The Proposed IBGJO
This section introduces the enhancement factors in the proposed IBGJO algorithm, including the random population initialization strategy based on the Chaotic Tent map, the optimal location update mechanism based on cosine similarity, and the sigmoid function used to discretization the continuous solution space problem. Finally, the complexity of the IBGJO algorithm was analyzed.
4.2.1. Chaotic Tent Map for Initiate Population
In the conventional GJO algorithm, initial population information is generated randomly, which can pose difficulties in retaining population diversity and hinder the algorithm’s effectiveness in achieving the optimal solution. In contrast, the chaotic tent map (CTM) mechanism is characterized by randomness, ergodicity, and regularity. It can be used either to generate the initial population or as a perturbation during the optimization process [
37,
42]. This approach overcomes the limitation of the algorithm becoming trapped in a suboptimal local solution, thereby improving its search efficiency compared to the original algorithm. The CTM mechanism is described as Algorithm 2.
Algorithm 2 Chaotic Tent Map (CTM) Mechanism |
Define and initialize the related parameters: the size of population , solution dimension , chaotic tent map threshold a, low boundaries and up boundaries , respectively.
- 1:
For to - 2:
For to - 3:
If - 4:
= - 5:
Else - 6:
= - 7:
= + - 8:
return of x - 9:
For to - 10:
For to - 11:
If - 12:
= 0 - 13:
Else - 14:
= 1 - 15:
return x
|
where
a is the tent map’s call threshold, generally set as 0.5. In IBGJO, we use a CTM as the initialization mechanism. Considering the different dimensions of datasets, we provide Hillvalley in 28 datasets as an example of population initialization. As shown in
Figure 3, the number of golden jackals in the population is 20, and the dimension is 100. And in
Figure 3, the points labelled as random population initialization are denoted in red, while the points labelled as CTM population initialization are represented in blue. As can be seen, compared with the random mechanism, the CTM mechanism has good distribution and randomness. Therefore, the initialized population is more evenly distributed in the search space, which is more conducive to the algorithm’s optimization efficiency and solution accuracy.
4.2.2. Cosine Similarity for Position Update
The conventional GJO algorithm (Algorithm 3) updates the position of jackals by Equation (
9) during the iteration process, equivalent to using the mean as a more optimal solution. Although this method can ensure the smoothness of jackal position updates, it has some drawbacks. The most obvious flaw is that it does not consider the correlation between different features. When there is a correlation between features, using the mean update mechanism may lead to some features being overemphasized or ignored, thereby affecting the model’s performance. In addition, when the data distribution is uneven, using the mean update mechanism may lead to poor prediction performance of the model for specific data. Therefore, we propose cosine similarity for positions updating of FMJ and MJ. Compared with the mean update mechanism, the advantage of using cosine similarity as the update mechanism is that it can consider the correlation between different features, thus updating model parameters more accurately. In addition, cosine similarity is not affected by vector length and data distribution and is suitable for high-dimensional data [
43]. The mathematical model of cosine similarity is defined as follows:
where the
and
represent the position of FMJ and MJ, respectively, and the · means dot product.
and
represent the lengths of FMJ and MJ, respectively. The value range of
is [−1, 1]. In this paper, we improve the cosine similarity between golden jackal pairs, using the absolute value as the weight of position update, which is defined as follows:
Algorithm 3 Improved Binary Golden Jackal Optimization |
Require: The size of population , solution dimension , the max number of iterations , lower and upper bounds , , the fitness function, the golden jackal , , etc. |
Ensure: The best solution found in the search process |
- 1:
Initializing the population through chaotic tent mechanism by Algorithm 2 - 2:
While - 3:
Calculate the fitness values of preys - 4:
best prey (Male Jackal position) - 5:
second best prey (Female Jackal Position) - 6:
for (each prey) - 7:
Update the evading energy E using Equations ( 6a), ( 6b) and ( 6c) - 8:
Update using Equations ( 7) and ( 8) - 9:
if // (Exploration phase) - 10:
Update the prey position using Equations ( 5b), ( 5a) and ( 12) - 11:
if //(Exploitation phase) - 12:
Update the prey position using Equations ( 10a), ( 10b) and ( 12) - 13:
end for - 14:
- 15:
end While - 16:
return
|
4.2.3. Binary Mechanism Sigmoid
The solutions in conventional GJO are continuous and can be updated using the Equations (
5a), (
5b), (
10a) and (
10b) directly. However, the solution space of the formulated feature selection problem is discrete, which cannot be handled by conventional GJO. Therefore, it is suitable for feature selection problems by introducing a binary mechanism to map the solutions from continuous to discrete space. For the solution mappings in this work, the commonly used
S-shaped transfer function [
44], i.e., the Sigmoid function, is applied to conventional GJO and IBGJO. The details of this function are as follows. Moreover, the binary mechanism is elucidated in Equations (
13) and (
14) as follows:
where
is the converted binary solution of the feature selection problem, and
is a random number used as the threshold.
Figure 4 presents the binary mechanism that we used in this paper.
4.3. Feature Selection Based on IBGJO
A solution could be viewed as a golden jackal for the formulated feature selection problem when employing the suggested IBGJO. Consequently, the answer could be stated as follows:
where
represents the number of features while
is the number of individuals, thus, the IBGJO population is expressed as follows:
4.4. Computational Complexity
The complexity of the conventional GJO algorithm depends on various factors, including the size of the individuals
, and the number of iterations
. The exploration phase or exploitation phase is performed in each iteration. Therefore, the overall time complexity of conventional GJO consists of the exploration and exploitation phase. Thus, the overall time complexity of conventional GJO is given as follows:
Since the structure of the proposed IBGJO is similar to conventional GJO, therefore, the computational complexity of IBGJO is also determined to be . As a result, for a given feature selection problem, IBGJO does not require noticeably more computation time than conventional GJO, as both conventional GJO and IBGJO algorithms possess equivalent computational complexity. Notably, the average execution time of IBGJO in the experimental results is better than that of conventional GJO; this may be attributed to the enhanced factors employed in IBGJO, which will help improve the searchability of IBGJO and promote its fast convergence.
5. Experiments and Analysis
In this section, we conduct tests to evaluate the performance of the proposed IBGJO algorithm for dealing with feature selection problems. First, the datasets and setups used in the experiments are introduced. Then, the test results obtained by IBGJO and several comparison algorithms are presented and analyzed. Moreover, several other algorithms are selected for comparison.
5.1. Datasets and Setup
In this work, we provide the datasets used in this article and the parameter settings for the experiment.
5.1.1. Benchmark Datasets
This section introduces the benchmark datasets used in different algorithms’ evaluations and parameter setups. Due to the fact that the UCI dataset covers multiple fields, such as Life, Social, Physical and so on, many research works use the UCI dataset as the benchmark data. For example, 10, 14, 16 and 20 datasets in the UCI dataset were respectively selected as experimental data in [
21,
45,
46,
47]. Therefore, the datasets used in our experiments refer to some datasets in their work and has been expanded to 28 datasets. By using these well-known datasets, we intended to facilitate comparisons with existing algorithms and provide a basis for future research. The primary information of these datasets is shown in
Table 1.
5.1.2. Experiment Setup
We compare IBMRFO with several other algorithms for feature selection experiments, including BCS, BGWO, BHBA, BMPA, BGJO, and IBGJO. It should be noted that all algorithms use the exact binary mechanisms. At the same time, BGJO is a binary version of the conventional GJO algorithm. IBGJO parameters are based on those of the conventional GJO algorithm, which has only one adaptive coefficient vector. Unlike other algorithms, conventional GJO and IBGJO require no additional tuning. The critical parameter choices for these algorithms are presented in
Table 2, with specific values based on prior evidence of consistently strong performance in the literature for each algorithm, enabling effective feature comparison.
Moreover, because both the proposed IBGJO and these comparison algorithms are meta-heuristics, the size of the population and the number of iterations directly impact them. To guarantee that the comparison is fair, the population size and the number of algorithm iterations must be consistent. The population size and iteration count for each algorithm in this study are set to 20 and 200, respectively. Additionally, to prevent the experiment’s random bias, each algorithm is independently performed 30 times in these chosen datasets, as suggested by the central limit theorem. The experiment’s Intel(R) Core(R) I9-12900KF CPU and 64 GB of RAM were employed. Using Python 3.9.12 and the
KNN [
48] (
k = 5) based on Euclidean distance measurement, we put the trials into practice. It is worth noting that a common approach has been employed in several previous works where 80% of the instances are used for training purposes, while the remaining instances are reserved for testing. Moreover, in the fitness function
and
are set to
and
, respectively.
5.2. Feature Selection Results
This section presents the feature selection results of various algorithms in terms of average fitness function value, convergence speed, average accuracy, and average CPU time. Also, the best results are shown in bold.
5.2.1. Performance Evaluation
To explicitly demonstrate the performance of various algorithms, the fitness function values achieved by those algorithms are shown in
Table 3.
Table 3 shows the numerical statistical results of each dataset’s average fitness function value and standard deviation (std) of different algorithms. For the average fitness values on 28 datasets, BCS, BGWO, BHBA, BMPA, BGJO, and IBGJO, they achieved the best performance on 3, 3, 5, 7, 9, and 14 datasets, respectively. This demonstrates our conjecture that BGJO may have a good exploration ability but lacks exploitation performance. Thus, by introducing the improved factors to conventional BGJO, the proposed improvement factors are practical. Compared with conventional BGJO, IBGJO has an advantage on average fitness value in 21 datasets. Moreover, IBGJO obtains the best stds of fitness values in 11 datasets, which means that IBGJO is more stable than others regarding feature selection.
Due to space restrictions, many such figures are divided into three parts, and each curve is taken from the 15th test. The convergence rates of various algorithms used in the optimization processes are shown in
Figure 5,
Figure 6 and
Figure 7. These figures demonstrate that the proposed IBGJO exposes the best curves on 20 datasets and has the best convergence capability compared to all other comparison algorithms. Overall, the proposed IBGJO performs better than other comparison algorithms for solving the formulated feature selection problem. Note that the effectiveness of different improved factors is further verified and discussed in
Section 5.3.
5.2.2. Features Selection Accuracy of Algorithms
The feature selection accuracy obtained by various algorithms is shown in
Table 4. The IBGJO algorithm achieves the best average accuracies of feature selection results on 14 datasets. Moreover, IBGJO obtains better accuracy than conventional BGJO in 21 datasets. Thus, compared with other algorithms, the IBGJO algorithm has the best performance in terms of feature selection accuracies on these selected datasets. The reason could be that the improved factors can balance the exploration and exploitation abilities, improving the algorithm’s performance. However, it is crucial to recognize that achieving optimal results for accuracy and the number of selected features is a challenging tradeoff that varies across datasets.
Therefore, it can be concluded that the proposed IBGJO algorithm displays superior overall performance in feature selection across the selected datasets as compared to the other algorithms according to
Table 4 and
Table 5.
5.2.3. Number of Selected Features
The counts of the selected features from the datasets acquired by various techniques are displayed in
Table 5. Similar to the accuracy results, these tables likewise display the outcomes of numerical statistics. BMPA obtains the best average number of selected features in the majority of the datasets (20 of 28), which may be regarded as the best results in the tests compared to other algorithms. This is shown in
Table 5. Meanwhile, the number of features of IBGJO has an advantage in 15 datasets compared to that of BGJO. It is important to note that there exists a tradeoff between accuracy and the number of selected features, making it challenging to achieve optimal results for both objectives in each dataset.
5.2.4. Algorithm Execution Time
The average running time of all algorithms is shown in
Table 6. Based on the data presented in
Table 6, it is evident that the IBGJO has an advantage in algorithm execution time. IBGJO is experimented on 28 datasets and compares the performance of different feature selection algorithms. Among these 28 datasets, our algorithm converged in the least average time on 19 datasets. This means that our algorithm has higher efficiency and faster convergence and can select the best subset of features in less time, thereby improving the performance of the model. This result shows that our algorithm has higher practicability and feasibility in practical applications.
5.3. Effectiveness of the Improved Factors
In this section, we conduct experiments to evaluate the effectiveness of the introduced factors in IBGJO. To observe whether these factors can impove the performance of BGJO, we use BGJO, BGJO with CTM mechanism (T-BGJO), BGJO with CS (C-BGJO), and BGJO both with CTM mechanism and CS together (IBGJO) to solve the formulated feature selection problem, respectively. The tests are also conducted on the nine selected datasets: Arrhythmia, Diabets, Heart-StatLog, Ionosphere, Krvskp, Lung, Parkinsons, Thyroid and WDBC. The numerical findings generated by these abovementioned algorithms are listed in
Table 7. Overall, all algorithms obtain the same results on the Diabets dataset. This may be because this dataset has the lowest solution dimension, making it easy to solve. Furthermore, the convergence rates of different improvement factors used in optimization are shown in
Figure 8. The remaining outcomes are discussed in detail as follows.
5.3.1. Effectiveness of the Chaotic Tent Map (CTM) Mechanism
It can be seen from
Table 7 that compared with the traditional BGJO algorithm, the T-BGJO algorithm does not have many advantages in fitness function value or classification accuracy. However, T-BGJO could efficiently select a fewer number of features than BGJO. Therefore, CTM has the advantage in feature number over other improved factors with cosine similarity that can help IBGJO obtain better performance.
5.3.2. Effectiveness of Cosine Similarity Position Update
In most datasets, especially medium-dimensional datasets, C-BGJO outperforms BGJO and T-BGJO in terms of accuracy of the fitness function values obtained, as shown in
Table 7. This is due to the ability of the proposed CS position updating system to adaptively modify the searching scope to enhance BGJO’s exploration capabilities. Note that, compared with the location update mechanism of the conventional BGJO, the CS requires additional calculation in each iteration. However, the searchability of IBGJO will be more robust with CS. Therefore, it could increase the convergence time.
5.3.3. Effectiveness of CTM and CS
It can be seen from
Table 7 that the CTM mechanism effectively improves the representativeness and diversity of the initial population through a chaotic tent map and prevents the algorithm from falling into local optimum. Using CS as the jackal position update strategy in the golden jackal optimization algorithm can speed up the convergence speed of the algorithm and help IBGJO to converge faster. The combination of the two enhancement factors can effectively improve the algorithmic performance of BGJO in the field of feature selection.
To summarize, incorporating the two enhancement factors and binary mechanisms has effectively elevated the performance of the conventional BGJO algorithm and made it well-suited for feature selection. Furthermore, these components exhibit a complementary relationship. For instance, utilizing the CTM mechanism on small-sized datasets may cause the algorithm to encounter local optima frequently. Therefore, incorporating the CS is essential to address this problem.
5.4. Limitation of IBGJO
Although the experimental simulation results show that the proposed IBGJO algorithm outperforms some comparative algorithms, it still has some limitations. One limitation of the IBGJO algorithm is its sensitivity to parameter settings, requiring careful tuning for optimal performance. Additionally, the scalability of IBGJO to large-scale or high-dimensional datasets is a concern, as its computational complexity may become prohibitive. The generalization of IBGJO to different domains and problem types needs further exploration, as specific data characteristics may influence its performance. Furthermore, the interpretability of the selected feature subsets may not be guaranteed, as the algorithm prioritizes classification performance over intuitive feature combinations. The effectiveness and applicability of IBGJO can be improved by reducing the dimensionality of the feature set on the original dataset and then using the IBGJO algorithm for feature selection and optimizing the algorithm parameters.
6. Conclusions
The focus of this research work is to improve the classification performance of machine learning by addressing the issue of feature selection. An improved version of the BGJO algorithm, referred to as the IBGJO algorithm, is proposed to solve the feature selection problem. The IBGJO algorithm incorporates the CTM mechanism, CS location updating mechanism, and S-shape binary mechanism, designed to improve the performance of conventional BGJO and make it suitable for feature selection problems. Utilizing these improved factors allows the algorithm to balance its exploitation and exploration abilities while maintaining population diversity.
By using the improved factors, we can balance the development of the algorithm and its ability to explore different options while maintaining diversity within the population. We conducted experiments to test our proposed algorithm, IBGJO, on 28 well-known datasets and found that it outperformed other state-of-the-art algorithms such as BCS, BGWO, BHBA, BMPA and BGJO in terms of feature selection. We also evaluated the effectiveness of the improvement factors. We found that they helped to enhance the performance of the conventional golden jackal optimization algorithm. In the future, we plan to propose additional ways to update the location of the population and combine them with other evolutionary algorithms to tackle a broader range of optimization problems.