1. Introduction
Researchers from all over the world have been paying more and more attention to how to handle massive datasets in recent years, thanks to data exploration. These databases can contain duplicate and pointless features. Thus, in order to solve this issue and increase the effectiveness of both supervised and unsupervised learning algorithms, feature selection is crucial [
1,
2,
3]. It is obviously a challenging task to find the best subset of features in high-dimensional feature datasets because there are
feature subsets in a dataset with
N features. Traditional mathematical techniques cannot produce the desired outcome in an acceptable amount of time; hence, meta-heuristic algorithms are frequently used for feature selection issues.
Metaheuristic algorithms have been successfully used to solve complex engineering and science computation problems, such as function optimization [
4,
5,
6], engineering optimization [
7,
8,
9], and feature selection problems [
10]. Researchers have proposed binary metaheuristic algorithms or improved versions for feature selection, such as binary swarm optimization (BPSO) [
11], the binary artificial bee colony (BABC) [
12], the binary gravitational search algorithm (BGSA) [
13], binary grey wolf optimization (BGWO) [
14], the binary salp swarm algorithm (BSSA) [
15], the binary bat algorithm (BBA) [
16], the binary whale optimization algorithm (BWOA) [
17], binary spotted hyena optimization (BSHO) [
18], binary emperor penguin optimization (BEPO) [
19], binary Harris hawks optimization (BHHO) [
20], binary equilibrium optimization (BEO) [
21], binary atom search optimization (BASO) [
22], the binary dragonfly algorithm (BDA) [
23], the binary jaya algorithm (BJA) [
24], binary coronavirus herd immunity optimization (BCHIO) [
25], the binary butterfly optimization algorithm (BBOA) [
26], binary black widow optimization (BBWO) [
27], the binary slime mould algorithm (BSMA) [
28], binary golden eagle optimization (BGEO) [
29], and so on.
To find the best solution to complex optimization problems, metaheuristics are employed. A system of solutions that develops over many iterations utilizing a set of rules or mathematical equations can be used by several agents to facilitate the search process. These iterations continue until the result satisfies a set of predetermined requirements. This final solution (the near-optimal solution) is called the optimal solution, and the system is considered to have reached a state of convergence [
30].
In contrast to exact methods that find optimal solutions but require a long computational time, heuristic methods find near-optimal solutions quite quickly [
31]. However, most of these methods are problem-specific. As the word “meta” in metaheuristic methods indicates, metaheuristics are one level higher than heuristics. Metaheuristics have been very successful because they have the potential to provide solutions at an acceptable computational cost. By mixing good heuristics with classical metaheuristics, very good solutions can be obtained for many real-world problems.
Binary particle swarm optimization (BPSO) and its variants have been widely used in the FS problem. In 2020, the authors of ref. [
32] proposed self-adaptive PSO with a local search strategy to find less-correlated feature subsets. The authors of ref. [
33] proposed an improved version of the SSA algorithm called the ISSA to solve the FS problem. Furthermore, a binary chaotic horse herd optimization algorithm for feature selection (BCHOAFS) was proposed by Esin Ayşe Zaimoğlu [
34]. The authors of ref. [
35] used four families of transfer functions in the binary AOA (BAOA) to test 10 low-dimensional and 10 high-dimensional datasets. In feature selection problems, several wrapping-based algorithms using binary meta-heuristic algorithms have been proposed. The S-shaped and V-shaped transfer functions are mostly used, such as modified binary SSA (MBSSA) [
36].
In recent years, to increase the efficiency of transfer functions, the time-varying S-shaped and V-shaped transfer functions have been proposed and applied to the binary dragonfly algorithm (BDA) [
37] and BMPA [
38]. The results showed that the time-varying S-shaped transfer function has a better performance than the V-shaped one. Besides the single-objective algorithm, the muti-objective algorithm also plays an important role in the feature selection problem. Multi-objective whale algorithm optimization (WOA) [
39] was proposed for data classification for which filter and wrapper fitness functions were simultaneously optimized. Another paper studies a multi-label feature selection algorithm using improved multi-objective particle swarm optimization (PSO) [
40], with the purpose of searching for a Pareto set of non-dominated solutions (feature subsets). Liu proposes a novel feature selection method utilizing a filtered and supported sequential forward search technique called multi-objective ant colony optimization (MOACO) in the context of support vector machines (SVMs) [
41] to solve the feature selection problem. A new method known as improved multi-objective salp swarm algorithm (IMOSSA) is tested for a feature selection task [
42]. Also, other kinds of transfer functions are used in feature selection problems. An automata-based improved BEO [
43] (AIEOU) used a U-shaped transfer function to select the best subset of features. This algorithm applied both learning-based automata and adaptive β-hill climbing (AβHC) to find the best parameters to form a better equilibrium pool. Many binary metaheuristic algorithms have been introduced to select the best subset of features. However, due to the importance of feature selection in many fields, it is necessary to design algorithms that can obtain a higher accuracy with a smaller subset of features.
An important step in the feature selection problem is mapping a continuous space to binary ones, and the transfer function plays a significant role in this process. Moreover, using transfer functions is one of the easiest ways to convert an algorithm from continuous to binary without modifying its structure. The shapes of transfer functions are classified into two families: S-shaped and V-shaped [
44]. The S-shaped transfer function is a transformation function that increases monotonically within the [0, 1] interval, and its main attribute is that it is asymmetric and has a single fixed value throughout its entire area. The V-shaped transformation function is symmetric and has two numbers with equal values in the [0, 1] interval, which enhances the diversity of the population in the region within the range.
A common drawback of common transfer functions used in binary algorithms is that they do not explore and develop evolutionarily during the search for the optimal solution; that is, the process to obtain a solution involves changing the probability of the parameter values in a nonadaptive manner. Thus, they have poor exploration or exploitation and are static functions which cannot change as time goes on.
In this paper, a powerful transfer function is proposed to convert continuous search spaces to binary ones. The transfer function is dynamic during this process, which can enhance the search ability of the BGOA in the exploration phase. Then, the transfer function gradually changes while the proposed algorithm switches from exploration to exploitation and finally reaches good result in the end, the K-nearest neighbors (KNN) algorithm is applied to classify it.
The main contributions for this paper can be summarized as follows:
A time-varying Gaussian transfer function is introduced.
A new binary grasshopper optimization algorithm based on time-varying Gaussian transfer functions (BGOA-TVG) is proposed.
The BGOA-TVG achieves a balance between its exploration and exploitation capabilities and improves the convergence speed of the algorithm.
The BGOA-TVG can effectively deal with high-dimensional feature selection problems.
Compared with proposed binary metaheuristic optimization algorithms in recent years, the excellent performance of the BGOA-TVG is verified.
The rest of the paper is organized as follows:
Section 2 presents a brief introduction to the feature selection problem. In
Section 3, the basic grasshopper optimization algorithm is discussed. The enhanced transfer function is presented in
Section 4.
Section 5 shows the results of the test. In
Section 6, the proposed method is demonstrated within the EEG analysis field. Finally,
Section 7 concludes the paper and suggests some directions for future studies.
2. Feature Selection Problem
The feature selection problem is an NP-hard optimization problem [
45], in which as the features of a dataset increase, the search space of the problem exponentially grows. It is a useful way to find a relevant subset of fewer features from an initial dataset to reduce dimensions and training times [
46]. However, traditional mathematic methods cannot solve high-dimensional feature selection problems in a reasonable time, and according to tests, metaheuristic algorithms are better at finding subsets of features [
47,
48,
49,
50,
51,
52,
53,
54]. There are three selection strategies in feature selection: wrapper-based, filter-based, and hybrid filter–wrapper-based methods [
55]. The precision of the learning algorithm in the wrapper-based strategy creates the optimum subset. The chosen subset in the filter-based method is unrelated to the learning process. These methods are combined in the hybrid approach. The wrapper-based method outperforms the others in terms of accuracy, but it requires more CPU resources and a longer testing time. The authors of [
56,
57,
58] provided a new method to extract optimal feature subset to enhance accuracy of the calculation. Others have proposed correlation feature selection [
59] and in-depth analyses on the usage of searching methods like the best-first, greedy step-wise, genetic, linear forward selection, and rank searches [
60].
A feature selection module is applied prior to the classification method to optimize efficiency and precision by eliminating irrelevant features and to reduce the time complexity to find the classification to which a document belongs [
61].
The most important thing in feature extraction is to extract subsets and determine whether to select an element in the set according to the accuracy rate. In the binary algorithm, individuals traverse a set and display whether to select an element using 0 s and 1 s. In Equation (1), is the number of the dataset and is a subset selected by the algorithms; in the binary algorithm, it is selected by the value of an individual.
Feature selection is a multi-objective optimization problem for which we aim to minimize the subset of the selected features and maximize the classification accuracy, which is described as a fitness function as follows:
where
is the resulting classification error.
is the number of selected features of the subset, and
is the total number of features of the dataset.
is the feature selection ratio of the subset to the total set.
and
are parameters in the interval of [0, 1] and
.
3. Grasshopper Optimization Algorithm (GOA)
The grasshopper optimization algorithm is a population-based swarm intelligence algorithm introduced by Mirjalili et al. in 2017 [
62], which models the behaviour of grasshopper swarms in nature. There are two essential phases in this algorithm: the exploration and exploitation of the search space. Through social interactions during the food search process, the swarm of grasshoppers changes between the phases. The swarm moves slowly and goes a small distance in the larval stage. In contrast, the swarm moves quickly and goes a large distance in adulthood.
There are three evolutionary operators in the position-updating process of individuals in swarms [
62]: the social interaction operator,
in Equation (2); the gravity force operator,
in Equation (2); and the wind advection operator,
in Equation (2). The movement of individuals in the swarm is describes as follows:
where
defines the position of the
grasshopper.
where
N is the number of grasshoppers in the swarm,
represents the distance between the
and the
grasshopper,
is a function that defines the strength of the social forces and is calculated as shown in Equation (4), and
is the unit vector from the
grasshopper to the
.
where
and
are two constants that indicate the intensity of attraction and the attraction length scale, respectively, and
is a real value.
in Equation (2) is calculated as shown in Equation (5) below:
where
is the gravitational constant, and
shows a unity vector towards the center of the earth. The effect of an individual’s flight to overcome gravity is represented by the symbol preceding it.
in Equation (2) is calculated as shown in Equation (6) below:
where
is a constant drift, and
is a unity vector in the direction of the wind.
Equation (2) can be expanded to Equation (7) as follows:
However, the mathematical model using Equation (7) cannot be used directly to solve optimization problems, mainly because the grasshoppers quickly reach their comfort zone and the swarm does not converge to a specified point, according to a test in ref. [
62]. The author of the GOA algorithm suggested a modified version of Equation (7) as shown in Equation (8) to solve optimization problems [
62], where the gravity operator is unconsidered, the gravity factor is set to 0, and the wind direction is always defined as moving towards a target. Accordingly, Equation (2) becomes Equation (8) as follows:
where
is the upper bound in the dth dimension, and
is the lower bound in the
dimension.
is the value of the
dimension in the target (the best solution found so far). The coefficient
reduces the comfort zone proportional to the number of iterations and is calculated in Equation (9) as follows.
where
is the maximum value,
is the minimum value,
indicates the current iteration, and
is the maximum number of iterations. In ref. [
62], they use
and
. Equation (8) shows that the next position of a grasshopper is defined based on its current position, the position of all other grasshoppers, and the position of the target. Algorithm 1 shows the pseudocode of the GOA algorithm.
4. Our Proposed BGOA-TVG Method
A binary search space is commonly considered as a hypercube [
63]. The space is four-dimensional, which is formed by moving three-dimensional objects. In this search space, the search agents of the binary optimization algorithm can only move to nearer and farther corners of this hypercube by flipping various numbers of bits. Therefore, to design the binary version of the GOA, the concepts of the velocity- and position-updating process should be modified.
In the continuous version of the GOA, the swarm of grasshoppers moves around the search space by utilizing direction vectors, and the value of position is in the continuous real domain. In the binary space, due to dealing with only two numbers (“0” and “1”), the position cannot be updated using Equation (6). The way to change the position and velocity is outlined below.
In binary spaces, position updating means switching between “0” and “1” values. This switching should be based on the probability of updating the binary solution’s elements from 0 to 1 and vice versa. The main problem here is how to change the concept of velocity in the real world to a binary space.
In order to achieve this, a transfer function is important to map velocity values to probability values to update the positions. In other words, a transfer function defines the probability of changing a position element from 0 to 1 and vice versa. In general, transfer functions force predators to move in a binary space. According to ref. [
64], the following concepts should be taken into consideration when selecting a transfer function in order to map velocity values to probability values:
(1) The range of a transfer function should be bounded in the interval [0, 1], as this represents the probability that a particle will change its position.
(2) A transfer function should have a high probability of changing position for large absolute values of velocity. Particles with large absolute values for their velocities are probably far from the best solution, so they should switch their positions in the next iteration.
(3) A transfer function should also have a small probability of changing position for small absolute values of velocity.
(4) The return value of a transfer function should increase as the velocity rises. Particles that are moving away from the best solution should have a higher probability of changing their position vectors in order to return to their previous positions.
(5) The return value of a transfer function should decrease as the velocity is reduced.
(6) These concepts guarantee that a transfer function is able to map the process of searching from a continuous search space to a binary search space while preserving simi-lar concepts of the search for a particular evolutionary algorithm. The GOA is simulated by PSO, the changed part in Equation (8). Defined as
, Equation (10) is analogous to the velocity vector (step) in PSO [
65]. The transfer function defines the probability of updating the binary solution’s elements from 0 to 1 and vice versa. In the BGOA, the probability of changing the positions of elements is based on the step vector values.
New time-varying transfer functions are proposed to enhance the ability of the BGOA in the search space. Algorithm 1 shows the pseudocode of the BGOA-TVG. The first transfer function (time-varying sin) is proposed to convert positions in the continuous space into the binary search space. The position is in the range of [
]. The binary position is mapped as follows:
where
is in the range of [
,
], and the linear increase in Equation (11) switches the algorithm smoothly from the exploration to the exploitation phases.
Figure 1 shows the time-varying transfer function. It enhances the capability of the exploration in the first phase, as shown by the blue curve in
Figure 1. In this phase, the diversity is extremely high, so the swarm can search all of the space. The red curve shows the phase between exploration and exploitation, which has a lower level of diversity than the first phase and searches more around the good solutions. The last phase, shown by the purple curve, changes slowly for the last iterations.
To avoid the local optima, the GOA uses Equation (8) to update the best solution. In the BGOA-TVG, a new time-varying V-shaped transfer function combined with a Gaussian mutation is proposed, as shown in
Figure 2. The binary solutions are generated based on the TVG as shown in Equation (14) and are defined as follows:
where
is in the range of [0.05, 10], and sigma is in the range of [0.01, 10] to switch efficiently from the exploration to the exploitation phases over time.
In
Figure 2, the blue curve is the initial status of the combined function, which has both exploration and exploitation, and the purple curve is the last status, which has maximal exploitation. Because the parameters
and
in the function are constantly changing with each iteration, the intermediate conversion function is also constantly changing. The yellow curves depict diverse scenarios characterized by varying parameters.
Algorithm 1 shows the pseudocode of the BGOA-TVG algorithm. In ref. [
65], it showed that normalizing the distance of grasshoppers in [1, 4], individuals can have both attraction and repulsion forces, which balance exploration and exploitation in the algorithm. Hence, we set the distance of individuals within the closed interval [1, 4].
Figure 3 shows the flowchart of the proposed algorithm.
Algorithm 1: Pseudocode of the BGOA-TVG algorithm. |
Initialize , , and Max_Iterations |
Initialize a population of solutions (i = 1, 2, …, n) |
Evaluate each solution in the population |
Set T as the best solution |
While (t < Max_Iterations) |
Update c using Equation (9) |
For each search agent |
Normalize the distances between grasshoppers in [1, 4] |
Update the step vector ΔX of the current solution using Equation (10) |
For i = 1: dim |
Use Equation (8) to obtain the current position |
Use Equations (10)–(13) to obtain the binary position |
Use Equations (14)–(17) to obtain the final position |
Calculate based on Equations (13), (15), and (16) |
End Reevaluate the fitness of each individual in the population If there is a better solution, replace T with it Update T |
End |
End |
Return T |
Computational Complexity
The proposed transfer functions do not change the computational complexity of the algorithm during each iteration. Moreover, the core program of the grasshopper optimization algorithm is to find the current optimal value in a loop. Factors that affect the overall complexity include the population number, number of individuals, and number of iterations. Therefore, the maximum of the BGOA-TVG is O (), where D shows the number of dimensions.