Feature selection is a crucial step in machine learning, particularly when dealing with datasets with high dimensionality. Its primary objective is to streamline the dataset by reducing its dimensionality, thereby identifying the most relevant features that contribute significantly to predictive accuracy, while discarding irrelevant or noisy attributes. This process not only enhances computational efficiency, but also minimizes redundancy among the selected features. Feature selection is essential in various domains, including text categorization, data mining, pattern recognition, and signal processing [
33], where it aids in improving model performance by focusing on the most informative attributes and discarding superfluous ones.
Feature selection poses a challenging problem, acknowledged as non-deterministic polynomial hard (NP-hard) [
34], making exact (exhaustive) search methods impractical due to their computational complexity and time requirements. Therefore, heuristic and metaheuristic algorithms and their hybridizations become essential in this context [
35]. When crafting a metaheuristic algorithm for an NP-hard problem, a delicate balance between exploration and exploitation must be carefully maintained to optimize search algorithms [
36]. The GWO is recognized in the literature for its adeptness in striking the right equilibrium between exploration and exploitation. Simultaneously, the PCC stands out as a swift heuristic method for identifying and eliminating highly correlated features [
37]. Hence, we have chosen to employ PCC and GWO as the heuristic and metaheuristic components of our integrated PCC–GWO feature selection algorithm. This strategy aims to harness the advantages of both methods concurrently, combining the speed of heuristic-based PCC with the precision of metaheuristic-driven GWO to enhance the feature selection process.
Algorithm 1 outlines the PCC–GWO feature selection approach, offering a hybrid method for selecting an optimal feature subset for each base learner within the ensemble learning model. Initially, the algorithm employs the PCC method to compute an importance score for each feature. Subsequently, these scores serve as heuristic knowledge to guide the GWO during the search process. To achieve this, the importance scores are normalized within the range of [0, 1], and a roulette wheel selection method is utilized to choose features for each grey wolf within the initial population generation procedure. The subsequent sections delve into the specifics of the PCC–GWO algorithm, encompassing both the PCC and GWO phases, facilitating a comprehensive understanding of the feature selection process.
Algorithm 1. Feature selection using PCC–GWO algorithm. |
Input: |
Full heart disease dataset |
Output: |
Optimal Feature Subset for Machine Learning Model |
Heuristic Feature Selection: Calculation of Importance Scores using PCC: |
- 1.
For (i = 1: Number of Features) - 2.
Calculation of the correlation of feature i with the class: CCi - 3.
Calculation of the correlation of feature i in relation to the other features: CFi - 4.
Calculation of the PCC importance score of feature i: ISi = CCi/CFi - 5.
End For
|
Metaheuristic Feature Selection: Final Feature Subset Selection using GWO: |
- 1.
t = 0 (Initial Population) - 2.
For (s = 1: PopSize) - 3.
for (i = 1: Number of Features) - 4.
Calculation of the probability of feature i in solution s using Equation (3) - 5.
Deciding to select or decline feature i using roulette wheel selection - 6.
end for - 7.
Calculation of the fitness of each grey wolf s using Equation (4) - 8.
End For - 9.
Considering the best solution as alpha wolf: Xα - 10.
Considering the second best solution as beta wolf: Xβ - 11.
Considering the third best solution as delta wolf: Xδ - 12.
For (t = 1: MaxIter) % Population Updating- 13.
for each grey wolf s - 14.
Updating a, Ai, and Ci, rAi, and rCi. - 15.
Calculation of updating factor towards alpha grey wolf using Equation (13) - 16.
Calculation of updating factor towards beta grey wolf using Equation (14) - 17.
Calculation of updating factor towards delta grey wolf using Equation (15) - 18.
if (|Ai| ≥ 1) - 19.
Updating the wolf s using search for prey by Equation (16) - 20.
elseif (|Ai| < 1) - 21.
Updating the wolf s using attacking prey by Equation (16) - 22.
end if - 23.
end for % Fitness Evaluation- 24.
for (s = 1: PopSize) - 25.
Calculation of the fitness of each grey wolf s using Equation (4) - 26.
End for - 27.
Updating the best solution as alpha wolf: Xα - 28.
Updating the second best solution as beta wolf: Xβ - 29.
Updating the third best solution as delta wolf: Xδ - 30.
End For Return Xα as the optimized feature subset |
4.1.1. Calculating the Importance Score of Features Using PCC
The PCC is a measure of the degree and direction of a relationship between two variables [
38]. The PCC values vary from −1 to +1. A value of zero shows that there is no correlation between the two variables, while values near −1 or +1 suggest that there is a strong association between the two variables. The PCC is determined by:
where
and
are the means of the two variables
x and
y, respectively. Moreover,
xi denotes the
i-th value of the variable
x, and
yi denotes the
i-th value of the variable
y.
By computing the correlation coefficient between each feature and the target variable, the method identifies the most informative features for an accurate classification. Then, by considering the correlation of each feature with respect to all the other features in the dataset, the method identifies redundant or highly correlated features that may not provide much additional information. The selection status of each feature is then determined based on a threshold value derived from its correlation coefficients. Finally, the GWO algorithm is used to repeat the selection process multiple times, and the feature subset with the highest fitness value is selected as the final solution. This method provides an effective way to identify and select the most valuable features in high-dimensional datasets, leading to improved predictive accuracy and better performance of machine learning models. The overall operation of PCC can be summarized as follows:
- (1)
The correlation coefficient of each feature i with the class is computed as CCi;
- (2)
The correlation coefficient of each feature i in relation to the other features is calculated as CFi;
- (3)
The importance score of each feature i can be calculated as ISi = CCi/CFi.
Concerning the PCC, if the value of ISi is greater than a specific threshold TH (ISi > TH), the feature i is selected; otherwise, it is not chosen. However, in the proposed combined PCC–GWO algorithm, the importance scores obtained by the PCC are used to guide the search process of the GWO for achieving a better level of convergence.
4.1.2. Feature Subset Selection Using GWO
The GWO was originally introduced by Mirjalili et al. [
39]. It is based on the hunting behavior and social order of grey wolves found in nature. The social hierarchy of grey wolves is described by four types of wolves, which are the following:
Alpha (α): the finest solution;
Beta (β): the second best solution;
Delta (δ): the third best solution;
Omega (ω): the rest of the grey wolves.
Similar to other metaheuristic algorithms, the GWO initiates its search procedure by creating an initial population of viable solutions. Subsequently, it undergoes iterative phases, comprising a fitness assessment and population adaptation, until it fulfills a predefined stopping condition, such as reaching a specific number of iterations.
Representation of Feasible Solutions: The encoding of a feasible solution
X (i.e., a grey wolf) is depicted in
Figure 3. If the quantity of the
i-th variable is equal to 1, the feature
i is selected by the grey wolf; otherwise, it is not picked. Consequently, a value of 1 is used to represent the feature subset’s scope, which is expressed as follows:
Initial Population Generation: As mentioned above, the original GWO algorithm starts its search process with a random population of grey wolves. However, in the proposed combined PCC–GWO algorithm, the importance scores of the features obtained by the PCC are utilized to generate a set of near-optimal initial solutions for the GWO. To achieve this purpose, at first the normalized importance score for each feature
i is calculated and, then, the probability of feature
i to be selected in each solution (grey wolf)
s can be expressed using the roulette wheel selection method, as follows:
Fitness Evaluation: The original dataset is separated into train and test datasets. The train dataset is considered for the optimization procedure via the GWO by means of K-fold cross-validation. However, the test dataset is unseen for the final evaluation of the generalizability of the trained model. The following is the fitness function of the GWO to assign the quality of each solution, which aims to be maximized:
where
accuracy is the total accuracy of the base learner using the validation dataset, and
μ is a parameter (0 <
μ < 1) that determines the relative importance of accuracy and the number of selected features on the fitness value. The higher
μ, the higher impact of
accuracy on the fitness value. We consider
μ = 0.99 to ensure that high-accuracy solutions are achieved, while the number of features in the second rank is minimized.
Population Updating: At every iteration of the GWO, after the fitness evaluation of all the wolves, the first three best wolves,
α,
β, and
δ, are in charge of leading the optimizer’s hunting process, while
ω simply obeys and follows them. Encircling, hunting, and attacking are the three well-organized steps that the GWO does during the optimization process. The following equations were used to determine the encircling process:
where
t indicates the number of iterations,
X represents the location vector of the wolf, and
Xp represents the location vector of the prey. Moreover,
A and
C represent the vector coefficients expressed as follows:
Where [0, 1] is a random range for the vectors
r1 and
r2, and the elements within the vector
a start at 2 and fall linearly to 0 during the execution of the algorithm, as follows:
where
MaxIter denotes the maximum number of iterations.
The GWO keeps the top three solutions (
α,
β, and
δ) obtained so far and compels ω to modify their placements in order to follow them. As a result, a series of equations that run for each search candidate is used to simulate the GWO hunting process. To achieve this, at first, the parameters of
D for alpha, beta, and delta wolves are expressed as follows:
Then, the moving vectors of the grey wolf
X towards the alpha, beta, and delta wolves can be calculated as Equations (12)–(14), respectively. Finally, the movement of the grey wolf
X is obtained through the aggregation of the three moving vectors according to Equation (15).