The data module handles raw data, preprocesses it, and prepares a features list from all the relevant parameters. The feature selection module is based on PSO that uses the features list as input and returns an optimized set of features based on their respective scores calculated using the Gaussian Mixture Model (GMM). The random sampling module takes all the data based on essential and optimized elements and generates different random samples of data. Each random sample is trained separately. The most crucial module is our hybrid ensemble learning module, which again uses PSO for optimization by tuning the hyper-parameters of all the models. The last module is the evaluation module that uses different evaluation metrics to assess the performance of all the models.
3.2.2. PSO-Based Feature Selection
Feature selection is the method for choosing a subgroup of significant features to construct a model. It aims to enhance the data quality by selecting the best features for the model performance. The process of feature selection by using PSO is shown in
Figure 4.
We have used a variation of the binary particle swarm optimization (BPSO) algorithm, which primarily is a binary version of PSO [
42]. The primary functions are the same as PSO; in BPSO, both local best (pbest) and global best (gbest) solutions exist. Here, the positions of the particles are stated in two terms, i.e., 0 (not selected) and 1 (preferred). The position and velocity of a particle on d-dimensions at any given time t can be defined in Equations (1) and (2).
where
is the position of a particle;
is the velocity of the particle; acceleration factors are represented as
and
;
and
represents two random numbers in the range [0,1]; d refers to the dimension in the search space, and t is the number of iterations.
For each particle, its velocity is updated and changed to probability. Additionally, the position of the particle is updated based on its updated velocity. The position and velocity of a particle can be updated by using Equations (3) and (4).
where,
and
refer to updated velocity and position of a particle, respectively; a
random is a random number that has uniformly distributed values between 0 and 1.
The most vital role is played by pbest and gbest values that guide the particles toward the global optimum. Equations (5) and (6) are used to update pbest and gbest values, respectively.
where
x is the position (solution), pbest is the local best solution, gbest is the global best solution, and
f(.) is the fitness function defined as in Equation (7).
where
α is a hyper-parameter that decides the tradeoff between the classifier performance
P,
refers to the size of features subset, while
represents the total number of features in the dataset. Accuracy, precision, or F-score can be used to evaluate the classifier performance. We have used Gaussian Mixture Modeling for feature evaluation.
Inertia weight, being the most significant parameter in BPSO, handles exploitation and exploration behavior. It is critically essential to achieve a balance between exploitation and exploration. In the literature, several different types of inertia weight strategies are proposed to improve PSO performance. Here, we use our previously proposed velocity-boost-inertia-weight scheme in [
43]. We start with a constant inertia weight
valuev
= 0.729, and a velocity boost threshold (VBT) then, in each iteration, we observe the particle’s pbest, till VBT. If there is no enhancement in the pbest, we assign a new inertia weight to update the particles’ velocities. Based on previously proposed inertia weights in Equation (8) [
44] and Equation (9) [
45], we define our new inertia weight in Equation (10).
3.2.4. PSO Optimized Hybrid Ensemble Learning Model
The most critical objective of machine learning is to prepare a balanced model that operates perfectly in all circumstances; however, real-life examples and situations are not often ideal. Ensemble learning is the procedure of merging several models to achieve a more comprehensive and better-supervised model. The fundamental idea of ensemble learning is that if a specific weak regressor fails to provide accurate prediction, other regressors can take care of it and improve the results.
We have experimented with a novel machine learning-based hybrid approach, combining multiple combinations of various models such as CatBoost, XGBoost, LightBoost, RandomForest, and GradientBoosting. Then, we randomly selected three different varieties of these models for experiments.
- (A)
Hyper-parameter Tuning Using PSO
The hyper-parameters of these combination or hybrid models were optimized using PSO. For any machine learning model with the highest performance, there is a dire need to tune the system for any given problem at hand; otherwise, it will fail to achieve the best performance. It is nearly impossible to adjust the system every time; therefore, manually, automated hyper-parameters tuning diminishes the labor-intensive work for experimenting with various machine learning model configurations. Hyper-parameter tuning enhances the precision of ML algorithms and increases reproducibility. It plays a vital role in producing more accurate results for any ML model [
46]. Genetic Algorithms (GA) and PSO are elementary and are discovered to be more efficient in exploring huge hyper-parameter space [
47]. However, PSO can obtain the same optimization level as GA but usually with less cost in terms of generations. In
Table 2, the critical hyper-parameters for the base learner of ensemble selected are shown along with their value ranges.
PSO requires several parameters to execute, and all the parameters’ values depend upon the problem. In our case, we are aiming to find optimal deals for all the hyper-parameters. PSO parameters, along with their respective values, are presented in
Table 3.
The fitness function depends on the RMSE of the regressor, which should be minimized. Hyper-parameters values are updated in such a way that solution advances toward local best and global best. These hyper-parameter values are then used to update the position of each particle. In each iteration of the algorithm, the fitness condition and the termination condition are verified before calculating pbest and gbest values. As shown in
Table 3, RMSE is chosen as fitness criteria, and 200 iterations or RMSE difference among previous iterations <0.3% is selected as the termination criteria. The fitter and finer particles will have a minimum RMSE value for that configuration. The iterations are repeated till the terminal condition is reached, i.e., when the difference of RMSE between two iterations remains <0.3% or the number of iterations exceeds 200. Then the particle with the best fitness value is chosen as the most desired solution.
The reasons for choosing PSO include:
Being an increasingly popular meta-heuristic algorithm, PSO has a more robust global-search ability.
In most cases, PSO has substantially improved computational effectiveness.
It is easier to implement as compared with other meta-heuristic algorithms.
- (B)
Learners for optimized-ensemble model
We have introduced all the models that we used for ensemble learning below.
- (1)
GradientBoosting
Decision trees-based Gradient Boosting Machine (GB) is a robust ensemble machine learning algorithm. Boosting is the most conventional ensemble approach where each model is added in sequence to the ensemble, and the later added model improvises the performance of the former models. The first algorithm that could potentially perform boosting is the AdaBoost algorithm, and Gradient Boosting is based on the generalization of AdaBoosting to enhance its performance. It also aims to further improve performance by bringing concepts of bootstrap aggregation, e.g., while fitting ensemble models, randomly choosing the samples and features.
The key reason for selecting GB is it performs well, and it is one of the general boosting algorithms. Later versions of GB such as XGBoost and LightBoost are powerful variations of GB that can play a vital role in various complex predictive learning-based problems.
- (2)
CatBoost
Catboost is among the boosting algorithm family that includes XGBoost, LightGBM algorithms. Just like these algorithms, CatBoost is also an open-source machine learning library. It is an improvised implementation under the Gradient Boosting DT algorithm technique. This technique is developed upon symmetric DT algorithm with various advantages such as limited parameters involved, suitable for categorical and numerical variables, generalization capability, higher prediction speed and accuracy, etc.
The main reason to select CatBoost is that it is primarily used to handle challenges associated with categorical variables proficiently. Additionally, it is a time-efficient model as one requires spending little to no time on parameter adjustment as just with default features. Without introducing lots of changes, high-quality results can still be obtained.
- (3)
XGBoost
Gradient Boosting (GB) is being used for both classification and regression-related problems, and it belongs to an ensemble machine learning-based class of algorithms. XGBoost aims for higher speed and better performance and implements Gradient Boosting decision trees upon which ensemble models are constructed. The prediction errors of the prior models are minimized by adding trees one by one to the ensemble model.
It has recently earned utmost popularity in applied machine learning, primarily for structured data. Using Python, the implementation and model for XGBoost in scikit-learn can easily be installed in your development environment.
- (4)
LightBoost
Like CatBoost and XGBoost, Light Gradient Boosted Machine, shortly known as LightGBM, is among the boosting algorithm family. LightGBM is also an open-source machine learning library and is an improvised implementation of the GB algorithm.
Here, feature selection is automatically added, and also a significant focus is paid on boosting examples with higher gradients. It enables LightGBM with efficient training by reducing the training time and enhance prediction outcomes. It is the reason for selecting LightGBM for ensemble modeling. Besides that, it can work well with tabular and structural data in classification and regression-based modeling tasks. LightGBM, along with XGBoost algorithms, has undoubtedly accelerated the popularity of GB models.
- (5)
RandomForest
It is an ensemble machine learning algorithm and uses bagging as an ensemble approach and decision trees as individual models. The key reason for selecting Random Forest is that it is most certainly an extremely popular and extensively applied machine learning algorithm. Besides that, it can be widely applied and show effective performance outcomes for classification and regression-based predictive modeling problems. Additionally, its dependence on a fewer number of hyper-parameters makes it easy to use and implement.
- (C)
Predictions
We used all the random samples to train with three different combinations of optimized hybrid ensemble learning models. The prediction module generates separate predictions for each model in an ensemble combination. The final projections are developed by taking an average of all the projections.