4.3.2. Feature-Based Comparative Analysis

This section applies the comparative analysis to different datasets containing the indicators chosen by different feature selection techniques, including best-first search, evolutionary search, PSO search, and PCA dimension reduction methods. Table A7 in the Appendix A.3 presents the different features chosen by other methods. For the analysis, machine learning models, including OLS, Ensemble methods (bagging), SVR (with a polynomial kernel), and MLP (with one hidden layer and nine neurons), have been applied to different datasets, which include the indicators selected by other feature selections. The aim is to specify the best feature selection method and determine the best machine learning method. To evaluate the predictive machine learning models' performance and have robust results, the 10-fold cross-validation on a rolling basis evaluation technique is used, and each model is repeated ten times. Therefore, the average results of 100 prediction trials, including the forecasting ability of models, namely RMSE and Pearson's *r*, are shown in Tables 13 and 14.



**Table 14.** Pearson's *r* of different models on different data sets.


According to Tables 13 and 14, all models applied to all indicators have the best accuracy than those applied to the other datasets. Therefore, it can be concluded that no feature selection improves the model's accuracy. Compared to those applied to the different datasets, all models applied to data reduced by PCA have the lowest accuracy. Therefore, it can be concluded that the PCA reduction method is not a promising feature selection method for this research data. Different models are compared together for each data set to identify the best model. Table 15 summarizes the model's comparisons, showing that the SVR model has the best accuracy for all datasets, and the MLP has the least accuracy.


**Table 15.** Order of the models in terms of accuracy.
