5.3.1. ANOVA Results under Different Imbalance Ratios
We conducted comprehensive ANOVA experiments to gain a deeper understanding and quantify the impact of different parameters on the CNN model performance across various levels of imbalance. ANOVA was conducted on datasets with imbalance ratios of 25:1, 15:1, and 1:1, showing the main and interactions effects of LR, DR 1, DR 2, and KS on CNN model accuracy.
Table 5 presents partial results from the fractional factorial design experiments conducted on the CIFAR-10 dataset, encompassing a total of 250 data points. For clarity and conciseness,
Table 5 displays only rows 1–4, 124–126, and 247–250, representing the overall dataset. Each parameter configuration was tested four times, and the accuracy, F1 score, G-mean, P-mean, and recall values were obtained for the minority class in each experiment. The same experiments were conducted using the Fashion-MNIST dataset. These data were used for variance analysis to assess the main and interaction effects of different factors on model performance. Through repeated experiments, we can better understand the impact of each parameter on the model performance and identify the optimal parameter combination to enhance the model’s classification performance on imbalanced datasets. The data in the table demonstrate the consistency and variability of the results under the same parameter configurations, providing reliable support data for the subsequent variance analysis.
Comparing the results from
Table 6,
Table 7 and
Table 8, it is evident that LR, KS, and their interaction significantly impact the model accuracy across all imbalance ratios. The results indicate that these two parameters are crucial tuning factors across different levels of imbalanced datasets.
In the CIFAR-10 dataset with an imbalance ratio of 25:1, both LR and KS significantly affected model accuracy, with p-values of and 0.036, respectively. In the same dataset, the interaction effect between LR and KS (LR × KS) also showed a significant impact, with an F-value of 27.600 and a p-value of . The combination of these two parameters significantly influenced the model performance.
LR and KS exhibited significant effects on the CIFAR-10 dataset with an imbalance ratio of 15:1, with p-values of and , respectively. The interaction effect between LR and KS (LR × KS) was particularly noteworthy, with an F-value of 38.768 and a p-value of , indicating a highly significant impact.
In the balanced CIFAR-10 dataset, LR and KS still significant affected accuracy, with p-values of and , respectively. Furthermore, the interaction effect between LR and KS (LR × KS) also demonstrated a significant impact, with an F-value of 209.146 and a p-value of . However, DR 1 and DR 2 did not exhibit significant main or interaction effects on any dataset, with p-values greater than 0.1.
Just as we conducted experiments on the CIFAR-10 dataset, we carried out the same experiments on the Fashion-MNIST dataset and achieved similar results. Across different imbalance ratios, we found significant main effects and interaction effects of LR and KS on the CNN model accuracy. In the Fashion-MNIST dataset with an imbalance ratio of 25:1, LR and KS exhibited p-values of and , respectively. The interaction effect (LR × KS) showed an F-value of 27.322 and a p-value of . Under the 15:1 imbalance ratio, LR and KS had p-values and F-values of and , respectively, with an interaction effect (LR × KS) p-value of and F-value of 13.800. In the balanced dataset, LR and KS significantly influenced the CNN model accuracy with p-values of and , respectively. The interaction effect (LR × KS) had an F-value of 12.222, although its impact was slightly diminished compared with that in the highly imbalanced dataset. These results underscore that the observed effects are not incidental but robust validations of their impact on model performance.
The previous section demonstrates the significant impact of various parameters on accuracy. To further explore the effects of these parameters on recall under different imbalance ratios, we conducted an analysis of variance to obtain the
p-values for the main effects and interaction effects of learning rate, dropout rate 1, dropout rate 2, and kernel size on the accuracy of minority class samples. Due to the substantial differences in the
p-values, we standardized each
p-value using the following formula:
This standardization process ensures that the effects of different parameters at various imbalance ratios can be compared on the same scale, thereby guaranteeing fairness and accuracy in the data analysis. By employing this method, we were able to illustrate and compare the substantial impact of various parameters on recall.
As can be seen from
Figure 6, the impact of different parameters on the model performance varied across different imbalance ratios. In highly imbalanced datasets, LR, KS, and their interaction effects significantly influenced the model. In moderately imbalanced datasets, the effects of LR and KS were more pronounced. In balanced datasets, the interaction effects of LR and KS (LR × KS) and the interaction effects of DR 1 and KS (DR 1 × KS) significantly impacted model performance. Based on these findings, it is possible to more precisely select and adjust the parameters of CNN models to improve their performance on imbalanced data.
In summary, LR and KS significantly impacted the overall model performance across all imbalanced datasets, especially in highly imbalanced datasets. The interaction between these two parameters also significantly influences the model’s performance. In balanced datasets, LR and KS still significantly affect the model performance. However, their effects are less pronounced than those for highly imbalanced datasets, whereas minor parameters such as DR 1 and DR 2 have a minor impact on model performance. In subsequent optimizations, attention should be focused on LR and KS, and their optimal combinations should be explored to enhance the performance of CNN models on imbalanced datasets.
5.3.2. Detailed Analysis of 25:1 Imbalanced Data on the CIFAR-10 Dataset
To gain a deeper understanding of the impact of the parameters on extremely imbalanced datasets, we conducted a detailed analysis of data with an imbalance ratio of 25:1. At this ratio, the model performance is susceptible to parameter settings, making the identification of critical parameters crucial for optimizing the model performance.
Because the independent variables that significantly impact the evaluation metrics are LR and KS, we do not discuss the main effect results of the dropout rate in detail.
Figure 7 shows the main effects of LR on various metrics, highlighting the significant impact of LR on each metric. The best performance for each metric occurred when LR ranged from 0.001 to 0.031, peaking at approximately 0.031. Therefore, selecting an appropriate LR is crucial for imbalanced data classification and can significantly enhance various metrics.
Figure 8 shows the overall impact of KS on various evaluation metrics for imbalanced and balanced data. The results indicate that smaller kernel sizes perform better in terms of accuracy, F1 score, G-mean, and P-mean, reflecting better overall balanced performance. On the other hand, larger kernel sizes significantly improve recall, but cause a decline in other performance metrics; thus, there is a significant trade-off when choosing the kernel size for imbalanced data classification. The selection of the most suitable kernel configuration should consider the specific performance requirements of the application and the importance of the different evaluation metrics.
Figure 9 illustrates the interaction effects of different parameter combinations on the accuracy of the imbalanced data classification. In the first subplot, the x-axis represents the LR, the y-axis represents the mean accuracy, and the colors denote different levels of DR 1. It can be observed that accuracy increases as the learning rate increases from 0.001 to 0.031 but sharply declines at a learning rate of 0.1. The impact of different DR 1 levels on the accuracy is relatively minor. Similar to in the previous subplot, the effect of different DR 2 levels on the accuracy is also relatively small. Additionally, the combined effects of DR 1 and DR 2 on the accuracy are complex.
In summary, LR and KS are the primary factors that significantly affect various performance metrics, while DR 1, DR 2, and their interactions have a minor impact on these metrics. To optimize the performance, the focus should be on the learning rate, kernel size, and interaction. LR = 0.01, a moderate learning rate, provides an adequate gradient update step size during model training. This allows the model to effectively learn the features of the training data while avoiding oscillations and instability caused by overly large gradient steps. DR 1 = 0.02, DR 2 = 0.02, and relatively low dropout rates help prevent overfitting while ensuring the model’s training efficiency and generalization ability. These two parameters enhance the robustness of the model while maintaining its simplicity. KS = 3 × 3, a smaller kernel size, allows the model to capture detailed image features better, thereby improving its ability to recognize minority class samples. The smaller kernel also reduces the number of parameters, thereby lowering the computational complexity. In the dataset with an imbalance ratio of 25:1, using fractional factorial design and ANOVA, we identified that the optimal parameter combination is A3B2C2D1.