Improving Minority Class Recall through a Novel Cluster-Based Oversampling Technique
Abstract
:1. Introduction
2. Literature Review
2.1. Conventional Resampling Method
2.2. Extreme Learning Machine and Cost-Sensitive Learning Method
2.3. Ensemble Method
2.4. Cluster-Based Algorithm
3. Proposed Algorithm
Algorithm 1: ClusterOversampleG-Mean (COG) Algorithm |
Input: Original imbalanced data set (D), Desired number of clusters for data partitioning |
(n), The cluster to perform oversampling on (i), Termination imbalanced ratio for |
oversampling (Γ), Initial oversampling ratio (П) |
Output: A classifier obtained from the algorithm (M3), |
The algorithm’s performances (G3) |
1: S, T ← PartitionData set(D) |
2: M1 ← BuildClassifier(S) |
3: G1 ← EvaluateClassifier(M1, T) |
4: Clusters ← KMeans(S, n) |
5: For each Cluster[i] in Clusters: |
6: Δ ← CalculateIR(Cluster[i]) |
7: П ← 0.1 |
8: While Δ < Γ: |
9: SyntheticSamples ← GenerateSyntheticSamples(Cluster[i], П) |
10: UpdatedCluster[i] ← CombineOriginalWithSynthetic(Cluster[i], SyntheticSamples) |
11: Δ ← UpdateImbalancedRatio(UpdatedCluster[i]) |
12: Φ ← CreateTemporaryData set(S, Cluster[i], UpdatedCluster[i]) |
13: M2 ← BuildModifiedClassifier(Φ) |
14: G2 ← EvaluateClassifier(M2, T) |
15: If G2 > G1: |
16: G1 ← G2 |
17: S ← Φ |
18: П ← IncrementOversamplingRatio(П) |
19: EndIF |
20: EndWhile |
21: EndFor |
22: M3 ← BuildNotableClassifier(S) |
23: G3 ← EvaluateClassifier(M3, T) |
24: Return M3 and G3 |
4. Experimental Configuration
4.1. Data Collection and Preprocessing
4.2. Analysis Software and Hardware
4.3. Selection of Baseline Methods
4.4. Performance Metric
- Proportion of misclassified minority instances
- Proportion of correctly classified majority instances
- Using these proportions, we apply the entropy formula:
5. Experimental Results
5.1. Improvement in Minority Class Recall
5.2. Influence on Additional Performance Measures
5.3. Reduction in Information Entropy
5.4. Parameter Settings Found in the Proposed Method
6. Discussion
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- He, H.; Wu, D. Learning from Imbalanced Data. IEEE Trans. Knowl. Data Eng. 2017, 29, 2734–2759. [Google Scholar]
- Chandola, R.; Banerjee, A.; Kumar, V. Fraud Detection Using Machine Learning: A Comprehensive Survey. ACM Comput. Surv. 2009, 41, 15. [Google Scholar] [CrossRef]
- Patel, A.; Smith, B.; Johnson, C. Reducing False Negatives in Medical Diagnostics Using Ensemble Learning. J. Med. Inform. 2018, 25, 123–137. [Google Scholar]
- Doe, J.; Smith, M.; Johnson, R. Crime Classification Using Machine Learning Techniques: A Comprehensive Study. Int. J. Law Technol. 2020, 30, 567–582. [Google Scholar]
- Prexawanprasut, T.; Banditwattanawong, T. Improving the Performance of Imbalanced Learning and Classification of a Juvenile Delinquency Data. In Intelligent Systems, Technologies and Application; Paprzycki, M., Thampi, S.M., Mitra, S., Trajkovic, L., El-Alfy, E.S.M., Eds.; Springer: Singapore, 2021; Volume 1353. [Google Scholar] [CrossRef]
- Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic Minority Over-sampling Technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
- Batista, P.; Prati, R.; Monard, M.C. A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data. IEEE Trans. Neural Netw. 2004, 15, 1249–1257. [Google Scholar]
- Nguyen, H.M.; Cooper, E.W.; Kamei, K. Borderline over-sampling for imbalanced data classification. Int. J. Knowl. Eng. Soft Data Paradig. 2009, 3, 4–21. [Google Scholar] [CrossRef]
- Krawczyk, M. A Comprehensive Investigation on the Effectiveness of SVM-SMOTE and One-sided Selection Techniques for Handling Class Imbalance. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 2202–2218. [Google Scholar]
- Gao, Z.; Huang, W.; Liu, Y. Optimizing SVM-SMOTE Sampling for Imbalanced Data Classification. IEEE Access 2019, 7, 40156–40168. [Google Scholar]
- Xie, L.; Li, Z.; Liu, X.; Li, D. An SVM-based random subsampling method for imbalanced data sets. IEEE Trans. Neural Netw. Learn. Syst. 2016, 27, 1055–1066. [Google Scholar]
- Han, D.; Liu, Q.; Li, X. Synthetic Informative Minority Oversampling (SIMO) for Imbalanced Classification. IEEE Trans. Knowl. Data Eng. 2016, 28, 2679–2691. [Google Scholar]
- García, S.; Herrera, F. A Comparative Study of Data Preprocessing Techniques for Credit Risk Assessment with SVM. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 2004, 34, 1–13. [Google Scholar]
- Han, H.; Wang, W.; Mao, B. Borderline-SMOTE variations for imbalanced data set learning. In Proceedings of the International Conference on Intelligent Computing, Hefei, China, 23–26 August 2005; Springer: Berlin/Heidelberg, Germany, 2005; pp. 878–887. [Google Scholar]
- García, V.; Sánchez, J.S.; Mollineda, R.A. On the effectiveness of preprocessing methods when dealing with different levels of class imbalance. Knowl.-Based Syst. 2009, 25, 13–21. [Google Scholar] [CrossRef]
- Tripathy, R.K.; Rath, S.K.; Rath, A.K. Safe-level SMOTE: A data preprocessing technique for class imbalance learning. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 2840–2851. [Google Scholar]
- He, H.; Bai, Y.; Garcia, E.A.; Li, S. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In Proceedings of the 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), Hong Kong, China, 1–8 June 2008; pp. 1322–1328. [Google Scholar] [CrossRef]
- Barua, S.; Islam, M.M.; Yao, X.; Murase, K. MWMOTE—Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning. IEEE Trans. Knowl. Data Eng. 2014, 26, 405–425. [Google Scholar] [CrossRef]
- Chawla, N.V.; Lazarevic, A.; Hall, L.O.; Bowyer, K.W. SMOTEBoost: Improving prediction of the minority class in boosting. In Knowledge Discovery in Databases: PKDD 2003: 7th European Conference on Principles and Practice of Knowledge Discovery in Databases, Cavtat, Croatia, 22–26 September 2003; Springer: Berlin/Heidelberg, Germany, 2003; pp. 107–119. [Google Scholar]
- Seiffert, C.; Khoshgoftaar, T.M.; Hulse, J.V.; Napolitano, A. RUSBoost: A hybrid approach to alleviating class imbalance. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 2010, 40, 185–197. [Google Scholar] [CrossRef]
- Tang, Y.; Zhang, Y.-Q.; Chawla, N.V.; Krasser, S. SVMs modeling for highly imbalanced classification. IEEE Trans. Syst. Man Cybern. Part B Cybern. 2009, 39, 281–288. [Google Scholar] [CrossRef]
- Japkowicz, N.; Stephen, S. The class imbalance problem: A systematic study. Intell. Data Anal. 2002, 6, 429–449. [Google Scholar] [CrossRef]
- Davis, J.; Goadrich, M. The Relationship between Precision-Recall and ROC Curves. In Proceedings of the 23rd International Conference on Machine Learning—ICML ’06, Pittsburgh, PA, USA, 25–29 June 2006. [Google Scholar] [CrossRef]
- Elkan, C. The Foundations of Cost-Sensitive Learning. In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, Seattle, WA, USA, 4–10 August 2001. [Google Scholar]
- Provost, F.; Fawcett, T. Fawcett. Robust Classification for Imprecise Environments. Mach. Learn. 2001, 42, 203–231. [Google Scholar] [CrossRef]
- Li, Y.; Zhang, S.; Yin, Y.; Xiao, W.; Zhang, J. Parallel one-class extreme learning machine for imbalance learning based on Bayesian approach. J. Ambient. Intell. Hum. Comput. 2018, 15, 1745–1762. [Google Scholar] [CrossRef]
- Anwar, S.; Khan, S.; Khan, M.F.; Khan, F.S.; Shao, L. Class-specific cost-sensitive extreme learning machine for imbalanced classification. Neurocomputing 2017, 267, 395–404. [Google Scholar] [CrossRef]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Schapire, R.E. A brief introduction to boosting. In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (IJCAI 9‘9), Stockholm, Sweden, 31 July–6 August 1999; pp. 1401–1406. [Google Scholar]
- Chen, T.; Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
- Sun, S.; Dai, Z.; Xi, X.; Shan, X.; Wang, B. Ensemble Machine Learning Identification of Power Fault Countermeasure Text Considering Word String TF-IDF Feature. In Proceedings of the 2018 IEEE International Conference of Safety Produce Informatization (IICSPI), Chongqing, China, 10–12 December 2018; pp. 610–616. [Google Scholar] [CrossRef]
- Choudhary, R.; Shukla, S. A clustering based ensemble of weighted kernelized extreme learning machine for class imbalance learning. Expert Syst. Appl. 2021, 164, 114041. [Google Scholar]
- Xu, Z.; Shen, D.; Nie, T.; Kou, Y.; Yin, N.; Han, X. A cluster-based oversampling algorithm combining SMOTE and k-means for imbalanced medical data. Inf. Sci. 2021, 572, 574–589. [Google Scholar] [CrossRef]
- Liang, X.W.; Jiang, A.P.; Li, T.; Xue, Y.Y.; Wang, G.T. LR-SMOTE—An improved unbalanced data set oversampling based on K-means and SVM. Knowl.-Based Syst. 2020, 196, 105845. [Google Scholar] [CrossRef]
- Tao, X.; Li, Q.; Guo, W.; Ren, C.; He, Q.; Liu, R.; Zou, J.-R. Adaptive weighted over-sampling for imbalanced data sets based on density peaks clustering with heuristic filtering. Inf. Sci. 2020, 519, 43–73. [Google Scholar] [CrossRef]
- Guzmán-Ponce, A.; Valdovinos, R.M.; Sánchez, J.S.; Marcial-Romero, J.R. A new under-sampling method to face class overlap and imbalance. Appl. Sci. 2020, 10, 5164. [Google Scholar] [CrossRef]
- Li, Z.; Huang, M.; Liu, G.; Jiang, C. A hybrid method with dynamic weighted entropy for handling the problem of class imbalance with overlap in credit card fraud detection. Expert Syst. Appl. 2021, 175, 114750. [Google Scholar] [CrossRef]
- Sivakrishna3311. Delinquency Telecom Data Set. Kaggle. 2019. Available online: https://www.kaggle.com/datasets/sivakrishna3311/delinquency-telecom-dataset (accessed on 15 December 2020).
- Urstrulyvikas. Lending Club Loan Data Analysis. Kaggle. 2020. Available online: https://www.kaggle.com/datasets/urstrulyvikas/lending-club-loan-data-analysis (accessed on 15 December 2020).
- Machine Learning Group—ULB. Credit Card Fraud Detection. Kaggle. 2017. Available online: https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud (accessed on 15 December 2020).
- Moro, S.; Cortez, P.; Rita, P. Bank Marketing. UCI Machine Learning Repository. 2014. Available online: https://archive.ics.uci.edu/dataset/222/bank+marketing (accessed on 15 October 2022).
- Chaipornkaew, P.; Prexawanprasut, T. A Prediction Model for Human Happiness Using Machine Learning Techniques. In Proceedings of the 2019 5th International Conference on Science in Information Technology (ICSITech), Yogyakarta, Indonesia, 23–24 October 2019; pp. 33–37. [Google Scholar] [CrossRef]
- Shilpagopal. US Crime Data Set Code. Kaggle. 2021. Available online: https://www.kaggle.com/datasets/shilpagopal/us-crime-dataset/code (accessed on 15 September 2022).
- Sayantandas30011998. E. coli Classification. Kaggle. 2019. Available online: https://www.kaggle.com/code/sayantandas30011998/ecoli-classification (accessed on 8 September 2022).
- Alpaydin, E.; Kaynak, C. Optical Recognition of Handwritten Digits Data Set. UCI Machine Learning Repository. 1998. Available online: https://archive.ics.uci.edu/dataset/80/optical+recognition+of+handwritten+digits (accessed on 15 December 2022).
- Samanemami. Yeast CSV. Kaggle. 2021. Available online: https://www.kaggle.com/datasets/samanemami/yeastcsv (accessed on 15 December 2022).
Data Set | Number of Attributes | Number of Instances | Majority: Minority Instances | Imbalanced Ratio | Source |
---|---|---|---|---|---|
Delinquency Telecom | 28 | 128,650 | 103,507:25,143 | 4.11 | Kaggle [38] |
Juvenile Delinquency | 26 | 5953 | 4828:1125 | 4.29 | Prexawanprasut et al. (2021) [6] |
Lending Club | 22 | 88,890 | 64,377:15,623 | 4.12 | Kaggle [39] |
Credit Fraud | 31 | 204,507 | 204,015:492 | 415.66 | Kaggle [40] |
Bank Marketing | 15 | 21,188 | 16,363:4825 | 3.39 | UCI [41] |
Happino | 25 | 988 | 902:86 | 10.49 | Chaipornkaew et al. (2019) [42] |
US Crime | 15 | 47 | 78:16 | 4.85 | Kaggle [43] |
Ecoli | 9 | 336 | 306:20 | 15.30 | Kaggle [44] |
Optical | 64 | 3823 | 3441:382 | 9.00 | UCI [45] |
Yeast | 9 | 1484 | 1054:430 | 2.45 | Kaggle [46] |
Data Set | RF | DT | ID3 | |||
---|---|---|---|---|---|---|
Number of Trees | Max Depth | Max Depth | Min Sample Split | Min Sample Split | Max Depth | |
Delinquency Telecom | 200 | none | 20 | 2 | 2 | none |
Juvenile Delinquency | 150 | none | 18 | 2 | 3 | none |
Lending Club | 100 | none | 25 | 2 | 2 | none |
Credit Fraud | 200 | none | 15 | 2 | 2 | none |
Bank Marketing | 100 | none | 6 | 2 | 2 | none |
Happino | 50 | none | 4 | 2 | 2 | none |
US Crime | 20 | none | 4 | 2 | 2 | none |
Ecoli | 50 | none | 5 | 2 | 2 | none |
Optical | 100 | none | 15 | 2 | 2 | none |
Yeast | 200 | none | 15 | 2 | 2 | none |
Predicted | ||||
---|---|---|---|---|
Positive Class | Negative Class | |||
Actual | Positive Class | True Positive (TP) | False Negative (FN) | Recall: TP/(TP + FN) |
Negative Class | False Positive (FP) | True Negative (TN) | Specificity TN/(TN + FP) | |
Precision: TP/(TP + FP) | Negative Predictive Value: TN/(TN + FN) | Accuracy: TP + TN/ (TP + TN + FP + FN) |
Original IR | SMOTE | SVM SMOTE | Borderline SMOTE | COG | |
---|---|---|---|---|---|
Minority Class Recall | 0.3054 | 0.3552 | 0.3687 | 0.3757 | 0.5995 |
Specificity | 0.7159 | 0.5143 | 0.6143 | 0.6571 | 0.5796 |
G-mean | 0.4675 | 0.5796 | 0.5915 | 0.5966 | 0.5895 |
F1-score | 0.2793 | 0.4327 | 0.4052 | 0.4123 | 0.4128 |
Original IR | SMOTE | SVM SMOTE | Borderline SMOTE | COG | |
---|---|---|---|---|---|
Minority Class Recall | 0.4321 | 0.7808 | 0.7896 | 0.7812 | 0.8056 |
Specificity | 0.5042 | 0.8645 | 0.9161 | 0.9076 | 0.9925 |
G-mean | 0.4668 | 0.7449 | 0.7804 | 0.7696 | 0.8942 |
F1-score | 0.2693 | 0.7425 | 0.7587 | 0.7793 | 0.8793 |
Original IR | SMOTE | SVM SMOTE | Borderline SMOTE | COG | |
---|---|---|---|---|---|
Minority Class Recall | 0.5581 | 0.7638 | 0.7817 | 0.6745 | 0.7858 |
Specificity | 0.7478 | 0.8935 | 0.8872 | 0.7137 | 0.9009 |
G-mean | 0.6461 | 0.7403 | 0.7388 | 0.6377 | 0.8414 |
F1-score | 0.5581 | 0.7333 | 0.7214 | 0.7145 | 0.7859 |
Data Set | Minority Class Recall | Improvement (%) From | ||
---|---|---|---|---|
SMOTE | SVMSMOTE | BorderlineSMOTE | ||
Delinquency Telecom | 59.95 | 24.43 | 23.08 | 22.38 |
Juvenile Delinquency | 80.56 | 2.48 | 7.64 | 8.49 |
Lending Club | 78.58 | 2.20 | 0.41 | 11.13 |
Credit Fraud | 56.05 | −2.27 | 3.89 | 2.21 |
Bank Marketing | 60.65 | 6.51 | −0.13 | 0.13 |
Happino | 41.46 | 4.88 | 7.32 | 0.00 |
US Crime | 45.45 | 9.09 | 0.00 | 0.00 |
Ecoli | 84.21 | 14.52 | 18.25 | 15.58 |
Optical | 76.29 | −11.54 | −1.58 | −3.42 |
Yeast | 26.53 | 9.35 | 8.33 | 9.75 |
Average | 60.97 | 5.97 | 6.72 | 6.63 |
Data Set | Specificity | Improvement (%) From | ||
---|---|---|---|---|
SMOTE | SVMSMOTE | BorderlineSMOTE | ||
Delinquency Telecom | 57.96 | 0.07 | −0.03 | −0.08 |
Juvenile Delinquency | 99.25 | 12.80 | 7.64 | 8.49 |
Lending Club | 90.09 | 0.74 | 1.37 | 18.72 |
Credit Fraud | 74.01 | −0.05 | 0.23 | 0.07 |
Bank Marketing | 82.03 | −0.12 | −2.33 | −0.85 |
Happino | 78.00 | −19.59 | −10.40 | 2.80 |
US Crime | 83.33 | 5.56 | −5.55 | 0.00 |
Ecoli | 89.68 | 4.25 | 5.23 | 5.59 |
Optical | 88.11 | 0.13 | 0.54 | 0.68 |
Yeast | 98.52 | 0.75 | 1.66 | 1.33 |
Average | 84.10 | 0.46 | −0.16 | 3.68 |
Data Set | G-Mean | Improvement (%) From | ||
---|---|---|---|---|
SMOTE | SVMSMOTE | BorderlineSMOTE | ||
Delinquency Telecom | 58.95 | 0.99 | 0.00 | −0.01 |
Juvenile Delinquency | 89.42 | 14.93 | 11.38 | 12.46 |
Lending Club | 84.14 | 10.11 | 10.26 | 20.37 |
Credit Fraud | 64.41 | −1.88 | 0.57 | 0.98 |
Bank Marketing | 70.56 | 3.85 | −1.06 | −0.28 |
Happino | 56.87 | −2.28 | 1.93 | 1.03 |
US Crime | 61.55 | 8.37 | −2.01 | 0.00 |
Ecoli | 86.90 | 9.64 | 12.32 | 13.08 |
Optical | 81.99 | −6.26 | −2.42 | −3.25 |
Yeast | 51.13 | 12.06 | 11.49 | 10.87 |
Average | 70.59 | 4.95 | 4.25 | 5.53 |
Data Set | F1-Score | Improvement (%) From | ||
---|---|---|---|---|
SMOTE | SVMSMOTE | BorderlineSMOTE | ||
Delinquency Telecom | 41.28 | −0.02 | 0.01 | 0.00 |
Juvenile Delinquency | 87.93 | 13.68 | 12.06 | 10.00 |
Lending Club | 78.59 | 5.26 | 6.45 | 7.14 |
Credit Fraud | 9.38 | −2.25 | −2.78 | −1.52 |
Bank Marketing | 44.60 | 3.70 | −2.58 | −0.80 |
Happino | 20.24 | −23.28 | −4.53 | 1.56 |
US Crime | 65.47 | 10.53 | −2.92 | 0.00 |
Ecoli | 66.67 | 8.50 | 10.20 | 10.33 |
Optical | 56.27 | 6.87 | 0.32 | 0.24 |
Yeast | 40.00 | 12.45 | 10.95 | 9.93 |
Average | 51.04 | 3.54 | 2.72 | 3.69 |
Data Set | IR in the Ambiguous Regions | Percent of Misclassified Minority Instances | Information Entropy |
---|---|---|---|
Delinquency Telecom | 16.55 | 60.04 | 0.8380 |
Juvenile Delinquency | 9.85 | 20.14 | 0.6245 |
Lending Club | 8.59 | 6.12 | 0.5145 |
Credit Fraud | 3.03 | 5.38 | 0.3528 |
Bank Marketing | 3.55 | 4.27 | 0.2524 |
Happino | 3.66 | 3.45 | 0.2647 |
US Crime | 6.22 | 6.82 | 0.3880 |
Ecoli | 12.12 | 40.58 | 0.6341 |
Optical | 8.00 | 5.44 | 0.1859 |
Yeast | 12.45 | 42.95 | 0.7255 |
Data Set | Reduction of Misclassified Minority Instances (%) | Reduction in Information Entropy Compared with | |||
---|---|---|---|---|---|
Original IR | SMOTE | SVM SMOTE | BorderlineSMOTE | ||
Delinquency Telecom | 25.63 | 0.4245 | 0.2457 | 0.2471 | 0.2785 |
Juvenile Delinquency | 10.06 | 0.4214 | 0.2209 | 0.2105 | 0.2457 |
Lending Club | 12.32 | 0.3547 | 0.0435 | 0.0475 | 0.2474 |
Credit Fraud | 5.52 | 0.0563 | 0.0587 | 0.0457 | 0.0458 |
Bank Marketing | 3.08 | 0.0952 | 0.0304 | 0.0147 | 0.0578 |
Happino | 5.04 | 0.0457 | 0.0921 | 0.0975 | 0.0475 |
US Crime | 9.20 | 0.3205 | 0.2150 | 0.0458 | 0.0243 |
Ecoli | 8.06 | 0.4347 | 0.2478 | 0.2571 | 0.2848 |
Optical | 1.59 | 0.0240 | 0.0145 | 0.0587 | 0.0074 |
Yeast | 9.55 | 0.4234 | 0.2289 | 0.2658 | 0.2875 |
Data Set | Number of Clusters (n) | Oversampling Details | Termination IR (Γ) | Initial IR (П) |
---|---|---|---|---|
Delinquency Telecom | 4 (4) | 0−, 1−, 2, 3 | 1.00 | 0.10 |
Juvenile Delinquency | 5 (4) | 0−, 1−, 2, 3−, 4 | 0.80 | 0.08 |
Lending Club | 8 (5) | 0−, 1−, 2, 3−, 4, 5, 6, 7− | 0.80 | 0.08 |
Credit Fraud | 11 (5) | 0−, 1, 2, 3−, 4, 5, 6−, 7+, 8, 9−, 10− | 0.65 | 0.15 |
Bank Marketing | 4 (3,4) | 0, 1−, 2, 3 | 1.00 | 0.02 |
Happino | 9 (5) | 0, 1, 2−, 3, 4, 5, 6, 7−, 8+ | 0.75 | 0.02 |
US Crime | 2 (2,3) | 0−, 1− | 1.00 | 0.05 |
Ecoli | 3 (2,3) | 0, 1−, 2 | 1.00 | 0.01 |
Optical | 4 (3) | 0−, 1−, 2, 3 | 1.00 | 0.02 |
Yeast | 4 (3) | 0−, 1−, 2, 3 | 1.00 | 0.05 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Prexawanprasut, T.; Banditwattanawong, T. Improving Minority Class Recall through a Novel Cluster-Based Oversampling Technique. Informatics 2024, 11, 35. https://doi.org/10.3390/informatics11020035
Prexawanprasut T, Banditwattanawong T. Improving Minority Class Recall through a Novel Cluster-Based Oversampling Technique. Informatics. 2024; 11(2):35. https://doi.org/10.3390/informatics11020035
Chicago/Turabian StylePrexawanprasut, Takorn, and Thepparit Banditwattanawong. 2024. "Improving Minority Class Recall through a Novel Cluster-Based Oversampling Technique" Informatics 11, no. 2: 35. https://doi.org/10.3390/informatics11020035
APA StylePrexawanprasut, T., & Banditwattanawong, T. (2024). Improving Minority Class Recall through a Novel Cluster-Based Oversampling Technique. Informatics, 11(2), 35. https://doi.org/10.3390/informatics11020035