Impact of Imbalanced Datasets Preprocessing in the Performance of Associative Classifiers
Abstract
:1. Introduction
2. Previous Works
2.1. Credit Scoring
2.2. Computational Intelligence Models for Financial Applications
2.3. Data Preprocessing for Financial Applications
3. Materials and Methods
3.1. Datasets
3.2. Associative Classifiers
3.2.1. Hybrid Associative Classifier with Translation
- For each association in the training set, the external product is completed, where is the transposed of the input vector .
- Sum the externals products to obtain the matrix , where is a normalization parameter (usually ). Each component of the matrix is defined as .
- Translate the pattern to classify , according to the average of the training patterns, as .
- Determine the components of the output vector (class) for the pattern to classify . To do so, it is considered that . Thus, the class wil be returned if and only if the obtained vector has value 1 in its k-th component, and 0 in the remaining components.
3.2.2. Extended Gamma Classifier
3.2.3. Naïve Associative Classifier
3.2.4. Smallest Normalized Difference Associative Memory
3.3. Sampling Algorithms for Imbalanced Data
- 1.
- Oversampling algorithms. These techniques are based on the creation of synthetic instances of the minority class through replication, or creating new instances based on the existing ones.
- 2.
- Undersampling algorithms. These methods are based on the elimination of instances of the majority class.
- 3.
- Hybrid Algorithms These methods are a combination of oversampling and undersampling techniques.
4. Experimental Methodology
4.1. Dataset Selection
4.2. Validation Methods Selection
4.3. Performance Measure Selection
- 1
- Correctly classify a positive instance (True Positive, TP)
- 2
- Correctly classify a negative instance (True Negative, TN)
- 3
- Incorrectly classifying a positive instance (False Negative, FN)
- 4
- Incorrectly classifying a negative instance (False Positive, FP)
4.4. Sampling Algorithms Selection
4.5. Classifiers Selection
4.6. Statistical Test Selection
5. Experimental Results
5.1. Results of the Sampling Algorithms
5.2. Impact of the Sampling Algorithms in the Performance of Associative Classifiers
5.2.1. Impact of the Sampling Algorithms on the Performance of Associative Classifiers
5.2.2. Impact on the performance of the Extended Gamma Classifier
5.2.3. Impact on the performance of the NAC Classifier
5.2.4. Impact on the performance of the SNDAM Classifier
5.3. Statistical Analysis
6. Conclusions and Future Works
- About sampling methods:
- All of the oversampling methods tested obtained balanced datasets, although at the cost of increasing the cardinality of the data.
- The undersampling methods analyzed (CNN, CNNTL, NCL, OSS, SBC and TL), but RUS, fail to find balanced data sets, when the imbalance ratio of the original set was greater than 4.0. However, for moderate imbalance ratios (less than 4.0), the NCL and TL algorithms got good results.
- Systematically, the CNN, CNNTL, OSS and SBC algorithms reversed the amounts of instances in the classes in the datasets, making the majority class a minority.
- The SBC algorithm had a very bad behavior in the face of financial data, since it systematically eliminated all the instances of the majority class.
- Both SMOTE-ENN and SMOTE-TL obtained good results according to data balancing.
- SPIDER2 obtained better balanced datasets than SPIDER.
- About the impact of the sampling in the associative classifiers:
- The HACT, Extended Gamma and NAC classifiers do not benefit from financial data balancing.
- Undersampling algorithms do not benefit the SNDAM classifier. However, oversampling and hybrid methods do increase the performance of SNDAM over imbalanced financial data.
- There is a significant improvement, within a 95% of confidence, in the Area under the ROC curve of SNDAM while sampling imbalanced financial data by SMOTE and SMOTE-ENN.
- To design undersampling algorithms that are robust to high imbalance ratios, in order to solve the limitations founded in the evaluated algorithms.
- To apply the proposed methodology to other supervised classifiers, for instance Deep Neural Networks and other algorithms related with Deep Learning.
- To choose other datasets, related to areas of interest other than financial, in order to perform experiments similar to those presented in this paper.
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Bischl, B.; Kühn, T.; Szepannek, G. On Class Imbalance Correction for Classification Algorithms in Credit Scoring. In Operations Research Proceedings 2014; Springer: Basel, Switzerland, 2014; pp. 37–43. [Google Scholar]
- García, V.; Marqués, A.; Sánchez, J.S. On the use of data filtering techniques for credit risk prediction with instance-based models. Expert Syst. Appl. 2012, 39, 13267–13276. [Google Scholar] [CrossRef]
- Marqués, A.I.; García, V.; Sánchez, J.S. On the suitability of resampling techniques for the class imbalance problem in credit scoring. J. Oper. Res. Soc. 2013, 64, 1060–1070. [Google Scholar] [CrossRef] [Green Version]
- Banasik, J.; Crook, J.; Thomas, L. Sample selection bias in credit scoring models. J. Oper. Res. Soc. 2003, 54, 822–832. [Google Scholar] [CrossRef]
- Su, H.; Qi, W.; Yang, C.; Aliverti, A.; Ferrigno, G.; De Momi, E. Deep Neural Network Approach in Human-Like Redundancy Optimization for Anthropomorphic Manipulators. IEEE Access 2019, 7, 124207–124216. [Google Scholar] [CrossRef]
- Su, H.; Yang, C.; Mdeihly, H.; Rizzo, A.; Ferrigno, G.; De Momi, E. Neural Network Enhanced Robot Tool Identification and Calibration for Bilateral Teleoperation. IEEE Access 2019, 7, 122041–122051. [Google Scholar] [CrossRef]
- Goh, R.; Lee, L. Credit Scoring: A Review on Support Vector Machines and Metaheuristic Approaches. Adv. Oper. Res. 2019, 2019, 1–30. [Google Scholar] [CrossRef]
- Wang, T.; Li, J. An improved support vector machine and its application in P2P lending personal credit scoring. In IOP Conference Series: Materials Science and Engineering; IOP Publishing: Bristol, UK, 2019; p. 062041. [Google Scholar]
- Luo, J.; Yan, X.; Tian, Y. Unsupervised quadratic surface support vector machine with application to credit risk assessment. Eur. J. Oper. Res. 2020, 280, 1008–1017. [Google Scholar] [CrossRef]
- Akkoç, S. Exploring the Nature of Credit Scoring: A Neuro Fuzzy Approach. Fuzzy Econ. Rev. 2019, 24, 3–24. [Google Scholar]
- Livieris, I.E. Forecasting economy-related data utilizing weight-constrained recurrent neural networks. Algorithms 2019, 12, 85. [Google Scholar] [CrossRef] [Green Version]
- Munkhdalai, L.; Lee, J.Y.; Ryu, K.H. A Hybrid Credit Scoring Model Using Neural Networks and Logistic Regression. In Advances in Intelligent Information Hiding and Multimedia Signal Processing; Springer: Basel, Switzerland, 2020; pp. 251–258. [Google Scholar]
- Feng, X.; Xiao, Z.; Zhong, B.; Dong, Y.; Qiu, J. Dynamic weighted ensemble classification for credit scoring using Markov Chain. Appl. Intell. 2019, 49, 555–568. [Google Scholar] [CrossRef]
- Guo, S.; He, H.; Huang, X. A multi-stage self-adaptive classifier ensemble model with application in credit scoring. IEEE Access 2019, 7, 78549–78559. [Google Scholar] [CrossRef]
- Pławiak, P.; Abdar, M.; Acharya, U.R. Application of new deep genetic cascade ensemble of SVM classifiers to predict the Australian credit scoring. Appl. Soft Comput. 2019, 84, 105740. [Google Scholar] [CrossRef]
- Xiao, J.; Zhou, X.; Zhong, Y.; Xie, L.; Gu, X.; Liu, D. Cost-sensitive semi-supervised selective ensemble model for customer credit scoring. Knowl.-Based Syst. 2020, 189, 105118. [Google Scholar] [CrossRef]
- Shen, K.-Y.; Sakai, H.; Tzeng, G.-H. Comparing two novel hybrid MRDM approaches to consumer credit scoring under uncertainty and fuzzy judgments. Int. J. Fuzzy Syst. 2019, 21, 194–212. [Google Scholar] [CrossRef]
- Zhang, W.; He, H.; Zhang, S. A novel multi-stage hybrid model with enhanced multi-population niche genetic algorithm: An application in credit scoring. Expert Syst. Appl. 2019, 121, 221–232. [Google Scholar] [CrossRef]
- Maldonado, S.; Peters, G.; Weber, R. Credit scoring using three-way decisions with probabilistic rough sets. Inf. Sci. 2020, 507, 700–714. [Google Scholar] [CrossRef]
- García, V.; Marqués, A.I.; Sánchez, J.S. Exploring the synergetic effects of sample types on the performance of ensembles for credit risk and corporate bankruptcy prediction. Inf. Fusion 2019, 47, 88–101. [Google Scholar] [CrossRef]
- Louzada, F.; Ara, A.; Fernandes, G.B. Classification methods applied to credit scoring: Systematic review and overall comparison. Surv. Oper. Res. Manag. Sci. 2016, 21, 117–134. [Google Scholar] [CrossRef] [Green Version]
- Lessmann, S.; Baesens, B.; Seow, H.-V.; Thomas, L.C. Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research. Eur. J. Oper. Res. 2015, 247, 124–136. [Google Scholar] [CrossRef] [Green Version]
- Brown, I.; Mues, C. An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Syst. Appl. 2012, 39, 3446–3453. [Google Scholar] [CrossRef] [Green Version]
- Su, H.; Yang, C.; Ferrigno, G.; De Momi, E. Improved human–robot collaborative control of redundant robot for teleoperated minimally invasive surgery. IEEE Robot. Autom. Lett. 2019, 4, 1447–1453. [Google Scholar] [CrossRef] [Green Version]
- Wolpert, D.H. The supervised learning no-free-lunch theorems. In Soft Computing and Industry; Springer: London, UK, 2002; pp. 25–42. [Google Scholar]
- Villuendas-Rey, Y.; Rey-Benguría, C.F.; Ferreira-Santiago, Á.; Camacho-Nieto, O.; Yáñez-Márquez, C. The naïve associative classifier (NAC): A novel, simple, transparent, and accurate classification model evaluated on financial data. Neurocomputing 2017, 265, 105–115. [Google Scholar] [CrossRef]
- López, V.; Fernández, A.; García, S.; Palade, V.; Herrera, F. An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Inf. Sci. 2013, 250, 113–141. [Google Scholar] [CrossRef]
- Piramuthu, S. On preprocessing data for financial credit risk evaluation. Expert Syst. Appl. 2006, 30, 489–497. [Google Scholar] [CrossRef]
- Abdou, H.A.; Pointon, J. Credit scoring, statistical techniques and evaluation criteria: A review of the literature. Intell. Syst. Account. Financ. Manag. 2011, 18, 59–88. [Google Scholar] [CrossRef] [Green Version]
- Su, H.; Ovur, S.E.; Zhou, X.; Qi, W.; Ferrigno, G.; De Momi, E. Depth vision guided hand gesture recognition using electromyographic signals. Adv. Robot. 2020, 1–13. [Google Scholar] [CrossRef]
- Beaver, W.H. Financial ratios as predictors of failure. J. Account. Res. 1966, 4, 71–111. [Google Scholar] [CrossRef]
- Altman, E.I. Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. J. Financ. 1968, 23, 589–609. [Google Scholar] [CrossRef]
- Damrongsakmethee, T.; Neagoe, V.-E. Principal component analysis and relieff cascaded with decision tree for credit scoring. In Proceedings of the Computer Science On-line Conference, Zlin, Czech Republic, 24–27 April 2019; pp. 85–95. [Google Scholar]
- Kozodoi, N.; Lessmann, S.; Papakonstantinou, K.; Gatsoulis, Y.; Baesens, B. A multi-objective approach for profit-driven feature selection in credit scoring. Decis. Support Syst. 2019, 120, 106–117. [Google Scholar] [CrossRef]
- Srinivasan, V.; Kim, Y.H. Credit granting: A comparative analysis of classification procedures. J. Financ. 1987, 42, 665–681. [Google Scholar] [CrossRef]
- Abellán, J.; Castellano, J.G. A comparative study on base classifiers in ensemble methods for credit scoring. Expert Syst. Appl. 2017, 73, 1–10. [Google Scholar] [CrossRef]
- Boughaci, D.; Alkhawaldeh, A.A. Appropriate machine learning techniques for credit scoring and bankruptcy prediction in banking and finance: A comparative study. Risk Decis. Anal. 2018, 1–10. [Google Scholar] [CrossRef]
- Greene, W. Sample selection in credit-scoring models. Jpn. World Econ. 1998, 10, 299–316. [Google Scholar] [CrossRef]
- Crone, S.F.; Finlay, S. Instance sampling in credit scoring: An empirical study of sample size and balancing. Int. J. Forecast. 2012, 28, 224–238. [Google Scholar] [CrossRef]
- Dal Pozzolo, A.; Caelen, O.; Bontempi, G. When is undersampling effective in unbalanced classification tasks? In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Porto, Portugal, 7–11 September 2015; pp. 200–215. [Google Scholar]
- Santiago-Montero, R. Hybrid Accociative pattern Classifier with Translation (In Spanish: Clasificador Híbrido de Patrones Basado en la Lernmatrix de Steinbuch y el Linear Associator de Anderson Kohonen). Master’s Thesis, Centro de Investigación en Computación, Instituto Politécnico Nacional, Mexico City, Mexico, 2003. [Google Scholar]
- Cleofas-Sánchez, L.; García, V.; Marqués, A.; Sánchez, J.S. Financial distress prediction using the hybrid associative memory with translation. Appl. Soft Comput. 2016, 44, 144–152. [Google Scholar] [CrossRef] [Green Version]
- López-Yáñez, I.; Argüelles-Cruz, A.J.; Camacho-Nieto, O.; Yáñez-Márquez, C. Pollutants time-series prediction using the Gamma classifier. Int. J. Comput. Int. Syst. 2011, 4, 680–711. [Google Scholar] [CrossRef]
- Ramirez, A.; Lopez, I.; Villuendas, Y.; Yanez, C. Evolutive improvement of parameters in an associative classifier. IEEE Lat. Am. Trans. 2015, 13, 1550–1555. [Google Scholar] [CrossRef]
- Villuendas-Rey, Y.; Yanez-Marquez, C.; Anton-Vargas, J.A.; Lopez-Yanez, I. An extension of the gamma associative classifier for dealing with hybrid data. IEEE Access 2019, 7, 64198–64205. [Google Scholar] [CrossRef]
- Serrano-Silva, Y.O.; Villuendas-Rey, Y.; Yáñez-Márquez, C. Automatic feature weighting for improving financial Decision Support Systems. Decis. Support Syst. 2018, 107, 78–87. [Google Scholar] [CrossRef]
- Ramírez-Rubio, R.; Aldape-Pérez, M.; Yáñez-Márquez, C.; López-Yáñez, I.; Camacho-Nieto, O. Pattern classification using smallest normalized difference associative memory. Pattern Recogn. Lett. 2017, 93, 104–112. [Google Scholar] [CrossRef]
- Cleofas-Sánchez, L.; Sánchez, J.S.; García, V.; Valdovinos, R.M. Associative Learning on imbalanced environments: An empirical study. Expert Syst. Appl. 2016, 54, 387–397. [Google Scholar] [CrossRef] [Green Version]
- González, S.; García, S.; Li, S.-T.; Herrera, F. Chain based sampling for monotonic imbalanced classification. Inf. Sci. 2019, 474, 187–204. [Google Scholar] [CrossRef]
- Nejatian, S.; Parvin, H.; Faraji, E. Using sub-sampling and ensemble clustering techniques to improve performance of imbalanced classification. Neurocomputing 2018, 276, 55–66. [Google Scholar] [CrossRef]
- Yan, Y.; Liu, R.; Ding, Z.; Du, X.; Chen, J.; Zhang, Y. A parameter-free cleaning method for SMOTE in imbalanced classification. IEEE Access 2019, 7, 23537–23548. [Google Scholar] [CrossRef]
- Li, Y.; Wang, J.; Wang, S.; Liang, J.; Li, J. Local dense mixed region cutting+ global rebalancing: A method for imbalanced text sentiment classification. Int. J. Mach. Learn. Cybern. 2019, 10, 1805–1820. [Google Scholar] [CrossRef]
- Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
- He, H.; Bai, Y.; Garcia, E.A.; Li, S. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In Proceedings of the 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), Hong Kong, China, 1–8 June 2008; pp. 1322–1328. [Google Scholar]
- Han, H.; Wang, W.-Y.; Mao, B.-H. Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning. In Proceedings of the International Conference on Intelligent Computing, Hefei, China, 23–26 August 2005; pp. 878–887. [Google Scholar]
- Bunkhumpornpat, C.; Sinapiromsaran, K.; Lursinsap, C. Safe-level-smote: Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, Bangkok, Thailand, 27–30 April 2009; pp. 475–482. [Google Scholar]
- Batista, G.E.; Prati, R.C.; Monard, M.C. A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newsl. 2004, 6, 20–29. [Google Scholar] [CrossRef]
- Tang, S.; Chen, S.-P. The generation mechanism of synthetic minority class examples. In Proceedings of the 2008 International Conference on Information Technology and Applications in Biomedicine, Shenzhen, China, 30–31 May 2008; pp. 444–447. [Google Scholar]
- Tomek, I. Two modification of CNN. IEEE Trans. Syst. Man Commun. 1976, 6, 769–772. [Google Scholar]
- Hart, P. The condensed nearest neighbor rule. IEEE Trans. Inf. Theory 1968, 14, 515–516. [Google Scholar] [CrossRef]
- Kubat, M.; Matwin, S. Addressing the curse of imbalanced training sets: One-sided selection. In Proceedings of the 14th International Conference on Machine Learning (ICML), Nashville, TN, USA, 8–12 July 1997; ICML: Nashville, TN, USA, 1997; pp. 179–186. [Google Scholar]
- Laurikkala, J. Improving identification of difficult small classes by balancing class distribution. In Proceedings of the Conference on Artificial Intelligence in Medicine in Europe, Cascais, Portugal, 1–4 July 2001; pp. 63–66. [Google Scholar]
- Yen, S.-J.; Lee, Y.-S. Under-sampling approaches for improving prediction of the minority class in an imbalanced dataset. In Intelligent Control and Automation; Springer: Berlin, Heidelberg, 2006; pp. 731–740. [Google Scholar]
- Stefanowski, J.; Wilk, S. Selective pre-processing of imbalanced data for improving classification performance. In Proceedings of the International Conference on Data Warehousing and Knowledge Discovery, Turin, Italy, 1–5 September 2008; pp. 283–292. [Google Scholar]
- Napierała, K.; Stefanowski, J.; Wilk, S. Learning from imbalanced data in presence of noisy and borderline examples. In Proceedings of the International Conference on Rough Sets and Current Trends in Computing, Warsaw, Poland, 28–30 June 2010; pp. 158–167. [Google Scholar]
- Larson, S.C. The shrinkage of the coefficient of multiple correlation. J. Educ. Psychol. 1931, 22, 45–55. [Google Scholar] [CrossRef]
- Stone, M. Cross-Validatory Choice and Assessment of Statistical Predictions. J. R. Stat. Soc. 1974, 36, 111–147. [Google Scholar] [CrossRef]
- Geisser, S. The predictive sample reuse model method with applications. J. Am. Stat. Assoc. 1975, 70, 320–328. [Google Scholar] [CrossRef]
- Dietterich, T.G. Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms. Neural Comput. 1998, 10, 1895–1923. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Alcalá-Fdez, F.H.J.; Sánchez, L.; García, S.; del Jesus, M.J.; Ventura, S.; Garrell, J.M.; Otero, J.; Romero, C.; Bacardit, J.; Rivas, V.M.; et al. KEEL: A software tool to assess evolutionary algorithms for data mining problems. Soft Comput. 2009, 13, 307–318. [Google Scholar]
- Sokolova, M.; Lapalme, G. A systematic analysis of performance measures for classification tasks. Inf. Process. Manag. 2009, 45, 427–437. [Google Scholar] [CrossRef]
- Triguero, I.; González, S.; Moyano, J.M.; García López, S.; Alcalá Fernández, J.; Luengo Martín, J.; Fernández, A.; del Jesús, M.J.; Sánchez, L.; Herrera, F. KEEL 3.0: An open source software for multi-stage analysis in data mining. Int. J. Comput. Intell. Syst. 2017, 10, 1238–1249. [Google Scholar] [CrossRef] [Green Version]
- Hernández-Castaño, J.A.; Villuendas-Rey, Y.; Camacho-Nieto, O.; Yáñez-Márquez, C. Experimental platform for intelligent computing (epic). Computación y Sistemas 2018, 22, 245–253. [Google Scholar] [CrossRef]
- Hernández-Castaño, J.A.; Villuendas-Rey, Y.; Camacho-Nieto, O.; Rey-Benguría, C.F. A New Experimentation Module for the EPIC Software. Res. Comput. Sci. 2018, 147, 243–252. [Google Scholar] [CrossRef]
- Garcia, S.; Herrera, F. An extension on “statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons. J. Mach. Learn. Res. 2008, 9, 2677–2694. [Google Scholar]
- Friedman, M. The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Am. Stat. Assoc. 1937, 32, 675–701. [Google Scholar] [CrossRef]
- Holm, S. A simple sequentially rejective multiple test procedure. Scand. J. Stat. 1979, 6, 65–70. [Google Scholar]
Dataset | Instances | Attributes | Missing | IR | |
---|---|---|---|---|---|
Num. | Cat. | ||||
Australian | 690 | 8 | 6 | No | 1.25 |
Default credit | 30,000 | 13 | 10 | No | 3.52 |
German | 1000 | 7 | 13 | No | 2.33 |
Give me credit | 150,000 | 10 | 0 | No | 13.96 |
Iranian | 1002 | 28 | 0 | Yes | 19.04 |
Japanese | 690 | 6 | 9 | Yes | 1.21 |
Polish_1 | 7027 | 64 | 0 | Yes | 24.93 |
Polish_2 | 10,173 | 64 | 0 | Yes | 24.43 |
Polish_3 | 10,503 | 64 | 0 | Yes | 20.22 |
Polish_4 | 9792 | 64 | 0 | Yes | 18.01 |
Polish_5 | 5910 | 64 | 0 | Yes | 13.41 |
Qualitative | 250 | 0 | 6 | No | 1.34 |
The PAKDD | 20,000 | 10 | 9 | Yes | 4.12 |
Parameter | Meaning | Recommendation |
---|---|---|
It is the vector of weights of the attributes, which indicates the importance of each attribute. | Computed by Differential Evolution [40] | |
It is the value that will initially take θ and indicates how different two numerical values can be and that the extended generalized similarity operator considers them similar. | (inicial value) | |
It is the stop parameter and refers to the maximum value allowed to θ, which allows to continue looking for the disambiguation of patterns near the border; when θ = ρ, the CAG will stop iterating and disambiguate the class. | if there is at least a numeric attribute. Otherwise use | |
It is the pause parameter. In this pause an evaluation of the pattern to be classified is carried out, in order to determine whether or not it belongs to the unknown class: it depends on whether the normal operation of the algorithm is continued. | if there is at least a numeric attribute. Otherwise use | |
It is the threshold to decide if the pattern to be classified belongs to the unknown class or to any of the known classes. |
Name | Acronym | Reference |
---|---|---|
Synthetic Minority Over-sampling TEchnique | SMOTE | [53] |
ADAptive SYNthetic Sampling | ADASYN | [54] |
Borderline-Synthetic Minority Over-sampling TEchnique | SMOTE-BL | [55] |
Safe Level Synthetic Minority Over-sampling TEchnique | SMOTE-SL | [56] |
Random Oversampling | ROS | [57] |
Adjusting the Direction Of the synthetic Minority clasS examples | ADOMS | [58] |
Name | Acronym | Reference |
---|---|---|
Tomek’s modification of Condensed Nearest Neighbor | TL | [59] |
Condensed Nearest Neighbor | CNN | [60] |
Condensed Nearest Neighbor + Tomek’s modification of Condensed Nearest Neighbor | CNNTL | [57] |
One Side Selection | OSS | [61] |
Random Undersampling | RUS | [57] |
Neighborhood Cleaning Rule | NCL | [62] |
Under-Sampling Based on Clustering | SBC | [63] |
Name | Acronym | Reference |
---|---|---|
Synthetic Minority Over-sampling Technique + Edited Nearest Neighbor | SMOTE-ENN | [57] |
Synthetic Minority Over-sampling Technique + Tomek’s modification of Condensed Nearest Neighbor | SMOTE-TL | [57] |
Selective Preprocessing of Imbalanced Data | SPIDER | [64] |
Selective Preprocessing of Imbalanced Data 2 | SPIDER2 | [65] |
Datasets | Original | ADASYN | ADOMS | SMOTE-BL | ROS | SMOTE-SL | SMOTE |
---|---|---|---|---|---|---|---|
Australian | 1.25 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
Default credit | 3.52 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
German | 2.33 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
Give me credit | 13.96 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
Iranian | 19.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
Japanese | 1.26 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
Polish_1 | 24.93 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
Polish_2 | 24.43 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
Polish_3 | 20.22 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
Polish_4 | 18.01 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
Polish_5 | 13.41 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
Qualitative | 1.34 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
The PAKDD | 4.12 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
Datasets | Original | CNN | CNNTL | NCL | OSS | RUS | SBC | TL |
---|---|---|---|---|---|---|---|---|
Australian | 1.25 | 3.37 | 8.59 | 1.35 | 5.50 | 1.00 | 2.22 | 1.04 |
Default credit | 3.52 | 1.13 | 2.24 | 1.78 | 1.21 | 1.00 | - | 2.69 |
German | 2.33 | 1.12 | 2.96 | 1.08 | 1.75 | 1.00 | 1.96 | 1.68 |
Give me credit | 13.96 | 1.59 | 1.36 | 10.84 | 1.39 | 1.00 | 2.00 | 12.95 |
Iranian | 19.00 | 2.37 | 1.47 | 14.53 | 2.18 | 1.00 | - | 17.91 |
Japanese | 1.26 | 3.45 | 9.07 | 1.33 | 5.30 | 1.00 | 1.96 | 1.01 |
Polish_1 | 24.93 | 2.53 | 1.40 | 21.14 | 2.50 | 1.00 | - | 23.70 |
Polish_2 | 24.43 | 2.62 | 1.41 | 20.31 | 2.56 | 1.00 | - | 22.97 |
Polish_3 | 20.22 | 2.58 | 1.33 | 16.33 | 2.45 | 1.00 | - | 18.86 |
Polish_4 | 18.01 | 2.38 | 1.25 | 14.42 | 2.22 | 1.00 | - | 16.72 |
Polish_5 | 13.41 | 1.84 | 1.15 | 10.23 | 1.62 | 1.00 | - | 12.28 |
Qualitative | 1.34 | 6.36 | 5.85 | 1.28 | 6.27 | 1.00 | - | 1.32 |
The PAKDD | 4.12 | 1.51 | 1.15 | 2.41 | 1.41 | 1.00 | 2.00 | 3.84 |
Datasets | SMOTE-ENN | SMOTE-TL | SPIDER | SPIDER2 | Original |
---|---|---|---|---|---|
Australian | 1.03 | 1.31 | 1.17 | 1.49 | 1.25 |
Default credit | 1.39 | 1.50 | 1.45 | 1.02 | 3.52 |
German | 1.19 | 1.66 | 1.06 | 1.30 | 2.33 |
Give me credit | 1.13 | 1.17 | 4.22 | 2.62 | 13.96 |
Iranian | 1.18 | 1.14 | 6.18 | 3.89 | 19.00 |
Japanese | 1.02 | 1.28 | 1.15 | 1.46 | 1.26 |
Polish_1 | 1.24 | 1.19 | 6.87 | 4.14 | 24.93 |
Polish_2 | 1.28 | 1.21 | 6.49 | 3.84 | 24.43 |
Polish_3 | 1.32 | 1.26 | 5.50 | 3.31 | 20.22 |
Polish_4 | 1.30 | 1.24 | 5.08 | 3.13 | 18.01 |
Polish_5 | 1.29 | 1.25 | 3.98 | 2.51 | 13.41 |
Qualitative | 1.01 | 1.01 | 1.29 | 1.31 | 1.34 |
The PAKDD | 1.33 | 1.16 | 1.64 | 1.05 | 4.12 |
Datasets | Original | ADASYN | ADOMS | SMOTE-BL | ROS | SMOTE-SL | SMOTE |
---|---|---|---|---|---|---|---|
Australian | 0.59 | 0.59 | 0.59 | 0.59 | 0.61 | 0.59 | 0.60 |
Default credit | 0.95 | 0.96 | 0.96 | 0.96 | 0.96 | 0.96 | 0.97 |
German | 0.64 | 0.63 | 0.64 | 0.64 | 0.64 | 0.64 | 0.64 |
Give me credit | 0.63 | 0.63 | 0.60 | 0.63 | 0.64 | 0.63 | 0.62 |
Iranian | 0.61 | 0.60 | 0.61 | 0.60 | 0.61 | 0.60 | 0.60 |
Japanese | 0.64 | 0.65 | 0.64 | 0.66 | 0.64 | 0.62 | 0.63 |
Polish_1 | 0.70 | 0.65 | 0.66 | 0.62 | 0.65 | 0.64 | 0.66 |
Polish_2 | 0.58 | 0.58 | 0.58 | 0.58 | 0.58 | 0.58 | 0.58 |
Polish_3 | 0.52 | 0.53 | 0.52 | 0.51 | 0.52 | 0.53 | 0.52 |
Polish_4 | 0.50 | 0.50 | 0.50 | 0.50 | 0.50 | 0.50 | 0.50 |
Polish_5 | 0.53 | 0.52 | 0.52 | 0.53 | 0.52 | 0.52 | 0.52 |
Qualitative | 0.55 | 0.55 | 0.55 | 0.55 | 0.55 | 0.56 | 0.55 |
The PAKDD | 0.62 | 0.62 | 0.62 | 0.62 | 0.62 | 0.62 | 0.62 |
Datasets | Original | CNNTL | NCL | OSS | RUS | SBC | TL |
---|---|---|---|---|---|---|---|
Australian | 0.59 | 0.58 | 0.59 | 0.57 | 0.59 | 0.64 | 0.59 |
Default credit | 0.95 | 0.89 | 0.95 | 0.88 | 0.96 | - | 0.95 |
German | 0.64 | 0.64 | 0.63 | 0.64 | 0.64 | 0.50 | 0.64 |
Give me credit | 0.64 | 0.62 | 0.63 | 0.61 | 0.63 | 0.64 | 0.63 |
Iranian | 0.61 | 0.61 | 0.61 | 0.61 | 0.61 | - | 0.61 |
Japanese | 0.64 | 0.64 | 0.65 | 0.67 | 0.65 | 0.50 | 0.64 |
Polish_1 | 0.66 | 0.62 | 0.61 | 0.61 | 0.66 | - | 0.62 |
Polish_2 | 0.58 | 0.58 | 0.58 | 0.58 | 0.58 | - | 0.58 |
Polish_3 | 0.52 | 0.50 | 0.52 | 0.51 | 0.51 | - | 0.52 |
Polish_4 | 0.50 | 0.49 | 0.50 | 0.50 | 0.50 | - | 0.50 |
Polish_5 | 0.53 | 0.52 | 0.53 | 0.52 | 0.53 | - | 0.53 |
Qualitative | 0.55 | 0.55 | 0.56 | 0.55 | 0.56 | - | 0.55 |
The PAKDD | 0.62 | 0.63 | 0.62 | 0.63 | 0.62 | 0.53 | 0.63 |
Datasets | Original | SMOTE-ENN | SMOTE-TL | SPIDER2 | SPIDER |
---|---|---|---|---|---|
Australian | 0.59 | 0.60 | 0.57 | 0.59 | 0.59 |
Default credit | 0.95 | 0.96 | 0.96 | 0.95 | 0.95 |
German | 0.64 | 0.63 | 0.63 | 0.62 | 0.63 |
Give me credit | 0.63 | 0.63 | 0.63 | 0.63 | 0.63 |
Iranian | 0.61 | 0.61 | 0.61 | 0.61 | 0.61 |
Japanese | 0.64 | 0.64 | 0.65 | 0.65 | 0.65 |
Polish_1 | 0.70 | 0.65 | 0.62 | 0.60 | 0.62 |
Polish_2 | 0.58 | 0.58 | 0.58 | 0.58 | 0.58 |
Polish_3 | 0.52 | 0.52 | 0.52 | 0.53 | 0.52 |
Polish_4 | 0.50 | 0.50 | 0.50 | 0.50 | 0.50 |
Polish_5 | 0.53 | 0.52 | 0.52 | 0.52 | 0.53 |
Qualitative | 0.55 | 0.56 | 0.56 | 0.55 | 0.55 |
The PAKDD | 0.62 | 0.62 | 0.62 | 0.62 | 0.63 |
Datasets | Original | ADASYN | ADOMS | SMOTE-BL | ROS | SMOTE-SL | SMOTE |
---|---|---|---|---|---|---|---|
Australian | 0.84 | 0.85 | 0.83 | 0.85 | 0.83 | 0.83 | 0.83 |
Default credit | 0.99 | 0.98 | 0.99 | 0.97 | 0.99 | 0.99 | 0.99 |
German | 0.67 | 0.63 | 0.61 | 0.64 | 0.67 | 0.67 | 0.64 |
Give me credit | 0.68 | 0.67 | 0.54 | 0.67 | 0.69 | 0.69 | 0.65 |
Iranian | 0.59 | 0.51 | 0.50 | 0.52 | 0.59 | 0.59 | 0.51 |
Japanese | 0.68 | 0.63 | 0.57 | 0.58 | 0.67 | 0.64 | 0.63 |
Polish_1 | 0.82 | 0.85 | 0.83 | 0.85 | 0.82 | 0.82 | 0.82 |
Polish_2 | 0.57 | 0.55 | 0.50 | 0.60 | 0.61 | 0.58 | 0.61 |
Polish_3 | 0.76 | 0.50 | 0.50 | 0.50 | 0.76 | 0.58 | 0.50 |
Polish_4 | 0.71 | 0.50 | 0.50 | 0.50 | 0.71 | 0.55 | 0.50 |
Polish_5 | 0.71 | 0.50 | 0.50 | 0.50 | 0.72 | 0.62 | 0.50 |
Qualitative | 0.75 | 0.50 | 0.50 | 0.50 | 0.75 | 0.67 | 0.50 |
The PAKDD | 0.79 | 0.50 | 0.50 | 0.51 | 0.78 | 0.71 | 0.50 |
Datasets | Original | CNNTL | NCL | OSS | RUS | SBC | TL |
---|---|---|---|---|---|---|---|
Australian | 0.84 | 0.84 | 0.87 | 0.84 | 0.83 | 0.84 | 0.85 |
Default credit | 0.99 | 0.94 | 0.99 | 0.94 | 0.99 | - | 0.99 |
German | 0.67 | 0.69 | 0.66 | 0.68 | 0.67 | 0.53 | 0.66 |
Give me credit | 0.68 | 0.68 | 0.68 | 0.69 | 0.68 | 0.67 | 0.70 |
Iranian | 0.59 | 0.60 | 0.59 | 0.60 | 0.59 | - | 0.59 |
Japanese | 0.68 | 0.70 | 0.68 | 0.69 | 0.67 | 0.50 | 0.68 |
Polish_1 | 0.82 | 0.85 | 0.86 | 0.83 | 0.82 | - | 0.85 |
Polish_2 | 0.61 | 0.60 | 0.60 | 0.61 | 0.61 | - | 0.61 |
Polish_3 | 0.76 | 0.76 | 0.75 | 0.74 | 0.75 | - | 0.76 |
Polish_4 | 0.71 | 0.68 | 0.71 | 0.69 | 0.70 | - | 0.71 |
Polish_5 | 0.71 | 0.72 | 0.71 | 0.72 | 0.71 | - | 0.71 |
Qualitative | 0.75 | 0.73 | 0.75 | 0.73 | 0.74 | - | 0.75 |
The PAKDD | 0.79 | 0.78 | 0.79 | 0.79 | 0.79 | 0.55 | 0.78 |
Datasets | Original | SMOTE-ENN | SMOTE-TL | SPIDER2 | SPIDER |
---|---|---|---|---|---|
Australian | 0.84 | 0.83 | 0.85 | 0.86 | 0.85 |
Default credit | 0.99 | 1.00 | 1.00 | 0.98 | 0.98 |
German | 0.67 | 0.65 | 0.65 | 0.65 | 0.65 |
Give me credit | 0.68 | 0.69 | 0.69 | 0.68 | 0.68 |
Iranian | 0.59 | 0.51 | 0.51 | 0.59 | 0.59 |
Japanese | 0.68 | 0.63 | 0.63 | 0.67 | 0.67 |
Polish_1 | 0.82 | 0.82 | 0.85 | 0.87 | 0.86 |
Polish_2 | 0.57 | 0.56 | 0.57 | 0.60 | 0.61 |
Polish_3 | 0.76 | 0.50 | 0.50 | 0.76 | 0.76 |
Polish_4 | 0.71 | 0.50 | 0.50 | 0.71 | 0.71 |
Polish_5 | 0.71 | 0.50 | 0.50 | 0.71 | 0.72 |
Qualitative | 0.75 | 0.50 | 0.50 | 0.75 | 0.74 |
The PAKDD | 0.79 | 0.50 | 0.50 | 0.80 | 0.80 |
Datasets | Original | ADASYN | ADOMS | SMOTE-BL | ROS | SMOTE-SL | SMOTE |
---|---|---|---|---|---|---|---|
Australian | 0.83 | 0.83 | 0.84 | 0.84 | 0.84 | 0.84 | 0.83 |
Default credit | 0.99 | 0.98 | 1.00 | 0.98 | 0.99 | 0.99 | 0.99 |
German | 0.67 | 0.65 | 0.67 | 0.67 | 0.67 | 0.66 | 0.67 |
Give me credit | 0.69 | 0.69 | 0.68 | 0.68 | 0.70 | 0.68 | 0.69 |
Iranian | 0.61 | 0.59 | 0.60 | 0.61 | 0.61 | 0.60 | 0.59 |
Japanese | 0.50 | 0.50 | 0.50 | 0.50 | 0.50 | 0.50 | 0.50 |
Polish_1 | 0.84 | 0.84 | 0.84 | 0.84 | 0.83 | 0.84 | 0.83 |
Polish_2 | 0.60 | 0.61 | 0.61 | 0.61 | 0.61 | 0.61 | 0.61 |
Polish_3 | 0.50 | 0.50 | 0.50 | 0.50 | 0.50 | 0.50 | 0.50 |
Polish_4 | 0.50 | 0.50 | 0.50 | 0.50 | 0.50 | 0.50 | 0.50 |
Polish_5 | 0.50 | 0.50 | 0.50 | 0.50 | 0.51 | 0.50 | 0.50 |
Qualitative | 0.53 | 0.51 | 0.52 | 0.50 | 0.53 | 0.50 | 0.51 |
The PAKDD | 0.62 | 0.59 | 0.62 | 0.60 | 0.62 | 0.59 | 0.60 |
Datasets | Original | CNNTL | NCL | OSS | RUS | SBC | TL |
---|---|---|---|---|---|---|---|
Australian | 0.83 | 0.84 | 0.83 | 0.83 | 0.84 | 0.82 | 0.83 |
Default credit | 0.99 | 0.94 | 0.99 | 0.94 | 0.99 | - | 0.99 |
German | 0.67 | 0.65 | 0.58 | 0.67 | 0.67 | 0.52 | 0.57 |
Give me credit | 0.69 | 0.64 | 0.68 | 0.68 | 0.69 | 0.68 | 0.69 |
Iranian | 0.61 | 0.53 | 0.68 | 0.54 | 0.61 | - | 0.62 |
Japanese | 0.50 | 0.50 | 0.50 | 0.49 | 0.54 | 0.50 | 0.50 |
Polish_1 | 0.83 | 0.83 | 0.83 | 0.81 | 0.85 | - | 0.83 |
Polish_2 | 0.61 | 0.55 | 0.61 | 0.59 | 0.59 | - | 0.61 |
Polish_3 | 0.50 | 0.50 | 0.50 | 0.50 | 0.52 | - | 0.50 |
Polish_4 | 0.50 | 0.50 | 0.52 | 0.51 | 0.50 | - | 0.50 |
Polish_5 | 0.50 | 0.49 | 0.55 | 0.50 | 0.50 | - | 0.53 |
Qualitative | 0.53 | 0.54 | 0.53 | 0.53 | 0.52 | - | 0.54 |
The PAKDD | 0.62 | 0.50 | 0.65 | 0.52 | 0.50 | 0.50 | 0.57 |
Datasets | Original | SMOTE-ENN | SMOTE-TL | SPIDER2 | SPIDER |
---|---|---|---|---|---|
Australian | 0.83 | 0.84 | 0.82 | 0.82 | 0.83 |
Default credit | 0.99 | 0.98 | 1.00 | 0.98 | 0.98 |
German | 0.67 | 0.60 | 0.58 | 0.51 | 0.51 |
Give me credit | 0.69 | 0.70 | 0.70 | 0.68 | 0.69 |
Iranian | 0.61 | 0.73 | 0.65 | 0.61 | 0.62 |
Japanese | 0.50 | 0.50 | 0.50 | 0.50 | 0.50 |
Polish_1 | 0.84 | 0.83 | 0.84 | 0.82 | 0.83 |
Polish_2 | 0.60 | 0.61 | 0.61 | 0.61 | 0.61 |
Polish_3 | 0.50 | 0.50 | 0.50 | 0.51 | 0.50 |
Polish_4 | 0.50 | 0.50 | 0.50 | 0.50 | 0.50 |
Polish_5 | 0.50 | 0.53 | 0.52 | 0.51 | 0.50 |
Qualitative | 0.53 | 0.53 | 0.52 | 0.53 | 0.52 |
The PAKDD | 0.62 | 0.53 | 0.56 | 0.61 | 0.61 |
Datasets | Original | ADASYN | ADOMS | SMOTE-BL | ROS | SMOTE-SL | SMOTE |
---|---|---|---|---|---|---|---|
Australian | 0.80 | 0.79 | 0.79 | 0.81 | 0.80 | 0.80 | 0.80 |
Default credit | 1.00 | 1.00 | 0.99 | 1.00 | 1.00 | 1.00 | 1.00 |
German | 0.61 | 0.62 | 0.61 | 0.62 | 0.61 | 0.61 | 0.62 |
Give me credit | 0.58 | 0.60 | 0.59 | 0.62 | 0.58 | 0.59 | 0.58 |
Iranian | 0.57 | 0.58 | 0.59 | 0.59 | 0.57 | 0.58 | 0.59 |
Japanese | 0.65 | 0.67 | 0.66 | 0.66 | 0.65 | 0.65 | 0.68 |
Polish_1 | 0.82 | 0.82 | 0.83 | 0.83 | 0.82 | 0.82 | 0.82 |
Polish_2 | 0.91 | 0.91 | 0.91 | 0.90 | 0.91 | 0.78 | 0.91 |
Polish_3 | 0.54 | 0.58 | 0.55 | 0.56 | 0.54 | 0.54 | 0.57 |
Polish_4 | 0.52 | 0.52 | 0.53 | 0.52 | 0.52 | 0.52 | 0.53 |
Polish_5 | 0.52 | 0.55 | 0.54 | 0.54 | 0.52 | 0.53 | 0.55 |
Qualitative | 0.54 | 0.59 | 0.56 | 0.56 | 0.54 | 0.54 | 0.57 |
The PAKDD | 0.58 | 0.62 | 0.60 | 0.61 | 0.58 | 0.58 | 0.61 |
Datasets | Original | CNNTL | NCL | OSS | RUS | SBC | TL |
---|---|---|---|---|---|---|---|
Australian | 0.80 | 0.71 | 0.81 | 0.77 | 0.80 | 0.81 | 0.82 |
Default credit | 1.00 | 0.97 | 1.00 | 0.96 | 0.99 | - | 1.00 |
German | 0.61 | 0.59 | 0.65 | 0.62 | 0.62 | 0.52 | 0.64 |
Give me credit | 0.58 | 0.58 | 0.63 | 0.59 | 0.59 | 0.62 | 0.61 |
Iranian | 0.57 | 0.57 | 0.62 | 0.58 | 0.63 | - | 0.60 |
Japanese | 0.65 | 0.72 | 0.68 | 0.68 | 0.70 | 0.50 | 0.67 |
Polish_1 | 0.82 | 0.70 | 0.81 | 0.79 | 0.82 | - | 0.84 |
Polish_2 | 0.91 | 0.78 | 0.81 | 0.81 | 0.80 | - | 0.85 |
Polish_3 | 0.54 | 0.56 | 0.56 | 0.58 | 0.65 | - | 0.54 |
Polish_4 | 0.52 | 0.55 | 0.53 | 0.55 | 0.60 | - | 0.53 |
Polish_5 | 0.52 | 0.58 | 0.54 | 0.56 | 0.60 | - | 0.53 |
Qualitative | 0.54 | 0.57 | 0.58 | 0.57 | 0.62 | - | 0.56 |
The PAKDD | 0.58 | 0.62 | 0.65 | 0.61 | 0.68 | 0.53 | 0.61 |
Datasets | Original | SMOTE-ENN | SMOTE-TL | SPIDER2 | SPIDER |
---|---|---|---|---|---|
Australian | 0.80 | 0.85 | 0.81 | 0.84 | 0.84 |
Default credit | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
German | 0.61 | 0.64 | 0.64 | 0.64 | 0.64 |
Give me credit | 0.58 | 0.64 | 0.64 | 0.61 | 0.60 |
Iranian | 0.57 | 0.65 | 0.64 | 0.59 | 0.58 |
Japanese | 0.65 | 0.66 | 0.68 | 0.65 | 0.65 |
Polish_1 | 0.82 | 0.87 | 0.84 | 0.85 | 0.85 |
Polish_2 | 0.91 | 0.91 | 0.77 | 0.84 | 0.86 |
Polish_3 | 0.54 | 0.59 | 0.60 | 0.54 | 0.54 |
Polish_4 | 0.52 | 0.54 | 0.54 | 0.52 | 0.52 |
Polish_5 | 0.52 | 0.55 | 0.55 | 0.53 | 0.53 |
Qualitative | 0.54 | 0.62 | 0.61 | 0.54 | 0.54 |
The PAKDD | 0.58 | 0.67 | 0.67 | 0.59 | 0.59 |
i | Algorithm | z | p | Holm |
---|---|---|---|---|
6 | SBC | 2.768916 | 0.005624 | 0.008333 |
5 | CNNTL | 1.861075 | 0.062734 | 0.010000 |
4 | OSS | 1.770291 | 0.076679 | 0.012500 |
3 | TL | 0.408529 | 0.682886 | 0.016667 |
2 | NCL | 0.136176 | 0.891682 | 0.025000 |
1 | RUS | 0.045392 | 0.963795 | 0.050000 |
i | Algorithm | z | p | Holm |
---|---|---|---|---|
6 | ADOMS | 3.994502 | 0.000065 | 0.008333 |
5 | SMOTE | 3.041268 | 0.002356 | 0.010000 |
4 | ADASYN | 2.995876 | 0.002737 | 0.012500 |
3 | SMOTE-BL | 2.496564 | 0.01254 | 0.016667 |
2 | SMOTE-SL | 0.998625 | 0.317976 | 0.025000 |
1 | Original | 0.136176 | 0.891682 | 0.050000 |
i | Algorithm | z | p | Holm |
---|---|---|---|---|
6 | SBC | 3.404405 | 0.000663 | 0.008333 |
5 | RUS | 1.361762 | 0.173273 | 0.010000 |
4 | CNNTL | 0.817057 | 0.413896 | 0.012500 |
3 | OSS | 0.635489 | 0.52511 | 0.016667 |
2 | Original | 0.499313 | 0.617559 | 0.025000 |
1 | NCL | 0.272352 | 0.785351 | 0.05 |
i | Algorithm | z | p | Holm |
---|---|---|---|---|
4 | SMOTE-ENN | 2.790782 | 0.005258 | 0.012500 |
3 | SMOTE-TL | 2.10859 | 0.034980 | 0.016667 |
2 | Original | 0.496139 | 0.619796 | 0.025000 |
1 | SPIDER | 0.186052 | 0.852404 | 0.050000 |
i | Algorithm | z | p | Holm |
---|---|---|---|---|
6 | Original | 2.995876 | 0.002737 | 0.008333 |
5 | ROS | 2.995876 | 0.002737 | 0.010000 |
4 | SMOTE-SL | 2.814308 | 0.004888 | 0.012500 |
3 | ADOMS | 1.225586 | 0.220355 | 0.016667 |
2 | ADASYN | 0.22696 | 0.820455 | 0.025000 |
1 | SMOTE-BL | 0.22696 | 0.820455 | 0.050000 |
i | Algorithm | z | p | Holm |
---|---|---|---|---|
6 | SBC | 2.950484 | 0.003173 | 0.008333 |
5 | CNNTL | 2.314995 | 0.020613 | 0.010000 |
4 | Original | 2.088035 | 0.036795 | 0.012500 |
3 | OSS | 1.679506 | 0.093053 | 0.016667 |
2 | TL | 0.726273 | 0.467671 | 0.025000 |
1 | NCL | 0.090784 | 0.927664 | 0.050000 |
i | Algorithm | z | p | Holm |
---|---|---|---|---|
4 | Original | 4.279198 | 0.000019 | 0.012500 |
3 | SPIDER | 2.914816 | 0.003559 | 0.016667 |
2 | SPIDER2 | 2.790782 | 0.005258 | 0.025000 |
1 | SMOTE-TL | 1.17833 | 0.238665 | 0.050000 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Rangel-Díaz-de-la-Vega, A.; Villuendas-Rey, Y.; Yáñez-Márquez, C.; Camacho-Nieto, O.; López-Yáñez, I. Impact of Imbalanced Datasets Preprocessing in the Performance of Associative Classifiers. Appl. Sci. 2020, 10, 2779. https://doi.org/10.3390/app10082779
Rangel-Díaz-de-la-Vega A, Villuendas-Rey Y, Yáñez-Márquez C, Camacho-Nieto O, López-Yáñez I. Impact of Imbalanced Datasets Preprocessing in the Performance of Associative Classifiers. Applied Sciences. 2020; 10(8):2779. https://doi.org/10.3390/app10082779
Chicago/Turabian StyleRangel-Díaz-de-la-Vega, Adolfo, Yenny Villuendas-Rey, Cornelio Yáñez-Márquez, Oscar Camacho-Nieto, and Itzamá López-Yáñez. 2020. "Impact of Imbalanced Datasets Preprocessing in the Performance of Associative Classifiers" Applied Sciences 10, no. 8: 2779. https://doi.org/10.3390/app10082779
APA StyleRangel-Díaz-de-la-Vega, A., Villuendas-Rey, Y., Yáñez-Márquez, C., Camacho-Nieto, O., & López-Yáñez, I. (2020). Impact of Imbalanced Datasets Preprocessing in the Performance of Associative Classifiers. Applied Sciences, 10(8), 2779. https://doi.org/10.3390/app10082779