Research on User Default Prediction Algorithm Based on Adjusted Homogenous and Heterogeneous Ensemble Learning
Abstract
:1. Introduction
2. Data and Methods
2.1. Experimental Data
2.2. Data Preprocessing
2.3. Feature Extraction
2.4. Ensemble Learning Models
2.4.1. Random Forest
2.4.2. Multi-Grained Cascade Forest
2.4.3. Categorical Features Gradient Boosting
2.4.4. Random Undersampling Boosting
2.4.5. Stacking
2.5. Model Indicator
3. Results
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Gao, X.; Xiong, Y.; Xiong, Z.; Xiong, H. Credit Default Risk Prediction Based on Deep Learning. Res. Sq. 2021. [Google Scholar] [CrossRef]
- Çallı, B.A.; Coşkun, E. A Longitudinal Systematic Review of Credit Risk Assessment and Credit Default Predictors. SAGE Open 2021, 11, 21582440211061333. [Google Scholar] [CrossRef]
- Kriebel, J.; Stitz, L. Credit default prediction from user-generated text in peer-to-peer lending using deep learning. Eur. J. Oper. Res. 2022, 302, 309–323. [Google Scholar] [CrossRef]
- Hu, Y.; Jiang, H.; Zhong, Z. Impact of green credit on industrial structure in China: Theoretical mechanism and empirical analysis. Environ. Sci. Pollut. Res. 2020, 27, 10506–10519. [Google Scholar] [CrossRef]
- Nie, G.; Rowe, W.; Zhang, L.; Tian, Y.; Shi, Y. Credit card churn forecasting by logistic regression and decision tree. Expert Syst. Appl. 2011, 38, 15273–15285. [Google Scholar] [CrossRef]
- Padimi, V.; Sravan, V.; Ningombam, D.D. Applying Machine Learning Techniques To Maximize The Performance of Loan Default Prediction. J. Neutrosophic Fuzzy Syst. 2022, 2, 44–56. [Google Scholar] [CrossRef]
- Ribeiro, B.; Silva, C.; Chen, N.; Vieira, A.; das Neves, J.C. Enhanced default risk models with SVM+. Expert Syst. Appl. 2012, 39, 10140–10152. [Google Scholar] [CrossRef]
- Huang, Z. Research on Credit Default Prediction Based on Machine Learning. Master’s Thesis, Chongqing Technology and Business University, Chongqing, China, 2023. [Google Scholar]
- Syed Nor, S.H.; Ismail, S.; Yap, B.W. Personal bankruptcy prediction using decision tree model. J. Econ. Financ. Adm. Sci. 2019, 24, 157–170. [Google Scholar] [CrossRef]
- Abedin, M.Z.; Chi, G.; Colombage, S.; Moula, F.E. Credit default prediction using a support vector machine and a probabilistic neural network. J. Credit. Risk 2018, 14, 1–27. [Google Scholar] [CrossRef]
- Dong, X.; Yu, Z.; Cao, W.; Shi, Y.; Ma, Q. A survey on ensemble learning. Front. Comput. Sci. 2019, 14, 241–258. [Google Scholar] [CrossRef]
- Ying, C.; Qi-Guang, M.; Jia-Chen, L.; Lin, G. Advance and Prospects of AdaBoost Algorithm. Acta Autom. Sin. 2014, 39, 745–758. [Google Scholar]
- Abedin, M.Z.; Guotai, C.; Hajek, P.; Zhang, T. Combining weighted SMOTE with ensemble learning for the class-imbalanced prediction of small business credit risk. Complex Intell. Syst. 2022, 9, 3559–3579. [Google Scholar] [CrossRef]
- Khan, F.N.; Khan, A.H.; Israt, L. Credit Card Fraud Prediction and Classification using Deep Neural Network and Ensemble Learning. In Proceedings of the 2020 IEEE Region 10 Symposium 2020, Dhaka, Bangladesh, 5–7 June 2020. [Google Scholar]
- Yu, L.; Wang, S.; Lai, K.K. Developing an SVM-based ensemble learning system for customer risk identification collaborating with customer relationship management. Front. Comput. Sci. China 2010, 4, 196–203. [Google Scholar] [CrossRef]
- He, H.; Fan, Y. A novel hybrid ensemble model based on tree-based method and deep learning method for default prediction. Expert Syst. Appl. 2021, 176, 114899. [Google Scholar] [CrossRef]
- Murray, L.; Nguyen, H.; Lee, Y.-F.; Remmenga, M.D.; Smith, D.W. Variance Inflation Factors in Regression Models with Dummy Variables. In Proceedings of the Conference on Applied Statistics in Agriculture, Manhattan, Kansas, 29 April–1 May 2012. [Google Scholar]
- Odhiambo Omuya, E.; Onyango Okeyo, G.; Waema Kimwele, M. Feature Selection for Classification using Principal Component Analysis and Information Gain. Expert Syst. Appl. 2021, 174, 114765. [Google Scholar] [CrossRef]
- Zizi, Y.; Jamali-Alaoui, A.; El Goumi, B.; Oudgou, M.; El Moudden, A. An Optimal Model of Financial Distress Prediction: A Comparative Study between Neural Networks and Logistic Regression. Risks 2021, 9, 200. [Google Scholar] [CrossRef]
- Tian, S.; Yu, Y. Financial ratios and bankruptcy predictions: An international evidence. Int. Rev. Econ. Financ. 2017, 51, 510–526. [Google Scholar] [CrossRef]
- Alonso Robisco, A.; Carbó Martínez, J.M. Measuring the model risk-adjusted performance of machine learning algorithms in credit default prediction. Financ. Innov. 2022, 8, 70. [Google Scholar] [CrossRef]
- Lemmens, A.; Croux, C. Bagging and Boosting Classification Trees to Predict Churn. J. Mark. Res. 2006, 43, 276–286. [Google Scholar] [CrossRef]
- Rutkowski, L.; Jaworski, M.; Pietruczuk, L.; Duda, P. The CART decision tree for mining data streams. Inf. Sci. 2014, 266, 1–15. [Google Scholar] [CrossRef]
- Xia, M.; Wang, Z.; Han, F.; Kang, Y. Enhanced Multi-Dimensional and Multi-Grained Cascade Forest for Cloud/Snow Recognition Using Multispectral Satellite Remote Sensing Imagery. IEEE Access 2021, 9, 131072–131086. [Google Scholar] [CrossRef]
- Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: Unbiased boosting with categorical features. arXiv 2017, arXiv:1706.09516. [Google Scholar]
- Seiffert, C.; Khoshgoftaar, T.M.; Van Hulse, J.; Napolitano, A. RUSBoost: A Hybrid Approach to Alleviating Class Imbalance. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 2010, 40, 185–197. [Google Scholar] [CrossRef]
- Gupta, A.; Jain, V.; Singh, A. Stacking Ensemble-Based Intelligent Machine Learning Model for Predicting Post-COVID-19 Complications. New Gener. Comput. 2021, 40, 987–1007. [Google Scholar] [CrossRef] [PubMed]
Feature Name | Feature Value Type | Number of Missing Values | Missing Rate (%) |
---|---|---|---|
Return on equity | float | 2 | 0.85% |
Main business profit margin | float | 45 | 19.23% |
Main business net interest rate | float | 45 | 19.23% |
Main business cash ratio | float | 45 | 19.23% |
Profit growth rate | float | 61 | 26.07% |
Sales growth rate | float | 76 | 32.48% |
Total asset growth rate | float | 57 | 24.35% |
Growth rate of accounts receivable | float | 66 | 28.21% |
Rate of capital accumulation | float | 57 | 24.36% |
Number of accounts receivable turnover | float | 18 | 7.69% |
Average turnover days of receivables | float | 49 | 20.94% |
Number of inventory turnover | float | 20 | 8.55% |
Average turnover days of the inventory | float | 57 | 24.36% |
Turnover of current assets | float | 2 | 0.85% |
Turnover of total capital | float | 2 | 0.85% |
Quick ratio | float | 2 | 0.85% |
Current ratio | float | 2 | 0.85% |
Interest protection multiple | float | 116 | 49.57% |
Operating cash flow liability ratio | float | 4 | 1.71% |
Asset–liability ratio | float | 0 | 0 |
Long-term asset suitability rate | float | 10 | 4.27% |
Equity and liability ratio | float | 0 | 0 |
Net asset | float | 0 | 0 |
Net cash flow from operating activities | float | 0 | 0 |
Net margin | float | 0 | 0 |
Net assets and year-end loan balance ratio | float | 24 | 10.26% |
Hot-Deck Imputation | Mean Imputation | |
---|---|---|
IV < 0.001,VIF ≥ 10 | 10 | 9 |
IV < 0.01,VIF ≥ 10 | 7 | 6 |
AIC Regression | 11 | 11 |
BIC Regression | 9 | 9 |
Lasso Regression | 9 | 7 |
Elastic Net Regression | 11 | 7 |
PCA | 20 | 20 |
Parameter | Parameter Value |
---|---|
iterations | 1000 |
learning_rate | 0.03 |
depth | 6 |
grow_policy | ‘Depthwise’ |
12_leaf_reg | 3 |
Hot Deck | Mean | |
---|---|---|
SVM | 0.6761 | 0.8451 |
DT | 0.7746 | 0.9577 |
KNN | 0.6619 | 0.8732 |
LDA | 0.7606 | 0.8873 |
gcForest | 0.9718 | 0.9296 |
RF | 0.9859 | 0.9859 |
CatBoost | 1 | 0.9859 |
RUSBoost | 1 | 0.9859 |
Imbalance-XGBoost | 0.9577 | 0.9577 |
Stacking | 0.9296 | 0.9718 |
IV-VIF0.01 | IV-VIF0.001 | AIC | BIC | Lasso | Elastic Net | PCA | |
---|---|---|---|---|---|---|---|
SVM | 0.7607 | 0.7746 | 0.8732 | 0.9155 | 0.9014 | 0.9014 | 0.8309 |
DT | 0.8309 | 0.8732 | 0.9718 | 0.9718 | 0.9718 | 0.9859 | 0.8873 |
KNN | 0.7887 | 0.8592 | 0.9155 | 0.9437 | 0.9155 | 0.9155 | 0.8873 |
LDA | 0.8028 | 0.8028 | 0.8873 | 0.9014 | 0.8873 | 0.8873 | 0.8592 |
gcForest | 0.8732 | 0.8591 | 0.9859 | 0.9859 | 0.9577 | 0.9577 | 0.9296 |
RF | 0.9014 | 0.9014 | 1 | 0.9718 | 0.9859 | 0.9718 | 0.9155 |
CatBoost | 0.8873 | 0.8169 | 0.9859 | 0.9859 | 0.9859 | 0.9859 | 0.9014 |
RUSBoost | 0.8028 | 0.8308 | 0.9718 | 0.9859 | 0.9577 | 0.9718 | 0.9014 |
Imbalance-XGBoost | 0.8028 | 0.8028 | 0.9859 | 0.9859 | 0.9437 | 0.9437 | 0.8732 |
Stacking | 0.8591 | 0.9014 | 0.9859 | 0.9718 | 0.9437 | 0.9437 | 0.8873 |
Accuracy | Specificity | Sensitivity | F1-Score | Kappa | MCC | |
---|---|---|---|---|---|---|
CatBoost (Hot deck) | 100% | 100% | 100% | 100% | 100% | 100% |
RUSBoost (Hot deck) | 100% | 100% | 100% | 100% | 100% | 100% |
gcForest (Hot deck) | 97.18% | 97.29% | 97.06% | 97.06% | 94.36% | 94.36% |
Forest (Mean) | 98.59% | 97.29% | 100% | 98.55% | 97.18% | 97.22% |
Imbalance (Hot deck) | 95.77% | 85.71% | 50% | 61.54% | 34.78% | 37.79% |
Stacking (Mean) | 97.18% | 94.59% | 100% | 97.14% | 94.37% | 94.52% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lu, Y.; Wang, K.; Sun, H.; Qu, H.; Chen, J.; Liu, W.; Chang, C. Research on User Default Prediction Algorithm Based on Adjusted Homogenous and Heterogeneous Ensemble Learning. Appl. Sci. 2024, 14, 5711. https://doi.org/10.3390/app14135711
Lu Y, Wang K, Sun H, Qu H, Chen J, Liu W, Chang C. Research on User Default Prediction Algorithm Based on Adjusted Homogenous and Heterogeneous Ensemble Learning. Applied Sciences. 2024; 14(13):5711. https://doi.org/10.3390/app14135711
Chicago/Turabian StyleLu, Yao, Kui Wang, Hui Sun, Hanwen Qu, Jiajia Chen, Wei Liu, and Chenjie Chang. 2024. "Research on User Default Prediction Algorithm Based on Adjusted Homogenous and Heterogeneous Ensemble Learning" Applied Sciences 14, no. 13: 5711. https://doi.org/10.3390/app14135711
APA StyleLu, Y., Wang, K., Sun, H., Qu, H., Chen, J., Liu, W., & Chang, C. (2024). Research on User Default Prediction Algorithm Based on Adjusted Homogenous and Heterogeneous Ensemble Learning. Applied Sciences, 14(13), 5711. https://doi.org/10.3390/app14135711