Incorporating Digital Footprints into Credit-Scoring Models through Model Averaging
Abstract
:1. Introduction
2. Literature Review
2.1. Personal Credit Scoring
2.2. Digital Footprints
2.3. Model Averaging
3. Digital Footprints Data Processing and Forecasting Method
3.1. Variable Selection
3.2. Model Averaging Estimation and Weight Choice
- Construct of the candidate model:
- 2.
- Parameter estimation:
- 3.
- Calculation of the KL loss for the model-averaged approach to logistics regression:
- 4.
- Selection of the optimal weight vector
4. Empirical Analysis
4.1. Dataset
4.2. Model Performance Evaluation
4.3. Validity of Digital Footprint Variables
5. Conclusions and Future Research
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Hand, D.J.; Henley, W.E. Statistical Classification Methods in Consumer Credit Scoring: A Review. J. R. Stat. Soc. Ser. A Stat. Soc. 1997, 160, 523–541. [Google Scholar] [CrossRef]
- Xu, D.; Zhang, X.; Feng, H. Generalized Fuzzy Soft Sets Theory-based Novel Hybrid Ensemble Credit Scoring Model. Int. J. Fin. Econ. 2019, 24, 903–921. [Google Scholar] [CrossRef]
- Zhang, T.; Chi, G. A Heterogeneous Ensemble Credit Scoring Model Based on Adaptive Classifier Selection: An Application on Imbalanced Data. Int. J. Fin. Econ. 2021, 26, 4372–4385. [Google Scholar] [CrossRef]
- Loutfi, A.A. A Framework for Evaluating the Business Deployability of Digital Footprint Based Models for Consumer Credit. J. Bus. Res. 2022, 152, 473–486. [Google Scholar] [CrossRef]
- Dai, L.; Han, J.; Shi, J.; Zhang, B. Digital Footprints as Collateral for Debt Collection. SSRN Work. Pap. 2022. [Google Scholar] [CrossRef]
- Arya, V.; Sethi, D.; Paul, J. Does Digital Footprint Act as a Digital Asset?—Enhancing Brand Experience through Remarketing. Int. J. Inf. Manag. 2019, 49, 142–156. [Google Scholar] [CrossRef]
- Golder, S.A.; Macy, M.W. Digital Footprints: Opportunities and Challenges for Online Social Research. Annu. Rev. Sociol. 2014, 40, 129–152. [Google Scholar] [CrossRef]
- Salas-Olmedo, M.H.; Moya-Gómez, B.; García-Palomares, J.C.; Gutiérrez, J. Tourists’ Digital Footprint in Cities: Comparing Big Data Sources. Tour. Manag. 2018, 66, 13–25. [Google Scholar] [CrossRef]
- Baesens, B.; Van Gestel, T.; Viaene, S.; Stepanova, M.; Suykens, J.; Vanthienen, J. Benchmarking State-of-the-Art Classification Algorithms for Credit Scoring. J. Oper. Res. Soc. 2003, 54, 627–635. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
- Berg, T.; Burg, V.; Gombović, A.; Puri, M. On the Rise of FinTechs: Credit Scoring Using Digital Footprints. Rev. Financ. Stud. 2020, 33, 2845–2897. [Google Scholar] [CrossRef]
- Jiang, J.; Liao, L.; Lu, X.; Wang, Z.; Xiang, H. Deciphering Big Data in Consumer Credit Evaluation. J. Empir. Financ. 2021, 62, 28–45. [Google Scholar] [CrossRef]
- Orlova, E.V. Methodology and Models for Individuals’ Creditworthiness Management Using Digital Footprint Data and Machine Learning Methods. Mathematics 2021, 9, 1820. [Google Scholar] [CrossRef]
- Wang, S.; Zhang, D.; Cui, L.; Lu, X.; Liu, L.; Li, Q. Personality Traits Prediction Based on Sparse Digital Footprints via Discriminative Matrix Factorization. In Database Systems for Advanced Applications; Jensen, C.S., Lim, E.-P., Yang, D.-N., Lee, W.-C., Tseng, V.S., Kalogeraki, V., Huang, J.-W., Shen, C.-Y., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2021; Volume 12682, pp. 692–700. ISBN 978-3-030-73196-0. [Google Scholar]
- Jagtiani, J.; Lemieux, C. The Roles of Alternative Data and Machine Learning in Fintech Lending: Evidence from the LendingClub Consumer Platform. Financ. Manag. 2019, 48, 1009–1029. [Google Scholar] [CrossRef]
- Ando, T.; Li, K.-C. A Model-Averaging Approach for High-Dimensional Regression. J. Am. Stat. Assoc. 2014, 109, 254–265. [Google Scholar] [CrossRef]
- Hansen, B.E. Model Averaging, Asymptotic Risk, and Regressor Groups. Quant. Econ. 2014, 5, 495–530. [Google Scholar] [CrossRef]
- Zheng, C.; Zhu, J. Promote Sign Consistency in Cure Rate Model with Weibull Lifetime. AIMS Math. 2022, 7, 3186–3202. [Google Scholar] [CrossRef]
- Crook, J.N.; Edelman, D.B.; Thomas, L.C. Recent Developments in Consumer Credit Risk Assessment. Eur. J. Oper. Res. 2007, 183, 1447–1465. [Google Scholar] [CrossRef]
- He, H.; Zhang, W.; Zhang, S. A Novel Ensemble Method for Credit Scoring: Adaption of Different Imbalance Ratios. Expert Syst. Appl. 2018, 98, 105–117. [Google Scholar] [CrossRef]
- Durand, D. Risk Elements in Consumer Instalment Financing; Nber Books: Cambridge, MA, USA, 1941. [Google Scholar]
- Thomas, L.C. A Survey of Credit and Behavioural Scoring: Forecasting Financial Risk of Lending to Consumers. Int. J. Forecast. 2000, 16, 149–172. [Google Scholar] [CrossRef]
- Desai, V.S.; Crook, J.N.; Overstreet, G.A. A Comparison of Neural Networks and Linear Scoring Models in the Credit Union Environment. Eur. J. Oper. Res. 1996, 95, 24–37. [Google Scholar] [CrossRef]
- Lee, T.; Chen, I. A Two-Stage Hybrid Credit Scoring Model Using Artificial Neural Networks and Multivariate Adaptive Regression Splines. Expert Syst. Appl. 2005, 28, 743–752. [Google Scholar] [CrossRef]
- Abid, L.; Masmoudi, A.; Zouari-Ghorbel, S. The Consumer Loan’s Payment Default Predictive Model: An Application of the Logistic Regression and the Discriminant Analysis in a Tunisian Commercial Bank. J. Knowl. Econ. 2018, 9, 948–962. [Google Scholar] [CrossRef]
- De Caigny, A.; Coussement, K.; De Bock, K.W. A New Hybrid Classification Algorithm for Customer Churn Prediction Based on Logistic Regression and Decision Trees. Eur. J. Oper. Res. 2018, 269, 760–772. [Google Scholar] [CrossRef]
- LEHDONVIRTA, V. A History of the Digitalization of Consumer Culture. In Digital Virtual Consumption; Routledge: London, UK, 2012; ISBN 978-0-203-11483-4. [Google Scholar]
- Zarate, D.; Stavropoulos, V.; Ball, M.; De Sena Collier, G.; Jacobson, N.C. Exploring the Digital Footprint of Depression: A PRISMA Systematic Literature Review of the Empirical Evidence. BMC Psychiatry 2022, 22, 421. [Google Scholar] [CrossRef] [PubMed]
- Azcona, D.; Hsiao, I.-H.; Smeaton, A.F. Detecting Students-at-Risk in Computer Programming Classes with Learning Analytics from Students’ Digital Footprints. User Model. User-Adapt. Interact. 2019, 29, 759–788. [Google Scholar] [CrossRef]
- Feher, K. Digital Identity and the Online Self: Footprint Strategies—An Exploratory and Comparative Research Study. J. Inf. Sci. 2021, 47, 192–205. [Google Scholar] [CrossRef]
- Mou, N.; Zheng, Y.; Makkonen, T.; Yang, T.; Tang, J.; Song, Y. Tourists’ Digital Footprint: The Spatial Patterns of Tourist Flows in Qingdao, China. Tour. Manag. 2020, 81, 104151. [Google Scholar] [CrossRef]
- Wang, S.; Cui, L.; Liu, L.; Lu, X.; Li, Q. Personality Traits Prediction Based on Users’ Digital Footprints in Social Networks via Attention RNN. In Proceedings of the 2020 IEEE International Conference on Services Computing (SCC), Beijing, China, 7–11 July 2020; IEEE: Beijing, China, 2020; pp. 54–56. [Google Scholar]
- Yang, Y.; Fan, Y.; Jiang, L.; Liu, X. Search Query and Tourism Forecasting during the Pandemic: When and Where Can Digital Footprints Be Helpful as Predictors? Ann. Tour. Res. 2022, 93, 103365. [Google Scholar] [CrossRef]
- Gladstone, J.J.; Matz, S.C.; Lemaire, A. Can Psychological Traits Be Inferred from Spending? Evidence From Transaction Data. Psychol Sci 2019, 30, 1087–1096. [Google Scholar] [CrossRef]
- Rozo, B.J.G.; Crook, J.; Andreeva, G. The Role of Web Browsing in Credit Risk Prediction. Decis. Support Syst. 2023, 164, 113879. [Google Scholar] [CrossRef]
- Moral-Benito, E. Model averaging in economics: An overview. J. Econ. Surv. 2015, 29, 46–75. [Google Scholar] [CrossRef]
- Figini, S.; Giudici, P. Credit Risk Assessment with Bayesian Model Averaging. Commun. Stat. Theory Methods 2017, 46, 9507–9517. [Google Scholar] [CrossRef]
- Jha, P.N.; Cucculelli, M. A New Model Averaging Approach in Predicting Credit Risk Default. Risks 2021, 9, 114. [Google Scholar] [CrossRef]
- Buckland, S.T.; Burnham, K.P.; Augustin, N.H. Model Selection: An Integral Part of Inference. Biometrics 1997, 53, 603. [Google Scholar] [CrossRef]
- Hansen, B.E. Least Squares Model Averaging. Econometrica 2007, 75, 1175–1189. [Google Scholar] [CrossRef]
- Zhang, X.; Yu, D.; Zou, G.; Liang, H. Optimal Model Averaging Estimation for Generalized Linear Models and Generalized Linear Mixed-Effects Models. J. Am. Stat. Assoc. 2016, 111, 1775–1790. [Google Scholar] [CrossRef]
- Tibshirani, R. Regression Shrinkage and Selection Via the Lasso. J. R. Stat. Soc. Ser. B (Methodol.) 1996, 58, 267–288. [Google Scholar] [CrossRef]
- Wright, S.J. Coordinate Descent Algorithms. Math. Program. 2015, 151, 3–34. [Google Scholar] [CrossRef]
Groups | Variables | Description | Value Categories |
---|---|---|---|
Loan amount (RMB) | [99, 40,000] | ||
Term | Repayment term selected by the borrower, ranging from 1 to 12 months. | ||
Penalty interest rate | [0.00045, 0.00097] | ||
Platform interest rates | [0.000277, 0.00065] | ||
Total Credit Limit of Circulating Card (RMB) | (Low, 10,000); [10,000, 30,000); [30,000, 50,000); [50,000, 100,000); | ||
Maximum Month on Book | The longest time period between the account or loan opening date and the current date. | ||
Current highest overdue status | Write-off; Bad Debt; Payment Stopped; Frozen; Account Closed; Normal; Not Activated | ||
Number of loans | |||
Total Credit Limit (RMB) | |||
The longest months on book for loan accounts | |||
Number of CC approval reason inquiries in the last 1 month | |||
Age | |||
Gender | male; female | ||
Marital status | Married; Not Married | ||
Number of active cities in the last 90 days | |||
Length of registration (days) | |||
Platform Consumption Power Levels | |||
Consumption Frequency Levels | [0, 3); [3, 6); [6, 10); [10, +∞) | ||
Consumption Scenario Level | [0, 2); [2, 4); [4, +∞) | ||
User’s Number of Transactions in 360 Days | [0, 2], (2, 16], (16, +∞) | ||
User’s Successful Transaction Amount in 180 Days (RMB) | [0, 50], (50, 400], (400, +∞) | ||
User’s Successful Transaction Amount in 90 Days (RMB) | [0, 25], (25, 200], (200, +∞) | ||
User’s Number of Successful Transactions in 360 Days | [0, 1], (1, 14], (14, +∞) | ||
User’s Number of Successful Transactions in 90 Days | [0, 1], (1, 4], (4, +∞) | ||
User’s Number of Successful Food Delivery Transactions in 180 Days | [0, 2], (2, 11], (11, +∞) | ||
User’s Number of Successful Food Delivery Transactions in 360 Days | [0, 2], (2, 15], (15, +∞) | ||
User’s Number of Successful Food Delivery Transactions in 90 Days | |||
Number of Channels Visited by User in 90 Days | |||
Number of Days User Visited in 360 Days | |||
Number of Days User Visited in 90 Days | |||
Number of Days User Visited Food Delivery Service in 180 Days | |||
Annual Active User Tag | Active Customer; Inactive Customer |
Subject | Y | A1 | A2 | B1 | B2 | B3 |
---|---|---|---|---|---|---|
1 | * | * | * | |||
2 | * | * | * | * | ||
3 | * | * | * | * | ||
4 | * | * | * | * | ||
5 | * | * | * | * | * | |
6 | * | * | * | * | * | |
7 | * | * | * | * | * | |
8 | * | * | * | * | * | * |
Predicted Positive | Predicted Negative | |
---|---|---|
Real positive | TP | FN |
Real negative | FP | TN |
AUC | ACC | F-Score | |
---|---|---|---|
AIC | 0.7640 | 0.6676 | 0.7732 |
BIC | 0.7633 | 0.6621 | 0.7654 |
S-AIC | 0.7638 | 0.6764 | 0.7632 |
S-BIC | 0.7633 | 0.6621 | 0.7670 |
OPT | 0.7740 | 0.7495 | 0.7911 |
Logistics1 | 0.7600 | 0.7370 | 0.7620 |
Logistics8 | 0.7601 | 0.7390 | 0.7812 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, L.; Zhu, J.; Zheng, C.; Zhang, Z. Incorporating Digital Footprints into Credit-Scoring Models through Model Averaging. Mathematics 2024, 12, 2907. https://doi.org/10.3390/math12182907
Wang L, Zhu J, Zheng C, Zhang Z. Incorporating Digital Footprints into Credit-Scoring Models through Model Averaging. Mathematics. 2024; 12(18):2907. https://doi.org/10.3390/math12182907
Chicago/Turabian StyleWang, Linhui, Jianping Zhu, Chenlu Zheng, and Zhiyuan Zhang. 2024. "Incorporating Digital Footprints into Credit-Scoring Models through Model Averaging" Mathematics 12, no. 18: 2907. https://doi.org/10.3390/math12182907
APA StyleWang, L., Zhu, J., Zheng, C., & Zhang, Z. (2024). Incorporating Digital Footprints into Credit-Scoring Models through Model Averaging. Mathematics, 12(18), 2907. https://doi.org/10.3390/math12182907