ChurnKB: A Generative AI-Enriched Knowledge Base for Customer Churn Feature Engineering
Abstract
:1. Introduction
2. Background
2.1. Customer Journey, Cognitive Status, and Behaviours
2.2. Customer Churn from an Analytical Perspective
Method | Strengths | Limitations |
---|---|---|
Rule-based text mining | Explicit, interpretable rules | Requires manual rule creation, limited generalization [52] |
Sentiment analysis | Captures basic sentiment polarity (positive/negative) | May miss subtle or domain-specific sentiment nuances [53] |
Domain-specific lexicon | Provides structured vocabulary for churn indicators | Static and requires regular updates to remain relevant [52] |
Generative AI (proposed approach) | Extracts latent features, dynamically adapts to text, and generates new insights | Computationally expensive, potential risk of hallucination [54] |
2.3. Churn-Related Analysis in Various Domains
3. Method
3.1. Developing a Customer ChurnKB
3.1.1. Comprehensive Literature Review
3.1.2. Feature Identification
- Frustration: Frustration arises when customers encounter obstacles, delays, or challenges in their interactions with a company. This emotion may be triggered by poor customer service or unresolved issues.
- Disconnection: Disconnection occurs when customers no longer feel emotionally or personally connected to a brand. This feeling may result from shifts in company values or inconsistent communication.
- Reduced Usage: Customers who are dissatisfied or frustrated or feel disconnected from a company are likely to reduce their usage of the company’s products or services.
- Seeking Alternatives: Dissatisfied, frustrated, and disconnected customers are prone to seek alternatives actively. They might research and explore other companies or businesses offering similar products or services.
3.1.3. Taxonomy Development
3.1.4. Taxonomy Validation/Modification (Feedback Loop)
3.1.5. Developing a Sub-Instance to Phrasal/Lexical List Connector API
3.1.6. Linked List Validation (Feedback Loop)
3.1.7. Developing a Sub-Instance Score Calculator API
Algorithm 1 The process of sub-instance score calculation in APIs that work based on textual data mining techniques. |
|
3.2. Developing a Knowledge Base-Enhanced Classifier for Identifying Customer Churn-Related Patterns
- The first step involves data curation, defined as the process of transforming raw data into contextualised data and knowledge [12], thereby enhancing the efficiency of ML algorithms. Drawing inspiration from two recent studies [12,75], it is crucial to prepare and curate raw textual data before proceeding with further analysis. The curation process involves three steps: (i) data cleaning, (ii) feature extraction, and (iii) feature enrichment. Customer interactions include communications such as reviews and chat logs in a desired period. Cleaning includes removing punctuation, stop words, and special characters while normalizing text. Next, feature extraction applies part-of-speech tagging, named entity recognition, and keyword identification to capture key linguistic patterns. Finally, Feature enrichment incorporates synonym expansion (e.g., WordNet [76]) and stemming to standardize word forms, enhancing the dataset for churn prediction analysis.
- Each customer feeling and behaviour represented in ChurnKB (e.g., dissatisfaction) may be caused by various reasons (e.g., poor customer service). These reasons, in turn, may lead to several feelings or behaviours (e.g., anger), which are considered sub-instances within ChurnKB. In the second step, as outlined in Algorithm 2, these sub-instances are linked to the curated data and extracted features by initialising an empty list for each instance. Extracted features (e.g., stemmed keywords or phrases) are then added to the corresponding sub-instance list.
- In the third step, the extracted features from the previous step are used as input for the corresponding sub-instance-related APIs, enabling the score calculation process for each sub-instance.
- In the fourth step, the calculated scores are used as input features for the churn classifier. ML algorithms like Random Forest, Logistic Regression, and XGBoost can be applied. The model is trained as a binary classifier to predict if a customer is likely to churn. A feedback loop evaluates the KB’s performance, as described in Section 4.1.
Algorithm 2 Linking extracted features from input textual data to the sub-instances in ChurnKB. |
|
4. Evaluation
4.1. Evaluating the ChurnKB Development Approach
- H1: The structure of the initial churn taxonomy, including concepts and instances, etc., is relevant to customer churn.
- H2: The use of the pronoun I), absolute words, and certainty words by customers in their communications or feedback is relevant to identifying customers’ churn-related cognitive and behavioural patterns.
- H3: Churn-related features (i.e., sub-instances) and the corresponding phrasal/lexical list are relevant to identifying customers’ churn-related cognitive and behavioural patterns.
- H4: The developed classifier leads to reliable results derived from the application of ChurnKB to enhance the feature engineering phase.
4.2. Evaluating Churn Classifier Performance
4.2.1. Data
4.2.2. Evaluation Metrics
4.2.3. Results
4.2.4. Statistical Validation
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Tueanrat, Y.; Papagiannidis, S.; Alamanos, E. Going on a Journey: A Review of the Customer Journey Literature. J. Bus. Res. 2021, 125, 336–353. [Google Scholar] [CrossRef]
- Knowles, C. Customer Churn Costing Australian Businesses Millions, Report Finds. 2021. Available online: https://itbrief.com.au/story/customer-churn-costing-australian-businesses-millions-report-finds (accessed on 13 April 2025).
- Ahn, J.; Hwang, J.; Kim, D.; Choi, H.; Kang, S. A Survey on Churn Analysis in Various Business Domains. IEEE Access 2020, 8, 220816–220839. [Google Scholar] [CrossRef]
- Wu, X.; Li, P.; Zhao, M.; Liu, Y.; Crespo, R.G.; Herrera-Viedma, E. Customer Churn Prediction for Web Browsers. Expert Syst. Appl. 2022, 209, 118177. [Google Scholar] [CrossRef]
- Kim, K.; Jun, C.H.; Lee, J. Improved Churn Prediction in Telecommunication Industry by Analyzing a Large Network. Expert Syst. Appl. 2014, 41, 6575–6584. [Google Scholar] [CrossRef]
- Amiri, H.; Daume, H., III. Target-Dependent Churn Classification in Microblogs. In Proceedings of the AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015; Volume 29. [Google Scholar]
- Verbeke, W.; Martens, D.; Baesens, B. Social network analysis for customer churn prediction. Appl. Soft Comput. 2014, 14, 431–446. [Google Scholar] [CrossRef]
- Abdul-Rahman, S.; Ali, M.F.A.M.; Bakar, A.A.; Mutalib, S. Enhancing churn forecasting with sentiment analysis of steam reviews. Soc. Netw. Anal. Min. 2024, 14, 178. [Google Scholar] [CrossRef]
- AI, G. How Large Language Models Extract Insights from Support Calls. 2023. Available online: https://www.joinglyph.com/blog/how-llms-are-used-to-extract-insights-from-support-calls (accessed on 13 April 2025).
- Luzmo. How to Perform Churn Analysis Using AI. 2024. Available online: https://www.luzmo.com/blog/churn-analysis (accessed on 13 April 2025).
- De, S.; Prabu, P. Predicting Customer Churn: A Systematic Literature Review. J. Discret. Math. Sci. Cryptogr. 2022, 25, 1965–1985. [Google Scholar] [CrossRef]
- Beheshti, A.; Benatallah, B.; Tabebordbar, A.; Motahari-Nezhad, H.R.; Barukh, M.C.; Nouri, R. Datasynapse: A social data curation foundry. Distrib. Parallel Databases 2019, 37, 351–384. [Google Scholar] [CrossRef]
- Beheshti, A.; Vaghani, K.; Benatallah, B.; Tabebordbar, A. CrowdCorrect: A Curation Pipeline for Social Data Cleansing and Curation. In Proceedings of the Information Systems in the Big Data Era: CAiSE Forum 2018, Tallinn, Estonia, 11–15 June 2018; pp. 24–38. [Google Scholar]
- Beheshti, A. Knowledge base 4.0: Using crowdsourcing services for mimicking the knowledge of domain experts. In Proceedings of the 2022 IEEE International Conference on Web Services (ICWS), Barcelona, Spain, 10–16 July 2022; pp. 425–427. [Google Scholar]
- Domingos, P. A few useful things to know about machine learning. Commun. ACM 2012, 55, 78–87. [Google Scholar] [CrossRef]
- Shahabikargar, M.; Beheshti, A.; Khatami, A.; Nguyen, R.; Zhang, X.; Alinejad-Rokny, H. Domain Knowledge Enhanced Text Mining for Identifying Mental Disorder Patterns. In Proceedings of the 2022 IEEE 9th International Conference on Data Science and Advanced Analytics (DSAA), Shenzhen, China, 13–16 October 2022; pp. 1–10. [Google Scholar]
- Barukh, M.C.; Zamanirad, S.; Baez, M.; Beheshti, A.; Benatallah, B.; Casati, F.; Yao, L.; Sheng, Q.Z.; Schiliro, F. Cognitive augmentation in processes. In Next-Gen Digital Services. A Retrospective and Roadmap for Service Computing of the Future: Essays Dedicated to Michael Papazoglou on the Occasion of His 65th Birthday and His Retirement; Springer: Cham, Switzerland, 2021; pp. 123–137. [Google Scholar]
- Beheshti, A.; Yang, J.; Sheng, Q.Z.; Benatallah, B.; Casati, F.; Dustdar, S.; Nezhad, H.R.M.; Zhang, X.; Xue, S. ProcessGPT: Transforming Business Process Management with Generative Artificial Intelligence. arXiv 2023, arXiv:2306.01771. [Google Scholar]
- Kapiche. The Definitive Guide to Text Analytics for Customer Experience. 2023. Available online: https://www.kapiche.com/blog/the-definitive-guide-to-text-analytics-for-cx (accessed on 13 April 2025).
- Brooks, M.; Amershi, S.; Lee, B.; Drucker, S.M.; Kapoor, A.; Simard, P. FeatureInsight: Visual support for error-driven feature ideation in text classification. In Proceedings of the 2015 IEEE Conference on Visual Analytics Science and Technology (VAST), Chicago, IL, USA, 25–30 October 2015; pp. 105–112. [Google Scholar]
- Kotni, V.D.P. Paradigm shift from attracting footfalls for retail store to getting hits for e-stores: An evaluation of decision-making attributes in e-tailing. Glob. Bus. Rev. 2017, 18, 1215–1237. [Google Scholar] [CrossRef]
- Blackwell, R.D.; Miniard, P.W.; Engel, J.F. Consumer Behavior, 10th ed.; Dryden Press: Chicago, IL, USA, 2006. [Google Scholar]
- Solomon, R.; Bamossy, G.; Askegaard, S.; Hogg, M. Consumer Behaviour: European Perspective, 4th ed.; Prentice Hall: Harlow, UK, 2010. [Google Scholar]
- Valaskova, K.; Kramarova, K.; Bartosova, V. Multi criteria models used in Slovak consumer market for business decision making. Procedia Econ. Financ. 2015, 26, 174–182. [Google Scholar] [CrossRef]
- Khawaja, S.; Zia, T.; Sokić, K.; Qureshi, F.H. The impact of emotions on consumer behaviour: Exploring gender differences. Mark. Consum. Res. 2023, 88, 69–80. [Google Scholar]
- Havlena, W.J.; Holbrook, M.B. The varieties of consumption experience: Comparing two typologies of emotion in consumer behavior. J. Consum. Res. 1986, 13, 394–404. [Google Scholar] [CrossRef]
- Chandrashekar, G.; Sahin, F. A survey on feature selection methods. Comput. Electr. Eng. 2014, 40, 16–28. [Google Scholar] [CrossRef]
- Zheng, A.; Casari, A. Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2018. [Google Scholar]
- Xie, J.; Sage, M.; Zhao, Y.F. Feature selection and feature learning in machine learning applications for gas turbines: A review. Eng. Appl. Artif. Intell. 2023, 117, 105591. [Google Scholar] [CrossRef]
- Bengio, Y.; Courville, A.; Vincent, P. Representation learning: A review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 1798–1828. [Google Scholar] [CrossRef]
- Reddy, G.T.; Reddy, M.P.K.; Lakshmanna, K.; Kaluri, R.; Rajput, D.S.; Srivastava, G.; Baker, T. Analysis of dimensionality reduction techniques on big data. IEEE Access 2020, 8, 54776–54788. [Google Scholar] [CrossRef]
- Guyon, I.; Elisseeff, A. An introduction to variable and feature selection. J. Mach. Learn. Res. 2003, 3, 1157–1182. [Google Scholar]
- Cohen, I. Optimizing Feature Generation. 2019. Available online: https://medium.com/towards-data-science/optimizing-feature-generation-dab98a049f2e (accessed on 19 April 2025).
- McKinsey. Data Preprocessing vs. Feature Engineering. 2023. Available online: https://www.iguazio.com/questions/data-preprocessing-vs-feature-engineering-whats-the-difference/ (accessed on 13 April 2025).
- Pfingsten, T.; Herrmann, D.J.; Schnitzler, T.; Feustel, A.; Scholkopf, B. Feature selection for troubleshooting in complex assembly lines. IEEE Trans. Autom. Sci. Eng. 2007, 4, 465–469. [Google Scholar] [CrossRef]
- Chai, X.; Deshpande, O.; Garera, N.; Gattani, A.; Lam, W.; Lamba, D.S.; Liu, L.; Tiwari, M.; Tourn, M.; Vacheri, Z.; et al. Social Media Analytics: The Kosmix Story. IEEE Data Eng. Bull. 2013, 36, 4–12. [Google Scholar]
- Beheshti, A. Empowering Generative AI with Knowledge Base 4.0: Towards Linking Analytical, Cognitive, and Generative Intelligence. In Proceedings of the International Conference on Web Services (ICWS), Chicago, IL, USA, 2–8 July 2023. [Google Scholar]
- Thematic. Sentiment Analysis: Comprehensive Beginner’s Guide. 2023. Available online: https://getthematic.com/sentiment-analysis (accessed on 13 April 2025).
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
- Kingma, D.P. Auto-encoding variational bayes. arXiv 2013, arXiv:1312.6114. [Google Scholar]
- Cao, Y.; Li, S.; Liu, Y.; Yan, Z.; Dai, Y.; Yu, P.S.; Sun, L. A comprehensive survey of ai-generated content (aigc): A history of generative ai from gan to chatgpt. arXiv 2023, arXiv:2303.04226. [Google Scholar]
- Shi, Y.; Wang, B.; Yu, Y.; Tang, X.; Huang, C.; Dong, J. Robust anomaly detection for multivariate time series through temporal GCNs and attention-based VAE. Knowl.-Based Syst. 2023, 275, 110725. [Google Scholar] [CrossRef]
- Wu, J.; Plataniotis, K.; Liu, L.; Amjadian, E.; Lawryshyn, Y. Interpretation for Variational Autoencoder Used to Generate Financial Synthetic Tabular Data. Algorithms 2023, 16, 121. [Google Scholar] [CrossRef]
- Park, N.; Mohammadi, M.; Gorde, K.; Jajodia, S.; Park, H.; Kim, Y. Data synthesis based on generative adversarial networks. arXiv 2018, arXiv:1806.03384. [Google Scholar] [CrossRef]
- Kate, P.; Ravi, V.; Gangwar, A. FinGAN: Chaotic generative adversarial network for analytical customer relationship management in banking and insurance. Neural Comput. Appl. 2023, 35, 6015–6028. [Google Scholar] [CrossRef]
- Li, B.; Xie, J. Study on the Prediction of Imbalanced Bank Customer Churn Based on Generative Adversarial Network. J. Phys. Conf. Ser. 2020, 1624, 032054. [Google Scholar] [CrossRef]
- Hofmann, P.; Rückel, T.; Urbach, N. Innovating with Artificial Intelligence: Capturing the Constructive Functional Capabilities of Deep Generative Learning. In Proceedings of the 54th Hawaii International Conference on System Sciences, Kauai, HI, USA, 5–8 January 2021. [Google Scholar]
- Tirado-Olivares, S.; Navío-Inglés, M.; O’Connor-Jiménez, P.; Cózar-Gutiérrez, R. From Human to Machine: Investigating the Effectiveness of the Conversational AI ChatGPT in Historical Thinking. Educ. Sci. 2023, 13, 803. [Google Scholar] [CrossRef]
- Su, J.; Yang, W. Unlocking the power of ChatGPT: A framework for applying generative AI in education. ECNU Rev. Educ. 2023, 6, 355–366. [Google Scholar]
- Aydın, Ö.; Karaarslan, E. Is ChatGPT leading generative AI? What is beyond expectations? J. Eng. Smart Syst. 2023, 11, 118–134. [Google Scholar]
- Azaria, A. ChatGPT Usage and Limitations; HAL Open Science: Lyon, France, 2022. [Google Scholar]
- Datavid. How Is Text Mining Different from Data Mining? 2022. Available online: https://datavid.com/blog/text-mining-vs-data-mining (accessed on 13 April 2025).
- International Research Journal of Engineering Science, Technology and Innovation. Sentiment Analysis: Techniques, Limitations, and Case Studies in Data Extraction and Classification. 2023. Available online: https://www.interesjournals.org/articles/sentiment-analysis-techniques-limitations-and-case-studies-in-data-extraction-and-classification-99020.html (accessed on 13 April 2025).
- MIT xPRO. Exploring the Shift from Traditional to Generative AI. 2024. Available online: https://curve.mit.edu/exploring-shift-traditional-generative-ai (accessed on 13 April 2025).
- Periáñez, Á.; Saas, A.; Guitart, A.; Magne, C. Churn prediction in mobile social games: Towards a complete assessment using survival ensembles. In Proceedings of the 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA), Montreal, QC, Canada, 17–19 October 2016; pp. 564–573. [Google Scholar]
- Tamaddoni Jahromi, A.; Sepehri, M.M.; Teimourpour, B.; Choobdar, S. Modeling customer churn in a non-contractual setting: The case of telecommunications service providers. J. Strateg. Mark. 2010, 18, 587–598. [Google Scholar] [CrossRef]
- Buckinx, W.; Van den Poel, D. Customer base analysis: Partial defection of behaviourally loyal clients in a non-contractual FMCG retail setting. Eur. J. Oper. Res. 2005, 164, 252–268. [Google Scholar] [CrossRef]
- Lejeune, M.A. Measuring the impact of data mining on churn management. Internet Res. 2001, 11, 375–387. [Google Scholar] [CrossRef]
- Agrawal, S.; Das, A.; Gaikwad, A.; Dhage, S. Customer churn prediction modelling based on behavioural patterns analysis using deep learning. In Proceedings of the 2018 International Conference on Smart Computing and Electronic Enterprise (ICSCEE), Shah Alam, Malaysia, 11–12 July 2018; pp. 1–6. [Google Scholar]
- Rahman, M.; Kumar, V. Machine Learning-Based Customer Churn Prediction in Banking. In Proceedings of the 2020 4th International Conference on Electronics, Communication and Aerospace Technology (ICECA), Coimbatore, India, 5–7 November 2020; pp. 1196–1201. [Google Scholar]
- Karvana, K.G.M.; Yazid, S.; Syalim, A.; Mursanto, P. Customer churn analysis and prediction using data mining models in banking industry. In Proceedings of the 2019 International Workshop on Big Data and Information Security (IWBIS), Bali, Indonesia, 11 October 2019; pp. 33–38. [Google Scholar]
- Miao, X.; Wang, H. Customer Churn Prediction on Credit Card Services using Random Forest Method. In Proceedings of the 2022 7th International Conference on Financial Innovation and Economic Development (ICFIED 2022), Online, 14–16 January 2022; Atlantis Press: Dordrecht, The Netherlands, 2022; pp. 649–656. [Google Scholar]
- Zhang, R.; Li, W.; Tan, W.; Mo, T. Deep and shallow model for insurance churn prediction service. In Proceedings of the 2017 IEEE International Conference on Services Computing (SCC), Honolulu, HI, USA, 25–30 June 2017; pp. 346–353. [Google Scholar]
- Kilimci, Z.H.; Yörük, H.; Akyokus, S. Sentiment analysis based churn prediction in mobile games using word embedding models and deep learning algorithms. In Proceedings of the 2020 International Conference on Innovations in Intelligent Systems and Applications (INISTA), Novi Sad, Serbia, 24–26 August 2020; pp. 1–7. [Google Scholar]
- Wang, A.X.; Chukova, S.S.; Nguyen, B.P. Data-Centric AI to Improve Churn Prediction with Synthetic Data. In Proceedings of the 2023 3rd International Conference on Computer, Control and Robotics (ICCCR), Shanghai, China, 24–26 March 2023; pp. 409–413. [Google Scholar]
- Hasumoto, K.; Goto, M. Predicting customer churn for platform businesses: Using latent variables of variational autoencoder as consumers’ purchasing behavior. Neural Comput. Appl. 2022, 34, 18525–18541. [Google Scholar] [CrossRef]
- Lemon, K.N.; Verhoef, P.C. Understanding customer experience throughout the customer journey. J. Mark. 2016, 80, 69–96. [Google Scholar] [CrossRef]
- Rezvani, N.; Beheshti, A.; Tabebordbar, A. Linking Textual and Contextual Features for Intelligent Cyberbullying Detection in Social Media. In Proceedings of the 18th International Conference on Advances in Mobile Computing & Multimedia, Chiang Mai, Thailand, 30 November–2 December 2020; pp. 3–10. [Google Scholar]
- Troop, N.A.; Chilcot, J.; Hutchings, L.; Varnaite, G. Expressive writing, self-criticism, and self-reassurance. Psychol. Psychother. Theory Res. Pract. 2013, 86, 374–386. [Google Scholar] [CrossRef]
- De Choudhury, M.; Counts, S.; Horvitz, E. Major life changes and behavioral markers in social media: Case of childbirth. In Proceedings of the 2013 Conference on Computer Supported Cooperative Work, San Antonio, TX, USA, 23–27 February 2013; pp. 1431–1442. [Google Scholar]
- Fernandes, A.C.; Dutta, R.; Velupillai, S.; Sanyal, J.; Stewart, R.; Chandran, D. Identifying suicide ideation and suicidal attempts in a psychiatric clinical research database using natural language processing. Sci. Rep. 2018, 8, 7426. [Google Scholar] [CrossRef]
- Pennebaker, J.W.; Boyd, R.L.; Jordan, K.; Blackburn, K. The Development and Psychometric Properties of LIWC2015; Technical report; Pennebaker Conglomerates: Austin, TX, USA, 2015. [Google Scholar]
- Mohammad, S.M.; Turney, P.D. Crowdsourcing a word–emotion association lexicon. Comput. Intell. 2013, 29, 436–465. [Google Scholar] [CrossRef]
- Park, K.; Hong, J.S.; Kim, W. A methodology combining cosine similarity with classifier for text classification. Appl. Artif. Intell. 2020, 34, 396–411. [Google Scholar] [CrossRef]
- Beheshti, A.; Benatallah, B.; Nouri, R.; Tabebordbar, A. CoreKG: A knowledge lake service. Proc. VLDB Endow. 2018, 11, 1942–1945. [Google Scholar] [CrossRef]
- Miller, G.A. WordNet: A lexical database for English. Commun. ACM 1995, 38, 39–41. [Google Scholar] [CrossRef]
- Heaton, J. Goodfellow, Ian and Bengio, Yoshua and Courville, Aaron and Bengio, Yoshua: Deep learning: The mit press, 2016, 800 pp, isbn: 0262035618. Genet. Program. Evolvable Mach. 2018, 19, 305–307. [Google Scholar] [CrossRef]
- Arik, S.Ö.; Pfister, T. Tabnet: Attentive interpretable tabular learning. Proc. AAAI Conf. Artif. Intell. 2021, 35, 6679–6687. [Google Scholar] [CrossRef]
Model | Recall | Precision | F1 Score |
---|---|---|---|
(A) Scenario One: Numerical Data Only. | |||
Tabnet | 0.5000 | 0.7800 | 0.6094 |
Gradient Boosting | 0.4970 | 0.7446 | 0.5894 |
XGBoost | 0.5005 | 0.6830 | 0.5752 |
MLP | 0.5065 | 0.6190 | 0.5571 |
LightGBM | 0.4823 | 0.6711 | 0.5567 |
(B) Scenario Two: Textual Data Only. | |||
Linear Discriminant Analysis | 0.6308 | 0.7966 | 0.7021 |
MLP | 0.6410 | 0.7463 | 0.6897 |
AdaBoost | 0.6229 | 0.7661 | 0.6850 |
Tabnet | 0.6301 | 0.7188 | 0.6715 |
XGBoost | 0.6057 | 0.7565 | 0.6698 |
(C) Scenario Three: Numerical + Textual Data. | |||
XGBoost | 0.7085 | 0.8946 | 0.7891 |
LightGBM | 0.7013 | 0.8963 | 0.7842 |
AdaBoost | 0.7193 | 0.8531 | 0.7754 |
Tabnet | 06623 | 0.7969 | 0.7234 |
MLP | 0.6494 | 0.7812 | 0.7092 |
Metric | T-Statistic | p-Value | Metric | T-Statistic | p-Value |
---|---|---|---|---|---|
AUC | 5.297949 | 0.000253 | Recall | 7.819004 | 0.000008 |
Precision | 6.428685 | 0.000049 | F1 | 8.786979 | 0.000003 |
Model Setup | F1 Score | Improvement over Baseline |
---|---|---|
Baseline (A)—numerical Only | 0.5752 | — |
ChurnKB-enabled (B)—textual features only | 0.6698 | +9.4% |
Full model (C)—numerical + textual features | 0.7891 | +21.3% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Shahabikargar, M.; Beheshti, A.; Mansoor, W.; Zhang, X.; Foo, E.J.; Jolfaei, A.; Hanif, A.; Shabani, N. ChurnKB: A Generative AI-Enriched Knowledge Base for Customer Churn Feature Engineering. Algorithms 2025, 18, 238. https://doi.org/10.3390/a18040238
Shahabikargar M, Beheshti A, Mansoor W, Zhang X, Foo EJ, Jolfaei A, Hanif A, Shabani N. ChurnKB: A Generative AI-Enriched Knowledge Base for Customer Churn Feature Engineering. Algorithms. 2025; 18(4):238. https://doi.org/10.3390/a18040238
Chicago/Turabian StyleShahabikargar, Maryam, Amin Beheshti, Wathiq Mansoor, Xuyun Zhang, Eu Jin Foo, Alireza Jolfaei, Ambreen Hanif, and Nasrin Shabani. 2025. "ChurnKB: A Generative AI-Enriched Knowledge Base for Customer Churn Feature Engineering" Algorithms 18, no. 4: 238. https://doi.org/10.3390/a18040238
APA StyleShahabikargar, M., Beheshti, A., Mansoor, W., Zhang, X., Foo, E. J., Jolfaei, A., Hanif, A., & Shabani, N. (2025). ChurnKB: A Generative AI-Enriched Knowledge Base for Customer Churn Feature Engineering. Algorithms, 18(4), 238. https://doi.org/10.3390/a18040238