ChurnKB: A Generative AI-Enriched Knowledge Base for Customer Churn Feature Engineering

Shahabikargar, Maryam; Beheshti, Amin; Mansoor, Wathiq; Zhang, Xuyun; Foo, Eu Jin; Jolfaei, Alireza; Hanif, Ambreen; Shabani, Nasrin

doi:10.3390/a18040238

Open AccessArticle

ChurnKB: A Generative AI-Enriched Knowledge Base for Customer Churn Feature Engineering

by

Maryam Shahabikargar

^1,*

,

Amin Beheshti

^1,*

,

Wathiq Mansoor

²

,

Xuyun Zhang

¹

,

Eu Jin Foo

³

,

Alireza Jolfaei

⁴

,

Ambreen Hanif

¹

and

Nasrin Shabani

¹

School of Computing, Macquarie University, Sydney, NSW 2109, Australia

²

College of Engineering and Information Technology, University of Dubai, Dubai 14143, United Arab Emirates

³

Prospa Pty Ltd., 4-16 Yurong St, Darlinghurst, NSW 2010, Australia

⁴

College of Science and Engineering, Flinders University, Adelaide, SA 5042, Australia

^*

Authors to whom correspondence should be addressed.

Algorithms 2025, 18(4), 238; https://doi.org/10.3390/a18040238

Submission received: 16 February 2025 / Revised: 11 April 2025 / Accepted: 14 April 2025 / Published: 21 April 2025

(This article belongs to the Special Issue Machine Learning Algorithms and Optimization in the Digital Transition (2nd Edition))

Download

Browse Figures

Versions Notes

Abstract

:

Customers are the cornerstone of business success across industries. Companies invest significant resources in acquiring new customers and, more importantly, retaining existing ones. However, customer churn remains a major challenge, leading to substantial financial losses. Addressing this issue requires a deep understanding of customers’ cognitive status and behaviours, as well as early signs of churn. Predictive and Machine Learning (ML)-based analysis, when trained with appropriate features indicative of customer behaviour and cognitive status, can be highly effective in mitigating churn. A robust ML-driven churn analysis depends on a well-developed feature engineering process. Traditional churn analysis studies have primarily relied on demographic, product usage, and revenue-based features, overlooking the valuable insights embedded in customer–company interactions. Recognizing the importance of domain knowledge and human expertise in feature engineering and building on our previous work, we propose the Customer Churn-related Knowledge Base (ChurnKB) to enhance feature engineering for churn prediction. ChurnKB utilizes textual data mining techniques such as Term Frequency-Inverse Document Frequency (TF-IDF), cosine similarity, regular expressions, word tokenization, and stemming to identify churn-related features within customer-generated content, including emails. To further enrich the structure of ChurnKB, we integrate Generative AI, specifically large language models, which offer flexibility in handling unstructured text and uncovering latent features, to identify and refine features related to customer cognitive status, emotions, and behaviours. Additionally, feedback loops are incorporated to validate and enhance the effectiveness of ChurnKB.Integrating knowledge-based features into machine learning models (e.g., Random Forest, Logistic Regression, Multilayer Perceptron, and XGBoost) improves predictive performance of ML models compared to the baseline, with XGBoost’s F1 score increasing from 0.5752 to 0.7891. Beyond churn prediction, this approach potentially supports applications like personalized marketing, cyberbullying detection, hate speech identification, and mental health monitoring, demonstrating its broader impact on business intelligence and online safety.

Keywords:

customer churn analysis; text mining; cognitive and behavioural patterns; knowledge base; generative AI

1. Introduction

Customer churn refers to a customer’s tendency to discontinue business with a company within a given period or contract [1]. Nearly all businesses encounter customer churn, often exacerbated by the limited visibility of customer data. This lack of insight leads to customer relationship crises and churn, making it difficult for companies to predict when customers are likely to leave [2]. Retaining existing customers is significantly more profitable than acquiring new ones, as retained customers generate higher profit margins [3]. Moreover, attracting a new customer costs 5–10 times more than retaining an existing one [4]. Therefore, preventing customer churn is critical for businesses offering ongoing services. Addressing churn effectively requires a deep understanding of customers’ cognitive status and behaviour, enabling businesses to anticipate and mitigate churn risks proactively. To this end, leveraging analytical technologies to identify early signs of churn can be of great benefit.

Extensive research has been conducted on customer churn analysis, utilizing various analytical and machine learning (ML) techniques to gain insights into customer churn tendencies. Most of these studies rely on feature engineering as a fundamental prerequisite for model training and predictive analysis [3,5,6]. However, traditional churn-related analytical studies have predominantly focused on demographic, product usage, and revenue-based features. Some recent studies have incorporated social network analysis features into churn models, which yield promising results [5,6,7].While some prior studies have explored social media sentiment analysis [8], call transcripts [9], and support tickets [10] for churn prediction, there is still a gap in systematically integrating a structured knowledge base and generative AI into feature engineering for churn analysis. Our approach, ChurnKB, extends prior work by combining knowledge-driven text mining with generative AI to enhance feature extraction and improve predictive performance [11].

Leveraging knowledge bases, which integrate domain knowledge and experts’ experience, can enhance the feature engineering process [12,13,14]. Human expertise and intuition play a crucial role in feature engineering, a fundamental step in ML projects that involves designing and selecting relevant features [15]. In our previous work [16], we explored various psychological and cognitive science sources to develop a domain-specific knowledge base related to mental health disorders, referred to as the Mental Disorder Knowledge Base (mKB). Our findings indicated that depression is closely linked to negative emotions such as sadness, anger, and disgust. Building upon the promising results of that study and recognizing that negative cognitive status, emotions, and feelings (e.g., dissatisfaction with a product or service) increase the likelihood of churn [1], we propose the development of the Customer Churn-related Knowledge Base (ChurnKB) to enhance customer churn analysis.

Recent advancements in generative AI (GenAI) have expanded the opportunities available to businesses, allowing them to enhance various aspects of their operations beyond what was previously considered achievable [17]. GenAI is already transforming industries such as journalism, visual arts, and customer service by enabling automated content creation [18].

ChurnKB leverages generative AI in conjunction with textual data mining techniques to extract churn-related features from customer-generated content, such as emails, chat logs with support agents, website reviews, and social media feedback. Unlike traditional NLP-based approaches (e.g., rule-based text mining, sentiment analysis, and domain-specific lexicons), which rely on predefined heuristics, generative AI offers the flexibility to handle unstructured text, uncover latent features, and generate contextually enriched representations of customer behaviour [19].

As the first contribution of this study, we propose the development of ChurnKB and leverage generative AI, specifically Large Language Models (LLMs), to refine and enhance its structure. This includes utilizing LLMs for feature generation to build a more comprehensive and adaptable knowledge base. To ensure its robustness, we implement a feedback loop approach that validates both the structural integrity of ChurnKB and the effectiveness of GenAI in improving its framework.

A key factor in effectively utilizing ML algorithms is the selection of meaningful features that capture essential aspects of the data. In customer churn analysis, behavioural patterns that indicate a propensity to churn are particularly important. However, despite the availability of numerous tools for handling ML data and algorithms, support for cognitive and behavioural features remains limited [20]. As our second contribution, we explore the integration of ChurnKB into ML models by leveraging cosine similarity and Term Frequency-Inverse Document Frequency (TF-IDF) techniques. These techniques facilitate the quantification of churn-related features, allowing ML classifiers to better capture subtle emotional cues and behavioural signals indicative of churn. Additionally, we assess the effectiveness of ChurnKB-derived features by comparing model performance with and without knowledge-enhanced feature representations.

The remainder of this paper is organized as follows: Section 2 provides background studies on customer churn, feature engineering, and generative AI. Section 3 details our proposed approach for developing ChurnKB and its integration into the customer churn classification pipeline. Section 4 presents the evaluation results of the proposed approach, followed by a discussion of the findings in Section 5. Finally, Section 6 concludes the paper with remarks on future research directions.

2. Background

2.1. Customer Journey, Cognitive Status, and Behaviours

The customer journey encompasses the entire experience a customer has with a business, including interactions at various touchpoints (i.e., points of interaction) such as purchasing and post-purchase services [1]. Optimizing these touchpoints is essential in enhancing customer experience. Customer service, as a key component of the journey, involves communication with customers, follow-ups after sales, and gathering qualitative data on customer sentiments. Combining these data with quantitative factors provides valuable insights into customer feedback and trends [1].

The customer journey significantly influences customer cognitive status, satisfaction, and overall churn risk [1,21]. Meanwhile, customer behaviour refers to the activities and decision-making processes customers undergo when purchasing and consuming products [22,23,24]. Various factors, including personal attributes (e.g., gender, age, education, and income), psychological influences (e.g., motivation, perception, and attitudes), social elements (e.g., family and reference groups), and cultural aspects (e.g., social class), shape customer cognitive status and behaviour [25]. Emotional responses such as pleasure, fear, anger, and sadness further impact the experiences and purchasing decisions of customers, ultimately influencing their likelihood of staying with or leaving a business [26].

2.2. Customer Churn from an Analytical Perspective

Understanding and predicting customer churn is crucial in today’s data-driven world. Churn analysis aims to identify patterns and reasons behind customer departures. By analysing data from various sources, businesses can gain insights into why customers leave. ML models help predict potential churners, enabling companies to take action and retain valuable customers [27]. An ML workflow relies on a good combination of data, features, and models [20]. Features act as a bridge between raw data and insights. Selecting the right features simplifies model training and improves performance. Poor feature selection may require more complex models to achieve the same accuracy [28].

A feature is a measurable property of data used in ML. Features can be numerical (e.g., age and income), categorical (e.g., gender and country), textual (e.g., TF-IDF and word embeddings), temporal (e.g., date and time), geospatial (e.g., location and distance), or transactional (e.g., product prices) [28]. Some models work better with specific feature types. The number of features also matters. Too few may lead to poor predictions, while too many can increase complexity and costs [28]. Feature engineering is the process of transforming raw data into useful features for ML models. It involves creating new features, modifying existing ones, and ensuring they are informative and relevant [28,29,30]. Effective feature engineering enhances a model’s predictive power and performance [31].

Feature engineering transforms raw data into structured representations for ML, optimizing model performance [27,29,32]. It encompasses feature generation, preprocessing, extraction, and selection. Feature generation derives new attributes from existing data, enhancing interpretability and prediction performance [33]. Preprocessing ensures data consistency through cleaning, transformation, and aggregation, supporting meaningful feature extraction [34]. Feature extraction identifies informative patterns, leveraging techniques like embeddings and named entity recognition [12]. Feature selection refines model inputs by prioritizing relevant attributes, reducing dimensionality and noise [35]. Additionally, domain knowledge-based feature engineering integrates expert insights to improve feature relevance, with approaches such as rule-based engineering and external knowledge incorporation [16,36]. Studies have explored advanced feature engineering strategies, including data curation [12], weak supervision [16], and visualization-driven ideation [20], further enhancing interpretability and predictive power.

Artificial Intelligence (AI) refers to the simulation of human intelligence in computer systems. A key advancement in this field is GenAI, which mimics human creativity and problem solving by learning patterns from existing data [37]. As shown in Table 1, compared to traditional feature engineering techniques, which often rely on domain-specific heuristics or handcrafted rules, GenAI provides a more adaptive approach by leveraging deep learning to dynamically extract meaningful features from unstructured data. While rule-based text mining and sentiment analysis require predefined keywords and lexicons, generative AI can identify implicit signals, sentiment nuances, and context-aware patterns that may be overlooked by static methods [38]. GenAI uses deep learning techniques to generate human-like content in various domains, including text, images, and structured data. Its ability to produce realistic and innovative outputs has made it valuable in multiple industries.

GenAI has revolutionized various tasks by providing models like Generative Adversarial Networks (GANs) [39] and Variational Autoencoders (VAEs) [40]. These models are particularly useful for feature engineering, especially when dealing with unstructured data [41], anomaly detection [42], data augmentation [43], synthetic data generation [43,44], imbalanced data [45,46], and data imputation [41,47]. Combining generative models with traditional feature engineering methods can improve the overall efficiency of machine learning models.

As GenAI continues to evolve, its impact on technology and creativity is expanding. Well-known applications include DALL-E (https://openai.com/dall-e-2 (accessed on 13 April 2025)); VALL-E (https://www.microsoft.com/en-us/research/project/vall-e-x/ (accessed on 13 April 2025)); ProcessGPT [18]; and ChatGPT (https://openai.com/blog/chatgpt (accessed on 13 April 2025)),a widely recognized and conversational AI tool. Introduced by OpenAI in 2018, ChatGPT is based on transformer architectures, which excel in natural language processing (NLP) tasks such as text generation and question answering [48].

Building on our previous research [16] and the increasing influence of GenAI in various industries [49], we propose the use of generative models to improve feature engineering for customer churn prediction. One of the key challenges in this approach is Prompt Engineering (PE), which involves designing strategic queries to maximize the quality of AI-generated responses. To address this challenge, we conducted a comprehensive literature review on customer journey analysis, with a focus on behavioural and cognitive aspects. Using the acquired knowledge, we developed structured prompts and fed them into a generative model. Since these models have inherent limitations [50,51], such as generating incorrect information, we integrated a feedback loop where domain experts verified the reliability of AI-generated content.

Table 1. Comparison of generative AI vs. traditional NLP feature engineering.

Method	Strengths	Limitations
Rule-based text mining	Explicit, interpretable rules	Requires manual rule creation, limited generalization [52]
Sentiment analysis	Captures basic sentiment polarity (positive/negative)	May miss subtle or domain-specific sentiment nuances [53]
Domain-specific lexicon	Provides structured vocabulary for churn indicators	Static and requires regular updates to remain relevant [52]
Generative AI (proposed approach)	Extracts latent features, dynamically adapts to text, and generates new insights	Computationally expensive, potential risk of hallucination [54]

2.3. Churn-Related Analysis in Various Domains

This section reviews customer churn studies across different industries, focusing on their feature engineering approaches. Churn is commonly defined as inactivity for a long time [55], though the definition varies across businesses due to market competition and flexible service contracts. As an example, in industries such as Internet services (e.g., online banking, education, and entertainment) churn is frequent due to low switching costs and minimal investment requirements [56,57,58].

Most churn prediction models use log data, which record user interactions within a system (https://www.sumologic.com/glossary/log-file/ (accessed on 13 April 2025)). Businesses employ log-based churn models to analyse transaction logs (e.g., purchases and cancellations), communication logs (e.g., emails and calls), and activity logs (e.g., login data) [3]. Various industries, including telecommunication, banking, and insurance, rely on churn analysis to improve customer retention.

In social media churn prediction, Amiri et al. were the first to explore churn in microblogs (e.g., Twitter). Their study suggested that identifying churn-related content enables companies to offer personalized assistance to at-risk customers. Their main feature categories were ‘demographic churn indicators’, ‘content churn indicators’, and ‘context churn indicators’ [6]. For telecom churn analysis, Agrawal et al. found that long-term contract customers were less likely to churn compared to those on short-term contracts. They also noted that high monthly charges increase churn likelihood. Their study used demographic data, subscription details, user billing profile, and tenure as key predictive features [59].

In the banking sector, churn is a major issue due to the availability of multiple service providers [60]. Customers who close all accounts and cease banking relationships pose financial risks to institutions [61]. Studies suggest that frequent credit card use is linked to lower churn rates, with factors such as total transaction amount; transaction frequency; and revolving balance, i.e., the portion of credit card spending that goes unpaid at the end of a billing cycle, being significant predictors [62].

For insurance churn prediction, Zhang et al. proposed a dynamic modelling technique to assess monthly churn likelihood. Their model incorporated features such as partner status, gender, customer lifetime, age, premium amounts, discounts, and insurance type count [63]. They highlighted that discount possession and variations over time were key churn indicators. The study also suggested that external factors like competitor campaigns and final price offers could influence customer retention.

While previous studies have employed sentiment analysis and text mining techniques for churn prediction [8,64], these approaches often focus on individual feature extraction. Our approach builds upon these works by integrating a domain-specific knowledge base enriched with generative AI to enhance feature engineering by capturing features related to customer behavioural and cognitive patterns.

Despite extensive research, there are limited studies leveraging GenAI for feature engineering in churn prediction. Existing works mainly utilize GANs [45,46] and VAEs [65] to address a major challenge in churn analysis, i.e., the imbalance between churned and non-churned data, which can affect prediction accuracy. GANs are employed to generate synthetic customer data that mimic real churn patterns, helping to balance the dataset and improve model performance. Similarly, VAEs capture the underlying distribution of customer behaviours and generate diverse yet realistic synthetic samples, enhancing the robustness of churn prediction models [66].

In this study, we employ GenAI to enhance ChurnKB by integrating features related to customer cognitive status, emotions, and behaviours. This approach aims to improve churn prediction by capturing more meaningful and personalized insights into customer retention.

3. Method

3.1. Developing a Customer ChurnKB

In the first section of the method, a pipeline to construct ChurnKB is proposed. In this pipeline, previous studies on customer journey, cognitive science, and customer behaviour are leveraged to understand the concepts and behavioural patterns associated with customer churn. We also discuss how we utilise GenAI to enhance the structure of the proposed ChurnKB. As illustrated in Figure 1A, the process of developing ChurnKB involves seven steps: (1) a comprehensive literature review, (2) feature identification, (3) taxonomy development, (4) taxonomy validation/modification (Feedback Loop), (5) the development of a sub-instance to phrasal/lexical list connector API, (6) linked lists validation (feedback loop), and (7) the development of a sub-instance score calculator API. Each of these steps is briefly explained in the following sections and discussed in more detail in Section 4.

3.1.1. Comprehensive Literature Review

A review of previous studies on customer experience, customer journey, and customer churn was conducted to understand the emotional and behavioural aspects of churn. This analysis provides a holistic view of the customer experience, identifying potential dissatisfaction sources, pain points, emotions, and behaviours. Customer interactions and communications with a company influence their perceived value, which, in turn, affects customer satisfaction and the decision to continue or discontinue their relationship with the company [1,67].

3.1.2. Feature Identification

The previous step generates a set of churn-related features. Based on this, in the current step, various concepts and sub-concepts, as well as relationships, that may indicate a customer’s propensity to churn are identified. In developing ChurnKB, two primary concepts that suggest customers’ likelihood of leaving a company are identified as their feelings and the resulting behaviours. Among these, dissatisfaction is one of the most significant emotions contributing to churn. Factors such as poor customer service, low product or service quality, ineffective communication, and lack of value for money are commonly associated with dissatisfaction [1].

However, this research requires more detailed emotional and behavioural features related to these factors, which are not comprehensively covered in the existing literature.We use GenAI to extract granular churn-related features. Unlike rule-based NLP, it detects hidden patterns in customer interactions. It adapts to evolving language, making it valuable for churn prediction. GenAI also generates new insights beyond heuristics, improving the understanding of customer dissatisfaction and churn.

By interacting with a generative model, we retrieve detailed features to be incorporated into ChurnKB. For example, following query “In addition to dissatisfaction, what are the primary emotions that might lead customers to leave a company or business?” produces the following outcome:

Frustration: Frustration arises when customers encounter obstacles, delays, or challenges in their interactions with a company. This emotion may be triggered by poor customer service or unresolved issues.
Disconnection: Disconnection occurs when customers no longer feel emotionally or personally connected to a brand. This feeling may result from shifts in company values or inconsistent communication.

As another example, the query “What are the main probable behaviours of customers with a tendency to churn, derived from those feelings?” produces the following outcome:

Reduced Usage: Customers who are dissatisfied or frustrated or feel disconnected from a company are likely to reduce their usage of the company’s products or services.
Seeking Alternatives: Dissatisfied, frustrated, and disconnected customers are prone to seek alternatives actively. They might research and explore other companies or businesses offering similar products or services.

Continuing the question-answering process with GenAI yields a comprehensive set of features related to customer churn. By integrating insights from customer journey studies with GenAI interactions, we can identify that disconnection might be caused by factors such as no emotional connection and a lack of interest in the product/service. Similarly, frustration could arise from difficulty in using the product/service, a complicated processes, or frequent disruptions or downtime.

Additionally, behaviours like cancelling a contract might result from an intention to churn. These feelings and emotions can be reflected in or predicted by the content of customers’ interactions and communications with the company. Such communications might reveal specific emotions like anger, disgust, and sadness. They might also indicate intent to churn, the customer’s personality type, or negative feedback. Consequently, any combination of these emotions and behaviours could lead to outcomes such as decreased frequency and/or duration of product/service usage, the seeking of alternatives, or negative word-of-mouth communication, eventually resulting in churn.

3.1.3. Taxonomy Development

In the third step, a churn taxonomy is developed based on the features identified in the previous steps. A taxonomy is a hierarchical structure that organises concepts, instances, and their relationships [12,36]. This taxonomy comprises four levels: churn-related concepts (level 1), sub-concepts (level 2), instances (level 3), and sub-instances (level 4). At level 1, the taxonomy includes two primary concepts: customer feelings and customer behaviour. Level 2 includes sub-concepts such as dissatisfaction and reduced usage, while level 3 includes instances such as poor customer service and decreased frequency of use and sub-instances such as anger and disgust associated with the feelings and behaviours of customers likely to churn.

3.1.4. Taxonomy Validation/Modification (Feedback Loop)

Generative AI is known to hallucinate or generate misleading information. To verify the quality of AI-generated features to enhance the taxonomy structure, the fourth step implements a feedback loop to collect domain experts’ insights on the relevance of the included elements. A questionnaire is designed and reviewed by an expert for relevance. She also suggests a threshold for modifying the taxonomy based on expert feedback. If a certain percentage marks an item as ‘irrelevant’, it is removed; otherwise, it stays. This ensures only elements with strong expert consensus are retained, refining the taxonomy for churn analysis.

3.1.5. Developing a Sub-Instance to Phrasal/Lexical List Connector API

In textual documents, there is a clear link between certain words and phrases and the presence of distinctive content or emotion (e.g., profanity [68], self-criticism [69], negative emotion [70], and suicidal ideation or attempts [71]). Consequently, we propose the use of lexical/phrasal resources to identify customers who are experiencing emotions such as dissatisfaction or frustration. It could be done by analysing text from various forms of communication between customers and a business (e.g., voice call transcripts, emails, feedback, or posts on business-related online platforms).

In this step, each sub-instance (level-4) is provided by a ‘sub-instance to phrasal/lexical list connector API’. Such APIs are developed and embedded into the corresponding ChurnKB sub-instances. Each API is an ML algorithm that connects the corresponding sub-instance to the appropriate lexical or phrasal category.

There are several lexical sources available, including the LIWC2015 lexicon [72], which contains categories related to churn, such as anger, sadness, disgust, and fear, along with the corresponding group of words related to each category. Additionally, the NRC emotional lexicon [73] includes categories like negative emotion and their associated words. On the other hand, since some churn-related sub-instances, such as ask for compensation, frustration, neglect, and intent to churn, are not included in either of the available lexical sources, we leveraged GenAI to help us create lists words/phrases that are related to a specific sub-instance indicative of a customer’s propensity to churn, e.g., I’m leaving, discontinue, and This is unacceptable.

All of the churn-related categories included in the lexical sources are joined together to create the main database in the form of an Excel file. As shown on Figure 2, each of the columns of the Excel file acts as a category related to one of the churn sub-instances. For example, there is a special API that connects the sadness sub-instance to the sadness category. Another API links the intent to churn sub-instance to the list of words/phrases showing the intention to churn, etc.

3.1.6. Linked List Validation (Feedback Loop)

Similar to Section 3.1.4, in the sixth step, we implement a feedback loop to gather domain experts’ opinions on the relevance of the elements included at the sub-instance level (i.e., Level 4) of the taxonomy, as well as the corresponding lists of words/phrases. A questionnaire is used to facilitate this. An expert is also asked to confirm the appropriateness of the questions before distribution among other experts.

3.1.7. Developing a Sub-Instance Score Calculator API

In the final step, several APIs are developed, each corresponding to a sub-instance in the customer churn taxonomy. These APIs employ ML algorithms to calculate sub-instance scores (e.g., an anger score) by analysing the customer’s chat logs (i.e., textual input data). The resulting scores, represented as cosine similarity values, form a key component of our proposed ChurnKB, facilitating automated knowledge-based analysis. Algorithm 1 outlines the process of calculating sub-instance scores for APIs using NLP and text mining techniques such as TF-IDF, regular expressions, word tokenization, and stemming. The input for this process is curated textual data, with details of the curation process provided in Section 3.2.

Algorithm 1 The process of sub-instance score calculation in APIs that work based on textual data mining techniques.

Input: Curated input textual data
Output: Calculated sub-instance scores
The ChurnKB sub-instance;
Linked sub-instance-related lexicons;
for Each sub-instance in the ChurnKB do
Reading the curated input textual data;
Reading the linked lexicon;
Applying proper text mining techniques (e.g., Cosine Similarity Calculation);
end for

We employ term frequency-inverse document frequency (TFIDF) and cosine similarity to quantify textual features related to customer churn. TF-IDF identifies key terms in customer interactions, while cosine similarity measures textual similarity between customer-generated content and churn-related lexicons. These methods allow us to capture churn signals efficiently without requiring extensive domain-specific rules.

In the realm of text classification, it serves to quantify the similarity between two documents. The range of values spans from 0 to 1, with 0 denoting no similarity and 1 signifying identical documents. For a pair of documents, such as “doc1” and “doc2”, the similarity is expressed through the following formula:

Cosine Sim (doc 1, doc 2) = \frac{doc 1 \cdot doc 2}{| | doc 1 | | \times | | doc 2 | |}

(1)

which is equal to:

\frac{\sum_{i = 1}^{n} A_{i} B_{i}}{\sqrt{\sum_{i = 1}^{n} A_{i}^{2}} \times \sqrt{\sum_{i = 1}^{n} B_{i}^{2}}}

(2)

where

A_{i}

and

B_{i}

represent the components of vectors doc1 and doc2, respectively [74]. In this study, doc1 and doc2 correspond to a customer-related document and a lexical/phrasal source document, respectively.

The final version of the churn taxonomy which is embedded with sub-instance score calculator APIs, is called ChurnKB. Figure 1B illustrates a snapshot of the customer ChurnKB, indicative of a small part of the inter-connected feelings and behaviours of customers at risk of churn. ChurnKB contains different concepts and sub-concepts associated with churn-related feelings and behaviours, instances and sub-instances, and the relationships among them. Experiencing various feelings could end in different behaviours in a customer. In this figure, solid connections mainly indicate relationships between concepts, sub-concepts, and corresponding instances. On the other hand, dotted arrows indicate the connection between the instances and the corresponding sub-instances. Examples of existing relationships in ChurnKB include the following: poor customer service could cause customers to feel unimportant, which, in turn, may lead to a lack of interest in the service/product. As a reflection of this lack of interest, different changes in customers’ behaviour or consumption patterns may arise, such as decreased frequency of use, decreased duration of use, decreased spend, or decreased purchase frequency. Additionally, a sense of unimportance could directly contribute to decreased spend or decreased purchase frequency.

3.2. Developing a Knowledge Base-Enhanced Classifier for Identifying Customer Churn-Related Patterns

This work enhances ML classification using ChurnKB-driven features. By combining cognitive science and data science, key features of customer behaviour are identified. ChurnKB’s insights on customer feelings and behaviours are used for feature extraction from textual data.

Figure 3 shows the pipeline for a KB-enhanced churn classifier. The method adopts a scalable, service-oriented architecture, allowing for seamless integration of new analytical features.

The first step involves data curation, defined as the process of transforming raw data into contextualised data and knowledge [12], thereby enhancing the efficiency of ML algorithms. Drawing inspiration from two recent studies [12,75], it is crucial to prepare and curate raw textual data before proceeding with further analysis. The curation process involves three steps: (i) data cleaning, (ii) feature extraction, and (iii) feature enrichment. Customer interactions include communications such as reviews and chat logs in a desired period. Cleaning includes removing punctuation, stop words, and special characters while normalizing text. Next, feature extraction applies part-of-speech tagging, named entity recognition, and keyword identification to capture key linguistic patterns. Finally, Feature enrichment incorporates synonym expansion (e.g., WordNet [76]) and stemming to standardize word forms, enhancing the dataset for churn prediction analysis.
Each customer feeling and behaviour represented in ChurnKB (e.g., dissatisfaction) may be caused by various reasons (e.g., poor customer service). These reasons, in turn, may lead to several feelings or behaviours (e.g., anger), which are considered sub-instances within ChurnKB. In the second step, as outlined in Algorithm 2, these sub-instances are linked to the curated data and extracted features by initialising an empty list for each instance. Extracted features (e.g., stemmed keywords or phrases) are then added to the corresponding sub-instance list.
In the third step, the extracted features from the previous step are used as input for the corresponding sub-instance-related APIs, enabling the score calculation process for each sub-instance.
In the fourth step, the calculated scores are used as input features for the churn classifier. ML algorithms like Random Forest, Logistic Regression, and XGBoost can be applied. The model is trained as a binary classifier to predict if a customer is likely to churn. A feedback loop evaluates the KB’s performance, as described in Section 4.1.

Algorithm 2 Linking extracted features from input textual data to the sub-instances in ChurnKB.

Input: Curated input textual data
Output: List of features that are linked to the sub-instances of the ChurnKB
Extract features from textual data;
Churn-related sub-instances = Set up and Array for sub-instances in the ChurnKB; % (e.g., Negative Feedback and Anger, etc.)
for Each Churn-related sub-instances do
Generate an empty list;
for Each feature in extracted-feature do
Add the feature to the corresponding sub-instance list;
Link extracted features to the sub-instances in ChurnKB;
end for
end for

4. Evaluation

To address potential limitations of generative models [50,51] and assess the effectiveness of our ChurnKB, we conducted a survey and gathered feedback from experts in the field. Additionally, we evaluated our approach by applying features derived from ChurnKB to customer churn classification tasks using various ML algorithms and evaluation metrics, as detailed in the following sections.

4.1. Evaluating the ChurnKB Development Approach

We formulated four hypotheses to test and validate the information in the knowledge base and the proposed method. For this purpose, we used an iterative approach that closely mirrors reinforcement learning with human feedback.

To test the hypotheses, we designed and distributed anonymous questionnaires consisting of multiple-choice items. Participants were asked to indicate their level of agreement with each statement using a Likert scale ranging from 1 to 5 (1: irrelevant; 2: weakly relevant; 3: neutral; 4: relevant; 5: strongly relevant). No identifying information was collected, and the identities of participants remained undisclosed throughout the study. This approach focuses on refining the knowledge base through multiple feedback loops, gathering domain experts’ opinions on the relevance of the taxonomy’s elements.

Initially, the included questions were reviewed by domain experts to ensure their relevance to the churn problem. Once the questionnaires were finalized, it was recommended by the experts that a 20% threshold be applied in the analysis of the collected responses. All responses were provided anonymously, and no identifying information about the participants was collected or disclosed.

On the questionnaire, an explanation of its purpose and goals is first provided. This is followed by a target hypothesis. Next, a snapshot of the ChurnKB components related to the hypothesis questions is presented. Finally, the corresponding questions are provided.

Four hypotheses, designed to validate our approach on developing ChurnKB, are outlined as below:

H1: The structure of the initial churn taxonomy, including concepts and instances, etc., is relevant to customer churn.
H2: The use of the pronoun I), absolute words, and certainty words by customers in their communications or feedback is relevant to identifying customers’ churn-related cognitive and behavioural patterns.
H3: Churn-related features (i.e., sub-instances) and the corresponding phrasal/lexical list are relevant to identifying customers’ churn-related cognitive and behavioural patterns.
H4: The developed classifier leads to reliable results derived from the application of ChurnKB to enhance the feature engineering phase.

The initial version of the churn taxonomy was developed based on customer journey studies and GenAI capabilities. To test H1, we presented various parts of the initial churn taxonomy, including different concepts and instances, to the experts. We asked them to evaluate the relevance of each part to customer churn. An example question designed for this purpose is shown in Figure 4A.

To test H2, various questions were posed to validate an assumption, leading to modifications in the initial churn taxonomy. During the development of the initial churn taxonomy, we noted that several sub-instances, such as sadness and negative emotions, were present in a KB developed in previous work [16]. This KB includes different feelings and mental statuses that individuals suffering from mental health issues, such as depression, are likely to experience. Based on this, the use of terms like I, absolute words, and certainty words in communication could reflect specific psychological and emotional statuses.

Motivated by that understanding, we hypothesised that dissatisfied customers may experience a negative emotional status (e.g., similar to depressed individuals) that confirms their decision to leave a company. Therefore, those individuals may use the pronoun I, absolute words, and certainty words in their feedback, reviews, and/or interactions with a company to emphasise a specific decision of theirs. Hence, we came up with the idea of adding those three as sub-instances of negative word of mouth, assumed to be probable behavioural signs of unhappy customers who wish to discourage others from using the product/service.

To validate the H2, we asked experts about their opinions of adding those sub-instances to the latest version of the churn taxonomy. As shown in Figure 4E, we showed experts part of the taxonomy with those three sub-instances added to it. An example question related to the relevance of using absolute words to customer dissatisfaction is illustrated on Figure 4B. Analysis of the experts’ responses showed that the majority of experts found all of the three sub-instances to be relevant to the churn taxonomy. Hence, we incorporated them into it.

To test H3, we focused on questions related to the sub-instances (level 4), as well as the relevance of the generated lists, which include related words and phrases corresponding to some of the sub-instances. Figure 4C illustrates an example question asked in this regard. After analysing expert responses, as shown in Figure 5A,B, we noticed that there were two sub-instances that received a corresponding ‘irrelevant’ rate of more than 20% from the experts. Hence, we removed them. These sub-instances are embarrassment and envy. Embarrassment refers to feeling self-conscious or ashamed about a situation. Envy, on the other hand, refers to the feeling of jealousy towards others who might have a better experience.

To test H4, as shown in Figure 4D, we asked questions of experts to get their ideas regarding the reasonableness/reliability of churn classifier performance based on the corresponding results derived by applying ChurnKB to enhance feature engineering. For this purpose, examples of customer chat logs and the churn status of the corresponding customers were shown to the experts. The analytical results of the classifier are were also presented to the experts. They were asked about their ideas regarding the results. In Section 5, more discussion is provided regarding the final evaluation results of the four hypotheses that were tested in this study.

4.2. Evaluating Churn Classifier Performance

4.2.1. Data

For our experiments and to evaluate the effectiveness of applying the feature engineering derived from ChurnKB, we need a churn-related dataset that includes both textual and other data types, such as categorical and numerical data. The only available dataset that meets these criteria is the churn prediction with text and interpretability dataset (https://github.com/aws-samples/churn-prediction-with-text-and-interpretability/blob/main/README.md (accessed on 13 April 2025)). While the dataset is specific to the telecom industry, the underlying methodology of ChurnKB is designed to be adaptable across various sectors. Customer churn behaviours, though differing in industry-specific manifestations, share common cognitive and behavioural indicators, such as dissatisfaction, reduced engagement, and negative sentiment patterns, that ChurnKB could potentially capture.

The dataset contains 3333 rows and 21 columns. Of these, 2850 samples (85.50%) belong to the ‘churn = no’ class, while 483 samples (14.50%) belong to the ‘churn = yes’ class. Each column represents a different feature type. The features include categorical variables such as state, international plan, and voice-mail plan, and numerical variables like account length and total length of day calls. Additionally, the dataset includes a specific column for text data, which contains chat logs between customers and agents.

4.2.2. Evaluation Metrics

This study employs the F1 score as the primary evaluation metric, given its effectiveness in handling imbalanced datasets. The F1 score balances precision and recall, which is crucial for accurately identifying churners and optimizing retention strategies. Incorporating both precision and recall ensures reliable and actionable predictions for churn mitigation. Precision measures how many predicted churners actually churned, while recall assesses how well the model captures actual churn cases. The F1 score is calculated as follows:

F 1 = 2 \times \frac{Precision \times Recall}{Precision + Recall,}

where

Precision = \frac{Number of correctly predicted churn customers}{Total number of customers predicted as churn,}

whereas

Recall = \frac{Number of correctly predicted churn customers}{Total number of actual churn customers .}

4.2.3. Results

For experimentation and evaluation, the data were processed through a PyCaret (https://github.com/pycaret/pycaret (accessed on 13 April 2025)) pipeline to compare different machine learning models and assess the effect of ChurnKB-enabled features on their analytical performance. PyCaret does not provide built-in support for deep learning models such as TensorFlow or PyTorch-based neural networks in its standard modules. However, it includes various machine learning models via scikit-learn, XGBoost, LightGBM, CatBoost, logistic regression, and decision trees. To further validate the effectiveness of ChurnKB features, we also experimented separately with neural network-based models. For performance comparison, we report only the three top-performing models in each scenario due to space limitations, along with Multilayer Perceptron (MLP) and TabNet, which are designed for tabular data. These deep learning models are more suitable for tabular data than traditional recurrent architectures like Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRUs), which are optimized for sequential data. In addition, we utilised 31 sub-instances from ChurnKB, which can be supplied with an appropriate lexical source.

We compared three scenarios, i.e., (A) a dataset with only numerical data, (B) a dataset with only textual data, and (C) a dataset that includes both textual and numerical data. The evaluation results are reported in Table 2 (A), (B), and (C), respectively. Among various classification models, XGBoost appears consistently across all comparisons, with its performance improving when textual features are included. In the first scenario, XGBoost achieves an F1 score of 0.5752, while in the second scenario, its performance improves to 0.6698. The most notable performance boost occurs in the third scenario, where the inclusion of both numerical and textual features leads to an F1 score of 0.7891. These results highlight the positive impact of textual features on predictive performance. Given XGBoost’s strong and consistent performance across all settings, we selected it as the target classifier for our research.

All the comparison results indicate that applying ChurnKB-enabled feature engineering positively impacts the performance of different classifiers, specifically the XGBoost model. While MLP [77] and TabNet [78] demonstrate competitive performance, XGBoost remains the highest-performing model across all evaluation metrics, including F1 score. The superior performance of XGBoost can be attributed to its ability to efficiently capturing complex feature interactions. Additionally, XGBoost requires fewer computational resources and less training time compared to deep learning models, making it a more practical choice for real-world churn prediction applications.

4.2.4. Statistical Validation

To verify that ChurnKB’s improvements are not due to random variations, a paired t-test was conducted. Table 3 presents the results, comparing model performance with and without ChurnKB-derived features. The T-statistic measures the magnitude of performance differences, while the p-value indicates statistical significance. A larger absolute T-statistic strengthens evidence against random chance, and a smaller p-value (typically < 0.05) suggests the improvement is unlikely to be random. The results show statistically significant gains across all metrics, confirming that ChurnKB-derived features enhance model performance.

For the purpose of conducting an ablation study and measuring the impact of ChurnKB-enabled features, we considered the results derived from XGBoost, which demonstrated the strongest predictive performance. Table 4 presents the impact of ChurnKB-enhanced features in three experimental setups (A) a baseline model, using only numerical features (without ChurnKB); (B) a ChurnKB-enabled model, using only ChurnKB-derived textual features (lexicon-based churn indicators from ChurnKB); and (C) a full model, combining both numerical and ChurnKB-enhanced features.

The ablation study confirms that ChurnKB contributes significantly to churn prediction performance. The ChurnKB-derived textual features alone improve the F1 score by +9.4% compared to the baseline. When combined with numerical features, ChurnKB enhances overall performance by +21.3%, demonstrating its effectiveness in feature enrichment.

5. Discussion

As illustrated in Figure 6, ChurnKB was validated in terms of its elements, content, and performance, leveraging human expertise for refinement. Participants’ responses support Hypotheses 1–4. Part (i) confirms that all participants find the components of the developed knowledge base relevant to customer churn. In Part (ii), ten participants recognize the three sub-instances (i.e., the use of the pronoun I, absolute words, and certainty words) as relevant to churn, while the rest perceive them as neutral. Since the majority consider these sub-instances relevant, we decided to incorporate them into ChurnKB.

Furthermore, Part (iii) shows that all experts, except one, recognize the relevance of the sub-instances used in the developed classifier for identifying churn-related behavioural patterns. One expert, however, considers them only weakly relevant. Part (iv), which assesses the churn classifier’s outcomes using the modified version of ChurnKB, indicates that all but one participant find the results relevant.

However, it is essential to note that there are some validity concerns that should be considered before drawing conclusive results from our study. Although the overall survey results support Hypotheses 1, 2, 3, and 4, there is still room for improvement in the proposed approach—for instance, considering weights for some of the concepts and instances to be indicative of being more influential with respect to churn decisions of customers. These matters will be further investigated in our future works.

On the other hand, the classifier was developed and evaluated using the extracted features from ChurnKB. Although the study is currently limited to a telecom dataset, the framework can be applied to other industries, such as banking, insurance, and e-commerce. For instance, in banking, customer churn often manifests through declining transaction frequency, reduced credit card usage, and negative sentiment in customer support interactions. In insurance, churn indicators include policy cancellations, inquiries about competitor policies, and dissatisfaction with claim processes. ChurnKB can be extended to these domains by adapting its feature extraction pipeline to industry-specific text sources (e.g., customer service transcripts, complaint records, and online reviews). Additionally, integrating industry-specific lexicons and knowledge bases would further refine ChurnKB’s feature representation, enhancing its applicability across sectors.

6. Conclusions

In this study, we introduced ChurnKB, a knowledge base for churn analysis, integrating customer-generated textual data, such as chat logs, with churn analysis. By doing so, it captures cognitive and behavioural patterns indicative of churn risk. To enhance ChurnKB, we developed an interactive GenAI-enabled feature engineering approach, leveraging generative AI to refine its structure and content.

We validated our approach through expert evaluation, using a structured questionnaire to assess ChurnKB’s structure, content, and analytical outcomes. The results confirm its relevance and potential in improving churn prediction. However, some limitations were identified. Some features had weak relevance for detecting churn-related patterns, and linguistic indicators like I, absolute words, and certainty words were seen as neutral.

Despite these challenges, integrating customer–company interactions into churn analysis improves upon traditional approaches based on demographic and product usage features. Our results demonstrate the effectiveness of ChurnKB-derived features, significantly boosting model performance, with XGBoost’s F1 score increasing from 0.5752 to 0.7891 when combining textual and numerical data. This underscores the value of cognitive and behavioural insights in churn analysis. Future work will refine ChurnKB by addressing limitations and optimizing feature selection for further accuracy improvements.

Future research could explore ChurnKB’s application in banking, insurance, and e-commerce to assess its generalizability. Integrating real-time interactions, such as live chat and social media feedback, may enhance early churn detection. Additionally, explainable AI techniques could improve transparency and help businesses justify predictions. AI-driven knowledge bases like ChurnKB may shift churn management from reactive to proactive. They could also enhance customer experience and automate large-scale churn risk detection. These advancements may lead to more effective and responsible business decision making.

Author Contributions

Conceptualization, M.S. and A.B.; validation, M.S. and A.B.; formal analysis, M.S. and A.B.; investigation, M.S., A.B., X.Z., W.M., E.J.F. and A.J.; resources, M.S. and A.B.; writing—original draft preparation, M.S. and A.B.; writing—review and editing, M.S., A.B., W.M. and A.J.; visualization, M.S., A.B., A.H. and N.S.; supervision, A.B., X.Z. and J.F.; project administration, M.S. and A.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The dataset used for supporting the proposed approach in this paper is publicly available at the following web page address: https://github.com/aws-samples/churn-prediction-with-text-and-interpretability/blob/main/README.md.

Acknowledgments

We acknowledge the Centre for Applied Artificial Intelligence at Macquarie University (Sydney, NSW, Australia) and Prospa Advance Pty Limited (Sydney, NSW, Australia) for supporting and funding this research.

Conflicts of Interest

Author Eu Jin Foo was employed by the company Prospa. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Tueanrat, Y.; Papagiannidis, S.; Alamanos, E. Going on a Journey: A Review of the Customer Journey Literature. J. Bus. Res. 2021, 125, 336–353. [Google Scholar] [CrossRef]
Knowles, C. Customer Churn Costing Australian Businesses Millions, Report Finds. 2021. Available online: https://itbrief.com.au/story/customer-churn-costing-australian-businesses-millions-report-finds (accessed on 13 April 2025).
Ahn, J.; Hwang, J.; Kim, D.; Choi, H.; Kang, S. A Survey on Churn Analysis in Various Business Domains. IEEE Access 2020, 8, 220816–220839. [Google Scholar] [CrossRef]
Wu, X.; Li, P.; Zhao, M.; Liu, Y.; Crespo, R.G.; Herrera-Viedma, E. Customer Churn Prediction for Web Browsers. Expert Syst. Appl. 2022, 209, 118177. [Google Scholar] [CrossRef]
Kim, K.; Jun, C.H.; Lee, J. Improved Churn Prediction in Telecommunication Industry by Analyzing a Large Network. Expert Syst. Appl. 2014, 41, 6575–6584. [Google Scholar] [CrossRef]
Amiri, H.; Daume, H., III. Target-Dependent Churn Classification in Microblogs. In Proceedings of the AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015; Volume 29. [Google Scholar]
Verbeke, W.; Martens, D.; Baesens, B. Social network analysis for customer churn prediction. Appl. Soft Comput. 2014, 14, 431–446. [Google Scholar] [CrossRef]
Abdul-Rahman, S.; Ali, M.F.A.M.; Bakar, A.A.; Mutalib, S. Enhancing churn forecasting with sentiment analysis of steam reviews. Soc. Netw. Anal. Min. 2024, 14, 178. [Google Scholar] [CrossRef]
AI, G. How Large Language Models Extract Insights from Support Calls. 2023. Available online: https://www.joinglyph.com/blog/how-llms-are-used-to-extract-insights-from-support-calls (accessed on 13 April 2025).
Luzmo. How to Perform Churn Analysis Using AI. 2024. Available online: https://www.luzmo.com/blog/churn-analysis (accessed on 13 April 2025).
De, S.; Prabu, P. Predicting Customer Churn: A Systematic Literature Review. J. Discret. Math. Sci. Cryptogr. 2022, 25, 1965–1985. [Google Scholar] [CrossRef]
Beheshti, A.; Benatallah, B.; Tabebordbar, A.; Motahari-Nezhad, H.R.; Barukh, M.C.; Nouri, R. Datasynapse: A social data curation foundry. Distrib. Parallel Databases 2019, 37, 351–384. [Google Scholar] [CrossRef]
Beheshti, A.; Vaghani, K.; Benatallah, B.; Tabebordbar, A. CrowdCorrect: A Curation Pipeline for Social Data Cleansing and Curation. In Proceedings of the Information Systems in the Big Data Era: CAiSE Forum 2018, Tallinn, Estonia, 11–15 June 2018; pp. 24–38. [Google Scholar]
Beheshti, A. Knowledge base 4.0: Using crowdsourcing services for mimicking the knowledge of domain experts. In Proceedings of the 2022 IEEE International Conference on Web Services (ICWS), Barcelona, Spain, 10–16 July 2022; pp. 425–427. [Google Scholar]
Domingos, P. A few useful things to know about machine learning. Commun. ACM 2012, 55, 78–87. [Google Scholar] [CrossRef]
Shahabikargar, M.; Beheshti, A.; Khatami, A.; Nguyen, R.; Zhang, X.; Alinejad-Rokny, H. Domain Knowledge Enhanced Text Mining for Identifying Mental Disorder Patterns. In Proceedings of the 2022 IEEE 9th International Conference on Data Science and Advanced Analytics (DSAA), Shenzhen, China, 13–16 October 2022; pp. 1–10. [Google Scholar]
Barukh, M.C.; Zamanirad, S.; Baez, M.; Beheshti, A.; Benatallah, B.; Casati, F.; Yao, L.; Sheng, Q.Z.; Schiliro, F. Cognitive augmentation in processes. In Next-Gen Digital Services. A Retrospective and Roadmap for Service Computing of the Future: Essays Dedicated to Michael Papazoglou on the Occasion of His 65th Birthday and His Retirement; Springer: Cham, Switzerland, 2021; pp. 123–137. [Google Scholar]
Beheshti, A.; Yang, J.; Sheng, Q.Z.; Benatallah, B.; Casati, F.; Dustdar, S.; Nezhad, H.R.M.; Zhang, X.; Xue, S. ProcessGPT: Transforming Business Process Management with Generative Artificial Intelligence. arXiv 2023, arXiv:2306.01771. [Google Scholar]
Kapiche. The Definitive Guide to Text Analytics for Customer Experience. 2023. Available online: https://www.kapiche.com/blog/the-definitive-guide-to-text-analytics-for-cx (accessed on 13 April 2025).
Brooks, M.; Amershi, S.; Lee, B.; Drucker, S.M.; Kapoor, A.; Simard, P. FeatureInsight: Visual support for error-driven feature ideation in text classification. In Proceedings of the 2015 IEEE Conference on Visual Analytics Science and Technology (VAST), Chicago, IL, USA, 25–30 October 2015; pp. 105–112. [Google Scholar]
Kotni, V.D.P. Paradigm shift from attracting footfalls for retail store to getting hits for e-stores: An evaluation of decision-making attributes in e-tailing. Glob. Bus. Rev. 2017, 18, 1215–1237. [Google Scholar] [CrossRef]
Blackwell, R.D.; Miniard, P.W.; Engel, J.F. Consumer Behavior, 10th ed.; Dryden Press: Chicago, IL, USA, 2006. [Google Scholar]
Solomon, R.; Bamossy, G.; Askegaard, S.; Hogg, M. Consumer Behaviour: European Perspective, 4th ed.; Prentice Hall: Harlow, UK, 2010. [Google Scholar]
Valaskova, K.; Kramarova, K.; Bartosova, V. Multi criteria models used in Slovak consumer market for business decision making. Procedia Econ. Financ. 2015, 26, 174–182. [Google Scholar] [CrossRef]
Khawaja, S.; Zia, T.; Sokić, K.; Qureshi, F.H. The impact of emotions on consumer behaviour: Exploring gender differences. Mark. Consum. Res. 2023, 88, 69–80. [Google Scholar]
Havlena, W.J.; Holbrook, M.B. The varieties of consumption experience: Comparing two typologies of emotion in consumer behavior. J. Consum. Res. 1986, 13, 394–404. [Google Scholar] [CrossRef]
Chandrashekar, G.; Sahin, F. A survey on feature selection methods. Comput. Electr. Eng. 2014, 40, 16–28. [Google Scholar] [CrossRef]
Zheng, A.; Casari, A. Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2018. [Google Scholar]
Xie, J.; Sage, M.; Zhao, Y.F. Feature selection and feature learning in machine learning applications for gas turbines: A review. Eng. Appl. Artif. Intell. 2023, 117, 105591. [Google Scholar] [CrossRef]
Bengio, Y.; Courville, A.; Vincent, P. Representation learning: A review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 1798–1828. [Google Scholar] [CrossRef]
Reddy, G.T.; Reddy, M.P.K.; Lakshmanna, K.; Kaluri, R.; Rajput, D.S.; Srivastava, G.; Baker, T. Analysis of dimensionality reduction techniques on big data. IEEE Access 2020, 8, 54776–54788. [Google Scholar] [CrossRef]
Guyon, I.; Elisseeff, A. An introduction to variable and feature selection. J. Mach. Learn. Res. 2003, 3, 1157–1182. [Google Scholar]
Cohen, I. Optimizing Feature Generation. 2019. Available online: https://medium.com/towards-data-science/optimizing-feature-generation-dab98a049f2e (accessed on 19 April 2025).
McKinsey. Data Preprocessing vs. Feature Engineering. 2023. Available online: https://www.iguazio.com/questions/data-preprocessing-vs-feature-engineering-whats-the-difference/ (accessed on 13 April 2025).
Pfingsten, T.; Herrmann, D.J.; Schnitzler, T.; Feustel, A.; Scholkopf, B. Feature selection for troubleshooting in complex assembly lines. IEEE Trans. Autom. Sci. Eng. 2007, 4, 465–469. [Google Scholar] [CrossRef]
Chai, X.; Deshpande, O.; Garera, N.; Gattani, A.; Lam, W.; Lamba, D.S.; Liu, L.; Tiwari, M.; Tourn, M.; Vacheri, Z.; et al. Social Media Analytics: The Kosmix Story. IEEE Data Eng. Bull. 2013, 36, 4–12. [Google Scholar]
Beheshti, A. Empowering Generative AI with Knowledge Base 4.0: Towards Linking Analytical, Cognitive, and Generative Intelligence. In Proceedings of the International Conference on Web Services (ICWS), Chicago, IL, USA, 2–8 July 2023. [Google Scholar]
Thematic. Sentiment Analysis: Comprehensive Beginner’s Guide. 2023. Available online: https://getthematic.com/sentiment-analysis (accessed on 13 April 2025).
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
Kingma, D.P. Auto-encoding variational bayes. arXiv 2013, arXiv:1312.6114. [Google Scholar]
Cao, Y.; Li, S.; Liu, Y.; Yan, Z.; Dai, Y.; Yu, P.S.; Sun, L. A comprehensive survey of ai-generated content (aigc): A history of generative ai from gan to chatgpt. arXiv 2023, arXiv:2303.04226. [Google Scholar]
Shi, Y.; Wang, B.; Yu, Y.; Tang, X.; Huang, C.; Dong, J. Robust anomaly detection for multivariate time series through temporal GCNs and attention-based VAE. Knowl.-Based Syst. 2023, 275, 110725. [Google Scholar] [CrossRef]
Wu, J.; Plataniotis, K.; Liu, L.; Amjadian, E.; Lawryshyn, Y. Interpretation for Variational Autoencoder Used to Generate Financial Synthetic Tabular Data. Algorithms 2023, 16, 121. [Google Scholar] [CrossRef]
Park, N.; Mohammadi, M.; Gorde, K.; Jajodia, S.; Park, H.; Kim, Y. Data synthesis based on generative adversarial networks. arXiv 2018, arXiv:1806.03384. [Google Scholar] [CrossRef]
Kate, P.; Ravi, V.; Gangwar, A. FinGAN: Chaotic generative adversarial network for analytical customer relationship management in banking and insurance. Neural Comput. Appl. 2023, 35, 6015–6028. [Google Scholar] [CrossRef]
Li, B.; Xie, J. Study on the Prediction of Imbalanced Bank Customer Churn Based on Generative Adversarial Network. J. Phys. Conf. Ser. 2020, 1624, 032054. [Google Scholar] [CrossRef]
Hofmann, P.; Rückel, T.; Urbach, N. Innovating with Artificial Intelligence: Capturing the Constructive Functional Capabilities of Deep Generative Learning. In Proceedings of the 54th Hawaii International Conference on System Sciences, Kauai, HI, USA, 5–8 January 2021. [Google Scholar]
Tirado-Olivares, S.; Navío-Inglés, M.; O’Connor-Jiménez, P.; Cózar-Gutiérrez, R. From Human to Machine: Investigating the Effectiveness of the Conversational AI ChatGPT in Historical Thinking. Educ. Sci. 2023, 13, 803. [Google Scholar] [CrossRef]
Su, J.; Yang, W. Unlocking the power of ChatGPT: A framework for applying generative AI in education. ECNU Rev. Educ. 2023, 6, 355–366. [Google Scholar]
Aydın, Ö.; Karaarslan, E. Is ChatGPT leading generative AI? What is beyond expectations? J. Eng. Smart Syst. 2023, 11, 118–134. [Google Scholar]
Azaria, A. ChatGPT Usage and Limitations; HAL Open Science: Lyon, France, 2022. [Google Scholar]
Datavid. How Is Text Mining Different from Data Mining? 2022. Available online: https://datavid.com/blog/text-mining-vs-data-mining (accessed on 13 April 2025).
International Research Journal of Engineering Science, Technology and Innovation. Sentiment Analysis: Techniques, Limitations, and Case Studies in Data Extraction and Classification. 2023. Available online: https://www.interesjournals.org/articles/sentiment-analysis-techniques-limitations-and-case-studies-in-data-extraction-and-classification-99020.html (accessed on 13 April 2025).
MIT xPRO. Exploring the Shift from Traditional to Generative AI. 2024. Available online: https://curve.mit.edu/exploring-shift-traditional-generative-ai (accessed on 13 April 2025).
Periáñez, Á.; Saas, A.; Guitart, A.; Magne, C. Churn prediction in mobile social games: Towards a complete assessment using survival ensembles. In Proceedings of the 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA), Montreal, QC, Canada, 17–19 October 2016; pp. 564–573. [Google Scholar]
Tamaddoni Jahromi, A.; Sepehri, M.M.; Teimourpour, B.; Choobdar, S. Modeling customer churn in a non-contractual setting: The case of telecommunications service providers. J. Strateg. Mark. 2010, 18, 587–598. [Google Scholar] [CrossRef]
Buckinx, W.; Van den Poel, D. Customer base analysis: Partial defection of behaviourally loyal clients in a non-contractual FMCG retail setting. Eur. J. Oper. Res. 2005, 164, 252–268. [Google Scholar] [CrossRef]
Lejeune, M.A. Measuring the impact of data mining on churn management. Internet Res. 2001, 11, 375–387. [Google Scholar] [CrossRef]
Agrawal, S.; Das, A.; Gaikwad, A.; Dhage, S. Customer churn prediction modelling based on behavioural patterns analysis using deep learning. In Proceedings of the 2018 International Conference on Smart Computing and Electronic Enterprise (ICSCEE), Shah Alam, Malaysia, 11–12 July 2018; pp. 1–6. [Google Scholar]
Rahman, M.; Kumar, V. Machine Learning-Based Customer Churn Prediction in Banking. In Proceedings of the 2020 4th International Conference on Electronics, Communication and Aerospace Technology (ICECA), Coimbatore, India, 5–7 November 2020; pp. 1196–1201. [Google Scholar]
Karvana, K.G.M.; Yazid, S.; Syalim, A.; Mursanto, P. Customer churn analysis and prediction using data mining models in banking industry. In Proceedings of the 2019 International Workshop on Big Data and Information Security (IWBIS), Bali, Indonesia, 11 October 2019; pp. 33–38. [Google Scholar]
Miao, X.; Wang, H. Customer Churn Prediction on Credit Card Services using Random Forest Method. In Proceedings of the 2022 7th International Conference on Financial Innovation and Economic Development (ICFIED 2022), Online, 14–16 January 2022; Atlantis Press: Dordrecht, The Netherlands, 2022; pp. 649–656. [Google Scholar]
Zhang, R.; Li, W.; Tan, W.; Mo, T. Deep and shallow model for insurance churn prediction service. In Proceedings of the 2017 IEEE International Conference on Services Computing (SCC), Honolulu, HI, USA, 25–30 June 2017; pp. 346–353. [Google Scholar]
Kilimci, Z.H.; Yörük, H.; Akyokus, S. Sentiment analysis based churn prediction in mobile games using word embedding models and deep learning algorithms. In Proceedings of the 2020 International Conference on Innovations in Intelligent Systems and Applications (INISTA), Novi Sad, Serbia, 24–26 August 2020; pp. 1–7. [Google Scholar]
Wang, A.X.; Chukova, S.S.; Nguyen, B.P. Data-Centric AI to Improve Churn Prediction with Synthetic Data. In Proceedings of the 2023 3rd International Conference on Computer, Control and Robotics (ICCCR), Shanghai, China, 24–26 March 2023; pp. 409–413. [Google Scholar]
Hasumoto, K.; Goto, M. Predicting customer churn for platform businesses: Using latent variables of variational autoencoder as consumers’ purchasing behavior. Neural Comput. Appl. 2022, 34, 18525–18541. [Google Scholar] [CrossRef]
Lemon, K.N.; Verhoef, P.C. Understanding customer experience throughout the customer journey. J. Mark. 2016, 80, 69–96. [Google Scholar] [CrossRef]
Rezvani, N.; Beheshti, A.; Tabebordbar, A. Linking Textual and Contextual Features for Intelligent Cyberbullying Detection in Social Media. In Proceedings of the 18th International Conference on Advances in Mobile Computing & Multimedia, Chiang Mai, Thailand, 30 November–2 December 2020; pp. 3–10. [Google Scholar]
Troop, N.A.; Chilcot, J.; Hutchings, L.; Varnaite, G. Expressive writing, self-criticism, and self-reassurance. Psychol. Psychother. Theory Res. Pract. 2013, 86, 374–386. [Google Scholar] [CrossRef]
De Choudhury, M.; Counts, S.; Horvitz, E. Major life changes and behavioral markers in social media: Case of childbirth. In Proceedings of the 2013 Conference on Computer Supported Cooperative Work, San Antonio, TX, USA, 23–27 February 2013; pp. 1431–1442. [Google Scholar]
Fernandes, A.C.; Dutta, R.; Velupillai, S.; Sanyal, J.; Stewart, R.; Chandran, D. Identifying suicide ideation and suicidal attempts in a psychiatric clinical research database using natural language processing. Sci. Rep. 2018, 8, 7426. [Google Scholar] [CrossRef]
Pennebaker, J.W.; Boyd, R.L.; Jordan, K.; Blackburn, K. The Development and Psychometric Properties of LIWC2015; Technical report; Pennebaker Conglomerates: Austin, TX, USA, 2015. [Google Scholar]
Mohammad, S.M.; Turney, P.D. Crowdsourcing a word–emotion association lexicon. Comput. Intell. 2013, 29, 436–465. [Google Scholar] [CrossRef]
Park, K.; Hong, J.S.; Kim, W. A methodology combining cosine similarity with classifier for text classification. Appl. Artif. Intell. 2020, 34, 396–411. [Google Scholar] [CrossRef]
Beheshti, A.; Benatallah, B.; Nouri, R.; Tabebordbar, A. CoreKG: A knowledge lake service. Proc. VLDB Endow. 2018, 11, 1942–1945. [Google Scholar] [CrossRef]
Miller, G.A. WordNet: A lexical database for English. Commun. ACM 1995, 38, 39–41. [Google Scholar] [CrossRef]
Heaton, J. Goodfellow, Ian and Bengio, Yoshua and Courville, Aaron and Bengio, Yoshua: Deep learning: The mit press, 2016, 800 pp, isbn: 0262035618. Genet. Program. Evolvable Mach. 2018, 19, 305–307. [Google Scholar] [CrossRef]
Arik, S.Ö.; Pfister, T. Tabnet: Attentive interpretable tabular learning. Proc. AAAI Conf. Artif. Intell. 2021, 35, 6679–6687. [Google Scholar] [CrossRef]

Figure 1. The process of constructing the Churn Knowledge Base (ChurnKB). (A) The pipeline for ChurnKB development, consisting of seven steps: literature review, feature identification, taxonomy development, validation, and API integration for the linking of churn-related phrases and calculation of scores. Feedback loops ensure refinement through expert validation. (B) A hierarchical snapshot of ChurnKB, illustrating relationships between customer feelings, behaviours, and churn-related concepts across four levels, from broad concepts to granular sub-instances.

Figure 2. A snapshot of some of the phrasal/lexical sources linked to the ChurnKB sub-instances. Each of the columns of the Excel file acts as a category related to one of the churn sub-instances.

Figure 3. The pipeline for developing a knowledge base-enhanced churn classifier. The process includes the following steps: (1) data curation: preprocessing of customer interactions to extract relevant linguistic features; (2) feature linking: mapping of extracted features to churn-related concepts in the churn knowledge base; (3) score calculation: assigning numerical scores to features; and (4) Classifier development: using these scores as inputs for a churn prediction model validated via expert feedback.

Figure 4. (A–D) Snapshots of sample questions and corresponding responses used to evaluate H1, H2, H3, and H4, respectively. (E) A snapshot of the negative word-of-mouth sub-concept, along with its corresponding instances and sub-instances, included in the second version of ChurnKB.

Figure 5. Questions related to the relevance of two ChurnKB sub-instances and their corresponding words and phrases that received an ‘irrelevant’ rating of more than 20% from experts and were, therefore, removed from the initial version of ChurnKB.

Figure 6. Survey-based evaluation of ChurnKB hypotheses. (i) Assessment of the initial churn taxonomy structure. (ii) Validation of new sub-instances (e.g., pronoun usage and absolute and certainty words). (iii) Evaluation of churn-related sub-instances and lexical lists. (iv) Assessment of the ChurnKB-enhanced classifier’s outcomes.

Table 2. Model performance comparison across different data scenarios, organized in ascending order, with XGBoost consistently appearing in all scenarios and showing performance improvement from A to C.

Model	Recall	Precision	F1 Score
(A) Scenario One: Numerical Data Only.
Tabnet	0.5000	0.7800	0.6094
Gradient Boosting	0.4970	0.7446	0.5894
XGBoost	0.5005	0.6830	0.5752
MLP	0.5065	0.6190	0.5571
LightGBM	0.4823	0.6711	0.5567
(B) Scenario Two: Textual Data Only.
Linear Discriminant Analysis	0.6308	0.7966	0.7021
MLP	0.6410	0.7463	0.6897
AdaBoost	0.6229	0.7661	0.6850
Tabnet	0.6301	0.7188	0.6715
XGBoost	0.6057	0.7565	0.6698
(C) Scenario Three: Numerical + Textual Data.
XGBoost	0.7085	0.8946	0.7891
LightGBM	0.7013	0.8963	0.7842
AdaBoost	0.7193	0.8531	0.7754
Tabnet	06623	0.7969	0.7234
MLP	0.6494	0.7812	0.7092

Table 3. Paired t-test results comparing model performance with and without ChurnKB-derived features across multiple evaluation metrics. Statistically significant improvements (p-value < 0.05) in all metrics suggest that ChurnKB features enhance model performance.

Metric	T-Statistic	p-Value	Metric	T-Statistic	p-Value
AUC	5.297949	0.000253	Recall	7.819004	0.000008
Precision	6.428685	0.000049	F1	8.786979	0.000003

Table 4. Ablation study results, measuring the impact of ChurnKB-enabled features.

Model Setup	F1 Score	Improvement over Baseline
Baseline (A)—numerical Only	0.5752	—
ChurnKB-enabled (B)—textual features only	0.6698	+9.4%
Full model (C)—numerical + textual features	0.7891	+21.3%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Shahabikargar, M.; Beheshti, A.; Mansoor, W.; Zhang, X.; Foo, E.J.; Jolfaei, A.; Hanif, A.; Shabani, N. ChurnKB: A Generative AI-Enriched Knowledge Base for Customer Churn Feature Engineering. Algorithms 2025, 18, 238. https://doi.org/10.3390/a18040238

AMA Style

Shahabikargar M, Beheshti A, Mansoor W, Zhang X, Foo EJ, Jolfaei A, Hanif A, Shabani N. ChurnKB: A Generative AI-Enriched Knowledge Base for Customer Churn Feature Engineering. Algorithms. 2025; 18(4):238. https://doi.org/10.3390/a18040238

Chicago/Turabian Style

Shahabikargar, Maryam, Amin Beheshti, Wathiq Mansoor, Xuyun Zhang, Eu Jin Foo, Alireza Jolfaei, Ambreen Hanif, and Nasrin Shabani. 2025. "ChurnKB: A Generative AI-Enriched Knowledge Base for Customer Churn Feature Engineering" Algorithms 18, no. 4: 238. https://doi.org/10.3390/a18040238

APA Style

Shahabikargar, M., Beheshti, A., Mansoor, W., Zhang, X., Foo, E. J., Jolfaei, A., Hanif, A., & Shabani, N. (2025). ChurnKB: A Generative AI-Enriched Knowledge Base for Customer Churn Feature Engineering. Algorithms, 18(4), 238. https://doi.org/10.3390/a18040238

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

ChurnKB: A Generative AI-Enriched Knowledge Base for Customer Churn Feature Engineering

Abstract

1. Introduction

2. Background

2.1. Customer Journey, Cognitive Status, and Behaviours

2.2. Customer Churn from an Analytical Perspective

2.3. Churn-Related Analysis in Various Domains

3. Method

3.1. Developing a Customer ChurnKB

3.1.1. Comprehensive Literature Review

3.1.2. Feature Identification

3.1.3. Taxonomy Development

3.1.4. Taxonomy Validation/Modification (Feedback Loop)

3.1.5. Developing a Sub-Instance to Phrasal/Lexical List Connector API

3.1.6. Linked List Validation (Feedback Loop)

3.1.7. Developing a Sub-Instance Score Calculator API

3.2. Developing a Knowledge Base-Enhanced Classifier for Identifying Customer Churn-Related Patterns

4. Evaluation

4.1. Evaluating the ChurnKB Development Approach

4.2. Evaluating Churn Classifier Performance

4.2.1. Data

4.2.2. Evaluation Metrics

4.2.3. Results

4.2.4. Statistical Validation

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI