1. Introduction
Measuring users’ satisfaction is a critical part of assessing successful interaction between humans and technologies. The telecommunications industry has emerged as a prominent sector in developed nations. The escalation of competition has been propelled by the proliferation of operators and advancements in technology [
1]. Enterprises are implementing diverse tactics to sustain themselves in this highly competitive marketplace. According to the extant literature [
2], three principal strategies have been proposed to augment revenue generation: (1) procuring a new clientele, (2) upselling to the extant clientele, and (3) prolonging the retention duration of the clientele. Upon analysing these strategies while considering their respective return on investment (RoI), it has been determined that the third strategy yields the greatest financial benefit [
2]. This discovery corroborates the idea that maintaining an existing customer is more cost effective than obtaining a new one [
3] and is also regarded as a less complicated tactic than the upselling technique [
4]. To execute the third strategy, corporations must address the potential occurrence of customer churn, which describes the phenomenon of customers transitioning from one provider to another [
5].
The pursuit of customer satisfaction is a key driver for telecommunication companies in the face of intense global competition. Numerous studies have established a positive correlation between customer satisfaction and both customer loyalty and customer churn [
6,
7,
8]. The phenomenon of customer churn is characterised in the telecommunications industry as the act of customers switching from one telecommunications service provider to another [
9]. According to recent research, the expense associated with acquiring a new customer surpasses that of retaining an existing customer [
10]. Currently, corporations exhibit a heightened level of interest in retaining their clientele. As demonstrated in the literature review, a multitude of investigations have been carried out in various sectors pertaining to customer relationship management (CRM) to handle customer retention and develop a proficient framework for forecasting customer churn.
Customer involvement is a crucial aspect of the operations of diverse small, medium, and large enterprises (SMEs). The success or failure of businesses or industries can be influenced by various factors, such as customer relations, loyalty, trust, support, feedback, opinions, surveys, and other forms of commentary, either independently or in conjunction with one another. Comprehending the requirements and comfort levels of customers holds great importance in both commercial and individual-based sectors, particularly in terms of customer satisfaction during or after service consumption. Researchers have used a range of techniques to obtain and anticipate customer feedback, such as social media platforms, electronic questionnaires, telephone calls, email correspondence, online mobile applications, and websites [
8]. Through the incorporation of customer feedback, comments, advice, suggestions, recommendations, and viewpoints, it is feasible to augment and broaden the calibre and volume of services [
11,
12].
Telecommunications is a vital global industry that has the potential to make a significant impact across various sectors, including business, defence, investment, production, and individual domains. The provision of a swift, dependable, protected, and precise service has the potential to enhance the calibre of service offered by the relevant communication enterprises. Thus, the anticipation of customer feedback holds significant importance for the progress of the nation. Several techniques from statistics, computer science, and theoretical and mathematical fields have been suggested and simulated to precisely forecast customer satisfaction. This is carried out to enhance service quality in accordance with the requirements and expectations of customers [
13,
14].
In Saudi Arabia, customer feedback holds significant importance for both government and private sector companies. The establishment of diverse departments to oversee and address customer grievances across multiple industries through varied approaches and mediums has fostered a highly competitive telecommunications market [
15]. The telecommunications sector in Saudi Arabia is currently experiencing notable changes in various aspects, including technological innovations, service provision, a competitive landscape, and the extension of telecommunications services to non-traditional domains. The services mentioned above include managed infrastructure, data/colocation centres, and cloud services. Some of the most prominent telecommunication firms operating in Saudi Arabia are the STC, Integrated Telecom Company (ITC), Saudi Mobily Company (Etisalat), Zain, Virgin, and Go Telecom [
16]. Saudi Arabia is one of the most densely populated nations in the Gulf Cooperation Council (GCC) region, with a demographic composition that is predominantly youthful. Emerging nations exhibit a strong inclination towards the adoption and application of cutting-edge technology in various domains, such as education, research, commerce, manufacturing, and more. The uncertain business situations of the future have been amplified by the high-speed 5G network and the COVID-19 pandemic, as stated in [
17]. In 2021, the 5G awards were bestowed upon STC, Mobily, and Zain. According to [
18], approximately 11 million individuals use the Twitter platform via both smartphones and computers. The user base is observed to be expanding at a swift pace owing to the growing population and interest [
18].
Our study centred on the assessment and examination of the efficacy of a collection of deep learning (DL) models in predicting customer attrition in the telecommunications industry. Various algorithms, including long short-term memory (LSTM), gated recurrent unit (GRU), BiLSTM, and convolutional neural networks (CNN) with LSTM (CNN-LSTM), were used to develop methods for data preparation, feature engineering, and feature selection.
This research contributes to the domain of customer satisfaction analysis by using Arabic tweets about Saudi telecommunications companies. It demonstrates the ability of several models using DL, including LSTM, GRU, BiLSTM, and CNN-LSTM, to predict customer satisfaction. The significance of social media as a platform where customers may express their positive and negative experiences with telecommunications services and products was further confirmed. The study’s findings have real-world relevance for Saudi Arabia’s telecommunications sector because they shed light on customer satisfaction and reveal opportunities for service enhancement. This information can inform business decisions, reduce customer churn due to dissatisfaction, and enhance customer service and loyalty.
The present study is organised as follows: The literature review thoroughly examines the pertinent research within the discipline. The methodology section comprehensively describes the dataset and the model architectures utilised. The section dedicated to experimental results comprehensively examines the obtained findings and their subsequent analysis. Finally, the study concludes by engaging in a comprehensive discussion and providing a conclusive summary.
2. Background of Study
Various methodologies have been used to forecast customer attrition in telecommunications firms. The majority of these methodologies employ machine learning (ML) and data mining techniques. The predominant body of literature has centred on the implementation of a singular data-mining technique for knowledge extraction, while alternative studies have prioritised the evaluation of multiple approaches for the purpose of churn prediction.
In their study, Brandusoiu et al. [
19] introduced a sophisticated data-mining approach to predict churn among prepaid customers. This approach involved the use of a dataset containing call details for 3333 customers, which included 21 distinct features. The dependent churn parameter in this dataset was binary, with values of either ‘Yes’ or ‘No’. The features encompass details pertaining to the quantity of incoming and outgoing messages as well as voicemail for individual customers. The PCA algorithm was used by the author to perform dimensionality reduction on the data. The research used three discrete ML algorithms—specifically neural networks, a support vector machine (SVM), and Bayes networks—to predict the churn factor. The author evaluated the algorithms’ performance using the area under the curve (AUC) as a metric. The present study involved the computation of the area under the receiver operating characteristic curve (AUC–ROC) for three distinct ML models: Bayes networks, neural networks, and SVM. The AUC values acquired were 99.10%, 99.55%, and 99.70%, respectively. The current study used a restricted dataset that was free from any instances of missing data. He et al. [
20] proposed a model that employed the neural network algorithm to tackle the problem of customer churn in a large telecommunications company in China that had a customer base of around 5.23 million. The metric used to assess the precision of predictions was the general accuracy rate, which yielded a score of 91.1%. Idris [
21] addressed the issue of churn in the telecommunications industry by presenting a methodology that employed genetic programming alongside AdaBoost. The efficacy of the model was evaluated using two established datasets: one from Orange Telecom and the other from cell2cell. The cell2cell dataset achieved an accuracy rate of 89%, while the other dataset achieved a rate of 63%. Huang et al. [
22] investigated customer churn within the context of the big data platform. The aim of the researchers was to exhibit noteworthy enhancement in churn prediction by leveraging big data, which is dependent on the magnitude, diversity, and speed of the data. The handling of data derived from the Operation Support and Business Support divisions of the largest telecommunications corporation in China required the implementation of a big data platform to enable the requisite manipulations. The use of the random forest algorithm was evaluated using the AUC metric.
A rudimentary set theory-based churn prediction model was proposed by Makhtar et al. [
23] for the telecommunications sector. The rough set classification technique outperformed the linear regression, decision tree, and voted perception neural network methods, as indicated in the aforementioned research. The problem of skewed data sets in churn prediction has been the subject of several studies. When the number of churned client classes falls below the number of active customer classes, this phenomenon occurs. In their research, Amin et al. [
24] compared six alternative oversampling strategies in the context of telecommunication churn prediction. The results showed that genetic algorithm-based rules-generation oversampling algorithms outperformed the other oversampling techniques evaluated.
Burez and Van den Poel [
25] investigated the issue of imbalanced datasets in churn prediction models. They conducted a comparative analysis of the efficacy of random sampling, advanced undersampling, gradient-boosting models, and weighted random forests. The model was evaluated using metrics such as AUC and Lift. The findings indicate that the undersampling technique exhibited superior performance compared to the other techniques tested. Individuals who use social media platforms, including Twitter, Facebook, and Instagram, tend to provide commentary and evaluations regarding a company’s offering because these platforms provide a means for users to express their opinions and exchange ideas concerning products [
9]. The process of sentiment analysis, also referred to as feedback mining, involves the use of natural language processing (NLP), statistical analysis, and ML to extract and classify feedback from textual inputs based on criteria such as subjectivity and polarity recognition [
6]. Individuals who use social media platforms, including but not limited to Twitter, Facebook, and Instagram, have been observed to provide commentary and evaluations regarding a company’s offerings because these platforms provide an avenue for individuals to express their viewpoints and exchange their perspectives on products [
9]. The process of sentiment analysis, also referred to as feedback mining, involves the use of NLP, statistical analysis, and ML to extract and categorise feedback from textual inputs based on factors such as subjectivity and polarity recognition [
6]. Furthermore, Pavaloaia and colleagues provided a concise definition of sentiment analysis as a social media tool that entails evaluating the presence of positive and negative keywords in text messages linked to a social media post [
10].
Recognition of the need for sentiment analysis is increasing [
9,
25]. This is attributed to the growing demand for the estimation and organisation of unstructured data from social media. The task of text mining is challenging because it involves the identification of topical words across various subjects. To effectively categorise these words into either positive or negative polarity, it is imperative to conduct sentiment analysis. Additionally, selecting appropriate sentiment signals for real-time analysis is crucial in this process [
26,
27]. The increasing prevalence of textual content sharing on social media has led to an increase in the use of text-mining and sentiment analysis techniques [
28,
29,
30].
The study conducted by [
31] involved an analysis of consumer sentiment expressed in a Jordanian dialect across the Facebook pages of multiple telecommunication businesses in Jordan, as well as on Twitter. The four fundamental classifiers used for the manual categorisation of all the gathered and processed attitudes are the SVM, K-nearest neighbour (k-nn), naïve Bayesnaïve (), and decision tree (DT). The present study used its results to exhibit the superiority of SVM over three other widely used sentiment classifiers. In [
27], the researchers aimed to ascertain the sentiment of user-generated content but were constrained to classifying comments instead of the actual posts.
Furthermore, [
32] employed Twitter as a medium for conducting sentiment analysis by scrutinising tweets in the English language originating from diverse businesses in Saudi Arabia. The researchers used K-nearest neighbour and naive Bayes algorithms to classify attitudes into three categories: positive, negative, and neutral. These classifications were based on their daily and monthly trend observations. Furthermore, the K-nearest neighbour algorithm, an ML methodology, was employed to examine user sentiment in the present investigation. Nonetheless, the exclusion of Arabic opinions may have resulted in a less comprehensive dataset.
The study used a sentiment analysis of Facebook posts as a means of assessing the efficacy of social media posts in supporting effective self-marketing strategies on social media platforms. A reference for this study is available. Furthermore, according to a study conducted by [
33], the implementation of sentiment analysis results in a rise in negative sentiment among followers during phases of reduced user-generated activity. This is noteworthy, as sentiment analysis consistently yields supplementary insights beyond those derived from solely analysing comments, likes, and shares of articles. Research has demonstrated that a single published article has the potential to generate a substantial number of comments, which can be subjected to sentiment analysis using an ML-based approach.
The researchers [
34,
35,
36] used a range of deep learning approaches to establish the correlation between several organizations and their clients, drawing from feedback, quality assessments, comments, and surveys conducted across many domains. In the field of natural language processing (NLP), these three approaches have garnered significant interest because of their exceptional accuracy in text categorization analysis. These methods have shown to be indispensable in many sectors, including commercial and consumer interactions, as well as in predicting societal implications on future trends. The user has provided a numerical sequence.
5. Results
This section presents the results of various DL models, namely BiLSTM, CNN-LSTM, GRU, and LSTM, for sentiment analysis of Arabic customer satisfaction. Several evaluation metrics, such as accuracy, precision, and the F1 score, were used to assess the quality of these models.
Table 9 shows the results of the DL models. The training accuracy for BiLSTM was 97.84%, while the test accuracy was 96.40%. With a sensitivity of 91.67% and a specificity of 98.58 percent, it showed a healthy middle ground. The overall classification ability was measured by an AUC score of 96.44% and an F1 score of 94.14%, which considered both precision and recall. CNN-LSTM scored 96.82% on the accuracy test, which was slightly higher than BiLSTM’s score of 96.80%. Its specificity remained high, at 98.58%, while its sensitivity increased to 93.1%. In spite of a slight drop in AUC (96.17%), the F1 score improved to 94.86%. The test results showed that GRU, similar to CNN-LSTM, had a sensitivity of 93.02% and a specificity of 98.58%. However, it improved upon the previous version’s AUC score of 96.57% and F1 score of 94.86%.
When compared to other models, LSTM achieved the best results. Its test accuracy was 97.03%, which was nearly as high as its 98.04% training accuracy. LSTM also had the highest sensitivity (93.34%) and specificity (98.72%) of all the models, indicating that it was the best at making the right positive and negative identifications. It performed admirably across the board, with an F1 score of 95.19% and an AUC of 96.35%.
Figure 9 shows a comparison of the performance of the models.
The models’ performance on the task was very high. However, LSTM excelled above all other models in terms of accuracy, sensitivity, specificity, F1 score, and AUC.
The LSTM model trained for 20 epochs and early stopping at 8 epochs. The performance of the model in training accuracy was 98.04%, and the testing accuracy was 97.03%, as shown in
Figure 10a,b. The model achieved a sensitivity of 93.34%, a specificity of 98.72%, and an F1 score of 95.19%. Additionally, the model achieved an AUC of 96.35%.
The GRU model trained for 20 epochs and early stopping at 7 epochs. The performance of the model in training accuracy was 98.07%, and the testing accuracy was 96.82%, as shown in
Figure 11a,b. The model achieved a sensitivity of 93.2%, a specificity of 98.58%, and an F1 score of 94.86%. Additionally, the model achieved an AUC of 96.57%.
The BiLSTM model was trained for 20 epochs and early stopping at 12 epochs. The performance of the model in training accuracy was 97.84%, and the testing accuracy was 96.40%, as shown in
Figure 12a,b. The model achieved a sensitivity of 91.67%, a specificity of 98.58%, and an F1 score of 94.14%. Additionally, the model achieved an AUC of 96.44%.
The BiLSTM model was trained for 20 epochs and early stopping at 12 epochs. The performance of the model in training accuracy was 97.82%, and the testing accuracy was 96.82%, as shown in
Figure 12a,b. The model achieved a sensitivity of 93.02%, a specificity of 98.58%, and an F1 score of 94.86%. Additionally, the model achieved an AUC of 96.17%.
This study’s customer satisfaction level findings help improve services and retain regular clients. This research detailed the models’ sensitivity, specificity, and positive and negative predictive values, as described in
Figure 13. With only 35 FPs and 84 FNs, LSTM achieved 2704 true positives and a 1177 TN accuracy ratio. Among the 2700 positive results and 1173 negative results that GRU found, there were only 39 FPs and 88 FNs. Exactly the same numbers of true positives (2700), FPs (39), FNs (88), and true negatives (1177) were generated by both BiLSTM and CNN-LSTM.
Although there were some differences between the models in terms of the proportions of correct predictions, incorrect predictions and FNs, all of them performed a respectable job. LSTM had the highest proportion of correct positive and negative identifications, demonstrating its superior ability to detect customer satisfaction. The confusion metrics of the deep learning models is presented in
Figure 14.
Figure 15 shows a comparison of the confusion metrics of DL models.
6. Discussion
The phenomenon of customer churn represents a significant challenge and a top priority for major corporations. Owing to its significant impact on corporate revenues, particularly within the telecommunications industry, companies are actively pursuing strategies to forecast potential customer churn. Hence, identifying the determinants that contribute to customer attrition is crucial to implementing appropriate measures aimed at mitigating this phenomenon. Our study’s primary objective was to create a churn prediction model that can aid telecommunication operators in identifying customers who are at a higher risk of churning.
This paper used Arabic tweets from Saudi telecommunications companies. The new restrictions on Twitter prevent data collection from tweets using the Python scripter. The restrictions, which were put in place in January 2023, limit the number of tweets a single user or application can collect in a given period. This makes it more difficult to collect large datasets of tweets, which is often necessary for data mining and other research purposes. This study compared four models for predicting customer satisfaction. Models such as LSTM, GRU, BiLSTM, and CNN-LSTM were tested. The research confirmed the significance of customers’ use of social media to share their experiences, both good and bad, with a company’s services or products.
Figure 16 shows the ROC of the deep learning models. The problem was solved by creating and training DL methods on the open-source AraCust dataset. The LSTM model stood out because it had the highest training and test accuracy for text classification: 98.04% and 97.03%, respectively.
The comparison results of proposed deep learning models and existing models for sentiment analysis for Arabic customer satisfaction on the racist dataset are presented in
Table 10. This is related to the telecommunication sectors of Saudi Arabia. Almuqren et al. [
49] roposed two models: Bi-GRU and LSTM. The BiG RU model achieved an accuracy of 95.16%, while the LSTM model achieved 94.66% accuracy. Aftan and Shah [
50] proposed three other models: RNN, CNN, and AraBERT. The AraBERT model achieved 94.33% accuracy, the RNN model achieved an accuracy of 91.35%, and the CNN model achieved 88.34% accuracy. Almuqren et al. [
46] proposed a SentiChurn model and obtained an accuracy of 95.8%. In this study, we proposed several DL models; the best accuracy result achieved by an LSTM model was 97.03%, and it also achieved the highest accuracy among the existing studies.
7. Conclusions
The significance of conducting research in the telecommunications industry lies in its potential to enhance the interaction between users and technologies and, therefore, to improve companies’ profitability. It is widely acknowledged that the ability to forecast customer churn is a critical revenue stream for telecommunications enterprises. Therefore, the objective of this study was to construct a predictive system for customer churn in Saudi Arabian telecommunications companies. This study used DL and sentiment analysis to make important decisions about increasing customer loyalty and satisfaction. This research can help the telecommunications industry better serve its customers and address their concerns as social media continues to shape public opinion. This study used sentiment analysis to assess customer satisfaction with STC, Mobily, and Zain services, and to inform business decisions. The study confirmed social media’s value as a platform for consumers to share their positive and negative experiences with a company’s products or services. Communication is vital to Saudi life and, so, online discussions are inevitable. In this study, sophisticated DL models were trained on the available online dataset AraCust, which was collected from Arabic tweets. The proposed models in this study were LSTM, GRU, BiLSTM, and CNN-LSTM. The LSTM model had the highest training (98.04%) and test accuracy (97.03%) in text classification. The model’s superior sensitivity to identifying customer satisfaction showed its potential to help telecommunications providers reduce customer churn caused by dissatisfaction with their offerings. The researcher aimed to enhance their existing research model by incorporating sophisticated DL techniques, such as transform models and time series models, to enhance its precision.
This research paper provided a substantial contribution to the domain of customer satisfaction analysis in the Arabic language. It is a crucial area of investigation, given the population of Arabic speakers in the world. The study effectively showed the ability of different deep learning models to accurately predict customer satisfaction through analysing Arabic tweets. This study highlighted the importance of social media platforms as valuable mediums through which customers can share their experiences, which helps business owners improve service quality and maintain customer loyalty.