Explainable Deep Learning for COVID-19 Vaccine Sentiment in Arabic Tweets Using Multi-Self-Attention BiLSTM with XLNet

Sweidan, Asmaa Hashem; El-Bendary, Nashwa; Taie, Shereen A.; Idrees, Amira M.; Elhariri, Esraa

doi:10.3390/bdcc9020037

Open AccessArticle

Explainable Deep Learning for COVID-19 Vaccine Sentiment in Arabic Tweets Using Multi-Self-Attention BiLSTM with XLNet

by

Asmaa Hashem Sweidan

¹,

Nashwa El-Bendary

²

,

Shereen A. Taie

¹

,

Amira M. Idrees

^3,*

and

Esraa Elhariri

¹

Faculty of Computers and Artificial Intelligence, Fayoum University, Fayoum 63514, Egypt

²

College of Computing and Information Technology, Arab Academy for Science, Technology, and Maritime Transport (AASTMT), Aswan 81516, Egypt

³

College of Business, King Khalid University, Abha 61421, Saudi Arabia

^*

Author to whom correspondence should be addressed.

Big Data Cogn. Comput. 2025, 9(2), 37; https://doi.org/10.3390/bdcc9020037

Submission received: 29 November 2024 / Revised: 18 January 2025 / Accepted: 24 January 2025 / Published: 10 February 2025

(This article belongs to the Special Issue Application of Deep Learning and Convolution Neural Networks for Social Healthcare)

Download

Browse Figures

Versions Notes

Abstract

:

The COVID-19 pandemic has generated a vast corpus of online conversations regarding vaccines, predominantly on social media platforms like X (formerly known as Twitter). However, analyzing sentiment in Arabic text is challenging due to the diverse dialects and lack of readily available sentiment analysis resources for the Arabic language. This paper proposes an explainable Deep Learning (DL) approach designed for sentiment analysis of Arabic tweets related to COVID-19 vaccinations. The proposed approach utilizes a Bidirectional Long Short-Term Memory (BiLSTM) network with Multi-Self-Attention (MSA) mechanism for capturing contextual impacts over long spans within the tweets, while having the sequential nature of Arabic text constructively learned by the BiLSTM model. Moreover, the XLNet embeddings are utilized to feed contextual information into the model. Subsequently, two essential Explainable Artificial Intelligence (XAI) methods, namely Local Interpretable Model-Agnostic Explanations (LIME) and SHapley Additive exPlanations (SHAP), have been employed for gaining further insights into the features’ contributions to the overall model performance and accordingly achieving reasonable interpretation of the model’s output. Obtained experimental results indicate that the combined XLNet with BiLSTM model outperforms other implemented state-of-the-art methods, achieving an accuracy of 93.2% and an F-measure of 92% for average sentiment classification. The integration of LIME and SHAP techniques not only enhanced the model’s interpretability, but also provided detailed insights into the factors that influence the classification of emotions. These findings underscore the model’s effectiveness and reliability for sentiment analysis in low-resource languages such as Arabic.

Keywords:

Arabic sentiment analysis (ASA); deep learning; explainable artificial intelligence (XAI); multi-self-attention; natural language processing; XLNet embeddings

1. Introduction

The COVID-19 pandemic widely changed the world through initiating extensive discussions among countless people about various aspects, including sharing their thoughts regarding COVID-19 vaccines and debating the pros and cons of vaccination. Social media platforms like X (formerly known as Twitter) became a central hub for sharing information and exchanging views that reflect what people think about COVID-19 vaccines. However, it poses significant challenges to comprehend these conversations, particularly if they are written in one of the low-resource languages, like the Arabic language. This is due to the complex nature of the Arabic language, considering grammar and syntax, dialectal variations, in addition to the limited available amounts of annotated data for training Natural Language Processing (NLP) models [1,2].

During the COVID-19 pandemic, people’s hesitancy towards vaccination or their refusal to get vaccinated put public health at significant risk. Thus, leveraging text analysis models to gain more insights about the attitudes towards COVID-19 vaccines, as expressed by social media users, can give a thorough understanding of how people feel and recognize potential reasons people might have for resisting vaccination [3].

In Arabic sentiment analysis, traditional Machine Learning (ML) models face challenges in accurately capturing sentiment context. Moreover, the limited availability of reliable and interpretable models for sentiment analysis in low-resource languages like Arabic presents a notable difficulty in utilizing Explainable Artificial Intelligence (XAI). Consequently, the lack of interpretability impairs the ability to comprehend the decision-making processes of these models, thereby limiting their effectiveness in analyzing essential sentiment aspects within Arabic text.

This paper presents a comprehensive approach for Arabic sentiment analysis of COVID-19 vaccine-related tweets. The primary component of this approach is a Multi-Self-Attention BiLSTM model, which leverages the strengths of both architectures to effectively capture long-term contextual dependencies and complex grammatical characteristics of the Arabic language [4,5]. Moreover, it employs XLNet embeddings for further improving sentiment classification accuracy [6]. In addition, in order to ensure model transparency and provide valuable insights into the factors influencing sentiment predictions, the proposed approach in this research incorporates two prominent XAI techniques: LIME and SHAP.

The proposed approach used a BiLSTM paired with a Multi-Self-Attention (MSA) mechanism to effectively analyze long-term contextual relationships based on the MSA’s ability to capture sequential dependencies and dynamic context weighting [4,5]. Additionally, it is particularly effective for handling the rich morphology and grammar of the Arabic language, thus enhancing the model’s ability to extract subtle meanings from the text. That is due to the ability of the combined Multi-Self-Attention BiLSTM model to capture the sequential nature of the Arabic text while focusing on the important parts of the input contextual information. Moreover, XLNet embeddings were chosen to extract features due to their ability to outperform other models such as BERT for capturing bidirectional contextual dependencies [6]. Also, the autoregressive pre-training technique of XLNet presents suitable handling for Arabic sentiment analysis through effectively managing the complex relationships in text, despite the Arabic language’s morphological complexities.

SHAP and LIME XAI techniques were chosen for this study based on their solid capabilities in enhancing the interpretability of ML models, and particularly complex DL models. Also, they were employed for ensuring transparency and addressing the challenges of sentiment analysis in Arabic tweets. Moreover, employing SHAP and LIME methods helps provide both global and local insights that are essential for sentiment analysis in Arabic tweets. Using SHAP offers a consistent approach, based on cooperative game theory, to impute feature contributions, while ensuring fair feature importance distribution and comprehensive model interpretability [6]. On the other hand, LIME enables model-agnostic local interpretation, which facilitates understanding features’ individual predictions [7]. Other XAI techniques such as Gradient-weighted Class Activation Mapping (Grad-CAM), which is more effective for visual models, and Deep Learning Important FeaTures (DeepLIFT), which shows high computational complexity that hinders its practicality for large-scale datasets, are less effective and suffer several limitations in handling text data [8,9].

The major contributions of this manuscript are summarized as follows:

Designing an explainable DL approach for sentiment analysis of Arabic tweets by utilizing MSA-BiLSTM and XLNet embeddings.
Developing a robust model, which is capable of accurately classifying the sentiment of Arabic tweets related to COVID-19 vaccines.
Generating interpretability of decisions within the proposed sentiment analysis approach to provide transparency into the factors influencing its predictions.

The remainder of this paper is organized as follows.

In Section 2, existing research on sentiment analysis is reviewed, focusing on studies analyzing sentiment in Arabic social media data related to the COVID-19 vaccination topic. Section 3 describes the proposed explainable DL approach, encompassing the MSA-BiLSTM architecture, the use of XLNet embeddings, and the deployment of LIME and SHAP XAI techniques. Section 4 presents the experimental results obtained by evaluating the proposed approach on a benchmark dataset of Arabic tweets related to COVID-19 vaccines and discusses the key insights derived from the XAI explanations. Finally, Section 5 highlights the contributions and summarizes the key findings of the conducted study, with a discussion of potential future directions.

To enhance clarity, throughout the following sections, the corresponding English translations of the Arabic terms/phrases employed to explain the functionalities of the proposed approach modules will be included in parentheses

()

alongside the Arabic text.

2. Related Work

This section presents recent work related to the work proposed in this study.

In [10], the authors developed a model to predict individual awareness of COVID-19 precautionary procedures in five main regions of Saudi Arabia. The dataset used consists of Arabic COVID-19 related tweets from the curfew period. For this study, the authors implemented several ML models. Among them, the Support Vector Machine (SVM) classifier with bigrams in TF-IDF demonstrated the best performance, achieving an accuracy of 85%.

Furthermore, a manually annotated dataset of COVID-19 vaccine-related Arabic tweets, entitled ArCovidVac, was introduced by Mubarak et al. in [11]. The constructed dataset comprises 10,000 Arabic tweets, which cover various countries in the Arab region. The tweets are annotated based on their stance on the vaccination process into a pro-vaccination, against-vaccination, or neutral stance. Further analysis was performed to investigate the public stance over time. Benchmark experiments were conducted considering various tasks, including tweet classification and stance detection, using several transformer architectures. The AraBERT model achieved an F1-score of 80% on the constructed dataset for the binary classification task.

Moreover, in [12], the author developed a large-scale dataset, which substantially consists of seven databases, containing more than one million Arabic tweets on the COVID-19 vaccine-related discussions. Based on text analysis, the author identified that among the most frequent words, the Arabic terms “جرعة، كورونا”, corresponding to (corona, dose) English terms, were the most popular terms in discussions.

Also, in [2], the authors proposed an approach to explore the sentiment polarity of Arabic tweets shared publicly by the population of the Gulf region, on particular types of COVID-19-related vaccines. In the presented study, the authors compiled an annotated and filtered dataset of 32,476 Arabic tweets relevant to the topic of the COVID-19 vaccine (ASAVACT). Word embeddings (AraVec and FastText) and TF-IDF were used as feature extraction. Several variants of the Gated Recurrent Unit (GRU) DL model, namely Stacked Gated Recurrent Unit (SGRU), Stacked Bidirectional Gated Recurrent Unit (SBi-GRU), and the ensemble of SGRU, in addition to the AraBERT model, were investigated for analyzing the data. The ensemble of the SGRU model proved its superiority over other proposed models by achieving an accuracy of 81.67% on the ASAVACT dataset, demonstrating its effectiveness in analyzing sentiment polarity.

In addition to the models presented, the authors in [13] introduced COVID-Twitter-BERT (CT-BERT), a domain-specific transformer-based model derived from the BERT-LARGE base model [14]. CT-BERT extracts features to create contextual inclusions that encapsulate semantic details for COVID-19-related tweets. This model was pre-trained on a large corpus of 160 million COVID-19 tweets and was designed to analyze tweets regarding COVID-19 stances and vaccination sentiments. The CT-BERT model achieved a mean F1-score of 0.833.

In the same vein, the authors of [15] demonstrated a hybrid Convolutional Neural Network (CNN) and BiLSTM model to analyze Arabic tweets related to COVID-19 vaccines. By employing various word embedding techniques (Aravec, FastText, and ArWordVec), the proposed model effectively captured both local and global contextual information within the tweets. Experimental results highlighted the effectiveness of the proposed model using three benchmark datasets, affirming its superiority over several current DL models in classifying Arabic tweets related to COVID-19 vaccines. The proposed model achieved F-measures of 80.5%, 76.76%, and 87% on ArCovidVac, AraCOVID19-SSD, and SenWave datasets, respectively.

Additionally, in [16], the author of [12] with other collaborators introduced an approach for identifying self-reported side effects of COVID-19 vaccines in Arabic tweets and grouping the extracted symptoms in clusters based on their co-occurrence patterns, using Biterm Topic Modeling (BTM) and SVM methods. An experimental analysis of 65,387 Arabic tweets related to six COVID-19 vaccines extracted 51 distinct symptoms. The key insight highlighted several potential connections between certain groups of symptoms. The proposed approach achieved an accuracy and an F1-score of 90.9% and 0.876, respectively.

In [17], the authors designed an ensemble-based BERT model for determining COVID-19 vaccination sentiment using Twitter data. The proposed model was validated using the IRMiDis dataset, released by Basu et al. [18], which includes 2792 English language tweets about COVID-19 vaccination, labeled as Anti-Vax, Pro-Vax, or Neutral, and it achieved a micro-F1 score and an accuracy of 0.532.

In [19], the authors of [17] developed an explainable DL approach for the sentiment analysis of tweets about COVID-19 vaccination. The proposed approach combined a pre-trained Bidirectional Encoder Representations from Transformers (BERT) language model with a Long Short-Term Memory (LSTM) network to effectively understand the context of the tweets. A significant sentiment classification performance was achieved with an F1-score of 0.88, using the same IRMiDis dataset from [18]. Furthermore, in a related study [20], the authors proposed a framework that combines multi-aspect-based sentiment analysis with LIME XAI techniques for enhancing the interpretability of sentiment prediction. The proposed approach utilized a Latent Dirichlet Allocation (LDA) algorithm for aspect extraction, along with a hierarchical neural network architecture for sentiment prediction. The proposed approach outperformed the state-of-the-art models in accuracy and interpretability by achieving an accuracy and F1-score of 93.17% and 0.9318, respectively, using several benchmark datasets of annotated reviews and tweets from different domains.

Based on the reviewed studies, conventional sentiment analysis approaches showed recurrent problems in understanding the subtle difference in meanings, especially in low-resource languages. This can lead to errors in classifying the polarity of vaccine-related tweets. Moreover, the complexity of natural languages makes it difficult for traditional sentiment analysis models to provide transparent explanations for their predictions. To address these challenges and gain a deeper understanding of the factors influencing vaccine sentiment, this research employs XAI techniques jointly with Multi-Self-Attention BiLSTM with XLNet embedding-based sentiment analysis, which provide an interpretable decision-making process of the proposed approach and build trust in its results.

Table 1 presents a summary of several related studies to COVID-19 vaccination sentiment analysis from social media posts.

3. Proposed Approach

This section describes the proposed XAI-based sentiment analysis model for Arabic tweets concerning COVID-19 vaccines. As depicted in Figure 1, the proposed approach comprises five key phases, namely data preparation, preprocessing, feature extraction, classification, and XAI.

3.1. Dataset Preparation

The aim of this phase is to prepare the dataset for the proposed model. The dataset is generated from two different datasets, namely Dataset-I and Dataset-II. After fetching each dataset using tweets IDs and neglecting empty tweets, the prepared tweets from the two datasets are combined to form the used dataset.

3.2. Preprocessing

In the proposed approach, data preprocessing is a key preliminary phase, which substantially impacts the model performance. The preprocessing pipeline employed in the current study utilizes our previously developed methodology outlined in [21].

The preprocessing steps include data cleaning to handle missing values and outliers, normalization to ensure consistent feature scales, and feature extraction to derive meaningful representations from the raw data. By adhering to these established preprocessing steps, we establish a robust foundation for the subsequent phases of our research and facilitate direct comparison with the findings of study [21], as illustrated in Figure 2.

The common steps of the preprocessing phase in sentiment analysis are listed as follows:

Tokenizing social input text: Present each word in the input social text as a token that could be further divided into several sub-words [21].
Converting Franco-Arabic text into Arabic: Check Google’s API and match the Franco-Arabic words, which are Arabic words created using a combination of Latin characters, with their corresponding original Arabic words, such as writing the word (تطعيم) (Vaccination) as “tat3eem” [22].
Removing elongation: Remove repeated letters used for emphasizing/expressing strong emotions, such as writing the word as (!أخبااار رااائعة) (Great News!), to reduce words to their standard form. In the proposed approach, at most two repeated consecutive letters are kept. Also, elongation (also known as تطويل “Tatweel”), which means stretching/lengthening words by increasing the length of some characters (by using the “kashida” character “-”), is adjusted to unify the letters that can appear in different forms [23].
Letter normalization: Use the normalization procedure to eliminate punctuation and diacritics “Tashkeel”, such as (ضمة “damma” or ُ in the word كُورونا (Corona)).
Word stemming: Break Arabic words down to their word root form or stem using the Information Science Research Institute (ISRI) rule-based stemmer, such as stemming the text [24]: ( للجميع الآن متاح كورونا لقاح رائعة! أخبار) (Great News! The COVID-19 vaccine is now available for everyone) to be ( متاح، آن، جميع كورونا، لقاح، رائع، خبر،) (news, great, vaccine, corona, available, now, everyone).
Lemmatization: Use morphological analysis of words by removing inflectional endings (suffixes, prefixes, etc.), such as mapping the word (أخبار ) (news) to the lemma (خبر ) (news) and the word (اجراءات) (procedures) to the lemma (اجراء) (procedure). It is used to reduce morphological variations.
Part of Speech (PoS) Tagging: The proposed method develops a PoS tagger using a rule-based linguistic approach. Words in a given text are annotated according to their definitions and contextual usage. For instance, the word (الملتزمين) (adhering ones) is assigned the tag (NNS: Noun Phrase Plural) because it represents the plural form of the noun (الملتزم) (adhering one) and is preceded by the preposition (من) (of). The complete set of tags and their conventional meanings follow the International Corpus of Arabic (ICA) tagset, employing a rule-based tagging approach.
Polarity Identification: PoS tagging is used to reduce the dimensionality of the matrix and consequently enhance model performance. The Stanford PoS tagging tool is utilized to assign polarity tags to words, such as tagging the PoS (الاحترازية) (precautionary) as negative.
This paper uses ArSenL, a supervised Arabic sentiment analysis tool, for PoS tagging.

These steps have reduced ambiguity and improved the input quality for subsequent embedding and modeling. Table 2 illustrates examples of tweet preprocessing with sentiment labeling.

3.3. Feature Extraction

In the proposed approach, feature extraction is the main phase responsible for generating word vectors. The feature extraction phase employed in the current study utilizes our previously developed XLNet methodology outlined in [21]. XLNet is a transformer-based model that enhances BERT by capturing the powerful relationships between context words while retaining the strengths of autoregressive models. It is widely used in many natural language processing tasks for its strong performance [6,21].

3.4. Classification

This phase aims to identify all topics concerning the impact of COVID-19 vaccines on the input tweets. The proposed model utilizes a BiLSTM DL model, which analyzes text bi-directionally, making it capable of handling long documents effectively. The key advantage of using BiLSTM is its capability to capture sequential data by considering prior information. The BiLSTM model employed in the current study utilizes our previously developed methodology outlined in [21].

3.5. XAI Mechanisms

The black-box nature of AI models makes it difficult to understand the related decision-making processes, leading to complexity in interpreting the outcomes and ensuring fairness. As a result, there is a growing need for XAI techniques for improving trust in AI models [25].

XAI is the capability of AI systems, such as ML or DL models, to give meaningful justification for their decision-making processes. The major goal of XAI is to make the behavior of AI systems more understandable to human users by clarifying the underlying mechanisms of their processes and decisions [26].

Two widely used XAI techniques are LIME and SHAP. These techniques offer insights into the aspects that influence a model’s prediction, thus improving transparency and accountability in ML and DL models [27]. The two XAI techniques employed in the current study are illustrated in Algorithms 1 and 2.

Algorithm 1 LIME: Generating Explanations for Arabic Text Sentiment

Input: An Arabic sentence (instance) $X_{i}$ ; BiLSTM model f
Output: Feature Importance scores (word importance); Explanations

1:: Create N perturbed $N_{s}$ samples around the input instance.
2:: Use the provided BiLSTM black-box model f to predict the sentiment probability distribution $(y_{j})$ for each perturbed sample $X_{j}$ .
3:: Calculate weights $w_{j}$ of the perturbed samples $X_{j}$ based on the cosine similarity on XLNet word embeddings vectors.
4:: Train a simple and interpretable model on the perturbed samples $(X_{j}, y_{j}, w_{j})$ .
5:: Extract feature importance scores from the trained interpretable model.

Algorithm 2 SHAP: Generating Explanations for Arabic Text Sentiment

Input: An Arabic sentence (instance) $X_{i}$ ; BiLSTM mModel f; Reference value r; Number of features N
Output: SHAP values $ϕ$ for all features of $X_{i}$

1:: Initialize a list to store SHAP values for each feature: $ϕ$ = [0] ∗ N.
2:: Generate all possible features’ subsets (words) from $X_{i}$ and store them in a list S. /∗ sampling techniques should be considered to reduce computational load, especially for large sentences. /∗
3:: For each subset s in S:
Compute the prediction of the BiLSTM model on the subset: $f (s)$
4:: For each feature j in s:
4.1: Compute the prediction of the BiLSTM model on the subset without feature j: $f (s - j)$ .
4.2: Calculate the marginal contribution of feature j: $M C_{j} (s) = f (s) - f (s - j)$ .
4.3: Update the SHAP value of feature j: $ϕ$ [j] = $ϕ$ [j] + $M C_{j} (s)$ .
5:: For each feature j:
Calculate the average marginal contribution: $ϕ$ [j] = $ϕ$ [j] / $| S_{j} |$ , where $| S_{j} |$ is the number of subsets containing feature j.
6:: Return the list of SHAP values $ϕ$ .

For applying the LIME technique, as detailed in Algorithm 1, consider the following input instance: “الوضع الوبائي الحالي خطير جدا” (The current epidemiological situation is very serious).

The following dataset of perturbed

N_{s}

samples shall be generated around the input instance:

“الوضع الوبائي خطير جدا” (The epidemiological situation is very serious);
“الوضع الوبائي الحالي خطير” (The current epidemiological situation is serious);
“الوبائي الحالي خطير جدا” (The current epidemic is very serious).

After training the interpretable model using the generated samples and their corresponding weights, the model shows the impact of the most important coefficients as follows:

“خطير” (serious) → High negative coefficient;
“جدا” (very) → High negative coefficient;
“الوضع” (situation) → Medium negative coefficient.

On the other hand, in Algorithm 2, the SHAP technique generates an explanation for the prediction of an instance “اللقاح فعال وآمن” (The vaccine is effective and safe). After training the model, the SHAP values represent the contribution of each word to the model’s prediction as follows:

“فعال” (effective) → Positive SHAP value;
“آمن” (safe) → Positive SHAP value;
“اللقاح”(vaccine) → Neutral SHAP value.

4. Results and Discussion

This section discusses the details of the conducted experiments to investigate which features impact the proposed model’s ability to recognize user opinion about COVID-19 vaccines and evaluate its performance. Conducted experiments were carried out using Google Colab (Version: 1.0.0).

4.1. Datasets and Evaluation Metrics

The proposed model generates datasets of Arabic tweets regarding COVID-19 vaccines in the Arabic region from two different datasets, as presented in Table 3. The utilized datasets in this study are (1) Dataset-I, which consists of 1.1 million Arabic tweets about the COVID-19 vaccines [28], and (2) Dataset-II, known as the ArCovidVac dataset, which contains 10,002 Arabic tweets about COVID-19 vaccination, annotated using an Arabic Speech-Act and Sentiment Corpus of tweets (ArSAS) [29].

The performance of the proposed model is evaluated using the primary measurements, namely Accuracy, Recall (Sensitivity), Precision, and F-measure, as shown in Equations (1)–(4). These metrics are calculated using the terms TP (True Positive), TN (True Negative), FP (False Positive), and FN (False Negative).

Accuracy = \frac{TP + TN}{TP + TN + FP + FN}

(1)

Recall (Sensitivity) = \frac{TP}{TP + FN}

(2)

Precision = \frac{TP}{TP + FP}

(3)

F - measure = \frac{2 \times Precision \times Recall}{Precision + Recall}

(4)

4.2. Experimental Analysis

This section elaborates on the effectiveness and performance of the proposed approach. Moreover, a comparative analysis of XLNet and BERT sentiment analysis models considering Arabic tweets about COVID-19 vaccines is conducted, and the results are summarized in Table 4. Based on the presented experimental results, the XLNet model outperforms the BERT model with an accuracy and F-measure of 93.2% and 92% for COVID-19 vaccine opinion recognition, respectively.

Figure 3 shows the training and validation loss curves for 300 epochs, including mean loss values and standard deviations, which are calculated by cross-validation. It is indicated from the illustrated mean training loss curve that the proposed model is learning from the training data effectively without stagnating too early. In addition, the mean validation loss curve shows a consistent reduction that demonstrates good generalization throughout the validation dataset for effectively avoiding overfitting. Furthermore, the standard deviations of both training and validation loss curves offer insights into the model’s stability.

Figure 4 shows the training and validation accuracy curves for 300 epochs, including mean accuracy values and standard deviations, which are calculated through cross-validation. Figure 4 shows that both training and validation accuracies increased throughout the training process, which is a good indicator of model learning and its ability to make correct predictions. Moreover, the small difference between training and validation accuracies is a strong indicator of the model generalization and not overfitting. The consistent standard deviations of both the training and validation accuracy curves also demonstrate the consistency of the model’s performance and generalization.

As previously discussed, the black-box nature of ML models for predictive analytics can make it difficult to comprehend the internal workings and how they arrive at their outputs. For instance, in this paper, it is not readily understandable how the model precisely assesses user opinions about COVID-19 vaccines from Arabic tweets due to model’s inherent complexity. To address this issue, two different explainable models that simulate the working of the proposed model are developed, which illustrate how it exactly operates and generates predictions. This process, known as XAI, encompasses techniques such as LIME and SHAP. These techniques reveal the predicted probabilities of a particular sentiment based on the words in a specific tweet. By analyzing the contributions of individual words to the model’s predictions, these XAI techniques enhance understanding of the model’s behavior and build trust in its outputs.

In the following sections, the effectiveness and performance of the proposed model will be described through utilizing LIME and SHAP XAI techniques.

4.3. LIME-Based Model Analysis

To highlight the power of the proposed model and analyze its performance, the outputs generated from LIME are examined.

In Figure 5, Figure 6 and Figure 7, LIME prediction probability graphs are depicted to highlight word importance and clarify the proposed model’s decision on a certain sample for COVID-19 vaccine opinion recognition. The three tweet instances illustrated in Figure 5, Figure 6 and Figure 7 clarify how LIME improves the understanding of the proposed model’s functionality. LIME analyzes all words in a tweet and assigns probability scores to each word according to the expressed opinions (positive/negative/neutral), their context, and their syntactic meaning. The opinion with the highest probability score is considered as the tweet’s final sentiment.

Figure 5 shows the prediction probabilities of the target positive-sentiment tweet: “ الوباء انتشار من الحد في تساهم كورونا ضد فيروس الاحترازية الإجراءات” (precautionary procedures against the coronavirus contribute to reduce the spread of the epidemic), and the contribution of each individual word to different sentiment classifications.

As shown, the sentence is classified as negative, neutral, and positive sentiment with a probability of 0.08, 0.02, and 0.9, respectively. It is noticed that positive sentiment is the dominant class with the highest probability, indicating that the tweet conveys a positive message. From column (Not Positive, Positive), LIME gives the words “احترازي” (precautionary), “تساهم” (contribute) and “الحد” (reduce/limit) high positive contributions with scores of 0.43, 0.25 and 0.15, respectively, indicating that they are key contributors representing a positive message. On the other hand, in column (Not Negative, Negative), the word “ضد”(against) has the highest negative contribution score of 0.11, as it reflects a negative connotation. In the column (Not Neutral, Neutral), the words “احترازي” (precautionary), “تساهم” (contribute), and “الحد” (reduce/limit) have significant contributions towards neutral sentiment, but these contributions are smaller than their contributions to positive sentiment. This reflects the effectiveness of the proposed model in capturing the overall positive sentiment of the tweet, despite the presence of words in the tweet that might be interpreted as neutral.

Figure 6 explains the prediction probabilities of the target negative-sentiment tweet: “شكوك وتثير كاف بشكل فعاليتها تثبت لم - ۱۹ كوفيد ضد الموجودة اللقاحات في أشك” (I doubt the existing vaccines against COVID-19 as their effectiveness has not been sufficiently proven and raise doubts) and the contribution of each individual word to different sentiment classifications. As shown, the sentence is classified as negative, neutral, and positive sentiment with probabilities of 0.7, 0.06, and 0.24, respectively. It is noticed that negative sentiment is the dominant class with the highest probability, indicating that the tweet conveys a negative message, reflecting the nature of negative sentiment. From column (Not Negative, Negative), LIME gives the word “شك” (doubt) a high contribution to negative sentiment with a score of 0.56. This is reasonable, as doubt is a strong negative indicator in this context. Also, the word “ضد” (against) has a significant contribution to negative sentiment with a score of 0.26. Other words such as “شكوك” (doubts), “لقاح” (vaccine), and “- ۱۹ كوفيد ” (COVID-19) contribute with lower scores to the negative sentiment.

On the other hand, in column 2 (NOT Positive/Positive), the word “تثير” (raise/cause) has a high contribution to positive sentiment with a score of 0.43. In addition, the word ‘تثير’ (raise/cause) on its own can often convey a neutral or even positive tone. However, when combined with the word ‘شكوك’ (doubts), it intensifies the negative sentiment. Thus, this reflects that the model might be interpreting the general action of “تثير” (raise/cause), without accounting for the negativity implied in this particular context. Meanwhile, the word “موجود” (exist) contributes a moderate score of 0.2 to positive sentiment. This is because acknowledging the existence of vaccines can be interpreted as a factual, and potentially neutral or slightly positive aspect. Moreover, in column 3 (NOT Neutral/Neutral), the word “تثير” (raise/cause) also has a noticeable contribution to neutral sentiment with a score of 0.07. This further reinforces the observation that this word can individually have a more neutral interpretation.

In brief, the model identifies the sentence correctly as predominantly negative, and the word “شك” (doubt) is the main driver of this negativity. The accurate investigation presented through the three columns reveals that some words, like “تثير” (raise/cause), may deliver different meanings depending on the specific sentiment being analyzed. The model shows difficulty in thoroughly comprehending the negative connotation of the phrase “تثير الشكوك” (raise doubts) within the context of positive sentiment analysis.

Figure 7 explains the prediction probabilities for the target neutral-sentiment tweet: “ الموقع على بياناتك تسجل لازم تاخده عشان الصحة مراكز كل متوفر في كورونا ضد التطعيم ” (COVID-19 vaccination is available in all health centers. Because you must register your details on the website).

Additionally, the figure shows the contribution of each individual word in that tweet, considering different sentiment classifications. As shown in that figure, the sentence is classified as negative, neutral, and positive sentiment with a probability of 0.1, 0.6, and 0.3, respectively, with a weaker though non-negligible positive sentiment. It is noticed that neutral sentiment is the dominant class, indicating that the tweet provides factual and informative content, reflecting the nature of neutral sentiment.

It offers details about the availability of COVID-19 vaccination and the necessary steps for registration on the website to access it. LIME gives the words “توفر” (provides), “أخذ” (get) and “لزم” (must) a 0.21, 0.19 and 0.14 probability score, respectively, of being classified as a neutral sentiment, indicating that they are key contributors, representing a factual tone. The word “توفر” (provides) has the highest contribution score in positive sentiment with 0.4 probability, as it reflects a sense of facilitation or accessibility, while the words “لزم” (must), “ضد” (against) have minimal contribution scores towards negative sentiment, which slightly shifts the tone towards constraint or opposition. This reflects that the tweet provides information/content with minimal emotion, primarily focusing on facts, such as the registration requirements and the availability of COVID-19 vaccination in health centers. The visualization of LIME effectively illustrated why the model classified the tweet as neutral, with the word “توفر” leaning slightly towards positivity without dominating the overall sentiment.

4.4. SHAP-Based Model Analysis

On the other hand, the SHAP XAI technique is utilized for highlighting word importance and clarifying the proposed model’s decision on a certain sample for COVID-19 vaccine opinion recognition. Also, Figure 8, Figure 9 and Figure 10 present SHAP values for the same three instances explained in the previous subsection of model analysis based on LIME technique. These figures show SHAP summary plots, visualizing the contributions of individual words in a tweet to the model’s prediction. In these plots, the

b a s e v a l u e

(the average output across the dataset) represents the expected model output in case no input features are provided. The final model output for the given input instance is denoted by

f (i n p u t s)

. Each word in the tweet is represented by an arrow and a specific color. The length and direction of the arrow reflect the contribution of each word to the final prediction. Red arrows and blue arrows represent, respectively, the words that positively contribute and negatively contribute to the model prediction. The contribution of each word to the final prediction is accumulated along the horizontal scale.

Figure 8 shows the prediction probabilities and word importance of the correctly classified positive opinion instance. It is noticed that the words “احترازى” (precautionary), “تساهم” (contributes) “في” (in), “فيروس” (virus), and “الحد” (limit), in the zone of red arrows, positively influence the prediction. These words increase the likelihood of the predicted class, strongly indicating a significant classification with a probability of 92%.

From Figure 8, it can be recognized that the proposed model correctly predicts the opinion of the given instance as a positive opinion.

Similarly, Figure 9 illustrates the prediction probabilities and word importance of the correctly classified negative opinion instance. It is noticed that the model prediction is negative (−0.65), indicating that the model is pushed towards a negative class, where the words “شك” (doubt), “ضد” (against), “لم” (do not), and “شكوك“(doubts) impact the model’s correct classification of a given instance as a negative opinion.

Finally, Figure 10 shows the word contributions for a model prediction that results in a moderately neutral score of 0.1992, which is slightly less than the

b a s e v a l u e

.

The figure illustrates that words like “لازم، ضد ، كورونا” (must, against, corona) contribute negatively, while words like “متوفر، تسجيل، التطعيم” (available, register, vaccination) contribute positively to model prediction. This blending between positive and negative contributions explains why the proposed model classified the given instance as a neutral opinion.

Figure 11 shows a Beeswarm plot of SHAP calculations for the fourteen highest-ranking words. The words are descendingly ordered by their Mean Absolute SHAP Value (MAV), with most remarkable words at the top. Each dot corresponds to one tweet in the dataset.

The Beeswarm plot illustrates how the different words in each tweet affect the prediction of the proposed XLNet model regarding COVID-19 vaccine intake opinions. The positive values of SHAP show a change in the expected model prediction concerning COVID-19 vaccine intake. The Beeswarm plot in Figure 11 is based on an analysis of all words in the dataset. It showcases the most informative words, such as “كورونا” (corona), “حماية” (protection), “ضد” (against), “لازم” (must), “الأزمة” (pandemic), “يتلقى” (receive), and “لقاح” (vaccine).

Figure 12 shows Arabic word clouds for the most frequent hashtags in the used dataset, along with their English translations.

This visualization illustrates the most frequent words that are highly related to COVID-19 vaccination. From the word cloud visualization depicted in Figure 12, several insights can be revealed for positive, negative, and neutral opinions. For instance, in Figure 12a, some of the most frequent words in the “COVID-19 vaccines positive opinion” class are “تلقيت” (received), “أشجع الجميع” (encourage everyone), “اللقاح” (Vaccine), “بأمان” (safely), “شكرا” (thanks), and “فخور” (proud), which reflect a supportive opinion towards vaccination. However, looking at the most frequent words that appear in the “COVID-19 vaccines negative opinion” class, as depicted in Figure 12b, words such as “بصداع” (headache), “أشعر بالقلق” (feeling worried), “اللقاح” (vaccine), “خطيرة” (serious), “بألم شديد” (severe pain), and “تعب” (tired) highlight hesitancy and adverse reactions associated with vaccination. Finally, Figure 12c shows negative words such as “ألم” (pain) and “مشاكل” (problems), alongside positive terms like “مريحة” (comfortable), “بأمان” (safely), and “إيجابية” (positive), reflecting a mixed viewpoint. This co-occurrence of positive and negative associations reflects the ambiguity of neutral opinions on COVID-19 vaccination.

4.5. Comparative Analysis Discussion

As previously presented, the proposed approach outperforms most of the surveyed related studies in this paper. It is important to note that the use of a different Arabic COVID-19 vaccination dataset makes direct comparative analysis challenging and potentially unfair. As a result of the limited literature considering the ArCovidVac dataset for developing efficient sentiment analysis models, the comparative study of the proposed approach in this paper considered the approach presented in [15]. In that approach, the authors proposed a conventional sentiment analysis approach, based on a hybrid CNN-BiLSTM model for recognizing the polarity of Arabic tweets on COVID-19 vaccination through deploying various word embedding techniques. The proposed approach achieved an F-measure of 80.5% on the ArCovidVac dataset, and it did not employ any XAI techniques for evaluating the interpretability of the achieved performance. In contrast, the proposed approach in this study combines XLNet embeddings with a BiLSTM model and achieves superior results, with 93.2% accuracy and a 92% F-measure. Additionally, the integration of LIME and SHAP XAI techniques enhances interpretability and provides detailed sentiment insights with a focus on the model’s effectiveness for the low-resource language Arabic.

4.6. Limitations

Despite the significant performance achieved by the proposed approach on COVID-19 vaccine sentiment context in Arabic tweets, several potential limitations need to be considered. Analyzing polarity in Arabic text presents several linguistic challenges due to the diverse dialects in addition to the limited readily available sentiment analysis resources for the Arabic language. In addition, the narrow scope of the specific COVID-19 vaccine sentiment context in Arabic tweets might limit the model’s effectiveness and constrain its generalizability to other topics, domains, or languages. Moreover, the performance of the model might be influenced by potential text biases that could come from sentiment class imbalance. Furthermore, although the value of XLNet embeddings lies in their ability to capture rich contextual information, which significantly enhances sentiment analysis performance, the model’s reliance on XLNet embeddings limits its adaptability to other embedding techniques, which may affect its versatility and performance across diverse datasets. In the same vein, while LIME and SHAP enhance interpretability, they can be computationally intensive, impacting scalability and efficiency in real-time applications.

5. Conclusions and Future Work

This paper presents an interpretable XAI approach for aspect-based sentiment analysis of Arabic tweets on the COVID-19 vaccine. The proposed model utilizes XLNet features in conjunction with LIME and SHAP XAI techniques to explore the potential for accurate and interpretable sentiment classification. It has been observed that LIME and SHAP effectively enhance model interpretability by identifying the key aspect features that influence classification decisions. Experimental results demonstrate that the XLNet model surpasses the BERT model, achieving an accuracy of 93.2% and an F-measure of 92% in COVID-19 vaccine opinion recognition. The proposed model has also proven its effectiveness in identifying subtle sentiment aspects. Moreover, the integration of SHAP and LIME within the XAI framework has improved the understanding and confidence in the predictions achieved by the XLNet model, thereby enhancing its practical applicability. Despite the outperformance of the proposed approach, several challenges were experienced in analyzing sentiment analysis of Arabic text, such as diverse dialects and limited resources. In addition, reliance on XLNet embeddings limits adaptability to other methods. Moreover, despite improving interpretability through employing LIME and SHAP XAI techniques, they may face challenges with highly complex models or large datasets, which results in reduced explanation reliability.

In future work, the proposed approach will be advanced by exploring the effects of various feature extraction techniques on the accuracy of sentiment analysis. Also, other XAI techniques, like DeepLIFT or DeepSHAP, could be investigated to provide more accurate and consistent interpretations. Additionally, this approach can be further enhanced by applying it to other low-resource languages to assess its effectiveness and address the previously mentioned potential limitations.

Author Contributions

Conceptualization, A.H.S. and N.E.-B.; methodology, A.H.S., E.E. and N.E.-B.; software, A.H.S. and E.E.; validation, N.E.-B. and A.M.I.; investigation, A.H.S., E.E. and N.E.-B.; data curation, A.H.S.; writing—original draft preparation, A.H.S. and E.E.; writing—review and editing, N.E.-B., S.A.T. and A.M.I.; visualization, A.H.S., E.E. and N.E.-B.; supervision, N.E.-B. and S.A.T.; project administration, N.E.-B. All authors have read and agreed to the published version of the manuscript.

Funding

The authors extend their appreciation to the Deanship of Research and Graduate Studies at King Khalid University, KSA, for funding this work through General Research Project under grant number GRP/50/45.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

For experimental purposes, we generated the dataset from two publicly available datasets: ArCovidVac and an Arabic Twitter dataset of 1.1 million Arabic posts regarding the COVID-19 vaccine, published on Mendeley.

Acknowledgments

The authors extend their appreciation to the Deanship of Research and Graduate Studies at King Khalid University, KSA, for funding this work through General Research Project under grant number GRP/50/45.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Aljedaani, W.; Abuhaimed, I.; Rustam, F.; Mkaouer, M.W.; Ouni, A.; Jenhani, I. Automatically detecting and understanding the perception of COVID-19 vaccination: A middle east case study. Soc. Netw. Anal. Min. 2022, 12, 128. [Google Scholar] [CrossRef]
Alhumoud, S.; Al Wazrah, A.; Alhussain, L.; Alrushud, L.; Aldosari, A.; Altammami, R.N.; Almukirsh, N.; Alharbi, H.; Alshahrani, W. ASAVACT: Arabic sentiment analysis for vaccine-related COVID-19 tweets using deep learning. PeerJ Comput. Sci. 2023, 9, e1507. [Google Scholar] [CrossRef] [PubMed]
Zeid, N.; Tang, L.; Amith, M.T. The spread of COVID-19 vaccine information in Arabic on YouTube: A network exposure study. Digit. Health 2023, 9, 20552076231205714. [Google Scholar] [CrossRef] [PubMed]
Wankhade, M.; Annavarapu, C.S.R.; Abraham, A. CBMAFM: CNN-BiLSTM Multi-Attention Fusion Mechanism for Sentiment Classification. Multimed. Tools Appl. 2024, 83, 51755–51786. [Google Scholar] [CrossRef]
Wang, Y.; Cheng, X.; Meng, X. Sentiment Analysis with an Integrated Model of BERT and Bi-LSTM Based on Multi-Head Attention Mechanism. Int. J. Comput. Sci. 2023, 50, 255–262. [Google Scholar]
Yang, Z.; Dai, Z.; Yang, Y.; Carbonell, J.G.; Salakhutdinov, R.; Le, Q.V. XLNet: Generalized Autoregressive Pretraining for Language Understanding. In Proceedings of the Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’16), San Francisco, CA, USA, 13–17 August 2016; pp. 1135–1144. [Google Scholar]
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
Shrikumar, A.; Greenside, P.; Shcherbina, A.; Kundaje, A. Not Just a Black Box: Learning Important Features Through Propagating Activation Differences. arXiv 2017, arXiv:1605.01713. [Google Scholar]
Aljameel, S.S.; Alabbad, D.A.; Alzahrani, N.A.; Alqarni, S.M.; Alamoudi, F.A.; Babili, L.M.; Aljaafary, S.K.; Alshamrani, F.M. A Sentiment Analysis Approach to Predict an Individual’s Awareness of the Precautionary Procedures to Prevent COVID-19 Outbreaks in Saudi Arabia. Int. J. Environ. Res. Public Health 2021, 18, 218. [Google Scholar] [CrossRef] [PubMed]
Mubarak, H.; Hassan, S.; Chowdhury, S.A.; Alam, F. ArCovidVac: Analyzing Arabic Tweets About COVID-19 Vaccination. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, Marseille, France, 20–25 June 2022; Calzolari, N., Béchet, F., Blache, P., Choukri, K., Cieri, C., Declerck, T., Goggi, S., Isahara, H., Maegaard, B., Mariani, J., et al., Eds.; European Language Resources Association: Paris, France, 2022; pp. 3220–3230. [Google Scholar]
Alhazmi, H. Arabic Twitter conversation dataset about the COVID-19 vaccine. Data 2022, 7, 152. [Google Scholar] [CrossRef]
Müller, M.; Salathé, M.; Kummervold, P. COVID-Twitter-BERT: A natural language processing model to analyse COVID-19 content on Twitter. Front. Artif. Intell. 2023, 6, 1023281. [Google Scholar] [CrossRef]
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar] [CrossRef]
Abdelhady, N.; Hassan, A.; Soliman, T.; F. Farghally, M. Stacked-CNN-BiLSTM-COVID: An effective stacked ensemble deep learning framework for sentiment analysis of Arabic COVID-19 tweets. J. Cloud Comput. 2024, 13, 85. [Google Scholar] [CrossRef]
Alhumayani, M.K.; Alhazmi, H.N. Detecting reported side effects of COVID-19 vaccines from Arabic twitter (X) data. IEEE Access 2024, 12, 55367–55388. [Google Scholar] [CrossRef]
Kumar, A.; Roy, P.K.; Singh, J.P. Bidirectional Encoder Representations from Transformers for the COVID-19 vaccine stance classification. In Proceedings of the Working Notes of FIRE 2021—Forum for Information Retrieval Evaluation, Gandhinagar, India, 13–17 December 2021; Mehta, P., Mandl, T., Majumder, P., Mitra, M., Eds.; CEUR Workshop Proceedings: Aachen, Germany, 2021; Volume 3159, pp. 1216–1220. [Google Scholar]
Basu, M.; Poddar, S.; Ghosh, S.; Ghosh, K. Overview of the FIRE 2021 track: Information Retrieval from Microblogs during Disasters (IRMiDis). In Proceedings of the FIRE’21: Proceedings of the 13th Annual Meeting of the Forum for Information Retrieval Evaluation, Virtual, 13–17 December 2021. [Google Scholar] [CrossRef]
Kumar, A.; Singh, J.P.; Singh, A.K. Explainable BERT-LSTM Stacking for Sentiment Analysis of COVID-19 Vaccination. IEEE Trans. Comput. Soc. Syst. 2023, 1–11. [Google Scholar] [CrossRef]
Prakash, J.; Vijay, A.A. A multi-aspect framework for explainable sentiment analysis. Pattern Recognit. Lett. 2024, 178, 122–129. [Google Scholar]
Sweidan, A.H.; El-Bendary, N.; Elhariri, E. Autoregressive Feature Extraction with Topic Modeling for Aspect-based Sentiment Analysis of Arabic as a Low-resource Language. In ACM Transactions on Asian and Low-Resource Language Information Processing; Association for Computing Machinery: New York, NY, USA, 2024; Volume 23. [Google Scholar] [CrossRef]
Hamed, I.; Vu, N.T.; Abdennadher, S. Arzen: A speech corpus for code-switched egyptian arabic-english. In Proceedings of the Twelfth Language Resources and Evaluation Conference, Marseille, France, 11–16 May 2020; pp. 4237–4246. [Google Scholar]
Bouamor, H.; Habash, N.; Oflazer, K. A Multidialectal Parallel Corpus of Arabic. In Proceedings of the LREC, 2014, Reykjavik, Iceland, 26–31 May 2014; pp. 1240–1245. [Google Scholar]
Taghva, K.; Elkhoury, R.; Coombs, J. Arabic stemming without a root dictionary. In Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC’05)-Volume II, Las Vegas, NV, USA, 4–6 April 2005; Volume 1, pp. 152–157. [Google Scholar]
Ali, S.; Abuhmed, T.; El-Sappagh, S.; Muhammad, K.; Alonso-Moral, J.M.; Confalonieri, R.; Guidotti, R.; Del Ser, J.; Díaz-Rodríguez, N.; Herrera, F. Explainable Artificial Intelligence (XAI): What we know and what is left to attain Trustworthy Artificial Intelligence. Inf. Fusion 2023, 99, 101805. [Google Scholar] [CrossRef]
Peters, U. Explainable AI lacks regulative reasons: Why AI and human decision-making are not equally opaque. AI Ethics 2023, 3, 963–974. [Google Scholar] [CrossRef]
Gaspar, D.; Silva, P.; Silva, C. Explainable AI for Intrusion Detection Systems: LIME and SHAP Applicability on Multi-Layer Perceptron. IEEE Access 2024, 12, 30164–30175. [Google Scholar] [CrossRef]
Alhazmi, H. Arabic Twitter Conversation Dataset about COVID-19 Vaccine. 2022. Available online: https://data.mendeley.com/datasets/zmwfnsms9n/1 (accessed on 20 July 2024).
Mubarak, H.; Hassan, S.; Chowdhury, S.A.; Alam, F. ArCovidVac: Arabic COVID-19 Vaccine Sentiment Dataset. 2022. Available online: https://huggingface.co/datasets/arbml/ArCovidVac (accessed on 20 July 2024).

Figure 1. Schematic diagram of the proposed XAI-based sentiment analysis approach.

Figure 2. Preprocessing workflow.

Figure 3. Training and validation loss with cross-validation.

Figure 4. Training and validation accuracy with cross-validation.

Figure 5. LIME interpretation for predicting a positive opinion based on the proposed model.

Figure 6. LIME interpretation for predicting a negative opinion based on the proposed model.

Figure 7. LIME interpretation for predicting a neutral opinion based on the proposed model.

Figure 8. SHAP force plot for predicting a positive opinion based on the proposed model.

Figure 9. SHAP force plot for predicting a negative opinion based on the proposed model.

Figure 10. SHAP force plot for predicting a neutral opinion based on the proposed model.

Figure 11. Beeswarm plot of SHAP calculations for the fourteen highest-ranking words.

Figure 12. Word cloud of COVID-19 vaccine Arabic tweets with their English translations.

Table 1. Summary of surveyed state-of-the-art studies for literature review.

Study	Task	Methods	Dataset	Performance
[10] (2021)	Prediction of individual COVID-19 precautionary awareness	SVM, NB, KNN	collected dataset of Arabic COVID-19-related tweets	Accuracy = 85% (SVM with bigrams TF-IDF)
[11] (2022)	Stance detection in Arabic tweets	Transformer architectures	ArCovidVac (10K Arabic tweets)	F1-score = 80%
[12] (2022)	Analysis of Arabic tweets related to vaccine	Text analysis	1M+ Arabic COVID-19 tweets	–
[2] (2023)	Sentiment polarity analysis of Arabic tweets	SGRU, SBi-GRU, ensemble, AraBERT	ASAVACT (32,476 Arabic tweets)	Accuracy = 81.67%
[13] (2023)	COVID-19 vaccination sentiment classification	COVID-Twitter-BERT (CT-BERT) transformer-based model	Training: collected corpus of 160 M COVID-19 tweets Testing: Twitter-related data.	F1-score: 0.833
[15] (2024)	Tweet classification on COVID-19 vaccines	hybrid CNN-BiLSTM	SenWave, AraCOVID19-SSD and ArCovidVac	F-measures of 76.76%, 87%, and 80.5% on the datasets, respectively.
[16] (2024)	detect side effect for Vaccine	BTM, SVM	65,387 Arabic tweets	Accuracy = 90.9%, F1-score = 0.876
[17] (2021)	COVID-19 vaccination stance classification	ensemble-based CT-BERT model	validated by the FIRE-2021 shared task [18]	validation F1-score = 0.86 testing F1-score = 0.532
[19] (2023)	XAI based sentiment analysis of COVID-19 vaccination tweets	CT-BERT-LSTM	Tested by the FIRE-2021 shared task [18]	F1-score = 0.88
[20] (2024)	XAI based Multi-aspect sentiment analysis	Hierarchical neural network	Benchmark datasets of reviews, tweets	Accuracy = 93.17%, F1-score = 0.9318

Table 2. Illustrative examples of tweet preprocessing with sentiment labeling.

Raw Tweet	Pre-Processed	Sentiment	Word Polarity	Total	Sentiment
	Tweet	Words	Score	Score
الآن متاح كورونا لقاح رائعة أخبار	لقاح، رائع، خبر،	رائعة	0.6
على للحصول سجل ۱۸. سن فوق للجميع	الان، متاح، كورونا،	متاح	0.2	0.9	Positive
الحل! من جزءًا وكن اليوم موعدك	سن، فوق، جميع،	سجل	0.1
۱۹ #الصحة_العامة #اللقاح_فعال #كوفيد -	موعد، سجل، حصل،
	يوم، كن، جزء، حل
Great news! The COVID-19 vaccine is now	news, great, vaccine,
available to everyone over the age of 18. Register	Corona, Available, Now,	Greats
for your appointment today and be part of the	all, above, age,	available
solution! #VaccineWorks #COVID-19 #PublicHealth	Register, get, appointment,	register
	day, be, part, solution
في دلوقتي متوفر كورونا ضد التطعيم	تطعيم، ضد،	ضد	−0.60
لازم تاخده عشان الصحة. مراكز كل	مركز، توفر، كورونا،	توفر	0.4	0	Neutral
الموقع على بياناتك تسجل	لزم، أخذ، صح،	أخذ	0.15
	سجل، بيان، موقع	لزم	0.05
Corona vaccination is now available	Vaccination, against, Corona,	against
in health centers. To get it,	availability, Center, correct, take,	provide
you must register your data on the website	necessary, record, statement,	get
	location	must
- ۱۹ كوفيد ضد الموجوده اللقاحات في اشك	شك، لقاح، موجود،	اشك	−0.54
شكوك وتثير كافٍ بشكل فعاليتها تثبت لم	ضد، كوفيد - ۱۹،	تثير	0.27	−0.9	Negative
	ثبت، فعالية، كاف،	شكوك	−0.63
	ثار، شك
I doubt that the existing vaccines	Doubt, vaccine, exist,	doubt
against COVID-19 have not been proven	against, COVID-19, proved	raise
to be effective enough and raise doubts.	effectiveness, enough, raise, doubt	doubts

Table 3. Dataset description.

Dataset	Number of Tweets	Class Distribution
Dataset-I (An Arabic Twitter dataset) [28]	1.1 M Arabic tweets	Positive: 400,000 Neutral: 300,000 Negative: 400,000
Dataset-II (ArCovidVac dataset) [29]	10,002 Arabic tweets	Positive: 4000 Neutral: 3000 Negative: 3002

Table 4. Performance comparison of XLNet and BERT models.

Metric	XLNet	BERT
Accuracy	93.2%	85.0%
Recall (Sensitivity)	92.3%	85.2%
Precision	92.0%	82.1%
F-measure	92.0%	81.0 %

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sweidan, A.H.; El-Bendary, N.; Taie, S.A.; Idrees, A.M.; Elhariri, E. Explainable Deep Learning for COVID-19 Vaccine Sentiment in Arabic Tweets Using Multi-Self-Attention BiLSTM with XLNet. Big Data Cogn. Comput. 2025, 9, 37. https://doi.org/10.3390/bdcc9020037

AMA Style

Sweidan AH, El-Bendary N, Taie SA, Idrees AM, Elhariri E. Explainable Deep Learning for COVID-19 Vaccine Sentiment in Arabic Tweets Using Multi-Self-Attention BiLSTM with XLNet. Big Data and Cognitive Computing. 2025; 9(2):37. https://doi.org/10.3390/bdcc9020037

Chicago/Turabian Style

Sweidan, Asmaa Hashem, Nashwa El-Bendary, Shereen A. Taie, Amira M. Idrees, and Esraa Elhariri. 2025. "Explainable Deep Learning for COVID-19 Vaccine Sentiment in Arabic Tweets Using Multi-Self-Attention BiLSTM with XLNet" Big Data and Cognitive Computing 9, no. 2: 37. https://doi.org/10.3390/bdcc9020037

APA Style

Sweidan, A. H., El-Bendary, N., Taie, S. A., Idrees, A. M., & Elhariri, E. (2025). Explainable Deep Learning for COVID-19 Vaccine Sentiment in Arabic Tweets Using Multi-Self-Attention BiLSTM with XLNet. Big Data and Cognitive Computing, 9(2), 37. https://doi.org/10.3390/bdcc9020037

Article Menu

Explainable Deep Learning for COVID-19 Vaccine Sentiment in Arabic Tweets Using Multi-Self-Attention BiLSTM with XLNet

Abstract

1. Introduction

2. Related Work

3. Proposed Approach

3.1. Dataset Preparation

3.2. Preprocessing

3.3. Feature Extraction

3.4. Classification

3.5. XAI Mechanisms

4. Results and Discussion

4.1. Datasets and Evaluation Metrics

4.2. Experimental Analysis

4.3. LIME-Based Model Analysis

4.4. SHAP-Based Model Analysis

4.5. Comparative Analysis Discussion

4.6. Limitations

5. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI