Leveraging Tweets for Artificial Intelligence Driven Sentiment Analysis on the COVID-19 Pandemic

Alkhaldi, Nora A.; Asiri, Yousef; Mashraqi, Aisha M.; Halawani, Hanan T.; Abdel-Khalek, Sayed; Mansour, Romany F.

doi:10.3390/healthcare10050910

Open AccessArticle

Leveraging Tweets for Artificial Intelligence Driven Sentiment Analysis on the COVID-19 Pandemic

by

Nora A. Alkhaldi

¹

,

Yousef Asiri

²

,

Aisha M. Mashraqi

²,

Hanan T. Halawani

^2,*,

Sayed Abdel-Khalek

³

and

Romany F. Mansour

⁴

¹

Department of Computer Science, College of Computer Sciences and Information Technology, King Faisal University, Al-Ahsa 31982, Saudi Arabia

²

Department of Computer Science, College of Computer Science and Information Systems, Najran Univesity, Najran 61441, Saudi Arabia

³

Department of Mathematics, College of Science, Taif University, Taif 21944, Saudi Arabia

⁴

Department of Mathematics, Faculty of Science, New Valley University, El-Kharga 72511, Egypt

^*

Author to whom correspondence should be addressed.

Healthcare 2022, 10(5), 910; https://doi.org/10.3390/healthcare10050910

Submission received: 4 April 2022 / Revised: 9 May 2022 / Accepted: 10 May 2022 / Published: 13 May 2022

(This article belongs to the Special Issue Clinical Decision-Making Processes in COVID-19 Pandemic: Changes and Effects)

Download

Browse Figures

Versions Notes

Abstract

:

The COVID-19 pandemic has been a disastrous event that has elevated several psychological issues such as depression given abrupt social changes and lack of employment. At the same time, social scientists and psychologists have gained significant interest in understanding the way people express emotions and sentiments at the time of pandemics. During the rise in COVID-19 cases with stricter lockdowns, people expressed their sentiments on social media. This offers a deep understanding of human psychology during catastrophic events. By exploiting user-generated content on social media such as Twitter, people’s thoughts and sentiments can be examined, which aids in introducing health intervention policies and awareness campaigns. The recent developments of natural language processing (NLP) and deep learning (DL) models have exposed noteworthy performance in sentiment analysis. With this in mind, this paper presents a new sunflower optimization with deep-learning-driven sentiment analysis and classification (SFODLD-SAC) on COVID-19 tweets. The presented SFODLD-SAC model focuses on the identification of people’s sentiments during the COVID-19 pandemic. To accomplish this, the SFODLD-SAC model initially preprocesses the tweets in distinct ways such as stemming, removal of stopwords, usernames, link punctuations, and numerals. In addition, the TF-IDF model is applied for the useful extraction of features from the preprocessed data. Moreover, the cascaded recurrent neural network (CRNN) model is employed to analyze and classify sentiments. Finally, the SFO algorithm is utilized to optimally adjust the hyperparameters involved in the CRNN model. The design of the SFODLD-SAC technique with the inclusion of an SFO algorithm-based hyperparameter optimizer for analyzing people’s sentiments on COVID-19 shows the novelty of this study. The simulation analysis of the SFODLD-SAC model is performed using a benchmark dataset from the Kaggle repository. Extensive, comparative results report the promising performance of the SFODLD-SAC model over recent state-of-the-art models with maximum accuracy of 99.65%.

Keywords:

COVID-19; sentiment analysis; Twitter; mental illness; deep learning; natural language processing

1. Introduction

COVID-19 is a communicable disease that can be transferred or spread mainly by the tiny droplets released by the individual during sneezing, coughing, and also while talking. It is currently becoming a source of anxiety depression and stress, owing to the false information that is to be posted on social media. The mental well-being of people is severely affected due to the fast spread of incorrect information on social media [1,2]. Due to the present situation of lockdown and social distancing, people are mainly dependent on, or even addicted to, the internet and mobile phones, as revealed by reports indicating that the highest number of activities are performed [1] on social media. At the time of lockdown, traffic on social media has extremely increased [3]. Among all other social media, Twitter ranks first in spreading COVID news [4,5]. The devastating part of such news is subjective because it involves mostly personal thoughts and confusion, which leads to intentional fake information, negativity, and uncertainty in the human community [6]. Meanwhile, this condition is seeking the interest of researcher scholars to make calculable analyses to create a wholesome picture. This study mainly aims at sentiment analysis based on Twitter datasets with regard to COVID-19 through a supervised machine learning algorithm.

During lockdowns, all individuals, particularly teenagers, usually spend more time on Twitter, and in fact, users are more active than at any other time. The reason behind this is to receive up-to-date information regarding COVID-19 news. Meanwhile, they share their thoughts and feelings with friends and society through a medium. Therefore, in this pandemic situation, the analysis of Twitter data has received attention from the research community. Sentiment analysis (SA) is a technical study that deals with the opinions, attitudes, and emotions of people [7]. It is considered an efficient way to calculate people’s opinions on specific topics. Additionally, SA is able to convey several impacts on the community in various means. Additionally, SA summarizes the different anxieties and mental health conditions of people that arise during the pandemic situation. We can quickly identify the depression status and panic disorder of individuals in a community from the SA outcome [8]. The only solution to bring positivity to society is to apply various virtual depression optimizers for that depressed person. It should be mentioned that the success of most applications is based on the sentiments of social users. SA for active users is considered one of the efficient ways of tracking public opinion. In this pandemic situation, these kinds of studies have made important contributions to helping policymakers and governments.

Based on this background, this paper presents a new sunflower optimization with deep-learning-driven sentiment analysis and classification (SFODLD-SAC) on COVID-19 tweets. The presented SFODLD-SAC model initially preprocesses the tweets in distinct ways such as stemming, removal of stopwords, usernames, link punctuations, and numerals. In addition, the TF-IDF model is applied for the useful extraction of features from the preprocessed data. Moreover, a cascaded recurrent neural network (CRNN) model is employed to analyze and classify sentiments. Finally, the SFO algorithm is utilized to optimally adjust the hyperparameters involved in the CRNN model. The simulation analysis of the SFODLD-SAC model is performed using a benchmark dataset from the Kaggle repository. In short, the paper’s contributions are as follows:

An intelligent SFODLD-SAC model is presented consisting of TF-IDF-based feature extraction, CRNN classification, and SFO-based hyperparameter optimization for COVID-19 tweet analysis. To the best of our knowledge, the SFODLD-SAC model has been never presented in the literature;
The SFODLD-SAC technique involves the design of an SFO algorithm to optimally choose the hyperparameters, which helps in increasing the classification accuracy and avoids computational overhead;
The performance of the SFODLD-SAC model is validated using a benchmark dataset from the Kaggle repository, and the results are investigated under distinct sizes of training/testing data.

The rest of this paper is organized as follows: Section 2 offers related research, and Section 3 discusses the proposed model. Then, Section 4 elaborates on the experimental validation with the benchmark Kaggle dataset, and Section 5 draws the conclusions of the paper.

2. Literature Review

This section offers a detailed review of existing SA models related to COVID-19. Researchers in [9] analyzed Indian people’s sentiment during the lockdown. They used some popular hashtags for measuring negativity and positivity in people. Samuel et al. [10] highlighted public sentiments related to the COVID-19 pandemic using two machine learning (ML) classification techniques. The researchers in [11] presented an architecture, in which a deep-learning-based language model was applied through long short-term memory (LSTM) recurrent neural network for sentimental analysis during the increase in COVID-19 cases in India. In [12], bidirectional encoder representation conducted COVID-19 tweet data analysis from a Transformer-based (BERT) model. Gulati et al. [13] implemented a comparative analysis of an ML-based classifier. This classifier was employed for above 72,000 tweets related to COVID-19. Mujahid et al. [14] employed a Twitter dataset comprising 17,155 tweets regarding e-learning. ML and DL methods showed the potential, suitability, and capability for object detection, natural language processing, and image processing tasks. Luo and Xu [15] presented a DL method to explore customer opinion regarding restaurant features and to discover reviews with mismatched ratings. This study strengthens the extant literature by analyzing restaurant reviews posted during the COVID-19 pandemic and finding a DL algorithm for text mining tasks [16].

Singh et al. [17] proposed a DL technique for SA of Twitter statistics based on COVID-19 analyses. The suggested model depends on the LSTM–RNN-based network and improved featured weight by attention layer. This approach makes use of an improved feature transformation architecture through the attention model. Yin et al. [18] conducted a study based on COVID-19 vaccination on Twitter. The authors analyzed the deliberations of individuals in terms of this research topic and the emotional polarization between vaccine brands and perceptions of countries. The results showed that the majority of individuals trust the usefulness of vaccines, and they are ready to vaccinate themselves. In another study [19], the authors focused on increasing the consideration of public awareness of the COVID-19 pandemic trend and uncovering meaningful themes of concern posted by Twitter users in the English language. An NLP method and the latent Dirichlet allocation model was utilized to classify cluster and identify themes based on keyword analysis, along with identifying the most common twitter topics. In [20], data from the Arabic COVID-19-based tweet dataset were gathered. The data were processed according to the ML prediction model. The results showed that applying the SVM classification together with bigram in TF-IDF outperformed other algorithms, with 85% accuracy.

Lyu et al. [21] identified sentiments and topics in COVID-19 vaccine-interrelated conversation among the public on social networking platforms and discriminate the relevant modifications in sentiments and topics over time for a good understanding of public emotions, perceptions, and concerns that might affect the accomplishment of herd immunity objectives. Basiri et al. [22] presented a methodology according to the fusion of four DL and one traditional supervised ML method for SA of COVID-based twitters from eight countries. Moreover, the authors analyzed COVID-based searches using Google Trends for a good understanding of the changes in sentimental patterns at dissimilar places and times. Imran et al. [23] analyzed the reaction of citizens from various cultures to the novel COVID-19 and people’s sentiments regarding subsequent actions taken by many countries. The deep LSTM model was utilized for assessing the emotions and sentimental polarities from extracted tweets. In [24], GloVe and fastText were tested as word embedding. Data collected from Twitter were prepared as stemmed and unstemmed datasets.

In short, SA can be considered a meaningful source of data mining, particularly for circumstances relevant to the requirement of examining massive quantities of publicly relevant data, such as investigating public behavior concerning the COVID-19 pandemic and its outcome on people’s lives. Furthermore, it is desirable to improve decision makers’ countermeasures and offer them an effortless method with a collection of common rules that assist complex decision-making processes depending on people’s sentiments and via examining and sorting an essential set of key features for COVID-19 posts. Thus, the proposed study in this paper varies from earlier research in combining DSS with SA for improving government decisions at the time of COVID-19. The use of the SFODLD-SAC model offers more insights and achieves better performance than other state-of-the-art techniques.

3. Materials and Methods

In this study, a novel SFODLD-SAC model was developed for the identification and classification of sentiments on COVID-19 tweets. The presented SFODLD-SAC model follows a series of processes—namely, preprocessing, TF-IDF feature extraction, CRNN classification, and SFO-based parameter optimization. Figure 1 illustrates the pipeline of the SFODLD-SAC model. The workflow of each module in the SFODLD-SAC model is elaborated in the following subsections.

3.1. Data Used

In this section, the performance of the SFODLD-SAC model on the COVID-19 tweet dataset is investigated [25]. The dataset holds 2750 instances with 11 class labels. The details related to the dataset are given in Table 1. Some sample tweets related to COVID-19 are provided in Table 2.

3.2. Data Preprocessing

At first, the SFODLD-SAC model preprocessed the tweets in distinct ways such as stemming, removal of stopwords, usernames, link punctuations, and numerals [25].

Removing usernames and links in tweets that do not affect SA;
Removing punctuation marks such as hashtags and converting them to lower case;
Removing stopwords and numerals.

In addition, stemming was performed to reduce the terms to their root forms. The process of reducing the term also aids to reduce the complexity of text features. Then, the TextBlob approach was used to determine the sentiment scores. Afterward, the TF-IDF model was executed to generate a collection of feature vectors. In this study, the TF-IDF model was applied for the useful extraction of features from the preprocessed data.

3.3. Sentiment Classification Using CRNN Model

For the effective recognition and classification of sentiments, the CRNN model was exploited [26]. RNN is a branch of an artificial neural network (ANN), that is, a feedforward neural network (FFNN) with connections and loops. Unlike FFNN, RNN is able to calculate input sequence using a recurrent hidden layer with the activation of previous steps. Given the sequential dataset

(x_{1}, x_{2}, \dots, x_{T})

, where

x_{i}

denotes the data in

i^{t h}

time step, RNN upgrades the recurrent hidden layer

h_{t}

as follows:

h_{t} = \{\begin{array}{l} 0, & i f t = 0 . \\ ϕ (h_{t - 1}, x_{t}), & o t h e r w i s e . \end{array}

(1)

where

ϕ

indicates a nonlinear function. Therefore, RNN is made up of output

(y_{1}, y_{2}, \dots, y_{T})

. Eventually, data classification is implemented by an output

y_{T}

. In the traditional RNN model, the update rule of the recurrent hidden layer in (1) can be implemented by

h_{t} = ϕ (W x_{t} + U h_{t - 1}),

(2)

where

W

and

U

represent the coefficient matrix for input and activation of recurrent hidden units. Given that

p (x_{1}, x_{2}, \dots, x_{T})

is a sequential probability as follows:

p (x_{1}, x_{2}, \dots, x_{T}) = p (x_{1}) \dots p (x_{T} | x_{1}, \dots, x_{T - 1}) .

(3)

Next, the conditional likelihood distribution can be developed by utilizing a recurrent network. The tweets can be processed as sequence data, and a recurrent network is employed to model spectral sequence [26]. In contrast to the LSTM unit, GRU needs a smaller number of variables pertinent for classification, and a fewer number of training instances is needed. Therefore, GRU was chosen as a key element of RNN. The essential component of GRU is 2 gating units that are used to control the data flow within the unit. Figure 2 depicts the framework of CRNN.

p (x_{t} | x_{1}, \dots, x_{t - 1}) = ϕ (h_{t}),

(4)

h_{t} = (1 - u_{t}) h_{t - 1} + u_{t} {\tilde{h}}_{t} .

(5)

Now,

u_{t}

symbolizes the update gate as follows:

u_{t} = σ (w_{u} x_{t} + v_{u} h_{t - 1}) .

(6)

3.4. Parameter Optimization

Finally, the SFO algorithm was utilized to optimally adjust the hyperparameters involved in the CRNN model. Gomes et al. [27] introduced an approach for flowering plants based on a flower pollination technique that takes into account the biological process of reproduction.

Generally, the SFO algorithm involves six steps, as given in Figure 3. It starts with the parameter initiation process, during which the number of sunflowers, maximum iterations, and solution dimension space are initialized. Then, the sunflower parameters such as pollination rate, mortality rate, and survival rate are fixed. In the third step, the optimal objective of every sunflower is arbitrarily chosen. Next, the optimal sunflower is updated. Afterward, the new sunflower is produced depending upon the pollination and mortality rate. In the final step, the termination condition is checked, and the process continues until the stopping criteria are fulfilled. The mathematical modeling of the SFO algorithm is given in what follows.

For this algorithm, we considered the peculiar nature of sunflowers in detecting the optimal direction toward the sun. Pollination was considered to occur randomly, with minimal distance between flower

i

and flower

i + 1

. Then, the flower patch releases billions of pollen gametes. For simplicity, it was assumed that each sunflower only generates 1 pollen gamete and reproduces individually. Next, the amount of heat

Q

accomplished by the plant is given by

Q_{i} = \frac{P}{4 π r_{i}^{2}},

(7)

where

P

denotes source power, and

r_{i}

indicates distance amongst current plant and optimal

i .

The sunflower’s direction toward the sun can be represented as follows:

\vec{s_{i}} = \frac{X^{*} - X_{i}}{||X^{*} - X_{i}||}, i = 1, 2, \dots, n_{p} .

(8)

The sunflowers in direction

s

are evaluated by

d_{i} = λ \times P_{i} (||X_{i} + X_{i - 1}||) \times ||X_{i} + X_{i - 1}||,

(9)

where

λ

represents constant value,

P_{i} (||X_{i} + X_{i - 1}||)

denotes pollination possibility, i.e., sunflower

i

pollinated with neighboring

i - 1

, creating an individual in an arbitrary position that varies according to the distance among the flowers. Specifically, the individual near the sun would take small steps in the local refinement search. Additionally, it is necessary to bound maximal steps given by the individual. Hence, it is defined as

d_{\max} = \frac{||X_{\max} - X_{\min}||}{2 \times N_{pop}},

(10)

where

X_{\max}

and

X_{\min}

indicates lower and upper bounds, and

N_{pop}

represents the number of plants in the population. It can be expressed as follows:

{\vec{X}}_{t + 1} = {\vec{X}}_{i} + d_{i} \times \vec{s_{i}} .

(11)

The SFO approach resolves an FF for achieving enhanced classification performance. In this case, the minimized classifier error rate was assumed to be the FF determined by Equation (12). The best result includes a minimal error rate, and the worse result gains a high error rate.

C l a s s i f i e r E r r o r R a t e (x_{i}) = \frac{n u m b e r o f m i s c l a s s i f i e d t w e e t s}{T o t a l n u m b e r o f t w e e t s} * 100 .

(12)

4. Performance Validation

4.1. Result Analysis

Figure 4 illustrates a set of confusion matrices formed by the SFODLD-SAC model on a test dataset. The figures indicate that the SFODLD-SAC model ensured the effective identification of distinct class labels on 70% of the training set (TRS) and 30% of the testing set (TSS).

Table 3 provides the detailed classification outcomes of the SFODLD-SAC model on 70% of TRS. The experimental results revealed that the proposed model provided effective outcomes under all class labels.

Figure 5 reports a brief result of the SFODLD-SAC model on 70% of TRS in terms of

a c c u_{y}

,

p r e c_{n}

, and

r e c a_{l}

. The results indicated that the SFODLD-SAC model accomplished effective results under each class. For instance, the SFODLD-SAC model identified class 0 with

a c c u_{y}

,

p r e c_{n}

, and

r e c a_{l}

of 99.64, 99.69, and 99.43% correspondingly. In line with this, the SFODLD-SAC model identified class 5 with

a c c u_{y}

,

p r e c_{n}

, and

r e c a_{l}

of 99.74, 99.44, and 97.78%, respectively. Moreover, the SFODLD-SAC model identified class 10 with

a c c u_{y}

,

p r e c_{n}

, and

r e c a_{l}

of 99.01, 96.95, and 91.91%, respectively.

Figure 6 offers detailed results of the SFODLD-SAC model on 70% of TRS in terms of

s p e c_{y}

,

F_{s c o r e}

, and

M C C

. The experimental values denoted that the SFODLD-SAC model led to proficient performance levels in all classes. For instance, the SFODLD-SAC model recognized class 0 with

s p e c_{y}

,

F_{s c o r e}

, and

M C C

of 99.66, 98.04, and 97.85%, respectively. In line with this, the SFODLD-SAC model acknowledged class 5 with

s p e c_{y}

,

F_{s c o r e}

, and

M C C

of 99.94, 98.60, and 98.46%, respectively. In addition, the SFODLD-SAC model categorized class 10 with

s p e c_{y}

,

F_{s c o r e}

, and

M C C

of 99.71, 94.36, and 93.86%, respectively.

Figure 7 highlights the average classification performance of the SFODLD-SAC model on 70% of TRS. The results indicated that the SFODLD-SAC model accomplished an average

a c c u_{y}

,

p r e c_{n}

, and

r e c a_{l}

of 99.48, 97.22, and 97.14%, respectively. Thus, the SFODLD-SAC model accomplished effective sentiment classification on tweets.

Table 4 provides the detailed classification outcomes of the SFODLD-SAC model on 30% of TSS. Figure 8 showcases a comparative result of the SFODLD-SAC model on 30% of TSS in terms of

a c c u_{y}

,

p r e c_{n}

, and

r e c a_{l}

. The figure exhibits that the SFODLD-SAC technique attained improved performance under all class labels. For instance, the SFODLD-SAC model recognized class 0 with

a c c u_{y}

,

p r e c_{n}

, and

r e c a_{l}

of 99.52, 96.05, and 98.65%, respectively. Moreover, the SFODLD-SAC method identified class 5 with

a c c u_{y}

,

p r e c_{n}

, and

r e c a_{l}

of 99.76, 98.57, and 98.57%, respectively. Furthermore, the SFODLD-SAC model recognized class 10 with

a c c u_{y}

,

p r e c_{n}

, and

r e c a_{l}

of 99.76, 100, and 97.40%, correspondingly.

Figure 9 validates a detailed comparative study of the SFODLD-SAC model on 30% of TSS in terms of

s p e c_{y}

,

F_{s c o r e}

, and

M C C

. The experimental values revealed that the SFODLD-SAC model gained better results under each class. For instance, the SFODLD-SAC model identified class 0 with

s p e c_{y}

,

F_{s c o r e}

, and

M C C

of 99.60, 97.33, and 97.08%, respectively. At the same time, the SFODLD-SAC model identified class 5 with

s p e c_{y}

,

F_{s c o r e}

, and

M C C

of 99.87, 98.57, and 98.44%, respectively. Al, the SFODLD-SAC model identified class 10 with

s p e c_{y}

,

F_{s c o r e}

, and

M C C

of 100, 98.68, and 98.56%, correspondingly.

Figure 10 showcases the average classification performance of the SFODLD-SAC model on 30% of TSS. The results revealed that the SFODLD-SAC model provided an average

a c c u_{y}

,

p r e c_{n}

, and

r e c a_{l}

values of 99.76, 98.12, and 98.05%, respectively. Therefore, the SFODLD-SAC model accomplished effective sentiment classification on tweets.

The training accuracy (TA) and validation accuracy (VA) attained by the SFODLD-SAC model on phishing email classification is demonstrated in Figure 11. Based on the experimental outcomes, the SFODLD-SAC model gained maximum values of TA and VA. Specifically, VA seemed to be higher than TA.

The training loss (TL) and validation loss (VL) achieved by the SFODLD-SAC model on phishing email classification are shown in Figure 12. Based on the experimental outcomes, it can be inferred that the SFODLD-SAC model accomplished the least values of TL and VL. Specifically, VL seemed to be lower than TL. The results denoted that the SFODLD-SAC model exhibited its ability in categorizing different classes on the test datasets.

4.2. Discussion

To highlight the supremacy of the SFODLD-SAC model, a comparative study with recent approaches [12] was conducted, the results of which are shown in Table 5 and Figure 13. The experimental outcomes stated that the SVM and DT models showed the least classification performance over the other methods. At the same time, the RF and XGBoost models accomplished slightly improved outcomes over the other techniques. In addition, the extra tree classifier accomplished reasonable performance with

a c c u_{y}

,

p r e c_{n}

,

r e c a_{l}

, and

F 1_{s c o r e}

of 92.32, 93.08, 92.42, and 92.13%, respectively.

However, the SFODLD-SAC model accomplished superior outcomes with maximum

a c c u_{y}

,

p r e c_{n}

,

r e c a_{l}

, and

F 1_{s c o r e}

of 99.65, 98.12, 98.05, and 98.06%, respectively. The above-mentioned results and discussion demonstrate that the SFODLD-SAC model accomplished effective classification performance on COVID-19 tweets. The enhanced performance of the proposed model is due to the optimal hyperparameter tuning of the CRNN model using the SFO algorithm.

5. Conclusions

In this study, a novel SFODLD-SAC model was introduced for the recognition and classification of sentiments on COVID-19 tweets. At the initial stage, the SFODLD-SAC model preprocessed the tweets in distinct ways, such as stemming, removal of stopwords, usernames, link punctuations, and numerals. Then, the TF-IDF model was applied for the useful extraction of features from the preprocessed data. Afterward, features were passed into the CRNN model to analyze and classify sentiments. Lastly, the SFO algorithm was utilized to optimally adjust the hyperparameters that exist in the CRNN model. A simulation analysis of the SFODLD-SAC model was performed using a benchmark dataset from the Kaggle repository. Extensive comparative results report the promising performance of the SFODLD-SAC model over other recent state-of-the-art models, with maximum

a c c u_{y}

,

p r e c_{n}

,

r e c a_{l}

, and

F 1_{s c o r e}

of 99.65, 98.12, 98.05, and 98.06%, respectively. Thus, the presented SFODLD-SAC model can be applied for enhanced SA on COVID-19 tweets, as well as on big data environments to analyze the sentiments in a real-time environment. In the future, outlier detection and clustering models can be employed to improve the sentiment classification performance. Moreover, the proposed SFODLD-SAC model can be extended to the design of an ensemble voting-based fusion model to improve classification performance. In addition, the proposed model can focus on the design of metaheuristic feature selection techniques to reduce the curse of dimensionality. Finally, different data preprocessing approaches can be employed for improving the input data quality in the future.

Author Contributions

Conceptualization, N.A.A. and H.T.H.; methodology, S.A.-K. and R.F.M.; software, S.A.-K. and R.F.M.; validation, Y.A., A.M.M. and H.T.H.; formal analysis, N.A.A., A.M.M. and H.T.H.; investigation, S.A.-K. and Y.A.; resources, H.T.H.; data curation, A.M.M.; writing—original draft preparation, R.F.M., S.A.-K. and Y.A.; writing—review and editing, N.A.A., A.M.M. and H.T.H.; visualization, Y.A., A.M.M. and H.T.H.; supervision, S.A.-K. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Taif University.

Data Availability Statement

Data sharing is not applicable to this article, as no datasets were generated during the current study.

Acknowledgments

Taif University Researchers Supporting Project number (TURSP-2020/154), Taif University, Taif, Saudi Arabia.

Conflicts of Interest

The authors declare that they have no conflicts of interest. The manuscript was written with the contributions of all authors. All authors have given approval to the final version of the manuscript.

References

Mohan, S.; Solanki, A.K.; Taluja, H.K.; Singh, A. Predicting the impact of the third wave of COVID-19 in India using hybrid statistical machine learning models: A time series forecasting and sentiment analysis approach. Comput. Biol. Med. 2022, 144, 105354. [Google Scholar] [CrossRef] [PubMed]
Kaur, H.; Ahsaan, S.U.; Alankar, B.; Chang, V. A proposed sentiment analysis deep learning algorithm for analyzing COVID-19 tweets. Inf. Syst. Front. 2021, 23, 1417–1429. [Google Scholar] [CrossRef] [PubMed]
Mansour, R.F.; Escorcia-Gutierrez, J.; Gamarra, M.; Gupta, D.; Castillo, O.; Kumar, S. Unsupervised deep learning based variational autoencoder model for COVID-19 diagnosis and classification. Pattern Recognit. Lett. 2021, 151, 267–274. [Google Scholar] [CrossRef] [PubMed]
Muthumayil, K.; Buvana, M.; Sekar, K.R.; Amraoui, A.E.; Nouaouri, I.; Mansour, R.F. Optimized convolutional neural network for automatic detection of COVID-19. Comput. Mater. Contin. 2021, 70, 1159–1175. [Google Scholar] [CrossRef]
Xue, Y.; Onzo, B.M.; Mansour, R.F.; Su, S.B. Deep Convolutional Neural Network Approach for COVID-19 Detection. Comput. Syst. Sci. Eng. 2022, 42, 201–211. [Google Scholar] [CrossRef]
Satu, M.S.; Khan, M.I.; Mahmud, M.; Uddin, S.; Summers, M.A.; Quinn, J.M.; Moni, M.A. TClustVID: A novel machine learning classification model to investigate topics and sentiment in COVID-19 tweets. Knowl.-Based Syst. 2021, 226, 107126. [Google Scholar] [CrossRef]
Naseem, U.; Razzak, I.; Khushi, M.; Eklund, P.W.; Kim, J. COVIDSenti: A large-scale benchmark Twitter data set for COVID-19 sentiment analysis. IEEE Trans. Comput. Soc. Syst. 2021, 8, 1003–1015. [Google Scholar] [CrossRef]
Pano, T.; Kashef, R. A complete VADER-based sentiment analysis of bitcoin (BTC) tweets during the era of COVID-19. Big Data Cogn. Comput. 2020, 4, 33. [Google Scholar] [CrossRef]
Alamoodi, A.H.; Zaidan, B.B.; Zaidan, A.A.; Albahri, O.S.; Mohammed, K.I.; Malik, R.Q.; Almahdi, E.M.; Chyad, M.A.; Tareq, Z.; Albahri, A.S.; et al. Sentiment analysis and its applications in fighting COVID-19 and infectious diseases: A systematic review. Expert Syst. Appl. 2021, 167, 114155. [Google Scholar] [CrossRef]
Samuel, J.; Ali, G.G.; Rahman, M.; Esawi, E.; Samuel, Y. COVID-19 public sentiment insights and machine learning for tweets classification. Information 2020, 11, 314. [Google Scholar] [CrossRef]
Chandra, R.; Krishna, A. COVID-19 sentiment analysis via deep learning during the rise of novel cases. PLoS ONE 2021, 16, e0255615. [Google Scholar] [CrossRef] [PubMed]
Chintalapudi, N.; Battineni, G.; Amenta, F. Sentimental analysis of COVID-19 tweets using deep learning models. Infect. Dis. Rep. 2021, 13, 329–339. [Google Scholar] [CrossRef] [PubMed]
Gulati, K.; Kumar, S.S.; Boddu, R.S.K.; Sarvakar, K.; Sharma, D.K.; Nomani, M.Z.M. Comparative analysis of machine learning-based classification models using sentiment classification of tweets related to COVID-19 pandemic. Mater. Today Proc. 2022, 51, 38–41. [Google Scholar] [CrossRef]
Mujahid, M.; Lee, E.; Rustam, F.; Washington, P.B.; Ullah, S.; Reshi, A.A.; Ashraf, I. Sentiment analysis and topic modeling on tweets about online education during COVID-19. Appl. Sci. 2021, 11, 8438. [Google Scholar] [CrossRef]
Luo, Y.; Xu, X. Comparative study of deep learning models for analyzing online restaurant reviews in the era of the COVID-19 pandemic. Int. J. Hosp. Manag. 2021, 94, 102849. [Google Scholar] [CrossRef]
Rustam, F.; Khalid, M.; Aslam, W.; Rupapara, V.; Mehmood, A.; Choi, G.S. A performance comparison of supervised machine learning models for Covid-19 tweets sentiment analysis. PLoS ONE 2021, 16, e0245909. [Google Scholar] [CrossRef]
Singh, C.; Imam, T.; Wibowo, S.; Grandhi, S. A Deep Learning Approach for Sentiment Analysis of COVID-19 Reviews. Appl. Sci. 2022, 12, 3709. [Google Scholar] [CrossRef]
Yin, H.; Song, X.; Yang, S.; Li, J. Sentiment analysis and topic modeling for COVID-19 vaccine discussions. World Wide Web 2022, 1–17. [Google Scholar] [CrossRef]
Boon-Itt, S.; Skunkan, Y. Public perception of the COVID-19 pandemic on Twitter: Sentiment analysis and topic modeling study. JMIR Public Health Surveill. 2020, 6, e21978. [Google Scholar] [CrossRef]
Aljameel, S.S.; Alabbad, D.A.; Alzahrani, N.A.; Alqarni, S.M.; Alamoudi, F.A.; Babili, L.M.; Aljaafary, S.K.; Alshamrani, F.M. A sentiment analysis approach to predict an individual’s awareness of the precautionary procedures to prevent COVID-19 outbreaks in Saudi Arabia. Int. J. Environ. Res. Public Health 2021, 18, 218. [Google Scholar] [CrossRef]
Lyu, J.C.; Le Han, E.; Luli, G.K. COVID-19 vaccine–related discussion on Twitter: Topic modeling and sentiment analysis. J. Med. Internet Res. 2021, 23, e24435. [Google Scholar] [CrossRef] [PubMed]
Basiri, M.E.; Nemati, S.; Abdar, M.; Asadi, S.; Acharrya, U.R. A novel fusion-based deep learning model for sentiment analysis of COVID-19 tweets. Knowl.-Based Syst. 2021, 228, 107242. [Google Scholar] [CrossRef]
Imran, A.S.; Daudpota, S.M.; Kastrati, Z.; Batra, R. Cross-cultural polarity and emotion detection using sentiment analysis and deep learning on COVID-19 related tweets. IEEE Access 2020, 8, 181074–181090. [Google Scholar] [CrossRef] [PubMed]
Agustiningsih, K.K.; Utami, E.; Alsyaibani, M.A. Sentiment Analysis of COVID-19 Vaccines in Indonesia on Twitter Using Pre-Trained and Self-Training Word Embeddings. J. Ilmu Komput. Dan Inf. 2022, 15, 39–46. [Google Scholar]
Sentiment Analysis of COVID-19 Related Tweets. Available online: https://www.kaggle.com/competitions/sentiment-analysis-of-covid-19-related-tweets/data?select=validation.csv (accessed on 1 April 2022).
Shankar, K.; Perumal, E.; Díaz, V.G.; Tiwari, P.; Gupta, D.; Saudagar, A.K.J.; Muhammad, K. An optimal cascaded recurrent neural network for intelligent COVID-19 detection using Chest X-ray images. Appl. Soft Comput. 2021, 113, 107878. [Google Scholar] [CrossRef] [PubMed]
Gomes, G.F.; da Cunha, S.S.; Ancelotti, A.C. A sunflower optimization (SFO) algorithm applied to damage identification on laminated composite plates. Eng. Comput. 2019, 35, 619–626. [Google Scholar] [CrossRef]

Figure 1. Overall process of SFODLD-SAC technique.

Figure 2. CRNN structure.

Figure 3. Flowchart of SFO.

Figure 4. Confusion matrix of SFODLD-SAC technique. (a) The training set (TRS) and (b) the testing set (TSS).

Figure 5.

A c c_{y}

,

P r e c_{n}

, and

r e c a_{l}

analysis of SFODLD-SAC technique on 70% of TRS for classes 0–10.

Figure 5.

A c c_{y}

,

P r e c_{n}

, and

r e c a_{l}

analysis of SFODLD-SAC technique on 70% of TRS for classes 0–10.

Figure 6.

S p e c_{y}

,

F_{s c o r e}

, and

M C C

analysis of SFODLD-SAC technique on 70% of TRS for classes 0–10.

Figure 6.

S p e c_{y}

,

F_{s c o r e}

, and

M C C

analysis of SFODLD-SAC technique on 70% of TRS for classes 0–10.

Figure 7. Average analysis of SFODLD-SAC technique on 70% of TRS.

Figure 8.

A c c_{y}

,

P r e c_{n}

, and

r e c a_{l}

analyses of SFODLD-SAC technique on 370% of TSS for classes 0–10.

Figure 8.

A c c_{y}

,

P r e c_{n}

, and

r e c a_{l}

analyses of SFODLD-SAC technique on 370% of TSS for classes 0–10.

Figure 9.

S p e c_{y}

,

F_{s c o r e}

, and

M C C

analyses of SFODLD-SAC technique on 30% of TSS for classes 0–10.

Figure 9.

S p e c_{y}

,

F_{s c o r e}

, and

M C C

analyses of SFODLD-SAC technique on 30% of TSS for classes 0–10.

Figure 10. Average analysis of SFODLD-SAC technique on 30% of TSS.

Figure 11. TA and VA analyses of SFODLD-SAC technique.

Figure 12. TL and VL analyses of SFODLD-SAC technique.

Figure 13. Comparative analysis of SFODLD-SAC technique with existing approaches.

Table 1. Dataset details.

Class Label	Class Name	No. of Instances
Class 0	Optimistic	250
Class 1	Thankful	250
Class 2	Empathetic	250
Class 3	Pessimistic	250
Class 4	Anxious	250
Class 5	Sad	250
Class 6	Annoyed	250
Class 7	Denial	250
Class 8	Surprise	250
Class 9	Official report	250
Class 10	Joking	250

Table 2. Sample tweets.

ID	Tweets	Labels
1	NO JOKE I WILL HOP ON A PLANE RN! (Well after COVID-19 lol)	(0) (10)
2	Has anyone else FB ads been killing it since this coronavirus hit?	(0) (5) (10)
3	Im waiting for someone to say to me that all this corona thing is just an April fool’s joke	(3) (4)
4	He is a liar. Proven day night. Time again. Lies when the truth will do. COVID-19	(6)
5	NEW: U.S. CoronaVirus death toll reaches 4000 after nearly 900 new deaths were reported today (BNO News) COVID-19 CoronaVirusOutbreak	(8)
6	Coronavirus impact Govt extends I-T deadlines related to Sections 80C, 80D	(5) (8)
7	That moment you realize your new medication has side effects identical to coronavirus symptoms how will I know?	(4) (9)
8	Watch the government play off Corona virus as a big April Fool’s Joke	(10)
9	The problem of poverty has now covered the cover of religion. The issue has changed. There is relief from corona. All is well	(0) (4)
10	My mental health hasn’t suffered at all under the coronavirus quarantine! Ha-ha, April Fools.	(10)
11	i cannot die before watching a concert live coronavirus pls try to understand	(5) (10)

Table 3. Result analysis of SFODLD-SAC technique with distinct measures on 70% of TRS.

Training Set (70%)
Class Labels	Accuracy	Precision	Recall	Specificity	F-Score	MCC
0	99.64	96.69	99.43	99.66	98.04	97.85
1	99.12	96.47	93.71	99.66	95.07	94.60
2	99.48	96.05	98.27	99.60	97.14	96.86
3	99.38	98.84	94.48	99.89	96.61	96.30
4	99.12	91.92	99.45	99.08	95.54	95.14
5	99.74	99.44	97.78	99.94	98.60	98.46
6	99.84	100.00	98.20	100.00	99.09	99.01
7	99.69	98.25	98.25	99.83	98.25	98.07
8	99.79	98.31	99.43	99.83	98.86	98.75
9	99.48	96.53	97.66	99.66	97.09	96.81
10	99.01	96.95	91.91	99.71	94.36	93.86
Average	99.48	97.22	97.14	99.71	97.15	96.88

Table 4. Result analysis of SFODLD-SAC technique with distinct measures on 30% of TSS.

Testing Set (30%)
Class Labels	Accuracy	Precision	Recall	Specificity	F-Score	MCC
0	99.52	96.05	98.65	99.60	97.33	97.08
1	99.88	100.00	98.67	100.00	99.33	99.26
2	99.64	96.25	100.00	99.60	98.09	97.91
3	99.39	100.00	92.75	100.00	96.24	95.99
4	99.88	98.53	100.00	99.87	99.26	99.20
5	99.76	98.57	98.57	99.87	98.57	98.44
6	99.76	100.00	97.59	100.00	98.78	98.65
7	99.64	96.34	100.00	99.60	98.14	97.96
8	99.64	97.37	98.67	99.73	98.01	97.82
9	99.27	96.20	96.20	99.60	96.20	95.80
10	99.76	100.00	97.40	100.00	98.68	98.56
Average	99.65	98.12	98.05	99.81	98.06	97.88

Table 5. Comparative analysis of SFODLD-SAC technique with existing approaches.

Methods	Accuracy	Precision	Recall	F1 Score
Random Forest	90.13	91.22	90.30	90.29
XGBoost Algorithm	90.16	90.35	90.39	90.36
Support Vector Machine	89.43	89.29	89.12	89.18
Extra Tree Classifier	92.32	93.08	92.42	92.13
Decision Tree	89.29	89.47	89.21	89.29
SFODLD-SAC	99.65	98.12	98.05	98.06

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Alkhaldi, N.A.; Asiri, Y.; Mashraqi, A.M.; Halawani, H.T.; Abdel-Khalek, S.; Mansour, R.F. Leveraging Tweets for Artificial Intelligence Driven Sentiment Analysis on the COVID-19 Pandemic. Healthcare 2022, 10, 910. https://doi.org/10.3390/healthcare10050910

AMA Style

Alkhaldi NA, Asiri Y, Mashraqi AM, Halawani HT, Abdel-Khalek S, Mansour RF. Leveraging Tweets for Artificial Intelligence Driven Sentiment Analysis on the COVID-19 Pandemic. Healthcare. 2022; 10(5):910. https://doi.org/10.3390/healthcare10050910

Chicago/Turabian Style

Alkhaldi, Nora A., Yousef Asiri, Aisha M. Mashraqi, Hanan T. Halawani, Sayed Abdel-Khalek, and Romany F. Mansour. 2022. "Leveraging Tweets for Artificial Intelligence Driven Sentiment Analysis on the COVID-19 Pandemic" Healthcare 10, no. 5: 910. https://doi.org/10.3390/healthcare10050910

APA Style

Alkhaldi, N. A., Asiri, Y., Mashraqi, A. M., Halawani, H. T., Abdel-Khalek, S., & Mansour, R. F. (2022). Leveraging Tweets for Artificial Intelligence Driven Sentiment Analysis on the COVID-19 Pandemic. Healthcare, 10(5), 910. https://doi.org/10.3390/healthcare10050910

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Leveraging Tweets for Artificial Intelligence Driven Sentiment Analysis on the COVID-19 Pandemic

Abstract

1. Introduction

2. Literature Review

3. Materials and Methods

3.1. Data Used

3.2. Data Preprocessing

3.3. Sentiment Classification Using CRNN Model

3.4. Parameter Optimization

4. Performance Validation

4.1. Result Analysis

4.2. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI