Next Article in Journal
Sibling Support Program: A Novel Peer Support Intervention for Parents, Caregivers and Siblings of Youth Experiencing Mental Illness
Next Article in Special Issue
Versatility of Intermittent Abdominal Pressure Ventilation in a Case of Complicated Restrictive Respiratory Failure and COVID-19
Previous Article in Journal
Attitude, Beliefs, and Use of Herbal Remedies by Patients in the Riyadh Region of Saudi Arabia
Previous Article in Special Issue
An Automated Glowworm Swarm Optimization with an Inception-Based Deep Convolutional Neural Network for COVID-19 Diagnosis and Classification
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Leveraging Tweets for Artificial Intelligence Driven Sentiment Analysis on the COVID-19 Pandemic

1
Department of Computer Science, College of Computer Sciences and Information Technology, King Faisal University, Al-Ahsa 31982, Saudi Arabia
2
Department of Computer Science, College of Computer Science and Information Systems, Najran Univesity, Najran 61441, Saudi Arabia
3
Department of Mathematics, College of Science, Taif University, Taif 21944, Saudi Arabia
4
Department of Mathematics, Faculty of Science, New Valley University, El-Kharga 72511, Egypt
*
Author to whom correspondence should be addressed.
Healthcare 2022, 10(5), 910; https://doi.org/10.3390/healthcare10050910
Submission received: 4 April 2022 / Revised: 9 May 2022 / Accepted: 10 May 2022 / Published: 13 May 2022

Abstract

:
The COVID-19 pandemic has been a disastrous event that has elevated several psychological issues such as depression given abrupt social changes and lack of employment. At the same time, social scientists and psychologists have gained significant interest in understanding the way people express emotions and sentiments at the time of pandemics. During the rise in COVID-19 cases with stricter lockdowns, people expressed their sentiments on social media. This offers a deep understanding of human psychology during catastrophic events. By exploiting user-generated content on social media such as Twitter, people’s thoughts and sentiments can be examined, which aids in introducing health intervention policies and awareness campaigns. The recent developments of natural language processing (NLP) and deep learning (DL) models have exposed noteworthy performance in sentiment analysis. With this in mind, this paper presents a new sunflower optimization with deep-learning-driven sentiment analysis and classification (SFODLD-SAC) on COVID-19 tweets. The presented SFODLD-SAC model focuses on the identification of people’s sentiments during the COVID-19 pandemic. To accomplish this, the SFODLD-SAC model initially preprocesses the tweets in distinct ways such as stemming, removal of stopwords, usernames, link punctuations, and numerals. In addition, the TF-IDF model is applied for the useful extraction of features from the preprocessed data. Moreover, the cascaded recurrent neural network (CRNN) model is employed to analyze and classify sentiments. Finally, the SFO algorithm is utilized to optimally adjust the hyperparameters involved in the CRNN model. The design of the SFODLD-SAC technique with the inclusion of an SFO algorithm-based hyperparameter optimizer for analyzing people’s sentiments on COVID-19 shows the novelty of this study. The simulation analysis of the SFODLD-SAC model is performed using a benchmark dataset from the Kaggle repository. Extensive, comparative results report the promising performance of the SFODLD-SAC model over recent state-of-the-art models with maximum accuracy of 99.65%.

1. Introduction

COVID-19 is a communicable disease that can be transferred or spread mainly by the tiny droplets released by the individual during sneezing, coughing, and also while talking. It is currently becoming a source of anxiety depression and stress, owing to the false information that is to be posted on social media. The mental well-being of people is severely affected due to the fast spread of incorrect information on social media [1,2]. Due to the present situation of lockdown and social distancing, people are mainly dependent on, or even addicted to, the internet and mobile phones, as revealed by reports indicating that the highest number of activities are performed [1] on social media. At the time of lockdown, traffic on social media has extremely increased [3]. Among all other social media, Twitter ranks first in spreading COVID news [4,5]. The devastating part of such news is subjective because it involves mostly personal thoughts and confusion, which leads to intentional fake information, negativity, and uncertainty in the human community [6]. Meanwhile, this condition is seeking the interest of researcher scholars to make calculable analyses to create a wholesome picture. This study mainly aims at sentiment analysis based on Twitter datasets with regard to COVID-19 through a supervised machine learning algorithm.
During lockdowns, all individuals, particularly teenagers, usually spend more time on Twitter, and in fact, users are more active than at any other time. The reason behind this is to receive up-to-date information regarding COVID-19 news. Meanwhile, they share their thoughts and feelings with friends and society through a medium. Therefore, in this pandemic situation, the analysis of Twitter data has received attention from the research community. Sentiment analysis (SA) is a technical study that deals with the opinions, attitudes, and emotions of people [7]. It is considered an efficient way to calculate people’s opinions on specific topics. Additionally, SA is able to convey several impacts on the community in various means. Additionally, SA summarizes the different anxieties and mental health conditions of people that arise during the pandemic situation. We can quickly identify the depression status and panic disorder of individuals in a community from the SA outcome [8]. The only solution to bring positivity to society is to apply various virtual depression optimizers for that depressed person. It should be mentioned that the success of most applications is based on the sentiments of social users. SA for active users is considered one of the efficient ways of tracking public opinion. In this pandemic situation, these kinds of studies have made important contributions to helping policymakers and governments.
Based on this background, this paper presents a new sunflower optimization with deep-learning-driven sentiment analysis and classification (SFODLD-SAC) on COVID-19 tweets. The presented SFODLD-SAC model initially preprocesses the tweets in distinct ways such as stemming, removal of stopwords, usernames, link punctuations, and numerals. In addition, the TF-IDF model is applied for the useful extraction of features from the preprocessed data. Moreover, a cascaded recurrent neural network (CRNN) model is employed to analyze and classify sentiments. Finally, the SFO algorithm is utilized to optimally adjust the hyperparameters involved in the CRNN model. The simulation analysis of the SFODLD-SAC model is performed using a benchmark dataset from the Kaggle repository. In short, the paper’s contributions are as follows:
  • An intelligent SFODLD-SAC model is presented consisting of TF-IDF-based feature extraction, CRNN classification, and SFO-based hyperparameter optimization for COVID-19 tweet analysis. To the best of our knowledge, the SFODLD-SAC model has been never presented in the literature;
  • The SFODLD-SAC technique involves the design of an SFO algorithm to optimally choose the hyperparameters, which helps in increasing the classification accuracy and avoids computational overhead;
  • The performance of the SFODLD-SAC model is validated using a benchmark dataset from the Kaggle repository, and the results are investigated under distinct sizes of training/testing data.
The rest of this paper is organized as follows: Section 2 offers related research, and Section 3 discusses the proposed model. Then, Section 4 elaborates on the experimental validation with the benchmark Kaggle dataset, and Section 5 draws the conclusions of the paper.

2. Literature Review

This section offers a detailed review of existing SA models related to COVID-19. Researchers in [9] analyzed Indian people’s sentiment during the lockdown. They used some popular hashtags for measuring negativity and positivity in people. Samuel et al. [10] highlighted public sentiments related to the COVID-19 pandemic using two machine learning (ML) classification techniques. The researchers in [11] presented an architecture, in which a deep-learning-based language model was applied through long short-term memory (LSTM) recurrent neural network for sentimental analysis during the increase in COVID-19 cases in India. In [12], bidirectional encoder representation conducted COVID-19 tweet data analysis from a Transformer-based (BERT) model. Gulati et al. [13] implemented a comparative analysis of an ML-based classifier. This classifier was employed for above 72,000 tweets related to COVID-19. Mujahid et al. [14] employed a Twitter dataset comprising 17,155 tweets regarding e-learning. ML and DL methods showed the potential, suitability, and capability for object detection, natural language processing, and image processing tasks. Luo and Xu [15] presented a DL method to explore customer opinion regarding restaurant features and to discover reviews with mismatched ratings. This study strengthens the extant literature by analyzing restaurant reviews posted during the COVID-19 pandemic and finding a DL algorithm for text mining tasks [16].
Singh et al. [17] proposed a DL technique for SA of Twitter statistics based on COVID-19 analyses. The suggested model depends on the LSTM–RNN-based network and improved featured weight by attention layer. This approach makes use of an improved feature transformation architecture through the attention model. Yin et al. [18] conducted a study based on COVID-19 vaccination on Twitter. The authors analyzed the deliberations of individuals in terms of this research topic and the emotional polarization between vaccine brands and perceptions of countries. The results showed that the majority of individuals trust the usefulness of vaccines, and they are ready to vaccinate themselves. In another study [19], the authors focused on increasing the consideration of public awareness of the COVID-19 pandemic trend and uncovering meaningful themes of concern posted by Twitter users in the English language. An NLP method and the latent Dirichlet allocation model was utilized to classify cluster and identify themes based on keyword analysis, along with identifying the most common twitter topics. In [20], data from the Arabic COVID-19-based tweet dataset were gathered. The data were processed according to the ML prediction model. The results showed that applying the SVM classification together with bigram in TF-IDF outperformed other algorithms, with 85% accuracy.
Lyu et al. [21] identified sentiments and topics in COVID-19 vaccine-interrelated conversation among the public on social networking platforms and discriminate the relevant modifications in sentiments and topics over time for a good understanding of public emotions, perceptions, and concerns that might affect the accomplishment of herd immunity objectives. Basiri et al. [22] presented a methodology according to the fusion of four DL and one traditional supervised ML method for SA of COVID-based twitters from eight countries. Moreover, the authors analyzed COVID-based searches using Google Trends for a good understanding of the changes in sentimental patterns at dissimilar places and times. Imran et al. [23] analyzed the reaction of citizens from various cultures to the novel COVID-19 and people’s sentiments regarding subsequent actions taken by many countries. The deep LSTM model was utilized for assessing the emotions and sentimental polarities from extracted tweets. In [24], GloVe and fastText were tested as word embedding. Data collected from Twitter were prepared as stemmed and unstemmed datasets.
In short, SA can be considered a meaningful source of data mining, particularly for circumstances relevant to the requirement of examining massive quantities of publicly relevant data, such as investigating public behavior concerning the COVID-19 pandemic and its outcome on people’s lives. Furthermore, it is desirable to improve decision makers’ countermeasures and offer them an effortless method with a collection of common rules that assist complex decision-making processes depending on people’s sentiments and via examining and sorting an essential set of key features for COVID-19 posts. Thus, the proposed study in this paper varies from earlier research in combining DSS with SA for improving government decisions at the time of COVID-19. The use of the SFODLD-SAC model offers more insights and achieves better performance than other state-of-the-art techniques.

3. Materials and Methods

In this study, a novel SFODLD-SAC model was developed for the identification and classification of sentiments on COVID-19 tweets. The presented SFODLD-SAC model follows a series of processes—namely, preprocessing, TF-IDF feature extraction, CRNN classification, and SFO-based parameter optimization. Figure 1 illustrates the pipeline of the SFODLD-SAC model. The workflow of each module in the SFODLD-SAC model is elaborated in the following subsections.

3.1. Data Used

In this section, the performance of the SFODLD-SAC model on the COVID-19 tweet dataset is investigated [25]. The dataset holds 2750 instances with 11 class labels. The details related to the dataset are given in Table 1. Some sample tweets related to COVID-19 are provided in Table 2.

3.2. Data Preprocessing

At first, the SFODLD-SAC model preprocessed the tweets in distinct ways such as stemming, removal of stopwords, usernames, link punctuations, and numerals [25].
  • Removing usernames and links in tweets that do not affect SA;
  • Removing punctuation marks such as hashtags and converting them to lower case;
  • Removing stopwords and numerals.
In addition, stemming was performed to reduce the terms to their root forms. The process of reducing the term also aids to reduce the complexity of text features. Then, the TextBlob approach was used to determine the sentiment scores. Afterward, the TF-IDF model was executed to generate a collection of feature vectors. In this study, the TF-IDF model was applied for the useful extraction of features from the preprocessed data.

3.3. Sentiment Classification Using CRNN Model

For the effective recognition and classification of sentiments, the CRNN model was exploited [26]. RNN is a branch of an artificial neural network (ANN), that is, a feedforward neural network (FFNN) with connections and loops. Unlike FFNN, RNN is able to calculate input sequence using a recurrent hidden layer with the activation of previous steps. Given the sequential dataset x 1 ,   x 2 ,   ,   x T , where x i   denotes the data in i t h time step, RNN upgrades the recurrent hidden layer h t as follows:
h t = 0 , i f           t = 0 . ϕ h t 1 ,   x t , o t h e r w i s e .  
where ϕ indicates a nonlinear function. Therefore, RNN is made up of output y 1 ,   y 2 ,   ,   y T . Eventually, data classification is implemented by an output y T . In the traditional RNN model, the update rule of the recurrent hidden layer in (1) can be implemented by
h t = ϕ W x t + U h t 1 ,
where W and U represent the coefficient matrix for input and activation of recurrent hidden units. Given that p x 1 ,   x 2 ,   ,   x T is a sequential probability as follows:
p x 1 ,   x 2 ,   ,   x T = p x 1 p x T | x 1 ,   ,   x T 1 .
Next, the conditional likelihood distribution can be developed by utilizing a recurrent network. The tweets can be processed as sequence data, and a recurrent network is employed to model spectral sequence [26]. In contrast to the LSTM unit, GRU needs a smaller number of variables pertinent for classification, and a fewer number of training instances is needed. Therefore, GRU was chosen as a key element of RNN. The essential component of GRU is 2 gating units that are used to control the data flow within the unit. Figure 2 depicts the framework of CRNN.
p x t | x 1 ,   ,   x t 1 = ϕ h t ,
h t = 1 u t h t 1 + u t h ˜ t .
Now, u t symbolizes the update gate as follows:
u t = σ w u x t + v u h t 1 .

3.4. Parameter Optimization

Finally, the SFO algorithm was utilized to optimally adjust the hyperparameters involved in the CRNN model. Gomes et al. [27] introduced an approach for flowering plants based on a flower pollination technique that takes into account the biological process of reproduction.
Generally, the SFO algorithm involves six steps, as given in Figure 3. It starts with the parameter initiation process, during which the number of sunflowers, maximum iterations, and solution dimension space are initialized. Then, the sunflower parameters such as pollination rate, mortality rate, and survival rate are fixed. In the third step, the optimal objective of every sunflower is arbitrarily chosen. Next, the optimal sunflower is updated. Afterward, the new sunflower is produced depending upon the pollination and mortality rate. In the final step, the termination condition is checked, and the process continues until the stopping criteria are fulfilled. The mathematical modeling of the SFO algorithm is given in what follows.
For this algorithm, we considered the peculiar nature of sunflowers in detecting the optimal direction toward the sun. Pollination was considered to occur randomly, with minimal distance between flower i and flower i + 1 . Then, the flower patch releases billions of pollen gametes. For simplicity, it was assumed that each sunflower only generates 1 pollen gamete and reproduces individually. Next, the amount of heat Q accomplished by the plant is given by
Q i = P 4 π r i 2 ,
where P denotes source power, and r i indicates distance amongst current plant and optimal i . The sunflower’s direction toward the sun can be represented as follows:
s i = X * X i X * X i , i = 1 ,   2 ,   ,   n p .
The sunflowers in direction s are evaluated by
d i = λ × P i X i + X i 1 × X i + X i 1 ,
where λ represents constant value, P i X i + X i 1 denotes pollination possibility, i.e., sunflower i pollinated with neighboring i 1 , creating an individual in an arbitrary position that varies according to the distance among the flowers. Specifically, the individual near the sun would take small steps in the local refinement search. Additionally, it is necessary to bound maximal steps given by the individual. Hence, it is defined as
d max = X max X min 2 × N pop ,
where X max and X min indicates lower and upper bounds, and N pop represents the number of plants in the population. It can be expressed as follows:
X t + 1 = X i + d i × s i .
The SFO approach resolves an FF for achieving enhanced classification performance. In this case, the minimized classifier error rate was assumed to be the FF determined by Equation (12). The best result includes a minimal error rate, and the worse result gains a high error rate.
C l a s s i f i e r E r r o r R a t e x i = n u m b e r   o f   m i s c l a s s i f i e d   t w e e t s T o t a l   n u m b e r   o f   t w e e t s * 100 .

4. Performance Validation

4.1. Result Analysis

Figure 4 illustrates a set of confusion matrices formed by the SFODLD-SAC model on a test dataset. The figures indicate that the SFODLD-SAC model ensured the effective identification of distinct class labels on 70% of the training set (TRS) and 30% of the testing set (TSS).
Table 3 provides the detailed classification outcomes of the SFODLD-SAC model on 70% of TRS. The experimental results revealed that the proposed model provided effective outcomes under all class labels.
Figure 5 reports a brief result of the SFODLD-SAC model on 70% of TRS in terms of a c c u y , p r e c n , and r e c a l . The results indicated that the SFODLD-SAC model accomplished effective results under each class. For instance, the SFODLD-SAC model identified class 0 with a c c u y , p r e c n , and r e c a l of 99.64, 99.69, and 99.43% correspondingly. In line with this, the SFODLD-SAC model identified class 5 with a c c u y , p r e c n , and r e c a l of 99.74, 99.44, and 97.78%, respectively. Moreover, the SFODLD-SAC model identified class 10 with a c c u y , p r e c n , and r e c a l of 99.01, 96.95, and 91.91%, respectively.
Figure 6 offers detailed results of the SFODLD-SAC model on 70% of TRS in terms of s p e c y , F s c o r e , and M C C . The experimental values denoted that the SFODLD-SAC model led to proficient performance levels in all classes. For instance, the SFODLD-SAC model recognized class 0 with s p e c y , F s c o r e , and M C C of 99.66, 98.04, and 97.85%, respectively. In line with this, the SFODLD-SAC model acknowledged class 5 with s p e c y , F s c o r e , and M C C of 99.94, 98.60, and 98.46%, respectively. In addition, the SFODLD-SAC model categorized class 10 with s p e c y , F s c o r e , and M C C of 99.71, 94.36, and 93.86%, respectively.
Figure 7 highlights the average classification performance of the SFODLD-SAC model on 70% of TRS. The results indicated that the SFODLD-SAC model accomplished an average   a c c u y , p r e c n , and r e c a l of 99.48, 97.22, and 97.14%, respectively. Thus, the SFODLD-SAC model accomplished effective sentiment classification on tweets.
Table 4 provides the detailed classification outcomes of the SFODLD-SAC model on 30% of TSS. Figure 8 showcases a comparative result of the SFODLD-SAC model on 30% of TSS in terms of a c c u y , p r e c n , and r e c a l . The figure exhibits that the SFODLD-SAC technique attained improved performance under all class labels. For instance, the SFODLD-SAC model recognized class 0 with a c c u y , p r e c n , and r e c a l of 99.52, 96.05, and 98.65%, respectively. Moreover, the SFODLD-SAC method identified class 5 with a c c u y , p r e c n , and r e c a l of 99.76, 98.57, and 98.57%, respectively. Furthermore, the SFODLD-SAC model recognized class 10 with a c c u y , p r e c n , and r e c a l of 99.76, 100, and 97.40%, correspondingly.
Figure 9 validates a detailed comparative study of the SFODLD-SAC model on 30% of TSS in terms of s p e c y , F s c o r e , and M C C . The experimental values revealed that the SFODLD-SAC model gained better results under each class. For instance, the SFODLD-SAC model identified class 0 with s p e c y , F s c o r e , and M C C of 99.60, 97.33, and 97.08%, respectively. At the same time, the SFODLD-SAC model identified class 5 with s p e c y , F s c o r e , and M C C of 99.87, 98.57, and 98.44%, respectively. Al, the SFODLD-SAC model identified class 10 with s p e c y , F s c o r e , and M C C of 100, 98.68, and 98.56%, correspondingly.
Figure 10 showcases the average classification performance of the SFODLD-SAC model on 30% of TSS. The results revealed that the SFODLD-SAC model provided an average   a c c u y , p r e c n , and r e c a l values of 99.76, 98.12, and 98.05%, respectively. Therefore, the SFODLD-SAC model accomplished effective sentiment classification on tweets.
The training accuracy (TA) and validation accuracy (VA) attained by the SFODLD-SAC model on phishing email classification is demonstrated in Figure 11. Based on the experimental outcomes, the SFODLD-SAC model gained maximum values of TA and VA. Specifically, VA seemed to be higher than TA.
The training loss (TL) and validation loss (VL) achieved by the SFODLD-SAC model on phishing email classification are shown in Figure 12. Based on the experimental outcomes, it can be inferred that the SFODLD-SAC model accomplished the least values of TL and VL. Specifically, VL seemed to be lower than TL. The results denoted that the SFODLD-SAC model exhibited its ability in categorizing different classes on the test datasets.

4.2. Discussion

To highlight the supremacy of the SFODLD-SAC model, a comparative study with recent approaches [12] was conducted, the results of which are shown in Table 5 and Figure 13. The experimental outcomes stated that the SVM and DT models showed the least classification performance over the other methods. At the same time, the RF and XGBoost models accomplished slightly improved outcomes over the other techniques. In addition, the extra tree classifier accomplished reasonable performance with a c c u y , p r e c n , r e c a l , and F 1 s c o r e of 92.32, 93.08, 92.42, and 92.13%, respectively.
However, the SFODLD-SAC model accomplished superior outcomes with maximum   a c c u y , p r e c n , r e c a l , and F 1 s c o r e of 99.65, 98.12, 98.05, and 98.06%, respectively. The above-mentioned results and discussion demonstrate that the SFODLD-SAC model accomplished effective classification performance on COVID-19 tweets. The enhanced performance of the proposed model is due to the optimal hyperparameter tuning of the CRNN model using the SFO algorithm.

5. Conclusions

In this study, a novel SFODLD-SAC model was introduced for the recognition and classification of sentiments on COVID-19 tweets. At the initial stage, the SFODLD-SAC model preprocessed the tweets in distinct ways, such as stemming, removal of stopwords, usernames, link punctuations, and numerals. Then, the TF-IDF model was applied for the useful extraction of features from the preprocessed data. Afterward, features were passed into the CRNN model to analyze and classify sentiments. Lastly, the SFO algorithm was utilized to optimally adjust the hyperparameters that exist in the CRNN model. A simulation analysis of the SFODLD-SAC model was performed using a benchmark dataset from the Kaggle repository. Extensive comparative results report the promising performance of the SFODLD-SAC model over other recent state-of-the-art models, with maximum   a c c u y , p r e c n , r e c a l , and F 1 s c o r e of 99.65, 98.12, 98.05, and 98.06%, respectively. Thus, the presented SFODLD-SAC model can be applied for enhanced SA on COVID-19 tweets, as well as on big data environments to analyze the sentiments in a real-time environment. In the future, outlier detection and clustering models can be employed to improve the sentiment classification performance. Moreover, the proposed SFODLD-SAC model can be extended to the design of an ensemble voting-based fusion model to improve classification performance. In addition, the proposed model can focus on the design of metaheuristic feature selection techniques to reduce the curse of dimensionality. Finally, different data preprocessing approaches can be employed for improving the input data quality in the future.

Author Contributions

Conceptualization, N.A.A. and H.T.H.; methodology, S.A.-K. and R.F.M.; software, S.A.-K. and R.F.M.; validation, Y.A., A.M.M. and H.T.H.; formal analysis, N.A.A., A.M.M. and H.T.H.; investigation, S.A.-K. and Y.A.; resources, H.T.H.; data curation, A.M.M.; writing—original draft preparation, R.F.M., S.A.-K. and Y.A.; writing—review and editing, N.A.A., A.M.M. and H.T.H.; visualization, Y.A., A.M.M. and H.T.H.; supervision, S.A.-K. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Taif University.

Data Availability Statement

Data sharing is not applicable to this article, as no datasets were generated during the current study.

Acknowledgments

Taif University Researchers Supporting Project number (TURSP-2020/154), Taif University, Taif, Saudi Arabia.

Conflicts of Interest

The authors declare that they have no conflicts of interest. The manuscript was written with the contributions of all authors. All authors have given approval to the final version of the manuscript.

References

  1. Mohan, S.; Solanki, A.K.; Taluja, H.K.; Singh, A. Predicting the impact of the third wave of COVID-19 in India using hybrid statistical machine learning models: A time series forecasting and sentiment analysis approach. Comput. Biol. Med. 2022, 144, 105354. [Google Scholar] [CrossRef] [PubMed]
  2. Kaur, H.; Ahsaan, S.U.; Alankar, B.; Chang, V. A proposed sentiment analysis deep learning algorithm for analyzing COVID-19 tweets. Inf. Syst. Front. 2021, 23, 1417–1429. [Google Scholar] [CrossRef] [PubMed]
  3. Mansour, R.F.; Escorcia-Gutierrez, J.; Gamarra, M.; Gupta, D.; Castillo, O.; Kumar, S. Unsupervised deep learning based variational autoencoder model for COVID-19 diagnosis and classification. Pattern Recognit. Lett. 2021, 151, 267–274. [Google Scholar] [CrossRef] [PubMed]
  4. Muthumayil, K.; Buvana, M.; Sekar, K.R.; Amraoui, A.E.; Nouaouri, I.; Mansour, R.F. Optimized convolutional neural network for automatic detection of COVID-19. Comput. Mater. Contin. 2021, 70, 1159–1175. [Google Scholar] [CrossRef]
  5. Xue, Y.; Onzo, B.M.; Mansour, R.F.; Su, S.B. Deep Convolutional Neural Network Approach for COVID-19 Detection. Comput. Syst. Sci. Eng. 2022, 42, 201–211. [Google Scholar] [CrossRef]
  6. Satu, M.S.; Khan, M.I.; Mahmud, M.; Uddin, S.; Summers, M.A.; Quinn, J.M.; Moni, M.A. TClustVID: A novel machine learning classification model to investigate topics and sentiment in COVID-19 tweets. Knowl.-Based Syst. 2021, 226, 107126. [Google Scholar] [CrossRef]
  7. Naseem, U.; Razzak, I.; Khushi, M.; Eklund, P.W.; Kim, J. COVIDSenti: A large-scale benchmark Twitter data set for COVID-19 sentiment analysis. IEEE Trans. Comput. Soc. Syst. 2021, 8, 1003–1015. [Google Scholar] [CrossRef]
  8. Pano, T.; Kashef, R. A complete VADER-based sentiment analysis of bitcoin (BTC) tweets during the era of COVID-19. Big Data Cogn. Comput. 2020, 4, 33. [Google Scholar] [CrossRef]
  9. Alamoodi, A.H.; Zaidan, B.B.; Zaidan, A.A.; Albahri, O.S.; Mohammed, K.I.; Malik, R.Q.; Almahdi, E.M.; Chyad, M.A.; Tareq, Z.; Albahri, A.S.; et al. Sentiment analysis and its applications in fighting COVID-19 and infectious diseases: A systematic review. Expert Syst. Appl. 2021, 167, 114155. [Google Scholar] [CrossRef]
  10. Samuel, J.; Ali, G.G.; Rahman, M.; Esawi, E.; Samuel, Y. COVID-19 public sentiment insights and machine learning for tweets classification. Information 2020, 11, 314. [Google Scholar] [CrossRef]
  11. Chandra, R.; Krishna, A. COVID-19 sentiment analysis via deep learning during the rise of novel cases. PLoS ONE 2021, 16, e0255615. [Google Scholar] [CrossRef] [PubMed]
  12. Chintalapudi, N.; Battineni, G.; Amenta, F. Sentimental analysis of COVID-19 tweets using deep learning models. Infect. Dis. Rep. 2021, 13, 329–339. [Google Scholar] [CrossRef] [PubMed]
  13. Gulati, K.; Kumar, S.S.; Boddu, R.S.K.; Sarvakar, K.; Sharma, D.K.; Nomani, M.Z.M. Comparative analysis of machine learning-based classification models using sentiment classification of tweets related to COVID-19 pandemic. Mater. Today Proc. 2022, 51, 38–41. [Google Scholar] [CrossRef]
  14. Mujahid, M.; Lee, E.; Rustam, F.; Washington, P.B.; Ullah, S.; Reshi, A.A.; Ashraf, I. Sentiment analysis and topic modeling on tweets about online education during COVID-19. Appl. Sci. 2021, 11, 8438. [Google Scholar] [CrossRef]
  15. Luo, Y.; Xu, X. Comparative study of deep learning models for analyzing online restaurant reviews in the era of the COVID-19 pandemic. Int. J. Hosp. Manag. 2021, 94, 102849. [Google Scholar] [CrossRef]
  16. Rustam, F.; Khalid, M.; Aslam, W.; Rupapara, V.; Mehmood, A.; Choi, G.S. A performance comparison of supervised machine learning models for Covid-19 tweets sentiment analysis. PLoS ONE 2021, 16, e0245909. [Google Scholar] [CrossRef]
  17. Singh, C.; Imam, T.; Wibowo, S.; Grandhi, S. A Deep Learning Approach for Sentiment Analysis of COVID-19 Reviews. Appl. Sci. 2022, 12, 3709. [Google Scholar] [CrossRef]
  18. Yin, H.; Song, X.; Yang, S.; Li, J. Sentiment analysis and topic modeling for COVID-19 vaccine discussions. World Wide Web 2022, 1–17. [Google Scholar] [CrossRef]
  19. Boon-Itt, S.; Skunkan, Y. Public perception of the COVID-19 pandemic on Twitter: Sentiment analysis and topic modeling study. JMIR Public Health Surveill. 2020, 6, e21978. [Google Scholar] [CrossRef]
  20. Aljameel, S.S.; Alabbad, D.A.; Alzahrani, N.A.; Alqarni, S.M.; Alamoudi, F.A.; Babili, L.M.; Aljaafary, S.K.; Alshamrani, F.M. A sentiment analysis approach to predict an individual’s awareness of the precautionary procedures to prevent COVID-19 outbreaks in Saudi Arabia. Int. J. Environ. Res. Public Health 2021, 18, 218. [Google Scholar] [CrossRef]
  21. Lyu, J.C.; Le Han, E.; Luli, G.K. COVID-19 vaccine–related discussion on Twitter: Topic modeling and sentiment analysis. J. Med. Internet Res. 2021, 23, e24435. [Google Scholar] [CrossRef] [PubMed]
  22. Basiri, M.E.; Nemati, S.; Abdar, M.; Asadi, S.; Acharrya, U.R. A novel fusion-based deep learning model for sentiment analysis of COVID-19 tweets. Knowl.-Based Syst. 2021, 228, 107242. [Google Scholar] [CrossRef]
  23. Imran, A.S.; Daudpota, S.M.; Kastrati, Z.; Batra, R. Cross-cultural polarity and emotion detection using sentiment analysis and deep learning on COVID-19 related tweets. IEEE Access 2020, 8, 181074–181090. [Google Scholar] [CrossRef] [PubMed]
  24. Agustiningsih, K.K.; Utami, E.; Alsyaibani, M.A. Sentiment Analysis of COVID-19 Vaccines in Indonesia on Twitter Using Pre-Trained and Self-Training Word Embeddings. J. Ilmu Komput. Dan Inf. 2022, 15, 39–46. [Google Scholar]
  25. Sentiment Analysis of COVID-19 Related Tweets. Available online: https://www.kaggle.com/competitions/sentiment-analysis-of-covid-19-related-tweets/data?select=validation.csv (accessed on 1 April 2022).
  26. Shankar, K.; Perumal, E.; Díaz, V.G.; Tiwari, P.; Gupta, D.; Saudagar, A.K.J.; Muhammad, K. An optimal cascaded recurrent neural network for intelligent COVID-19 detection using Chest X-ray images. Appl. Soft Comput. 2021, 113, 107878. [Google Scholar] [CrossRef] [PubMed]
  27. Gomes, G.F.; da Cunha, S.S.; Ancelotti, A.C. A sunflower optimization (SFO) algorithm applied to damage identification on laminated composite plates. Eng. Comput. 2019, 35, 619–626. [Google Scholar] [CrossRef]
Figure 1. Overall process of SFODLD-SAC technique.
Figure 1. Overall process of SFODLD-SAC technique.
Healthcare 10 00910 g001
Figure 2. CRNN structure.
Figure 2. CRNN structure.
Healthcare 10 00910 g002
Figure 3. Flowchart of SFO.
Figure 3. Flowchart of SFO.
Healthcare 10 00910 g003
Figure 4. Confusion matrix of SFODLD-SAC technique. (a) The training set (TRS) and (b) the testing set (TSS).
Figure 4. Confusion matrix of SFODLD-SAC technique. (a) The training set (TRS) and (b) the testing set (TSS).
Healthcare 10 00910 g004
Figure 5. A c c y , P r e c n , and r e c a l analysis of SFODLD-SAC technique on 70% of TRS for classes 0–10.
Figure 5. A c c y , P r e c n , and r e c a l analysis of SFODLD-SAC technique on 70% of TRS for classes 0–10.
Healthcare 10 00910 g005
Figure 6. S p e c y , F s c o r e , and M C C analysis of SFODLD-SAC technique on 70% of TRS for classes 0–10.
Figure 6. S p e c y , F s c o r e , and M C C analysis of SFODLD-SAC technique on 70% of TRS for classes 0–10.
Healthcare 10 00910 g006
Figure 7. Average analysis of SFODLD-SAC technique on 70% of TRS.
Figure 7. Average analysis of SFODLD-SAC technique on 70% of TRS.
Healthcare 10 00910 g007
Figure 8. A c c y , P r e c n , and r e c a l analyses of SFODLD-SAC technique on 370% of TSS for classes 0–10.
Figure 8. A c c y , P r e c n , and r e c a l analyses of SFODLD-SAC technique on 370% of TSS for classes 0–10.
Healthcare 10 00910 g008
Figure 9. S p e c y , F s c o r e , and M C C analyses of SFODLD-SAC technique on 30% of TSS for classes 0–10.
Figure 9. S p e c y , F s c o r e , and M C C analyses of SFODLD-SAC technique on 30% of TSS for classes 0–10.
Healthcare 10 00910 g009
Figure 10. Average analysis of SFODLD-SAC technique on 30% of TSS.
Figure 10. Average analysis of SFODLD-SAC technique on 30% of TSS.
Healthcare 10 00910 g010
Figure 11. TA and VA analyses of SFODLD-SAC technique.
Figure 11. TA and VA analyses of SFODLD-SAC technique.
Healthcare 10 00910 g011
Figure 12. TL and VL analyses of SFODLD-SAC technique.
Figure 12. TL and VL analyses of SFODLD-SAC technique.
Healthcare 10 00910 g012
Figure 13. Comparative analysis of SFODLD-SAC technique with existing approaches.
Figure 13. Comparative analysis of SFODLD-SAC technique with existing approaches.
Healthcare 10 00910 g013
Table 1. Dataset details.
Table 1. Dataset details.
Class LabelClass NameNo. of Instances
Class 0Optimistic250
Class 1Thankful250
Class 2Empathetic250
Class 3Pessimistic250
Class 4Anxious250
Class 5Sad250
Class 6Annoyed250
Class 7Denial250
Class 8Surprise250
Class 9Official report250
Class 10Joking250
Table 2. Sample tweets.
Table 2. Sample tweets.
IDTweetsLabels
1NO JOKE I WILL HOP ON A PLANE RN! (Well after COVID-19 lol)(0) (10)
2Has anyone else FB ads been killing it since this coronavirus hit?(0) (5) (10)
3Im waiting for someone to say to me that all this corona thing is just an April fool’s joke(3) (4)
4He is a liar. Proven day night. Time again. Lies when the truth will do. COVID-19(6)
5NEW: U.S. CoronaVirus death toll reaches 4000 after nearly 900 new deaths were reported today (BNO News) COVID-19 CoronaVirusOutbreak(8)
6Coronavirus impact Govt extends I-T deadlines related to Sections 80C, 80D(5) (8)
7That moment you realize your new medication has side effects identical to coronavirus symptoms how will I know?(4) (9)
8Watch the government play off Corona virus as a big April Fool’s Joke(10)
9The problem of poverty has now covered the cover of religion. The issue has changed. There is relief from corona. All is well(0) (4)
10My mental health hasn’t suffered at all under the coronavirus quarantine! Ha-ha, April Fools.(10)
11i cannot die before watching a concert live coronavirus pls try to understand(5) (10)
Table 3. Result analysis of SFODLD-SAC technique with distinct measures on 70% of TRS.
Table 3. Result analysis of SFODLD-SAC technique with distinct measures on 70% of TRS.
Training Set (70%)
Class LabelsAccuracyPrecisionRecallSpecificityF-ScoreMCC
099.6496.6999.4399.6698.0497.85
199.1296.4793.7199.6695.0794.60
299.4896.0598.2799.6097.1496.86
399.3898.8494.4899.8996.6196.30
499.1291.9299.4599.0895.5495.14
599.7499.4497.7899.9498.6098.46
699.84100.0098.20100.0099.0999.01
799.6998.2598.2599.8398.2598.07
899.7998.3199.4399.8398.8698.75
999.4896.5397.6699.6697.0996.81
1099.0196.9591.9199.7194.3693.86
Average99.4897.2297.1499.7197.1596.88
Table 4. Result analysis of SFODLD-SAC technique with distinct measures on 30% of TSS.
Table 4. Result analysis of SFODLD-SAC technique with distinct measures on 30% of TSS.
Testing Set (30%)
Class LabelsAccuracyPrecisionRecallSpecificityF-ScoreMCC
099.5296.0598.6599.6097.3397.08
199.88100.0098.67100.0099.3399.26
299.6496.25100.0099.6098.0997.91
399.39100.0092.75100.0096.2495.99
499.8898.53100.0099.8799.2699.20
599.7698.5798.5799.8798.5798.44
699.76100.0097.59100.0098.7898.65
799.6496.34100.0099.6098.1497.96
899.6497.3798.6799.7398.0197.82
999.2796.2096.2099.6096.2095.80
1099.76100.0097.40100.0098.6898.56
Average99.6598.1298.0599.8198.0697.88
Table 5. Comparative analysis of SFODLD-SAC technique with existing approaches.
Table 5. Comparative analysis of SFODLD-SAC technique with existing approaches.
MethodsAccuracyPrecisionRecallF1 Score
Random Forest90.1391.2290.3090.29
XGBoost Algorithm90.1690.3590.3990.36
Support Vector Machine89.4389.2989.1289.18
Extra Tree Classifier92.3293.0892.4292.13
Decision Tree89.2989.4789.2189.29
SFODLD-SAC99.6598.1298.0598.06
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Alkhaldi, N.A.; Asiri, Y.; Mashraqi, A.M.; Halawani, H.T.; Abdel-Khalek, S.; Mansour, R.F. Leveraging Tweets for Artificial Intelligence Driven Sentiment Analysis on the COVID-19 Pandemic. Healthcare 2022, 10, 910. https://doi.org/10.3390/healthcare10050910

AMA Style

Alkhaldi NA, Asiri Y, Mashraqi AM, Halawani HT, Abdel-Khalek S, Mansour RF. Leveraging Tweets for Artificial Intelligence Driven Sentiment Analysis on the COVID-19 Pandemic. Healthcare. 2022; 10(5):910. https://doi.org/10.3390/healthcare10050910

Chicago/Turabian Style

Alkhaldi, Nora A., Yousef Asiri, Aisha M. Mashraqi, Hanan T. Halawani, Sayed Abdel-Khalek, and Romany F. Mansour. 2022. "Leveraging Tweets for Artificial Intelligence Driven Sentiment Analysis on the COVID-19 Pandemic" Healthcare 10, no. 5: 910. https://doi.org/10.3390/healthcare10050910

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop