Next Article in Journal
Effects of Regular Low-Level Alcohol Consumption in Healthy Individuals: A Randomized, Double-Blind, Placebo-Controlled Crossover Study
Previous Article in Journal
Socioeconomic Status and Interest in Genetic Testing in a US-Based Sample
Previous Article in Special Issue
Analysis of Social Media Discussions on (#)Diet by Blue, Red, and Swing States in the U.S.
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Semantic Analysis and Topic Modelling of Web-Scrapped COVID-19 Tweet Corpora through Data Mining Methodologies

1
School of Computer Engineering, KIIT Deemed to be University, Bhubaneswar 751024, Odisha, India
2
School of Computer Applications, KIIT Deemed to be University, Bhubaneswar 751024, Odisha, India
3
Department of Mathematics, Pandit Deendayal Energy University, Gandhinagar 382426, Gujarat, India
4
Faculty of Economics and Administrative Sciences, Universidad Católica de la Santísima Concepción, Concepción 4030000, Chile
5
Department of Computer Science and Engineering, School of Computing and Information Technology, Manipal University Jaipur, Jaipur 303007, Rajasthan, India
6
ERCIM Postdoctoral Fellow, Department of ICT and Science, Norwegian University of Science and Technology, Ankeret, B-315, Ålesund, Torgarden, 7491 Trondheim, Norway
7
Department of CSE, School of Engineering and Technology, CHRIST Deemed to be University, Bengaluru 560029, Karnataka, India
*
Author to whom correspondence should be addressed.
Healthcare 2022, 10(5), 881; https://doi.org/10.3390/healthcare10050881
Submission received: 30 March 2022 / Revised: 2 May 2022 / Accepted: 5 May 2022 / Published: 10 May 2022

Abstract

The evolution of the coronavirus (COVID-19) disease took a toll on the social, healthcare, economic, and psychological prosperity of human beings. In the past couple of months, many organizations, individuals, and governments have adopted Twitter to convey their sentiments on COVID-19, the lockdown, the pandemic, and hashtags. This paper aims to analyze the psychological reactions and discourse of Twitter users related to COVID-19. In this experiment, Latent Dirichlet Allocation (LDA) has been used for topic modeling. In addition, a Bidirectional Long Short-Term Memory (BiLSTM) model and various classification techniques such as random forest, support vector machine, logistic regression, naive Bayes, decision tree, logistic regression with stochastic gradient descent optimizer, and majority voting classifier have been adapted for analyzing the polarity of sentiment. The effectiveness of the aforesaid approaches along with LDA modeling has been tested, validated, and compared with several benchmark datasets and on a newly generated dataset for analysis. To achieve better results, a dual dataset approach has been incorporated to determine the frequency of positive and negative tweets and word clouds, which helps to identify the most effective model for analyzing the corpora. The experimental result shows that the BiLSTM approach outperforms the other approaches with an accuracy of 96.7%.
Keywords: COVID-19 sentiment analysis; BiLSTM; Latent Dirichlet Allocation (LDA); topic modeling; natural language processing COVID-19 sentiment analysis; BiLSTM; Latent Dirichlet Allocation (LDA); topic modeling; natural language processing

Share and Cite

MDPI and ACS Style

Gourisaria, M.K.; Chandra, S.; Das, H.; Patra, S.S.; Sahni, M.; Leon-Castro, E.; Singh, V.; Kumar, S. Semantic Analysis and Topic Modelling of Web-Scrapped COVID-19 Tweet Corpora through Data Mining Methodologies. Healthcare 2022, 10, 881. https://doi.org/10.3390/healthcare10050881

AMA Style

Gourisaria MK, Chandra S, Das H, Patra SS, Sahni M, Leon-Castro E, Singh V, Kumar S. Semantic Analysis and Topic Modelling of Web-Scrapped COVID-19 Tweet Corpora through Data Mining Methodologies. Healthcare. 2022; 10(5):881. https://doi.org/10.3390/healthcare10050881

Chicago/Turabian Style

Gourisaria, Mahendra Kumar, Satish Chandra, Himansu Das, Sudhansu Shekhar Patra, Manoj Sahni, Ernesto Leon-Castro, Vijander Singh, and Sandeep Kumar. 2022. "Semantic Analysis and Topic Modelling of Web-Scrapped COVID-19 Tweet Corpora through Data Mining Methodologies" Healthcare 10, no. 5: 881. https://doi.org/10.3390/healthcare10050881

APA Style

Gourisaria, M. K., Chandra, S., Das, H., Patra, S. S., Sahni, M., Leon-Castro, E., Singh, V., & Kumar, S. (2022). Semantic Analysis and Topic Modelling of Web-Scrapped COVID-19 Tweet Corpora through Data Mining Methodologies. Healthcare, 10(5), 881. https://doi.org/10.3390/healthcare10050881

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop