Next Article in Journal
Wind Loads on Overhead Sign Structures: A Comparative Study
Previous Article in Journal
A Comprehensive Safety Analysis Study for Concrete Core Dams
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Punctuation Restoration with Transformer Model on Social Media Data

by
Adebayo Mustapha Bakare
1,
Kalaiarasi Sonai Muthu Anbananthen
1,*,
Saravanan Muthaiyah
2,
Jayakumar Krishnan
1 and
Subarmaniam Kannan
1
1
Faculty of Information Science and Technology, Multimedia University, Melaka 75450, Malaysia
2
Faculty of Management, Multimedia University, Cyberjaya 63100, Malaysia
*
Author to whom correspondence should be addressed.
Appl. Sci. 2023, 13(3), 1685; https://doi.org/10.3390/app13031685
Submission received: 20 December 2022 / Revised: 15 January 2023 / Accepted: 18 January 2023 / Published: 28 January 2023

Abstract

Several key challenges are faced during sentiment analysis. One major problem is determining the sentiment of complex sentences, paragraphs, and text documents. A paragraph with multiple parts might have multiple sentiment values. Predicting the overall sentiment value for this paragraph will not produce all the information necessary for businesses and brands. Therefore, a paragraph with multiple sentences should be separated into simple sentences. With a simple sentence, it will be effective to extract all the possible sentiments. Therefore, to split a paragraph, that paragraph must be properly punctuated. Most social media texts are improperly punctuated, so separating the sentences may be challenging. This study proposes a punctuation-restoration algorithm using the transformer model approach. We evaluated different Bidirectional Encoder Representations from Transformers (BERT) models for our transformer encoding, in addition to the neural network used for evaluation. Based on our evaluation, the RobertaLarge with the bidirectional long short-term memory (LSTM) provided the best accuracy of 97% and 90% for restoring the punctuation on Amazon and Telekom data, respectively. Other evaluation criteria like precision, recall, and F1-score are also used.
Keywords: punctuation restoration; transformers models; Bidirectional Encoder Representations from Transformers (BERT); long short-term memory (LSTM) punctuation restoration; transformers models; Bidirectional Encoder Representations from Transformers (BERT); long short-term memory (LSTM)

Share and Cite

MDPI and ACS Style

Bakare, A.M.; Anbananthen, K.S.M.; Muthaiyah, S.; Krishnan, J.; Kannan, S. Punctuation Restoration with Transformer Model on Social Media Data. Appl. Sci. 2023, 13, 1685. https://doi.org/10.3390/app13031685

AMA Style

Bakare AM, Anbananthen KSM, Muthaiyah S, Krishnan J, Kannan S. Punctuation Restoration with Transformer Model on Social Media Data. Applied Sciences. 2023; 13(3):1685. https://doi.org/10.3390/app13031685

Chicago/Turabian Style

Bakare, Adebayo Mustapha, Kalaiarasi Sonai Muthu Anbananthen, Saravanan Muthaiyah, Jayakumar Krishnan, and Subarmaniam Kannan. 2023. "Punctuation Restoration with Transformer Model on Social Media Data" Applied Sciences 13, no. 3: 1685. https://doi.org/10.3390/app13031685

APA Style

Bakare, A. M., Anbananthen, K. S. M., Muthaiyah, S., Krishnan, J., & Kannan, S. (2023). Punctuation Restoration with Transformer Model on Social Media Data. Applied Sciences, 13(3), 1685. https://doi.org/10.3390/app13031685

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop