An Assessment of Deep Learning Models and Word Embeddings for Toxicity Detection within Online Textual Comments

Dessì, Danilo; Recupero, Diego Reforgiato; Sack, Harald

doi:10.3390/electronics10070779

Open AccessArticle

An Assessment of Deep Learning Models and Word Embeddings for Toxicity Detection within Online Textual Comments

by

Danilo Dessì

^1,2,*,†

,

Diego Reforgiato Recupero

^3,†

and

Harald Sack

^1,2,†

¹

FIZ Karlsruhe–Leibniz Institute for Information Infrastructure, Hermann-von-Helmholtz-Platz 1, 76344 Eggenstein-Leopoldshafen, Germany

²

Karlsruhe Institute of Technology, Institute AIFB, Kaiserstraße 89, 76133 Karlsruhe, Germany

³

Department of Mathematics and Computer Science, University of Cagliari, 09124 Cagliari, Italy

^*

Author to whom correspondence should be addressed.

^†

The authors equally contributed to the research performed in this paper.

Electronics 2021, 10(7), 779; https://doi.org/10.3390/electronics10070779

Submission received: 11 February 2021 / Revised: 15 March 2021 / Accepted: 20 March 2021 / Published: 25 March 2021

(This article belongs to the Special Issue Deep Learning and Explainability for Sentiment Analysis)

Download

Browse Figure

Versions Notes

Abstract

Today, increasing numbers of people are interacting online and a lot of textual comments are being produced due to the explosion of online communication. However, a paramount inconvenience within online environments is that comments that are shared within digital platforms can hide hazards, such as fake news, insults, harassment, and, more in general, comments that may hurt someone’s feelings. In this scenario, the detection of this kind of toxicity has an important role to moderate online communication. Deep learning technologies have recently delivered impressive performance within Natural Language Processing applications encompassing Sentiment Analysis and emotion detection across numerous datasets. Such models do not need any pre-defined hand-picked features, but they learn sophisticated features from the input datasets by themselves. In such a domain, word embeddings have been widely used as a way of representing words in Sentiment Analysis tasks, proving to be very effective. Therefore, in this paper, we investigated the use of deep learning and word embeddings to detect six different types of toxicity within online comments. In doing so, the most suitable deep learning layers and state-of-the-art word embeddings for identifying toxicity are evaluated. The results suggest that Long-Short Term Memory layers in combination with mimicked word embeddings are a good choice for this task.

Keywords: deep learning; word embeddings; toxicity detection; binary classification

Share and Cite

MDPI and ACS Style

Dessì, D.; Recupero, D.R.; Sack, H. An Assessment of Deep Learning Models and Word Embeddings for Toxicity Detection within Online Textual Comments. Electronics 2021, 10, 779. https://doi.org/10.3390/electronics10070779

AMA Style

Dessì D, Recupero DR, Sack H. An Assessment of Deep Learning Models and Word Embeddings for Toxicity Detection within Online Textual Comments. Electronics. 2021; 10(7):779. https://doi.org/10.3390/electronics10070779

Chicago/Turabian Style

Dessì, Danilo, Diego Reforgiato Recupero, and Harald Sack. 2021. "An Assessment of Deep Learning Models and Word Embeddings for Toxicity Detection within Online Textual Comments" Electronics 10, no. 7: 779. https://doi.org/10.3390/electronics10070779

APA Style

Dessì, D., Recupero, D. R., & Sack, H. (2021). An Assessment of Deep Learning Models and Word Embeddings for Toxicity Detection within Online Textual Comments. Electronics, 10(7), 779. https://doi.org/10.3390/electronics10070779

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Assessment of Deep Learning Models and Word Embeddings for Toxicity Detection within Online Textual Comments

Abstract

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI