Computational Linguistics Based Emotion Detection and Classification Model on Social Networking Data

Al-Baity, Heyam H.; Alshahrani, Hala J.; Nour, Mohamed K.; Yafoz, Ayman; Alghushairy, Omar; Alsini, Raed; Othman, Mahmoud

doi:10.3390/app12199680

Open AccessArticle

Computational Linguistics Based Emotion Detection and Classification Model on Social Networking Data

by

Heyam H. Al-Baity

¹

,

Hala J. Alshahrani

²,

Mohamed K. Nour

^3,*,

Ayman Yafoz

⁴,

Omar Alghushairy

⁵

,

Raed Alsini

⁴

and

Mahmoud Othman

⁶

¹

Department of Information Technology, College of Computer and Information Sciences, King Saud University, P.O. Box 145111, Riyadh 11451, Saudi Arabia

²

Department of Applied Linguistics, College of Languages, Princess Nourah Bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia

³

Department of Computer Sciences, College of Computing and Information System, Umm Al-Qura University, Mecca 24382, Saudi Arabia

⁴

Department of Information Systems, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 22254, Saudi Arabia

⁵

Department of Information Systems and Technology, College of Computer Science and Engineering, University of Jeddah, Jeddah 21589, Saudi Arabia

⁶

Department of Computer Science, Faculty of Computers and Information Technology, Future University in Egypt, New Cairo 11835, Egypt

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2022, 12(19), 9680; https://doi.org/10.3390/app12199680

Submission received: 11 August 2022 / Revised: 10 September 2022 / Accepted: 22 September 2022 / Published: 27 September 2022

(This article belongs to the Section Computing and Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

:

Computational linguistics (CL) is the application of computer science for analysing and comprehending written and spoken languages. Recently, emotion classification and sentiment analysis (SA) are the two techniques that are mostly utilized in the Natural Language Processing (NLP) field. Emotion analysis refers to the task of recognizing the attitude against a topic or target. The attitude may be polarity (negative or positive) or an emotional state such as sadness, joy, or anger. Therefore, classifying posts and opinion mining manually is a difficult task. Data subjectivity has made this issue an open problem in the domain. Therefore, this article develops a computational linguistics-based emotion detection and a classification model on social networking data (CLBEDC-SND) technique. The presented CLBEDC-SND technique investigates the recognition and classification of emotions in social networking data. To attain this, the presented CLBEDC-SND model performs different stages of data pre-processing to make it compatible for further processing. In addition, the CLBEDC-SND model undergoes vectorization and sentiment scoring process using fuzzy approach. For emotion classification, the presented CLBEDC-SND model employs extreme learning machine (ELM). Finally, the parameters of the ELM model are optimally modified by the use of the shuffled frog leaping optimization (SFLO) algorithm. The performance validation of the CLBEDC-SND model is tested using benchmark datasets. The experimental results demonstrate the better performance of the CLBEDC-SND model over other models.

Keywords:

computational linguistics; emotion classification; machine learning; social networking; natural language processing

1. Introduction

Computational linguistics has become a significant area in the domain of linguistics. Computational techniques employed in computational linguistics have their origin in artificial intelligence (AI) or computer science. However, the main objective of computational linguistics will remain the modeling of human language, and thus it lasts as a part of the humanity domain [1]. Computational linguistics will study how language models should be constructed for better understanding by the computers, that is it not just studies the usage of language in human actions, but even implements particular formal techniques that permit the exact construction of the hypotheses and its successive automated evaluation utilizing linguistic data (corpora) [2]. The formal component of computational linguistics relies on AI techniques [3,4,5].

Sentiment analysis (SA) turns into a growing domain at the intersection of computer science and linguistics that endeavours to automatically decide the sentiment presented in text. Sentiment is categorized into negative or positive evaluations articulated via languages [6]. The most appropriate sources for opinionated text were posts from social networking sites and their observation has obtained popularity as many approaches were available for executing SA, evaluating the thoughts articulated regarding public figures or present events, and categorizing them as negative or positive [7]. SA can be utilized for describing the text categorization as articulating a negative or positive opinion. Emotion detection in text can be projected as a solution for such difficulties. This method does not concentrate on the negative or positive opinions articulated, but rather endeavours to determine the human emotion that can be expressed [8,9,10].

Finding the emotions articulated by an individual becomes a very difficult task that humans may also struggle with [11]. Trying to devise this identification and establish an automatic way of detecting the expressed emotion is also challenging, not just because of the inadequate accessibility of training data but also the limited data presented in a short text. The emotional categories which are chosen were not exclusively dependent on the research conducted by psychologists with regard to the emotion theory [12]. Several systems adapt the emotion categories on the basis of their findings, separating or merging emotions, without any concern for the scientific standard. In contrast, many accessible systems constitute their training data on the basis of the presence of particular keywords [13]. By utilizing merely keywords-related techniques, there is no way for validating the accurateness of the classifier techniques as the classifier was almost exclusively trained for recognizing such keywords and categorizing the text, respectively [14,15].

This article develops a computational linguistics-based emotion detection and classification model on social networking data (CLBEDC-SND) technique. The presented CLBEDC-SND technique performs different stages of data pre-processing to make it compatible for further processing. In addition, the CLBEDC-SND model undergoes vectorization and sentiment scoring process using the fuzzy approach. For emotion classification, the presented CLBEDC-SND model employs extreme learning machine (ELM). Finally, the parameters of the ELM model are optimally modified through the use of shuffled frog leaping optimization (SFLO) algorithm. The performance validation of the CLBEDC-SND method can be tested by making use of the benchmark dataset.

2. Related Works

Kabir and Madria [16] formulated a neural network (NN) method, which makes use of manually labelled data for emotion classification on the COVID-19 tweets. The author provides manually labelled tweets data on COVID-19 emotional responses as well as regular tweets data. The author constituted a customized Q&A roBERTa method for extracting phrases from the tweets that can be mainly accountable for the respective emotions. Zhang et al. [17] developed a model for emotion detection in online social networks (OSNs) from user level view and creates this issue as a multi-label learning issue. Firstly, the author finds emotion temporal correlations, label correlations, and social correlations from annotated Twitter data. Secondly, depending on the mentioned observations, the author implements factor graph-related emotion recognition method for incorporating emotion temporal, labels, and social correlations into a common structure, and identifies various emotions related to the multi-label learning method.

Vijayvergia and Kumar [18] devised a new method for emotion identification which is utilized in real time because it is a relatively smaller memory size and much smaller run time. Current effective functioning methods for emotion identification were not utilized in real time because of the integration of large deep learning (DL) methods that make them significantly slower. This study suggests a method to leverage many shallow methods for surpassing the presentation of a single large method by compiling their strengths and ignoring their weakness. Such shallow methods operate independently and this enables them to be run parallelly for ensuring a smaller executing period. Rashid et al. [19] defined Aimens system that identifies emotions from textual dialogues. This mechanism utilized the LSTM method related to DL for detecting the emotions such as angry, happy, and sad in contextual conversation. The chief input was a mixture of doc2vec and word2vec embedding.

Feng et al. [20] presented a user group related topic-emotion method called UGTE for topic discovery and detecting emotions that ease the above feature sparsity issue of short texts. To be specific, the features of every user were utilized for discovering group of persons who express same emotions, and UGTE sums up short texts in a crew into long pseudo-documents efficiently. Riza and Charibaldi [21] work uses the long short term memory (LSTM) technique since the technique proved superior to earlier works. Word embedding fast text is utilized for improving GloVe and Word2Vec that cannot manage the issue of out of vocabulary (OOV). Imran et al. [22] aimed to examine reaction of citizens from various cultures to the new Coronavirus and individual sentiment regarding successive activities considered by various countries. Deep LSTM methods are utilized to estimate the emotions and sentiment polarity from tweets which are derived. Shrivastava et al. [23] modelled a sequence-related convolutional neural network (CNN) having word embedding for detecting the emotions. An attention system can be implemented in the presented method that enables CNN to concentrate the part of the features that should be attended more or over the words that contain more effect on the classification. Shelke et al. [24] develops a Leaky Relu activated Deep Neural Network (LRA-DNN), comprised of pre-processing, feature extraction, ranking and classification. The related ranks are allocated for every extracted feature in the ranking phase and lastly, the data are categorized, and precise output is gained from the classification process.

Although different ML and DL models for emotion classification are available in the literature, it is still needed to enhance the classification performance. Owing to continual deepening of the model, different parameters have a significant impact on the efficiency of the applied model. Since the trial-and-error method for hyperparameter tuning is a tedious and erroneous process, metaheuristic algorithms can be applied. Therefore, in this work, we employ SFLO algorithm for the parameter selection of the ELM model. The ultimate goal of this study was to formulate a structure for generalizing recently accumulated data and support businesses for understanding the mind of customers and mass media monitoring as it enables us to obtain an overview of the wider public opinion behind some topics.

3. The Proposed Model

In this article, a new CLBEDC-SND technique has been developed for the recognition and classification of emotions in social networking data. The presented CLBEDC-SND model performed distinct levels of data pre-processing to make it compatible for further processing. In addition, the CLBEDC-SND model undergoes vectorization and sentiment scoring process using fuzzy approach. At last, the SFLO with ELM model is applied for emotion recognition and classification. Figure 1 demonstrates the overall working process of presented CLBEDC-SND algorithm.

3.1. Data Pre-Processing

The presented CLBEDC-SND model performed distinct levels of data pre-processing to make it compatible for further processing. In the social media data, there always exists unwanted parts where the pre-processing stage takes place for removing those parts from the comment that eventually assist in improving the performance [24]. In the study, this stage undergoes six distinct steps, namely stopping word removal, tokenization, punctuation removal, URL removal, and lemmatization. The pre-processing function can be carried out in a mathematical form:

p_{r} = λ_{p} [I_{t}]

(1)

In Equation (1),

p_{r}

indicates the output of pre-processing function.

I_{t}

shows the input dataset and

λ_{p}

is denotes pre-processing function as follows

λ_{p} = [λ_{t k}, λ_{p r}, λ_{s r}, λ_{l m}, λ_{u r}]

(2)

In Equation (2),

λ_{t k}

denotes the tokenization function,

λ_{p r}

indicates the removal of punctuation function,

λ_{s r}

shows the removal of stop word function,

λ_{l m}

refers to the lemmatization function, and

λ_{u r}

indicates the removal of URL function.

At first, the input text is subjected to the tokenization method where the whole text is detached into small unit named token. This technique assists the machine in easily understanding the text and it is expressed as follows:

λ_{t k} = [I_{t 1}, I_{t 2}, I_{t 3}, \dots \dots I_{t n}]

(3)

The removal of the punctuation method is performed afterward the tokenization where the punctuation mark

(';!_{} ., ″ : ?

) is detached from the tokenized dataset for a further examination of the text and it is shown in the following

λ_{p_{1}} = λ_{p r} [I_{t}]

(4)

Stop word is the most commonly used word that barely contributes all meaning in the language such as “the”, “a”, “is”, “are”, “or” and so on. The word has small amount of data for analyzing emotion from the texts that word was removed such that the quality of text is optimized.

The lemmatization procedure takes place afterward the removal of stop words where the root word is analyzed. The root word is just a meaningful base form of the text named the lemma. For instance, excites, excited, exciting, excitement shows a similar meaning where the lemma is root word or “excite”.

Lastly, the URL removal procedure is carried out. It indicates the location of resource that does not give any essential data for analyzing sentiment.

The checking of incomplete dataset is completed, even a smaller amount of incomplete dataset might cause misprediction of emotion in the text to manage these problems. This is the point where they check the existence of irrelevant or incomplete datasets from the pre-processed dataset. When any incomplete dataset is existent, it replaces that dataset with complete meaningful dataset such that the information is correctly synthesized and provides significant emotion.

For instance,

c

u

tmrw” is replaced by “see you tomorrow”, “gd mrng” is replaced by “good morning”, “b4” is replaced by “before”, “2day” is replaced by “today”, “Lol” is replaced by “Laugh out loud”. Subsequently, the completed dataset undergoes process of feature extraction.

3.2. Vectorization and Sentiment Scoring

At this stage, the CLBEDC-SND model undergoes vectorization and sentiment scoring process using fuzzy approach. Vectorization: the text dataset can be transformed into vector format for the usage of ELM model [25]. For these purposes, a python library Gensim algorithm is used to implement word embedding that can capture context, learn word relations, syntactic and semantic similarity of words in the document. Word2Vec is used to vectorize the text dataset, and the output can be used as a primary weight for these models. Fuzzy sentiment scoring: the degree of sentiment extracted the features that specify the sentiment degree expressed within the text, the procedure of extracting the degree of sentiment passes by initial recognizing opinionated word, and the linguistic hedge with Part-Of-Speech tagger, then applying WordNet and SentiWordNet dictionaries to relate polarity for this opinionated word as a primary score value

f (u_{s})

, afterward, it adapts this value according to the type and the existence of linguistic hedges, using the fuzzy logic function developed by Zadeh for getting the concluding sentimental score in the following: An opinion word has a complement hedge (“Not”), the fuzzy score is reformed as:

f (u_{s}) = 1 - (u_{s})

(5)

when the hedge is concentrator (“extremely”), then:

f (u_{s}) = {[u_{s}]}^{2}

(6)

when the hedge is dilator (“somewhat”), then fuzzy score is deduced as:

f (u_{s}) = {[u_{s}]}^{\frac{1}{2}}

(7)

The last fuzzy score is normalized and applied as sentiment feature in fake news classification algorithm.

3.3. Emotion Classification

For accurate and timely emotion classification, the presented CLBEDC-SND model employed the ELM model. Huang et al. [26] devised the ELM model for single hidden layer feedforward neural network (SLFN). The algorithm arbitrarily chooses the input weight and empirically defines the output weight of SLFN. Afterward, the input weight and hidden layer bias are selected at random, SLFN is regarded as a linear technique and the output weight of SLFN is empirically defined by a generalized inverse function of the hidden layer output matrix. This model produces a fast-learning speed when compared to classical feedforward network learning algorithms while attaining good performance of generalization. Further, ELM tends to accomplish the minimum norm of weights and the minimum training error. The output weight is evaluated by Moore-Penrose (MP) generalized inverse. The learning speed of ELM is hundreds of times quicker than classical learning algorithm with good performance when compared to the gradient based learning mechanism. Different from the conventional gradient-based learning algorithm that works only for differentiable activation functions, the ELM mechanism is utilized for training SLFN with multiple non-differentiable activation functions. Figure 2 portrays the infrastructure of ELM.

For a sequence of training instances

{(x_{j}, t_{j})}_{j = 1}^{N}

with

N

instances and

m

classes, the SLFN with activation function

g (x)

and

L

hidden nodes is formulated below [27], where

w_{i}

=

{[w_{i 1}, \dots, w_{i n}]}^{T}

indicates the input weight,

b_{i}

denotes the bias of

i

-

t h

hidden nodes,

β_{i} = {[β_{i 1}, \dots, β_{i m}]}^{T}

shows the weight vector linking the

i

-

t h

hidden and output nodes,

w_{i} \cdot x_{j}

shows the inner products of

w_{i}

and

x_{j}

and

t_{j}

is network output interms of input

x_{j} .

\sum_{i = 1}^{L} β_{i} g (w_{j} \cdot x_{j} + b_{i}) = t_{j}, j = 1, 2, \dots, N

(8)

The Equation (8) is formulated by Equation (9):

H β = T

(9)

where

H = {[\begin{matrix} g (w_{1} \cdot x_{1} + b_{1}) & \dots & g (w_{L} \cdot x_{N} + b_{L}) \\ ⋮ & \dots & ⋮ \\ g (w_{1} \cdot x_{N} + b_{1}) & \dots & g (w_{L} \cdot x_{1} + b_{L}) \end{matrix}]}_{N \times L}, β = {[\begin{matrix} β_{1}^{T} \\ ⋮ \\ β_{L}^{T} \end{matrix}]}_{L \times m}, T = {[\begin{matrix} t 1^{T} \\ ⋮ \\ t N^{T} \end{matrix}]}_{N \times m}

In the expression,

H

indicates the hidden layer output matrices of NN system, whereas

β

shows the output weight matrix.

3.4. Hyperparameter Tuning

At the final stage, the parameters of the ELM model are optimally modified by the use of SFLO algorithm. The SFLO algorithm mimics the sub-population coevolution procedure of species of frogs seeking for food locations [28]. It integrates randomness and deterministic methods and has very effectual computational power and global searching efficiency. The frog population of wetland can be classified into distinct subpopulations, has its own culture, and local search approach is conducted for local optimization within all the subpopulations. Initially, local search is implemented for all the subpopulations, viz., upgrade function is conducted for separate frogs with the worst adaptive values in the sub-population, and the upgrading approach is given below.

D s = r a n d () \times (X_{b} - X_{w})

(10)

The upgraded solution is:

x_{w^{'}} = X_{w} + D s (| | D s | | \leq D \max)

(11)

Now, rand

()

signifies an arbitrary integer distributed uniformly within 0 and 1, Ds characterizes the amendment vector of the frog, and Dmax epitomizes the maximum permissible step size of frog individual in every jump.

3.4.1. Global Search Process

Step 1. Initialization. Define the number of frogs

n

in all the subpopulations, overall amount of frogs

N

in population, and subpopulation amount

m

.

Step 2. The

N

early population is arbitrarily produced (many primary solutions are randomly produced), and the fitness values of all the frogs are evaluated

P = {X_{1}, X_{2}, \dots, X_{F}} .

In the

S

-dimension, the

i

-

t h

frog is signified by

X_{i} = [X_{i 1}, X_{i 2}, \dots, X_{i s}] .

Step 3. Sort the fitness value in descending order, record the equivalent optimal of the existing optimal fitness value

X g

, and the frog population is classified into different sub-populations. Specifically,

N

frogs are allocated to

m

subpopulations

M^{1},

M^{2},

M^{3},

\dots,

M^{m},

encompasses

n

frogs, which satisfy

N = m \times n

:

M^{k} = {X_{k + m (l - 1)} \in P | 1 \leq k \leq m}

(12)

In Equation (12),

m

indicates the amount of sub-populations,

M^{k}

shows the

k

-

t h

subpopulations, the initial frog is distributed into the initial sub-population, the secondary frog is dispersed into the secondary sub-population, the

m

-

t h

frogs are distributed into the

m

-

t h

subpopulations, and the

m + 1 s t

frog is classified into the initial sub-population in descending sequence, and the recursion is recurring until each frog is disseminated.

Step 4. Based on the Equations (10) and (11) of the SFLO algorithm and the constraint of resolving the problems, meta evolution is conducted in every sub-population.

Step 5. Once every sub-population changes to a constrained amount of Lmax, every sub-population is mixed. Afterward a round of meta evolution for all the subpopulations, frogs in every sub-population. Still, it is in descending sequence and sub-population division, and the present global optimal solution

X g

is upgraded.

Step 6. Iterative termination condition. Once the convergence condition is met, the implementation procedure ends, or else it proceeds to Step 3.

SFLA generally accepts three approaches to regulate the runtime of the process.

i

. Afterward endless

p

times global interchange of ideas, the global optimal

X g

has not been significantly augmented.

i i

. Reach the function assessment time that is fixed beforehand the procedure is implemented.

i i i

. The fitness value of optimum converges to the current test outcomes, and the complete error is lesser than specific thresholding values.

The procedure should be enforced to present the entire searching cycle and output the optimal No matter stopping criteria are fulfilled.

3.4.2. Local Search Process

Step 4-1. If

i m = 0

, then

i m

represents the count of sub-population to compare to the number of sub-populations. If

i n = 0

, then

i n

indicates the count of local search progress for comparing to Lmax. The local search algorithm is the comprehensive procedure of step 4 of the global search technique.

Step 4-2.

X w

and

X b

described in the

i m

-

t h

populations. Afterward

D_{S}

is evaluated based on Equation (10), the worst solution is upgraded in Equation (11) to increase the location of the worst frogs in the sub-population. Once the upgrade fitness number of the worst frog is superior to the existing fitness values,

X w^{'}

is utilized for replacing

X w, \cdot

if the updated fitness value is inferior to the current fitness value, the global optima

X g

is used to replace

X b .

Furthermore, the local search procedure is accomplished based on Equations (10) and (11), once the upgrade fitness number of the worst frog is superior to the existing one, then

X w^{'}

is utilized for replacing

X w

. Or then, a solution is arbitrarily produced for replacing

X w

to the worst frog. The fitness value of the upgraded sub-population is ordered in descending sequence. Set

i n = i n + 1

and reiterate these steps until

i n = L s .

Step 4-3. Consider

i m = i m + 1

, and skip to steps 4-2 until

i m = m .

Step 4-4. The global data exchange is implemented if

i m = m .

The SFLO algorithm makes a derivation of a fitness function (FF) for acquiring enhanced classifier outcomes. It sets a positive value for indicating superior performance of the candidate solutions. In this paper, the reduction in the classifier error rate was taken as the FF, as presented below in Equation (13). The optimum solution comprises the least fault rate whereas the worst one gains a higher fault rate.

f i t n e s s (x_{i}) = C l a s s i f i e r E r r o r R a t e (x_{i}) = \frac{n u m b e r o f m i s c l a s s i f i e d s a m p l e s}{T o t a l n u m b e r o f s a m p l e s} * 100

(13)

4. Results and Discussion

The proposed model is simulated using Python 3.6.5 tool on PC i5-8600k, GeForce 1050Ti 4 GB, 16 GB RAM, 250 GB SSD, and 1TB HDD. The parameter settings are given as follows: learning rate: 0.01, dropout: 0.5, batch size: 5, epoch count: 50, and activation: ReLU. In this study, the emotion classification results of the CLBEDC-SND method are tested using a dataset comprising 3500 samples. The dataset holds samples under 7 different class labels as depicted in Table 1.

The emotion classifier results of the CLBEDC-SND model are presented in the form of confusion matrix in Figure 3. The figure implied that the CLBEDC-SND model has properly identified all of the emotions that exist in the dataset.

Table 2 and Figure 4 provide overall emotion classification outcomes of the CLBEDC-SND method on entire dataset. The experimental values indicated the CLBEDC-SND method has achieved improved results under all classes. For instance, in FEAR class, the CLBEDC-SND model has offered

a c c u_{y}

of 98.49%,

p r e c_{n}

of 93.91%,

r e c a_{l}

of 95.60%,

s p e c_{y}

of 98.97%, and

F_{s c o r e}

of 94.75%. Moreover, in JOY class, the CLBEDC-SND approach has rendered

a c c u_{y}

of 99.03%,

p r e c_{n}

of 96.60%,

r e c a_{l}

of 96.60%,

s p e c_{y}

of 99.43%, and

F_{s c o r e}

of 96.60%. Likewise, in ANGER class, the CLBEDC-SND algorithm has provided

a c c u_{y}

of 98.54%,

p r e c_{n}

of 95.35%,

r e c a_{l}

of 94.40%,

s p e c_{y}

of 99.23%, and

F_{s c o r e}

of 94.87%.

Table 3 and Figure 5 offer overall emotion classification outcomes of the CLBEDC-SND approach on 70% of training (TR) dataset. The experimental values showed that the CLBEDC-SND method have achieved improved results under all classes. For example, in FEAR class, the CLBEDC-SND algorithm has presented

a c c u_{y}

of 98.57%,

p r e c_{n}

of 94.21%,

r e c a_{l}

of 96.07%,

s p e c_{y}

of 99%, and

F_{s c o r e}

of 95.13%. Further, in JOY class, the CLBEDC-SND technique has rendered

a c c u_{y}

of 99.06%,

p r e c_{n}

of 95.89%,

r e c a_{l}

of 97.32%,

s p e c_{y}

of 99.34%, and

F_{s c o r e}

of 96.60%. Similarly, in ANGER class, the CLBEDC-SND approach has granted

a c c u_{y}

of 98.37%,

p r e c_{n}

of 94.90%,

r e c a_{l}

of 93.84%,

s p e c_{y}

of 99.14%, and

F_{s c o r e}

of 94.37%.

Table 4 and Figure 6 demonstrate complete emotion classification outcomes of the CLBEDC-SND approach on 30% of testing (TS) dataset. The experimental values specified the CLBEDC-SND technique have gained improved results under all classes. For example, on FEAR class, the CLBEDC-SND approach has offered

a c c u_{y}

of 98.29%,

p r e c_{n}

of 93.15%,

r e c a_{l}

of 94.44%,

s p e c_{y}

of 98.90%, and

F_{s c o r e}

of 93.79%. Further, on JOY class, the CLBEDC-SND technique has granted

a c c u_{y}

of 98.95%,

p r e c_{n}

of 98.11%,

r e c a_{l}

of 95.12%,

s p e c_{y}

of 99.66%, and

F_{s c o r e}

of 96.59%. Likewise, on ANGER class, the CLBEDC-SND approach has presented

a c c u_{y}

of 98.95%,

p r e c_{n}

of 96.48%,

r e c a_{l}

of 95.80%,

s p e c_{y}

of 99.45%, and

F_{s c o r e}

of 96.14%.

The training accuracy (TRA) and validation accuracy (VLA) attained by the CLBEDC-SND algorithm on test dataset is exemplified in Figure 7. The experimental result implicit the CLBEDC-SND algorithm has gained maximum values of TRA and VLA. Seemingly the VLA is greater than TRA.

The training loss (TRL) and validation loss (VLL) reached by the CLBEDC-SND method on test dataset are given in Figure 8. The TRL and VLL values needs to be lower for enhanced classification results. The experimental outcome showed the CLBEDC-SND technique has established minimal values of TRL and VLL. Particularly, the VLL is lower than TRL.

A clear precision-recall inspection of the CLBEDC-SND algorithm on test dataset is portrayed in Figure 9. The precision-recall curve demonstrates the tradeoff between precision and recall for various threshold values. A high area under the curve denotes increased high recall and high precision, where high precision related to a low false positive rate, and high recall related to a low false negative rate. The figure denoted the CLBEDC-SND technique has resulted in enhanced values of precision-recall values under all classes.

A brief ROC study of the CLBEDC-SND algorithm on the test dataset is portrayed in Figure 10. The outcomes denoted that the CLBEDC-SND method has shown its ability in categorizing distinct classes on test dataset. It is a graph presenting the performance of a classification model at all classification thresholds. It is a probability curve and tells how much the model is capable of distinguishing between classes.

For ensuring the enhanced emotion classification results of the CLBEDC-SND model, a detailed comparative analysis is shown in Table 5 and Figure 11 [24]. The obtained results revealed that the CLBEDC-SND method has accomplished improved results under all measures. For example, with respect to

a c c u_{y}

, the CLBEDC-SND method has reached increased

a c c u_{y}

of 98.72% whereas the LRA-DNN, DNN, CNN, and ANN models have offered decreased

a c c u_{y}

of 94.48%, 92.08%, 89.93%, and 88.05%, respectively. In the meantime, with respect to

p r e n_{c}

, the CLBEDC-SND technique has gained increased

p r e n_{c}

of 95.49% whereas the LRA-DNN, DNN, CNN, and ANN approaches have presented decreased

p r e n_{c}

of 87.96%, 86.74%, 80.47%, and 78.43% correspondingly. Furthermore, with respect to

r e c a_{l}

, the CLBEDC-SND method has attained increased

r e c a_{l}

of 95.53% whereas the LRA-DNN, DNN, CNN, and ANN models have presented decreased

r e c a_{l}

of 91.70%, 90.53%, 87.59%, and 84.75%, correspondingly. Next, with respect to

F_{s c o r e}

, the CLBEDC-SND technique has attained increased

F_{s c o r e}

of 99.25% whereas the LRA-DNN, DNN, CNN, and ANN methodologies have rendered decreased

F_{s c o r e}

of 91.67%, 88.39%, 86.10%, and 83.93%, correspondingly.

From these experimental evaluations, it is concluded that the CLBEDC-SND model has offered superior emotion classification results over other models. The enhanced performance of the CLBEDC-SND model is due to the inclusion of fuzzy based sentiment scoring and SFLO based optimal parameter tuning process. Therefore, the proposed model can be employed in real time social networking sites such as Facebook, Twitter, Instagram, etc. for emotion classification.

5. Conclusions

In this article, a novel CLBEDC-SND approach was designed for the recognition and classification of emotions in social networking data. The presented CLBEDC-SND model performed distinct levels of data pre-processing to make it compatible for further processing. In addition, the CLBEDC-SND model undergoes vectorization and sentiment scoring process using fuzzy approach. For emotion classification, the presented CLBEDC-SND model employed the ELM approach. Finally, the parameters of the ELM method are optimally modified by the use of SFLO algorithm. The performance validation of the CLBEDC-SND method can be tested using benchmark datasets. The experimental outcomes demonstrate the better performance of the CLBEDC-SND method over other models with maximum accuracy of 98.72%. In future, feature selection approaches will be integrated into the CLBEDC-SND technique for enhancing classification performance and reducing complexity levels. Moreover, the presented CLBEDC-SND technique can be tested on real time large-scale datasets.

Author Contributions

Conceptualization, H.H.A.-B.; Data curation, H.J.A.; Formal analysis, H.J.A.; Funding acquisition, M.K.N.; Project administration, A.Y. and M.O.; Resources, A.Y.; Software, O.A.; Supervision, O.A.; Validation, R.A.; Visualization, R.A.; Writing—original draft, H.H.A.-B.; Writing—review & editing, M.K.N. and M.O. All authors have read and agreed to the published version of the manuscript.

Funding

Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2022R281), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia. The authors would like to thank the Deanship of Scientific Research at Umm Al-Qura University for supporting this work by Grant Code: (22UQU4310373DSR38).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data sharing not applicable to this article as no datasets were generated during the current study.

Conflicts of Interest

The authors declare that they have no conflict of interest. The manuscript was written through contributions of all authors. All authors have given approval to the final version of the manuscript.

References

Nandwani, P.; Verma, R. A review on sentiment analysis and emotion detection from text. Soc. Netw. Anal. Min. 2021, 11, 1–19. [Google Scholar] [CrossRef] [PubMed]
Ahire, V.; Borse, S. Emotion detection from social media using machine learning techniques: A survey. In Applied Information Processing Systems; Springer: Singapore, 2022; pp. 83–92. [Google Scholar]
Zad, S.; Heidari, M.; James, H., Jr.; Uzuner, O. Emotion detection of textual data: An interdisciplinary survey. In Proceedings of the 2021 IEEE World AI IoT Congress (AIIoT), Seattle, WA, USA, 10–13 May 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 255–261. [Google Scholar]
Sailunaz, K.; Dhaliwal, M.; Rokne, J.; Alhajj, R. Emotion detection from text and speech: A survey. Soc. Netw. Anal. Min. 2018, 8, 1–26. [Google Scholar] [CrossRef]
Gaind, B.; Syal, V.; Padgalwar, S. Emotion detection and analysis on social media. arXiv 2019, arXiv:1901.08458. [Google Scholar]
Graterol, W.; Diaz-Amado, J.; Cardinale, Y.; Dongo, I.; Lopes-Silva, E.; Santos-Libarino, C. Emotion detection for social robots based on NLP transformers and an emotion ontology. Sensors 2021, 21, 1322. [Google Scholar] [CrossRef] [PubMed]
Sasidhar, T.T.; Premjith, B.; Soman, K.P. Emotion detection in hinglish (hindi+ english) code-mixed social media text. Procedia Comput. Sci. 2020, 171, 1346–1352. [Google Scholar] [CrossRef]
Mustakim, N.; Rabu, R.; Mursalin, G.M.; Hossain, E.; Sharif, O.; Hoque, M.M. CUET-NLP@ TamilNLP-ACL2022: Multi-Class Textual Emotion Detection from Social Media using Transformer. In Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages, Dublin, Irelan, 26 May 2022; Association for Computational Linguistics: Stroudsburg, PA, USA, 2022; pp. 199–206. [Google Scholar]
Acheampong, F.A.; Nunoo-Mensah, H.; Chen, W. Transformer models for text-based emotion detection: A review of BERT-based approaches. Artif. Intell. Rev. 2021, 54, 5789–5829. [Google Scholar] [CrossRef]
Vasantharajan, C.; Benhur, S.; Kumarasen, P.K.; Ponnusamy, R.; Thangasamy, S.; Priyadharshini, R.; Durairaj, T.; Sivanraju, K.; Sampath, A.; Chakravarthi, B.R.; et al. Tamilemo: Finegrained emotion detection dataset for tamil. arXiv 2022, arXiv:2202.04725. [Google Scholar]
De, A.; Mishra, S. Augmented Intelligence in Mental Health Care: Sentiment Analysis and Emotion Detection with Health Care Perspective. In Augmented Intelligence in Healthcare: A Pragmatic and Integrated Analysis; Springer: Singapore, 2022; pp. 205–235. [Google Scholar]
Aslam, N.; Rustam, F.; Lee, E.; Washington, P.B.; Ashraf, I. Sentiment Analysis and Emotion Detection on Cryptocurrency Related Tweets Using Ensemble LSTM-GRU Model. IEEE Access 2022, 10, 39313–39324. [Google Scholar] [CrossRef]
Al-Wesabi, F.N. A hybrid intelligent approach for content authentication and tampering detection of arabic text transmitted via internet. Comput. Mater. Contin. 2021, 66, 195–211. [Google Scholar] [CrossRef]
Kaur, R.; Kautish, S. Multimodal sentiment analysis: A survey and comparison. In Research Anthology on Implementing Sentiment Analysis across Multiple Disciplines; IGI Global: Hershey, PA, USA, 2022; pp. 1846–1870. [Google Scholar]
Kumar, S.; Prabha, R.; Samuel, S. Sentiment Analysis and Emotion Detection with Healthcare Perspective. In Augmented Intelligence in Healthcare: A Pragmatic and Integrated Analysis; Springer: Singapore, 2022; pp. 189–204. [Google Scholar]
Kabir, M.Y.; Madria, S. EMOCOV: Machine learning for emotion detection, analysis and visualization using COVID-19 tweets. Online Soc. Netw. Media 2021, 23, 100135. [Google Scholar] [CrossRef] [PubMed]
Zhang, X.; Li, W.; Ying, H.; Li, F.; Tang, S.; Lu, S. Emotion detection in online social networks: A multilabel learning approach. IEEE Internet Things J. 2020, 7, 8133–8143. [Google Scholar] [CrossRef]
Vijayvergia, A.; Kumar, K. Selective shallow models strength integration for emotion detection using GloVe and LSTM. Multimed. Tools Appl. 2021, 80, 28349–28363. [Google Scholar] [CrossRef]
Rashid, U.; Iqbal, M.W.; Skiandar, M.A.; Raiz, M.Q.; Naqvi, M.R.; Shahzad, S.K. Emotion Detection of Contextual Text using Deep learning. In Proceedings of the 2020 4th International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT), Istanbul, Turkey, 22–24 October 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–5. [Google Scholar]
Feng, J.; Rao, Y.; Xie, H.; Wang, F.L.; Li, Q. User group based emotion detection and topic discovery over short text. World Wide Web 2020, 23, 1553–1587. [Google Scholar] [CrossRef]
Riza, M.A.; Charibaldi, N. Emotion Detection in Twitter Social Media Using Long Short-Term Memory (LSTM) and Fast Text. Int. J. Artif. Intell. Robot 2021, 3, 15–26. [Google Scholar] [CrossRef]
Imran, A.S.; Daudpota, S.M.; Kastrati, Z.; Batra, R. Cross-cultural polarity and emotion detection using sentiment analysis and deep learning on COVID-19 related tweets. IEEE Access 2020, 8, 181074–181090. [Google Scholar] [CrossRef]
Shrivastava, K.; Kumar, S.; Jain, D.K. An effective approach for emotion detection in multimedia text data using sequence based convolutional neural network. Multimed. Tools Appl. 2019, 78, 29607–29639. [Google Scholar] [CrossRef]
Shelke, N.; Chaudhury, S.; Chakrabarti, S.; Bangare, S.L.; Yogapriya, G.; Pandey, P. An efficient way of text-based emotion analysis from social media using LRA-DNN. Neurosci. Inform. 2022, 2, 100048. [Google Scholar] [CrossRef]
Gagliardi, I.; Artese, M.T. Semantic unsupervised automatic keyphrases extraction by integrating word embedding with clustering methods. Multimodal Technol. Interact. 2020, 4, 30. [Google Scholar] [CrossRef]
Huang, G.-B.; Zhu, Q.-Y.; Siew, C.-K. Extreme learning machine: Theory and applications. Neurocomputing 2006, 70, 489–501. [Google Scholar] [CrossRef]
Al-Shamiri, A.K.; Sadollah, A.; Kim, J.H. April. Harmony search algorithms for optimizing extreme learning machines. In International Conference on Harmony Search Algorithm; Springer: Singapore, 2020; pp. 11–20. [Google Scholar]
Jazebi, S.J.; Ghaffari, A. RISA: Routing scheme for Internet of Things using shuffled frog leaping optimization algorithm. J. Ambient. Intell. Humaniz. Comput. 2020, 11, 4273–4283. [Google Scholar] [CrossRef]

Figure 1. Working process of CLBEDC-SND approach.

Figure 2. Architecture of ELM.

Figure 3. Confusion matrices of CLBEDC-SND approach (a) Entire dataset, (b) 70% of TR data, and (c) 30% of TS data.

Figure 4. Result analysis of CLBEDC-SND approach under entire dataset.

Figure 5. Result analysis of CLBEDC-SND approach under 70% of TR data.

Figure 6. Result analysis of CLBEDC-SND approach under 30% of TS data.

Figure 7. TRA and VLA analysis of CLBEDC-SND approach.

Figure 8. TRL and VLL analysis of CLBEDC-SND approach.

Figure 9. Precision-recall analysis of CLBEDC-SND approach.

Figure 10. ROC curve analysis of CLBEDC-SND approach.

Figure 11. Comparative analysis of CLBEDC-SND technique with existing algorithms.

Table 1. Dataset details.

Class	No. of Samples
FEAR	500
JOY	500
ANGER	500
SAD	500
DISGUST	500
GUILT	500
SHAME	500
Total Number of Samples	3500

Table 2. Result analysis of CLBEDC-SND approach with distinct class labels under entire dataset.

Entire Dataset
Labels	Accuracy	Precision	Recall	Specificity	F-Score
FEAR	98.49	93.91	95.60	98.97	94.75
JOY	99.03	96.60	96.60	99.43	96.60
ANGER	98.54	95.35	94.40	99.23	94.87
SAD	99.06	96.61	96.80	99.43	96.70
DISGUST	98.63	95.93	94.40	99.33	95.16
GUILT	98.89	95.28	97.00	99.20	96.13
SHAME	98.80	96.36	95.20	99.40	95.77
Average	98.78	95.72	95.71	99.29	95.71

Table 3. Result analysis of CLBEDC-SND approach with distinct class labels under 70% of TR data.

Training Phase (70%)
Labels	Accuracy	Precision	Recall	Specificity	F-Score
FEAR	98.57	94.21	96.07	99.00	95.13
JOY	99.06	95.89	97.32	99.34	96.60
ANGER	98.37	94.90	93.84	99.14	94.37
SAD	99.31	98.02	97.19	99.67	97.60
DISGUST	98.49	96.38	93.51	99.38	94.92
GUILT	98.98	95.21	97.25	99.25	96.22
SHAME	98.82	95.97	95.69	99.33	95.83
Average	98.80	95.80	95.84	99.30	95.81

Table 4. Result analysis of CLBEDC-SND approach with distinct class labels under 30% of TS data.

Testing Phase (30%)
Labels	Accuracy	Precision	Recall	Specificity	F-Score
FEAR	98.29	93.15	94.44	98.90	93.79
JOY	98.95	98.11	95.12	99.66	96.59
ANGER	98.95	96.48	95.80	99.45	96.14
SAD	98.48	93.24	95.83	98.90	94.52
DISGUST	98.95	94.74	96.92	99.24	95.82
GUILT	98.67	95.43	96.53	99.09	95.98
SHAME	98.76	97.28	94.08	99.55	95.65
Average	98.72	95.49	95.53	99.25	95.50

Table 5. Comparative analysis of CLBEDC-SND approach with existing algorithms.

Methods	Accuracy	Precision	Recall	F-Score
CLBEDC-SND	98.72	95.49	95.53	99.25
LRA-DNN	94.48	87.96	91.70	91.67
Depp Neural Network (DNN) Model	92.08	86.74	90.53	88.39
CNN Model	89.93	80.47	87.59	86.10
Artificial Neural Network (ANN)	88.05	78.43	84.75	83.93

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Al-Baity, H.H.; Alshahrani, H.J.; Nour, M.K.; Yafoz, A.; Alghushairy, O.; Alsini, R.; Othman, M. Computational Linguistics Based Emotion Detection and Classification Model on Social Networking Data. Appl. Sci. 2022, 12, 9680. https://doi.org/10.3390/app12199680

AMA Style

Al-Baity HH, Alshahrani HJ, Nour MK, Yafoz A, Alghushairy O, Alsini R, Othman M. Computational Linguistics Based Emotion Detection and Classification Model on Social Networking Data. Applied Sciences. 2022; 12(19):9680. https://doi.org/10.3390/app12199680

Chicago/Turabian Style

Al-Baity, Heyam H., Hala J. Alshahrani, Mohamed K. Nour, Ayman Yafoz, Omar Alghushairy, Raed Alsini, and Mahmoud Othman. 2022. "Computational Linguistics Based Emotion Detection and Classification Model on Social Networking Data" Applied Sciences 12, no. 19: 9680. https://doi.org/10.3390/app12199680

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Computational Linguistics Based Emotion Detection and Classification Model on Social Networking Data

Abstract

1. Introduction

2. Related Works

3. The Proposed Model

3.1. Data Pre-Processing

3.2. Vectorization and Sentiment Scoring

3.3. Emotion Classification

3.4. Hyperparameter Tuning

3.4.1. Global Search Process

3.4.2. Local Search Process

4. Results and Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI