Context-Based Fake News Detection Model Relying on Deep Learning Models

Amer, Eslam; Kwak, Kyung-Sup; El-Sappagh, Shaker

doi:10.3390/electronics11081255

Open AccessArticle

Context-Based Fake News Detection Model Relying on Deep Learning Models

by

Eslam Amer

^1,†,

Kyung-Sup Kwak

^2,*

and

Shaker El-Sappagh

^3,4,†

¹

Faculty of Computer Science, Misr International University, Cairo 11828, Egypt

²

Graduate School of Information Technology and Telecommunications, Inha University, Nam-gu, Incheon 402-751, Korea

³

Faculty of Computer Science and Engineering, Galala University, Suez 43511, Egypt

⁴

Information Systems Department, Faculty of Computers and Artificial Intelligence, Benha University, Banha 13518, Egypt

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Electronics 2022, 11(8), 1255; https://doi.org/10.3390/electronics11081255

Submission received: 20 March 2022 / Revised: 7 April 2022 / Accepted: 13 April 2022 / Published: 15 April 2022

(This article belongs to the Section Computer Science & Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

Nowadays, due to the great accessibility to the internet, people seek out and consume news via social media due to its low cost, ease of access, and quick transmission of information. The tremendous leverage of social media applications in daily life makes them significant information sources. Users can post and share different types of information in all their forms with a single click. However, the cost becomes expensive and dangerous when non-experts say anything about anything. Fake news are rapidly dominating the dissemination of disinformation by distorting people’s views or knowledge to influence their awareness and decision-making. Therefore, we have to identify and prevent the problematic effects of falsified information as soon as possible. In this paper, we conducted three experiments with machine learning classifiers, deep learning models, and transformers. In all experiments, we relied on word embedding to extract contextual features from articles. Our experimental results showed that deep learning models outperformed machine learning classifiers and the BERT transformer in terms of accuracy. Moreover, results showed almost the same accuracy between the LSTM and GRU models. We showed that by combining an augmented linguistic feature set with machine or deep learning models, we can, with high accuracy, identify fake news.

Keywords:

fake news; transformers; LSTM; GRU; deep learning

1. Introduction

Currently, social media has surpassed its main function as a tool for communication between people. Nowadays, users rapidly rely on social networking platforms to acquire various topics, including new emergent social issues [1]. Moreover, to be informed about breaking news articles. These platforms make it possible for anybody with an internet-connected tool to express their opinions or to publish an announcement about a developing incident that they may be watched in real-time. As a result, social media has emerged as an extremely effective tool for journalists.

Although social media enables exceptional access to information, the lack of comprehensive effort on platforms to monitor postings results in the spread of lies and misinformation. Therefore, additional responsibilities is needed to confirm its origin and accuracy [2]. Usually, updates to breaking news stories are often given in fragments, which results in a considerable percentage of those updates being unconfirmed at the time of publishing, with some subsequently proving to be incorrect. Accordingly, without an official declaration supporting or denying a continuing rumor, social media users often offer their own opinions on its credibility through a form of mutual, inter-subjective perception, which may result in the revelation of the rumor’s veracity.

The widespread dissemination of false news can have a detrimental effect on both people and society. Regarding society, fake news threaten to disrupt the news ecosystem’s authenticity balance. For instance, the most famous fake news was shared even more extensively on Facebook than the most popular legitimate mainstream news during the 2016 United States presidential election. Additionally, fake news are designed to influence customers to embrace skewed or misleading notions. Nevertheless, we may claim that false news and, more generally, misinformation are significant issues on the web and may eventually have a significant societal cost. Consequently, both social media platforms and the academic community are very vigilant about finding and evaluating any false claims. For instance, Facebook (https://www.facebook.com/help/188118808357379 (accessed on 30 January 2022)) has issued a concise set of recommendations to prevent people from falling for fake news articles. However, a recent investigation discovered that Facebook referrals accounted for 50% of false news destinations and 20% of false information appearing on reputable websites [3].

Fake news may also be detrimental to social media platforms and have real-world ramifications for individuals and society. Additionally, the broad transmission of erroneous data has increasingly negative consequences in commerce and stock trading. For instance, in 2013, when bogus news circulated on Twitter claiming Barack Obama had been wounded in a bomb, the market dropped 130 billion dollars in value [4]. Moreover, the quick turnover of news on social media sites makes assessing its unflinching quality in real-time challenging [5]. Therefore, robotic devices have become vital to counteracting the spread of bogus news on social media. Hence, identifying and eradicating false news has become a major subject of conversation on social media platforms [6]. The unprecedented growth of fake news is not only an issue of deception; it is also one of the most severe dangers to democratic governance itself [7].

Recent advancements in news verification techniques address the rising requirement for an automated way of distinguishing true news from fake news in an enormous amount of data [8,9]. In general, current methods that detect fake news are classified into two categories depending on their underlying methodology: linguistic or network techniques. For example, natural language processing, or NLP, is oriented toward news content and tries to uncover patterns of fake news by studying the underlying semantics. In contrast, network techniques rely on established knowledge networks to verify facts. Modern operations research studies have begun to use the capabilities of these text analytics and modeling methodologies across a variety of application areas. For instance, text analytics techniques have been used to recommend improvements to the design of augmented reality health applications [10,11].

Previously, most efforts to combat the transmission of misleading information have mostly concentrated on combating the fake news items produced by bots [12,13]. The term “bots” is often defined as social media accounts that publish disinformation more frequently than legitimate accounts, i.e., the frequency with which bots propagate unrelated news seems to be high. Their primary objective is to target users who have some influence over others (particularly their social media followers) [12]. Additionally, it was observed that the bots employed to transmit false information to the target audience were often active during the early stages of fake news propagation, attracting people with similar beliefs who reported the same news in their feeds.

Similarly, it has been discovered that social bots often fill the social sphere with the intent of causing damage and deceiving social media users [12,14]. Moreover, they have been used to provoke political upheavals, destabilize the financial market, steal personal information, and propagate disinformation. The primary difficulty occurs due to the network’s dynamic nature. Due to the fact that the network deals with real-time data, it is critical to monitor the spread of rumors early in the process [15].

Early fake news detection approaches relied on natural language processing techniques. Specifically, they extensively used the weighting of bi-grams using term-frequency-inverse document frequency tfidf described by Equation (1).

t f i d f (t, d, D) = t f (t, d) \times i d f (t, D)

(1)

where t represents the words; d represents each document, and D represents the collection of documents.

Additionally, they studied the false news articles’ detection methodologies based on probabilistic context-free grammar. The features that they got were fed into machine learning classifiers such as support vector machine (SVM), gradient boosting, bounded decision trees, stochastic gradient descent, and random forests. Accordingly, they got a false news detection accuracy of 77.2% [15,16]. The accuracy value was not that high, because, as indicated in [17], the tfidf method does not preserve the contextual meaning behind the word co-occurrences.

Numerous approaches have been developed to ascertain the reliability of news sources. Existing approaches for detecting fake news are designed to identify purposefully false material. Unfortunately, these technologies are insufficient for automatically and precisely identifying fake news among the huge amounts of fresh data created daily through social media and online services. Therefore, in our paper, we propose a model that effectively identifies fake or false news to solve these issues [18].

Our significant contributions to this area and the key features of the proposed technique can be summarized as follows:

Proposing a contextual-based fake news detection model that relies on deep learning models;
Demonstrating our proposed model capability in accurately detecting fake news;
Comparing the performance of machine learning classifiers and deep learning models in identifying and detecting fake news.

Our article is organized as follows. In Section 2, we summarize the most related fake news models that are relevant to our work. Section 3 summarizes the proposed strategy and its primary components. Section 4 summarizes the experimental findings and discusses them. Finally, Section 6 concludes the paper and summarizes the next steps.

2. Related Work

Current research work in fact-checking, news verification and fake news exposure over the World Wide Web involves various methods and activities to obtain the best results. Notwithstanding its quick rise in prominence, the issue is still considered to be in its adolescence by scholars.

Popat et al. [19] proposed DeClarE, which is an end-to-end neural network model for differentiating fake news and claims. Their work relied on pieces of evidence, and counter-evidences gleaned from the web to substantiate or dispute claims. Specifically, they utilized the bi-directional LSTM model with attention and source embeddings. They achieved an overall accuracy of 80%.

Buntain and Gol-beck [20] proposed an automatic recognition of fake news on social media, specifically; Twitter. They assessed their methodology using the CREDBANK and Pheme databases. Their work used different features extracted from users’ profiles and temporal features along with Random Forest (RF) classifier. Additionally, they demonstrated that models trained on crowd-sourced individuals surpass models trained on journalistic evaluations and trained on a pooled dataset of participatory workers and journalists.

Girgis et al. [3] suggested a classification model for predicting false news by utilizing vanilla RNN (Recurrent Neural Network) related models like GRU (Gated Recurrent Unit) models and LSTMs (long short-term memories). Their work utilized the LIAR dataset, which contains 12,836 brief statements that have been categorized with different contexts. They separated each phrase and excluded superfluous words while compiling the dataset. Finally, they produced three trials with Vanilla, GRU, and LSTM and compared the outcomes for accuracy. GRU demonstrated a better outcome than Vanilla and LSTM.

Ajao et al. [21] employed hybrid LSTM-CNN models for fake news detection. In their work, they included a 1-D CNN model after the LSTM one. They achieved an accuracy of 0.80 in the prediction of the fake tweets.

Zhou et al. [22] aimed their work to stop the fake news propagation early on. Therefore, they investigated the content-based as well as the propagation-based characteristics of fake news. Consequently, they analyzed the news content in the FakeNewsNet dataset from different perspectives, including the lexicon-, syntax-, semantic-, and discourse-level. Accordingly, the extracted features are employed by conventional machine learning classifiers, including SVM, RF, Gradient Boosting (XGB), LR, and Naive Bayes.

Hamdi et al. [23] used a recent embedding methodology. In their work, they integrated the graph embeddings that were generated from the Twitter user follower network with user characteristics in order to assess the reliability of sources and, therefore, the news they post or share.

Karimi et al. [24] introduced a recent fake news detection framework. In their work, they made use of the structural discourse of fake news. The authors assess their framework using hybrid datasets. Moreover, they utilized machine learning algorithms like SVM and deep learning models such as LSTM and a hybrid Bi-directional Gated Recurrent Neural Network (BiGRNN) for classification. Their work showed an accuracy of 82.19%.

The previous works relied on statistical methods to assign weights to features. Therefore, the machine learning or deep learning models that used those features showed accuracy results that were not that high. Accordingly, in our proposed model, we used contextual methods such as word embedding to assign weights to features extracted from the textual dataset. We showed that our model predicts fake news with high accuracy.

3. Proposed Work

The proposed model, as described in Figure 1, is composed of three stages, namely, preprocessing, the embedding model, and the classification model. Each stage will be described in the following sections.

3.1. Data Preprocessing

Preprocessing data is considered a vital stage in natural language processing applications such as keyphrase extraction [25,26,27] and search engine optimization [17,28]. Consequently, false news identification directly affects the model’s efficacy concerning the data’s complexity. Normally, the textual datasets, including fake news datasets, often include numerous links, hashtags, and special symbols. Accordingly, we performed many layers of preprocessing on the dataset. We performed the following steps:

Replace contractions: We convert the contraction versions of words into their original and formal form (e.g., “don’t” is converted to “do not”);
Upper- to lower-case conversion: To guarantees the correlation within the features;
URLs Removal: Since URLs that are included in the articles have no meaning, it is preferred to eliminate them from the text.;
Special symbols Removal: Like punctuation, emojis, and other special characters.

3.2. Word Embedding

Word embedding is a methodology that is used to transform textual data (words) into vectors. The main concept behind the word embedding is that words that are similar to each other will be close in the dimension space. Consequently, each word is characterized through an n-dimensional dense vector. Unlike other embedding methods like Word2vec, GloVe obtains word vectors by including global data (word co-occurrence).

We adopted the GloVe model (Global Vectors) in our model since it was created to examine the local context and global statistics of words before embedding them. The GloVe model’s central notion is that it emphasizes the co-occurrence probabilities of words within a corpus of texts in order to embed them in meaningful vectors, as indicated in Equation (2). In other words, we are interested in the frequency with which a word j occurs in the context of a word i across our corpus of texts.

P_{i j} = P (j ∣ i) = \frac{X_{i j}}{\sum_{k \in context} X_{i k}}

(2)

where X is a word-word co-occurrence matrix,

X_{i} j

is the frequency that word j emerges in the context of word i.

The GloVe model’s objective is to construct a function F that can predict such ratios given two-word vectors word i and word j and a context word vector for word k as parameters, as shown in Equation (3).

F (w_{i}, w_{j}, {\tilde{w}}_{k}) = \frac{P_{i k}}{P_{j k}}

(3)

Subsequently, GloVe will demonstrate the appropriate word vectors

w_{i}

and

w_{j}

throughout the training in order to reduce this weighted least squares issue. Furthermore, a weight function

f (X i j)

must be employed to limit the significance of very frequent co-occurrences (such as “this is”) and to avoid uncommon co-occurrences from having the same significance as common ones.

J = \sum_{i, j = 1}^{V} f (X_{i j}) {(w_{i}^{T} {\tilde{w}}_{j} + b_{i} + {\tilde{b}}_{j} - log (X_{i j}))}^{2}

(4)

In conclusion, the GloVe model takes advantage of a relevant source of information to fulfill the requested word resemblance task: co-occurrence probability ratios. Following that, it constructs an objective function J that maps word vectors to text statistics. Finally, GloVe minimizes the J function by learning incentive for word vectors.

Accordingly, to generate the embedding matrix for all words, we relied on the Golve embedding [29]. The Golve is an unsupervised learning approach that creates representations for words in the form of vectors. The resultant representations emphasized fascinating linear substructures of the word vector space, which are trained using aggregated global word-word co-occurrence information extracted from a corpus. In our model, we used glove.840B.300d, which has a 300-dimensional vector embedding.

3.3. Classification Model

In the classification model, we implemented three experiments. In the first experiment, we applied machine learning algorithms, such as support vector machine (SVM), random forest (RF), and decision tree (DT). In the second experiment, we experimented with our dataset with long short-term memory (LSTM) and gated recurrent units (GRU) models. These models are considered the most popular deep learning models that deal with sequences. However, in the third experiment, we experimented with the BERT transformer to predict fake news.

LSTM and GRU are methodologies for deep learning that rely on recurrent neural networks (RNNs). Due to the ineffectiveness of RNN for long-term learning, LSTM and GRU were developed. GRU may also be regarded as a variant of the LSTM, since both are constructed similarly and give similar outstanding outcomes in certain circumstances. Using the LSTM or GRU architecture, any long-term interdependence may be retained and taught at random intervals. Additionally, LSTM or GRU are not dependent on any vectorization method, like tfidf [30].

LSTM outperformed RNN by including a module called constant error carousel (CEC). CEC spreads constant error signals across time, leveraging a well-designed “gate” structure to prevent backpropagated faults from fading or rising. As it switches to control data flow and memory, the “gate” structure calculates the internal value of CEC depending on the current input values and previous context.

Therefore, for any given sequence of words,

S = {s_{1}, s_{2}, s_{3}, \dots, s_{t}}

, the input gate, forget gate, and output gate in the LSTM structure, are notated as

i_{t}

,

f_{t}

, and

o_{t}

, respectively. The weight and bias values associated to the above gates are

W_{i}

,

W_{f}

,

W_{o}

,

b_{i}

,

b_{f}

,

b_{o}

, respectively. Through every step, LSTM updates two states: the hidden state

h_{t}

, and the cell state

c_{t}

, and

σ

denotes the sigmoid function. The above parameters are presented as follows:

\begin{matrix} f_{t} & = σ (W_{f} * [h_{t - 1}, x_{t}] + b_{f}) \\ i_{t} & = σ (W_{i} * [h_{t - 1}, x_{t}] + b_{i}) \\ \tilde{C_{t}} & = tanh (W_{c} * [h_{t - 1}, x_{t}] + b_{c}) \\ C_{t} & = f_{t} * C_{t - 1} + i_{t} * \tilde{C_{t}} \\ o_{t} & = σ (W_{o} * [h_{t - 1}, x_{t}] + b_{o}), \\ h_{t} & = o_{t} * tanh (C_{t}) . \end{matrix}

(5)

GRU was intended to address the long-short dependence issue with disappearing and inflating gradients, which is another well-known use of LSTM [31]. Accordingly, GRU is purpose-built to work with sequential data that exhibits patterns across time steps, such as time-series data. The construction of GRU is more simpler than that of LSTM, which comprises three gates: an input gate, a forget gate, and an output gate. As a result, the training speed of GRU is somewhat quicker than that of the LSTM.

The update gate specifies the amount of data that should be added to the next state cell. When the update gate value is greater, more information is sent to the next state cell. The reset gate specifies how much prior data should be erased.

Therefore, some information generated from the previous cell might be ignored or forgotten when the value of the reset gate evolves. Therefore, some information generated from the previous cell might be ignored or forgotten when the value of the reset gate is higher [31]. The following equation describes the update gate

Z_{t}

, and the reset gate

r_{t}

.

\begin{matrix} Z_{t} = σ (W^{z} x_{t} + V^{z} h_{t - 1} + b_{z}) \\ r_{t} = σ (W^{r} x_{t} + V^{r} h_{t - 1} + b_{r}) \end{matrix}

(6)

The hidden state

h_{t}

associated with the specific time step t is calculated using a linear interpolation of the prior activation function at step

t - 1

and the candidate hidden state

h_{t}

.

Both the concealed state and the prospective concealed state are defined as follows:

\begin{matrix} h_{t} = (1 - z_{t}) \otimes h_{t - 1} + z_{t} \otimes {\bar{h}}_{t} \\ {\bar{h}}_{t} = tanh (W^{c} x_{t} + V^{c} (r_{t} \otimes h_{t - 1})) \end{matrix}

(7)

where

W^{z}

,

W^{r}

, and

W^{c}

denote the input weighted matrices. Moreover, the parameters

V^{z}

,

V^{r}

, and

V^{c}

are the recurrent weight matrices. The vectors

b_{z}

and

b_{r}

denote the bias vectors. Finally, the function

σ

denotes the activation function.

Our model examines the possible stacking of LSTM layers or GRU layers. Therefore, we experimented with a single LSTM or GRU layer, double LSTM or GRU layers, and finally triple LSTM or GRU layers. During our first experiment, we set our LSTM or GRU models to have a number of neurons of 128. In our second experiment, we stack another LSTM/GRU layer with a number of neurons set to 64. Finally, in our third experiment, we stacked another LSTM/GRU layer with a number of neurons set to 32.

During our experimentation, we fit our task to be a binary classification. Accordingly, we trained our LSTM and GRU models to return zero for real news and one for fake news. We used the ADAM optimizer with an early stopping condition to avoid the training overfitting. In addition, we use a learning rate of 0.001 and a batch size of 64.

Recently, most natural language processing applications have employed the use of Transformers [32]. Transformers is a two-part design that enables the transformation of one sequence into another (Encoder and Decoder). Nonetheless, it is distinct from the sequence-to-sequence mentioned in the above models in that it does not refer to recurrent networks (GRU, LSTM, etc.). The Bidirectional Encoder Representations from Transformers (BERT) is a Google-developed transformer-based machine learning approach for pre-training natural language processing (NLP) [33,34]. Unlike traditional directional models that scan incoming text input sequentially either (left-to-right or right-to-left), the Transformer encoder reads the full sequence of words simultaneously. This property enables the model to infer the context of a word from its surroundings (left and right of the word). As a result, it is classified as bidirectional. However, it is more proper to describe it as non-directional [35,36].

4. Results and Discussion

This section will detail the dataset we utilized and the outcomes of our experiments. Additionally, we will describe our model’s evaluation.

4.1. Dataset

Our experiments are conducted using the ISOT (https://www.uvic.ca/engineering/ece/isot/datasets/fake-news/index.php (accessed on 30 September 2021)) dataset, a freely accessible dataset. The ISOT dataset, as described in Table 1, contains 45,000 English-language news stories that are approximately evenly split between real and false. The real articles were gathered from the Reuters website, while the fake or false ones were from different sites identified as fictitious by Wikipedia (https://en.wikipedia.org/wiki/List_of_fake_news_websites (accessed on 30 September 2021)) and Politifact. The datasets include the whole text of each article and the headline, date, and subject. The articles’ primary subjects are politics and foreign news, and the dates range from 2016 to 2017.

4.2. Evaluation Metrics

This part discusses the most commonly used measures for detecting false news. The majority of current methods consider the fake news dilemma as a classification challenge involving determining whether a news piece is true or not.

True Positive (TP): where articles anticipated to be fake news are really tagged as fake;
True Negative (TN): when articles anticipated to be real news are really tagged as real;
False Negative (FN): when articles anticipated to be true news are really tagged as fake;
False Positive (FP): when articles anticipated to be fake news are really tagged as real.

Accordingly, we tested the performance of our model using standard assessment metrics such as precision, recall, accuracy, and F1-score. (Equations (8)–(11)).

The precision (Equation (8)) is defined as the ratio of accurately correct predictions observations to all anticipated positive observations. True positives are outcomes for which the model correctly predicts the positive class properly. Similarly, a true negative is a result for which the model predicts the negative class accurately [37,38]. On the other hand, false positives occur when the model incorrectly anticipates the positive class. Similarly, a false negative occurs when the model anticipates the negative class inaccurately [37,38].

P r e c i s i o n = \frac{T r u e P o s i t i v e s (T P)}{T P + F a l s e P o s i t i v e s (F P)}

(8)

The recall (a.k.a. Sensitivity) (Equation (9)) is the proportion of precisely correctly classified instances to all observed positive observations in the class label [37,38].

R e c a l l = \frac{T P}{T P + F a l s e N e g a t i v e s (F N)}

(9)

The F1-Score (Equation (10)) is calculated by averaging Precision and Recall. Subsequently, the F1 score accounts for both false positives and false negatives. While F1 is not as intuitive as accuracy, it is often more helpful than accuracy, particularly when the class distribution is unbalanced. The optimal usage of the accuracy measure when the cost of false positives and negatives is comparable. However, if the cost of false positives and negatives is very disparate, it is preferable to consider both Precision and Recall [37,38].

F 1 - S c o r e = \frac{2 \times p r e c i s i o n \times r e c a l l}{p r e c i s i o n + r e c a l l}

(10)

The accuracy score (Equation (11)), sometimes referred to as the classification performance score, is measured as the ratio of correct predictions over total predictions generated by the model [37,38].

A c c u r a c y = \frac{T P + T r u e N e g a t i v e s (T N)}{T P + T N + F P + F N}

(11)

4.3. Evaluation

In our experiments, we employed five-fold cross-validation to train and evaluate the machine learning classifiers. The training data set is separated into five sections, four of which are used to train the model and one of which is used to test the model. This procedure is repeated five times to guarantee that each component is examined precisely once. Following that, we get the average

f i v e

performance estimates.

We experimented with the efficiency of various classification models when relying on statistical and contextual features. The results depicted in Table 2 and Table 3 showed that there are varying performances within same classification algorithm concerning the feature type.

For example, the decision tree returned fake news detection accuracy of 0.768 when relying on statistical features. However, it returned an accuracy value of 0.962 when relying on contextual features. Similarly, for other classification algorithms, the performance showed an enhancement in the classification accuracy.

We noticed that the classification accuracy of machine learning classifiers were enhanced when classifiers relied on the contextual features embedded in the text. Therefore, word embedding provided reliable weights for features. In comparison to the statistical features calculated with, for example, the tfidf, the same classifiers showed a declined accuracy performance of 0.771, as indicated in [15,16].

Figure 2 provides an outline regarding the difference in terms of accuracy between different algorithms. As Figure 2 describes, the performance of all algorithms, in terms of accuracy, were enhanced with ≈(2–15%) when relying on contextual features instead of statistical ones.

Similarly, our employed deep learning models also showed an improvement in terms of accuracy when we relied on contextual features. The results depicted in Table 4 and Table 5 showed that the performance of LSTM and GRU models is increased by ≈18% when relying on contextual features.

The results in Table 5 showed the outstanding performance of the LSTM and GRU models. The single-layered LSTM and GRU models showed an accuracy of 0.989 and 0.987, respectively, in predicting fake news. Both models showed an almost typical performance with minor differences.

Table 5 also showed that stacking LSTM or GRU layers does not impact the performance in terms of the fake news prediction accuracy. Our experimented results showed that the two-layered LSTM and GRU models returned an accuracy of 0.988 and 0.991, respectively. These results are almost identical to those in single-layered LSTM and GRU models. Similarly, results showed also that the three-layered LSTM and GRU models returned an accuracy of 0.987. The results indicated that adding more layers does not change our model accuracy.

Our experimental results provided a proof that deep learning models are more reliable than machine learning ones, especially when dealing with sequences. According to the experimental results, the LSTM and GRU models were not perplexed by the words in the text sequences.

Our experiment with the BERT model, as described in Table 6, showed an almost similar validation accuracy to the LSTM and GRU model, with a value of 0.997. However, the BERT model showed slightly reduced accuracy compared to LSTM and GRU models. The reason is that, regarding our experiments with BERT in our previous work [36], BERT accepts a maximum of 512 tokens in text. This makes the BERT model very efficient in understanding specific domain text rather than general domain ones.

In comparison to other research work that employed deep learning in fake news detection, Table 7 showed that, with our proposed model, we found that stacking LSTM or GRU showed a comparative performance compared to other approaches.

5. Proposed Work Limitations

Automatic identification of deceptive and misleading news texts is a long-standing and mostly unresolved topic. Even worse, new advances in language modeling enable the automated production of such materials. Like a false or fake one, a true or real text can be created indefinitely. Research studies focused their work on training machine learning classifiers or deep learning models to over-extract patterns from real or fake text. However, news sources were ignored. Although our proposed work showed comparative results in identifying real and fake news, a major drawback of our model is that it needs to be continuously retrained or updated to cope with the enhanced tools that generate fake news. Moreover, we have not addressed the news source in our work. In our future work, we will also consider the effect of identifying news sources as a primary factor in recognizing whether the new unlabeled text is real or fake.

6. Conclusions

With the rapidly increasing news exchange rate among social media users, we have found ourselves in a difficult situation. Currently, most social media users are inexperienced in verifying news. Therefore, they can actively spread rumors or fake news because they share false information without validating it. Therefore, in this paper, we proposed a fake news detection model that relies on deep learning. We have tried many different approaches, including machine learning, deep learning, and transformers. We found that deep learning models outperformed other approaches. We showed that by combining an augmented linguistic feature set with the machine or deep learning models, we could, with high accuracy, identify fake news. In future work, we will augment our model with more linguistic features that enable it to identify newly generated fake news effectively. We also aim to study how to determine authoritative information sources automatically.

Author Contributions

E.A.: Conceptualization, Methodology, Investigation, Formal analysis, Writing—review and editing. S.E.-S.: Conceptualization, Formal analysis, Methodology, Writing—original draft. K.-S.K.: Methodology, Writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by National Research Foundation of Korea-Grant funded by the Korean Government (MSIT)-NRF-2020R1A2B5B02002478).

Conflicts of Interest

The authors declare no conflict of interest.

References

Feldman, P.; Papanastasiou, Y.; Segev, E. Social learning and the design of new experience goods. Manag. Sci. 2019, 65, 1502–1519. [Google Scholar] [CrossRef] [Green Version]
Sahoo, S.R.; Gupta, B.B. Multiple features based approach for automatic fake news detection on social networks using deep learning. Appl. Soft Comput. 2021, 100, 106983. [Google Scholar] [CrossRef]
Girgis, S.; Amer, E.; Gadallah, M. Deep learning algorithms for detecting fake news in online text. In Proceedings of the 2018 13th International Conference on Computer Engineering and Systems (ICCES), Cairo, Egypt, 18–19 December 2018; pp. 93–97. [Google Scholar]
Vosoughi, S.; Roy, D.; Aral, S. The spread of true and false news online. Science 2018, 359, 1146–1151. [Google Scholar] [CrossRef] [PubMed]
Umer, M.; Imtiaz, Z.; Ullah, S.; Mehmood, A.; Choi, G.S.; On, B.W. Fake news stance detection using deep learning architecture (CNN-LSTM). IEEE Access 2020, 8, 156695–156706. [Google Scholar] [CrossRef]
Yang, S.; Shu, K.; Wang, S.; Gu, R.; Wu, F.; Liu, H. Unsupervised fake news detection on social media: A generative approach. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 5644–5651. [Google Scholar]
Zhang, X.; Ghorbani, A.A. An overview of online fake news: Characterization, detection, and discussion. Inf. Process. Manag. 2020, 57, 102025. [Google Scholar] [CrossRef]
Wei, T.; Lu, Y.; Chang, H.; Zhou, Q.; Bao, X. A semantic approach for text clustering using WordNet and lexical chains. Expert Syst. Appl. 2015, 42, 2264–2275. [Google Scholar] [CrossRef] [Green Version]
Ahmed, H.; Traore, I.; Saad, S. Detecting opinion spams and fake news using text classification. Secur. Priv. 2018, 1, e9. [Google Scholar] [CrossRef] [Green Version]
Li, H.; Gupta, A.; Zhang, J.; Flor, N. Who will use augmented reality? An integrated approach based on text analytics and field survey. Eur. J. Oper. Res. 2020, 281, 502–516. [Google Scholar] [CrossRef]
Zhang, C.; Gupta, A.; Kauten, C.; Deokar, A.V.; Qin, X. Detecting fake news for reducing misinformation risks using analytics approaches. Eur. J. Oper. Res. 2019, 279, 1036–1052. [Google Scholar] [CrossRef]
Shao, C.; Ciampaglia, G.L.; Varol, O.; Yang, K.C.; Flammini, A.; Menczer, F. The spread of low-credibility content by social bots. Nat. Commun. 2018, 9, 1–9. [Google Scholar] [CrossRef] [Green Version]
Antoun, W.; Baly, F.; Achour, R.; Hussein, A.; Hajj, H. State of the art models for fake news detection tasks. In Proceedings of the 2020 IEEE International Conference on Informatics, IoT, and Enabling Technologies (ICIoT), Doha, Qatar, 2–5 February 2020; pp. 519–524. [Google Scholar]
Ferrara, E.; Varol, O.; Davis, C.; Menczer, F.; Flammini, A. The rise of social bots. Commun. ACM 2016, 59, 96–104. [Google Scholar] [CrossRef] [Green Version]
Kumar, S.; Asthana, R.; Upadhyay, S.; Upreti, N.; Akbar, M. Fake news detection using deep learning models: A novel approach. Trans. Emerg. Telecommun. Technol. 2020, 31, e3767. [Google Scholar] [CrossRef]
Mugdha, S.B.S.; Ferdous, S.M.; Fahmin, A. Evaluating Machine Learning Algorithms For Bengali Fake News Detection. In Proceedings of the 2020 23rd International Conference on Computer and Information Technology (ICCIT), Dhaka, Bangladesh, 19–21 December 2020; pp. 1–6. [Google Scholar]
Amer, E. Enhancing efficiency of web search engines through ontology learning from unstructured information sources. In Proceedings of the 2015 IEEE International Conference on Information Reuse and Integration, San Francisco, CA, USA, 13–15 August 2015; pp. 542–549. [Google Scholar]
Ruiz-Real, J.L.; Uribe-Toril, J.; Torres, J.A.; De Pablo, J. Artificial intelligence in business and economics research: Trends and future. J. Bus. Econ. Manag. 2021, 22, 98–117. [Google Scholar] [CrossRef]
Popat, K.; Mukherjee, S.; Yates, A.; Weikum, G. DeClarE: Debunking Fake News and False Claims using Evidence-Aware Deep Learning. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; Association for Computational Linguistics: Stroudsburg, PA, USA, 2018; pp. 22–32. [Google Scholar] [CrossRef] [Green Version]
Buntain, C.; Golbeck, J. Automatically identifying fake news in popular twitter threads. In Proceedings of the 2017 IEEE International Conference on Smart Cloud (SmartCloud), New York, NY, USA, 3–5 November 2017; pp. 208–215. [Google Scholar]
Ajao, O.; Bhowmik, D.; Zargari, S. Fake news identification on twitter with hybrid cnn and rnn models. In Proceedings of the 9th International Conference on Social Media and Society, Copenhagen, Denmark, 18–20 July 2018; pp. 226–230. [Google Scholar]
Zhou, X.; Jain, A.; Phoha, V.V.; Zafarani, R. Fake news early detection: A theory-driven model. Digit. Threat. Res. Pract. 2020, 1, 1–25. [Google Scholar] [CrossRef]
Hamdi, T.; Slimi, H.; Bounhas, I.; Slimani, Y. A hybrid approach for fake news detection in twitter based on user features and graph embedding. In Proceedings of the International Conference on Distributed Computing and Internet Technology, Bhubaneswar, India, 9–12 January 2020; Springer: Cham, Switzerland; pp. 266–280. [Google Scholar]
Karimi, H.; Tang, J. Learning Hierarchical Discourse-level Structure for Fake News Detection. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, 2–7 June 2019; Association for Computational Linguistics: Stroudsburg, PA, USA, 2019; pp. 3432–3442. [Google Scholar] [CrossRef] [Green Version]
Youssif, A.A.; Ghalwash, A.Z.; Amer, E. KPE: An automatic keyphrase extraction algorithm. In Proceedings of the IEEE Proceeding of International Conference on Information Systems and Computational Intelligence (ICISCI 2011), Harbin, China, 18–20 January 2011; pp. 103–107. [Google Scholar]
Amer, E.; Foad, K. Akea: An Arabic keyphrase extraction algorithm. In Proceedings of the International Conference on Advanced Intelligent Systems and Informatics, Cairo, Egypt, 24–26 October 2016; Springer: Cham, Switzerland; pp. 137–146. [Google Scholar]
Amer, E.; Fouad, K.M. Keyphrase extraction methodology from short abstracts of medical documents. In Proceedings of the 2016 8th Cairo International Biomedical Engineering Conference (CIBEC), Cairo, Egypt, 15–17 December 2016; pp. 23–26. [Google Scholar]
Youssif, A.A.; Ghalwash, A.Z.; Amer, E.A. HSWS: Enhancing efficiency of web search engine via semantic web. In Proceedings of the International Conference on Management of Emergent Digital EcoSystems, San Francisco, CA, USA, 21–24 November 2011; pp. 212–219. [Google Scholar]
Pennington, J.; Socher, R.; Manning, C.D. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 1532–1543. [Google Scholar]
Yan, J.; Qi, Y.; Rao, Q. LSTM-based hierarchical denoising network for Android malware detection. Secur. Commun. Netw. 2018, 2018, 5249190. [Google Scholar] [CrossRef]
Noh, J.; Park, H.J.; Kim, J.S.; Hwang, S.J. Gated recurrent unit with genetic algorithm for product demand forecasting in supply chain management. Mathematics 2020, 8, 565. [Google Scholar] [CrossRef] [Green Version]
Zeyer, A.; Bahar, P.; Irie, K.; Schlüter, R.; Ney, H. A comparison of transformer and lstm encoder decoder models for asr. In Proceedings of the 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Singapore, 14–18 December 2019; pp. 8–15. [Google Scholar]
Wolf, T.; Debut, L.; Sanh, V.; Chaumond, J.; Delangue, C.; Moi, A.; Cistac, P.; Rault, T.; Louf, R.; Funtowicz, M.; et al. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Online, 16–20 November 2020; pp. 38–45. [Google Scholar]
Rothman, D. Transformers for Natural Language Processing: Build Innovative Deep Neural Network Architectures for NLP with Python, PyTorch, TensorFlow, BERT, RoBERTa, and More; Packt Publishing Ltd. Birmingham Mumbai: Birmingham, UK, 2021. [Google Scholar]
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
Amer, E.; Hazem, A.; Farouk, O.; Louca, A.; Mohamed, Y.; Ashraf, M. A Proposed Chatbot Framework for COVID-19. In Proceedings of the 2021 International Mobile, Intelligent, and Ubiquitous Computing Conference (MIUCC), Cairo, Egypt, 26–27 May 2021; pp. 263–268. [Google Scholar]
Manning, C.; Raghavan, P.; Schütze, H. Introduction to information retrieval. Nat. Lang. Eng. 2010, 16, 100–103. [Google Scholar]
Baeza-Yates, R.; Ribeiro-Neto, B. Modern Information Retrieval; ACM Press: New York, NY, USA, 1999; Volume 463. [Google Scholar]
Kaliyar, R.K. Fake news detection using a deep neural network. In Proceedings of the 2018 4th International Conference on Computing Communication and Automation (ICCCA), Greater Noida, India, 14–15 December 2018; pp. 1–7. [Google Scholar]
Kaliyar, R.K.; Goswami, A.; Narang, P.; Sinha, S. FNDNet—A deep convolutional neural network for fake news detection. Cogn. Syst. Res. 2020, 61, 32–44. [Google Scholar] [CrossRef]
Agarwal, A.; Mittal, M.; Pathak, A.; Goyal, L.M. Fake news detection using a blend of neural networks: An application of deep learning. SN Comput. Sci. 2020, 1, 1–9. [Google Scholar] [CrossRef] [Green Version]
Bahad, P.; Saxena, P.; Kamal, R. Fake news detection using bi-directional LSTM-recurrent neural network. Procedia Comput. Sci. 2019, 165, 74–82. [Google Scholar] [CrossRef]
Kaliyar, R.K.; Goswami, A.; Narang, P. FakeBERT: Fake news detection in social media with a BERT-based deep learning approach. Multimed. Tools Appl. 2021, 80, 11765–11788. [Google Scholar] [CrossRef] [PubMed]
Singhania, S.; Fernandez, N.; Rao, S. 3han: A deep neural network for fake news detection. In Proceedings of the International Conference on Neural Information Processing, Guangzhou, China, 14–18 November 2017; Springer: Berlin/Heidelberg, Germany, 2017; pp. 572–581. [Google Scholar]
Thota, A.; Tilak, P.; Ahluwalia, S.; Lohia, N. Fake news detection: A deep learning approach. SMU Data Sci. Rev. 2018, 1, 10. [Google Scholar]
Padnekar, S.M.; Kumar, G.S.; Deepak, P. Bilstm-autoencoder architecture for stance prediction. In Proceedings of the 2020 International Conference on Data Science and Engineering (ICDSE), Kochi, India, 3–5 December 2020; pp. 1–5. [Google Scholar]

Figure 1. The proposed contextual-based fake news detection model.

Figure 2. Accuracy comparison between statistical and contextual accuracy performance.

Table 1. The fragments of the ISOT dataset.

News Type	Size	News Type	Number of Items
Real	21,417	World news	10,145
Real	21,417	Politics news	11,272
Fake	23,481	Government news	1570
		Middle east news	778
		US news	783
		Left party news	4459
		Politics news	6841
		General news	9050

Table 2. Individual classifier detection accuracy for fake news detection relying on statistical features.

Classifier	Accuracy Measures
Classifier	Precision	Recall	F1-Score	Accuracy
Random Forest	0.861	0.755	0.784	0.775 ± 0.061
Decision Tree	0.784	0.752	0.768	0.763 ± 0.035
SVM	0.728	0.816	0.770	0.781 ± 0.018
Naive Bayes	0.502	0.642	0.563	0.611 ± 0.062

Table 3. Individual classifier detection accuracy for fake news detection relying on contextual features.

Classifier	Accuracy Measures
Classifier	Precision	Recall	F1-Score	Accuracy
Random Forest	0.921	0.922	0.921	0.923 ± 0.041
Decision Tree	0.963	0.962	0.962	0.962 ± 0.022
SVM	0.854	0.711	0.774	0.801 ± 0.011
Naive Bayes	0.789	0.825	0.806	0.802 ± 0.052

Table 4. LSTM and GRU fake news models detection accuracy using statistical features.

Model	Accuracy
LSTM with one layer (LSTM-1)	0.814
GRU with an enhanced hidden layer (GRU-2)	0.835

Table 5. Training and testing accuracy of the LSTM and GRU models.

Model	Accuracy
Model	Training	Testing
LSTM—single layer	0.998	0.989
LSTM—stacked—2—Layers	0.999	0.988
LSTM—stacked—3—Layers	0.999	0.987
GRU—single layer	0.998	0.987
GRU—stacked—2—Layers	0.999	0.991
GRU- stacked—3—Layers	0.999	0.987

Table 6. Training and testing accuracy of the BERT model.

Model	Accuracy
Model	Training	Testing
BERT	0.997	0.969

Table 7. Comparison with other fake news detection approaches based on deep learning models.

Study	Method	NLP Technique	Accuracy
Kaliyar and Kumar [39]	CNN	TF-IDF	0.983
Kaliyar et al. [40]	Deep CNN	Glove	0.984
Agarwal et al. [41]	CNN + LSTM	Glove	0.947
Bahad et al. [42]	Bi-directional LSTM-RNN	Glove	0.988
Kaliyar et al. [43]	Fake Bert	Glove, Bert	0.989
Singhania et al. [44]	3HAN	Glove	0.968
Thota et al. [45]	Dense Neural Network	TF-IDF	0.943
Padnekar et al. [46]	Bi-directional LSTM	Word2Vec	0.940
Proposed Model	Stacked LSTM	Glove	0.988
Proposed Model	Stacked GRU	Glove	0.991

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Amer, E.; Kwak, K.-S.; El-Sappagh, S. Context-Based Fake News Detection Model Relying on Deep Learning Models. Electronics 2022, 11, 1255. https://doi.org/10.3390/electronics11081255

AMA Style

Amer E, Kwak K-S, El-Sappagh S. Context-Based Fake News Detection Model Relying on Deep Learning Models. Electronics. 2022; 11(8):1255. https://doi.org/10.3390/electronics11081255

Chicago/Turabian Style

Amer, Eslam, Kyung-Sup Kwak, and Shaker El-Sappagh. 2022. "Context-Based Fake News Detection Model Relying on Deep Learning Models" Electronics 11, no. 8: 1255. https://doi.org/10.3390/electronics11081255

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Context-Based Fake News Detection Model Relying on Deep Learning Models

Abstract

1. Introduction

2. Related Work

3. Proposed Work

3.1. Data Preprocessing

3.2. Word Embedding

3.3. Classification Model

4. Results and Discussion

4.1. Dataset

4.2. Evaluation Metrics

4.3. Evaluation

5. Proposed Work Limitations

6. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI