A Comprehensive Analysis of Fake News Detection Models: A Systematic Literature Review and Current Challenges

Mishra, Alok; Sadia, Halima

doi:10.3390/engproc2023059028

Open AccessProceeding Paper

A Comprehensive Analysis of Fake News Detection Models: A Systematic Literature Review and Current Challenges^†

by

Alok Mishra

^* and

Halima Sadia

Department of Computer Science & Engineering, Integral University, Lucknow 226026, India

^*

Author to whom correspondence should be addressed.

^†

Presented at the International Conference on Recent Advances on Science and Engineering, Dubai, United Arab Emirates, 4–5 October 2023.

Eng. Proc. 2023, 59(1), 28; https://doi.org/10.3390/engproc2023059028

Published: 12 December 2023

(This article belongs to the Proceedings of Eng. Proc., 2023, RAiSE-2023)

Download

Browse Figures

Versions Notes

Abstract

:

In today’s age of social networking, web news inconsistencies have become a pressing concern. These discrepancies can mislead individuals when making important purchase decisions. Despite the existing research in this area, there is a need for more empirical and rigorous investigation into the inconsistencies reported in reviews. False reporting and disinformation on social media platforms can significantly impact societal stability and peace. Fake news is frequently disseminated on social media and can easily influence and deceive populations and governments. Many researchers are working toward distinguishing fake news from genuine news on social media platforms. The practical and timely identification of fake news can help prevent its spread. Our study focuses on how machine learning and deep learning algorithms are used to detect fraudulent data. The most fundamental and practical techniques deployed over recent years are investigated, classified, and defined in numerous datasets in an extended review model. Additionally, simulation media and recorded indicators of performance are reviewed in detail. The review, as mentioned above, provides a comprehensive analysis of key research findings, delving into pertinent issues that may impact individuals in the academic and professional realms interested in augmenting the reliability of automated FND models.

Keywords:

deep learning; machine learning; used data sources; simulation platforms; FND models

1. Introduction

The recognition of fake news has become a noticeable area of research, as emphasized in [1]. Previously, yellow journalism was a standard means of spreading fake news, often aiming to sensationalize stories such as amusing news, accidents, rumors, and crime reports, as mentioned in [2]. However, in the digital era, the propagation of fake news has become even more straightforward, as users can quickly propagate false information to their acquaintances, friends, and beyond, leveraging the unique characteristics of social media platforms, as discussed in [3]. Fake news can circulate cyclically due to individuals’ widespread use of social media platforms. Furthermore, the comments connected with misleading information might alter from time to time, undermining the credibility of genuine news. As discussed in [4], fake news spreads faster than actual news. The impacts of fake news can range from misleading governments to influencing entire populations, as noted in [5]. Different approaches have been employed to detect fake news, including machine learning, language analysis, and knowledge-based approaches, as outlined in [6]. Social networking sites and technology advancements offer numerous avenues for propagating hoax news. Furthermore, recent literature highlights the various benefits of fake news detection (FND). In the modern age of digital technology, online fake news have garnered significant attention across web-based news sites, social networking channels, and digital media [7]. Despite this, the majority of individuals need more competence or sufficient time to verify news sources and ensure their credibility, as pointed out in [8,9]. Our research aims to analyze various fake news detection models. We will conduct a comprehensive literature review, provide inclusive reviews, and analyze performance metrics and datasets. The paper comprises four sections covering literature reviews and research design, algorithms and feature extraction, simulation tools and applications, and an architectural view of fake news detection models.

2. Related Works

Numerous studies have been carried out with the aim of identifying and exposing false information. The author of [10] created an FND system that uses a reverse-tracking approach, and demonstrated faster performance than traditional models. The author developed a framework for detecting fake customer reviews and found that the Ada-Boost classifier outperformed the other classifiers [11]. Fake News Tracker is a tool for gathering social context and generating datasets. In [12], the author analyzed the sharing patterns of real and fake news on Facebook using a frequency-inverse document frequency (tf-idf) and latent Dirichlet allocation (LDA). The author of [13] utilized computational–stylistic evaluation based on NLP. These approaches have shown promising results in improving the accuracy and efficiency of fake news detection. Several studies have been conducted to identify and classify fake news. In 2018, Jang et al. used advancement tree modeling to distinguish between genuine and fake news. Altunbey and Alatas proposed a two-step strategy in 2019 involving preprocessing and vector transformation [14]. Jadhav and Thepade utilized the deep structure semantic model (DSSM) and recurrent neural network (RNN) classifiers for fake news identification in the same year [15]. In 2020, FNDNet and Bernoulli’s Naive Bayes Classifier were developed to classify fake news [16]. Umer M. et al. proposed a hybrid conventional neural network long–short-term memory (CNN-LSTM) model for news categorization [17]. Agarwal et al. introduced a deep learning system using CNN and RNN with a dropout layer to address overfitting [18]. Various deep learning models for detecting fake news have been proposed, including linguistic models, hybrid support vector machine (SVM), coupled ConvNet architecture, and multi-view attention networks [19,20,21,22,23,24,25,26]. These models have shown improved accuracy and the ability to handle ambiguity and low-quality data [27]. TF-IDF, N-gram, and kernel sizes have also been optimized for better performance [28,29,30]. Overall, these models provide efficient solutions for verifying news on social media. Researchers have conducted various studies to uncover and distinguish false information on social media platforms, including Twitter [31]. These studies used natural language processing techniques and deep learning algorithms, such as LSTM and ensemble learning models, to identify fake news in various languages and domains [32]. One study presented a new system called fake news detection (FEND), which uses a clustering approach to identify fraudulent events and topics [33]. Another study analyzed reviews and ratings in the high-tech industry [34]. Furthermore, a self-adaptive harmony search algorithm optimized an ensemble learning model with promising results in detecting fake news across different domains [35]. Consequently, there has been a surge in research articles focusing on developing and implementing FND models based on deep learning techniques in recent years.

3. Characteristics of Existing Models: Algorithmic Classification, Feature Extraction Methods, and Dataset Utilization

3.1. Data Source

In this section, Table 1 presents a comprehensive list of the datasets employed by previous studies to assess the performance of their FND models. These datasets serve as benchmark resources for both testing and training purposes. A significant challenge relating FND is the lack of sufficient benchmark datasets that are large-scale and accurately labeled with ground-truth labels. To address this issue, researchers have utilized diverse datasets with varying characteristics. For instance, some datasets focus specifically on political statements, such as PolitiFact, LIAR, and Weibo. Some data sources, like Twitter, contain tweets, while others, like FNC-1, are based on news articles. These data sources vary not only in terms of size, but also in the labels assigned to the instances and modalities they encompass. Furthermore, many studies have collected their own data either from news stories or social networking sites. Overall, utilizing different datasets allows researchers to explore and evaluate their FND models across various contexts and domains.

3.2. NLP Methods Used in FND

Natural language processing (NLP) helps computers understand and use human language. Data preparation is crucial to detect fake news. Pre-processing can handle missing words, convert attributes, and manage complex structures. Feature extraction combines variables to address difficulties. Models use social context features to obtain relevant information. N-gram techniques group sequences into feature vectors. Linguistic feature extraction analyzes fake news performance using various feature classes. Word embedding generates vectors such as bidirectional encoder representations from transformers (BERT), and global vectors for word representation (Glove) for downstream tasks are tabulated in Table 2. This review summarizes the NLP tasks, features, and challenges for future research.

3.3. Algorithmic Classification

Machine learning is crucial in evaluating FND models, which can be categorized into supervised and unsupervised learning approaches. Unsupervised learning makes it simpler to collect training data by extracting useful feature information from unlabeled data. However, compared to supervised learning methods, the efficacy of unsupervised learning systems frequently needs to improve. Techniques for supervised learning rely on crucial information found in labeled data, with classification being the most popular method. However, the process of data labeling is frequently expensive. A significant obstacle to supervised learning is the need for labeled data. Deep learning, a recent research paradigm, is frequently employed in various identification models due to its notable advancements in complicated natural language processing tasks. The deep-learning algorithms are used for FND. Figure 1 categorizes the commonly used algorithms in fake news identification models. Table 3 lists the most popular machine learning models along with a few benefits and drawbacks.

Unsupervised learning techniques in shallow models include k-means, while supervised learning techniques comprise evolution tree analysis, SVM, hybrid SVM, Bernoulli’s Naive Bayes, LDA, and voting classifiers.

4. FND Model Architectural Views and Performance Measurements Utilized in Traditional FND Models

4.1. The FND Model Architectural View

The model is designed to handle various data types, such as text, image, audio, and video. The architecture comprises several elements and procedures, including data input, feature extraction, feature representation, cross-modal fusion, machine learning algorithms, model training and evaluation, and model output. The model receives different data types and undergoes specific feature extraction techniques tailored to each type. The extracted features are represented in a standard format and fused using cross-modal fusion techniques. Machine learning algorithms, such as classification or regression algorithms, are then applied to the fused feature representations to predict the authenticity of the news. The algorithm is trained on labeled data and its efficiency is evaluated using appropriate measured metrics. Finally, the model generates the output, indicating whether the news is fake or genuine. This architecture provides a comprehensive framework for an effective FND model considering various data types, as shown in Figure 2a,b.

Table 4 shows numerous performance metrics that have been used determine the reliability of existing fake news identification approaches. The following section discusses the most commonly employed techniques in detail.

F1 score: The F1 score is a statistical measurement that provides the harmonic mean between recall and precision, which is shown in Equation (1). It is commonly utilized to assess and rate the efficiency of a model.

F 1 s c o r e = \frac{{2 T}^{p}}{{2 T}^{p} + F^{p} + F^{n}}

(1)

Precision: Precision is defined as the proportion of accurately anticipated positive observations to the total number of positive observations expected, which is shown in Equation (2).

P e s = \frac{T^{p}}{T^{p} + F^{p}}

(2)

Accuracy: The precision is determined by dividing the number of accurately predicted observations by the total number of observations, which is shown in Equation (3).

A c = \frac{{(T}^{p} + T^{n})}{(T^{p} + F^{n} + F^{p} + F^{n})}

(3)

The number of accurately identified positive findings is represented by the recall, also known as the true positive rate, which is shown in Equation (4).

R e = \frac{T^{p}}{T^{p} + F^{n}}

(4)

T^p stands for true positives, Tⁿ for true negatives, F^p for false positives, and Fⁿ for false negatives.

The confusion matrix is a comprehensive summary of the prediction results in a classification problem. It provides a breakdown of the count values for correct and incorrect predictions, categorized by each class, as shown in Table 4.

4.2. The Applications Explored for FND

Fake news can permeate fields such as health, education, democracy, politics, COVID-19, and more, posing detrimental effects on individuals and society. Therefore, recent FND models have concentrated their applications in different domains, as depicted in Figure 3. A significant portion of the analyzed articles primarily emphasize politics. The remaining studies delve into FND within domains such as culture, tourism, COVID-19, e-commerce, and marketing. This analysis provides valuable insights for future researchers, allowing them to explore emerging fields and gather innovative information for advancing FND models.

5. The Consequences of False News and Research Problems, As Well As the Future Scope of the FND Model

5.1. Consequences of Fake News

False news can severely affect people, society, and even global affairs. The dissemination of fake news is a severe issue that can spread inaccurate information, damage reputation, focus on sensitive subjects, intensify polarization, erode trust in the media, and have tangible economic and real-life implications. It can also influence the public’s perception, impact elections, and threaten democracy. Inaccurate information about vaccines can be especially damaging to public health initiatives.

5.2. Gaps in Research and Future Scopes

This study contributes to society by raising awareness about the prevalence of fake news and its impact on social networking sites today (Alsaeedi A and Al-Sarem M, 2020) [47]. The primary objective of detecting fake news is to improve society. Previous studies have utilized deep learning techniques such as LSTM and NNs to develop models that enhance identifying misleading news, such as Savyan et al. [48]. However, FND still presents various challenges in current research. It is crucial to focus on detecting the subjects and creators of fake news, which can help eliminate a broad range of false information to combat false news on social media websites effectively [49]. Nonetheless, addressing the problem of FND remains complex, as shown in Figure 3.

The primary challenge posed by fake news is its inherently multimodal and multilingual nature. It encompasses information presented in various languages and auditory, visual, and textual formats. Furthermore, fake news often involves conversation in a specific language that users may not be familiar with [50].

6. Conclusions

This study presents an in-depth review of FND models and describes the relevant recent developments. A variety of machine learning and deep learning approaches are included in the survey, along with information on datasets, algorithms, their characteristics, and related difficulties. The paper also discusses research gaps and concerns that must be addressed to develop new FND approaches, as well as performance measures used to assess the efficacy of these models. Overall, the study is a helpful tool for encouraging future researchers to concentrate on creating original and ingenious false news recognition methods. It provides a thorough grasp of the issues currently being faced, as well as potential solutions and future developments in the field of FND.

Author Contributions

For this research article, the authors made the following contributions based on the Credit taxonomy: Conceptualization, A.M. and H.S.; Methodology, A.M.; Software, H.S.; Validation, A.M., H.S., and A.M.; Formal Analysis, A.M.; Investigation, A.M.; Resources, A.M.; Data Curation, A.M.; Writing—Original Draft Preparation, A.M.; Writing—Review and Editing, A.M.; Visualization, A.M.; Supervision, H.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Sengupta, E.; Nagpal, R.; Mehrotra, D.; Srivastava, G. ProB- lock: A novel approach for fake news detection. Clust. Comput. 2021, 24, 3779–3795. [Google Scholar] [CrossRef]
Islam, M.R.; Liu, S.; Wang, X.; Xu, G. Deep learning for misinformation detection on online social networks: A survey and new perspectives. Soc. Netw. Anal. Min. 2020, 10, 1–20. [Google Scholar] [CrossRef]
Habib, A.; Asghar, M.Z.; Khan, A.; Habib, A.; Khan, A. False information detection in online content and its role in decision making: A systematic literature review. Soc. Netw. Anal. Min. 2019, 9, 50. [Google Scholar] [CrossRef]
Yang, C.; Zhou, X.; Zafarani, R. CHECKED: Chinese COVID-19 fake news dataset. Soc. Netw. Anal. Min. 2019, 11, 1–8. [Google Scholar] [CrossRef] [PubMed]
Kim, G.; Ko, Y. Effective fake news detection using graph and summarization techniques. Pattern Recognit. Lett. 2021, 151, 135–139. [Google Scholar] [CrossRef]
Vereshchaka, A.; Cosimini, S.; Dong, W. Analyzing and distinguishing fake and real news to mitigate the problem of disinformation. Comput. Math. Organ. Theory 2020, 26, 350–364. [Google Scholar] [CrossRef]
Bondielli, A.; Marcelloni, F. A survey on fake news and rumour detection techniques. Inf. Sci. 2019, 497, 38–55. [Google Scholar] [CrossRef]
Zhou, X.; Zafarani, R. A survey of fake news: Fundamental theories, detection methods, and opportunities. ACM Comput. Surv. 2020, 53, 1–40. [Google Scholar] [CrossRef]
D’Ulizia, A.; Caschera, M.C.; Ferri, F.; Grifoni, P. Fake news detection: A survey of evaluation datasets. Peer J. Comput. Sci. 2021, 7, e518. [Google Scholar] [CrossRef]
Ko, H.; Hong, J.Y.; Kim, S.; Mesicek, L.; Na, I.S. Human-machine interaction: A case study on fake news detection using a back- tracking based on a cognitive system. Cogn. Syst. Res. 2019, 55, 77–81. [Google Scholar] [CrossRef]
Barbado, R.; Araque, O.; Iglesias, C.A. A framework for fake review detection in online consumer electronics retailers. Inf. Process. Manag. 2019, 56, 1234–1244. [Google Scholar] [CrossRef]
Xu, K.; Wang, F.; Wang, H.; Yang, B. Detecting fake news over online social media via domain reputations and content under- standing. Tsinghua Sci. Technol. 2021, 25, 20–27. [Google Scholar] [CrossRef]
de Oliveira, N.R.; Medeiros, D.S.V.; Mattos, D.M.F. A sensitive stylistic approach to identify fake news on social networking. IEEE Signal Process. Lett. 2020, 27, 1250–1254. [Google Scholar] [CrossRef]
Ozbay, F.A.; Alatas, B. Fake news detection within online social media using supervised artificial intelligence algorithms. Phys. A 2020, 540, 15. [Google Scholar] [CrossRef]
Jadhav, S.S.; Thepade, S.D. Fake news identification and classification using dssm and improved recurrent neural network classifier. Appl. Artif. Intell. Int. J. 2019, 33, 1058–1068. [Google Scholar] [CrossRef]
Kaur, S.; Kumar, P.; Kumaraguru, P. Automating fake news detection system using multi-level voting model. Soft Comput. 2020, 24, 9049–9069. [Google Scholar] [CrossRef]
Umer, M.; Imtiaz, Z.; Ullah, S.; Mehmood, A.; Choi, G.S.; On, B.-W. Fake news stance detection using deep learning architecture (CNN-LSTM). IEEE Access 2020, 8, 156695–156706. [Google Scholar] [CrossRef]
Agarwal, A.; Mittal, M.; Pathak, A.; Goyal, L.M. Fake news detection using a blend of neural networks: An application of deep Learning. SN Comput. Sci. 2020, 10 (Suppl. S2), S96–S101. [Google Scholar] [CrossRef]
Shrivastava, G.; Kumar, P.; Ojha, R.P.; Srivastava, P.K.; Mohan, S.; Srivastava, G. Defensive modeling of fake news through online social networks. IEEE Trans. Comput. Soc. Syst. 2020, 7, 1159–1167. [Google Scholar] [CrossRef]
Choudhary, A.; Arora, A. Linguistic feature based learning model for fake news detection and classification. Expert. Syst. Appl. 2020, 169, 114171. [Google Scholar] [CrossRef]
Setiawan, R.; Ponnam, V.S.; Sengan, S.; Anam, M.; Subbiah, C.; Phasinam, K.; Vairaven, M.; Ponnusamy, S. Certain investigation of fake news detection from facebook and twitter using artificial intelligence approach. Wirel. Pers. Commun. 2021, 127, 1–9. [Google Scholar] [CrossRef]
Raj, C.; Meel, P. ConvNet frameworks for multi-modal fake news detection. Appl. Intell. 2021, 51, 8132–8148. [Google Scholar] [CrossRef]
Javed, M.S.; Majeed, H.; Mujtaba, H.; Beg, M.O. Fake reviews classification using deep learning ensemble of shallow convolutions. J. Comput. Soc. Sci. 2021, 4, 883–902. [Google Scholar] [CrossRef]
Saleh, H.; Alharbi, A.; Alsamhi, S.H. OPCNN-FAKE: Optimized convolutional neural network for fake news detection. IEEE Access 2021, 9, 129471–129489. [Google Scholar] [CrossRef]
Ali, H.; Khan, M.S.; AlGhadhban, A.; Alazmi, M.; Alzamil, A.; Al-Utaibi, K.; Qadir, J. All your fake detector are belong to us: Evaluating adversarial robustness of fake-news detectors under black-box settings. IEEE Access 2021, 9, 81678–81692. [Google Scholar] [CrossRef]
Ni, S.; Li, J.; Kao, H.-Y. MVAN: Multi-view attention networks for fake news detection on social media. IEEE Access 2021, 9, 106907–106917. [Google Scholar] [CrossRef]
Han, B.; Han, X.; Zhang, H.; Li, J.; Cao, X. Fighting fake news: Two stream networks for deepfake detection via learnable SRM. IEEE Trans. Biom. Behav. Identity Sci. 2021, 3, 320–331. [Google Scholar] [CrossRef]
Kaliyar, R.K.; Goswami, A.; Narang, P. DeepFakE: Improving fake news detection using tensor decomposition-based deep neural network. J. Supercomput. 2020, 77, 1015–1037. [Google Scholar] [CrossRef]
Choudhary, M.; Chouhan, S.S.; Pilli, E.S.; Vipparthi, S.K. Ber-ConvoNet: A deep learning framework for fake news classification. Appl. Soft Comput. 2021, 110, 107614. [Google Scholar] [CrossRef]
Verma, P.K.; Agrawal, P.; Amorim, I.; Prodan, R. WELFake: Word embedding over linguistic features for fake news detection. IEEE Trans. Comput. Soc. Syst. 2021, 8, 881–893. [Google Scholar] [CrossRef]
Zervopoulos, A.; Alvanou, A.G.; Bezas, K.; Papamichail, A.; Maragoudakis, M.; Kermanidis, K. Deep learning for fake news detection on Twitter regarding the 2019 Hong Kong protests. Neur. Comput. Appl. 2021, 34, 969–982. [Google Scholar] [CrossRef]
Meesad, P. Thai fake news detection based on information retrieval, natural language processing and machine learning. SN Comput. Sci. 2021, 2, 1–17. [Google Scholar] [CrossRef] [PubMed]
Zhang, C.; Gupta, A.; Kauten, C.; Deokar, A.V.; Qin, X. Detecting fake news for reducing misinformation risks using analytics approaches. Eur. J. Oper. Res. 2019, 279, 1036–1052. [Google Scholar] [CrossRef]
Kauffmann, E.; Peral, J.; Gil, D.; Ferrández, A.; Sellers, R.; Mora, H. A framework for big data analytics in commercial social net-works: A case study on sentiment analysis and fake review detection for marketing decision-making. Ind. Mark. Manag. 2020, 90, 523–537. [Google Scholar] [CrossRef]
Huang, Y.F.; Chen, P.H. Fake news detection using an ensemble learning model based on Self-Adaptive Harmony Search algorithms. Expert. Syst. Appl. 2020, 159, 30. [Google Scholar] [CrossRef]
Zheng, L.; Elhai, J.D.; Miao, M.; Wang, Y.; Wang, Y.; Gan, Y. Health-related fake news during the COVID-19 pandemic: Perceived trust and information search. Internet Res. 2022, 32, 768–789. [Google Scholar] [CrossRef]
Jwa, H.; Oh, D.; Park, K.; Kang, J.; Lim, H. ExBAKE: Automatic fake news detection model based on bidirectional encoder rep- resentations from transformers (BERT). Appl. Sci. 2019, 9, 4062. [Google Scholar] [CrossRef]
Braşoveanu, A.M.P.; Andonie, R. Integrating machine learning techniques in semantic fake news detection. Neural Process Lett. 2021, 53, 3055–3072. [Google Scholar] [CrossRef]
Ozbay, F.A.; Alatas, B. Adaptive Salp swarm optimization algorithms with inertia weights for novel fake news detection model in online social media. Multimed. Tools Appl. 2021, 80, 34333–34357. [Google Scholar] [CrossRef]
Shishah, W. Fake news detection using BERT model with joint learning. Arab. J. Sci. Eng. 2021, 46, 9115–9127. [Google Scholar] [CrossRef]
Paka, W.S.; Bansal, R.; Kaushik, A.; Sengupta, S.; Chakraborty, T. Cross-SEAN: A cross-stitch semi-supervised neural attention model for COVID-19 fake news detection. Appl. Soft Comput. 2021, 107, 107393. [Google Scholar] [CrossRef] [PubMed]
Song, C.; Shu, K.; Wu, B. Temporally evolving graph neural network for fake news detection. Inf. Process Manag. 2021, 58, 102712. [Google Scholar] [CrossRef]
Mehta, D.; Dwivedi, A.; Patra, A.; Anand Kumar, M. A transformer-based architecture for fake news classification. Soc. Netw. Anal. Min. 2021, 11, 1–12. [Google Scholar] [CrossRef]
de Souza, M.C.; Nogueira, B.M.; Rossi, R.G.; Marcacini, R.M.; Dos Santos, B.N.; Rezende, S.O. A network-based positive and unlabeled learning approach for fake news detection. Mach. Learn. 2021, 111, 3549–3592. [Google Scholar] [CrossRef]
Faustini, P.H.A.; Covões, T.F. Fake news detection in multiple platforms and languages. Expert. Syst. Appl. 2020, 158, 113503. [Google Scholar] [CrossRef]
Dong, X.; Victor, U.; Qian, L. Two-path deep semisupervised learning for timely fake news detection. IEEE Trans. Comput. Soc. Syst. 2020, 7, 1386–1398. [Google Scholar] [CrossRef]
Alsaeedi, A.; Al-Sarem, M. Detecting rumors on social media on a CNN deep learning technique. Arab. J. Sci. Eng. 2020, 45, 1–32. [Google Scholar] [CrossRef]
Savyan, P.; Bhanu, S.M.S. UbCadet: Detection of compromised accounts in Twitter based on user behavioural profiling. Mul- Timed Tools Appl. 2020, 79, 1–37. [Google Scholar] [CrossRef]
Kapusta, J.; Obonya, J. Improvement of misleading and fake news classification for effective languages by morphological group analysis. Informatics 2020, 7, 4. [Google Scholar] [CrossRef]
Hakak, S.; Alazab, M.; Khan, S.; Gadekallu, T.R.; Maddikunta, P.K.R.; Khan, W.Z. An ensemble machine learning approach through effective feature extraction to classify fake news. Futur. Gener. Comput. Syst. 2021, 117, 47–58. [Google Scholar] [CrossRef]

Figure 1. Existing FND models use natural language processing challenges.

Figure 2. (a) The comprehensive procedure of the FND model. (b) Algorithmic categorization of the existing FND model performance measure.

Figure 3. Existing applications of FND models.

Table 1. Overview of the types of data and sources of data used in conventional FND models.

Type of Data	Source of Data with References
News articles	False or True news, Snopes Fake Legit news, FND datasets [36], Data collected (https://www.kaggle.com/ (accessed on 17 August 2023)) related to the United States Presidential election 2016 [37], http://www.fakenewschallenge.org/ (accessed on 18 September 2023) [17]
Social media articles	Reuters and Kaggle dataset [16], Horne2017_FakeNewsData [20]
Social media data	LIAR dataset [15,23,38,39], PolitiFact and Pyemia [40], Kaggle [21,25], MICC-F220 dataset [22], BuzzFeed Political News and ISOT dataset [39], COVID-19 dataset [41], Twitter, Weibo, Fake Newsnet dataset [42], Kaggle, Fake Newsnet, ISOT and FA-KES5 dataset [24]
Political comments	BuzzFeed and PolitiFact [28]

Table 2. Word vector model benefits and drawbacks, along with references.

Reference	Benefit	Drawback	Model
[40,43]	It records and detects the context of a text or set of words.	Requires significant computational resources during the inference phase.	BERT
[38]	Independent of local statistics.	However, this makes use of global statistics or word co-occurrences.	Glove
[44]	Exhibits faster performance compared to Word2Vec.	Less suitable for shorter documents.	Doc2Vec
[17,45]	Uses smaller vectors while keeping contextual information to handle the semantic meaning of several words inside a document.	Lacks common representations in shorter documents and struggles with handling unknown words.	Word2Vec
[26,45,46]	Offers a more straightforward implementation.	Disregards semantic relationships between words and does not consider the word order within a specific document.	Bag of words
[11,15,16,22,25,33,34,40,47]	Contains information about both less relevant words and more important words.	Exhibits slower performance when dealing with larger vocabularies.	TF-IDF

Table 3. Machine learning based model benefits and drawbacks, along with references.

Reference	Advantage	Disadvantage	Machine Learning Model
[33]	Well-suited for handling large-scale data.	Sensitive to the initialization and parameter K, and it has poor performance with non-convex data.	K-means
[34,45]	Dependable generation capabilities and valuable insights from limited-scale datasets	Performance is affected by the parameters of the kernel function and is unsuitable for multiple classification functions or huge datasets.	SVM
[12,18]	Captures more meaningful underlying representations.	It necessitates an extensive computational cost.	RNN
[17]	Superior performance compared to individual networks.	May lose important information depending on the feature extraction technique.	CNN LSTM
[6]	Innovative method for predicting the spread of false news using user comments and news content.	It necessitates more training time.	GRU
[41]	It extracts user information precisely.	However, it raises certain concerns about the detection of deceptive information.	GNN
[17,32,38]	It obtains a new set of characteristics from fake news	It takes extra time for both training and testing.	LSTM
[18,24,45]	Reduced susceptibility to overfitting.	Longer training duration.	CNN
[40,43,46]	Integrates supplementary knowledge obtained from extensive-scale data.	Performance can be influenced by noise due to the absence of data; can be influenced by noise due to the absence of data pre-processing.	BERT
[39]	It solves challenging issues, yields promising outcomes, and is statistically significant.	It is not suitable for distributed and parallel datasets.	ASSO-OSIW and GWO

Table 4. Performance measurement of the existing fake news identification model.

Reference	Approach
[11]	t-test
[15]	Nemenyi test, Holm test
[45]	Wilcoxon, false positive, true negative
[34]	Reliability, validity, AVE and CR score
[13]	Jaccard similarity
[6]	False positive rate (FPR), true positive rate (TPR), false positive (FP), false negative (FN), true positive (TP), true negative (TN)
[16]	Mean squared error (MSE), data loss
[40]	Binary classification metrics, f2, hamming loss
[42]	Cross-entropy loss, false positive rate (FPR), false negative rate (FNR)
[30]	McNemar’s test
[27]	Macro-F1, micro-F1

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mishra, A.; Sadia, H. A Comprehensive Analysis of Fake News Detection Models: A Systematic Literature Review and Current Challenges. Eng. Proc. 2023, 59, 28. https://doi.org/10.3390/engproc2023059028

AMA Style

Mishra A, Sadia H. A Comprehensive Analysis of Fake News Detection Models: A Systematic Literature Review and Current Challenges. Engineering Proceedings. 2023; 59(1):28. https://doi.org/10.3390/engproc2023059028

Chicago/Turabian Style

Mishra, Alok, and Halima Sadia. 2023. "A Comprehensive Analysis of Fake News Detection Models: A Systematic Literature Review and Current Challenges" Engineering Proceedings 59, no. 1: 28. https://doi.org/10.3390/engproc2023059028

APA Style

Mishra, A., & Sadia, H. (2023). A Comprehensive Analysis of Fake News Detection Models: A Systematic Literature Review and Current Challenges. Engineering Proceedings, 59(1), 28. https://doi.org/10.3390/engproc2023059028

Article Menu

A Comprehensive Analysis of Fake News Detection Models: A Systematic Literature Review and Current Challenges^†

Abstract

1. Introduction

2. Related Works