MDPI - Publisher of Open Access Journals

24 pages, 2403 KB

Open AccessArticle

Named Entity Recognition with Feature-Enhanced BiLSTM and CRF for Fine-Grained Aspect Identification in Large-Scale Textual Reviews

by Shaheen Khatoon, Jibran Mir and Azhar Mahmood

Mach. Learn. Knowl. Extr. 2026, 8(4), 88; https://doi.org/10.3390/make8040088 - 2 Apr 2026

Viewed by 524

Abstract

Named Entity Recognition (NER) plays a crucial role in Aspect-Based Sentiment Identification (ABSI), enabling the extraction of domain-specific aspects and their associated sentiment expressions from unstructured textual reviews. In complex domains such as movie reviews, sentiment is frequently conveyed through references to named [...] Read more.

Named Entity Recognition (NER) plays a crucial role in Aspect-Based Sentiment Identification (ABSI), enabling the extraction of domain-specific aspects and their associated sentiment expressions from unstructured textual reviews. In complex domains such as movie reviews, sentiment is frequently conveyed through references to named entities (e.g., actors, directors, or movie titles) and other contextual cues. However, many existing ABSI approaches treat NER as a separate preprocessing step, limiting the effective modeling of entity–aspect–opinion relationships. Integrating NER directly into the ABSI framework, allows entity-specific opinions to be more accurately identified, overlapping aspects to be disambiguated, and contextual sentiment expressions to be captured more effectively. To address these challenges, this study proposes an integrated NER-based aspect identification model built on feature-enhanced LSTM and BiLSTM architectures. Linguistic features, including Parts-of-Speech (POS) tags and chunking information, are incorporated to enrich contextual representations, while a Conditional Random Field (CRF) decoding layer models inter-label dependencies for coherent sequence-level predictions of named entities, aspects, and associated opinion expressions. Compared with large transformer-based models, the proposed BiLSTM-CRF architecture offers lower computational complexity, fewer parameters, and allows explicit integration and analysis of linguistic features that are often implicitly encoded in transformer attention mechanisms. The model is evaluated through multiple experimental variants across three domains. Four configurations are applied to movie-review data to jointly extract person names, movie titles, and aspect-opinion pairs, while six configurations assess cross-domain robustness on restaurant and laptop review datasets. Results show that the BiLSTM-CRF model augmented with POS features consistently outperforms baseline configurations in the movie domain and remains competitive across domains, achieving an F1-score of 0.89. These findings demonstrate that explicit linguistic feature integration within a CRF-based sequence modeling can provide an effective and computationally efficient alternative to large-scale transformer fine-tuning for structured, entity-linked ABSI tasks. Full article

(This article belongs to the Section Learning)

► Show Figures

Graphical abstract

25 pages, 2085 KB

Open AccessArticle

SPR-RAG: Semantic Parsing Retriever-Enhanced Question Answering for Power Policy

by Yufang Wang, Tongtong Xu and Yihui Zhu

Algorithms 2025, 18(12), 802; https://doi.org/10.3390/a18120802 - 17 Dec 2025

Viewed by 611

Abstract

To address the limitations of Retrieval-Augmented Generation (RAG) systems in handling long policy documents, mitigating information dilution, and reducing hallucinations in engineering-oriented applications, this paper proposes SPR-RAG, a retrieval-augmented framework designed for knowledge-intensive vertical domains such as electric power policy analysis. With practicality [...] Read more.

To address the limitations of Retrieval-Augmented Generation (RAG) systems in handling long policy documents, mitigating information dilution, and reducing hallucinations in engineering-oriented applications, this paper proposes SPR-RAG, a retrieval-augmented framework designed for knowledge-intensive vertical domains such as electric power policy analysis. With practicality and interpretability as core design goals, SPR-RAG introduces a Semantic Parsing Retriever (SPR), which integrates community detection–based entity disambiguation and transforms natural language queries into logical forms for structured querying over a domain knowledge graph, thereby retrieving verifiable triple-based evidence. To further resolve retrieval bias arising from diverse policy writing styles and inconsistencies between user queries and policy text expressions, a question-repository–based indirect retrieval mechanism is developed. By generating and matching latent questions, this module enables more robust retrieval of non-structured contextual evidence. The system then fuses structured and unstructured evidence into a unified dual-source context, providing the generator with an interpretable and reliable grounding signal. Experiments conducted on real electric power policy corpora demonstrate that SPR-RAG achieves 90.01% faithfulness—representing a 5.26% improvement over traditional RAG—and 76.77% context relevance, with a 5.96% gain. These results show that SPR-RAG effectively mitigates hallucinations caused by ambiguous entity names, textual redundancy, and irrelevant retrieved content, thereby improving the verifiability and factual grounding of generated answers. Overall, SPR-RAG demonstrates strong deployability and cross-domain transfer potential through its “Text → Knowledge Graph → RAG” engineering paradigm. The framework provides a practical and generalizable technical blueprint for building high-trust, industry-grade question–answering systems, offering substantial engineering value and real-world applicability. Full article

(This article belongs to the Section Algorithms for Multidisciplinary Applications)

► Show Figures

Figure 1

29 pages, 2935 KB

Open AccessArticle

Optimising Contextual Embeddings for Meaning Conflation Deficiency Resolution in Low-Resourced Languages

by Mosima A. Masethe, Sunday O. Ojo and Hlaudi D. Masethe

Computers 2025, 14(9), 402; https://doi.org/10.3390/computers14090402 - 22 Sep 2025

Cited by 1 | Viewed by 1239

Abstract

Meaning conflation deficiency (MCD) presents a continual obstacle in natural language processing (NLP), especially for low-resourced and morphologically complex languages, where polysemy and contextual ambiguity diminish model precision in word sense disambiguation (WSD) tasks. This paper examines the optimisation of contextual embedding models, [...] Read more.

Meaning conflation deficiency (MCD) presents a continual obstacle in natural language processing (NLP), especially for low-resourced and morphologically complex languages, where polysemy and contextual ambiguity diminish model precision in word sense disambiguation (WSD) tasks. This paper examines the optimisation of contextual embedding models, namely XLNet, ELMo, BART, and their improved variations, to tackle MCD in linguistic settings. Utilising Sesotho sa Leboa as a case study, researchers devised an enhanced XLNet architecture with specific hyperparameter optimisation, dynamic padding, early termination, and class-balanced training. Comparative assessments reveal that the optimised XLNet attains an accuracy of 91% and exhibits balanced precision–recall metrics of 92% and 91%, respectively, surpassing both its baseline counterpart and competing models. Optimised ELMo attained the greatest overall metrics (accuracy: 92%, F1-score: 96%), whilst optimised BART demonstrated significant accuracy improvements (96%) despite a reduced recall. The results demonstrate that fine-tuning contextual embeddings using MCD-specific methodologies significantly improves semantic disambiguation for under-represented languages. This study offers a scalable and flexible optimisation approach suitable for additional low-resource language contexts. Full article

► Show Figures

Figure 1

21 pages, 2655 KB

Open AccessEditor’s ChoiceArticle

A Hybrid Approach for Geo-Referencing Tweets: Transformer Language Model Regression and Gazetteer Disambiguation

by Thomas Edwards, Padraig Corcoran and Christopher B. Jones

ISPRS Int. J. Geo-Inf. 2025, 14(9), 321; https://doi.org/10.3390/ijgi14090321 - 22 Aug 2025

Cited by 2 | Viewed by 1988

Abstract

Recent approaches to geo-referencing X posts have focused on the use of language modelling techniques that learn geographic region-specific language and use this to infer geographic coordinates from text. These approaches rely on large amounts of labelled data to build accurate predictive models. [...] Read more.

Recent approaches to geo-referencing X posts have focused on the use of language modelling techniques that learn geographic region-specific language and use this to infer geographic coordinates from text. These approaches rely on large amounts of labelled data to build accurate predictive models. However, obtaining significant volumes of geo-referenced data from Twitter, recently renamed X, can be difficult. Further, existing language modelling approaches can require the division of a given area into a grid or set of clusters, which can be dataset-specific and challenging for location prediction at a fine-grained level. Regression-based approaches in combination with deep learning address some of these challenges as they can assign coordinates directly without the need for clustering or grid-based methods. However, such approaches have received only limited attention for the geo-referencing task. In this paper, we adapt state-of-the-art neural network models for the regression task, focusing on geo-referencing wildlife Tweets where there is a limited amount of data. We experiment with different transfer learning techniques for improving the performance of the regression models, and we also compare our approach to recently developed Large Language Models and prompting techniques. We show that using a location names extraction method in combination with regression-based disambiguation, and purely regression when names are absent, leads to significant improvements in locational accuracy over using only regression. Full article

► Show Figures

Figure 1

16 pages, 12177 KB

Open AccessArticle

An Advanced Natural Language Processing Framework for Arabic Named Entity Recognition: A Novel Approach to Handling Morphological Richness and Nested Entities

by Saleh Albahli

Appl. Sci. 2025, 15(6), 3073; https://doi.org/10.3390/app15063073 - 12 Mar 2025

Cited by 9 | Viewed by 2799

Abstract

Named Entity Recognition (NER) is a fundamental task in Natural Language Processing (NLP) that supports applications such as information retrieval, sentiment analysis, and text summarization. While substantial progress has been made in NER for widely studied languages like English, Arabic presents unique challenges [...] Read more.

Named Entity Recognition (NER) is a fundamental task in Natural Language Processing (NLP) that supports applications such as information retrieval, sentiment analysis, and text summarization. While substantial progress has been made in NER for widely studied languages like English, Arabic presents unique challenges due to its morphological richness, orthographic ambiguity, and the frequent occurrence of nested and overlapping entities. This paper introduces a novel Arabic NER framework that addresses these complexities through architectural innovations. The proposed model incorporates a Hybrid Feature Fusion Layer, which integrates external lexical features using a cross-attention mechanism and a Gated Lexical Unit (GLU) to filter noise, while a Compound Span Representation Layer employs Rotary Positional Encoding (RoPE) and Bidirectional GRUs to enhance the detection of complex entity structures. Additionally, an Enhanced Multi-Label Classification Layer improves the disambiguation of overlapping spans and assigns multiple entity types where applicable. The model is evaluated on three benchmark datasets—ANERcorp, ACE 2005, and a custom biomedical dataset—achieving an F1-score of 93.0% on ANERcorp and 89.6% on ACE 2005, significantly outperforming state-of-the-art methods. A case study further highlights the model’s real-world applicability in handling compound and nested entities with high confidence. By establishing a new benchmark for Arabic NER, this work provides a robust foundation for advancing NLP research in morphologically rich languages. Full article

(This article belongs to the Special Issue Techniques and Applications of Natural Language Processing)

► Show Figures

Figure 1

9 pages, 1925 KB

Open AccessProceeding Paper

A New Approach for Carrying Out Sentiment Analysis of Social Media Comments Using Natural Language Processing

by Mritunjay Ranjan, Sanjay Tiwari, Arif Md Sattar and Nisha S. Tatkar

Eng. Proc. 2023, 59(1), 181; https://doi.org/10.3390/engproc2023059181 - 17 Jan 2024

Cited by 13 | Viewed by 9156

Abstract

Business and science are using sentiment analysis to extract and assess subjective information from the web, social media, and other sources using NLP, computational linguistics, text analysis, image processing, audio processing, and video processing. It models polarity, attitudes, and urgency from positive, negative, [...] Read more.

Business and science are using sentiment analysis to extract and assess subjective information from the web, social media, and other sources using NLP, computational linguistics, text analysis, image processing, audio processing, and video processing. It models polarity, attitudes, and urgency from positive, negative, or neutral inputs. Unstructured data make emotion assessment difficult. Unstructured consumer data allow businesses to market, engage, and connect with consumers on social media. Text data are instantly assessed for user sentiment. Opinion mining identifies a text’s positive, negative, or neutral opinions, attitudes, views, emotions, and sentiments. Text analytics uses machine learning to evaluate “unstructured” natural language text data. These data can help firms make money and decisions. Sentiment analysis shows how individuals feel about things, services, organizations, people, events, themes, and qualities. Reviews, forums, blogs, social media, and other articles use it. DD (data-driven) methods find complicated semantic representations of texts without feature engineering. Data-driven sentiment analysis is three-tiered: document-level sentiment analysis determines polarity and sentiment, aspect-based sentiment analysis assesses document segments for emotion and polarity, and data-driven (DD) sentiment analysis recognizes word polarity and writes positive and negative neutral sentiments. Our innovative method captures sentiments from text comments. The syntactic layer encompasses various processes such as sentence-level normalisation, identification of ambiguities at paragraph boundaries, part-of-speech (POS) tagging, text chunking, and lemmatization. Pragmatics include personality recognition, sarcasm detection, metaphor comprehension, aspect extraction, and polarity detection; semantics include word sense disambiguation, concept extraction, named entity recognition, anaphora resolution, and subjectivity detection. Full article

(This article belongs to the Proceedings of Eng. Proc., 2023, RAiSE-2023)

► Show Figures

Figure 1

23 pages, 6647 KB

Open AccessArticle

Name Disambiguation Scheme Based on Heterogeneous Academic Sites

by Dojin Choi, Junhyeok Jang, Sangho Song, Hyeonbyeong Lee, Jongtae Lim, Kyoungsoo Bok and Jaesoo Yoo

Appl. Sci. 2024, 14(1), 192; https://doi.org/10.3390/app14010192 - 25 Dec 2023

Cited by 6 | Viewed by 3563

Abstract

Academic researchers publish their work in various formats, such as papers, patents, and research reports, on different academic sites. When searching for a particular researcher’s work, it can be challenging to pinpoint the right individual, especially when there are multiple researchers with the [...] Read more.

Academic researchers publish their work in various formats, such as papers, patents, and research reports, on different academic sites. When searching for a particular researcher’s work, it can be challenging to pinpoint the right individual, especially when there are multiple researchers with the same name. In order to handle this issue, we propose a name disambiguation scheme for researchers with the same name based on heterogeneous academic sites. The proposed scheme collects and integrates research results from these varied academic sites, focusing on attributes crucial for disambiguation. It then employs clustering techniques to identify individuals who share the same name. Additionally, we implement the proposed rule-based algorithm name disambiguation method and the existing deep learning-based identification method. This approach allows for the selection of the most accurate disambiguation scheme, taking into account the metadata available in the academic sites, using a multi-classifier approach. We consider various researchers’ achievements and metadata of articles registered in various academic search sites. The proposed scheme showed an exceptionally high F1-measure value of 0.99. In this paper, we propose a multi-classifier that executes the most appropriate disambiguation scheme depending on the inputted metadata. The proposed multi-classifier shows the high F1-measure value of 0.67. Full article

(This article belongs to the Special Issue Recent Applications of Big Data Management and Analytics)

► Show Figures

Figure 1

10 pages, 254 KB

Open AccessArticle

Findings on Ad Hoc Contractions

by Sing Choi and Kazem Taghva

Information 2023, 14(7), 391; https://doi.org/10.3390/info14070391 - 10 Jul 2023

Viewed by 2005

Abstract

Abbreviations are often overlooked, since their frequency and acceptance are almost second nature in everyday communication. Business names, handwritten notes, online messaging, professional domains, and different languages all have their own set of abbreviations. The abundance and frequent introduction of new abbreviations cause [...] Read more.

Abbreviations are often overlooked, since their frequency and acceptance are almost second nature in everyday communication. Business names, handwritten notes, online messaging, professional domains, and different languages all have their own set of abbreviations. The abundance and frequent introduction of new abbreviations cause multiple areas of overlaps and ambiguity, which mean documents often lose their clarity. We reverse engineered the process of creating these ad hoc abbreviations and revealed some preliminary statistics on what makes them easier or harder to define. In addition, we generated candidate definitions for which it proved difficult for a word sense disambiguation model to select the correct definition. Full article

(This article belongs to the Special Issue Novel Methods and Applications in Natural Language Processing)

► Show Figures

Figure 1

14 pages, 2538 KB

Open AccessArticle

ERF-XGB: Ensemble Random Forest-Based XG Boost for Accurate Prediction and Classification of E-Commerce Product Review

by Daniyal M. Alghazzawi, Anser Ghazal Ali Alquraishee, Sahar K. Badri and Syed Hamid Hasan

Sustainability 2023, 15(9), 7076; https://doi.org/10.3390/su15097076 - 23 Apr 2023

Cited by 29 | Viewed by 5325

Abstract

Recently, the concept of e-commerce product review evaluation has become a research topic of significant interest in sentiment analysis. The sentiment polarity estimation of product reviews is a great way to obtain a buyer’s opinion on products. It offers significant advantages for online [...] Read more.

Recently, the concept of e-commerce product review evaluation has become a research topic of significant interest in sentiment analysis. The sentiment polarity estimation of product reviews is a great way to obtain a buyer’s opinion on products. It offers significant advantages for online shopping customers to evaluate the service and product qualities of the purchased products. However, the issues related to polysemy, disambiguation, and word dimension mapping create prediction problems in analyzing online reviews. In order to address such issues and enhance the sentiment polarity classification, this paper proposes a new sentiment analysis model, the Ensemble Random Forest-based XG boost (ERF-XGB) approach, for the accurate binary classification of online e-commerce product review sentiments. Two different Internet Movie Database (IMDB) datasets and the Chinese Emotional Corpus (ChnSentiCorp) dataset are used for estimating online reviews. First, the datasets are preprocessed through tokenization, lemmatization, and stemming operations. The Harris hawk optimization (HHO) algorithm selects two datasets’ corresponding features. Finally, the sentiments from online reviews are classified into positive and negative categories regarding the proposed ERF-XGB approach. Hyperparameter tuning is used to find the optimal parameter values that improve the performance of the proposed ERF-XGB algorithm. The performance of the proposed ERF-XGB approach is analyzed using evaluation indicators, namely accuracy, recall, precision, and F1-score, for different existing approaches. Compared with the existing method, the proposed ERF-XGB approach effectively predicts sentiments of online product reviews with an accuracy rate of about 98.7% for the ChnSentiCorp dataset and 98.2% for the IMDB dataset. Full article

► Show Figures

Figure 1

20 pages, 676 KB

Open AccessArticle

Enhancement of Question Answering System Accuracy via Transfer Learning and BERT

by Kai Duan, Shiyu Du, Yiming Zhang, Yanru Lin, Hongzhuo Wu and Quan Zhang

Appl. Sci. 2022, 12(22), 11522; https://doi.org/10.3390/app122211522 - 13 Nov 2022

Cited by 6 | Viewed by 4319

Abstract

Entity linking and predicate matching are two core tasks in the Chinese Knowledge Base Question Answering (CKBQA). Compared with the English entity linking task, the Chinese entity linking is extremely complicated, making accurate Chinese entity linking difficult. Meanwhile, strengthening the correlation between entities [...] Read more.

Entity linking and predicate matching are two core tasks in the Chinese Knowledge Base Question Answering (CKBQA). Compared with the English entity linking task, the Chinese entity linking is extremely complicated, making accurate Chinese entity linking difficult. Meanwhile, strengthening the correlation between entities and predicates is the key to the accuracy of the question answering system. Therefore, we put forward a Bidirectional Encoder Representation from Transformers and transfer learning Knowledge Base Question Answering (BAT-KBQA) framework, which is on the basis of feature-enhanced Bidirectional Encoder Representation from Transformers (BERT), and then perform a Named Entity Recognition (NER) task, which is appropriate for Chinese datasets using transfer learning and the Bidirectional Long Short-Term Memory-Conditional Random Field (BiLSTM-CRF) model. We utilize a BERT-CNN (Convolutional Neural Network) model for entity disambiguation of the problem and candidate entities; based on the set of entities and predicates, a BERT-Softmax model with answer entity predicate features is introduced for predicate matching. The answer ultimately chooses to integrate entities and predicates scores to determine the definitive answer. The experimental results indicate that the model, which is developed by us, considerably enhances the overall performance of the Knowledge Base Question Answering (KBQA) and it has the potential to be generalizable. The model also has better performance on the dataset supplied by the NLPCC-ICCPOL2016 KBQA task with a mean F1 score of 87.74% compared to BB-KBQA. Full article

(This article belongs to the Topic Recent Advances in Data Mining)

► Show Figures

Figure 1

22 pages, 6734 KB

Open AccessArticle

G2O-Pose: Real-Time Monocular 3D Human Pose Estimation Based on General Graph Optimization

by Haixun Sun, Yanyan Zhang, Yijie Zheng, Jianxin Luo and Zhisong Pan

Sensors 2022, 22(21), 8335; https://doi.org/10.3390/s22218335 - 30 Oct 2022

Cited by 6 | Viewed by 4112

Abstract

Monocular 3D human pose estimation is used to calculate a 3D human pose from monocular images or videos. It still faces some challenges due to the lack of depth information. Traditional methods have tried to disambiguate it by building a pose dictionary or [...] Read more.

Monocular 3D human pose estimation is used to calculate a 3D human pose from monocular images or videos. It still faces some challenges due to the lack of depth information. Traditional methods have tried to disambiguate it by building a pose dictionary or using temporal information, but these methods are too slow for real-time application. In this paper, we propose a real-time method named G2O-pose, which has a high running speed without affecting the accuracy so much. In our work, we regard the 3D human pose as a graph, and solve the problem by general graph optimization (G2O) under multiple constraints. The constraints are implemented by algorithms including 3D bone proportion recovery, human orientation classification and reverse joint correction and suppression. When the depth of the human body does not change much, our method outperforms the previous non-deep learning methods in terms of running speed, with only a slight decrease in accuracy. Full article

(This article belongs to the Section Physical Sensors)

► Show Figures

Figure 1

12 pages, 1074 KB

Open AccessArticle

Entity Linking Method for Chinese Short Text Based on Siamese-Like Network

by Yang Zhang, Jin Liu, Bo Huang and Bei Chen

Information 2022, 13(8), 397; https://doi.org/10.3390/info13080397 - 22 Aug 2022

Cited by 6 | Viewed by 3254

Abstract

Entity linking plays a fundamental role in knowledge engineering and data mining and is the basis of various downstream applications such as content analysis, relationship extraction, question and answer. Most existing entity linking models rely on sufficient context for disambiguation but do not [...] Read more.

Entity linking plays a fundamental role in knowledge engineering and data mining and is the basis of various downstream applications such as content analysis, relationship extraction, question and answer. Most existing entity linking models rely on sufficient context for disambiguation but do not work well for concise and sparse short texts. In addition, most of the methods use pre-training models to directly calculate the similarity between the entity text to be disambiguated and the candidate entity text, and do not dig deeper into the relationship between them. This article proposes an entity linking method for Chinese short texts based on Siamese-like networks to address the above shortcomings. In the entity disambiguation task, the features of the Siamese-like network are used to deeply parse the semantic relationships in the text and make full use of the feature information of the entity text to be disambiguated, capturing the interdependent features within the sentences through an attention mechanism, aiming to find out the most critical elements in the entity text description. The experimental demonstration on the CCKS2019 dataset shows that the F1 value of the method reaches 87.29%, increase of 11.02% compared to the F1 value(that) of the baseline method, fully validating the superiority of the model. Full article

(This article belongs to the Special Issue Intelligence Computing and Systems)

► Show Figures

Figure 1

15 pages, 1209 KB

Open AccessArticle

Computationally Efficient Context-Free Named Entity Disambiguation with Wikipedia

by Michael Angelos Simos and Christos Makris

Information 2022, 13(8), 367; https://doi.org/10.3390/info13080367 - 2 Aug 2022

Cited by 7 | Viewed by 4585

Abstract

The induction of the semantics of unstructured text corpora is a crucial task for modern natural language processing and artificial intelligence applications. The Named Entity Disambiguation task comprises the extraction of Named Entities and their linking to an appropriate representation from a concept [...] Read more.

The induction of the semantics of unstructured text corpora is a crucial task for modern natural language processing and artificial intelligence applications. The Named Entity Disambiguation task comprises the extraction of Named Entities and their linking to an appropriate representation from a concept ontology based on the available information. This work introduces novel methodologies, leveraging domain knowledge extraction from Wikipedia in a simple yet highly effective approach. In addition, we introduce a fuzzy logic model with a strong focus on computational efficiency. We also present a new measure, decisive in both methods for the entity linking selection and the quantification of the confidence of the produced entity links, namely the relative commonness measure. The experimental results of our approach on established datasets revealed state-of-the-art accuracy and run-time performance in the domain of fast, context-free Wikification, by relying on an offline pre-processing stage on the corpus of Wikipedia. The methods introduced can be leveraged as stand-alone NED methodologies, propitious for applications on mobile devices, or in the context of vastly reducing the complexity of deep neural network approaches as a first context-free layer. Full article

(This article belongs to the Special Issue Knowledge Management and Digital Humanities)

► Show Figures

Figure 1

17 pages, 353 KB

Open AccessArticle

The “In-Between Land” of Suspicion and Ambiguity: Plotting the MS Estonia Shipwreck

by Siim Sorokin

Humanities 2022, 11(4), 92; https://doi.org/10.3390/h11040092 - 22 Jul 2022

Cited by 2 | Viewed by 3446

Abstract

The present article is multidisciplinary, drawing on and synthesizing narrative media theories, philosophy of epistemology, conspiracy theory research, and creativity studies. I will explore the following central theoretical problem: whether it is conceptually enriching to (i) further develop the notion of and hence [...] Read more.

The present article is multidisciplinary, drawing on and synthesizing narrative media theories, philosophy of epistemology, conspiracy theory research, and creativity studies. I will explore the following central theoretical problem: whether it is conceptually enriching to (i) further develop the notion of and hence advance the scholarship in “conspiracy theorizing” and (ii) in doing so, would it be productive to ponder the role of peoples’ affective state of suspicion in engaging with ambiguous representations, something that is thrown into especially sharp relief by the conspiracist discourse. Accordingly, my point of departure is the concept of ambiguity and the related semantic field (including its antithesis, closure). Hereby, the concept of suspicion is introduced and treated as a creativity-enhancing, productive affect rooted in narrative thinking and construction. In particular, a specific manifestation of ambiguity apparent in digital sense-making discourses is foregrounded—a self-reproduced ambiguity. These dynamics are explored in the context of, while aspiring to overcome the scholarly emphasis on its negative valence, the practice of “conspiracy theorizing”. This popular practice is hence reconceptualized as contra-plotting. It is understood as a form of sense-making undertaken by the plotters of suspicion in challenging official explanations found unsatisfying and straining one’s belief. Such activity emerges and becomes instrumental in the face of explanatory uncertainty, such as the unsolved nature (“the how”) of the shipwreck, and is posited to be an individual and collaborative creative construction characterized by “continual interpretation”. For, as I will argue, the functional outcome of contra-plotting is to self-reproduce—not to obtain closure for the—ambiguity. Motivated by the suspicious stance, it is a necessary operative mode of such interpretation itself. In attempting to overcome their suspicions about official explanations, plotters inadvertently also ‘plot’ suspicion. Consequently, such an interpretative process corresponding to disambiguation plotting always feeds back into its own ever-expanding (narrative) ‘middle’, searching for yet immediately disregarding, as if by design, any final crystallized ‘truth’. In this context, the perhaps more understated meaning of “to interpret”—namely, to creatively supplement “deficiencies” (supplentio)—may gain in conceptual relevance. In staking the proposed theoretical apparatus, I will draw on my preliminary findings from analytical work on ‘real-time’ digital discussions—observable as a chronological forum archive—on the 1994 shipwreck of the cruise ferry MS Estonia. In order to instrumentalize the outlined tentative theoretical vocabulary, an interpretative close reading of posts from different time periods from the conspiracist forum Para-Web will be provided. This analysis combines textual and narrative analyses. The article ends with some concluding thoughts and aims for further research. Full article

17 pages, 750 KB

Open AccessArticle

AR-Sanad 280K: A Novel 280K Artificial Sanads Dataset for Hadith Narrator Disambiguation

by Somaia Mahmoud, Omar Saif, Emad Nabil, Mohammad Abdeen, Mustafa ElNainay and Marwan Torki

Information 2022, 13(2), 55; https://doi.org/10.3390/info13020055 - 23 Jan 2022

Cited by 9 | Viewed by 10500

Abstract

Determining hadith authenticity is vitally important in the Islamic religion because hadiths record the sayings and actions of Prophet Muhammad (PBUH), and they are the second source of Islamic teachings following the Quran. When authenticating a hadith, the reliability of the [...] Read more.

Determining hadith authenticity is vitally important in the Islamic religion because hadiths record the sayings and actions of Prophet Muhammad (PBUH), and they are the second source of Islamic teachings following the Quran. When authenticating a hadith, the reliability of the hadith narrators is a big factor that hadith scholars consider. However, many narrators share similar names, and the narrators’ full names are not usually included in the narration chains of hadiths. Thus, first, ambiguous narrators need to be identified. Then, their reliability level can be determined. There are no available datasets that could help address this problem of identifying narrators. Here, we present a new dataset that contains narration chains (sanads) with identified narrators. The AR-Sanad 280K dataset has around 280K artificial sanads and could be used to identify 18,298 narrators. After creating the AR-Sanad 280K dataset, we address the narrator disambiguation in several experimental setups. The hadith narrator disambiguation is modeled as a multiclass classification problem with 18,298 class labels. We test different representations and models in our experiments. The best results were achieved by finetuning BERT-Based deep learning model (AraBERT). We obtained a 92.9 Micro F1 score and 30.2 sanad error rate (SER) on the validation set of our artificial sanads AR-Sanad 280K dataset. Furthermore, we extracted a real test set from the sanads of the famous six books in Islamic hadith. We evaluated the best model on the real test data, and we achieved 83.5 Micro F1 score and 60.6 sanad error rate. Full article

(This article belongs to the Topic Big Data and Artificial Intelligence)

► Show Figures

Figure 1

Search Results (25)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (25)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI