Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (247)

Search Parameters:
Keywords = Arabic language model

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
25 pages, 400 KB  
Article
An Automated Unsupervised Model Using Probabilistic Mixture Models and Textual Analysis for Arabic Fake News Detection
by Nuha Zamzami, Hanen Himdi and Rehab K. Qarout
Mathematics 2026, 14(8), 1250; https://doi.org/10.3390/math14081250 - 9 Apr 2026
Abstract
Along with the coronavirus pandemic (COVID-19), some in the medical publication industry have observed an “infodemic”, which is more pandemic than the virus. Given the lack of sufficient pandemic preparedness measures in many countries, people started posting millions of posts on social media [...] Read more.
Along with the coronavirus pandemic (COVID-19), some in the medical publication industry have observed an “infodemic”, which is more pandemic than the virus. Given the lack of sufficient pandemic preparedness measures in many countries, people started posting millions of posts on social media without questioning their veracity or accuracy, particularly within Arabic-speaking communities. This study investigates an unsupervised model for detecting fake news in Arabic to fight the infodemic. While there has been much research on fake news detection (FND) in English, this subject in Arabic has yet to be investigated enough in the literature. We examine the use of distribution-based clustering techniques for Arabic FND and show their performance compared to each other. Moreover, we conduct a comprehensive linguistic analysis, identifying significant differences in textual features between real and fake posts, which can improve fake news detection. Our research shows the potential of online learning techniques to enhance model performance, leading to high accuracy, reaching up to 92%. By addressing the unique challenges posed by Arabic-language posts, our research offers practical implications for developing effective strategies for reducing infodemics and their social consequences and for strategic planning to control the current and future infodemics. Full article
Show Figures

Figure 1

24 pages, 5711 KB  
Article
Image Captioning Through Deep Learning: An Adaptation of the BLIP-2 Model to Arabic
by Ahmed Fathy Abdelaal, Enrique Costa-Montenegro, Silvia García-Méndez, Hatem Mohamed Noaman and Mohammed Kayed
Appl. Sci. 2026, 16(7), 3226; https://doi.org/10.3390/app16073226 - 26 Mar 2026
Viewed by 334
Abstract
Image captioning using deep learning bridges computer vision and natural language processing, enabling machines to generate human-like textual descriptions for images. While significant progress has been made in English, in Arabic, the image captioning field remains under-explored due to the language’s morphological complexity, [...] Read more.
Image captioning using deep learning bridges computer vision and natural language processing, enabling machines to generate human-like textual descriptions for images. While significant progress has been made in English, in Arabic, the image captioning field remains under-explored due to the language’s morphological complexity, right-to-left script, and scarcity of annotated datasets. This paper addresses this gap by adapting the BLIP-2 (Bootstrapped Language—Image Pre-training) model for Arabic caption generation, leveraging machine-translated datasets, like Flickr 30k, to overcome resource limitations. BLIP-2 combines a vision transformer (ViT) for image encoding and a CamelBERT large language model (LLM) for text generation, enhanced by a lightweight Querying Transformer (Q-Former) for cross-modal alignment. Despite challenges such as translation artifacts and linguistic nuances, our experiments demonstrate promising results in generating coherent Arabic captions. In short, this study highlights the potential of BLIP-2 for multilingual applications while underscoring the need for native Arabic datasets and further optimization. Ultimately, this work contributes to advancing inclusive artificial intelligence technologies for Arabic-speaking communities, with applications in assistive tools, education, and content creation. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

18 pages, 37747 KB  
Article
Factually Consistent Prompting with LLMs for Cross-Lingual Dialogue Summarization
by Zhongtian Bao, Wenjian Ding, Yao Zhang, Jun Wang, Zhe Sun, Andrzej Cichocki and Zhenglu Yang
Computers 2026, 15(3), 197; https://doi.org/10.3390/computers15030197 - 21 Mar 2026
Viewed by 280
Abstract
Recent breakthroughs in large language models have made it feasible to effectively summarize cross-lingual dialogue information, proving essential for the global communication context. However, existing methodologies encounter difficulties in maintaining factual consistency across multiple dialogue exchanges and lack clear explanations of the summarization [...] Read more.
Recent breakthroughs in large language models have made it feasible to effectively summarize cross-lingual dialogue information, proving essential for the global communication context. However, existing methodologies encounter difficulties in maintaining factual consistency across multiple dialogue exchanges and lack clear explanations of the summarization process. This paper presents a novel factually consistent prompting technology with large language models to address these challenges in cross-lingual dialogue summarization. First, we propose a factual replacement mechanism to enhance information analysis by incorporating noise information into summarization candidates. We adopt a self-guidance framework to enforce factual consistency, enhancing information flow tracking in cross-lingual hybrid dialogue scenarios with the assistance of GPT-based models. Furthermore, we introduce a view-aware chain-of-thought-driven architecture to improve the interpretability and transparency of the cross-lingual dialogue summarization process. Comprehensive experimental evaluations on cross-lingual summarization tasks, spanning English, French, Spanish, Russian, Chinese, and Arabic, and hybrid cross-lingual tasks substantiate that the proposed model achieves superior performance relative to state-of-the-art baselines. Full article
Show Figures

Figure 1

24 pages, 2520 KB  
Article
MAFQA: A Dataset for Benchmarking Multi-Hop Arabic Fatwa Question Answering
by Manal Ali Al-Qahtani, Bader Fahad Alkhamees and Mourad Ykhlef
Data 2026, 11(3), 64; https://doi.org/10.3390/data11030064 - 20 Mar 2026
Viewed by 279
Abstract
Developing reliable Arabic question answering (QA) systems for Islamic fatwas requires datasets that capture the linguistic complexity and multi-step reasoning inherent in jurisprudential inquiries. However, the existing Arabic religious QA datasets primarily focus on direct retrieval or classification, often failing to address the [...] Read more.
Developing reliable Arabic question answering (QA) systems for Islamic fatwas requires datasets that capture the linguistic complexity and multi-step reasoning inherent in jurisprudential inquiries. However, the existing Arabic religious QA datasets primarily focus on direct retrieval or classification, often failing to address the multi-hop reasoning necessary for complex fatwa questions. To bridge this gap, we introduce MAFQA, a benchmark dataset specifically designed for multi-hop Arabic fatwa question answering. MAFQA was constructed from an extensive corpus of authentic fatwa records sourced from authoritative Islamic institutions. The dataset was developed via a semi-automated pipeline that integrates expert-guided identification of complex inquiries with a structured decomposition framework. This framework employs automated reasoning-pattern classification, semantic feature extraction, and template-guided annotation of subquestions and subanswers, followed by rigorous validation to ensure contextual grounding, logical coherence, and structural consistency. To evaluate the utility of the dataset, we conduct an extensive benchmarking study using Arabic-specialized, multilingual, and instruction-tuned language models across two primary tasks: question decomposition (QD) and generative question answering (QA). Performance is assessed using a comprehensive suite of lexical, semantic, relevance, and faithfulness metrics. Experimental results demonstrate that Arabic-specialized models consistently outperform their multilingual counterparts, with AraT5-base and AraBART achieving the highest performance in terms of lexical similarity, semantic alignment, and answer faithfulness. Full article
(This article belongs to the Section Information Systems and Data Management)
Show Figures

Figure 1

16 pages, 4745 KB  
Article
Automated Construction of a Multi-Dialectal Saudi Corpus Using Generative Language Models
by Khalid Almeman
Electronics 2026, 15(6), 1241; https://doi.org/10.3390/electronics15061241 - 17 Mar 2026
Viewed by 302
Abstract
The lack of high-quality linguistic resources, especially large and diverse Arabic dialect corpora, is a major challenge in the development of Natural Language Processing (NLP) applications. By taking advantage of the generative power of Large Language Models (LLMs), this research proposes an efficient [...] Read more.
The lack of high-quality linguistic resources, especially large and diverse Arabic dialect corpora, is a major challenge in the development of Natural Language Processing (NLP) applications. By taking advantage of the generative power of Large Language Models (LLMs), this research proposes an efficient approach for the automatic construction of a large-scale corpus of Saudi dialects. We specifically translated 51,840 sentences from Modern Standard Arabic (MSA) into three major Saudi dialects: Qassim (Central), Makkah/Jeddah (Western), and Al-Ahsa (Eastern) using Google’s Gemini 1.5 Pro model. Only two items were flagged by the system as invalid outputs and removed, yielding a pipeline-level invalid output rate below 0.01%. Both quantitative and qualitative differences between MSA and its dialects were discovered through extensive linguistic analyses. Although dialectal sentences had significantly higher lexical density and type token ratios, they were always shorter and more concise. These results suggest that the generated dialectal outputs reflect expected tendencies of informal registers in this controlled, domain-specific setting, while highlighting persistent challenges for dialectal NLP—particularly orthographic variation and the lack of standardized spelling. Full article
(This article belongs to the Special Issue Low-Resource Languages in the Age of Large Language Models)
Show Figures

Figure 1

25 pages, 1893 KB  
Article
Contribution to Sarcasm Detection in Arabic Using Natural Language Processing Techniques
by Mennat Allah Hassan, Silvia García-Méndez and Francisco de Arriba-Pérez
Appl. Sci. 2026, 16(6), 2724; https://doi.org/10.3390/app16062724 - 12 Mar 2026
Viewed by 295
Abstract
Sarcasm detection remains a challenging task in Natural Language Processing (nlp), especially for low-resource and non-standardized languages. Hence, this study addresses Franco-Arabic, a widely used form of online communication where Arabic words are written with Latin characters and numerals. Its informal [...] Read more.
Sarcasm detection remains a challenging task in Natural Language Processing (nlp), especially for low-resource and non-standardized languages. Hence, this study addresses Franco-Arabic, a widely used form of online communication where Arabic words are written with Latin characters and numerals. Its informal nature and orthographic variation complicate sarcasm identification and limit the applicability of existing nlp models. We propose an approach that integrates transformer-based representations with auxiliary linguistic features and rule-based cues to capture both contextual meaning and sentiment-driven inconsistencies. This research opens the door to practical applications. In particular, future work will investigate integrating sarcasm detection into the marketing sector, where accurate recognition of sarcastic reviews can enhance sentiment analysis, customer segmentation, and personalized communication strategies. Full article
Show Figures

Figure 1

33 pages, 2017 KB  
Article
GTHL-Emo: Adaptive Imbalance-Aware and Correlation-Aligned Training for Arabic Multi-Label Emotion Detection
by Mashary N. Alrasheedy, Sabrina Tiun and Fariza Fauzi
Electronics 2026, 15(6), 1169; https://doi.org/10.3390/electronics15061169 - 11 Mar 2026
Viewed by 353
Abstract
Multi-label emotion detection (MLED) suffers from long-tailed label distributions and structured inter-label correlations, which jointly suppress rare label recall and yield incoherent predictions. We present Graph Neural Network-Enhanced Transformer with Hybrid Loss Weighting (GTHL-Emo), a unified framework that addresses both challenges without heavy [...] Read more.
Multi-label emotion detection (MLED) suffers from long-tailed label distributions and structured inter-label correlations, which jointly suppress rare label recall and yield incoherent predictions. We present Graph Neural Network-Enhanced Transformer with Hybrid Loss Weighting (GTHL-Emo), a unified framework that addresses both challenges without heavy additional machinery. First, an adaptive imbalance-aware training scheme combines binary cross-entropy, asymmetric focal, and pairwise ranking losses under a learned batch-wise controller, emphasizing rare labels while stabilizing thresholding. Second, a lightweight correlation alignment module learns transformer-based label embeddings and aligns their predicted affinities with empirical co-occurrence via Kullback–Leibler (KL) regularization, smoothing rare label predictions through correlated frequent labels. A transformer encoder with learnable attention pooling provides semantic representations, and a dynamic GraphSAGE layer captures inter-instance structural dependencies. Comprehensive evaluation across three Arabic benchmarks—SemEval-2018-Ec-Ar, ExaAEC, and SemEval-2025 (Track A, Arq)—demonstrates competitive or leading performance. On SemEval-2018-Ec-Ar, GTHL-Emo attained a Jaccard accuracy of 58.70%, micro-F1 score of 71.02%, and macro-F1 score of 60.48%. On ExaAEC, it achieved a Jaccard accuracy of 65.99%, micro-F1 score of 70.72%, and macro-F1 score of 68.71%. On SemEval-2025-Arq, it obtained a Jaccard accuracy of 41.47%, micro-F1 score of 56.78%, and macro-F1 score of 56.69%. Ablation studies revealed that the GraphSAGE structure and ranking loss contributed most significantly (1.45% and 1.46% Jaccard accuracy drops, respectively), while label correlation alignment provided consistent improvements across the scales. These findings demonstrate that jointly optimizing imbalance-aware objectives and label dependencies yields robust Arabic MLED with minimal overhead. Full article
(This article belongs to the Special Issue Deep Learning Approaches for Natural Language Processing)
Show Figures

Figure 1

17 pages, 1701 KB  
Article
CLIP-ArASL: A Lightweight Multimodal Model for Arabic Sign Language Recognition
by Naif Alasmari
Appl. Sci. 2026, 16(5), 2573; https://doi.org/10.3390/app16052573 - 7 Mar 2026
Viewed by 251
Abstract
Arabic sign language (ArASL) is the primary communication medium for Deaf and hard-of-hearing people across Arabic-speaking communities. Most current ArASL recognition systems are based solely on visual features and do not incorporate linguistic or semantic information that could improve generalization and semantic grounding. [...] Read more.
Arabic sign language (ArASL) is the primary communication medium for Deaf and hard-of-hearing people across Arabic-speaking communities. Most current ArASL recognition systems are based solely on visual features and do not incorporate linguistic or semantic information that could improve generalization and semantic grounding. This paper introduces CLIP-ArASL, a lightweight CLIP-style multimodal approach for static ArASL letter recognition that aligns visual hand gestures with bilingual textual descriptions. The approach integrates an EfficientNet-B0 image encoder with a MiniLM text encoder to learn a shared embedding space using a hybrid objective that combines contrastive and cross-entropy losses. This design supports supervised classification on seen classes and zero-shot prediction on unseen classes using textual class representations. The proposed approach is evaluated on two public datasets, ArASL2018 and ArASL21L. Under supervised evaluation, recognition accuracies of 99.25±0.14% and 91.51±1.29% are achieved, respectively. Zero-shot performance is assessed by withholding 20% of gesture classes during training and predicting them using only their textual descriptions. In this setting, accuracies of 55.2±12.15% on ArASL2018 and 37.6±9.07% on ArASL21L are obtained. These results show that multimodal vision–language alignment supports semantic transfer and enables recognition of unseen classes. Full article
(This article belongs to the Special Issue Machine Learning in Computer Vision and Image Processing)
Show Figures

Figure 1

17 pages, 306 KB  
Article
Multimodal AI Screening of Developmental Language Disorder in Tunisian Arabic Children: Clinical Markers and Computational Detection
by Faten Bouhajeb, Redha Touati and Selçuk Güven
Behav. Sci. 2026, 16(3), 375; https://doi.org/10.3390/bs16030375 - 6 Mar 2026
Viewed by 380
Abstract
Developmental Language Disorder (DLD) is a common neurodevelopmental condition that affects language acquisition in children. However, standardized diagnostic tools for Tunisian Arabic, a widely spoken yet underrepresented dialect, is still lacking. This study presents a multimodal biomedical informatics framework that integrates clinical assessments, [...] Read more.
Developmental Language Disorder (DLD) is a common neurodevelopmental condition that affects language acquisition in children. However, standardized diagnostic tools for Tunisian Arabic, a widely spoken yet underrepresented dialect, is still lacking. This study presents a multimodal biomedical informatics framework that integrates clinical assessments, speech recordings, and artificial intelligence (AI) for early DLD detection. Three linguistic tasks (the CLT Task, the Arabic Verb Evaluation Task, and the Nonword Repetition Task) were adapted for Tunisian Arabic, and spontaneous speech samples were collected from children with typical development and those with DLD. Statistical analyses revealed significant deficits in verb production, past-tense morphology, and phonological memory in the DLD group. For automated screening, we developed two systems: a Random Forest classifier based on structured clinical and linguistic features and a multimodal deep learning model using Wav2Vec2 acoustic embeddings. The best model achieved an F1 score of 0.85, demonstrating the feasibility of AI-assisted DLD screening. This work introduces the first standardized dataset and computational baseline for DLD in Tunisian Arabic, providing clinically relevant tools for early identification and supporting research on underrepresented Arabic dialects. This work also highlights future implications, including potential applications in early screening, the integration of acoustic markers, and the development of culturally adapted assessment tools for underrepresented languages. Full article
22 pages, 3288 KB  
Article
An Intelligent Real-Time System for Sentence-Level Recognition of Continuous Saudi Sign Language Using Landmark-Based Temporal Modeling
by Adel BenAbdennour, Mohammed Mukhtar, Osama Almolike, Bilal A. Khawaja and Abdulmajeed M. Alenezi
Sensors 2026, 26(5), 1652; https://doi.org/10.3390/s26051652 - 5 Mar 2026
Viewed by 439
Abstract
A persistent challenge for Deaf and Hard-of-Hearing individuals is the communication gap between sign language users and the hearing community, particularly in regions with limited automated translation resources. In Saudi Arabia, this gap is amplified by the reliance on Saudi Sign Language (SSL) [...] Read more.
A persistent challenge for Deaf and Hard-of-Hearing individuals is the communication gap between sign language users and the hearing community, particularly in regions with limited automated translation resources. In Saudi Arabia, this gap is amplified by the reliance on Saudi Sign Language (SSL) and the scarcity of real-time, sentence-level translation systems. This paper presents a real-time system for sentence-level recognition of continuous SSL and direct mapping to natural spoken Arabic. The proposed system operates end-to-end on live video streams or pre-recorded content, extracting spatio-temporal landmark features using the MediaPipe Holistic framework. For classification, the input feature vector consists of 225 features derived from hand and body pose landmarks. These features are processed by a Bidirectional Long Short-Term Memory (BiLSTM) network trained on the ArabSign (ArSL) dataset to perform direct sentence-level classification over a vocabulary of 50 continuous Arabic sign language sentences, supported by an idle-based segmentation mechanism that enables natural, uninterrupted signing. Experimental evaluation demonstrates robust generalization: under a Leave-One-Signer-Out (LOSO) cross-validation protocol, the model attains a mean sentence-level accuracy of 94.2%, outperforming the fixed signer-independent split baseline of 92.07%, while maintaining real-time performance suitable for interactive use. To enhance linguistic fluency, an optional post-recognition refinement stage is incorporated using a large language model (LLM), followed by text-to-speech synthesis to produce audible Arabic output; this refinement operates strictly as post-processing and is not included in the reported recognition accuracy metrics. The results demonstrate that direct sentence-level modeling, combined with landmark-based feature extraction and real-time segmentation, provides an effective and practical solution for continuous SSL sentence recognition in real-time. Full article
(This article belongs to the Special Issue Sensor Systems for Gesture Recognition (3rd Edition))
Show Figures

Figure 1

18 pages, 284 KB  
Article
AI Adoption in K–12 Education: A Model of Skills Transformation, Productivity, and Institutional Readiness
by Tarek Elmourad, Lycourgos Hadjiphanis, Kyriakos Christofi, Pieris Chourides and Alexios Kythreotis
Educ. Sci. 2026, 16(2), 337; https://doi.org/10.3390/educsci16020337 - 19 Feb 2026
Viewed by 749
Abstract
The integration of artificial intelligence (AI) into education in the United Arab Emirates (UAE) is moving rapidly as it becomes increasingly mandatory, yet schools often lack the proper expertise and resources to achieve successful implementation. AI integration is not just about having access [...] Read more.
The integration of artificial intelligence (AI) into education in the United Arab Emirates (UAE) is moving rapidly as it becomes increasingly mandatory, yet schools often lack the proper expertise and resources to achieve successful implementation. AI integration is not just about having access to AI large language models and supplying schools with the right machines; it is also about ensuring the right preparation before introducing the tools and connectivity. Existing research has primarily focused on technological capabilities or individual attitudes, offering limited insight into how human, organizational, and well-being factors jointly shape institutional readiness. This study examines the determinants of AI adoption readiness in K–12 education using quantitative survey data collected from 602 teachers across public and private schools in the United Arab Emirates. The study builds on the technology–organization–environment framework and behavioral perception theory to test the influence of multiple factors on perceived usefulness. Professional development availability, school encouragement, access to AI tools, work–life balance, teaching experience, and institutional readiness in schools were examined. The findings of this study suggest that cultural readiness is as important as technical readiness. In addition, it underscores the primacy of perception, leadership support, and infrastructure alignment in shaping the transformation. This study presents an empirically tested explanatory framework that may inform policymakers and school leaders seeking to conduct AI transformation. It also offers practical implications for designing professional development, leadership strategy, and implementation models to support sustainable AI integration. Full article
Show Figures

Figure 1

29 pages, 911 KB  
Article
Boundary-Focused Large Language Model Adaptation for Style Change Detection in Multi-Authored Text
by Abeer Saad Alsheddi and Mohamed El Bachir Menai
Appl. Sci. 2026, 16(4), 1981; https://doi.org/10.3390/app16041981 - 17 Feb 2026
Viewed by 280
Abstract
The style change detection (SCD) task involves identifying the locations of writing style changes in multi-authored documents. This task can be applied to plagiarism detection, security, and commerce applications. Introducing decoder-based Large Language Models (LLMs) marks a pivotal shift in applications. The segment [...] Read more.
The style change detection (SCD) task involves identifying the locations of writing style changes in multi-authored documents. This task can be applied to plagiarism detection, security, and commerce applications. Introducing decoder-based Large Language Models (LLMs) marks a pivotal shift in applications. The segment boundaries for SCD models can be represented by concatenating two consecutive segments as pairs. However, LLMs usually restrict their input lengths, where the long-length inputs may exceed the restricted length. This paper seeks to bridge this gap and exploit the power of LLMs by introducing boundary-focused LLM Adaptation for SCD (BF-LLMA-SCD). The proposed solution adapts decoder-based LLMs for SCD using QLoRA. BF-LLMA-SCD truncates long-length input by preserving texts near an examined boundary while removing those at the other sides. BF-LLMA-SCD was trained on three PAN datasets. Comparison results with the top-performing SOTA solutions show that BF-LLMA-SCD achieved the best performance results in terms of F1 on PAN 2021 and PAN 2022/D1, while obtaining competitive results on PAN 2022/D3. BF-LLMA-SCD was also trained on an Arabic SCD dataset comprising three difficulty levels. It achieved an F1 score above 0.99 on easy instances. Full article
Show Figures

Figure 1

29 pages, 2340 KB  
Article
Target-Aware Bilingual Stance Detection in Social Media Using Transformer Architecture
by Abdul Rahaman Wahab Sait and Yazeed Alkhurayyif
Electronics 2026, 15(4), 830; https://doi.org/10.3390/electronics15040830 - 14 Feb 2026
Viewed by 260
Abstract
Stance detection has emerged as an essential tool in natural language processing for understanding how individuals express agreement, disagreement, or neutrality toward specific targets in social and online discourse. It plays a crucial role in bilingual and multilingual environments, including English-Arabic social media [...] Read more.
Stance detection has emerged as an essential tool in natural language processing for understanding how individuals express agreement, disagreement, or neutrality toward specific targets in social and online discourse. It plays a crucial role in bilingual and multilingual environments, including English-Arabic social media ecosystems, where differences in language structure, discourse style, and data availability pose significant challenges for reliable stance modelling. Existing approaches often struggle with target awareness, cross-lingual generalization, robustness to noisy user-generated text, and the interpretability of model decisions. This study aims to build a reliable, explainable target-aware bilingual stance-detection framework that generalizes across heterogeneous stance formats and languages without retraining on a dataset specific to the target language. Thus, a unified dual-encoder architecture based on mDeBERTa-v3 is proposed. Cross-language contrastive learning offers an auxiliary training objective to align English and Arabic stance representations in a common semantic space. Robustness-oriented regularization is used to mitigate the effects of informal language, vocabulary variation, and adversarial noise. To promote transparency and trustworthiness, the framework incorporates token-level rationale extraction, enables fine-grained interpretability, and supports analysis of hallucination. The proposed model is tested on a combined bilingual test set and two structurally distinct zero-shot benchmarks: MT-CSD and AraStance. Experimental results show consistent performance, with accuracies of 85.0% and 86.8% and F1-scores of 84.7% and 86.8% on the zero-shot benchmarks, confirming stable performance and realistic generalization. Ultimately, these findings reveal that effective bilingual stance detection can be achieved via explicit target conditioning, cross-lingual alignment, and explainability-driven design. Full article
Show Figures

Figure 1

22 pages, 2071 KB  
Article
An Empirical Study of Transformer-Based Neural Machine Translation for English to Arabic
by Fares Alrashidi and Hassan I. Mathkour
Information 2026, 17(2), 198; https://doi.org/10.3390/info17020198 - 14 Feb 2026
Viewed by 414
Abstract
Neural machine translation (NMT) performance is strongly influenced by tokenization strategies, particularly for morphologically rich languages such as Arabic. Despite the importance of tokenization, there is a lack of controlled, reproducible studies examining its impact under low-resource conditions, which limits our understanding of [...] Read more.
Neural machine translation (NMT) performance is strongly influenced by tokenization strategies, particularly for morphologically rich languages such as Arabic. Despite the importance of tokenization, there is a lack of controlled, reproducible studies examining its impact under low-resource conditions, which limits our understanding of how different methods affect translation quality and training dynamics. This paper presents a controlled experimental study analyzing the impact of different tokenization methods on English → Arabic (EN → AR) translation using a Tiny Transformer model under low-resource conditions. The study aims to provide a systematic and reproducible comparison that isolates the effect of tokenization choices under fixed modeling and training constraints. Experiments are conducted with identical architecture, training steps, decoding procedure, and evaluation pipeline to ensure reproducibility. Translation quality is assessed using multiple metrics including BLEU, ChrF++, TER, and BERTScore, revealing substantial divergences and demonstrating empirically, in the context of low-resource Arabic NMT, that BLEU alone is insufficient for reliable evaluation. While the limitations of BLEU are known in general, our results provide new evidence showing that, under low-resource conditions and across different tokenization strategies, reliance on BLEU can lead to misleading conclusions about translation quality. Training dynamics are analyzed using TensorBoard, linking tokenization strategies to differences in convergence, saturation, and stability. For validation, a small-scale English → German (EN → DE) experiment confirms that the Tiny Transformer setup reproduces expected behavior. The contribution of this work lies in establishing controlled empirical evidence and practical insights, rather than absolute performance gains, for low-resource Arabic NMT. Our results provide controlled evidence that tokenization choice critically affects both translation quality and optimization dynamics, offering practical guidance for low-resource Arabic NMT research. Overall, byte-pair encoding (BPE) achieves the strongest balance across surface-level and semantic metrics under controlled low-resource conditions (BLEU: 8.57, ChrF++: 18.56, TER: 97.38, BERTScore-F1: 0.785). Character-level tokenization yields higher semantic similarity than subword-based methods, as reflected by BERTScore, but remains weaker in structural fidelity and surface-form accuracy, while SentencePiece exhibits intermediate behavior, favoring semantic adequacy over exact n-gram matching. These results confirm that tokenization choice critically influences both evaluation outcomes and optimization behavior, and that BLEU alone is insufficient for assessing Arabic translation quality. Full article
(This article belongs to the Special Issue Human and Machine Translation: Recent Trends and Foundations)
Show Figures

Figure 1

28 pages, 2122 KB  
Article
AraCoNER: Arabic Complex NER with Gold and Silver Labels
by Wesam Alruwaili, Najwa Altwaijry and Isra Al-Turaiki
Electronics 2026, 15(4), 750; https://doi.org/10.3390/electronics15040750 - 10 Feb 2026
Viewed by 367
Abstract
Named entity recognition (NER) is a fundamental task in natural language processing. Recently, non-traditional nouns (known as complex NER) have increasingly emerged, including long noun phrases and ambiguous names, for example, Birds of Prey (and the Fantabulous Emancipation of One Harley Quinn), [...] Read more.
Named entity recognition (NER) is a fundamental task in natural language processing. Recently, non-traditional nouns (known as complex NER) have increasingly emerged, including long noun phrases and ambiguous names, for example, Birds of Prey (and the Fantabulous Emancipation of One Harley Quinn), Among Us, and Chicago, which may refer to a city or a novel. Such rapidly growing entity names pose significant challenges for NER. Arabic NER research is usually limited to flat and nested entities, overlooking complex entities due to limited resources, the language’s rich morphology, and context ambiguity. Such tasks require high-quality annotated data, yet most existing approaches rely heavily on supervised learning, which depends on large amounts of labeled data. However, acquiring large annotated datasets is costly and labor-intensive. We construct our corpus by leveraging the superior performance of large language models (LLMs), which have driven recent advances in dataset generation. We propose an Arabic complex NER (AraCoNER) dataset with semantically ambiguous and complex named entities, using both gold and silver labels. We investigate several agent-based annotation frameworks in addition to the plain LLM to determine the most efficient annotator for our task. Then, we introduce LLMAAA+, an LLM-agent-based framework that integrates an LLM-powered agent as an annotator into an active learning loop to efficiently select what should be labeled. Instead of solely synthesizing the training data from LLMs, we enhance both the annotation and training phases to generate pseudo-labels using k-NN sampling for in-context examples. Such an approach ensures both efficiency and quality, with cost-effective and minimal human involvement. Our results show that combining an LLM (GPT-4) with a structured agent framework (Google ADK) yields the highest annotation accuracy, even with a limited number of annotated examples, supporting the proposed LLM-agent-based active learning framework. Full article
Show Figures

Figure 1

Back to TopTop