MDPI - Publisher of Open Access Journals

23 pages, 2615 KB

Open AccessArticle

A Simulation-Based Risk Assessment Model for Comparative Analysis of Collisions in Autonomous and Non-Autonomous Haulage Trucks

by Malihe Goli, Amin Moniri-Morad, Mario Aguilar, Masoud S. Shishvan, Mahdi Shahsavar and Javad Sattarvand

Appl. Sci. 2025, 15(17), 9702; https://doi.org/10.3390/app15179702 (registering DOI) - 3 Sep 2025

Abstract

The implementation of autonomous haulage trucks in open-pit mines represents a progressive advancement in the mining industry, but it poses potential safety risks that require thorough assessment. This study proposes an integrated model that combines discrete-event simulation (DES) with a risk matrix to [...] Read more.

The implementation of autonomous haulage trucks in open-pit mines represents a progressive advancement in the mining industry, but it poses potential safety risks that require thorough assessment. This study proposes an integrated model that combines discrete-event simulation (DES) with a risk matrix to assess collisions associated with three different operational scenarios, including non-autonomous, hybrid, and fully autonomous truck operations. To achieve these objectives, a comprehensive dataset was collected and analyzed using statistical models and natural language processing (NLP) techniques. Multiple scenarios were then developed and simulated to compare the risks of collision and evaluate the impact of eliminating human intervention in hauling operations. A risk matrix was designed to assess the collision likelihood and risk severity of collisions in each scenario, emphasizing the impact on both human safety and project operations. The results revealed an inverse relationship between the number of autonomous trucks and the frequency of collisions, underscoring the potential safety advantages of fully autonomous operations. The collision probabilities show an improvement of approximately 91.7% and 90.7% in the third scenario compared to the first and second scenarios, respectively. Furthermore, high-risk areas were identified at intersections with high traffic. These findings offer valuable insights into enhancing safety protocols and integrating advanced monitoring technologies in open-pit mining operations, particularly those utilizing autonomous haulage truck fleets. Full article

18 pages, 568 KB

Open AccessArticle

Beyond Cross-Entropy: Discounted Least Information Theory of Entropy (DLITE) Loss and the Impact of Loss Functions on AI-Driven Named Entity Recognition

by Sonia Pascua, Michael Pan and Weimao Ke

Information 2025, 16(9), 760; https://doi.org/10.3390/info16090760 - 2 Sep 2025

Abstract

Loss functions play a significant role in shaping model behavior in machine learning, yet their design implications remain underexplored in natural language processing tasks such as Named Entity Recognition (NER). This study investigates the performance and optimization behavior of five loss functions—L1, L2, [...] Read more.

Loss functions play a significant role in shaping model behavior in machine learning, yet their design implications remain underexplored in natural language processing tasks such as Named Entity Recognition (NER). This study investigates the performance and optimization behavior of five loss functions—L1, L2, Cross-Entropy (CE), KL Divergence (KL), and the proposed DLITE (Discounted Least Information Theory of Entropy) Loss—within transformer-based NER models. DLITE introduces a bounded, entropy-discounting approach to penalization, prioritizing recall and training stability, especially under noisy or imbalanced data conditions. We conducted empirical evaluations across three benchmark NER datasets: Basic NER, CoNLL-2003, and the Broad Twitter Corpus. While CE and KL achieved the highest weighted F1-scores in clean datasets, DLITE Loss demonstrated distinct advantages in macro recall, precision–recall balance, and convergence stability—particularly in noisy environments. Our findings suggest that the choice of loss function should align with application-specific priorities, such as minimizing false negatives or managing uncertainty. DLITE adds a new dimension to model design by enabling more measured predictions, making it a valuable alternative in high-stakes or real-world NLP deployments. Full article

(This article belongs to the Special Issue Natural Language Processing (NLP) with Applications and Natural Language Understanding (NLU))

► Show Figures

Figure 1

36 pages, 8964 KB

Open AccessArticle

Verified Language Processing with Hybrid Explainability

by Oliver Robert Fox, Giacomo Bergami and Graham Morgan

Electronics 2025, 14(17), 3490; https://doi.org/10.3390/electronics14173490 - 31 Aug 2025

Viewed by 116

Abstract

The volume and diversity of digital information have led to a growing reliance on Machine Learning (ML) techniques, such as Natural Language Processing (NLP), for interpreting and accessing appropriate data. While vector and graph embeddings represent data for similarity tasks, current state-of-the-art pipelines [...] Read more.

The volume and diversity of digital information have led to a growing reliance on Machine Learning (ML) techniques, such as Natural Language Processing (NLP), for interpreting and accessing appropriate data. While vector and graph embeddings represent data for similarity tasks, current state-of-the-art pipelines lack guaranteed explainability, failing to accurately determine similarity for given full texts. These considerations can also be applied to classifiers exploiting generative language models with logical prompts, which fail to correctly distinguish between logical implication, indifference, and inconsistency, despite being explicitly trained to recognise the first two classes. We present a novel pipeline designed for hybrid explainability to address this. Our methodology combines graphs and logic to produce First-Order Logic (FOL) representations, creating machine- and human-readable representations through Montague Grammar (MG). The preliminary results indicate the effectiveness of this approach in accurately capturing full text similarity. To the best of our knowledge, this is the first approach to differentiate between implication, inconsistency, and indifference for text classification tasks. To address the limitations of existing approaches, we use three self-contained datasets annotated for the former classification task to determine the suitability of these approaches in capturing sentence structure equivalence, logical connectives, and spatiotemporal reasoning. We also use these data to compare the proposed method with language models pre-trained for detecting sentence entailment. The results show that the proposed method outperforms state-of-the-art models, indicating that natural language understanding cannot be easily generalised by training over extensive document corpora. This work offers a step toward more transparent and reliable Information Retrieval (IR) from extensive textual data. Full article

(This article belongs to the Special Issue Emerging Theory and Applications in Natural Language Processing, 2nd Edition)

23 pages, 1540 KB

Open AccessReview

Revolutionizing Oncology Through AI: Addressing Cancer Disparities by Improving Screening, Treatment, and Survival Outcomes via Integration of Social Determinants of Health

by Amit Kumar Srivastav, Aryan Singh, Shailesh Singh, Brian Rivers, James W. Lillard and Rajesh Singh

Cancers 2025, 17(17), 2866; https://doi.org/10.3390/cancers17172866 - 31 Aug 2025

Viewed by 174

Abstract

Background: Social determinants of health (SDOH) are critical contributors to cancer disparities, influencing prevention, early detection, treatment access, and survival outcomes. Addressing these disparities is essential in achieving equitable oncology care. Artificial intelligence (AI) is revolutionizing oncology by leveraging advanced computational methods to [...] Read more.

Background: Social determinants of health (SDOH) are critical contributors to cancer disparities, influencing prevention, early detection, treatment access, and survival outcomes. Addressing these disparities is essential in achieving equitable oncology care. Artificial intelligence (AI) is revolutionizing oncology by leveraging advanced computational methods to address SDOH-driven disparities through predictive analytics, data integration, and precision medicine. Methods: This review synthesizes findings from systematic reviews and original research on AI applications in cancer-focused SDOH research. Key methodologies include machine learning (ML), natural language processing (NLP), deep learning-based medical imaging, and explainable AI (XAI). Special emphasis is placed on AI’s ability to analyze large-scale oncology datasets, including electronic health records (EHRs), geographic information systems (GIS), and real-world clinical trial data, to enhance cancer risk stratification, optimize screening programs, and improve resource allocation. Results: AI has demonstrated significant advancements in cancer diagnostics, treatment planning, and survival prediction by integrating SDOH data. AI-driven radiomics and histopathology have enhanced early detection, particularly in underserved populations. Predictive modeling has improved personalized oncology care, enabling stratification based on socioeconomic and environmental factors. However, challenges remain, including AI bias in screening, trial underrepresentation, and treatment recommendation disparities. Conclusions: AI holds substantial potential to reduce cancer disparities by integrating SDOH into risk prediction, screening, and treatment personalization. Ethical deployment, bias mitigation, and robust regulatory frameworks are essential in ensuring fairness in AI-driven oncology. Integrating AI into precision oncology and public health strategies can bridge cancer care gaps, enhance early detection, and improve treatment outcomes for vulnerable populations. Full article

(This article belongs to the Special Issue Innovations in Addressing Disparities in Cancer)

► Show Figures

Figure 1

14 pages, 657 KB

Open AccessArticle

Pretrained Models Against Traditional Machine Learning for Detecting Fake Hadith

by Jawaher Alghamdi, Adeeb Albukhari and Thair Al-Dala’in

Electronics 2025, 14(17), 3484; https://doi.org/10.3390/electronics14173484 - 31 Aug 2025

Viewed by 162

Abstract

The proliferation of fake news, particularly in sensitive domains like religious texts, necessitates robust authenticity verification methods. This study addresses the growing challenge of authenticating Hadith, where traditional methods relying on the analysis of the chain of narrators (Isnad) and the content (Matn) [...] Read more.

The proliferation of fake news, particularly in sensitive domains like religious texts, necessitates robust authenticity verification methods. This study addresses the growing challenge of authenticating Hadith, where traditional methods relying on the analysis of the chain of narrators (Isnad) and the content (Matn) are increasingly strained by the sheer volume in circulation. To combat this issue, machine learning (ML) and natural language processing (NLP) techniques, specifically through transfer learning, are explored to automate Hadith classification into Genuine and Fake categories. This study utilizes an imbalanced dataset of 8544 Hadiths, with 7008 authentic and 1536 fake Hadiths, to systematically investigate the collective impact of both linguistic and contextual features, particularly the chain of narrators (Isnad), on Hadith authentication. For the first time in this specialized domain, state-of-the-art pre-trained language models (PLMs) such as Multilingual BERT (mBERT), CamelBERT, and AraBERT are evaluated alongside classical algorithms like logistic regression (LR) and support vector machine (SVM) for Hadith authentication. Our best-performing model, AraBERT, achieved a 99.94% F1score when including the chain of narrators, demonstrating the profound effectiveness of contextual elements (Isnad) in significantly improving accuracy, providing novel insights into the indispensable role of computational methods in Hadith authentication and reinforcing traditional scholarly emphasis. This research represents a significant advancement in combating misinformation in this important field. Full article

(This article belongs to the Special Issue Advances in Artificial Intelligence and Computer Vision Based on Deep Learning)

► Show Figures

Figure 1

26 pages, 555 KB

Open AccessConcept Paper

Do We Need a Voice Methodology? Proposing a Voice-Centered Methodology: A Conceptual Framework in the Age of Surveillance Capitalism

by Laura Caroleo

Societies 2025, 15(9), 241; https://doi.org/10.3390/soc15090241 - 30 Aug 2025

Viewed by 119

Abstract

This paper explores the rise in voice-based social media as a pivotal transformation in digital communication, situated within the broader era of chatbots and voice AI. Platforms such as Clubhouse, X Spaces, Discord and similar ones foreground vocal interaction, reshaping norms of participation, [...] Read more.

This paper explores the rise in voice-based social media as a pivotal transformation in digital communication, situated within the broader era of chatbots and voice AI. Platforms such as Clubhouse, X Spaces, Discord and similar ones foreground vocal interaction, reshaping norms of participation, identity construction, and platform governance. This shift from text-centered communication to hybrid digital orality presents new sociological and methodological challenges, calling for the development of voice-centered analytical approaches. In response, the paper introduces a multidimensional methodological framework for analyzing voice-based social media platforms in the context of surveillance capitalism and AI-driven conversational technologies. We propose a high-level reference architecture machine learning for social science pipeline that integrates digital methods techniques, automatic speech recognition (ASR) models, and natural language processing (NLP) models within a reflexive and ethically grounded framework. To illustrate its potential, we outline possible stages of a PoC (proof of concept) audio analysis machine learning pipeline, demonstrated through a conceptual use case involving the collection, ingestion, and analysis of X Spaces. While not a comprehensive empirical study, this pipeline proposal highlights technical and ethical challenges in voice analysis. By situating the voice as a central axis of online sociality and examining it in relation to AI-driven conversational technologies, within an era of post-orality, the study contributes to ongoing debates on surveillance capitalism, platform affordances, and the evolving dynamics of digital interaction. In this rapidly evolving landscape, we urgently need a robust vocal methodology to ensure that voice is not just processed but understood. Full article

(This article belongs to the Special Issue Reshaping Social Reality: Digital Societies and the Data-Based Approach)

► Show Figures

Figure 1

29 pages, 434 KB

Open AccessArticle

Comparative Analysis of Natural Language Processing Techniques in the Classification of Press Articles

by Kacper Piasta and Rafał Kotas

Appl. Sci. 2025, 15(17), 9559; https://doi.org/10.3390/app15179559 - 30 Aug 2025

Viewed by 122

Abstract

The study undertook a comprehensive review and comparative analysis of natural language processing techniques for news article classification, with a particular focus on Java language libraries. The dataset comprised an excess of 200,000 items of news metadata sourced from The Huffington Post. The [...] Read more.

The study undertook a comprehensive review and comparative analysis of natural language processing techniques for news article classification, with a particular focus on Java language libraries. The dataset comprised an excess of 200,000 items of news metadata sourced from The Huffington Post. The traditional algorithms based on mathematical statistics and deep machine learning were evaluated. The libraries chosen for tests were Apache OpenNLP, Stanford CoreNLP, Waikato Weka, and the Huggingface ecosystem with the Pytorch backend. The efficacy of the trained models in forecasting specific topics was evaluated, and diverse methodologies for the feature extraction and analysis of word-vector representations were explored. The study considered aspects such as hardware resource management, implementation simplicity, learning time, and the quality of the resulting model in terms of detection, and it examined a range of techniques for attribute selection, feature filtering, vector representation, and the handling of imbalanced datasets. Advanced techniques for word selection and named entity recognition were employed. The study compared different models and configurations in terms of their performance and the resources they consumed. Furthermore, it addressed the difficulties encountered when processing lengthy texts with transformer neural networks, and it presented potential solutions such as sequence truncation and segment analysis. The elevated computational cost inherent to Java-based languages may present challenges in machine learning tasks. OpenNLP model achieved 84% accuracy, Weka and CoreNLP attained 86% and 88%, respectively, and DistilBERT emerged as the top performer, with an accuracy rate of 92%. Deep learning models demonstrated superior performance, training time, and ease of implementation compared to conventional statistical algorithms. Full article

(This article belongs to the Special Issue Natural Language Processing (NLP) and Applications—2nd Edition)

► Show Figures

Figure 1

25 pages, 4657 KB

Open AccessArticle

Identifying Methodological Language in Psychology Abstracts: A Machine Learning Approach Using NLP and Embedding-Based Clustering

by Konstantinos G. Stathakis, George Papageorgiou and Christos Tjortjis

Big Data Cogn. Comput. 2025, 9(9), 224; https://doi.org/10.3390/bdcc9090224 - 29 Aug 2025

Viewed by 150

Abstract

Research articles are valuable resources for Information Retrieval and Natural Language Processing (NLP) tasks, offering opportunities to analyze key components of scholarly content. This study investigates the presence of methodological terminology in psychology research over the past 30 years (1995–2024) by applying a [...] Read more.

Research articles are valuable resources for Information Retrieval and Natural Language Processing (NLP) tasks, offering opportunities to analyze key components of scholarly content. This study investigates the presence of methodological terminology in psychology research over the past 30 years (1995–2024) by applying a novel NLP and Machine Learning pipeline to a large corpus of 85,452 abstracts, as well as the extent to which this terminology forms distinct thematic groupings. Combining glossary-based extraction, contextualized language model embeddings, and dual-mode clustering, this study offers a scalable framework for the exploration of methodological transparency in scientific text via deep semantic structures. A curated glossary of 365 method-related keywords served as a gold-standard reference for term identification, using direct and fuzzy string matching. Retrieved terms were encoded with SciBERT, averaging embeddings across contextual occurrences to produce unified vectors. These vectors were clustered using unsupervised and weighted unsupervised approaches, yielding six and ten clusters, respectively. Cluster composition was analyzed using weighted statistical measures to assess term importance within and across groups. A total of 78.16% of the examined abstracts contained glossary terms, with an average of 1.8 term per abstract, highlighting an increasing presence of methodological terminology in psychology and reflecting a shift toward greater transparency in research reporting. This work goes beyond the use of static vectors by incorporating contextual understanding in the examination of methodological terminology, while offering a scalable and generalizable approach to semantic analysis in scientific texts, with implications for meta-research, domain-specific lexicon development, and automated scientific knowledge discovery. Full article

(This article belongs to the Special Issue Machine Learning Applications in Natural Language Processing)

► Show Figures

Figure 1

30 pages, 21387 KB

Open AccessArticle

An Intelligent Docent System with a Small Large Language Model (sLLM) Based on Retrieval-Augmented Generation (RAG)

by Taemoon Jung and Inwhee Joe

Appl. Sci. 2025, 15(17), 9398; https://doi.org/10.3390/app15179398 - 27 Aug 2025

Viewed by 336

Abstract

This study designed and empirically evaluated a method to enhance information accessibility for museum and art gallery visitors using a small Large Language Model (sLLM) based on the Retrieval-Augmented Generation (RAG) framework. Over 199,000 exhibition descriptions were collected and refined, and a question-answering [...] Read more.

This study designed and empirically evaluated a method to enhance information accessibility for museum and art gallery visitors using a small Large Language Model (sLLM) based on the Retrieval-Augmented Generation (RAG) framework. Over 199,000 exhibition descriptions were collected and refined, and a question-answering dataset consisting of 102,000 pairs reflecting user personas was constructed to develop DocentGemma, a domain-optimized language model. This model was fine-tuned through Low-Rank Adaptation (LoRA) based on Google’s Gemma2-9B and integrated with FAISS and OpenSearch-based document retrieval systems within the LangChain framework. Performance evaluation was conducted using a dedicated Q&A benchmark for the docent domain, comparing the model against five commercial and open-source LLMs (including GPT-3.5 Turbo, LLaMA3.3-70B, and Gemma2-9B). DocentGemma achieved an accuracy of 85.55% and a perplexity of 3.78, demonstrating competitive performance in language generation and response accuracy within the domain-specific context. To enhance retrieval relevance, a Spatio-Contextual Retriever (SC-Retriever) was introduced, which combines semantic similarity and spatial proximity based on the user’s query and location. An ablation study confirmed that integrating both modalities improved retrieval quality, with the SC-Retriever achieving a recall@1 of 53.45% and a Mean Reciprocal Rank (MRR) of 68.12, representing a 17.5 20% gain in search accuracy compared to baseline models such as GTE and SpatialNN. System performance was further validated through field deployment at three major exhibition venues in Seoul (the Seoul History Museum, the Hwan-ki Museum, and the Hanseong Baekje Museum). A user test involving 110 participants indicated high response credibility and an average satisfaction score of 4.24. To ensure accessibility, the system supports various output formats, including multilingual speech and subtitles. This work illustrates a practical application of integrating LLM-based conversational capabilities into traditional docent services and suggests potential for further development toward location-aware interactive systems and AI-driven cultural content services. Full article

(This article belongs to the Special Issue Advancements in Large Language Models Applied in Multidisciplinary Research Contexts)

► Show Figures

Figure 1

19 pages, 1190 KB

Open AccessArticle

A Lightweight AI System to Generate Headline Messages for Inventory Status Summarization

by Bongjun Ji, Yukwan Hwang, Donghun Kim, Jungmin Park, Minhyeok Ryu and Yongkyu Cho

Systems 2025, 13(9), 741; https://doi.org/10.3390/systems13090741 - 26 Aug 2025

Viewed by 317

Abstract

In the manufacturing supply chain, management reports often begin with concise messages that summarize key inventory insights. Traditionally, human analysts manually crafted these summary messages by sifting through complex data—a process that is both time-consuming and prone to inconsistency. In this research study, [...] Read more.

In the manufacturing supply chain, management reports often begin with concise messages that summarize key inventory insights. Traditionally, human analysts manually crafted these summary messages by sifting through complex data—a process that is both time-consuming and prone to inconsistency. In this research study, we present an AI-based system that automatically generates high-quality inventory insight summaries, referred to as “headline messages,” using real-world inventory data. The proposed system leverages lightweight natural language processing (NLP) and machine learning models to achieve accurate and efficient performance. Historical messages are first clustered using a sentence-translation MiniLM model that provides fast semantic embedding. This is used to derive key message categories and define structured input features for this purpose. Then, an explainable and low-complexity classifier trained to predict appropriate headline messages based on current inventory metrics using minimal computational resources. Through empirical experiments with real enterprise data, we demonstrate that this approach can reproduce expert-written headline messages with high accuracy while reducing report generation time from hours to minutes. This study makes three contributions. First, it introduces a lightweight approach that transforms inventory data into concise messages. Second, the proposed approach mitigates confusion by maintaining interpretability and fact-based control, and aligns wording with domain-specific terminology. Furthermore, it reports an industrial validation and deployment case study, demonstrating that the system can be integrated with enterprise data pipelines to generate large-scale weekly reports. These results demonstrate the application and technological innovation of combining small-scale language models with interpretable machine learning to provide insights. Full article

(This article belongs to the Special Issue Data-Driven Analysis of Industrial Systems Using AI)

► Show Figures

Figure 1

26 pages, 389 KB

Open AccessArticle

Integrating AI with Meta-Language: An Interdisciplinary Framework for Classifying Concepts in Mathematics and Computer Science

by Elena Kramer, Dan Lamberg, Mircea Georgescu and Miri Weiss Cohen

Information 2025, 16(9), 735; https://doi.org/10.3390/info16090735 - 26 Aug 2025

Viewed by 228

Abstract

Providing students with effective learning resources is essential for improving educational outcomes—especially in complex and conceptually diverse fields such as Mathematics and Computer Science. To better understand how these subjects are communicated, this study investigates the linguistic structures embedded in academic texts from [...] Read more.

Providing students with effective learning resources is essential for improving educational outcomes—especially in complex and conceptually diverse fields such as Mathematics and Computer Science. To better understand how these subjects are communicated, this study investigates the linguistic structures embedded in academic texts from selected subfields within both disciplines. In particular, we focus on meta-languages—the linguistic tools used to express definitions, axioms, intuitions, and heuristics within a discipline. The primary objective of this research is to identify which subfields of Mathematics and Computer Science share similar meta-languages. Identifying such correspondences may enable the rephrasing of content from less familiar subfields using styles that students already recognize from more familiar areas, thereby enhancing accessibility and comprehension. To pursue this aim, we compiled text corpora from multiple subfields across both disciplines. We compared their meta-languages using a combination of supervised (Neural Network) and unsupervised (clustering) learning methods. Specifically, we applied several clustering algorithms—K-means, Partitioning around Medoids (PAM), Density-Based Clustering, and Gaussian Mixture Models—to analyze inter-discipline similarities. To validate the resulting classifications, we used XLNet, a deep learning model known for its sensitivity to linguistic patterns. The model achieved an accuracy of 78% and an F1-score of 0.944. Our findings show that subfields can be meaningfully grouped based on meta-language similarity, offering valuable insights for tailoring educational content more effectively. To further verify these groupings and explore their pedagogical relevance, we conducted both quantitative and qualitative research involving student participation. This paper presents findings from the qualitative component—namely, a content analysis of semi-structured interviews with software engineering students and lecturers. Full article

(This article belongs to the Special Issue Advancing Educational Innovation with Artificial Intelligence)

► Show Figures

Figure 1

28 pages, 2252 KB

Open AccessReview

Technical Review: Architecting an AI-Driven Decision Support System for Enhanced Online Learning and Assessment

by Saipunidzam Mahamad, Yi Han Chin, Nur Izzah Nasuha Zulmuksah, Md Mominul Haque, Muhammad Shaheen and Kanwal Nisar

Future Internet 2025, 17(9), 383; https://doi.org/10.3390/fi17090383 - 26 Aug 2025

Viewed by 420

Abstract

The rapid expansion of online learning platforms has necessitated advanced systems to address scalability, personalization, and assessment challenges. This paper presents a comprehensive review of artificial intelligence (AI)-based decision support systems (DSSs) designed for online learning and assessment, synthesizing advancements from 2020 to [...] Read more.

The rapid expansion of online learning platforms has necessitated advanced systems to address scalability, personalization, and assessment challenges. This paper presents a comprehensive review of artificial intelligence (AI)-based decision support systems (DSSs) designed for online learning and assessment, synthesizing advancements from 2020 to 2025. By integrating machine learning, natural language processing, knowledge-based systems, and deep learning, AI-DSSs enhance educational outcomes through predictive analytics, automated grading, and personalized learning paths. This study examines system architecture, data requirements, model selection, and user-centric design, emphasizing their roles in achieving scalability and inclusivity. Through case studies of a MOOC platform using NLP and an adaptive learning system employing reinforcement learning, this paper highlights significant improvements in grading efficiency (up to 70%) and student performance (12–20% grade increases). Performance metrics, including accuracy, response time, and user satisfaction, are analyzed alongside evaluation frameworks combining quantitative and qualitative approaches. Technical challenges, such as model interpretability and bias, ethical concerns like data privacy, and implementation barriers, including cost and adoption resistance, are critically assessed, with proposed mitigation strategies. Future directions explore generative AI, multimodal integration, and cross-cultural studies to enhance global accessibility. This review offers a robust framework for researchers and practitioners, providing actionable insights for designing equitable, efficient, and scalable AI-DSSs to transform online education. Full article

(This article belongs to the Special Issue Generative Artificial Intelligence in Smart Societies)

► Show Figures

Figure 1

19 pages, 4875 KB

Open AccessArticle

Insights into People’s Perceptions Towards Urban Public Spaces Through Analysis of Social Media Reviews: A Case Study of Shanghai

by Lingyue Li and Lie Wang

Buildings 2025, 15(17), 3033; https://doi.org/10.3390/buildings15173033 - 26 Aug 2025

Viewed by 394

Abstract

Urban public space is a crucial constituent of livable city construction. A pleasant and comfortable public space is not simply spacious, bright, and accessible but also subjectively preferred by citizens who use it. Efforts to understand how citizens experience and perceive therein thus [...] Read more.

Urban public space is a crucial constituent of livable city construction. A pleasant and comfortable public space is not simply spacious, bright, and accessible but also subjectively preferred by citizens who use it. Efforts to understand how citizens experience and perceive therein thus matters and would significantly aid urban design and well-being improvement. This research constructs a perception lexicon for 129 sites of public street space, a significant type of public space, in Shanghai and identifies how citizens comment on these sites through sentiment analysis based on social platform texts. A Chinese natural language processing (NLP) tool is applied to sort out the extent of citizens’ feelings on the urban street environment through a 0–1 scoring system. Six types of built environment elements and five categories of urban public spaces are identified. Pleasantly perceived sites primarily locate in the urban center and sporadically distribute in the outskirts and are normally “high-density” and “multi-function” in nature. Among the five categories of urban public spaces, sites that are commercially dynamic with culture, arts, and historical elements or that have gourmet food and good walkability generally receive the higher sentiment scores, but scores of ancient town commercial streets (many are antique streets), once popular and contributing much to tourism economy, are not satisfactory. The NLP-based text analysis also quantifies the intensity of emotional perceptions toward the six types of built environment elements and their associations with the general perception. This study not only offers insights for designers and policy makers in public space optimization but also showcases a scalable, data-driven approach for integrating public emotional and experiential dimensions into urban livability assessments. Full article

(This article belongs to the Special Issue Planning for Sustainable Land Use and Built Environment in High-Density Cities)

► Show Figures

Figure 1

29 pages, 848 KB

Open AccessArticle

Applying Additional Auxiliary Context Using Large Language Model for Metaphor Detection

by Takuya Hayashi and Minoru Sasaki

Big Data Cogn. Comput. 2025, 9(9), 218; https://doi.org/10.3390/bdcc9090218 - 25 Aug 2025

Viewed by 324

Abstract

Metaphor detection is challenging in natural language processing (NLP) because it requires recognizing nuanced semantic shifts beyond literal meaning, and conventional models often falter when contextual cues are limited. We propose a method to enhance metaphor detection by augmenting input sentences with auxiliary [...] Read more.

Metaphor detection is challenging in natural language processing (NLP) because it requires recognizing nuanced semantic shifts beyond literal meaning, and conventional models often falter when contextual cues are limited. We propose a method to enhance metaphor detection by augmenting input sentences with auxiliary context generated by ChatGPT. In our approach, ChatGPT produces semantically relevant sentences that are inserted before, after, or on both sides of a target sentence, allowing us to analyze the impact of context position and length on classification. Experiments on three benchmark datasets (MOH-X, VUA_All, VUA_Verb) show that this context-enriched input consistently outperforms the no-context baseline across accuracy, precision, recall, and F1-score, with the MOH-X dataset achieving the largest F1 gain. These improvements are statistically significant based on two-tailed t-tests. Our findings demonstrate that generative models can effectively enrich context for metaphor understanding, highlighting context placement and quantity as critical factors. Finally, we outline future directions, including advanced prompt engineering, optimizing context lengths, and extending this approach to multilingual metaphor detection. Full article

► Show Figures

Figure 1

36 pages, 590 KB

Open AccessReview

Machine Translation in the Era of Large Language Models:A Survey of Historical and Emerging Problems

by Duygu Ataman, Alexandra Birch, Nizar Habash, Marcello Federico, Philipp Koehn and Kyunghyun Cho

Information 2025, 16(9), 723; https://doi.org/10.3390/info16090723 - 25 Aug 2025

Viewed by 904

Abstract

Historically regarded as one of the most challenging tasks presented to achieve complete artificial intelligence (AI), machine translation (MT) research has seen continuous devotion over the past decade, resulting in cutting-edge architectures for the modeling of sequential information. While the majority of statistical [...] Read more.

Historically regarded as one of the most challenging tasks presented to achieve complete artificial intelligence (AI), machine translation (MT) research has seen continuous devotion over the past decade, resulting in cutting-edge architectures for the modeling of sequential information. While the majority of statistical models traditionally relied on the idea of learning from parallel translation examples, recent research exploring self-supervised and multi-task learning methods extended the capabilities of MT models, eventually allowing the creation of general-purpose large language models (LLMs). In addition to versatility in providing translations useful across languages and domains, LLMs can in principle perform any natural language processing (NLP) task given sufficient amount of task-specific examples. While LLMs now reach a point where they can both replace and augment traditional MT models, the extent of their advantages and the ways in which they leverage translation capabilities across multilingual NLP tasks remains a wide area for exploration. In this literature survey, we present an introduction to the current position of MT research with a historical look at different modeling approaches to MT, how these might be advantageous for the solution of particular problems, and which problems are solved or remain open in regard to recent developments. We also discuss the connection of MT models leading to the development of prominent LLM architectures, how they continue to support LLM performance across different tasks by providing a means for cross-lingual knowledge transfer, and the redefinition of the task with the possibilities that LLM technology brings. Full article

(This article belongs to the Special Issue Human and Machine Translation: Recent Trends and Foundations)

► Show Figures

Figure 1

Search Results (1,114)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (1,114)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI