Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (956)

Search Parameters:
Keywords = natural language understanding

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
18 pages, 439 KB  
Article
Understanding and Predicting Tourist Behavior Through Large Language Models
by Anna Dalla Vecchia, Simone Mattioli, Sara Migliorini and Elisa Quintarelli
Big Data Cogn. Comput. 2026, 10(4), 117; https://doi.org/10.3390/bdcc10040117 - 11 Apr 2026
Viewed by 49
Abstract
Understanding and predicting how tourists move through a city is a challenging task, as it involves a complex interplay of spatial, temporal, and social factors. Traditional recommender systems often rely on structured data, trying to capture the nature of the problem. However, recent [...] Read more.
Understanding and predicting how tourists move through a city is a challenging task, as it involves a complex interplay of spatial, temporal, and social factors. Traditional recommender systems often rely on structured data, trying to capture the nature of the problem. However, recent advances in Large Language Models (LLMs) open new possibilities for reasoning over richer, text-based representations of user context, even without a dedicated pre-training phase. In this study, we investigate the potential of LLMs to interpret and predict tourist movements in a real-world application scenario involving tourist visits to Verona, a municipality in Northern Italy, between 2014 and 2023. We propose an incremental prompt engineering approach that gradually enriches the model input, from spatial features alone to richer behavioral information, including visit histories, time information, and user cluster patterns. The approach is evaluated using six open-source models, enabling us to compare their accuracy and efficiency across various levels of contextual enrichment. The results provide a first insight about the abilities of LLMs to incorporate spatio-temporal contextual factors, thus improving predictions, while maintaining computational efficiency. The analysis of the model-generated explanations completes the picture by adding an interpretability dimension that most existing next-PoI prediction solutions lack. Overall, the study demonstrates the potential of LLMs to integrate multiple contextual dimensions in tourism mobility, highlighting the possibility of a more text-oriented, adaptive, and explainable T-RS. Full article
(This article belongs to the Section Large Language Models and Embodied Intelligence)
Show Figures

Figure 1

35 pages, 51987 KB  
Article
Structurally Consistent and Grounding-Aware Stagewise Reasoning for Referring Remote Sensing Image Segmentation
by Shan Dong, Jianlin Xie, Liang Chen, He Chen, Baogui Qi and Yunqiu Ge
Remote Sens. 2026, 18(7), 1015; https://doi.org/10.3390/rs18071015 - 28 Mar 2026
Viewed by 272
Abstract
Referring Remote Sensing Image Segmentation (RRSIS) is a representative multimodal understanding task for remote sensing, which segments designated targets from remote images according to free-form natural language descriptions. However, complex remote sensing characteristics, such as cluttered backgrounds, large-scale variations, small scattered targets and [...] Read more.
Referring Remote Sensing Image Segmentation (RRSIS) is a representative multimodal understanding task for remote sensing, which segments designated targets from remote images according to free-form natural language descriptions. However, complex remote sensing characteristics, such as cluttered backgrounds, large-scale variations, small scattered targets and repetitive textures, lead to unstable visual grounding and further spatial grounding drift, resulting in inaccurate segmentation results. Existing approaches typically perform implicit visual–linguistic fusion across encoding and decoding stages, entangling spatial grounding with mask refinement. This tightly coupled formulation lacks explicit structural constraints and is prone to cross-modal ambiguity, especially in complex remote sensing layouts. To address these limitations, we propose a Structurally consistent and Grounding-aware Stagewise Reasoning Framework (SGSRF) that follows a grounding-first, segmentation-second paradigm. The framework decomposes inference into three cascaded stages with progressively imposed structural constraints. First, Cross-modal Consistency Refinement (CCR) lays the foundation for stable spatial grounding by enhancing visual–textual structural alignment via CLIP-based features and Structural Consistency Regularization (SCR), producing well-aligned multimodal representations and reliable grounding cues. Second, Grounding-aware Prompt (GPG) Generation bridges grounding and segmentation by converting aligned representations into complementary sparse and dense prompts, which serve as explicit grounding guidance for the segmentation model. Third, Grounding Modulated Segmentation (GMS) leverages the Segment Anything Model (SAM) to generate fine-grained mask prediction under the joint guidance of prompts and grounding cues, improving spatial grounding stability and robustness to background interference and scale variation. Extensive experiments on three remote sensing benchmarks, namely RefSegRS, RRSIS-D, and RISBench, demonstrate that SGSRF achieves state-of-the-art performance. The proposed stagewise paradigm integrates structural alignment, explicit grounding, and prompt-driven segmentation into a unified framework, providing a practical and robust solution for RRSIS in real-world Earth observation applications. Full article
Show Figures

Figure 1

28 pages, 502 KB  
Article
Emotional Framing in Prompts Modulates Large Language Model Performance
by Manuel Gozzi and Francesca Fallucchi
Big Data Cogn. Comput. 2026, 10(4), 102; https://doi.org/10.3390/bdcc10040102 - 24 Mar 2026
Viewed by 612
Abstract
Large Language Models (LLMs) demonstrate remarkable performance across a variety of natural language understanding tasks, yet their sensitivity to emotional framing in user prompts remains underexplored. This paper presents an empirical study investigating how four emotional tones—joy, apathy, anger, and fear—affect LLM performance [...] Read more.
Large Language Models (LLMs) demonstrate remarkable performance across a variety of natural language understanding tasks, yet their sensitivity to emotional framing in user prompts remains underexplored. This paper presents an empirical study investigating how four emotional tones—joy, apathy, anger, and fear—affect LLM performance on the SuperGLUE benchmark. We evaluate five instruction-tuned, open-weight models across eight diverse tasks, systematically modulating input prompts with affective cues while keeping semantic content constant. Results reveal that prompts framed with joy and apathy lead to consistently higher accuracy, with gains of up to 4.5 percentage points compared to fear-framed inputs, which yield the lowest performance. These findings demonstrate that affective modulation in user prompts measurably impacts LLM reasoning and task outcomes, suggesting that emotional framing is not merely stylistic but functionally relevant to model behavior. Our study provides a reproducible experimental framework and an open-source prompt set, offering a foundation for future research on affect-aware prompting strategies and their implications in human–AI interaction. Full article
Show Figures

Figure 1

51 pages, 2633 KB  
Review
Large-Scale Model-Enhanced Vision-Language Navigation: Recent Advances, Practical Applications, and Future Challenges
by Zecheng Li, Xiaolin Meng, Xu He, Youdong Zhang and Wenxuan Yin
Sensors 2026, 26(7), 2022; https://doi.org/10.3390/s26072022 - 24 Mar 2026
Viewed by 728
Abstract
The ability to autonomously navigate and explore complex 3D environments in a purposeful manner, while integrating visual perception with natural language interaction in a human-like way, represents a longstanding research objective in Artificial Intelligence (AI) and embodied cognition. Vision-Language Navigation (VLN) has evolved [...] Read more.
The ability to autonomously navigate and explore complex 3D environments in a purposeful manner, while integrating visual perception with natural language interaction in a human-like way, represents a longstanding research objective in Artificial Intelligence (AI) and embodied cognition. Vision-Language Navigation (VLN) has evolved from geometry-driven to semantics-driven and, more recently, knowledge-driven approaches. With the introduction of Large Language Models (LLMs) and Vision-Language Models (VLMs), recent methods have achieved substantial improvements in instruction interpretation, cross-modal alignment, and reasoning-based planning. However, existing surveys primarily focus on traditional VLN settings and offer limited coverage of LLM-based VLN, particularly in relation to Sim2Real transfer and edge-oriented deployment. This paper presents a structured review of LLM-enabled VLN, covering four core components: instruction understanding, environment perception, high-level planning, and low-level control. Edge deployment and implementation requirements, datasets, and evaluation protocols are summarized, along with an analysis of task evolution from path-following to goal-oriented and demand-driven navigation. Key challenges, including reasoning complexity, spatial cognition, real-time efficiency, robustness, and Sim2Real adaptation, are examined. Future research directions, such as knowledge-enhanced navigation, multimodal integration, and world-model-based frameworks, are discussed. Overall, LLM-driven VLN is progressing toward deeper cognitive integration, supporting the development of more explainable, generalizable, and deployable embodied navigation systems. Full article
Show Figures

Figure 1

18 pages, 2996 KB  
Article
A Multimodal Agentic AI Framework for Intuitive Human–Robot Collaboration
by Xiaoyun Liang and Jiannan Cai
Sensors 2026, 26(6), 1958; https://doi.org/10.3390/s26061958 - 20 Mar 2026
Viewed by 631
Abstract
Widespread acceptance of collaborative robots in human-involved scenarios requires accessible and intuitive interfaces for lay workers and non-expert users. Existing interfaces often rely on users to plan and issue low-level commands, necessitating extensive knowledge of robot control. This study proposes a multimodal agentic [...] Read more.
Widespread acceptance of collaborative robots in human-involved scenarios requires accessible and intuitive interfaces for lay workers and non-expert users. Existing interfaces often rely on users to plan and issue low-level commands, necessitating extensive knowledge of robot control. This study proposes a multimodal agentic AI framework integrating natural user interfaces (NUIs) to foster effortless human-like partnerships in human–robot collaboration (HRC), which enhance intuitiveness and operational efficiency. First, it allows users to instruct robots using plain language verbally, coupled with gaze, revealing objects precisely. Second, it offloads users’ workload for robot motion planning by understanding context and reasoning task decomposition. Third, coordinating with AI agents built on large language models (LLMs), the system interprets users’ requests effectively and provides feedback to establish transparent communication. This proof-of-concept study included experiments to demonstrate a practical implementation of the agentic AI framework on a mobile manipulation robot in the collaborative task of human–robot wood assembly. Seven participants were recruited to interact with this AI-integrated agentic robotic system. Task performance and user experience metrics were measured in terms of completion time, intervention rate, NASA TLX survey for workload, and valuable insights of practical applications were summarized through a qualitative analysis. This study highlights the potential of NUIs and agentic AI-embodied robots to overcome existing HRC barriers and contributes to improving HRC intuitiveness and efficiency. Full article
(This article belongs to the Special Issue Advanced Sensors and AI Integration for Human–Robot Teaming)
Show Figures

Figure 1

21 pages, 281 KB  
Review
Citation Inaccuracies and the Need for Multi-Level Oversight in AI-Assisted Medical Writing
by Vaikunthan Rajaratnam, Usama Farghaly Omar, Kristen Kee and Arun-Kumar Kaliya-Perumal
Standards 2026, 6(1), 10; https://doi.org/10.3390/standards6010010 - 20 Mar 2026
Viewed by 419
Abstract
Generative artificial intelligence (AI)-based large language models (LLMs) are increasingly being used in medical writing to improve efficiency and broaden access to knowledge. However, concerns have emerged regarding the accuracy of the citations they generate. This review discusses the issue of citation inaccuracies [...] Read more.
Generative artificial intelligence (AI)-based large language models (LLMs) are increasingly being used in medical writing to improve efficiency and broaden access to knowledge. However, concerns have emerged regarding the accuracy of the citations they generate. This review discusses the issue of citation inaccuracies in AI-assisted medical writing and its implications for scientific reliability and accountability in academic medicine. Published literature describing citation errors in AI-generated content, particularly in medical and academic contexts, was examined to understand the nature and persistence of this problem and to consider potential safeguards. Reports consistently describe citation inaccuracies, including fabricated references, incorrect bibliographic details, and incomplete source information such as missing authors, journal titles, publication years, or digital object identifiers. Although these tools continue to evolve, such errors remain reported and highlight limitations in their reliability. While LLMs offer clear benefits in supporting medical writing, their outputs require careful verification. As developers continue to address these challenges, responsible use will depend on continued human oversight, improved transparency, greater user awareness, and institutional and policy-level guidance to ensure accurate and trustworthy use of generative AI in medical writing. Full article
29 pages, 3711 KB  
Article
Artificial Intelligence Chatbots as Assistants for Media Users: The Cases of El País and El Espectador
by Gema Sánchez-Muñoz, Isabel García Casado and David Varona Aramburu
Journal. Media 2026, 7(1), 59; https://doi.org/10.3390/journalmedia7010059 - 18 Mar 2026
Viewed by 436
Abstract
In recent months, some media outlets have been launching artificial intelligence-based chatbots that serve as assistants to users in their search, selection and consumption of content. This research analyses two such examples: Vera, a conversational assistant launched by the Spanish newspaper El País, [...] Read more.
In recent months, some media outlets have been launching artificial intelligence-based chatbots that serve as assistants to users in their search, selection and consumption of content. This research analyses two such examples: Vera, a conversational assistant launched by the Spanish newspaper El País, and the model used by the Colombian newspaper El Espectador, which operates on the WhatsApp platform. Both chatbots share the same approach: they are tools designed for users to interact with newspaper content. This interaction takes place through natural language conversations: the technology understands ‘users’ questions or requests and provides answers based on the content hosted in the newspapers. This changes the way media content is explored. We are moving from a paradigm centred on search engines and keywords to one in which conversation determines the discovery of content. The research analyses the results of these two pioneering experiences in the Spanish-language media. The aim is to understand the extent to which they are changing the relationship with content and how they are affecting the media. Full article
(This article belongs to the Special Issue Reimagining Journalism in the Era of Digital Innovation)
Show Figures

Figure 1

20 pages, 2618 KB  
Article
A Deep Hybrid Recommendation Method for Multimodal Information Integrating Content Generated by Large Language Models
by Chao Duan, Wenlong Zhang, Zhongtao Yu, Senyao Li, Xuelian Wan and Qionghao Huang
Information 2026, 17(3), 298; https://doi.org/10.3390/info17030298 - 18 Mar 2026
Viewed by 326
Abstract
Item description information plays a crucial role in helping users understand the basic situation of an item and is also vital auxiliary information in recommendation systems. Traditional methods obtain this data through platform backend data or web scraping techniques, but these data are [...] Read more.
Item description information plays a crucial role in helping users understand the basic situation of an item and is also vital auxiliary information in recommendation systems. Traditional methods obtain this data through platform backend data or web scraping techniques, but these data are often static, relatively fixed, and insufficiently descriptive. In recent years, large language models (LLMs) like generative pre-trained transformer (GPT) have become powerful tools in natural language processing, bringing new hope for LLM-based recommendations. However, does the text information generated by large language models help improve recommendation accuracy? How can the information produced by generative artificial intelligence be integrated with existing multi-source heterogeneous information? In this paper, we propose a novel deep hybrid recommendation method for multimodal information integrating content generated by large language models (DML). We first explore the use of large language models to generate detailed descriptive information about movies. Next, we perform a weighted fusion of the generated text information with existing movie category information and user demographic data, among other multi-source heterogeneous information. Finally, we use the fused information to predict movie ratings. The results indicate that the multimodal information deep hybrid recommendation method, which integrates content generated by large language models, provides substantial evidence of superior performance relative to existing baseline models. Full article
(This article belongs to the Special Issue Generative AI Transformations in Industrial and Societal Applications)
Show Figures

Figure 1

18 pages, 2341 KB  
Article
Structure-Aware Lightweight Document-Level Event Extraction via Code-Based Large Language Models
by Xing Xu, Jianbin Zhao, Pengfei Zhang, Yaduo Liu, Bingyang Yu, Puyuan Zheng, Dingyuan Hu, Zhongchen Deng, Ping Zong, Guoxin Zhang, Zhonghong Ou, Meina Song and Yifan Zhu
Electronics 2026, 15(6), 1187; https://doi.org/10.3390/electronics15061187 - 12 Mar 2026
Viewed by 368
Abstract
Document-level Event Extraction (DEE) requires identifying complex event records and arguments dispersed across unstructured texts. However, applying general Large Language Models (LLMs) to DEE is intrinsically hindered by their lack of inductive bias for rigid structural constraints, often leading to schema violations and [...] Read more.
Document-level Event Extraction (DEE) requires identifying complex event records and arguments dispersed across unstructured texts. However, applying general Large Language Models (LLMs) to DEE is intrinsically hindered by their lack of inductive bias for rigid structural constraints, often leading to schema violations and suboptimal performance in complex structural prediction tasks. To address this, we propose the S tructure-Aware Lightweight DEE, termed SALE, which leverages the structural reasoning potential of Code-Based LLMs (Code-LLMs) as a favorable inductive preference. We leverage the natural isomorphism between event schemas and programming object definitions, formulating event extraction as a Python 3.9 class instantiation task to bridge the gap between semantic understanding and structural adherence. Specifically, SALE employs a novel two-stage training paradigm: First, a Structure-Aware Fine-tuning stage injects general structural knowledge via diverse code-style instruction tasks derived from broad Information Extraction (IE) datasets; second, an Event Extraction Alignment stage utilizes a reward-based alignment loss—optimized via policy gradient—to adapt this capability to document-level intricacies. The effectiveness of SALE stems from the synergy between its structure-aware prompting and the specialized alignment stage built on a code-oriented backbone. Extensive experiments on established news-domain benchmarks (RAMS and WikiEvents) demonstrate that our approach significantly outperforms representative supervised and general LLM baselines in cross-task zero-shot and few-shot transfer settings (e.g., surpassing supervised baselines by over 7% in F1 score). Furthermore, SALE maintains a highly efficient inference profile and parameter-efficient footprint, offering a practical and scalable solution for vertical domain applications. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

21 pages, 976 KB  
Article
A GraphRAG-Based Question-Answering System for Explainable and Advanced Reasoning over Air Quality Insights
by Christos Mountzouris, Grigorios Protopsaltis and John Gialelis
Air 2026, 4(1), 6; https://doi.org/10.3390/air4010006 - 10 Mar 2026
Viewed by 494
Abstract
Exposure to poor indoor air quality (IAQ) conditions represents a major public health concern, with adverse effects on human health and well-being. The adoption of innovative technological solutions can support timely risk awareness, enable informed decision-making, and ultimately mitigate this health burden. In [...] Read more.
Exposure to poor indoor air quality (IAQ) conditions represents a major public health concern, with adverse effects on human health and well-being. The adoption of innovative technological solutions can support timely risk awareness, enable informed decision-making, and ultimately mitigate this health burden. In this context, Large Language Models (LLMs) emerge as a promising technological avenue through the Retrieval-Augmented Generation (RAG) paradigm, which extends their inherent natural language understanding capabilities with explicit access to external knowledge bases, enabling evidence-grounded reasoning and informed recommendations. The present work introduces an integrated GraphRAG-based Question Answering (QA) system that couples a domain-specific knowledge graph encoding fundamental IAQ concepts and relationships with a RAG-based natural language interface, thereby enabling explainable, context-aware, and advanced analytical reasoning over IAQ data. The evaluation results demonstrate the effectiveness of the proposed QA system across both retrieval and generation stages. The retrieval mechanism achieved a context recall of 0.914 and a precision of 0.838, while the generation mechanism attained a faithfulness score of 0.906 and an answer relevancy score of 0.891. Full article
Show Figures

Figure 1

23 pages, 527 KB  
Systematic Review
Knowledge Graph Applications in Cultural Heritage: A ROSES-Based Systematic Review
by Liangbing Zhu, Safawi Abdul Rahman and Hazila Timan
Information 2026, 17(3), 269; https://doi.org/10.3390/info17030269 - 9 Mar 2026
Viewed by 587
Abstract
Knowledge Graphs (KGs) are increasingly adopted in cultural heritage research to address challenges of semantic heterogeneity, data fragmentation, and cross-institutional knowledge integration. Despite the rapid growth of KG-based heritage systems, a comprehensive and methodologically rigorous synthesis of existing applications remains limited. To address [...] Read more.
Knowledge Graphs (KGs) are increasingly adopted in cultural heritage research to address challenges of semantic heterogeneity, data fragmentation, and cross-institutional knowledge integration. Despite the rapid growth of KG-based heritage systems, a comprehensive and methodologically rigorous synthesis of existing applications remains limited. To address this gap, this study conducts a ROSES-based systematic review of KG applications in cultural heritage, aiming to examine prevailing application domains, methodological patterns, and emerging research trends. Following the Reporting Standards for Systematic Evidence Syntheses (ROSES), a structured search was conducted in Scopus, Web of Science, and IEEE Xplore. After duplicate removal, screening, eligibility assessment, and quality appraisal, 248 peer-reviewed studies published between 2015 and 2024 were retained for final synthesis. A mixed-method approach combining descriptive analysis and thematic synthesis was employed to analyze KG construction strategies, technological components, application contexts, and reported outcomes. The results indicate that KGs are primarily applied in five interconnected areas: digital recording and preservation, knowledge management and integration, protection and restoration support, cultural transmission and education, and research and innovation. Methodologically, the literature reveals a transition from ontology-driven and manually curated knowledge models toward hybrid approaches integrating artificial intelligence techniques such as natural language processing and machine learning. However, persistent challenges remain, including ontology alignment, scalability, evaluation inconsistency, and limited cross-project interoperability. This review contributes a consolidated and transparent evidence base for KG applications in cultural heritage and advances a conceptual understanding of KGs as socio-technical infrastructures that mediate cultural knowledge representation and interpretation. The findings offer methodological insights and practical implications for researchers, heritage professionals, and system designers, while highlighting directions for future interdisciplinary research. Full article
(This article belongs to the Section Information Applications)
Show Figures

Figure 1

53 pages, 5533 KB  
Systematic Review
Embodied AI with Foundation Models for Mobile Service Robots: A Systematic Review
by Matthew Lisondra, Beno Benhabib and Goldie Nejat
Robotics 2026, 15(3), 55; https://doi.org/10.3390/robotics15030055 - 4 Mar 2026
Cited by 1 | Viewed by 2509
Abstract
Rapid advancements in foundation models, including Large Language Models, Vision-Language Models, Multimodal Large Language Models, and Vision-Language-Action models, have opened new avenues for embodied AI in mobile service robotics. By combining foundation models with the principles of embodied AI, where intelligent systems perceive, [...] Read more.
Rapid advancements in foundation models, including Large Language Models, Vision-Language Models, Multimodal Large Language Models, and Vision-Language-Action models, have opened new avenues for embodied AI in mobile service robotics. By combining foundation models with the principles of embodied AI, where intelligent systems perceive, reason, and act through physical interaction, mobile service robots can achieve more flexible understanding, adaptive behavior, and robust task execution in dynamic real-world environments. Despite this progress, embodied AI for mobile service robots continues to face fundamental challenges related to the translation of natural language instructions into executable robot actions, multimodal perception in human-centered environments, uncertainty estimation for safe decision-making, and computational constraints for real-time onboard deployment. In this paper, we present the first systematic review of foundation models in mobile service robotics, following the preferred reporting items for systematic reviews and meta-analysis (PRISMA) guidelines. Using an OpenAlex literature search, we considered 7506 papers for the years spanning 1968–2025. Our detailed analysis identified four main challenges and how recent advances in foundation models, related to the translation of natural language instructions into executable robot actions, multimodal perception in human-centered environments, uncertainty estimation for safe decision-making, and computational constraints for real-time onboard deployment, have addressed these challenges. We further examine real-world applications in domestic assistance, healthcare, and service automation, highlighting how foundation models enable context-aware, socially responsive, and generalizable robot behaviors. Beyond technical considerations, we discuss ethical, societal, human-interaction, and physical design and ergonomic implications associated with deploying foundation-model-enabled service robots in human environments. Finally, we outline future research directions emphasizing reliability and lifelong adaptation, privacy-aware and resource-constrained deployment, as well as the governance and human-in-the-loop frameworks required for safe, scalable, and trustworthy mobile service robotics. Full article
(This article belongs to the Special Issue Embodied Intelligence: Physical Human–Robot Interaction)
Show Figures

Figure 1

20 pages, 2574 KB  
Article
Quantitative Evaluation and Domain Adaptation of Vision–Language Models for Mixed-Reality Interpretation of Indoor Environmental Computational Fluid Dynamics Visualizations
by Soushi Futamura and Tomohiro Fukuda
Technologies 2026, 14(3), 157; https://doi.org/10.3390/technologies14030157 - 4 Mar 2026
Viewed by 569
Abstract
In built environmental design, incorporating building user participation and verifying indoor thermal performance at early design stages have become increasingly important. Although Computational Fluid Dynamics (CFD) analysis is widely used to predict indoor thermal environments, its results are difficult for non-expert stakeholders to [...] Read more.
In built environmental design, incorporating building user participation and verifying indoor thermal performance at early design stages have become increasingly important. Although Computational Fluid Dynamics (CFD) analysis is widely used to predict indoor thermal environments, its results are difficult for non-expert stakeholders to interpret, even when visualized using Mixed Reality (MR). Interpreting CFD visualizations in MR requires quantitative reasoning that explicitly cross-references visual features with legend information, rather than relying on prior color–value associations learned from natural images. This study investigates the capability of Vision–Language Models (VLMs) to interpret MR visualizations of CFD results and respond to user queries. We focus on indoor temperature distributions and airflow velocities visualized in MR. A novel dataset was constructed, consisting of MR images with CFD results superimposed onto real indoor spaces, paired with domain-specific question–answer annotations requiring legend-based reasoning. Using this dataset, a general-purpose VLM (Qwen2.5-VL) was fine-tuned. Experimental results show that the baseline model achieved less than 30% accuracy, whereas fine-tuning improved accuracy to over 60% across all categories while largely preserving general reasoning performance. These results demonstrate that domain adaptation enables VLMs to quantitatively interpret physical information embedded in MR visualizations, supporting non-experts’ understanding of built environmental design. Full article
(This article belongs to the Section Construction Technologies)
Show Figures

Figure 1

20 pages, 4824 KB  
Article
CIR-SQL: A Dual-Model Intent Recognition Framework for Chinese Text-to-SQL
by Yao Wang, Huiyong Lv and Yurong Qian
AI 2026, 7(3), 91; https://doi.org/10.3390/ai7030091 - 4 Mar 2026
Viewed by 608
Abstract
In Industry 4.0 environments, operators and production managers frequently query industrial databases for production monitoring, quality control, and equipment maintenance using natural language. Existing Chinese NL2SQL systems often process semantic, program, and schema information in a single encoder, which leads to semantic-program interference [...] Read more.
In Industry 4.0 environments, operators and production managers frequently query industrial databases for production monitoring, quality control, and equipment maintenance using natural language. Existing Chinese NL2SQL systems often process semantic, program, and schema information in a single encoder, which leads to semantic-program interference and frequent structural or schema errors in the generated SQL. We present CIR-SQL, a dual-model framework that separates intent recognition from SQL generation via structured intermediate representations, decoupling semantic understanding from program synthesis. CIR-SQL employs a seven-category intent classification system (simple_select, count_query, filter_query, max_min_query, sort_query, join_query, group_by_query) and leverages large language models for intent recognition and structured information extraction. A three-level hierarchical backtracking strategy (SQL, context, intent) further improves robustness by correcting different error types. The architecture is particularly suited to Industry 4.0 scenarios where Chinese-speaking operators interact with complex industrial databases containing production data, quality metrics, and equipment status information. Full article
Show Figures

Figure 1

15 pages, 736 KB  
Article
Reducing Energy Footprint of LLM Inference Through FPGA-Based Heterogeneous Computing Platforms
by Thiago Cormie Monteiro and Andrea Guerrieri
Electronics 2026, 15(5), 1052; https://doi.org/10.3390/electronics15051052 - 3 Mar 2026
Viewed by 660
Abstract
Artificial Intelligence (AI) has emerged as a transformative force, increasingly integrated into diverse aspects of modern society, from healthcare and education to business and entertainment. Among the most influential AI technologies are large language models (LLMs), such as generative pretrained transformers (GPTs). These [...] Read more.
Artificial Intelligence (AI) has emerged as a transformative force, increasingly integrated into diverse aspects of modern society, from healthcare and education to business and entertainment. Among the most influential AI technologies are large language models (LLMs), such as generative pretrained transformers (GPTs). These models are designed to process vast amounts of data and perform complex computations, enabling advanced capabilities in natural language understanding and generation. However, deployment and operation of such systems requires significant computational resources, leading to substantial energy consumption. While general-purpose hardware such as GPUs is limited by fixed-precision architectures, field-programmable gate arrays (FPGAs) offer the bit-level reconfigurability needed to exploit ultra-low-bitwidth representations. This allows power-intensive multiplications to be replaced by streamlined logic-based accumulations, maximizing the energy benefits of model quantization. This paper addresses the problem of the energy impact of LLMs by leveraging innovative FPGA-based heterogeneous computing platforms. Results demonstrate that ternary matrix multiplication (MatMul) achieves a 23% speedup and a remarkable 96% reduction in digital signal processor (DSP) utilization. Furthermore, the final optimized design shows a 52% reduction in total energy consumption compared to the baseline, making heterogeneous computing a compelling solution for power- and resource-constrained embedded applications. Full article
(This article belongs to the Special Issue New Trends for Power Optimizations in FPGA-Based Embedded Systems)
Show Figures

Figure 1

Back to TopTop