MDPI - Publisher of Open Access Journals

18 pages, 3027 KiB

Open AccessArticle

Domain-Specialized Large Language Model for Corrosion Analysis: Construction and Evaluation of Corr-Lora-RAG

by Weitong Wu, Di Xu, Liangan Liu, Bingqin Wang, Yadi Zhao, Xuequn Cheng and Xiaogang Li

Appl. Sci. 2025, 15(16), 9226; https://doi.org/10.3390/app15169226 - 21 Aug 2025

This study proposes a large language model, Corr-Lora-RAG, designed to address the complexity and uncertainty inherent in corrosion data. A dedicated corrosion knowledge database (CKD) was constructed, and dataset generation code was provided to enhance the model’s reproducibility and adaptability. Based on the [...] Read more.

This study proposes a large language model, Corr-Lora-RAG, designed to address the complexity and uncertainty inherent in corrosion data. A dedicated corrosion knowledge database (CKD) was constructed, and dataset generation code was provided to enhance the model’s reproducibility and adaptability. Based on the Qwen2.5-7B model, the Corr-Lora model was developed by integrating prompt engineering and low-rank adaptation (LoRA) supervised fine-tuning (SFT) techniques, thereby improving the understanding and expression of domain-specific knowledge in the field of corrosion. Furthermore, the Corr-Lora-RAG model was built using retrieval-augmented generation (RAG) technology, enabling dynamic access to external knowledge. Experimental results demonstrate that the proposed model outperforms baseline models in terms of accuracy, completeness, and domain relevance, and exhibits knowledge generation capabilities comparable to those of large-scale language models under limited computational resources. This approach provides an intelligent solution for corrosion risk assessment, standards compliance analysis, and protective strategy formulation, and offers a valuable reference for the development of specialized language models in other engineering fields. Full article

(This article belongs to the Topic Advanced Development and Applications of AI-Generated Content (AIGC))

► Show Figures

Figure 1

18 pages, 1143 KiB

Open AccessArticle

Enhancing Clinical Decision Support with Adaptive Iterative Self-Query Retrieval for Retrieval-Augmented Large Language Models

by Srinivasagam Prabha, Cesar A. Gomez-Cabello, Syed Ali Haider, Ariana Genovese, Maissa Trabilsy, Nadia G. Wood, Sanjay Bagaria, Cui Tao and Antonio J. Forte

Bioengineering 2025, 12(8), 895; https://doi.org/10.3390/bioengineering12080895 - 21 Aug 2025

Abstract

Retrieval-Augmented Generation (RAG) offers a promising strategy to harness large language models (LLMs) for delivering up-to-date, accurate clinical guidance while reducing physicians’ cognitive burden, yet its effectiveness hinges on query clarity and structure. We propose an adaptive Self-Query Retrieval (SQR) framework that integrates [...] Read more.

Retrieval-Augmented Generation (RAG) offers a promising strategy to harness large language models (LLMs) for delivering up-to-date, accurate clinical guidance while reducing physicians’ cognitive burden, yet its effectiveness hinges on query clarity and structure. We propose an adaptive Self-Query Retrieval (SQR) framework that integrates three refinement modules—PICOT (Population, Intervention, Comparison, Outcome, Time), SPICE (Setting, Population, Intervention, Comparison, Evaluation), and Iterative Query Refinement (IQR)—to automatically restructure and iteratively enhance clinical questions until they meet predefined retrieval-quality thresholds. Implemented on Gemini-1.0 Pro, we benchmarked SQR using thirty postoperative rhinoplasty queries, evaluating responses for accuracy and relevance on a three-point Likert scale and for retrieval quality via precision, recall, and F1 score; statistical significance was assessed by one-way ANOVA with Tukey post-hoc testing. The full SQR pipeline achieved 87% accuracy (Likert 2.4 ± 0.7) and 100% relevance (Likert 3.0 ± 0.0), significantly outperforming a non-refined RAG baseline (50% accuracy, 80% relevance; p < 0.01 and p = 0.03). Precision, recall, and F1 rose from 0.17, 0.39 and 0.24 to 0.53, 1.00, and 0.70, respectively, while PICOT-only and SPICE-only variants yielded intermediate improvements. These findings demonstrate that automated structuring and iterative enhancement of queries via SQR substantially elevate LLM-based clinical decision support, and its model-agnostic architecture enables rapid adaptation across specialties, data sources, and LLM platforms. Full article

(This article belongs to the Special Issue Application of Artificial Intelligence in Complex Diseases)

► Show Figures

Figure 1

18 pages, 433 KiB

Open AccessArticle

A Retrieval-Augmented Generation Method for Question Answering on Airworthiness Regulations

by Tao Zheng, Shiyu Shen and Changchang Zeng

Electronics 2025, 14(16), 3314; https://doi.org/10.3390/electronics14163314 - 20 Aug 2025

Abstract

Civil aviation airworthiness regulations are the fundamental basis for the design and operational safety of aircraft. Their provisions exhibit a high degree of specialization, cross-disciplinary complexity, and hierarchical structure. Moreover, the regulations are frequently updated, posing unique challenges for automated question-answering systems. While [...] Read more.

Civil aviation airworthiness regulations are the fundamental basis for the design and operational safety of aircraft. Their provisions exhibit a high degree of specialization, cross-disciplinary complexity, and hierarchical structure. Moreover, the regulations are frequently updated, posing unique challenges for automated question-answering systems. While large language models (LLMs) have demonstrated remarkable capabilities in dialog and reasoning; however, they still face challenges such as difficulties in knowledge updating and a scarcity of high-quality domain-specific datasets when tackling knowledge-intensive tasks in the field of civil aviation regulations. This study introduces a retrieval-augmented generation (RAG) approach that integrates retrieval modules with generative models to enable more efficient knowledge acquisition and updating, encompassing data processing and retrieval-based reasoning. The data processing stage comprises document conversion, information extraction, and document parsing modules. Additionally, a high-quality airworthiness regulation QA dataset was specifically constructed, covering multiple-choice, true/false, and fill-in-the-blank questions, with a total of 4688 entries. The retrieval-based reasoning stage employs vector search and re-ranking strategies, combined with prompt optimization, to enhance the model’s reasoning capabilities in specific airworthiness certification regulation comprehension tasks. A series of experiments demonstrate the effectiveness of the retrieval-augmented generation approach in this domain, significantly improving answer accuracy and retrieval hit rates. Full article

(This article belongs to the Special Issue The Future of AI-Generated Content（AIGC）)

► Show Figures

Figure 1

28 pages, 8325 KiB

Open AccessArticle

Tunnel Rapid AI Classification (TRaiC): An Open-Source Code for 360° Tunnel Face Mapping, Discontinuity Analysis, and RAG-LLM-Powered Geo-Engineering Reporting

by Seyedahmad Mehrishal, Junsu Leem, Jineon Kim, Yulong Shao, Il-Seok Kang and Jae-Joon Song

Remote Sens. 2025, 17(16), 2891; https://doi.org/10.3390/rs17162891 - 20 Aug 2025

Viewed by 53

Abstract

Accurate and efficient rock mass characterization is essential in geotechnical engineering, yet traditional tunnel face mapping remains time consuming, subjective, and potentially hazardous. Recent advances in digital technologies and AI offer automation opportunities, but many existing solutions are hindered by slow 3D scanning, [...] Read more.

Accurate and efficient rock mass characterization is essential in geotechnical engineering, yet traditional tunnel face mapping remains time consuming, subjective, and potentially hazardous. Recent advances in digital technologies and AI offer automation opportunities, but many existing solutions are hindered by slow 3D scanning, computationally intensive processing, and limited integration flexibility. This paper presents Tunnel Rapid AI Classification (TRaiC), an open-source MATLAB-based platform for rapid and automated tunnel face mapping. TRaiC integrates single-shot 360° panoramic photography, AI-powered discontinuity detection, 3D textured digital twin generation, rock mass discontinuity characterization, and Retrieval-Augmented Generation with Large Language Models (RAG-LLM) for automated geological interpretation and standardized reporting. The modular eight-stage workflow includes simplified 3D modeling, trace segmentation, 3D joint network analysis, and rock mass classification using RMR, with outputs optimized for Geo-BIM integration. Initial evaluations indicate substantial reductions in processing time and expert assessment workload. Producing a lightweight yet high-fidelity digital twin, TRaiC enables computational efficiency, transparency, and reproducibility, serving as a foundation for future AI-assisted geotechnical engineering research. Its graphical user interface and well-structured open-source code make it accessible to users ranging from beginners to advanced researchers. Full article

(This article belongs to the Special Issue Machine Learning and Remote/Proximal Sensing for Rock Mass Characterization and Slope Analyses)

► Show Figures

Figure 1

23 pages, 4794 KiB

Open AccessArticle

IHGR-RAG: An Enhanced Retrieval-Augmented Generation Framework for Accurate and Interpretable Power Equipment Condition Assessment

by Zhenhao Ye, Donglian Qi, Hanlin Liu and Siqi Zhang

Electronics 2025, 14(16), 3284; https://doi.org/10.3390/electronics14163284 - 19 Aug 2025

Viewed by 180

Abstract

Condition assessment of power equipment is crucial for optimizing maintenance strategies. However, knowledge-driven approaches rely heavily on manual alignment between equipment failure characteristics and guideline information, while data-driven methods predominantly depend on on-site experiments to detect abnormal conditions. Both face challenges in terms [...] Read more.

Condition assessment of power equipment is crucial for optimizing maintenance strategies. However, knowledge-driven approaches rely heavily on manual alignment between equipment failure characteristics and guideline information, while data-driven methods predominantly depend on on-site experiments to detect abnormal conditions. Both face challenges in terms of inefficiency and timeliness limitations. With the growing integration of information systems, a significant portion of condition assessment-related information is represented in textual formats, such as system alerts and experimental records. Although Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) show promise in processing such text-based information, their practical application is constrained by LLMs’ hallucinations and RAG’s coarse-grained retrieval mechanisms, which struggle with semantically similar but contextually distinct guideline items. To address these issues, this paper proposes an enhanced RAG framework that integrates hierarchical and global retrieval mechanisms (IHGR-RAG). The framework comprehensively incorporates three optimization strategies: a query rewriting mechanism based on few-shot learning prompt engineering, an integrated approach combining hierarchical and global retrieval mechanisms, and a zero-shot chain-of-thought generation optimization pipeline. Additionally, a Task-Specific Quantitative Evaluation Benchmark is developed to rigorously evaluate model performance. Experimental results indicate that IHGR-RAG achieves accuracy improvements of

4.14 %

and

5.12 %

in the task of matching the solely correct guideline item, compared to conventional RAG and standalone hierarchical methods, respectively. Ablation studies confirm the effectiveness of each module. This work advances dynamic health monitoring for power equipment by balancing interpretability, accuracy, and domain adaptability, providing a cost-effective optimization pathway for scenarios with limited annotated data. Full article

(This article belongs to the Special Issue Advances in Condition Monitoring and Fault Diagnosis)

► Show Figures

Figure 1

14 pages, 1467 KiB

Open AccessArticle

MDKAG: Retrieval-Augmented Educational QA Powered by a Multimodal Disciplinary Knowledge Graph

by Xu Zhao, Guozhong Wang and Yufei Lu

Appl. Sci. 2025, 15(16), 9095; https://doi.org/10.3390/app15169095 - 18 Aug 2025

Viewed by 106

Abstract

With the accelerated digital transformation in education, the efficient integration of massive multimodal instructional resources and the support for interactive question answering (QA) remains a prominent challenge. This study introduces Multimodal Disciplinary Knowledge-Augmented Generation (MDKAG), a framework integrating retrieval-augmented generation (RAG) with a [...] Read more.

With the accelerated digital transformation in education, the efficient integration of massive multimodal instructional resources and the support for interactive question answering (QA) remains a prominent challenge. This study introduces Multimodal Disciplinary Knowledge-Augmented Generation (MDKAG), a framework integrating retrieval-augmented generation (RAG) with a multimodal disciplinary knowledge graph (MDKG). MDKAG first extracts high-precision entities from digital textbooks, lecture slides, and classroom videos by using the Enhanced Representation through Knowledge Integration 3.0 (ERNIE 3.0) model and then links them into a graph that supports fine-grained retrieval. At inference time, the framework retrieves graph-adjacent passages, integrates multimodal data, and feeds them into a large language model (LLM) to generate context-aligned answers. An answer-verification module checks semantic overlap and entity coverage to filter hallucinations and triggers incremental graph updates when new concepts appear. Experiments on three university courses show that MDKAG reduces hallucination rates by up to 23% and increases answer accuracy by 11% over text-only RAG and knowledge-augmented generation (KAG) baselines, demonstrating strong adaptability across subject domains. The results indicate that MDKAG offers an effective route for scalable knowledge organization and reliable interactive QA in education. Full article

► Show Figures

Figure 1

21 pages, 806 KiB

Open AccessTutorial

Multi-Layered Framework for LLM Hallucination Mitigation in High-Stakes Applications: A Tutorial

by Sachin Hiriyanna and Wenbing Zhao

Computers 2025, 14(8), 332; https://doi.org/10.3390/computers14080332 - 16 Aug 2025

Viewed by 499

Abstract

Large language models (LLMs) now match or exceed human performance on many open-ended language tasks, yet they continue to produce fluent but incorrect statements, which is a failure mode widely referred to as hallucination. In low-stakes settings this may be tolerable; in regulated [...] Read more.

Large language models (LLMs) now match or exceed human performance on many open-ended language tasks, yet they continue to produce fluent but incorrect statements, which is a failure mode widely referred to as hallucination. In low-stakes settings this may be tolerable; in regulated or safety-critical domains such as financial services, compliance review, and client decision support, it is not. Motivated by these realities, we develop an integrated mitigation framework that layers complementary controls rather than relying on any single technique. The framework combines structured prompt design, retrieval-augmented generation (RAG) with verifiable evidence sources, and targeted fine-tuning aligned with domain truth constraints. Our interest in this problem is practical. Individual mitigation techniques have matured quickly, yet teams deploying LLMs in production routinely report difficulty stitching them together in a coherent, maintainable pipeline. Decisions about when to ground a response in retrieved data, when to escalate uncertainty, how to capture provenance, and how to evaluate fidelity are often made ad hoc. Drawing on experience from financial technology implementations, where even rare hallucinations can carry material cost, regulatory exposure, or loss of customer trust, we aim to provide clearer guidance in the form of an easy-to-follow tutorial. This paper makes four contributions. First, we introduce a three-layer reference architecture that organizes mitigation activities across input governance, evidence-grounded generation, and post-response verification. Second, we describe a lightweight supervisory agent that manages uncertainty signals and triggers escalation (to humans, alternate models, or constrained workflows) when confidence falls below policy thresholds. Third, we analyze common but under-addressed security surfaces relevant to hallucination mitigation, including prompt injection, retrieval poisoning, and policy evasion attacks. Finally, we outline an implementation playbook for production deployment, including evaluation metrics, operational trade-offs, and lessons learned from early financial-services pilots. Full article

► Show Figures

Figure 1

22 pages, 3187 KiB

Open AccessArticle

Automated Clinical Trial Data Analysis and Report Generation by Integrating Retrieval-Augmented Generation (RAG) and Large Language Model (LLM) Technologies

by Sheng-Ming Kuo, Shao-Kuo Tai, Hung-Yu Lin and Rung-Ching Chen

AI 2025, 6(8), 188; https://doi.org/10.3390/ai6080188 - 15 Aug 2025

Viewed by 681

Abstract

Retrieval-Augmented Generation (RAG) combined with Large Language Models (LLMs) introduces a new paradigm for clinical-trial data analysis that is both real-time and knowledge-traceable. This study targets a multi-site, real-world data environment. It builds a hierarchical RAG pipeline spanning an electronic health record (EHR), [...] Read more.

Retrieval-Augmented Generation (RAG) combined with Large Language Models (LLMs) introduces a new paradigm for clinical-trial data analysis that is both real-time and knowledge-traceable. This study targets a multi-site, real-world data environment. It builds a hierarchical RAG pipeline spanning an electronic health record (EHR), National Health Insurance (NHI) billing codes, and image-vector indices. The LLM is optimized through lightweight LoRA/QLoRA fine-tuning and reinforcement-learning-based alignment. The system first retrieves key textual and imaging evidence from heterogeneous data repositories and then fuses these artifacts into the contextual window for clinical report generation. Experimental results show marked improvements over traditional manual statistics and prompt-only models in retrieval accuracy, textual coherence, and response latency while reducing human error and workload. In evaluation, the proposed multimodal RAG-LLM workflow achieved statistically significant gains in three core metrics—recall, factual consistency, and expert ratings—and substantially shortened overall report-generation time, demonstrating clear efficiency advantages versus conventional manual processes. However, LLMs alone often face challenges such as limited real-world grounding, hallucination risks, and restricted context windows. Similarly, RAG systems, while improving factual consistency, depend heavily on retrieval quality and may yield incoherent synthesis if evidence is misaligned. These limitations underline the complementary nature of integrating RAG and LLM architectures in a clinical reporting context. Quantitatively, the proposed system achieved a Composite Quality Index (CQI) of 78.3, outperforming strong baselines such as Med-PaLM 2 (72.6) and PMC-LLaMA (74.3), and reducing the report drafting time by over 75% (p < 0.01). These findings confirm the practical feasibility of the framework to support fully automated clinical reporting. Full article

► Show Figures

Figure 1

18 pages, 3219 KiB

Open AccessArticle

Designing Trustworthy AI Systems for PTSD Follow-Up

by María Cazares, Jorge Miño-Ayala, Iván Ortiz and Roberto Andrade

Technologies 2025, 13(8), 361; https://doi.org/10.3390/technologies13080361 - 15 Aug 2025

Viewed by 200

Abstract

Post-Traumatic Stress Disorder (PTSD) poses complex clinical challenges due to its emotional volatility, contextual sensitivity, and need for personalized care. Conventional AI systems often fall short in therapeutic contexts due to lack of explainability, ethical safeguards, and narrative understanding. We propose a hybrid [...] Read more.

Post-Traumatic Stress Disorder (PTSD) poses complex clinical challenges due to its emotional volatility, contextual sensitivity, and need for personalized care. Conventional AI systems often fall short in therapeutic contexts due to lack of explainability, ethical safeguards, and narrative understanding. We propose a hybrid neuro-symbolic architecture that combines Large Language Models (LLMs), Retrieval-Augmented Generation (RAG), symbolic controllers, and ensemble classifiers to support clinicians in PTSD follow-up. The proposal integrates real-time anonymization, session memory through patient-specific RAG, and a Human-in-the-Loop (HITL) interface. It ensures clinical safety via symbolic logic rules derived from trauma-informed protocols. The proposed architecture enables safe, personalized AI-driven responses by combining statistical language modeling with explicit therapeutic constraints. Through modular integration, it supports affective signal adaptation, longitudinal memory, and ethical traceability. A comparative evaluation against state-of-the-art approaches highlights improvements in contextual alignment, privacy protection, and clinician supervision. Full article

(This article belongs to the Special Issue AI-Enabled Smart Healthcare Systems)

► Show Figures

Figure 1

15 pages, 1844 KiB

Open AccessArticle

Artificial Intelligence Agent-Enabled Predictive Maintenance: Conceptual Proposal and Basic Framework

by Wenyu Jiang and Fuwen Hu

Computers 2025, 14(8), 329; https://doi.org/10.3390/computers14080329 - 15 Aug 2025

Viewed by 486

Abstract

Predictive maintenance (PdM) represents a significant evolution in maintenance strategies. However, challenges such as system integration complexity, data quality, and data availability are intricately intertwined, collectively impacting the successful deployment of PdM systems. Recently, large model-based agents, or agentic artificial intelligence (AI), have [...] Read more.

Predictive maintenance (PdM) represents a significant evolution in maintenance strategies. However, challenges such as system integration complexity, data quality, and data availability are intricately intertwined, collectively impacting the successful deployment of PdM systems. Recently, large model-based agents, or agentic artificial intelligence (AI), have evolved from simple task automation to active problem-solving and strategic decision-making. As such, we propose an AI agent-enabled PdM method that leverages an agentic AI development platform to streamline the development of a multimodal data-based fault detection agent, a RAG (retrieval-augmented generation)-based fault classification agent, a large model-based fault diagnosis agent, and a digital twin-based fault handling simulation agent. This approach breaks through the limitations of traditional PdM, which relies heavily on single models. This combination of “AI workflow + large reasoning models + operational knowledge base + digital twin” integrates the concepts of BaaS (backend as a service) and LLMOps (large language model operations), constructing an end-to-end intelligent closed loop from data perception to decision execution. Furthermore, a tentative prototype is demonstrated to show the technology stack and the system integration methods of the agentic AI-based PdM. Full article

(This article belongs to the Special Issue Adaptive Decision Making Across Industries with AI and Machine Learning: Frameworks, Challenges, and Innovations)

► Show Figures

Figure 1

28 pages, 968 KiB

Open AccessArticle

EVuLLM: Ethereum Smart Contract Vulnerability Detection Using Large Language Models

by Eleni Mandana, George Vlahavas and Athena Vakali

Electronics 2025, 14(16), 3226; https://doi.org/10.3390/electronics14163226 - 14 Aug 2025

Viewed by 427

Abstract

Smart contracts have become integral to decentralized applications, yet their programmability introduces critical security risks, exemplified by high-profile exploits such as the DAO and Parity Wallet incidents. Existing vulnerability detection methods, including static and dynamic analysis, as well as machine learning-based approaches, often [...] Read more.

Smart contracts have become integral to decentralized applications, yet their programmability introduces critical security risks, exemplified by high-profile exploits such as the DAO and Parity Wallet incidents. Existing vulnerability detection methods, including static and dynamic analysis, as well as machine learning-based approaches, often struggle with emerging threats and rely heavily on large, labeled datasets. This study investigates the effectiveness of open-source, lightweight large language models (LLMs) fine-tuned using parameter-efficient techniques, including Quantized Low-Rank Adaptation (QLoRA), for smart contract vulnerability detection. We introduce the EVuLLM dataset to address the scarcity of diverse evaluation resources and demonstrate that our fine-tuned models achieve up to 94.78% accuracy, surpassing the performance of larger proprietary models, while significantly reducing computational requirements. Moreover, we emphasize the advantages of lightweight models deployable on local hardware, such as enhanced data privacy, reduced reliance on internet connectivity, lower infrastructure costs, and improved control over model behavior, factors that are especially critical in security-sensitive blockchain applications. We also explore Retrieval-Augmented Generation (RAG) as a complementary strategy, achieving competitive results with minimal training. Our findings highlight the practicality of using locally hosted LLMs for secure, efficient, and reproducible smart contract analysis, paving the way for broader adoption of AI-driven security in blockchain ecosystems. Full article

(This article belongs to the Special Issue Network Security and Cryptography Applications)

► Show Figures

Figure 1

25 pages, 3348 KiB

Open AccessArticle

An AI-Assisted Thermodynamic Equilibrium Simulator: A Case Study on Steam Methane Reforming in Isothermal and Adiabatic Reactors

by Julles Mitoura dos Santos Junior, Antonio Carlos Daltro de Freitas and Adriano Pinto Mariano

Processes 2025, 13(8), 2508; https://doi.org/10.3390/pr13082508 - 8 Aug 2025

Viewed by 512

Abstract

This study presents TeS v.3, a thermodynamic equilibrium simulator integrated with an artificial intelligence agent (AI), ThermoAgent, to enhance the analysis of complex chemical systems. Developed in Python, the simulator employs Gibbs energy minimization for isothermal reactors and entropy maximization for [...] Read more.

This study presents TeS v.3, a thermodynamic equilibrium simulator integrated with an artificial intelligence agent (AI), ThermoAgent, to enhance the analysis of complex chemical systems. Developed in Python, the simulator employs Gibbs energy minimization for isothermal reactors and entropy maximization for adiabatic reactors. ThermoAgent leverages the LangChain framework to interpret natural language commands, autonomously execute simulations, and query a scientific knowledge base through a Retrieval-Augmented Generation (RAG) approach. The validation of TeS v.3 demonstrated high accuracy, with coefficients of determination (R² > 0.95) compared to reference simulation data and strong correlation (R² > 0.88) with experimental data from the steam methane reforming (SMR) process. The SMR analysis correctly distinguished the high conversions in isothermal reactors from the limited conversions in adiabatic reactors, due to the reaction temperature drop. ThermoAgent successfully executed simulations and provided justified analyses, combining generated data with information from reference publications. The successful integration of the simulator with the AI agent represents a significant advancement, offering a powerful tool that accurately calculates equilibrium and accelerates knowledge extraction through intuitive interaction. Full article

(This article belongs to the Special Issue Artificial Intelligence (AI) and Automation-Driven Innovations in Chemical Engineering)

► Show Figures

Figure 1

16 pages, 505 KiB

Open AccessArticle

Retrieval-Augmented Text-to-CSEQL Generation for Cross-Platform Cyberspace Assets Query

by Ye Li, Yuwei Li, Fan Shi, Pengfei Xue, Chengxi Xu and Luolin Hu

Electronics 2025, 14(16), 3164; https://doi.org/10.3390/electronics14163164 - 8 Aug 2025

Viewed by 246

Abstract

Cyberspace search engines (CSEs) are systems designed to search and index information about cyberspace assets. Effectively mining data across diverse platforms is hindered by the complexity and diversity of different CSE syntaxes. While Text-to-CSEQL offers a promising solution by translating natural language (NL) [...] Read more.

Cyberspace search engines (CSEs) are systems designed to search and index information about cyberspace assets. Effectively mining data across diverse platforms is hindered by the complexity and diversity of different CSE syntaxes. While Text-to-CSEQL offers a promising solution by translating natural language (NL) questions into cyberspace search engine query language (CSEQL), existing prompt-based methods still struggle due to the platform-specific intricacies of CSEQL. To address this limitation, we propose an LLM-based approach leveraging Retrieval-Augmented Generation (RAG). Specifically, to overcome the inability of traditional methods to retrieve relevant syntax fields effectively, we propose a novel hybrid retrieval mechanism combining keyword and dense retrieval, leveraging both field values and their semantic descriptions. Furthermore, we integrate these retrieved fields and the relevant few-shot examples into a redesigned prompt template adapted from the COSTAR framework. For comprehensive evaluation, we construct a Text-to-CSEQL dataset and introduce a new domain-specific metric, field match (FM). Extensive experiments demonstrate our method’s ability to adapt to platform-specific characteristics. Compared to prompt-based methods, it achieves an average accuracy improvement of 43.15% when generating CSEQL queries for diverse platforms. Moreover, our method also outperforms techniques designed for single-platform CSEQL generation. Full article

(This article belongs to the Special Issue New Insights into Natural Language Processing and Large Language Models)

► Show Figures

Figure 1

25 pages, 1436 KiB

Open AccessReview

Large Language Models for Structured and Semi-Structured Data, Recommender Systems and Knowledge Base Engineering: A Survey of Recent Techniques and Architectures

by Alma Smajić, Ratomir Karlović, Mieta Bobanović Dasko and Ivan Lorencin

Electronics 2025, 14(15), 3153; https://doi.org/10.3390/electronics14153153 - 7 Aug 2025

Viewed by 638

Abstract

Large Language Models (LLMs) are reshaping recommendation systems through enhanced language understanding, reasoning, and integration with structured data. This systematic review analyzes 88 studies published between 2023 and 2025, categorized into three thematic areas: data processing, technical identification, and LLM-based recommendation architectures. Following [...] Read more.

Large Language Models (LLMs) are reshaping recommendation systems through enhanced language understanding, reasoning, and integration with structured data. This systematic review analyzes 88 studies published between 2023 and 2025, categorized into three thematic areas: data processing, technical identification, and LLM-based recommendation architectures. Following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, the review highlights key trends such as the use of knowledge graphs, Retrieval-Augmented Generation (RAG), domain-specific fine-tuning, and robustness improvements. Findings reveal that while LLMs significantly advance semantic reasoning and personalization, challenges remain in hallucination mitigation, fairness, and domain adaptation. Technical innovations, including graph-augmented retrieval methods and human-in-the-loop validation, show promise in addressing these limitations. The review also considers the broader macroeconomic implications associated with the deployment of LLM-based systems, particularly as they relate to scalability, labor dynamics, and resource-intensive implementation in real-world recommendation contexts, emphasizing both productivity gains and potential labor market shifts. This work provides a structured overview of current methods and outlines future directions for developing reliable and efficient LLM-based recommendation systems. Full article

(This article belongs to the Special Issue Advances in Algorithm Optimization and Computational Intelligence)

► Show Figures

Figure 1

28 pages, 15658 KiB

Open AccessArticle

Unifying Flood-Risk Communication: Empowering Community Leaders Through AI-Enhanced, Contextualized Storytelling

by Michal Zajac, Connor Kulawiak, Shenglin Li, Caleb Erickson, Nathan Hubbell and Jiaqi Gong

Hydrology 2025, 12(8), 204; https://doi.org/10.3390/hydrology12080204 - 4 Aug 2025

Viewed by 456

Abstract

Floods pose a growing threat globally, causing tragic loss of life, billions in economic damage annually, and disproportionately affecting socio-economically vulnerable populations. This paper aims to improve flood-risk communication for community leaders by exploring the application of artificial intelligence. We categorize U.S. flood [...] Read more.

Floods pose a growing threat globally, causing tragic loss of life, billions in economic damage annually, and disproportionately affecting socio-economically vulnerable populations. This paper aims to improve flood-risk communication for community leaders by exploring the application of artificial intelligence. We categorize U.S. flood information sources, review communication modalities and channels, synthesize the literature on community leaders’ roles in risk communication, and analyze existing technological tools. Our analysis reveals three key challenges: the fragmentation of flood information, information overload that impedes decision-making, and the absence of a unified communication platform to address these issues. We find that AI techniques can organize data and significantly enhance communication effectiveness, particularly when delivered through infographics and social media channels. Based on these findings, we propose FLAI (Flood Language AI), an AI-driven flood communication platform that unifies fragmented flood data sources. FLAI employs knowledge graphs to structure fragmented data sources and utilizes a retrieval-augmented generation (RAG) framework to enable large language models (LLMs) to produce contextualized narratives, including infographics, maps, and cost–benefit analyses. Beyond flood management, FLAI’s framework demonstrates how AI can transform public service data management and institutional AI readiness. By centralizing and organizing information, FLAI can significantly reduce the cognitive burden on community leaders, helping them communicate timely, actionable insights to save lives and build flood resilience. Full article

► Show Figures

Figure 1

Search Results (135)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (135)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI