Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (125)

Search Parameters:
Keywords = multimodal LLM

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
36 pages, 163598 KB  
Article
Multi-Weather DomainShifter: A Comprehensive Multi-Weather Transfer LLM Agent for Handling Domain Shift in Aerial Image Processing
by Yubo Wang, Ruijia Wen, Hiroyuki Ishii and Jun Ohya
J. Imaging 2025, 11(11), 395; https://doi.org/10.3390/jimaging11110395 (registering DOI) - 6 Nov 2025
Abstract
Recent deep learning-based remote sensing analysis models often struggle with performance degradation due to domain shifts caused by illumination variations (clear to overcast), changing atmospheric conditions (clear to foggy, dusty), and physical scene changes (clear to snowy). Addressing domain shift in aerial image [...] Read more.
Recent deep learning-based remote sensing analysis models often struggle with performance degradation due to domain shifts caused by illumination variations (clear to overcast), changing atmospheric conditions (clear to foggy, dusty), and physical scene changes (clear to snowy). Addressing domain shift in aerial image segmentation is challenging due to limited training data availability, including costly data collection and annotation. We propose Multi-Weather DomainShifter, a comprehensive multi-weather domain transfer system that augments single-domain images into various weather conditions without additional laborious annotation, coordinated by a large language model (LLM) agent. Specifically, we utilize Unreal Engine to construct a synthetic dataset featuring images captured under diverse conditions such as overcast, foggy, and dusty settings. We then propose a latent space style transfer model that generates alternate domain versions based on real aerial datasets. Additionally, we present a multi-modal snowy scene diffusion model with LLM-assisted scene descriptors to add snowy elements into scenes. Multi-weather DomainShifter integrates these two approaches into a tool library and leverages the agent for tool selection and execution. Extensive experiments on the ISPRS Vaihingen and Potsdam dataset demonstrate that domain shift caused by weather change in aerial image-leads to significant performance drops, then verify our proposal’s capacity to adapt models to perform well in shifted domains while maintaining their effectiveness in the original domain. Full article
(This article belongs to the Special Issue Celebrating the 10th Anniversary of the Journal of Imaging)
23 pages, 933 KB  
Article
Multimodal Semantic Fusion of Heterogeneous Data Silos
by Abdurrahman Alshareef and Bernard P. Zeigler
Systems 2025, 13(11), 987; https://doi.org/10.3390/systems13110987 - 4 Nov 2025
Abstract
Maintaining consistency in complex systems is a continuous challenge that requires active coordination. Data management systems often face the issue of segregated data silos due to various organizational and technical factors. Integrating them when needed can present challenges due to heterogeneity and multimodality. [...] Read more.
Maintaining consistency in complex systems is a continuous challenge that requires active coordination. Data management systems often face the issue of segregated data silos due to various organizational and technical factors. Integrating them when needed can present challenges due to heterogeneity and multimodality. Recent advances in AI models with enhanced multimodal inference and semantic reasoning capabilities offer an opportunity to resolve interoperability issues at both the schema and data levels. In this paper, we discuss ways to leverage such models to mitigate a variety of heterogeneous timing and data barriers across disparate silos. We also examine their fusion and propose ways to formally define it as a foundational means for self-evolving unified meta-space in light of recent model enablements and active inference. Assessing the degree of fusion is necessary to understand and determine how silos, as subsystems, collectively interact, and therefore to control their integration while preserving data source independence. Adherence to a principled design that handles complexity can guide crucial decisions and enhance controllability over the reasoning process. We formalize a foundation for separating prior knowledge from observed data, showing how to leverage inference in both cases with examples and real data. The resulting approach enables advanced inference while providing statistical evidence from observed data by applying reasoning at multiple steps. To conclude, we discuss the implications of this approach for complex systems more generally. Full article
Show Figures

Figure 1

17 pages, 2127 KB  
Article
Leveraging Large Language Models for Real-Time UAV Control
by Kheireddine Choutri, Samiha Fadloun, Ayoub Khettabi, Mohand Lagha, Souham Meshoul and Raouf Fareh
Electronics 2025, 14(21), 4312; https://doi.org/10.3390/electronics14214312 - 2 Nov 2025
Viewed by 378
Abstract
As drones become increasingly integrated into civilian and industrial domains, the demand for natural and accessible control interfaces continues to grow. Conventional manual controllers require technical expertise and impose cognitive overhead, limiting their usability in dynamic and time-critical scenarios. To address these limitations, [...] Read more.
As drones become increasingly integrated into civilian and industrial domains, the demand for natural and accessible control interfaces continues to grow. Conventional manual controllers require technical expertise and impose cognitive overhead, limiting their usability in dynamic and time-critical scenarios. To address these limitations, this paper presents a multilingual voice-driven control framework for quadrotor drones, enabling real-time operation in both English and Arabic. The proposed architecture combines offline Speech-to-Text (STT) processing with large language models (LLMs) to interpret spoken commands and translate them into executable control code. Specifically, Vosk is employed for bilingual STT, while Google Gemini provides semantic disambiguation, contextual inference, and code generation. The system is designed for continuous, low-latency operation within an edge–cloud hybrid configuration, offering an intuitive and robust human–drone interface. While speech recognition and safety validation are processed entirely offline, high-level reasoning and code generation currently rely on cloud-based LLM inference. Experimental evaluation demonstrates an average speech recognition accuracy of 95% and end-to-end command execution latency between 300 and 500 ms, validating the feasibility of reliable, multilingual, voice-based UAV control. This research advances multimodal human–robot interaction by showcasing the integration of offline speech recognition and LLMs for adaptive, safe, and scalable aerial autonomy. Full article
Show Figures

Figure 1

25 pages, 5575 KB  
Article
Multi-Agent Multimodal Large Language Model Framework for Automated Interpretation of Fuel Efficiency Analytics in Public Transportation
by Zhipeng Ma, Ali Rida Bahja, Andreas Burgdorf, André Pomp, Tobias Meisen, Bo Nørregaard Jørgensen and Zheng Grace Ma
Appl. Sci. 2025, 15(21), 11619; https://doi.org/10.3390/app152111619 - 30 Oct 2025
Viewed by 186
Abstract
Enhancing fuel efficiency in public transportation requires the integration of complex multimodal data into interpretable, decision-relevant insights. However, traditional analytics and visualization methods often yield fragmented outputs that demand extensive human interpretation, limiting scalability and consistency. This study presents a multi-agent framework that [...] Read more.
Enhancing fuel efficiency in public transportation requires the integration of complex multimodal data into interpretable, decision-relevant insights. However, traditional analytics and visualization methods often yield fragmented outputs that demand extensive human interpretation, limiting scalability and consistency. This study presents a multi-agent framework that leverages multimodal large language models (LLMs) to automate data narration and energy insight generation. The framework coordinates three specialized agents, including a data narration agent, an LLM-as-a-judge agent, and an optional human-in-the-loop evaluator, to iteratively transform analytical artifacts into coherent, stakeholder-oriented reports. The system is validated through a real-world case study on public bus transportation in Northern Jutland, Denmark, where fuel efficiency data from 4006 trips are analyzed using Gaussian Mixture Model clustering. Comparative experiments across five state-of-the-art LLMs and three prompting paradigms identify GPT-4.1 mini with Chain-of-Thought prompting as the optimal configuration, achieving 97.3% narrative accuracy while balancing interpretability and computational cost. The findings demonstrate that multi-agent orchestration significantly enhances factual precision, coherence, and scalability in LLM-based reporting. The proposed framework establishes a replicable and domain-adaptive methodology for AI-driven narrative generation and decision support in energy informatics. Full article
(This article belongs to the Special Issue Enhancing User Experience in Automation and Control Systems)
Show Figures

Figure 1

34 pages, 62676 KB  
Article
Multimodal LLM vs. Human-Measured Features for AI Predictions of Autism in Home Videos
by Parnian Azizian, Mohammadmahdi Honarmand, Aditi Jaiswal, Aaron Kline, Kaitlyn Dunlap, Peter Washington and Dennis P. Wall
Algorithms 2025, 18(11), 687; https://doi.org/10.3390/a18110687 - 29 Oct 2025
Viewed by 335
Abstract
Autism diagnosis remains a critical healthcare challenge, with current assessments contributing to average diagnostic ages of 5 and extending to 8 in underserved populations. With the FDA approval of CanvasDx in 2021, the paradigm of human-in-the-loop AI diagnostics entered the pediatric market as [...] Read more.
Autism diagnosis remains a critical healthcare challenge, with current assessments contributing to average diagnostic ages of 5 and extending to 8 in underserved populations. With the FDA approval of CanvasDx in 2021, the paradigm of human-in-the-loop AI diagnostics entered the pediatric market as the first medical device for clinically precise autism diagnosis at scale, while fully automated deep learning approaches have remained underdeveloped. However, the importance of early autism detection, ideally before 3 years of age, underscores the value of developing even more automated AI approaches, due to their potentials for scale, reach, and privacy. We present the first systematic evaluation of multimodal LLMs as direct replacements for human annotation in AI-based autism detection. Evaluating seven Gemini model variants (1.5–2.5 series) on 50 YouTube videos shows clear generational progression: version 1.5 models achieve 72–80% accuracy, version 2.0 models reach 80%, and version 2.5 models attain 85–90%, with the best model (2.5 Pro) achieving 89.6% classification accuracy using validated autism detection AI models (LR5)—comparable to the 88% clinical baseline and approaching crowdworker performance of 92–98%. The 24% improvement across two generations suggests the gap is closing. LLMs demonstrate high within-model consistency versus moderate human agreement, with distinct assessment strategies: LLMs focus on language/behavioral markers, crowdworkers prioritize social-emotional engagement, clinicians balance both. While LLMs have yet to match the highest-performing subset of human annotators in their ability to extract behavioral features that are useful for human-in-the-loop AI diagnosis, their rapid improvement and advantages in consistency, scalability, cost, and privacy position them as potentially viable alternatives for aiding diagnostic processes in the future. Full article
(This article belongs to the Special Issue Algorithms for Computer Aided Diagnosis: 2nd Edition)
Show Figures

Figure 1

21 pages, 3340 KB  
Article
Orthodontic Biomechanical Reasoning with Multimodal Language Models: Performance and Clinical Utility
by Arda Arısan, Celal Genç and Gökhan Serhat Duran
Bioengineering 2025, 12(11), 1165; https://doi.org/10.3390/bioengineering12111165 - 27 Oct 2025
Viewed by 366
Abstract
Background: Multimodal large language models (LLMs) are increasingly being explored as clinical support tools, yet their capacity for orthodontic biomechanical reasoning has not been systematically evaluated. This retrospective study assessed their ability to analyze treatment mechanics and explored their potential role in [...] Read more.
Background: Multimodal large language models (LLMs) are increasingly being explored as clinical support tools, yet their capacity for orthodontic biomechanical reasoning has not been systematically evaluated. This retrospective study assessed their ability to analyze treatment mechanics and explored their potential role in supporting orthodontic decision-making. Methods: Five publicly available models (GPT-o3, Claude 3.7 Sonnet, Gemini 2.5 Pro, GPT-4.0, and Grok) analyzed 56 standardized intraoral photographs illustrating a diverse range of active orthodontic force systems commonly encountered in clinical practice. Three experienced orthodontists independently scored the outputs across four domains—observation, interpretation, biomechanics, and confidence—using a 5-point scale. Inter-rater agreement and consistency were assessed, and statistical comparisons were made between models. Results: GPT-o3 achieved the highest composite score (3.34/5.00; 66.8%), significantly outperforming all other models. The performance ranking was followed by Claude (57.8%), Gemini (52.6%), GPT-4.0 (48.8%), and Grok (38.8%). Inter-rater reliability among the expert evaluators was excellent, with ICC values ranging from 0.786 (Confidence Evaluation) to 0.802 (Observation). Model self-reported confidence showed poor calibration against expert-rated output quality. Conclusions: Multimodal LLMs show emerging potential for assisting orthodontic biomechanical assessment. With expert-guided validation, these models may contribute meaningfully to clinical decision support across diverse biomechanical scenarios encountered in routine orthodontic care. Full article
(This article belongs to the Special Issue New Tools for Multidisciplinary Treatment in Dentistry)
Show Figures

Figure 1

15 pages, 2174 KB  
Article
BoxingPro: An IoT-LLM Framework for Automated Boxing Coaching via Wearable Sensor Data Fusion
by Man Zhu, Pengfei Huang, Xiaolong Xu, Houpeng He and Lijie Zhang
Electronics 2025, 14(21), 4155; https://doi.org/10.3390/electronics14214155 - 23 Oct 2025
Viewed by 366
Abstract
The convergence of Internet of Things (IoT) and Artificial Intelligence (AI) has enabled personalized sports coaching, yet a significant gap remains: translating low-level sensor data into high-level, contextualized feedback. Large Language Models (LLMs) excel at reasoning and instruction but lack a native understanding [...] Read more.
The convergence of Internet of Things (IoT) and Artificial Intelligence (AI) has enabled personalized sports coaching, yet a significant gap remains: translating low-level sensor data into high-level, contextualized feedback. Large Language Models (LLMs) excel at reasoning and instruction but lack a native understanding of physical kinematics. This paper introduces BoxingPro, a novel framework that bridges this semantic gap by fusing wearable sensor data with LLMs for automated boxing coaching. Our core contribution is a dedicated translation methodology that converts multi-modal time-series data (IMU) and visual data (video) into structured linguistic prompts, enabling off-the-shelf LLMs to perform sophisticated biomechanical reasoning without extensive retraining. Our evaluation with professional boxers showed that the generated feedback achieved an average expert rating of over 4.0/5.0 on key criteria like biomechanical correctness and actionability. This work establishes a new paradigm for integrating sensor-based systems with LLMs, with potential applications extending far beyond boxing to any domain requiring physical skill assessment. Full article
(This article belongs to the Special Issue Techniques and Applications in Prompt Engineering and Generative AI)
Show Figures

Figure 1

29 pages, 549 KB  
Article
Catch Me If You Can: Rogue AI Detection and Correction at Scale
by Fatemeh Stodt, Jan Stodt, Mohammed Alshawki, Javad Salimi Sratakhti and Christoph Reich
Electronics 2025, 14(20), 4122; https://doi.org/10.3390/electronics14204122 - 21 Oct 2025
Viewed by 427
Abstract
Modern AI systems can strategically misreport information when incentives diverge from truthfulness, posing risks for oversight and deployment. Prior studies often examine this behavior within a single paradigm; systematic, cross-architecture evidence under a unified protocol has been limited. We introduce the Strategy Elicitation [...] Read more.
Modern AI systems can strategically misreport information when incentives diverge from truthfulness, posing risks for oversight and deployment. Prior studies often examine this behavior within a single paradigm; systematic, cross-architecture evidence under a unified protocol has been limited. We introduce the Strategy Elicitation Battery (SEB), a standardized probe suite for measuring deceptive reporting across large language models (LLMs), reinforcement-learning agents, vision-only classifiers, multimodal encoders, state-space models, and diffusion models. SEB uses Bayesian inference tasks with persona-controlled instructions, schema-constrained outputs, deterministic decoding where supported, and a probe mix (near-threshold, repeats, neutralized, cross-checks). Estimates use clustered bootstrap intervals, and significance is assessed with a logistic regression by architecture; a mixed-effects analysis is planned once the per-round agent/episode traces are exported. On the latest pre-correction runs, SEB shows a consistent cross-architecture pattern in deception rates: ViT 80.0%, CLIP 15.0%, Mamba 10.0%, RL agents 10.0%, Stable Diffusion 10.0%, and LLMs 5.0% (20 scenarios/architecture). A logistic regression on per-scenario flags finds a significant overall architecture effect (likelihood-ratio test vs. intercept-only: χ2(5)=41.56, p=7.22×108). Holm-adjusted contrasts indicate ViT is significantly higher than all other architectures in this snapshot; the remaining pairs are not significant. Post-correction acceptance decisions are evaluated separately using residual deception and override rates under SEB-Correct. Latency varies by architecture (sub-second to minutes), enabling pre-deployment screening broadly and real-time auditing for low-latency classes. Results indicate that SEB-Detect deception flags are not confined to any one paradigm, that distinct architectures can converge to similar levels under a common interface, and that reporting interfaces and incentive framing are central levers for mitigation. We operationalize “deception” as reward-sensitive misreport flags, and we separate detection from intervention via a correction wrapper (SEB-Correct), supporting principled acceptance decisions for deployment. Full article
Show Figures

Figure 1

17 pages, 1203 KB  
Article
Exploration of Stability Judgments: Assessing Multimodal LLMs in Game-Inspired Physical Reasoning Tasks
by Mury Fajar Dewantoro, Febri Abdullah, Yi Xia, Ibrahim Khan, Ruck Thawonmas, Wenwen Ouyang and Fitra Abdurrachman Bachtiar
Appl. Sci. 2025, 15(20), 11253; https://doi.org/10.3390/app152011253 - 21 Oct 2025
Viewed by 285
Abstract
This study extends our previous investigation into whether multimodal large language models (MLLMs) can reason about physical reasoning, using a game environment as the testbed. Stability served as a foundational scenario to probe model understanding of physical reasoning. We evaluated twelve models, combining [...] Read more.
This study extends our previous investigation into whether multimodal large language models (MLLMs) can reason about physical reasoning, using a game environment as the testbed. Stability served as a foundational scenario to probe model understanding of physical reasoning. We evaluated twelve models, combining those from the earlier study with six additional open-weight models, across three tasks designed to capture different aspects of reasoning. Human participants were included as a reference point, consistently achieving the highest accuracy, underscoring the gap between model and human performance. Among MLLMs, the GPT series continued to perform strongly, with GPT-4o showing reliable results in image-based tasks, while the Qwen2.5-VL series reached the highest overall scores in this extended study and in some cases surpassed commercial counterparts. Simpler binary tasks yielded balanced performance across modalities, suggesting that models can capture certain basic aspects of reasoning, whereas more complex multiple-choice tasks led to sharp declines in accuracy. Structured inputs such as XML improved results in the prediction task, where Qwen2.5-VL outperformed GPT variants in our earlier work. These findings demonstrate progress in scaling and modality design for physical reasoning, while reaffirming that human participants remain superior across all tasks. Full article
Show Figures

Figure 1

15 pages, 517 KB  
Systematic Review
Generative AI Chatbots Across Domains: A Systematic Review
by Lama Aldhafeeri, Fay Aljumah, Fajr Thabyan, Maram Alabbad, Sultanh AlShahrani, Fawzia Alanazi and Abeer Al-Nafjan
Appl. Sci. 2025, 15(20), 11220; https://doi.org/10.3390/app152011220 - 20 Oct 2025
Viewed by 744
Abstract
The rapid advancement of large language models (LLMs) has significantly transformed the development and deployment of generative AI chatbots across various domains. This systematic literature review (SLR) analyzes 39 primary studies published between 2020 and 2025 to explore how these models are utilized, [...] Read more.
The rapid advancement of large language models (LLMs) has significantly transformed the development and deployment of generative AI chatbots across various domains. This systematic literature review (SLR) analyzes 39 primary studies published between 2020 and 2025 to explore how these models are utilized, the sectors in which they are deployed, and the broader trends shaping their use. The findings reveal that models such as GPT-3.5, GPT-4, and LLaMA variants have been widely adopted, with applications spanning education, healthcare, business services, and beyond. As adoption increases, research continues to emphasize the need for more adaptable, context-aware, and responsible chatbot systems. The insights from this review aim to guide the effective integration of LLM-based chatbots, highlighting best practices such as domain-specific fine-tuning, retrieval-augmented generation (RAG), and multi-modal interaction design. This review maps the current landscape of LLM-based chatbot development, explores the sectors and primary use cases in each domain, analyzes the types of generative AI models used in chatbot applications, and synthesizes the reported limitations and future directions to guide effective strategies for their design and deployment across domains. Full article
Show Figures

Figure 1

23 pages, 1945 KB  
Article
A Symmetry-Informed Multimodal LLM-Driven Approach to Robotic Object Manipulation: Lowering Entry Barriers in Mechatronics Education
by Jorge Gudiño-Lau, Miguel Durán-Fonseca, Luis E. Anido-Rifón and Pedro C. Santana-Mancilla
Symmetry 2025, 17(10), 1756; https://doi.org/10.3390/sym17101756 - 17 Oct 2025
Viewed by 444
Abstract
The integration of Large Language Models (LLMs), particularly Visual Language Models (VLMs), into robotics promises more intuitive human–robot interactions; however, challenges remain in efficiently translating high-level commands into precise physical actions. This paper presents a novel architecture for vision-based object manipulation that leverages [...] Read more.
The integration of Large Language Models (LLMs), particularly Visual Language Models (VLMs), into robotics promises more intuitive human–robot interactions; however, challenges remain in efficiently translating high-level commands into precise physical actions. This paper presents a novel architecture for vision-based object manipulation that leverages a VLM’s reasoning capabilities while incorporating symmetry principles to enhance operational efficiency. Implemented on a Yahboom DOFBOT educational robot with a Jetson Nano platform, our system introduces a prompt-based framework that uniquely embeds symmetry-related cues to streamline feature extraction and object detection from visual data. This methodology, which utilizes few-shot learning, enables the VLM to generate more accurate and contextually relevant commands for manipulation tasks by efficiently interpreting the symmetric and asymmetric features of objects. The experimental results in controlled scenarios demonstrate that our symmetry-informed approach significantly improves the robot’s interaction efficiency and decision-making accuracy compared to generic prompting strategies. This work contributes a robust method for integrating fundamental vision principles into modern generative AI workflows for robotics. Furthermore, its implementation on an accessible educational platform shows its potential to simplify complex robotics concepts for engineering education and research. Full article
Show Figures

Graphical abstract

51 pages, 4751 KB  
Review
Large Language Models and 3D Vision for Intelligent Robotic Perception and Autonomy
by Vinit Mehta, Charu Sharma and Karthick Thiyagarajan
Sensors 2025, 25(20), 6394; https://doi.org/10.3390/s25206394 - 16 Oct 2025
Viewed by 834
Abstract
With the rapid advancement of artificial intelligence and robotics, the integration of Large Language Models (LLMs) with 3D vision is emerging as a transformative approach to enhancing robotic sensing technologies. This convergence enables machines to perceive, reason, and interact with complex environments through [...] Read more.
With the rapid advancement of artificial intelligence and robotics, the integration of Large Language Models (LLMs) with 3D vision is emerging as a transformative approach to enhancing robotic sensing technologies. This convergence enables machines to perceive, reason, and interact with complex environments through natural language and spatial understanding, bridging the gap between linguistic intelligence and spatial perception. This review provides a comprehensive analysis of state-of-the-art methodologies, applications, and challenges at the intersection of LLMs and 3D vision, with a focus on next-generation robotic sensing technologies. We first introduce the foundational principles of LLMs and 3D data representations, followed by an in-depth examination of 3D sensing technologies critical for robotics. The review then explores key advancements in scene understanding, text-to-3D generation, object grounding, and embodied agents, highlighting cutting-edge techniques such as zero-shot 3D segmentation, dynamic scene synthesis, and language-guided manipulation. Furthermore, we discuss multimodal LLMs that integrate 3D data with touch, auditory, and thermal inputs, enhancing environmental comprehension and robotic decision-making. To support future research, we catalog benchmark datasets and evaluation metrics tailored for 3D-language and vision tasks. Finally, we identify key challenges and future research directions, including adaptive model architectures, enhanced cross-modal alignment, and real-time processing capabilities, which pave the way for more intelligent, context-aware, and autonomous robotic sensing systems. Full article
(This article belongs to the Special Issue Advanced Sensors and AI Integration for Human–Robot Teaming)
Show Figures

Figure 1

33 pages, 2007 KB  
Review
Review of Artificial Intelligence Techniques for Breast Cancer Detection with Different Modalities: Mammography, Ultrasound, and Thermography Images
by Aigerim Mashekova, Michael Yong Zhao, Vasilios Zarikas, Olzhas Mukhmetov, Nurduman Aidossov, Eddie Yin Kwee Ng, Dongming Wei and Madina Shapatova
Bioengineering 2025, 12(10), 1110; https://doi.org/10.3390/bioengineering12101110 - 15 Oct 2025
Viewed by 868
Abstract
Breast cancer remains one of the most prevalent cancers worldwide, necessitating reliable, efficient, and precise diagnostic methods. Meanwhile, the rapid development of artificial intelligence (AI) presents significant opportunities for integration into various fields, including healthcare, by enabling the processing of medical data and [...] Read more.
Breast cancer remains one of the most prevalent cancers worldwide, necessitating reliable, efficient, and precise diagnostic methods. Meanwhile, the rapid development of artificial intelligence (AI) presents significant opportunities for integration into various fields, including healthcare, by enabling the processing of medical data and the early detection of cancer. This review examines the major medical imaging techniques used for breast cancer detection, specifically mammography, ultrasound, and thermography, and identifies widely used publicly available datasets in this domain. It also surveys traditional machine learning and deep learning approaches commonly applied to the analysis of mammographic, ultrasound, and thermographic images, discussing key studies in the field and evaluating the potential of different AI techniques for breast cancer detection. Furthermore, the review highlights the development and integration of explainable artificial intelligence (XAI) to enhance transparency and trust in medical imaging-based diagnoses. Finally, it considers potential future directions, including the application of large language models (LLMs) and multimodal LLMs in breast cancer diagnosis, emphasizing recent research aimed at advancing the precision, accessibility, and reliability of diagnostic systems. Full article
Show Figures

Graphical abstract

24 pages, 2328 KB  
Review
Large Language Model Agents for Biomedicine: A Comprehensive Review of Methods, Evaluations, Challenges, and Future Directions
by Xiaoran Xu and Ravi Sankar
Information 2025, 16(10), 894; https://doi.org/10.3390/info16100894 - 14 Oct 2025
Viewed by 1407
Abstract
Large language model (LLM)-based agents are rapidly emerging as transformative tools across biomedical research and clinical applications. By integrating reasoning, planning, memory, and tool use capabilities, these agents go beyond static language models to operate autonomously or collaboratively within complex healthcare settings. This [...] Read more.
Large language model (LLM)-based agents are rapidly emerging as transformative tools across biomedical research and clinical applications. By integrating reasoning, planning, memory, and tool use capabilities, these agents go beyond static language models to operate autonomously or collaboratively within complex healthcare settings. This review provides a comprehensive survey of biomedical LLM agents, spanning their core system architectures, enabling methodologies, and real-world use cases such as clinical decision making, biomedical research automation, and patient simulation. We further examine emerging benchmarks designed to evaluate agent performance under dynamic, interactive, and multimodal conditions. In addition, we systematically analyze key challenges, including hallucinations, interpretability, tool reliability, data bias, and regulatory gaps, and discuss corresponding mitigation strategies. Finally, we outline future directions in areas such as continual learning, federated adaptation, robust multi-agent coordination, and human AI collaboration. This review aims to establish a foundational understanding of biomedical LLM agents and provide a forward-looking roadmap for building trustworthy, reliable, and clinically deployable intelligent systems. Full article
Show Figures

Figure 1

23 pages, 37453 KB  
Article
LLM-Driven Adaptive Prompt Optimization Framework for ADS-B Anomaly Detection
by Siqi Li, Buhong Wang, Zhengyang Zhao, Yong Yang and Yongjian Guan
Aerospace 2025, 12(10), 906; https://doi.org/10.3390/aerospace12100906 - 9 Oct 2025
Viewed by 1021
Abstract
The Automatic Dependent Surveillance-Broadcast (ADS-B) is a key component of the new-generation air traffic surveillance system. However, it is vulnerable to security threats due to its plaintext transmission and lack of authentication mechanisms. Existing ADS-B anomaly detection methods still suffer from significant limitations, [...] Read more.
The Automatic Dependent Surveillance-Broadcast (ADS-B) is a key component of the new-generation air traffic surveillance system. However, it is vulnerable to security threats due to its plaintext transmission and lack of authentication mechanisms. Existing ADS-B anomaly detection methods still suffer from significant limitations, including low anomaly detection rates and limited adaptability. To address these issues, this paper proposes a novel ADS-B anomaly detection framework driven by large language models (LLMs). The approach utilizes pre-trained LLMs and a self-iterative prompt optimization loop, which integrates historical trajectories and multimodal features to refine expert-initialized prompts. The optimized prompts guide the LLM in identifying ADS-B anomalies. The advantage of the proposed ADS-B anomaly detection framework lies in overcoming the limitation of traditional model adaptation. Experimental results show that the proposed method achieves excellent performance on key metrics: the anomaly detection rate of 98.55%, the false alarm rate controlled at 3.61%, the miss detection rate reduced to 1.45%, and a recall of 96.39%. Compared to traditional detection methods, this method improves detection accuracy by an average of more than 12%. Furthermore, experiments on multi-type anomaly detection tasks validate that the framework exhibits strong adaptability and good generalization, providing effective technical support for the development of aviation data security protection systems. Full article
(This article belongs to the Section Air Traffic and Transportation)
Show Figures

Figure 1

Back to TopTop