Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (474)

Search Parameters:
Keywords = multi-modal AI

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
31 pages, 897 KB  
Review
A Survey of Large Language Models: Evolution, Architectures, Adaptation, Benchmarking, Applications, Challenges, and Societal Implications
by Seyed Mahmoud Sajjadi Mohammadabadi, Burak Cem Kara, Can Eyupoglu, Can Uzay, Mehmet Serkan Tosun and Oktay Karakuş
Electronics 2025, 14(18), 3580; https://doi.org/10.3390/electronics14183580 - 9 Sep 2025
Abstract
This survey provides an in-depth review of large language models (LLMs), highlighting the significant paradigm shift they represent in artificial intelligence. Our purpose is to consolidate state-of-the-art advances in LLM design, training, adaptation, evaluation, and application for both researchers and practitioners. To accomplish [...] Read more.
This survey provides an in-depth review of large language models (LLMs), highlighting the significant paradigm shift they represent in artificial intelligence. Our purpose is to consolidate state-of-the-art advances in LLM design, training, adaptation, evaluation, and application for both researchers and practitioners. To accomplish this, we trace the evolution of language models and describe core approaches, including parameter-efficient fine-tuning (PEFT). The methodology involves a thorough survey of real-world LLM applications across the scientific, engineering, healthcare, and creative sectors, coupled with a review of current benchmarks. Our findings indicate that high training and inference costs are shaping market structures, raising economic and labor concerns, while also underscoring a persistent need for human oversight in assessment. Key trends include the development of unified multimodal architectures capable of processing varied data inputs and the emergence of agentic systems that exhibit complex behaviors such as tool use and planning. We identify critical open problems, such as detectability, data contamination, generalization, and benchmark diversity. Ultimately, we conclude that overcoming these complex technical, economic, and social challenges necessitates collaborative advancements in adaptation, evaluation, infrastructure, and governance. Full article
(This article belongs to the Section Artificial Intelligence)
25 pages, 4660 KB  
Article
Dual-Stream Former: A Dual-Branch Transformer Architecture for Visual Speech Recognition
by Sanghun Jeon, Jieun Lee and Yong-Ju Lee
AI 2025, 6(9), 222; https://doi.org/10.3390/ai6090222 - 9 Sep 2025
Abstract
This study proposes Dual-Stream Former, a novel architecture that integrates a Video Swin Transformer and Conformer designed to address the challenges of visual speech recognition (VSR). The model captures spatiotemporal dependencies, achieving a state-of-the-art character error rate (CER) of 3.46%, surpassing traditional convolutional [...] Read more.
This study proposes Dual-Stream Former, a novel architecture that integrates a Video Swin Transformer and Conformer designed to address the challenges of visual speech recognition (VSR). The model captures spatiotemporal dependencies, achieving a state-of-the-art character error rate (CER) of 3.46%, surpassing traditional convolutional neural network (CNN)-based models, such as 3D-CNN + DenseNet-121 (CER: 5.31%), and transformer-based alternatives, such as vision transformers (CER: 4.05%). The Video Swin Transformer captures multiscale spatial representations with high computational efficiency, whereas the Conformer back-end enhances temporal modeling across diverse phoneme categories. Evaluation of a high-resolution dataset comprising 740,000 utterances across 185 classes highlighted the effectiveness of the model in addressing visually confusing phonemes, such as diphthongs (/ai/, /au/) and labio-dental sounds (/f/, /v/). Dual-Stream Former achieved phoneme recognition error rates of 10.39% for diphthongs and 9.25% for labiodental sounds, surpassing those of CNN-based architectures by more than 6%. Although the model’s large parameter count (168.6 M) poses resource challenges, its hierarchical design ensures scalability. Future work will explore lightweight adaptations and multimodal extensions to increase deployment feasibility. These findings underscore the transformative potential of Dual-Stream Former for advancing VSR applications such as silent communication and assistive technologies by achieving unparalleled precision and robustness in diverse settings. Full article
Show Figures

Figure 1

38 pages, 15014 KB  
Article
Web-Based Multimodal Deep Learning Platform with XRAI Explainability for Real-Time Skin Lesion Classification and Clinical Decision Support
by Serra Aksoy, Pinar Demircioglu and Ismail Bogrekci
Cosmetics 2025, 12(5), 194; https://doi.org/10.3390/cosmetics12050194 - 8 Sep 2025
Abstract
Background: Skin cancer represents one of the most prevalent malignancies worldwide, with melanoma accounting for approximately 75% of skin cancer-related deaths despite comprising fewer than 5% of cases. Early detection dramatically improves survival rates from 14% to over 99%, highlighting the urgent need [...] Read more.
Background: Skin cancer represents one of the most prevalent malignancies worldwide, with melanoma accounting for approximately 75% of skin cancer-related deaths despite comprising fewer than 5% of cases. Early detection dramatically improves survival rates from 14% to over 99%, highlighting the urgent need for accurate and accessible diagnostic tools. While deep learning has shown promise in dermatological diagnosis, existing approaches lack clinical explainability and deployable interfaces that bridge the gap between research innovation and practical healthcare applications. Methods: This study implemented a comprehensive multimodal deep learning framework using the HAM10000 dataset (10,015 dermatoscopic images across seven diagnostic categories). Three CNN architectures (DenseNet-121, EfficientNet-B3, ResNet-50) were systematically compared, integrating patient metadata, including age, sex, and anatomical location, with dermatoscopic image analysis. The first implementation of XRAI (eXplanation with Region-based Attribution for Images) explainability for skin lesion classification was developed, providing spatially coherent explanations aligned with clinical reasoning patterns. A deployable web-based clinical interface was created, featuring real-time inference, comprehensive safety protocols, risk stratification, and evidence-based cosmetic recommendations for benign conditions. Results: EfficientNet-B3 achieved superior performance with 89.09% test accuracy and 90.08% validation accuracy, significantly outperforming DenseNet-121 (82.83%) and ResNet-50 (78.78%). Test-time augmentation improved performance by 1.00 percentage point to 90.09%. The model demonstrated excellent performance for critical malignant conditions: melanoma (81.6% confidence), basal cell carcinoma (82.1% confidence), and actinic keratoses (88% confidence). XRAI analysis revealed clinically meaningful attention patterns focusing on irregular pigmentation for melanoma, ulcerated borders for basal cell carcinoma, and surface irregularities for precancerous lesions. Error analysis showed that misclassifications occurred primarily in visually ambiguous cases with high correlation (0.855–0.968) between model attention and ideal features. The web application successfully validated real-time diagnostic capabilities with appropriate emergency protocols for malignant conditions and comprehensive cosmetic guidance for benign lesions. Conclusions: This research successfully developed the first clinically deployable skin lesion classification system combining diagnostic accuracy with explainable AI and practical patient guidance. The integration of XRAI explainability provides essential transparency for clinical acceptance, while the web-based deployment democratizes access to advanced dermatological AI capabilities. Comprehensive validation establishes readiness for controlled clinical trials and potential integration into healthcare workflows, particularly benefiting underserved regions with limited specialist availability. This work bridges the critical gap between research-grade AI models and practical clinical utility, establishing a foundation for responsible AI integration in dermatological practice. Full article
(This article belongs to the Special Issue Feature Papers in Cosmetics in 2025)
Show Figures

Figure 1

28 pages, 21851 KB  
Article
A Critical Assessment of Modern Generative Models’ Ability to Replicate Artistic Styles
by Andrea Asperti, Franky George, Tiberio Marras, Razvan Ciprian Stricescu and Fabio Zanotti
Big Data Cogn. Comput. 2025, 9(9), 231; https://doi.org/10.3390/bdcc9090231 - 6 Sep 2025
Viewed by 286
Abstract
In recent years, advancements in generative artificial intelligence have led to the development of sophisticated tools capable of mimicking diverse artistic styles, opening new possibilities for digital creativity and artistic expression. This paper presents a critical assessment of the style replication capabilities of [...] Read more.
In recent years, advancements in generative artificial intelligence have led to the development of sophisticated tools capable of mimicking diverse artistic styles, opening new possibilities for digital creativity and artistic expression. This paper presents a critical assessment of the style replication capabilities of contemporary generative models, evaluating their strengths and limitations across multiple dimensions. We examine how effectively these models reproduce traditional artistic styles while maintaining structural integrity and compositional balance in the generated images. The analysis is based on a new large dataset of AI-generated works imitating artistic styles of the past, holding potential for a wide range of applications: the “AI-Pastiche” dataset. This study is supported by extensive user surveys, collecting diverse opinions on the dataset and investigating both technical and aesthetic challenges, including the ability to generate outputs that are realistic and visually convincing, the versatility of models in handling a wide range of artistic styles, and the extent to which they adhere to the content and stylistic specifications outlined in prompts, preserving cohesion and integrity in generated images. This paper aims to provide a comprehensive overview of the current state of generative tools in style replication, offering insights into their technical and artistic limitations, potential advancements in model design and training methodologies, and emerging opportunities for enhancing digital artistry, human–AI collaboration, and the broader creative landscape. Full article
Show Figures

Figure 1

22 pages, 4937 KB  
Article
Multimodal AI for UAV: Vision–Language Models in Human– Machine Collaboration
by Maroš Krupáš, Ľubomír Urblík and Iveta Zolotová
Electronics 2025, 14(17), 3548; https://doi.org/10.3390/electronics14173548 - 6 Sep 2025
Viewed by 419
Abstract
Recent advances in multimodal large language models (MLLMs)—particularly vision– language models (VLMs)—introduce new possibilities for integrating visual perception with natural-language understanding in human–machine collaboration (HMC). Unmanned aerial vehicles (UAVs) are increasingly deployed in dynamic environments, where adaptive autonomy and intuitive interaction are essential. [...] Read more.
Recent advances in multimodal large language models (MLLMs)—particularly vision– language models (VLMs)—introduce new possibilities for integrating visual perception with natural-language understanding in human–machine collaboration (HMC). Unmanned aerial vehicles (UAVs) are increasingly deployed in dynamic environments, where adaptive autonomy and intuitive interaction are essential. Traditional UAV autonomy has relied mainly on visual perception or preprogrammed planning, offering limited adaptability and explainability. This study introduces a novel reference architecture, the multimodal AI–HMC system, based on which a dedicated UAV use case architecture was instantiated and experimentally validated in a controlled laboratory environment. The architecture integrates VLM-powered reasoning, real-time depth estimation, and natural-language interfaces, enabling UAVs to perform context-aware actions while providing transparent explanations. Unlike prior approaches, the system generates navigation commands while also communicating the underlying rationale and associated confidence levels, thereby enhancing situational awareness and fostering user trust. The architecture was implemented in a real-time UAV navigation platform and evaluated through laboratory trials. Quantitative results showed a 70% task success rate in single-obstacle navigation and 50% in a cluttered scenario, with safe obstacle avoidance at flight speeds of up to 0.6 m/s. Users approved 90% of the generated instructions and rated explanations as significantly clearer and more informative when confidence visualization was included. These findings demonstrate the novelty and feasibility of embedding VLMs into UAV systems, advancing explainable, human-centric autonomy and establishing a foundation for future multimodal AI applications in HMC, including robotics. Full article
Show Figures

Figure 1

15 pages, 604 KB  
Review
Advancing Precision Neurology and Wearable Electrophysiology: A Review on the Pivotal Role of Medical Physicists in Signal Processing, AI, and Prognostic Modeling
by Constantinos Koutsojannis, Athanasios Fouras and Dionysia Chrysanthakopoulou
Biophysica 2025, 5(3), 40; https://doi.org/10.3390/biophysica5030040 - 5 Sep 2025
Viewed by 185
Abstract
Medical physicists are transforming physiological measurements and electrophysiological applications by addressing challenges like motion artifacts and regulatory compliance through advanced signal processing, artificial intelligence (AI), and statistical rigor. Their innovations in wearable electrophysiology achieve 8–12 dB signal-to-noise ratio (SNR) improvements in EEG, 60% [...] Read more.
Medical physicists are transforming physiological measurements and electrophysiological applications by addressing challenges like motion artifacts and regulatory compliance through advanced signal processing, artificial intelligence (AI), and statistical rigor. Their innovations in wearable electrophysiology achieve 8–12 dB signal-to-noise ratio (SNR) improvements in EEG, 60% motion artifact reduction, and 94.2% accurate AI-driven arrhythmia detection at 12 μW power. In precision neurology, machine learning (ML) with evoked potentials (EPs) predicts spinal cord injury (SCI) recovery and multiple sclerosis (MS) progression with 79.2% accuracy based on retrospective data from 560 SCI/MS patients. By integrating multimodal data (EPs, MRI), developing quantum sensors, and employing federated learning, these can enhance diagnostic precision and prognostic accuracy. Clinical applications span epilepsy, stroke, cardiac monitoring, and chronic pain management, reducing diagnostic errors by 28% and optimizing treatments like deep brain stimulation (DBS). In this paper, we review the current state of wearable devices and provide some insight into possible future directions. Embedding medical physicists into standardization efforts is critical to overcoming barriers like quantum sensor power consumption, advancing personalized, evidence-based healthcare. Full article
Show Figures

Figure 1

77 pages, 2936 KB  
Review
Enhancing Smart Grid Security and Efficiency: AI, Energy Routing, and T&D Innovations (A Review)
by Hassam Ishfaq, Sania Kanwal, Sadeed Anwar, Mubarak Abdussalam and Waqas Amin
Energies 2025, 18(17), 4747; https://doi.org/10.3390/en18174747 - 5 Sep 2025
Viewed by 465
Abstract
This paper presents an in-depth review of cybersecurity challenges and advanced solutions in modern power-generation systems, with particular emphasis on smart grids. It examines vulnerabilities in devices such as smart meters (SMs), Phasor Measurement Units (PMUs), and Remote Terminal Units (RTUs) to cyberattacks, [...] Read more.
This paper presents an in-depth review of cybersecurity challenges and advanced solutions in modern power-generation systems, with particular emphasis on smart grids. It examines vulnerabilities in devices such as smart meters (SMs), Phasor Measurement Units (PMUs), and Remote Terminal Units (RTUs) to cyberattacks, including False Data Injection Attacks (FDIAs), Denial of Service (DoS), and Replay Attacks (RAs). The study evaluates cutting-edge detection and mitigation techniques, such as Cluster Partition, Fuzzy Broad Learning System (CP-BLS), multimodal deep learning, and autoencoder models, achieving detection accuracies of (up to 99.99%) for FDIA identification. It explores critical aspects of power generation, including resource assessment, environmental and climatic factors, policy and regulatory frameworks, grid and storage integration, and geopolitical and social dimensions. The paper also addresses the transmission and distribution (T&D) system, emphasizing the role of smart-grid technologies and advanced energy-routing strategies that leverage Artificial Neural Networks (ANNs), Generative Adversarial Networks (GANs), and game-theoretic approaches to optimize energy flows and enhance grid stability. Future research directions include high-resolution forecasting, adaptive optimization, and the integration of quantum–AI methods to improve scalability, reliability, and resilience. Full article
(This article belongs to the Special Issue Smart Grid and Energy Storage)
Show Figures

Figure 1

23 pages, 1928 KB  
Systematic Review
Eye Tracking-Enhanced Deep Learning for Medical Image Analysis: A Systematic Review on Data Efficiency, Interpretability, and Multimodal Integration
by Jiangxia Duan, Meiwei Zhang, Minghui Song, Xiaopan Xu and Hongbing Lu
Bioengineering 2025, 12(9), 954; https://doi.org/10.3390/bioengineering12090954 - 5 Sep 2025
Viewed by 369
Abstract
Deep learning (DL) has revolutionized medical image analysis (MIA), enabling early anomaly detection, precise lesion segmentation, and automated disease classification. However, its clinical integration faces two major challenges: reliance on limited, narrowly annotated datasets that inadequately capture real-world patient diversity, and the inherent [...] Read more.
Deep learning (DL) has revolutionized medical image analysis (MIA), enabling early anomaly detection, precise lesion segmentation, and automated disease classification. However, its clinical integration faces two major challenges: reliance on limited, narrowly annotated datasets that inadequately capture real-world patient diversity, and the inherent “black-box” nature of DL decision-making, which complicates physician scrutiny and accountability. Eye tracking (ET) technology offers a transformative solution by capturing radiologists’ gaze patterns to generate supervisory signals. These signals enhance DL models through two key mechanisms: providing weak supervision to improve feature recognition and diagnostic accuracy, particularly when labeled data are scarce, and enabling direct comparison between machine and human attention to bridge interpretability gaps and build clinician trust. This approach also extends effectively to multimodal learning models (MLMs) and vision–language models (VLMs), supporting the alignment of machine reasoning with clinical expertise by grounding visual observations in diagnostic context, refining attention mechanisms, and validating complex decision pathways. Conducted in accordance with the PRISMA statement and registered in PROSPERO (ID: CRD42024569630), this review synthesizes state-of-the-art strategies for ET-DL integration. We further propose a unified framework in which ET innovatively serves as a data efficiency optimizer, a model interpretability validator, and a multimodal alignment supervisor. This framework paves the way for clinician-centered AI systems that prioritize verifiable reasoning, seamless workflow integration, and intelligible performance, thereby addressing key implementation barriers and outlining a path for future clinical deployment. Full article
(This article belongs to the Section Biosignal Processing)
Show Figures

Graphical abstract

19 pages, 329 KB  
Review
Artificial Intelligence-Driven Personalization in Breast Cancer Screening: From Population Models to Individualized Protocols
by Filippo Pesapane, Luca Nicosia, Lucrezia D’Amelio, Giulia Quercioli, Mariassunta Roberta Pannarale, Francesca Priolo, Irene Marinucci, Maria Giorgia Farina, Silvia Penco, Valeria Dominelli, Anna Rotili, Lorenza Meneghetti, Anna Carla Bozzini, Sonia Santicchia and Enrico Cassano
Cancers 2025, 17(17), 2901; https://doi.org/10.3390/cancers17172901 - 4 Sep 2025
Viewed by 461
Abstract
Conventional breast cancer screening programs are predominantly age-based, applying uniform intervals and modalities across broad populations. While this model has reduced mortality, it entails harms—including overdiagnosis, false positives, and missed interval cancers—prompting interest in risk-stratified approaches. In recent years, artificial intelligence (AI) has [...] Read more.
Conventional breast cancer screening programs are predominantly age-based, applying uniform intervals and modalities across broad populations. While this model has reduced mortality, it entails harms—including overdiagnosis, false positives, and missed interval cancers—prompting interest in risk-stratified approaches. In recent years, artificial intelligence (AI) has emerged as a critical enabler of this paradigm shift. This narrative review examines how AI-driven tools are advancing breast cancer screening toward personalization, with a focus on mammographic risk models, multimodal risk prediction, and AI-enabled clinical decision support. We reviewed studies published from 2015 to 2025, prioritizing large cohorts, randomized trials, and prospective validations. AI-based mammographic risk models generally improve discrimination versus classical models and are being externally validated; however, evidence remains heterogeneous across subtypes and populations. Emerging multimodal models integrate genetics, clinical data, and imaging; AI is also being evaluated for triage and personalized intervals within clinical workflows. Barriers remain—explainability, regulatory validation, and equity. Widespread adoption will depend on prospective clinical benefit, regulatory alignment, and careful integration. Overall, AI-based mammographic risk models generally improve discrimination versus classical models and are being externally validated; however, evidence remains heterogeneous across molecular subtypes, with signals strongest for ER-positive disease and limited data for fast-growing and interval cancers. Prospective trials demonstrating outcome benefit and safe interval modification are still pending. Accordingly, adoption should proceed with safeguards, equity monitoring, and clear separation between risk prediction, lesion detection, triage, and decision-support roles Full article
(This article belongs to the Special Issue Advances in Oncological Imaging (2nd Edition))
25 pages, 13849 KB  
Article
When Action Speaks Louder than Words: Exploring Non-Verbal and Paraverbal Features in Dyadic Collaborative VR
by Dennis Osei Tutu, Sepideh Habibiabad, Wim Van den Noortgate, Jelle Saldien and Klaas Bombeke
Sensors 2025, 25(17), 5498; https://doi.org/10.3390/s25175498 - 4 Sep 2025
Viewed by 778
Abstract
Soft skills such as communication and collaboration are vital in both professional and educational settings, yet difficult to train and assess objectively. Traditional role-playing scenarios rely heavily on subjective trainer evaluations—either in real time, where subtle behaviors are missed, or through time-intensive post [...] Read more.
Soft skills such as communication and collaboration are vital in both professional and educational settings, yet difficult to train and assess objectively. Traditional role-playing scenarios rely heavily on subjective trainer evaluations—either in real time, where subtle behaviors are missed, or through time-intensive post hoc analysis. Virtual reality (VR) offers a scalable alternative by immersing trainees in controlled, interactive scenarios while simultaneously capturing fine-grained behavioral signals. This study investigates how task design in VR shapes non-verbal and paraverbal behaviors during dyadic collaboration. We compared two puzzle tasks: Task 1, which provided shared visual access and dynamic gesturing, and Task 2, which required verbal coordination through separation and turn-taking. From multimodal tracking data, we extracted features including gaze behaviors (eye contact, joint attention), hand gestures, facial expressions, and speech activity, and compared them across tasks. A clustering analysis explored whether o not tasks could be differentiated by their behavioral profiles. Results showed that Task 2, the more constrained condition, led participants to focus more visually on their own workspaces, suggesting that interaction difficulty can reduce partner-directed attention. Gestures were more frequent in shared-visual tasks, while speech became longer and more structured when turn-taking was enforced. Joint attention increased when participants relied on verbal descriptions rather than on a visible shared reference. These findings highlight how VR can elicit distinct soft skill behaviors through scenario design, enabling data-driven analysis of collaboration. This work contributes to scalable assessment frameworks with applications in training, adaptive agents, and human-AI collaboration. Full article
(This article belongs to the Special Issue Sensing Technology to Measure Human-Computer Interactions)
Show Figures

Figure 1

10 pages, 1081 KB  
Proceeding Paper
Insights into the Emotion Classification of Artificial Intelligence: Evolution, Application, and Obstacles of Emotion Classification
by Marselina Endah Hiswati, Ema Utami, Kusrini Kusrini and Arief Setyanto
Eng. Proc. 2025, 103(1), 24; https://doi.org/10.3390/engproc2025103024 - 3 Sep 2025
Viewed by 194
Abstract
In this systematic literature review, we examined the integration of emotional intelligence into artificial intelligence (AI) systems, focusing on advancements, challenges, and opportunities in emotion classification technologies. Accurate emotion recognition in AI holds immense potential in healthcare, the IoT, and education. However, challenges [...] Read more.
In this systematic literature review, we examined the integration of emotional intelligence into artificial intelligence (AI) systems, focusing on advancements, challenges, and opportunities in emotion classification technologies. Accurate emotion recognition in AI holds immense potential in healthcare, the IoT, and education. However, challenges such as computational demands, limited dataset diversity, and real-time deployment complexity remain significant. In this review, we included research on emerging solutions like multimodal data processing, attention mechanisms, and real-time emotion tracking to address these issues. By overcoming these issues, AI systems enhance human–AI interactions and expand real-world applications. Recommendations for improving accuracy and scalability in emotion-aware AI are provided based on the review results. Full article
Show Figures

Figure 1

23 pages, 552 KB  
Article
Flipping the Script: The Impact of a Blended Literacy Learning Intervention on Comprehension
by Michael J. Hockwater
Educ. Sci. 2025, 15(9), 1147; https://doi.org/10.3390/educsci15091147 - 3 Sep 2025
Viewed by 503
Abstract
This qualitative action research case study explored how a blended literacy learning intervention combining the flipped classroom model with youth-selected multimodal texts influenced sixth-grade Academic Intervention Services (AIS) students’ comprehension of figurative language. The study was conducted over four months in a New [...] Read more.
This qualitative action research case study explored how a blended literacy learning intervention combining the flipped classroom model with youth-selected multimodal texts influenced sixth-grade Academic Intervention Services (AIS) students’ comprehension of figurative language. The study was conducted over four months in a New York State middle school and involved seven students identified as at-risk readers. Initially, students engaged with teacher-created instructional videos outside of class and completed analytical activities during class time. However, due to low engagement and limited comprehension gains, the intervention was revised to incorporate student autonomy through the selection of multimodal texts such as graphic novels, song lyrics, and YouTube videos. Data was collected through semi-structured interviews, journal entries, surveys, and classroom artifacts, and then analyzed using inductive coding and member checking. Findings indicate that students demonstrated increased the comprehension of figurative language when given choice in both texts and instructional videos. Participants reported increased motivation, deeper engagement, and enhanced meaning-making, particularly when reading texts that reflected their personal interests and experiences. The study concludes that a blended literacy model emphasizing autonomy and multimodality can support comprehension and bridge the gap between in-school and out-of-school literacy practices. Full article
(This article belongs to the Special Issue Digital Literacy Environments and Reading Comprehension)
Show Figures

Figure 1

22 pages, 1688 KB  
Article
LumiCare: A Context-Aware Mobile System for Alzheimer’s Patients Integrating AI Agents and 6G
by Nicola Dall’Ora, Lorenzo Felli, Stefano Aldegheri, Nicola Vicino and Romeo Giuliano
Electronics 2025, 14(17), 3516; https://doi.org/10.3390/electronics14173516 - 2 Sep 2025
Viewed by 424
Abstract
Alzheimer’s disease is a growing global health concern, demanding innovative solutions for early detection, continuous monitoring, and patient support. This article reviews recent advances in Smart Wearable Medical Devices (SWMDs), Internet of Things (IoT) systems, and mobile applications used to monitor physiological, behavioral, [...] Read more.
Alzheimer’s disease is a growing global health concern, demanding innovative solutions for early detection, continuous monitoring, and patient support. This article reviews recent advances in Smart Wearable Medical Devices (SWMDs), Internet of Things (IoT) systems, and mobile applications used to monitor physiological, behavioral, and cognitive changes in Alzheimer’s patients. We highlight the role of wearable sensors in detecting vital signs, falls, and geolocation data, alongside IoT architectures that enable real-time alerts and remote caregiver access. Building on these technologies, we present LumiCare, a conceptual, context-aware mobile system that integrates multimodal sensor data, chatbot-based interaction, and emerging 6G network capabilities. LumiCare uses machine learning for behavioral analysis, delivers personalized cognitive prompts, and enables emergency response through adaptive alerts and caregiver notifications. The system includes the LumiCare Companion, an interactive mobile app designed to support daily routines, cognitive engagement, and safety monitoring. By combining local AI processing with scalable edge-cloud architectures, LumiCare balances latency, privacy, and computational load. While promising, this work remains at the design stage and has not yet undergone clinical validation. Our analysis underscores the potential of wearable, IoT, and mobile technologies to improve the quality of life for Alzheimer’s patients, support caregivers, and reduce healthcare burdens. Full article
(This article belongs to the Special Issue Smart Bioelectronics, Wearable Systems and E-Health)
Show Figures

Figure 1

18 pages, 1660 KB  
Article
AI Gem: Context-Aware Transformer Agents as Digital Twin Tutors for Adaptive Learning
by Attila Kovari
Computers 2025, 14(9), 367; https://doi.org/10.3390/computers14090367 - 2 Sep 2025
Viewed by 380
Abstract
Recent developments in large language models allow for real time, context-aware tutoring. AI Gem, presented in this article, is a layered architecture that integrates personalization, adaptive feedback, and curricular alignment into transformer based tutoring agents. The architecture combines retrieval augmented generation, Bayesian learner [...] Read more.
Recent developments in large language models allow for real time, context-aware tutoring. AI Gem, presented in this article, is a layered architecture that integrates personalization, adaptive feedback, and curricular alignment into transformer based tutoring agents. The architecture combines retrieval augmented generation, Bayesian learner model, and policy-based dialog in a verifiable and deployable software stack. The opportunities are scalable tutoring, multimodal interaction, and augmentation of teachers through content tools and analytics. Risks are factual errors, bias, over reliance, latency, cost, and privacy. The paper positions AI Gem as a design framework with testable hypotheses. A scenario-based walkthrough and new diagrams assign each learner step to the ten layers. Governance guidance covers data privacy across jurisdictions and operation in resource constrained environments. Full article
(This article belongs to the Special Issue Recent Advances in Computer-Assisted Learning (2nd Edition))
Show Figures

Figure 1

22 pages, 47099 KB  
Article
Deciphering Emotions in Children’s Storybooks: A Comparative Analysis of Multimodal LLMs in Educational Applications
by Bushra Asseri, Estabrag Abaker, Maha Al Mogren, Tayef Alhefdhi and Areej Al-Wabil
AI 2025, 6(9), 211; https://doi.org/10.3390/ai6090211 - 2 Sep 2025
Viewed by 492
Abstract
Emotion recognition capabilities in multimodal AI systems are crucial for developing culturally responsive educational technologies yet remain underexplored for Arabic language contexts, where culturally appropriate learning tools are critically needed. This study evaluated the emotion recognition performance of two advanced multimodal large language [...] Read more.
Emotion recognition capabilities in multimodal AI systems are crucial for developing culturally responsive educational technologies yet remain underexplored for Arabic language contexts, where culturally appropriate learning tools are critically needed. This study evaluated the emotion recognition performance of two advanced multimodal large language models, GPT-4o and Gemini 1.5 Pro, when processing Arabic children’s storybook illustrations. We assessed both models across three prompting strategies (zero-shot, few-shot, and chain-of-thought) using 75 images from seven Arabic storybooks, comparing model predictions with human annotations based on Plutchik’s emotional framework. GPT-4o consistently outperformed Gemini across all conditions, achieving the highest macro F1-score of 59% with chain-of-thought prompting compared to Gemini’s best performance of 43%. Error analysis revealed systematic misclassification patterns, with valence inversions accounting for 60.7% of errors, while both models struggled with culturally nuanced emotions and ambiguous narrative contexts. These findings highlight fundamental limitations in current models’ cultural understanding and emphasize the need for culturally sensitive training approaches to develop effective emotion-aware educational technologies for Arabic-speaking learners. Full article
(This article belongs to the Special Issue Exploring the Use of Artificial Intelligence in Education)
Show Figures

Figure 1

Back to TopTop