Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (2,479)

Search Parameters:
Keywords = multimodal learning

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
40 pages, 5708 KB  
Review
Advances on Multimodal Remote Sensing Foundation Models for Earth Observation Downstream Tasks: A Survey
by Guoqing Zhou, Lihuang Qian and Paolo Gamba
Remote Sens. 2025, 17(21), 3532; https://doi.org/10.3390/rs17213532 (registering DOI) - 24 Oct 2025
Abstract
Remote sensing foundation models (RSFMs) have demonstrated excellent feature extraction and reasoning capabilities under the self-supervised learning paradigm of “unlabeled datasets—model pre-training—downstream tasks”. These models achieve superior accuracy and performance compared to existing models across numerous open benchmark datasets. However, when confronted with [...] Read more.
Remote sensing foundation models (RSFMs) have demonstrated excellent feature extraction and reasoning capabilities under the self-supervised learning paradigm of “unlabeled datasets—model pre-training—downstream tasks”. These models achieve superior accuracy and performance compared to existing models across numerous open benchmark datasets. However, when confronted with multimodal data, such as optical, LiDAR, SAR, text, video, and audio, the RSFMs exhibit limitations in cross-modal generalization and multi-task learning. Although several reviews have addressed the RSFMs, there is currently no comprehensive survey dedicated to vision–X (vision, language, audio, position) multimodal RSFMs (MM-RSFMs). To tackle this gap, this article provides a systematic review of MM-RSFMs from a novel perspective. Firstly, the key technologies underlying MM-RSFMs are reviewed and analyzed, and the available multimodal RS pre-training datasets are summarized. Then, recent advances in MM-RSFMs are classified according to the development of backbone networks and cross-modal interaction methods of vision–X, such as vision–vision, vision–language, vision–audio, vision–position, and vision–language–audio. Finally, potential challenges are analyzed, and perspectives for MM-RSFMs are outlined. This survey from this paper reveals that current MM-RSFMs face the following key challenges: (1) a scarcity of high-quality multimodal datasets, (2) limited capability for multimodal feature extraction, (3) weak cross-task generalization, (4) absence of unified evaluation criteria, and (5) insufficient security measures. Full article
(This article belongs to the Section AI Remote Sensing)
15 pages, 1889 KB  
Article
Predicting Sarcopenia in Peritoneal Dialysis Patients: A Multimodal Ultrasound-Based Logistic Regression Analysis and Nomogram Model
by Shengqiao Wang, Xiuyun Lu, Juan Chen, Xinliang Xu, Jun Jiang and Yi Dong
Diagnostics 2025, 15(21), 2685; https://doi.org/10.3390/diagnostics15212685 - 23 Oct 2025
Abstract
Objective: This study aimed to evaluate the diagnostic value of logistic regression and nomogram models based on multimodal ultrasound in predicting sarcopenia in patients with peritoneal dialysis (PD). Methods: A total of 178 patients with PD admitted to our nephrology department between June [...] Read more.
Objective: This study aimed to evaluate the diagnostic value of logistic regression and nomogram models based on multimodal ultrasound in predicting sarcopenia in patients with peritoneal dialysis (PD). Methods: A total of 178 patients with PD admitted to our nephrology department between June 2024 and April 2025 were enrolled. According to the 2019 Asian Working Group for Sarcopenia (AWGS) diagnostic criteria, patients were categorized into sarcopenia and non-sarcopenia groups. Ultrasound examinations were used to measure the muscle thickness (MT), pinna angle (PA), fascicle length (FL), attenuation coefficient (Atten Coe), and echo intensity (EI) of the right gastrocnemius medial head. The clinical characteristics of the groups were compared using the Mann–Whitney U test. Binary logistic regression was used to identify sarcopenia risk factors to construct clinical prediction models and nomograms. Receiver operating characteristic (ROC) curves were used to assess the model accuracy and stability. Results: The sarcopenia group exhibited significantly lower MT, PA, and FL, but higher Atten Coe and EI than the non-sarcopenia group (all p < 0.05). A multimodal ultrasound logistic regression model was developed using machine learning—Logit(P) = −7.29 − 1.18 × MT − 0.074 × PA + 0.48 × FL + 0.52 × Atten Coe + 0.13 × EI (p < 0.05)—achieving an F1-score of 0.785. The area under the ROC curve (ROC-AUC) was 0.902, with an optimal cut-off value of 0.45 (sensitivity 77.3%, specificity 56.7%). Nomogram consistency analysis showed no statistical difference between the ultrasound diagnosis and the appendicular skeletal muscle index (ASMI) measured by bioelectrical impedance analysis (BIA) (Z = 0.415, p > 0.05). Conclusions: The multimodal ultrasound-based prediction model effectively assists clinicians in identifying patients with PD at a high risk of sarcopenia, enabling early intervention to improve clinical outcomes. Full article
(This article belongs to the Section Medical Imaging and Theranostics)
Show Figures

Figure 1

22 pages, 1087 KB  
Article
Modeling the Internal and Contextual Attention for Self-Supervised Skeleton-Based Action Recognition
by Wentian Xin, Yue Teng, Jikang Zhang, Yi Liu, Ruyi Liu, Yuzhi Hu and Qiguang Miao
Sensors 2025, 25(21), 6532; https://doi.org/10.3390/s25216532 - 23 Oct 2025
Abstract
Multimodal contrastive learning has achieved significant performance advantages in self-supervised skeleton-based action recognition. Previous methods are limited by modality imbalance, which reduces alignment accuracy and makes it difficult to combine important spatial–temporal frequency patterns, leading to confusion between modalities and weaker feature representations. [...] Read more.
Multimodal contrastive learning has achieved significant performance advantages in self-supervised skeleton-based action recognition. Previous methods are limited by modality imbalance, which reduces alignment accuracy and makes it difficult to combine important spatial–temporal frequency patterns, leading to confusion between modalities and weaker feature representations. To overcome these problems, we explore intra-modality feature-wise self-similarity and inter-modality instance-wise cross-consistency, and discover two inherent correlations that benefit recognition: (i) Global Perspective expresses how action semantics carry a broad and high-level understanding, which supports the use of globally discriminative feature representations. (ii) Focus Adaptation refers to the role of the frequency spectrum in guiding attention toward key joints by emphasizing compact and salient signal patterns. Building upon these insights, we propose a novel language–skeleton contrastive learning framework comprising two key components: (a) Feature Modulation, which constructs a skeleton–language action conceptual domain to minimize the expected information gain between vision and language modalities. (b) Frequency Feature Learning, which introduces a Frequency-domain Spatial–Temporal block (FreST) that focuses on sparse key human joints in the frequency domain with compact signal energy. Extensive experiments demonstrate the effectiveness of our method achieves remarkable action recognition performance on widely used benchmark datasets, including NTU RGB+D 60 and NTU RGB+D 120. Especially on the challenging PKU-MMD dataset, MICA has achieved at least a 4.6% improvement over classical methods such as CrosSCLR and AimCLR, effectively demonstrating its ability to capture internal and contextual attention information. Full article
(This article belongs to the Special Issue Deep Learning for Perception and Recognition: Method and Applications)
Show Figures

Figure 1

26 pages, 1979 KB  
Review
From Single-Sensor Constraints to Multisensor Integration: Advancing Sustainable Complex Ore Sorting
by Sefiu O. Adewuyi, Angelina Anani, Kray Luxbacher and Sehliselo Ndlovu
Minerals 2025, 15(11), 1101; https://doi.org/10.3390/min15111101 - 23 Oct 2025
Abstract
Processing complex ore remains a challenge due to energy-intensive grinding and complex beneficiation and pyrometallurgical treatments that consume large amounts of water whilst generating significant waste and polluting the environment. Sensor-based ore sorting, which separates ore particles based on their physical or chemical [...] Read more.
Processing complex ore remains a challenge due to energy-intensive grinding and complex beneficiation and pyrometallurgical treatments that consume large amounts of water whilst generating significant waste and polluting the environment. Sensor-based ore sorting, which separates ore particles based on their physical or chemical properties before downstream processing, is emerging as a transformative technology in mineral processing. However, its application to complex and heterogeneous ores remain limited by the constraints of single-sensor systems. In addition, existing hybrid sensor strategies are fragmented and a consolidated framework for implementation is lacking. This review explores these challenges and underscores the potential of multimodal sensor integration for complex ore pre-concentration. A multi-sensor framework integrating machine learning and computer vision is proposed to overcome limitations in handling complex ores and enhance sorting efficiency. This approach can improve recovery rates, reduce energy and water consumption, and optimize process performance, thereby supporting more sustainable mining practices that contribute to the United Nations Sustainable Development Goals (UNSDGs). This work provides a roadmap for advancing efficient, resilient, and next-generation mineral processing operations. Full article
(This article belongs to the Section Mineral Processing and Extractive Metallurgy)
Show Figures

Graphical abstract

25 pages, 2557 KB  
Article
Modality-Resilient Multimodal Industrial Anomaly Detection via Cross-Modal Knowledge Transfer and Dynamic Edge-Preserving Voxelization
by Jiahui Xu, Jian Yuan, Mingrui Yang and Weishu Yan
Sensors 2025, 25(21), 6529; https://doi.org/10.3390/s25216529 - 23 Oct 2025
Abstract
Achieving high-precision anomaly detection with incomplete sensor data is a critical challenge in industrial automation and intelligent manufacturing. This incompleteness often results from sensor failures, environmental interference, occlusions, or acquisition cost constraints. This study explicitly targets both types of incompleteness commonly encountered in [...] Read more.
Achieving high-precision anomaly detection with incomplete sensor data is a critical challenge in industrial automation and intelligent manufacturing. This incompleteness often results from sensor failures, environmental interference, occlusions, or acquisition cost constraints. This study explicitly targets both types of incompleteness commonly encountered in industrial multimodal inspection: (i) incomplete sensor data within a given modality, such as partial point cloud loss or image degradation, and (ii) incomplete modalities, where one sensing channel (RGB or 3D) is entirely unavailable. By jointly addressing intra-modal incompleteness and cross-modal absence within a unified cross-distillation framework, our approach enhances anomaly detection robustness under both conditions. First, a teacher–student cross-modal distillation mechanism enables robust feature learning from both RGB and 3D modalities, allowing the student network to accurately detect anomalies even when a modality is missing during inference. Second, a dynamic voxel resolution adjustment with edge-retention strategy alleviates the computational burden of 3D point cloud processing while preserving crucial geometric features. By jointly enhancing robustness to missing modalities and improving computational efficiency, our method offers a resilient and practical solution for anomaly detection in real-world manufacturing scenarios. Extensive experiments demonstrate that the proposed method achieves both high robustness and efficiency across multiple industrial scenarios, establishing new state-of-the-art performance that surpasses existing approaches in both accuracy and speed. This method provides a robust solution for high-precision perception under complex detection conditions, significantly enhancing the feasibility of deploying anomaly detection systems in real industrial environments. Full article
(This article belongs to the Section Industrial Sensors)
Show Figures

Figure 1

24 pages, 5556 KB  
Article
Efficient Wearable Sensor-Based Activity Recognition for Human–Robot Collaboration in Agricultural Environments
by Sakorn Mekruksavanich and Anuchit Jitpattanakul
Informatics 2025, 12(4), 115; https://doi.org/10.3390/informatics12040115 - 23 Oct 2025
Viewed by 47
Abstract
This study focuses on human awareness, a critical component in human–robot interaction, particularly within agricultural environments where interactions are enriched by complex contextual information. The main objective is identifying human activities occurring during collaborative harvesting tasks involving humans and robots. To achieve this, [...] Read more.
This study focuses on human awareness, a critical component in human–robot interaction, particularly within agricultural environments where interactions are enriched by complex contextual information. The main objective is identifying human activities occurring during collaborative harvesting tasks involving humans and robots. To achieve this, we propose a novel and lightweight deep learning model, named 1D-ResNeXt, designed explicitly for recognizing activities in agriculture-related human–robot collaboration. The model is built as an end-to-end architecture incorporating feature fusion and a multi-kernel convolutional block strategy. It utilizes residual connections and a split–transform–merge mechanism to mitigate performance degradation and reduce model complexity by limiting the number of trainable parameters. Sensor data were collected from twenty individuals with five wearable devices placed on different body parts. Each sensor was embedded with tri-axial accelerometers, gyroscopes, and magnetometers. Under real field conditions, the participants performed several sub-tasks commonly associated with agricultural labor, such as lifting and carrying loads. Before classification, the raw sensor signals were pre-processed to eliminate noise. The cleaned time-series data were then input into the proposed deep learning network for sequential pattern recognition. Experimental results showed that the chest-mounted sensor achieved the highest F1-score of 99.86%, outperforming other sensor placements and combinations. An analysis of temporal window sizes (0.5, 1.0, 1.5, and 2.0 s) demonstrated that the 0.5 s window provided the best recognition performance, indicating that key activity features in agriculture can be captured over short intervals. Moreover, a comprehensive evaluation of sensor modalities revealed that multimodal fusion of accelerometer, gyroscope, and magnetometer data yielded the best accuracy at 99.92%. The combination of accelerometer and gyroscope data offered an optimal compromise, achieving 99.49% accuracy while maintaining lower system complexity. These findings highlight the importance of strategic sensor placement and data fusion in enhancing activity recognition performance while reducing the need for extensive data and computational resources. This work contributes to developing intelligent, efficient, and adaptive collaborative systems, offering promising applications in agriculture and beyond, with improved safety, cost-efficiency, and real-time operational capability. Full article
Show Figures

Figure 1

23 pages, 6498 KB  
Article
A Cross-Modal Deep Feature Fusion Framework Based on Ensemble Learning for Land Use Classification
by Xiaohuan Wu, Houji Qi, Keli Wang, Yikun Liu and Yang Wang
ISPRS Int. J. Geo-Inf. 2025, 14(11), 411; https://doi.org/10.3390/ijgi14110411 - 23 Oct 2025
Viewed by 184
Abstract
Land use classification based on multi-modal data fusion has gained significant attention due to its potential to capture the complex characteristics of urban environments. However, effectively extracting and integrating discriminative features derived from heterogeneous geospatial data remain challenging. This study proposes an ensemble [...] Read more.
Land use classification based on multi-modal data fusion has gained significant attention due to its potential to capture the complex characteristics of urban environments. However, effectively extracting and integrating discriminative features derived from heterogeneous geospatial data remain challenging. This study proposes an ensemble learning framework for land use classification by fusing cross-modal deep features from both physical and socioeconomic perspectives. Specifically, the framework utilizes the Masked Autoencoder (MAE) to extract global spatial dependencies from remote sensing imagery and applies long short-term memory (LSTM) networks to model spatial distribution patterns of points of interest (POIs) based on type co-occurrence. Furthermore, we employ inter-modal contrastive learning to enhance the representation of physical and socioeconomic features. To verify the superiority of the ensemble learning framework, we apply it to map the land use distribution of Bejing. By coupling various physical and socioeconomic features, the framework achieves an average accuracy of 84.33 %, surpassing several comparative baseline methods. Furthermore, the framework demonstrates comparable performance when applied to a Shenzhen dataset, confirming its robustness and generalizability. The findings highlight the importance of fully extracting and effectively integrating multi-source deep features in land use classification, providing a robust solution for urban planning and sustainable development. Full article
Show Figures

Figure 1

28 pages, 2038 KB  
Article
Cognitive-Inspired Multimodal Learning Framework for Hazard Identification in Highway Construction with BIM–GIS Integration
by Jibiao Zhou, Zewei Li, Zhan Shi, Xinhua Mao and Chao Gao
Sustainability 2025, 17(21), 9395; https://doi.org/10.3390/su17219395 - 22 Oct 2025
Viewed by 111
Abstract
Highway construction remains one of the most hazardous sectors in the infrastructure domain, where persistent accident rates challenge the vision of sustainable and safe development. Traditional hazard identification methods rely on manual inspections that are often slow, error-prone, and unable to cope with [...] Read more.
Highway construction remains one of the most hazardous sectors in the infrastructure domain, where persistent accident rates challenge the vision of sustainable and safe development. Traditional hazard identification methods rely on manual inspections that are often slow, error-prone, and unable to cope with complex and dynamic site conditions. To address these limitations, this study develops a cognitive-inspired multimodal learning framework integrated with BIM–GIS-enabled digital twins to advance intelligent hazard identification and digital management for highway construction safety. The framework introduces three key innovations: a biologically grounded attention mechanism that simulates inspector search behavior, an adaptive multimodal fusion strategy that integrates visual, textual, and sensor information, and a closed-loop digital twin platform that synchronizes physical and virtual environments in real time. The system was validated across five highway construction projects over an 18-month period. Results show that the framework achieved a hazard detection accuracy of 91.7% with an average response time of 147 ms. Compared with conventional computer vision methods, accuracy improved by 18.2%, while gains over commercial safety systems reached 24.8%. Field deployment demonstrated a 34% reduction in accidents and a 42% increase in inspection efficiency, delivering a positive return on investment within 8.7 months. By linking predictive safety analytics with BIM–GIS semantics and site telemetry, the framework enhances construction safety, reduces delays and rework, and supports more resource-efficient, low-disruption project delivery, highlighting its potential as a sustainable pathway toward zero-accident highway construction. Full article
Show Figures

Figure 1

22 pages, 4655 KB  
Article
Rural Settlement Mapping and Its Spatiotemporal Dynamics Monitoring in the Yellow River Delta Using Multi-Modal Fusion of Landsat Optical and Sentinel-1 SAR Polarimetric Decomposition Data by Leveraging Deep Learning
by Jiantao Liu, Yan Zhang, Fei Meng, Jianhua Gong, Dong Zhang, Yu Peng and Can Zhang
Remote Sens. 2025, 17(21), 3512; https://doi.org/10.3390/rs17213512 - 22 Oct 2025
Viewed by 110
Abstract
The Yellow River Delta (YRD) is a vital agricultural and ecologically fragile zone in China. Understanding the spatial pattern and evolutionary characteristics of Rural Settlements Area (RSA) in this region is crucial for both ecological protection and sustainable development. This study focuses on [...] Read more.
The Yellow River Delta (YRD) is a vital agricultural and ecologically fragile zone in China. Understanding the spatial pattern and evolutionary characteristics of Rural Settlements Area (RSA) in this region is crucial for both ecological protection and sustainable development. This study focuses on Dongying, a key YRD city, and compares four advanced deep learning models—U-Net, DeepLabv3+, TransUNet, and TransDeepLab—using fused Sentinel-1 radar and Landsat optical imagery to identify the optimal method for RSA mapping. Results show that TransUNet, integrating polarization and optical features, achieves the highest accuracy, with Precision, Recall, F1 score, and mIoU of 89.27%, 80.70%, 84.77%, and 85.39%, respectively. Accordingly, TransUNet was applied for the spatiotemporal extraction of RSA in 2002, 2008, 2015, 2019, and 2023. The results indicate that medium-sized settlements dominate, showing a “dense in the west/south, sparse in the east/north” pattern with clustered distribution. Settlement patches are generally regular but grow more complex over time while maintaining strong connectivity. In summary, the proposed method offers technical support for RSA identification in the YRD, and the extracted multi-temporal settlement data can serve as a valuable reference for optimizing settlement layout in the region. Full article
Show Figures

Figure 1

39 pages, 1188 KB  
Review
A Scoping Review of AI-Based Approaches for Detecting Autism Traits Using Voice and Behavioral Data
by Hajarimino Rakotomanana and Ghazal Rouhafzay
Bioengineering 2025, 12(11), 1136; https://doi.org/10.3390/bioengineering12111136 - 22 Oct 2025
Viewed by 143
Abstract
This scoping review systematically maps the rapidly evolving application of Artificial Intelligence (AI) in Autism Spectrum Disorder (ASD) diagnostics, specifically focusing on computational behavioral phenotyping. Recognizing that observable traits like speech and movement are critical for early, timely intervention, the study synthesizes AI’s [...] Read more.
This scoping review systematically maps the rapidly evolving application of Artificial Intelligence (AI) in Autism Spectrum Disorder (ASD) diagnostics, specifically focusing on computational behavioral phenotyping. Recognizing that observable traits like speech and movement are critical for early, timely intervention, the study synthesizes AI’s use across eight key behavioral modalities. These include voice biomarkers, conversational dynamics, linguistic analysis, movement analysis, activity recognition, facial gestures, visual attention, and multimodal approaches. The review analyzed 158 studies published between 2015 and 2025, revealing that modern Machine Learning and Deep Learning techniques demonstrate highly promising diagnostic performance in controlled environments, with reported accuracies of up to 99%. Despite this significant capability, the review identifies critical challenges that impede clinical implementation and generalizability. These persistent limitations include pervasive issues with dataset heterogeneity, gender bias in samples, and small overall sample sizes. By detailing the current landscape of observable data types, computational methodologies, and available datasets, this work establishes a comprehensive overview of AI’s current strengths and fundamental weaknesses in ASD diagnosis. The article concludes by providing actionable recommendations aimed at guiding future research toward developing diagnostic solutions that are more inclusive, generalizable, and ultimately applicable in clinical settings. Full article
Show Figures

Figure 1

25 pages, 1741 KB  
Article
Event-Aware Multimodal Time-Series Forecasting via Symmetry-Preserving Graph-Based Cross-Regional Transfer Learning
by Shu Cao and Can Zhou
Symmetry 2025, 17(11), 1788; https://doi.org/10.3390/sym17111788 - 22 Oct 2025
Viewed by 130
Abstract
Forecasting real-world time series in domains with strong event sensitivity and regional variability poses unique challenges, as predictive models must account for sudden disruptions, heterogeneous contextual factors, and structural differences across locations. In tackling these challenges, we draw on the concept of symmetry [...] Read more.
Forecasting real-world time series in domains with strong event sensitivity and regional variability poses unique challenges, as predictive models must account for sudden disruptions, heterogeneous contextual factors, and structural differences across locations. In tackling these challenges, we draw on the concept of symmetry that refers to the balance and invariance patterns across temporal, multimodal, and structural dimensions, which help reveal consistent relationships and recurring patterns within complex systems. This study is based on two multimodal datasets covering 12 tourist regions and more than 3 years of records, ensuring robustness and practical relevance of the results. In many applications, such as monitoring economic indicators, assessing operational performance, or predicting demand patterns, short-term fluctuations are often triggered by discrete events, policy changes, or external incidents, which conventional statistical and deep learning approaches struggle to model effectively. To address these limitations, we propose an event-aware multimodal time-series forecasting framework with graph-based regional transfer built upon an enhanced PatchTST backbone. The framework unifies multimodal feature extraction, event-sensitive temporal reasoning, and graph-based structural adaptation. Unlike Informer, Autoformer, FEDformer, or PatchTST, our model explicitly addresses naive multimodal fusion, event-agnostic modeling, and weak cross-regional transfer by introducing an event-aware Multimodal Encoder, a Temporal Event Reasoner, and a Multiscale Graph Module. Experiments on diverse multi-region multimodal datasets demonstrate that our method achieves substantial improvements over eight state-of-the-art baselines in forecasting accuracy, event response modeling, and transfer efficiency. Specifically, our model achieves a 15.06% improvement in the event recovery index, a 15.1% reduction in MAE, and a 19.7% decrease in event response error compared to PatchTST, highlighting its empirical impact on tourism event economics forecasting. Full article
Show Figures

Figure 1

21 pages, 1732 KB  
Review
Artificial Intelligence in Clinical Oncology: From Productivity Enhancement to Creative Discovery
by Masahiro Kuno, Hiroki Osumi, Shohei Udagawa, Kaoru Yoshikawa, Akira Ooki, Eiji Shinozaki, Tetsuo Ishikawa, Junna Oba, Kensei Yamaguchi and Kazuhiro Sakurada
Curr. Oncol. 2025, 32(11), 588; https://doi.org/10.3390/curroncol32110588 - 22 Oct 2025
Viewed by 388
Abstract
Modern clinical oncology faces an unprecedented data complexity that exceeds human analytical capacity, making artificial intelligence (AI) integration essential rather than optional. This review examines the dual impact of AI on productivity enhancement and creative discovery in cancer care. We trace the evolution [...] Read more.
Modern clinical oncology faces an unprecedented data complexity that exceeds human analytical capacity, making artificial intelligence (AI) integration essential rather than optional. This review examines the dual impact of AI on productivity enhancement and creative discovery in cancer care. We trace the evolution from traditional machine learning to deep learning and transformer-based foundation models, analyzing their clinical applications. AI enhances productivity by automating diagnostic tasks, streamlining documentation, and accelerating research workflows across imaging modalities and clinical data processing. More importantly, AI enables creative discovery by integrating multimodal data to identify computational biomarkers, performing unsupervised phenotyping to reveal hidden patient subgroups, and accelerating drug development. Finally, we introduce the FUTURE-AI framework, outlining the essential requirements for translating AI models into clinical practice. This ensures the responsible deployment of AI, which augments rather than replaces clinical judgment, while maintaining patient-centered care. Full article
Show Figures

Figure 1

12 pages, 3307 KB  
Article
Redefining MRI-Based Skull Segmentation Through AI-Driven Multimodal Integration
by Michel Beyer, Alexander Aigner, Alexandru Burde, Alexander Brasse, Sead Abazi, Lukas B. Seifert, Jakob Wasserthal, Martin Segeroth, Mohamed Omar and Florian M. Thieringer
J. Imaging 2025, 11(11), 372; https://doi.org/10.3390/jimaging11110372 - 22 Oct 2025
Viewed by 179
Abstract
Skull segmentation in magnetic resonance imaging (MRI) is essential for cranio-maxillofacial (CMF) surgery planning, yet manual approaches are time-consuming and error-prone. Computed tomography (CT) provides superior bone contrast but exposes patients to ionizing radiation, which is particularly concerning in pediatric care. This study [...] Read more.
Skull segmentation in magnetic resonance imaging (MRI) is essential for cranio-maxillofacial (CMF) surgery planning, yet manual approaches are time-consuming and error-prone. Computed tomography (CT) provides superior bone contrast but exposes patients to ionizing radiation, which is particularly concerning in pediatric care. This study presents an AI-based workflow that enables skull segmentation directly from routine MRI. Using 186 paired CT–MRI datasets, CT-based segmentations were transferred to MRI via multimodal registration to train dedicated deep learning models. Performance was evaluated against manually segmented CT ground truth using Dice Similarity Coefficient (DSC), Mean Surface Distance (MSD), and Hausdorff Distance (HD). AI achieved higher performance on CT (DSC 0.981) than MRI (DSC 0.864), with MSD and HD also favoring CT. Despite lower absolute accuracy on MRI, the approach substantially improved segmentation quality compared with manual MRI methods, particularly in clinically relevant regions. This automated method enables accurate skull modeling from standard MRI without radiation exposure or specialized sequences. While CT remains more precise, the presented framework enhances MRI utility in surgical planning, reduces manual workload, and supports safer, patient-specific treatment, especially for pediatric and trauma cases. Full article
(This article belongs to the Section AI in Imaging)
Show Figures

Figure 1

18 pages, 2025 KB  
Article
A Priori Prediction of Neoadjuvant Chemotherapy Response in Breast Cancer Using Deep Features from Pre-Treatment MRI and CT
by Deok Hyun Jang, Laurentius O. Osapoetra, Lakshmanan Sannachi, Belinda Curpen, Ana Pejović-Milić and Gregory J. Czarnota
Cancers 2025, 17(20), 3394; https://doi.org/10.3390/cancers17203394 - 21 Oct 2025
Viewed by 250
Abstract
Background: Response to neoadjuvant chemotherapy (NAC) is a key prognostic indicator in breast cancer, yet current assessment relies on postoperative pathology. This study investigated the use of deep features derived from pre-treatment MRI and CT scans, in conjunction with clinical variables, to [...] Read more.
Background: Response to neoadjuvant chemotherapy (NAC) is a key prognostic indicator in breast cancer, yet current assessment relies on postoperative pathology. This study investigated the use of deep features derived from pre-treatment MRI and CT scans, in conjunction with clinical variables, to predict treatment response a priori. Methods: Two response endpoints were analyzed: pathologic complete response (pCR) versus non-pCR, and responders versus non-responders, with response defined as a reduction in tumor size of at least 30%. Intratumoral and peritumoral segmentations were generated on contrast-enhanced T1-weighted (CE-T1) and T2-weighted MRI, as well as contrast-enhanced CT images of tumors. Deep features were extracted from these regions using ResNet10, ResNet18, ResNet34, and ResNet50 architectures pre-trained with MedicalNet. Handcrafted radiomic features were also extracted for comparison. Feature selection was conducted with minimum redundancy maximum relevance (mRMR) followed by recursive feature elimination (RFE), and classification was performed using XGBoost across ten independent data partitions. Results: A total of 177 patients were analyzed in this study. ResNet34-derived features achieved the highest overall classification performance under both criteria, outperforming handcrafted features and deep features from other ResNet architectures. For distinguishing pCR from non-pCR, ResNet34 achieved a balanced accuracy of 81.6%, whereas handcrafted radiomics achieved 77.9%. For distinguishing responders from non-responders, ResNet34 achieved a balanced accuracy of 73.5%, compared with 70.2% for handcrafted radiomics. Conclusions: Deep features extracted from routinely acquired MRI and CT, when combined with clinical information, improve the prediction of NAC response in breast cancer. This multimodal framework demonstrates the value of deep learning-based approaches as a complement to handcrafted radiomics and provides a basis for more individualized treatment strategies. Full article
(This article belongs to the Special Issue CT/MRI/PET in Cancer)
Show Figures

Figure 1

19 pages, 284 KB  
Article
Teachers’ Perceptions and Students’ Strategies in Using AI-Mediated Informal Digital Learning for Career ESL Writing
by Lan Thi Huong Nguyen, Hanh Dinh, Thi Bich Nguyen Dao and Ngoc Giang Tran
Educ. Sci. 2025, 15(10), 1414; https://doi.org/10.3390/educsci15101414 - 21 Oct 2025
Viewed by 266
Abstract
This study aims to explore teachers’ perceptions and students’ strategies when integrating AI-mediated informal digital learning of English tools (AI-IDLE) into career ESL writing instruction. This case study involved six university instructors and over 300 students in an English writing course. Although AI-IDLE [...] Read more.
This study aims to explore teachers’ perceptions and students’ strategies when integrating AI-mediated informal digital learning of English tools (AI-IDLE) into career ESL writing instruction. This case study involved six university instructors and over 300 students in an English writing course. Although AI-IDLE has broadened English access beyond classrooms, existing research on writing skills often neglects students’ diverse strategies that correspond to their professional aspirations, as well as teachers’ perceptions. The data included a demographic questionnaire, think-aloud protocols for real-time assessment of cognitive processes during the task, and semi-structured interviews for teachers’ validation. Findings reveal three major student strategies: (1) explicit genre understanding, (2) student-driven selection of digital multimodal tools—such as Grammarly, ChatGPT, Canva with Magic Write, and Invideo—to integrate text with images, sound, and layout for improved rhetorical accessibility, and (3) alignment with students’ post-graduation career needs. Students’ work with these AI tools demonstrated that when they created projects aligned with professional identities and future job needs, they became more aware of how to improve their writing; however, the teachers expressed hopes and doubts about the tools’ effectiveness and authenticity of the students’ work. Suggestions to use AI-IDLE to improve writing were provided. Full article
Back to TopTop