Saved Queries

Multimodal remote sensing object detection has gained increasing attention due to its ability to leverage complementary information from different sensing modalities, particularly visible (RGB) and thermal infrared (TIR) imagery. However, existing methods typically depend on deep, computationally intensive backbones and complex fusion strategies, limiting their suitability for real-time applications. To address these challenges, we propose a lightweight and efficient detection framework named RGB-TIR Multimodal Fusion Network (RTMF-Net), which introduces innovations in both the backbone architecture and fusion mechanism. Specifically, RTMF-Net adopts a dual-stream structure with modality-specific enhancement modules tailored for the characteristics of RGB and TIR data. The visible-light branch integrates a Convolutional Enhancement Fusion Block (CEFBlock) to improve multi-scale semantic representation with low computational overhead, while the thermal branch employs a Dual-Laplacian Enhancement Block (DLEBlock) to enhance frequency-domain structural features and weak texture cues. To further improve cross-modal feature interaction, a Weighted Denoising Fusion Module is designed, incorporating an Enhanced Fusion Attention (EFA) attention mechanism that adaptively suppresses redundant information and emphasizes salient object regions. Additionally, a Shape-Aware Intersection over Union (SA-IoU) loss function is proposed to improve localization robustness by introducing an aspect ratio penalty into the traditional IoU metric. Extensive experiments conducted on the ODinMJ and LLVIP multimodal datasets demonstrate that RTMF-Net achieves competitive performance, with mean Average Precision (mAP) scores of 98.7% and 95.7%, respectively, while maintaining a lightweight structure of only 4.3M parameters and 11.6 GFLOPs. These results confirm the effectiveness of RTMF-Net in achieving a favorable balance between accuracy and efficiency, making it well-suited for real-time remote sensing applications. Full article

(This article belongs to the Section Sensing and Imaging)

►▼ Show Figures

Figure 1

24 pages, 5241 KB

Open AccessArticle

CogMamba: Multi-Task Driver Cognitive Load and Physiological Non-Contact Estimation with Multimodal Facial Features

by Yicheng Xie and Bin Guo

Sensors 2025, 25(18), 5620; https://doi.org/10.3390/s25185620 - 9 Sep 2025

Abstract

The cognitive load of drivers directly affects the safety and practicality of advanced driving assistant systems, especially in autonomous driving scenarios where drivers need to quickly take control of the vehicle after performing non-driving-related tasks (NDRTs). However, existing driver cognitive load detection methods have shortcomings such as the inability to deploy invasive detection equipment inside vehicles and limitations to eye movement detection, which restrict their practical application. To achieve more efficient and practical cognitive load detection, this study proposes a multi-task non-contact cognitive load and physiological state estimation model based on RGB video, named CogMamba. The model utilizes multimodal features extracted from facial video and introduces the Mamba architecture to efficiently capture local and global temporal dependencies, thereby further jointly estimating cognitive load, heart rate (HR), and respiratory rate (RR). Experimental results demonstrate that CogMamba exhibits superior performance on two public datasets and shows excellent robustness under the cross-dataset generalization test. This study provides insights for non-contact driver state monitoring in real-world driving scenarios. Full article

(This article belongs to the Section Physical Sensors)

►▼ Show Figures

Figure 1

21 pages, 873 KB

Open AccessArticle

MBSCL-Net: Multi-Branch Spectral Network and Contrastive Learning for Next-Point-of-Interest Recommendation

by Sucheng Wang, Jinlai Zhang and Tao Zeng

Sensors 2025, 25(18), 5613; https://doi.org/10.3390/s25185613 - 9 Sep 2025

Abstract

Next-point-of-interest (POI) recommendation aims to model user preferences based on historical information to predict future mobility behavior, which has significant application value in fields such as urban planning, traffic management, and optimizing business decisions. However, existing methods often overlook the differences in location, time, and category information features, fail to fully utilize information from various modalities, and lack effective solutions for addressing users’ incidental behavior. Additionally, existing methods are somewhat lacking in capturing users’ personalized preferences. To address these issues, we propose a new method called Multi-Branch Spectral Network with Contrastive Learning (MBSCL-Net) for next-POI recommendation. We use a multihead attention mechanism to separately capture the distinct features of location, time, and category information, and then fuse the captured features to effectively integrate cross-modal features, avoid feature confusion, and achieve effective modeling of multi-modal information. We propose converting the time-domain information of user check-ins into frequency-domain information through Fourier transformation, directly enhancing the low-frequency signals of users’ periodic behavior and suppressing occasional high-frequency noise, thereby greatly alleviating noise interference caused by the introduction of too much information. Additionally, we introduced contrastive learning loss to distinguish user behavior patterns and better model personalized preferences. Extensive experiments on two real-world datasets demonstrate that MBSCL-Net outperforms state-of-the-art (SOTA) methods. Full article

(This article belongs to the Section Intelligent Sensors)

►▼ Show Figures

Figure 1

17 pages, 3935 KB

Open AccessArticle

Markerless Force Estimation via SuperPoint-SIFT Fusion and Finite Element Analysis: A Sensorless Solution for Deformable Object Manipulation

by Qingqing Xu, Ruoyang Lai and Junqing Yin

Biomimetics 2025, 10(9), 600; https://doi.org/10.3390/biomimetics10090600 - 8 Sep 2025

Abstract

Contact-force perception is a critical component of safe robotic grasping. With the rapid advances in embodied intelligence technology, humanoid robots have enhanced their multimodal perception capabilities. Conventional force sensors face limitations, such as complex spatial arrangements, installation challenges at multiple nodes, and potential interference with robotic flexibility. Consequently, these conventional sensors are unsuitable for biomimetic robot requirements in object perception, natural interaction, and agile movement. Therefore, this study proposes a sensorless external force detection method that integrates SuperPoint-Scale Invariant Feature Transform (SIFT) feature extraction with finite element analysis to address force perception challenges. A visual analysis method based on the SuperPoint-SIFT feature fusion algorithm was implemented to reconstruct a three-dimensional displacement field of the target object. Subsequently, the displacement field was mapped to the contact force distribution using finite element modeling. Experimental results demonstrate a mean force estimation error of 7.60% (isotropic) and 8.15% (anisotropic), with RMSE < 8%, validated by flexible pressure sensors. To enhance the model’s reliability, a dual-channel video comparison framework was developed. By analyzing the consistency of the deformation patterns and mechanical responses between the actual compression and finite element simulation video keyframes, the proposed approach provides a novel solution for real-time force perception in robotic interactions. The proposed solution is suitable for applications such as precision assembly and medical robotics, where sensorless force feedback is crucial. Full article

(This article belongs to the Special Issue Bio-Inspired Intelligent Robot)

►▼ Show Figures

Figure 1

38 pages, 15014 KB

Open AccessArticle

Web-Based Multimodal Deep Learning Platform with XRAI Explainability for Real-Time Skin Lesion Classification and Clinical Decision Support

by Serra Aksoy, Pinar Demircioglu and Ismail Bogrekci

Cosmetics 2025, 12(5), 194; https://doi.org/10.3390/cosmetics12050194 - 8 Sep 2025

Abstract

Background: Skin cancer represents one of the most prevalent malignancies worldwide, with melanoma accounting for approximately 75% of skin cancer-related deaths despite comprising fewer than 5% of cases. Early detection dramatically improves survival rates from 14% to over 99%, highlighting the urgent need for accurate and accessible diagnostic tools. While deep learning has shown promise in dermatological diagnosis, existing approaches lack clinical explainability and deployable interfaces that bridge the gap between research innovation and practical healthcare applications. Methods: This study implemented a comprehensive multimodal deep learning framework using the HAM10000 dataset (10,015 dermatoscopic images across seven diagnostic categories). Three CNN architectures (DenseNet-121, EfficientNet-B3, ResNet-50) were systematically compared, integrating patient metadata, including age, sex, and anatomical location, with dermatoscopic image analysis. The first implementation of XRAI (eXplanation with Region-based Attribution for Images) explainability for skin lesion classification was developed, providing spatially coherent explanations aligned with clinical reasoning patterns. A deployable web-based clinical interface was created, featuring real-time inference, comprehensive safety protocols, risk stratification, and evidence-based cosmetic recommendations for benign conditions. Results: EfficientNet-B3 achieved superior performance with 89.09% test accuracy and 90.08% validation accuracy, significantly outperforming DenseNet-121 (82.83%) and ResNet-50 (78.78%). Test-time augmentation improved performance by 1.00 percentage point to 90.09%. The model demonstrated excellent performance for critical malignant conditions: melanoma (81.6% confidence), basal cell carcinoma (82.1% confidence), and actinic keratoses (88% confidence). XRAI analysis revealed clinically meaningful attention patterns focusing on irregular pigmentation for melanoma, ulcerated borders for basal cell carcinoma, and surface irregularities for precancerous lesions. Error analysis showed that misclassifications occurred primarily in visually ambiguous cases with high correlation (0.855–0.968) between model attention and ideal features. The web application successfully validated real-time diagnostic capabilities with appropriate emergency protocols for malignant conditions and comprehensive cosmetic guidance for benign lesions. Conclusions: This research successfully developed the first clinically deployable skin lesion classification system combining diagnostic accuracy with explainable AI and practical patient guidance. The integration of XRAI explainability provides essential transparency for clinical acceptance, while the web-based deployment democratizes access to advanced dermatological AI capabilities. Comprehensive validation establishes readiness for controlled clinical trials and potential integration into healthcare workflows, particularly benefiting underserved regions with limited specialist availability. This work bridges the critical gap between research-grade AI models and practical clinical utility, establishing a foundation for responsible AI integration in dermatological practice. Full article

(This article belongs to the Special Issue Feature Papers in Cosmetics in 2025)

►▼ Show Figures

Figure 1

23 pages, 9447 KB

Open AccessArticle

Multi-Modal Side-Channel Analysis Based on Isometric Compression and Combined Clustering

by Xiaoyong Kou, Wei Yang, Lunbo Li and Gongxuan Zhang

Symmetry 2025, 17(9), 1483; https://doi.org/10.3390/sym17091483 - 8 Sep 2025

Abstract

Side-channel analysis (SCA) poses a persistent threat to cryptographic hardware by exploiting unintended physical leakages. To address the limitations of traditional single-modality SCA methods, we propose a novel multi-modal side-channel analysis framework that targets the recovery of encryption keys by leveraging the imperfections inherent in hardware implementations. The core objective is to extract and classify information-rich segments from power and electromagnetic (EM) signals in order to recover secret keys without profiling or labeling. Our approach introduces a unified pipeline combining joint peak-based segmentation, isometric compression of variable-length trace segments, and multi-modal feature fusion. A key component of the framework is unsupervised clustering, which serves to automatically classify trace segments corresponding to different cryptographic operations (e.g., different key-dependent leakage classes), thereby enabling key byte hypothesis testing and full key reconstruction. Experimental results on an FPGA-based AES-128 implementation demonstrate that our method achieves up to 99.2% clustering accuracy and successfully recovers the entire encryption key using as few as 1–3 traces. Moreover, the proposed approach significantly reduces sample complexity and maintains resilience in low signal-to-noise conditions. These results highlight the practicality of our technique for side-channel vulnerability assessment and its potential to inform the design of more robust cryptographic hardware. Full article

(This article belongs to the Special Issue Security Analysis of Symmetric and Asymmetric Cryptographic Implementations)

►▼ Show Figures

Figure 1

16 pages, 846 KB

Open AccessArticle

MMKT: Multimodal Sentiment Analysis Model Based on Knowledge-Enhanced and Text-Guided Learning

by Chengkai Shi and Yunhua Zhang

Appl. Sci. 2025, 15(17), 9815; https://doi.org/10.3390/app15179815 (registering DOI) - 7 Sep 2025

Viewed by 283

Abstract

Multimodal Sentiment Analysis (MSA) aims to predict subjective human emotions by leveraging multimodal information. However, existing research inadequately utilizes explicit sentiment semantic information at the lexical level in text and overlooks noise interference from non-dominant modalities, such as irrelevant movements in visual modalities and background noise in audio modalities. To address this issue, we propose a multimodal sentiment analysis model based on knowledge enhancement and text-guided learning (MMKT). The model constructs a sentiment knowledge graph for the textual modality using the SenticNet knowledge base. This graph directly annotates word-level sentiment polarity, strengthening the model’s understanding of emotional vocabulary. Furthermore, global sentiment knowledge features are generated through graph embedding computations to enhance the multimodal fusion process. Simultaneously, a dynamic text-guided learning approach is introduced, which dynamically leverages multi-scale textual features to actively suppress redundant or conflicting information in visual and audio modalities, thereby generating purer cross-modal representations. Finally, concatenated textual features, cross-modal features, and knowledge features are utilized for sentiment prediction. Experimental results on the CMU-MOSEI and Twitter2019 dataset demonstrate the superior performance of the MMKT model. Full article

►▼ Show Figures

Figure 1

14 pages, 1276 KB

Open AccessProtocol

Integration of EHR and ECG Data for Predicting Paroxysmal Atrial Fibrillation in Stroke Patients

by Alireza Vafaei Sadr, Manvita Mareboina, Diana Orabueze, Nandini Sarkar, Seyyed Sina Hejazian, Ajith Vemuri, Ravi Shah, Ankit Maheshwari, Ramin Zand and Vida Abedi

Bioengineering 2025, 12(9), 961; https://doi.org/10.3390/bioengineering12090961 (registering DOI) - 7 Sep 2025

Viewed by 217

Abstract

Predicting paroxysmal atrial fibrillation (PAF) is challenging due to its transient nature. Existing methods often rely solely on electrocardiogram (ECG) waveforms or Electronic Health Record (EHR)-based clinical risk factors. We hypothesized that explicitly balancing the contributions of these heterogeneous data sources could improve prediction accuracy. We developed a Transformer-based deep learning model that integrates 12-lead ECG signals and 47 structured EHR variables from 189 patients with cryptogenic stroke, including 49 with PAF. By systematically varying the relative contributions of ECG and EHR data, we identified an optimal ratio for prediction. Best performance (accuracy: 0.70, sensitivity: 0.72, specificity: 0.87, Area Under Curve - Receiver Operating Characteristics (AUROC): 0.65, Area Under the Precision-Recall Curve (AUPRC): 0.43) was achieved using a 5-fold cross-validation when EHR data contributed one-third and ECG data two-thirds of the model’s input. This multimodal approach outperformed unimodal models, improving accuracy by 35% over EHR-only and 5% over ECG-only methods. Our results support the value of combining ECG and structured EHR information to improve accuracy and sensitivity in this pilot cohort, motivating validation in larger studies. Full article

(This article belongs to the Special Issue Machine Learning Technology in Predictive Healthcare)

►▼ Show Figures

Figure 1

23 pages, 10200 KB

Open AccessArticle

Real-Time Driver State Detection Using mmWave Radar: A Spatiotemporal Fusion Network for Behavior Monitoring on Edge Platforms

by Shih-Pang Tseng, Wun-Yang Wu, Jhing-Fa Wang and Dawei Tao

Electronics 2025, 14(17), 3556; https://doi.org/10.3390/electronics14173556 - 7 Sep 2025

Viewed by 247

Abstract

Fatigue and distracted driving are among the leading causes of traffic accidents, highlighting the importance of developing efficient and non-intrusive driver monitoring systems. Traditional camera-based methods are often limited by lighting variations, occlusions, and privacy concerns. In contrast, millimeter-wave (mmWave) radar offers a non-contact, privacy-preserving, and environment-robust solution, providing a forward-looking alternative. This study introduces a novel deep learning model, RTSFN (radar-based temporal-spatial fusion network), which simultaneously analyzes the temporal motion changes and spatial posture features of the driver. RTSFN incorporates a cross-gated fusion mechanism that dynamically integrates multi-modal information, enhancing feature complementarity and stabilizing behavior recognition. Experimental results show that RTSFN effectively detects dangerous driving states with an average F1 score of 94% and recognizes specific high-risk behaviors with an average F1 score of 97% and can run in real-time on edge devices such as the NVIDIA Jetson Orin Nano, demonstrating its strong potential for deployment in intelligent transportation and in-vehicle safety systems. Full article

(This article belongs to the Special Issue Artificial Intelligence (AI) Based Radar Detection and Recognition in Complex Electromagnetic Environments)

►▼ Show Figures

Figure 1

33 pages, 4897 KB

Open AccessReview

Recent Advances in Sensor Fusion Monitoring and Control Strategies in Laser Powder Bed Fusion: A Review

by Alexandra Papatheodorou, Nikolaos Papadimitriou, Emmanuel Stathatos, Panorios Benardos and George-Christopher Vosniakos

Machines 2025, 13(9), 820; https://doi.org/10.3390/machines13090820 (registering DOI) - 6 Sep 2025

Viewed by 296

Abstract

Laser Powder Bed Fusion (LPBF) has emerged as a leading additive manufacturing (AM) process for producing complex metal components. Despite its advantages, the inherent LPBF process complexity leads to challenges in achieving consistent quality and repeatability. To address these concerns, recent research efforts have focused on sensor fusion techniques for process monitoring, and on developing more elaborate control strategies. Sensor fusion combines information from multiple in situ sensors to provide more comprehensive insights into process characteristics such as melt pool behavior, spatter formation, and layer integrity. By leveraging multimodal data sources, sensor fusion enhances the detection and diagnosis of process anomalies in real-time. Closed-loop control systems may utilize this fused information to adjust key process parameters–such as laser power, focal depth, and scanning speed–to mitigate defect formation during the build process. This review focuses on the current state-of-the-art in sensor fusion monitoring and control strategies for LPBF. In terms of sensor fusion, recent advances extend beyond CNN-based approaches to include graph-based, attention, and transformer architectures. Among these, feature-level integration has shown the best balance between accuracy and computational cost. However, the limited volume of available experimental data, class-imbalance issues and lack of standardization still hinder further progress. In terms of control, a trend away from purely physics-based towards Machine Learning (ML)-assisted and hybrid strategies can be observed. These strategies show promise for more adaptive and effective quality enhancement. The biggest challenge is the broader validation on more complex part geometries and under realistic conditions using commercial LPBF systems. Full article

(This article belongs to the Special Issue In Situ Monitoring of Manufacturing Processes)

►▼ Show Figures

Figure 1

26 pages, 8009 KB

Open AccessArticle

Bearing Fault Diagnosis Based on Golden Cosine Scheduler-1DCNN-MLP-Cross-Attention Mechanisms (GCOS-1DCNN-MLP-Cross-Attention)

by Aimin Sun, Kang He, Meikui Dai, Liyong Ma, Hongli Yang, Fang Dong, Chi Liu, Zhuo Fu and Mingxing Song

Machines 2025, 13(9), 819; https://doi.org/10.3390/machines13090819 (registering DOI) - 6 Sep 2025

Viewed by 196

Abstract

In contemporary industrial machinery, bearings are a vital component, so the ability to diagnose bearing faults is extremely important. Current methodologies face challenges in feature extraction and perform suboptimally in environments with high noise levels. This paper proposes an enhanced, multimodal, feature-fusion-bearing fault diagnosis model. Integrating a 1DCNN-dual MLP framework with an enhanced two-way cross-attention mechanism enables in-depth feature fusion. Firstly, the raw fault time-series data undergo fast Fourier transform (FFT). Then, the original time-series data are input into a multi-layer perceptron (MLP) and a one-dimensional convolutional neural network (1DCNN) model. The frequency-domain data are then entered into the other multi-layer perceptron (MLP) model to extract deep features in both the time and frequency domains. These features are then fed into a serial bidirectional cross-attention mechanism for feature fusion. At the same time, a GCOS learning rate scheduler has been developed to automatically adjust the learning rate. Following fifteen independent experiments on the Case Western Reserve University bearing dataset, the fusion model achieved an average accuracy rate of 99.83%. Even in a high-noise environment (0 dB), the model achieved an accuracy rate of 90.66%, indicating its ability to perform well under such conditions. Its accuracy remains at 86.73%, even under 0 dB noise and variable operating conditions, fully demonstrating its exceptional robustness. Full article

(This article belongs to the Special Issue AI-Driven Intelligent Perception and Diagnosis of Mechanical Equipment)

►▼ Show Figures

Figure 1

24 pages, 4829 KB

Open AccessArticle

Home Robot Interaction Based on EEG Motor Imagery and Visual Perception Fusion

by Tie Hua Zhou, Dongsheng Li, Zhiwei Jian, Wei Ding and Ling Wang

Sensors 2025, 25(17), 5568; https://doi.org/10.3390/s25175568 - 6 Sep 2025

Viewed by 455

Abstract

Amid the intensification of demographic aging, home robots based on intelligent technology have shown great application potential in assisting the daily life of the elderly. This paper proposes a multimodal human–robot interaction system that integrates EEG signal analysis and visual perception, aiming to realize the perception ability of home robots on the intentions and environment of the elderly. Firstly, a channel selection strategy is employed to identify the most discriminative electrode channels based on Motor Imagery (MI) EEG signals; then, the signal representation ability is improved by combining Filter Bank co-Spatial Patterns (FBCSP), wavelet packet decomposition and nonlinear features, and one-to-many Support Vector Regression (SVR) is used to achieve four-class classification. Secondly, the YOLO v8 model is applied for identifying objects within indoor scenes. Subsequently, object confidence and spatial distribution are extracted, and scene recognition is performed using a Machine Learning technique. Finally, the EEG classification results are combined with the scene recognition results to establish the scene-intention correspondence, so as to realize the recognition of the intention-driven task types of the elderly in different home scenes. Performance evaluation reveals that the proposed method attains a recognition accuracy of 83.4%, which indicates that this method has good classification accuracy and practical application value in multimodal perception and human–robot collaborative interaction, and provides technical support for the development of smarter and more personalized home assistance robots. Full article

(This article belongs to the Section Electronic Sensors)

►▼ Show Figures

Figure 1

29 pages, 1588 KB

Open AccessReview

A Review of Dynamic Traffic Flow Prediction Methods for Global Energy-Efficient Route Planning

by Pengyang Qi, Chaofeng Pan, Xing Xu, Jian Wang, Jun Liang and Weiqi Zhou

Sensors 2025, 25(17), 5560; https://doi.org/10.3390/s25175560 - 5 Sep 2025

Viewed by 744

Abstract

Urbanization and traffic congestion caused by the surge in car ownership have exacerbated energy consumption and carbon emissions, and dynamic traffic flow prediction and energy-saving route planning have become the key to solving this problem. Dynamic traffic flow prediction accurately captures the spatio-temporal changes of traffic flow through advanced algorithms and models, providing prospective information for traffic management and travel decision-making. Energy-saving route planning optimizes travel routes based on prediction results, reduces the time vehicles spend on congested road sections, thereby reducing fuel consumption and exhaust emissions. However, there are still many shortcomings in the current relevant research, and the existing research is mostly isolated and applies a single model, and there is a lack of systematic comparison of the adaptability, generalization ability and fusion potential of different models in various scenarios, and the advantages of heterogeneous graph neural networks in integrating multi-source heterogeneous data in traffic have not been brought into play. This paper systematically reviews the relevant global studies from 2020 to 2025, focuses on the integration path of dynamic traffic flow prediction methods and energy-saving route planning, and reveals the advantages of LSTM, graph neural network and other models in capturing spatiotemporal features by combing the application of statistical models, machine learning, deep learning and mixed methods in traffic forecasting, and comparing their performance with RMSE, MAPE and other indicators, and points out that the potential of heterogeneous graph neural networks in multi-source heterogeneous data integration has not been fully explored. Aiming at the problem of disconnection between traffic prediction and path planning, an integrated framework is constructed, and the real-time prediction results are integrated into path algorithms such as A* and Dijkstra through multi-objective cost functions to balance distance, time and energy consumption optimization. Finally, the challenges of data quality, algorithm efficiency, and multimodal adaptation are analyzed, and the development direction of standardized evaluation platform and open source toolkit is proposed, providing theoretical support and practical path for the sustainable development of intelligent transportation systems. Full article

(This article belongs to the Section Vehicular Sensing)

►▼ Show Figures

Figure 1

24 pages, 893 KB

Open AccessArticle

Multi-Modal Topology-Aware Graph Neural Network for Robust Chemical–Protein Interaction Prediction

by Jianshi Wang

Int. J. Mol. Sci. 2025, 26(17), 8666; https://doi.org/10.3390/ijms26178666 (registering DOI) - 5 Sep 2025

Viewed by 620

Abstract

Reliable prediction of chemical–protein interactions (CPIs) remains a key challenge in drug discovery, especially under sparse or noisy biological data. We present MM-TCoCPIn, a Multi-Modal Topology-aware Chemical–Protein Interaction Network that integrates three causally grounded modalities—network topology, biomedical semantics, and a 3D protein structure—into an interpretable graph learning framework. The model processes topological features via a CTC (Comprehensive Topological Characteristics)-based encoder, literature-derived semantics via SciBERT (Scientific Bidirectional Encoder Representations from Transformers), and structural geometry via a GVP-GNN (Geometric Vector Perceptron Graph Neural Network) applied to AlphaFold2 contact graphs. Evaluation on datasets from STITCH, STRING, and PubMed shows that MM-TCoCPIn achieves state-of-the-art performance (AUC = 0.93, F1 = 0.92), outperforming uni-modal baselines. Importantly, ablation and counterfactual analyses confirm that each modality contributes distinct biological insight: topology ensures robustness, semantics enhance recall, and structure sharpens precision. This framework offers a scalable and causally interpretable solution for CPI modeling, bridging the gap between predictive accuracy and mechanistic understanding. Full article

(This article belongs to the Section Molecular Informatics)

►▼ Show Figures

Figure 1

21 pages, 13169 KB

Open AccessArticle

Automated Rice Seedling Segmentation and Unsupervised Health Assessment Using Segment Anything Model with Multi-Modal Feature Analysis

by Hassan Rezvan, Mohammad Javad Valadan Zoej, Fahimeh Youssefi and Ebrahim Ghaderpour

Sensors 2025, 25(17), 5546; https://doi.org/10.3390/s25175546 - 5 Sep 2025

Viewed by 589

Abstract

This research presents a fully automated two-step method for segmenting rice seedlings and assessing their health by integrating spectral, morphological, and textural features. Driven by the global need for increased food production, the proposed method enhances monitoring and control in agricultural processes. Seedling locations are first identified by the excess green minus excess red index, which enables automated point-prompt inputs for the segment anything model to achieve precise segmentation and masking. Morphological features are extracted from the generated masks, while spectral and textural features are derived from corresponding red–green–blue imagery. Health assessment is conducted through anomaly detection using a one-class support vector machine, which identifies seedlings exhibiting abnormal morphology or spectral signatures suggesting stress. The proposed method is validated by visual inspection and Silhouette score, confirming effective separation of anomalies. For segmentation, the proposed method achieved mean dice scores ranging from 72.6 to 94.7. For plant health assessment, silhouette scores ranged from 0.31 to 0.44 across both datasets and various growth stages. Applied across three consecutive rice growth stages, the framework facilitates temporal monitoring of seedling health. The findings highlight the potential of advanced segmentation and anomaly detection techniques to support timely interventions, such as pruning or replacing unhealthy seedlings, to optimize crop yield. Full article

(This article belongs to the Special Issue Sensing Technology and Computer Vision for Precision Agriculture and Smart Farming)

►▼ Show Figures

Figure 1

Show export options Show export options

Select all

Export citation of selected articles as:

Error

Oops... you haven't selected anything for export.

Displaying article 1-50 on page 1 of 43.

Go to page 1 2 3 4 5

Search Results (2,122)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI