Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (1,577)

Search Parameters:
Keywords = multi-modal fusion

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
24 pages, 3303 KB  
Article
A Modular Framework for RGB Image Processing and Real-Time Neural Inference: A Case Study in Microalgae Culture Monitoring
by José Javier Gutiérrez-Ramírez, Ricardo Enrique Macias-Jamaica, Víctor Manuel Zamudio-Rodríguez, Héctor Arellano Sotelo, Dulce Aurora Velázquez-Vázquez, Juan de Anda-Suárez and David Asael Gutiérrez-Hernández
Eng 2025, 6(9), 221; https://doi.org/10.3390/eng6090221 - 2 Sep 2025
Abstract
Recent progress in computer vision and embedded systems has facilitated real-time monitoring of bioprocesses; however, lightweight and scalable solutions for resource-constrained settings remain limited. This work presents a modular framework for monitoring Chlorella vulgaris growth by integrating RGB image processing with multimodal sensor [...] Read more.
Recent progress in computer vision and embedded systems has facilitated real-time monitoring of bioprocesses; however, lightweight and scalable solutions for resource-constrained settings remain limited. This work presents a modular framework for monitoring Chlorella vulgaris growth by integrating RGB image processing with multimodal sensor fusion. The system incorporates a Logitech C920 camera and low-cost pH and temperature sensors within a compact photobioreactor. It extracts RGB channel statistics, luminance, and environmental data to generate a 10-dimensional feature vector. A feedforward artificial neural network (ANN) with ReLU activations, dropout layers, and SMOTE-based data balancing was trained to classify growth phases: lag, exponential, and stationary. The optimized model, quantized to 8 bits, was deployed on an ESP32 microcontroller, achieving 98.62% accuracy with 4.8 ms inference time and a 13.48 kB memory footprint. Robustness analysis confirmed tolerance to geometric transformations, though variable lighting reduced performance. Principal component analysis (PCA) retained 95% variance, supporting the discriminative power of the features. The proposed system outperformed previous vision-only methods, demonstrating the advantages of multimodal fusion for early detection. Limitations include sensitivity to lighting and validation limited to a single species. Future directions include incorporating active lighting control and extending the model to multi-species classification for broader applicability. Full article
(This article belongs to the Special Issue Artificial Intelligence for Engineering Applications, 2nd Edition)
20 pages, 4992 KB  
Article
Path Loss Prediction Model of 5G Signal Based on Fusing Data and XGBoost—SHAP Method
by Tingting Xu, Nuo Xu, Jay Gao, Yadong Zhou and Haoran Ma
Sensors 2025, 25(17), 5440; https://doi.org/10.3390/s25175440 - 2 Sep 2025
Abstract
The accurate prediction of path loss is essential for planning and optimizing communication networks, as it directly impacts the user experience. In 5G signal propagation, the mix of varied terrain and dense high-rise buildings poses significant challenges. For example, signals are more prone [...] Read more.
The accurate prediction of path loss is essential for planning and optimizing communication networks, as it directly impacts the user experience. In 5G signal propagation, the mix of varied terrain and dense high-rise buildings poses significant challenges. For example, signals are more prone to multipath effects and occlusion and shadowing occur often, leading to high nonlinearities and uncertainties in the signal path. Traditional and shallow models often fail to accurately depict 5G signal characteristics in complex terrains, limiting the accuracy of path loss modeling. To address this issue, our research introduces innovative feature engineering and prediction models for 5G signals. By utilizing smartphones as signal receivers and creating a multimodal system that captures 3D structures and obstructions in the N1 and N78 bands in China, the study aimed to overcome the shortcomings of traditional linear models, especially in mountainous areas. It employed the XGBoost algorithm with Optuna for hyperparameter tuning, improving model performance. After training on real 5G data, the model achieved a breakthrough in 5G signal path loss prediction, with an R2 of 0.76 and an RMSE of 3.81 dBm. Additionally, SHAP values were employed to interpret the results, revealing the relative impact of various environmental features on 5G signal path loss. This research enhances the accuracy and stability of predictions and offers a technical framework and theoretical foundation for planning and optimizing wireless communication networks in complex environments and terrains. Full article
(This article belongs to the Section Communications)
17 pages, 1447 KB  
Article
A Novel Prediction Model for Multimodal Medical Data Based on Graph Neural Networks
by Lifeng Zhang, Teng Li, Hongyan Cui, Quan Zhang, Zijie Jiang, Jiadong Li, Roy E. Welsch and Zhongwei Jia
Mach. Learn. Knowl. Extr. 2025, 7(3), 92; https://doi.org/10.3390/make7030092 - 2 Sep 2025
Abstract
Multimodal medical data provides a wide and real basis for disease diagnosis. Computer-aided diagnosis (CAD) powered by artificial intelligence (AI) is becoming increasingly prominent in disease diagnosis. CAD for multimodal medical data requires addressing the issues of data fusion and prediction. Traditionally, the [...] Read more.
Multimodal medical data provides a wide and real basis for disease diagnosis. Computer-aided diagnosis (CAD) powered by artificial intelligence (AI) is becoming increasingly prominent in disease diagnosis. CAD for multimodal medical data requires addressing the issues of data fusion and prediction. Traditionally, the prediction performance of CAD models has not been good enough due to the complicated dimensionality reduction. Therefore, this paper proposes a fusion and prediction model—EPGC—for multimodal medical data based on graph neural networks. Firstly, we select features from unstructured multimodal medical data and quantify them. Then, we transform the multimodal medical data into a graph data structure by establishing each patient as a node, and establishing edges based on the similarity of features between the patients. Normalization of data is also essential in this process. Finally, we build a node prediction model based on graph neural networks and predict the node classification, which predicts the patients’ diseases. The model is validated on two publicly available datasets of heart diseases. Compared to the existing models that typically involve dimensionality reduction, classification, or the establishment of complex deep learning networks, the proposed model achieves outstanding results with the experimental dataset. This demonstrates that the fusion and diagnosis of multimodal data can be effectively achieved without dimension reduction or intricate deep learning networks. We take pride in exploring unstructured multimodal medical data using deep learning and hope to make breakthroughs in various fields. Full article
Show Figures

Figure 1

25 pages, 487 KB  
Review
Deformable and Fragile Object Manipulation: A Review and Prospects
by Yicheng Zhu, David Yang and Yangming Lee
Sensors 2025, 25(17), 5430; https://doi.org/10.3390/s25175430 - 2 Sep 2025
Abstract
Deformable object manipulation (DOM) is a primary bottleneck for the real-world application of autonomous robots, requiring advanced frameworks for sensing, perception, modeling, planning, and control. When fragile objects such as soft tissues or fruits are involved, ensuring safety becomes the paramount concern, fundamentally [...] Read more.
Deformable object manipulation (DOM) is a primary bottleneck for the real-world application of autonomous robots, requiring advanced frameworks for sensing, perception, modeling, planning, and control. When fragile objects such as soft tissues or fruits are involved, ensuring safety becomes the paramount concern, fundamentally altering the manipulation problem from one of pure trajectory optimization to one of constrained optimization and real-time adaptive control. Existing DOM methodologies, however, often fall short of addressing fragility constraints as a core design feature, leading to significant gaps in real-time adaptiveness and generalization. This review systematically examines individual components in DOM with a focus on their effectiveness in handling fragile objects. We identified key limitations in current approaches and, based on this analysis, discussed a promising framework that utilizes both low-latency reflexive mechanisms and global optimization to dynamically adapt to specific object instances. Full article
(This article belongs to the Special Issue Advanced Robotic Manipulators and Control Applications)
Show Figures

Figure 1

16 pages, 2827 KB  
Article
A Dual-Modality CNN Approach for RSS-Based Indoor Positioning Using Spatial and Frequency Fingerprints
by Xiangchen Lai, Yunzhi Luo and Yong Jia
Sensors 2025, 25(17), 5408; https://doi.org/10.3390/s25175408 - 2 Sep 2025
Abstract
Indoor positioning systems based on received signal strength (RSS) achieve indoor positioning by leveraging the position-related features inherent in spatial RSS fingerprint images. Their positioning accuracy and robustness are directly influenced by the quality of fingerprint features. However, the inherent spatial low-resolution characteristic [...] Read more.
Indoor positioning systems based on received signal strength (RSS) achieve indoor positioning by leveraging the position-related features inherent in spatial RSS fingerprint images. Their positioning accuracy and robustness are directly influenced by the quality of fingerprint features. However, the inherent spatial low-resolution characteristic of spatial RSS fingerprint images makes it challenging to effectively extract subtle fingerprint features. To address this issue, this paper proposes an RSS-based indoor positioning method that combines enhanced spatial frequency fingerprint representation with fusion learning. First, bicubic interpolation is applied to improve image resolution and reveal finer spatial details. Then, a 2D fast Fourier transform (2D FFT) converts the enhanced spatial images into frequency domain representations to supplement spectral features. These spatial and frequency fingerprints are used as dual-modality inputs for a parallel convolutional neural network (CNN) model with efficient multi-scale attention (EMA) modules. The model extracts modality-specific features and fuses them to generate enriched representations. Each modality—spatial, frequency, and fused—is passed through a dedicated fully connected network to predict 3D coordinates. A coordinate optimization strategy is introduced to select the two most reliable outputs for each axis (x, y, z), and their average is used as the final estimate. Experiments on seven public datasets show that the proposed method significantly improves positioning accuracy, reducing the mean positioning error by up to 47.1% and root mean square error (RMSE) by up to 54.4% compared with traditional and advanced time–frequency methods. Full article
(This article belongs to the Section Navigation and Positioning)
Show Figures

Figure 1

26 pages, 13537 KB  
Article
GeoJapan Fusion Framework: A Large Multimodal Model for Regional Remote Sensing Recognition
by Yaozong Gan, Guang Li, Ren Togo, Keisuke Maeda, Takahiro Ogawa and Miki Haseyama
Remote Sens. 2025, 17(17), 3044; https://doi.org/10.3390/rs17173044 - 1 Sep 2025
Abstract
Recent advances in large multimodal models (LMMs) have opened new opportunities for multitask recognition from remote sensing images. However, existing approaches still face challenges in effectively recognizing the complex geospatial characteristics of regions such as Japan, where its location along the seismic belt [...] Read more.
Recent advances in large multimodal models (LMMs) have opened new opportunities for multitask recognition from remote sensing images. However, existing approaches still face challenges in effectively recognizing the complex geospatial characteristics of regions such as Japan, where its location along the seismic belt leads to highly diverse urban environments and cityscapes that differ from those in other regions. To overcome these challenges, we propose the GeoJapan Fusion Framework (GFF), a multimodal architecture that integrates a large language model (LLM) and a vision–language model (VLM) and strengthens multimodal alignment ability through an in-context learning mechanism to support multitask recognition for Japanese remote sensing images. The GFF also incorporates a cross-modal feature fusion mechanism with low-rank adaptation (LoRA) to enhance representation alignment and enable efficient model adaptation. To facilitate the construction of the GFF, we construct the GeoJapan dataset, which comprises a substantial collection of high-quality Japanese remote sensing images, designed to facilitate multitask recognition using LMMs. We conducted extensive experiments and compared our method with state-of-the-art LMMs. The experimental results demonstrate that GFF outperforms previous approaches across multiple tasks, demonstrating its promising ability for multimodal multitask remote sensing recognition. Full article
(This article belongs to the Special Issue Remote Sensing Image Classification: Theory and Application)
17 pages, 2227 KB  
Article
Remaining Useful Life Prediction of Turbine Engines Using Multimodal Transfer Learning
by Jiaze Li and Zeliang Yang
Machines 2025, 13(9), 789; https://doi.org/10.3390/machines13090789 - 1 Sep 2025
Abstract
Remaining useful life (RUL) prediction is a core technology in prognostics and health management (PHM), crucial for ensuring the safe and efficient operation of modern industrial systems. Although deep learning methods have shown potential in RUL prediction, they often face two major challenges: [...] Read more.
Remaining useful life (RUL) prediction is a core technology in prognostics and health management (PHM), crucial for ensuring the safe and efficient operation of modern industrial systems. Although deep learning methods have shown potential in RUL prediction, they often face two major challenges: an insufficient generalization ability when distribution gaps exist between training data and real-world application scenarios, and the difficulty of comprehensively capturing complex equipment degradation processes with single-modal data. A key challenge in current research is how to effectively fuse multimodal data and leverage transfer learning to address RUL prediction in small-sample and cross-condition scenarios. This paper proposes an innovative deep multimodal fine-tuning regression (DMFR) framework to address these issues. First, the DMFR framework utilizes a Convolutional Neural Network (CNN) and a Transformer Network to extract distinct modal features, thereby achieving a more comprehensive understanding of data degradation patterns. Second, a fusion layer is employed to seamlessly integrate these multimodal features, extracting fused information to identify latent features, which are subsequently utilized in the predictor. Third, a two-stage training algorithm combining supervised pre-training and fine-tuning is proposed to accomplish transfer alignment from the source domain to the target domain. This paper utilized the Commercial Modular Aero-Propulsion System Simulation (C-MAPSS) turbine engine dataset publicly released by NASA to conduct comparative transfer experiments on various RUL prediction methods. The experimental results demonstrate significant performance improvements across all tasks. Full article
(This article belongs to the Section Machines Testing and Maintenance)
Show Figures

Figure 1

16 pages, 11354 KB  
Article
MTC-BEV: Semantic-Guided Temporal and Cross-Modal BEV Feature Fusion for 3D Object Detection
by Qiankai Xi, Li Ma, Jikai Zhang, Hongying Bai and Zhixing Wang
World Electr. Veh. J. 2025, 16(9), 493; https://doi.org/10.3390/wevj16090493 - 1 Sep 2025
Abstract
We propose MTC-BEV, a novel multi-modal 3D object detection framework for autonomous driving that achieves robust and efficient perception by combining spatial, temporal, and semantic cues. MTC-BEV integrates image and LiDAR features in the Bird’s-Eye View (BEV) space, where heterogeneous modalities are aligned [...] Read more.
We propose MTC-BEV, a novel multi-modal 3D object detection framework for autonomous driving that achieves robust and efficient perception by combining spatial, temporal, and semantic cues. MTC-BEV integrates image and LiDAR features in the Bird’s-Eye View (BEV) space, where heterogeneous modalities are aligned and fused through the Bidirectional Cross-Modal Attention Fusion (BCAP) module with positional encodings. To model temporal consistency, the Temporal Fusion (TTFusion) module explicitly compensates for ego-motion and incorporates past BEV features. In addition, a segmentation-guided BEV enhancement projects 2D instance masks into BEV space, highlighting semantically informative regions. Experiments on the nuScenes dataset demonstrate that MTC-BEV achieves a nuScenes Detection Score (NDS) of 72.4% at 14.91 FPS, striking a favorable balance between accuracy and efficiency. These results confirm the effectiveness of the proposed design, highlighting the potential of semantic-guided cross-modal and temporal fusion for robust 3D object detection in autonomous driving. Full article
(This article belongs to the Special Issue Electric Vehicle Autonomous Driving Based on Image Recognition)
Show Figures

Figure 1

15 pages, 4475 KB  
Case Report
The Role of Targeted Therapy and Immunotherapy in Metastatic GNET/Clear Cell Sarcoma (CCS) of the Gastrointestinal Tract: A Case Report
by Raluca Ioana Mihaila, Andreea Veronica Lazescu, Daniela Luminița Zob and Dana Lucia Stanculeanu
Curr. Issues Mol. Biol. 2025, 47(9), 706; https://doi.org/10.3390/cimb47090706 - 1 Sep 2025
Abstract
Background: Gastrointestinal neuroectodermal tumour (GNET), also known as clear cell sarcoma (CCS) of the gastrointestinal tract, is a rare neural crest-derived malignancy characterized by EWSR1-ATF1 or EWSR1-CREB1 fusions. Due to its rarity, there is limited evidence and no established guidelines for standard [...] Read more.
Background: Gastrointestinal neuroectodermal tumour (GNET), also known as clear cell sarcoma (CCS) of the gastrointestinal tract, is a rare neural crest-derived malignancy characterized by EWSR1-ATF1 or EWSR1-CREB1 fusions. Due to its rarity, there is limited evidence and no established guidelines for standard management. GNET is aggressive, with high rates of local recurrence, metastasis, and mortality. Case Presentation: We report the case of a 46-year-old woman with a family history of gastrointestinal cancers who was diagnosed in 2020 with an intestinal GNET. She underwent a segmental enterectomy as the first step of multimodal therapy. After three years of follow-up, she developed hepatic and peritoneal metastases. In November 2023, she began combined therapy with the anti-VEGF tyrosine kinase inhibitor cabozantinib and the immune checkpoint inhibitor nivolumab. The patient has maintained stable disease for 18 months with good tolerance and no adverse events. Molecular analysis of the tumour, which showed an EWSR1-CREB1 fusion, supported the selection of targeted therapy and immunotherapy as the preferred treatment approach. Conclusions: Immunotherapy and targeted therapy show promise for GNET/CCS treatment, but clinical standards are lacking, and evidence comes primarily from case reports. Additional data are needed to determine the best sequence and combination of therapies for this very rare disease. Full article
(This article belongs to the Special Issue Future Challenges of Targeted Therapy of Cancers: 2nd Edition)
Show Figures

Figure 1

21 pages, 3439 KB  
Article
Multimodal Emotion Recognition Based on Graph Neural Networks
by Zhongwen Tu, Raoxin Yan, Sihan Weng, Jiatong Li and Wei Zhao
Appl. Sci. 2025, 15(17), 9622; https://doi.org/10.3390/app15179622 - 1 Sep 2025
Abstract
Emotion recognition remains a challenging task in human–computer interaction. With advancements in multimodal computing, multimodal emotion recognition has become increasingly important and significant. To address the existing limitations in multimodal fusion efficiency, emotional–semantic association mining, and long-range context modeling, we propose an innovative [...] Read more.
Emotion recognition remains a challenging task in human–computer interaction. With advancements in multimodal computing, multimodal emotion recognition has become increasingly important and significant. To address the existing limitations in multimodal fusion efficiency, emotional–semantic association mining, and long-range context modeling, we propose an innovative graph neural network (GNN)-based framework. Our methodology integrates three key components: (1) a hierarchical sequential fusion (HSF) multimodal integration approach, (2) a sentiment–emotion enhanced joint learning framework, and (3) a context-similarity dual-layer graph architecture (CS-BiGraph). The experimental results demonstrate that our method achieves 69.1% accuracy on the IEMOCAP dataset, establishing a new state-of-the-art performance. For future work, we will explore robust extensions of our framework under real-world scenarios with higher noise levels and investigate the integration of emerging modalities for broader applicability. Full article
(This article belongs to the Special Issue Advanced Technologies and Applications of Emotion Recognition)
Show Figures

Figure 1

31 pages, 11349 KB  
Article
CSGI-Net: A Cross-Sample Graph Interaction Network for Multimodal Sentiment Analysis
by Erlin Tian, Shuai Zhao, Zuhe Li, Haoran Chen, Yifan Gao and Yushan Pan
Electronics 2025, 14(17), 3493; https://doi.org/10.3390/electronics14173493 - 31 Aug 2025
Abstract
With the widespread application of multimodal data in sentiment analysis, effectively integrating information from different modalities to improve the accuracy and robustness of sentiment analysis has become a critical issue. Although current fusion methods using Transformer architectures have enhanced inter-modal interaction and alignment [...] Read more.
With the widespread application of multimodal data in sentiment analysis, effectively integrating information from different modalities to improve the accuracy and robustness of sentiment analysis has become a critical issue. Although current fusion methods using Transformer architectures have enhanced inter-modal interaction and alignment to some extent, challenges such as the neglect of intra-modal feature complexity and the imbalance in multimodal data optimization limit the full utilization of modality-specific information by multimodal models. To address these challenges, we propose a novel multimodal sentiment analysis model: Cross-Sample Graph Interaction Network (CSGI-Net). Specifically, CSGI-Net facilitates interaction and learning between each sample and its similar samples within the same modality, thereby capturing the common emotional characteristics among similar samples. During the training process, CSGI-Net quantifies and calculates the optimization differences between modalities and dynamically adjusts the optimization amplitude based on these differences, thereby providing under-optimized modalities with more opportunities for improvement. Experimental results demonstrate that CSGI-Net achieves superior performance on two major multimodal sentiment analysis datasets: CMU-MOSI and CMU-MOSEI. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

30 pages, 2137 KB  
Review
A SPAR-4-SLR Systematic Review of AI-Based Traffic Congestion Detection: Model Performance Across Diverse Data Types
by Doha Bakir, Khalid Moussaid, Zouhair Chiba, Noreddine Abghour and Amina El omri
Smart Cities 2025, 8(5), 143; https://doi.org/10.3390/smartcities8050143 - 30 Aug 2025
Viewed by 118
Abstract
Traffic congestion remains a major urban challenge, impacting economic productivity, environmental sustainability, and commuter well-being. This systematic review investigates how artificial intelligence (AI) techniques contribute to detecting traffic congestion. Following the SPAR-4-SLR protocol, we analyzed 44 peer-reviewed studies covering three data categories—spatiotemporal, probe, [...] Read more.
Traffic congestion remains a major urban challenge, impacting economic productivity, environmental sustainability, and commuter well-being. This systematic review investigates how artificial intelligence (AI) techniques contribute to detecting traffic congestion. Following the SPAR-4-SLR protocol, we analyzed 44 peer-reviewed studies covering three data categories—spatiotemporal, probe, and hybrid/multimodal—and four AI model types—shallow machine learning (SML), deep learning (DL), probabilistic reasoning (PR), and hybrid approaches. Each model category was evaluated against metrics such as accuracy, the F1-score, computational efficiency, and deployment feasibility. Our findings reveal that SML techniques, particularly decision trees combined with optical flow, are optimal for real-time, low-resource applications. CNN-based DL models excel in handling unstructured and variable environments, while hybrid models offer improved robustness through multimodal data fusion. Although PR methods are less common, they add value when integrated with other paradigms to address uncertainty. This review concludes that no single AI approach is universally the best; rather, model selection should be aligned with the data type, application context, and operational constraints. This study offers actionable guidance for researchers and practitioners aiming to build scalable, context-aware AI systems for intelligent traffic management. Full article
(This article belongs to the Special Issue Cost-Effective Transportation Planning for Smart Cities)
Show Figures

Figure 1

22 pages, 2406 KB  
Article
Research on Driving Fatigue Assessment Based on Physiological and Behavioral Data
by Ge Zhang, Zhangyu Song, Xiu-Li Li, Wenqing Li and Kuai Liang
Electronics 2025, 14(17), 3469; https://doi.org/10.3390/electronics14173469 - 29 Aug 2025
Viewed by 120
Abstract
Driving fatigue is a crucial factor affecting road traffic safety. Accurately assessing the driver’s fatigue status is critical for accident prevention. This paper explores the assessment methods of driving fatigue under different conditions based on multimodal physiological and behavioral data. Physiological data such [...] Read more.
Driving fatigue is a crucial factor affecting road traffic safety. Accurately assessing the driver’s fatigue status is critical for accident prevention. This paper explores the assessment methods of driving fatigue under different conditions based on multimodal physiological and behavioral data. Physiological data such as heart rate, brainwave, electromyography, and pupil diameter were collected through experiments, as well as behavioral data such as posture changes, vehicle acceleration, and throttle usage. The results show that physiological and behavioral indicators have significant sensitivity to driving fatigue, and the fusion of multimodal data can effectively improve the accuracy of fatigue detection. Based on this, a comprehensive driving fatigue assessment model was constructed, and its applicability and reliability in different driving scenarios were verified. This study provides a theoretical basis for the development and application of driver fatigue monitoring systems, helping to achieve real-time fatigue warnings and protections, thereby improving driving safety. Full article
(This article belongs to the Special Issue Techniques and Applications of Multimodal Data Fusion)
Show Figures

Figure 1

23 pages, 1466 KB  
Article
TMU-Net: A Transformer-Based Multimodal Framework with Uncertainty Quantification for Driver Fatigue Detection
by Yaxin Zhang, Xuegang Xu, Yuetao Du and Ningchao Zhang
Sensors 2025, 25(17), 5364; https://doi.org/10.3390/s25175364 - 29 Aug 2025
Viewed by 160
Abstract
Driving fatigued is a prevalent issue frequently contributing to traffic accidents, prompting the development of automated fatigue detection methods based on various data sources, particularly reliable physiological signals. However, challenges in accuracy, robustness, and practicality persist, especially for cross-subject detection. Multimodal data fusion [...] Read more.
Driving fatigued is a prevalent issue frequently contributing to traffic accidents, prompting the development of automated fatigue detection methods based on various data sources, particularly reliable physiological signals. However, challenges in accuracy, robustness, and practicality persist, especially for cross-subject detection. Multimodal data fusion can enhance the effective estimation of driver fatigue. In this work, we leverage the advantages of multimodal signals to propose a novel Multimodal Attention Network (TMU-Net) for driver fatigue detection, achieving precise fatigue assessment by integrating electroencephalogram (EEG) and electrooculogram (EOG) signals. The core innovation of TMU-Net lies in its unimodal feature extraction module, which combines causal convolution, ConvSparseAttention, and Transformer encoders to effectively capture spatiotemporal features, and a multimodal fusion module that employs cross-modal attention and uncertainty-weighted gating to dynamically integrate complementary information. By incorporating uncertainty quantification, TMU-Net significantly enhances robustness to noise and individual variability. Experimental validation on the SEED-VIG dataset demonstrates TMU-Net’s superior performance stability across 23 subjects in cross-subject testing, effectively leveraging the complementary strengths of EEG (2 Hz full-band and five-band features) and EOG signals for high-precision fatigue detection. Furthermore, attention heatmap visualization reveals the dynamic interaction mechanisms between EEG and EOG signals, confirming the physiological rationality of TMU-Net’s feature fusion strategy. Practical challenges and future research directions for fatigue detection methods are also discussed. Full article
(This article belongs to the Special Issue AI and Smart Sensors for Intelligent Transportation Systems)
Show Figures

Figure 1

47 pages, 2691 KB  
Systematic Review
Buzzing with Intelligence: A Systematic Review of Smart Beehive Technologies
by Josip Šabić, Toni Perković, Petar Šolić and Ljiljana Šerić
Sensors 2025, 25(17), 5359; https://doi.org/10.3390/s25175359 - 29 Aug 2025
Viewed by 345
Abstract
Smart-beehive technologies represent a paradigm shift in beekeeping, transitioning from traditional, reactive methods toward proactive, data-driven management. This systematic literature review investigates the current landscape of intelligent systems applied to beehives, focusing on the integration of IoT-based monitoring, sensor modalities, machine learning techniques, [...] Read more.
Smart-beehive technologies represent a paradigm shift in beekeeping, transitioning from traditional, reactive methods toward proactive, data-driven management. This systematic literature review investigates the current landscape of intelligent systems applied to beehives, focusing on the integration of IoT-based monitoring, sensor modalities, machine learning techniques, and their applications in precision apiculture. The review adheres to PRISMA guidelines and analyzes 135 peer-reviewed publications identified through searches of Web of Science, IEEE Xplore, and Scopus between 1990 and 2025. It addresses key research questions related to the role of intelligent systems in early problem detection, hive condition monitoring, and predictive intervention. Common sensor types include environmental, acoustic, visual, and structural modalities, each supporting diverse functional goals such as health assessment, behavior analysis, and forecasting. A notable trend toward deep learning, computer vision, and multimodal sensor fusion is evident, particularly in applications involving disease detection and colony behavior modeling. Furthermore, the review highlights a growing corpus of publicly available datasets critical for the training and evaluation of machine learning models. Despite the promising developments, challenges remain in system integration, dataset standardization, and large-scale deployment. This review offers a comprehensive foundation for the advancement of smart apiculture technologies, aiming to improve colony health, productivity, and resilience in increasingly complex environmental conditions. Full article
Show Figures

Figure 1

Back to TopTop