MDPI - Publisher of Open Access Journals

21 pages, 11316 KB

Open AccessArticle

Multimodal Fusion Prediction of Radiation Pneumonitis via Key Pre-Radiotherapy Imaging Feature Selection Based on Dual-Layer Attention Multiple-Instance Learning

by Hao Wang, Dinghui Wu, Shuguang Han, Jingli Tang and Wenlong Zhang

J. Imaging 2026, 12(4), 158; https://doi.org/10.3390/jimaging12040158 - 8 Apr 2026

Abstract

Radiation pneumonitis (RP), one of the most common and severe complications in locally advanced non-small cell lung cancer (LA-NSCLC) patients following thoracic radiotherapy, presents significant challenges in prediction due to the complexity of clinical risk factors, incomplete multimodal data, and unavailable slice-level annotations [...] Read more.

Radiation pneumonitis (RP), one of the most common and severe complications in locally advanced non-small cell lung cancer (LA-NSCLC) patients following thoracic radiotherapy, presents significant challenges in prediction due to the complexity of clinical risk factors, incomplete multimodal data, and unavailable slice-level annotations in pre-radiotherapy CT images. To address these challenges, we propose a multimodal fusion framework based on Dual-Layer Attention-Based Adaptive Bag Embedding Multiple-Instance Learning (DAAE-MIL) for accurate RP prediction. This study retrospectively collected data from 995 LA-NSCLC patients who received thoracic radiotherapy between November 2018 and April 2025. After screening, Subject datasets (n = 670) were allocated for training (n = 535), and the remaining samples (n = 135) were reserved for an independent test set. The proposed framework first extracts pre-radiotherapy CT image features using a fine-tuned C3D network, followed by the DAAE-MIL module to screen critical instances and generate bag-level representations, thereby enhancing the accuracy of deep feature extraction. Subsequently, clinical data, radiomics features, and CT-derived deep features are integrated to construct a multimodal prediction model. The proposed model demonstrates promising RP prediction performance across multiple evaluation metrics, outperforming both state-of-the-art and unimodal RP prediction approaches. On the test set, it achieves an accuracy (ACC) of 0.93 and an area under the curve (AUC) of 0.97. This study validates that the proposed method effectively addresses the limitations of single-modal prediction and the unknown key features in pre-radiotherapy CT images while providing significant clinical value for RP risk assessment. Full article

(This article belongs to the Section Medical Imaging)

► Show Figures

Figure 1

28 pages, 5258 KB

Open AccessArticle

Dual-View Entropy-Driven AIS–Sonar Fusion for Surface and Underwater Target Discrimination

by Xiaoshuang Zhang, Jiayi Che, Xiaodan Xiong, Yucheng Zhang, Xinbo He, Mengsha Deng and Dezhi Wang

J. Mar. Sci. Eng. 2026, 14(7), 675; https://doi.org/10.3390/jmse14070675 - 4 Apr 2026

Viewed by 165

Abstract

Distinguishing surfaces from underwater targets in complex marine environments is challenging when relying solely on physical sonar features. To address the high uncertainty inherent in single-modal features and the conflicts arising from heterogeneous data, we propose a Dual-View Entropy-Driven Negation Dempster–Shafer (DVE-NDS) fusion [...] Read more.

Distinguishing surfaces from underwater targets in complex marine environments is challenging when relying solely on physical sonar features. To address the high uncertainty inherent in single-modal features and the conflicts arising from heterogeneous data, we propose a Dual-View Entropy-Driven Negation Dempster–Shafer (DVE-NDS) fusion method that integrates AIS kinematic priors with passive sonar signals. First, a heterogeneous recognition framework is constructed. LOFAR and DEMON features are extracted via convolutional neural networks (CNNs), while a Negation Basic Probability Assignment (Negation BPA) strategy is introduced to transform AIS spatiotemporal mismatches into effective "negation support" for non-cooperative underwater targets. Instead of relying on a single conflict coefficient, the proposed method jointly considers evidence self-information and inter-source consistency. Evidence quality is quantified using improved Deng entropy and negation belief entropy, while mutual trust is evaluated via the Jousselme distance. Heterogeneous evidence is weighted and corrected by generated coupling weights, effectively suppressing low-quality evidence and sharpening decision boundaries. Simulation results confirm that DVE-NDS improves macro-F1 over classical fusion, indicating the framework’s potential for handling conflicting evidence, though the current validation remains simulation-based and should be regarded as a methodological proof-of-concept. Full article

(This article belongs to the Special Issue Emerging Computational Methods in Intelligent Marine Vehicles)

► Show Figures

Figure 1

16 pages, 3039 KB

Open AccessArticle

A Preclinical Study of a PSMA Ligand-Based Dual-Modality Probe for Radical Prostatectomy

by Haoxi Zhou, Zhiqiang Chen, Long Yi, Baojun Wang, Shaoxi Niu, Yu Gao and Xu Zhang

Pharmaceuticals 2026, 19(4), 564; https://doi.org/10.3390/ph19040564 - 1 Apr 2026

Viewed by 294

Abstract

Purpose: Prostate-specific membrane antigen (PSMA) is a well-established molecular target in prostate cancer (PCa). Both radionuclide imaging and near-infrared fluorescence (NIRF) imaging offer high sensitivity for in vivo tumor detection. PSMA-targeted dual-modality probes integrating these two imaging techniques provide complementary preoperative and [...] Read more.

Purpose: Prostate-specific membrane antigen (PSMA) is a well-established molecular target in prostate cancer (PCa). Both radionuclide imaging and near-infrared fluorescence (NIRF) imaging offer high sensitivity for in vivo tumor detection. PSMA-targeted dual-modality probes integrating these two imaging techniques provide complementary preoperative and intraoperative tumor visualization, thereby improving surgical guidance in PCa. In this study, we aimed to develop a novel dual-labeled PSMA probe combining radioactive and fluorescent properties to achieve precise tumor delineation during radical prostatectomy (RP). Methods: A high-affinity PSMA-targeted fluorescent probe (PSMA-DF) was synthesized using solid-phase synthesis. Subsequent radiolabeling with the radionuclide [⁶⁸Ga]Ga yielded the successful generation of a dual-modal PSMA-targeted molecular probe, namely [⁶⁸Ga]Ga-PSMA-DF. The probe was systematically evaluated both in vitro and in vivo, and its safety profile was assessed through acute toxicity testing. Tumor-bearing nude mouse models were established using PSMA-positive 22Rv1 and PSMA-negative PC-3 PCa cell lines. Imaging performance, tumor-targeting specificity, and biodistribution of the probe were comprehensively evaluated using micro-PET imaging, in vivo fluorescence imaging, and biodistribution studies. Results: High-quality and high-purity PSMA-DF was successfully prepared, which exhibited excellent optical properties. Following radiolabeling with [⁶⁸Ga]Ga, a dual-modality radionuclide-fluorescence probe ([⁶⁸Ga]Ga-PSMA-DF) was successfully constructed. In vitro cellular uptake studies demonstrated that 22Rv1 cells had relatively high uptake of the probe, reaching 7.34 ± 0.55 IA%/10⁶ cells at 120 min. In contrast, PC-3 cells and blocked 22Rv1 cells displayed minimal uptake, confirming the specific targeting ability of the probe. In vivo evaluations were conducted on tumor-bearing mice using micro-PET/CT and NIRF imaging. The results revealed that [⁶⁸Ga]Ga-PSMA-DF achieved high specific tumor accumulation in 22Rv1 xenografts, with the peak tumor uptake (SUVmax = 1.748 ± 0.132) and tumor-to-muscle ratio (11.542 ± 1.511) observed at 120 min. Notably, high-contrast fluorescence imaging was also achieved at later time points, yielding a tumor-to-background ratio (TBR) of 6.559 ± 1.415 at 48 h. Notably, ex vivo biodistribution data were consistent with in vivo imaging findings. Conclusions: This preclinical study demonstrates that [⁶⁸Ga]Ga-PSMA-DF exhibits high and specific uptake in PCa models, supporting its potential as a dual-modality tracer for both PET/CT imaging and real-time intraoperative fluorescence guidance during PCa surgery. Full article

(This article belongs to the Section Medicinal Chemistry)

► Show Figures

Figure 1

23 pages, 7126 KB

Open AccessArticle

Dual-Modal Chicken Mortality Detection Using Dynamic Hybrid Convolution-Based Feature Fusion

by Tian Hua, Qian Fan, Runhao Chen, Yulin Bi, Hao Bai, Zhixiu Wang, Guobin Chang and Wenming Zhao

Animals 2026, 16(7), 1057; https://doi.org/10.3390/ani16071057 - 31 Mar 2026

Viewed by 249

Abstract

In large-scale caged broiler farms, daily inspection of dead broilers is essential for flock health management and disease prevention. To address the significant performance degradation of existing methods under challenging conditions such as poor lighting, severe occlusion, and complex backgrounds, this paper proposes [...] Read more.

In large-scale caged broiler farms, daily inspection of dead broilers is essential for flock health management and disease prevention. To address the significant performance degradation of existing methods under challenging conditions such as poor lighting, severe occlusion, and complex backgrounds, this paper proposes a dual-modal dynamic hybrid convolutional feature fusion method for dead bird detection based on an improved YOLO11 framework, termed YOLO11-DualDynConv-FF. First, a dual-modal fusion network architecture was developed to combine RGB and infrared (IR) images, enabling the model to simultaneously process both modalities. By integrating complementary information from RGB and IR data, the proposed method significantly improved detection accuracy and efficiency under low-light conditions. Second, a dynamic hybrid convolution feature fusion module was designed to merge multi-scale feature maps with contextual information, allowing the network to capture fine-grained details and adapt better to complex farming environments. In addition, an occlusion-aware module was introduced to specifically address the physical occlusion challenges prevalent in crowded cage settings. Comparative experiments and ablation studies involving multiple object detection networks were conducted to evaluate the proposed method. The results show that the improved YOLO11 model achieves superior performance, with precision, recall, F1-score, and mAP@0.5 reaching 92.6%, 79.0%, 0.85, and 80.1%, respectively. These results represent improvements of 2.0%, 5.0%, 0.17, and 12.1%, respectively, over the original YOLO11 model. Compared with existing approaches, the proposed model is better suited to complex real-world poultry farming environments and achieves higher detection accuracy, providing a valuable reference for intelligent monitoring in caged poultry farming. Full article

(This article belongs to the Section Poultry)

► Show Figures

Figure 1

20 pages, 7082 KB

Open AccessArticle

Machine Learning-Powered Smart Sensing of Copper Ions in Water Based on a Carbon Dot-Incorporated Hydrogel Platform: An Easy Path from Bench to Onsite Detection

by Ramanand Bisauriya, Richa Gupta, Ashwin S. Deshpande, Ansh Agarwal, Aryan Agarwal and Roberto Pizzoferrato

Sensors 2026, 26(7), 2142; https://doi.org/10.3390/s26072142 - 31 Mar 2026

Viewed by 188

Abstract

Water supplies contaminated by heavy metals pose a serious threat to human health, especially in areas without access to centralized testing facilities. While copper is a necessary heavy metal in trace levels, high concentrations can have detrimental effects on health, such as oxidative [...] Read more.

Water supplies contaminated by heavy metals pose a serious threat to human health, especially in areas without access to centralized testing facilities. While copper is a necessary heavy metal in trace levels, high concentrations can have detrimental effects on health, such as oxidative stress, cognitive impairment, and liver damage. Due to their expense, complexity, and reliance on laboratories, conventional detection techniques are accurate but unsuitable for real-time, dispersed deployment. Machine learning offers a potent solution to these constraints by facilitating the automatic, precise, and quick interpretation of complicated sensor data. It makes it possible to make decisions in real time without requiring a large laboratory infrastructure. In this work, a dual-mode optical sensor was developed using the colorimetry and fluorometry images of carbon dots embedded in hydrogels with the Cu²⁺ concentration of 0, 20, 50, 100, 200, and 500 μM. Data augmentation was used to expand the RGB picture dataset for each modality, and these data were interpolated to provide responses at 1 µM intervals (0–500 µM). We trained a comprehensive set of supervised machine learning models, including Logistic Regression, Support Vector Machines, Random Forest, and XGBoost, to categorize water samples into five risk-informed quality levels. The system achieved classification accuracies exceeding 96%. Furthermore, we built a simple user interface to make the system practically deployable in mobile phone. Together, these results demonstrate a scalable, interpretable, cost-effective, and quick solution for real-time water quality monitoring in resource-constrained environments. Since the proposed method focuses on classifying concentration ranges rather than precise quantification, a formal limit of detection (LOD) was not calculated; instead, the lowest concentration in the experimental dataset serves as the minimum detectable level. Full article

(This article belongs to the Collection Optical Chemical Sensors: Design and Applications)

► Show Figures

Figure 1

21 pages, 930 KB

Open AccessArticle

DBCF-Net: A Dual-Branch Cross-Scale Fusion Network for Heterogeneous Satellite–UAV Change Detection

by Yan Ren, Ruiyong Li, Pengbo Zhai and Xinyu Chen

Remote Sens. 2026, 18(7), 1009; https://doi.org/10.3390/rs18071009 - 27 Mar 2026

Viewed by 282

Abstract

Heterogeneous change detection (HCD) using satellite and Unmanned Aerial Vehicle (UAV) imagery is a pivotal task in remote sensing and Earth observation. However, the effective utilization of such multi-source data is significantly hindered by extreme spatial resolution disparities and distinct radiometric characteristics. Existing [...] Read more.

Heterogeneous change detection (HCD) using satellite and Unmanned Aerial Vehicle (UAV) imagery is a pivotal task in remote sensing and Earth observation. However, the effective utilization of such multi-source data is significantly hindered by extreme spatial resolution disparities and distinct radiometric characteristics. Existing deep learning methods, often based on weight-sharing Siamese architectures, struggle to bridge these domain gaps, leading to spectral pseudo-changes and blurred detection boundaries. To address these challenges, we propose a novel Dual-Branch Cross-Scale Fusion Network (DBCF-Net) specifically tailored for heterogeneous satellite–UAV change detection. We introduce a Difference-Aware Attention Module (DAAM) to explicitly align cross-modal feature spaces and suppress domain-related noise through a hybrid local–global attention mechanism. Furthermore, an Adaptive Gated Fusion Module (AGFM) is designed to dynamically weight multi-scale interactions, ensuring the preservation of high-frequency spatial details from UAV imagery while maintaining the semantic consistency of satellite data. Extensive experiments on the Heterogeneous Satellite–UAV Dataset (HSUD) demonstrate that DBCF-Net achieves state-of-the-art performance, reaching an F1-score of 88.75% and an IoU of 80.58%. This study provides a robust technical framework for heterogeneous sensor fusion and high-precision monitoring in complex remote sensing scenarios. Full article

(This article belongs to the Section Remote Sensing Image Processing)

► Show Figures

Figure 1

14 pages, 23202 KB

Open AccessArticle

Design and Application of a Mobile Ultra-Audio Frequency Electromagnetic Measurement System

by Hongyu Ruan, Zucan Lin, Keyu Zhou, Yongqing Wang, Qisheng Zhang and Hui Zhang

Sensors 2026, 26(7), 2095; https://doi.org/10.3390/s26072095 - 27 Mar 2026

Viewed by 294

Abstract

Although high-frequency electromagnetic methods, such as Radio Magnetotellurics (RMT) and Controlled-Source Radio Magnetotellurics (CSRMT), are highly effective for shallow-to-medium depth exploration, deploying traditional transmitter–receiver setups remains labor-intensive and significantly slows down large-scale surveys. To overcome these logistical bottlenecks, we developed a mobile Ultra-Audio [...] Read more.

Although high-frequency electromagnetic methods, such as Radio Magnetotellurics (RMT) and Controlled-Source Radio Magnetotellurics (CSRMT), are highly effective for shallow-to-medium depth exploration, deploying traditional transmitter–receiver setups remains labor-intensive and significantly slows down large-scale surveys. To overcome these logistical bottlenecks, we developed a mobile Ultra-Audio Frequency Electromagnetic (UAEM) measurement system. While the hardware is designed with dual-mode capabilities supporting conventional controlled-source operations, this paper specifically focuses on its application in a Signals of Opportunity (SOOP) mode. By utilizing pre-existing, stable anthropogenic signals, including Amplitude Modulation (AM) broadcasts and naval very low frequency communications, the system effectively functions as a broadband RMT receiver. Technical evaluations demonstrate that the instrument operates across a 1 Hz to 1000 kHz bandwidth with a high sampling rate of 2.5 MHz. Furthermore, it achieves a dynamic range of 143 dB and maintains an apparent resistivity measurement accuracy of better than 3%. Thanks to its modular, vehicle-towed design, the UAEM system enables continuous, on-the-move data acquisition wherever ambient field sources are available. This approach eliminates the need for dedicated transmitter deployment, fundamentally reducing exploration costs and boosting overall survey efficiency. Full article

(This article belongs to the Special Issue Advanced Sensing Technologies for Space Electromagnetic Environments)

► Show Figures

Figure 1

18 pages, 1175 KB

Open AccessArticle

Cross-Modal Few-Shot Learning via Siamese Similarity Networks on CLIP Embeddings for Fine-Grained Image Classification

by Julius Olaniyan, Silas Formunyuy Verkijika and Ibidun C. Obagbuwa

Appl. Sci. 2026, 16(7), 3181; https://doi.org/10.3390/app16073181 - 26 Mar 2026

Viewed by 256

Abstract

Fine-grained image classification under few-shot learning conditions remains a significant challenge due to limited labeled data and high intra-class similarity. This paper proposes a novel cross-modal framework that integrates Contrastive Language-Image Pretraining (CLIP) embeddings within a Siamese similarity network to enable robust and [...] Read more.

Fine-grained image classification under few-shot learning conditions remains a significant challenge due to limited labeled data and high intra-class similarity. This paper proposes a novel cross-modal framework that integrates Contrastive Language-Image Pretraining (CLIP) embeddings within a Siamese similarity network to enable robust and label-efficient classification. By leveraging the semantic alignment between textual class descriptions and visual representations, the model forms hybrid similarity pairs of image-to-image and image-to-text within a shared latent space, facilitating discriminative learning under low-shot scenarios. The architecture employs a dual-branch CLIP encoder and a contrastive loss function to optimize intra-class compactness and inter-class separability. Experiments conducted on benchmark datasets including miniImageNet and CUB-200-2011 demonstrate substantial improvements over zero-shot and few-shot baselines, achieving 70.32% accuracy, 71.15% F1-score, and 68.47% mAP on 5-way 1-shot and 78.41% accuracy, 79.02% F1-score, and 76.83% mAP on 5-way 5-shot tasks (averaged over 600 episodes with 95% confidence intervals on the CUB-200-2011 dataset). Extended evaluations under 10-way settings show similarly strong performance. Ablation studies further validate the critical roles of contrastive learning, normalization, and cross-modal embeddings in enhancing generalization. This work presents a scalable and interpretable paradigm for fine-grained classification in data-scarce domains. Full article

► Show Figures

Figure 1

20 pages, 4497 KB

Open AccessArticle

Remote Sensing Identification of Benggang Using a Two-Stream Network with Multimodal Feature Enhancement and Sparse Attention

by Xuli Rao, Qihao Chen, Kexin Zhu, Zhide Chen, Jinshi Lin and Yanhe Huang

Electronics 2026, 15(6), 1331; https://doi.org/10.3390/electronics15061331 - 23 Mar 2026

Viewed by 209

Abstract

Benggang (Benggang), a typical landform characterized by severe erosion and a geohazard in the red-soil hilly regions of southern China, is characterized by a fragmented texture, irregular boundaries, and high similarity to background objects such as bare soil and roads, which poses a [...] Read more.

Benggang (Benggang), a typical landform characterized by severe erosion and a geohazard in the red-soil hilly regions of southern China, is characterized by a fragmented texture, irregular boundaries, and high similarity to background objects such as bare soil and roads, which poses a dual challenge of “multiscale variability + strong noise” for automated identification at regional scales. To address insufficient information from a single modality and the limited representation of cross-scale features, this study proposes a dual-stream feature-fusion network (DF-Net) for multisource data consisting of a digital orthophoto map (DOM) and a digital elevation model (DEM). The method adopts ResNeSt50d as the backbone of the two branches: on the DOM side, a Canny-edge channel is stacked to enhance high-frequency boundary information; on the DEM side, derived terrain factors, including slope, aspect, curvature, and hillshade, are introduced to provide morphological constraints. In the cross-modal fusion stage, a multiscale sparse attention fusion module is designed, which acquires contextual information via multiwindow average pooling and suppresses noise interference through top-K sparsification. In the decision stage, a multibranch ensemble is employed to improve classification stability. Taking Anxi County, Fujian Province, as the study area, a coregistered dataset of GF-2 (1 m) DOM and ALOS (12.5 m) DEMs is constructed, and a zonal partitioning strategy is adopted to evaluate the model’s generalization ability. The experimental results show that DF-Net achieves 97.44% accuracy, 85.71% recall, and an 82.98% F1 score in the independent test zone, outperforming multiple mainstream CNN/transformer classification models. This study indicates that the strategy of “multimodal feature enhancement + sparse attention fusion” tailored to Benggang erosional landforms can significantly improve recognition performance under complex backgrounds, providing technical support for rapid Benggang surveys and governance-effectiveness assessments. Full article

(This article belongs to the Section Artificial Intelligence)

► Show Figures

Figure 1

22 pages, 3299 KB

Open AccessArticle

DualStream-RTNet: A Multimodal Deep Learning Framework for Grape Cultivar Classification and Soluble Solid Content Prediction

by Zhiguo Liu, Yufei Song, Aoran Liu, Xi Meng, Chang Liu, Shanshan Li, Xiangqing Wang and Guifa Teng

Foods 2026, 15(6), 1095; https://doi.org/10.3390/foods15061095 - 20 Mar 2026

Viewed by 301

Abstract

Accurate and non-destructive evaluation of grape quality is crucial for intelligent viticulture, yet most existing approaches address cultivar classification and soluble solid content (SSC) prediction as independent tasks based on single-modality data, limiting robustness and practical applicability. This study proposes DualStream-RTNet, a unified [...] Read more.

Accurate and non-destructive evaluation of grape quality is crucial for intelligent viticulture, yet most existing approaches address cultivar classification and soluble solid content (SSC) prediction as independent tasks based on single-modality data, limiting robustness and practical applicability. This study proposes DualStream-RTNet, a unified multimodal deep learning framework that simultaneously performs grape cultivar classification and SSC prediction by integrating RGB-HSV fused images and PCA-compressed hyperspectral spectra. The dual-stream architecture enables the complementary learning of external chromatic–textural cues and internal physicochemical information, while a Transformer-enhanced fusion module strengthens global representation and cross-modal correlation. A dataset of 864 berries from five grape cultivars was used to validate the model. DualStream-RTNet achieved 93.64% classification accuracy, outperforming ResNet18 and other CNN baselines, and produced more compact and consistent confusion-matrix patterns. For SSC prediction, it consistently yielded the highest performance across cultivars, with R2p values up to 0.9693 and RMSE as low as 0.2567, surpassing the PLSR, SVR, LSTM, and Transformer regression models. These results demonstrate the superiority of the proposed framework in capturing both visual and spectral characteristics. DualStream-RTNet provides an efficient and scalable solution for comprehensive grape quality assessment, offering strong potential for real-time sorting, precision grading, and smart agricultural applications. Full article

(This article belongs to the Section Food Engineering and Technology)

► Show Figures

Figure 1

23 pages, 5079 KB

Open AccessArticle

Dual-Stream Transformer with Kalman-Based Sensor Fusion for Wearable Fall Detection

by Abheek Pradhan, Sana Alamgeer, Rakesh Suvvari, Syed Tousiful Haque and Anne H. H. Ngu

Big Data Cogn. Comput. 2026, 10(3), 90; https://doi.org/10.3390/bdcc10030090 - 17 Mar 2026

Viewed by 440

Abstract

Wearable fall detection systems face a fundamental challenge: while gyroscope data provide valuable orientation cues, naively combining raw gyroscope and accelerometer signals can degrade performance due to noise contamination. To overcome this challenge, we present a dual-stream transformer architecture that incorporates (i) Kalman-based [...] Read more.

Wearable fall detection systems face a fundamental challenge: while gyroscope data provide valuable orientation cues, naively combining raw gyroscope and accelerometer signals can degrade performance due to noise contamination. To overcome this challenge, we present a dual-stream transformer architecture that incorporates (i) Kalman-based sensor fusion to convert noisy gyroscope angular velocities into stable orientation estimates (roll, pitch, yaw), maintaining an internal state of body pose, and (ii) processing accelerometer and orientation streams in separate encoder pathways before fusion to prevent cross-modal interference. Our architecture further integrates Squeeze-and-Excitation channel attention and Temporal Attention Pooling to focus on fall-critical temporal patterns. Evaluated on the SmartFallMM dataset using 21-fold leave-one-subject-out cross-validation, the dual-stream Kalman transformer achieves 91.10% F1, outperforming single-stream Kalman transformers (89.80% F1) by 1.30% and single-stream baseline transformers (88.96% F1) by 2.14%. We further evaluate the model in real time using a watch-based SmartFall App on five participants, maintaining an average F1 score of 83% and an accuracy of 90%. These results indicate robust performance in both offline and real-world deployment settings, establishing a new state-of-the-art for inertial-measurement-unit-based fall detection on commodity smartwatch devices. Full article

► Show Figures

Figure 1

16 pages, 310 KB

Open AccessArticle

A Regularized Backbone-Level Cross-Modal Interaction Framework for Stable Temporal Reasoning in Video-Language Models

by Geon-Woo Kim and Ho-Young Jung

Mathematics 2026, 14(6), 996; https://doi.org/10.3390/math14060996 - 15 Mar 2026

Viewed by 300

Abstract

Deep learning approaches for egocentric video understanding often lack a principled theoretical treatment of stability, particularly when dealing with the sparse, noisy, and temporally ambiguous observations characteristic of first-person imaging. In this work, we frame egocentric video question answering not merely as a [...] Read more.

Deep learning approaches for egocentric video understanding often lack a principled theoretical treatment of stability, particularly when dealing with the sparse, noisy, and temporally ambiguous observations characteristic of first-person imaging. In this work, we frame egocentric video question answering not merely as a classification task, but as an ill-posed inverse problem aimed at reconstructing latent semantic intent from stochastically perturbed visual signals. To address the instability inherent in standard dual-encoder architectures, we present a framework with a mathematical interpretation that incorporates gated cross-modal interaction within the transformer backbone. Formally, the video-side update analyzed in this work is defined as a learnable convex combination of unimodal feature representations and cross-modal attention residuals; the full implementation applies analogous gated cross-modal updates bidirectionally. From a regularization perspective, the gating mechanism can be interpreted as an adaptive parameter that balances data fidelity against language-conditioned structural constraints during feature reconstruction. We provide the Bounded Update Property (Lemma 1) and an analytical layer-wise sensitivity bound and empirically demonstrate that the proposed framework achieves measurable improvements in both accuracy and stability on the EgoTaskQA and MSR-VTT benchmarks. On EgoTaskQA, our model improves accuracy from 27.0% to 31.7% (+4.7 pp) and reduces the accuracy drop under 50% frame drop from 3.93 pp to 0.94 pp. On MSR-VTT, our model improves accuracy by 13.0 pp over the dual-encoder baseline. Under severe perturbation (50% frame drop) on MSR-VTT, our model retains 97.7% of its clean performance, whereas the baseline exhibits near-zero drop accompanied by majority-class behavior. These results provide empirical evidence that the proposed interaction induces stable behavior under perturbations in an ill-posed multimodal inference setting, mitigating sensitivity to sampling variability while preserving query-relevant temporal structure. Furthermore, an entropy-based analysis indicates that the gating mechanism prevents excessive diffusion of attention, promoting coherent temporal reasoning. Overall, this work offers a mathematically informed perspective on designing interaction mechanisms for stable multimodal systems, with a focus on robust reasoning under temporal ambiguity. Full article

(This article belongs to the Special Issue Mathematical Foundations and New Advances in Deep Learning Applications)

► Show Figures

Figure 1

12 pages, 494 KB

Open AccessArticle

Neuromuscular Profile of CrossFit^® Athletes: Part 1—Isometric and Ballistic Performance

by Diego A. Alonso-Aubin, Ester Jiménez-Ormeño, César Gallo-Salazar, Verónica Giráldez-Costas, Diana Ruiz-Vicente, Sara Zafra-Díaz, Francisco Areces-Corcuera and Carlos Ruiz-Moreno

J. Funct. Morphol. Kinesiol. 2026, 11(1), 118; https://doi.org/10.3390/jfmk11010118 - 15 Mar 2026

Viewed by 282

Abstract

Background: CrossFit^® has gained widespread popularity as a high-intensity training modality, yet evidence describing neuromuscular performance characteristics in this population remains limited. This study aimed to evaluate isometric and ballistic strength profiles in trained CrossFit^® athletes and to identify sex-based [...] Read more.

Background: CrossFit^® has gained widespread popularity as a high-intensity training modality, yet evidence describing neuromuscular performance characteristics in this population remains limited. This study aimed to evaluate isometric and ballistic strength profiles in trained CrossFit^® athletes and to identify sex-based differences in absolute and relative neuromuscular performance. Methods: Seventy-two athletes participated (41 males and 31 females) participated in the study, completing two maximal isometric mid-thigh pull (IMTP) tests and three countermovement jump (CMJ) tests within a single testing session. Assessments were conducted using a dual force plate system (Hawkin Dynamics, Westbrook, ME, USA). Results: In the IMTP, males exhibited substantially higher absolute isometric force outputs, including peak force (3059 ± 576 vs. 1899 ± 324 N; p < 0.001) and relative peak force (36.34 ± 6.74 vs. 30.99 ± 4.41 N/kg; p < 0.001). Rates of force development were also greater in males for both early (0–50 ms: 7665 ± 5420 vs. 4001 ± 3021 N/s; p < 0.001) and late phases (0–250 ms: 5350 ± 1832 vs. 3035 ± 886 N/s; p < 0.001). However, no significant sex differences were detected in time to peak force (2.31 ± 1.27 vs. 1.94 ± 1.04 s) or dynamic strength index (0.72 ± 0.12 vs. 0.73 ± 0.12 a.u.). In ballistic performance using CMJ, males achieved higher jump height (0.33 ± 0.07 vs. 0.23 ± 0.05 m; p < 0.001), jump momentum (215 ± 27.9 vs. 131 ± 19.1 kg·m/s; p < 0.001), and modified reactive strength index (0.46 ± 0.11 vs. 0.32 ± 0.08 a.u.; p < 0.001). Relative propulsive and braking forces were also moderately greater in males. Notably, sex differences were reduced when variables were normalized to body mass or peak force, indicating comparable relative neuromuscular function across sexes. Conclusions: These findings provide descriptive neuromuscular performance data for CrossFit^® athletes and show that sex-based differences primarily reflect disparities in absolute force-production capacity rather than intrinsic neuromuscular efficiency. Such insights may support more precise, evidence-informed, and sex-specific training prescriptions to optimize performance. Full article

(This article belongs to the Special Issue Biomechanical and Neuromuscular Perspectives in Resistance Training)

► Show Figures

Figure 1

33 pages, 5767 KB

Open AccessArticle

Hyper-Thyro Vision: An Integrated Framework for Hyperthyroidism Diagnostic Facial Image Analysis Based on Deep Learning

by Poonyisa Thepmangkorn and Suchada Sitjongsataporn

Biomimetics 2026, 11(3), 210; https://doi.org/10.3390/biomimetics11030210 - 15 Mar 2026

Viewed by 515

Abstract

This paper presents an integrated multi-modal framework for detecting hyperthyroidism-associated abnormalities, namely exophthalmos and thyroid-related neck swelling, through the joint analysis of frontal facial and neck images using a deep learning-based approach. The objective of this research is to develop an integrated AI [...] Read more.

This paper presents an integrated multi-modal framework for detecting hyperthyroidism-associated abnormalities, namely exophthalmos and thyroid-related neck swelling, through the joint analysis of frontal facial and neck images using a deep learning-based approach. The objective of this research is to develop an integrated AI framework that improves hyperthyroid-related abnormality detection by simultaneously analyzing facial images of both the eye and neck based on pattern clinical knowledge. The multi-modal framework mimics a biological visual mechanism by using a dual-pathway architecture that concurrently processes foveal-like details of the eyes and neck. It integrates these high-resolution visual embeddings with quantitative morphological measurements to simulate a clinician’s ability to fuse observation with physical assessment. The proposed system employs a multi-faceted decision-making process derived from three distinct data components: two from frontal face analysis and one from neck region analysis. Specifically, eye regions extracted from facial images are preprocessed using the YOLOv11s model. The proposed system leverages a dual-pathway processing architecture to extract comprehensive diagnostic features. For the eye dataset, the framework utilizes a face mesh-based eye landmark (FMEL) to extract both eye regions and perform eyes unfold processing. These regions are subsequently analyzed by the proposed sclera map unwrapping engine (SMUE) to derive quantitative sclera metrics from both the left and right eyes. To optimize classification, a dual-branch architecture is employed by integrating CNN visual embeddings with SMUE-derived statistical features through a feature fusion layer. Simultaneously, the neck processing path executes the neck region of interest (ROI) prediction {upper, lower} to segment critical regions for goiter assessment via the proposed neck

μ - σ

ensemble thresholding (NSET) algorithm. The experimental results demonstrate that the proposed algorithm for eye analysis achieved a mean average precision (mAP50) of 96.4%, with a specific mAP50 of 98.6% for the hyperthyroid class. Regarding quantitative scleral measurement, the SMUE process revealed distinct morphological differences, with the experimental data group exhibiting consistently higher pixel distances across the reference points compared with the normal group. Furthermore, the proposed NSET algorithm yielded the highest performance for swollen neck classification with an mAP50 of 92.0%, significantly outperforming the baseline deep learning models while maintaining lower computational complexity. Full article

(This article belongs to the Special Issue Bio-Inspired Intelligence: Bridging Neural Networks, Artificial Intelligence (AI), and Biomimetics for Next-Generation Innovation)

► Show Figures

Graphical abstract

23 pages, 13226 KB

Open AccessArticle

DDAF-Net: Decoupled and Differentiated Attention Fusion Network for Object Detection

by Bo Yu, Guanghui Zhang, Qun Wang and Lei Wang

Sensors 2026, 26(6), 1812; https://doi.org/10.3390/s26061812 - 13 Mar 2026

Viewed by 347

Abstract

The fusion of data from visible (RGB) and infrared (IR) sensors is essential for robust all-day and all-weather object detection. However, existing methods often suffer from modality redundancy and noise interference. To address these challenges, we propose the Decoupled and Differentiated Attention Fusion [...] Read more.

The fusion of data from visible (RGB) and infrared (IR) sensors is essential for robust all-day and all-weather object detection. However, existing methods often suffer from modality redundancy and noise interference. To address these challenges, we propose the Decoupled and Differentiated Attention Fusion Network (DDAF-Net). Architecturally, DDAF-Net employs a decoupled backbone with a Siamese weight-sharing strategy to extract modality-common features, while parallel branches capture modality-specific features. To effectively integrate these features, we design the Differentiated Attention Fusion Module (DAFM). First, we introduce Spatial Residual Unshuffle Embedding (SRUE) to achieve lossless downsampling while preserving global semantic information. Second, differentiated attention mechanisms are applied for feature enhancement: Dual-Norm Alignment Attention (DNAA) facilitates effective modal alignment and enhances semantic consistency in modality-common features, while Sparse Purification Attention (SPA) enables selective utilization of complementary information by suppressing noise and focusing on salient regions in modality-specific features. Finally, the Adaptive Complementary Fusion Module (ACFM) integrates these components by using modality-common features as a baseline and dynamically weighting the complementary modality-specific information. Extensive experiments on public datasets such as LLVIP and M³FD demonstrate that DDAF-Net achieves state-of-the-art performance. These results validate the effectiveness of our proposed decoupling–enhancement–fusion paradigm. Full article

(This article belongs to the Section Physical Sensors)

► Show Figures

Figure 1

Search Results (344)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (344)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI