Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

Search Results (639)

Search Parameters:
Keywords = Adaptive Fusion Attention Module

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
18 pages, 3092 KB  
Article
Adverse-Weather Image Restoration Method Based on VMT-Net
by Zhongmin Liu, Xuewen Yu and Wenjin Hu
J. Imaging 2025, 11(11), 376; https://doi.org/10.3390/jimaging11110376 (registering DOI) - 26 Oct 2025
Abstract
To address global semantic loss, local detail blurring, and spatial–semantic conflict during image restoration under adverse weather conditions, we propose an image restoration network that integrates Mamba with Transformer architectures. We first design a Vision-Mamba–Transformer (VMT) module that combines the long-range dependency modeling [...] Read more.
To address global semantic loss, local detail blurring, and spatial–semantic conflict during image restoration under adverse weather conditions, we propose an image restoration network that integrates Mamba with Transformer architectures. We first design a Vision-Mamba–Transformer (VMT) module that combines the long-range dependency modeling of Vision Mamba with the global contextual reasoning of Transformers, facilitating the joint modeling of global structures and local details, thus mitigating information loss and detail blurring during restoration. Second, we introduce an Adaptive Content Guidance (ACG) module that employs dynamic gating and spatial–channel attention to enable effective inter-layer feature fusion, thereby enhancing cross-layer semantic consistency. Finally, we embed the VMT and ACG modules into a U-Net backbone, achieving efficient integration of multi-scale feature modeling and cross-layer fusion, significantly improving reconstruction quality under complex weather conditions. The experimental results show that on Snow100K-S/L, VMT-Net improves PSNR over the baseline by approximately 0.89 dB and 0.36 dB, with SSIM gains of about 0.91% and 0.11%, respectively. On Outdoor-Rain and Raindrop, it performs similarly to the baseline and exhibits superior detail recovery in real-world scenes. Overall, the method demonstrates robustness and strong detail restoration across diverse adverse-weather conditions. Full article
Show Figures

Figure 1

21 pages, 3381 KB  
Article
Aero-Engine Ablation Defect Detection with Improved CLR-YOLOv11 Algorithm
by Yi Liu, Jiatian Liu, Yaxi Xu, Qiang Fu, Jide Qian and Xin Wang
Sensors 2025, 25(21), 6574; https://doi.org/10.3390/s25216574 (registering DOI) - 25 Oct 2025
Abstract
Aero-engine ablation detection is a critical task in aircraft health management, yet existing rotation-based object detection methods often face challenges of high computational complexity and insufficient local feature extraction. This paper proposes an improved YOLOv11 algorithm incorporating Context-guided Large-kernel attention and Rotated detection [...] Read more.
Aero-engine ablation detection is a critical task in aircraft health management, yet existing rotation-based object detection methods often face challenges of high computational complexity and insufficient local feature extraction. This paper proposes an improved YOLOv11 algorithm incorporating Context-guided Large-kernel attention and Rotated detection head, called CLR-YOLOv11. The model achieves synergistic improvement in both detection efficiency and accuracy through dual structural optimization, with its innovations primarily embodied in the following three tightly coupled strategies: (1) Targeted Data Preprocessing Pipeline Design: To address challenges such as limited sample size, low overall image brightness, and noise interference, we designed an ordered data augmentation and normalization pipeline. This pipeline is not a mere stacking of techniques but strategically enhances sample diversity through geometric transformations (random flipping, rotation), hybrid augmentations (Mixup, Mosaic), and pixel-value transformations (histogram equalization, Gaussian filtering). All processed images subsequently undergo Z-Score normalization. This order-aware pipeline design effectively improves the quality, diversity, and consistency of the input data. (2) Context-Guided Feature Fusion Mechanism: To overcome the limitations of traditional Convolutional Neural Networks in modeling long-range contextual dependencies between ablation areas and surrounding structures, we replaced the original C3k2 layer with the C3K2CG module. This module adaptively fuses local textural details with global semantic information through a context-guided mechanism, enabling the model to more accurately understand the gradual boundaries and spatial context of ablation regions. (3) Efficiency-Oriented Large-Kernel Attention Optimization: To expand the receptive field while strictly controlling the additional computational overhead introduced by rotated detection, we replaced the C2PSA module with the C2PSLA module. By employing large-kernel decomposition and a spatial selective focusing strategy, this module significantly reduces computational load while maintaining multi-scale feature perception capability, ensuring the model meets the demands of high real-time applications. Experiments on a self-built aero-engine ablation dataset demonstrate that the improved model achieves 78.5% mAP@0.5:0.95, representing a 4.2% improvement over the YOLOv11-obb which model without the specialized data augmentation. This study provides an effective solution for high-precision real-time aviation inspection tasks. Full article
(This article belongs to the Special Issue Advanced Neural Architectures for Anomaly Detection in Sensory Data)
Show Figures

Figure 1

22 pages, 1512 KB  
Article
A Data-Driven Multi-Granularity Attention Framework for Sentiment Recognition in News and User Reviews
by Wenjie Hong, Shaozu Ling, Siyuan Zhang, Yinke Huang, Yiyan Wang, Zhengyang Li, Xiangjun Dong and Yan Zhan
Appl. Sci. 2025, 15(21), 11424; https://doi.org/10.3390/app152111424 (registering DOI) - 25 Oct 2025
Abstract
Sentiment analysis plays a crucial role in domains such as financial news, user reviews, and public opinion monitoring, yet existing approaches face challenges when dealing with long and domain-specific texts due to semantic dilution, insufficient context modeling, and dispersed emotional signals. To address [...] Read more.
Sentiment analysis plays a crucial role in domains such as financial news, user reviews, and public opinion monitoring, yet existing approaches face challenges when dealing with long and domain-specific texts due to semantic dilution, insufficient context modeling, and dispersed emotional signals. To address these issues, a multi-granularity attention-based sentiment analysis model built on a transformer backbone is proposed. The framework integrates sentence-level and document-level hierarchical modeling, a different-dimensional embedding strategy, and a cross-granularity contrastive fusion mechanism, thereby achieving unified representation and dynamic alignment of local and global emotional features. Static word embeddings combined with dynamic contextual embeddings enhance both semantic stability and context sensitivity, while the cross-granularity fusion module alleviates sparsity and dispersion of emotional cues in long texts, improving robustness and discriminability. Extensive experiments on multiple benchmark datasets demonstrate the effectiveness of the proposed model. On the Financial Forum Reviews dataset, it achieves an accuracy of 0.932, precision of 0.928, recall of 0.925, F1-score of 0.926, and AUC of 0.951, surpassing state-of-the-art baselines such as BERT and RoBERTa. On the Financial Product User Reviews dataset, the model obtains an accuracy of 0.902, precision of 0.898, recall of 0.894, and AUC of 0.921, showing significant improvements for short-text sentiment tasks. On the Financial News dataset, it achieves an accuracy of 0.874, precision of 0.869, recall of 0.864, and AUC of 0.895, highlighting its strong adaptability to professional and domain-specific texts. Ablation studies further confirm that the multi-granularity transformer structure, the different-dimensional embedding strategy, and the cross-granularity fusion module each contribute critically to overall performance improvements. Full article
Show Figures

Figure 1

23 pages, 11034 KB  
Article
UEBNet: A Novel and Compact Instance Segmentation Network for Post-Earthquake Building Assessment Using UAV Imagery
by Ziying Gu, Shumin Wang, Kangsan Yu, Yuanhao Wang and Xuehua Zhang
Remote Sens. 2025, 17(21), 3530; https://doi.org/10.3390/rs17213530 (registering DOI) - 24 Oct 2025
Abstract
Unmanned aerial vehicle (UAV) remote sensing is critical in assessing post-earthquake building damage. However, intelligent disaster assessment via remote sensing faces formidable challenges from complex backgrounds, substantial scale variations in targets, and diverse spatial disaster dynamics. To address these issues, we propose UEBNet, [...] Read more.
Unmanned aerial vehicle (UAV) remote sensing is critical in assessing post-earthquake building damage. However, intelligent disaster assessment via remote sensing faces formidable challenges from complex backgrounds, substantial scale variations in targets, and diverse spatial disaster dynamics. To address these issues, we propose UEBNet, a high-precision post-earthquake building instance segmentation model that systematically enhances damage recognition by integrating three key modules. Firstly, the Depthwise Separable Convolutional Block Attention Module suppresses background noise that visually resembles damaged structures. This is achieved by expanding the receptive field using multi-scale pooling and dilated convolutions. Secondly, the Multi-feature Fusion Module generates scale-robust feature representations for damaged buildings with significant size differences by processing feature streams from different receptive fields in parallel. Finally, the Adaptive Multi-Scale Interaction Module accurately reconstructs the irregular contours of damaged buildings through an advanced feature alignment mechanism. Extensive experiments were conducted using UAV imagery collected after the Ms 6.8 earthquake in Tingri County, Tibet Autonomous Region, China, on 7 January 2025, and the Ms 6.2 earthquake in Jishishan County, Gansu Province, China, on 18 December 2023. Results indicate that UEBNet enhances segmentation mean Average Precision (mAPseg) and bounding box mean Average Precision (mAPbox) by 3.09% and 2.20%, respectively, with equivalent improvements of 2.65% in F1-score and 1.54% in overall accuracy, outperforming state-of-the-art instance segmentation models. These results demonstrate the effectiveness and reliability of UEBNet in accurately segmenting earthquake-damaged buildings in complex post-disaster scenarios, offering valuable support for emergency response and disaster relief. Full article
Show Figures

Figure 1

24 pages, 987 KB  
Article
Meta-Learning Enhanced 3D CNN-LSTM Framework for Predicting Durability of Mechanical Metal–Concrete Interfaces in Building Composite Materials with Limited Historical Data
by Fangyuan Cui, Lie Liang and Xiaolong Chen
Buildings 2025, 15(21), 3848; https://doi.org/10.3390/buildings15213848 (registering DOI) - 24 Oct 2025
Abstract
We propose a novel meta-learning enhanced 3D CNN-LSTM framework for durability prediction. The framework integrates 3D microstructural data from micro-CT scanning with environmental time-series data through a dual-branch architecture: a 3D CNN branch extracts spatial degradation patterns from volumetric data, while an LSTM [...] Read more.
We propose a novel meta-learning enhanced 3D CNN-LSTM framework for durability prediction. The framework integrates 3D microstructural data from micro-CT scanning with environmental time-series data through a dual-branch architecture: a 3D CNN branch extracts spatial degradation patterns from volumetric data, while an LSTM network processes temporal environmental factors. To address data scarcity, we incorporate a prototypical network-based meta-learning module that learns class prototypes from limited support samples and generalizes predictions to new corrosion scenarios through distance-based probability estimation. Additionally, we develop a dynamic feature fusion mechanism that adaptively combines spatial, environmental, and mechanical features using trainable attention coefficients, enabling context-aware representation learning. Finally, an interface damage visualization component identifies critical degradation zones and propagation trajectories, providing interpretable engineering insights. Experimental validation on laboratory specimens demonstrates superior accuracy (74.6% in 1-shot scenarios) compared to conventional methods, particularly in aggressive corrosion environments where data scarcity typically hinders reliable prediction. The visualization system generates interpretable 3D damage maps with an average Intersection-over-Union of 0.78 compared to ground truth segmentations. This work establishes a unified computational framework bridging microstructure analysis with macroscopic durability assessment, offering practical value for infrastructure maintenance decision-making under uncertainty. The modular design facilitates extension to diverse interface types and environmental conditions. Full article
(This article belongs to the Section Construction Management, and Computers & Digitization)
24 pages, 11432 KB  
Article
MRDAM: Satellite Cloud Image Super-Resolution via Multi-Scale Residual Deformable Attention Mechanism
by Liling Zhao, Zichen Liao and Quansen Sun
Remote Sens. 2025, 17(21), 3509; https://doi.org/10.3390/rs17213509 - 22 Oct 2025
Viewed by 234
Abstract
High-resolution meteorological satellite cloud imagery plays a crucial role in diagnosing and forecasting severe convective weather phenomena characterized by suddenness and locality, such as tropical cyclones. However, constrained by imaging principles and various internal/external interferences during satellite data acquisition, current satellite imagery often [...] Read more.
High-resolution meteorological satellite cloud imagery plays a crucial role in diagnosing and forecasting severe convective weather phenomena characterized by suddenness and locality, such as tropical cyclones. However, constrained by imaging principles and various internal/external interferences during satellite data acquisition, current satellite imagery often fails to meet the spatiotemporal resolution requirements for fine-scale monitoring of these weather systems. Particularly for real-time tracking of tropical cyclone genesis-evolution dynamics and capturing detailed cloud structure variations within cyclone cores, existing spatial resolutions remain insufficient. Therefore, developing super-resolution techniques for meteorological satellite cloud imagery through software-based approaches holds significant application potential. This paper proposes a Multi-scale Residual Deformable Attention Model (MRDAM) based on Generative Adversarial Networks (GANs), specifically designed for satellite cloud image super-resolution tasks considering their morphological diversity and non-rigid deformation characteristics. The generator architecture incorporates two key components: a Multi-scale Feature Progressive Fusion Module (MFPFM), which enhances texture detail preservation and spectral consistency in reconstructed images, and a Deformable Attention Additive Fusion Module (DAAFM), which captures irregular cloud pattern features through adaptive spatial-attention mechanisms. Comparative experiments against multiple GAN-based super-resolution baselines demonstrate that MRDAM achieves superior performance in both objective evaluation metrics (PSNR/SSIM) and subjective visual quality, proving its superior performance for satellite cloud image super-resolution tasks. Full article
(This article belongs to the Special Issue Neural Networks and Deep Learning for Satellite Image Processing)
Show Figures

Figure 1

21 pages, 3685 KB  
Article
MSRLNet: A Multi-Source Fusion and Feedback Network for EEG Feature Recognition in ADHD
by Qiulei Han, Ze Song, Hongbiao Ye, Yan Sun, Jian Zhao, Lijuan Shi and Zhejun Kuang
Brain Sci. 2025, 15(11), 1132; https://doi.org/10.3390/brainsci15111132 - 22 Oct 2025
Viewed by 203
Abstract
Background: Electroencephalography (EEG) has been widely used in Attention Deficit Hyperactivity Disorder (ADHD) recognition, but existing methods still suffer from limitations in dynamic modeling, small-sample adaptability, and training stability. This study proposes a Multi-Source Fusion and Feedback Network (MSRLNet) to enhance EEG-based ADHD [...] Read more.
Background: Electroencephalography (EEG) has been widely used in Attention Deficit Hyperactivity Disorder (ADHD) recognition, but existing methods still suffer from limitations in dynamic modeling, small-sample adaptability, and training stability. This study proposes a Multi-Source Fusion and Feedback Network (MSRLNet) to enhance EEG-based ADHD recognition. Methods: MSRLNet comprises three modules: (1) Multi-Source Feature Fusion (MSFF), combining microstate and statistical features to improve interpretability; (2) a CNN-GRU Parallel Module (CGPM) for multi-scale temporal modeling; and (3) Performance Feedback–driven Parameter Optimization (PFPO) to enhance training stability. Feature-level data augmentation is introduced to alleviate overfitting in small-sample scenarios. Results: On a public dataset, MSRLNet achieved an accuracy of 98.90%, an F1-score of 98.98%, and a kappa of 0.979, all exceeding comparative approaches. Conclusions: MSRLNet shows high accuracy and robustness in ADHD EEG feature recognition, verifying its potential application value in clinical auxiliary diagnosis. Full article
Show Figures

Figure 1

18 pages, 10539 KB  
Article
Coal Shearer Drum Detection in Underground Mines Based on DCS-YOLO
by Tao Hu, Jinbo Qiu, Libo Zheng, Zehai Yu and Cong Liu
Electronics 2025, 14(20), 4132; https://doi.org/10.3390/electronics14204132 - 21 Oct 2025
Viewed by 192
Abstract
To address the challenges of low illumination, heavy dust, and severe occlusion in fully mechanized mining faces, this paper proposes a shearer drum detection algorithm named DCS-YOLO. To enhance the model’s ability to effectively capture features under drum deformation and occlusion, a C3k2_DCNv4 [...] Read more.
To address the challenges of low illumination, heavy dust, and severe occlusion in fully mechanized mining faces, this paper proposes a shearer drum detection algorithm named DCS-YOLO. To enhance the model’s ability to effectively capture features under drum deformation and occlusion, a C3k2_DCNv4 module based on deformable convolution (DCNv4) is incorporated into the network. This module adaptively adjusts convolution sampling points according to the drum’s size and position, enabling efficient and precise multi-scale feature extraction. To overcome the limitations of conventional convolution in global feature modeling, a convolution and attention fusion module (CAFM) is constructed, which combines lightweight convolution with attention mechanisms to selectively reweight feature maps at different resolutions. Under low-light conditions, the Shape-IoU loss function is employed to achieve accurate regression of irregular drum boundaries while considering both positional and shape similarity. In addition, GSConv is adopted to achieve model lightweighting while maintaining efficient feature extraction capability. Experiments were conducted on a dataset built from shearer drum images collected in underground coal mines. The results demonstrate that, compared with YOLOv11n, the proposed method reduces Params and Flops by 7.7% and 4.6%, respectively, while improving precision, recall, mAP@0.5, and mAP@0.5:0.95 by 2.9%, 3.2%, 1.1%, and 3.3%, respectively. These findings highlight the significant advantages of the proposed approach in both model lightweighting and detection performance. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

19 pages, 4123 KB  
Article
A Feature-Enhancement 6D Pose Estimation Method for Weakly Textured and Occluded Targets
by Xiaoqing Liu, Kaijun Zhou, Qingyuan Zeng and Peng Li
Electronics 2025, 14(20), 4125; https://doi.org/10.3390/electronics14204125 - 21 Oct 2025
Viewed by 226
Abstract
To achieve real-time and accurate pose estimation for weakly textured or occluded targets, this study proposes a feature-enhancement 6D pose estimation method based on DenseFusion. Firstly, in the image feature extraction stage, skip connections and attention modules, which could effectively fuse deep and [...] Read more.
To achieve real-time and accurate pose estimation for weakly textured or occluded targets, this study proposes a feature-enhancement 6D pose estimation method based on DenseFusion. Firstly, in the image feature extraction stage, skip connections and attention modules, which could effectively fuse deep and shallow features, are introduced to enhance the richness and effectiveness of image features. Secondly, in the point cloud feature extraction stage, PointNet is applied to the initial feature extraction of the point cloud. Then, the K-nearest neighbor method and the Pool globalization method are applied to obtain richer point cloud features. Subsequently, in the dense feature fusion stage, an adaptive feature selection module is introduced to further preserve and enhance effective features. Finally, we add a supervision network to the original pose estimation network to enhance the training results. The results of the experiment show that the improved method performs significantly better than classic methods in both the LineMOD dataset and Occlusion LineMOD dataset, and all enhancements improve the real-time performance and accuracy of pose estimation. Full article
(This article belongs to the Special Issue Digital Intelligence Technology and Applications, 2nd Edition)
Show Figures

Figure 1

19 pages, 1603 KB  
Article
BiLSTM-LN-SA: A Novel Integrated Model with Self-Attention for Multi-Sensor Fire Detection
by Zhaofeng He, Yu Si, Liyuan Yang, Nuo Xu, Xinglong Zhang, Mingming Wang and Xiaoyun Sun
Sensors 2025, 25(20), 6451; https://doi.org/10.3390/s25206451 - 18 Oct 2025
Viewed by 326
Abstract
Multi-sensor fire detection technology has been widely adopted in practical applications; however, existing methods still suffer from high false alarm rates and inadequate adaptability in complex environments due to their limited capacity to capture deep time-series dependencies in sensor data. To enhance robustness [...] Read more.
Multi-sensor fire detection technology has been widely adopted in practical applications; however, existing methods still suffer from high false alarm rates and inadequate adaptability in complex environments due to their limited capacity to capture deep time-series dependencies in sensor data. To enhance robustness and accuracy, this paper proposes a novel model named BiLSTM-LN-SA, which integrates a Bidirectional Long Short-Term Memory (BiLSTM) network with Layer Normalization (LN) and a Self-Attention (SA) mechanism. The BiLSTM module extracts intricate time-series features and long-term dependencies. The incorporation of Layer Normalization mitigates feature distribution shifts across different environments, thereby improving the model’s adaptability to cross-scenario data and its generalization capability. Simultaneously, the Self-Attention mechanism dynamically recalibrates the importance of features at different time steps, adaptively enhancing fire-critical information and enabling deeper, process-aware feature fusion. Extensive evaluation on a real-world dataset demonstrates the superiority of the BiLSTM-LN-SA model, which achieves a test accuracy of 98.38%, an F1-score of 0.98, and an AUC of 0.99, significantly outperforming existing methods including EIF-LSTM, rTPNN, and MLP. Notably, the model also maintains low false positive and false negative rates of 1.50% and 1.85%, respectively. Ablation studies further elucidate the complementary roles of each component: the self-attention mechanism is pivotal for dynamic feature weighting, while layer normalization is key to stabilizing the learning process. This validated design confirms the model’s strong generalization capability and practical reliability across varied environmental scenarios. Full article
(This article belongs to the Section Sensor Networks)
Show Figures

Figure 1

26 pages, 10675 KB  
Article
DFAS-YOLO: Dual Feature-Aware Sampling for Small-Object Detection in Remote Sensing Images
by Xiangyu Liu, Shenbo Zhou, Jianbo Ma, Yumei Sun, Jianlin Zhang and Haorui Zuo
Remote Sens. 2025, 17(20), 3476; https://doi.org/10.3390/rs17203476 - 18 Oct 2025
Viewed by 533
Abstract
In remote sensing imagery, detecting small objects is challenging due to the limited representational ability of feature maps when resolution changes. This issue is mainly reflected in two aspects: (1) upsampling causes feature shifts, making feature fusion difficult to align; (2) downsampling leads [...] Read more.
In remote sensing imagery, detecting small objects is challenging due to the limited representational ability of feature maps when resolution changes. This issue is mainly reflected in two aspects: (1) upsampling causes feature shifts, making feature fusion difficult to align; (2) downsampling leads to the loss of details. Although recent advances in object detection have been remarkable, small-object detection remains unresolved. In this paper, we propose Dual Feature-Aware Sampling YOLO (DFAS-YOLO) to address these issues. First, the Soft-Aware Adaptive Fusion (SAAF) module corrects upsampling by applying adaptive weighting and spatial attention, thereby reducing errors caused by feature shifts. Second, the Global Dense Local Aggregation (GDLA) module employs parallel convolution, max pooling, and average pooling with channel aggregation, combining their strengths to preserve details after downsampling. Furthermore, the detection head is redesigned based on object characteristics in remote sensing imagery. Extensive experiments on the VisDrone2019 and HIT-UAV datasets demonstrate that DFAS-YOLO achieves competitive detection accuracy compared with recent models. Full article
Show Figures

Figure 1

21 pages, 4746 KB  
Article
YOLO-PV: An Enhanced YOLO11n Model with Multi-Scale Feature Fusion for Photovoltaic Panel Defect Detection
by Wentao Cai and Hongfang Lv
Energies 2025, 18(20), 5489; https://doi.org/10.3390/en18205489 - 17 Oct 2025
Viewed by 281
Abstract
Photovoltaic (PV) panel defect detection is essential for maintaining power generation efficiency and ensuring the safe operation of solar plants. Conventional detectors often suffer from low accuracy and limited adaptability to multi-scale defects. To address this issue, we propose YOLO-PV, an enhanced YOLO11n-based [...] Read more.
Photovoltaic (PV) panel defect detection is essential for maintaining power generation efficiency and ensuring the safe operation of solar plants. Conventional detectors often suffer from low accuracy and limited adaptability to multi-scale defects. To address this issue, we propose YOLO-PV, an enhanced YOLO11n-based model incorporating three novel modules: the Enhanced Hybrid Multi-Scale Block (EHMSB), the Efficient Scale-Specific Attention Block (ESMSAB), and the ESMSAB-FPN for refined multi-scale feature fusion. YOLO-PV is evaluated on the PVEL-AD dataset and compared against representative detectors including YOLOv5n, YOLOv6n, YOLOv8n, YOLO11n, Faster R-CNN, and RT-DETR. Experimental results demonstrate that YOLO-PV achieves a 6.7% increase in Precision, a 2.9% improvement in mAP@0.5, and a 4.4% improvement in mAP@0.5:0.95, while maintaining real-time performance. These results highlight the effectiveness of the proposed modules in enhancing detection accuracy for PV defect inspection, providing a reliable and efficient solution for smart PV maintenance. Full article
Show Figures

Figure 1

22 pages, 1678 KB  
Article
Image Completion Network Considering Global and Local Information
by Yubo Liu, Ke Chen and Alan Penn
Buildings 2025, 15(20), 3746; https://doi.org/10.3390/buildings15203746 - 17 Oct 2025
Viewed by 210
Abstract
Accurate depth image inpainting in complex urban environments remains a critical challenge due to occlusions, reflections, and sensor limitations, which often result in significant data loss. We propose a hybrid deep learning framework that explicitly combines local and global modelling through Convolutional Neural [...] Read more.
Accurate depth image inpainting in complex urban environments remains a critical challenge due to occlusions, reflections, and sensor limitations, which often result in significant data loss. We propose a hybrid deep learning framework that explicitly combines local and global modelling through Convolutional Neural Networks (CNNs) and Transformer modules. The model employs a multi-branch parallel architecture, where the CNN branch captures fine-grained local textures and edges, while the Transformer branch models global semantic structures and long-range dependencies. We introduce an optimized attention mechanism, Agent Attention, which differs from existing efficient/linear attention methods by using learnable proxy tokens tailored for urban scene categories (e.g., façades, sky, ground). A content-guided dynamic fusion module adaptively combines multi-scale features to enhance structural alignment and texture recovery. The frame-work is trained with a composite loss function incorporating pixel accuracy, perceptual similarity, adversarial realism, and structural consistency. Extensive experiments on the Paris StreetView dataset demonstrate that the proposed method achieves state-of-the-art performance, outperforming existing approaches in PSNR, SSIM, and LPIPS metrics. The study highlights the potential of multi-scale modeling for urban depth inpainting and discusses challenges in real-world deployment, ethical considerations, and future directions for multimodal integration. Full article
Show Figures

Figure 1

17 pages, 2475 KB  
Article
YOLO-LMTB: A Lightweight Detection Model for Multi-Scale Tea Buds in Agriculture
by Guofeng Xia, Yanchuan Guo, Qihang Wei, Yiwen Cen, Loujing Feng and Yang Yu
Sensors 2025, 25(20), 6400; https://doi.org/10.3390/s25206400 - 16 Oct 2025
Viewed by 380
Abstract
Tea bud targets are typically located in complex environments characterized by multi-scale variations, high density, and strong color resemblance to the background, which pose significant challenges for rapid and accurate detection. To address these issues, this study presents YOLO-LMTB, a lightweight multi-scale detection [...] Read more.
Tea bud targets are typically located in complex environments characterized by multi-scale variations, high density, and strong color resemblance to the background, which pose significant challenges for rapid and accurate detection. To address these issues, this study presents YOLO-LMTB, a lightweight multi-scale detection model based on the YOLOv11n architecture. First, a Multi-scale Edge-Refinement Context Aggregator (MERCA) module is proposed to replace the original C3k2 block in the backbone. MERCA captures multi-scale contextual features through hierarchical receptive field collaboration and refines edge details, thereby significantly improving the perception of fine structures in tea buds. Furthermore, a Dynamic Hyperbolic Token Statistics Transformer (DHTST) module is developed to replace the original PSA block. This module dynamically adjusts feature responses and statistical measures through attention weighting using learnable threshold parameters, effectively enhancing discriminative features while suppressing background interference. Additionally, a Bidirectional Feature Pyramid Network (BiFPN) is introduced to replace the original network structure, enabling the adaptive fusion of semantically rich and spatially precise features via bidirectional cross-scale connections while reducing computational complexity. In the self-built tea bud dataset, experimental results demonstrate that compared to the original model, the YO-LO-LMTB model achieves a 2.9% improvement in precision (P), along with increases of 1.6% and 2.0% in mAP50 and mAP50-95, respectively. Simultaneously, the number of parameters decreased by 28.3%, and the model size reduced by 22.6%. To further validate the effectiveness of the improvement scheme, experiments were also conducted using public datasets. The results demonstrate that each enhancement module can boost the model’s detection performance and exhibits strong generalization capabilities. The model not only excels in multi-scale tea bud detection but also offers a valuable reference for reducing computational complexity, thereby providing a technical foundation for the practical application of intelligent tea-picking systems. Full article
(This article belongs to the Section Smart Agriculture)
Show Figures

Figure 1

20 pages, 2565 KB  
Article
GBV-Net: Hierarchical Fusion of Facial Expressions and Physiological Signals for Multimodal Emotion Recognition
by Jiling Yu, Yandong Ru, Bangjun Lei and Hongming Chen
Sensors 2025, 25(20), 6397; https://doi.org/10.3390/s25206397 - 16 Oct 2025
Viewed by 476
Abstract
A core challenge in multimodal emotion recognition lies in the precise capture of the inherent multimodal interactive nature of human emotions. Addressing the limitation of existing methods, which often process visual signals (facial expressions) and physiological signals (EEG, ECG, EOG, and GSR) in [...] Read more.
A core challenge in multimodal emotion recognition lies in the precise capture of the inherent multimodal interactive nature of human emotions. Addressing the limitation of existing methods, which often process visual signals (facial expressions) and physiological signals (EEG, ECG, EOG, and GSR) in isolation and thus fail to exploit their complementary strengths effectively, this paper presents a new multimodal emotion recognition framework called the Gated Biological Visual Network (GBV-Net). This framework enhances emotion recognition accuracy through deep synergistic fusion of facial expressions and physiological signals. GBV-Net integrates three core modules: (1) a facial feature extractor based on a modified ConvNeXt V2 architecture incorporating lightweight Transformers, specifically designed to capture subtle spatio-temporal dynamics in facial expressions; (2) a hybrid physiological feature extractor combining 1D convolutions, Temporal Convolutional Networks (TCNs), and convolutional self-attention mechanisms, adept at modeling local patterns and long-range temporal dependencies in physiological signals; and (3) an enhanced gated attention fusion module capable of adaptively learning inter-modal weights to achieve dynamic, synergistic integration at the feature level. A thorough investigation of the publicly accessible DEAP and MAHNOB-HCI datasets reveals that GBV-Net surpasses contemporary methods. Specifically, on the DEAP dataset, the model attained classification accuracies of 95.10% for Valence and 95.65% for Arousal, with F1-scores of 95.52% and 96.35%, respectively. On MAHNOB-HCI, the accuracies achieved were 97.28% for Valence and 97.73% for Arousal, with F1-scores of 97.50% and 97.74%, respectively. These experimental findings substantiate that GBV-Net effectively captures deep-level interactive information between multimodal signals, thereby improving emotion recognition accuracy. Full article
(This article belongs to the Section Biomedical Sensors)
Show Figures

Figure 1

Back to TopTop