Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

Search Results (700)

Search Parameters:
Keywords = attention-guided feature enhancement

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
40 pages, 15849 KB  
Article
Incorporating Structural Prior Knowledge into YOLO for Robust Infrastructure Damage Detection
by Zichen Zhang and Chengjun Guo
Buildings 2026, 16(11), 2105; https://doi.org/10.3390/buildings16112105 (registering DOI) - 25 May 2026
Abstract
Vision-based structural defect detection methods based on YOLOv11 have achieved promising performance in recent years; however, their robustness in real engineering environments remains limited due to illumination variation, shadow occlusion, surface contamination, and complex background textures. Existing data-driven approaches primarily rely on visual [...] Read more.
Vision-based structural defect detection methods based on YOLOv11 have achieved promising performance in recent years; however, their robustness in real engineering environments remains limited due to illumination variation, shadow occlusion, surface contamination, and complex background textures. Existing data-driven approaches primarily rely on visual appearance features while neglecting the intrinsic geometric continuity and morphological characteristics associated with structural failures such as cracks and spalling. To address these challenges, this study proposes an enhanced defect detection framework termed GCA-YOLO for intelligent structural inspection. The proposed method integrates a Geometric Constraint Attention (GCA) module and a Residual Efficient Channel Attention (RECA) module to improve feature representation. Instead of explicit physical simulation, the GCA module embeds morphology-guided geometric priors into the attention mechanism using differentiable gradient and Laplacian operators. This enforces structural continuity perception and suppresses geometrically inconsistent responses caused by background noise. Furthermore, a geometry confidence gating mechanism adaptively modulates the contribution of morphological features, while the RECA module recalibrates channel-wise responses to enhance the representation of weak and low-contrast defects. To comprehensively evaluate the proposed method, experiments were conducted on three representative datasets, including a public crack dataset and two self-built datasets (one for peeling/detachment and one for crack defects). These datasets were collected from diverse civil infrastructure scenarios such as bridges, tunnels, and pavements under challenging conditions including low illumination, shadow occlusion, complex textures, and heterogeneous backgrounds. Compared with the baseline YOLOv11 model, the proposed GCA-YOLO framework improves mAP@0.5 by 2.2%, 2.5%, and 15.9% on the public crack dataset, the self-built peeling/detaching dataset, and the self-built crack dataset, respectively. Meanwhile, Recall is improved by 4.6%, 3.8%, and 33.1%, respectively, demonstrating the effectiveness of the proposed dual-attention framework in enhancing the completeness of defect localization and reducing missed detections. Despite these performance gains, the proposed framework maintains a lightweight architecture and does not introduce significant computational overhead. Experimental results demonstrate that the proposed framework achieves strong robustness, stable generalization capability, and favorable detection efficiency across different defect categories and engineering scenarios, demonstrating promising potential for intelligent infrastructure inspection, urban safety monitoring, and practical engineering deployment. Full article
Show Figures

Figure 1

32 pages, 9709 KB  
Article
HSSD-YOLO: A Motion-Blur-Robust Object Detection Framework for Real-Time Seed Detection in High-Speed Pneumatic Seeders
by Yizheng Yao, Zishun Huang, Jiaqi Li, Xueyu Sun and Ying Zang
Agriculture 2026, 16(11), 1160; https://doi.org/10.3390/agriculture16111160 (registering DOI) - 25 May 2026
Abstract
For high-speed pneumatic seeders, accurate real-time seed detection underpins downstream quality assessments including seed counting, seeding-rate estimation, and uniformity evaluation. Under high-speed operating conditions, seeds exhibit rapid motion, dense distribution, frequent occlusion, and severe motion-blur-induced edge degradation, posing substantial challenges for vision-based detection. [...] Read more.
For high-speed pneumatic seeders, accurate real-time seed detection underpins downstream quality assessments including seed counting, seeding-rate estimation, and uniformity evaluation. Under high-speed operating conditions, seeds exhibit rapid motion, dense distribution, frequent occlusion, and severe motion-blur-induced edge degradation, posing substantial challenges for vision-based detection. This study proposes HSSD-YOLO, an improved detection algorithm built upon YOLOv11, incorporating three modules: a Motion Blur Enhanced Stem module (MBE-Stem) employing learnable Sobel gradient operators for edge feature extraction under motion blur; an Attention-enhanced Deformable Convolutional Network (ADCN) with a Residual Spatial-Channel Attention (RSCA) mechanism for adaptive sampling of irregularly shaped seeds; and an Edge-Guided Adaptive Recalibration Feature Pyramid Network (EGAR-FPN) injecting edge prior information into multi-scale feature fusion. On a self-constructed dataset of indica rice, japonica rice, and wheat seeds, HSSD-YOLO achieves 96.6% mAP@0.5 and 77.4% mAP@0.5–0.95, surpassing YOLOv11n by 2.5 and 5.4 percentage points, respectively, with only 5.2 M parameters. Ablation studies confirm synergistic gains exceeding linear superposition. Under the conditions evaluated, HSSD-YOLO outperformed all compared algorithms, providing the per-frame detection foundation for downstream seeding-quality tasks; empirical validation of those tasks on continuous video and embedded hardware remains outside the present scope. Full article
(This article belongs to the Special Issue Intelligent Agricultural Seeding Equipment)
Show Figures

Figure 1

6080 KB  
Proceeding Paper
Advancing Colorectal Polyp Detection in Colonoscopy Through Region-Guided Deep Learning
by Fairooz Nahiyan, Simoon Nahar, Taslim Alam, Md. Khaliluzzaman and Mohammad Mahadi Hassan
Eng. Proc. 2026, 124(1), 118; https://doi.org/10.3390/engproc2026124118 - 22 May 2026
Abstract
In terms of the detection of colorectal polyps during a colonoscopy, the accuracy of the diagnosis is key to effective prevention and treatment, and can be hindered by manual identification. Colorectal polyps are abnormal tissue growths in the colon or rectum, and their [...] Read more.
In terms of the detection of colorectal polyps during a colonoscopy, the accuracy of the diagnosis is key to effective prevention and treatment, and can be hindered by manual identification. Colorectal polyps are abnormal tissue growths in the colon or rectum, and their sizes, shapes and textures can make them difficult to find. Researchers have now turned to deep learning techniques and the YOLOv11 detection framework in particular to provide a method to automate the recognition and accurate identification of these abnormal growths. Specifically, the proposed method modifies the conventional YOLOv11 detection workflow by generating bounding box annotations from polyp segmentation masks, applying region-aware data preprocessing and augmentation, and training the detector under region-guided supervision to enhance localization precision and detection robustness. polyp segmentation masks are utilized to generate bounding box annotations which not only contribute exact spatial supervision but also avoid manual box labeling inconstancy. Region-aware data preprocessing and augmentation pay more attention to polyp-relevant regions and suppress background noise, which leads to clearer feature discrimination for small or irregular polyps. Additionally, region-guided supervision serves as explicit guidance for localizing objects with the anatomical polyp regions, which largely helps achieve accurate boundaries and prevent false detections. The proposed YOLOv11-based polyp detection system was tested and evaluated on the publicly available Kvasir-SEG dataset, which is comprised of annotated colonoscopy images. Enhanced data pre-processing and exhaustive training with appropriate choice of hyper-parameters fortified the reliability and useability of the model. The results confirmed high-grade results, and gave an Intersection over Union score of 0.9764, and an overall correctness rate of 99.00%, with well-balanced precision, recollection and F1-scores. Coming in with a mean Average Precision (mAP) of 0.9937 at a Intersection over Union threshold of 0.5 and 0.9935 over the full spectrum of thresholds from 0.5 to 0.95, this shows that the model is able to consistently and reliably detect polyps. The proposed system was also compared with Segment Anything Model, YOLO-Seg, and SAM2 and confirmed the efficacy of its method. Full article
(This article belongs to the Proceedings of The 6th International Electronic Conference on Applied Sciences)
Show Figures

Figure 1

20 pages, 58594 KB  
Article
FLKFormer: Frequency-Enhanced Large-Kernel Framework for Object Detection in UAV Imagery
by Yunhao Chen, Wen-Zhun Huang, Zhen Wang, Sihao Zeng and Chen Yang
Remote Sens. 2026, 18(11), 1686; https://doi.org/10.3390/rs18111686 - 22 May 2026
Viewed by 87
Abstract
UAV object detection remains challenging due to large scale variation, dense small objects, frequent occlusion, and complex background interference. Existing CNN-based detectors are often limited by weak small-object representation, while Transformer-based detectors may not adequately preserve local details in dense aerial scenes. This [...] Read more.
UAV object detection remains challenging due to large scale variation, dense small objects, frequent occlusion, and complex background interference. Existing CNN-based detectors are often limited by weak small-object representation, while Transformer-based detectors may not adequately preserve local details in dense aerial scenes. This paper proposes a dual-path detection framework that integrates frequency-domain enhancement with large-kernel convolution and Transformer-based global modeling. An FFT Large-Kernel Convolution (FFLKC) module is introduced to enhance high-frequency details and enlarge the effective receptive field. A Transformer pathway with Full-Process Feature Attention (FPFA) is designed to strengthen long-range dependency modeling and semantic representation. A Frequency-Semantic Memory-guided Adaptive Fusion (FMSAF) module is further employed to integrate local detail features and global contextual information. Experiments on UAVDT and VisDrone demonstrate that the proposed method achieves superior overall detection performance and stronger small-object perception than mainstream detectors. The method reaches 58.7 AP and 51.8 APS on UAVDT, and 39.4 AP and 30.5 APS on VisDrone. Qualitative and quantitative results verify the effectiveness of the proposed design in improving detection quality under complex UAV backgrounds. Full article
26 pages, 4609 KB  
Article
A DoveNet-Based Method for Plant Disease Image Generation
by Xinyue Sun, Xiangyan Meng and Qiufeng Wu
Appl. Sci. 2026, 16(11), 5208; https://doi.org/10.3390/app16115208 - 22 May 2026
Viewed by 69
Abstract
Image generation of plant disease in the natural environment has always been a challenging task. Traditional methods applied in the image generation of plant disease are without sufficient diversity and detailed lesions. Thus, this paper applies an image harmonization method to generate diverse [...] Read more.
Image generation of plant disease in the natural environment has always been a challenging task. Traditional methods applied in the image generation of plant disease are without sufficient diversity and detailed lesions. Thus, this paper applies an image harmonization method to generate diverse combinations of disease images by integrating different backgrounds and target regions to enhance diversity. To construct the dataset, we captured real disease images of soybean and rice in natural environments. Next, the Squeeze-and-Excitation (SE) attention mechanism was integrated into the domain verification network (DoveNet), together with a mask guide generator, to focus more attention on lesions. Two discriminators worked together to capture global and local features, ensuring the preservation of critical contextual information. Finally, the improved DoveNet achieved a MSE of 43.77, a PSNR of 33.02, and an SSIM of 0.9806, showing a reduction of 3.61 in the MSE, an increase of 0.50 in the PSNR, and a 2.49% improvement in the SSIM compared with the original DoveNet. Meanwhile, through visual Turing tests we confirmed that images generated using the improved DoveNet were of much better quality and more convincing. Full article
(This article belongs to the Section Agricultural Science and Technology)
33 pages, 6735 KB  
Article
ADDFNet: A Robotic Grasping Depth Map Completion Network Integrating Differential Enhancement Convolution and Hybrid Attention
by Nan Liu, Yi-Horng Lai, Yue Wu, Jiaen Wang and Xian Yu
Actuators 2026, 15(6), 280; https://doi.org/10.3390/act15060280 - 22 May 2026
Viewed by 68
Abstract
In the field of industrial robotic vision, accurate recognition and localization of transparent objects pose significant challenges. Unlike opaque objects, transparent objects are difficult to distinguish in RGB images, and due to refraction and reflection, their depth information often suffers from large-area missing [...] Read more.
In the field of industrial robotic vision, accurate recognition and localization of transparent objects pose significant challenges. Unlike opaque objects, transparent objects are difficult to distinguish in RGB images, and due to refraction and reflection, their depth information often suffers from large-area missing or erroneous values, leading to failed grasp pose prediction. Therefore, depth completion is crucial for transparent object grasping tasks. However, existing depth completion methods still exhibit obvious limitations. Multi-stage optimization methods, while achieving high accuracy, involve complex pipelines and high computational costs. Single-stage end-to-end networks, when processing sparse edge features of transparent objects that are also contaminated by background interference, are constrained by the receptive field and smoothing effect of conventional convolutions, often resulting in contour blurring and loss of geometric details. Moreover, existing methods still lack sufficient capability in modeling multi-directional gradient variations of transparent objects under complex backgrounds. To address these issues, this paper proposes ADDFNet for transparent object depth completion, achieving synergistic improvement in accuracy and robustness through two key designs: MDAM and CMFR. To tackle the problem of sparse edge features of transparent objects that are easily disturbed by noise, we design the Multi-directional Differential Attention Module (MDAM), which explicitly extracts multi-directional gradient information through multi-branch differential convolution. Within MDAM, we introduce the Detail Enhancement Differential sub-Module (DEDM) and the Dynamic Convolution with Symmetry-enhanced Geometry Attention sub-module (DSCA) to enhance the network’s perception of fine contours and improve global–local synergistic modeling capability. To address insufficient cross-modal information interaction, we introduce the Cross-Modal Feature Refinement (CMFR) module, which utilizes RGB context to guide and enhance depth features layer by layer during the encoding stage, improving the accuracy and robustness of depth completion while mitigating feature degradation caused by traditional simple fusion approaches. Experimental results on the ClearPose and TransCG datasets demonstrate that ADDFNet outperforms comparison methods in terms of RMSE, REL, MAE, and threshold accuracy metrics, exhibiting more stable performance in edge recovery and internal detail reconstruction of transparent objects. Full article
(This article belongs to the Special Issue Actuation and Sensing of Intelligent Soft Robots—2nd Edition)
24 pages, 62422 KB  
Article
GDBNet: A Three-Branch Semantic Segmentation Network Integrating CNN and Transformer for Land Cover Classification in Ski Resorts
by Zhiwei Yi, Lingjia Gu, Ruifei Zhu, Junwei Tian and He Mi
Remote Sens. 2026, 18(10), 1666; https://doi.org/10.3390/rs18101666 - 21 May 2026
Viewed by 88
Abstract
As a critical component of ice-snow tourism, land cover classification for ski resorts is crucial to ice-snow resource management. However, there is currently a scarcity of datasets and methods capable of high-precision mapping for such fine-grained scenarios. Although Transformers with long-sequence interactions and [...] Read more.
As a critical component of ice-snow tourism, land cover classification for ski resorts is crucial to ice-snow resource management. However, there is currently a scarcity of datasets and methods capable of high-precision mapping for such fine-grained scenarios. Although Transformers with long-sequence interactions and convolutional neural networks (CNNs) have emerged as mainstream solutions, their performance remains limited on high-resolution remote sensing data characterized by small datasets and high heterogeneity. Targeting land cover classification in ski resort areas, this study proposes a triple-branch segmentation framework integrating CNNs and Transformers to extract global, detail and boundary features (GDBNet), and constructs the first high-resolution ski resort land cover dataset with a resolution of 0.75 m using JiLin-1 satellite constellation (LULC_SKI). The framework employs a backbone combining SegFormer with dual CNN branches. SegFormer captures global semantic context, while dual ResNet-18 branches extract local semantics and edge details respectively. The neck integrates two specialized feature interaction modules, the proposed Pixel-Guided Feature Attention (PG-AFM) and Boundary-Guided Feature Attention (BG-AFM), which synergistically fuse these heterogeneous feature representations for enhanced multi-scale modeling. For the segmentation head, a multi-task learning approach supervises both semantic and edge outputs. LULC_SKI covers seven representative ski resorts in Jilin Province, China, comprising 10,000 multi-seasonal images annotated with six land cover classes, including roads, vegetation, built-up areas, ski runs, water bodies, and cropland. Experiments demonstrate GDBNet achieves 85.44% mIoU and 91.84% mF1 on LULC_SKI, outperforming other advanced models with particularly significant improvements for linear objects like roads and ski runs. Extensive experimental comparisons show that GDBNet delivers consistently excellent performance on both the iSAID and LoveDA datasets, underscoring the superiority of our proposed method. Ablation studies validate the effectiveness of the triple-branch architecture, attention modules, and multi-task supervision. This work proposes a modular framework for land cover classification in complex ski resort scenarios. Full article
Show Figures

Figure 1

26 pages, 3005 KB  
Article
EcoTomHybridNet: Policy-Guided Adaptive CNN–Transformer Inference for Resource-Aware Edge-Based Tomato Leaf Disease Classification
by Oussama Nabil and Cherkaoui Leghris
Future Internet 2026, 18(5), 271; https://doi.org/10.3390/fi18050271 - 21 May 2026
Viewed by 137
Abstract
Tomato (Solanum lycopersicum) cultivation is highly vulnerable to fungal, bacterial, and viral leaf diseases that can significantly reduce crop yield and fruit quality when not detected at early stages. Although recent deep learning approaches have achieved remarkable performance in plant disease [...] Read more.
Tomato (Solanum lycopersicum) cultivation is highly vulnerable to fungal, bacterial, and viral leaf diseases that can significantly reduce crop yield and fruit quality when not detected at early stages. Although recent deep learning approaches have achieved remarkable performance in plant disease classification, many state-of-the-art architectures remain computationally expensive and therefore difficult to deploy on resource-constrained edge devices commonly used in smart agriculture environments. To address this challenge, this paper introduces EcoTomHybridNet, an adaptive resource-aware CNN–Transformer framework designed for efficient tomato leaf disease classification under edge-computing constraints. The proposed architecture combines a lightweight convolutional backbone with a dual-branch inference mechanism composed of a fast convolutional branch for computationally efficient prediction and a Transformer-enhanced branch with local self-attention for richer contextual feature extraction. Unlike conventional lightweight hybrid models relying on static inference pipelines, EcoTomHybridNet integrates a lightweight policy-guided routing mechanism that dynamically allocates inputs between the fast convolutional branch and the Transformer-enhanced branch according to input complexity. This adaptive inference strategy dynamically reduces unnecessary Transformer computations for simpler samples while preserving strong predictive performance on more challenging inputs through policy-guided branch allocation. To further improve representation capability without significantly increasing computational complexity, the proposed student network is trained using knowledge distillation from a ViT-Tiny teacher model. Experimental results on the PlantVillage tomato dataset demonstrate that EcoTomHybridNet achieves 99.42% test accuracy and 99.0% validation accuracy under the full hybrid inference configuration. Additional validation strategies, including 5-fold cross-validation and robustness evaluation under Gaussian noise and motion blur perturbations, indicate stable performance across different data splits and moderate image degradations, suggesting improved generalization capability beyond simple dataset memorization. Furthermore, adaptive routing experiments using a lightweight threshold-based policy mechanism achieved 99.20% test accuracy while reducing computational complexity from 0.36 GFLOPs to 0.25 GFLOPs per image, corresponding to approximately 30% computational savings. These results demonstrate the effectiveness of policy-guided adaptive inference for balancing predictive performance and computational efficiency in edge-oriented plant disease classification. Overall, EcoTomHybridNet provides an efficient and adaptive framework for intelligent plant disease monitoring in IoT-enabled smart agriculture systems. Full article
Show Figures

Graphical abstract

18 pages, 5622 KB  
Article
MscaVPR: Multi-Scale Coordinate Attention Network for Robust Visual Place Recognition
by Xiaohan Gao, Zhinong Zhong, Yongjian Tan, Ning Jing, Anran Yang and Qingren Jia
Sensors 2026, 26(10), 3261; https://doi.org/10.3390/s26103261 - 21 May 2026
Viewed by 247
Abstract
Visual place recognition (VPR) aims to localize a query image by matching its visual representation against a geotagged database. One major challenge in VPR is to learn place representations that remain robust under appearance changes, viewpoint variations, and perceptual aliasing. However, existing VPR [...] Read more.
Visual place recognition (VPR) aims to localize a query image by matching its visual representation against a geotagged database. One major challenge in VPR is to learn place representations that remain robust under appearance changes, viewpoint variations, and perceptual aliasing. However, existing VPR methods still show limitations in adaptive multi-scale feature fusion and viewpoint-aware training supervision, which may hinder robustness under severe viewpoint changes. In this paper, we propose MscaVPR, a VPR framework that combines multi-scale feature enhancement with azimuth-aware training. Specifically, a Multi-Scale Spatial Pyramid Attention (MSPA) module is incorporated to aggregate regional features across different spatial scales, and Coordinate Attention (CA) is used to encode positional cues for spatially refined feature learning. To further enhance viewpoint robustness, we design an azimuth-guided training strategy that selects hard positive samples with significant viewpoint discrepancies and optimizes them using an azimuth-aware auxiliary loss function.Experimental results on multiple benchmark datasets indicate that MscaVPR generally outperforms the strong baseline and demonstrates improved performance under challenging conditions. In particular, Recall@1 is improved by 2.1%, 1.9%, and 1.9% on the AmsterTime, SVOX-Night, and SVOX-Sun datasets, respectively. The results demonstrate that explicitly incorporating azimuth cues provides an effective complement to existing multi-scale and attention-based VPR methods. Full article
(This article belongs to the Section Navigation and Positioning)
Show Figures

Figure 1

19 pages, 26232 KB  
Article
Blind-Spot KAN-Based Background Reconstruction Network with Prior Purification for Hyperspectral Anomaly Detection
by Lifeng Yu, Yifan Liu and Hongmin Gao
Remote Sens. 2026, 18(10), 1628; https://doi.org/10.3390/rs18101628 - 19 May 2026
Viewed by 176
Abstract
Hyperspectral anomaly detection (HAD) aims to identify rare targets without relying on prior target knowledge. However, background spectra in hyperspectral images often lie on highly complex and nonlinear manifolds, making accurate modeling challenging. Although models with strong nonlinear approximation capabilities, such as Kolmogorov–Arnold [...] Read more.
Hyperspectral anomaly detection (HAD) aims to identify rare targets without relying on prior target knowledge. However, background spectra in hyperspectral images often lie on highly complex and nonlinear manifolds, making accurate modeling challenging. Although models with strong nonlinear approximation capabilities, such as Kolmogorov–Arnold Networks (KANs), provide a promising solution for capturing such complexity, self-supervised reconstruction-based HAD methods still suffer from a fundamental issue known as anomaly leakage. When the model has high representation capacity, anomalous signatures tend to be partially reconstructed, which reduces residual contrast and degrades detection performance. To address this issue, we propose a Blind-Spot KAN-based background reconstruction network with prior purification (BKP-Net), which mitigates anomaly leakage from both data and model perspectives. Specifically, we first introduce a Background Prior Purification (BPP) module to construct a cleaner background prior. This module suppresses and replaces potential outlier pixels through spatial clustering and robust weighted mean estimation. We then design a Blind-Spot KAN-based Reconstruction backbone (BKCN) to model complex nonlinear background characteristics while preventing direct information flow from the center pixel, thereby reducing anomaly leakage during reconstruction. In addition, separable convolutions are employed to enhance spatial–spectral feature representation, followed by an attention-guided fusion mechanism to suppress cross-domain interference. Furthermore, a band-wise Guided Reconstruction Refinement (GRR) strategy is introduced in the detection phase to improve structural consistency between the reconstructed background and the original hyperspectral image, leading to more reliable anomaly discrimination. Experimental results on four hyperspectral datasets demonstrate that the proposed method achieves competitive performance compared with several representative traditional and deep learning-based detectors. Full article
(This article belongs to the Special Issue Super Resolution of Hyperspectral Imagery with Computer Vision)
Show Figures

Figure 1

25 pages, 795 KB  
Article
From Prediction to Planning: A Spectral-Temporal GNN and Bi-Directional Decoding RL Framework
by Peiming Zhang, Jiangang Lu, Jiajia Fu, Xinyue Di, Kai Fang, Jie Tang and Cui Yang
Signals 2026, 7(3), 47; https://doi.org/10.3390/signals7030047 - 19 May 2026
Viewed by 158
Abstract
Accurately capturing spatiotemporal dependencies and enabling effective decision support are core challenges in Intelligent Transportation Systems (ITS). Existing research often treats traffic prediction and path planning as isolated tasks. Moreover, mainstream prediction models struggle with long-term periodic patterns, while Reinforcement Learning (RL)-based planning [...] Read more.
Accurately capturing spatiotemporal dependencies and enabling effective decision support are core challenges in Intelligent Transportation Systems (ITS). Existing research often treats traffic prediction and path planning as isolated tasks. Moreover, mainstream prediction models struggle with long-term periodic patterns, while Reinforcement Learning (RL)-based planning often suffers from inefficient exploration in sparse topologies. To address these issues, this paper proposes a unified framework combining a spectral-temporal Graph Neural Network (GNN) and bi-directional decoding RL. Specifically, a time-frequency dual-stream adaptive learning module is introduced for prediction. Fast Fourier Transform (FFT) and Gated Recurrent Unit (GRU) are employed to capture global frequency periodicities and local temporal dynamics, respectively. Their adaptive fusion effectively mitigates the long-sequence information forgetting problem. For path planning, the task is formulated as sequence generation. A graph-aware attention encoder with adjacency masking is designed, and heuristic feature embeddings are incorporated to guide efficient exploration. Furthermore, a bi-directional autoregressive decoding strategy enhances robustness against topological bottlenecks. On PEMSD4 and PEMSD8, the proposed predictor achieves MAE/RMSE/MAPE values of 18.211/30.433/12.006 and 13.587/23.566/8.955, respectively. Path-planning simulations on the PEMSD4-derived sparse topology further demonstrate stable bi-directional RL optimization, faster convergence with heuristic guidance, and a sparsity-aware encoder that reduces redundant attention interactions in sparse road networks. These results validate the effectiveness of the proposed “predict-then-plan” paradigm. Full article
Show Figures

Figure 1

27 pages, 1812 KB  
Article
Prototype-Guided Attention for Graph Neural Networks
by Yiran Sun and Hak-Keung Lam
Appl. Sci. 2026, 16(10), 5028; https://doi.org/10.3390/app16105028 - 18 May 2026
Viewed by 100
Abstract
The graph neural network (GNN) has demonstrated strong performance in modelling graph-structured data across multiple application domains. However, existing GNN models do not fully exploit the information inherent in data. In particular, from labelled data, only labels are used as ground truth for [...] Read more.
The graph neural network (GNN) has demonstrated strong performance in modelling graph-structured data across multiple application domains. However, existing GNN models do not fully exploit the information inherent in data. In particular, from labelled data, only labels are used as ground truth for supervision through loss functions, while the rich feature information embedded in labelled data is not fully explored. To address this limitation, we propose a prototype-guided attention mechanism for GNNs, a novel architecture that constructs class prototypes from labelled data and leverages them as task-relevant information to guide representation learning. By incorporating this information as input through an attention mechanism, the resulting node embeddings capture more comprehensive and accurate graph representations, which are provided for subsequent GNN layers. The proposed architecture can be integrated with various existing GNN models to enhance their learning capability, demonstrating wide applicability and flexibility. Experiments on node classification tasks across multiple benchmark datasets demonstrate that the proposed attention-based GNN architecture outperforms the corresponding GNN baselines in prediction performance, highlighting its effectiveness and potential for graph learning tasks. Full article
Show Figures

Figure 1

27 pages, 4438 KB  
Article
DOM-MUSE: A Deformable Omnidirectional State Space Architecture for Efficient Speech Enhancement
by Tsung-Jung Li, Bo-Yu Su, Jung-Shan Lin and Jeih-Weih Hung
Electronics 2026, 15(10), 2159; https://doi.org/10.3390/electronics15102159 - 18 May 2026
Viewed by 163
Abstract
Transformer-based speech enhancement (SE) architectures suffer from high computational complexity, while existing lightweight state space model (SSM) approaches are constrained to fixed one-dimensional scanning that cannot fully exploit the two-dimensional time–frequency structure of speech spectrograms. To address these limitations, we propose DOM-MUSE, a [...] Read more.
Transformer-based speech enhancement (SE) architectures suffer from high computational complexity, while existing lightweight state space model (SSM) approaches are constrained to fixed one-dimensional scanning that cannot fully exploit the two-dimensional time–frequency structure of speech spectrograms. To address these limitations, we propose DOM-MUSE, a lightweight U-Net-style SE framework built upon the Mamba-2 SSM with four targeted innovations. First, a Deformable Feature Extractor (DFE) predicts per location spatial offsets that warp the feature sampling grid to align with speech formant trajectories and harmonic structures, providing geometrically coherent inputs to the state space model. Second, a DOM Mamba Block with Cross-Dimensional Gated Fusion (CDGF) deploys two parallel Mamba-2 instances scanning the time and frequency axes independently, and uses Taylor Channel Attention (TCA) to derive semantic gates that modulate each SSM output before fusion. Third, a Phase-Guided Feature Conditioner (PGFC) computes local phase-gradient gates that suppress noise-dominated activations prior to the SSM stage, making the feature extraction pathway implicitly phase-aware. Fourth, an Attention-Based Skip Connection (ABSC) replaces conventional concatenation skip connections with a learned channel gate, adaptively controlling the information flow from the encoder to the decoder. Experiments on the VoiceBank-DEMAND benchmark demonstrate that DOM-MUSE outperforms the reproduced MUSE baseline on all five evaluation metrics—including PESQ (+0.077), CSIG (+0.058), CBAK (+0.026), COVL (+0.070), and STOI (+0.002)—while reducing the parameter count by 24% (0.51 M to 0.39 M). Notably, DOM-MUSE also surpasses MUSE++ on perceptual quality metrics (PESQ +0.061, COVL +0.032) despite MUSE++ employing dynamic SNR augmentation and an augmented multi-objective loss that DOM-MUSE deliberately omits, demonstrating that the proposed architectural innovations yield genuine improvements independent of training strategy. When DOM-MUSE is additionally trained under the same augmented protocol as MUSE++, it achieves PESQ of 3.46 and COVL of 4.22, further confirming the complementary nature of architectural and training improvements. Full article
Show Figures

Figure 1

25 pages, 14321 KB  
Article
A Woodblock New Year Painting Style Classification Method Based on Structural-Aware Attention and Frequency-Domain Style Statistics
by Hua Wei, Zhihua Diao, Junxiang Diao, Liqin Wen, Binbin Sun, Xiaoxuan Chen and Luping Yin
Electronics 2026, 15(10), 2158; https://doi.org/10.3390/electronics15102158 - 18 May 2026
Viewed by 114
Abstract
To address the problems of subtle style differences, high inter-class similarity, and complex structural and texture features in woodblock New Year paintings, this paper proposes a style classification method for woodblock New Year paintings based on an improved ResNeXt-50. The method introduces SA-CBAM [...] Read more.
To address the problems of subtle style differences, high inter-class similarity, and complex structural and texture features in woodblock New Year paintings, this paper proposes a style classification method for woodblock New Year paintings based on an improved ResNeXt-50. The method introduces SA-CBAM at the middle- and high-level feature stages. Through the synergistic effect of channel attention and edge-enhanced spatial attention, the model is guided to focus on key structural regions such as human contours. Furthermore, single-stage 2D-DWT is introduced to separate deep features into low-frequency global structural components and high-frequency local detail components, thereby enabling effective representation of overall composition information and fine-grained carving textures. The Gram matrix is introduced to conduct statistical modeling of the fusion features, so as to characterize the overall style distribution from the perspective of channel correlation. The model is trained and tested on a dataset of 4043 independent images across six categories, achieving an overall classification accuracy of 97.68%, which is significantly superior to mainstream models such as Vision Transformer. Ablation experiments further verify the complementary effects of each module in structural perception, frequency-domain feature representation, and style statistical modeling, demonstrating the effectiveness and application potential of the proposed method for digital preservation and fine-grained style recognition of woodblock New Year paintings. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

24 pages, 3112 KB  
Article
Anime Character Style Classification Based on Frequency-Domain Decoupling and Multi-Scale Feature Fusion
by Yunfeng Chen, Junxiang Diao, Hua Wei and Zhihua Diao
Electronics 2026, 15(10), 2157; https://doi.org/10.3390/electronics15102157 - 17 May 2026
Viewed by 272
Abstract
Automatic classification of anime character painting styles is of great significance to the digital cultural industry and visual content production. Existing methods are prone to shortcut learning when handling complex color rendering and cannot fully decouple high-frequency line drafts from low-frequency colors. To [...] Read more.
Automatic classification of anime character painting styles is of great significance to the digital cultural industry and visual content production. Existing methods are prone to shortcut learning when handling complex color rendering and cannot fully decouple high-frequency line drafts from low-frequency colors. To solve this problem, this study proposes an improved deep learning classification method based on EfficientNetV2-B0. This method introduces random amplitude scaling (RAS) at the data input terminal. It realizes effective decoupling of colors and line-draft structures through random low-frequency amplitude perturbation, and suppresses the model’s excessive dependence on global color information from the source. Edge-guided coordinate attention (EG-CA) is integrated into the backbone network. It enhances the perception of line and contour features through edge weights and improves the model’s ability to capture fine-grained structural features. Adaptive scale feature aggregation (ASFA) is designed in the multi-scale feature fusion stage. It achieves efficient fusion of shallow textures and deep semantics through dynamic weighting, so as to enhance the model’s discriminative ability under complex painting styles. On a dataset containing 7887 images of four categories, the classification accuracy of the model reaches 95.81%. It significantly outperforms mainstream models such as MViTv2-T. Meanwhile, the number of parameters is only 7.84 M and the inference speed reaches 68.83 FPS. Ablation experiments show that the synergistic effect of the three modules improves the accuracy of the baseline model by 6.06%. It proves that the proposed method provides reliable technical support for the structured management and copyright traceability of anime images. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

Back to TopTop