Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (418)

Search Parameters:
Keywords = hierarchical attention mechanism

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
20 pages, 5076 KB  
Article
Hybrid-Domain Synergistic Transformer for Hyperspectral Image Denoising
by Haoyue Li and Di Wu
Appl. Sci. 2025, 15(17), 9735; https://doi.org/10.3390/app15179735 (registering DOI) - 4 Sep 2025
Abstract
Hyperspectral image (HSI) denoising is challenged by complex spatial-spectral noise coupling. Existing deep learning methods, primarily designed for RGB images, fail to address HSI-specific noise distributions and spectral correlations. This paper proposes a Hybrid-Domain Synergistic Transformer (HDST) integrating frequency-domain enhancement and multiscale modeling. [...] Read more.
Hyperspectral image (HSI) denoising is challenged by complex spatial-spectral noise coupling. Existing deep learning methods, primarily designed for RGB images, fail to address HSI-specific noise distributions and spectral correlations. This paper proposes a Hybrid-Domain Synergistic Transformer (HDST) integrating frequency-domain enhancement and multiscale modeling. Key contributions include (1) a Fourier-based preprocessing module decoupling spectral noise; (2) a dynamic cross-domain attention mechanism adaptively fusing spatial-frequency features; and (3) a hierarchical architecture combining global noise modeling and detail recovery. Experiments on realistic and synthetic datasets show HDST outperforms state-of-the-art methods in PSNR, with fewer parameters. Visual results confirm effective noise suppression without spectral distortion. The framework provides a robust solution for HSI denoising, demonstrating potential for high-dimensional visual data processing. Full article
19 pages, 805 KB  
Article
A Multi-Level Feature Fusion Network Integrating BERT and TextCNN
by Yangwu Zhang, Mingxiao Xu and Guohe Li
Electronics 2025, 14(17), 3508; https://doi.org/10.3390/electronics14173508 - 2 Sep 2025
Viewed by 29
Abstract
With the rapid growth of job-related crimes in developing economies, there is an urgent need for intelligent judicial systems to standardize sentencing practices. This study proposes a Multi-Level Feature Fusion Network (MLFFN) to enhance the accuracy and interpretability of sentencing predictions in job-related [...] Read more.
With the rapid growth of job-related crimes in developing economies, there is an urgent need for intelligent judicial systems to standardize sentencing practices. This study proposes a Multi-Level Feature Fusion Network (MLFFN) to enhance the accuracy and interpretability of sentencing predictions in job-related crime cases. The model integrates hierarchical legal feature representation, beginning with benchmark judgments (including starting-point penalties and additional penalties) as the foundational input. The frontend of MLFFN employs an attention mechanism to dynamically fuse word-level, segment-level, and position-level embeddings, generating a global feature encoding that captures critical legal relationships. The backend utilizes sliding-window convolutional kernels to extract localized features from the global feature map, preserving nuanced contextual factors that influence sentencing ranges. Trained on a dataset of job-related crime cases, MLFFN achieves a 6%+ performance improvement over the baseline models (BERT-base-Chinese, TextCNN, and ERNIE) in sentencing prediction accuracy. Our findings demonstrate that explicit modeling of legal hierarchies and contextual constraints significantly improves judicial AI systems. Full article
Show Figures

Figure 1

23 pages, 9065 KB  
Article
Multi-Scale Guided Context-Aware Transformer for Remote Sensing Building Extraction
by Mengxuan Yu, Jiepan Li and Wei He
Sensors 2025, 25(17), 5356; https://doi.org/10.3390/s25175356 - 29 Aug 2025
Viewed by 275
Abstract
Building extraction from high-resolution remote sensing imagery is critical for urban planning and disaster management, yet remains challenging due to significant intra-class variability in architectural styles and multi-scale distribution patterns of buildings. To address these limitations, we propose the Multi-Scale Guided Context-Aware Network [...] Read more.
Building extraction from high-resolution remote sensing imagery is critical for urban planning and disaster management, yet remains challenging due to significant intra-class variability in architectural styles and multi-scale distribution patterns of buildings. To address these limitations, we propose the Multi-Scale Guided Context-Aware Network (MSGCANet), a Transformer-based multi-scale guided context-aware network. Our framework integrates a Contextual Exploration Module (CEM) that synergizes asymmetric and progressive dilated convolutions to hierarchically expand receptive fields, enhancing discriminability for dense building features. We further design a Window-Guided Multi-Scale Attention Mechanism (WGMSAM) to dynamically establish cross-scale spatial dependencies through adaptive window partitioning, enabling precise fusion of local geometric details and global contextual semantics. Additionally, a cross-level Transformer decoder leverages deformable convolutions for spatially adaptive feature alignment and joint channel-spatial modeling. Experimental results show that MSGCANet achieves IoU values of 75.47%, 91.53%, and 83.10%, and F1-scores of 86.03%, 95.59%, and 90.78% on the Massachusetts, WHU, and Inria datasets, respectively, demonstrating robust performance across these datasets. Full article
(This article belongs to the Section Optical Sensors)
Show Figures

Figure 1

18 pages, 3388 KB  
Article
Analysis of Interfacial Properties in Flax Yarn-Reinforced Epoxy Resin Composites
by Xinlong Wang, Hongjun Li, Duncan Camilleri, B. Y. R. Surnam, Zhenyu Wu, Xiaoying Cheng, Lin Shi and Wenqi Lu
Fibers 2025, 13(9), 118; https://doi.org/10.3390/fib13090118 - 29 Aug 2025
Viewed by 278
Abstract
With the increasing demand for green materials, natural fiber-reinforced composites have garnered significant attention due to their environmental benefits and cost-effectiveness. However, the weak interfacial bonding between flax fibers and resin matrices limits their broader application. This study systematically investigates the interfacial properties [...] Read more.
With the increasing demand for green materials, natural fiber-reinforced composites have garnered significant attention due to their environmental benefits and cost-effectiveness. However, the weak interfacial bonding between flax fibers and resin matrices limits their broader application. This study systematically investigates the interfacial properties of single-ply and double-ply flax yarn-reinforced epoxy resin composites, focusing on interfacial shear strength (IFSS) and its influencing factors. Pull-out tests were conducted to evaluate the mechanical behavior of yarns under varying embedded lengths, while scanning electron microscopy (SEM) was employed to characterize interfacial failure modes. Critical embedded lengths were determined as 1.49 mm for single-ply and 2.71 mm for double-ply configurations. Results demonstrate that the tensile strength and elastic modulus of flax yarns decrease significantly with increasing gauge length. Single-ply yarns exhibit higher IFSS (30.90–32.03 MPa) compared to double-ply yarns (20.61–25.21 MPa), attributed to their tightly aligned fibers and larger interfacial contact area. Single-ply composites predominantly fail through interfacial debonding, whereas double-ply composites exhibit a hybrid failure mechanism involving interfacial separation, fiber slippage, and matrix fracture, caused by stress inhomogeneity from their multi-strand twisted structure. The study reveals that interfacial failure originates from the incompatibility between hydrophilic fibers and hydrophobic resin, coupled with stress concentration effects induced by the yarn’s multi-level hierarchical structure. These findings provide theoretical guidance for optimizing interfacial design in flax fiber composites to enhance load-transfer efficiency, advancing their application in lightweight, eco-friendly materials. Full article
Show Figures

Figure 1

17 pages, 3054 KB  
Article
Building Instance Extraction via Multi-Scale Hybrid Dual-Attention Network
by Qingqing Hu, Yiran Peng, Chi Zhang, Yunqi Lin, KinTak U and Junming Chen
Buildings 2025, 15(17), 3102; https://doi.org/10.3390/buildings15173102 - 29 Aug 2025
Viewed by 226
Abstract
Accurate building instance segmentation from high-resolution remote sensing images remains challenging due to complex urban scenes featuring occlusions, irregular building shapes, and heterogeneous textures. To address these issues, we propose a novel Multi-Scale Hybrid Dual-Attention Network (MS-HDAN), which integrates a dual-stream encoder, multi-scale [...] Read more.
Accurate building instance segmentation from high-resolution remote sensing images remains challenging due to complex urban scenes featuring occlusions, irregular building shapes, and heterogeneous textures. To address these issues, we propose a novel Multi-Scale Hybrid Dual-Attention Network (MS-HDAN), which integrates a dual-stream encoder, multi-scale feature extraction, and a hybrid attention mechanism. Specifically, the encoder is designed with a Local Feature Extraction Pathway (LFEP) and a Global Context Modeling Pathway (GCMP), enabling simultaneous capture of structural details and long-range semantic dependencies. A Local-Global Collaborative Perception Enhancement Module (LG-CPEM) is introduced to fuse the outputs from both streams, enhancing contextual representation. The decoder adopts a hierarchical up-sampling structure with skip connections and incorporates a dual-attention module to refine boundary-level details and suppress background noise. Extensive experiments on benchmark urban building datasets demonstrate that MS-HDAN significantly outperforms existing state-of-the-art methods, particularly in handling densely distributed and structurally complex buildings. The proposed framework offers a robust and scalable solution for real-world applications, such as urban planning, where precise building segmentation is crucial. Full article
Show Figures

Figure 1

33 pages, 2412 KB  
Review
Untangling the Complexity of Two-Component Signal Transduction in Bacteria
by Patrycja Wadach, Dagmara Jakimowicz and Martyna Gongerowska-Jac
Microorganisms 2025, 13(9), 2013; https://doi.org/10.3390/microorganisms13092013 - 28 Aug 2025
Viewed by 177
Abstract
Two-component systems (TCSs) are ubiquitous in bacteria and are central to their ability to sense and respond to diverse environmental and intracellular cues. Classically composed of a sensor histidine kinase and a cognate response regulator, TCSs control processes ranging from metabolism and development [...] Read more.
Two-component systems (TCSs) are ubiquitous in bacteria and are central to their ability to sense and respond to diverse environmental and intracellular cues. Classically composed of a sensor histidine kinase and a cognate response regulator, TCSs control processes ranging from metabolism and development to virulence and antibiotic resistance. In addition to their biological roles, TCSs are garnering attention in synthetic biology and antimicrobial drug development. While canonical architectures have been extensively studied, increasing evidence highlights the remarkable diversity in their organization and regulation. Despite substantial progress, key questions remain regarding the prevalence and physiological relevance of non-canonical TCSs, the mechanisms ensuring signal fidelity, and the potential for engineering these systems. This review explores non-typical TCSs, focusing on their varied transcriptional regulation, alternative response regulator activities, varied control by phosphorylation, and negative control mechanisms. We discuss how bacteria manage signaling specificity among numerous TCSs through cross-talk, hierarchical interactions, and phosphorelay systems and how these features shape adaptive responses. By synthesizing current understanding and highlighting still existing knowledge gaps, this review offers a novel perspective on TCS diversity, indicating directions for future research and potential translational applications in biotechnology and medicine. Full article
(This article belongs to the Section Molecular Microbiology and Immunology)
Show Figures

Figure 1

24 pages, 4455 KB  
Article
HDAMNet: Hierarchical Dilated Adaptive Mamba Network for Accurate Cloud Detection in Satellite Imagery
by Yongcong Wang, Yunxin Li, Xubing Yang, Rui Jiang and Li Zhang
Remote Sens. 2025, 17(17), 2992; https://doi.org/10.3390/rs17172992 - 28 Aug 2025
Viewed by 350
Abstract
Cloud detection is one of the primary challenges in preprocessing high-resolution remote sensing imagery, the accuracy of which is severely constrained by the multi-scale and complex morphological characteristics of clouds. Many approaches have been proposed to detect cloud. However, these methods still face [...] Read more.
Cloud detection is one of the primary challenges in preprocessing high-resolution remote sensing imagery, the accuracy of which is severely constrained by the multi-scale and complex morphological characteristics of clouds. Many approaches have been proposed to detect cloud. However, these methods still face significant challenges, particularly in handling the complexities of multi-scale cloud clusters and reliably distinguishing clouds from snow, ice and complex cloud shadows. To overcome these challenges, this paper proposes a novel cloud detection network based on the state space model (SSM), termed the Hierarchical Dilated Adaptive Mamba Network (HDAMNet). This network utilizes an encoder–decoder architecture, significantly expanding the receptive field and improving the capture of fine-grained cloud boundaries by introducing the Hierarchical Dilated Cross Scan (HDCS) mechanism in the encoder module. The multi-resolution adaptive feature extraction (MRAFE) integrates multi-scale semantic information to reduce channel confusion and emphasize essential features effectively. The Layer-wise Adaptive Attention (LAA) mechanism adaptively recalibrates features at skip connections, balancing fine-grained boundaries with global semantic information. On three public cloud detection datasets, HDAMNet achieves state-of-the-art performance across key evaluation metrics. Particularly noteworthy is its superior performance in identifying small-scale cloud clusters, delineating complex cloud–shadow boundaries, and mitigating interference from snow and ice. Full article
Show Figures

Figure 1

27 pages, 2279 KB  
Article
HQRNN-FD: A Hybrid Quantum Recurrent Neural Network for Fraud Detection
by Yao-Chong Li, Yi-Fan Zhang, Rui-Qing Xu, Ri-Gui Zhou and Yi-Lin Dong
Entropy 2025, 27(9), 906; https://doi.org/10.3390/e27090906 - 27 Aug 2025
Viewed by 366
Abstract
Detecting financial fraud is a critical aspect of modern intelligent financial systems. Despite the advances brought by deep learning in predictive accuracy, challenges persist—particularly in capturing complex, high-dimensional nonlinear features. This study introduces a novel hybrid quantum recurrent neural network for fraud detection [...] Read more.
Detecting financial fraud is a critical aspect of modern intelligent financial systems. Despite the advances brought by deep learning in predictive accuracy, challenges persist—particularly in capturing complex, high-dimensional nonlinear features. This study introduces a novel hybrid quantum recurrent neural network for fraud detection (HQRNN-FD). The model utilizes variational quantum circuits (VQCs) incorporating angle encoding, data reuploading, and hierarchical entanglement to project transaction features into quantum state spaces, thereby facilitating quantum-enhanced feature extraction. For sequential analysis, the model integrates a recurrent neural network (RNN) with a self-attention mechanism to effectively capture temporal dependencies and uncover latent fraudulent patterns. To mitigate class imbalance, the synthetic minority over-sampling technique (SMOTE) is employed during preprocessing, enhancing both class representation and model generalizability. Experimental evaluations reveal that HQRNN-FD attains an accuracy of 0.972 on publicly available fraud detection datasets, outperforming conventional models by 2.4%. In addition, the framework exhibits robustness against quantum noise and improved predictive performance with increasing qubit numbers, validating its efficacy and scalability for imbalanced financial classification tasks. Full article
(This article belongs to the Special Issue Quantum Computing in the NISQ Era)
Show Figures

Figure 1

24 pages, 3398 KB  
Article
DEMNet: Dual Encoder–Decoder Multi-Frame Infrared Small Target Detection Network with Motion Encoding
by Feng He, Qiran Zhang, Yichuan Li and Tianci Wang
Remote Sens. 2025, 17(17), 2963; https://doi.org/10.3390/rs17172963 - 26 Aug 2025
Viewed by 538
Abstract
Infrared dim and small target detection aims to accurately localize targets within complex backgrounds or clutter. However, under extremely low signal-to-noise ratio (SNR) conditions, single-frame detection methods often fail to effectively detect such targets. In contrast, multi-frame detection can exploit temporal cues to [...] Read more.
Infrared dim and small target detection aims to accurately localize targets within complex backgrounds or clutter. However, under extremely low signal-to-noise ratio (SNR) conditions, single-frame detection methods often fail to effectively detect such targets. In contrast, multi-frame detection can exploit temporal cues to significantly improve the probability of detection (Pd) and reduce false alarms (Fa). Existing multi-frame approaches often employ 3D convolutions/RNNs to implicitly extract temporal features. However, they typically lack explicit modeling of target motion. To address this, we propose a Dual Encoder–Decoder Multi-Frame Infrared Small Target Detection Network with Motion Encoding (DEMNet) that explicitly incorporates motion information into the detection process. The first multi-level encoder–decoder module leverages spatial and channel attention mechanisms to fuse hierarchical features across multiple scales, enabling robust spatial feature extraction from each frame of the temporally aligned input sequence. The second encoder–decoder module encodes both inter-frame target motion and intra-frame target positional information, followed by 3D convolution to achieve effective motion information fusion. Extensive experiments demonstrate that DEMNet achieves state-of-the-art performance, outperforming recent advanced methods such as DTUM and SSTNet. For the DAUB dataset, compared to the second-best model, DEMNet improves Pd by 2.42 percentage points and reduces Fa by 4.13 × 10−6 (a 68.72% reduction). For the NUDT dataset, it improves Pd by 1.68 percentage points and reduces Fa by 0.67 × 10−6 (a 7.26% reduction) compared to the next-best model. Notably, DEMNet demonstrates even greater advantages on test sequences with SNR ≤ 3. Full article
(This article belongs to the Special Issue Recent Advances in Infrared Target Detection)
Show Figures

Figure 1

23 pages, 16581 KB  
Article
SLD-YOLO: A Lightweight Satellite Component Detection Algorithm Based on Multi-Scale Feature Fusion and Attention Mechanism
by Yonghao Li, Hang Yang, Bo Lü and Xiaotian Wu
Remote Sens. 2025, 17(17), 2950; https://doi.org/10.3390/rs17172950 - 25 Aug 2025
Viewed by 482
Abstract
Space-based on-orbit servicing missions impose stringent requirements for precise identification and localization of satellite components, while existing detection algorithms face dual challenges of insufficient accuracy and excessive computational resource consumption. This paper proposes SLD-YOLO, a lightweight satellite component detection model based on improved [...] Read more.
Space-based on-orbit servicing missions impose stringent requirements for precise identification and localization of satellite components, while existing detection algorithms face dual challenges of insufficient accuracy and excessive computational resource consumption. This paper proposes SLD-YOLO, a lightweight satellite component detection model based on improved YOLO11, balancing accuracy and efficiency through structural optimization and lightweight design. First, we design RLNet, a lightweight backbone network that employs reparameterization mechanisms and hierarchical feature fusion strategies to reduce model complexity by 19.72% while maintaining detection accuracy. Second, we propose the CSP-HSF multi-scale feature fusion module, used in conjunction with PSConv downsampling, to effectively improve the model’s perception of multi-scale objects. Finally, we introduce SimAM, a parameter-free attention mechanism in the detection head to further improve feature representation capability. Experiments on the UESD dataset demonstrate that SLD-YOLO achieves measurable improvements compared to the baseline YOLO11s model across five satellite component detection categories: mAP50 increases by 2.22% to 87.44%, mAP50:95 improves by 1.72% to 63.25%, while computational complexity decreases by 19.72%, parameter count reduces by 25.93%, model file size compresses by 24.59%, and inference speed reaches 90.4 FPS. Validation experiments on the UESD_edition2 dataset further confirm the model’s robustness. This research provides an effective solution for target detection tasks in resource-constrained space environments, demonstrating practical engineering application value. Full article
(This article belongs to the Special Issue Advances in Remote Sensing Image Target Detection and Recognition)
Show Figures

Figure 1

26 pages, 3068 KB  
Article
EAR-CCPM-Net: A Cross-Modal Collaborative Perception Network for Early Accident Risk Prediction
by Wei Sun, Lili Nurliyana Abdullah, Fatimah Binti Khalid and Puteri Suhaiza Binti Sulaiman
Appl. Sci. 2025, 15(17), 9299; https://doi.org/10.3390/app15179299 - 24 Aug 2025
Viewed by 481
Abstract
Early traffic accident risk prediction in complex road environments poses significant challenges due to the heterogeneous nature and incomplete semantic alignment of multimodal data. To address this, we propose a novel Early Accident Risk Cross-modal Collaborative Perception Mechanism Network (EAR-CCPM-Net) that integrates hierarchical [...] Read more.
Early traffic accident risk prediction in complex road environments poses significant challenges due to the heterogeneous nature and incomplete semantic alignment of multimodal data. To address this, we propose a novel Early Accident Risk Cross-modal Collaborative Perception Mechanism Network (EAR-CCPM-Net) that integrates hierarchical fusion modules and cross-modal attention mechanisms to enable semantic interaction between visual, motion, and textual modalities. The model is trained and evaluated on the newly constructed CAP-DATA dataset, incorporating advanced preprocessing techniques such as bilateral filtering and a rigorous MINI-Train-Test sampling protocol. Experimental results show that EAR-CCPM-Net achieves an AUC of 0.853, AP of 0.758, and improves the Time-to-Accident (TTA0.5) from 3.927 s to 4.225 s, significantly outperforming baseline methods. These findings demonstrate that EAR-CCPM-Net effectively enhances early-stage semantic perception and prediction accuracy, providing an interpretable solution for real-world traffic risk anticipation. Full article
Show Figures

Figure 1

37 pages, 10467 KB  
Article
Cascaded Hierarchical Attention with Adaptive Fusion for Visual Grounding in Remote Sensing
by Huming Zhu, Tianqi Gao, Zhixian Li, Zhipeng Chen, Qiuming Li, Kongmiao Miao, Biao Hou and Licheng Jiao
Remote Sens. 2025, 17(17), 2930; https://doi.org/10.3390/rs17172930 - 23 Aug 2025
Viewed by 446
Abstract
Visual grounding for remote sensing (RSVG) is the task of localizing the referred object in remote sensing (RS) images by parsing free-form language descriptions. However, RSVG faces the challenge of low detection accuracy due to unbalanced multi-scale grounding capabilities, where large objects have [...] Read more.
Visual grounding for remote sensing (RSVG) is the task of localizing the referred object in remote sensing (RS) images by parsing free-form language descriptions. However, RSVG faces the challenge of low detection accuracy due to unbalanced multi-scale grounding capabilities, where large objects have more prominent grounding accuracy than small objects. Based on Faster R-CNN, we propose Faster R-CNN in Visual Grounding for Remote Sensing (FR-RSVG), a two-stage method for grounding RS objects. Building on this foundation, to enhance the ability to ground multi-scale objects, we propose Faster R-CNN with Adaptive Vision-Language Fusion (FR-AVLF), which introduces a layered Adaptive Vision-Language Fusion (AVLF) module. Specifically, this method can adaptively fuse deep or shallow visual features according to the input text (e.g., location-related or object characteristic descriptions), thereby optimizing semantic feature representation and improving grounding accuracy for objects of different scales. Given that RSVG is essentially an expanded form of RS object detection, and considering the knowledge the model acquired in prior RS object detection tasks, we propose Faster R-CNN with Adaptive Vision-Language Fusion Pretrained (FR-AVLFPRE). To further enhance model performance, we propose Faster R-CNN with Cascaded Hierarchical Attention Grounding and Multi-Level Adaptive Vision-Language Fusion Pretrained (FR-CHAGAVLFPRE), which introduces a cascaded hierarchical attention grounding mechanism, employs a more advanced language encoder, and improves upon AVLF by proposing Multi-Level AVLF, significantly improving localization accuracy in complex scenarios. Extensive experiments on the DIOR-RSVG dataset demonstrate that our model surpasses most existing advanced models. To validate the generalization capability of our model, we conducted zero-shot inference experiments on shared categories between DIOR-RSVG and both Complex Description DIOR-RSVG (DIOR-RSVG-C) and OPT-RSVG datasets, achieving performance superior to most existing models. Full article
(This article belongs to the Section AI Remote Sensing)
Show Figures

Figure 1

24 pages, 4538 KB  
Article
CNN–Transformer-Based Model for Maritime Blurred Target Recognition
by Tianyu Huang, Chao Pan, Jin Liu and Zhiwei Kang
Electronics 2025, 14(17), 3354; https://doi.org/10.3390/electronics14173354 - 23 Aug 2025
Viewed by 305
Abstract
In maritime blurred image recognition, ship collision accidents frequently result from three primary blur types: (1) motion blur from vessel movement in complex sea conditions, (2) defocus blur due to water vapor refraction, and (3) scattering blur caused by sea fog interference. This [...] Read more.
In maritime blurred image recognition, ship collision accidents frequently result from three primary blur types: (1) motion blur from vessel movement in complex sea conditions, (2) defocus blur due to water vapor refraction, and (3) scattering blur caused by sea fog interference. This paper proposes a dual-branch recognition method specifically designed for motion blur, which represents the most prevalent blur type in maritime scenarios. Conventional approaches exhibit constrained computational efficiency and limited adaptability across different modalities. To overcome these limitations, we propose a hybrid CNN–Transformer architecture: the CNN branch captures local blur characteristics, while the enhanced Transformer module models long-range dependencies via attention mechanisms. The CNN branch employs a lightweight ResNet variant, in which conventional residual blocks are substituted with Multi-Scale Gradient-Aware Residual Block (MSG-ARB). This architecture employs learnable gradient convolution for explicit local gradient feature extraction and utilizes gradient content gating to strengthen blur-sensitive region representation, significantly improving computational efficiency compared to conventional CNNs. The Transformer branch incorporates a Hierarchical Swin Transformer (HST) framework with Shifted Window-based Multi-head Self-Attention for global context modeling. The proposed method incorporates blur invariant Positional Encoding (PE) to enhance blur spectrum modeling capability, while employing DyT (Dynamic Tanh) module with learnable α parameters to replace traditional normalization layers. This architecture achieves a significant reduction in computational costs while preserving feature representation quality. Moreover, it efficiently computes long-range image dependencies using a compact 16 × 16 window configuration. The proposed feature fusion module synergistically integrates CNN-based local feature extraction with Transformer-enabled global representation learning, achieving comprehensive feature modeling across different scales. To evaluate the model’s performance and generalization ability, we conducted comprehensive experiments on four benchmark datasets: VAIS, GoPro, Mini-ImageNet, and Open Images V4. Experimental results show that our method achieves superior classification accuracy compared to state-of-the-art approaches, while simultaneously enhancing inference speed and reducing GPU memory consumption. Ablation studies confirm that the DyT module effectively suppresses outliers and improves computational efficiency, particularly when processing low-quality input data. Full article
Show Figures

Figure 1

28 pages, 7371 KB  
Article
Deep Fuzzy Fusion Network for Joint Hyperspectral and LiDAR Data Classification
by Guangen Liu, Jiale Song, Yonghe Chu, Lianchong Zhang, Peng Li and Junshi Xia
Remote Sens. 2025, 17(17), 2923; https://doi.org/10.3390/rs17172923 - 22 Aug 2025
Viewed by 497
Abstract
Recently, Transformers have made significant progress in the joint classification task of HSI and LiDAR due to their efficient modeling of long-range dependencies and adaptive feature learning mechanisms. However, existing methods face two key challenges: first, the feature extraction stage does not explicitly [...] Read more.
Recently, Transformers have made significant progress in the joint classification task of HSI and LiDAR due to their efficient modeling of long-range dependencies and adaptive feature learning mechanisms. However, existing methods face two key challenges: first, the feature extraction stage does not explicitly model category ambiguity; second, the feature fusion stage lacks a dynamic perception mechanism for inter-modal differences and uncertainties. To this end, this paper proposes a Deep Fuzzy Fusion Network (DFNet) for the joint classification of hyperspectral and LiDAR data. DFNet adopts a dual-branch architecture, integrating CNN and Transformer structures, respectively, to extract multi-scale spatial–spectral features from hyperspectral and LiDAR data. To enhance the model’s discriminative robustness in ambiguous regions, both branches incorporate fuzzy learning modules that model class uncertainty through learnable Gaussian membership functions. In the modality fusion stage, a Fuzzy-Enhanced Cross-Modal Fusion (FECF) module is designed, which combines membership-aware attention mechanisms with fuzzy inference operators to achieve dynamic adjustment of modality feature weights and efficient integration of complementary information. DFNet, through a hierarchical design, realizes uncertainty representation within and fusion control between modalities. The proposed DFNet is evaluated on three public datasets, and the extensive experimental results indicate that the proposed DFNet considerably outperforms other state-of-the-art methods. Full article
Show Figures

Figure 1

22 pages, 23322 KB  
Article
MS-PreTE: A Multi-Scale Pre-Training Encoder for Mobile Encrypted Traffic Classification
by Ziqi Wang, Yufan Qiu, Yaping Liu, Shuo Zhang and Xinyi Liu
Big Data Cogn. Comput. 2025, 9(8), 216; https://doi.org/10.3390/bdcc9080216 - 21 Aug 2025
Viewed by 477
Abstract
Mobile traffic classification serves as a fundamental component in network security systems. In recent years, pre-training methods have significantly advanced this field. However, as mobile traffic is typically mixed with third-party services, the deep integration of such shared services results in highly similar [...] Read more.
Mobile traffic classification serves as a fundamental component in network security systems. In recent years, pre-training methods have significantly advanced this field. However, as mobile traffic is typically mixed with third-party services, the deep integration of such shared services results in highly similar TCP flow characteristics across different applications. This makes it challenging for existing traffic classification methods to effectively identify mobile traffic. To address the challenge, we propose MS-PreTE, a two-phase pre-training framework for mobile traffic classification. MS-PreTE introduces a novel multi-level representation model to preserve traffic information from diverse perspectives and hierarchical levels. Furthermore, MS-PreTE incorporates a focal-attention mechanism to enhance the model’s capability in discerning subtle differences among similar traffic flows. Evaluations demonstrate that MS-PreTE achieves state-of-the-art performance on three mobile application datasets, boosting the F1 score for Cross-platform (iOS) to 99.34% (up by 2.1%), Cross-platform (Android) to 98.61% (up by 1.6%), and NUDT-Mobile-Traffic to 87.70% (up by 2.47%). Moreover, MS-PreTE exhibits strong generalization capabilities across four real-world traffic datasets. Full article
Show Figures

Figure 1

Back to TopTop