Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (486)

Search Parameters:
Keywords = pyramid neural network

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
23 pages, 6924 KiB  
Article
A Dynamic Multi-Scale Feature Fusion Network for Enhanced SAR Ship Detection
by Rui Cao and Jianghua Sui
Sensors 2025, 25(16), 5194; https://doi.org/10.3390/s25165194 - 21 Aug 2025
Viewed by 239
Abstract
This study aims to develop an enhanced YOLO algorithm to improve the ship detection performance of synthetic aperture radar (SAR) in complex marine environments. Current SAR ship detection methods face numerous challenges in complex sea conditions, including environmental interference, false detection, and multi-scale [...] Read more.
This study aims to develop an enhanced YOLO algorithm to improve the ship detection performance of synthetic aperture radar (SAR) in complex marine environments. Current SAR ship detection methods face numerous challenges in complex sea conditions, including environmental interference, false detection, and multi-scale changes in detection targets. To address these issues, this study adopts a technical solution that combines multi-level feature fusion with a dynamic detection mechanism. First, a cross-stage partial dynamic channel transformer module (CSP_DTB) was designed, which combines the transformer architecture with a convolutional neural network to replace the last two C3k2 layers in the YOLOv11n main network, thereby enhancing the model’s feature extraction capabilities. Second, a general dynamic feature pyramid network (RepGFPN) was introduced to reconstruct the neck network architecture, enabling more efficient multi-scale feature fusion and information propagation. Additionally, a lightweight dynamic decoupled dual-alignment head (DYDDH) was constructed to enhance the collaborative performance of localization and classification tasks through task-specific feature decoupling. Experimental results show that the proposed DRGD-YOLO algorithm achieves significant performance improvements. On the HRSID dataset, the algorithm achieves an average precision (mAP50) of 93.1% at an IoU threshold of 0.50 and an mAP50–95 of 69.2% over the IoU threshold range of 0.50–0.95. Compared to the baseline YOLOv11n algorithm, the proposed method improves mAP50 and mAP50–95 by 3.3% and 4.6%, respectively. The proposed DRGD-YOLO algorithm not only significantly improves the accuracy and robustness of synthetic aperture radar (SAR) ship detection but also demonstrates broad application potential in fields such as maritime surveillance, fisheries management, and maritime safety monitoring, providing technical support for the development of intelligent marine monitoring technology. Full article
(This article belongs to the Section Navigation and Positioning)
Show Figures

Figure 1

27 pages, 13262 KiB  
Article
MLP-MFF: Lightweight Pyramid Fusion MLP for Ultra-Efficient End-to-End Multi-Focus Image Fusion
by Yuze Song, Xinzhe Xie, Buyu Guo, Xiaofei Xiong and Peiliang Li
Sensors 2025, 25(16), 5146; https://doi.org/10.3390/s25165146 - 19 Aug 2025
Viewed by 348
Abstract
Limited depth of field in modern optical imaging systems often results in partially focused images. Multi-focus image fusion (MFF) addresses this by synthesizing an all-in-focus image from multiple source images captured at different focal planes. While deep learning-based MFF methods have shown promising [...] Read more.
Limited depth of field in modern optical imaging systems often results in partially focused images. Multi-focus image fusion (MFF) addresses this by synthesizing an all-in-focus image from multiple source images captured at different focal planes. While deep learning-based MFF methods have shown promising results, existing approaches face significant challenges. Convolutional Neural Networks (CNNs) often struggle to capture long-range dependencies effectively, while Transformer and Mamba-based architectures, despite their strengths, suffer from high computational costs and rigid input size constraints, frequently necessitating patch-wise fusion during inference—a compromise that undermines the realization of a true global receptive field. To overcome these limitations, we propose MLP-MFF, a novel lightweight, end-to-end MFF network built upon the Pyramid Fusion Multi-Layer Perceptron (PFMLP) architecture. MLP-MFF is specifically designed to handle flexible input scales, efficiently learn multi-scale feature representations, and capture critical long-range dependencies. Furthermore, we introduce a Dual-Path Adaptive Multi-scale Feature-Fusion Module based on Hybrid Attention (DAMFFM-HA), which adaptively integrates hybrid attention mechanisms and allocates weights to optimally fuse multi-scale features, thereby significantly enhancing fusion performance. Extensive experiments on public multi-focus image datasets demonstrate that our proposed MLP-MFF achieves competitive, and often superior, fusion quality compared to current state-of-the-art MFF methods, all while maintaining a lightweight and efficient architecture. Full article
Show Figures

Figure 1

16 pages, 3585 KiB  
Article
FedTP-NILM: A Federated Time Pattern-Based Framework for Privacy-Preserving Distributed Non-Intrusive Load Monitoring
by Chi Zhang, Biqi Liu, Xuguang Hu, Zhihong Zhang, Zhiyong Ji and Chenghao Zhou
Machines 2025, 13(8), 718; https://doi.org/10.3390/machines13080718 - 12 Aug 2025
Viewed by 231
Abstract
Existing non-intrusive load monitoring (NILM) methods predominantly rely on centralized models, which introduce privacy vulnerabilities and lack scalability in large industrial park scenarios equipped with distributed energy resources. To address this issue, a Federated Temporal Pattern-based NILM framework (FedTP-NILM) is proposed. It aims [...] Read more.
Existing non-intrusive load monitoring (NILM) methods predominantly rely on centralized models, which introduce privacy vulnerabilities and lack scalability in large industrial park scenarios equipped with distributed energy resources. To address this issue, a Federated Temporal Pattern-based NILM framework (FedTP-NILM) is proposed. It aims to ensure data privacy while enabling efficient load monitoring in distributed and heterogeneous environments, thereby extending the applicability of NILM technology in large-scale industrial park scenarios. First, a federated aggregation method is proposed, which integrates the FedYogi optimization algorithm with a secret sharing mechanism to enable the secure aggregation of local data. Second, a pyramid neural network architecture is presented to capture complex temporal dependencies in load identification tasks. It integrates temporal encoding, pooling, and decoding modules, along with an enhanced feature extractor, to better learn and distinguish multi-scale temporal patterns. In addition, a hybrid data augmentation strategy is proposed to expand the distribution range of samples by adding noise and linear mixing. Finally, experimental results validate the effectiveness of the proposed federated learning framework, demonstrating superior performance in both distributed energy device identification and privacy preservation. Full article
Show Figures

Figure 1

13 pages, 11739 KiB  
Article
DeepVinci: Organ and Tool Segmentation with Edge Supervision and a Densely Multi-Scale Pyramid Module for Robot-Assisted Surgery
by Li-An Tseng, Yuan-Chih Tsai, Meng-Yi Bai, Mei-Fang Li, Yi-Liang Lee, Kai-Jo Chiang, Yu-Chi Wang and Jing-Ming Guo
Diagnostics 2025, 15(15), 1917; https://doi.org/10.3390/diagnostics15151917 - 30 Jul 2025
Viewed by 371
Abstract
Background: Automated surgical navigation can be separated into three stages: (1) organ identification and localization, (2) identification of the organs requiring further surgery, and (3) automated planning of the operation path and steps. With its ideal visual and operating system, the da [...] Read more.
Background: Automated surgical navigation can be separated into three stages: (1) organ identification and localization, (2) identification of the organs requiring further surgery, and (3) automated planning of the operation path and steps. With its ideal visual and operating system, the da Vinci surgical system provides a promising platform for automated surgical navigation. This study focuses on the first step in automated surgical navigation by identifying organs in gynecological surgery. Methods: Due to the difficulty of collecting da Vinci gynecological endoscopy data, we propose DeepVinci, a novel end-to-end high-performance encoder–decoder network based on convolutional neural networks (CNNs) for pixel-level organ semantic segmentation. Specifically, to overcome the drawback of a limited field of view, we incorporate a densely multi-scale pyramid module and feature fusion module, which can also enhance the global context information. In addition, the system integrates an edge supervision network to refine the segmented results on the decoding side. Results: Experimental results show that DeepVinci can achieve state-of-the-art accuracy, obtaining dice similarity coefficient and mean pixel accuracy values of 0.684 and 0.700, respectively. Conclusions: The proposed DeepVinci network presents a practical and competitive semantic segmentation solution for da Vinci gynecological surgery. Full article
(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)
Show Figures

Figure 1

27 pages, 5740 KiB  
Article
Localization of Multiple GNSS Interference Sources Based on Target Detection in C/N0 Distribution Maps
by Qidong Chen, Rui Liu, Qiuzhen Yan, Yue Xu, Yang Liu, Xiao Huang and Ying Zhang
Remote Sens. 2025, 17(15), 2627; https://doi.org/10.3390/rs17152627 - 29 Jul 2025
Viewed by 377
Abstract
The localization of multiple interference sources in Global Navigation Satellite Systems (GNSS) can be achieved using carrier-to-noise ratio (C/N0) information provided by GNSS receivers, such as those embedded in smartphones. However, in increasingly prevalent complex scenarios—such as the coexistence of multiple [...] Read more.
The localization of multiple interference sources in Global Navigation Satellite Systems (GNSS) can be achieved using carrier-to-noise ratio (C/N0) information provided by GNSS receivers, such as those embedded in smartphones. However, in increasingly prevalent complex scenarios—such as the coexistence of multiple directional interferences, increased diversity and density of GNSS interference, and the presence of multiple low-power interference sources—conventional localization methods often fail to provide reliable results, thereby limiting their applicability in real-world environments. This paper presents a multi-interference sources localization method using object detection in GNSS C/N0 distribution maps. The proposed method first exploits the similarity between C/N0 data reported by GNSS receivers and image grayscale values to construct C/N0 distribution maps, thereby transforming the problem of multi-source GNSS interference localization into an object detection and localization task based on image processing techniques. Subsequently, an Oriented Squeeze-and-Excitation-based Faster Region-based Convolutional Neural Network (OSF-RCNN) framework is proposed to process the C/N0 distribution maps. Building upon the Faster R-CNN framework, the proposed method integrates an Oriented RPN (Region Proposal Network) to regress the orientation angles of directional antennas, effectively addressing their rotational characteristics. Additionally, the Squeeze-and-Excitation (SE) mechanism and the Feature Pyramid Network (FPN) are integrated at key stages of the network to improve sensitivity to small targets, thereby enhancing detection and localization performance for low-power interference sources. The simulation results verify the effectiveness of the proposed method in accurately localizing multiple interference sources under the increasingly prevalent complex scenarios described above. Full article
(This article belongs to the Special Issue Advanced Multi-GNSS Positioning and Its Applications in Geoscience)
Show Figures

Figure 1

27 pages, 11177 KiB  
Article
Robust Segmentation of Lung Proton and Hyperpolarized Gas MRI with Vision Transformers and CNNs: A Comparative Analysis of Performance Under Artificial Noise
by Ramtin Babaeipour, Matthew S. Fox, Grace Parraga and Alexei Ouriadov
Bioengineering 2025, 12(8), 808; https://doi.org/10.3390/bioengineering12080808 - 28 Jul 2025
Viewed by 428
Abstract
Accurate segmentation in medical imaging is essential for disease diagnosis and monitoring, particularly in lung imaging using proton and hyperpolarized gas MRI. However, image degradation due to noise and artifacts—especially in hyperpolarized gas MRI, where scans are acquired during breath-holds—poses challenges for conventional [...] Read more.
Accurate segmentation in medical imaging is essential for disease diagnosis and monitoring, particularly in lung imaging using proton and hyperpolarized gas MRI. However, image degradation due to noise and artifacts—especially in hyperpolarized gas MRI, where scans are acquired during breath-holds—poses challenges for conventional segmentation algorithms. This study evaluates the robustness of deep learning segmentation models under varying Gaussian noise levels, comparing traditional convolutional neural networks (CNNs) with modern Vision Transformer (ViT)-based models. Using a dataset of proton and hyperpolarized gas MRI slices from 56 participants, we trained and tested Feature Pyramid Network (FPN) and U-Net architectures with both CNN (VGG16, VGG19, ResNet152) and ViT (MiT-B0, B3, B5) backbones. Results showed that ViT-based models, particularly those using the SegFormer backbone, consistently outperformed CNN-based counterparts across all metrics and noise levels. The performance gap was especially pronounced in high-noise conditions, where transformer models retained higher Dice scores and lower boundary errors. These findings highlight the potential of ViT-based architectures for deployment in clinically realistic, low-SNR environments such as hyperpolarized gas MRI, where segmentation reliability is critical. Full article
Show Figures

Figure 1

32 pages, 6141 KiB  
Perspective
A Brief Perspective on Deep Learning Approaches for 2D Semantic Segmentation
by Shazia Sulemane, Nuno Fachada and João P. Matos-Carvalho
Eng 2025, 6(7), 165; https://doi.org/10.3390/eng6070165 - 18 Jul 2025
Viewed by 601
Abstract
Semantic segmentation is a vast field with many contributions, which can be difficult to organize and comprehend due to the amount of research available. Advancements in technology and processing power over the past decade have led to a significant increase in the number [...] Read more.
Semantic segmentation is a vast field with many contributions, which can be difficult to organize and comprehend due to the amount of research available. Advancements in technology and processing power over the past decade have led to a significant increase in the number of developed models and architectures. This paper provides a brief perspective on 2D segmentation by summarizing the mechanisms of various neural network models and the tools and datasets used for their training, testing, and evaluation. Additionally, this paper discusses methods for identifying new architectures, such as Neural Architecture Search, and explores the emerging research field of continuous learning, which aims to develop models capable of learning continuously from new data. Full article
(This article belongs to the Special Issue Artificial Intelligence for Engineering Applications, 2nd Edition)
Show Figures

Figure 1

15 pages, 1142 KiB  
Technical Note
Terrain and Atmosphere Classification Framework on Satellite Data Through Attentional Feature Fusion Network
by Antoni Jaszcz and Dawid Połap
Remote Sens. 2025, 17(14), 2477; https://doi.org/10.3390/rs17142477 - 17 Jul 2025
Viewed by 273
Abstract
Surface, terrain, or even atmosphere analysis using images or their fragments is important due to the possibilities of further processing. In particular, attention is necessary for satellite and/or drone images. Analyzing image elements by classifying the given classes is important for obtaining information [...] Read more.
Surface, terrain, or even atmosphere analysis using images or their fragments is important due to the possibilities of further processing. In particular, attention is necessary for satellite and/or drone images. Analyzing image elements by classifying the given classes is important for obtaining information about space for autonomous systems, identifying landscape elements, or monitoring and maintaining the infrastructure and environment. Hence, in this paper, we propose a neural classifier architecture that analyzes different features by the parallel processing of information in the network and combines them with a feature fusion mechanism. The neural architecture model takes into account different types of features by extracting them by focusing on spatial, local patterns and multi-scale representation. In addition, the classifier is guided by an attention mechanism for focusing more on different channels, spatial information, and even feature pyramid mechanisms. Atrous convolutional operators were also used in such an architecture as better context feature extractors. The proposed classifier architecture is the main element of the modeled framework for satellite data analysis, which is based on the possibility of training depending on the client’s desire. The proposed methodology was evaluated on three publicly available classification datasets for remote sensing: satellite images, Visual Terrain Recognition, and USTC SmokeRS, where the proposed model achieved accuracy scores of 97.8%, 100.0%, and 92.4%, respectively. The obtained results indicate the effectiveness of the proposed attention mechanisms across different remote sensing challenges. Full article
Show Figures

Figure 1

21 pages, 24495 KiB  
Article
UAMS: An Unsupervised Anomaly Detection Method Integrating MSAA and SSPCAB
by Zhe Li, Wenhui Chen and Weijie Wang
Symmetry 2025, 17(7), 1119; https://doi.org/10.3390/sym17071119 - 12 Jul 2025
Viewed by 390
Abstract
Anomaly detection methods play a crucial role in automated quality control within modern manufacturing systems. In this context, unsupervised methods are increasingly favored due to their independence from large-scale labeled datasets. However, existing methods present limited multi-scale feature extraction ability and may fail [...] Read more.
Anomaly detection methods play a crucial role in automated quality control within modern manufacturing systems. In this context, unsupervised methods are increasingly favored due to their independence from large-scale labeled datasets. However, existing methods present limited multi-scale feature extraction ability and may fail to effectively capture subtle anomalies. To address these challenges, we propose UAMS, a pyramid-structured normalization flow framework that leverages the symmetry in feature recombination to harmonize multi-scale interactions. The proposed framework integrates a Multi-Scale Attention Aggregation (MSAA) module for cross-scale dynamic fusion, as well as a Self-Supervised Predictive Convolutional Attention Block (SSPCAB) for spatial channel attention and masked prediction learning. Experiments on the MVTecAD dataset show that UAMS largely outperforms state-of-the-art unsupervised methods, in terms of detection and localization accuracy, while maintaining high inference efficiency. For example, when comparing UAMS against the baseline model on the carpet category, the AUROC is improved from 90.8% to 94.5%, and AUPRO is improved from 91.0% to 92.9%. These findings validate the potential of the proposed method for use in real industrial inspection scenarios. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

26 pages, 6233 KiB  
Article
A Method for Recognizing Dead Sea Bass Based on Improved YOLOv8n
by Lizhen Zhang, Chong Xu, Sai Jiang, Mengxiang Zhu and Di Wu
Sensors 2025, 25(14), 4318; https://doi.org/10.3390/s25144318 - 10 Jul 2025
Viewed by 317
Abstract
Deaths occur during the culture of sea bass, and if timely harvesting is not carried out, it will lead to water pollution and the continued spread of sea bass deaths. Therefore, it is necessary to promptly detect dead fish and take countermeasures. Existing [...] Read more.
Deaths occur during the culture of sea bass, and if timely harvesting is not carried out, it will lead to water pollution and the continued spread of sea bass deaths. Therefore, it is necessary to promptly detect dead fish and take countermeasures. Existing object detection algorithms, when applied to the task of detecting dead sea bass, often suffer from excessive model complexity, high computational cost, and reduced accuracy in the presence of occlusion. To overcome these limitations, this study introduces YOLOv8n-Deadfish, a lightweight and high-precision detection model. First, the homemade sea bass death recognition dataset was expanded to enhance the generalization ability of the neural network. Second, the C2f-faster–EMA (efficient multi-scale attention) convolutional module was designed to replace the C2f module in the backbone network of YOLOv8n, reducing redundant calculations and memory access, thereby more effectively extracting spatial features. Then, a weighted bidirectional feature pyramid network (BiFPN) was introduced to achieve a more thorough integration of deep and shallow features. Finally, in order to compensate for the weak generalization and slow convergence of the CIoU loss function in detection tasks, the Inner-CIoU loss function was used to accelerate bounding box regression and further improve the detection performance of the model. The experimental results show that the YOLOv8n-Deadfish model has an accuracy, recall, and mean precision of 90.0%, 90.4%, and 93.6%, respectively, which is an improvement of 2.0, 1.4, and 1.3 percentage points, respectively, over the original base network YOLOv8n. The number of model parameters and GFLOPs were reduced by 23.3% and 18.5%, respectively, and the detection speed was improved from the original 304.5 FPS to 424.6 FPS. This method can provide a technical basis for the identification of dead sea bass in the process of intelligent aquaculture. Full article
(This article belongs to the Section Smart Agriculture)
Show Figures

Figure 1

24 pages, 2149 KiB  
Article
STA-3D: Combining Spatiotemporal Attention and 3D Convolutional Networks for Robust Deepfake Detection
by Jingbo Wang, Jun Lei, Shuohao Li and Jun Zhang
Symmetry 2025, 17(7), 1037; https://doi.org/10.3390/sym17071037 - 1 Jul 2025
Viewed by 694
Abstract
Recent advancements in deep learning have driven the rapid proliferation of deepfake generation techniques, raising substantial concerns over digital security and trustworthiness. Most current detection methods primarily focus on spatial or frequency domain features but show limited effectiveness when dealing with compressed videos [...] Read more.
Recent advancements in deep learning have driven the rapid proliferation of deepfake generation techniques, raising substantial concerns over digital security and trustworthiness. Most current detection methods primarily focus on spatial or frequency domain features but show limited effectiveness when dealing with compressed videos and cross-dataset scenarios. Observing that mainstream generation methods use frame-by-frame synthesis without adequate temporal consistency constraints, we introduce the Spatiotemporal Attention 3D Network (STA-3D), a novel framework that combines a lightweight spatiotemporal attention module with a 3D convolutional architecture to improve detection robustness. The proposed attention module adopts a symmetric multi-branch architecture, where each branch follows a nearly identical processing pipeline to separately model temporal-channel, temporal-spatial, and intra-spatial correlations. Our framework additionally implements Spatial Pyramid Pooling (SPP) layers along the temporal axis, enabling adaptive modeling regardless of input video length. Furthermore, we mitigate the inherent asymmetry in the quantity of authentic and forged samples by replacing standard cross entropy with focal loss for training. This integration facilitates the simultaneous exploitation of inter-frame temporal discontinuities and intra-frame spatial artifacts, achieving competitive performance across various benchmark datasets under different compression conditions: for the intra-dataset setting on FF++, it improves the average accuracy by 1.09 percentage points compared to existing SOTA, with a more significant gain of 1.63 percentage points under the most challenging C40 compression level (particularly for NeuralTextures, achieving an improvement of 4.05 percentage points); while for the intra-dataset setting, AUC is enhanced by 0.24 percentage points on the DFDC-P dataset. Full article
Show Figures

Figure 1

16 pages, 3892 KiB  
Article
Fault Diagnosis Method for Shearer Arm Gear Based on Improved S-Transform and Depthwise Separable Convolution
by Haiyang Wu, Hui Zhou, Chang Liu, Gang Cheng and Yusong Pang
Sensors 2025, 25(13), 4067; https://doi.org/10.3390/s25134067 - 30 Jun 2025
Viewed by 342
Abstract
To address the limitations in time–frequency feature representation of shearer arm gear faults and the issues of parameter redundancy and low training efficiency in standard convolutional neural networks (CNNs), this study proposes a diagnostic method based on an improved S-transform and a Depthwise [...] Read more.
To address the limitations in time–frequency feature representation of shearer arm gear faults and the issues of parameter redundancy and low training efficiency in standard convolutional neural networks (CNNs), this study proposes a diagnostic method based on an improved S-transform and a Depthwise Separable Convolutional Neural Network (DSCNN). First, the improved S-transform is employed to perform time–frequency analysis on the vibration signals, converting the original one-dimensional signals into two-dimensional time–frequency images to fully preserve the fault characteristics of the gear. Then, a neural network model combining standard convolution and depthwise separable convolution is constructed for fault identification. The experimental dataset includes five gear conditions: tooth deficiency, tooth breakage, tooth wear, tooth crack, and normal. The performance of various frequency-domain and time-frequency methods—Wavelet Transform, Fourier Transform, S-transform, and Gramian Angular Field (GAF)—is compared using the same network model. Furthermore, Grad-CAM is applied to visualize the responses of key convolutional layers, highlighting the regions of interest related to gear fault features. Finally, four typical CNN architectures are analyzed and compared: Deep Convolutional Neural Network (DCNN), InceptionV3, Residual Network (ResNet), and Pyramid Convolutional Neural Network (PCNN). Experimental results demonstrate that frequency–domain representations consistently outperform raw time-domain signals in fault diagnosis tasks. Grad-CAM effectively verifies the model’s accurate focus on critical fault features. Moreover, the proposed method achieves high classification accuracy while reducing both training time and the number of model parameters. Full article
(This article belongs to the Section Fault Diagnosis & Sensors)
Show Figures

Figure 1

24 pages, 25315 KiB  
Article
PAMFPN: Position-Aware Multi-Kernel Feature Pyramid Network with Adaptive Sparse Attention for Robust Object Detection in Remote Sensing Imagery
by Xiaofei Yang, Suihua Xue, Lin Li, Sihuan Li, Yudong Fang, Xiaofeng Zhang and Xiaohui Huang
Remote Sens. 2025, 17(13), 2213; https://doi.org/10.3390/rs17132213 - 27 Jun 2025
Viewed by 555
Abstract
Deep learning methods have achieved remarkable success in remote sensing object detection. Existing object detection methods focus on integrating convolutional neural networks (CNNs) and Transformer networks to explore local and global representations to improve performance. However, existing methods relying on fixed convolutional kernels [...] Read more.
Deep learning methods have achieved remarkable success in remote sensing object detection. Existing object detection methods focus on integrating convolutional neural networks (CNNs) and Transformer networks to explore local and global representations to improve performance. However, existing methods relying on fixed convolutional kernels and dense global attention mechanisms suffer from computational redundancy and insufficient discriminative feature extraction, particularly for small and rotation-sensitive targets. To address these limitations, we propose a Dynamic Multi-Kernel Position-Aware Feature Pyramid Network (PAMFPN), which integrates adaptive sparse position modeling and multi-kernel dynamic fusion to achieve robust feature representation. Firstly, we design a position-interactive context module (PICM) that incorporates distance-aware sparse attention and dynamic positional encoding. It selectively focuses computation on sparse targets through a decay function that suppresses background noise while enhancing spatial correlations of critical regions. Secondly, we design a dual-kernel adaptive fusion (DKAF) architecture by combining region-sensitive attention (RSA) and reconfigurable context aggregation (RCA). RSA employs orthogonal large-kernel convolutions to capture anisotropic spatial features for arbitrarily oriented targets, while RCA dynamically adjusts the kernel scales based on content complexity, effectively addressing scale variations and intraclass diversity. Extensive experiments on three benchmark datasets (DOTA-v1.0, SSDD, HWPUVHR-10) demonstrate the effectiveness and versatility of the proposed PAMFPN. This work bridges the gap between efficient computation and robust feature fusion in remote sensing detection, offering a universal solution for real-world applications. Full article
(This article belongs to the Special Issue AI-Driven Hyperspectral Remote Sensing of Atmosphere and Land)
Show Figures

Figure 1

20 pages, 67212 KiB  
Article
KPV-UNet: KAN PP-VSSA UNet for Remote Image Segmentation
by Shuiping Zhang, Qiang Rao, Lei Wang, Tang Tang and Chen Chen
Electronics 2025, 14(13), 2534; https://doi.org/10.3390/electronics14132534 - 23 Jun 2025
Viewed by 569
Abstract
Semantic segmentation of remote sensing images is a key technology for land cover interpretation and target identification. Although convolutional neural networks (CNNs) have achieved remarkable success in this field, their inherent limitation of local receptive fields restricts their ability to model long-range dependencies [...] Read more.
Semantic segmentation of remote sensing images is a key technology for land cover interpretation and target identification. Although convolutional neural networks (CNNs) have achieved remarkable success in this field, their inherent limitation of local receptive fields restricts their ability to model long-range dependencies and global contextual information. As a result, CNN-based methods often struggle to capture the comprehensive spatial context necessary for accurate segmentation in complex remote sensing scenes, leading to issues such as the misclassification of small objects and blurred or imprecise object boundaries. To address these problems, this paper proposes a new hybrid architecture called KPV-UNet, which integrates the Kolmogorov–Arnold Network (KAN) and the Pyramid Pooling Visual State Space Attention (PP-VSSA) block. KPV-UNet introduces a deep feature refinement module based on KAN and incorporates PP-VSSA to enable scalable long-range modeling. This design effectively captures global dependencies and abundant localized semantic content extracted from complex feature spaces, overcoming CNNs’ limitations in modeling long-range dependencies and inter-national context in large-scale complex scenes. In addition, we designed an Auxiliary Local Monitoring (ALM) block that significantly enhances KPV-UNet’s perception of local content. Experimental results demonstrate that KPV-UNet outperforms state-of-the-art methods on the Vaihingen, LoveDA Urban, and WHDLD datasets, achieving mIoU scores of 84.03%, 51.27%, and 62.87%, respectively. The proposed method not only improves segmentation accuracy but also produces clearer and more connected object boundaries in visual results. Full article
Show Figures

Figure 1

24 pages, 6594 KiB  
Article
GAT-Enhanced YOLOv8_L with Dilated Encoder for Multi-Scale Space Object Detection
by Haifeng Zhang, Han Ai, Donglin Xue, Zeyu He, Haoran Zhu, Delian Liu, Jianzhong Cao and Chao Mei
Remote Sens. 2025, 17(13), 2119; https://doi.org/10.3390/rs17132119 - 20 Jun 2025
Viewed by 531
Abstract
The problem of inadequate object detection accuracy in complex remote sensing scenarios has been identified as a primary concern. Traditional YOLO-series algorithms encounter challenges such as poor robustness in small object detection and significant interference from complex backgrounds. In this paper, a multi-scale [...] Read more.
The problem of inadequate object detection accuracy in complex remote sensing scenarios has been identified as a primary concern. Traditional YOLO-series algorithms encounter challenges such as poor robustness in small object detection and significant interference from complex backgrounds. In this paper, a multi-scale feature fusion framework based on an improved version of YOLOv8_L is proposed. The combination of a graph attention network (GAT) and Dilated Encoder network significantly improves the algorithm detection and recognition performance for space remote sensing objects. It mainly includes abandoning the original Feature Pyramid Network (FPN) structure, proposing an adaptive fusion strategy based on multi-level features of backbone network, enhancing the expression ability of multi-scale objects through upsampling and feature stacking, and reconstructing the FPN. The local features extracted by convolutional neural networks are mapped to graph-structured data, and the nodal attention mechanism of GAT is used to capture the global topological association of space objects, which makes up for the deficiency of the convolutional operation in weight allocation and realizes GAT integration. The Dilated Encoder network is introduced to cover different-scale targets by differentiating receptive fields, and the feature weight allocation is optimized by combining it with a Convolutional Block Attention Module (CBAM). According to the characteristics of space missions, an annotated dataset containing 8000 satellite and space station images is constructed, covering a variety of lighting, attitude and scale scenes, and providing benchmark support for model training and verification. Experimental results on the space object dataset reveal that the enhanced algorithm achieves a mean average precision (mAP) of 97.2%, representing a 2.1% improvement over the original YOLOv8_L. Comparative experiments with six other models demonstrate that the proposed algorithm outperforms its counterparts. Ablation studies further validate the synergistic effect between the graph attention network (GAT) and the Dilated Encoder. The results indicate that the model maintains a high detection accuracy under challenging conditions, including strong light interference, multi-scale variations, and low-light environments. Full article
(This article belongs to the Special Issue Remote Sensing Image Thorough Analysis by Advanced Machine Learning)
Show Figures

Figure 1

Back to TopTop