MDPI - Publisher of Open Access Journals

23 pages, 9065 KB

Open AccessArticle

Multi-Scale Guided Context-Aware Transformer for Remote Sensing Building Extraction

by Mengxuan Yu, Jiepan Li and Wei He

Sensors 2025, 25(17), 5356; https://doi.org/10.3390/s25175356 - 29 Aug 2025

Viewed by 249

Building extraction from high-resolution remote sensing imagery is critical for urban planning and disaster management, yet remains challenging due to significant intra-class variability in architectural styles and multi-scale distribution patterns of buildings. To address these limitations, we propose the Multi-Scale Guided Context-Aware Network [...] Read more.

Building extraction from high-resolution remote sensing imagery is critical for urban planning and disaster management, yet remains challenging due to significant intra-class variability in architectural styles and multi-scale distribution patterns of buildings. To address these limitations, we propose the Multi-Scale Guided Context-Aware Network (MSGCANet), a Transformer-based multi-scale guided context-aware network. Our framework integrates a Contextual Exploration Module (CEM) that synergizes asymmetric and progressive dilated convolutions to hierarchically expand receptive fields, enhancing discriminability for dense building features. We further design a Window-Guided Multi-Scale Attention Mechanism (WGMSAM) to dynamically establish cross-scale spatial dependencies through adaptive window partitioning, enabling precise fusion of local geometric details and global contextual semantics. Additionally, a cross-level Transformer decoder leverages deformable convolutions for spatially adaptive feature alignment and joint channel-spatial modeling. Experimental results show that MSGCANet achieves IoU values of 75.47%, 91.53%, and 83.10%, and F1-scores of 86.03%, 95.59%, and 90.78% on the Massachusetts, WHU, and Inria datasets, respectively, demonstrating robust performance across these datasets. Full article

(This article belongs to the Section Optical Sensors)

► Show Figures

Figure 1

21 pages, 5171 KB

Open AccessArticle

FDBRP: A Data–Model Co-Optimization Framework Towards Higher-Accuracy Bearing RUL Prediction

by Muyu Lin, Qing Ye, Shiyue Na, Dongmei Qin, Xiaoyu Gao and Qiang Liu

Sensors 2025, 25(17), 5347; https://doi.org/10.3390/s25175347 - 28 Aug 2025

Viewed by 296

Abstract

This paper proposes Feature fusion and Dilated causal convolution model for Bearing Remaining useful life Prediction (FDBRP), an integrated framework for accurate Remaining Useful Life (RUL) prediction of rolling bearings that combines three key innovations: (1) a data augmentation module employing sliding-window processing [...] Read more.

This paper proposes Feature fusion and Dilated causal convolution model for Bearing Remaining useful life Prediction (FDBRP), an integrated framework for accurate Remaining Useful Life (RUL) prediction of rolling bearings that combines three key innovations: (1) a data augmentation module employing sliding-window processing and two-dimensional feature concatenation with label normalization to enhance signal representation and improve model generalizability, (2) a feature fusion module incorporating an enhanced graph convolutional network for spatial modeling, an improved multi-scale temporal convolution for dynamic pattern extraction, and an efficient multi-scale attention mechanism to optimize spatiotemporal feature consistency, and (3) an optimized dilated convolution module utilizing interval sampling to expand the receptive field, and combines the residual connection structure to realize the regularization of the neural network and enhance the ability of the model to capture long-range dependencies. Experimental validation showcases the effectiveness of proposed approach, achieving a high average score of 0.756564 and demonstrating a lower average error of 10.903656 in RUL prediction for test bearings compared to state-of-the-art benchmarks. This highlights the superior RUL prediction capability of the proposed methodology. Full article

(This article belongs to the Section Industrial Sensors)

► Show Figures

Figure 1

19 pages, 14216 KB

Open AccessArticle

LRA-YOLO: A Lightweight Power Equipment Detection Algorithm Based on Large Receptive Field and Attention Guidance

by Jiwen Yuan, Lei Hu and Qimin Hu

Information 2025, 16(9), 736; https://doi.org/10.3390/info16090736 - 26 Aug 2025

Viewed by 324

Abstract

Power equipment detection is a critical component in power transmission line inspection. However, existing power equipment detection algorithms often face problems such as large model sizes and high computational complexity. This paper proposes a lightweight power equipment detection algorithm based on large receptive [...] Read more.

Power equipment detection is a critical component in power transmission line inspection. However, existing power equipment detection algorithms often face problems such as large model sizes and high computational complexity. This paper proposes a lightweight power equipment detection algorithm based on large receptive field and attention guidance. First, we propose a lightweight large receptive field feature extraction module, CRepLK, which reparameterizes multiple branches into large kernel convolution to improve the multi-scale detection capability of the model; secondly, we propose a lightweight ELA-guided Dynamic Sampling Fusion (LEDSF) Neck, which alleviates the feature misalignment problem inherent in conventional neck networks to a certain extent; finally, we propose a lightweight Partial Asymmetric Detection Head (PADH), which utilizes the redundancy of feature maps to achieve the significant light weight of the detection head. Experimental results show that on the Insplad power equipment dataset, the number of parameters, computational cost (GFLOPs) and the size of the model weight are reduced by 46.8%, 44.1% and 46.4%, respectively, compared with the Baseline model, while the mAP is improved by 1%. Comparative experiments on three power equipment datasets show that our model achieves a compelling balance between efficiency and detection performance in power equipment detection scenarios. Full article

(This article belongs to the Special Issue Intelligent Image Processing by Deep Learning, 2nd Edition)

► Show Figures

Figure 1

23 pages, 4531 KB

Open AccessArticle

RDL-YOLO: A Method for the Detection of Leaf Pests and Diseases in Cotton Based on YOLOv11

by Xingchao Zhang, Li Li, Zhihua Bian, Chenxu Dai, Zhanlin Ji and Jinyun Liu

Agronomy 2025, 15(8), 1989; https://doi.org/10.3390/agronomy15081989 - 19 Aug 2025

Viewed by 442

Abstract

Accurate identification of cotton leaf pests and diseases is essential for sustainable cultivation but is challenged by complex backgrounds, diverse pest morphologies, and varied symptoms, where existing deep learning models often show insufficient robustness. To address these challenges, RDL-YOLO model is proposed in [...] Read more.

Accurate identification of cotton leaf pests and diseases is essential for sustainable cultivation but is challenged by complex backgrounds, diverse pest morphologies, and varied symptoms, where existing deep learning models often show insufficient robustness. To address these challenges, RDL-YOLO model is proposed in this study. In the proposed model, RepViT-Atrous Convolution (RepViT-A) is employed as the backbone network to enhance local–global interaction and improve the response intensity and extraction accuracy of key lesion features. In addition, the Dilated Dense Convolution (DDC) module is designed to achieve a dynamic multi-scale receptive field, enabling the network to adapt to lesion defects of different shapes and sizes. LDConv further optimizes the effect of feature fusion. Experimental results showed that the mean Average Precision (mAP) of the proposed model reached 77.1%, representing a 3.7% improvement over the baseline YOLOv11. Compared with leading detectors such as Real-Time Detection Transformer (RT-DETR), You Only Look Once version 11 (YOLOv11), DETRs as Fine-grained Distribution Refinement (D-FINE), and Spatial Transformer Network-YOLO (STN-YOLO). RDL-YOLO exhibits superior performance, enhanced reliability, and strong generalization capabilities in tests on the cotton leaf dataset and public datasets. This advancement offers a practical technical solution for improved agricultural pest and disease management. Full article

(This article belongs to the Special Issue Smart Pest Control for Building Farm Resilience)

► Show Figures

Figure 1

24 pages, 3961 KB

Open AccessArticle

Hierarchical Multi-Scale Mamba with Tubular Structure-Aware Convolution for Retinal Vessel Segmentation

by Tao Wang, Dongyuan Tian, Haonan Zhao, Jiamin Liu, Weijie Wang, Chunpei Li and Guixia Liu

Entropy 2025, 27(8), 862; https://doi.org/10.3390/e27080862 - 14 Aug 2025

Viewed by 428

Abstract

Retinal vessel segmentation plays a crucial role in diagnosing various retinal and cardiovascular diseases and serves as a foundation for computer-aided diagnostic systems. Blood vessels in color retinal fundus images, captured using fundus cameras, are often affected by illumination variations and noise, making [...] Read more.

Retinal vessel segmentation plays a crucial role in diagnosing various retinal and cardiovascular diseases and serves as a foundation for computer-aided diagnostic systems. Blood vessels in color retinal fundus images, captured using fundus cameras, are often affected by illumination variations and noise, making it difficult to preserve vascular integrity and posing a significant challenge for vessel segmentation. In this paper, we propose HM-Mamba, a novel hierarchical multi-scale Mamba-based architecture that incorporates tubular structure-aware convolution to extract both local and global vascular features for retinal vessel segmentation. First, we introduce a tubular structure-aware convolution to reinforce vessel continuity and integrity. Building on this, we design a multi-scale fusion module that aggregates features across varying receptive fields, enhancing the model’s robustness in representing both primary trunks and fine branches. Second, we integrate multi-branch Fourier transform with the dynamic state modeling capability of Mamba to capture both long-range dependencies and multi-frequency information. This design enables robust feature representation and adaptive fusion, thereby enhancing the network’s ability to model complex spatial patterns. Furthermore, we propose a hierarchical multi-scale interactive Mamba block that integrates multi-level encoder features through gated Mamba-based global context modeling and residual connections, enabling effective multi-scale semantic fusion and reducing detail loss during downsampling. Extensive evaluations on five widely used benchmark datasets—DRIVE, CHASE_DB1, STARE, IOSTAR, and LES-AV—demonstrate the superior performance of HM-Mamba, yielding Dice coefficients of 0.8327, 0.8197, 0.8239, 0.8307, and 0.8426, respectively. Full article

(This article belongs to the Special Issue Methods in Artificial Intelligence and Information Processing, Third Edition)

► Show Figures

Figure 1

20 pages, 3862 KB

Open AccessArticle

BlueberryNet: A Lightweight CNN for Real-Time Ripeness Detection in Automated Blueberry Processing Systems

by Bojian Yu, Hongwei Zhao and Xinwei Zhang

Processes 2025, 13(8), 2518; https://doi.org/10.3390/pr13082518 - 10 Aug 2025

Viewed by 401

Abstract

Blueberries are valued for their flavor and health benefits, but inconsistent ripeness at harvest complicates post-harvest food processing such as sorting and quality control. To address this, we propose a lightweight convolutional neural network (CNN) to detect blueberry ripeness in complex field environments, [...] Read more.

Blueberries are valued for their flavor and health benefits, but inconsistent ripeness at harvest complicates post-harvest food processing such as sorting and quality control. To address this, we propose a lightweight convolutional neural network (CNN) to detect blueberry ripeness in complex field environments, supporting efficient and automated food processing workflows. To meet the low-power and low-resource demands of embedded systems used in smart processing lines, we introduce a Grouped Large Kernel Reparameterization (GLKRep) module. This design reduces computational cost while enhancing the model’s ability to recognize ripe blueberries under complex lighting and background conditions. We also propose a Unified Adaptive Multi-Scale Fusion (UMSF) detection head that adaptively integrates multi-scale features using a dynamic receptive field. This enables the model to detect blueberries of various sizes accurately, a common challenge in real-world harvests. During training, a Semantics-Aware IoU (SAIoU) loss function is used to improve the alignment between predicted and ground truth regions by emphasizing semantic consistency. The model achieves 98.1% accuracy with only 2.6M parameters, outperforming existing methods. Its high accuracy, compact size, and low computational load make it suitable for real-time deployment in embedded sorting and grading systems, bridging field detection and downstream food-processing tasks. Full article

(This article belongs to the Section AI-Enabled Process Engineering)

► Show Figures

Figure 1

24 pages, 1471 KB

Open AccessArticle

WDM-UNet: A Wavelet-Deformable Gated Fusion Network for Multi-Scale Retinal Vessel Segmentation

by Xinlong Li and Hang Zhou

Sensors 2025, 25(15), 4840; https://doi.org/10.3390/s25154840 - 6 Aug 2025

Viewed by 438

Abstract

Retinal vessel segmentation in fundus images is critical for diagnosing microvascular and ophthalmologic diseases. However, the task remains challenging due to significant vessel width variation and low vessel-to-background contrast. To address these limitations, we propose WDM-UNet, a novel spatial-wavelet dual-domain fusion architecture that [...] Read more.

Retinal vessel segmentation in fundus images is critical for diagnosing microvascular and ophthalmologic diseases. However, the task remains challenging due to significant vessel width variation and low vessel-to-background contrast. To address these limitations, we propose WDM-UNet, a novel spatial-wavelet dual-domain fusion architecture that integrates spatial and wavelet-domain representations to simultaneously enhance the local detail and global context. The encoder combines a Deformable Convolution Encoder (DCE), which adaptively models complex vascular structures through dynamic receptive fields, and a Wavelet Convolution Encoder (WCE), which captures the semantic and structural contexts through low-frequency components and hierarchical wavelet convolution. These features are further refined by a Gated Fusion Transformer (GFT), which employs gated attention to enhance multi-scale feature integration. In the decoder, depthwise separable convolutions are used to reduce the computational overhead without compromising the representational capacity. To preserve fine structural details and facilitate contextual information flow across layers, the model incorporates skip connections with a hierarchical fusion strategy, enabling the effective integration of shallow and deep features. We evaluated WDM-UNet in three public datasets: DRIVE, STARE, and CHASE_DB1. The quantitative evaluations demonstrate that WDM-UNet consistently outperforms state-of-the-art methods, achieving 96.92% accuracy, 83.61% sensitivity, and an 82.87% F1-score in the DRIVE dataset, with superior performance across all the benchmark datasets in both segmentation accuracy and robustness, particularly in complex vascular scenarios. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

27 pages, 5228 KB

Open AccessArticle

Detection of Surface Defects in Steel Based on Dual-Backbone Network: MBDNet-Attention-YOLO

by Xinyu Wang, Shuhui Ma, Shiting Wu, Zhaoye Li, Jinrong Cao and Peiquan Xu

Sensors 2025, 25(15), 4817; https://doi.org/10.3390/s25154817 - 5 Aug 2025

Viewed by 704

Abstract

Automated surface defect detection in steel manufacturing is pivotal for ensuring product quality, yet it remains an open challenge owing to the extreme heterogeneity of defect morphologies—ranging from hairline cracks and microscopic pores to elongated scratches and shallow dents. Existing approaches, whether classical [...] Read more.

Automated surface defect detection in steel manufacturing is pivotal for ensuring product quality, yet it remains an open challenge owing to the extreme heterogeneity of defect morphologies—ranging from hairline cracks and microscopic pores to elongated scratches and shallow dents. Existing approaches, whether classical vision pipelines or recent deep-learning paradigms, struggle to simultaneously satisfy the stringent demands of industrial scenarios: high accuracy on sub-millimeter flaws, insensitivity to texture-rich backgrounds, and real-time throughput on resource-constrained hardware. Although contemporary detectors have narrowed the gap, they still exhibit pronounced sensitivity–robustness trade-offs, particularly in the presence of scale-varying defects and cluttered surfaces. To address these limitations, we introduce MBY (MBDNet-Attention-YOLO), a lightweight yet powerful framework that synergistically couples the MBDNet backbone with the YOLO detection head. Specifically, the backbone embeds three novel components: (1) HGStem, a hierarchical stem block that enriches low-level representations while suppressing redundant activations; (2) Dynamic Align Fusion (DAF), an adaptive cross-scale fusion mechanism that dynamically re-weights feature contributions according to defect saliency; and (3) C2f-DWR, a depth-wise residual variant that progressively expands receptive fields without incurring prohibitive computational costs. Building upon this enriched feature hierarchy, the neck employs our proposed MultiSEAM module—a cascaded squeeze-and-excitation attention mechanism operating at multiple granularities—to harmonize fine-grained and semantic cues, thereby amplifying weak defect signals against complex textures. Finally, we integrate the Inner-SIoU loss, which refines the geometric alignment between predicted and ground-truth boxes by jointly optimizing center distance, aspect ratio consistency, and IoU overlap, leading to faster convergence and tighter localization. Extensive experiments on two publicly available steel-defect benchmarks—NEU-DET and PVEL-AD—demonstrate the superiority of MBY. Without bells and whistles, our model achieves 85.8% mAP@0.5 on NEU-DET and 75.9% mAP@0.5 on PVEL-AD, surpassing the best-reported results by significant margins while maintaining real-time inference on an NVIDIA Jetson Xavier. Ablation studies corroborate the complementary roles of each component, underscoring MBY’s robustness across defect scales and surface conditions. These results suggest that MBY strikes an appealing balance between accuracy, efficiency, and deployability, offering a pragmatic solution for next-generation industrial quality-control systems. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

41 pages, 86958 KB

Open AccessArticle

An Efficient Aerial Image Detection with Variable Receptive Fields

by Wenbin Liu, Liangren Shi and Guocheng An

Remote Sens. 2025, 17(15), 2672; https://doi.org/10.3390/rs17152672 - 2 Aug 2025

Viewed by 781

Abstract

This article presents VRF-DETR, a lightweight real-time object detection framework for aerial remote sensing images, aimed at addressing the challenge of insufficient receptive fields for easily confused categories due to differences in height and perspective. Based on the RT-DETR architecture, our approach introduces [...] Read more.

This article presents VRF-DETR, a lightweight real-time object detection framework for aerial remote sensing images, aimed at addressing the challenge of insufficient receptive fields for easily confused categories due to differences in height and perspective. Based on the RT-DETR architecture, our approach introduces three key innovations: the multi-scale receptive field adaptive fusion (MSRF²) module replaces the Transformer encoder with parallel dilated convolutions and spatial-channel attention to adjust receptive fields for confusing objects dynamically; the gated multi-scale context (GMSC) block reconstructs the backbone using Gated Multi-Scale Context units with attention-gated convolution (AGConv), reducing parameters while enhancing multi-scale feature extraction; and the context-guided fusion (CGF) module optimizes feature fusion via context-guided weighting to resolve multi-scale semantic conflicts. Evaluations were conducted on both the VisDrone2019 and UAVDT datasets, where VRF-DETR achieved the mAP₅₀ of 52.1% and the mAP_50-95 of 32.2% on the VisDrone2019 validation set, surpassing RT-DETR by 4.9% and 3.5%, respectively, while reducing parameters by 32% and FLOPs by 22%. It maintains real-time performance (62.1 FPS) and generalizes effectively, outperforming state-of-the-art methods in accuracy-efficiency trade-offs for aerial object detection. Full article

(This article belongs to the Special Issue Deep Learning Innovations in Remote Sensing)

► Show Figures

Figure 1

22 pages, 4611 KB

Open AccessArticle

MMC-YOLO: A Lightweight Model for Real-Time Detection of Geometric Symmetry-Breaking Defects in Wind Turbine Blades

by Caiye Liu, Chao Zhang, Xinyu Ge, Xunmeng An and Nan Xue

Symmetry 2025, 17(8), 1183; https://doi.org/10.3390/sym17081183 - 24 Jul 2025

Viewed by 468

Abstract

Performance degradation of wind turbine blades often stems from geometric asymmetry induced by damage. Existing methods for assessing damage face challenges in balancing accuracy and efficiency due to their limited ability to capture fine-grained geometric asymmetries associated with multi-scale damage under complex background [...] Read more.

Performance degradation of wind turbine blades often stems from geometric asymmetry induced by damage. Existing methods for assessing damage face challenges in balancing accuracy and efficiency due to their limited ability to capture fine-grained geometric asymmetries associated with multi-scale damage under complex background interference. To address this, based on the high-speed detection model YOLOv10-N, this paper proposes a novel detection model named MMC-YOLO. First, the Multi-Scale Perception Gated Convolution (MSGConv) Module was designed, which constructs a full-scale receptive field through multi-branch fusion and channel rearrangement to enhance the extraction of geometric asymmetry features. Second, the Multi-Scale Enhanced Feature Pyramid Network (MSEFPN) was developed, integrating dynamic path aggregation and an SENetv2 attention mechanism to suppress background interference and amplify damage response. Finally, the Channel-Compensated Filtering (CCF) module was constructed to preserve critical channel information using a dynamic buffering mechanism. Evaluated on a dataset of 4818 wind turbine blade damage images, MMC-YOLO achieves an 82.4% mAP [0.5:0.95], representing a 4.4% improvement over the baseline YOLOv10-N model, and a 91.1% recall rate, an 8.7% increase, while maintaining a lightweight parameter count of 4.2 million. This framework significantly enhances geometric asymmetry defect detection accuracy while ensuring real-time performance, meeting engineering requirements for high efficiency and precision. Full article

(This article belongs to the Special Issue Symmetry and Its Applications in Image Processing)

► Show Figures

Figure 1

24 pages, 9664 KB

Open AccessArticle

Frequency-Domain Collaborative Lightweight Super-Resolution for Fine Texture Enhancement in Rice Imagery

by Zexiao Zhang, Jie Zhang, Jinyang Du, Xiangdong Chen, Wenjing Zhang and Changmeng Peng

Agronomy 2025, 15(7), 1729; https://doi.org/10.3390/agronomy15071729 - 18 Jul 2025

Viewed by 486

Abstract

In rice detection tasks, accurate identification of leaf streaks, pest and disease distribution, and spikelet hierarchies relies on high-quality images to distinguish between texture and hierarchy. However, existing images often suffer from texture blurring and contour shifting due to equipment and environment limitations, [...] Read more.

In rice detection tasks, accurate identification of leaf streaks, pest and disease distribution, and spikelet hierarchies relies on high-quality images to distinguish between texture and hierarchy. However, existing images often suffer from texture blurring and contour shifting due to equipment and environment limitations, which affects the detection performance. In view of the fact that pests and diseases affect the whole situation and tiny details are mostly localized, we propose a rice image reconstruction method based on an adaptive two-branch heterogeneous structure. The method consists of a low-frequency branch (LFB) that recovers global features using orientation-aware extended receptive fields to capture streaky global features, such as pests and diseases, and a high-frequency branch (HFB) that enhances detail edges through an adaptive enhancement mechanism to boost the clarity of local detail regions. By introducing the dynamic weight fusion mechanism (CSDW) and lightweight gating network (LFFN), the problem of the unbalanced fusion of frequency information for rice images in traditional methods is solved. Experiments on the 4× downsampled rice test set demonstrate that the proposed method achieves a 62% reduction in parameters compared to EDSR, 41% lower computational cost (30 G) than MambaIR-light, and an average PSNR improvement of 0.68% over other methods in the study while balancing memory usage (227 M) and inference speed. In downstream task validation, rice panicle maturity detection achieves a 61.5% increase in mAP50 (0.480 → 0.775) compared to interpolation methods, and leaf pest detection shows a 2.7% improvement in average mAP50 (0.949 → 0.975). This research provides an effective solution for lightweight rice image enhancement, with its dual-branch collaborative mechanism and dynamic fusion strategy establishing a new paradigm in agricultural rice image processing. Full article

(This article belongs to the Collection AI, Sensors and Robotics for Smart Agriculture)

► Show Figures

Figure 1

21 pages, 4936 KB

Open AccessArticle

A Lightweight Pavement Defect Detection Algorithm Integrating Perception Enhancement and Feature Optimization

by Xiang Zhang, Xiaopeng Wang and Zhuorang Yang

Sensors 2025, 25(14), 4443; https://doi.org/10.3390/s25144443 - 17 Jul 2025

Viewed by 409

Abstract

To address the current issue of large computations and the difficulty in balancing model complexity and detection accuracy in pavement defect detection models, a lightweight pavement defect detection algorithm, PGS-YOLO, is proposed based on YOLOv8, which integrates perception enhancement and feature optimization. The [...] Read more.

To address the current issue of large computations and the difficulty in balancing model complexity and detection accuracy in pavement defect detection models, a lightweight pavement defect detection algorithm, PGS-YOLO, is proposed based on YOLOv8, which integrates perception enhancement and feature optimization. The algorithm first designs the Receptive-Field Convolutional Block Attention Module Convolution (RFCBAMConv) and the Receptive-Field Convolutional Block Attention Module C2f-RFCBAM, based on which we construct an efficient Perception Enhanced Feature Extraction Network (PEFNet) that enhances multi-scale feature extraction capability by dynamically adjusting the receptive field. Secondly, the dynamic upsampling module DySample is introduced into the efficient feature pyramid, constructing a new feature fusion pyramid (Generalized Dynamic Sampling Feature Pyramid Network, GDSFPN) to optimize the multi-scale feature fusion effect. In addition, a shared detail-enhanced convolution lightweight detection head (SDCLD) was designed, which significantly reduces the model’s parameters and computation while improving localization and classification performance. Finally, Wise-IoU was introduced to optimize the training performance and detection accuracy of the model. Experimental results show that PGS-YOLO increases mAP50 by 2.8% and 2.9% on the complete GRDDC2022 dataset and the Chinese subset, respectively, outperforming the other detection models. The number of parameters and computations are reduced by 10.3% and 9.9%, respectively, compared to the YOLOv8n model, with an average frame rate of 69 frames per second, offering good real-time performance. In addition, on the CRACK500 dataset, PGS-YOLO improved mAP50 by 2.3%, achieving a better balance between model complexity and detection accuracy. Full article

(This article belongs to the Topic Applied Computing and Machine Intelligence (ACMI))

► Show Figures

Figure 1

29 pages, 16466 KB

Open AccessArticle

DMF-YOLO: Dynamic Multi-Scale Feature Fusion Network-Driven Small Target Detection in UAV Aerial Images

by Xiaojia Yan, Shiyan Sun, Huimin Zhu, Qingping Hu, Wenjian Ying and Yinglei Li

Remote Sens. 2025, 17(14), 2385; https://doi.org/10.3390/rs17142385 - 10 Jul 2025

Viewed by 754

Abstract

Target detection in UAV aerial images has found increasingly widespread applications in emergency rescue, maritime monitoring, and environmental surveillance. However, traditional detection models suffer significant performance degradation due to challenges including substantial scale variations, high proportions of small targets, and dense occlusions in [...] Read more.

Target detection in UAV aerial images has found increasingly widespread applications in emergency rescue, maritime monitoring, and environmental surveillance. However, traditional detection models suffer significant performance degradation due to challenges including substantial scale variations, high proportions of small targets, and dense occlusions in UAV-captured images. To address these issues, this paper proposes DMF-YOLO, a high-precision detection network based on YOLOv10 improvements. First, we design Dynamic Dilated Snake Convolution (DDSConv) to adaptively adjust the receptive field and dilation rate of convolution kernels, enhancing local feature extraction for small targets with weak textures. Second, we construct a Multi-scale Feature Aggregation Module (MFAM) that integrates dual-branch spatial attention mechanisms to achieve efficient cross-layer feature fusion, mitigating information conflicts between shallow details and deep semantics. Finally, we propose an Expanded Window-based Bounding Box Regression Loss Function (EW-BBRLF), which optimizes localization accuracy through dynamic auxiliary bounding boxes, effectively reducing missed detections of small targets. Experiments on the VisDrone2019 and HIT-UAV datasets demonstrate that DMF-YOLOv10 achieves 50.1% and 81.4% mAP50, respectively, significantly outperforming the baseline YOLOv10s by 27.1% and 2.6%, with parameter increases limited to 24.4% and 11.9%. The method exhibits superior robustness in dense scenarios, complex backgrounds, and long-range target detection. This approach provides an efficient solution for UAV real-time perception tasks and offers novel insights for multi-scale object detection algorithm design. Full article

► Show Figures

Graphical abstract

22 pages, 2583 KB

Open AccessArticle

Helmet Detection in Underground Coal Mines via Dynamic Background Perception with Limited Valid Samples

by Guangfu Wang, Dazhi Sun, Hao Li, Jian Cheng, Pengpeng Yan and Heping Li

Mach. Learn. Knowl. Extr. 2025, 7(3), 64; https://doi.org/10.3390/make7030064 - 9 Jul 2025

Viewed by 516

Abstract

The underground coal mine environment is complex and dynamic, making the application of visual algorithms for object detection a crucial component of underground safety management as well as a key factor in ensuring the safe operation of workers. We look at this in [...] Read more.

The underground coal mine environment is complex and dynamic, making the application of visual algorithms for object detection a crucial component of underground safety management as well as a key factor in ensuring the safe operation of workers. We look at this in the context of helmet-wearing detection in underground mines, where over 25% of the targets are small objects. To address challenges such as the lack of effective samples for unworn helmets, significant background interference, and the difficulty of detecting small helmet targets, this paper proposes a novel underground helmet-wearing detection algorithm that combines dynamic background awareness with a limited number of valid samples to improve accuracy for underground workers. The algorithm begins by analyzing the distribution of visual surveillance data and spatial biases in underground environments. By using data augmentation techniques, it then effectively expands the number of training samples by introducing positive and negative samples for helmet-wearing detection from ordinary scenes. Thereafter, based on YOLOv10, the algorithm incorporates a background awareness module with region masks to reduce the adverse effects of complex underground backgrounds on helmet-wearing detection. Specifically, it adds a convolution and attention fusion module in the detection head to enhance the model’s perception of small helmet-wearing objects by enlarging the detection receptive field. By analyzing the aspect ratio distribution of helmet wearing data, the algorithm improves the aspect ratio constraints in the loss function, further enhancing detection accuracy. Consequently, it achieves precise detection of helmet-wearing in underground coal mines. Experimental results demonstrate that the proposed algorithm can detect small helmet-wearing objects in complex underground scenes, with a 14% reduction in background false detection rates, and thereby achieving accuracy, recall, and average precision rates of 94.4%, 89%, and 95.4%, respectively. Compared to other mainstream object detection algorithms, the proposed algorithm shows improvements in detection accuracy of 6.7%, 5.1%, and 11.8% over YOLOv9, YOLOv10, and RT-DETR, respectively. The algorithm proposed in this paper can be applied to real-time helmet-wearing detection in underground coal mine scenes, providing safety alerts for standardized worker operations and enhancing the level of underground security intelligence. Full article

(This article belongs to the Topic Applied Computer Vision and Pattern Recognition: 2nd Edition)

► Show Figures

Graphical abstract

31 pages, 20469 KB

Open AccessArticle

YOLO-SRMX: A Lightweight Model for Real-Time Object Detection on Unmanned Aerial Vehicles

by Shimin Weng, Han Wang, Jiashu Wang, Changming Xu and Ende Zhang

Remote Sens. 2025, 17(13), 2313; https://doi.org/10.3390/rs17132313 - 5 Jul 2025

Cited by 1 | Viewed by 1092

Abstract

Unmanned Aerial Vehicles (UAVs) face a significant challenge in balancing high accuracy and high efficiency when performing real-time object detection tasks, especially amidst intricate backgrounds, diverse target scales, and stringent onboard computational resource constraints. To tackle these difficulties, this study introduces YOLO-SRMX, a [...] Read more.

Unmanned Aerial Vehicles (UAVs) face a significant challenge in balancing high accuracy and high efficiency when performing real-time object detection tasks, especially amidst intricate backgrounds, diverse target scales, and stringent onboard computational resource constraints. To tackle these difficulties, this study introduces YOLO-SRMX, a lightweight real-time object detection framework specifically designed for infrared imagery captured by UAVs. Firstly, the model utilizes ShuffleNetV2 as an efficient lightweight backbone and integrates the novel Multi-Scale Dilated Attention (MSDA) module. This strategy not only facilitates a substantial 46.4% reduction in parameter volume but also, through the flexible adaptation of receptive fields, boosts the model’s robustness and precision in multi-scale object recognition tasks. Secondly, within the neck network, multi-scale feature extraction is facilitated through the design of novel composite convolutions, ConvX and MConv, based on a “split–differentiate–concatenate” paradigm. Furthermore, the lightweight GhostConv is incorporated to reduce model complexity. By synthesizing these principles, a novel composite receptive field lightweight convolution, DRFAConvP, is proposed to further optimize multi-scale feature fusion efficiency and promote model lightweighting. Finally, the Wise-IoU loss function is adopted to replace the traditional bounding box loss. This is coupled with a dynamic non-monotonic focusing mechanism formulated using the concept of outlier degrees. This mechanism intelligently assigns elevated gradient weights to anchor boxes of moderate quality by assessing their relative outlier degree, while concurrently diminishing the gradient contributions from both high-quality and low-quality anchor boxes. Consequently, this approach enhances the model’s localization accuracy for small targets in complex scenes. Experimental evaluations on the HIT-UAV dataset corroborate that YOLO-SRMX achieves an

{mAP}_{50}

of 82.8%, representing a 7.81% improvement over the baseline YOLOv8s model; an F1 score of 80%, marking a 3.9% increase; and a substantial 65.3% reduction in computational cost (GFLOPs). YOLO-SRMX demonstrates an exceptional trade-off between detection accuracy and operational efficiency, thereby underscoring its considerable potential for efficient and precise object detection on resource-constrained UAV platforms. Full article

(This article belongs to the Special Issue Target Detection, Recognition, Tracking, and Positioning Using Remote Sensing and AI Techniques)

► Show Figures

Figure 1

Search Results (72)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (72)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI