Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (2,898)

Search Parameters:
Keywords = multi-scale fusion

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
21 pages, 1688 KB  
Article
Sparse-Gated RGB-Event Fusion for Small Object Detection in the Wild
by Yangsi Shi, Miao Li, Nuo Chen, Yihang Luo, Shiman He and Wei An
Remote Sens. 2025, 17(17), 3112; https://doi.org/10.3390/rs17173112 (registering DOI) - 6 Sep 2025
Abstract
Detecting small moving objects under challenging lighting conditions, such as overexposure and underexposure, remains a critical challenge in computer vision applications including surveillance, autonomous driving, and anti-UAV systems. Traditional RGB-based detectors often suffer from degraded object visibility and highly dynamic illumination, leading to [...] Read more.
Detecting small moving objects under challenging lighting conditions, such as overexposure and underexposure, remains a critical challenge in computer vision applications including surveillance, autonomous driving, and anti-UAV systems. Traditional RGB-based detectors often suffer from degraded object visibility and highly dynamic illumination, leading to suboptimal performance. To address these limitations, we propose a novel RGB-Event fusion framework that leverages the complementary strengths of RGB and event modalities for enhanced small object detection. Specifically, we introduce a Temporal Multi-Scale Attention Fusion (TMAF) module to encode motion cues from event streams at multiple temporal scales, thereby enhancing the saliency of small object features. Furthermore, we design a Sparse Noisy Gated Attention Fusion (SNGAF) module, inspired by the mixture-of-experts paradigm, which employs a sparse gating mechanism to adaptively combine multiple fusion experts based on input characteristics, enabling flexible and robust RGB-Event feature integration. Additionally, we present RGBE-UAV, which is a new RGB-Event dataset tailored for small moving object detection under diverse exposure conditions. Extensive experiments on our RGBE-UAV and public DSEC-MOD datasets demonstrate that our method outperforms existing state-of-the-art RGB-Event fusion approaches, validating its effectiveness and generalization under complex lighting conditions. Full article
20 pages, 3214 KB  
Article
FDMNet: A Multi-Task Network for Joint Detection and Segmentation of Three Fish Diseases
by Zhuofu Liu, Zigan Yan and Gaohan Li
J. Imaging 2025, 11(9), 305; https://doi.org/10.3390/jimaging11090305 (registering DOI) - 6 Sep 2025
Abstract
Fish diseases are one of the primary causes of economic losses in aquaculture. Existing deep learning models have progressed in fish disease detection and lesion segmentation. However, many models still have limitations, such as detecting only a single type of fish disease or [...] Read more.
Fish diseases are one of the primary causes of economic losses in aquaculture. Existing deep learning models have progressed in fish disease detection and lesion segmentation. However, many models still have limitations, such as detecting only a single type of fish disease or completing only a single task within fish disease detection. To address these limitations, we propose FDMNet, a multi-task learning network. Built upon the YOLOv8 framework, the network incorporates a semantic segmentation branch with a multi-scale perception mechanism. FDMNet performs detection and segmentation simultaneously. The detection and segmentation branches use the C2DF dynamic feature fusion module to address information loss during local feature fusion across scales. Additionally, we use uncertainty-based loss weighting together with PCGrad to mitigate conflicting gradients between tasks, improving the stability and overall performance of FDMNet. On a self-built image dataset containing three common fish diseases, FDMNet achieved 97.0% mAP50 for the detection task and 85.7% mIoU for the segmentation task. Relative to the multi-task YOLO-FD baseline, FDMNet’s detection mAP50 improved by 2.5% and its segmentation mIoU by 5.4%. On the dataset constructed in this study, FDMNet achieved competitive accuracy in both detection and segmentation. These results suggest potential practical utility. Full article
21 pages, 5521 KB  
Article
AMS-YOLO: Asymmetric Multi-Scale Fusion Network for Cannabis Detection in UAV Imagery
by Xuelin Li, Huanyin Yue, Jianli Liu and Aonan Cheng
Drones 2025, 9(9), 629; https://doi.org/10.3390/drones9090629 (registering DOI) - 6 Sep 2025
Abstract
Cannabis is a strictly regulated plant in China, and its illegal cultivation presents significant challenges for social governance. Traditional manual patrol methods suffer from low coverage efficiency, while satellite imagery struggles to identify illicit plantations due to its limited spatial resolution, particularly for [...] Read more.
Cannabis is a strictly regulated plant in China, and its illegal cultivation presents significant challenges for social governance. Traditional manual patrol methods suffer from low coverage efficiency, while satellite imagery struggles to identify illicit plantations due to its limited spatial resolution, particularly for sparsely distributed and concealed cultivation. UAV remote sensing technology, with its high resolution and mobility, provides a promising solution for cannabis monitoring. However, existing detection methods still face challenges in terms of accuracy and robustness, particularly due to varying target scales, severe occlusion, and background interference. In this paper, we propose AMS-YOLO, a cannabis detection model tailored for UAV imagery. The model incorporates an asymmetric backbone network to improve texture perception by directing the model’s focus towards directional information. Additionally, it features a multi-scale fusion neck structure, incorporating partial convolution mechanisms to effectively improve cannabis detection in small target and complex background scenarios. To evaluate the model’s performance, we constructed a cannabis remote sensing dataset consisting of 1972 images. Experimental results show that AMS-YOLO achieves an mAP of 90.7% while maintaining efficient inference speed, outperforming existing state-of-the-art detection algorithms. This method demonstrates strong adaptability and practicality in complex environments, offering robust technical support for monitoring illegal cannabis cultivation. Full article
Show Figures

Figure 1

26 pages, 7650 KB  
Article
ACD-DETR: Adaptive Cross-Scale Detection Transformer for Small Object Detection in UAV Imagery
by Yang Tong, Hui Ye, Jishen Yang and Xiulong Yang
Sensors 2025, 25(17), 5556; https://doi.org/10.3390/s25175556 - 5 Sep 2025
Abstract
Small object detection in UAV imagery remains challenging due to complex aerial perspectives and the presence of dense, small targets with blurred boundaries. To address these challenges, we propose ACD-DETR, an adaptive end-to-end Transformer detector tailored for UAV-based small object detection. The framework [...] Read more.
Small object detection in UAV imagery remains challenging due to complex aerial perspectives and the presence of dense, small targets with blurred boundaries. To address these challenges, we propose ACD-DETR, an adaptive end-to-end Transformer detector tailored for UAV-based small object detection. The framework introduces three core modules: the Multi-Scale Edge-Enhanced Feature Fusion Module (MSEFM) to preserve fine-grained details; the Omni-Grained Boundary Calibrator (OG-BC) for boundary-aware semantic fusion; and the Dynamic Position Bias Attention-based Intra-scale Feature Interaction (DPB-AIFI) to enhance spatial reasoning. Furthermore, we introduce ACD-DETR-SBA+, a fusion-enhanced variant that removes OG-BC and DPB-AIFI while deploying densely connected Semantic–Boundary Aggregation (SBA) modules to intensify boundary–semantic fusion. This design sacrifices computational efficiency in exchange for higher detection precision, making it suitable for resource-rich deployment scenarios. On the VisDrone2019 dataset, ACD-DETR achieves 50.9% mAP@0.5, outperforming the RT-DETR-R18 baseline by 3.6 percentage points, while reducing parameters by 18.5%. ACD-DETR-SBA+ further improves accuracy to 52.0% mAP@0.5, demonstrating the benefit of SBA-based fusion. Extensive experiments on the VisDrone2019 and DOTA datasets demonstrate that ACD-DETR achieves a state-of-the-art trade-off between accuracy and efficiency, while ACD-DETR-SBA+ achieves further performance improvements at higher computational cost. Ablation studies and visual analyses validate the effectiveness of the proposed modules and design strategies. Full article
(This article belongs to the Section Remote Sensors)
Show Figures

Figure 1

19 pages, 11406 KB  
Article
A Pool Drowning Detection Model Based on Improved YOLO
by Wenhui Zhang, Lu Chen and Jianchun Shi
Sensors 2025, 25(17), 5552; https://doi.org/10.3390/s25175552 - 5 Sep 2025
Abstract
Drowning constitutes the leading cause of injury-related fatalities among adolescents. In swimming pool environments, traditional manual surveillance exhibits limitations, while existing technologies suffer from poor adaptability of wearable devices. Vision models based on YOLO still face challenges in edge deployment efficiency, robustness in [...] Read more.
Drowning constitutes the leading cause of injury-related fatalities among adolescents. In swimming pool environments, traditional manual surveillance exhibits limitations, while existing technologies suffer from poor adaptability of wearable devices. Vision models based on YOLO still face challenges in edge deployment efficiency, robustness in complex water conditions, and multi-scale object detection. To address these issues, we propose YOLO11-LiB, a drowning object detection model based on YOLO11n, featuring three key enhancements. First, we design the Lightweight Feature Extraction Module (LGCBlock), which integrates the Lightweight Attention Encoding Block (LAE) and effectively combines Ghost Convolution (GhostConv) with dynamic convolution (DynamicConv). This optimizes the downsampling structure and the C3k2 module in the YOLO11n backbone network, significantly reducing model parameters and computational complexity. Second, we introduce the Cross-Channel Position-aware Spatial Attention Inverted Residual with Spatial–Channel Separate Attention module (C2PSAiSCSA) into the backbone. This module embeds the Spatial–Channel Separate Attention (SCSA) mechanism within the Inverted Residual Mobile Block (iRMB) framework, enabling more comprehensive and efficient feature extraction. Finally, we redesign the neck structure as the Bidirectional Feature Fusion Network (BiFF-Net), which integrates the Bidirectional Feature Pyramid Network (BiFPN) and Frequency-Aware Feature Fusion (FreqFusion). The enhanced YOLO11-LiB model was validated against mainstream algorithms through comparative experiments, and ablation studies were conducted. Experimental results demonstrate that YOLO11-LiB achieves a drowning class mean average precision (DmAP50) of 94.1%, with merely 2.02 M parameters and a model size of 4.25 MB. This represents an effective balance between accuracy and efficiency, providing a high-performance solution for real-time drowning detection in swimming pool scenarios. Full article
(This article belongs to the Section Intelligent Sensors)
17 pages, 1128 KB  
Article
Research on a Lightweight Textile Defect Detection Algorithm Based on WSF-RTDETR
by Jun Chen, Shubo Zhang, Yingying Yang, Weiqian Li and Gangfeng Wang
Processes 2025, 13(9), 2851; https://doi.org/10.3390/pr13092851 - 5 Sep 2025
Abstract
Textile defect detection technology has become a core component of industrial quality control. With the advancement of artificial intelligence technologies, an increasing number of intelligent recognition methods are being actively researched and deployed in the textile defect detection. To further improve detection accuracy [...] Read more.
Textile defect detection technology has become a core component of industrial quality control. With the advancement of artificial intelligence technologies, an increasing number of intelligent recognition methods are being actively researched and deployed in the textile defect detection. To further improve detection accuracy and quality, we propose a new lightweight process named WSF-RTDETR with reduced computational resources. Firstly, we integrated WTConv convolution with residual blocks to form a lightweight WTConv-Block module, which could enhance the capability of capturing detailed features of tiny defective targets while reducing computational overhead. Subsequently, a lightweight slimneck-SSFF feature fusion architecture was constructed to enhance the feature fusion performance. In addition, the Focaler–MPDIoU loss function was presented by incorporating‌ dynamic weight adjustment and multi-scale perception mechanism, which could improve the detection accuracy and convergence speed for tiny defective targets. Finally, we conducted experiments on a textile defect dataset to further validate the effectiveness of the WSF-RTDETR model. The results demonstrate that the model improves‌ mean average precision (mAP50) by 4.71% while reducing GFLOPs and the number of parameters by 24.39% and 31.11%, respectively. The improvements in both detection performance and computational efficiency would provide an effective and reliable solution for industrial textile defect detection. Full article
27 pages, 2800 KB  
Article
A Hierarchical Multi-Feature Point Cloud Lithology Identification Method Based on Feature-Preserved Compressive Sampling (FPCS)
by Xiaolei Duan, Ran Jing, Yanlin Shao, Yuangang Liu, Binqing Gan, Peijin Li and Longfan Li
Sensors 2025, 25(17), 5549; https://doi.org/10.3390/s25175549 - 5 Sep 2025
Abstract
Lithology identification is a critical technology for geological resource exploration and engineering safety assessment. However, traditional methods suffer from insufficient feature representation and low classification accuracy due to challenges such as weathering, vegetation cover, and spectral overlap in complex sedimentary rock regions. This [...] Read more.
Lithology identification is a critical technology for geological resource exploration and engineering safety assessment. However, traditional methods suffer from insufficient feature representation and low classification accuracy due to challenges such as weathering, vegetation cover, and spectral overlap in complex sedimentary rock regions. This study proposes a hierarchical multi-feature random forest algorithm based on Feature-Preserved Compressive Sampling (FPCS). Using 3D laser point cloud data from the Manas River outcrop in the southern margin of the Junggar Basin as the test area, we integrate graph signal processing and multi-scale feature fusion to construct a high-precision lithology identification model. The FPCS method establishes a geologically adaptive graph model constrained by geodesic distance and gradient-sensitive weighting, employing a three-tier graph filter bank (low-pass, band-pass, and high-pass) to extract macroscopic morphology, interface gradients, and microscopic fracture features of rock layers. A dynamic gated fusion mechanism optimizes multi-level feature weights, significantly improving identification accuracy in lithological transition zones. Experimental results on five million test samples demonstrate an overall accuracy (OA) of 95.6% and a mean accuracy (mAcc) of 94.3%, representing improvements of 36.1% and 20.5%, respectively, over the PointNet model. These findings confirm the robust engineering applicability of the FPCS-based hierarchical multi-feature approach for point cloud lithology identification. Full article
(This article belongs to the Section Remote Sensors)
25 pages, 1812 KB  
Article
YOLO-EDH: An Enhanced Ore Detection Algorithm
by Lei Wan, Xueyu Huang and Zeyang Qiu
Minerals 2025, 15(9), 952; https://doi.org/10.3390/min15090952 - 5 Sep 2025
Abstract
Mineral identification technology is a key technology in the construction of intelligent mines. In ore classification and detection, mining scenarios present challenges, such as diverse ore types, significant scale variations, and complex surface textures. Traditional detection models often suffer from insufficient multi-scale feature [...] Read more.
Mineral identification technology is a key technology in the construction of intelligent mines. In ore classification and detection, mining scenarios present challenges, such as diverse ore types, significant scale variations, and complex surface textures. Traditional detection models often suffer from insufficient multi-scale feature representation and weak dynamic adaptability, leading to the missed detection of small targets and misclassification of similar minerals. To address these issues, this paper proposes an efficient multi-scale ore classification and detection model, YOLO-EDH. To begin, standard convolution is replaced with deformable convolution, which efficiently captures irregular defect patterns, significantly boosting the model’s robustness and generalization ability. The C3k2 module is then combined with a modified dynamic convolution module, which avoids unnecessary computational overhead while enhancing the flexibility and feature representation. Additionally, a content-guided attention fusion (HGAF) module is introduced before the detection phase, ensuring that the model assigns the correct importance to various feature maps, thereby highlighting the most relevant object details. Experimental results indicate that YOLO-EDH surpasses YOLOv11, improving the precision, recall, and mAP50 by 0.9%, 1.7%, and 1.6%, respectively. In conclusion, YOLO-EDH offers an efficient solution for ore detection in practical applications, with considerable potential for industries like intelligent mine resource sorting and safety production monitoring, showing notable commercial value. Full article
Show Figures

Figure 1

18 pages, 2778 KB  
Article
YOLO-MARS for Infrared Target Detection: Towards near Space
by Bohan Liu, Yeteng Han, Pengxi Liu, Sha Luo, Jie Li, Tao Zhang and Wennan Cui
Sensors 2025, 25(17), 5538; https://doi.org/10.3390/s25175538 - 5 Sep 2025
Abstract
In response to problems such as large target scale variations, strong background noise, and blurred features leading by low contrast in infrared target detection in near space environments, this paper proposes an efficient detection model, YOLO-MARS, which is based on YOLOv8. The model [...] Read more.
In response to problems such as large target scale variations, strong background noise, and blurred features leading by low contrast in infrared target detection in near space environments, this paper proposes an efficient detection model, YOLO-MARS, which is based on YOLOv8. The model introduces a Space-to-Depth (SPD) convolution module into the backbone section, which retains the detailed features of smaller targets by downsampling operations without information loss, alleviating the loss of the target feature caused by traditional downsampling. The Grouped Multi-Head Self-Attention (GMHSA) module is added after the backbone’s SPPF module to improve cross-scale global modeling capabilities for target area feature responses while suppressing complex thermal noise background interference. In addition, a Light Adaptive Spatial Feature Fusion (LASFF) detector head is designed to mitigate the scale sensitivity issue of infrared targets (especially smaller targets) in the feature pyramid. It uses a shared weighting mechanism to achieve adaptive fusion of multi-scale features, reducing computational complexity while improving target localization and classification accuracy. To address the extreme scarcity of near space data, we integrated 284 near space images with the HIT-UAV dataset through physical equivalence analysis (atmospheric transmittance, contrast, and signal-to-noise ratio) to construct the NS-HIT dataset. The experimental results show that mAP@0.5 increases by 5.4% and the number of parameters only increase 10% using YOLO-MARS compared to YOLOv8. YOLO-MARS improves the accuracy of detection significantly while considering the requirements of model complexity, which provides an efficient and reliable solution for applications in near space infrared target detection. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

20 pages, 2226 KB  
Article
RST-Net: A Semantic Segmentation Network for Remote Sensing Images Based on a Dual-Branch Encoder Structure
by Na Yang, Chuanzhao Tian, Xingfa Gu, Yanting Zhang, Xuewen Li and Feng Zhang
Sensors 2025, 25(17), 5531; https://doi.org/10.3390/s25175531 - 5 Sep 2025
Abstract
High-resolution remote sensing images often suffer from inadequate fusion between global and local features, leading to the loss of long-range dependencies and blurred spatial details, while also exhibiting limited adaptability to multi-scale object segmentation. To overcome these limitations, this study proposes RST-Net, a [...] Read more.
High-resolution remote sensing images often suffer from inadequate fusion between global and local features, leading to the loss of long-range dependencies and blurred spatial details, while also exhibiting limited adaptability to multi-scale object segmentation. To overcome these limitations, this study proposes RST-Net, a semantic segmentation network featuring a dual-branch encoder structure. The encoder integrates a ResNeXt-50-based CNN branch for extracting local spatial features and a Shunted Transformer (ST) branch for capturing global contextual information. To further enhance multi-scale representation, the multi-scale feature enhancement module (MSFEM) is embedded in the CNN branch, leveraging atrous and depthwise separable convolutions to dynamically aggregate features. Additionally, the residual dynamic feature fusion (RDFF) module is incorporated into skip connections to improve interactions between encoder and decoder features. Experiments on the Vaihingen and Potsdam datasets show that RST-Net achieves promising performance, with MIoU scores of 77.04% and 79.56%, respectively, validating its effectiveness in semantic segmentation tasks. Full article
Show Figures

Figure 1

18 pages, 15698 KB  
Article
MDEM: A Multi-Scale Damage Enhancement MambaOut for Pavement Damage Classification
by Shizheng Zhang, Kunpeng Wang, Pu Li, Min Huang and Jianxiang Guo
Sensors 2025, 25(17), 5522; https://doi.org/10.3390/s25175522 - 4 Sep 2025
Abstract
Pavement damage classification is crucial for road maintenance and driving safety. However, restricted to the varying scales, irregular shapes, small area ratios, and frequent overlap with background noise, traditional methods struggle to achieve accurate recognition. To address these challenges, a novel pavement damage [...] Read more.
Pavement damage classification is crucial for road maintenance and driving safety. However, restricted to the varying scales, irregular shapes, small area ratios, and frequent overlap with background noise, traditional methods struggle to achieve accurate recognition. To address these challenges, a novel pavement damage classification model is designed based on the MambaOut named Multi-scale Damage Enhancement MambaOut (MDEM). The model incorporates two key modules to improve damage classification performance. The Multi-scale Dynamic Feature Fusion Block (MDFF) adaptively integrates multi-scale information to enhance feature extraction, effectively distinguishing visually similar cracks at different scales. The Damage Detail Enhancement Block (DDE) emphasizes fine structural details while suppressing background interference, thereby improving the representation of small-scale damage regions. Experiments were conducted on multiple datasets, including CQU-BPMDD, CQU-BPDD, and Crack500-PDD. On the CQU-BPMDD dataset, MDEM outperformed the baseline model with improvements of 2.01% in accuracy, 2.64% in precision, 2.7% in F1-score, and 4.2% in AUC. The extensive experimental results demonstrate that MDEM significantly surpasses MambaOut and other comparable methods in pavement damage classification tasks. It effectively addresses challenges such as varying scales, irregular shapes, small damage areas, and background noise, enhancing inspection accuracy in real-world road maintenance. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

23 pages, 1476 KB  
Article
Dynamically Optimized Object Detection Algorithms for Aviation Safety
by Yi Qu, Cheng Wang, Yilei Xiao, Haijuan Ju and Jing Wu
Electronics 2025, 14(17), 3536; https://doi.org/10.3390/electronics14173536 - 4 Sep 2025
Abstract
Infrared imaging technology demonstrates significant advantages in aviation safety monitoring due to its exceptional all-weather operational capability and anti-interference characteristics, particularly in scenarios requiring real-time detection of aerial objects such as airport airspace management. However, traditional infrared target detection algorithms face critical challenges [...] Read more.
Infrared imaging technology demonstrates significant advantages in aviation safety monitoring due to its exceptional all-weather operational capability and anti-interference characteristics, particularly in scenarios requiring real-time detection of aerial objects such as airport airspace management. However, traditional infrared target detection algorithms face critical challenges in complex sky backgrounds, including low signal-to-noise ratio (SNR), small target dimensions, and strong background clutter, leading to insufficient detection accuracy and reliability. To address these issues, this paper proposes the AFK-YOLO model based on the YOLO11 framework: it integrates an ADown downsampling module, which utilizes a dual-branch strategy combining average pooling and max pooling to effectively minimize feature information loss during spatial resolution reduction; introduces the KernelWarehouse dynamic convolution approach, which adopts kernel partitioning and a contrastive attention-based cross-layer shared kernel repository to address the challenge of linear parameter growth in conventional dynamic convolution methods; and establishes a feature decoupling pyramid network (FDPN) that replaces static feature pyramids with a dynamic multi-scale fusion architecture, utilizing parallel multi-granularity convolutions and an EMA attention mechanism to achieve adaptive feature enhancement. Experiments demonstrate that the AFK-YOLO model achieves 78.6% mAP on a self-constructed aerial infrared dataset—a 2.4 percentage point improvement over the baseline YOLO11—while meeting real-time requirements for aviation safety monitoring (416.7 FPS), reducing parameters by 6.9%, and compressing weight size by 21.8%. The results demonstrate the effectiveness of dynamic optimization methods in improving the accuracy and robustness of infrared target detection under complex aerial environments, thereby providing reliable technical support for the prevention of mid-air collisions. Full article
(This article belongs to the Special Issue Computer Vision and AI Algorithms for Diverse Scenarios)
Show Figures

Figure 1

20 pages, 8561 KB  
Article
LCW-YOLO: An Explainable Computer Vision Model for Small Object Detection in Drone Images
by Dan Liao, Rengui Bi, Yubi Zheng, Cheng Hua, Liangqing Huang, Xiaowen Tian and Bolin Liao
Appl. Sci. 2025, 15(17), 9730; https://doi.org/10.3390/app15179730 - 4 Sep 2025
Abstract
Small targets in drone imagery are often difficult to accurately locate and identify due to scale imbalance and limitations, such as pixel representation and dynamic environmental interference, and the balance between detection accuracy and resource consumption of the model also poses challenges. Therefore, [...] Read more.
Small targets in drone imagery are often difficult to accurately locate and identify due to scale imbalance and limitations, such as pixel representation and dynamic environmental interference, and the balance between detection accuracy and resource consumption of the model also poses challenges. Therefore, we propose an interpretable computer vision framework based on YOLOv12m, called LCW-YOLO. First, we adopt multi-scale heterogeneous convolutional kernels to improve the lightweight channel-level and spatial attention combined context (LA2C2f) structure, enhancing spatial perception capabilities while reducing model computational load. Second, to enhance feature fusion capabilities, we propose the Convolutional Attention Integration Module (CAIM), enabling the fusion of original features across channels, spatial dimensions, and layers, thereby strengthening contextual attention. Finally, the model incorporates Wise-IoU (WIoU) v3, which dynamically allocates loss weights for detected objects. This allows the model to adjust its focus on samples of average quality during training based on object difficulty, thereby improving the model’s generalization capabilities. According to experimental results, LCW-YOLO eliminates 0.4 M parameters and improves mAP@0.5 by 3.3% on the VisDrone2019 dataset when compared to YOLOv12m. And the model improves mAP@0.5 by 1.9% on the UAVVaste dataset. In the task of identifying small objects with drones, LCW-YOLO, as an explainable AI (XAI) model, provides visual detection results and effectively balances accuracy, lightweight design, and generalization capabilities. Full article
(This article belongs to the Special Issue Explainable Artificial Intelligence Technology and Its Applications)
Show Figures

Figure 1

20 pages, 9291 KB  
Article
BGWL-YOLO: A Lightweight and Efficient Object Detection Model for Apple Maturity Classification Based on the YOLOv11n Improvement
by Zhi Qiu, Wubin Ou, Deyun Mo, Yuechao Sun, Xingzao Ma, Xianxin Chen and Xuejun Tian
Horticulturae 2025, 11(9), 1068; https://doi.org/10.3390/horticulturae11091068 - 4 Sep 2025
Abstract
China is the world’s leading producer of apples. However, the current classification of apple maturity is predominantly reliant on manual expertise, a process that is both inefficient and costly. In this study, we utilize a diverse array of apples of varying ripeness levels [...] Read more.
China is the world’s leading producer of apples. However, the current classification of apple maturity is predominantly reliant on manual expertise, a process that is both inefficient and costly. In this study, we utilize a diverse array of apples of varying ripeness levels as the research subjects. We propose a lightweight target detection model, termed BGWL-YOLO, which is based on YOLOv11n and incorporates the following specific improvements. To enhance the model’s ability for multi-scale feature fusion, a bidirectional weighted feature pyramid network (BiFPN) is introduced in the neck. In response to the problem of redundant computation in convolutional neural networks, a GhostConv is used to replace the standard convolution. The Wise-Inner-MPDIoU (WIMIoU) loss function is introduced to improve the localization accuracy of the model. Finally, the LAMP pruning algorithm is utilized to further compress the model size. The experimental results demonstrate that the BGWL-YOLO model attains a detection and recognition precision rate of 83.5%, a recall rate of 81.7%, and an average precision mean of 90.1% on the test set. A comparative analysis reveals that the number of parameters has been reduced by 65.3%, the computational demands have been decreased by 57.1%, the frames per second (FPS) have been boosted by 5.8% on the GPU and 32.8% on the CPU, and most notably, the model size has been reduced by 74.8%. This substantial reduction in size is highly advantageous for deployment on compact smart devices, thereby facilitating the advancement of smart agriculture. Full article
Show Figures

Figure 1

17 pages, 1294 KB  
Article
SPARSE-OTFS-Net: A Sparse Robust OTFS Signal Detection Algorithm for 6G Ubiquitous Coverage
by Yunzhi Ling and Jun Xu
Electronics 2025, 14(17), 3532; https://doi.org/10.3390/electronics14173532 - 4 Sep 2025
Abstract
With the evolution of 6G technology toward global coverage and multidimensional integration, OTFS modulation has become a research focus due to its advantages in high-mobility scenarios. However, existing OTFS signal detection algorithms face challenges such as pilot contamination, Doppler spread degradation, and diverse [...] Read more.
With the evolution of 6G technology toward global coverage and multidimensional integration, OTFS modulation has become a research focus due to its advantages in high-mobility scenarios. However, existing OTFS signal detection algorithms face challenges such as pilot contamination, Doppler spread degradation, and diverse interference in complex environments. This paper proposes the SPARSE-OTFS-Net algorithm, which establishes a comprehensive signal detection solution by innovatively integrating sparse random pilot design, compressive sensing-based frequency offset estimation with closed-loop cancellation, and joint denoising techniques combining an autoencoder, residual learning, and multi-scale feature fusion. The algorithm employs deep learning to dynamically generate non-uniform pilot distributions, reducing pilot contamination by 60%. Through orthogonal matching pursuit algorithms, it achieves super-resolution frequency offset estimation with tracking errors controlled within 20 Hz, effectively addressing Doppler spread degradation. The multi-stage denoising mechanism of deep neural networks suppresses various interferences while preserving time-frequency domain signal sparsity. Simulation results demonstrate: Under large frequency offset, multipath, and low SNR conditions, multi-kernel convolution technology achieves significant computational complexity reduction while exhibiting outstanding performance in tracking error and weak multipath detection. In 1000 km/h high-speed mobility scenarios, Doppler error estimation accuracy reaches ±25 Hz (approaching the Cramér-Rao bound), with BER performance of 5.0 × 10−6 (7× improvement over single-Gaussian CNN’s 3.5 × 10−5). In 1024-user interference scenarios with BER = 10−5 requirements, SNR demand decreases from 11.4 dB to 9.2 dB (2.2 dB reduction), while maintaining EVM at 6.5% under 1024-user concurrency (compared to 16.5% for conventional MMSE), effectively increasing concurrent user capacity in 6G ultra-massive connectivity scenarios. These results validate the superior performance of SPARSE-OTFS-Net in 6G ultra-massive connectivity applications and provide critical technical support for realizing integrated space–air–ground networks. Full article
(This article belongs to the Section Microwave and Wireless Communications)
Show Figures

Figure 1

Back to TopTop