Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

Search Results (426)

Search Parameters:
Keywords = Efficient Attention Pyramid

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
25 pages, 73925 KB  
Article
Attention-Guided Edge-Optimized Network for Real-Time Detection and Counting of Pre-Weaning Piglets in Farrowing Crates
by Ning Kong, Tongshuai Liu, Guoming Li, Lei Xi, Shuo Wang and Yuepeng Shi
Animals 2025, 15(17), 2553; https://doi.org/10.3390/ani15172553 - 30 Aug 2025
Viewed by 59
Abstract
Accurate, real-time, and cost-effective detection and counting of pre-weaning piglets are critical for improving piglet survival rates. However, achieving this remains technically challenging due to high computational demands, frequent occlusion, social behaviors, and cluttered backgrounds in commercial farming environments. To address these challenges, [...] Read more.
Accurate, real-time, and cost-effective detection and counting of pre-weaning piglets are critical for improving piglet survival rates. However, achieving this remains technically challenging due to high computational demands, frequent occlusion, social behaviors, and cluttered backgrounds in commercial farming environments. To address these challenges, this study proposes a lightweight and attention-enhanced piglet detection and counting network based on an improved YOLOv8n architecture. The design includes three key innovations: (i) the standard C2f modules in the backbone were replaced with an efficient novel Multi-Scale Spatial Pyramid Attention (MSPA) module to enhance the multi-scale feature representation while a maintaining low computational cost; (ii) an improved Gather-and-Distribute (GD) mechanism was incorporated into the neck to facilitate feature fusion and accelerate inference; and (iii) the detection head and the sample assignment strategy were optimized to align the classification and localization tasks better, thereby improving the overall performance. Experiments on the custom dataset demonstrated the model’s superiority over state-of-the-art counterparts, achieving 88.5% precision and a 93.8% mAP0.5. Furthermore, ablation studies showed that the model reduced the parameters, floating point operations (FLOPs), and model size by 58.45%, 46.91% and 56.45% compared to those of the baseline YOLOv8n, respectively, while achieving a 2.6% improvement in the detection precision and a 4.41% reduction in the counting MAE. The trained model was deployed on a Raspberry Pi 4B with ncnn to verify the effectiveness of the lightweight design, reaching an average inference speed of <87 ms per image. These findings confirm that the proposed method offers a practical, scalable solution for intelligent pig farming, combining a high accuracy, efficiency, and real-time performance in resource-limited environments. Full article
(This article belongs to the Section Pigs)
Show Figures

Figure 1

30 pages, 25011 KB  
Article
Multi-Level Contextual and Semantic Information Aggregation Network for Small Object Detection in UAV Aerial Images
by Zhe Liu, Guiqing He and Yang Hu
Drones 2025, 9(9), 610; https://doi.org/10.3390/drones9090610 - 29 Aug 2025
Viewed by 194
Abstract
In recent years, detection methods for generic object detection have achieved significant progress. However, due to the large number of small objects in aerial images, mainstream detectors struggle to achieve a satisfactory detection performance. The challenges of small object detection in aerial images [...] Read more.
In recent years, detection methods for generic object detection have achieved significant progress. However, due to the large number of small objects in aerial images, mainstream detectors struggle to achieve a satisfactory detection performance. The challenges of small object detection in aerial images are primarily twofold: (1) Insufficient feature representation: The limited visual information for small objects makes it difficult for models to learn discriminative feature representations. (2) Background confusion: Abundant background information introduces more noise and interference, causing the features of small objects to easily be confused with the background. To address these issues, we propose a Multi-Level Contextual and Semantic Information Aggregation Network (MCSA-Net). MCSA-Net includes three key components: a Spatial-Aware Feature Selection Module (SAFM), a Multi-Level Joint Feature Pyramid Network (MJFPN), and an Attention-Enhanced Head (AEHead). The SAFM employs a sequence of dilated convolutions to extract multi-scale local context features and combines a spatial selection mechanism to adaptively merge these features, thereby obtaining the critical local context required for the objects, which enriches the feature representation of small objects. The MJFPN introduces multi-level connections and weighted fusion to fully leverage the spatial detail features of small objects in feature fusion and enhances the fused features further through a feature aggregation network. Finally, the AEHead is constructed by incorporating a sparse attention mechanism into the detection head. The sparse attention mechanism efficiently models long-range dependencies by computing the attention between the most relevant regions in the image while suppressing background interference, thereby enhancing the model’s ability to perceive targets and effectively improving the detection performance. Extensive experiments on four datasets, VisDrone, UAVDT, MS COCO, and DOTA, demonstrate that the proposed MCSA-Net achieves an excellent detection performance, particularly in small object detection, surpassing several state-of-the-art methods. Full article
(This article belongs to the Special Issue Intelligent Image Processing and Sensing for Drones, 2nd Edition)
Show Figures

Figure 1

26 pages, 23082 KB  
Article
SPyramidLightNet: A Lightweight Shared Pyramid Network for Efficient Underwater Debris Detection
by Yi Luo and Osama Eljamal
Appl. Sci. 2025, 15(17), 9404; https://doi.org/10.3390/app15179404 - 27 Aug 2025
Viewed by 224
Abstract
Underwater debris detection plays a crucial role in marine environmental protection. However, existing object detection algorithms generally suffer from excessive model complexity and insufficient detection accuracy, making it difficult to meet the real-time detection requirements in resource-constrained underwater environments. To address this challenge, [...] Read more.
Underwater debris detection plays a crucial role in marine environmental protection. However, existing object detection algorithms generally suffer from excessive model complexity and insufficient detection accuracy, making it difficult to meet the real-time detection requirements in resource-constrained underwater environments. To address this challenge, this paper proposes a novel lightweight object detection network named the Shared Pyramid Lightweight Network (SPyramidLightNet). The network adopts an improved architecture based on YOLOv11 and achieves an optimal balance between detection performance and computational efficiency by integrating three core innovative modules. First, the Split–Merge Attention Block (SMAB) employs a dynamic kernel selection mechanism and split–merge strategy, significantly enhancing feature representation capability through adaptive multi-scale feature fusion. Second, the C3 GroupNorm Detection Head (C3GNHead) introduces a shared convolution mechanism and GroupNorm normalization strategy, substantially reducing the computational complexity of the detection head while maintaining detection accuracy. Finally, the Shared Pyramid Convolution (SPyramidConv) replaces traditional pooling operations with a parameter-sharing multi-dilation-rate convolution architecture, achieving more refined and efficient multi-scale feature aggregation. Extensive experiments on underwater debris datasets demonstrate that SPyramidLightNet achieves 0.416 on the mAP@0.5:0.95 metric, significantly outperforming mainstream algorithms including Faster-RCNN, SSD, RT-DETR, and the YOLO series. Meanwhile, compared to the baseline YOLOv11, the proposed algorithm achieves an 11.8% parameter compression and a 17.5% computational complexity reduction, with an inference speed reaching 384 FPS, meeting the stringent requirements for real-time detection. Ablation experiments and visualization analyses further validate the effectiveness and synergistic effects of each core module. This research provides important theoretical guidance for the design of lightweight object detection algorithms and lays a solid foundation for the development of automated underwater debris recognition and removal technologies. Full article
Show Figures

Figure 1

28 pages, 4317 KB  
Article
Multi-Scale Attention Networks with Feature Refinement for Medical Item Classification in Intelligent Healthcare Systems
by Waqar Riaz, Asif Ullah and Jiancheng (Charles) Ji
Sensors 2025, 25(17), 5305; https://doi.org/10.3390/s25175305 - 26 Aug 2025
Viewed by 391
Abstract
The increasing adoption of artificial intelligence (AI) in intelligent healthcare systems has elevated the demand for robust medical imaging and vision-based inventory solutions. For an intelligent healthcare inventory system, accurate recognition and classification of medical items, including medicines and emergency supplies, are crucial [...] Read more.
The increasing adoption of artificial intelligence (AI) in intelligent healthcare systems has elevated the demand for robust medical imaging and vision-based inventory solutions. For an intelligent healthcare inventory system, accurate recognition and classification of medical items, including medicines and emergency supplies, are crucial for ensuring inventory integrity and timely access to life-saving resources. This study presents a hybrid deep learning framework, EfficientDet-BiFormer-ResNet, that integrates three specialized components: EfficientDet’s Bidirectional Feature Pyramid Network (BiFPN) for scalable multi-scale object detection, BiFormer’s bi-level routing attention for context-aware spatial refinement, and ResNet-18 enhanced with triplet loss and Online Hard Negative Mining (OHNM) for fine-grained classification. The model was trained and validated on a custom healthcare inventory dataset comprising over 5000 images collected under diverse lighting, occlusion, and arrangement conditions. Quantitative evaluations demonstrated that the proposed system achieved a mean average precision (mAP@0.5:0.95) of 83.2% and a top-1 classification accuracy of 94.7%, outperforming conventional models such as YOLO, SSD, and Mask R-CNN. The framework excelled in recognizing visually similar, occluded, and small-scale medical items. This work advances real-time medical item detection in healthcare by providing an AI-enabled, clinically relevant vision system for medical inventory management. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

24 pages, 103094 KB  
Article
A Method for Automated Detection of Chicken Coccidia in Vaccine Environments
by Ximing Li, Qianchao Wang, Lanqi Chen, Xinqiu Wang, Mengting Zhou, Ruiqing Lin and Yubin Guo
Vet. Sci. 2025, 12(9), 812; https://doi.org/10.3390/vetsci12090812 - 26 Aug 2025
Viewed by 320
Abstract
Vaccines play a crucial role in the prevention and control of chicken coccidiosis, effectively reducing economic losses in the poultry industry and significantly improving animal welfare. To ensure the production quality and immune effect of vaccines, accurate detection of chicken Coccidia oocysts in [...] Read more.
Vaccines play a crucial role in the prevention and control of chicken coccidiosis, effectively reducing economic losses in the poultry industry and significantly improving animal welfare. To ensure the production quality and immune effect of vaccines, accurate detection of chicken Coccidia oocysts in vaccine is essential. However, this task remains challenging due to the minute size of oocysts, variable spatial orientation, and morphological similarity among species. Therefore, we propose YOLO-Cocci, a chicken coccidia detection model based on YOLOv8n, designed to improve the detection accuracy of chicken coccidia oocysts in vaccine environments. Firstly, an efficient multi-scale attention (EMA) module was added to the backbone to enhance feature extraction and enable more precise focus on oocyst regions. Secondly, we developed the inception-style multi-scale fusion pyramid network (IMFPN) as an efficient neck. By integrating richer low-level features and applying convolutional kernels of varying sizes, IMFPN effectively preserves the features of small objects and enhances feature representation, thereby improving detection accuracy. Finally, we designed a lightweight feature-reconstructed and partially decoupled detection head (LFPD-Head), which enhances detection accuracy while reducing both model parameters and computational cost. The experimental results show that YOLO-Cocci achieves an mAP@0.5 of 89.6%, an increase of 6.5% over the baseline model, while reducing the number of parameters and computation by 14% and 12%, respectively. Notably, in the detection of Eimeria necatrix, mAP@0.5 increased by 14%. In order to verify the application effect of the improved detection algorithm, we developed client software that can realize automatic detection and visualize the detection results. This study will help improve the level of automated assessment of vaccine quality and thus promote the improvement of animal welfare. Full article
Show Figures

Figure 1

39 pages, 4783 KB  
Article
Sparse-MoE-SAM: A Lightweight Framework Integrating MoE and SAM with a Sparse Attention Mechanism for Plant Disease Segmentation in Resource-Constrained Environments
by Benhan Zhao, Xilin Kang, Hao Zhou, Ziyang Shi, Lin Li, Guoxiong Zhou, Fangying Wan, Jiangzhang Zhu, Yongming Yan, Leheng Li and Yulong Wu
Plants 2025, 14(17), 2634; https://doi.org/10.3390/plants14172634 - 24 Aug 2025
Viewed by 303
Abstract
Plant disease segmentation has achieved significant progress with the help of artificial intelligence. However, deploying high-accuracy segmentation models in resource-limited settings faces three key challenges, as follows: (A) Traditional dense attention mechanisms incur quadratic computational complexity growth (O(n2d)), rendering [...] Read more.
Plant disease segmentation has achieved significant progress with the help of artificial intelligence. However, deploying high-accuracy segmentation models in resource-limited settings faces three key challenges, as follows: (A) Traditional dense attention mechanisms incur quadratic computational complexity growth (O(n2d)), rendering them ill-suited for low-power hardware. (B) Naturally sparse spatial distributions and large-scale variations in the lesions on leaves necessitate models that concurrently capture long-range dependencies and local details. (C) Complex backgrounds and variable lighting in field images often induce segmentation errors. To address these challenges, we propose Sparse-MoE-SAM, an efficient framework based on an enhanced Segment Anything Model (SAM). This deep learning framework integrates sparse attention mechanisms with a two-stage mixture of experts (MoE) decoder. The sparse attention dynamically activates key channels aligned with lesion sparsity patterns, reducing self-attention complexity while preserving long-range context. Stage 1 of the MoE decoder performs coarse-grained boundary localization; Stage 2 achieves fine-grained segmentation by leveraging specialized experts within the MoE, significantly enhancing edge discrimination accuracy. The expert repository—comprising standard convolutions, dilated convolutions, and depthwise separable convolutions—dynamically routes features through optimized processing paths based on input texture and lesion morphology. This enables robust segmentation across diverse leaf textures and plant developmental stages. Further, we design a sparse attention-enhanced Atrous Spatial Pyramid Pooling (ASPP) module to capture multi-scale contexts for both extensive lesions and small spots. Evaluations on three heterogeneous datasets (PlantVillage Extended, CVPPP, and our self-collected field images) show that Sparse-MoE-SAM achieves a mean Intersection-over-Union (mIoU) of 94.2%—surpassing standard SAM by 2.5 percentage points—while reducing computational costs by 23.7% compared to the original SAM baseline. The model also demonstrates balanced performance across disease classes and enhanced hardware compatibility. Our work validates that integrating sparse attention with MoE mechanisms sustains accuracy while drastically lowering computational demands, enabling the scalable deployment of plant disease segmentation models on mobile and edge devices. Full article
(This article belongs to the Special Issue Advances in Artificial Intelligence for Plant Research)
Show Figures

Figure 1

29 pages, 23079 KB  
Article
An Aircraft Skin Defect Detection Method with UAV Based on GB-CPP and INN-YOLO
by Jinhong Xiong, Peigen Li, Yi Sun, Jinwu Xiang and Haiting Xia
Drones 2025, 9(9), 594; https://doi.org/10.3390/drones9090594 - 22 Aug 2025
Viewed by 224
Abstract
To address the problems of low coverage rate and low detection accuracy in UAV-based aircraft skin defect detection under complex real-world conditions, this paper proposes a method combining a Greedy-based Breadth-First Search Coverage Path Planning (GB-CPP) approach with an improved YOLOv11 architecture (INN-YOLO). [...] Read more.
To address the problems of low coverage rate and low detection accuracy in UAV-based aircraft skin defect detection under complex real-world conditions, this paper proposes a method combining a Greedy-based Breadth-First Search Coverage Path Planning (GB-CPP) approach with an improved YOLOv11 architecture (INN-YOLO). GB-CPP generates collision-free, near-optimal flight paths on the 3D aircraft surface using a discrete grid map. INN-YOLO enhances detection capability by reconstructing the neck with the BiFPN (Bidirectional Feature Pyramid Network) for better feature fusion, integrating the SimAM (Simple Attention Mechanism) with convolution for efficient small-target extraction, as well as employing RepVGG within the C3k2 layer to improve feature learning and speed. The model is deployed on a Jetson Nano for real-time edge inference. Results show that GB-CPP achieves 100% surface coverage with a redundancy rate not exceeding 6.74%. INN-YOLO was experimentally validated on three public datasets (10,937 images) and a self-collected dataset (1559 images), achieving mAP@0.5 scores of 42.30%, 84.10%, 56.40%, and 80.30%, representing improvements of 10.70%, 2.50%, 3.20%, and 6.70% over the baseline models, respectively. The proposed GB-CPP and INN-YOLO framework enables efficient, high-precision, and real-time UAV-based aircraft skin defect detection. Full article
(This article belongs to the Section Artificial Intelligence in Drones (AID))
Show Figures

Figure 1

20 pages, 16392 KB  
Article
PCC-YOLO: A Fruit Tree Trunk Recognition Algorithm Based on YOLOv8
by Yajie Zhang, Weiliang Jin, Baoxing Gu, Guangzhao Tian, Qiuxia Li, Baohua Zhang and Guanghao Ji
Agriculture 2025, 15(16), 1786; https://doi.org/10.3390/agriculture15161786 - 21 Aug 2025
Viewed by 299
Abstract
With the development of smart agriculture, the precise identification of fruit tree trunks by orchard management robots has become a key technology for achieving autonomous navigation. To solve the issue of tree trunks being hard to see against their background in orchards, this [...] Read more.
With the development of smart agriculture, the precise identification of fruit tree trunks by orchard management robots has become a key technology for achieving autonomous navigation. To solve the issue of tree trunks being hard to see against their background in orchards, this study introduces PCC-YOLO (PENet, CoT-Net, and Coord-SE attention-based YOLOv8), a new trunk detection model based on YOLOv8. It improves the ability to identify features in low-contrast situations by using a pyramid enhancement network (PENet), a context transformer (CoT-Net) module, and a combined coordinate and channel attention mechanism. By introducing a pyramid enhancement network (PENet) into YOLOv8, the model’s feature extraction ability under low-contrast conditions is enhanced. A context transformer module (CoT-Net) is then used to strengthen global perception capabilities, and a combination of coordinate attention (Coord-Att) and SENetV2 is employed to optimize target localization accuracy. Experimental results show that PCC-YOLO achieves a mean average precision (mAP) of 82.6% on a self-built orchard dataset (5000 images) and a detection speed of 143.36 FPS, marking a 4.8% improvement over the performance of the baseline YOLOv8 model, while maintaining a low computational load (7.8 GFLOPs). The model demonstrates a superior balance of accuracy, speed, and computational cost compared to results for the baseline YOLOv8 and other common YOLO variants, offering an efficient solution for the real-time autonomous navigation of orchard management robots. Full article
(This article belongs to the Section Artificial Intelligence and Digital Agriculture)
Show Figures

Figure 1

18 pages, 7729 KB  
Article
A Lightweight Traffic Sign Detection Model Based on Improved YOLOv8s for Edge Deployment in Autonomous Driving Systems Under Complex Environments
by Chen Xing, Haoran Sun and Jiafu Yang
World Electr. Veh. J. 2025, 16(8), 478; https://doi.org/10.3390/wevj16080478 - 21 Aug 2025
Viewed by 814
Abstract
Traffic sign detection is a core function of autonomous driving systems, requiring real-time and accurate target recognition in complex road environments. Existing lightweight detection models struggle to balance accuracy, efficiency, and robustness under computational constraints of vehicle-mounted edge devices. To address this, we [...] Read more.
Traffic sign detection is a core function of autonomous driving systems, requiring real-time and accurate target recognition in complex road environments. Existing lightweight detection models struggle to balance accuracy, efficiency, and robustness under computational constraints of vehicle-mounted edge devices. To address this, we propose a lightweight model integrating FasterNet, Efficient Multi-scale Attention (EMA), Bidirectional Feature Pyramid Network (BiFPN), and Group Separable Convolution (GSConv) based on YOLOv8s (FEBG-YOLOv8s). Key innovations include reconstructing the Cross Stage Partial Network 2 with Focus (C2f) module using FasterNet blocks to minimize redundant computation; integrating an EMA mechanism to enhance robustness against small and occluded targets; refining the neck network based on BiFPN via channel compression, downsampling layers, and skip connections to optimize shallow–deep semantic fusion; and designing a GSConv-based hybrid serial–parallel detection head (GSP-Detect) to preserve cross-channel information while reducing computational load. Experiments on Tsinghua–Tencent 100K (TT100K) show FEBG-YOLOv8s improves mean Average Precision at Intersection over Union 0.5 (mAP50) by 3.1% compared to YOLOv8s, with 4 million fewer parameters and 22.5% lower Giga Floating-Point Operations (GFLOPs). Generalizability experiments on the CSUST Chinese Traffic Sign Detection Benchmark (CCTSDB) validate robustness, with 3.3% higher mAP50, demonstrating its potential for real-time traffic sign detection on edge platforms. Full article
Show Figures

Figure 1

17 pages, 8132 KB  
Article
DGNCA-Net: A Lightweight and Efficient Insulator Defect Detection Model
by Qiang Chen, Yuanfeng Luo, Wu Yuan, Ruiliang Zhang and Yunshou Mao
Algorithms 2025, 18(8), 528; https://doi.org/10.3390/a18080528 - 20 Aug 2025
Viewed by 370
Abstract
This paper proposes a lightweight DGNCA-Net insulator defect detection algorithm based on improvements to the YOLOv11 framework, addressing the issues of high computational complexity and low detection accuracy for small targets in machine vision-based insulator defect detection methods. Firstly, to enhance the model’s [...] Read more.
This paper proposes a lightweight DGNCA-Net insulator defect detection algorithm based on improvements to the YOLOv11 framework, addressing the issues of high computational complexity and low detection accuracy for small targets in machine vision-based insulator defect detection methods. Firstly, to enhance the model’s ability to perceive multi-scale targets while reducing computational overhead, a lightweight Ghost-backbone network is designed. This network integrates the improved Ghost modules with the original YOLOv11 backbone layers to improve feature extraction efficiency. Meanwhile, the original C2PSA module is replaced with a CSPCA module incorporating Coordinate Attention, thereby strengthening the model’s spatial awareness and target localization capabilities. Secondly, to improve the detection accuracy of small insulator defects in complex scenes and reduce redundant feature information, a DC-PUFPN neck network is constructed. This network combines deformable convolutions with a progressive upsampling feature pyramid structure to optimize the Neck part of YOLOv11, enabling efficient feature fusion and information transfer, while retaining the original C3K2 module. Additionally, a composite loss function combining Wise-IoUv3 and Focal Loss is adopted to further accelerate model convergence and improve detection accuracy. Finally, the effectiveness and advancement of the proposed DGNCA-Net algorithm in insulator defect detection tasks are comprehensively validated through ablation studies, comparative experiments, and visualization results. Full article
(This article belongs to the Section Algorithms for Multidisciplinary Applications)
Show Figures

Figure 1

29 pages, 4839 KB  
Article
FED-UNet++: An Improved Nested UNet for Hippocampus Segmentation in Alzheimer’s Disease Diagnosis
by Liping Yang, Wei Zhang, Shengyu Wang, Xiaoru Yu, Bin Jing, Nairui Sun, Tengchao Sun and Wei Wang
Sensors 2025, 25(16), 5155; https://doi.org/10.3390/s25165155 - 19 Aug 2025
Viewed by 442
Abstract
The hippocampus is a key structure involved in the early pathological progression of Alzheimer’s disease. Accurate segmentation of this region is vital for the quantitative assessment of brain atrophy and the support of diagnostic decision-making. To address limitations in current MRI-based hippocampus segmentation [...] Read more.
The hippocampus is a key structure involved in the early pathological progression of Alzheimer’s disease. Accurate segmentation of this region is vital for the quantitative assessment of brain atrophy and the support of diagnostic decision-making. To address limitations in current MRI-based hippocampus segmentation methods—such as indistinct boundaries, small target size, and limited feature representation—this study proposes an enhanced segmentation framework called FED-UNet++. The residual feature reconstruction block (FRBlock) is introduced to strengthen the network’s ability to capture boundary cues and fine-grained structural details in shallow layers. The efficient attention pyramid (EAP) module enhances the integration of multi-scale features and spatial contextual information. The dynamic frequency context network (DFCN) mitigates the decoder’s limitations in capturing long-range dependencies and global semantic structures. Experimental results on the benchmark dataset demonstrate that FED-UNet++ achieves superior performance across multiple evaluation metrics, with an IoU of 74.95% and a Dice coefficient of 84.43% ± 0.21%, outperforming the baseline model in both accuracy and robustness. These findings confirm that FED-UNet++ is highly effective in segmenting small and intricate brain structures like the hippocampus, providing a robust and practical tool for MRI-based analysis of neurodegenerative diseases. Full article
(This article belongs to the Special Issue Sensors for Human Activity Recognition: 3rd Edition)
Show Figures

Figure 1

19 pages, 2717 KB  
Article
EASD: Exposure Aware Single-Step Diffusion Framework for Monocular Depth Estimation in Autonomous Vehicles
by Chenyuan Zhang and Deokwoo Lee
Appl. Sci. 2025, 15(16), 9130; https://doi.org/10.3390/app15169130 - 19 Aug 2025
Viewed by 257
Abstract
Monocular depth estimation (MDE) is a cornerstone of computer vision and is applied to diverse practical areas such as autonomous vehicles, robotics, etc., yet even the latest methods suffer substantial errors in high-dynamic-range (HDR) scenes where over- or under-exposure erases critical texture. To [...] Read more.
Monocular depth estimation (MDE) is a cornerstone of computer vision and is applied to diverse practical areas such as autonomous vehicles, robotics, etc., yet even the latest methods suffer substantial errors in high-dynamic-range (HDR) scenes where over- or under-exposure erases critical texture. To address this challenge in real-world autonomous driving scenarios, we propose the Exposure-Aware Single-Step Diffusion Framework for Monocular Depth Estimation (EASD). EASD leverages a pre-trained Stable Diffusion variational auto-encoder, freezing its encoder to extract exposure-robust latent RGB and depth representations. A single-step diffusion process then predicts the clean depth latent vector, eliminating iterative error accumulation and enabling real-time inference suitable for autonomous vehicle perception pipelines. To further enhance robustness under extreme lighting conditions, EASD introduces an Exposure-Aware Feature Fusion (EAF) module—an attention-based pyramid that dynamically modulates multi-scale features according to global brightness statistics. This mechanism suppresses bias in saturated regions while restoring detail in under-exposed areas. Furthermore, an Exposure-Balanced Loss (EBL) jointly optimises global depth accuracy, local gradient coherence and reliability in exposure-extreme regions—key metrics for safety-critical perception tasks such as obstacle detection and path planning. Experimental results on NYU-v2, KITTI, and related benchmarks demonstrate that EASD reduces absolute relative error by an average of 20% under extreme illumination, using only 60,000 labelled images. The framework achieves real-time performance (<50 ms per frame) and strikes a superior balance between accuracy, computational efficiency, and data efficiency, offering a promising solution for robust monocular depth estimation in challenging automotive lighting conditions such as tunnel transitions, night driving and sun glare. Full article
Show Figures

Figure 1

27 pages, 13262 KB  
Article
MLP-MFF: Lightweight Pyramid Fusion MLP for Ultra-Efficient End-to-End Multi-Focus Image Fusion
by Yuze Song, Xinzhe Xie, Buyu Guo, Xiaofei Xiong and Peiliang Li
Sensors 2025, 25(16), 5146; https://doi.org/10.3390/s25165146 - 19 Aug 2025
Viewed by 505
Abstract
Limited depth of field in modern optical imaging systems often results in partially focused images. Multi-focus image fusion (MFF) addresses this by synthesizing an all-in-focus image from multiple source images captured at different focal planes. While deep learning-based MFF methods have shown promising [...] Read more.
Limited depth of field in modern optical imaging systems often results in partially focused images. Multi-focus image fusion (MFF) addresses this by synthesizing an all-in-focus image from multiple source images captured at different focal planes. While deep learning-based MFF methods have shown promising results, existing approaches face significant challenges. Convolutional Neural Networks (CNNs) often struggle to capture long-range dependencies effectively, while Transformer and Mamba-based architectures, despite their strengths, suffer from high computational costs and rigid input size constraints, frequently necessitating patch-wise fusion during inference—a compromise that undermines the realization of a true global receptive field. To overcome these limitations, we propose MLP-MFF, a novel lightweight, end-to-end MFF network built upon the Pyramid Fusion Multi-Layer Perceptron (PFMLP) architecture. MLP-MFF is specifically designed to handle flexible input scales, efficiently learn multi-scale feature representations, and capture critical long-range dependencies. Furthermore, we introduce a Dual-Path Adaptive Multi-scale Feature-Fusion Module based on Hybrid Attention (DAMFFM-HA), which adaptively integrates hybrid attention mechanisms and allocates weights to optimally fuse multi-scale features, thereby significantly enhancing fusion performance. Extensive experiments on public multi-focus image datasets demonstrate that our proposed MLP-MFF achieves competitive, and often superior, fusion quality compared to current state-of-the-art MFF methods, all while maintaining a lightweight and efficient architecture. Full article
Show Figures

Figure 1

17 pages, 3569 KB  
Article
A Real-Time Mature Hawthorn Detection Network Based on Lightweight Hybrid Convolutions for Harvesting Robots
by Baojian Ma, Bangbang Chen, Xuan Li, Liqiang Wang and Dongyun Wang
Sensors 2025, 25(16), 5094; https://doi.org/10.3390/s25165094 - 16 Aug 2025
Viewed by 386
Abstract
Accurate real-time detection of hawthorn by vision systems is a fundamental prerequisite for automated harvesting. This study addresses the challenges in hawthorn orchards—including target overlap, leaf occlusion, and environmental variations—which lead to compromised detection accuracy, high computational resource demands, and poor real-time performance [...] Read more.
Accurate real-time detection of hawthorn by vision systems is a fundamental prerequisite for automated harvesting. This study addresses the challenges in hawthorn orchards—including target overlap, leaf occlusion, and environmental variations—which lead to compromised detection accuracy, high computational resource demands, and poor real-time performance in existing methods. To overcome these limitations, we propose YOLO-DCL (group shuffling convolution and coordinate attention integrated with a lightweight head based on YOLOv8n), a novel lightweight hawthorn detection model. The backbone network employs dynamic group shuffling convolution (DGCST) for efficient and effective feature extraction. Within the neck network, coordinate attention (CA) is integrated into the feature pyramid network (FPN), forming an enhanced multi-scale feature pyramid network (HSPFN); this integration further optimizes the C2f structure. The detection head is designed utilizing shared convolution and batch normalization to streamline computation. Additionally, the PIoUv2 (powerful intersection over union version 2) loss function is introduced to significantly reduce model complexity. Experimental validation demonstrates that YOLO-DCL achieves a precision of 91.6%, recall of 90.1%, and mean average precision (mAP) of 95.6%, while simultaneously reducing the model size to 2.46 MB with only 1.2 million parameters and 4.8 GFLOPs computational cost. To rigorously assess real-world applicability, we developed and deployed a detection system based on the PySide6 framework on an NVIDIA Jetson Xavier NX edge device. Field testing validated the model’s robustness, high accuracy, and real-time performance, confirming its suitability for integration into harvesting robots operating in practical orchard environments. Full article
(This article belongs to the Section Sensors and Robotics)
Show Figures

Figure 1

27 pages, 10760 KB  
Article
U-MoEMamba: A Hybrid Expert Segmentation Model for Cabbage Heads in Complex UAV Low-Altitude Remote Sensing Scenarios
by Rui Li, Xue Ding, Shuangyun Peng and Fapeng Cai
Agriculture 2025, 15(16), 1723; https://doi.org/10.3390/agriculture15161723 - 9 Aug 2025
Viewed by 448
Abstract
To address the challenges of missed and incorrect segmentation in cabbage head detection under complex field conditions using UAV-based low-altitude remote sensing, this study proposes U-MoEMamba, an innovative dynamic state-space framework with a mixture-of-experts (MoE) collaborative segmentation network. The network constructs a dynamic [...] Read more.
To address the challenges of missed and incorrect segmentation in cabbage head detection under complex field conditions using UAV-based low-altitude remote sensing, this study proposes U-MoEMamba, an innovative dynamic state-space framework with a mixture-of-experts (MoE) collaborative segmentation network. The network constructs a dynamic multi-scale expert architecture, integrating three expert paradigms—multi-scale convolution, attention mechanisms, and Mamba pathways—for efficient and accurate segmentation. First, we design the MambaMoEFusion module, a collaborative expert fusion block that employs a lightweight gating network to dynamically integrate outputs from different experts, enabling adaptive selection and optimal feature aggregation. Second, we propose an MSCrossDualAttention module as an attention expert branch, leveraging a dual-path interactive attention mechanism to jointly extract shallow details and deep semantic information, effectively capturing the contextual features of cabbages. Third, the VSSBlock is incorporated as an expert pathway to model long-range dependencies via visual state-space representation. Evaluation on datasets of different cabbage growth stages shows that U-MoEMamba achieves an mIoU of 89.51% on the early-heading dataset, outperforming SegMamba and EfficientPyramidMamba by 3.91% and 1.4%, respectively. On the compact heading dataset, it reaches 91.88%, with improvements of 2.41% and 1.65%. This study provides a novel paradigm for intelligent monitoring of open-field crops. Full article
(This article belongs to the Section Artificial Intelligence and Digital Agriculture)
Show Figures

Figure 1

Back to TopTop