Saved Queries

Coastal debris is a global environmental issue that requires systematic monitoring strategies based on reliable statistical data. Recent advances in remote sensing and deep learning-based object detection have enabled the development of efficient coastal debris monitoring systems. In this study, two state-of-the-art object detection models—RT-DETR and YOLOv10—were applied to UAV-acquired images for coastal debris detection. Their false positive characteristics were analyzed to provide guidance on model selection under different coastal environmental conditions. Quantitative evaluation using mean average precision (mAP@0.5) showed comparable performance between the two models (RT-DETR: 0.945, YOLOv10: 0.957). However, bounding box label accuracy revealed a significant gap, with RT-DETR achieving 80.18% and YOLOv10 only 53.74%. Class-specific analysis indicated that both models failed to detect Metal and Glass and showed low accuracy for fragmented debris, while buoy-type objects with high structural integrity (Styrofoam Buoy, Plastic Buoy) were consistently identified. Error analysis further revealed that RT-DETR tended to overgeneralize by misclassifying untrained objects as similar classes, whereas YOLOv10 exhibited pronounced intra-class confusion in fragment-type objects. These findings demonstrate that mAP alone is insufficient to evaluate model performance in real-world coastal monitoring. Instead, model assessment should account for training data balance, coastal environmental characteristics, and UAV imaging conditions. Future studies should incorporate diverse coastal environments and apply dataset augmentation to establish statistically robust and standardized monitoring protocols for coastal debris. Full article

(This article belongs to the Section Ocean Engineering)

►▼ Show Figures

Figure 1

28 pages, 25154 KB

Open AccessArticle

A Progressive Target-Aware Network for Drone-Based Person Detection Using RGB-T Images

by Zhipeng He, Boya Zhao, Yuanfeng Wu, Yuyang Jiang and Qingzhan Zhao

Remote Sens. 2025, 17(19), 3361; https://doi.org/10.3390/rs17193361 (registering DOI) - 4 Oct 2025

Abstract

Drone-based target detection using visible and thermal (RGB-T) images is critical in disaster rescue, intelligent transportation, and wildlife monitoring. However, persons typically occupy fewer pixels and exhibit more varied postures than vehicles or large animals, making them difficult to detect in unmanned aerial vehicle (UAV) remote sensing images with complex backgrounds. We propose a novel progressive target-aware network (PTANet) for person detection using RGB-T images. A global adaptive feature fusion module (GAFFM) is designed to fuse the texture and thermal features of persons. A progressive focusing strategy is used. Specifically, we incorporate a person segmentation auxiliary branch (PSAB) during training to enhance target discrimination, while a cross-modality background mask (CMBM) is applied in the inference phase to suppress irrelevant background regions. Extensive experiments demonstrate that the proposed PTANet achieves high accuracy and generalization performance, reaching 79.5%, 47.8%, and 97.3% mean average precision (mAP)@50 on three drone-based person detection benchmarks (VTUAV-det, RGBTDronePerson, and VTSaR), with only 4.72 M parameters. PTANet deployed on an embedded edge device with TensorRT acceleration and quantization achieves an inference speed of 11.177 ms (640 × 640 pixels), indicating its promising potential for real-time onboard person detection. The source code is publicly available on GitHub. Full article

(This article belongs to the Special Issue Object Detection and Information Extraction Based on Remote Sensing Imagery (Second Edition))

24 pages, 14242 KB

Open AccessArticle

DBA-YOLO: A Dense Target Detection Model Based on Lightweight Neural Networks

by Zhiyong He, Jiahong Yang, Hongtian Ning, Chengxuan Li and Qiang Tang

J. Imaging 2025, 11(10), 345; https://doi.org/10.3390/jimaging11100345 (registering DOI) - 4 Oct 2025

Abstract

Current deep learning-based dense target detection models face dual challenges in industrial scenarios: high computational complexity leading to insufficient inference efficiency on mobile devices, and missed/false detections caused by dense small targets, high inter-class similarity, and complex background interference. To address these issues, this paper proposes DBA-YOLO, a lightweight model based on YOLOv10, which significantly reduces computational complexity through model compression and algorithm optimization while maintaining high accuracy. Key improvements include the following: (1) a C2f PA module for enhanced feature extraction, (2) a parameter-refined BIMAFPN neck structure to improve small target detection, and (3) a DyDHead module integrating scale, space, and task awareness for spatial feature weighting. To validate DBA-YOLO, we constructed a real-world dataset from cigarette package images. Experiments on SKU-110K and our dataset show that DBA-YOLO achieves 91.3% detection accuracy (1.4% higher than baseline), with mAP and mAP75 improvements of 2–3%. Additionally, the model reduces parameters by 3.6%, balancing efficiency and performance for resource-constrained devices. Full article

(This article belongs to the Section Computer Vision and Pattern Recognition)

►▼ Show Figures

Figure 1

20 pages, 7348 KB

Open AccessArticle

A Sketch-Based Cross-Modal Retrieval Model for Building Localization Without Satellite Signals

by Haihua Du, Jiawei Fan, Yitao Huang, Longyang Lin and Jiuchao Qian

Electronics 2025, 14(19), 3936; https://doi.org/10.3390/electronics14193936 (registering DOI) - 4 Oct 2025

Abstract

In existing non-satellite navigation systems, visual localization is widely adopted for its high precision. However, in scenarios with highly similar building structures, traditional visual localization methods that rely on direct coordinate prediction often suffer from decreased accuracy or even failure. Moreover, as scene complexity increases, their robustness tends to decline. To address these challenges, this paper proposes a Sketch Line Information Consistency Generation (SLIC) model for indirect building localization. Instead of regressing geographic coordinates, the model retrieves candidate building images that correspond to hand-drawn sketches, and these retrieved results serve as proxies for localization in satellite-denied environments. Within the model, the Line-Attention Block and Relation Block are designed to extract fine-grained line features and structural correlations, thereby improving retrieval accuracy. Experiments on multiple architectural datasets demonstrate that the proposed approach achieves high precision and robustness, with mAP@2 values ranging from 0.87 to 1.00, providing a practical alternative to conventional coordinate-based localization methods. Full article

(This article belongs to the Special Issue Recent Advances in Autonomous Localization and Navigation System)

24 pages, 9586 KB

Open AccessArticle

Optimized Recognition Algorithm for Remotely Sensed Sea Ice in Polar Ship Path Planning

by Li Zhou, Runxin Xu, Jiayi Bian, Shifeng Ding, Sen Han and Roger Skjetne

Remote Sens. 2025, 17(19), 3359; https://doi.org/10.3390/rs17193359 (registering DOI) - 4 Oct 2025

Abstract

Collisions between ships and sea ice pose a significant threat to maritime safety, making it essential to detect sea ice and perform safety-oriented path planning for polar navigation. This paper utilizes an optimized You Only Look Once version 5 (YOLOv5) model, designated as YOLOv5-ICE, for the detection of sea ice in satellite imagery, with the resultant detection data being employed to input obstacle coordinates into a ship path planning system. The enhancements include the Squeeze-and-Excitation (SE) attention mechanism, improved spatial pyramid pooling, and the Flexible ReLU (FReLU) activation function. The improved YOLOv5-ICE shows enhanced performance, with its mAP increasing by 3.5% compared to the baseline YOLOv5 and also by 1.3% compared to YOLOv8. YOLOv5-ICE demonstrates robust performance in detecting small sea ice targets within large-scale satellite images and excels in high ice concentration regions. For path planning, the Any-Angle Path Planning on Grids algorithm is applied to simulate routes based on detected sea ice floes. The objective function incorporates the path length, number of ship turns, and sea ice risk value, enabling path planning under varying ice concentrations. By integrating detection and path planning, this work proposes a novel method to enhance navigational safety in polar regions. Full article

(This article belongs to the Special Issue Remote Sensing of River and Lake Ice/Water Using Spaceborne, Airborne, and Ground Platforms)

►▼ Show Figures

Figure 1

15 pages, 2159 KB

Open AccessArticle

Benchmarking Lightweight YOLO Object Detectors for Real-Time Hygiene Compliance Monitoring

by Leen Alashrafi, Raghad Badawood, Hana Almagrabi, Mayda Alrige, Fatemah Alharbi and Omaima Almatrafi

Sensors 2025, 25(19), 6140; https://doi.org/10.3390/s25196140 (registering DOI) - 4 Oct 2025

Abstract

Ensuring hygiene compliance in regulated environments—such as food processing facilities, hospitals, and public indoor spaces—requires reliable detection of personal protective equipment (PPE) usage, including gloves, face masks, and hairnets. Manual inspection is labor-intensive and unsuitable for continuous, real-time enforcement. This study benchmarks three lightweight object detection models—YOLOv8n, YOLOv10n, and YOLOv12n—for automated PPE compliance monitoring using a large curated dataset of over 31,000 annotated images. The dataset spans seven classes representing both compliant and non-compliant conditions: glove, no_glove, mask, no_mask, incorrect_mask, hairnet, and no_hairnet. All evaluations were conducted using both detection accuracy metrics (mAP@50, mAP@50–95, precision, recall) and deployment-relevant efficiency metrics (inference speed, model size, GFLOPs). Among the three models, YOLOv10n achieved the highest mAP@50 (85.7%) while maintaining competitive efficiency, indicating strong suitability for resource-constrained IoT-integrated deployments. YOLOv8n provided the highest localization accuracy at stricter thresholds (mAP@50–95), while YOLOv12n favored ultra-lightweight operation at the cost of reduced accuracy. The results provide practical guidance for selecting nano-scale detection models in real-time hygiene compliance systems and contribute a reproducible, deployment-aware evaluation framework for computer vision in hygiene-critical settings. Full article

(This article belongs to the Section Internet of Things)

►▼ Show Figures

Figure 1

21 pages, 3489 KB

Open AccessArticle

GA-YOLOv11: A Lightweight Subway Foreign Object Detection Model Based on Improved YOLOv11

by Ning Guo, Min Huang and Wensheng Wang

Sensors 2025, 25(19), 6137; https://doi.org/10.3390/s25196137 (registering DOI) - 4 Oct 2025

Abstract

Modern subway platforms are generally equipped with platform screen door systems to enhance safety, but the gap between the platform screen doors and train doors may cause passengers or objects to become trapped, leading to accidents. Addressing the issues of excessive parameter counts and computational complexity in existing foreign object intrusion detection algorithms, as well as false positives and false negatives for small objects, this article introduces a lightweight deep learning model based on YOLOv11n, named GA-YOLOv11. First, a lightweight GhostConv convolution module is introduced into the backbone network to reduce computational resource waste in irrelevant areas, thereby lowering model complexity and computational load. Additionally, the GAM attention mechanism is incorporated into the head network to enhance the model’s ability to distinguish features, enabling precise identification of object location and category, and significantly reducing the probability of false positives and false negatives. Experimental results demonstrate that in comparison to the original YOLOv11n model, the improved model achieves 3.3%, 3.2%, 1.2%, and 3.5% improvements in precision, recall, mAP@0.5, and mAP@0.5: 0.95, respectively. In contrast to the original YOLOv11n model, the number of parameters and GFLOPs were reduced by 18% and 7.9%, respectfully, while maintaining the same model size. The improved model is more lightweight while ensuring real-time performance and accuracy, designed for detecting foreign objects in subway platform gaps. Full article

(This article belongs to the Special Issue Image Processing and Analysis for Object Detection: 3rd Edition)

►▼ Show Figures

Figure 1

14 pages, 1893 KB

Open AccessArticle

Anti-Photoaging Effects of a Standardized Hot Water Extract of Petasites japonicus Leaves in Ultraviolet B-Exposed Hairless Mice

by Hyeon-A Song, Min-Ji Park, Chae-Won Lee, Sangsu Park, Jong Kil Lee, Kyung-Sook Chung and Kyung-Tae Lee

Pharmaceuticals 2025, 18(10), 1490; https://doi.org/10.3390/ph18101490 - 3 Oct 2025

Abstract

Background: Ultraviolet B (UVB) radiation accelerates skin aging by inducing oxidative stress, collagen degradation, and cellular senescence. Although Petasites japonicus is known for its antioxidant properties, its anti-photoaging potential remains underexplored. This research explored the protective properties of a hot water extract from P. japonicus leaves (KP-1) against photoaging caused by UVB exposure. Methods: Hairless mice were exposed to UVB three times per week and orally administered KP-1 for 13 weeks. Wrinkle formation, epidermal thickness, skin hydration, and collagen content were assessed. Protein expression related to MAPK/AP-1, TGF-β/Smad2/3, and p53/p21 pathways was analyzed by Western blotting. Results: KP-1 significantly reduced UVB-induced wrinkle area, epidermal and dermal thickening, and transepidermal water loss while restoring collagen density and skin hydration. KP-1 inhibited MMP-1 expression, enhanced COL1A1 levels, suppressed MAPK/AP-1 activation, and activated TGF-β/Smad2/3 signaling. It also balanced p53/p21 expression and restored cyclin D1 and CDK4 levels, thereby preventing UVB-induced senescence. Conclusions: The findings of this research revealed that KP-1 can serve as a promising natural substance for safeguarding the skin from damage and aging caused by UVB exposure. Full article

(This article belongs to the Topic Bioactive Phytochemicals from Plant Essential Oils and Extracts, 2nd Edition)

►▼ Show Figures

Figure 1

21 pages, 7207 KB

Open AccessArticle

Optimization Algorithm for Detection of Impurities in Polypropylene Random Copolymer Raw Materials Based on YOLOv11

by Mingchen Dai and Xuedong Jing

Electronics 2025, 14(19), 3934; https://doi.org/10.3390/electronics14193934 - 3 Oct 2025

Abstract

Impurities in polypropylene random copolymer (PPR) raw materials can seriously affect the performance of the final product, and efficient and accurate impurity detection is crucial to ensure high production quality. In order to solve the problems of high small-target miss rates, weak anti-interference ability, and difficulty in balancing accuracy and speed in existing detection methods used in complex industrial scenarios, this paper proposes an enhanced machine vision detection algorithm based on YOLOv11. Firstly, the FasterLDConv module dynamically adjusts the position of sampling points through linear deformable convolution (LDConv), which improves the feature extraction ability of small-scale targets on complex backgrounds while maintaining lightweight features. The IR-EMA attention mechanism is a novel approach that combines an efficient reverse residual architecture with multi-scale attention. This combination enables the model to jointly capture feature channel dependencies and spatial relationships, thereby enhancing its sensitivity to weak impurity features. Again, a DC-DyHead deformable dynamic detection head is constructed, and deformable convolutions are embedded into the spatial perceptual attention of DyHead to enhance its feature modelling ability for anomalies and occluded impurities. We introduce an enhanced InnerMPDIoU loss function to optimise the bounding box regression strategy. This new method addresses issues related to traditional CIoU losses, including excessive penalties imposed on small targets and a lack of sufficient gradient guidance in situations where there is almost no overlap. The results indicate that the average precision (mAP@0.5) of the improved algorithm on the self-made PPR impurity dataset reached 88.6%, which is 2.3% higher than that of the original YOLOv11n, while precision (P) and recall (R) increased by 2.4% and 2.8%, respectively. This study provides a reliable technical solution for the quality inspection of PPR raw materials and serves as a reference for algorithm optimisation in the field of industrial small-target detection. Full article

15 pages, 3332 KB

Open AccessArticle

YOLOv11-XRBS: Enhanced Identification of Small and Low-Detail Explosives in X-Ray Backscatter Images

by Baolu Yang, Zhe Yang, Xin Wang, Baozhong Mu, Jie Xu and Hong Li

Sensors 2025, 25(19), 6130; https://doi.org/10.3390/s25196130 - 3 Oct 2025

Abstract

Identifying concealed explosives in X-ray backscatter (XRBS) imagery remains a critical challenge, primarily due to low image contrasts, cluttered backgrounds, small object sizes, and limited structural details. To address these limitations, we propose YOLOv11-XRBS, an enhanced detection framework tailored to the characteristics of XRBS images. A dedicated dataset (SBCXray) comprising over 10,000 annotated images of simulated explosive scenarios under varied concealment conditions was constructed to support training and evaluation. The proposed framework introduces three targeted improvements: (1) adaptive architectural refinement to enhance multi-scale feature representation and suppress background interference, (2) a Size-Aware Focal Loss (SaFL) strategy to improve the detection of small and weak-feature objects, and (3) a recomposed loss function with scale-adaptive weighting to achieve more accurate bounding box localization. The experiments demonstrated that YOLOv11-XRBS achieves better performance compared to both existing YOLO variants and classical detection models such as Faster R-CNN, SSD512, RetinaNet, DETR, and VGGNet, achieving a mean average precision (mAP) of 94.8%. These results confirm the robustness and practicality of the proposed framework, highlighting its potential deployment in XRBS-based security inspection systems. Full article

(This article belongs to the Special Issue Advanced Spectroscopy-Based Sensors and Spectral Analysis Technology)

►▼ Show Figures

Figure 1

37 pages, 10380 KB

Open AccessArticle

FEWheat-YOLO: A Lightweight Improved Algorithm for Wheat Spike Detection

by Hongxin Wu, Weimo Wu, Yufen Huang, Shaohua Liu, Yanlong Liu, Nannan Zhang, Xiao Zhang and Jie Chen

Plants 2025, 14(19), 3058; https://doi.org/10.3390/plants14193058 - 3 Oct 2025

Abstract

Accurate detection and counting of wheat spikes are crucial for yield estimation and variety selection in precision agriculture. However, challenges such as complex field environments, morphological variations, and small target sizes hinder the performance of existing models in real-world applications. This study proposes FEWheat-YOLO, a lightweight and efficient detection framework optimized for deployment on agricultural edge devices. The architecture integrates four key modules: (1) FEMANet, a mixed aggregation feature enhancement network with Efficient Multi-scale Attention (EMA) for improved small-target representation; (2) BiAFA-FPN, a bidirectional asymmetric feature pyramid network for efficient multi-scale feature fusion; (3) ADown, an adaptive downsampling module that preserves structural details during resolution reduction; and (4) GSCDHead, a grouped shared convolution detection head for reduced parameters and computational cost. Evaluated on a hybrid dataset combining GWHD2021 and a self-collected field dataset, FEWheat-YOLO achieved a COCO-style AP of 51.11%, AP@50 of 89.8%, and AP scores of 18.1%, 50.5%, and 61.2% for small, medium, and large targets, respectively, with an average recall (AR) of 58.1%. In wheat spike counting tasks, the model achieved an R² of 0.941, MAE of 3.46, and RMSE of 6.25, demonstrating high counting accuracy and robustness. The proposed model requires only 0.67 M parameters, 5.3 GFLOPs, and 1.6 MB of storage, while achieving an inference speed of 54 FPS. Compared to YOLOv11n, FEWheat-YOLO improved AP@50, AP_s, AP_m, AP_l, and AR by 0.53%, 0.7%, 0.7%, 0.4%, and 0.3%, respectively, while reducing parameters by 74%, computation by 15.9%, and model size by 69.2%. These results indicate that FEWheat-YOLO provides an effective balance between detection accuracy, counting performance, and model efficiency, offering strong potential for real-time agricultural applications on resource-limited platforms. Full article

(This article belongs to the Special Issue Advances in Artificial Intelligence for Plant Research)

22 pages, 16284 KB

Open AccessArticle

C5LS: An Enhanced YOLOv8-Based Model for Detecting Densely Distributed Small Insulators in Complex Railway Environments

by Xiaoai Zhou, Meng Xu and Peifen Pan

Appl. Sci. 2025, 15(19), 10694; https://doi.org/10.3390/app151910694 - 3 Oct 2025

Abstract

The complex environment along railway lines, characterized by low imaging quality, strong background interference, and densely distributed small objects, causes existing detection models to suffer from low accuracy in practical applications. To tackle these challenges, this study aims to develop a robust and lightweight insulator detection model specifically optimized for these challenging railway scenarios. To this end, we release a dedicated comprehensive dataset named complexRailway that covers typical railway scenarios to address the limitations of existing insulator datasets, such as the lack of small-scale objects in high-interference backgrounds. On this basis, we present CutP5-LargeKernelAttention-SIoU (C5LS), an improved YOLOv8 variant with three key improvements: (1) optimized YOLOv8’s detection head by removing the P5 branch to improve feature extraction for small- and medium-sized targets while reducing computational redundancy, (2) integrating a lightweight Large Separable Kernel Attention (LSKA) module to expand the receptive field and improve contextual modeling, (3) and replacing CIoU with SIoU loss to refine localization accuracy and accelerate convergence. Experimental results demonstrate that it reaches 94.7% in mAP@0.5 and 65.5% in mAP@0.5–0.95, outperforming the baseline model by 1.9% and 3.5%, respectively. With an inference speed of 104 FPS and a model size of 13.9 MB, the model balances high precision and lightweight deployment. By providing stable and accurate insulator detection, C5LS not only offers reliable spatial positioning basis for subsequent defect identification but also builds an efficient and feasible intelligent monitoring solution for these failure-prone insulators, thereby effectively enhancing the operational safety and maintenance efficiency of the railway power system. Full article

►▼ Show Figures

Figure 1

27 pages, 3475 KB

Open AccessArticle

Pillar-Bin: A 3D Object Detection Algorithm for Communication-Denied UGVs

by Cunfeng Kang, Yukun Liu, Junfeng Chen and Siqi Tang

Drones 2025, 9(10), 686; https://doi.org/10.3390/drones9100686 - 3 Oct 2025

Abstract

Addressing the challenge of acquiring high-precision leader Unmanned Ground Vehicle (UGV) pose information in real time for communication-denied leader–follower formations, this study proposed Pillar-Bin, a 3D object detection algorithm based on the PointPillars framework. Pillar-Bin introduced an Interval Discretization Strategy (Bin) within the detection head, mapping critical target parameters (dimensions, center, heading angle) to predefined intervals for joint classification-residual regression optimization. This effectively suppresses environmental noise and enhances localization accuracy. Simulation results on the KITTI dataset demonstrate that the Pillar-Bin algorithm significantly outperforms PointPillars in detection accuracy. In the 3D detection mode, the mean Average Precision (mAP) increased by 2.95%, while in the bird’s eye view (BEV) detection mode, mAP was improved by 0.94%. With a processing rate of 48 frames per second (FPS), the proposed algorithm effectively enhanced detection accuracy while maintaining the high real-time performance of the baseline method. To evaluate Pillar-Bin’s real-vehicle performance, a leader UGV pose extraction scheme was designed. Real-vehicle experiments show absolute X/Y positioning errors below 5 cm and heading angle errors under 5° in Cartesian coordinates, with the pose extraction processing speed reaching 46 FPS. The proposed Pillar-Bin algorithm and its pose extraction scheme provide efficient and accurate leader pose information for formation control, demonstrating practical engineering utility. Full article

20 pages, 3740 KB

Open AccessArticle

Wildfire Target Detection Algorithms in Transmission Line Corridors Based on Improved YOLOv11_MDS

by Guanglun Lei, Jun Dong, Yi Jiang, Li Tang, Li Dai, Dengyong Cheng, Chuang Chen, Daochun Huang, Tianhao Peng, Biao Wang and Yifeng Lin

Appl. Sci. 2025, 15(19), 10688; https://doi.org/10.3390/app151910688 - 3 Oct 2025

Abstract

To address the issues of small-target missed detection, false alarms from cloud/fog interference, and low computational efficiency in traditional wildfire detection for transmission line corridors, this paper proposes a YOLOv11_MDS detection model by integrating Multi-Scale Convolutional Attention (MSCA) and Distribution-Shifted Convolution (DSConv). The MSCA module is embedded in the backbone and neck to enhance multi-scale dynamic feature extraction of flame and smoke through collaborative depth strip convolution and channel attention. The DSConv with a quantized dynamic shift mechanism is introduced to significantly reduce computational complexity while maintaining detection accuracy. The improved model, as shown in experiments, achieves an mAP@0.5 of 88.21%, which is 2.93 percentage points higher than the original YOLOv11. It also demonstrates a 3.33% increase in recall and a frame rate of 242 FPS, with notable improvements in detecting small targets (pixel occupancy < 1%). Generalization tests demonstrate mAP improvements of 0.4% and 0.7% on benchmark datasets, effectively resolving false/missed detection in complex backgrounds. This study provides an engineering solution for real-time wildfire monitoring in transmission lines with balanced accuracy and efficiency. Full article

►▼ Show Figures

Figure 1

20 pages, 57579 KB

Open AccessArticle

Radar–Camera Fusion in Perspective View and Bird’s Eye View for 3D Object Detection

by Yuhao Xiao, Xiaoqing Chen, Yingkai Wang and Zhongliang Fu

Sensors 2025, 25(19), 6106; https://doi.org/10.3390/s25196106 - 3 Oct 2025

Abstract

Three-dimensional object detection based on the fusion of millimeter-wave radar and cameras is increasingly gaining attention due to characteristics of low cost, high accuracy, and strong robustness. Recently, the bird’s eye view (BEV) fusion paradigm has dominated radar–camera fusion-based 3D object detection methods. In the BEV fusion paradigm, the detection accuracy is jointly determined by the precision of both image BEV features and radar BEV features. The precision of image BEV features is significantly influenced by depth estimation accuracy, whereas estimating depth from a monocular image is naturally a challenging, ill-posed problem. In this article, we propose a novel approach to enhance depth estimation accuracy by fusing camera perspective view (PV) features and radar perspective view features, thereby improving the precision of image BEV features. The refined image BEV features are then fused with radar BEV features to achieve more accurate 3D object detection results. To realize PV fusion, we designed a radar image generation module based on radar cross-section (RCS) and depth information, accurately projecting radar data into the camera view to generate radar images. The radar images are used to extract radar PV features. We present a cross-modal feature fusion module using the attention mechanism to dynamically fuse radar PV features with camera PV features. Comprehensive evaluations on the nuScenes 3D object detection dataset demonstrate that the proposed dual-view fusion paradigm outperforms the BEV fusion paradigm, achieving state-of-the-art performance with 64.2 NDS and 56.3 mAP. Full article

(This article belongs to the Section Sensing and Imaging)

►▼ Show Figures

Figure 1

Show export options Show export options

Select all

Export citation of selected articles as:

Error

Oops... you haven't selected anything for export.

Displaying article 1-50 on page 1 of 220.

Go to page 1 2 3 4 5

Search Results (10,965)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI