RLRD-YOLO: An Improved YOLOv8 Algorithm for Small Object Detection from an Unmanned Aerial Vehicle (UAV) Perspective
Abstract
:1. Introduction
- (1)
- A novel Receptive Field Attention Convolution (RFCBAMConv) module is introduced to replace standard convolutions and refine the C2f module within the backbone. By integrating the strengths of the Convolutional Block Attention Module (CBAM) [11] and Receptive Field Attention Convolution (RFAConv) [12], this component significantly boosts the network’s feature extraction capacity. Furthermore, the Spatial Pyramid Pooling Fast (SPPF) layer employs Large Separable Kernel Attention (LSKA) [13], which broadens the overall feature-map perception and enhances multi-scale target processing.
- (2)
- The neck component of the model utilizes the Reparameterized Generalized Feature Pyramid Network (RepGFPN) [14] in place of the original architecture. RepGFPN effectively integrates deep semantic information with shallow, detailed features, improving feature complementarity across different levels. Additionally, a small-target detection layer is integrated into RepGFPN to strengthen both classification and localization for smaller objects.
- (3)
- The original detection head is replaced with a Dynamic Head (Dyhead) [15], which improves performance in complex environments and accommodates diverse target features by dynamically adjusting parameters. Through the improvement of scale perception, spatial awareness, and task-specific comprehension, this adjustment increases the model’s capacity to recognize small targets.
- (4)
- Experimental results on the VisDrone2019 [16] dataset show that RLRD-YOLO significantly outperforms YOLOv8n in detection accuracy. Compared to other leading algorithms, RLRD-YOLO achieves superior performance on key metrics, validating its effectiveness. Additionally, generalization experiments on an infrared UAV dataset were conducted to validate the reliability of RLRD-YOLO in various scenarios.
2. Related Work
2.1. YOLO Detection Algorithm
- (1)
- Backbone Network: YOLOv8 employs the C2f module, an improvement over the earlier CSPDarkNet53 (C3) [22]. By merging outputs from all bottleneck modules, the C2f module accelerates training and improves the extraction of key features.
- (2)
- Neck Network: The neck network is critical for facilitating feature fusion across different levels. YOLOv8 integrates FPN and PAN in its architecture, enabling the more effective extraction and fusion of information from multi-resolution feature maps.
- (3)
- Head Network: YOLOv8’s head shifts from an anchor-based scheme to an anchor-free design, employing a decoupled architecture that separately handles classification and regression, thereby boosting overall performance. In addition, the loss function is refined by applying binary cross-entropy for classification, alongside distribution focal loss [23] and Complete Intersection over Union [24] for regression, ultimately enhancing both accuracy and efficiency.
2.2. Improvement of YOLO Detection Algorithm
- (1)
- Backbone Network Improvement: Zhong et al. [25] modified the backbone by replacing traditional stride convolutions and pooling layers with SPD-Conv [26] layers. This adjustment significantly reduced fine-grained information loss and improved feature representation. Liu et al. [27] proposed an efficient spatio-temporal interaction module aimed at enhancing spatial feature preservation, augmenting network depth, and proficiently capturing extensive target information. Zhou et al. [28] incorporated the DualC2f module into the backbone. Leveraging group convolution for efficient filter arrangement, this method enhances inter-channel communication and information retention, significantly increasing model accuracy. Xiao et al. [29] presented the Efficient Multi-scale Convolution Module (EMCM) to strengthen the C2f architecture. By incorporating a Multi-scale Attention Module into EMCM’s multi-branch structure, C2f’s capacity to extract features in complex environments is further improved. Although improvements to the backbone have enhanced feature extraction capabilities, they may also result in increased computational expenses and problems such as overfitting or poor feature discrimination when detecting small targets.
- (2)
- Neck Network Improvement: Li et al. [30] proposed the Bi-PAN-FPN framework to enhance the neck of YOLOv8s, aiming to improve detection performance. Nie et al. [31] introduced the HPANet architecture as an alternative to the original PANet, improving its capacity to amalgamate feature maps across varying scales while decreasing the number of network parameters. Li et al. [32] introduced an innovative architecture to address the shortcomings of conventional Feature Pyramid Networks when handling dimension variations. Through the enhancement of the feature fusion approach, MPA-FPN mitigates conflicts among nonadjacent features and amplifies interactions between low-level and high-level information. Xiao et al. [33] introduced the Multi-Scale Ghost Generalized Feature Pyramid Network (MSGFPN) to improve the combination of feature map data across various scales while reducing its parameter count. Zhao et al. [34] integrated the Efficient Multi-Scale Attention (EMA) [35] into the neck, improving the model’s sensitivity to both spatial and textural information. While neck enhancements improve feature fusion, they may also introduce feature redundancy, potentially increasing confusion between targets and noise in complex backgrounds.
- (3)
- Head Network Improvement: Zhang et al. [36] tackled the issue of the network’s insensitivity to aspect ratio similarity between the assumption and reality in aerial imagery by optimizing the detection head. The basic model’s CIoU was substituted by the enhanced loss function, inner-MPDIoU, for improving the recognition performance of small target features. Peng et al. [37] replaced the detection head of YOLOv8 with a lightweight, multi-attention Dynamic Head. This modification replaces the original convolution with FasterConv, reducing the detection head’s weight and improving its efficiency in leveraging feature maps that integrate both local and global information. Li et al. [38] utilized a head decoupling strategy in conjunction with MAM, which enabled the classification task to focus on semantic information and the localization task to emphasize boundary information. Consequently, the detection performance was enhanced. Lin et al. [39] introduced the normalized Wasserstein distance for training, with the objective of mitigating the susceptibility of IoU-based metrics to small object localization bias. The enhancements to the detection head have improved classification and localization skills for small targets. Nevertheless, intricate backgrounds may still result in erroneous and overlooked detections, and multi-target scenarios may limit the generalization ability.
3. Methods
3.1. Improvements to the Backbone
3.1.1. Improvement of C2F
3.1.2. Improvement of SPPF
3.2. Improvement of Neck
3.3. Improvement of Head
- (1)
- Scale-aware attention
- (2)
- Spatial-aware attention
- (3)
- Task-aware attention
4. Experimental Design and Analysis of Results
4.1. Experimental Data
4.2. Experimental Equipment
4.3. Experimental Evaluative Metrics
4.4. Comparison Experiment
4.5. Ablation Experiment
4.6. Generalization Experiment
4.7. Embedded Porting of Models
4.8. Visual Analysis
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Shi, W.; Lyu, X.; Han, L. SONet: A Small Object Detection Network for Power Line Inspection Based on YOLOv8. IEEE Trans. Power Deliv. 2024, 39, 2973–2984. [Google Scholar] [CrossRef]
- He, Y.; Li, J. TSRes-YOLO: An Accurate and Fast Cascaded Detector for Waste Collection and Transportation Supervision. Eng. Appl. Artif. Intell. 2023, 126, 106997. [Google Scholar] [CrossRef]
- Bakirci, M. Enhancing Vehicle Detection in Intelligent Transportation Systems via Autonomous UAV Platform and YOLOv8 Integration. Appl. Soft Comput. 2024, 164, 112015. [Google Scholar] [CrossRef]
- Girshick, R. Fast R-CNN. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the Computer Vision—ECCV 2016, Amsterdam, The Netherlands, 11–14 October 2016; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Springer: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar]
- Redmon, J. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016. [Google Scholar]
- Huang, T.; Zhu, J.; Liu, Y.; Tan, Y. UAV Aerial Image Target Detection Based on BLUR-YOLO. Remote Sens. Lett. 2023, 14, 186–196. [Google Scholar] [CrossRef]
- Tang, P.; Ding, Z.; Lv, M.; Jiang, M.; Xu, W. YOLO-RSFM: An Efficient Road Small Object Detection Method. IET Image Process. 2024, 18, 4263–4274. [Google Scholar] [CrossRef]
- Tahir, N.U.A.; Long, Z.; Zhang, Z.; Asim, M.; ElAffendi, M. PVswin-YOLOv8s: UAV-Based Pedestrian and Vehicle Detection for Traffic Management in Smart Cities Using Improved YOLOv8. Drones 2024, 8, 84. [Google Scholar] [CrossRef]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
- Zhang, X.; Liu, C.; Yang, D.; Song, T.; Ye, Y.; Li, K.; Song, Y. RFAConv: Innovating Spatial Attention and Standard Convolutional Operation. arXiv 2023, arXiv:2304.03198. [Google Scholar]
- Lau, K.W.; Po, L.M.; Rehman, Y.A.U. Large Separable Kernel Attention: Rethinking the Large Kernel Attention Design in CNN. Expert Syst. Appl. 2024, 236, 121352. [Google Scholar] [CrossRef]
- Xu, X.; Jiang, Y.; Chen, W.; Huang, Y.; Zhang, Y.; Sun, X. Damo-YOLO: A Report on Real-Time Object Detection Design. arXiv 2022, arXiv:2211.15444. [Google Scholar]
- Dai, X.; Chen, Y.; Xiao, B.; Chen, D.; Liu, M.; Yuan, L.; Zhang, L. Dynamic Head: Unifying Object Detection Heads with Attentions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 19–25 June 2021; pp. 7373–7382. [Google Scholar]
- Du, D.; Zhu, P.; Wen, L.; Bian, X.; Lin, H.; Hu, Q.; Peng, T.; Zheng, J.; Wang, X.; Zhang, Y.; et al. VisDrone-DET2019: The Vision Meets Drone Object Detection in Image Challenge Results. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar]
- Wang, Y.; Li, H.; Li, X.; Wang, Z.; Zhang, B. UAV Image Target Localization Method Based on Outlier Filter and Frame Buffer. Chin. J. Aeronaut. 2024, 37, 375–390. [Google Scholar] [CrossRef]
- Jocher, G. YOLOv5 by Ultralytics. 2020. Available online: https://github.com/ultralytics/yolov5 (accessed on 9 June 2020).
- Jocher, G.; Chaurasia, A.; Qiu, J. YOLOv5 by Ultralytics. 2023. Available online: https://github.com/ultralytics/ultralytics/yolov8 (accessed on 10 January 2023).
- Lin, T.Y.; Dollar, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path Aggregation Network for Instance Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
- Redmon, J. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 12993–13000. [Google Scholar]
- Duan, K.; Bai, S.; Xie, L.; Qi, H.; Huang, Q.; Tian, Q. CenterNet: Keypoint Triplets for Object Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6569–6578. [Google Scholar]
- Zhong, R.; Peng, E.; Li, Z.; Ai, Q.; Han, T.; Tang, Y. SPD-YOLOv8: A Small-Size Object Detection Model of UAV Imagery in Complex Scene. J. Supercomput. 2024, 80, 1–21. [Google Scholar] [CrossRef]
- Sunkara, R.; Luo, T. No More Strided Convolutions or Pooling: A New CNN Building Block for Low-Resolution Images and Small Objects. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Grenoble, France, 19–23 September 2022; Springer: Cham, Switzerland, 2022; pp. 443–459. [Google Scholar]
- Liu, H.; Duan, X.; Lou, H.; Gu, J.; Chen, H.; Bi, L. Improved GBS-YOLOv5 Algorithm Based on YOLOv5 Applied to UAV Intelligent Traffic. Sci. Rep. 2023, 13, 9577. [Google Scholar] [CrossRef]
- Zhou, S.; Zhou, H. Detection Based on Semantics and a Detail Infusion Feature Pyramid Network and a Coordinate Adaptive Spatial Feature Fusion Mechanism Remote Sensing Small Object Detector. Remote Sens. 2024, 16, 2416. [Google Scholar] [CrossRef]
- Xiao, L.; Li, W.; Zhang, X.; Jiang, H.; Wan, B.; Ren, D. EMG-YOLO: An Efficient Fire Detection Model for Embedded Devices. Digit. Signal Process. 2025, 156, 104824. [Google Scholar] [CrossRef]
- Li, Y.; Fan, Q.; Huang, H.; Han, Z.; Gu, Q. A Modified YOLOv8 Detection Network for UAV Aerial Image Recognition. Drones 2023, 7, 304. [Google Scholar] [CrossRef]
- Nie, H.; Pang, H.; Ma, M.; Zheng, R. A Lightweight Remote Sensing Small Target Image Detection Algorithm Based on Improved YOLOv8. Sensors 2024, 24, 2952. [Google Scholar] [CrossRef]
- Li, Z.; He, Q.; Zhao, H.; Yang, W. Doublem-net: Multi-scale spatial pyramid pooling-fast and multi-path adaptive Feature Pyramid Network for UAV detection. Int. J. Mach. Learn. Cybern. 2024, 15, 5781–5805. [Google Scholar]
- Xiao, L.; Li, W.; Yao, S.; Liu, H.; Ren, D. High-Precision and Lightweight Small-Target Detection Algorithm for Low-Cost Edge Intelligence. Sci. Rep. 2024, 14, 23542. [Google Scholar] [CrossRef]
- Zhao, S.; Chen, J.; Ma, L. Subtle-YOLOv8: A Detection Algorithm for Tiny and Complex Targets in UAV Aerial Imagery. Signal Image Video Process. 2024, 18, 8949–8964. [Google Scholar] [CrossRef]
- Ouyang, D.; He, S.; Zhang, G.; Luo, M.; Guo, H.; Zhan, J.; Huang, Z. Efficient Multi-Scale Attention Module with Cross-Spatial Learning. In Proceedings of the ICASSP 2023—IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes, Greece, 4–9 June 2023; pp. 1–5. [Google Scholar] [CrossRef]
- Zhang, Z.; Xie, X.; Guo, Q.; Xu, J. Improved YOLOv7-Tiny for Object Detection Based on UAV Aerial Images. Electronics 2024, 13, 2969. [Google Scholar] [CrossRef]
- Peng, H.; Xie, H.; Liu, H.; Guan, X. LGFF-YOLO: Small Object Detection Method of UAV Images Based on Efficient Local-Global Feature Fusion. J. Real-Time Image Process. 2024, 21, 167. [Google Scholar] [CrossRef]
- Li, H.; Ling, L.; Li, Y.; Zhang, W. DFE-Net: Detail Feature Extraction Network for Small Object Detection. Vis. Comput. 2024, 40, 8853–8866. [Google Scholar] [CrossRef]
- Lin, Y.; Li, J.; Shen, S.; Wang, H.; Zhou, H. GDRS-YOLO: More Efficient Multiscale Features Fusion Object Detector for Remote Sensing Images. IEEE Geosci. Remote Sens. Lett. 2024, 21, 6008505. [Google Scholar] [CrossRef]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
- Wang, C.Y.; Liao, H.Y.M.; Wu, Y.H.; Chen, P.Y.; Hsieh, J.W.; Yeh, I.H. CSPNet: A New Backbone That Can Enhance Learning Capability of CNN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 14–19 June 2020; pp. 390–391. [Google Scholar]
- Wang, C.Y.; Yeh, I.H.; Liao, H.Y.M. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. In Proceedings of the Computer Vision—ECCV 2024, Cham, Switzerland, 7–13 October 2024; Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G., Eds.; Springer Nature Switzerland: Cham, Switzerland, 2025; pp. 1–21. [Google Scholar]
- Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. YOLOv10: Real-Time End-to-End Object Detection. arXiv 2024, arXiv:2405.14458. [Google Scholar]
- Wang, J.; Yu, J.; He, Z. ARFP: A Novel Adaptive Recursive Feature Pyramid for Object Detection in Aerial Images. Appl. Intell. 2022, 52, 12844–12859. [Google Scholar] [CrossRef]
- Suo, J.; Wang, T.; Zhang, X.; Chen, H.; Zhou, W.; Shi, W. HIT-UAV: A High-Altitude Infrared Thermal Dataset for Unmanned Aerial Vehicle-Based Object Detection. Sci. Data 2023, 10, 227. [Google Scholar] [CrossRef]
- Zhao, X.; Xia, Y.; Zhang, W.; Zheng, C.; Zhang, Z. YOLO-ViT-Based Method for Unmanned Aerial Vehicle Infrared Vehicle Target Detection. Remote Sens. 2023, 15, 3778. [Google Scholar] [CrossRef]
Component | Specification |
---|---|
CPU | 18 vCPU AMD EPYC 9754 128-Core Processor |
GPU | RTX 4090D |
PyTorch | PyTorch 1.13.1 |
CUDA | CUDA 12.1 |
Python | Python 3.8 |
Model | P/% | R/% | mAP@0.5/% | mAP@0.5:0.95/% | Params/M | FLOPs/G |
---|---|---|---|---|---|---|
YOLOv8n | 44.9 | 33.7 | 33.9 | 19.7 | 3.0 | 8.1 |
Ours | 56.8 | 43.2 | 46.1 | 28.1 | 4.0 | 27.4 |
Model | P/% | R/% | mAP@0.5/% | mAP@0.5:0.95/% | Params/M | FLOPs/G |
---|---|---|---|---|---|---|
YOLOv3 | 56.2 | 42.3 | 44.9 | 27.6 | 103.7 | 283.0 |
YOLOv5n | 43.9 | 31.1 | 32.5 | 18.7 | 1.8 | 4.2 |
YOLOv5s | 46.3 | 37.2 | 36.7 | 21.6 | 7.2 | 15.8 |
YOLOv8s | 50.9 | 38.1 | 39.3 | 23.5 | 11.2 | 28.5 |
YOLOv8m | 52.2 | 41.4 | 42.5 | 25.8 | 25.9 | 78.7 |
YOLOv8l | 56.0 | 42.9 | 44.4 | 27.3 | 43.6 | 164.9 |
YOLOv9s | 50.2 | 38.9 | 39.5 | 23.5 | 7.2 | 26.7 |
YOLOv10s | 51.4 | 38.2 | 40.0 | 24.0 | 8.1 | 24.6 |
YOLOv11s | 50.9 | 38.8 | 40.3 | 24.2 | 9.4 | 21.4 |
RLRD-YOLO | 56.8 | 43.2 | 46.1 | 28.1 | 4.0 | 27.4 |
Model | P/% | R/% | mAP@0.5/% | mAP@0.5:0.95/% | Params/M | FLOPs/G |
---|---|---|---|---|---|---|
ARFP+Faster-RCNN | 45.2 | 35.8 | 26.4 | 15.0 | 41.3 | 207 |
YOLO-RSFM | 53.5 | 36.4 | 35.1 | 20.2 | 7.1 | 29.0 |
PVSwim-YOLOv8 | 54.5 | 41.8 | 43.3 | 24.6 | 21.6 | - |
GBS-YOLOv5 | 41.4 | 35.5 | 35.3 | 20.0 | - | - |
DDSC-YOLO | 52.9 | 40.3 | 42.2 | 25.5 | 4.99 | - |
Subtle-YOLOv8 | 54.4 | 43.4 | 45.1 | 26.6 | 11.8 | 72.9 |
Improve YOLOv7 | 51.2 | 42.1 | 41.5 | 23.6 | 6.39 | 21.0 |
LGFF-YOLO | - | - | 43.5 | 22.9 | 4.15 | 12.4 |
DFE-Net | 48.4 | 39.1 | 39.1 | 23.9 | 19.7 | 19.7 |
RLRD-YOLO | 56.8 | 43.2 | 46.1 | 28.1 | 4.0 | 27.4 |
Model | Pedestrian | People | Bicycle | Car | Van | Truck | Tri | Awn-Tri | Bus | Motor | mAP@0.5 |
---|---|---|---|---|---|---|---|---|---|---|---|
YOLOv5s | 39.7 | 31.6 | 11.5 | 77.3 | 41.4 | 33.0 | 24.9 | 13.5 | 53.1 | 41.8 | 36.7 |
YOLOv8n | 35.5 | 28.1 | 8.6 | 76.4 | 39.2 | 30.0 | 22.7 | 12.2 | 48.5 | 38.0 | 33.9 |
YOLOv8s | 42.5 | 32.5 | 12.6 | 79.2 | 44.7 | 36.1 | 28.8 | 17.0 | 55.5 | 44.5 | 39.3 |
YOLOv8m | 46.6 | 36.3 | 15.5 | 81.2 | 47.6 | 40.8 | 30.9 | 17.5 | 59.9 | 48.3 | 42.5 |
YOLOv9s | 43.9 | 32.5 | 12.9 | 79.8 | 45.0 | 36.5 | 28.5 | 17.0 | 55.7 | 44.3 | 39.5 |
YOLOv10s | 43.6 | 34.4 | 14.7 | 80.3 | 45.3 | 37.0 | 28.7 | 16.2 | 54.3 | 46.0 | 40.0 |
YOLO-RSFM | 36.3 | 25.4 | 12.3 | 76.1 | 34.6 | 34.1 | 23.7 | 17.3 | 52.3 | 39.2 | 35.1 |
PVSwim-YOLOv8s | 45.9 | 35.7 | 16.4 | 81.5 | 49.1 | 42.4 | 32.8 | 17.7 | 62.9 | 48.2 | 43.3 |
GBS-YOLOv5 | 42.3 | 32.8 | 8.8 | 78.2 | 39.0 | 30.9 | 22.2 | 11.7 | 47.7 | 39.1 | 35.3 |
DDSC-YOLO | 48.9 | 40.3 | 15.1 | 83.1 | 47.8 | 35.0 | 27.6 | 16.3 | 59.1 | 48.6 | 42.2 |
Improve YOLOv7 | 45.3 | 41.7 | 13.1 | 83.8 | 45.3 | 38.5 | 25.6 | 20.7 | 53.3 | 47.7 | 41.5 |
LGFF-YOLO | 40.9 | 31.2 | 12.3 | 78.7 | 44.6 | 35.3 | 27.1 | 14.8 | 55.9 | 42.1 | 38.3 |
DFF-Net | 46.4 | 38.1 | 14.9 | 79.6 | 42.4 | 35.4 | 25.9 | 13.3 | 51.6 | 45.0 | 39.3 |
RLRD-YOLO | 52.0 | 42.4 | 18.8 | 84.3 | 50.9 | 40.8 | 33.7 | 18.6 | 65.8 | 53.9 | 46.1 |
RFCBAMConv | SPPF_LSKA | RepGFPN | Dyhead | mAP@50/% | mAP@50:95/% | Params/M | FLOPs/G |
---|---|---|---|---|---|---|---|
33.9 | 19.7 | 3.0 | 8.1 | ||||
✓ | 35.1 | 20.6 | 3.1 | 8.7 | |||
✓ | ✓ | 35.5 | 20.9 | 3.2 | 9.2 | ||
✓ | ✓ | ✓ | 42.3 | 25.4 | 3.2 | 19.7 | |
✓ | ✓ | ✓ | ✓ | 46.1 | 28.1 | 4.0 | 27.4 |
Model | mAP@0.5/% | mAP@0.5:0.95/% | Params/M | FLOPs/G |
---|---|---|---|---|
YOLOv3-tiny | 86.2 | 53.1 | 12.1 | 19.0 |
YOLOv5n | 93.0 | 59.0 | 1.8 | 4.2 |
YOLOv5s | 93.9 | 61.0 | 7.2 | 15.8 |
YOLOv8n | 92.7 | 59.0 | 3.0 | 8.1 |
YOLOv8s | 94.6 | 61.5 | 11.2 | 28.5 |
YOLOv9s | 93.9 | 61.3 | 7.2 | 26.7 |
YOLOv10n | 93.4 | 59.6 | 2.7 | 8.4 |
YOLOv10s | 94.7 | 61.6 | 8.1 | 24.6 |
YOLOv11n | 93.4 | 60.2 | 2.6 | 6.3 |
YOLOv11s | 94.6 | 62.1 | 9.4 | 21.4 |
YOLO-ViT [46] | 94.5 | - | 17.3 | 33.1 |
RLRD-YOLO | 95.1 | 62.8 | 4.0 | 27.4 |
Method | mAP@50/% | mAP@50:95/% | Params (M) | FPS |
---|---|---|---|---|
YOLOv8-n | 33.9 | 19.7 | 3.0 | 39.2 |
YOLOv8-s | 39.3 | 23.5 | 11.2 | 19.3 |
YOLOv11-n | 34.4 | 19.9 | 2.6 | 40.9 |
RLRD-YOLO | 46.1 | 28.1 | 4.0 | 21.7 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, H.; Li, Y.; Xiao, L.; Zhang, Y.; Cao, L.; Wu, D. RLRD-YOLO: An Improved YOLOv8 Algorithm for Small Object Detection from an Unmanned Aerial Vehicle (UAV) Perspective. Drones 2025, 9, 293. https://doi.org/10.3390/drones9040293
Li H, Li Y, Xiao L, Zhang Y, Cao L, Wu D. RLRD-YOLO: An Improved YOLOv8 Algorithm for Small Object Detection from an Unmanned Aerial Vehicle (UAV) Perspective. Drones. 2025; 9(4):293. https://doi.org/10.3390/drones9040293
Chicago/Turabian StyleLi, Hanyun, Yi Li, Linsong Xiao, Yunfeng Zhang, Lihua Cao, and Di Wu. 2025. "RLRD-YOLO: An Improved YOLOv8 Algorithm for Small Object Detection from an Unmanned Aerial Vehicle (UAV) Perspective" Drones 9, no. 4: 293. https://doi.org/10.3390/drones9040293
APA StyleLi, H., Li, Y., Xiao, L., Zhang, Y., Cao, L., & Wu, D. (2025). RLRD-YOLO: An Improved YOLOv8 Algorithm for Small Object Detection from an Unmanned Aerial Vehicle (UAV) Perspective. Drones, 9(4), 293. https://doi.org/10.3390/drones9040293