Evaluating YOLOv4 and YOLOv5 for Enhanced Object Detection in UAV-Based Surveillance
Abstract
:1. Introduction
2. Related Work
3. Methodology
- Limited Battery Life:UAVs rely on batteries for power, and running resource-intensive algorithms like YOLO can drain the battery quickly, limiting flight time [20].
- Processing Power and Memory:Implementing YOLO on UAVs may require specialized hardware or optimizations to ensure real-time performance [20]. Since speed is dependent on the algorithm’s performance, a lower performance might require more advanced hardware. Three different hardwares with different processing performances can be seen in Table 2 for comparison.
- Payload Constraints:UAVs often have weight restrictions, and adding the necessary equipment for YOLO implementation, such as cameras and processing units, can impact flight time and overall performance, especially on small UAVs [21].
- Data Transmission:Transmitting high-resolution images or video streams from the UAV to a ground station for YOLO processing can strain the communication bandwidth and introduce latency. This is especially challenging for real-time applications [22].
- Real-time Processing:Achieving real-time processing for YOLO on UAVs can be challenging due to limited computational resources. This can affect the responsiveness of the system in dynamic environments.
- Object Size and Distance:YOLO may struggle to detect small or distant objects, which can be a significant limitation in UAV applications, especially for tasks such as search and rescue or wildlife monitoring.
- Cost:Implementing YOLO on UAVs may require investments in hardware, software, and training, which can increase the overall cost of UAV operations. As can be seen in Table 2, as hardware performance increases, the cost also increases.
AGX ORIN 64 GB | AGX XAVIER 64 GB | JETSON NANO | |
---|---|---|---|
AI performance | 275 TOPS | 32 TOPS | 472 GFLOPS |
GPU | 2048-core | 512-core | 128-core |
Memory speed | 204.8 GB/s | 136.5 GB/s | 25.6 GB/s |
Power | 15 W–60 W | 10 W–30 W | 5 W–10 W |
Cost | $1800 | $1400 | $500 1 |
4. Experiment
4.1. MS-COCO Evaluation
- Surveillance and Monitoring: High precision is vital in applications like military surveillance or border monitoring, where false positives can lead to unnecessary alerts or actions. For example, detecting intruders or vehicles must have high AP to avoid misidentifying benign objects as threats. Recall also matters for ensuring no critical objects (e.g., intruders) are missed, but a balanced AP ensures consistent performance.
- Search and Rescue: In disaster scenarios, UAVs often scan large areas to locate missing persons or critical items. High AP ensures that the system can accurately detect objects such as humans or vehicles, minimizing false positives that could divert rescue efforts. Precision and recall trade-offs must be optimized for speed and resource efficiency.
- Inspection and Maintenance: In industrial UAV applications, such as inspecting wind turbines or power lines, high AP ensures the accurate detection of defects or anomalies, reducing the risk of missing critical issues or flagging false positives that increase operational costs.
4.2. UAV-Captured Real Images Evaluation
5. Results and Discussion
5.1. Detection Accuracy
5.2. Speed and Efficiency
5.3. Confidence Values
5.4. Suitability for UAV Applications
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Kuznetsova, A.; Maleva, T.; Soloviev, V. YOLOv5 Versus YOLOv3 for Apple Detection; Springer: Berlin/Heidelberg, Germany, 2021; pp. 349–358. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. Available online: http://pjreddie.com/yolo/ (accessed on 19 September 2023).
- Jiang, P.; Ergu, D.; Liu, F.; Cai, Y.; Ma, B. A Review of Yolo Algorithm Developmentsin Procedia Computer Science; Elsevier B.V.: Amsterdam, The Netherlands, 2021; pp. 1066–1073. [Google Scholar] [CrossRef]
- Nepal, U.; Eslamiat, H. Comparing YOLOv3, YOLOv4 and YOLOv5 for Autonomous Landing Spot Detection in Faulty UAVs. Sensors 2022, 22, 464. [Google Scholar] [CrossRef] [PubMed]
- Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- YOLOv5 v1.0 Commits · Ultralytics/yolov5 · GitHub. Available online: https://github.com/ultralytics/yolov5/commits/v1.0 (accessed on 18 September 2023).
- Zhang, P.; Li, D. EPSA-YOLO-V5s, A novel method for detecting the survival rate of rapeseed in a plant factory based on multiple guarantee mechanisms. Comput. Electron. Agric. 2022, 193, 106714. [Google Scholar] [CrossRef]
- Kırac, E.; Özbek, S. Deep Learning Based Object Detection with Unmanned Aerial Vehicle Equipped with Embedded System. J. Aviat. 2024, 8, 15–25. [Google Scholar] [CrossRef]
- Kim, J.; Cho, J. RGDiNet: Efficient Onboard Object Detection with Faster R-CNN for Air-to-Ground Surveillance. Sensors 2021, 21, 1677. [Google Scholar] [CrossRef]
- Tang, G.; Ni, J.; Zhao, Y.; Gu, Y.; Cao, W. A Survey of Object Detection for UAVs Based on Deep Learning. Remote Sens. 2024, 16, 149. [Google Scholar] [CrossRef]
- Lin, T.-Y.; Maire, M.; Belongie, S.; Bourdev, L.; Girshick, R.; Hays, J.; Perona, P.; Ramanan, D.; Zitnick, C.L.; Dollár, P. Microsoft COCO: Common Objects in Context. arXiv 2014, arXiv:1405.0312. [Google Scholar]
- YOLOv4 GitHub—AlexeyAB/Darknet at 3d4242a6e534fe44afd3c0bf0de92e0f4e9ce23f. Available online: https://github.com/AlexeyAB/darknet/tree/3d4242a6e534fe44afd3c0bf0de92e0f4e9ce23f (accessed on 19 September 2023).
- Carrio, A.; Sampedro, C.; Rodriguez-Ramos, A.; Campoy, P. A review of deep learning methods and applications for unmanned aerial vehicles. J. Sens. 2017, 2017, 3296874. [Google Scholar] [CrossRef]
- Salvini, P. Urban robotics: Towards responsible innovations for our cities. Robot. Auton. Syst. 2018, 100, 278–286. [Google Scholar] [CrossRef]
- Schedl, D.C.; Kurmi, I.; Bimber, O. Search and Rescue with Airborne Optical Sectioning. Nat. Mach. Intell. 2020, 2, 783–790. [Google Scholar] [CrossRef]
- Xu, R.; Lin, H.; Lu, K.; Cao, L.; Liu, Y. A forest fire detection system based on ensemble learning. Forests 2021, 12, 217. [Google Scholar] [CrossRef]
- Park, S.E.; Eem, S.H.; Jeon, H. Concrete crack detection and quantification using deep learning and structured light. Constr. Build. Mater. 2020, 252, 119096. [Google Scholar] [CrossRef]
- Tan, L.; Huangfu, T.; Wu, L.; Chen, W. Comparison of RetinaNet, SSD, and YOLO v3 for real-time pill identification. BMC Med. Inform. Decis. Mak. 2021, 21, 324. [Google Scholar] [CrossRef] [PubMed]
- Li, S.; Ozo, M.M.O.I.; Wagter, C.D.; de Croon, G.C.H.E. Autonomous drone race: A computationally efficient vision-based navigation and control strategy. Robot. Auton. Syst. 2020, 133, 103621. [Google Scholar] [CrossRef]
- Plastiras, G.; Kyrkou, C.; Theocharides, T. EdgeNet—Balancing accuracy and performance for edge-based convolutional neural network object detectors. In ACM International Conference Proceeding Series; Association for Computing Machinery: New York, NY, USA, 2019. [Google Scholar] [CrossRef]
- Bayer, R.; Priest, J.; Tözün, P. Reaching the Edge of the Edge: Image Analysis in Space. arXiv 2024, arXiv:2301.04954. [Google Scholar]
- Yang, Q.; Yang, J.H. HD video transmission of multi-rotor Unmanned Aerial Vehicle based on 5G cellular communication network. Comput. Commun. 2020, 160, 688–696. [Google Scholar] [CrossRef]
- Jetson Modules, Support, Ecosystem, and Lineup|NVIDIA Developer. Available online: https://developer.nvidia.com/embedded/jetson-modules (accessed on 19 October 2023).
- Liu, L.; Liu, Y.; Gao, X.-Z.; Zhang, X. An Immersive Human-Robot Interactive Game Framework Based on Deep Learning for Children’s Concentration Training. Healthcare 2022, 10, 1779. [Google Scholar] [CrossRef] [PubMed]
- COCO—Common Objects in Context Metrics. Available online: https://cocodataset.org/#detection-eval (accessed on 7 November 2023).
YOLO Version | Release Date | Backbone Architecture | Key Features |
---|---|---|---|
YOLOv1 | June 2016 | Darknet-19 | Custom network designed for efficiency |
YOLOv2 | December 2016 | Modified Darknet-19 | Batch normalization for faster training and improved performance. |
YOLOv3 | April 2018 | Darknet-53 | Deeper architecture for improved feature extraction. |
YOLOv4 | April 2020 | CSP-Darknet53 | Improved efficiency and accuracy through feature reuse |
YOLOv5 | June 2020 | New CSP-Darknet53 | Scalable architecture with multiple sizes |
YOLOv4 | YOLOv5 | |
---|---|---|
AP when IoU is 0.50:0.95, area is all, and maxDets is 100 | =0.456 | =0.506 |
AP when IoU is 0.50, area is all, and maxDets is 100 | =0.715 | =0.750 |
AP when IoU is 0.75, area is all, and maxDets is 100 | =0.492 | =0.549 |
AP when IoU is 0.50:0.95, area is small, and maxDets is 100 | =0.358 | =0.392 |
AP when IoU is 0.50:0.95, area is medium, and maxDets is 100 | =0.603 | =0.675 |
AP when IoU is 0.50:0.95, area is large, and maxDets is 100 | =0.601 | =0.702 |
AR when IoU is 0.50:0.95, area is all, and maxDets is 1 | =0.193 | =0.211 |
AR when IoU is 0.50:0.95, area is all, and maxDets is 10 | =0.545 | =0.590 |
AR when IoU is 0.50:0.95, area is all, and maxDets is 100 | =0.603 | =0.643 |
AR when IoU is 0.50:0.95, area is small, and maxDets is 100 | =0.508 | =0.534 |
AR when IoU is 0.50:0.95, area is medium, and maxDets is 100 | =0.736 | =0.793 |
AR when IoU is 0.50:0.95, area is large, and maxDets is 100 | =0.781 | =0.856 |
YOLOv4 | YOLOv5 | |
---|---|---|
AP when IoU is 0.50:0.95, area is all, and maxDets is 100 | =0.691 | =0.762 |
AP when IoU is 0.50, area is all, and maxDets is 100 | =0.877 | =0.891 |
AP when IoU is 0.75, area is all, and maxDets is 100 | =0.785 | =0.834 |
AP when IoU is 0.50:0.95, area is small, and maxDets is 100 | =0.250 | =0.315 |
AP when IoU is 0.50:0.95, area is medium, and maxDets is 100 | =0.588 | =0.658 |
AP when IoU is 0.50:0.95, area is large, and maxDets is 100 | =0.803 | =0.876 |
AR when IoU is 0.50:0.95, area is all, and maxDets is 1 | =0.501 | =0.547 |
AR when IoU is 0.50:0.95, area is all, and maxDets is 10 | =0.762 | =0.830 |
AR when IoU is 0.50:0.95, area is all, and maxDets is 100 | =0.768 | =0.834 |
AR when IoU is 0.50:0.95, area is small, and maxDets is 100 | =0.431 | =0.516 |
AR when IoU is 0.50:0.95, area is medium, and maxDets is 100 | =0.710 | =0.771 |
AR when IoU is 0.50:0.95, area is large, and maxDets is 100 | =0.857 | =0.923 |
YOLOv4 | YOLOv5 | |
---|---|---|
AP when IoU is 0.50:0.95, area is all, and maxDets is 100 | =0.548 | =0.614 |
AP when IoU is 0.50, area is all, and maxDets is 100 | =0.827 | =0.846 |
AP when IoU is 0.75, area is all, and maxDets is 100 | =0.606 | =0.669 |
AP when IoU is 0.50:0.95, area is small, and maxDets is 100 | =0.375 | =0.413 |
AP when IoU is 0.50:0.95, area is medium, and maxDets is 100 | =0.621 | =0.696 |
AP when IoU is 0.50:0.95, area is large, and maxDets is 100 | =0.719 | =0.824 |
AR when IoU is 0.50:0.95, area is all, and maxDets is 1 | =0.189 | =0.209 |
AR when IoU is 0.50:0.95, area is all, and maxDets is 10 | =0.554 | =0.614 |
AR when IoU is 0.50:0.95, area is all, and maxDets is 100 | =0.645 | =0.701 |
AR when IoU is 0.50:0.95, area is small, and maxDets is 100 | =0.497 | =0.530 |
AR when IoU is 0.50:0.95, area is medium, and maxDets is 100 | =0.711 | =0.776 |
AR when IoU is 0.50:0.95, area is large, and maxDets is 100 | =0.807 | =0.889 |
Person | Bus | Car | |
---|---|---|---|
YOLOv4 | 0.62 ± 0.14 | 0.67 ± 0.21 | 0.54 ± 0.12 |
YOLOv5 | 0.68 ± 0.14 | 0.72 ± 0.20 | 0.60 ± 0.12 |
Person | Bus | Car | |
---|---|---|---|
YOLOv4 | 0.57 ± 0.20 | 0.67 ± 0.15 | 0.56 ± 0.19 |
YOLOv5 | 0.62 ± 0.22 | 0.74 ± 0.15 | 0.60 ± 0.21 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Alhassan, M.A.M.; Yılmaz, E. Evaluating YOLOv4 and YOLOv5 for Enhanced Object Detection in UAV-Based Surveillance. Processes 2025, 13, 254. https://doi.org/10.3390/pr13010254
Alhassan MAM, Yılmaz E. Evaluating YOLOv4 and YOLOv5 for Enhanced Object Detection in UAV-Based Surveillance. Processes. 2025; 13(1):254. https://doi.org/10.3390/pr13010254
Chicago/Turabian StyleAlhassan, Mugtaba Abdalrazig Mohamed, and Ersen Yılmaz. 2025. "Evaluating YOLOv4 and YOLOv5 for Enhanced Object Detection in UAV-Based Surveillance" Processes 13, no. 1: 254. https://doi.org/10.3390/pr13010254
APA StyleAlhassan, M. A. M., & Yılmaz, E. (2025). Evaluating YOLOv4 and YOLOv5 for Enhanced Object Detection in UAV-Based Surveillance. Processes, 13(1), 254. https://doi.org/10.3390/pr13010254