A Lightweight Network for UAV Multi-Scale Feature Fusion-Based Object Detection
Abstract
:1. Introduction
- The integration of the MSCA [19] module within the backbone network improves multi-scale feature fusion, efficiently combining features from various scales to boost detection accuracy.
- A 160 × 160 small target detection head was introduced, replacing the 20 × 20 large target detection head, specifically designed to enhance the detection of small targets.
- The MSFB module is constructed in the neck network to fuse shallow, mid-level, and deep features, thereby enhancing the network’s capacity to capture more complex features.
- To optimize anchor boxes, the Focal-EIoU loss function [20] was utilized, which minimizes the influence of poor-quality anchor boxes and improves the regression accuracy by emphasizing high-quality boxes.
2. Related Work
2.1. YOLO Networks
2.2. UAV Object Detection
3. Methods
3.1. Overall Architecture
3.2. C2f_SEPConv Module
3.3. Detection Head Adjustment
3.4. MSFB Module
3.5. Loss Function
3.6. MSCA
4. Experiments
4.1. Datasets
4.2. Implementation Details and Evaluation Metrics
4.3. Experimental Results and Analysis
4.4. Ablation Study
4.5. Visualization
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
UAV | Unmanned Aerial Vehicle |
PConv | Partial Convolution |
SE | Squeeze-and-Excitation |
MSCA | Multi-Scale Cross-Axis Attention |
MSFB | Multi-Scale Fusion Block |
LMSF-YOLOv8s | Lightweight Multi-Scale Fusion-YOLOv8s |
GFLOP | Giga Floating Point Operations per Second |
mAP | Mean Average Precision |
P | Precision |
R | Recall |
IOU | Intersection Over Union |
FPS | Frame Per Second |
FPN | Feature Pyramid Network |
PAN | Path Aggregation Network |
CBAM | Convolutional Block Attention Module |
FC | Fully Connected |
GAP | Global Average Pooling |
DWConv | Depth-Wise Convolution |
References
- Citroni, R.; Di Paolo, F.; Livreri, P. A novel energy harvester for powering small UAVs: Performance analysis, model validation and flight results. Sensors 2019, 19, 1771. [Google Scholar] [CrossRef] [PubMed]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part I 14. Springer: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the Computer Vision & Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6517–6525. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Wang, J.; Chen, Y.; Dong, Z.; Gao, M. Improved YOLOv5 network for real-time multi-scale traffic sign detection. Neural Comput. Appl. 2022, 35, 7853–7865. [Google Scholar] [CrossRef]
- Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W.; et al. YOLOv6: A single-stage object detection framework for industrial applications. arXiv 2022, arXiv:2209.02976. [Google Scholar]
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv 2022, arXiv:2207.02696. [Google Scholar]
- Wang, C.Y.; Yeh, I.H.; Mark Liao, H.Y. YOLOv9: Learning What You Want toLearn Using Programmable Gradient Information. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2025. [Google Scholar]
- Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. YOLOv10: Real-Time End-to-End Object Detection. Adv. Neural Inf. Process. Syst. 2024, 37, 107984–108011. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28. [Google Scholar] [CrossRef]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Chen, J.; Kao, S.h.; He, H.; Zhuo, W.; Wen, S.; Lee, C.H.; Chan, S.H.G. Run, don’t walk: Chasing higher FLOPS for faster neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 12021–12031. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar]
- Shao, H.; Zeng, Q.; Hou, Q.; Yang, J. Mcanet: Medical image segmentation with multi-scale cross-axis attention. arXiv 2023, arXiv:2312.08866. [Google Scholar]
- Zhang, Y.F.; Ren, W.; Zhang, Z.; Jia, Z.; Wang, L.; Tan, T. Focal and efficient IOU loss for accurate bounding box regression. Neurocomputing 2022, 506, 146–157. [Google Scholar] [CrossRef]
- Du, D.; Zhu, P.; Wen, L.; Bian, X.; Lin, H.; Hu, Q.; Peng, T.; Zheng, J.; Wang, X.; Zhang, Y.; et al. VisDrone-DET2019: The vision meets drone object detection in image challenge results. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Seoul, Republic of Korea, 27–28 October 2019; pp. 213–226. [Google Scholar]
- Li, Y.; Wang, J.; Zhang, K.; Yi, J.; Wei, M.; Zheng, L.; Xie, W. Lightweight object detection networks for uav aerial images based on yolo. Chin. J. Electron. 2024, 33, 997–1009. [Google Scholar] [CrossRef]
- Zhang, P.; Deng, H.; Chen, Z. RT-YOLO: A residual feature fusion triple attention network for aerial image target detection. Comput. Mater. Contin. 2023, 75, 1411–1430. [Google Scholar] [CrossRef]
- Wang, J.; Zhang, F.; Zhang, Y.; Liu, Y.; Cheng, T. Lightweight object detection algorithm for uav aerial imagery. Sensors 2023, 23, 5786. [Google Scholar] [CrossRef] [PubMed]
- Cao, J.; Bao, W.; Shang, H.; Yuan, M.; Cheng, Q. GCL-YOLO: A GhostConv-based lightweight yolo network for UAV small object detection. Remote. Sens. 2023, 15, 4932. [Google Scholar] [CrossRef]
- Sui, J.; Chen, D.; Zheng, X.; Wang, H. A new algorithm for small target detection from the perspective of unmanned aerial vehicles. IEEE Access 2024, 12, 29690–29697. [Google Scholar] [CrossRef]
- Xiao, Y.; Di, N. SOD-YOLO: A lightweight small object detection framework. Sci. Rep. 2024, 14, 25624. [Google Scholar] [CrossRef]
- Yang, R.; Zhang, J.; Shang, X.; Li, W. Lightweight small target detection algorithm with multi-feature fusion. Electronics 2023, 12, 2739. [Google Scholar] [CrossRef]
- Xu, L.; Zhao, Y.; Zhai, Y.; Huang, L.; Ruan, C. Small object detection in UAV images based on Yolov8n. Int. J. Comput. Intell. Syst. 2024, 17, 223. [Google Scholar] [CrossRef]
- Mei, J.; Zhu, W. BGF-YOLOv10: Small object detection algorithm from unmanned aerial vehicle perspective based on improved YOLOv10. Sensors 2024, 24, 6911. [Google Scholar] [CrossRef]
- Wang, Y.; Zou, H.; Yin, M.; Zhang, X. Smff-yolo: A scale-adaptive yolo algorithm with multi-level feature fusion for object detection in uav scenes. Remote. Sens. 2023, 15, 4580. [Google Scholar] [CrossRef]
- Wang, X.; He, N.; Hong, C.; Sun, F.; Han, W.; Wang, Q. Yolo-erf: Lightweight object detector for uav aerial images. Multimedia Systems 2023, 29, 3329–3339. [Google Scholar] [CrossRef]
- Tahir, N.U.A.; Long, Z.; Zhang, Z.; Asim, M.; ELAffendi, M. PVswin-YOLOv8s: UAV-based pedestrian and vehicle detection for traffic management in smart cities using improved YOLOv8. Drones 2024, 8, 84. [Google Scholar] [CrossRef]
- Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. Ghostnet: More features from cheap operations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 1580–1589. [Google Scholar]
- Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar]
- Liu, W.; Lu, H.; Fu, H.; Cao, Z. Learning to upsample by learning to sample. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 6027–6037. [Google Scholar]
- Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU loss: Faster and better learning for bounding box regression. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 12993–13000. [Google Scholar]
- Xia, G.S.; Bai, X.; Ding, J.; Zhu, Z.; Belongie, S.; Luo, J.; Datcu, M.; Pelillo, M.; Zhang, L. DOTA: A large-scale dataset for object detection in aerial images. In Proceedings of the IEEE Conference on Computer Vision And Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 3974–3983. [Google Scholar]
- Wang, Z.; Su, Y.; Kang, F.; Wang, L.; Lin, Y.; Wu, Q.; Li, H.; Cai, Z. Pc-yolo11s: A lightweight and effective feature extraction method for small target image detection. Sensors 2025, 25, 348. [Google Scholar] [CrossRef] [PubMed]
- Liu, Y.; He, M.; Hui, B. ESO-DETR: An Improved Real-Time Detection Transformer Model for Enhanced Small Object Detection in UAV Imagery. Drones 2025, 9, 143. [Google Scholar] [CrossRef]
- Zhu, G.; Zhu, F.; Wang, Z.; Yang, S.; Li, Z. EDANet: Efficient Dynamic Alignment of Small Target Detection Algorithm. Electronics (2079-9292) 2025, 14, 242. [Google Scholar] [CrossRef]
Method | Params/M | GFLOPs/% | |
---|---|---|---|
YOLOv8s | 38 | 11.1 | 28.5 |
+GhostNet | 36.9 | 9.5 | 24.9 |
+MobileNetV3 | 36.4 | 10.4 | 20.8 |
+C2f_SEPConv | 38.2 | 9.9 | 25.6 |
Combination | Params/M | GFLOPs/% | ||
---|---|---|---|---|
P3 + P4 + P5 | 38 | 22.9 | 11.1 | 28.5 |
P2 + P3 + P4 + P5 | 40.9 | 24.4 | 10.6 | 37.0 |
P2 + P3 + P4 | 41.3 | 25.1 | 7.4 | 34.1 |
Loss Funciton | P/% | R/% | ||
---|---|---|---|---|
CIoU [37] | 38 | 22.9 | 45.9 | 36.9 |
DIoU [37] | 38.2 | 22.8 | 49.4 | 37.7 |
EIoU [20] | 37.8 | 22.9 | 49.6 | 37.4 |
Focal-EIoU [20] | 38.6 | 23.1 | 49.6 | 38.1 |
Method | Params/M | GFLOPs/ | FPS | ||
---|---|---|---|---|---|
Faster R-CNN [15] | 34.7 | 20.6 | 43.15 | 199.2 | — |
SSD [2] | 21.2 | 13.1 | 25.96 | 84.9 | — |
YOLOv5s | 35.4 | 20.5 | 7.1 | 16.5 | — |
YOLOv5l | 41.1 | 24.4 | 47.1 | 109.3 | — |
YOLOv8n | 32.5 | 19.1 | 3.01 | 8.1 | 183.2 |
YOLOv8s | 38.0 | 22.9 | 11.12 | 28.5 | 147.8 |
YOLOv8m | 42.1 | 25.8 | 25.84 | 78.7 | 109.4 |
BGF-YOLOv10 [30] | 39.5 | — | 2.0 | 8.6 | 37 |
SOD-YOLO [29] | 37.6 | — | 3.0 | 12.5 | — |
BDH-YOLO [26] | 42.9 | 26.2 | 9.39 | — | — |
PVswin-YOLOv8 [33] | 43.3 | 26.4 | 21.6 | — | 161.2 |
PC-YOLO11s [39] | 43.8 | 26.3 | 7.1 | — | — |
ESO-DETR [40] | 41 | 24 | 14.9 | 66 | 120 |
EDANet [41] | 39.1 | 23.1 | 6.16 | 5.4 | — |
Ours | 44.4 | 26.9 | 7.97 | 38.3 | 117.6 |
Class | P/% | R/% | ||
---|---|---|---|---|
all | 38.0 | 22.9 | 49.5 | 36.9 |
pedestrian | 40.0 | 18.1 | 52.3 | 36.9 |
people | 31.3 | 11.8 | 54.5 | 25.9 |
bicycle | 11.3 | 5.0 | 25.9 | 15.3 |
car | 78.8 | 56.3 | 72.9 | 76.3 |
van | 44.0 | 31.0 | 49.4 | 44.1 |
truck | 36.3 | 24.2 | 52.4 | 34.8 |
tricycle | 26.5 | 14.7 | 39.0 | 27.3 |
awning-tricycle | 15.6 | 9.8 | 29.3 | 17.5 |
bus | 53.7 | 39.4 | 67.6 | 49.0 |
motor | 42.4 | 18.9 | 51.5 | 42.4 |
Class | P/% | R/% | ||
---|---|---|---|---|
all | 44.4 | 26.9 | 53.9 | 42.5 |
pedestrian | 51.8 | 24.9 | 58.8 | 47.3 |
people | 41.6 | 17.5 | 58.9 | 37.3 |
bicycle | 16.8 | 7.3 | 29.9 | 19.9 |
car | 84.0 | 60.9 | 74.5 | 81.9 |
van | 49.1 | 35.5 | 54.7 | 47.2 |
truck | 38.5 | 25.7 | 52.2 | 36.9 |
tricycle | 32.2 | 18.4 | 46.9 | 31.0 |
awning-tricycle | 17.9 | 11.3 | 37.6 | 19.9 |
bus | 60.2 | 43.8 | 70.0 | 52.2 |
motor | 51.4 | 24.1 | 55.1 | 51.2 |
Method | P/% | R/% | Params/M | GFLOPs/ | ||
---|---|---|---|---|---|---|
YOLOv8s | 38.0 | 22.9 | 49.5 | 36.9 | 11.1 | 28.5 |
+A | 38.6 | 23.1 | 49.6 | 38.1 | 11.1 | 28.5 |
+B | 39.2 | 23.2 | 49.6 | 38.7 | 11.4 | 29.3 |
+C | 38.2 | 22.9 | 50.5 | 37.3 | 9.9 | 25.6 |
+D | 41.3 | 25.1 | 51.4 | 40.1 | 7.4 | 34.1 |
+E | 40.2 | 24.4 | 50.7 | 39.6 | 12.38 | 35.2 |
+A +B | 39.7 | 23.4 | 49.6 | 39.4 | 11.4 | 29.3 |
+A + B + C | 40.0 | 23.5 | 50.8 | 39.7 | 10.2 | 26.4 |
+A + B + C + D | 42.6 | 26.1 | 52.9 | 41.8 | 6.5 | 32.1 |
+A + B + C + D + E | 44.3 | 26.9 | 53.9 | 42.5 | 7.9 | 38.3 |
Method | P/% | R/% | ||
---|---|---|---|---|
YOLOv8s | 54.5 | 36.5 | 77.8 | 49.1 |
LMSF-YOLOv8s (Ours) | 56.6 | 37.8 | 79.4 | 50.9 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Deng, S.; Wan, Y. A Lightweight Network for UAV Multi-Scale Feature Fusion-Based Object Detection. Information 2025, 16, 250. https://doi.org/10.3390/info16030250
Deng S, Wan Y. A Lightweight Network for UAV Multi-Scale Feature Fusion-Based Object Detection. Information. 2025; 16(3):250. https://doi.org/10.3390/info16030250
Chicago/Turabian StyleDeng, Sheng, and Yaping Wan. 2025. "A Lightweight Network for UAV Multi-Scale Feature Fusion-Based Object Detection" Information 16, no. 3: 250. https://doi.org/10.3390/info16030250
APA StyleDeng, S., & Wan, Y. (2025). A Lightweight Network for UAV Multi-Scale Feature Fusion-Based Object Detection. Information, 16(3), 250. https://doi.org/10.3390/info16030250