MCF-YOLOv5: A Small Target Detection Algorithm Based on Multi-Scale Feature Fusion Improved YOLOv5
Abstract
:1. Introduction
2. Related Work
2.1. Target Detection
2.2. Small Target Detection
3. Methods
3.1. Data Augmentation
3.2. Attention Module
3.3. Construction of a Bidirectional Feature Fusion Network (FPN)
4. Experiments
4.1. Datasets
4.2. Training Setup
4.3. Evaluation Indicators and Model Validity
4.4. Ablation Study of the Proposed Model
4.5. Comparison with Other Detection Models
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Tong, K.; Wu, Y.; Zhou, F. Recent advances in small object detection based on deep learning: A review. Image Vis. Comput. 2020, 97, 103910. [Google Scholar] [CrossRef]
- Zhao, Z.Q.; Zheng, P.; Xu, S.T.; Wu, X. Object detection with deep learning: A review. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 3212–3232. [Google Scholar] [CrossRef] [PubMed]
- Zaidi, S.S.A.; Ansari, M.S.; Aslam, A.; Kanwal, N.; Asghar, M.; Lee, B. A survey of modern deep learning based object detection models. Digit. Signal Process. 2022, 126, 103514. [Google Scholar] [CrossRef]
- Zhang, H.; Cisse, M.; Dauphin, Y.N.; Lopez-Paz, D. mixup: Beyond empirical risk minimization. arXiv 2017, arXiv:1710.09412. [Google Scholar] [CrossRef]
- Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. Yolox: Exceeding yolo series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar] [CrossRef]
- Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6517–6525. [Google Scholar] [CrossRef]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37. [Google Scholar] [CrossRef]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 936–944. [Google Scholar] [CrossRef]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar] [CrossRef]
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar] [CrossRef]
- Zhang, J.; Meng, Y.; Chen, Z. A small target detection method based on deep learning with considerate feature and effectively expanded sample size. IEEE Access 2021, 9, 96559–96572. [Google Scholar] [CrossRef]
- Kisantal, M.; Wojna, Z.; Murawski, J.; Naruniec, J.; Cho, K. Augmentation for small object detection. arXiv 2019, arXiv:1902.07296. [Google Scholar] [CrossRef]
- Chen, C.; Liu, M.Y.; Tuzel, O.; Xiao, J. 2017. R-CNN for small object detection. In Proceedings of the Computer Vision–ACCV 2016: 13th Asian Conference on Computer Vision, Taipei, Taiwan, 20–24 November 2016; pp. 214–230. [Google Scholar] [CrossRef]
- Romano, Y.; Isidoro, J.; Milanfar, P. RAISR: Rapid and accurate image super resolution. IEEE Trans. Comput. Imaging 2016, 3, 110–125. [Google Scholar] [CrossRef]
- Zhang, Y.; Bai, Y.; Ding, M.; Ghanem, B. Multi-task generative adversarial network for detecting small objects in the wild. Int. J. Comput. Vis. 2020, 128, 1810–1828. [Google Scholar] [CrossRef]
- Li, J.; Liang, X.; Wei, Y.; Xu, T.; Feng, J.; Yan, S. Perceptual Generative Adversarial Networks for Small Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1951–1959. [Google Scholar] [CrossRef]
- Pang, Y.; Cao, J.; Wang, J.; Han, J. JCS-Net: Joint classification and super-resolution network for small-scale pedestrian detection in surveillance images. IEEE Trans. Inf. Forensics Secur. 2019, 14, 3322–3331. [Google Scholar] [CrossRef]
- Olorunshola, O.E.; Irhebhude, M.E.; Evwiekpaefe, A.E. A comparative study of YOLOv5 and YOLOv7 object detection algorithms. J. Comput. Soc. Inform. 2023, 2, 1–12. [Google Scholar] [CrossRef]
- Wang, M.; Yang, W.; Wang, L.; Chen, D.; Wei, F.; KeZiErBieKe, H.; Liao, Y. FE-YOLOv5: Feature enhancement network based on YOLOv5 for small object detection. J. Vis. Commun. Image Represent. 2023, 90, 103752. [Google Scholar] [CrossRef]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar] [CrossRef]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. 2018. CBAM: Convolutional Block Attention Module. In Proceedings of the Computer Vision—ECCV 2018: 15th European Conference, Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar] [CrossRef]
- Hou, Q.; Zhou, D.; Feng, J. Coordinate Attention for Efficient Mobile Network Design. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 13708–13717. [Google Scholar] [CrossRef]
- Cao, Y.; He, Z.; Wang, L.; Wang, W.; Yuan, Y.; Zhang, D.; Zhang, J.; Zhu, P.; Van Gool, L.; Han, J.; et al. VisDrone-DET2021: The Vision Meets Drone Object Detection Challenge Results. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Montreal, BC, Canada, 11–17 October 2021; pp. 2847–2854. [Google Scholar] [CrossRef]
- Zhu, Z.; Liang, D.; Zhang, S.; Huang, X.; Li, B.; Hu, S. Traffic-Sign Detection and Classification in the Wild. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2110–2118. [Google Scholar] [CrossRef]
- Zhu, X.; Lyu, S.; Wang, X.; Zhao, Q. TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-Captured Scenarios. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Montreal, BC, Canada, 11–17 October 2021; pp. 2778–2788. [Google Scholar] [CrossRef]
Model | mAP | mAP50 | mAP75 | APsmall | APmedium | APlarge |
---|---|---|---|---|---|---|
YOLOv5s | 18.4 | 33.1 | 15.6 | 10.5 | 26.5 | 36.1 |
MCF-YOLOv5 | 22.1 | 38.3 | 21.5 | 13.8 | 32.1 | 40.1 |
Model | mAP | mAP50 | mAP75 | APsmall | APmedium | APlarge |
---|---|---|---|---|---|---|
YOLOv5s | 62.5 | 79.4 | 60.4 | 48.2 | 69.7 | 76.5 |
MCF-YOLOv5 | 63.8 | 85.1 | 62.5 | 51.8 | 72.2 | 79.8 |
YOLOv5 | Mixup+Mosaic | CA | SODL | mAP | mAP50 | mAP75 | APsmall | APmedium | APlarge |
---|---|---|---|---|---|---|---|---|---|
✓ | 18.4 | 33.1 | 15.6 | 10.5 | 26.5 | 36.1 | |||
✓ | ✓ | 18.9 | 33.4 | 15.7 | 10.9 | 26.5 | 36.4 | ||
✓ | ✓ | 19.5 | 33.8 | 16.5 | 11.7 | 27.2 | 37.1 | ||
✓ | ✓ | 20.3 | 34.7 | 18.1 | 12.6 | 28.6 | 37.8 | ||
✓ | ✓ | ✓ | 19.9 | 34.5 | 17.2 | 12.1 | 29.5 | 38.3 | |
✓ | ✓ | ✓ | 20.8 | 36.0 | 19.5 | 12.9 | 30.4 | 38.7 | |
✓ | ✓ | ✓ | 21.7 | 37.9 | 20.8 | 13.3 | 31.2 | 39.3 | |
✓ | ✓ | ✓ | ✓ | 22.1 | 38.3 | 21.5 | 13.8 | 32.1 | 40.1 |
Model | P | R | mAP | mAP50 | mAP75 | APsmall | APlarge |
---|---|---|---|---|---|---|---|
SSD | 38.7 | 30.3 | 17.6 | 27.9 | 11.4 | 9.5 | 33.4 |
YOLOv5s | 40.5 | 33.2 | 18.4 | 33.1 | 15.6 | 10.7 | 36.1 |
YOLOX [5] | 41.9 | 34.5 | 19.0 | 34.8 | 16.5 | 10.9 | 37.8 |
YOLOv7s [16] | 43.5 | 35.4 | 19.5 | 36.4 | 17.3 | - | - |
TPH-YOLOv5 [31] | 44.7 | 36.6 | 21.4 | 37.6 | 19.7 | 12.9 | - |
YOLOv5m | 44.5 | 36.3 | 20.9 | 36.9 | 19.3 | 12.3 | 41.5 |
MCF-YOLOv5(ours) | 45.2 | 37.0 | 22.1 | 38.3 | 21.1 | 13.8 | 40.1 |
Model | Input Size | Params (M) | FLOPs (G) | FPS |
---|---|---|---|---|
YOLOv5s | 640 × 640 | 7.2 | 16.21 | 95.2 |
MCF-Yolov5(ours) | 640 × 640 | 10.31 | 25.52 | 88.5 |
YOLOv5-M | 640 × 640 | 21.2 | 48.7 | 67.1 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gao, S.; Gao, M.; Wei, Z. MCF-YOLOv5: A Small Target Detection Algorithm Based on Multi-Scale Feature Fusion Improved YOLOv5. Information 2024, 15, 285. https://doi.org/10.3390/info15050285
Gao S, Gao M, Wei Z. MCF-YOLOv5: A Small Target Detection Algorithm Based on Multi-Scale Feature Fusion Improved YOLOv5. Information. 2024; 15(5):285. https://doi.org/10.3390/info15050285
Chicago/Turabian StyleGao, Song, Mingwang Gao, and Zhihui Wei. 2024. "MCF-YOLOv5: A Small Target Detection Algorithm Based on Multi-Scale Feature Fusion Improved YOLOv5" Information 15, no. 5: 285. https://doi.org/10.3390/info15050285
APA StyleGao, S., Gao, M., & Wei, Z. (2024). MCF-YOLOv5: A Small Target Detection Algorithm Based on Multi-Scale Feature Fusion Improved YOLOv5. Information, 15(5), 285. https://doi.org/10.3390/info15050285