A Multi-Scale Traffic Object Detection Algorithm for Road Scenes Based on Improved YOLOv5
Abstract
:1. Introduction
- We add the fourth detection head for the detection of extremely small objects on the basis of the three detection heads of the original YOLOv5, which improved the problem of wrong detection and missing detection of extremely small objects in complex traffic images.
- A new content-aware reassembly of features (CARAFE) module is used for feature fusion, which enhances the feature fusion capability of the neck part. It is lighter than the traditional upsampling module and requires fewer parameters and less computation.
- A new SPD-Conv CNN Module is used to replace the original convolution module, which improves detection accuracy for low-resolution images and extremely small objects. It uses the space-to-depth and non-strided convolution layers to replace the original pooling and strided convolution layers.
- An effective attention mechanism, Normalization-based Attention Module (NAM), is added to the neck part, which improves the accuracy and robustness of the model. It applies a weight sparsity penalty to the attention modules, making them more computationally efficient while retaining similar performance.
2. Related Work
3. Methods
3.1. Dataset Construction
3.2. Data Augmentation
3.3. Algorithm Optimization
3.3.1. Additional Detection Head
3.3.2. Content-Aware Reassembly of Feature Module
3.3.3. SPD-Conv CNN Module
3.3.4. Normalization-Based Attention Module
4. Experiments
4.1. Implementation Details
4.2. Model Algorithm Evaluation Index
4.3. Comparison of Multi-Scale YOLOv5s Models’ Performances for Each Category
4.4. Comparison of Multi-Scale YOLOv5s Models’ Performances with Different Attention Mechanisms
4.5. Ablation Experiments
4.6. Methods’ Comparative Experiment
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Matthews, N.; An, P.; Charnley, D.; Harris, C. Vehicle Detection and Recognition in Greyscale Imagery. IFAC Proc. Vol. 1995, 4, 473–479. [Google Scholar]
- You, M.; Zhang, Y.; Shen, C.; Zhang, X. An Extended Filtered Channel Framework for Pedestrian Detection. IEEE Trans. Intell. Transp. Syst. 2018, 19, 1640–1651. [Google Scholar] [CrossRef]
- Walk, S.; Majer, N.; Schindler, K.; Schiele, B. New features and insights for pedestrian detection. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. Adv. Neural Inf. Process. Syst. 2015, 28. [Google Scholar] [CrossRef] [PubMed]
- Joseph, R.; Santosh, D.; Ross, G.; Ali, F. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the Computer Vision—ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar]
- Zhang, H.; Wang, Y.; Dayoub, F.; Sünderhauf, N. VarifocalNet: An IoU-aware Dense Object Detector. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020. [Google Scholar]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009. [Google Scholar]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. In Proceedings of the Computer Vision—ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014. [Google Scholar]
- Everingham, M.; Gool, L.V.; Williams, C.; Winn, J.; Zisserman, A. The Pascal Visual Object Classes (VOC) Challenge. Int. J. Comput. Vis. 2010, 88, 303–308. [Google Scholar] [CrossRef]
- Geiger, A.; Lenz, P.; Stiller, C.; Urtasun, R. Vision meets robotics: The KITTI dataset. Int. J. Robot. Res. 2013, 32, 1231–1237. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. Scaled-YOLOv4: Scaling Cross Stage Partial Network. In Proceedings of the IEEE/cvf Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020. [Google Scholar]
- Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. YOLOX: Exceeding YOLO Series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
- Tian, Z.; Shen, C.; Chen, H.; He, T. FCOS: Fully Convolutional One-Stage Object Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar]
- Zhu, X.; Su, W.; Lu, L.; Li, B.; Wang, X.; Dai, J. Deformable DETR: Deformable Transformers for End-to-End Object Detection. arXiv 2020, arXiv:2010.04159. [Google Scholar]
- Zhou, X.; Koltun, V.; Krähenbühl, P. Probabilistic two-stage detection. arXiv 2021, arXiv:2103.07461. [Google Scholar]
- Yu, F.; Zhong, M.; Tang, S.; Zheng, Z. Improved traffic signal light recognition algorithm based on YOLO v3. In Proceedings of the International Conference on Optics and Machine Vision (ICOMV 2022), Guangzhou, China, 14–16 January 2022. [Google Scholar]
- Zhu, J.; Li, X.; Jin, P.; Xu, Q.; Sun, Z.; Song, X. MME-YOLO: Multi-Sensor Multi-Level Enhanced YOLO for Robust Vehicle Detection in Traffic Surveillance. Sensors 2020, 21, 27. [Google Scholar] [CrossRef] [PubMed]
- Li, Y.; Li, J.; Meng, P. Attention-YOLOV4: A real-time and high-accurate traffic sign detection algorithm. Multimed. Tools Appl. 2022, 82, 7567–7582. [Google Scholar] [CrossRef]
- Mittal, U.; Chawla, P.; Tiwari, R. EnsembleNet: A hybrid approach for vehicle detection and estimation of traffic density based on faster R-CNN and YOLO models. Neural Comput. Appl. 2022, 35, 4755–4774. [Google Scholar]
- Wang, C.Y.; Liao, H.Y.M.; Yeh, I.H.; Wu, Y.H.; Chen, P.Y.; Hsieh, J.W. CSPNet: A New Backbone that can Enhance Learning Capability of CNN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA, 16–17 June 2019. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016. [Google Scholar]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path Aggregation Network for Instance Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Wang, J.; Chen, K.; Xu, R.; Liu, Z.; Loy, C.C.; Lin, D. CARAFE: Content-Aware ReAssembly of FEatures. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27–28 October 2019. [Google Scholar]
- Sunkara, R.; Luo, T. No More Strided Convolutions or Pooling: A New CNN Building Block for Low-Resolution Images and Small Objects. arXiv 2022, arXiv:2208.03641. [Google Scholar]
- Liu, Y.; Shao, Z.; Teng, Y.; Hoffmann, N. NAM: Normalization-based Attention Module. arXiv 2022, arXiv:2111.12419. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2015, arXiv:1409.1556. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015. [Google Scholar]
- Huang, G.; Liu, Z.; van der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Tan, M.; Pang, R.; Le, Q.V. EfficientDet: Scalable and Efficient Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
- Liu, S.; Huang, D.; Wang, Y. Learning Spatial Fusion for Single-Shot Object Detection. arXiv 2019, arXiv:1911.09516. [Google Scholar]
- Hu, J.; Shen, L.; Albanie, S.; Sun, G.; Wu, E. Squeeze-and-Excitation Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
- Hou, Q.; Zhou, D.; Feng, J. Coordinate Attention for Efficient Mobile Network Design. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 19–25 June 2021. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
- Zhang, H.; Cisse, M.; Dauphin, Y.N.; Lopez-Paz, D. mixup: Beyond Empirical Risk Minimization. arXiv 2017, arXiv:1710.09412. [Google Scholar]
Methods | All | Car | Truck | Bus | Person | Fire | Smoke | Cone | Div | Suit | Box | Moto |
---|---|---|---|---|---|---|---|---|---|---|---|---|
YOLOv5s | 78.3 | 95.1 | 93.4 | 64.9 | 81.3 | 98.1 | 99.5 | 76.2 | 62.0 | 57.9 | 61.3 | 72.1 |
YOLOv5s-F | 81.9 | 94.9 | 94.1 | 67.9 | 85.7 | 97.9 | 99.5 | 81.7 | 72.3 | 70.5 | 65.7 | 70.8 |
YOLOv5s-C | 79.8 | 96.3 | 94.0 | 68.8 | 80.2 | 98.0 | 99.2 | 75.1 | 67.7 | 60.0 | 60.3 | 78.6 |
YOLOv5s-S | 78.6 | 95.9 | 93.8 | 65.1 | 83.6 | 98.1 | 99.5 | 70.6 | 68.3 | 57.1 | 60.8 | 72.5 |
YOLOv5s-N | 81.3 | 98.2 | 95.6 | 70.1 | 80.7 | 98.7 | 99.6 | 78.6 | 69.1 | 61.9 | 61.7 | 80.7 |
Methods | Params (M) | FLOPs@640 (B) | [email protected] (%) | Precision (%) | Recall (%) |
---|---|---|---|---|---|
YOLOv5s-FCS | 12.5 | 25.5 | 83.1 | 96.5 | 82.0 |
YOLOv5s-FCS-NAM | 16.1 | 32.1 | 85.4 | 97.2 | 87.0 |
YOLOv5s-FCS-SE | 15.8 | 30.7 | 83.5 | 96.2 | 85.3 |
YOLOv5s-FCS-CA | 15.5 | 30.1 | 84.9 | 96.9 | 85.0 |
YOLOv5s-FCS-CBAM | 16.5 | 35.7 | 85.1 | 97.1 | 86.9 |
Methods | Params (M) | FLOPs@640 (B) | [email protected] (%) | Precision (%) | Recall (%) |
---|---|---|---|---|---|
YOLOv5s-FCS | 12.5 | 25.5 | 65.4 | 77.6 | 67.9 |
YOLOv5s-FCS-NAM | 16.1 | 32.1 | 69.7 | 80.9 | 69.3 |
YOLOv5s-FCS-SE | 15.8 | 30.7 | 66.3 | 79.2 | 70.6 |
YOLOv5s-FCS-CA | 15.5 | 30.1 | 67.9 | 80.1 | 65.4 |
YOLOv5s-FCS-CBAM | 16.5 | 35.7 | 68.1 | 78.6 | 67.8 |
Methods | Params (M) | FLOPs@640 (B) | [email protected] (%) | Precision (%) | Recall (%) |
---|---|---|---|---|---|
YOLOv5s-FCS | 12.5 | 25.5 | 79.6 | 87.6 | 78.9 |
YOLOv5s-FCS-NAM | 16.1 | 32.1 | 82.4 | 90.7 | 76.3 |
YOLOv5s-FCS-SE | 15.8 | 30.7 | 80.6 | 88.2 | 80.9 |
YOLOv5s-FCS-CA | 15.5 | 30.1 | 81.7 | 91.6 | 81.2 |
YOLOv5s-FCS-CBAM | 16.5 | 35.7 | 81.4 | 90.6 | 75.5 |
Methods | F | C | S | N | [email protected] (%) | Precision (%) | Recall (%) |
---|---|---|---|---|---|---|---|
YOLOv5s | 78.3 | 96.0 | 81.0 | ||||
YOLOv5s-F | + | 81.9 | 96.5 | 82.0 | |||
YOLOv5s-C | + | 79.8 | 96.0 | 81.0 | |||
YOLOv5s-S | + | 78.6 | 96.2 | 81.2 | |||
YOLOv5s-N | + | 81.3 | 96.9 | 82.1 | |||
YOLOv5s-CSN | + | + | + | 84.1 | 96.9 | 82.3 | |
YOLOv5s-FSN | + | + | + | 84.2 | 97.0 | 82.2 | |
YOLOv5s-FCN | + | + | + | 83.9 | 96.9 | 82.0 | |
YOLOv5s-FCS | + | + | + | 83.1 | 96.5 | 82.0 | |
YOLOv5s-FCSN | + | + | + | + | 85.4 | 97.2 | 87.0 |
Methods | Params (M) | FLOPs@640 (B) | [email protected] (%) | Precision (%) | Recall (%) |
---|---|---|---|---|---|
YOLOv4 | 62.1 | 128.4 | 74.0 | 89.6 | 80.1 |
YOLOv4-tiny | 6.1 | 3.4 | 75.9 | 90.7 | 80.5 |
SSD | 50.4 | 114.2 | 70.1 | 85.4 | 73.2 |
Faster R-CNN | 67.9 | 147.2 | 73.6 | 88.9 | 77.8 |
YOLOv5x | 86.7 | 205.7 | 85.3 | 97.1 | 85.0 |
YOLOv5l | 46.5 | 109.1 | 82.9 | 96.2 | 81.1 |
YOLOv5m | 21.2 | 49.0 | 80.6 | 96.2 | 82.4 |
YOLOv5s | 7.2 | 16.5 | 78.3 | 96.0 | 81.0 |
YOLOv5n | 1.9 | 4.5 | 69.7 | 87.6 | 78.9 |
Multi-scale YOLOv5s | 16.1 | 32.1 | 85.4 | 97.2 | 87.0 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, A.; Sun, S.; Zhang, Z.; Feng, M.; Wu, C.; Li, W. A Multi-Scale Traffic Object Detection Algorithm for Road Scenes Based on Improved YOLOv5. Electronics 2023, 12, 878. https://doi.org/10.3390/electronics12040878
Li A, Sun S, Zhang Z, Feng M, Wu C, Li W. A Multi-Scale Traffic Object Detection Algorithm for Road Scenes Based on Improved YOLOv5. Electronics. 2023; 12(4):878. https://doi.org/10.3390/electronics12040878
Chicago/Turabian StyleLi, Ang, Shijie Sun, Zhaoyang Zhang, Mingtao Feng, Chengzhong Wu, and Wang Li. 2023. "A Multi-Scale Traffic Object Detection Algorithm for Road Scenes Based on Improved YOLOv5" Electronics 12, no. 4: 878. https://doi.org/10.3390/electronics12040878
APA StyleLi, A., Sun, S., Zhang, Z., Feng, M., Wu, C., & Li, W. (2023). A Multi-Scale Traffic Object Detection Algorithm for Road Scenes Based on Improved YOLOv5. Electronics, 12(4), 878. https://doi.org/10.3390/electronics12040878