A Robust Lightweight Network for Pedestrian Detection Based on YOLOv5-x
Abstract
:1. Introduction
- A RainDet3000 dataset is proposed to fill the gap that the pedestrian detection dataset does not target rainy days, providing a more realistic detection scenario for network training.
- The bottleneck layer structure of GhostNet has been optimized using a compact bilinear pooling algorithm to enhance the network’s feature learning capability while maintaining its lightweight architecture. The resulting CBP-GNet is then utilized as the backbone network for RSTDet-Lite.
- The proposed Simple-BiFPN is an extension of BiFPN, which delivers superior computational efficiency compared with YOLOv5’s PANet feature fusion network by eliminating redundant computational overhead. In addition, an attention mechanism module called CBAM is incorporated between the backbone network and the feature fusion network to optimize the assignment of feature weights, thus enabling the network to learn more effective features.
- The REP structure is a novel approach to enhancing the capacity of neural networks through structural reparameterization. During training, the REP structure increases the number of trainable parameters, allowing the network to capture richer and more abstract semantic concepts without increasing inference time. In other words, the REP structure provides a way to improve the network’s performance without compromising its efficiency in real-time applications.
2. Related Work
YOLOv5 Algorithm
3. Method
3.1. Design of Backbone Network
3.2. Proposed Simple-BiFPN Structure
3.3. Incorporating the Spatial Attention Mechanism CBAM
3.4. REP Structures Combining Structural Reparameterization Ideas
4. Experiments
4.1. Experimental Environment Configuration and Data Introduction
4.2. Experimental Environment Configuration and Data Introduction
4.3. Analysis of Experimental Results
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Ahonen, T.; Hadid, A.; Pietikainen, M. Face description with local binary patterns: Application to face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 28, 2037–2041. [Google Scholar] [CrossRef] [PubMed]
- Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005; Volume 1. [Google Scholar]
- Wu, B.; Nevatia, R. Detection of multiple, partially occluded humans in a single image by bayesian combination of edgelet part detectors. In Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV’05), Beijing, China, 17–21 October 2005; Volume 1. [Google Scholar]
- Ye, L.; Keogh, E. Time series shapelets: A new primitive for data mining. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France, 28 June–1 July 2009. [Google Scholar]
- Lienhart, R.; Maydt, J. An extended set of haar-like features for rapid object detection. In Proceedings of the International Conference on Image Processing, Rochester, NY, USA, 22–25 September 2002; Volume 1. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014. [Google Scholar]
- Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28, 91–99. [Google Scholar] [CrossRef] [PubMed]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Tian, Z.; Shen, C.; Chen, H.; He, T. Fcos: Fully convolutional one-stage object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
- Zhang, Y.; Zhou, A.; Zhao, F.; Wu, H. A lightweight vehicle-pedestrian detection algorithm based on attention mechanism in traffic scenarios. Sensors 2022, 22, 8480. [Google Scholar] [CrossRef] [PubMed]
- Sun, H.; Dong, X.; Wang, J.; Chen, Z. Based on the improved YOLOv4-tiny lightweight pedestrian in school target detection algorithm. Comput. Eng. Appl. 2023, 35, 13895–13906. [Google Scholar]
- Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. Ghostnet: More features from cheap operations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
- Zhang, B.; Kang, Q.; Li, J.; Guo, J.; Chen, S. Lightweight YOLOv4 Object Detection Algorithm. Comput. Eng. 2022, 48, 206–214. [Google Scholar] [CrossRef]
- Roszyk, K.; Nowicki, M.R.; Skrzypczyński, P. Adopting the YOLOv4 architecture for low-latency multispectral pedestrian detection in autonomous driving. Sensors 2022, 22, 1082. [Google Scholar] [CrossRef] [PubMed]
- Li, M.-L.; Sun, G.-B.; Yu, J.-X. A pedestrian detection network model based on improved YOLOv5. Entropy 2023, 25, 381. [Google Scholar] [CrossRef] [PubMed]
- Sha, M.; Zeng, K.; Tao, Z.; Wang, Z.; Liu, Q. Lightweight Pedestrian Detection Based on Feature Multiplexed Residual Network. Electronics 2023, 12, 918. [Google Scholar] [CrossRef]
- Zhao, Q.; Ma, W.; Zheng, C.; Li, L. Exploration of Vehicle Target Detection Method Based on Lightweight YOLOv5 Fusion Background Modeling. Appl. Sci. 2023, 13, 4088. [Google Scholar] [CrossRef]
- Sun, Z.; Liu, C.A.; Qu, H.; Xie, G. PVformer: Pedestrian and vehicle detection algorithm based on Swin transformer in rainy scenes. Sensors 2022, 22, 5667. [Google Scholar] [CrossRef] [PubMed]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef] [PubMed]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016. [Google Scholar]
- Gao, Y.; Beijbom, O.; Zhang, N.; Darrell, T. Compact bilinear pooling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016. [Google Scholar]
- Lin, T.-Y.; RoyChowdhury, A.; Maji, S. Bilinear CNN models for fine-grained visual recognition. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015. [Google Scholar]
- Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
- Ghiasi, G.; Lin, T.-Y.; Le, Q.V. Nas-fpn: Learning scalable feature pyramid architecture for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
- Ding, X.; Zhang, X.; Ma, N.; Han, J.; Ding, G.; Sun, J. Repvgg: Making vgg-style convnets great again. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021. [Google Scholar]
- Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016. [Google Scholar]
- Wen, L.; Du, D.; Cai, Z.; Lei, Z.; Chang, M.C.; Qi, H.; Lim, J.; Yang, M.H.; Lyu, S. UA-DETRAC: A new benchmark and protocol for multi-object detection and tracking. Comput. Vis. Image Underst. 2020, 193, 102907. [Google Scholar] [CrossRef]
- Dollár, P.; Wojek, C.; Schiele, B.; Perona, P. Pedestrian detection: A benchmark. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009. [Google Scholar]
- Geiger, A.; Lenz, P.; Stiller, C.; Urtasun, R. Vision meets robotics: The kitti dataset. Int. J. Robot. Res. 2013, 32, 1231–1237. [Google Scholar] [CrossRef]
- Yu, F.; Chen, H.; Wang, X.; Xian, W.; Chen, Y.; Liu, F.; Madhavan, V.; Darrell, T. Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
Item | Parameter |
---|---|
CPU | Intel(R) Core i5-10200H |
GPU | NVIDIA GeForce GTX 2060 |
Operating System | Ubuntu 16.04 LTS |
Memory | 8 GB |
Deep learning framework version | Pytorch 1.8 |
Development languages | Python 3.9 |
Backbone | mAP (%) | Model Size (Mb) | Model Scaling Ratio | FPS |
---|---|---|---|---|
- | 49.91 | 86.1 | - | 30.5 |
GhostNet | 43.9 | 21.3 | 75.2 | 48.6 |
CBP-GNet | 50.1 | 24.1 | 72 | 47.9 |
Feature Fusion Network | mAP (%) | Model Size (Mb) | Model Scaling Ratio | FPS |
---|---|---|---|---|
PANet | 50.1 | 24.1 | 72 | 47.9 |
NAS-FPN | 51.2 | 25.9 | 69.9 | 34.6 |
Simple-BiFPN | 52.99 | 11.4 | 86.8 | 48.8 |
Network (CBP-GNet + Simple-BiFPN) | mAP (%) | Model Size (Mb) | Model Scaling Ratio | FPS |
---|---|---|---|---|
- | 52.99 | 11.4 | 86.8 | 48.8 |
+CBAM | 53.09 | 11.4 | 86.8 | 49.8 |
+CBAM + REP | 54.47 | 12.3 | 85.7 | 49.8 |
Model | Input Image Size | Recall (%) | mAP (%) | AP60 | Model Size (Mb) | FPS |
---|---|---|---|---|---|---|
YOLOv4 | 416 × 416 | 66.43 | 45.91 | 44.31 | 244.0 | 18.6 |
YOLOv5x | 640 × 640 | 73.41 | 49.91 | 44.86 | 42.3 | 30.5 |
YOLOv7 | 640 × 640 | 74.51 | 50.9 | 47.6 | 37.1 | 48.3 |
Improved YOLOv4-Tiny [14] | 416 × 416 | 54.32 | 30.40 | 22.1 | 7.1 | 32.1 |
Improved YOLOv4 [16] | 416 × 416 | 79.15 | 47.93 | 41.92 | 187.0 | 29.0 |
PVformer [21] | 640 × 640 | 83.35 | 52.44 | 45.3 | 145.4 | 19.1 |
RSTDet-Lite | 416 × 416 | 88.31 | 54.47 | 51.68 | 12.3 | 49.8 |
Model | Input Image Size | Recall (%) | mAP (%) | AP60 | Model Size (Mb) | FPS |
---|---|---|---|---|---|---|
YOLOv5x | 640 × 640 | 89.23 | 43.1 | 36.9 | 42.3 | 24.9 |
YOLOX | 416 × 416 | 91.21 | 45.77 | 39.81 | 19.8 | 29.8 |
CA-MobileNetv2-YOLOv4 | 640 × 640 | 92.33 | 43.89 | 38.93 | 40.1 | 31.2 |
PVformer | 640 × 640 | 94.65 | 47.23 | 45.67 | 145.4 | 25.9 |
RSTDet-Lite | 416 × 416 | 95.46 | 49.89 | 45.32 | 12.3 | 43.1 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chen, Y.; Wang, C.; Zhang, C. A Robust Lightweight Network for Pedestrian Detection Based on YOLOv5-x. Appl. Sci. 2023, 13, 10225. https://doi.org/10.3390/app131810225
Chen Y, Wang C, Zhang C. A Robust Lightweight Network for Pedestrian Detection Based on YOLOv5-x. Applied Sciences. 2023; 13(18):10225. https://doi.org/10.3390/app131810225
Chicago/Turabian StyleChen, Yuanjie, Chunyuan Wang, and Chi Zhang. 2023. "A Robust Lightweight Network for Pedestrian Detection Based on YOLOv5-x" Applied Sciences 13, no. 18: 10225. https://doi.org/10.3390/app131810225
APA StyleChen, Y., Wang, C., & Zhang, C. (2023). A Robust Lightweight Network for Pedestrian Detection Based on YOLOv5-x. Applied Sciences, 13(18), 10225. https://doi.org/10.3390/app131810225