Traffic Sign Detection Based on Lightweight Multiscale Feature Fusion Network
Abstract
:1. Introduction
2. Methodology and Models
2.1. Feature Extraction
2.1.1. Lightweight Feature Extraction Network
2.1.2. Multiscale Feature Fusion Network
- Top-down module. To obtain a rich feature representation, the multiscale fusion module designed in this paper selected seven different sizes of feature maps for fusion, denoted as . These feature maps were the feature maps output from Bottleneck_2, Bottleneck_4, Bottleneck_7, Bottleneck_9, Bottleneck_11, Bottleneck_13, and Bottleneck_15 in the lightweight feature extraction network, with sizes of 512 × 512 × 24, 256 × 256 × 40, 128 × 128 × 80, 64 × 64 × 112, 32 × 32 × 160, 16 × 16 × 200, and 8 × 8 × 240, respectively.
- 2.
- Bottom-up Module. To improve the limitations of unidirectional information flow, a bottom-up module was designed in this paper. The bottom-up module mainly consisted of three parts: max pooling, lateral connection, and cross-stage connection. The purple arrows in the figure represent max pooling, and show the cross-level connections. By cross-level connection, more features could be fused without additional cost. The feature map after max pooling, the feature map after cross-level connection, and the feature map obtained in the top-down module were added pixel by pixel, and the obtained results were used for feature extraction through convolution to obtain the output of the top-down module.
- 3.
- Layer-by-Layer Connection Module. To improve the efficiency of network detection, the connection module fused the feature map generated by the bottom-up module by summation and then performed feature re-extraction by convolution with a size of , a step size of , and a number of output channels to generate the fused feature map of .
2.2. Hybrid Attention Module
2.2.1. Spatial Attention Module
- Input the feature map , where the superscript indicates the height and width of the feature map and the number of channels.
- Convolution of the input feature map of the size to generate spatial feature map :
- Compression of the channel direction to generate the feature map ; its generation process is as follows:
- Generation of spatial attention weight maps by pooling, convolution, and deformation operations :
- Acquisition of the spatial attention map by multiplying the spatial attention weight map and the spatial feature map ; then, the spatial attention map is added to to obtain the final output of the spatial attention subnet :
2.2.2. Channel Attention Module
- Enter the feature map , where the superscript indicates the height and width of the feature map and the number of channels.
- After the convolution operation of is performed on the input feature map , generate the channel domain basic feature map :
- Calculate the global maximum pooling and global average pooling:
- Compress the channels through compressed convolution, and then the feature maps are updimensioned using neurons to obtain the outputs of the global maximum pooling branch and the global average pooling branch , respectively:
- Obtain the fused feature maps of the global maximum pooling branch and the global average pooling branch by adding and pixel by pixel to obtain the channel domain attention weight map:
- Obtain the channel domain attention map by multiplying the channel domain attention weight map with the channel domain basis feature map ; then, add the channel domain attention map with to obtain the final output of the channel domain attention subnetwork :
2.3. Detection Network
3. Experiments and Results
3.1. Experiment Platform
3.2. Ablation Study
3.3. Structural Experiments
3.4. Evaluation of TT100K
3.4.1. Comparative Experiment of Similar Algorithm
3.4.2. Different Types of Traffic Sign Detection
4. Discussion
- (1)
- The meanings of traffic signs are expressed through shapes, colors, graphics, and words, and traffic signs with the same meanings in different countries or regions may have different shapes, colors, graphics, and words. Therefore, it is a challenge for a network to recognize not only the existing data categories in the training set, but also the data not previously seen. Therefore, in the next step of study, the migration capability of the network should investigated so that the network can be more widely used.
- (2)
- When testing the network, experiments were conducted only under daily conditions, such as normal conditions, the presence of occlusion, and uneven lighting. However, during actual driving, traffic sign detection results in bad weather are more important for driving safety. Therefore, in future study, more challenging environments, such as the presence of fog, haze, rain, or snow, should be selected for experimental analysis.
- (3)
- In the process of driving, a driver pays different attention to the traffic signs he sees, paying attention to ones that require attention and ignoring unwanted information. For example, if a driver is at an intersection and his route is straight, while the traffic signs for a right-turn road may also exist in the driver’s field of vision, the driver automatically ignores the information that is not meaningful for behavioral decision making, which reduces the burden of information processing in the human brain. Therefore, the next step of research should try to construct a network that gives different attention to traffic signs in images to further simplify information redundancy and make the network more suitable for complex traffic environments.
5. Conclusions
- (1)
- To address the problems that traffic sign detection requires high real-time performance and that the existing convolutional neural networks had many redundancies, a lightweight feature extraction network was designed and a key-point detection method was adopted instead of the original anchor frame traversal detection method. In addition, a feature-interleaving module was designed to realize the multiscale extraction of feature information for the problem that traffic sign sizes in a traffic scene map were variable and the semantic information obtained by the existing network was single.
- (2)
- To improve the detection effect when a traffic sign occupied a small image size, was densely arranged, or had too much background information in the image, a hybrid attention module was designed and constructed, which was divided into a spatial attention branch and a channel attention branch, giving different weights to different locations in space and different channels, respectively.
- (3)
- Experiments showed that the algorithm in this paper achieved 85% recognition accuracy for different scale targets and most categories. Compared with Faster R-CNN, ConerNet, and CenterNet, the check-all rate and check-accuracy rate of the algorithm in this paper were significantly higher, and a better real-time performance was achieved. Therefore, the proposed network in this paper was robust, had high recognition accuracy, and achieved a good real-time performance.
Author Contributions
Funding
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Lu, Q.-C.; Zhang, L.; Xu, P.-C.; Cui, X.; Li, J. Modeling network vulnerability of urban rail transit under cascading failures: A Coupled Map Lattices approach. Reliab. Eng. Syst. Saf. 2022, 221, 108320. [Google Scholar] [CrossRef]
- Zhang, W.; Wang, Q.; Fan, H.; Tang, Y. Contextual and Multi-Scale Feature Fusion Network for Traffic Sign Detection. In Proceedings of the 2020 10th Institute of Electrical and Electronics Engineers International Conference on Cyber Technology in Automation, Control, and Intelligent Systems (CYBER), Xi’an, China, 10–13 October 2020. [Google Scholar]
- Sun, W.; Du, H.; Zhang, X.; Zhao, Y.; Yang, C. Traffic Sign Recognition Method based on Multi-layer Feature CNN and Extreme Learning Machine. J. Univ. Electron. Sci. Technol. China 2018, 47, 343–349. [Google Scholar]
- Wu, L.; Li, H.; He, J.; Chen, X. Traffic Sign Detection Method Based on Faster R-CNN. J. Phys. Conf. Ser. 2019, 1176, 32045. [Google Scholar] [CrossRef]
- Li, H.; Sun, F.; Liu, L.; Wang, L. A Novel Traffic Sign Detection Method via Color Segmentation and Robust Shape Matching. Neurocomputing 2015, 169, 77–88. [Google Scholar] [CrossRef]
- Yu, L.; Xia, X.; Zhou, K. Traffic Sign Detection Based on Visual Co-saliency in Complex Scenes. Appl. Intell. 2019, 49, 764–790. [Google Scholar] [CrossRef]
- Yu, C.; Hou, J.; Hou, C. Traffic Sign Detection Based on Saliency Map and Fourier Descriptor. Comput. Eng. 2017, 43, 28–34. [Google Scholar]
- Zhang, F.; Ji, R.; Jiao, S.; Qi, K. A Novel Saliency Computation Model for Traffic Sign Detection. In Proceedings of the 2017 2nd International Conference on Image, Vision and Computing (ICIVC), Chengdu, China, 2–4 June 2017. [Google Scholar]
- Yin, S.; Deng, J.; Zhang, D.; Du, J.-Y. Traffic Sign Recognition Based on Deep Convolutional Neural Network. Optoelectron. Lett. 2017, 13, 476–480. [Google Scholar] [CrossRef]
- Zhu, Z.; Liang, D.; Zhang, S.; Huang, X.; Li, B.; Hu, S. Traffic-Sign Detection and Classification in the Wild. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Xie, K.; Ge, S.; Ye, Q.; Luo, Z. Traffic Sign Recognition Based on Attribute-Refinement Cascaded Convolutional Neural Networks. In Proceedings of the Pacific Rim Conference on Multimedia, Xi’an, China, 15–16 September 2016. [Google Scholar]
- Zhu, Y.; Zhang, C.; Zhou, D.; Wang, X.; Bai, X.; Liu, W. Traffic Sign Detection and Recognition Using Fully Convolutional Network Guided Proposals. Neurocomputing 2016, 214, 758–766. [Google Scholar] [CrossRef]
- Zhang, Z.; Zhou, X.; Chan, S.; Chen, S.; Liu, H. Faster R-CNN for Small Traffic Sign Detection. In CCF Chinese Conference on Computer Vision; Springer: Singapore, 2017; pp. 155–165. [Google Scholar]
- Zuo, Z.; Yu, K.; Zhou, Q.; Wang, X.; Li, T. Traffic Signs Detection Based on Faster R-CNN. In Proceedings of the 2017 IEEE 37th International Conference on Distributed Computing Systems Workshops (ICDCSW), Atlanta, GA, USA, 5–8 June 2017; pp. 286–288. [Google Scholar]
- Luo, H.; Yang, Y.; Tong, B.; Wu, F.; Fan, B. Traffic Sign Recognition Using A Multi-Task Convolutional Neural Network. IEEE Trans. Intell. Transp. Syst. 2018, 19, 1100–1111. [Google Scholar] [CrossRef]
- Zhu, Y.; Liao, M.; Yang, M.; Liu, W. Cascaded Segmentation-Detection Networks for Text-Based Traffic Sign Detection. IEEE Trans. Intell. Transp. Syst. 2018, 19, 209–219. [Google Scholar] [CrossRef]
- Cheng, P.; Liu, W.; Zhang, Y.; Ma, H. LOCO: Local Context Based Faster R-CNN for Small Traffic Sign Detection. In Proceedings of the International Conference on Multimedia Modeling, Bangkok, Thailand, 5–7 February 2018; Springer: Cham, Switzerland; pp. 329–341. [Google Scholar]
- Pei, S.; Tang, F.; Ji, Y.; Fan, J.; Ning, Z. Localized Traffic Sign Detection with Multi-scale Deconvolution Networks. In Proceedings of the 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC), Tokyo, Japan, 23–27 July 2018; pp. 1–7. [Google Scholar]
- Li, J.; Liang, X.; Wei, Y.; Xu, T.; Feng, J.; Yan, S. Perceptual Generative Adversarial Networks for Small Object Detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1951–1959. [Google Scholar]
- Heng, L.; Qing, K. Traffic Sign Image Synthesis with Generative Adversarial Networks. In Proceedings of the 24th International Conferenceon Pattern Recognition, Beijing, China, 20–24 August 2018; pp. 2540–2545. [Google Scholar]
- Xiang, C.; Zhang, L.; Tang, Y.; Zou, W.; Xu, C. MS-CapsNet: A Novel Multi-Scale Capsule Network. IEEE Signal Process. Lett. 2018, 25, 1850–1854. [Google Scholar] [CrossRef]
- Zhang, J.; Xie, Z.; Sun, J.; Zou, X.; Wang, J. A Cascaded R-CNN With Multiscale Attention and Imbalanced Samples for Traffic Sign Detection. IEEE Access 2020, 42, 29742–29754. [Google Scholar] [CrossRef]
- Yuan, Y.; Zhi, X.; Qi, W. An Incremental Framework for Video-Based Traffic Sign Detection, Tracking, and Recognition. IEEE Trans. Intell. Transp. Syst. 2017, 18, 1918–1929. [Google Scholar] [CrossRef]
- Lee, H.; Kim, K. Simultaneous Traffic Sign Detection and Boundary Estimation using Convolutional Neural Network. IEEE Trans. Intell. Transp. Syst. 2018, 19, 1652–1663. [Google Scholar] [CrossRef] [Green Version]
- Kong, X.; Zhang, J.; Deng, L.; Liu, Y. Research Advances on Vehicle Parameter Identification Based on Machine Vision. China J. Highw. Transp. 2021, 34, 13–30. [Google Scholar]
- Zhou, K.; Zhan, Y.; Fu, D. Learning Region-Based Attention Network for Traffic Sign Recognition. Sensors 2021, 21, 686. [Google Scholar] [CrossRef]
- Lian, J.; Yin, Y.; Li, L.; Wang, Z.; Zhou, Y. Small Object Detection in Traffic Scenes Based on Attention Feature Fusion. Sensors 2021, 21, 3031. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G. ImageNet Classification with Deep Convolutional Neural Networks; NIPS. Curran Associates Inc.: New York, NY, USA, 2012; pp. 1097–1105. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556, 1–14. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. arXiv 2018, arXiv:1608.06993, 1–9. [Google Scholar]
- Iandola, F.N.; Han, S.; Moskewicz, M.W.; Ashraf, K.; Dally, W.J.; Keutzer, K. SqueezeNet: AlexNet-level Accuracy with 50× Fewer Parameters and <0.5MB Model Size. arXiv 2016, arXiv:1602.07360, 1–13. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861, 1–9. [Google Scholar]
- Zhang, M.; Xu, S.; Song, W.; He, Q.; Wei, Q. Lightweight underwater object detection based on yolo v4 and multi-scale attentional feature fusion. Remote Sens. 2021, 13, 4706. [Google Scholar] [CrossRef]
- Zhang, X.; Zhou, X.; Lin, M.; Sun, J. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. arXiv 2017, arXiv:1707.01083, 1–9. [Google Scholar]
- Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1800–1807. [Google Scholar]
- Howard, A.; Sandler, M.; Chen, B.; Wang, W.; Chen, L.-C.; Tan, M.; Chu, G.; Vasudevan, V.; Zhu, Y.; Pang, R.; et al. Searching for MobileNetV3. arXiv 2019, arXiv:1905.02244, 1–11. [Google Scholar]
- Howard, A.; Zhmoginov, A.; Chen, L.C.; Sandler, M.; Zhu, M. Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation. arXiv 2019, arXiv:1801.04381, 1–14. [Google Scholar]
- Tan, M.; Pang, R.; Le, Q.V. EfficientDet: Scalable and Efficient Object Detection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 10778–10787. [Google Scholar]
- Guo, M.-H.; Xu, T.-X.; Liu, J.-J.; Liu, Z.-N.; Jiang, P.-T.; Mu, T.-J.; Zhang, S.-H.; Martin, R.R.; Cheng, M.-M.; Hu, S.-M. Attention Mechanisms in Computer Vision: A survey. Comput. Vis. Media 2022, 8, 331–368. [Google Scholar] [CrossRef]
- Duan, K.; Bai, S.; Xie, L.; Qi, H.; Huang, Q.; Tian, Q. Centernet: Keypoint Triplets for Object Detection. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 6569–6578. [Google Scholar]
- Ghiasi, G.; Lin, T.Y.; Le, Q.V. NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 7029–7038. [Google Scholar]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path Aggregation Network for Instance Segmentation. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 8759–8768. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. arXiv 2018, arXiv:1807.06521, 1–17. [Google Scholar]
- Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H. Dual Attention Network for Scene Segmentation. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 3141–3149. [Google Scholar]
- Law, H.; Deng, J. CornerNet: Detecting Objects as Paired Keypoints. Int. J. Comput. Vis. 2020, 128, 642–656. [Google Scholar] [CrossRef]
Average Accuracy (%) | Recall (%) | F1 (%) | |||||||
---|---|---|---|---|---|---|---|---|---|
Small | Mid | Large | Small | Mid | Large | Small | Mid | Large | |
Model 1 | 82.7 | 91.1 | 89.8 | 78.6 | 87.9 | 87.5 | 80.6 | 89.5 | 88.6 |
Model 2 | 83.3 | 91.7 | 90.4 | 80.8 | 88.6 | 88.2 | 82.0 | 90.1 | 89.2 |
Model 3 | 83.4 | 91.8 | 90.6 | 81.5 | 89.8 | 89.0 | 82.4 | 90.7 | 89.8 |
Model 4 | 84.7 | 93.1 | 91.9 | 82.6 | 91.3 | 89.8 | 83.6 | 92.2 | 90.8 |
Model 5 | 84.1 | 91.8 | 91.3 | 81.6 | 90.8 | 88.9 | 82.8 | 91.3 | 90.1 |
Model 6 | 84.6 | 92.7 | 91.4 | 84.5 | 92.8 | 91.7 | 84.6 | 92.7 | 91.5 |
Model 7 | 85.0 | 92.8 | 91.9 | 85.2 | 93.8 | 93.1 | 85.1 | 93.3 | 92.5 |
Model 8 | 86.7 | 93.7 | 92.8 | 89.8 | 94.9 | 94.7 | 88.2 | 94.3 | 93.7 |
Average Accuracy (%) | Recall (%) | F1 (%) | |||||||
---|---|---|---|---|---|---|---|---|---|
Small | Mid | Large | Small | Mid | Large | Small | Mid | Large | |
Model 1 | 84.5 | 92.6 | 92.1 | 82.7 | 91.8 | 88.9 | 83.5 | 92.2 | 90.5 |
Model 2 | 84.9 | 92.9 | 92.5 | 83.6 | 92.2 | 89.4 | 84.2 | 92.5 | 91.0 |
Model 3 | 85.6 | 93.2 | 91.8 | 87.7 | 93.5 | 92.1 | 86.6 | 93.3 | 91.9 |
Model 4 | 84.9 | 92.0 | 91.6 | 83.9 | 89.3 | 88.5 | 84.4 | 90.6 | 90.0 |
Model 5 | 85.2 | 92.3 | 91.8 | 84.2 | 91.7 | 91.2 | 84.7 | 82.0 | 91.5 |
Model 6 | 85.7 | 92.8 | 92.3 | 86.2 | 93.9 | 94.6 | 85.9 | 93.3 | 93.0 |
Model 7 | 85.2 | 93.2 | 92.5 | 89.3 | 94.6 | 93.6 | 87.2 | 93.9 | 93.0 |
Model 8 | 86.0 | 93.3 | 91.5 | 89.5 | 94.8 | 94.4 | 87.7 | 94.0 | 92.9 |
Model 9 | 86.7 | 93.7 | 92.8 | 89.8 | 94.9 | 94.7 | 88.2 | 94.3 | 93.7 |
Average Accuracy (%) | Recall (%) | FPS | |||||
---|---|---|---|---|---|---|---|
Small | Mid | Large | Small | Mid | Large | ||
Faster R-CNN | 76.5 | 88.7 | 88.2 | 76.2 | 89.3 | 88.6 | 12.0 |
CornerNet | 79.6 | 91.2 | 91.0 | 78.3 | 90.4 | 91.8 | 4.3 |
CenterNet | 82.7 | 91.0 | 90.8 | 78.7 | 91.5 | 93.6 | 7.9 |
Ours | 86.7 | 95.5 | 95.0 | 92.8 | 97.1 | 96.5 | 10.8 |
Classes | p3 | p5 | p6 | p10 | p12 | p19 | p23 | p26 | p27 | pg |
Precision(%) | 87.0 | 94.1 | 86.5 | 86.2 | 76.2 | 93.2 | 92.1 | 90.2 | 91.2 | 90.2 |
Classes | ph4 | ph4.5 | ph5 | pl20 | pl30 | pl40 | pl50 | pl60 | pl70 | pl80 |
Precision(%) | 80.2 | 90.5 | 69.5 | 82.3 | 90.3 | 90.0 | 85.5 | 90.2 | 86.8 | 92.0 |
Classes | pl100 | pl120 | pm20 | pm30 | pm55 | pn | pne | pr40 | w13 | w32 |
Precision(%) | 96.5 | 97.5 | 90.5 | 90.9 | 93.9 | 86.9 | 96.8 | 93.6 | 70.2 | 90.2 |
Classes | w55 | w57 | w59 | i2 | i4 | i5 | il60 | il80 | il100 | ip |
Precision(%) | 88.5 | 91.9 | 87.5 | 84.6 | 86.5 | 87.3 | 92.2 | 96.5 | 95.3 | 85.6 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lin, S.; Zhang, Z.; Tao, J.; Zhang, F.; Fan, X.; Lu, Q. Traffic Sign Detection Based on Lightweight Multiscale Feature Fusion Network. Sustainability 2022, 14, 14019. https://doi.org/10.3390/su142114019
Lin S, Zhang Z, Tao J, Zhang F, Fan X, Lu Q. Traffic Sign Detection Based on Lightweight Multiscale Feature Fusion Network. Sustainability. 2022; 14(21):14019. https://doi.org/10.3390/su142114019
Chicago/Turabian StyleLin, Shan, Zicheng Zhang, Jie Tao, Fan Zhang, Xing Fan, and Qingchang Lu. 2022. "Traffic Sign Detection Based on Lightweight Multiscale Feature Fusion Network" Sustainability 14, no. 21: 14019. https://doi.org/10.3390/su142114019