Research on an Intelligent Driving Algorithm Based on the Double Super-Resolution Network
Abstract
:1. Introduction
2. Materials and Methods
2.1. Dual Super-Resolution Learning
2.2. You Only Look One-Level Feature
- (1)
- C5 feature provides sufficient semantic information for object detection at different scales, which enables the SiMo encoder to achieve the same results as the MiMo encoder;
- (2)
- The benefit of multi-scale feature fusion is far less important than the divide-and-conquer strategy, so multi-scale feature fusion may not be the most significant benefit of FPN.
2.3. Involution
2.4. Network Structure
2.5. Loss Function
3. Results
3.1. Construction of Dataset
3.2. Network Model Evaluation Index
- (1)
- PA (Pixel Accuracy): The ratio between the number of pixels correctly classified and all pixel points is shown in Formula (5).
- (2)
- MPA (Mean Pixel Accuracy) calculated the average value based on the proportion of correctly classified pixel points to all pixel points, and the formula is shown in (6).
- (3)
- MIOU (Mean Intersection over Union): The ratio between the intersection between the real value and the predicted value and the union between the real value and the predicted value is averaged, and the formula is shown in (7).
- (4)
- DICE: The ratio of the intersection of 2 times the predicted result and the real result to the predicted result plus the real result is shown in Formula (8), where represents the real value, represents the predicted value.
3.3. Analysis of Training Results
- (1)
- The IOU value increased from 91.17% to 95.23%;
- (2)
- PA value increased from 94.42% to 98.99%;
- (3)
- DICE increased from 56.59% to 60.49%.
- (1)
- The network with VGG16 as the backbone can be achieved with half as few parameters than ResNet50 and ResNet101 with a similar effect. In terms of the segmentation accuracy of the lane line part of the road, the accuracy of VGG16 and ResNet50 is similar. Both lane lines can be clearly segmented, which is better than ResNet101. In terms of the segmentation accuracy of the tire shape at the bottom of the car, the segmentation accuracy of VGG16 is slightly better than ResNet50 and ResNet101, which can better fit the tire shape.
- (2)
- The tire shape segmentation accuracy of the network with CSPdarkNet53 as the backbone is better than VGG16, ResNet50, and ResNet101 on the lane line and the bottom of the vehicle, and fits better with the lane line and tire shape.
- (1)
- Only half of the input is involved in the calculation, which can greatly reduce the amount of calculation and memory consumption;
- (2)
- In the process of back propagation, a completely independent gradient propagation path is added, which can prevent feature loss caused by excessive convolution, and there is no reuse of gradient information.
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Saleh, M.; Hatzopoulou, M. Greenhouse gas emissions attributed to empty kilometers in automated vehicles. Transp. Res. Part D Transp. Environ. 2020, 88, 102567. [Google Scholar] [CrossRef]
- Gawron, J.H.; Keoleian, G.A.; De Kleine, R.D.; Wallington, T.J.; Kim, H.C. Deep decarbonization from electrified autonomous taxi fleets: Life cycle assessment and case study in Austin, TX. Transp. Res. Part D Transp. Environ. 2019, 73, 130–141. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv 2014, arXiv:1412.7062. [Google Scholar]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 834–848. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Adam, H.; Schroff, F. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 June 2017; pp. 2881–2890. [Google Scholar]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef] [PubMed]
- Yu, J.; Yu, Z. Mono-Vision Based Lateral Localization System of Low-Cost Autonomous Vehicles Using Deep Learning Curb Detection. Actuators 2021, 10, 57. [Google Scholar] [CrossRef]
- Mehta, S.; Rastegari, M.; Caspi, A.; Shapiro, L.; Hajishirzi, H. Espnet: Efficient spatial pyramid of dilated convolutions for semantic segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 552–568. [Google Scholar]
- Mehta, S.; Rastegari, M.; Shapiro, L.; Hajishirzi, H. Espnetv2: A light-weight, power efficient, and general purpose convolutional neural network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 9190–9200. [Google Scholar]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
- Ma, N.; Zhang, X.; Zheng, H.T.; Sun, J. Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 116–131. [Google Scholar]
- Han, S.; Mao, H.; Dally, W.J. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv 2015, arXiv:1510.00149. [Google Scholar]
- Li, D.; Hu, J.; Wang, C.; Zhu, L.; Zhang, T.; Chen, Q. Involution: Inverting the inherence of convolution for visual recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 12321–12330. [Google Scholar]
- Wang, L.; Li, D.; Zhu, Y.; Shan, Y.; Tian, L. Dual super-resolution learning for semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 3774–3783. [Google Scholar]
- Kim, S.W.; Kook, H.K.; Sun, J.Y.; Kang, M.C.; Ko, S.J. Parallel feature pyramid network for object detection. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 234–250. [Google Scholar]
- Chen, Q.; Wang, Y.; Yang, T.; Zhang, X.; Cheng, J.; Sun, J. You only look one-level feature. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13039–13048. [Google Scholar]
Model | Estimated Total Size | Params Size |
---|---|---|
DSRL | 8091.60 (MB) | 231.03 (MB) |
MY | 5438.59 (MB) | 40.88 (MB) |
Evaluating Indicator | DSRL | MY |
---|---|---|
IOU | 91.17% | 95.23% |
PA | 94.42% | 98.99% |
DICE | 56.59% | 60.49% |
Evaluating Indicator | DSRL | MY |
---|---|---|
IOU | 95.23% | 92.25% |
PA | 98.99% | 97.86% |
DICE | 60.49% | 60.25% |
Backbone | Estimated Total Size | Params Size |
---|---|---|
VGG16 | 3751.10 (MB) | 76.87 (MB) |
ResNet50 | 6948.62 (MB) | 113.41 (MB) |
ResNet101 | 6517.05 (MB) | 185.86 (MB) |
CSPDarkNet53 | 5438.59 (MB) | 40.88 (MB) |
Backbone | IOU | PA | DICE |
---|---|---|---|
VGG16 | 94.38% | 96.51% | 60.21% |
ResNet50 | 92.25% | 97.65% | 60.25% |
ResNet101 | 91.44% | 96.55% | 60.22% |
CSPDarkNet53 | 95.23% | 96.55% | 60.49% |
Network | DSRL | MY |
---|---|---|
Picture1 | 1.36(s) | 1.11(s) |
Picture2 | 2.05(s) | 1.70(s) |
Picture3 | 2.25(s) | 1.71(s) |
Picture4 | 2.50(s) | 1.72(s) |
Picture5 | 2.16(s) | 1.73(s) |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hang, T.; Li, B.; Zhao, Q.; Bei, S.; Han, X.; Zhou, D.; Zhou, X. Research on an Intelligent Driving Algorithm Based on the Double Super-Resolution Network. Actuators 2022, 11, 69. https://doi.org/10.3390/act11030069
Hang T, Li B, Zhao Q, Bei S, Han X, Zhou D, Zhou X. Research on an Intelligent Driving Algorithm Based on the Double Super-Resolution Network. Actuators. 2022; 11(3):69. https://doi.org/10.3390/act11030069
Chicago/Turabian StyleHang, Taoyang, Bo Li, Qixian Zhao, Shaoyi Bei, Xiao Han, Dan Zhou, and Xinye Zhou. 2022. "Research on an Intelligent Driving Algorithm Based on the Double Super-Resolution Network" Actuators 11, no. 3: 69. https://doi.org/10.3390/act11030069
APA StyleHang, T., Li, B., Zhao, Q., Bei, S., Han, X., Zhou, D., & Zhou, X. (2022). Research on an Intelligent Driving Algorithm Based on the Double Super-Resolution Network. Actuators, 11(3), 69. https://doi.org/10.3390/act11030069