Efficient Roadside Vehicle Line-Pressing Identification in Intelligent Transportation Systems with Mask-Guided Attention
Abstract
:1. Introduction
- We propose VLPI-RC, a large-scale dataset containing diverse scenarios, enabling a more comprehensive evaluation of model performance.
- We propose a method that integrates vehicle features with lane line features, enabling end-to-end processing for vehicle line-pressing identification. This enhances the model’s efficiency, allowing it to adapt to more complex environments through automated feature learning.
- We introduce a mask-guided attention mechanism, which utilizes lane line masks as prior information. This allows the model to more effectively capture the relationship between vehicle features and lane line features, focusing more on the key areas of vehicle line-pressing.
- We propose BBCL to address the data imbalance issue and introduce a hard example mining strategy in contrastive learning, helping the model generate more discriminative features.
2. Related Work
3. VLPI-RC Dataset
- BrnoCompSpeed [20]: This dataset uses traffic cameras to collect 21 videos, each of which is about 60 min long and has a resolution of 1920 × 1080. It contains a total of seven different scenes, and the same scene is divided into three perspectives: left, middle, and right, as shown in Figure 3a. We sampled 11,139 video frames for data annotation.
- Private Datasets: We collected 4032 images at a resolution of 1920 × 1080 using multiple roadside cameras deployed in Beijing and Shanghai, China, covering diverse traffic scenarios such as highways and urban intersections. As illustrated in Figure 3d, the dataset also includes recordings under varying environmental conditions, including rainy weather and nighttime, to comprehensively evaluate the robustness of the proposed method in real-world settings.
4. Method
4.1. Overview
4.2. Robust Input Augmentation
4.3. Feature Fusion Module
4.4. Mask-Guided Attention Module
4.5. Learning Balanced and Discriminative Features
5. Experiments
5.1. Implementation Details
5.2. Performance Metric
5.3. Comparison with State-of-the-Art Methods
5.4. Ablation Study
5.4.1. Impact of Loss Function
5.4.2. Impact of Attention Mechanisms
5.4.3. Impact of Backbone Networks
5.4.4. Impact of Image Size
5.4.5. Impact of Data Quantity
5.4.6. Impact of Inaccurate Bounding Boxes
5.5. Identification Speed
6. Conclusions and Discussion
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Mittal, U.; Chawla, P.; Tiwari, R. EnsembleNet: A hybrid approach for vehicle detection and estimation of traffic density based on faster R-CNN and YOLO models. Neural Comput. Appl. 2023, 35, 4755–4774. [Google Scholar] [CrossRef]
- Jin, G.; Wang, M.; Zhang, J.; Sha, H.; Huang, J. STGNN-TTE: Travel time estimation via spatial–temporal graph neural network. Future Gener. Comput. Syst. 2022, 126, 70–81. [Google Scholar] [CrossRef]
- Yao, A.; Huang, M.; Qi, J.; Zhong, P. Attention mask-based network with simple color annotation for UAV vehicle re-identification. IEEE Geosci. Remote Sens. Lett. 2021, 19, 8014705. [Google Scholar] [CrossRef]
- Zhu, W.; Wang, Z.; Wang, X.; Hu, R.; Liu, H.; Liu, C.; Wang, C.; Li, D. A Dual Self-Attention mechanism for vehicle re-Identification. Pattern Recognit. 2023, 137, 109258. [Google Scholar] [CrossRef]
- Yu, L.; Du, B.; Hu, X.; Sun, L.; Han, L.; Lv, W. Deep spatio-temporal graph convolutional network for traffic accident prediction. Neurocomputing 2021, 423, 135–147. [Google Scholar] [CrossRef]
- Zhao, C.; Chang, X.; Xie, T.; Fujita, H.; Wu, J. Unsupervised anomaly detection based method of risk evaluation for road traffic accident. Appl. Intell. 2023, 53, 369–384. [Google Scholar] [CrossRef]
- 5GAA. C-V2X Use Cases and Service Level Requirements Volume II. 2023. Available online: https://5gaa.org/c-v2x-use-cases-and-service-level-requirements-volume-ii (accessed on 22 April 2005).
- Lee, H.; Jeong, S.; Lee, J. Robust detection system of illegal lane changes based on tracking of feature points. IET Intell. Transp. Syst. 2013, 7, 20–27. [Google Scholar] [CrossRef]
- HD, A.K.; Prabhakar, C. Vehicle abnormality detection and classification using model based tracking. Int. J. Adv. Res. Comput. Sci. 2017, 8, 842. [Google Scholar]
- Arun Kumar, H.D.; Prabhakar, C.J. Detection and Tracking of Lane Crossing Vehicles in Traffic Video for Abnormality Analysis. Int. J. Eng. Adv. Technol. 2021, 10, 1–9. [Google Scholar] [CrossRef]
- Zhou, Z.; Li, R.; Gao, Y.; Zhang, C.; Hei, X. SLDNet: A Branched, Spatio-Temporal Convolution Neural Network for Detecting Solid Line Driving Violation in Intelligent Transportation Systems. In Proceedings of the 2020 Information Communication Technologies Conference (ICTC), Nanjing, China, 29–31 May 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 313–317. [Google Scholar]
- Gao, F.; Zhou, M.; Weng, L.; Lu, S. An automatic verification method for vehicle line-pressing violation based on CNN and geometric projection. J. Ambient. Intell. Humaniz. Comput. 2021, 14, 1889–1901. [Google Scholar] [CrossRef]
- Wu, S.; Ge, F.; Zhang, Y. A Vehicle Line-Pressing Detection Approach Based on YOLOv5 and DeepSort. In Proceedings of the 2022 IEEE 22nd International Conference on Communication Technology (ICCT), Nanjing, China, 11–14 November 2022; pp. 1745–1749. [Google Scholar] [CrossRef]
- Zheng, G.; Lin, J.; Qin, Y.; Tan, B. A novel vehicle line-pressing detection framework based on 3D object detection. In Proceedings of the Fourth International Conference on Signal Processing and Computer Science (SPCS 2023), Guilin, China, 25–27 August 2023; SPIE: Bellingham, WA, USA, 2023; Volume 12970, pp. 243–250. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar] [CrossRef]
- Neven, D.; De Brabandere, B.; Georgoulis, S.; Proesmans, M.; Van Gool, L. Towards end-to-end lane detection: An instance segmentation approach. In Proceedings of the 2018 IEEE Intelligent Vehicles Symposium (IV), Changshu, China, 26–30 June 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 286–291. [Google Scholar]
- Li, G.; Qiu, Y.; Yang, Y.; Li, Z.; Li, S.; Chu, W.; Green, P.; Li, S.E. Lane Change Strategies for Autonomous Vehicles: A Deep Reinforcement Learning Approach Based on Transformer. IEEE Trans. Intell. Veh. 2023, 8, 2197–2211. [Google Scholar] [CrossRef]
- Biparva, M.; Fernández-Llorca, D.; Gonzalo, R.I.; Tsotsos, J.K. Video Action Recognition for Lane-Change Classification and Prediction of Surrounding Vehicles. IEEE Trans. Intell. Veh. 2022, 7, 569–578. [Google Scholar] [CrossRef]
- Zhang, X.; Li, Y.; Zhan, R.; Chen, J.; Li, J. The Line Pressure Detection for Autonomous Vehicles Based on Deep Learning. J. Adv. Transp. 2022, 2022, 4489770. [Google Scholar] [CrossRef]
- Sochor, J.; Juránek, R.; Špaňhel, J.; Maršík, L.; Široký, A.; Herout, A.; Zemčík, P. Comprehensive Data Set for Automatic Single Camera Visual Speed Measurement. IEEE Trans. Intell. Transp. Syst. 2019, 20, 1633–1643. [Google Scholar] [CrossRef]
- Dong, Z.; Wu, Y.; Pei, M.; Jia, Y. Vehicle Type Classification Using a Semisupervised Convolutional Neural Network. IEEE Trans. Intell. Transp. Syst. 2015, 16, 2247–2256. [Google Scholar] [CrossRef]
- Guerrero-Gomez-Olmedo, R.; Lopez-Sastre, R.J.; Maldonado-Bascon, S.; Fernandez-Caballero, A. Vehicle Tracking by Simultaneous Detection and Viewpoint Estimation. In Proceedings of the 5th International Work-Conference on the Interplay Between Natural and Artificial Computation, IWINAC 2013, Mallorca, Spain, 10–14 June 2013; pp. 306–316. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, Red Hook, NY, USA, 4–9 December 2017; pp. 6000–6010. [Google Scholar]
- Wang, X.; Girshick, R.; Gupta, A.; He, K. Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7794–7803. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar]
- Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H. Dual attention network for scene segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3146–3154. [Google Scholar]
- Williams, C.K.; Barber, D. Bayesian classification with Gaussian processes. IEEE Trans. Pattern Anal. Mach. Intell. 1998, 20, 1342–1351. [Google Scholar] [CrossRef]
- Qin, Y.; Yan, C.; Liu, G.; Li, Z.; Jiang, C. Pairwise Gaussian loss for convolutional neural networks. IEEE Trans. Ind. Inform. 2020, 16, 6324–6333. [Google Scholar] [CrossRef]
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. Pytorch: An imperative style, high-performance deep learning library. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; Volume 32. [Google Scholar]
- Sutskever, I.; Martens, J.; Dahl, G.; Hinton, G. On the importance of initialization and momentum in deep learning. In Proceedings of the International Conference on Machine Learning, Atlanta, GA, USA, 17–19 June 2013; pp. 1139–1147. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
- Zhang, H.; Wu, C.; Zhang, Z.; Zhu, Y.; Lin, H.; Zhang, Z.; Sun, Y.; He, T.; Mueller, J.; Manmatha, R.; et al. Resnest: Split-attention networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 2736–2746. [Google Scholar]
- Zhang, X.; Zhou, X.; Lin, M.; Sun, J. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 6848–6856. [Google Scholar] [CrossRef]
- Ma, N.; Zhang, X.; Zheng, H.T.; Sun, J. Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 116–131. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 4510–4520. [Google Scholar]
- Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar]
- Jocher, G.; Chaurasia, A.; Stoken, A.; Borovec, J.; Kwon, Y.; Michael, K.; Fang, J.; Yifu, Z.; Wong, C.; Montes, D.; et al. ultralytics/yolov5: v7. 0-YOLOv5 SOTA Realtime Instance Segmentation; Zenodo: Geneva, Switzerland, 2022. [Google Scholar]
- Wang, T.; Xinge, Z.; Pang, J.; Lin, D. Probabilistic and geometric depth: Detecting objects in perspective. In Proceedings of the Conference on Robot Learning PMLR, Auckland, New Zealand, 14–18 December 2022; pp. 1475–1485. [Google Scholar]
Dataset | Total Image | Total Sample | Normal Sample | Line-Pressing Sample | Imbalance Ratio |
---|---|---|---|---|---|
BrnoCompSpeed [20] | 11,139 | 22,599 | 18,502 | 4097 | 4.51 |
BIT-Vehicle [21] | 2196 | 2324 | 1795 | 529 | 3.39 |
GRAM-RTM [22] | 939 | 3824 | 3438 | 386 | 8.90 |
Private Dataset | 4050 | 5769 | 4439 | 1330 | 3.33 |
Total | 18,324 | 34,516 | 28,174 | 6342 | 4.44 |
Dataset | Training | Validation | Test |
---|---|---|---|
N/L | N/L | N/L | |
BrnoCompSpeed | 7400/1638 | 3701/820 | 7401/1639 |
BIT-Vehicle | 718/211 | 359/106 | 718/212 |
GRAM-RTM | 1375/154 | 688/78 | 1375/154 |
Private Dataset | 1775/532 | 888/266 | 1776/532 |
Total | 11,268/2535 | 5636/1270 | 11,270/2537 |
Ground Truth | Predicted | |
---|---|---|
Normal Class | Line-Pressing Class | |
Normal Class | TP (True Positive) | FN (False Negative) |
Line-Pressing Class | FP (False Positive) | TN (True Negative) |
Sub-Dataset Name | Method | PPV | NPV | SPE | SEN | ACC | F1 * | MCC * | AUC * |
---|---|---|---|---|---|---|---|---|---|
BrnoCompSpeed [20] | Original Image † | 72.03 | 89.87 | 95.58 | 51.37 | 87.57 | 59.97 | 0.5391 | 88.54 |
H.D et al. [9,10] | 25.19 | 87.72 | 58.59 | 62.97 | 59.38 | 35.98 | 0.1668 | 60.78 | |
SLDNet [11] | 90.64 | 96.98 | 98.03 | 86.21 | 95.88 | 88.37 | 0.8591 | 98.13 | |
Gao et al. [12] | 76.32 | 96.29 | 94.26 | 83.59 | 92.32 | 79.79 | 0.7518 | 97.51 | |
Wu et al. [13] | 77.23 | 96.35 | 94.53 | 83.83 | 92.59 | 80.40 | 0.7593 | 97.84 | |
Zheng et al. [14] | 86.88 | 96.91 | 97.12 | 86.03 | 95.11 | 86.45 | 0.8347 | 98.17 | |
Ours | 96.91 | 99.43 | 99.31 | 97.44 | 98.97 | 97.17 | 0.9654 | 99.84 | |
BIT-Vehicle [21] | Original Image † | 71.33 | 86.54 | 94.01 | 50.47 | 84.09 | 59.12 | 0.5074 | 87.59 |
H.D et al. [9,10] | 81.33 | 89.92 | 95.68 | 63.68 | 88.39 | 71.43 | 0.6503 | 79.68 | |
SLDNet [11] | 96.79 | 95.83 | 99.16 | 85.38 | 96.02 | 90.73 | 0.8849 | 97.84 | |
Gao et al. [12] | 97.42 | 96.88 | 99.30 | 89.15 | 96.99 | 93.10 | 0.9133 | 98.97 | |
Wu et al. [13] | 95.59 | 97.66 | 98.75 | 91.98 | 97.20 | 93.75 | 0.9198 | 98.92 | |
Zheng et al. [14] | 98.97 | 97.41 | 99.72 | 91.04 | 97.74 | 94.84 | 0.9353 | 98.88 | |
Ours | 99.05 | 99.58 | 99.72 | 98.58 | 99.46 | 98.82 | 0.9847 | 99.97 | |
GRAM-RTM [22] | Original Image † | 83.67 | 94.97 | 98.84 | 53.25 | 94.24 | 65.08 | 0.6400 | 92.62 |
H.D et al. [9,10] | 12.06 | 92.50 | 44.87 | 67.53 | 47.16 | 20.47 | 0.0753 | 56.20 | |
SLDNet [11] | 85.93 | 97.27 | 98.62 | 75.32 | 96.27 | 80.28 | 0.7843 | 96.29 | |
Gao et al. [12] | 89.51 | 98.12 | 98.91 | 83.12 | 97.32 | 86.20 | 0.8478 | 97.10 | |
Wu et al. [13] | 92.03 | 98.06 | 99.20 | 82.47 | 97.51 | 86.99 | 0.8577 | 97.32 | |
Zheng et al. [14] | 87.90 | 98.83 | 98.62 | 89.61 | 97.71 | 88.75 | 0.8748 | 97.71 | |
Ours | 93.17 | 99.71 | 99.20 | 97.40 | 99.02 | 95.24 | 0.9472 | 99.86 | |
Private Dataset | Original Image † | 60.00 | 83.01 | 92.68 | 36.65 | 79.77 | 45.51 | 0.3552 | 77.96 |
H.D et al. [9,10] | 32.85 | 87.06 | 55.69 | 72.37 | 59.53 | 45.19 | 0.2363 | 64.03 | |
SLDNet [11] | 86.20 | 91.91 | 96.57 | 71.62 | 90.81 | 78.23 | 0.7298 | 85.45 | |
Gao et al. [12] | 86.60 | 93.20 | 96.45 | 76.50 | 91.85 | 81.24 | 0.7630 | 85.74 | |
Wu et al. [13] | 87.77 | 93.32 | 96.79 | 76.88 | 92.20 | 81.96 | 0.7729 | 86.04 | |
Zheng et al. [14] | 90.51 | 94.38 | 97.47 | 80.64 | 93.59 | 85.29 | 0.8143 | 86.80 | |
Ours | 91.64 | 98.41 | 97.41 | 94.74 | 96.79 | 93.16 | 0.9109 | 98.98 |
Loss Function | ACC | F1 | MCC | AUC |
---|---|---|---|---|
Softmax [28] | 97.86 | 94.07 | 0.9278 | 99.40 |
Focal Loss [32] | 98.08 | 94.75 | 0.9358 | 99.66 |
Softmax + BBCL ( = 0.1) | 98.51 | 95.99 | 0.9508 | 99.74 |
Softmax + BBCL ( = 0.05) | 98.65 | 96.34 | 0.9551 | 99.75 |
Softmax + BBCL ( = 0.01) | 98.57 | 96.14 | 0.9526 | 99.75 |
Method | ACC | F1 | MCC | AUC |
---|---|---|---|---|
ResNet50 | 97.65 | 93.75 | 0.9235 | 99.62 |
ResNet50 (Mask-Guided Attention) | 98.65 | 96.34 | 0.9551 | 99.75 |
BackBone | ACC | F1 | Params | FLOPs | FPS |
---|---|---|---|---|---|
ResNet18 | 98.46 | 95.86 | 11.49 | 1.24 | 269.20 |
ResNet34 | 98.56 | 96.12 | 21.60 | 2.45 | 190.02 |
ResNet50 | 98.65 | 96.34 | 26.82 | 2.97 | 154.85 |
ResNet101 | 98.70 | 96.46 | 45.81 | 5.40 | 94.18 |
DenseNet121 | 98.13 | 94.95 | 7.92 | 1.95 | 77.94 |
DenseNet169 | 98.36 | 95.54 | 14.19 | 2.32 | 57.58 |
DenseNet201 | 98.49 | 95.92 | 20.59 | 2.96 | 48.62 |
ResNeSt50 | 98.63 | 96.25 | 28.77 | 4.02 | 74.85 |
ResNeSt101 | 98.73 | 96.55 | 49.66 | 7.92 | 41.37 |
ResNeSt200 | 98.82 | 96.81 | 71.59 | 12.67 | 21.39 |
ShuffleNetV1 | 98.07 | 94.80 | 1.75 | 0.13 | 183.07 |
ShuffleNetV2 | 98.20 | 95.12 | 1.66 | 0.11 | 174.30 |
MobileNetV1 | 97.86 | 94.27 | 4.17 | 0.45 | 301.23 |
MobileNetV2 | 98.14 | 94.99 | 2.59 | 0.22 | 209.08 |
MobileNetV3 | 97.78 | 94.04 | 3.11 | 0.16 | 148.45 |
BackBone | Input Size | ACC | F1 | FLOPs | FPS |
---|---|---|---|---|---|
ResNet50 | 32 × 32 | 97.46 | 93.20 | 0.19 | 164.53 |
64 × 64 | 98.17 | 95.04 | 0.74 | 160.52 | |
128 × 128 | 98.65 | 96.34 | 2.97 | 156.65 | |
256 × 256 | 98.67 | 96.41 | 11.86 | 151.38 | |
ShuffleNetV2 | 32 × 32 | 97.43 | 93.00 | 0.01 | 185.08 |
64 × 64 | 97.82 | 94.18 | 0.03 | 184.92 | |
128 × 128 | 98.20 | 95.12 | 0.11 | 170.99 | |
256 × 256 | 98.27 | 95.27 | 0.43 | 164.40 |
Sample | N/L | ACC | F1 | MCC | AUC |
---|---|---|---|---|---|
20% | 2253/505 | 96.94 | 91.81 | 0.8994 | 99.03 |
40% | 4507/1012 | 97.41 | 93.04 | 0.9146 | 99.19 |
60% | 6760/1519 | 98.07 | 94.79 | 0.9361 | 99.57 |
80% | 9014/2026 | 98.44 | 95.77 | 0.9482 | 99.57 |
100% | 11,268/2535 | 98.65 | 96.34 | 0.9551 | 99.75 |
Perturbation Level | ACC | F1 | MCC | AUC |
---|---|---|---|---|
0% (GT BBox) | 98.65 | 96.34 | 0.9551 | 99.75 |
5% | 98.58 | 96.16 | 0.9530 | 99.75 |
10% | 98.45 | 95.81 | 0.9486 | 99.71 |
15% | 98.27 | 95.33 | 0.9427 | 99.66 |
20% | 97.95 | 94.44 | 0.9319 | 99.63 |
25% | 97.55 | 93.40 | 0.9190 | 99.53 |
30% | 97.26 | 92.65 | 0.9098 | 99.42 |
Method | Object Detection Method | Avg. Time on Object Detection | Line-Pressing Identification Method | Avg. Time for Line-Pressing Identification | Total FPS |
---|---|---|---|---|---|
NVIDIA A40 | |||||
H.D et al. [9,10] | Yolov5s | 5.7629 ms | Distance Calculation | 0.0013 ms | 173.42 |
SLDNet [11] | Mask R-CNN [15] | 68.0569 ms | ResNet34 | 3.8103 ms | 13.91 |
Gao et al. [12] | Yolov5s | 5.7626 ms | Chassis Pose Fitting | 0.0078 ms | 173.25 |
Wu et al. [13] | Yolov5s | 5.7611 ms | Chassis Pose Fitting | 0.0083 ms | 173.26 |
Zheng et al. [14] | PGD [42] | 79.5759 ms | Overlap Determination | 0.8299 ms | 12.44 |
Ours | Yolov5s | 5.7614 ms | MobileNetV1 | 3.4768 ms | 108.29 |
Yolov5m | 7.8747 ms | MobileNetV1 | 3.4773 ms | 88.10 | |
Yolov5l | 10.3311 ms | MobileNetV1 | 3.4777 ms | 72.41 | |
Yolov5x | 15.3634 ms | MobileNetV1 | 3.4745 ms | 53.11 | |
NVIDIA Jetson AGX | |||||
H.D et al. [9,10] | Yolov5s | 26.3693 ms | Distance Calculation | 0.0073 ms | 37.91 |
SLDNet [11] | Mask R-CNN [15] | 402.1801 ms | ResNet34 | 11.1265 ms | 2.41 |
Gao et al. [12] | Yolov5s | 26.2754 ms | Chassis Pose Fitting | 0.0493 ms | 37.98 |
Wu et al. [13] | Yolov5s | 26.7673 ms | Chassis Pose Fitting | 0.0434 ms | 37.29 |
Zheng et al. [14] | PGD [42] | 487.7193 ms | Overlap Determination | 0.8299 ms | 2.04 |
Ours | Yolov5s | 26.3865 ms | MobileNetV1 | 10.4323 ms | 27.16 |
Yolov5m | 49.7241 ms | MobileNetV1 | 10.4362 ms | 16.62 | |
Yolov5l | 82.3505 ms | MobileNetV1 | 10.4377 ms | 10.77 | |
Yolov5x | 145.3109 ms | MobileNetV1 | 10.4335 ms | 6.42 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Qin, Y.; Qi, X.; Hao, R.; Sun, T.; Song, J. Efficient Roadside Vehicle Line-Pressing Identification in Intelligent Transportation Systems with Mask-Guided Attention. Sustainability 2025, 17, 3845. https://doi.org/10.3390/su17093845
Qin Y, Qi X, Hao R, Sun T, Song J. Efficient Roadside Vehicle Line-Pressing Identification in Intelligent Transportation Systems with Mask-Guided Attention. Sustainability. 2025; 17(9):3845. https://doi.org/10.3390/su17093845
Chicago/Turabian StyleQin, Yuxiang, Xinzhou Qi, Ruochen Hao, Tuo Sun, and Jun Song. 2025. "Efficient Roadside Vehicle Line-Pressing Identification in Intelligent Transportation Systems with Mask-Guided Attention" Sustainability 17, no. 9: 3845. https://doi.org/10.3390/su17093845
APA StyleQin, Y., Qi, X., Hao, R., Sun, T., & Song, J. (2025). Efficient Roadside Vehicle Line-Pressing Identification in Intelligent Transportation Systems with Mask-Guided Attention. Sustainability, 17(9), 3845. https://doi.org/10.3390/su17093845