CNN-Based Vehicle Bottom Face Quadrilateral Detection Using Surveillance Cameras for Intelligent Transportation Systems
Abstract
:1. Introduction
2. Related Works
2.1. Corner-Based Approach
2.2. PSA-Based Appraoch
2.3. Line-Based Approach
3. Implementation Details
3.1. Implementation Using YOLO
3.2. Implementation of Corner-Based Approach
3.3. Implementation of PSA-Based Approach
3.4. Implementation of Line-Based Approach
3.4.1. Line Parameter-Based Method
3.4.2. Intersection Point-Based Method
4. Experiments
4.1. Dataset and Training
4.2. Evaluation Criteria
4.3. Evaluation Results
4.4. Result Analysis
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Yang, Z.; Pun-Cheng, L.S. Vehicle detection in intelligent transportation systems and its applications under varying environments: A review. Image Vis. Comput. 2018, 69, 143–154. [Google Scholar] [CrossRef]
- Yu, H.; Luo, Y.; Shu, M.; Huo, Y.; Yang, Z.; Shi, Y.; Guo, Z.; Li, H.; Hu, X.; Yuan, J. Dair-v2x: A large-scale dataset for vehicle-infrastructure cooperative 3D object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 21361–21370. [Google Scholar]
- Zwemer, M.; Scholte, D.; Wijnhoven, R. 3D Detection of Vehicles from 2D Images in Traffic Surveillance. In Proceedings of the 17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, VISIGRAPP 2022, Online, 6–8 February 2022; pp. 97–106. [Google Scholar]
- Chen, Y.; Liu, F.; Pei, K. Monocular Vehicle 3D Bounding Box Estimation Using Homograhy and Geometry in Traffic Scene. In Proceedings of the ICASSP 2022–2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 23–27 May 2022; pp. 1995–1999. [Google Scholar]
- Zhu, M.; Zhang, S.; Zhong, Y.; Lu, P.; Peng, H.; Lenneman, J. Monocular 3D vehicle detection using uncalibrated traffic cameras through homography. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 27 September–1 October 2021; pp. 3814–3821. [Google Scholar]
- Li, P.; Zhao, H.; Liu, P.; Cao, F. Rtm3d: Real-time monocular 3D detection from object keypoints for autonomous driving. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part III 16, 2020. pp. 644–660. [Google Scholar]
- Gählert, N.; Wan, J.-J.; Weber, M.; Zöllner, J.M.; Franke, U.; Denzler, J. Beyond bounding boxes: Using bounding shapes for real-time 3D vehicle detection from monocular rgb images. In Proceedings of the 2019 IEEE Intelligent Vehicles Symposium (IV), Paris, France, 9–12 June 2019; pp. 675–682. [Google Scholar]
- Barabanau, I.; Artemov, A.; Burnaev, E.; Murashkin, V. Monocular 3D object detection via geometric reasoning on keypoints. arXiv 2019, arXiv:1905.05618. [Google Scholar]
- Yang, W.; Li, Z.; Wang, C.; Li, J. A multi-task Faster R-CNN method for 3D vehicle detection based on a single image. Appl. Soft Comput. 2020, 95, 106533. [Google Scholar] [CrossRef]
- Huang, S.; Cai, G.; Wang, Z.; Xia, Q.; Wang, R. SSA3D: Semantic Segmentation Assisted One-Stage Three-Dimensional Vehicle Object Detection. IEEE Trans. Intell. Transp. Syst. 2021, 23, 14764–14778. [Google Scholar] [CrossRef]
- Yin, T.; Zhou, X.; Krahenbuhl, P. Center-based 3D object detection and tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 11784–11793. [Google Scholar]
- Chen, X.; Ma, H.; Wan, J.; Li, B.; Xia, T. Multi-view 3D object detection network for autonomous driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1907–1915. [Google Scholar]
- Ku, J.; Mozifian, M.; Lee, J.; Harakeh, A.; Waslander, S.L. Joint 3D proposal generation and object detection from view aggregation. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 1–8. [Google Scholar]
- Chen, X.; Kundu, K.; Zhu, Y.; Berneshawi, A.G.; Ma, H.; Fidler, S.; Urtasun, R. 3D object proposals for accurate object class detection. In Advances in Neural Information Processing Systems (NIPS); NIPS: San Diego, CA, USA, 2015; pp. 424–432. [Google Scholar]
- Wang, C.; Musaev, A. Preliminary research on vehicle speed detection using traffic cameras. In Proceedings of the 2019 IEEE International Conference on Big Data (Big Data), Los Angeles, CA, USA, 9–12 December 2019; pp. 3820–3823. [Google Scholar]
- Giannakeris, P.; Kaltsa, V.; Avgerinakis, K.; Briassouli, A.; Vrochidis, S.; Kompatsiaris, I. Speed estimation and abnormality detection from surveillance cameras. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–22 June 2018; pp. 93–99. [Google Scholar]
- Gupta, I.; Rangesh, A.; Trivedi, M. 3D Bounding Boxes for Road Vehicles: A One-Stage, Localization Prioritized Approach using Single Monocular Images. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, Munich, Germany, 8–14 September 2018. [Google Scholar]
- Kim, G.; Jung, H.G.; Suhr, J.K. Improvement of vehicle position estimation using CNN-based vehicle bottom face center detection. Trans. Korean Soc. Automot. Eng. 2022, 30, 599–607. [Google Scholar] [CrossRef]
- Ming, Y.; Meng, X.; Fan, C.; Yu, H. Deep learning for monocular depth estimation: A review. Neurocomputing 2021, 438, 14–33. [Google Scholar] [CrossRef]
- Qin, Z.; Wang, J.; Lu, Y. Monogrnet: A geometric reasoning network for monocular 3D object localization. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; pp. 8851–8858. [Google Scholar]
- Zhang, Y.; Lu, J.; Zhou, J. Objects are different: Flexible monocular 3D object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 3289–3298. [Google Scholar]
- Carrillo, J.; Waslander, S. Urbannet: Leveraging urban maps for long range 3D object detection. In Proceedings of the 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), Indianapolis, IN, USA, 19–22 September 2021; pp. 3799–3806. [Google Scholar]
- Rui, Z.; Zongyuan, G.; Simon, D.; Sridha, S.; Clinton, F. Geometry-constrained car recognition using a 3D perspective network. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 1161–1168. [Google Scholar]
- Xinyao, T.; Huansheng, S.; Wei, W.; Chunhui, Z. CenterLoc3D: Monocular 3D Vehicle Localization Network for Roadside Surveillance Cameras. arXiv 2022, arXiv:2203.14550. [Google Scholar]
- Chabot, F.; Chaouch, M.; Rabarisoa, J.; Teuliere, C.; Chateau, T. Deep manta: A coarse-to-fine many-task network for joint 2d and 3D vehicle analysis from monocular image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2040–2049. [Google Scholar]
- Wu, Y.; Jiang, X.; Fang, Z.; Gao, Y.; Fujita, H. Multi-modal 3D object detection by 2d-guided precision anchor proposal and multi-layer fusion. Appl. Soft Comput. 2021, 108, 107405. [Google Scholar] [CrossRef]
- Brazil, G.; Liu, X. M3d-rpn: Monocular 3D region proposal network for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9287–9296. [Google Scholar]
- Li, B.; Ouyang, W.; Sheng, L.; Zeng, X.; Wang, X. Gs3d: An efficient 3D object detection framework for autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 1019–1028. [Google Scholar]
- Mauri, A.; Khemmar, R.; Decoux, B.; Haddad, M.; Boutteau, R. Real-time 3D multi-object detection and localization based on deep learning for road and railway smart mobility. J. Imaging 2021, 7, 145. [Google Scholar] [CrossRef]
- Mauri, A.; Khemmar, R.; Decoux, B.; Haddad, M.; Boutteau, R. Lightweight convolutional neural network for real-time 3D object detection in road and railway environments. J. Real Time Image Process. 2022, 19, 499–516. [Google Scholar] [CrossRef]
- Weber, M.; Fürst, M.; Zöllner, J.M. Direct 3D detection of vehicles in monocular images with a cnn based 3D decoder. In Proceedings of the 2019 IEEE Intelligent Vehicles Symposium (IV), Paris, France, 9–12 June 2019; pp. 417–423. [Google Scholar]
- Jiaojiao, F.; Linglao, Z.; Guizhong, L. Monocular 3D Detection for Autonomous Vehicles by Cascaded Geometric Constraints and Depurated Using 3D Results. In Proceedings of the 2020 3rd International Conference on Unmanned Systems (ICUS), Harbin, China, 27–28 November 2020; pp. 954–959. [Google Scholar]
- Mousavian, A.; Anguelov, D.; Flynn, J.; Kosecka, J. 3D bounding box estimation using deep learning and geometry. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7074–7082. [Google Scholar]
- Liu, Z.; Wu, Z.; Tóth, R. Smoke: Single-stage monocular 3D object detection via keypoint estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 996–997. [Google Scholar]
- Wang, T.; Zhu, X.; Pang, J.; Lin, D. Fcos3d: Fully convolutional one-stage monocular 3D object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 913–922. [Google Scholar]
- Bao, W.; Xu, B.; Chen, Z. Monofenet: Monocular 3D object detection with feature enhancement networks. IEEE Trans. Image Process. 2019, 29, 2753–2765. [Google Scholar] [CrossRef] [PubMed]
- Wu, Z.; Jiang, X.; Xu, R.; Lu, K.; Zhu, Y.; Wu, M. DST3D: DLA-Swin Transformer for Single-Stage Monocular 3D Object Detection. In Proceedings of the 2022 IEEE Intelligent Vehicles Symposium (IV), Aachen, Germany, 4–9 June 2022; pp. 411–418. [Google Scholar]
- Kocur, V.; Ftáčnik, M. Detection of 3D bounding boxes of vehicles using perspective transformation for accurate speed measurement. Mach. Vis. Appl. 2020, 31, 62. [Google Scholar] [CrossRef]
- Gählert, N.; Wan, J.-J.; Jourdan, N.; Finkbeiner, J.; Franke, U.; Denzler, J. Single-shot 3D detection of vehicles from monocular rgb images via geometrically constrained keypoints in real-time. In Proceedings of the 2020 IEEE Intelligent Vehicles Symposium (IV), Las Vegas, NV, USA, 19 October–13 November 2020; pp. 437–444. [Google Scholar]
- Zhang, B.; Zhang, J. A traffic surveillance system for obtaining comprehensive information of the passing vehicles based on instance segmentation. IEEE Trans. Intell. Transp. Syst. 2020, 22, 7040–7055. [Google Scholar] [CrossRef]
- Zhu, J.; Li, X.; Zhang, C.; Shi, T. An accurate approach for obtaining spatiotemporal information of vehicle loads on bridges based on 3D bounding box reconstruction with computer vision. Measurement 2021, 181, 109657. [Google Scholar] [CrossRef]
- Sang, J.; Wu, Z.; Guo, P.; Hu, H.; Xiang, H.; Zhang, Q.; Cai, B. An Improved YOLOv2 for Vehicle Detection. Sensors 2018, 18, 4272. [Google Scholar] [CrossRef] [Green Version]
- Reddy, N.D.; Vo, M.; Narasimhan, S.G. Occlusion-net: 2D/3D occluded keypoint localization using graph networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 7326–7335. [Google Scholar]
- Zhou, X.; Wang, D.; Krähenbühl, P. Objects as points. arXiv 2019, arXiv:1904.07850 2019. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Gählert, N.; Mayer, M.; Schneider, L.; Franke, U.; Denzler, J. Mb-net: Mergeboxes for real-time 3D vehicles detection. In Proceedings of the 2018 IEEE Intelligent Vehicles Symposium (IV), Changshu, China, 26–30 June 2018; pp. 2117–2124. [Google Scholar]
- Zhang, X.; Feng, Y.; Angeloudis, P.; Demiris, Y. Monocular visual traffic surveillance: A review. IEEE Trans. Intell. Transp. Syst. 2022, 23, 14148–14165. [Google Scholar] [CrossRef]
- Bradler, H.; Kretz, A.; Mester, R. Urban Traffic Surveillance (UTS): A fully probabilistic 3D tracking approach based on 2D detections. In Proceedings of the 2021 IEEE Intelligent Vehicles Symposium (IV), Nagoya, Japan, 11–17 July 2021; pp. 1198–1205. [Google Scholar]
- Ahmed, I.; Jeon, G.; Chehri, A. A Smart IoT Enabled End-to-End 3D Object Detection System for Autonomous Vehicles. IEEE Trans. Intell. Transp. Syst. 2022, 23, 1–365. [Google Scholar] [CrossRef]
- Arnold, E.; Al-Jarrah, O.Y.; Dianati, M.; Fallah, S.; Oxtoby, D.; Mouzakitis, A. A survey on 3D object detection methods for autonomous driving applications. IEEE Trans. Intell. Transp. Syst. 2019, 20, 3782–3795. [Google Scholar] [CrossRef] [Green Version]
- Liu, M.; Wang, X.; Zhou, A.; Fu, X.; Ma, Y.; Piao, C. Uav-yolo: Small object detection on unmanned aerial vehicle perspective. Sensors 2020, 20, 2238. [Google Scholar] [CrossRef] [Green Version]
- Hong, S.-J.; Han, Y.; Kim, S.-Y.; Lee, A.-Y.; Kim, G. Application of deep-learning methods to bird detection using unmanned aerial vehicle imagery. Sensors 2019, 19, 1651. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Du, G.; Wang, K.; Lian, S.; Zhao, K. Vision-based robotic grasping from object localization, object pose estimation to grasp estimation for parallel grippers: A review. Artif. Intell. Rev. 2021, 54, 1677–1734. [Google Scholar] [CrossRef]
- YOLOv4. Available online: https://docs.nvidia.com/tao/tao-toolkit/text/object_detection/yolo_v4.html (accessed on 6 March 2023).
- Getting Started with YOLO V4. Available online: https://kr.mathworks.com/help/vision/ug/getting-started-with-yolo-v4.html (accessed on 6 March 2023).
- Wang, C.-Y.; Liao, H.-Y.M.; Wu, Y.-H.; Chen, P.-Y.; Hsieh, J.-W.; Yeh, I.-H. CSPNet: A new backbone that can enhance learning capability of CNN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 390–391. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; pp. 740–755. [Google Scholar]
- Ren, X.; Wang, X.; Wang, P. Vehicle Positioning Method of Roadside Monocular Camera. In Proceedings of the 2020 IEEE MTT-S International Wireless Symposium (IWS), Shanghai, China, 20–23 September 2020; pp. 1–3. [Google Scholar]
Abbreviation of Variations | Encoding Targets | |
---|---|---|
Origin | Offset | |
C1 | center of the anchor | ratio to the anchor size |
C2 | center of the anchor | residual to the anchor size |
C3 | bottom-left corner of the anchor | ratio to the anchor size |
C4 | bottom-left corner of the anchor | residual to the anchor size |
Abbreviation of Variations | Encoding Targets | ||
---|---|---|---|
Position | Size | Angle | |
PSA1 | C2 applied | residual to the anchor size | two angles |
PSA2 | C2 applied | residual to the anchor size | cosine and sine of two angles |
PSA3 | C2 applied | log-scale offset | two angles |
PSA4 | C2 applied | log-scale offset | cosine and sine of two angles |
Approach | Method | Vehicle Bounding Box AP (%) |
---|---|---|
Corner-based | C1 | 89.42 |
C2 | 88.36 | |
C3 | 89.46 | |
C4 | 89.34 | |
PSA-based | PSA1 | 90.28 |
PSA2 | 90.63 | |
PSA3 | 90.11 | |
PSA4 | 90.36 | |
Line-based | L1 | 87.98 |
L2 | 87.45 | |
L3 | 87.66 |
Approach | Method | Vehicle BFQ Detection | |||||||
---|---|---|---|---|---|---|---|---|---|
Using Strict Threshold | Using Loose Threshold | ||||||||
F1 Score | Precision | Recall | Average Position Error | F1 Score | Precision | Recall | Average Position Error | ||
Corner-based | C1 | 0.84 | 0.88 | 0.81 | 0.0521 | 0.92 | 0.96 | 0.88 | 0.0641 |
C2 | 0.86 | 0.90 | 0.83 | 0.0509 | 0.92 | 0.96 | 0.89 | 0.0614 | |
C3 | 0.85 | 0.88 | 0.81 | 0.0515 | 0.92 | 0.96 | 0.89 | 0.0641 | |
C4 | 0.86 | 0.90 | 0.83 | 0.0513 | 0.92 | 0.96 | 0.89 | 0.0617 | |
PSA-based | PSA1 | 0.82 | 0.85 | 0.78 | 0.0602 | 0.89 | 0.93 | 0.86 | 0.0727 |
PSA2 | 0.84 | 0.88 | 0.80 | 0.0605 | 0.91 | 0.95 | 0.87 | 0.0726 | |
PSA3 | 0.82 | 0.86 | 0.78 | 0.0604 | 0.90 | 0.95 | 0.86 | 0.0739 | |
PSA4 | 0.83 | 0.87 | 0.79 | 0.0615 | 0.91 | 0.95 | 0.87 | 0.0744 | |
Line-based | L1 | 0.73 | 0.77 | 0.68 | 0.0533 | 0.83 | 0.88 | 0.78 | 0.0703 |
L2 | 0.75 | 0.80 | 0.70 | 0.0526 | 0.83 | 0.89 | 0.79 | 0.0670 | |
L3 | 0.50 | 0.54 | 0.48 | 0.0389 | 0.63 | 0.67 | 0.60 | 0.0586 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kim, G.; Jung, H.G.; Suhr, J.K. CNN-Based Vehicle Bottom Face Quadrilateral Detection Using Surveillance Cameras for Intelligent Transportation Systems. Sensors 2023, 23, 6688. https://doi.org/10.3390/s23156688
Kim G, Jung HG, Suhr JK. CNN-Based Vehicle Bottom Face Quadrilateral Detection Using Surveillance Cameras for Intelligent Transportation Systems. Sensors. 2023; 23(15):6688. https://doi.org/10.3390/s23156688
Chicago/Turabian StyleKim, Gahyun, Ho Gi Jung, and Jae Kyu Suhr. 2023. "CNN-Based Vehicle Bottom Face Quadrilateral Detection Using Surveillance Cameras for Intelligent Transportation Systems" Sensors 23, no. 15: 6688. https://doi.org/10.3390/s23156688
APA StyleKim, G., Jung, H. G., & Suhr, J. K. (2023). CNN-Based Vehicle Bottom Face Quadrilateral Detection Using Surveillance Cameras for Intelligent Transportation Systems. Sensors, 23(15), 6688. https://doi.org/10.3390/s23156688