Deep Feature-Level Sensor Fusion Using Skip Connections for Real-Time Object Detection in Autonomous Driving
Abstract
:1. Introduction and Literature Review
- The TVNet is a novel end-to-end deep learning framework which simultaneously extracts sensor specific features, performs feature-level fusion and object detection for the thermal and visible cameras.
- Comparative experimental analysis of early, late and feature-level fusion of visible and thermal camera for obstacle detection.
2. Deep Learning Framework
2.1. RVNet: Radar and Visible Camera Fusion and Object Detection
2.2. TVNet: Thermal and Visible Camera Fusion and Object Detection
2.3. Training
3. Experimental Results and Discussion
3.1. Results
3.1.1. RVNet
3.1.2. TVNet
3.1.3. Discussion
4. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- John, V.; Karunakaran, N.M.; Guo, C.; Kidono, K.; Mita, S. Free Space, Visible and Missing Lane Marker Estimation using the PsiNet and Extra Trees Regression. In Proceedings of the 24th International Conference on Pattern Recognition, Beijing, China, 20–24 August 2018; pp. 189–194. [Google Scholar]
- Jazayeri, A.; Cai, H.; Zheng, J.Y.; Tuceryan, M. Vehicle Detection and Tracking in Car Video Based on Motion Model. IEEE Trans. Intell. Transp. Syst. 2011, 12, 583–595. [Google Scholar] [CrossRef]
- Kafia, M.A.; Challal, Y.; Djenouria, D.; Abdelmadjid, M.D.; Badacheab, B.N. A Study of Wireless Sensor Networks for Urban Traffic Monitoring: Applications and Architectures. Procedia Comput. Sci. 2013, 19, 617–626. [Google Scholar] [CrossRef] [Green Version]
- Nellore, K.; Hancke, G. A Survey on Urban Traffic Management System Using Wireless Sensor Networks. Sensors 2016, 16, 157. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Curiac, D.-I.; Volosencu, C. Urban Traffic Control System Architecture Based on Wireless Sensor-Actuator Networks. In Proceedings of the 2nd International Conference on Manufacturing Engineering, Quality and Production Systems, (MEQAPS’10), Constanța, Romania, 3–5 September 2010. [Google Scholar]
- Bombini, L.; Cerri, P.; Medici, P.; Alessandretti, G. Radar-Vision Fusion for Vehicle Detection. 2006. Available online: http://citeseerx.ist.psu.edu/viewdoc/versions?doi=10.1.1.218.7866 (accessed on 5 January 2021).
- Sugimoto, S.; Tateda, H.; Takahashi, H.; Okutomi, M. Obstacle detection using millimeter-wave radar and its visualization on image sequence. In Proceedings of the 17th International Conference on Pattern Recognition, ICPR 2004, Cambridge, UK, 26 August 2004; Volume 3, pp. 342–345. [Google Scholar]
- Fang, Y.; Masaki, I.; Horn, B. Depth-based target segmentation for intelligent vehicles: Fusion of radar and binocular stereo. IEEE Trans. Intell. Transp. Syst. 2002, 3, 196–202. [Google Scholar] [CrossRef]
- Garcia, F.; Cerri, P.; Broggi, A.; de la Escalera, A.; Armingol, J.M. Data fusion for overtaking vehicle detection based on radar and optical flow. In Proceedings of the IEEE Intelligent Vehicles Symposium, Alcala de Henares, Spain, 3–7 June 2012; pp. 494–499. [Google Scholar]
- Zhong, Z.; Liu, S.; Mathew, M.; Dubey, A. Camera Radar Fusion for Increased Reliability in ADAS Applications. Electron. Imaging Auton. Veh. Mach. 2018, 1, 258-1–258-4. [Google Scholar] [CrossRef]
- Wang, X.; Xu, L.; Sun, H.; Xin, J.; Zheng, N. On-Road Vehicle Detection and Tracking Using MMW Radar and Monovision Fusion. IEEE Trans. Intell. Transp. Syst. 2016, 17, 2075–2084. [Google Scholar] [CrossRef]
- Chadwick, S.; Maddern, W.; Newman, P. Distant Vehicle Detection Using Radar and Vision. arXiv 2019, arXiv:1901.10951. [Google Scholar]
- John, V.; Mita, S. RVNet: Deep Sensor Fusion of Monocular Camera and Radar for Image-based Obstacle Detection in Challenging Environments. In Pacific-Rim Symposium on Image and Video Technology, Proceedings of the PSIVT, Sydney, Australia, 18–22 November 2019; Springer: Cham, Switzerland, 2019. [Google Scholar]
- Gaisser, F.; Jonker, P.P. Road user detection with convolutional neural networks: An application to the autonomous shuttle WEpod. In Proceedings of the International Conference on Machine Vision Applications (MVA), Nagoya, Japan, 8–12 May 2017; pp. 101–104. [Google Scholar]
- Milch, S.; Behrens, M. Pedestrian Detection with Radar and Computer Vision. 2001. Available online: http://citeseerx.ist.psu.edu/viewdoc/versions?doi=10.1.1.20.9264 (accessed on 5 January 2021).
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.E.; Fu, C.; Berg, A.C. SSD: Single Shot MultiBox Detector. arXiv 2015, arXiv:1512.02325. [Google Scholar]
- Shah, P.; Merchant, S.N.; Desai, U.B. Multifocus and multispectral image fusion based on pixel significance using multiresolution decomposition. Signal Image Video Process. 2013, 7, 95–109. [Google Scholar] [CrossRef]
- Liu, Z.; Laganière, R. Context enhancement through infrared vision: A modified fusion scheme. Signal Image Video Process. 2007, 1, 293–301. [Google Scholar] [CrossRef]
- Flitti, F.; Collet, C.; Slezak, E. Image fusion based on pyramidal multiband multiresolution markovian analysis. Signal Image Video Process. 2008, 3, 275–289. [Google Scholar] [CrossRef]
- Shah, P.; Reddy, B.C.S.; Merchant, S.N.; Desai, U.B. Context enhancement to reveal a camouflaged target and to assist target localization by fusion of multispectral surveillance videos. Signal Image Video Process. 2013, 7, 537–552. [Google Scholar] [CrossRef]
- Zhang, Y.; Zhang, L.; Bai, X.; Zhang, L. Infrared and visual image fusion through infrared feature extraction and visual information preservation. Infrared Phys. Technol. 2017, 83, 227–237. [Google Scholar] [CrossRef]
- John, V.; Mita, S.; Liu, Z.; Qi, B. Pedestrian detection in thermal images using adaptive fuzzy C-means clustering and convolutional neural networks. In Proceedings of the 14th IAPR International Conference on Machine Vision Applications (MVA), Tokyo, Japan, 18–22 May 2015. [Google Scholar]
- Shah, P.; Srikanth, T.V.; Merchant, S.N.; Desai, U.B. Multimodal image/video fusion rule using generalized pixel significance based on statistical properties of the neighborhood. Signal Image Video Process. 2014, 8, 723–738. [Google Scholar] [CrossRef]
- Shopovska, I.; Jovanov, L.; Philips, W. Deep Visible and Thermal Image Fusion for Enhanced Pedestrian Visibility. Sensors 2019, 19, 3727. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ma, J.; Yu, W.; Liang, P.; Li, C.; Jiang, J. FusionGAN: A generative adversarial network for infrared and visible image fusion. Inf. Fusion 2019, 48, 11–26. [Google Scholar] [CrossRef]
- Zhao, Y.; Fu, G.; Wang, H.; Zhang, S. The Fusion of Unmatched Infrared and Visible Images Based on Generative Adversarial Networks. Math. Probl. Eng. 2020, 2020. [Google Scholar] [CrossRef]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Caesar, H.; Bankiti, V.; Lang, A.H.; Vora, S.; Liong, V.E.; Xu, Q.; Krishnan, A.; Pan, Y.; Baldan, G.; Beijbom, O. nuScenes: A multimodal dataset for autonomous driving. arXiv 2019, arXiv:1903.11027. [Google Scholar]
- FLIR. 2015. Available online: http://www.flir.com (accessed on 5 January 2021).
- Manjunath, A.; Liu, Y.; Henriques, B.; Engstle, A. Radar Based Object Detection and Tracking for Autonomous Driving. In Proceedings of the IEEE MTT-S International Conference on Microwaves for Intelligent Mobility (ICMIM), Munich, Germany, 15–17 April 2018; pp. 1–4. [Google Scholar]
- John, V.; Nithilan, M.K.; Mita, S.; Tehrani, H.; Konishi, M.; Ishimaru, K.; Oishi, T. Sensor Fusion of Intensity and Depth Cues using the ChiNet for Semantic Segmentation of Road Scenes. In Proceedings of the Intelligent Vehicles Symposium, Changshu, China, 26–30 June 2018. [Google Scholar]
- Everingham, M.; Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The Pascal Visual Object Classes (VOC) Challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef] [Green Version]
- Prechelt, L. Early Stopping-But When? In Neural Networks: Tricks of the Trade; Orr, G.B., Müller, K.R., Eds.; Springer: Berlin/Heidelberg, Germany, 1996; Volume 1524, pp. 55–69. [Google Scholar]
Sensor | Milliwave Radar | Visible Camera |
---|---|---|
Weather | Not affected by | Affected by |
rain, snow and fog [30] | rain, snow and fog | |
Illumination | Not affected | Affected |
Data density | Sparse appearance | Dense appearance |
Object boundary | No | Yes |
Object Speed | Yes | No |
Sensor | Thermal Camera | Visible Camera |
---|---|---|
Weather | Not affected by rain, snow and fog [30] | Affected by rain, snow and fog |
Illumination | Not affected by low-light and low-visibility | Affected by low-light and low-visibility |
Lens flare | Not affected by direct sunlight and headlight | Affected by direct sunlight and headlight |
Algorithms | Average Precision | Computing Time (ms) |
---|---|---|
Tiny Yolov3 [27] | 0.40 | 10 |
Late Fusion | 0.40 | 14 |
RVNet | 0.56 | 17 |
Algorithms | Average Precision | Computing Time (ms) |
---|---|---|
Tiny Yolov3 (Visible) [27] | 0.59 | 10 |
Tiny Yolov3 (Thermal) [27] | 0.58 | 10 |
Early Fusion | 0.55 | 10 |
Late Fusion | 0.56 | 17 |
TVNet | 0.61 | 17 |
Algorithms | Average Precision | Computing Time (ms) |
---|---|---|
Tiny Yolov3 (Visible) [27] | 0.21 | 10 |
Tiny Yolov3 (Thermal) | 0.61 | 10 |
Early Fusion | 0.59 | 10 |
Late Fusion | 0.60 | 20 |
TVNet | 0.60 | 17 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
John, V.; Mita, S. Deep Feature-Level Sensor Fusion Using Skip Connections for Real-Time Object Detection in Autonomous Driving. Electronics 2021, 10, 424. https://doi.org/10.3390/electronics10040424
John V, Mita S. Deep Feature-Level Sensor Fusion Using Skip Connections for Real-Time Object Detection in Autonomous Driving. Electronics. 2021; 10(4):424. https://doi.org/10.3390/electronics10040424
Chicago/Turabian StyleJohn, Vijay, and Seiichi Mita. 2021. "Deep Feature-Level Sensor Fusion Using Skip Connections for Real-Time Object Detection in Autonomous Driving" Electronics 10, no. 4: 424. https://doi.org/10.3390/electronics10040424
APA StyleJohn, V., & Mita, S. (2021). Deep Feature-Level Sensor Fusion Using Skip Connections for Real-Time Object Detection in Autonomous Driving. Electronics, 10(4), 424. https://doi.org/10.3390/electronics10040424