Semantic Scene Completion in Autonomous Driving: A Two-Stream Multi-Vehicle Collaboration Approach
Abstract
:1. Introduction
- A novel collaboration approach called TSMV was proposed for collaborative semantic scene completion, which includes a two-stream architecture to effectively fuse features from collaborative vehicles. TSMV alleviates the boundary ambiguity problem caused by feature misalignment in multi-vehicle cooperative perception.
- A NSCAT module was proposed by combining self-attention and cross-attention transformer. NSCAT recurrently aggregates features of two streams through local and global interaction.
- Experiments on both V2VSSC and SemanticOPV2V datasets show that TSMV achieves improved performance compared to the prior arts in both LiDAR-based and Camera-based methods.
2. Related Work
2.1. Semantic Scene Completion
2.2. Multi-Vehicle Perception
2.3. Transformer in Feature Fusion
3. Methodology
3.1. Overall Architecture
3.1.1. Spatial Graph Construction
3.1.2. Feature Extraction and Compression
3.1.3. Head and Losses
3.2. Two-Stream Multi-Vehicle Collaboration
4. Experiments
4.1. Datasets and Evaluation Metrics
4.2. Implementation Details
4.3. Comparison
4.3.1. LiDAR-Based Results on V2VSSC
4.3.2. Camera-Based Results on SemanticOPV2V
4.4. Run Time Efficiency
4.5. Detection Visualization
4.6. Ablation Studies
4.6.1. Contribution of Major Components
4.6.2. Impact of the Kernel Size in the Neighborhood
4.6.3. Performance of Positional Error
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Fayyad, J.; Jaradat, M.A.; Gruyer, D.; Najjaran, H. Deep Learning Sensor Fusion for Autonomous Vehicle Perception and Localization: A Review. Sensors 2020, 20, 4220. [Google Scholar] [CrossRef] [PubMed]
- Wang, R.; Luo, X.; Ye, Q.; Jiang, Y.; Liu, W. Research on Visual Perception of Speed Bumps for Intelligent Connected Vehicles Based on Lightweight FPNet. Sensors 2024, 24, 2130. [Google Scholar] [CrossRef]
- Wang, J.; Li, F.; An, Y.; Zhang, X.; Sun, H. Toward Robust LiDAR-Camera Fusion in BEV Space via Mutual Deformable Attention and Temporal Aggregation. IEEE Trans. Circuits Syst. Video Technol. 2024, 34, 5753–5764. [Google Scholar] [CrossRef]
- Li, Z.; Wang, W.; Li, H.; Xie, E.; Sima, C.; Lu, T.; Qiao, Y.; Dai, J. Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers. In Proceedings of the European Conference on Computer Vision (ECCV), Tel Aviv, Israel, 23–27 October 2022; pp. 1–18. [Google Scholar]
- Liang, T.; Xie, H.; Yu, K.; Xia, Z.; Lin, Z.; Wang, Y.; Tang, T.; Wang, B.; Tang, Z. Bevfusion: A simple and robust lidar-camera fusion framework. In Proceedings of the Advances in Neural Information Processing Systems (NIPS), New Orleans, LA, USA, 28 November–9 December 2022; Volume 35, pp. 10421–10434. [Google Scholar]
- Yan, Y.; Mao, Y.; Li, B. SECOND: Sparsely Embedded Convolutional Detection. Sensors 2018, 18, 3337. [Google Scholar] [CrossRef]
- Rist, C.; Emmerichs, D.; Enzweiler, M.; Gavrila, D. Semantic Scene Completion Using Local Deep Implicit Functions on LiDAR Data. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 7205–7218. [Google Scholar] [CrossRef]
- Zhang, Z.; Han, X.; Dong, B.; Li, T.; Yin, B.; Yang, X. Point Cloud Scene Completion with Joint Color and Semantic Estimation from Single RGB-D Image. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 11079–11095. [Google Scholar] [CrossRef]
- Wilson, J.; Song, J.; Fu, Y.; Zhang, A.; Capodieci, A.; Jayakumar, P.; Barton, K.; Ghaffari, M. MotionSC: Data Set and Network for Real-Time Semantic Mapping in Dynamic Environments. IEEE Robot. Autom. Lett. 2022, 7, 8439–8446. [Google Scholar] [CrossRef]
- Park, J.; Yoo, H.; Wang, Y. Drivable Dirt Road Region Identification Using Image and Point Cloud Semantic Segmentation Fusion. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 23, 13203–13216. [Google Scholar] [CrossRef]
- Meyer, G.P.; Charland, J.; Pandey, S.; Laddha, A.; Gautam, S.; Vallespi-Gonzalez, C.; Wellington, C.K. LaserFlow: Efficient and Probabilistic Object Detection and Motion Forecasting. IEEE Robot. Autom. Lett. 2021, 6, 526–533. [Google Scholar] [CrossRef]
- Yuan, Z.; Song, X.; Bai, L.; Wang, Z.; Ouyang, W. Temporal-Channel Transformer for 3D Lidar-Based Video Object Detection for Autonomous Driving. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 2068–2078. [Google Scholar] [CrossRef]
- Rong, Y.; Wei, X.; Lin, T.; Wang, Y.; Kasneci, E. DynStatF: An Efficient Feature Fusion Strategy for LiDAR 3D Object Detection. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Vancouver, BC, Canada, 17–24 June 2023; pp. 3238–3247. [Google Scholar] [CrossRef]
- Ullah, I.; Ali, F.; Khan, H.; Khan, F.; Bai, X. Ubiquitous computation in internet of vehicles for human-centric transport systems. Comput. Hum. Behav. 2024, 161, 108394. [Google Scholar] [CrossRef]
- Naeem, H.M.Y.; Bhatti, A.I.; Butt, Y.A.; Ahmed, Q.; Bai, X. Energy Efficient Solution for Connected Electric Vehicle and Battery Health Management Using Eco-Driving Under Uncertain Environmental Conditions. IEEE Trans. Intell. Veh. 2024, 9, 4621–4631. [Google Scholar] [CrossRef]
- Nardini, G.; Virdis, A.; Campolo, C.; Molinaro, A.; Stea, G. Cellular-V2X Communications for Platooning: Design and Evaluation. Sensors 2018, 18, 1527. [Google Scholar] [CrossRef] [PubMed]
- Ku, Y.J.; Baidya, S.; Dey, S. Uncertainty-Aware Task Offloading for Multi-Vehicle Perception Fusion Over Vehicular Edge Computing. IEEE Trans. Veh. Technol. 2023, 72, 14906–14923. [Google Scholar] [CrossRef]
- Cui, G.; Zhang, W.; Xiao, Y.; Yao, L.; Fang, Z. Cooperative Perception Technology of Autonomous Driving in the Internet of Vehicles Environment: A Review. Sensors 2022, 22, 5535. [Google Scholar] [CrossRef]
- Li, Y.; Ma, D.; An, Z.; Wang, Z.; Zhong, Y.; Chen, S.; Feng, C. V2X-Sim: Multi-Agent Collaborative Perception Dataset and Benchmark for Autonomous Driving. IEEE Robot. Autom. Lett. 2022, 7, 10914–10921. [Google Scholar] [CrossRef]
- Malik, S.; Khan, M.A.; El-Sayed, H. Collaborative Autonomous Driving—A Survey of Solution Approaches and Future Challenges. Sensors 2021, 21, 3783. [Google Scholar] [CrossRef]
- Chen, Q.; Tang, S.; Yang, Q.; Fu, S. Cooper: Cooperative Perception for Connected Autonomous Vehicles Based on 3D Point Clouds. In Proceedings of the IEEE 39th International Conference on Distributed Computing Systems (ICDCS), Dallas, TX, USA, 7–10 July 2019; pp. 514–524. [Google Scholar] [CrossRef]
- Rawashdeh, Z.Y.; Wang, Z. Collaborative Automated Driving: A Machine Learning-based Method to Enhance the Accuracy of Shared Information. In Proceedings of the International Conference on Intelligent Transportation Systems (ITSC), Maui, HI, USA, 4–7 November 2018; pp. 3961–3966. [Google Scholar] [CrossRef]
- Hu, Y.; Fang, S.; Lei, Z.; Zhong, Y.; Chen, S. Where2comm: Communication-efficient collaborative perception via spatial confidence maps. In Proceedings of the Advances in Neural Information Processing Systems (NIPS), Vancouver, BC, Canada, 9–14 December 2024. [Google Scholar]
- Xu, R.; Xiang, H.; Xia, X.; Han, X.; Li, J.; Ma, J. Opv2v: An open benchmark dataset and fusion pipeline for perception with vehicle-to-vehicle communication. In Proceedings of the International Conference on Robotics and Automation (ICRA), Philadelphia, PA, USA, 23–27 May 2022; pp. 2583–2589. [Google Scholar]
- Zhang, Y.; Li, J.; Luo, K.; Yang, Y.; Han, J.; Liu, N.; Qin, D.; Han, P.; Xu, C. V2VSSC: A 3D Semantic Scene Completion Benchmark for Perception with Vehicle to Vehicle Communication. arXiv 2024, arXiv:2402.04671. [Google Scholar]
- Xu, R.; Xiang, H.; Tu, Z.; Xia, X.; Yang, M.H.; Ma, J. V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer. In Proceedings of the European Conference on Computer Vision (ECCV), Tel Aviv, Israel, 23–27 October 2022. [Google Scholar]
- Song, S.; Yu, F.; Zeng, A.; Chang, A.X.; Savva, M.; Funkhouser, T. Semantic Scene Completion from a Single Depth Image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 190–198. [Google Scholar] [CrossRef]
- Firman, M.; Aodha, O.M.; Julier, S.; Brostow, G.J. Structured Prediction of Unobserved Voxels from a Single Depth Image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 5431–5440. [Google Scholar] [CrossRef]
- Dai, A.; Chang, A.X.; Savva, M.; Halber, M.; Funkhouser, T.; Niessner, M. ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2432–2443. [Google Scholar] [CrossRef]
- Li, J.; Liu, Y.; Yuan, X.; Zhao, C.; Siegwart, R.; Reid, I.; Cadena, C. Depth Based Semantic Scene Completion With Position Importance Aware Loss. IEEE Robot. Autom. Lett. 2020, 5, 219–226. [Google Scholar] [CrossRef]
- Sakaridis, C.; Dai, D.; Van Gool, L. Semantic Foggy Scene Understanding with Synthetic Data. Int. J. Comput. Vis. 2018, 126, 973–992. [Google Scholar] [CrossRef]
- Li, J.; Wang, P.; Han, K.; Liu, Y. Anisotropic Convolutional Neural Networks for RGB-D Based Semantic Scene Completion. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 8125–8138. [Google Scholar] [CrossRef] [PubMed]
- Behley, J.; Garbade, M.; Milioto, A.; Quenzel, J.; Behnke, S.; Stachniss, C.; Gall, J. SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9296–9306. [Google Scholar] [CrossRef]
- Wang, X.; Zhu, Z.; Xu, W.; Zhang, Y.; Wei, Y.; Chi, X.; Ye, Y.; Du, D.; Lu, J.; Wang, X. OpenOccupancy: A Large Scale Benchmark for Surrounding Semantic Occupancy Perception. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 1–6 October 2023; pp. 17804–17813. [Google Scholar] [CrossRef]
- Li, Y.; Li, S.; Liu, X.; Gong, M.; Li, K.; Chen, N.; Wang, Z.; Li, Z.; Jiang, T.; Yu, F.; et al. Sscbench: A large-scale 3d semantic scene completion benchmark for autonomous driving. arXiv 2023, arXiv:2306.09001. [Google Scholar]
- Xu, R.; Guo, Y.; Han, X.; Xia, X.; Xiang, H.; Ma, J. OpenCDA: An Open Cooperative Driving Automation Framework Integrated with Co-Simulation. In Proceedings of the International Conference on Intelligent Transportation Systems (ITSC), Indianapolis, IN, USA, 19–22 September 2021; pp. 1155–1162. [Google Scholar] [CrossRef]
- Song, R.; Liang, C.; Cao, H.; Yan, Z.; Zimmer, W.; Gross, M.; Festag, A.; Knoll, A. Collaborative Semantic Occupancy Prediction with Hybrid Feature Fusion in Connected Automated Vehicles. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 17–21 June 2024. [Google Scholar]
- Yuan, Y.; Cheng, H.; Sester, M. Keypoints-Based Deep Feature Fusion for Cooperative Vehicle Detection of Autonomous Driving. IEEE Robot. Autom. Lett. 2022, 7, 3054–3061. [Google Scholar] [CrossRef]
- Wang, T.H.; Manivasagam, S.; Liang, M.; Yang, B.; Zeng, W.; Urtasun, R. V2VNet: Vehicle-to-Vehicle Communication for Joint Perception and Prediction. In Proceedings of the European Conference on Computer Vision (ECCV), Glasgow, UK, 23–28 August 2020; pp. 605–621. [Google Scholar] [CrossRef]
- Chen, Q.; Ma, X.; Tang, S.; Guo, J.; Yang, Q.; Fu, S. F-cooper: Feature based cooperative perception for autonomous vehicle edge computing system using 3D point clouds. In Proceedings of the ACM/IEEE Symposium on Edge Computing, Arlington, VA, USA, 7–9 November 2019; pp. 88–100. [Google Scholar] [CrossRef]
- Mehr, E.; Jourdan, A.; Thome, N.; Cord, M.; Guitteny, V. DiscoNet: Shapes Learning on Disconnected Manifolds for 3D Editing. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Repulic of Korea, 27 October–2 November 2019; pp. 3473–3482. [Google Scholar] [CrossRef]
- Fang, S.; Li, H. Multi-Vehicle Cooperative Simultaneous LiDAR SLAM and Object Tracking in Dynamic Environments. IEEE Trans. Intell. Transp. Syst. 2024, 25, 11411–11421. [Google Scholar] [CrossRef]
- Yin, H.; Tian, D.; Lin, C.; Duan, X.; Zhou, J.; Zhao, D.; Cao, D. V2VFormer++: Multi-Modal Vehicle-to-Vehicle Cooperative Perception via Global-Local Transformer. IEEE Trans. Intell. Transp. Syst. 2024, 25, 2153–2166. [Google Scholar] [CrossRef]
- Luo, G.; Shao, C.; Cheng, N.; Zhou, H.; Zhang, H.; Yuan, Q.; Li, J. EdgeCooper: Network-Aware Cooperative LiDAR Perception for Enhanced Vehicular Awareness. IEEE J. Sel. Areas Commun. 2024, 42, 207–222. [Google Scholar] [CrossRef]
- Xu, R.; Tu, Z.; Xiang, H.; Shao, W.; Zhou, B.; Ma, J. CoBEVT: Cooperative Bird’s Eye View Semantic Segmentation with Sparse Transformers. In Proceedings of the Conference on Robot Learning, Atlanta, GA, USA, 14–18 December 2023; Volume 205, pp. 989–1000. [Google Scholar]
- Yu, H.; Luo, Y.; Shu, M.; Huo, Y.; Yang, Z.; Shi, Y.; Guo, Z.; Li, H.; Hu, X.; Yuan, J.; et al. DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 21329–21338. [Google Scholar] [CrossRef]
- Lu, Y.; Hu, Y.; Zhong, Y.; Wang, D.; Chen, S.; Wang, Y. An Extensible Framework for Open Heterogeneous Collaborative Perception. arXiv 2024, arXiv:2401.13964. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems (NIPS’17), Long Beach, CA, USA, 4–9 December 2017; pp. 6000–6010. [Google Scholar]
- Xu, C.; Jia, W.; Wang, R.; Luo, X.; He, X. MorphText: Deep Morphology Regularized Accurate Arbitrary-shape Scene Text Detection. IEEE Trans. Multimed. 2022, 25, 4199–4212. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In Proceedings of the International Conference on Learning Representations (ICLR), Vienna, Austria, 4 May 2021. [Google Scholar]
- Xu, C.; Jia, W.; Wang, R.; He, X.; Zhao, B.; Zhang, Y. Semantic Navigation of PowerPoint-Based Lecture Video for AutoNote Generation. IEEE Trans. Learn. Technol. 2022, 16, 1–17. [Google Scholar] [CrossRef]
- Xu, C.; Jia, W.; Cui, T.; Wang, R.; Zhang, Y.F.; He, X. Arbitrary-shape scene text detection via visual-relational rectification and contour approximation. IEEE Trans. Multimed. 2022, 25, 4052–4066. [Google Scholar] [CrossRef]
- Bai, X.; Hu, Z.; Zhu, X.; Huang, Q.; Chen, Y.; Fu, H.; Tai, C.L. TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 1080–1089. [Google Scholar] [CrossRef]
- Zhu, X.; Su, W.; Lu, L.; Li, B.; Wang, X.; Dai, J. Deformable DETR: Deformable Transformers for End-to-End Object Detection. In Proceedings of the International Conference on Learning Representations (ICLR), Vienna, Austria, 4 May 2021. [Google Scholar]
- Ramachandran, P.; Parmar, N.; Vaswani, A.; Bello, I.; Levskaya, A.; Shlens, J. Stand-alone self-attention in vision models. In Proceedings of the Advances in Neural Information Processing Systems (NIPS), Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 9992–10002. [Google Scholar] [CrossRef]
- Dong, X.; Bao, J.; Chen, D.; Zhang, W.; Yu, N.; Yuan, L.; Chen, D.; Guo, B. CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 12114–12124. [Google Scholar] [CrossRef]
- Hassani, A.; Walton, S.; Li, J.; Li, S.; Shi, H. Neighborhood Attention Transformer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 6185–6194. [Google Scholar] [CrossRef]
- Zhou, Y.; Tuzel, O. VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 4490–4499. [Google Scholar] [CrossRef]
- Yu, Z.; Shu, C.; Deng, J.; Lu, K.; Liu, Z.; Yu, J.; Yang, D.; Li, H.; Chen, Y. FlashOcc: Fast and Memory-Efficient Occupancy Prediction via Channel-to-Height Plugin. arXiv 2023, arXiv:2311.12058. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
- Kingma, D.; Ba, J. Adam: A Method for Stochastic Optimization. In Proceedings of the International Conference on Learning Representations (ICLR), San Diega, CA, USA, 7–9 May 2015. [Google Scholar]
- Xia, X.; Hang, P.; Xu, N.; Huang, Y.; Xiong, L.; Yu, Z. Advancing Estimation Accuracy of Sideslip Angle by Fusing Vehicle Kinematics and Dynamics Information With Fuzzy Logic. IEEE Trans. Veh. Technol. 2021, 70, 6577–6590. [Google Scholar] [CrossRef]
- Li, Y.; Zhuang, Y.; Hu, X.; Gao, Z.; Hu, J.; Chen, L.; He, Z.; Pei, L.; Chen, K.; Wang, M.; et al. Toward Location-Enabled IoT (LE-IoT): IoT Positioning Techniques, Error Sources, and Error Mitigation. IEEE Internet Things J. 2021, 8, 4035–4062. [Google Scholar] [CrossRef]
- Tsukada, M.; Oi, T.; Ito, A.; Hirata, M.; Esaki, H. AutoC2X: Open-source software to realize V2X cooperative perception among autonomous vehicles. In Proceedings of the 2020 IEEE 92nd Vehicular Technology Conference (VTC2020-Fall), Victoria, BC, Canada, 18 November–16 December 2020; pp. 1–6. [Google Scholar] [CrossRef]
Method | IoU | mIoU | cIoU | Road | Car | Terrian | Building | Veg. 1 | Pole |
---|---|---|---|---|---|---|---|---|---|
No Fusion | 0.512 | 0.418 | 0.604 | 0.643 | 0.566 | 0.464 | 0.324 | 0.383 | 0.126 |
Late Fusion | 0.54 | 0.433 | 0.608 | 0.67 | 0.546 | 0.478 | 0.368 | 0.419 | 0.115 |
Early Fusion | 0.57 | 0.461 | 0.649 | 0.682 | 0.616 | 0.508 | 0.34 | 0.444 | 0.174 |
Fcooper [40] | 0.556 | 0.455 | 0.64 | 0.676 | 0.604 | 0.513 | 0.323 | 0.44 | 0.172 |
V2XViT [26] | 0.58 | 0.44 | 0.58 | 0.658 | 0.501 | 0.558 | 0.359 | 0.433 | 0.131 |
V2VSSC [25] | 0.566 | 0.462 | 0.658 | 0.672 | 0.644 | 0.499 | 0.336 | 0.446 | 0.177 |
TSMV(Ours) | 0.586 | 0.492 | 0.687 | 0.708 | 0.666 | 0.577 | 0.338 | 0.481 | 0.18 |
Method | Perfect | Noisy | ||||
---|---|---|---|---|---|---|
IoU | mIoU | cIoU | IoU | mIoU | cIoU | |
No Fusion | 0.512 | 0.418 | 0.604 | 0.512 | 0.418 | 0.604 |
Late Fusion | 0.54 | 0.433 | 0.608 | 0.526 | 0.403 | 0.544 |
Early Fusion | 0.57 | 0.461 | 0.649 | 0.559 | 0.441 | 0.613 |
FCooper [40] | 0.556 | 0.455 | 0.64 | 0.552 | 0.447 | 0.626 |
V2XViT [26] | 0.58 | 0.44 | 0.58 | 0.57 | 0.431 | 0.565 |
V2VSSC [25] | 0.566 | 0.462 | 0.658 | 0.559 | 0.443 | 0.618 |
TSMV(Ours) | 0.586 | 0.492 | 0.687 | 0.572 | 0.468 | 0.648 |
Task Type | Occ Pred. | Sem. Occ. Pred. | |||
---|---|---|---|---|---|
Method | RL 1 | CoHFF [37] | TSMV | CoHFF [37] | TSMV |
mean IoU | 57.12 | 65.24 | 65.76 | 31.72 | 34.04 |
Building (5.40%) | 67.5 | 44.08 | 43.63 | 24.00 | 31.49 |
Fence (0.85%) | 59.40 | 63.63 | 64.15 | 22.86 | 27.71 |
Terrain (4.80%) | 43.60 | 76.99 | 77.37 | 10.21 | 12.89 |
Pole (0.39%) | 66.3 | 61.57 | 61.29 | 46.33 | 49.93 |
Road (40.53%) | 51.47 | 87.9 | 88.68 | 64.80 | 65.80 |
Side walk (35.64%) | 45.46 | 86.27 | 86.49 | 57.44 | 63.11 |
Vegetation (1.11%) | 43.61 | 33.26 | 33.94 | 10.47 | 10.86 |
Vehicles (9.14%) | 41.4 | 70.15 | 72.62 | 82.21 | 84.11 |
Wall (2.01%) | 71.51 | 80.48 | 81.13 | 38.23 | 37.74 |
Guard rail (0.04%) | 49.67 | 39.91 | 39.76 | 23.81 | 17.97 |
Traffic signs (0.05%) | 68.98 | 57.06 | 58.1 | 0.24 | 6.93 |
Bridge (0.04%) | 76.53 | 81.59 | 81.97 | 0.00 | 0.00 |
Method | Parameter (M) | Time (ms) | mIoU (prf) |
---|---|---|---|
No Fusion | 10.03 | 50 | 0.418 |
Late Fusion | 10.03 | 50 | 0.433 |
Early Fusion | 10.03 | 54 | 0.461 |
V2XViT [26] | 15.07 | 88 | 0.44 |
V2VSSC [25] | 10.03 | 31 | 0.462 |
TSMV(Ours) | 11.23 | 48 | 0.492 |
Two-Stream | NSCAT | mIoU (prf) | |
---|---|---|---|
Ego Stream | Fuse Stream | ||
✓ | × | × | 0.418 |
× | ✓ | × | 0.444 |
✓ | ✓ | × | 0.465 |
✓ | ✓ | ✓ | 0.492 |
Size | mIoU | Road | Car | Terrian | Buiding | Vegetable | Pole |
---|---|---|---|---|---|---|---|
3 | 0.475 | 0.7 | 0.673 | 0.528 | 0.303 | 0.462 | 0.183 |
5 | 0.477 | 0.709 | 0.662 | 0.532 | 0.334 | 0.461 | 0.17 |
7 | 0.48 | 0.711 | 0.658 | 0.563 | 0.347 | 0.466 | 0.132 |
9 | 0.492 | 0.708 | 0.666 | 0.577 | 0.338 | 0.481 | 0.18 |
11 | 0.486 | 0.715 | 0.653 | 0.586 | 0.333 | 0.487 | 0.143 |
Method | 0 | 0–500 | 0–1000 | 0–2000 | 0–3000 |
---|---|---|---|---|---|
No Fusion | 0.418 | 0.418 | 0.418 | 0.418 | 0.418 |
Late Fusion | 0.433 | 0.414 | 0.403 | 0.398 | 0.397 |
Early Fusion | 0.461 | 0.416 | 0.404 | 0.397 | 0.395 |
V2VSSC [25] | 0.462 | 0.427 | 0.415 | 0.409 | 0.406 |
TSMV | 0.492 | 0.452 | 0.44 | 0.435 | 0.433 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, J.; Zhang, Y.; Han, J.; Han, P.; Luo, K. Semantic Scene Completion in Autonomous Driving: A Two-Stream Multi-Vehicle Collaboration Approach. Sensors 2024, 24, 7702. https://doi.org/10.3390/s24237702
Li J, Zhang Y, Han J, Han P, Luo K. Semantic Scene Completion in Autonomous Driving: A Two-Stream Multi-Vehicle Collaboration Approach. Sensors. 2024; 24(23):7702. https://doi.org/10.3390/s24237702
Chicago/Turabian StyleLi, Junxuan, Yuanfang Zhang, Jiayi Han, Peng Han, and Kaiqing Luo. 2024. "Semantic Scene Completion in Autonomous Driving: A Two-Stream Multi-Vehicle Collaboration Approach" Sensors 24, no. 23: 7702. https://doi.org/10.3390/s24237702
APA StyleLi, J., Zhang, Y., Han, J., Han, P., & Luo, K. (2024). Semantic Scene Completion in Autonomous Driving: A Two-Stream Multi-Vehicle Collaboration Approach. Sensors, 24(23), 7702. https://doi.org/10.3390/s24237702