Real-Time Drivable Region Mapping Using an RGB-D Sensor with Loop Closure Refinement and 3D Semantic Map-Merging
Abstract
:1. Introduction
- Semantic visual SLAM and map-merging methods are proposed to identify drivable areas in large-scale environments. Our method allows for the visualization of drivable areas that traditional visual SLAM cannot depict;
- A loop closure refinement for semantic information is developed to build seamless semantic 3D maps. This approach allows for the simultaneous use of semantic segmentation and visual SLAM results to create a unified semantic map, minimizing the gap between semantic information and world/agent data;
- A 3D point cloud registration-based semantic map-merging is performed not only on 3D point descriptors but also on semantic labels to create a large-scale map. Our proposed method uses fewer points in overlapping areas, thereby accelerating the map-merging process;
- Our proposed system was tested in real-world experiments by attaching a single RGB-D camera to a vehicle. We constructed a large-scale drivable map from multiple segmented datasets, demonstrating higher accuracy than existing methods.
2. Related Work
3. Methodology
3.1. Problem Formulation
3.2. System Overview
3.3. Drivable Region Mapping
3.4. Map-Merging
Algorithm 1 Map-merging |
|
Algorithm 2 Extracting point clouds in the overlapping region |
|
4. Experiments and Results
4.1. Datasets
4.2. Drivable Region Mapping Results
4.3. Map-Merging Results
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Matsuki, H.; Murai, R.; Kelly, P.H.; Davison, A.J. Gaussian splatting slam. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle WA, USA, 17–21 June 2024; pp. 18039–18048. [Google Scholar]
- Xu, H.; Liu, P.; Chen, X.; Shen, S. D2SLAM: Decentralized and Distributed Collaborative Visual-Inertial SLAM System for Aerial Swarm. IEEE Trans. Robot. 2024, 40, 3445–3464. [Google Scholar] [CrossRef]
- Lu, B.X.; Tsai, Y.C.; Tseng, K.S. GRVINS: Tightly Coupled GNSS-Range-Visual-Inertial System. J. Intell. Robot. Syst. 2024, 110, 36. [Google Scholar] [CrossRef]
- Xu, M.; Zhang, Z.; Wei, F.; Hu, H.; Bai, X. Side adapter network for open-vocabulary semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 2945–2954. [Google Scholar]
- Zhang, Z.; Yang, B.; Wang, B.; Li, B. Growsp: Unsupervised semantic segmentation of 3d point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 17619–17629. [Google Scholar]
- Ding, J.; Xue, N.; Xia, G.S.; Schiele, B.; Dai, D. Hgformer: Hierarchical grouping transformer for domain generalized semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 15413–15423. [Google Scholar]
- Wang, J.; Rünz, M.; Agapito, L. DSP-SLAM: Object oriented SLAM with deep shape priors. In Proceedings of the 2021 International Conference on 3D Vision (3DV), London, UK, 1–31 December 2021; IEEE: New York, NY, USA, 2021; pp. 1362–1371. [Google Scholar]
- Wu, Y.; Zhang, Y.; Zhu, D.; Deng, Z.; Sun, W.; Chen, X.; Zhang, J. An object slam framework for association, mapping, and high-level tasks. IEEE Trans. Robot. 2023, 39, 2912–2932. [Google Scholar] [CrossRef]
- Wang, Y.; Xu, B.; Fan, W.; Xiang, C. Qiso-slam: Object-oriented slam using dual quadrics as landmarks based on instance segmentation. IEEE Robot. Autom. Lett. 2023, 8, 2253–2260. [Google Scholar] [CrossRef]
- Lee, H.C.; Lee, S.H.; Choi, M.H.; Lee, B.H. Probabilistic map merging for multi-robot RBPF-SLAM with unknown initial poses. Robotica 2012, 30, 205–220. [Google Scholar] [CrossRef]
- Birk, A.; Carpin, S. Merging Occupancy Grid Maps From Multiple Robots. Proc. IEEE 2006, 94, 1384–1397. [Google Scholar] [CrossRef]
- Ma, L.; Zhu, J.; Zhu, L.; Du, S.; Cui, J. Merging grid maps of different resolutions by scaling registration. Robotica 2016, 34, 2516–2531. [Google Scholar] [CrossRef]
- Cui, M.; Femiani, J.; Hu, J.; Wonka, P.; Razdan, A. Curve matching for open 2D curves. Pattern Recognit. Lett. 2009, 30, 1–10. [Google Scholar] [CrossRef]
- Amigoni, F.; Gasparini, S.; Gini, M. Building Segment-Based Maps Without Pose Information. Proc. IEEE 2006, 94, 1340–1359. [Google Scholar] [CrossRef]
- Huang, W.H.; Beevers, K.R. Topological map merging. Int. J. Robot. Res. 2005, 24, 601–613. [Google Scholar] [CrossRef]
- Bernuy, F.; Ruiz-del Solar, J. Topological semantic mapping and localization in urban road scenarios. J. Intell. Robot. Syst. 2018, 92, 19–32. [Google Scholar] [CrossRef]
- Groß, J.; Ošep, A.; Leibe, B. Alignnet-3d: Fast point cloud registration of partially observed objects. In Proceedings of the 2019 International Conference on 3D Vision (3DV), Québec City, QC, Canada, 16–19 September 2019; IEEE: New York, NY, USA, 2019; pp. 623–632. [Google Scholar]
- Le, H.M.; Do, T.T.; Hoang, T.; Cheung, N.M. SDRSAC: Semidefinite-based randomized approach for robust point cloud registration without correspondences. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 124–133. [Google Scholar]
- Pais, G.D.; Ramalingam, S.; Govindu, V.M.; Nascimento, J.C.; Chellappa, R.; Miraldo, P. 3dregnet: A deep neural network for 3d point registration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 7193–7203. [Google Scholar]
- Chiciudean, V.; Florea, H.; Beche, R.; Oniga, F.; Nedevschi, S. Data augmentation for environment perception with unmanned aerial vehicles. IEEE Trans. Intell. Veh. 2024. [Google Scholar] [CrossRef]
- Engel, J.; Koltun, V.; Cremers, D. Direct Sparse Odometry. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 611–625. [Google Scholar] [CrossRef] [PubMed]
- Ma, Z.W.; Cheng, W.S. Visual-Inertial RGB-D SLAM with Encoder Integration of ORB Triangulation and Depth Measurement Uncertainties. Sensors 2024, 24, 5964. [Google Scholar] [CrossRef] [PubMed]
- Jo, H.; Lee, W.; Kim, E. Mixture density-PoseNet and its application to monocular camera-based global localization. IEEE Trans. Ind. Inform. 2020, 17, 388–397. [Google Scholar] [CrossRef]
- Qin, T.; Li, P.; Shen, S. VINS-Mono: A Robust and Versatile Monocular Visual-Inertial State Estimator. IEEE Trans. Robot. 2018, 34, 1004–1020. [Google Scholar] [CrossRef]
- Campos, C.; Elvira, R.; Rodríguez, J.J.G.; Montiel, J.M.M.; Tardós, J.D. ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual–Inertial, and Multimap SLAM. IEEE Trans. Robot. 2021, 37, 1874–1890. [Google Scholar] [CrossRef]
- Lu, F.; Milios, E. Globally consistent range scan alignment for environment mapping. Auton. Robot. 1997, 4, 333–349. [Google Scholar] [CrossRef]
- Elhashash, M.; Qin, R. Cross-view SLAM solver: Global pose estimation of monocular ground-level video frames for 3D reconstruction using a reference 3D model from satellite images. ISPRS J. Photogramm. Remote Sens. 2022, 188, 62–74. [Google Scholar] [CrossRef]
- Arun, A.; Ayyalasomayajula, R.; Hunter, W.; Bharadia, D. P2SLAM: Bearing based WiFi SLAM for indoor robots. IEEE Robot. Autom. Lett. 2022, 7, 3326–3333. [Google Scholar] [CrossRef]
- Duan, X.; Hu, Q.; Zhao, P.; Wang, S. A low-cost and portable indoor 3D mapping approach using biaxial line laser scanners and a one-dimension laser range finder integrated with microelectromechanical systems. Photogramm. Eng. Remote Sens. 2022, 88, 311–321. [Google Scholar] [CrossRef]
- Zhang, D.; Xu, B.; Hu, H.; Zhu, Q.; Wang, Q.; Ge, X.; Chen, M.; Zhou, Y. Spherical Hough Transform for Robust Line Detection Toward a 2D–3D Integrated Mobile Mapping System. Photogramm. Eng. Remote Sens. 2023, 89, 50–59. [Google Scholar] [CrossRef]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015, Proceedings, Part III; Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
- Ning, S.; Ding, F.; Chen, B. Research on the method of foreign object detection for railway tracks based on deep learning. Sensors 2024, 24, 4483. [Google Scholar] [CrossRef]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef]
- Zhang, Z.; Zhang, X.; Peng, C.; Xue, X.; Sun, J. Exfuse: Enhancing feature fusion for semantic segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 269–284. [Google Scholar]
- Fang, Y.; Wang, W.; Xie, B.; Sun, Q.; Wu, L.; Wang, X.; Huang, T.; Wang, X.; Cao, Y. Eva: Exploring the limits of masked visual representation learning at scale. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 19358–19369. [Google Scholar]
- Wang, X.; Kong, T.; Shen, C.; Jiang, Y.; Li, L. Solo: Segmenting objects by locations. In Computer Vision—ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020, Proceedings, Part XVIII; Springer: Cham, Switzerland, 2020; pp. 649–665. [Google Scholar]
- Bolya, D.; Zhou, C.; Xiao, F.; Lee, Y.J. Yolact: Real-time instance segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27–28 October 2019; pp. 9157–9166. [Google Scholar]
- Xu, J.; Xiong, Z.; Bhattacharyya, S.P. PIDNet: A Real-Time Semantic Segmentation Network Inspired by PID Controllers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 19529–19539. [Google Scholar]
- Gosala, N.; Petek, K.; Drews, P.L.J., Jr.; Burgard, W.; Valada, A. SkyEye: Self-Supervised Bird’s-Eye-View Semantic Mapping Using Monocular Frontal View Images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 14901–14910. [Google Scholar]
- Lin, S.; Wang, J.; Xu, M.; Zhao, H.; Chen, Z. Topology aware object-level semantic mapping towards more robust loop closure. IEEE Robot. Autom. Lett. 2021, 6, 7041–7048. [Google Scholar] [CrossRef]
- Chen, H.; Zhang, G.; Ye, Y. Semantic loop closure detection with instance-level inconsistency removal in dynamic industrial scenes. IEEE Trans. Ind. Inform. 2020, 17, 2030–2040. [Google Scholar] [CrossRef]
- Qin, C.; Zhang, Y.; Liu, Y.; Lv, G. Semantic loop closure detection based on graph matching in multi-objects scenes. J. Vis. Commun. Image Represent. 2021, 76, 103072. [Google Scholar] [CrossRef]
- Rusu, R.B.; Marton, Z.C.; Blodow, N.; Beetz, M. Learning informative point classes for the acquisition of object model maps. In Proceedings of the 2008 10th International Conference on Control, Automation, Robotics and Vision, Hanoi, Vietnam, 17–20 December 2008; pp. 643–650. [Google Scholar] [CrossRef]
- Rusu, R.B.; Blodow, N.; Beetz, M. Fast Point Feature Histograms (FPFH) for 3D registration. In Proceedings of the 2009 IEEE International Conference on Robotics and Automation, Kobe, Japan, 12–17 May 2009; pp. 3212–3217. [Google Scholar] [CrossRef]
- Arbeiter, G.; Fuchs, S.; Bormann, R.; Fischer, J.; Verl, A. Evaluation of 3D feature descriptors for classification of surface geometries in point clouds. In Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, Vilamoura, Algarve, Portugal, 7–12 October 2012; pp. 1644–1650. [Google Scholar] [CrossRef]
- Salti, S.; Tombari, F.; Di Stefano, L. SHOT: Unique signatures of histograms for surface and texture description. Comput. Vis. Image Underst. 2014, 125, 251–264. [Google Scholar] [CrossRef]
- Besl, P.J.; McKay, N.D. Method for registration of 3-D shapes. In Sensor Fusion IV: Control Paradigms and Data Structures; Schenker, P.S., Ed.; International Society for Optics and Photonics, SPIE: Cergy-Pontoise, France, 1992; Volume 1611, pp. 586–606. [Google Scholar] [CrossRef]
- Segal, A.; Haehnel, D.; Thrun, S. Generalized-icp. In Proceedings of the Robotics: Science and Systems, Seattle, WA, USA, 28 June–1 July 2009; Volume 2, p. 435. [Google Scholar]
- Cho, H.M.; Jo, H.; Kim, E. SP-SLAM: Surfel-Point Simultaneous Localization and Mapping. IEEE/ASME Trans. Mechatronics 2022, 27, 2568–2579. [Google Scholar] [CrossRef]
- Yang, H.; Shi, J.; Carlone, L. TEASER: Fast and Certifiable Point Cloud Registration. IEEE Trans. Robot. 2021, 37, 314–333. [Google Scholar] [CrossRef]
- Bentley, J.L. Multidimensional binary search trees used for associative searching. Commun. ACM 1975, 18, 509–517. [Google Scholar] [CrossRef]
- Mur-Artal, R.; Tardós, J.D. Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras. IEEE Trans. Robot. 2017, 33, 1255–1262. [Google Scholar] [CrossRef]
- Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The Cityscapes Dataset for Semantic Urban Scene Understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Jeong, J.; Cho, Y.; Shin, Y.S.; Roh, H.; Kim, A. Complex urban dataset with multi-level sensors from highly diverse urban environments. Int. J. Robot. Res. 2019, 38, 642–657. [Google Scholar] [CrossRef]
- Yang, L.; Kang, B.; Huang, Z.; Xu, X.; Feng, J.; Zhao, H. Depth anything: Unleashing the power of large-scale unlabeled data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 17–21 June 2024; pp. 10371–10381. [Google Scholar]
- Chen, L.C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking atrous convolution for semantic image segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890. [Google Scholar]
- Yu, C.; Gao, C.; Wang, J.; Yu, G.; Shen, C.; Sang, N. Bisenet v2: Bilateral network with guided aggregation for real-time semantic segmentation. Int. J. Comput. Vis. 2021, 129, 3051–3068. [Google Scholar] [CrossRef]
- Hong, Y.; Pan, H.; Sun, W.; Jia, Y. Deep dual-resolution networks for real-time and accurate semantic segmentation of road scenes. arXiv 2021, arXiv:2101.06085. [Google Scholar]
- Grupp, M. Evo: Python Package for the Evaluation of Odometry and SLAM. Available online: https://github.com/MichaelGrupp/evo (accessed on 1 December 2017).
Map1 | Map2 | Map3 | Map4 | Map5 | Map6 | Map7 | Map8 | |
---|---|---|---|---|---|---|---|---|
Length (m) | 491 | 754 | 1062 | 911 | 1482 | 531 | 374 | 510 |
Area (m2) | 15,947 | 37,046 | 38,689 | 33,237 | 76,572 | 11,966 | 9215 | 5030 |
Images | 2472 | 3119 | 4755 | 3447 | 6423 | 3361 | 1495 | 3987 |
Time (s) | 164 | 211 | 326 | 231 | 434 | 168 | 106 | 156 |
Map1 | Map2 | Map3 | Merged Map | |
---|---|---|---|---|
Length (m) | 3179 | 2219 | 5861 | 11,285 |
Images | 5673 | 3563 | 11,671 | 19,745 |
RMSE | 5.674 | 11.20 | 14.75 | 18.79 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ha, C.; Yang, D.; Wang, G.; Kim, S.C.; Jo, H. Real-Time Drivable Region Mapping Using an RGB-D Sensor with Loop Closure Refinement and 3D Semantic Map-Merging. Appl. Sci. 2024, 14, 11613. https://doi.org/10.3390/app142411613
Ha C, Yang D, Wang G, Kim SC, Jo H. Real-Time Drivable Region Mapping Using an RGB-D Sensor with Loop Closure Refinement and 3D Semantic Map-Merging. Applied Sciences. 2024; 14(24):11613. https://doi.org/10.3390/app142411613
Chicago/Turabian StyleHa, ChangWan, DongHyun Yang, Gicheol Wang, Sung Chang Kim, and HyungGi Jo. 2024. "Real-Time Drivable Region Mapping Using an RGB-D Sensor with Loop Closure Refinement and 3D Semantic Map-Merging" Applied Sciences 14, no. 24: 11613. https://doi.org/10.3390/app142411613
APA StyleHa, C., Yang, D., Wang, G., Kim, S. C., & Jo, H. (2024). Real-Time Drivable Region Mapping Using an RGB-D Sensor with Loop Closure Refinement and 3D Semantic Map-Merging. Applied Sciences, 14(24), 11613. https://doi.org/10.3390/app142411613