Neural Radiance Field Dynamic Scene SLAM Based on Ray Segmentation and Bundle Adjustment
Abstract
:1. Introduction
- We propose a joint dynamic pixel detection and segmentation method based on semantic segmentation and Conditional Random Fields (CRFs), adaptively adjusting the loss function to achieve tracking and mapping in dynamic environments. By maintaining a set of dynamic pixels, we accomplish pixel filling and completion.
- We propose a semantic-based neural implicit dynamic SLAM framework that utilizes an Implicit Truncated Signed Distance Field (TSDF) representation, enabling tracking and mapping in dynamic environments.
- We propose the loop closure detection and keyframe selection strategies based on Lucas–Kanade (LK) optical flow, accomplishing inter-frame matching by calculating optical flow vectors. We incorporate loop closure keyframes into the optimization process, implementing loop closure detection in the neural implicit SLAM system.
2. Related Work
2.1. Dynamic Visual SLAM
2.2. Neural Implicit SLAM
3. Methodology
3.1. Dynamic and Static Pixel Segmentation Based on Conditional Random Fields
3.2. Neural Implicit Rendering for Dynamic Environments
3.3. Optical Flow-Based Dynamic Tracking and Loop Closure Detection
4. Experiments
4.1. Experiments Settings
4.2. Results on TUM RGB-D
4.3. Results on Bonn
4.4. Results on Kitti
4.5. Ablation Study
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Montiel, J.; Mur-Arta, R.; Tardós, J.D. ORB-SLAM: A versatile and accurate monocular. IEEE Trans. Robot. 2015, 31, 1147–1163. [Google Scholar]
- Li, M.; Liu, S.; Zhou, H. Sgs-slam: Semantic gaussian splatting for neural dense slam. arXiv 2024, arXiv:2402.03246. [Google Scholar]
- Newcombe, R.A.; Izadi, S.; Hilliges, O.; Molyneaux, D.; Kim, D.; Davison, A.J.; Kohi, P.; Shotton, J.; Hodges, S.; Fitzgibbon, A. Kinectfusion: Real-time dense surface mapping and tracking. In Proceedings of the 2011 10th IEEE International Symposium on Mixed and Augmented Reality, Basel, Switzerland, 26–29 October 2011; pp. 127–136. [Google Scholar]
- Whelan, T.; Leutenegger, S.; Salas-Moreno, R.F.; Glocker, B.; Davison, A.J. ElasticFusion: Dense SLAM without a pose graph. In Proceedings of the Robotics: Science and Systems, Rome, Italy, 13–17 July 2015; Volume 11, p. 3. [Google Scholar]
- Dai, A.; Nießner, M.; Zollhöfer, M.; Izadi, S.; Theobalt, C. Bundlefusion: Real-time globally consistent 3d reconstruction using on-the-fly surface reintegration. ACM Trans. Graph. (ToG) 2017, 36, 1. [Google Scholar] [CrossRef]
- Yang, X.; Ming, Y.; Cui, Z.; Calway, A. Fd-slam: 3-d reconstruction using features and dense matching. In Proceedings of the 2022 International Conference on Robotics and Automation (ICRA), Philadelphia, PA, USA, 23–27 May 2022; pp. 8040–8046. [Google Scholar]
- Li, M.; Huang, J.; Sun, L.; Tian, A.X.; Deng, T.; Wang, H. NGM-SLAM: Gaussian Splatting SLAM with Radiance Field Submap. arXiv 2024, arXiv:2405.05702. [Google Scholar]
- Dai, W.; Zhang, Y.; Li, P.; Fang, Z.; Scherer, S. Rgb-d slam in dynamic environments using point correlations. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 44, 373–389. [Google Scholar] [CrossRef] [PubMed]
- Tang, B.; Cao, S. A review of VSLAM technology applied in augmented reality. IOP Conf. Ser. Mater. Sci. Eng. 2020, 782, 042014. [Google Scholar] [CrossRef]
- Cheng, J.; Zhang, L.; Chen, Q.; Hu, X.; Cai, J. A review of visual SLAM methods for autonomous driving vehicles. Eng. Appl. Artif. Intell. 2022, 114, 104992. [Google Scholar] [CrossRef]
- Kerl, C.; Sturm, J.; Cremers, D. Dense visual SLAM for RGB-D cameras. In Proceedings of the 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, Tokyo, Japan, 3–7 November 2013; pp. 2100–2106. [Google Scholar]
- Mildenhall, B.; Srinivasan, P.P.; Tancik, M.; Barron, J.T.; Ramamoorthi, R.; Ng, R. Nerf: Representing scenes as neural radiance fields for view synthesis. Commun. ACM 2021, 65, 99–106. [Google Scholar] [CrossRef]
- Li, M.; He, J.; Jiang, G.; Wang, H. Ddn-slam: Real-time dense dynamic neural implicit slam with joint semantic encoding. arXiv 2024, arXiv:2401.01545. [Google Scholar]
- He, J.; Li, M.; Wang, Y.; Wang, H. OVD-SLAM: An online visual SLAM for dynamic environments. IEEE Sens. J. 2023, 23, 13210–13219. [Google Scholar] [CrossRef]
- Li, M.; He, J.; Wang, Y.; Wang, H. End-to-end rgb-d slam with multi-mlps dense neural implicit representations. IEEE Robot. Autom. Lett. 2023, 8, 7138–7145. [Google Scholar] [CrossRef]
- Zhou, H.; Guo, Z.; Liu, S.; Zhang, L.; Wang, Q.; Ren, Y.; Li, M. MoD-SLAM: Monocular Dense Mapping for Unbounded 3D Scene Reconstruction. arXiv 2024, arXiv:2402.03762. [Google Scholar] [CrossRef]
- Sucar, E.; Liu, S.; Ortiz, J.; Davison, A.J. imap: Implicit mapping and positioning in real-time. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 6229–6238. [Google Scholar]
- Zhu, Z.; Peng, S.; Larsson, V.; Xu, W.; Bao, H.; Cui, Z.; Oswald, M.R.; Pollefeys, M. Nice-slam: Neural implicit scalable encoding for slam. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 12786–12796. [Google Scholar]
- Johari, M.M.; Carta, C.; Fleuret, F. Eslam: Efficient dense slam system based on hybrid representation of signed distance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 17408–17419. [Google Scholar]
- Wang, H.; Wang, J.; Agapito, L. Co-slam: Joint coordinate and sparse parametric encodings for neural real-time slam. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 13293–13302. [Google Scholar]
- Yu, C.; Liu, Z.; Liu, X.J.; Xie, F.; Yang, Y.; Wei, Q.; Fei, Q. DS-SLAM: A semantic visual SLAM towards dynamic environments. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 1168–1174. [Google Scholar]
- Bescos, B.; Fácil, J.M.; Civera, J.; Neira, J. DynaSLAM: Tracking, mapping, and inpainting in dynamic scenes. IEEE Robot. Autom. Lett. 2018, 3, 4076–4083. [Google Scholar] [CrossRef]
- Bescos, B.; Campos, C.; Tardós, J.D.; Neira, J. DynaSLAM II: Tightly-coupled multi-object tracking and SLAM. IEEE Robot. Autom. Lett. 2021, 6, 5191–5198. [Google Scholar] [CrossRef]
- Du, Z.J.; Huang, S.S.; Mu, T.J.; Zhao, Q.; Martin, R.R.; Xu, K. Accurate dynamic SLAM using CRF-based long-term consistency. IEEE Trans. Vis. Comput. Graph. 2020, 28, 1745–1757. [Google Scholar] [CrossRef] [PubMed]
- Wang, Y.; Xu, K.; Tian, Y.; Ding, X. DRG-SLAM: A semantic RGB-D SLAM using geometric features for indoor dynamic scene. In Proceedings of the 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Kyoto, Japan, 23–27 October 2022; pp. 1352–1359. [Google Scholar]
- Rünz, M.; Agapito, L. Co-fusion: Real-time segmentation, tracking and fusion of multiple objects. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; pp. 4471–4478. [Google Scholar]
- Rosinol, A.; Leonard, J.J.; Carlone, L. Nerf-slam: Real-time dense monocular slam with neural radiance fields. In Proceedings of the 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Detroit, MI, USA, 1–5 October 2023; pp. 3437–3444. [Google Scholar]
- Teed, Z.; Deng, J. Droid-slam: Deep visual slam for monocular, stereo, and rgb-d cameras. Adv. Neural Inf. Process. Syst. 2021, 34, 16558–16569. [Google Scholar]
- Baker, S.; Matthews, I. Lucas-kanade 20 years on: A unifying framework. Int. J. Comput. Vis. 2004, 56, 221–255. [Google Scholar] [CrossRef]
- Sturm, J.; Engelhard, N.; Endres, F.; Burgard, W.; Cremers, D. A benchmark for the evaluation of RGB-D SLAM systems. In Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, Vilamoura-Algarve, Portugal, 7–12 October 2012; pp. 573–580. [Google Scholar]
- Geiger, A.; Lenz, P.; Stiller, C.; Urtasun, R. Vision meets robotics: The kitti dataset. Int. J. Robot. Res. 2013, 32, 1231–1237. [Google Scholar] [CrossRef]
- Palazzolo, E.; Behley, J.; Lottes, P.; Giguere, P.; Stachniss, C. ReFusion: 3D reconstruction in dynamic environments for RGB-D cameras exploiting residuals. In Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 3–8 November 2019; pp. 7855–7862. [Google Scholar]
- Campos, C.; Elvira, R.; Rodríguez, J.J.G.; Montiel, J.M.; Tardós, J.D. Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam. IEEE Trans. Robot. 2021, 37, 1874–1890. [Google Scholar] [CrossRef]
- Xu, Z.; Niu, J.; Li, Q.; Ren, T.; Chen, C. Nid-slam: Neural implicit representation-based rgb-d slam in dynamic environments. arXiv 2024, arXiv:2401.01189. [Google Scholar]
Method | ORB-SLAM3 | NICE-SLAM | ESLAM | DYNA-SLAM | NID-SLAM | Ours |
---|---|---|---|---|---|---|
fr3/w/xyz | 0.507 | 0.305 | 0.716 | 0.087 | 0.091 | 0.045 |
fr3/w/half | 0.254 | 0.627 | 0.241 | 0.065 | 0.074 | 0.036 |
fr3/w/static | 0.109 | 0.093 | 0.036 | 0.039 | 0.041 | 0.022 |
fr3/w/rpy | 0.595 | 0.724 | 0.189 | 0.107 | 0.115 | 0.064 |
fr3/w/xyz_v | 0.764 | 0.583 | 0.297 | 0.078 | 0.081 | 0.043 |
fr3/w/half_v | 0.351 | 0.296 | 0.175 | 0.059 | 0.064 | 0.034 |
fr3/s/xyz | 0.013 | 0.394 | 0.025 | 0.053 | 0.061 | 0.021 |
fr3/s/half | 0.026 | 0.109 | 0.019 | 0.031 | 0.037 | 0.016 |
Method | ORB-SLAM3 | NICE-SLAM | ESLAM | Ours |
---|---|---|---|---|
balloon1 | 0.078 | 2.234 | 0.204 | 0.026 |
balloon2 | 0.245 | 1.989 | 0.236 | 0.031 |
move1 | 0.230 | 0.213 | 0.079 | 0.023 |
move2 | 0.127 | 0.816 | 0.103 | 0.027 |
crowd1 | 0.335 | 1.765 | 0.317 | 0.015 |
crowd2 | 0.762 | 3.481 | 1.143 | 0.024 |
person1 | 0.723 | 0.233 | 0.147 | 0.042 |
person2 | 0.971 | 0.467 | 0.453 | 0.066 |
Method | ORB-SLAM3 | NICE-SLAM | ESLAM | DYNA-SLAM | NID-SLAM | Ours |
---|---|---|---|---|---|---|
KITTI 00 | 1.7 | 7.0 | 5.9 | 1.4 | 4.2 | 1.2 |
KITTI 01 | 10.4 | 47.0 | 38.6 | 9.4 | 28.2 | 8.1 |
KITTI 02 | 5.4 | 36.5 | 23.8 | 6.7 | 20.1 | 5.7 |
KITTI 03 | 0.7 | 5.0 | 4.4 | 0.6 | 1.8 | 0.5 |
KITTI 04 | 0.4 | 2.3 | 1.8 | 0.2 | 0.6 | 0.3 |
w/o Dynamic–Static Seg | w/ Dynamic–Static Seg | |
---|---|---|
ATE RMSE (m) ↓ | 0.078 | 0.026 |
STD (m) ↓ | 0.051 | 0.009 |
w/o Loop Detection | w/ Loop Detection | |
---|---|---|
ATE RMSE (m) ↓ | 0.059 | 0.031 |
STD (m) ↓ | 0.033 | 0.012 |
Method | Track. (ms) | Map. (ms) | FPS | GPU Usage |
---|---|---|---|---|
ESLAM | 7.5 | 7.6 G | ||
NICE-SLAM | 0.08 | 14.1 G | ||
DynaSLAM | 13.7 | 8.8 G | ||
Ours | 15.3 | 5.7 G |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, Y.; Feng, G. Neural Radiance Field Dynamic Scene SLAM Based on Ray Segmentation and Bundle Adjustment. Sensors 2025, 25, 1679. https://doi.org/10.3390/s25061679
Zhang Y, Feng G. Neural Radiance Field Dynamic Scene SLAM Based on Ray Segmentation and Bundle Adjustment. Sensors. 2025; 25(6):1679. https://doi.org/10.3390/s25061679
Chicago/Turabian StyleZhang, Yuquan, and Guosheng Feng. 2025. "Neural Radiance Field Dynamic Scene SLAM Based on Ray Segmentation and Bundle Adjustment" Sensors 25, no. 6: 1679. https://doi.org/10.3390/s25061679
APA StyleZhang, Y., & Feng, G. (2025). Neural Radiance Field Dynamic Scene SLAM Based on Ray Segmentation and Bundle Adjustment. Sensors, 25(6), 1679. https://doi.org/10.3390/s25061679