Neural Surfel Reconstruction: Addressing Loop Closure Challenges in Large-Scale 3D Neural Scene Mapping
Abstract
:1. Introduction
- We propose Neural Surfel Reconstruction, a 3D neural reconstruction system with loop closure constraints, which first combines learning geometric neural features with surfel elements.
- We design a novel surface representation using new type of surfel equipped with neural descriptors that unify geometry and position for robust 3D reconstruction.
- We employ surfel pose graph optimization for our neural surfels to improve tasks such as scene reconstruction in environments with large loops.
2. Related Works
2.1. Classical Geometry-Based Methods
2.2. Learning-Based Methods
3. Neural Surfel Reconstruction
3.1. Neural Surfel Representation and Extraction
- Randomly sample neural surfels densely from the original point cloud (at a rate of around 1:200–1000). This initializes an overcomplete set of surfels covering the object’s surfaces.
- Prune surfels that are too close to each other based on distance and normal deviation thresholds. This removes redundant surfels representing the same local region.
- For each remaining surfel, associate nearby points using K-Nearest Neighbor (KNN) search and generate an initial volumetric bounding region. Specifically, set a K value (e.g., K = 5) and calculate the voxel size as twice the distance between the surfel and its K-th nearest neighbor. This provides a spatial extent for the nearby points.
- Re-assign the points within each surfel’s volume to that surfel.
- Further cluster the points within each surfel’s volume into different surface patches and create one surfel per patch. Then, update those surfels’ associated points based on which cluster its center falls into.
- Outlier surfels that have a point cloud size of less than 20 points are removed, which reveals missing surfaces. New surfels will be initialized at these cluster centroids.
3.2. SDFs Prediction
3.3. Training and Inference
3.4. Neural Surfel Pose Graph
3.5. Mesh Generation
4. Results and Evaluation
4.1. Data Preparation
4.2. Surfel Graph Optimization
4.3. Object Reconstruction
4.4. Scene Reconstruction
4.5. Failure Cases and Limitation
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Groueix, T.; Fisher, M.; Kim, V.G.; Russell, B.C.; Aubry, M. A papier-mâché approach to learning 3d surface generation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 216–224. [Google Scholar]
- Whelan, T.; Leutenegger, S.; Salas-Moreno, R.; Glocker, B.; Davison, A. ElasticFusion: Dense SLAM without a pose graph. In Proceedings of the Robotics: Science and Systems, Rome, Italy, 13–17 July 2015; p. 11. [Google Scholar]
- Weise, T.; Wismer, T.; Leibe, B.; Van Gool, L. Online loop closure for real-time interactive 3D scanning. Comput. Vis. Image Underst. 2011, 115, 635–648. [Google Scholar] [CrossRef]
- Behley, J.; Stachniss, C. Efficient Surfel-Based SLAM using 3D Laser Range Data in Urban Environments. In Proceedings of the Robotics: Science and Systems, Pittsburgh, PA, USA, 26–30 June 2018; Volume 2018, p. 59. [Google Scholar]
- Park, J.J.; Florence, P.; Straub, J.; Newcombe, R.; Lovegrove, S. Deepsdf: Learning continuous signed distance functions for shape representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 165–174. [Google Scholar]
- Sucar, E.; Liu, S.; Ortiz, J.; Davison, A.J. iMAP: Implicit mapping and positioning in real-time. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 6229–6238. [Google Scholar]
- Chabra, R.; Lenssen, J.E.; Ilg, E.; Schmidt, T.; Straub, J.; Lovegrove, S.; Newcombe, R. Deep local shapes: Learning local sdf priors for detailed 3d reconstruction. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part XXIX 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 608–625. [Google Scholar]
- Peng, S.; Niemeyer, M.; Mescheder, L.; Pollefeys, M.; Geiger, A. Convolutional occupancy networks. In Computer Vision–ECCV 2020, Proceedings of the 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part III 16; Springer: Berlin/Heidelberg, Germany, 2020; pp. 523–540. [Google Scholar]
- Zhu, Z.; Peng, S.; Larsson, V.; Xu, W.; Bao, H.; Cui, Z.; Oswald, M.R.; Pollefeys, M. Nice-slam: Neural implicit scalable encoding for slam. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 12786–12796. [Google Scholar]
- Yang, X.; Li, H.; Zhai, H.; Ming, Y.; Liu, Y.; Zhang, G. Vox-Fusion: Dense tracking and mapping with voxel-based neural implicit representation. In Proceedings of the 2022 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), Singapore, 17–21 October 2022; pp. 499–507. [Google Scholar]
- Jiang, C.; Sud, A.; Makadia, A.; Huang, J.; Nießner, M.; Funkhouser, T. Local implicit grid representations for 3d scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 6001–6010. [Google Scholar]
- Newcombe, R.A.; Izadi, S.; Hilliges, O.; Molyneaux, D.; Kim, D.; Davison, A.J.; Kohi, P.; Shotton, J.; Hodges, S.; Fitzgibbon, A. Kinectfusion: Real-time dense surface mapping and tracking. In Proceedings of the 2011 10th IEEE International Symposium on Mixed and Augmented Reality, Basel, Switzerland, 26–29 October 2011; pp. 127–136. [Google Scholar]
- Dai, A.; Nießner, M.; Zollhöfer, M.; Izadi, S.; Theobalt, C. Bundlefusion: Real-time globally consistent 3d reconstruction using on-the-fly surface reintegration. ACM Trans. Graph. (ToG) 2017, 36, 1. [Google Scholar] [CrossRef]
- Curless, B.; Levoy, M. A volumetric method for building complex models from range images. In Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, New Orleans, LA, USA, 4–9 August 1996; pp. 303–312. [Google Scholar]
- Keller, M.; Lefloch, D.; Lambers, M.; Izadi, S.; Weyrich, T.; Kolb, A. Real-time 3d reconstruction in dynamic scenes using point-based fusion. In Proceedings of the 2013 International Conference on 3D Vision-3DV 2013, Washington, DC, USA, 29 June–1 July 2013; pp. 1–8. [Google Scholar]
- Lefloch, D.; Weyrich, T.; Kolb, A. Anisotropic point-based fusion. In Proceedings of the 2015 18th International Conference on Information Fusion (Fusion), Washington, DC, USA, 6–9 July 2015; pp. 2121–2128. [Google Scholar]
- Lefloch, D.; Kluge, M.; Sarbolandi, H.; Weyrich, T.; Kolb, A. Comprehensive use of curvature for robust and accurate online surface reconstruction. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2349–2365. [Google Scholar] [CrossRef] [PubMed]
- Pfister, H.; Zwicker, M.; Van Baar, J.; Gross, M. Surfels: Surface elements as rendering primitives. In Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, New Orleans, LA, USA, 23–28 July 2000; pp. 335–342. [Google Scholar]
- Yan, Z.; Ye, M.; Ren, L. Dense visual SLAM with probabilistic surfel map. IEEE Trans. Vis. Comput. Graph. 2017, 23, 2389–2398. [Google Scholar] [CrossRef] [PubMed]
- Zhang, J.; Singh, S. LOAM: Lidar odometry and mapping in real-time. In Proceedings of the Robotics: Science and Systems, Berkeley, CA, USA, 12–16 July 2014; Volume 2, pp. 1–9. [Google Scholar]
- Cui, J.; Schwertfeger, S. CP+: Camera Poses Augmentation with Large-scale LiDAR Maps. In Proceedings of the 2022 IEEE International Conference on Real-time Computing and Robotics (RCAR), Guiyang, China, 17–22 July 2022; pp. 69–74. [Google Scholar]
- Vizzo, I.; Chen, X.; Chebrolu, N.; Behley, J.; Stachniss, C. Poisson surface reconstruction for LiDAR odometry and mapping. In Proceedings of the 2021 IEEE international conference on robotics and automation (ICRA), Xian, China, 30 May–5 June 2021; pp. 5624–5630. [Google Scholar]
- Ruan, J.; Li, B.; Wang, Y.; Sun, Y. Slamesh: Real-time lidar simultaneous localization and meshing. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May–2 June 2023; pp. 3546–3552. [Google Scholar]
- Weder, S.; Schonberger, J.; Pollefeys, M.; Oswald, M.R. Routedfusion: Learning real-time depth map fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 4887–4897. [Google Scholar]
- Weder, S.; Schonberger, J.L.; Pollefeys, M.; Oswald, M.R. Neuralfusion: Online depth fusion in latent space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 3162–3172. [Google Scholar]
- Teed, Z.; Deng, J. Droid-slam: Deep visual slam for monocular, stereo, and rgb-d cameras. Adv. Neural Inf. Process. Syst. 2021, 34, 16558–16569. [Google Scholar]
- Gkioxari, G.; Malik, J.; Johnson, J. Mesh r-cnn. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9785–9795. [Google Scholar]
- Wang, N.; Zhang, Y.; Li, Z.; Fu, Y.; Liu, W.; Jiang, Y.G. Pixel2mesh: Generating 3d mesh models from single rgb images. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 52–67. [Google Scholar]
- Yang, X.; Cao, M.; Li, C.; Zhao, H.; Yang, D. Learning Implicit Neural Representation for Satellite Object Mesh Reconstruction. Remote Sens. 2023, 15, 4163. [Google Scholar] [CrossRef]
- Chen, Z.; Zhang, H. Learning implicit fields for generative shape modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–17 June 2019; pp. 5939–5948. [Google Scholar]
- Mescheder, L.; Oechsle, M.; Niemeyer, M.; Nowozin, S.; Geiger, A. Occupancy networks: Learning 3d reconstruction in function space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4460–4470. [Google Scholar]
- Boulch, A.; Marlet, R. Poco: Point convolution for surface reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 6302–6314. [Google Scholar]
- Choy, C.B.; Xu, D.; Gwak, J.; Chen, K.; Savarese, S. 3d-r2n2: A unified approach for single and multi-view 3d object reconstruction. In Computer Vision–ECCV 2016, Proceedings of the 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part VIII 14; Springer: Berlin/Heidelberg, Germany, 2016; pp. 628–644. [Google Scholar]
- Wang, W.; Gao, F.; Shen, Y. Res-NeuS: Deep Residuals and Neural Implicit Surface Learning for Multi-View Reconstruction. Sensors 2024, 24, 881. [Google Scholar] [CrossRef] [PubMed]
- Li, Z.; Müller, T.; Evans, A.; Taylor, R.H.; Unberath, M.; Liu, M.Y.; Lin, C.H. Neuralangelo: High-fidelity neural surface reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 8456–8465. [Google Scholar]
- Mildenhall, B.; Srinivasan, P.P.; Tancik, M.; Barron, J.T.; Ramamoorthi, R.; Ng, R. Nerf: Representing scenes as neural radiance fields for view synthesis. Commun. ACM 2021, 65, 99–106. [Google Scholar] [CrossRef]
- Azinović, D.; Martin-Brualla, R.; Goldman, D.B.; Nießner, M.; Thies, J. Neural rgb-d surface reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 6290–6301. [Google Scholar]
- Huang, J.; Huang, S.S.; Song, H.; Hu, S.M. Di-fusion: Online implicit 3d reconstruction with deep priors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 10–25 June 2021; pp. 8932–8941. [Google Scholar]
- Li, K.; Tang, Y.; Prisacariu, V.A.; Torr, P.H. Bnv-fusion: Dense 3d reconstruction using bi-level neural volume fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 6166–6175. [Google Scholar]
- Jiang, C.; Shao, H. Fast 3D Reconstruction of UAV Images Based on Neural Radiance Field. Appl. Sci. 2023, 13, 10174. [Google Scholar] [CrossRef]
- Ge, Y.; Guo, B.; Zha, P.; Jiang, S.; Jiang, Z.; Li, D. 3D Reconstruction of Ancient Buildings Using UAV Images and Neural Radiation Field with Depth Supervision. Remote Sens. 2024, 16, 473. [Google Scholar] [CrossRef]
- Wang, P.; Liu, L.; Liu, Y.; Theobalt, C.; Komura, T.; Wang, W. Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. arXiv 2021, arXiv:2106.10689. [Google Scholar]
- Zhang, X.; Bi, S.; Sunkavalli, K.; Su, H.; Xu, Z. Nerfusion: Fusing radiance fields for large-scale scene reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 5449–5458. [Google Scholar]
- Xu, Q.; Xu, Z.; Philip, J.; Bi, S.; Shu, Z.; Sunkavalli, K.; Neumann, U. Point-nerf: Point-based neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 5438–5448. [Google Scholar]
- Cao, J.; Zhao, X.; Schwertfeger, S. Large-Scale Indoor Visual–Geometric Multimodal Dataset and Benchmark for Novel View Synthesis. Sensors 2024, 24, 5798. [Google Scholar] [CrossRef] [PubMed]
- Zhou, Y.; Zeng, Z.; Chen, A.; Zhou, X.; Ni, H.; Zhang, S.; Li, P.; Liu, L.; Zheng, M.; Chen, X. Evaluating modern approaches in 3d scene reconstruction: Nerf vs gaussian-based methods. In Proceedings of the 2024 6th International Conference on Data-Driven Optimization of Complex Systems (DOCS), Hangzhou, China, 16–18 August 2024; pp. 926–931. [Google Scholar]
- Gao, Y.; Cao, Y.P.; Shan, Y. SurfelNeRF: Neural Surfel Radiance Fields for Online Photorealistic Reconstruction of Indoor Scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 108–118. [Google Scholar]
- Kerbl, B.; Kopanas, G.; Leimkühler, T.; Drettakis, G. 3D Gaussian Splatting for Real-Time Radiance Field Rendering. ACM Trans. Graph. 2023, 42, 139:1–139:14. [Google Scholar] [CrossRef]
- Cui, J.; Cao, J.; Zhong, Y.; Wang, L.; Zhao, F.; Wang, P.; Chen, Y.; He, Z.; Xu, L.; Shi, Y.; et al. LetsGo: Large-Scale Garage Modeling and Rendering via LiDAR-Assisted Gaussian Primitives. arXiv 2024, arXiv:2404.09748. [Google Scholar]
- Handa, A.; Pătrăucean, V.; Stent, S.; Cipolla, R. Scenenet: An annotated model generator for indoor scene understanding. In Proceedings of the 2016 IEEE International Conference on Robotics and Automation (ICRA), Stockholm, Sweden, 16–21 May 2016; pp. 5737–5743. [Google Scholar]
- Lorensen, W.E.; Cline, H.E. Marching cubes: A high resolution 3D surface construction algorithm. In Seminal Graphics: Pioneering Efforts that Shaped the Field; ACM, Inc.: New York, NY, USA, 1998; pp. 347–353. [Google Scholar]
- Sumner, R.W.; Schmid, J.; Pauly, M. Embedded deformation for shape manipulation. In ACM Siggraph 2007 Papers; ACM, Inc.: New York, NY, USA, 2007; p. 80-es. [Google Scholar]
- Chen, J.; Izadi, S.; Fitzgibbon, A. KinÊtre: Animating the world with the human body. In Proceedings of the 25th Annual ACM Symposium on User Interface Software and Technology, Cambridge, MA, USA, 7–10 October 2012; pp. 435–444. [Google Scholar]
- Chang, A.X.; Funkhouser, T.; Guibas, L.; Hanrahan, P.; Huang, Q.; Li, Z.; Savarese, S.; Savva, M.; Song, S.; Su, H.; et al. Shapenet: An information-rich 3d model repository. arXiv 2015, arXiv:1512.03012. [Google Scholar]
- Straub, J.; Whelan, T.; Ma, L.; Chen, Y.; Wijmans, E.; Green, S.; Engel, J.J.; Mur-Artal, R.; Ren, C.; Verma, S.; et al. The Replica dataset: A digital replica of indoor spaces. arXiv 2019, arXiv:1906.05797. [Google Scholar]
- Dai, A.; Chang, A.X.; Savva, M.; Halber, M.; Funkhouser, T.; Nießner, M. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5828–5839. [Google Scholar]
- Vizzo, I.; Guadagnino, T.; Mersch, B.; Wiesmann, L.; Behley, J.; Stachniss, C. Kiss-icp: In defense of point-to-point icp–simple, accurate, and robust registration if done the right way. IEEE Robot. Autom. Lett. 2023, 8, 1029–1036. [Google Scholar] [CrossRef]
ASE (m) | Initial Position | Optimized Result |
---|---|---|
SceneNet Bedroom | 0.9781 | 0.0809 |
SceneNet Living-room | 0.5323 | 0.1341 |
SceneNet Office | 0.7122 | 0.0731 |
Replica Office 2 | 0.6441 | 0.1202 |
Replica Office 3 | 0.6231 | 0.0911 |
Replica Apartment 0 | 1.2021 | 0.3218 |
Replica Room 0 | 0.4230 | 0.0208 |
Replica Room 1 | 0.7413 | 0.0730 |
CD (↓) | Ours | DeepSDF | DeepLS |
---|---|---|---|
sofa | 0.132 | 0.141 | 0.044 |
chair | 0.204 | 0.117 | 0.030 |
lamp | 0.832 | 1.034 | 0.078 |
table | 0.553 | 0.341 | 0.032 |
Method | SceneNet [50] | Replica [55] | ||||
---|---|---|---|---|---|---|
CD (↓) | Normal (↑) | F-Score (↑) | CD (↓) | Normal (↑) | F-Score (↑) | |
DeepSDF [5] | 4.611 | 0.510 | 0.068 | 8.123 | 0.122 | 0.232 |
DeepLS [7] | 12.836 | 0.001 | 0.470 | 5.642 | 0.041 | 0.341 |
ConvONet [8] | 0.076 | 0.510 | 0.692 | 0.082 | 0.412 | 0.703 |
LIG [11] | 0.059 | 0.517 | 0.623 | 0.043 | 0.519 | 0.663 |
POCO [32] | 0.062 | 0.547 | 0.652 | 0.041 | 0.621 | 0.688 |
Poisson | 0.084 | 0.374 | 0.401 | 0.120 | 0.476 | 0.612 |
Ours | 0.056 | 0.578 | 0.694 | 0.048 | 0.682 | 0.692 |
Poisson* | 0.049 | 0.445 | 0.854 | 0.086 | 0.511 | 0.820 |
File Size (MB) | Point Clouds | Ours | Poisson | Poisson* |
---|---|---|---|---|
SceneNet | 132.8 | 2.8 | 4.8 | 1224.3 |
ScanNet++ | 42.9 | 7.1 | 12.3 | 612.2 |
Dyson Lab | 28.0 | 0.8 | 10.6 | 295.6 |
UAV | 411.7 | 3.2 | 34.9 | 3422.1 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cui, J.; Zhang, J.; Kneip, L.; Schwertfeger, S. Neural Surfel Reconstruction: Addressing Loop Closure Challenges in Large-Scale 3D Neural Scene Mapping. Sensors 2024, 24, 6919. https://doi.org/10.3390/s24216919
Cui J, Zhang J, Kneip L, Schwertfeger S. Neural Surfel Reconstruction: Addressing Loop Closure Challenges in Large-Scale 3D Neural Scene Mapping. Sensors. 2024; 24(21):6919. https://doi.org/10.3390/s24216919
Chicago/Turabian StyleCui, Jiadi, Jiajie Zhang, Laurent Kneip, and Sören Schwertfeger. 2024. "Neural Surfel Reconstruction: Addressing Loop Closure Challenges in Large-Scale 3D Neural Scene Mapping" Sensors 24, no. 21: 6919. https://doi.org/10.3390/s24216919