Incremental SFM 3D Reconstruction Based on Deep Learning
Abstract
:1. Introduction
2. Related Work
3. The Proposed Algorithm
3.1. UAV Dataset
3.2. Selecting Image Pairs to Match
3.3. SuperPoint and SuperGlue Overview
3.3.1. Feature Extraction
- During the pretraining of the feature point detector, synthetic images with pseudo-labels are used. The output is a heatmap with the same size as the input image. Each pixel of the heatmap represents the probability of that point being an interest point. By using non-maximum suppression (NMS), sparse feature points can be obtained, and the resulting model is called MagicPoint [27]. The synthetic images with clear edge and corner features are randomly generated by the computer. These images include triangles, squares, lines, cubes, checkerboards, and stars, among others. Since they are computer-generated, each image comes with the coordinates of feature points on the image. To further increase the robustness of the model, Gaussian noise and circles without any feature points were randomly added to the images during training, which resulted in better generalization and robustness compared to classical detectors.
- Combined with the Homography Adaptation mechanism, the feature point detector is trained using unlabeled real images. First, ( = 100) random homography transformations are applied to generate N Warp Images from the unlabeled images. These Warp Images are then fed into the MagicPoint model, and the detected feature points are projected back onto the original images. This combined set of projected feature points serves as the ground truth for training. By following this process, the detected feature points become more abundant and exhibit certain homography invariance.
- The original images, along with their corresponding images after homographic transformations, are fed into the SuperPoint network. Using the feature point locations and the correspondence relationships between them, the network is trained to generate feature points and descriptors.
3.3.2. Feature Matching
3.3.3. Matching Strategy
3.4. Bundle Adjustment
3.5. Multi-View Stereo
Geometric Filtering
4. Experiments
4.1. Feature Matching
4.2. Depth Map Estimation
4.3. Multi-View Stereo
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Noah, S.; Steven, M.S.; Richard, R. Photo tourism: Exploring photo collections in 3D. ACM Trans. Graph. 2006, 25, 835–846. [Google Scholar]
- Zheng, E.L.; Wu, C.C. Structure from Motion using Structure-less Resection. In Proceedings of the International Conference on Computer Vision (ICCV2015), Santiago, Chile, 13–16 December 2015; p. 240. [Google Scholar]
- Schonberger, J.L.; Frahm, J.M. Structure-from-motion revisited. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 4104–4113. [Google Scholar]
- Carsten, G.; Simone, G.; Fabien, C. AliceVision Meshroom: An open-source 3D reconstruction pipeline. In Proceedings of the 12th ACM Multimedia Systems Conference, Istanbul, Turkey, 28–30 September 2021; pp. 241–247. [Google Scholar]
- Gao, L.; Zhao, Y.B.; Han, J.C.; Liu, H.X. Research on Multi-View 3D Reconstruction Technology Based on SFM. Sensors 2022, 22, 4366. [Google Scholar] [CrossRef] [PubMed]
- Yin, H.Y.; Yu, H.Y. Incremental SFM 3D reconstruction based on monocular. In Proceedings of the 13th International Symposium on Computational Intelligence and Design (ISCID2020), Hangzhou, China, 12–13 December 2020; pp. 17–21. [Google Scholar]
- Triggs, B.; McLauchlan, P.F.; Hartley, R.I. Bundle adjustment—A modern synthesis. In Proceedings of the International Workshop on Vision Algorithms, Corfu, Greece, 21–22 September 1999; pp. 298–372. [Google Scholar]
- Xue, Y.D.; Zhang, S.; Zhou, M.L.; Zhu, H.H. Novel SfM-DLT method for metro tunnel 3D reconstruction and Visualization. Undergr. Space 2021, 6, 134–141. [Google Scholar] [CrossRef]
- Qu, Y.; Huang, J.; Zhang, X. Rapid 3D reconstruction for image sequence acquired from UAV camera. Sensors 2018, 18, 225. [Google Scholar] [CrossRef]
- Lindenberger, P.; Sarlin, P.E.; Larsson, V.; Pollefeys, M. Pixel-perfect structure-from-motion with featuremetric refinement. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–18 October 2021; pp. 5987–5997. [Google Scholar]
- Yao, Y.; Luo, Z.X.; Li, S.W.; Fang, T.; Quan, L. MVSNet: Depth inference for unstructured multi-view stereo. In Proceedings of the European Conference on Computer Vision (ECCV2018), Munich, Germany, 8–14 September 2018; pp. 767–783. [Google Scholar]
- Yao, Y.; Luo, Z.X.; Li, S.W.; Shen, T.W.; Fang, T.; Quan, L. Recurrent MVSNet for high-resolution multi-view stereo depth inference. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR2019), Long Beach, CA, USA, 16–20 June 2019; pp. 5525–5534. [Google Scholar]
- Dai, Y.C.; Zhu, Z.D.; Rao, Z.B.; Li, B. MVS2: Deep unsupervised multi-view stereo with multi-view symmetry. In Proceedings of the International Conference on 3D Vision, Quebec City, QC, Canada, 16–19 September 2019; pp. 1–8. [Google Scholar]
- Huang, B.C.; Yi, H.W.; Huang, C.; He, Y.J.; Liu, J.B.; Liu, X. M3VSNet: Unsupervised multi-metric multi-view stereo network. In Proceedings of the IEEE International Conference on Image Processing (ICIP2021), Anchorage, AK, USA, 19–22 September 2021; pp. 3163–3167. [Google Scholar]
- Mildenhall, B.; Srinivasan, P.P.; Tancik, M. Nerf: Representing scenes as neural radiance fields for view synthesis. Commun. ACM 2021, 65, 99–106. [Google Scholar] [CrossRef]
- Tancik, M.; Casser, V.; Yan, X. Block-nerf: Scalable large scene neural view synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022; pp. 8248–8258. [Google Scholar]
- Chen, A.; Xu, Z.; Zhao, F. MVSNerf: Fast generalizable radiance field reconstruction from multi-view stereo. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 14124–14133. [Google Scholar]
- Garbin, S.J.; Kowalski, M.; Johnson, M. FastNerf: High-fidelity neural rendering at 200fps. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 14346–14355. [Google Scholar]
- DeTone, D.; Malisiewicz, T.; Rabinovich, A. SuperPoint: Self-supervised interest point detection and description. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–22 June 2018; pp. 224–236. [Google Scholar]
- Sarlin, P.E.; DeTone, D.; Malisiewicz, T. Superglue: Learning feature matching with graph neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 4938–4947. [Google Scholar]
- Zhou, W.; Chen, X. Global convergence of a new hybrid Gauss–Newton structured BFGS method for nonlinear least squares problems. SIAM J. Optim. 2010, 20, 2422–2441. [Google Scholar] [CrossRef]
- Ma, F.; Karaman, S. Sparse-to-dense: Depth prediction from sparse depth samples and a single image. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA2018), Brisbane, Australia, 21–25 May 2018; pp. 4796–4803. [Google Scholar]
- Simo-Serra, E.; Trulls, E.; Ferraz, L. Discriminative learning of deep convolutional feature point descriptors. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 13–16 December 2015; pp. 118–126. [Google Scholar]
- Yi, K.M.; Trulls, E.; Lepetit, V. Lift: Learned invariant feature transform. In Proceedings of the 14th European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 467–483. [Google Scholar]
- Dusmanu, M.; Rocco, I.; Pajdla, T. D2-Net: A trainable CNN for joint detection and description of local features. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR2019), Long Beach, CA, USA, 16–20 June 2019; pp. 8092–8101. [Google Scholar]
- Sun, J.; Shen, Z.; Wang, Y. LoFTR: Detector-free local feature matching with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR2021), Virtual, 19–25 June 2021; pp. 8922–8931. [Google Scholar]
- Daniel, D.; Tomasz, M.; Andrew, R. Toward geometric deep slam. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR2017), Honolulu, HI, USA, 22–25 July 2017; pp. 1–14. [Google Scholar]
- Marco, C. Sinkhorn distances: Lightspeed computation of optimal transportation distances. Adv. Neural Inf. Process. Syst. 2013, 26, 2292–2300. [Google Scholar]
- Adam, V.E. You only look twice: Rapid multi-scale object detection in satellite imagery. In Proceedings of the Computer Vision and Pattern Recognition (CVPR2018), Salt Lake City, UT, USA, 18–23 June 2018; p. 1805.09512. [Google Scholar]
- Ananth, R. The levenberg-marquardt algorithm. Tutor. LM Algorithm 2004, 11, 101–110. [Google Scholar]
- Li, Y.Y.; Fan, S.Y.; Sun, Y.B.; Wang, Q.; Sun, S.L. Bundle adjustment method using sparse BFGS solution. Remote Sens. Lett. 2018, 9, 789–798. [Google Scholar] [CrossRef]
- Zhao, S.H.; Li, Y.Y.; Cao, J.; Cao, X.X. A BFGS-Corrected Gauss-Newton Solver for Bundle Adjustment. Acta Sci. Nat. Univ. Pekin. 2020, 56, 1013–1019. [Google Scholar]
- Ayan, S.; Zak, M.; James, B.; Vijay, B.; Andrew, R. Deltas: Depth estimation by learning triangulation and densification of sparse points. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference (ECCV2020), Glasgow, Scotland, 23–28 August 2020; pp. 104–121. [Google Scholar]
- Yao, Y.; Luo, Z.X.; Li, S.W.; Zhang, J.Y. BlendedMVS: A large-scale dataset for generalized multi-view stereo neworks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR2020), Seattle, WA, USA, 14–19 June 2020; pp. 1790–1799. [Google Scholar]
Method | Feature Points | Match Points | Time/s | FPS |
---|---|---|---|---|
SIFT + SuperGlue + RANSAC | 2392 | 1398 | 0.052 | 19.23 |
SuperPoint + SuperGlue + RANSAC | 1927 | 1631 | 0.040 | 25.00 |
Dataset | Colmap | Meshroom | MVSNet [10] | R-MVSNet [11] | Ours | |||||
---|---|---|---|---|---|---|---|---|---|---|
RMSE | REL | RMSE | REL | RMSE | REL | RMSE | REL | RMSE | REL | |
DTU | 1.183 | 0.213 | 1.343 | 0.238 | 2.56 | 0.323 | 2.68 | 0.318 | 0.875 | 0.129 |
NYU-Depth-v2 | 1.524 | 0.325 | 2.191 | 0.376 | 3.117 | 0.512 | 4.235 | 0.54 | 1.479 | 0.213 |
BlendedMVS | 2.21 | 0.491 | 3.412 | 0.837 | 2.361 | 0.419 | 3.168 | 0.612 | 1.779 | 0.290 |
Images | 125 | 136 | ||||
---|---|---|---|---|---|---|
Reconstruction Stage | Colmap | Meshroom | Ours | Colmap | Meshroom | Ours |
Feature Detection | 46 s | 2.3 min | 53 s | 51 s | 2.7 min | 59 s |
Feature Matching | 1.3 min | 2.6 min | 1.1 min | 1.5 min | 3.1 min | 1.2 min |
SFM | 7.7 min | 9.2 min | 8.3 min | 8.6 min | 11 min | 9.0 min |
Global BA | 15 s | 38 s | 10 s | 17 s | 43 s | 12 s |
MVS | 51.3 min | 63.1 min | 32.7 min | 57.6 min | 70.5 min | 36.1 min |
3D Points | 8.9 M | 11.3 M | 13.2 M | 26.3 M | 28.5 M | 31.4 M |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, L.; Wang, C.; Feng, C.; Gong, W.; Zhang, L.; Liao, L.; Feng, C. Incremental SFM 3D Reconstruction Based on Deep Learning. Electronics 2024, 13, 2850. https://doi.org/10.3390/electronics13142850
Liu L, Wang C, Feng C, Gong W, Zhang L, Liao L, Feng C. Incremental SFM 3D Reconstruction Based on Deep Learning. Electronics. 2024; 13(14):2850. https://doi.org/10.3390/electronics13142850
Chicago/Turabian StyleLiu, Lei, Congzheng Wang, Chuncheng Feng, Wanqi Gong, Lingyi Zhang, Libin Liao, and Chang Feng. 2024. "Incremental SFM 3D Reconstruction Based on Deep Learning" Electronics 13, no. 14: 2850. https://doi.org/10.3390/electronics13142850
APA StyleLiu, L., Wang, C., Feng, C., Gong, W., Zhang, L., Liao, L., & Feng, C. (2024). Incremental SFM 3D Reconstruction Based on Deep Learning. Electronics, 13(14), 2850. https://doi.org/10.3390/electronics13142850