Multi-View Stereo Vision Patchmatch Algorithm Based on Data Augmentation
Abstract
:1. Introduction
2. Related Work
2.1. Traditional MVS
2.2. Learning-Based MVS
3. Proposed Algorithm
3.1. Multi-Scale Feature Extraction
3.2. Learning-Based Patchmatch
3.2.1. Initialization and Local Perturbation
3.2.2. Adaptive Propagation
3.2.3. Adaptive Evaluation
- Feature mapping
- Cost matching
- Adaptive Spatial Cost Aggregation
- Depth Regression
3.3. Dynamic Interval d
3.4. Data Augmentation
3.4.1. View Mask
3.4.2. Gamma Correction
3.4.3. Color Transformation and Blurring
3.5. Loss Function
4. Experiments
4.1. Datasets
- DTU dataset [1]
- The DTU dataset is an indoor multi-view vision dataset in an experimental environment, which contains 128 different scenes and 49 views from different angles under different lighting conditions. The scanning of scene objects in this data set is carried out in a standard experimental environment, and they are all captured by cameras with the same trajectory. The DTU dataset is utilized to train and evaluate our network. The setup of most multi-view works is followed, providing 79 of the 128 scenes as the training set to provide real depth maps and 22 scenes as the test set to evaluate the 3D point cloud.
- Tanks and Temples dataset [2]
4.2. Implementation Details
4.3. Benchmark Performance
4.3.1. Evaluation on DTU Dataset
4.3.2. Calculation Time and Memory Consumption
4.3.3. Evaluation on Tanks and Temples Dataset
4.4. Ablation Study
4.4.1. Data Augmentation (DA)
4.4.2. Dynamic Interval d
4.4.3. Adaptive Propagation (AP) and Adaptive Evaluation (AE)
4.4.4. Number of Iterations of Patchmatch
4.4.5. Number of Views
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Aanæs, H.; Jensen, R.R.; Vogiatzis, G.; Tola, E.; Dahl, A.B. Large-Scale Data for Multiple-View Stereopsis. Int. J. Comput. Vis. 2016, 120, 153–168. [Google Scholar] [CrossRef] [Green Version]
- Knapitsch, A.; Park, J.; Zhou, Q.-Y.; Koltun, V. Tanks and temples: Benchmarking large-scale scene reconstruction. ACM Trans. Graph. 2017, 36, 78. [Google Scholar] [CrossRef]
- Yao, Y.; Luo, Z.; Li, S.; Fang, T.; Quan, L. MVSNet: Depth inference for unstructured multi-view stereo. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 767–783. [Google Scholar]
- Yao, Y.; Luo, Z.; Li, S.; Shen, T.; Fang, T.; Quan, L. Recurrent MVSNet for High-resolution Multi-view Stereo Depth Inference. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 5525–5534. [Google Scholar]
- Luo, K.; Guan, T.; Ju, L.; Huang, H.; Luo, Y. P-MVSNet: Learning Patch-Wise matching confidence aggregation for Multi-View Stereo. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 10452–10461. [Google Scholar]
- Yu, Z.; Gao, S. Fast-MVSNet: Sparse-to-Dense Multi-View Stereo With Learned Propagation and Gauss-Newton Refinement. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 1949–1958. [Google Scholar]
- Wang, F.; Galliani, S.; Vogel, C.; Speciale, P.; Pollefeys, M. PatchmatchNet: Learned Multi-View Patchmatch Stereo. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 14194–14203. [Google Scholar]
- Huang, P.-H.; Matzen, K.; Kopf, J.; Ahuja, N.; Huang, J.-B. DeepMVS: Learning Multi-view Stereopsis 2018. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 2821–2830. [Google Scholar]
- Peng, R.; Wang, R.; Wang, Z.; Lai, Y.; Wang, R. Rethinking depth estimation for multi-view stereo: A unified representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 8645–8654. [Google Scholar]
- Xu, H.; Zhou, Z.; Qiao, Y.; Kang, W.; Wu, Q. Self-supervised Multi-view Stereo via Effective Co-Segmentation and Data-Augmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, Held Virtually, 2–9 February 2021; Volume 35, pp. 3030–3038. [Google Scholar]
- Sinha, S.N.; Mordohai, P.; Pollefeys, M. Multi-view stereo via graph cuts on the dual of an adaptive tetrahedral mesh. In Proceedings of the 2007 IEEE 11th International Conference on Computer Vision, Rio de Janeiro, Brazil, 14–21 October 2007; pp. 1–8. [Google Scholar]
- Ulusoy, A.O.; Black, M.J.; Geiger, A. Semantic Multi-view Stereo: Jointly estimating objects and voxels. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 4531–4540. [Google Scholar]
- Li, Z.; Wang, K.; Zuo, W.; Meng, D.; Zhang, L. Detail-preserving and Content-aware Variational Multi-view Stereo Reconstruction. IEEE Trans. Image Proc. 2016, 25, 864–877. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Locher, A.; Perdoch, M.; Van Gool, L. Progressive Prioritized Multi-view Stereo. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 3244–3252. [Google Scholar]
- Galliani, S.; Lasinger, K.; Schindler, K. Massively parallel multiview stereopsis by surface normal diffusion. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 873–881. [Google Scholar]
- Schonberger, J.L.; Frahm, J.-M. Structure-from-Motion Revisited. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 4104–4113. [Google Scholar]
- Xu, Q.; Tao, W. Multi-scale geometric consistency guided multi-view stereo. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 5483–5492. [Google Scholar]
- Ji, M.; Gall, J.; Zheng, H.; Liu, Y.; Fang, L. SurfaceNet: An End-to-end 3D Neural Network for Multiview Stereopsis. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2307–2315. [Google Scholar] [CrossRef] [Green Version]
- Wang, F.; Galliani, S.; Vogel, C.; Pollefeys, M. IterMVS: Iterative probability estimation for efficient multi-view stereo. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 8606–8615. [Google Scholar]
- Wei, Z.; Zhu, Q.; Min, C.; Chen, Y.; Wang, G. Bidirectional Hybrid LSTM Based Recurrent Neural Network for Multi-view Stereo. IEEE Trans. Vis. Comput. Graph. 2022. [Google Scholar] [CrossRef] [PubMed]
- Yan, J.; Wei, Z.; Yi, H.; Ding, M.; Zhang, R.; Chen, Y.; Wang, G.; Tai, Y.-W. Dense hybrid recurrent multi-view stereo net with dynamic consistency checking. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020, Proceedings, Part IV; Springer International Publishing: Cham, Switzerland, 2020; pp. 674–689. [Google Scholar]
- Gao, S.; Li, Z.; Wang, Z. Cost Volume Pyramid Network with Multi-strategies Range Searching for Multi-view Stereo. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 4877–4886. [Google Scholar]
- Gu, X.; Fan, Z.; Zhu, S.; Dai, Z.; Tan, F.; Tan, P. Cascade cost volume for high-resolution multi-view stereo and stereo matching. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 2495–2504. [Google Scholar]
- Cheng, S.; Xu, Z.; Zhu, S.; Li, Z.; Li, L.E.; Ramamoorthi, R.; Su, H. Deep stereo using adaptive thin volume representation with uncertainty awareness. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 2521–2531. [Google Scholar]
- Liao, J.; Ding, Y.; Shavit, Y.; Huang, D.; Ren, S.; Guo, J.; Feng, W.; Zhang, K. WT-MVSNet: Window-based Transformers for Multi-view Stereo. arXiv 2022, arXiv:2205.14319. [Google Scholar]
- Luo, K.; Guan, T.; Ju, L.; Wang, Y.; Chen, Z.; Luo, Y. Attention-Aware Multi-View Stereo. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 1587–1596. [Google Scholar]
- Zhang, J.; Yao, Y.; Li, S.; Luo, Z.; Fang, T. Visibility-aware Multi-view Stereo Network. arXiv 2020, arXiv:2008.07928. [Google Scholar]
- Wei, Z.; Zhu, Q.; Min, C.; Chen, Y.; Wang, G. AA-RMVSNet: Adaptive aggregation recurrent multi-view stereo network. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 6187–6196. [Google Scholar]
- Xu, Q.; Tao, W. Learning inverse depth regression for multi-view stereo with correlation cost volume. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 12508–12515. [Google Scholar]
- Duggal, S.; Wang, S.; Ma, W.-C.; Hu, R.; Urtasun, R. DeepPruner: Learning Efficient Stereo Matching via Differentiable PatchMatch. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 4384–4393. [Google Scholar]
- Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A simple framework for contrastive learning of visual representations. In International Conference on Machine Learning; PMLR: Vienna, Austria, 2020; pp. 1597–1607. [Google Scholar]
- Xie, Q.; Dai, Z.; Hovy, E.; Luong, M.-T.; Le, Q.V. Unsupervised Data Augmentation for Consistency Training. Adv. Neural Inf. Proc. Syst. 2020, 33, 6256–6268. [Google Scholar]
- Campbell, N.D.; Vogiatzis, G.; Hernández, C.; Cipolla, R. Using multiple hypotheses to improve depth-maps for multi-view stereo. In Computer Vision–ECCV 2008: 10th European Conference on Computer Vision, Marseille, France, 12–18 October 2008, Proceedings, Part I 10; Springer: Berlin/Heidelberg, Germany, 2008; pp. 766–779. [Google Scholar]
- Furukawa, Y.; Ponce, J. Accurate, dense, and robust multiview stereopsis. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 32, 1362–1376. [Google Scholar] [CrossRef] [PubMed]
- Chen, R.; Han, S.; Xu, J.; Su, H. Point-based multi-view stereo network. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1538–1547. [Google Scholar]
- Wang, S.; Li, B.; Dai, Y. Efficient Multi-View Stereo by Iterative Dynamic Cost Volume. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022; pp. 8655–8664. [Google Scholar]
- Ma, X.; Gong, Y.; Wang, Q.; Huang, J.; Chen, L.; Yu, F. EPP-MVSNet: Epipolar-assembling based Depth Prediction for Multi-view Stereo. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 5732–5740. [Google Scholar]
Algorithm | ACC. (mm) | Comp. (mm) | Overall (mm) |
---|---|---|---|
ACMP [33] | 0.835 | 0.554 | 0.695 |
Furu [34] | 0.613 | 0.941 | 0.777 |
Gipuma [15] | 0.283 | 0.873 | 0.578 |
COLMAP [16] | 0.400 | 0.664 | 0.532 |
SurfaceNet [18] | 0.450 | 1.040 | 0.745 |
MVSNet [3] | 0.396 | 0.527 | 0.462 |
P-MVSNet [5] | 0.406 | 0.434 | 0.420 |
R-MVSNet [4] | 0.383 | 0.452 | 0.417 |
Point-MVSNet [35] | 0.342 | 0.411 | 0.376 |
Fast-MVSNet [6] | 0.336 | 0.403 | 0.370 |
AA-RMVSNet [28] | 0.376 | 0.339 | 0.357 |
CasMVSNet [23] | 0.325 | 0.385 | 0.355 |
CVP-MVSNet [22] | 0.296 | 0.406 | 0.351 |
UCS-Net [24] | 0.338 | 0.349 | 0.344 |
PatchmatchNet [7] | 0.427 | 0.277 | 0.352 |
UniMVSNet [9] | 0.352 | 0.278 | 0.315 |
Effi-MVS [36] | 0.321 | 0.313 | 0.317 |
Ours | 0.417 | 0.272 | 0.344 |
Algorithm | Intermediate Dataset | Advanced Dataset | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Mean | Fam. | Fra. | Hor. | Lig. | M60 | Pan. | Pla. | Tra. | Mean | Aud. | Bal. | Cou. | Mus. | Pal. | Tem. | |
COLMAP [16] | 42.14 | 50.41 | 22.25 | 25.63 | 56.43 | 44.83 | 46.97 | 48.53 | 42.04 | 27.24 | 16.02 | 25.23 | 34.70 | 41.51 | 18.05 | 27.94 |
PointMVSNet [35] | 48.27 | 61.79 | 41.15 | 34.20 | 50.79 | 51.97 | 50.85 | 52.38 | 43.06 | - | - | - | - | - | - | - |
UCS-Net [24] | 54.83 | 76.09 | 53.16 | 43.03 | 54.00 | 55.60 | 51.49 | 57.38 | 47.89 | - | - | - | - | - | - | - |
P-MVSNet [5] | 55.62 | 70.04 | 44.64 | 40.22 | 65.20 | 55.08 | 55.17 | 60.37 | 54.29 | - | - | - | - | - | - | - |
CasMVSNet [23] | 56.84 | 76.37 | 58.45 | 46.26 | 55.81 | 56.11 | 54.06 | 58.18 | 49.51 | 31.12 | 19.81 | 38.46 | 29.10 | 43.87 | 27.36 | 28.11 |
ACMP [33] | 58.41 | 70.30 | 54.06 | 54.11 | 61.65 | 54.16 | 57.60 | 58.12 | 57.25 | 37.44 | 30.12 | 34.68 | 44.58 | 50.64 | 27.20 | 37.43 |
VisMVSNet [27] | 60.03 | 77.40 | 60.23 | 47.07 | 63.44 | 62.21 | 57.28 | 60.54 | 52.07 | 33.78 | 20.79 | 38.77 | 32.45 | 44.20 | 28.73 | 37.70 |
CVP-MVSNet [22] | 54.03 | 76.50 | 47.74 | 36.34 | 55.12 | 57.28 | 54.28 | 57.43 | 47.54 | - | - | - | - | - | - | - |
Patchmatchnet [7] | 53.15 | 66.99 | 52.64 | 43.24 | 54.87 | 52.87 | 49.54 | 54.21 | 50.81 | 32.31 | 23.69 | 37.73 | 30.04 | 41.80 | 28.31 | 32.29 |
AARMVSNet [28] | 61.51 | 77.77 | 59.53 | 51.53 | 64.02 | 64.05 | 59.47 | 60.85 | 55.50 | 33.53 | 20.96 | 40.15 | 32.05 | 46.01 | 29.28 | 32.71 |
Fast-MVSNet [6] | 47.39 | 65.18 | 39.59 | 34.98 | 47.81 | 49.16 | 46.20 | 53.27 | 42.91 | - | - | - | - | - | - | - |
EPP-MVSNet [37] | 61.68 | 77.86 | 60.54 | 52.96 | 62.33 | 61.69 | 60.34 | 62.44 | 55.30 | 35.72 | 21.28 | 39.74 | 35.34 | 49.21 | 30.00 | 38.75 |
UniMVSNet [9] | 64.36 | 81.20 | 66.43 | 53.11 | 63.46 | 66.09 | 64.84 | 62.23 | 57.53 | 38.96 | 28.33 | 44.36 | 39.74 | 52.89 | 33.80 | 34.63 |
Effi-MVS [36] | 56.88 | 72.21 | 51.02 | 51.78 | 58.63 | 58.71 | 56.21 | 57.07 | 49.38 | 34.39 | 20.22 | 42.39 | 33.73 | 45.08 | 29.81 | 35.09 |
Ours | 54.79 | 68.10 | 54.60 | 45.65 | 57.32 | 53.43 | 48.21 | 57.64 | 53.33 | 33.97 | 24.51 | 39.43 | 33.24 | 42.53 | 30.26 | 33.83 |
Algorithm | ACC. (mm) | Comp. (mm) | Overall (mm) |
---|---|---|---|
None | 0.429 | 0.283 | 0.356 |
DA | 0.417 | 0.272 | 0.344 |
Algorithm | ACC. (mm) | Comp. (mm) | Overall (mm) |
---|---|---|---|
Static d | 0.429 | 0.283 | 0.356 |
Dynamic d | 0.417 | 0.272 | 0.344 |
Algorithm | ACC. (mm) | Comp. (mm) | Overall (mm) |
---|---|---|---|
None | 0.464 | 0.351 | 0.407 |
AP | 0.437 | 0.293 | 0.365 |
AE | 0.421 | 0.326 | 0.373 |
AP and AE | 0.417 | 0.272 | 0.344 |
Iterations | ACC. (mm) | Comp. (mm) | Overall (mm) |
---|---|---|---|
1,1,1 | 0.443 | 0.283 | 0.363 |
2,2,1 | 0.417 | 0.272 | 0.344 |
3,3,1 | 0.416 | 0.273 | 0.344 |
4,4,1 | 0.417 | 0.272 | 0.344 |
5,5,1 | 0.417 | 0.272 | 0.344 |
N | ACC. (mm) | Comp. (mm) | Overall (mm) |
---|---|---|---|
2 | 0.453 | 0.342 | 0.397 |
3 | 0.432 | 0.311 | 0.371 |
4 | 0.428 | 0.280 | 0.354 |
5 | 0.417 | 0.272 | 0.344 |
6 | 0.419 | 0.281 | 0.350 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Pan, F.; Wang, P.; Wang, L.; Li, L. Multi-View Stereo Vision Patchmatch Algorithm Based on Data Augmentation. Sensors 2023, 23, 2729. https://doi.org/10.3390/s23052729
Pan F, Wang P, Wang L, Li L. Multi-View Stereo Vision Patchmatch Algorithm Based on Data Augmentation. Sensors. 2023; 23(5):2729. https://doi.org/10.3390/s23052729
Chicago/Turabian StylePan, Feiyang, Pengtao Wang, Lin Wang, and Lihong Li. 2023. "Multi-View Stereo Vision Patchmatch Algorithm Based on Data Augmentation" Sensors 23, no. 5: 2729. https://doi.org/10.3390/s23052729
APA StylePan, F., Wang, P., Wang, L., & Li, L. (2023). Multi-View Stereo Vision Patchmatch Algorithm Based on Data Augmentation. Sensors, 23(5), 2729. https://doi.org/10.3390/s23052729