Robust Tracking and Clean Background Dense Reconstruction for RGB-D SLAM in a Dynamic Indoor Environment
Abstract
:1. Introduction
- A dynamic feature-removal method is proposed, which is based on a dynamic object mask and depth and projection error check. Dynamic object masks are obtained by Mask-RCNN and further optimized by using the connected component analysis method and a reference frame-based optimization method;
- We build an optimization model based on points, lines, and plane features to obtain higher trajectory accuracy and more robust tracking;
- We propose a static background reconstruction method, which uses the reconstructed information and ray-casting method to determine the pending regions. Some region-handling methods are designed to extract static regions from the pending regions, and static regions are used to reconstruct the static background model.
2. Related Works
3. Proposed Methods
3.1. Subsection Dynamic Object Mask Generation
3.1.1. Dynamic Object Mask Extraction Using Mask-RCNN
3.1.2. Mask Optimization by Connected Component Analysis Method
3.1.3. Mask Optimization Using a Reference Frame
3.2. Dynamic Features’ Removal
3.2.1. Dynamic Features’ Removal by a Mask
3.2.2. Depth Constraint and Projection Error Check
3.3. Subpixel Tracking
3.4. Optimization Model of Points, Lines, and Planes
3.4.1. Line Structure Constraints
3.4.2. Plane Structure Constraints
3.4.3. Sparse Tracking and Reconstruction in Dynamic Scenes
3.5. Mask Optimization by Multiview Projection
3.6. Detailed Overview of Dense Background Reconstruction in Dynamic Scenes
3.7. Determination of Pending Areas
3.8. Divisible and Indivisible Situation Handling
3.8.1. Small Noise Block Removal
3.8.2. Dynamic Block Removal
3.8.3. Static Block Extraction in
3.8.4. Floating Block Processing in the Bounding Box
3.8.5. Indivisible Situation Handling
3.9. Bounding Box Tracking and Noncontact Static Region Extraction
3.10. Mesh Generation and Postprocessing
4. Experimental Results and Discussion
4.1. Experiments on TUM Datasets
4.1.1. ATE Evaluation Experiment
4.1.2. RPE Evaluation Experiment
4.1.3. Results Analysis of the ATE and RPE
4.1.4. Evaluation of Tracking Rate
4.1.5. Comparison of the Error Metric and Tracking Rate between the Items of the Proposed Method
4.1.6. Dense Reconstruction of the TUM Dataset
4.2. Experiments on Real Scenes
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Saputra, M.R.U.; Markham, A.; Trigoni, N. Visual SLAM and Structure from Motion in Dynamic Environments: A Survey. ACM Comput. Surv. 2018, 51, 37. [Google Scholar] [CrossRef]
- Chang, J.; Dong, N.; Li, D. A Real-Time Dynamic Object Segmentation Framework for SLAM System in Dynamic Scenes. IEEE Trans. Instrum. Meas. 2021, 70, 2513709. [Google Scholar] [CrossRef]
- Sun, Y.X.; Liu, M.; Meng, M.Q.H. Improving RGB-D SLAM in dynamic environments: A motion removal approach. Robot. Auton. Syst. 2017, 89, 110–122. [Google Scholar] [CrossRef]
- Mur-Artal, R.; Tardos, J.D. ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras. IEEE Trans. Robot. 2017, 33, 1255–1262. [Google Scholar] [CrossRef] [Green Version]
- He, K.M.; Gkioxari, G.; Dollar, P.; Girshick, R. Mask R-CNN. In Proceedings of the International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. ITPAM 2017, 39, 2481–2495. [Google Scholar] [CrossRef] [PubMed]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Fan, Y.C.; Zhang, Q.C.; Liu, S.F.; Tang, Y.L.; Jing, X.; Yao, J.T.; Han, H. Semantic SLAM With More Accurate Point Cloud Map in Dynamic Environments. IEEE Access 2020, 8, 112237–112252. [Google Scholar] [CrossRef]
- Dai, W.; Zhang, Y.; Li, P.; Fang, Z.; Scherer, S. RGB-D SLAM in dynamic environments using point correlations. ITPAM 2020, 44, 373–389. [Google Scholar] [CrossRef]
- Wang, Y.B.; Huang, S.D. Motion Segmentation based Robust RGB-D SLAM. In Proceedings of the World Congress on Intelligent Control and Automation (WCICA), Shenyang, China, 27–30 June 2014; pp. 3122–3127. [Google Scholar]
- Liu, Y.; Miura, J. RDMO-SLAM: Real-time visual SLAM for dynamic environments using semantic label prediction with optical flow. IEEE Access 2021, 9, 106981–106997. [Google Scholar] [CrossRef]
- Cheng, J.Y.; Sun, Y.X.; Meng, M.Q.H. Improving monocular visual SLAM in dynamic environments: An optical-flow-based approach. Adv. Robot. 2019, 33, 576–589. [Google Scholar] [CrossRef]
- Brasch, N.; Bozic, A.; Lallemand, J.; Tombari, F. Semantic Monocular SLAM for Highly Dynamic Environments. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 393–400. [Google Scholar]
- Wang, R.Z.; Wan, W.H.; Wang, Y.K.; Di, K.C. A New RGB-D SLAM Method with Moving Object Detection for Dynamic Indoor Scenes. Remote Sens. 2019, 11, 1143. [Google Scholar] [CrossRef]
- Liu, G.H.; Zeng, W.L.; Feng, B.; Xu, F. DMS-SLAM: A General Visual SLAM System for Dynamic Scenes with Multiple Sensors. Sensors 2019, 19, 3714. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Bescos, B.; Facil, J.M.; Civera, J.; Neira, J. DynaSLAM: Tracking, Mapping, and Inpainting in Dynamic Scenes. IEEE Robot. Autom. Lett. 2018, 3, 4076–4083. [Google Scholar] [CrossRef] [Green Version]
- Zhang, C.Y.; Huang, T.; Zhang, R.C.; Yi, X.F. PLD-SLAM: A New RGB-D SLAM Method with Point and Line Features for Indoor Dynamic Scene. Isprs Int. J. Geo-Inf. 2021, 10, 163. [Google Scholar] [CrossRef]
- MacQueen, J. Classification and analysis of multivariate observations. In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, California, CA, USA, 27 December 1965–7 January 1966; pp. 281–297. [Google Scholar]
- Yang, S.Q.; Fan, G.H.; Bai, L.L.; Zhao, C.; Li, D.X. SGC-VSLAM: A Semantic and Geometric Constraints VSLAM for Dynamic Indoor Environments. Sensors 2020, 20, 2432. [Google Scholar] [CrossRef]
- Han, S.; Xi, Z. Dynamic scene semantics SLAM based on semantic segmentation. IEEE Access 2020, 8, 43563–43570. [Google Scholar] [CrossRef]
- Cui, L.Y.; Ma, C.W. SOF-SLAM: A Semantic Visual SLAM for Dynamic Environments. IEEE Access 2019, 7, 166528–166539. [Google Scholar] [CrossRef]
- Cui, L.Y.; Ma, C.W. SDF-SLAM: Semantic Depth Filter SLAM for Dynamic Environments. IEEE Access 2020, 8, 95301–95311. [Google Scholar] [CrossRef]
- Yu, C.; Liu, Z.X.; Liu, X.J.; Xie, F.G.; Yang, Y.; Wei, Q.; Fei, Q. DS-SLAM: A Semantic Visual SLAM towards Dynamic Environments. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 1168–1174. [Google Scholar]
- Cheng, J.; Wang, Z.; Zhou, H.; Li, L.; Yao, J. DM-SLAM: A feature-based SLAM system for rigid dynamic scenes. ISPRS Int. J. Geo-Inf. 2020, 9, 202. [Google Scholar] [CrossRef] [Green Version]
- Zhao, X.; Zuo, T.; Hu, X. OFM-SLAM: A Visual Semantic SLAM for Dynamic Indoor Environments. Math. Probl. Eng. 2021, 9, 202–219. [Google Scholar] [CrossRef]
- Xiao, L.; Wang, J.; Qiu, X.; Rong, Z.; Zou, X. Dynamic-SLAM: Semantic monocular visual localization and mapping based on deep learning in dynamic environment. Robot. Auton. Syst. 2019, 117, 1–16. [Google Scholar] [CrossRef]
- Liu, Y.; Wu, Y.L.; Pan, W.Z. Dynamic RGB-D SLAM Based on Static Probability and Observation Number. IEEE Trans. Instrum. Meas. 2021, 70, 8503411. [Google Scholar] [CrossRef]
- Xie, W.F.; Liu, X.P.; Zheng, M.H. Moving Object Segmentation and Detection for Robust RGBD-SLAM in Dynamic Environments. IEEE Trans. Instrum. Meas. 2021, 70, 5001008. [Google Scholar] [CrossRef]
- Ran, T.; Yuan, L.; Zhang, J.B.; Tang, D.X.; He, L. RS-SLAM: A Robust Semantic SLAM in Dynamic Environments Based on RGB-D Sensor. IEEE Sens. J. 2021, 21, 20657–20664. [Google Scholar] [CrossRef]
- Ai, Y.B.; Rui, T.; Lu, M.; Fu, L.; Liu, S.; Wang, S. DDL-SLAM: A Robust RGB-D SLAM in Dynamic Environments Combined With Deep Learning. IEEE Access 2020, 8, 162335–162342. [Google Scholar] [CrossRef]
- Zhang, L.; Wei, L.Q.; Shen, P.Y.; Wei, W.; Zhu, G.M.; Song, J. Semantic SLAM Based on Object Detection and Improved Octomap. IEEE Access 2018, 6, 75545–75559. [Google Scholar] [CrossRef]
- Runz, M.; Buffier, M.; Agapito, L. MaskFusion: Real-Time Recognition, Tracking and Reconstruction of Multiple Moving Objects. In Proceedings of the IEEE International Symposium on Mixed and Augmented Reality (ISMAR), Munich, Germany, 16–20 October 2018; pp. 10–20. [Google Scholar]
- Xu, B.B.; Li, W.B.; Tzoumanikas, D.; Bloesch, M.; Davison, A.; Leutenegger, S. MID-Fusion: Octree-based Object-Level Multi-Instance Dynamic SLAM. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 5231–5237. [Google Scholar]
- Scona, R.; Jaimez, M.; Petillot, Y.R.; Fallon, M.; Cremers, D. StaticFusion: Background Reconstruction for Dense RGB-D SLAM in Dynamic Environments. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018; pp. 3849–3856. [Google Scholar]
- Palazzolo, E.; Behley, J.; Lottes, P.; Giguere, P.; Stachniss, C. ReFusion: 3D Reconstruction in Dynamic Environments for RGB-D Cameras Exploiting Residuals. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 4–8 November 2019; pp. 7855–7862. [Google Scholar]
- Von Gioi, R.G.; Jakubowicz, J.; Morel, J.-M.; Randall, G. LSD: A line segment detector. Image Process. 2012, 2, 35–55. [Google Scholar] [CrossRef]
- Feng, C.; Taguchi, Y.; Kamat, V.R. Fast Plane Extraction in Organized Point Clouds Using Agglomerative Hierarchical Clustering. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, China, 31 May–7 June 2014; pp. 6218–6225. [Google Scholar]
Sequence | DS-SLAM | OFM-SLAM | DM-SLAM | Dyna-SLAM | ORB-SLAM2 | PLD-SLAM | Refusion | Xie [28] | Liu [27] | Our Method |
---|---|---|---|---|---|---|---|---|---|---|
Fr3_s_static | 0.0065 | 0.0134 | 0.0063 | 0.0064 | 0.0083 | 0.0063 | 0.009 | 0.007 | 0.0086 | 0.0060 |
Fr3_s_xyz | - | 0.0130 | - | 0.013 | 0.0095 | 0.0092 | 0.040 | 0.013 | 0.0090 | 0.0117 |
Fr3_s_rpy | 0.0187 | 0.0160 | 0.0230 | 0.0302 | 0.019 | 0.0220 | - | 0.043 | 0.0204 | 0.021 |
Fr3_s_half | 0.0148 | 0.0257 | 0.0178 | 0.0191 | 0.035 | 0.0145 | 0.110 | 0.019 | 0.0149 | 0.0173 |
Fr3_w_static | 0.0081 | 0.041 | 0.0079 | 0.0080 | 0.390 | 0.0065 | 0.017 | 0.010 | 0.0108 | 0.016 |
Fr3_w_xyz | 0.0247 | 0.306 | 0.0148 | 0.0158 | 0.614 | 0.0144 | 0.099 | 0.014 | 0.0156 | 0.0140 |
Fr3_w_rpy | 0.4442 | 0.104 | 0.0328 | 0.0402 | 0.973 | 0.2212 | 0.104 | 0.033 | - | 0.0303 |
Fr3_w_half | 0.303 | 0.307 | 0.0274 | 0.0274 | 0.789 | 0.0261 | - | 0.028 | 0.0359 | 0.0227 |
Sequence | ORB-SLAM2 | Dyna-SLAM | DS-SLAM | Fan [8] | Our Method | |||||
---|---|---|---|---|---|---|---|---|---|---|
RMSE | S.D. | RMSE | S.D. | RMSE | S.D. | RMSE | S.D. | RMSE | S.D. | |
Fr3_s_static | 0.0095 | 0.0046 | 0.0126 | 0.0067 | 0.0078 | 0.0038 | 0.0087 | 0.0038 | 0.0073 | 0.0036 |
Fr3_s_xyz | 0.0118 | 0.0057 | 0.0147 | 0.0079 | - | - | - | - | 0.0143 | 0.0076 |
Fr3_s_rpy | 0.0264 | 0.0211 | 0.0316 | 0.0191 | - | - | - | - | 0.0326 | 0.0185 |
Fr3_s_half | 0.0229 | 0.0166 | 0.0192 | 0.009 | - | - | - | - | 0.0222 | 0.0107 |
Fr3_w_static | 0.1928 | 0.1773 | 0.0089 | 0.0044 | 0.0102 | 0.0038 | 0.0102 | 0.0049 | 0.0144 | 0.0081 |
Fr3_w_xyz | 0.4834 | 0.3663 | 0.0217 | 0.0119 | 0.0333 | 0.0229 | 0.0204 | 0.0107 | 0.0182 | 0.0087 |
Fr3_w_rpy | 0.3880 | 0.2823 | 0.0448 | 0.0262 | 0.1503 | 0.1168 | 0.0616 | 0.0357 | 0.0425 | 0.0239 |
Fr3_w_half | 0.3216 | 0.2629 | 0.0284 | 0.0149 | 0.0297 | 0.0152 | 0.0274 | 0.0140 | 0.0243 | 0.0109 |
Scene | Sequence | ORB-SLAM2 | Dyna-SLAM | DS-SLAM | Fan [8] | Our Method | |||||
---|---|---|---|---|---|---|---|---|---|---|---|
RMSE | S.D. | RMSE | S.D. | RMSE | S.D. | RMSE | S.D. | RMSE | S.D. | ||
Less Dynamic Scenes | Fr3_s_static | 0.2881 | 0.1244 | 0.3416 | 0.1642 | 0.2735 | 0.1215 | 0.2782 | 0.1210 | 0.2673 | 0.1178 |
Fr3_s_xyz | 0.4976 | 0.2772 | 0.5162 | 0.2882 | - | - | - | - | 0.5062 | 0.2759 | |
Fr3_s_rpy | 0.7613 | 0.3954 | 0.833 | 0.470 | - | - | - | - | 0.8206 | 0.4154 | |
Fr3_s_half | 0.576 | 0.2651 | 0.649 | 0.3155 | - | - | - | - | 0.6945 | 0.3427 | |
Highly Dynamic Scenes | Fr3_w_static | 3.5991 | 3.2457 | 0.2612 | 0.1259 | 0.2690 | 0.1215 | 0.2631 | 0.1119 | 0.3336 | 0.1630 |
Fr3_w_xyz | 8.8419 | 6.6762 | 0.6284 | 0.3848 | 0.8266 | 0.2826 | 0.6227 | 0.3807 | 0.6033 | 0.3749 | |
Fr3_w_rpy | 7.5906 | 5.4768 | 0.9894 | 0.5701 | 3.0042 | 2.3065 | 1.3831 | 0.8319 | 1.0791 | 0.6658 | |
Fr3_w_half | 6.6515 | 5.3990 | 0.7842 | 0.4012 | 0.8142 | 0.4101 | 0.7440 | 0.3459 | 0.8101 | 0.3947 |
Scene | Sequence | Total | ORB-SLAM2 | Dyna-SLAM | DS-SLAM | Fan [8] | Our Method | |||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
NSTF | STR | NSTF | STR | NSTF | STR | NSTF | STR | NSTF | STR | |||
Less Dynamic Scenes | Fr3_s_static | 679 | 675 | 99.4% | 675 | 99.4% | 676 | 99.6% | 676 | 99.6% | 679 | 100% |
Fr3_s_xyz | 1219 | 1219 | 100% | 1219 | 100% | - | - | - | - | 1219 | 100% | |
Fr3_s_rpy | 795 | 773 | 97% | 760 | 96% | - | - | - | - | 781 | 98% | |
Fr3_s_half | 1074 | 1074 | 100% | 1074 | 100% | - | - | - | - | 1074 | 100% | |
Highly Dynamic Scenes | Fr3_w_static | 717 | 714 | 99.6% | 375 | 52.3% | 714 | 99.6% | 714 | 99.6% | 717 | 100% |
Fr3_w_xyz | 827 | 809 | 97.8% | 757 | 91.5% | 826 | 99.9% | 826 | 99.9% | 827 | 100% | |
Fr3_w_rpy | 866 | 825 | 95.3% | 546 | 63.1% | 864 | 99.8% | 864 | 99.8% | 858 | 98% | |
Fr3_w_half | 1021 | 942 | 99.3% | 525 | 51.4% | 1018 | 99.7% | 1018 | 99.7% | 1021 | 100% | |
Average STR | 900 | 879 | 98.6% | 741 | 81.7% | - | - | - | - | 897 | 99.5% |
Sequence | Total Frame | M | M + CR | M + CR + LP | ||||||
---|---|---|---|---|---|---|---|---|---|---|
ATE | NSTF | STR | ATE | NSTF | STR | ATE | NSTF | STR | ||
Fr3_s_static | 679 | 0.0068 | 679 | 100% | 0.0063 | 676 | 100% | 0.0060 | 679 | 100% |
Fr3_s_xyz | 1219 | 0.0132 | 1219 | 100% | 0.0126 | 1216 | 100% | 0.0117 | 1219 | 100% |
Fr3_s_rpy | 795 | 0.0273 | 748 | 94.1% | 0.0228 | 756 | 95.1% | 0.021 | 781 | 98% |
Fr3_s_half | 1074 | 0.0207 | 1074 | 100% | 0.0197 | 1074 | 100% | 0.0173 | 1074 | 100% |
Fr3_w_static | 717 | 0.0172 | 717 | 100% | 0.0167 | 717 | 100% | 0.016 | 717 | 100% |
Fr3_w_xyz | 827 | 0.0212 | 827 | 100% | 0.0182 | 827 | 100% | 0.0140 | 827 | 100% |
Fr3_w_rpy | 866 | 0.0332 | 820 | 94.7% | 0.0342 | 838 | 96.8% | 0.0303 | 858 | 98% |
Fr3_w_half | 1021 | 0.0459 | 1021 | 100% | 0.0366 | 1021 | 100% | 0.0227 | 1021 | 100% |
Sequence | ORB-SLAM2 | Our Method |
---|---|---|
Lowly dynamic real scenes | 0.1792 | 0.004 |
Highly dynamic real scenes | 0.421 | 0.0008 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhu, F.; Zheng, S.; Huang, X.; Wang, X. Robust Tracking and Clean Background Dense Reconstruction for RGB-D SLAM in a Dynamic Indoor Environment. Machines 2022, 10, 892. https://doi.org/10.3390/machines10100892
Zhu F, Zheng S, Huang X, Wang X. Robust Tracking and Clean Background Dense Reconstruction for RGB-D SLAM in a Dynamic Indoor Environment. Machines. 2022; 10(10):892. https://doi.org/10.3390/machines10100892
Chicago/Turabian StyleZhu, Fengbo, Shunyi Zheng, Xia Huang, and Xiqi Wang. 2022. "Robust Tracking and Clean Background Dense Reconstruction for RGB-D SLAM in a Dynamic Indoor Environment" Machines 10, no. 10: 892. https://doi.org/10.3390/machines10100892