DLD-SLAM: RGB-D Visual Simultaneous Localisation and Mapping in Indoor Dynamic Environments Based on Deep Learning
Abstract
:1. Introduction
- On the basis of the ORB-SLAM3 algorithm, the GCNv2 tiny network replaces the conventional ORB method to achieve the extraction and matching of feature points, which improves the efficiency and robustness of the system.
- The lightweight GSConv [10] module is applied to the YOLOv5s network model, which reduces the count of parameters in the network to improve the computational efficiency of the target detection algorithm. Then, the target detection algorithm combines with the depth information of the RGB-D camera to obtain the mask of potential dynamic targets, which helps identify areas where dynamic feature points are located.
- A novel method for rejecting dynamic feature points was designed based on the dynamic feature point rejection strategy. We propose the concept of dynamic probability based on LK (Lucas–Kanade) optical flow, semantic labels, and the state in the last frame which is added to the tracking thread. Using this method, the real dynamic feature points are rejected, and the static feature points are retained for position estimation. This method can effectively solve the problem of interference with positioning by dynamic objects.
- Experiments are carried out for the above design: Firstly, feature point detection and matching are verified to prove the accuracy and robustness of the system; Then, the training accuracy and detection results of the lightweight target detection network are analysed. Finally, the performance of position estimation in the dynamic environment is verified by the TUM dataset, which has been demonstrated to improve the efficiency of our algorithm and the effectiveness of our approach when dealing with dynamic objects.
2. Related Work
2.1. Visual SLAM Based on Deep Learning
2.2. Dynamic Visual SLAM
3. Methods
3.1. Feature Extraction and Matching Based on GCNv2-tiny Network
3.2. Lightweight YOLOv5 Target Detection Algorithm
3.3. Dynamic Object Detection Based on Target Detection and Depth Image
3.4. Dynamic Feature Point Rejection Strategy Based on Optical Flow
Algorithm 1: Dynamic feature points rejection algorithm. |
Input: Current frame’s feature points in the semantic segmentation mask, ; |
Output: The set including all current frame’s static feature point; |
1 Obtain all feature points in the semantic segmentation mask belong to the set P; |
2 Classify feature points in the set P according to the mask’s semantic labels, , , and , and assign semantic labels to every feature point, =1, =0.5, and =0; |
3 for each feature point do |
4 Calculate the mean motion velocity of all the feature points, , and calculate the velocity, =; |
5 if then |
6 =1; 7 else =0; |
8 end if |
9 Follow the method in step 5~8, judge the state of in last frame, and obtain the value of ; |
10 Calculate the dynamic probability of , ; |
11 if then 12 Save the feature points to the set ; |
13 else reject the feature point ; |
14 end if |
15 end for |
4. Experimental Results
4.1. Feature Extraction and Matching
4.2. Target Detection Network Training and Performance
4.3. Trajectory Accuracy Verification Experiment in the Dynamic Environment
4.3.1. Trajectory Accuracy
4.3.2. The Efficiency of the Algorithm
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Abaspur Kazerouni, I.; Fitzgerald, L.; Dooly, G.; Toal, D. A Survey of State-of-the-Art on Visual SLAM. Expert Syst. Appl. 2022, 205, 117734. [Google Scholar] [CrossRef]
- Mur-Artal, R.; Tardós, J.D. ORB-SLAM2: An open-source slam system for monocular, stereo, and rgb-d cameras. IEEE Trans. Robot. 2017, 33, 1255–1262. [Google Scholar] [CrossRef]
- Campos, C.; Elvira, R.; Rodríguez, J.J.G.; Montiel, J.M.M.; Tardos, J.D. ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual–Inertial, and Multimap SLAM. IEEE Trans. Robot. 2021, 37, 1874–1890. [Google Scholar] [CrossRef]
- Qin, T.; Li, P.; Shen, S. VINS-Mono: A Robust and Versatile Monocular Visual-Inertial State Estimator. IEEE Trans. Robot. 2018, 34, 1004–1020. [Google Scholar] [CrossRef]
- Forster, C.; Pizzoli, M.; Scaramuzza, D. SVO: Fast semi-direct monocular visual odometry. In Proceedings of the 2014 IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, China, 31 May–7 June 2014. [Google Scholar]
- Chengqi, D.; Kaitao, Q.; Rong, X. Comparative Study of Deep Learning Based Features in SLAM. In Proceedings of the 2019 4th Asia-Pacific Conference on Intelligent Robot Systems (ACIRS), Nagoya, Japan, 13–15 July 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 250–254. [Google Scholar]
- Mohamed, A.; Tharwat, M.; Magdy, M.; Abubakr, T.; Nasr, O.; Youssef, M. DeepFeat: Robust Large-Scale Multi-Features Outdoor Localization in LTE Networks Using Deep Learning. IEEE Access 2022, 10, 3400–3414. [Google Scholar] [CrossRef]
- Yi, K.M.; Trulls, E.; Lepetit, V.; Fua, P. LIFT: Learned Invariant Feature Transform. In Proceedings of the Computer Vision—ECCV 2016, Amsterdam, The Netherlands, 11–14 October 2016; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Springer International Publishing: Cham, Switzerland, 2016; pp. 467–483. [Google Scholar]
- Ballester, I.; Fontán, A.; Civera, J.; Strobl, K.H.; Triebel, R. DOT: Dynamic Object Tracking for Visual SLAM. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 11705–11711. [Google Scholar]
- Li, H.; Li, J.; Wei, H.; Liu, Z.; Zhan, Z.; Ren, Q. Slim-neck by GSConv: A better design paradigm of detector architectures for autonomous vehicles. arXiv 2022, arXiv:2206.02424. [Google Scholar] [CrossRef]
- Xie, Y.; Tang, Y.; Tang, G.; Hoff, W. Learning To Find Good Correspondences Of Multiple Objects. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; pp. 2779–2786. [Google Scholar]
- Dusmanu, M.; Rocco, I.; Pajdla, T.; Pollefeys, M.; Sivic, J.; Torii, A.; Sattler, T. D2-Net: A Trainable CNN for Joint Description and Detection of Local Features. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 8084–8093. [Google Scholar]
- Revaud, J.; Weinzaepfel, P.; Souza, C.D.; Humenberger, M. R2D2: Repeatable and Reliable Detector and Descriptor. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; Curran Associates Inc.: Red Hook, NY, USA, 2019; pp. 12414–12424. [Google Scholar]
- DeTone, D.; Malisiewicz, T.; Rabinovich, A. SuperPoint: Self-Supervised Interest Point Detection and Description. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar] [CrossRef]
- Tang, J.; Ericson, L.; Folkesson, J.; Jensfelt, P. GCNv2: Efficient Correspondence Prediction for Real-Time SLAM. IEEE Robot. Autom. Lett. 2019, 4, 3505–3512. [Google Scholar] [CrossRef]
- Clark, R.; Wang, S.; Wen, H.; Markham, A.; Trigoni, N. VINet: Visual-Inertial Odometry as a Sequence-to-Sequence Learning Problem. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; Volume 31. [Google Scholar] [CrossRef]
- Zhang, R.; Zhang, X. Geometric Constraint-Based and Improved YOLOv5 Semantic SLAM for Dynamic Scenes. ISPRS Int. J. Geo-Inf. 2023, 12, 211. [Google Scholar] [CrossRef]
- Zhang, X.; Zhang, R.; Wang, X. Visual SLAM Mapping Based on YOLOv5 in Dynamic Scenes. Appl. Sci. 2022, 12, 11548. [Google Scholar] [CrossRef]
- Kim, D.-H.; Kim, J.-H. Effective Background Model-Based RGB-D Dense Visual Odometry in a Dynamic Environment. IEEE Trans. Robot. 2016, 32, 1565–1573. [Google Scholar] [CrossRef]
- Kerl, C.; Sturm, J.; Cremers, D. Dense visual SLAM for RGB-D cameras. In Proceedings of the 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, Tokyo, Japan, 3–7 November 2013; pp. 2100–2106. [Google Scholar] [CrossRef]
- Guohao, F.; Lele, B.; Cheng, Z. Geometric Constraint-Based Visual SLAM Under Dynamic Indoor Environment. Comput. Eng. Appl. 2021, 57, 203–212. [Google Scholar]
- Zhang, C.; Zhang, R.; Jin, S.; Yi, X. PFD-SLAM: A New RGB-D SLAM for Dynamic Indoor Environments Based on Non-Prior Semantic Segmentation. Remote Sens. 2022, 14, 2445. [Google Scholar] [CrossRef]
- Yu, C.; Liu, Z.; Liu, X.-J.; Xie, F.; Yang, Y.; Wei, Q.; Fei, Q. DS-SLAM: A Semantic Visual SLAM towards Dynamic Environments. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 1168–1174. [Google Scholar]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef] [PubMed]
- Zhong, F.; Wang, S.; Zhang, Z.; Chen, C.; Wang, Y. Detect-SLAM: Making Object Detection and SLAM Mutually Beneficial. In Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA, 12–15 March 2018; pp. 1001–1010. [Google Scholar] [CrossRef]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the Computer Vision—ECCV 2016, Amsterdam, The Netherlands, 11–14 October 2016; Volume 9905, pp. 21–37. [Google Scholar]
- Liu, Y.; Miura, J. RDS-SLAM: Real-Time Dynamic SLAM Using Semantic Segmentation Methods. IEEE Access 2021, 9, 23772–23785. [Google Scholar] [CrossRef]
- Li, A.; Wang, J.; Xu, M. DP-SLAM: A visual SLAM with moving probability towards dynamic environments. Inf. Sci. 2020, 556, 128–142. [Google Scholar] [CrossRef]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar] [CrossRef]
- Bescos, B.; Fácil, J.M.; Civera, J.; Neira, J. DynaSLAM: Tracking, Mapping, and Inpainting in Dynamic Scenes. IEEE Robot. Autom. Lett. 2018, 3, 4076–4083. [Google Scholar] [CrossRef]
- Wu, W.; Guo, L.; Gao, H.; You, Z.; Liu, Y.; Chen, Z. YOLO-SLAM: A semantic SLAM system towards dynamic environment with geometric constraint. Neural Comput. Appl. 2022, 34, 6011–6026. [Google Scholar] [CrossRef]
- Wei, S.; Wang, S.; Li, H.; Liu, G.; Yang, T.; Liu, C. A Semantic Information-Based Optimized vSLAM in Indoor Dynamic Environments. Appl. Sci. 2023, 13, 8790. [Google Scholar] [CrossRef]
- Wang, X.; Zhang, X. MCBM-SLAM: An Improved Mask-Region-Convolutional Neural Network-Based Simultaneous Localization and Mapping System for Dynamic Environments. Electronics 2023, 12, 3596. [Google Scholar] [CrossRef]
- Xiao, J.; Owens, A.; Torralba, A. SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels. In Proceedings of the 2013 IEEE International Conference on Computer Vision, Sydney, Australia, 1–8 December 2013; pp. 1625–1632. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar] [CrossRef]
- Wang, C.-Y.; Liao, H.-Y.M.; Wu, Y.-H.; Chen, P.-Y.; Hsieh, J.-W.; Yeh, I.-H. CSPNet: A New Backbone that can Enhance Learning Capability of CNN. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 14–19 June 2020; pp. 1571–1580. [Google Scholar] [CrossRef]
- Lucas, B.D.; Kanade, T. An Iterative Image Registration Technique with an Application to Stereo Vision. In Proceedings of the 7th International Joint Conference on Artificial Intelligence—Volume 2, Vancouver, BC, Canada, 24–28 August 1981; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 1981; pp. 674–679. [Google Scholar]
Method | Perspective | Initial Matching | Correct Matching | Rate (%) |
---|---|---|---|---|
ORB | 1 | 329 | 46 | 13.98 |
2 | 374 | 97 | 25.94 | |
3 | 261 | 17 | 6.51 | |
4 | 302 | 5 | 1.66 | |
GCNv2-tiny | 1 | 341 | 124 | 36.36 |
2 | 563 | 386 | 68.56 | |
3 | 367 | 89 | 24.25 | |
4 | 435 | 92 | 21.15 |
YOLOv5s | GS-YOLOv5s | Promotion Rate (%) | |
---|---|---|---|
mAP_0.5 | 94.291 | 93.473 | −0.87 |
mAP_0.5:0.95 | 75.689 | 79.418 | 4.93 |
FPS | 112 | 135 | 20.54 |
Params (M) | 15.2 | 12.7 | 16.45 |
FLOPs (ms) | 15.6 | 13.2 | 15.38 |
Sequence | ORB-SLAM3 | DLD-SLAM | Promotion Rate of RMSE (%) | ||||||
---|---|---|---|---|---|---|---|---|---|
RMSE | Mean | Median | S.D. | RMSE | Mean | Median | S.D. | ||
fr3-w-xyz | 0.6847 | 0.6097 | 0.6306 | 0.3116 | 0.0185 | 0.0163 | 0.0147 | 0.0088 | 97.29 |
fr3-w-rpy | 0.8003 | 0.6846 | 0.6584 | 0.4145 | 0.0424 | 0.0307 | 0.0229 | 0.0293 | 94.71 |
fr3-w-halfsphere | 0.7057 | 0.6481 | 0.6041 | 0.2792 | 0.0219 | 0.0186 | 0.0156 | 0.0118 | 96.89 |
fr3-w-halfsphere | 0.4028 | 0.368 | 0.3017 | 0.1638 | 0.0056 | 0.0049 | 0.0043 | 0.0028 | 98.61 |
Sequence | ORB-SLAM3 | DS-SLAM | Detect-SLAM | DynaSLAM | DLD-SLAM |
---|---|---|---|---|---|
fr3-w-xyz | 0.6847 | 0.0257 | 0.0254 | 0.0156 | 0.0185 |
fr3-w-rpy | 0.8003 | 0.4453 | 0.4559 | 0.0358 | 0.0424 |
fr3-w-halfsphere | 0.7057 | 0.0346 | 0.2021 | 0.0179 | 0.0219 |
fr3-w-static | 0.4028 | 0.0072 | 0.0069 | 0.0011 | 0.0056 |
Method | Time Consumption (Unit: ms) | |
---|---|---|
Front End | Per Frame | |
ORB-SLAM3 | - | 46.81 |
DynaSLAM | 310.87 | 376.36 |
DS-SLAM | 42.61 | 78.46 |
Detect-SLAM | 57.74 | 96.14 |
DLD-SLAM | 31.32 | 65.82 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yu, H.; Wang, Q.; Yan, C.; Feng, Y.; Sun, Y.; Li, L. DLD-SLAM: RGB-D Visual Simultaneous Localisation and Mapping in Indoor Dynamic Environments Based on Deep Learning. Remote Sens. 2024, 16, 246. https://doi.org/10.3390/rs16020246
Yu H, Wang Q, Yan C, Feng Y, Sun Y, Li L. DLD-SLAM: RGB-D Visual Simultaneous Localisation and Mapping in Indoor Dynamic Environments Based on Deep Learning. Remote Sensing. 2024; 16(2):246. https://doi.org/10.3390/rs16020246
Chicago/Turabian StyleYu, Han, Qing Wang, Chao Yan, Youyang Feng, Yang Sun, and Lu Li. 2024. "DLD-SLAM: RGB-D Visual Simultaneous Localisation and Mapping in Indoor Dynamic Environments Based on Deep Learning" Remote Sensing 16, no. 2: 246. https://doi.org/10.3390/rs16020246
APA StyleYu, H., Wang, Q., Yan, C., Feng, Y., Sun, Y., & Li, L. (2024). DLD-SLAM: RGB-D Visual Simultaneous Localisation and Mapping in Indoor Dynamic Environments Based on Deep Learning. Remote Sensing, 16(2), 246. https://doi.org/10.3390/rs16020246