The Method of Static Semantic Map Construction Based on Instance Segmentation and Dynamic Point Elimination
Abstract
:1. Introduction
- On the basis of ORB-SLAM3, we use multiple concurrency technology to add an instance segment thread. this thread uses FPN(Feature Pyramid Network) [16]+ Mask R-CNN network and is written in C++ language to extract the semantic information of image frames. Since the main language style of ORB-SLAM3 is C++, this makes the modules of the system become orderly and harmonious.
- We propose a new method of combining with a deep learning FPN+Mask R-CNN network with global dense optical flow to obtain semantic information and eliminate the dynamic points in objects under the dynamic scene, which solves the redundant tracking problem of visual odometry and improves the accuracy and robustness of ORB-SLAM3 in dynamic scene effectively.
- Our system integrates 2D semantic information and 3D point cloud to construct a semantic map with perceptual information, further improving the robot’s ability to perceive and understand the surrounding environment.
2. Related Work
2.1. Improvement of Visual Odometry Performance
2.2. Visual SLAM in a Dynamic Scene
2.3. Semantic Information of Maps
3. System Description
3.1. System Components
3.2. Semantic Segmentation
3.3. Dynamic Points Elimination
3.4. Static Semantic Map Construction
4. Experimental Results
4.1. Dynamic Object Eliminating Experiment
4.2. Dataset Experiment
4.3. Real Scene Test
5. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Wang, S.; Wu, Z.; Zhang, W. An Overview of SLAM. In Proceedings of the Chinese Intelligent Systems Conference, CISC 2018, Wenzhou, China, 1 January 2019; pp. 673–681. [Google Scholar]
- Davison, A.J.; Reid, I.D.; Molton, N.D.; Stasse, O. MonoSLAM: Real-time single camera SLAM. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 29, 1052–1067. [Google Scholar] [CrossRef] [Green Version]
- Mur-Artal, R.; Tardos, J.D. ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo and RGB-D Cameras. IEEE Trans. Robot. 2017, 33, 1255–1262. [Google Scholar] [CrossRef] [Green Version]
- Carlos, C.; Richard, E.; Gomez, R.J.J. ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual–Inertial, and Multimap SLAM. IEEE Trans. Robot. 2021, 1–17. [Google Scholar] [CrossRef]
- Friedrich, F.; Davide, S. Visual odometry: Part II: Matching, robustness, optimization, and applications. IEEE Rob. Autom. Mag. 2012, 19, 78–90. [Google Scholar]
- Jorge, F.-P.; Jose, R.A.; Juan Manuel, R.-M. Visual simultaneous localization and mapping: A survey. Artif. Intell. Rev. 2012, 43, 55–81. [Google Scholar]
- Xia, L.; Cui, J.; Shen, R.; Xu, X.; Gao, Y.; Li, X. A survey of image semantics-based visual simultaneous localization and mapping: Application-oriented solutions to autonomous navigation of mobile robots. Int. J. Adv. Rob. Syst. 2020, 17, 1729881420919185. [Google Scholar] [CrossRef]
- Smirnov, E.A.; Timoshenko, D.M.; Andrianov, S.N. Comparison of Regularization Methods for ImageNet Classification with Deep Convolutional Neural Networks. AASRI Procedia 2014, 6, 89–94. [Google Scholar] [CrossRef]
- Ross, G.; Jeff, D.; Trevor, D.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the 27th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Vijay, B.; Alex, K.; Roberto, C. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar]
- Ren, S.; He, K.; Ross, G.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the 16th IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the 14th European Conference on Computer Vision, ECCV 2016, Amsterdam, The Netherlands, 8–16 October 2016; pp. 21–37. [Google Scholar]
- Raul, M.-A.; Montiel, J.M.M.; Tardos, J.D. ORB-SLAM: A Versatile and Accurate Monocular SLAM System. IEEE Trans. Rob. 2015, 31, 1147–1163. [Google Scholar]
- Lin, T.-Y.; Piotr, D.; Ross, G.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2017; pp. 936–944. [Google Scholar]
- Cui, L.; Ma, C.; Wen, F. Direct-ORB-SLAM: Direct Monocular ORB-SLAM. In Proceedings of the 2nd International Conference on Computer Information Science and Application Technology, CISAT 2019, Guangzhou, China, 30 August–1 September 2019; p. 032016. [Google Scholar]
- Zhang, F.; Rui, T.; Yang, C.; Shi, J. LAP-SLAM: A Line-Assisted Point-Based Monocular VSLAM. Electronics 2019, 8, 2079–9292. [Google Scholar] [CrossRef] [Green Version]
- Lianos, K.-N.; Schonberger, J.L.; Pollefeys, M.; Sattler, T. VSO: Visual Semantic Odometry. In Proceedings of the Computer Vision-ECCV 2018—15th European Conference, Munich, Germany, 8–14 September 2018; pp. 246–263. [Google Scholar]
- Zhu, A.Z.; Atanasov, N.; Daniilidis, K. Event-Based Visual Inertial Odometry. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 5816–5824. [Google Scholar]
- Yu, C.; Liu, Z.; Liu, X.-J.; Xie, F.; Yang, Y.; Wei, Q.; Fei, Q. DS-SLAM: A Semantic Visual SLAM towards Dynamic Environments. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2018, Madrid, Spain, 27 December 2018; pp. 1168–1174. [Google Scholar]
- Bescos, B.; Facil, J.M.; Civera, J.; Neira, J. DynaSLAM: Tracking, Mapping, and Inpainting in Dynamic Scenes. IEEE Robot. Autom. 2018, 3, 4076–4083. [Google Scholar] [CrossRef] [Green Version]
- Deyvid, K.; Aljoša, O.; Jörg, S.; Leibe, B. Scene flow propagation for semantic mapping and object discovery in dynamic street scenes. In Proceedings of the 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, Korea, 9–14 October 2016; pp. 1785–1792. [Google Scholar]
- Palazzolo, E.; Behley, J.; Lottes, P.; Giguere, P.; Stachniss, C. ReFusion: 3D Reconstruction in Dynamic Environments for RGB-D Cameras Exploiting Residuals. In Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 3–8 November 2019; pp. 7855–7862. [Google Scholar]
- Rünz, M.; Agapito, L. Co-fusion: Real-time segmentation, tracking and fusion of multiple objects. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; pp. 4471–4478. [Google Scholar]
- Weinmann, M.; Jutzi, B.; Hinz, S.; Mallet, C. Semantic point cloud interpretation based on optimal neighborhoods, relevant features and efficient classifiers. ISPRS J. Photogramm. Remote Sens. 2015, 105, 286–304. [Google Scholar] [CrossRef]
- Qi, X.; Yang, S.; Yan, Y. Deep Learning Based Semantic Labelling of 3D Point Cloud in Visual SLAM. In Proceedings of the 2018 3rd International Conference on Automation, Control and Robotics Engineering, CACRE 2018, Chengdu, China, 19–22 July 2018; pp. 12–23. [Google Scholar]
- Guan, P.; Cao, Z.; Chen, E.; Liang, S.; Tan, M.; Yu, J. A real-time semantic visual SLAM approach with points and objects. Int. J. Adv. Rob. Syst. 2020, 17, 1729881420905443. [Google Scholar] [CrossRef] [Green Version]
- Yue, Y.; Zhao, C.; Wu, Z.; Yang, C.; Wang, Y.; Wang, D. Collaborative Semantic Understanding and Mapping Framework for Autonomous Systems. IEEE/ASME Trans. Mechatronics 2021, 26, 978–989. [Google Scholar] [CrossRef]
- Qin, T.; Chen, T.; Chen, Y.; Su, Q. AVP-SLAM: Semantic Visual Mapping and Localization for Autonomous Vehicles in the Parking Lot. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 24 October 2020–24 January 2021; pp. 5939–5945. [Google Scholar]
- Li, W.; Gu, J.; Chen, B. Incremental Instance-Oriented 3D Semantic Mapping via RGB-D Cameras for Unknown Indoor Scene. Discret. Dyn. Nat. Soc. 2020, 2020, 2528954. [Google Scholar]
- ORBSLAM2_with_Pointcloud_Map. Available online: https://github.com/gaoxiang12/ORBSLAM2_with_pointcloud_map (accessed on 19 July 2021).
- Andreas, T.; Karl-Peter, F.; Hannes, F. REST-Net: A dynamic rule-based IDS for VANETs. In Proceedings of the 7th IFIP Wireless and Mobile Networking Conference, WMNC 2014, Vilamoura, Portugal, 20–22 May 2014; pp. 1–8. [Google Scholar]
- Lin, T.-Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollar, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. In In Proceedings of the 13th European Conference on Computer Vision, ECCV 2014, Zurich, Switzerland, 6–12 September 2014; pp. 740–755. [Google Scholar]
- Farnebäck, G. Polynomial Expansion for Orientation and Motion Estimation. Ph.D. Dissertation, Linköping University Electronic Press, Linköping, Sweden, 2002. [Google Scholar]
- Sturm, J.; Engelhard, N.; Endres, F.; Burgard, W.; Cremers, D. A benchmark for the evaluation of RGB-D SLAM systems. In Proceedings of the 25th IEEE/RSJ International Conference on Robotics and Intelligent Systems, IROS 2012, Vilamoura, Algarve, Portugal, 7–12 October 2012; pp. 573–580. [Google Scholar]
- evo. Available online: https://github.com/MichaelGrupp/evo (accessed on 31 July 2021).
Sequences | ORB-SLAM3 | Our System | ||||||
---|---|---|---|---|---|---|---|---|
RMSE | Median | Mean | S.D. | RMSE | Median | Mean | S.D. | |
fr3-walking-xyz | 0.6687 | 0.5124 | 0.5823 | 0.3288 | 0.0150 | 0.0110 | 0.0128 | 0.0078 |
fr3-walking-rpy | 0.8461 | 0.7803 | 0.7738 | 0.3424 | 0.0314 | 0.0203 | 0.0256 | 0.0183 |
fr3-walking-static | 0.1072 | 0.0788 | 0.0933 | 0.0528 | 0.0073 | 0.0059 | 0.0065 | 0.0033 |
fr3-walking-halfsphere | 0.5939 | 0.4562 | 0.5092 | 0.3055 | 0.0180 | 0.0150 | 0.0162 | 0.0079 |
fr3-sitting-static | 0.0087 | 0.0068 | 0.0075 | 0.0044 | 0.0065 | 0.0048 | 0.0056 | 0.0033 |
Sequences | ORB-SLAM3 | Our System | ||||||
---|---|---|---|---|---|---|---|---|
RMSE | Median | Mean | S.D. | RMSE | Median | Mean | S.D. | |
fr3-walking-xyz | 0.0255 | 0.0163 | 0.0207 | 0.0148 | 0.0121 | 0.0080 | 0.0099 | 0.0069 |
fr3-walking-rpy | 0.0281 | 0.0180 | 0.0221 | 0.0172 | 0.0197 | 0.0123 | 0.0153 | 0.0124 |
fr3-walking-static | 0.0290 | 0.0065 | 0.0128 | 0.0260 | 0.0066 | 0.0051 | 0.0057 | 0.0032 |
fr3-walking-halfsphere | 0.0236 | 0.0145 | 0.0188 | 0.0143 | 0.0128 | 0.0093 | 0.0107 | 0.0069 |
fr3-sitting-static | 0.0048 | 0.0037 | 0.0041 | 0.0024 | 0.0056 | 0.0042 | 0.0049 | 0.0027 |
Sequences | Improvements (APE) | Improvements (RPE) | ||||||
---|---|---|---|---|---|---|---|---|
RMSE | Median | Mean | S.D. | RMSE | Median | Mean | S.D. | |
fr3-walking-xyz | 97.76% | 97.85% | 97.80% | 97.63% | 52.55% | 50.92% | 52.17% | 53.38% |
fr3-walking-rpy | 96.29% | 97.4% | 96.69% | 94.66% | 29.89% | 31.67% | 30.77% | 27.91% |
fr3-walking-static | 93.19% | 92.51% | 93.03% | 93.75% | 77.24% | 21.54% | 55.47% | 87.69% |
fr3-walking-halfsphere | 96.97% | 96.71% | 96.82% | 97.41% | 45.76% | 35.86% | 43.09% | 51.75% |
fr3-sitting-static | 25.29% | 29.41% | 25.33% | 25.00% | - | - | - | - |
Sequences | ORB-SLAM3 APE | Our APE | Improvements |
---|---|---|---|
Average Value | Average Value | (APE) | |
fr3-walking-xyz | 0.5230 | 0.0116 | 97.78% |
fr3-walking-rpy | 0.6856 | 0.0239 | 96.51% |
fr3-walking-static | 0.0830 | 0.0057 | 93.13% |
fr3-walking-halfsphere | 0.4662 | 0.0142 | 96.95% |
Sequences | ORB-SLAM3 RPE | Our RPE | Improvements |
---|---|---|---|
Average Value | Average Value | (RPE) | |
fr3-walking-xyz | 0.0193 | 0.0092 | 52.33% |
fr3-walking-rpy | 0.0213 | 0.0149 | 30.04% |
fr3-walking-static | 0.0185 | 0.0051 | 72.43% |
fr3-walking-halfsphere | 0.0178 | 0.0099 | 44.38% |
Algorithm | RMSE | Median | Mean | S.D. |
---|---|---|---|---|
DS-SLAM | 0.0171 | 0.0118 | 0.0140 | 0.0098 |
DynaSLAM | 0.0133 | 0.0097 | 0.0112 | 0.0071 |
Our | 0.0150 | 0.0110 | 0.0128 | 0.0078 |
Algorithm | RMSE | Median | Mean | S.D. |
---|---|---|---|---|
DS-SLAM | 0.0139 | 0.0080 | 0.0105 | 0.0091 |
DynaSLAM | 0.0122 | 0.0082 | 0.0098 | 0.0073 |
Our | 0.0121 | 0.0080 | 0.0099 | 0.0069 |
Algorithm | Mean Tracking Time (ms) |
---|---|
DS-SLAM | 102.9 |
DynaSLAM | 594.4 |
Our SLAM | 371.6 |
Algorithm Name | Instance Segmentation | Compute Dynamic- Static Mask | Dynamic Points Elimination |
Time (ms) | 336.098 | 19.461 | 2.452 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors.
Share and Cite
Li, J.; Zhang, R.; Liu, Y.; Zhang, Z.; Fan, R.; Liu, W. The Method of Static Semantic Map Construction Based on Instance Segmentation and Dynamic Point Elimination. Electronics 2021, 10, 1883. https://doi.org/10.3390/electronics10161883
Li J, Zhang R, Liu Y, Zhang Z, Fan R, Liu W. The Method of Static Semantic Map Construction Based on Instance Segmentation and Dynamic Point Elimination. Electronics. 2021; 10(16):1883. https://doi.org/10.3390/electronics10161883
Chicago/Turabian StyleLi, Jingyu, Rongfen Zhang, Yuhong Liu, Zaiteng Zhang, Runze Fan, and Wenjiang Liu. 2021. "The Method of Static Semantic Map Construction Based on Instance Segmentation and Dynamic Point Elimination" Electronics 10, no. 16: 1883. https://doi.org/10.3390/electronics10161883
APA StyleLi, J., Zhang, R., Liu, Y., Zhang, Z., Fan, R., & Liu, W. (2021). The Method of Static Semantic Map Construction Based on Instance Segmentation and Dynamic Point Elimination. Electronics, 10(16), 1883. https://doi.org/10.3390/electronics10161883