Deep Learning for Visual SLAM: The State-of-the-Art and Future Trends
Abstract
:1. Introduction
- Adding auxiliary modules based on deep learning.
- Replacing the original modules of a traditional VSLAM system with deep learning modules.
- Replacing the traditional VSLAM system with end-to-end deep neural networks.
Title and Reference | Year | Deep Learning |
---|---|---|
Keyframe-based monocular SLAM: design, survey, and future directions [20] | 2017 | No |
Survey and evaluation of monocular visual-inertial SLAM algorithms for augmented reality [21] | 2019 | No |
Collaborative visual SLAM for multiple agents: A brief survey [22] | 2019 | No |
A state-of-the-art review on mobile robotics tasks using artificial intelligence and visual data [23] | 2021 | Yes |
SLAM; definition and evolution [24] | 2021 | Yes |
Role of deep learning in loop closure detection for visual and LiDAR SLAM: A survey [25] | 2021 | Yes |
A review of visual SLAM methods for autonomous driving vehicles [26] | 2022 | No |
Advances in visual simultaneous localisation and mapping techniques for autonomous vehicles: A review [27] | 2022 | Yes |
A survey of state-of-the-art on visual SLAM [28] | 2022 | Yes |
Visual SLAM algorithms and their application for AR, mapping, localization and wayfinding [29] | 2022 | No |
A comprehensive survey of visual SLAM algorithms [30] | 2022 | Yes |
An overview on visual SLAM: From tradition to semantic [31] | 2022 | Yes |
Overview of deep learning application on visual SLAM [19] | 2022 | Yes |
Perception and navigation in autonomous systems in the era of learning: A survey [32] | 2022 | Yes |
Approaches, challenges, and applications for deep visual odometry: Toward complicated and emerging areas [33] | 2022 | Yes |
In-depth review of augmented reality: Tracking technologies, development tools, AR displays, collaborative AR, and security concerns [34] | 2023 | No |
Augmented reality-based guidance in product assembly and maintenance/repair perspective: A state of the art review on challenges and opportunities [35] | 2023 | No |
Automated guided vehicles and autonomous mobile robots for recognition and tracking in civil engineering [36] | 2023 | Yes |
- A brief description of historical development and some problem statements of deep-based VSLAM tasks are presented. Although the historical evaluation of SLAM systems is divided into two main periods: the past (1985–1999) and the present (2001–2023), we introduce a different interpretation in terms of the development of deep models—since 2017.
- We provide a new and complete classification and overview of the recent VSLAM methods based on three ways to integrate deep learning into traditional VSLAM systems: (1) adding auxiliary modules based on deep learning, (2) replacing the original modules of a traditional VSLAM system with deep learning modules, and (3) replacing the traditional VSLAM system with end-to-end deep neural networks. These three ways have different degrees of elaborateness due to a short period of development.
- Description of multi-modal VSLAM datasets suitable for supervised training and testing will help to select the most suitable datasets in terms of intra-cross and inter-cross validation. Most VSLAM datasets use real data obtained from multi-modal sensors. However, several datasets include simulation data using third party software tools.
- Critical analysis of advantages and disadvantages provides further research on the integration of deep learning into VSLAM methods applied in many practical fields.
2. Problem Formulation of VSLAM Based on Deep Learning
3. Deep Learning Models for Visual SLAM
3.1. Auxiliary Modules Based on Deep Learning
Method | Year | Main Subject | Data | Learning Strategy | Dataset | |
---|---|---|---|---|---|---|
SP | US | |||||
2D3D-MatchNet [48] | 2019 | Feature extraction | Monocular, LiDAR data | + | Oxford 2D-3D Patches Dataset | |
SP-Flow [49] | 2020 | Feature extraction | Monocular, Stereo, Depth | + | KITTI Visual Odometry, TUM RGB-D | |
LIFT-SLAM [50] | 2021 | Feature extraction | Monocular, Intertial data | + | KITTI Visual Odometry, EuRoc MAV | |
[51] | 2018 | Semantic segmentation | Monocular | + | CARLA simulator | |
[52] | 2019 | Semantic segmentation | Monocular | + | KITTI Visual Odometry, TUM-mono | |
ObjectFusion [53] | 2019 | Semantic segmentation | Monocular, Depth | + | Own dataset | |
Deep SAFT [54] | 2020 | Semantic segmentation | Monocular | + | TUM RGB-D, ICL-NUIM | |
EF-Razor [55] | 2020 | Semantic segmentation | Monocular, Depth | + | TUM RGB-D | |
RoomSLAM [56] | 2020 | Semantic segmentation | Monocular | + | MIT Stata Center, TUM RGB-D, | |
USS-SLAM [57] | 2020 | Semantic segmentation | Monocular | + | Pascal VOC, SBD, COCO | |
[58] | 2022 | Semantic segmentation | Monocular, Depth | + | Virtual KITTI 2, KITTI Visual Odometry, Extended CMU Seasons, RobotCar Seasons | |
[59] | 2020 | Semantic segmentation | Monocular, Inetrtial | + | ADVIO | |
[60] | 2022 | Semantic segmentation | Monocular | TUM RGB-D | ||
[45] | 2022 | Pose estimation | Monocular | + | KITTI Visual Odometry, TUM RGB-D, own dataset | |
[46] | 2022 | Pose estimation | Monocular | + | KITTI Visual Odometry | |
ObjectFusion [61] | 2022 | Pose estimation | Monocular, Depth | + | SceneNet RGB-D, ScanNet | |
Cowan-GGR [62] | 2022 | Pose estimation | Monocular | + | KITTI Visual Odometry, MidAir, Synthetic images | |
TransPoseNet [63] | 2023 | Pose estimation | Monocular, Depth | + | RGB-D 7-Scenes | |
ORGPoseNet, ORGMapNet [8] | 2023 | Pose estimation | Monocular | + | RGB-D 7-Scenes, RIO10, Oxford RobotCar | |
LKN [64] | 2019 | Map construction | Monocular | + | KITTI Visual Odometry, ApolloScape | |
DRM-SLAM [65] | 2020 | Map construction | Monocular, Depth | + | NYU RGB-D V2, TUM RGB-D, ICL-NUIM | |
Mask-RCNN [66] | 2020 | Map construction | Monocular | + | Own dataset | |
[3] | 2021 | Map construction | Stereo | + | Own agricultural dataset | |
[67] | 2020 | Loop closure | Monocular | + | + | City Centre, KITTI Visual Odometry, Gardens Point Walking |
Triplet Loss [68] | 2021 | Loop closure | Monocular | + | TUM2, City Centre | |
[69] | 2022 | Loop closure | Monocular | + | KITTI Visual Odometry, Oxford RobotCar | |
PlaceNet [70] | 2023 | Loop closure | Monocular | + | CityScape, subset of ADE20K |
3.2. Deep Learning Modules
Method | Year | Main Subject | Data | Learning Strategy | Dataset | |
---|---|---|---|---|---|---|
SP | US | |||||
[82] | 2017 | Camera relocalization | Monocular | + | RGB-D 7-Scenes | |
DistanceNet [83] | 2019 | Distance estimation | Monocular | + | KITTI Visual Odometry | |
DDL-SLAM [84] | 2020 | Object segmentation, Background inpainting | Monocular, Depth | + | TUM RGB-D, PASCAL VOC | |
PSPNet-SLAM [85] | 2020 | Object segmentation | Monocular, Depth | + | TUM RGB-D | |
[86] | 2022 | Path planning | Monocular | + | Own dataset | |
DEM [87] | 2020 | Scene reconstruction | Monocular, Depth | + | NYU-Depth-v2, KITTI Visual Odometry |
3.3. End-to-End Deep Neural Networks
Method | Year | Main Subject | Data | Learning Strategy | Dataset | |
---|---|---|---|---|---|---|
SP | US/RL | |||||
vCNN [90] | 2022 | Sub-VSLAM | Monocular | + | M2CAI 2016 Challenge | |
PU-PoseNet [47] | 2022 | Pose estimation | Monocular | + | KITTI Visual Odometry | |
[91] | 2022 | Pose estimation | Monocular, Inertial | + | EuRoC, own dataset | |
VIOLearner [92] | 2018 | Trajectory estimation | Monocular, Depth, Inertial | + | KITTI Visual Odometry | |
DeepMLE [93] | 2022 | Depth estimator | Monocular, Depth | + | KITTI Visual Odometry, Virtual KITTI 2, DeMoN | |
PoseConvGRU [94] | 2020 | Ego-motion estimation | Monocular | + | KITTI Visual Odometry, Malaga 2013 | |
DeepAVO [95] | 2022 | Ego-motion estimation | Monocular | + | KITTI Visual Odometry, Malaga, ApolloScape, own dataset | |
DeepVO [96] | 2017 | Visual odometry | Monocular | + | KITTI Visual Odometry | |
UnDeeopVO [97] | 2018 | Visual odometry | Monocular, Depth, Stereo | + | KITTI Visual Odometry | |
HVIOnet [98] | 2022 | Visual–inertial odometry | Monocular, Intertial | + | EuRoC, ROS-based simulation dataset | |
SelfVIO [99] | 2022 | Visual–inertial odometry | Monocular | + | KITTI Visual Odometry, EuRoC, Cityscapes | |
[100] | 2021 | Loop closure | Monocular | + | Own dataset | |
MGRL [101] | 2021 | Visual navigation | Monocular | + | AI2-THOR framework | |
VGF-Net [102] | 2021 | Drone navigation | Monocular, Depth, GPS | + | Own dataset |
4. Datasets for Visual SLAM
5. Discussion and Future Trends
- Complex Types of Unsupervised Learning: Currently, supervised learning prevails with the need to have access to large labeled datasets. Unsupervised learning is more preferable from a practical point of view. However, it provides less precision, which is of great importance for many near-photogrammetric VSLAM tasks. It is evident that meta-learning, reinforcement learning, and life-long learning, as well as the development of efficient architectures, will make it possible to compensate for the inherent properties of unsupervised learning.
- Robustness in Challenging Scenes: The robustness of algorithms plays an important role in practical applications. Deep learning models well-trained on datasets need to be robust to sensor noise, lighting, weather conditions, and complex scenarios when used in real environments. Thus, the ego-motion and the motion of objects must be accurately estimated. At the same time, the performance of some sensors, such as LiDAR and RADAR sensors, is low in extreme meteorological conditions. Conventional VSLAM systems based on stable visual landmarks are of limited use and currently cannot be considered an effective solution. Another problem is failure in situations with fast movement. Some algorithms are tested on multiple datasets, including their own datasets. However, the problem of generalization is far from its sufficient solution.
- Real-time Deployment: Real-time implementation is one of the main factors positively influencing the practical development of VSALM for autonomous systems. At present, most deep learning architectures tend to use complex modules, which increase computational calculations and costs. Nevertheless, it is difficult to expect in the near future that light-weight neural solutions will be acceptable for such complex problems as VSLAM provides. Thus, we can talk about a huge gap between simulation environments and real scenes.
- Multi-task and Multi-modal Architectures: Deep networks that use multi-task and multi-modal paradigms are attracting particular attention. Although datasets typically provide multi-modal data from sensors such as color camera, stereo camera, event-based camera, IMU, LiDAR, etc., deep learning networks cannot fuse or process all types of data directly due to their inherent limitations. Data fusion remains an open issue not only for VSLAM problems, but for many others.
- Dynamic Environment: The first VSLAM and traditional SLAM methods focused on a static environment and failed in a dynamic environment. Only recently developed end-to-end deep networks can be implemented as real time systems in a dynamic environment, primarily due to the high performance of trackers with recognition functions. Moreover, some solutions involve not only semantic but also instance segmentation. Dynamic scenarios with dynamic objects are typical environments for autonomous systems (road networks, highways, etc.).
- Navigation Control: Path planning and navigation control are the core modules for autonomous vehicles that affect safety. Many VSLAM algorithms lack an efficient control technique for navigation. Navigation control is highly dependent on the state of the environment and the performance of the algorithms. This issue requires future work for fully autonomous vehicles.
6. Conclusions
Funding
Data Availability Statement
Conflicts of Interest
References
- Palomeras, N.; Carreras, M.; Andrade-Cetto, J. Active SLAM for autonomous underwater exploration. Remote Sens. 2019, 11, 2827. [Google Scholar] [CrossRef]
- Fang, B.; Mei, G.; Yuan, X.; Wang, L.; Wang, Z.; Wang, J. Visual SLAM for robot navigation in healthcare facility. Pattern Recognit. 2021, 113, 107822. [Google Scholar] [CrossRef] [PubMed]
- Chen, M.; Tang, Y.; Zou, X.; Huang, Z.; Zhou, H.; Chen, S. 3D global mapping of large-scale unstructured orchard integrating eye-in-hand stereo vision and SLAM. Comput. Electron. Agric. 2021, 187, 106237. [Google Scholar] [CrossRef]
- Ouyang, M.; Shi, X.; Wang, Y.; Tian, Y.; Shen, Y.; Wang, D.; Wang, P.; Cao, Z. A collaborative visual SLAM framework for service robots. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2021), Prague, Czech Republic, 27 September–1 October 2021; pp. 8679–8685. [Google Scholar] [CrossRef]
- Kuo, C.-Y.; Huang, C.-C.; Tsai, C.-H.; Shi, Y.-S.; Smith, S. Development of an immersive SLAM-based VR system for teleoperation of a mobile manipulator in an unknown environment. Comput. Ind. 2021, 132, 103502. [Google Scholar] [CrossRef]
- Li, W.; Wang, J.; Liu, M.; Zhao, S. Real-time occlusion handling for augmented reality assistance assembly systems with monocular images. J. Manuf. Syst. 2022, 62, 561–574. [Google Scholar] [CrossRef]
- Sucar, E.; Liu, S.; Ortiz, J.; Davison, A.J. iMAP: Implicit mapping and positioning in real-time. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; pp. 6229–6238. [Google Scholar]
- Qia, C.; Xiang, Z.; Wang, X.; Chen, S.; Fan, Y.; Zhao, X. Objects matter: Learning object relation graph for robust absolute pose. Neurocomputing 2023, 521, 11–26. [Google Scholar] [CrossRef]
- Davison, A.J.; Reid, I.D.; Molton, N.D.; Stasse, O. MonoSLAM: Real-time single camera SLAM. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 29, 1052–1067. [Google Scholar] [CrossRef]
- Klein, G.; Murray, D. Parallel tracking and mapping for small AR workspaces. In Proceeding of the 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality, Nara, Japan, 13–16 November 2007; pp. 1–10. [Google Scholar] [CrossRef]
- Zou, D.P.; Tan, P. CoSLAM: Collaborative visual SLAM in dynamic environments. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 354–366. [Google Scholar] [CrossRef]
- Mur-Artal, R.; Montiel, J.M.M.; Tardos, J.D. ORB-SLAM: A versatile and accurate monocular SLAM system. IEEE Trans. Robot. 2015, 31, 1147–1163. [Google Scholar] [CrossRef]
- Liu, H.M.; Zhang, G.F.; Bao, H.J. Robust keyframe-based monocular SLAM for augmented reality. In Proceedings of the 2016 IEEE International Symposium on Mixed and Augmented Reality (ISMAR-Adjunct), Merida, Mexico, 19–23 September 2016; pp. 1–10. [Google Scholar] [CrossRef]
- Campos, C.; Elvira, R.; Rodrıguez, J.J.G.; Montiel, J.M.; Tardos, J.D. ORB-SLAM3: An accurate open-source library for visual, visual–inertial, and multimap SLAM. IEEE Trans. Robot. 2021, 37, 1874–1890. [Google Scholar] [CrossRef]
- Forster, C.; Zhang, Z.C.; Gassner, M.; Werlberger, M.; Scaramuzza, D. SVO: Semi-direct visual odometry for monocular and multicamera systems. IEEE Trans. Robot. 2017, 33, 249–265. [Google Scholar] [CrossRef]
- Qin, T.; Li, P.L.; Shen, S.J. VINS-mono: A robust and versatile monocular visual-inertial state estimator. IEEE Trans. Robot. 2018, 34, 1004–1020. [Google Scholar] [CrossRef]
- Zou, D.P.; Wu, Y.X.; Pei, L.; Ling, H.B.; Yu, W.X. StructVIO: Visual-inertial odometry with structural regularity of man-made environments. IEEE Trans. Robot. 2019, 35, 999–1013. [Google Scholar] [CrossRef]
- Sun, Y.; Liu, M.; Meng, M.Q.-H. Motion removal for reliable RGB-D SLAM in dynamic environments. Robot. Auton. Syst. 2018, 108, 115–128. [Google Scholar] [CrossRef]
- Li, S.; Zhang, D.; Xian, Y.; Li, B.; Zhang, T.; Zhong, C. Overview of deep learning application on visual SLAM. Displays 2022, 74, 102298. [Google Scholar] [CrossRef]
- Younes, G.; Asmar, D.; Shammas, E.; Zelek, J. Keyframe-based monocular SLAM: Design, survey, and future directions. Robot. Auton. Syst. 2017, 98, 67–88. [Google Scholar] [CrossRef]
- Li, J.; Yang, B.; Chen, D.; Wang, N.; Zhang, G.; Bao, H. Survey and evaluation of monocular visual-inertial SLAM algorithms for augmented reality. Virtual Real. Intell. Hardw. 2019, 1, 386–410. [Google Scholar] [CrossRef]
- Zou, D.; Tan, P.; Yu, W. Collaborative visual SLAM for multiple agents: A brief survey. Virtual Real. Intell. Hardw. 2019, 1, 461–482. [Google Scholar] [CrossRef]
- Cebollada, S.; Payá, L.; Flores, M.; Peidró, A.; Reinoso, O. A state-of-the-art review on mobile robotics tasks using artificial intelligence and visual data. Expert Syst. Appl. 2021, 167, 114195. [Google Scholar] [CrossRef]
- Taheri, H.; Xia, Z.C. SLAM; definition and evolution. Eng. Appl. Artif. Intell. 2021, 97, 104032. [Google Scholar] [CrossRef]
- Arshad, S.; Kim, G.-W. Role of deep learning in loop closure detection for visual and LiDAR SLAM: A survey. Sensors 2021, 21, 1243. [Google Scholar] [CrossRef] [PubMed]
- Cheng, J.; Zhang, L.; Chen, Q.; Hu, X.; Cai, J. A review of visual SLAM methods for autonomous driving vehicles. Eng. Appl. Artif. Intell. 2022, 114, 104992. [Google Scholar] [CrossRef]
- Bala, J.A.; Adeshina, S.A.; Aibinu, A.M. Advances in visual simultaneous localisation and mapping techniques for autonomous vehicles: A review. Sensors 2022, 22, 8943. [Google Scholar] [CrossRef] [PubMed]
- Kazerouni, I.A.; Fitzgerald, L.; Dooly, G.; Toal, D. A survey of state-of-the-art on visual SLAM. Expert Syst. Appl. 2022, 205, 117734. [Google Scholar] [CrossRef]
- Theodorou, C.; Velisavljevic, V.; Dyo, V.; Nonyelu, F. Visual SLAM algorithms and their application for AR, mapping, localization and wayfinding. Array 2022, 15, 100222. [Google Scholar] [CrossRef]
- Macario Barros, A.; Michel, M.; Moline, Y.; Corre, G.; Carrel, F. A comprehensive survey of visual SLAM algorithms. Robotics 2022, 11, 24. [Google Scholar] [CrossRef]
- Chen, W.; Shang, G.; Ji, A.; Zhou, C.; Wang, X.; Xu, C.; Li, Z.; Hu, K. An overview on visual SLAM: From tradition to semantic. Remote Sens. 2022, 14, 3010. [Google Scholar] [CrossRef]
- Tang, Y.; Zhao, C.; Wang, J.; Zhang, C.; Sun, Q.; Zheng, W.X.; Du, W.; Qian, F.; Kurths, J. Perception and navigation in autonomous systems in the era of learning: A survey. IEEE Trans. Neural Netw. Learn. Syst. 2022, in press. [CrossRef]
- Wang, K.; Ma, S.; Chen, J.; Ren, F.; Lu, J. Approaches, challenges, and applications for deep visual odometry: Toward complicated and emerging areas. IEEE Trans. Cogn. Dev. Syst. 2022, 14, 35–49. [Google Scholar] [CrossRef]
- Syed, T.A.; Siddiqui, M.S.; Abdullah, H.B.; Jan, S.; Namoun, A.; Alzahrani, A.; Nadeem, A.; Alkhodre, A.B. In-depth review of augmented reality: Tracking technologies, development tools, AR displays, collaborative AR, and security concerns. Sensors 2023, 23, 146. [Google Scholar] [CrossRef]
- Eswaran, M.; Gulivindala, A.K.; Inkulu, A.K.; Bahubalendruni, R.M.V.A. Augmented reality-based guidance in product assembly and maintenance/repair perspective: A state of the art review on challenges and opportunities. Expert Syst. Appl. 2023, 213, 118983. [Google Scholar] [CrossRef]
- Zhang, J.; Yang, X.; Wang, W.; Guan, J.; Ding, L.; Lee, V.C.S. Automated guided vehicles and autonomous mobile robots for recognition and tracking in civil engineering. Autom. Constr. 2023, 146, 104699. [Google Scholar] [CrossRef]
- Martinelli, F.; Mattogno, S.; Romanelli, F. A resilient solution to Range-Only SLAM based on a decoupled landmark range and bearing reconstruction. Robot. Auton. Syst. 2023, 160, 104324. [Google Scholar] [CrossRef]
- Ila, V.; Porta, J.M.; Andrade-Cetto, J. Amortized constant time state estimation in Pose SLAM and hierarchical SLAM using a mixed Kalman-information filter. Robot. Auton. Syst. 2011, 59, 310–318. [Google Scholar] [CrossRef]
- Bonetto, E.; Goldschmid, P.; Pabst, M.; Black, M.J.; Ahmad, A. iRotate: Active visual SLAM for omnidirectional robots. Robot. Auton. Syst. 2022, 154, 104102. [Google Scholar] [CrossRef]
- Xie, H.; Chen, W.; Wang, J. Hierarchical forest based fast online loop closure for low-latency consistent visual-inertial SLAM. Robot. Auton. Syst. 2022, 151, 104035. [Google Scholar] [CrossRef]
- Lee, S.J.; Choi, H.; Hwang, S.S. Real-time depth estimation using recurrent CNN with sparse depth cues for SLAM system. Int. J. Control Autom. Syst. 2020, 18, 206–216. [Google Scholar] [CrossRef]
- Soares, J.C.V.; Gattass, M.; Meggiolaro, M.A. Crowd-SLAM: Visual SLAM towards crowded environments using object detection. J. Intell. Robot. Syst. 2021, 102, 50. [Google Scholar] [CrossRef]
- Liu, Y.; Miura, J. RDS-SLAM: Real-time dynamic SLAM using semantic segmentation methods. IEEE Access 2021, 9, 23772–23785. [Google Scholar] [CrossRef]
- Mur-Artal, R.; Tardós, J.D. ORB-SLAM2: An open-source SLAM system for monocular, stereo, and RGB-D cameras. IEEE Trans. Robot. 2017, 33, 1255–1262. [Google Scholar] [CrossRef]
- Zhu, Z.; Wang, J.; Xu, M.; Lin, S.; Chen, Z. InterpolationSLAM: An effective visual SLAM system based on interpolation network. Eng. Appl. Artif. Intell. 2022, 115, 105333. [Google Scholar] [CrossRef]
- Song, C.; Niu, M.; Liu, Z.; Cheng, J.; Wang, P.; Li, H.; Hao, L. Spatial-temporal 3D dependency matching with self-supervised deep learning for monocular visual sensing. Neurocomputing 2022, 481, 11–21. [Google Scholar] [CrossRef]
- Xiu, H.; Liang, Y.; Zeng, H.; Li, Q.; Liu, H.; Fan, B.; Li, C. Robust self-supervised monocular visual odometry based on prediction-update pose estimation network. Eng. Appl. Artif. Intell. 2022, 116, 105481. [Google Scholar] [CrossRef]
- Feng, M.; Hu, S.; Ang, M.H.; Lee, G.H. 2D3D-MatchNet: Learning to match keypoints across 2D image and 3D point cloud. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA 2019), Montreal, QC, Canada, 20–24 May 2019; pp. 4790–4796. [Google Scholar]
- Qin, Z.; Yin, M.; Li, G.; Yang, F. SP-Flow: Self-supervised optical flow correspondence point prediction for real-time SLAM. Comput. Aided Geom. Des. 2020, 82, 101928. [Google Scholar] [CrossRef]
- Bruno, H.M.S.; Colombini, E.L. LIFT-SLAM: A deep-learning feature-based monocular visual SLAM method. Neurocomputing 2021, 455, 97–110. [Google Scholar] [CrossRef]
- Kaneko, M.; Iwami, K.; Ogawa, T.; Yamasaki, T.; Aizawa, K. Mask-SLAM: Robust feature-based monocular SLAM by masking using semantic segmentation. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW 2018), Salt Lake City, UT, USA, 18–22 June 2018; pp. 371–379. [Google Scholar]
- Shao, C.; Zhang, C.; Fang, Z.; Yang, G. A deep learning-based semantic filter for RANSAC-based fundamental matrix calculation and the ORB-SLAM system. IEEE Access 2019, 8, 3212–3223. [Google Scholar] [CrossRef]
- Tian, G.; Liu, L.; Ri, J.H.; Liu, Y.; Sun, Y. ObjectFusion: An object detection and segmentation framework with RGB-D SLAM and convolutional neural networks. Neurocomputing 2019, 345, 3–14. [Google Scholar] [CrossRef]
- Xu, L.; Feng, C.; Kamat, V.R.; Menassa, C.C. A scene-adaptive descriptor for visual SLAM-based locating applications in built environments. Autom. Constr. 2020, 112, 103067. [Google Scholar] [CrossRef]
- Liu, W.; Mo, Y.; Jiao, J.; Deng, Z. EF-Razor: An effective edge-feature processing method in visual SLAM. IEEE Access 2020, 8, 140798–140805. [Google Scholar] [CrossRef]
- Rusli, I.; Trilaksono, B.R.; Adiprawita, W. RoomSLAM: Simultaneous localization and mapping with objects and indoor layout structure. IEEE Access 2020, 8, 196992–197004. [Google Scholar] [CrossRef]
- Jin, S.; Chen, L.; Sun, R.; McLoone, S. A novel vSLAM framework with unsupervised semantic segmentation based on adversarial transfer learning. Appl. Soft Comput. J. 2020, 90, 106153. [Google Scholar] [CrossRef]
- Wu, J.; Shi, Q.; Lu, Q.; Liu, X.; Zhu, X.; Lin, Z. Learning invariant semantic representation for long-term robust visual localization. Eng. Appl. Artif. Intell. 2022, 111, 104793. [Google Scholar] [CrossRef]
- Zhao, X.; Wang, C.; Ang, M.H. Real-time visual-inertial localization using semantic segmentation towards dynamic environments. IEEE Access 2020, 8, 155047–155059. [Google Scholar] [CrossRef]
- Su, P.; Luo, S.; Huang, X. Real-time dynamic SLAM algorithm based on deep learning. IEEE Access 2022, 10, 87754–87766. [Google Scholar] [CrossRef]
- Zou, Z.-X.; Huang, S.-S.; Mu, T.-J.; Wang, Y.-P. ObjectFusion: Accurate object-level SLAM with neural object priors. Graph. Model. 2022, 123, 101165. [Google Scholar] [CrossRef]
- Mumuni, F.; Mumuni, A.; Amuzuvi, C.K. Deep learning of monocular depth, optical flow and ego-motion with geometric guidance for UAV navigation in dynamic environments. Mach. Learn. Appl. 2022, 10, 100416. [Google Scholar] [CrossRef]
- Li, Q.; Cao, R.; Zhu, J.; Fu, H.; Zhou, B.; Fang, X.; Jia, S.; Zhang, S.; Liu, K.; Li, Q. Learn then match: A fast coarse-to-fine depth image-based indoor localization framework for dark environments via deep learning and keypoint-based geometry alignment. ISPRS J. Photogramm. Remote Sens. 2023, 195, 169–177. [Google Scholar] [CrossRef]
- Zhao, C.; Sun, L.; Yan, Z.; Neumann, G.; Duckett, T.; Stolkin, R. Learning Kalman Network: A deep monocular visual odometry for on-road driving. Robot. Auton. Syst. 2019, 121, 103234. [Google Scholar] [CrossRef]
- Ye, X.; Ji, X.; Sun, B.; Chen, S.; Wang, Z.; Li, H. DRM-SLAM: Towards dense reconstruction of monocular SLAM with scene depth fusion. Neurocomputing 2020, 396, 76–91. [Google Scholar] [CrossRef]
- Tao, C.; Gao, Z.; Yan, J.; Li, C.; Cui, G. Indoor 3D semantic robot VSLAM based on mask regional convolutional neural network. IEEE Access 2020, 8, 52906. [Google Scholar] [CrossRef]
- Memon, A.R.; Wang, H.; Hussain, A. Loop closure detection using supervised and unsupervised deep neural networks for monocular SLAM systems. Robot. Auton. Syst. 2020, 126, 103470. [Google Scholar] [CrossRef]
- Chang, J.; Dong, N.; Li, D.; Qin, M. Triplet loss based metric learning for closed loop detection in VSLAM system. Expert Syst. Appl. 2021, 185, 115646. [Google Scholar] [CrossRef]
- Duan, R.; Feng, Y.; Wen, C.-Y. Deep pose graph-matching-based loop closure detection for semantic visual SLAM. Sustainability 2022, 14, 11864. [Google Scholar] [CrossRef]
- Osman, H.; Darwish, N.; Bayoumi, A. PlaceNet: A multi-scale semantic-aware model for visual loop closure. Eng. Appl. Artif. Intell. 2023, 119, 105797. [Google Scholar] [CrossRef]
- Leonardi, M.; Fiori, L.; Stahl, A. Deep learning based keypoint rejection system for underwater visual ego-motion estimation. IFAC-Pap. 2020, 53, 9471–9477. [Google Scholar] [CrossRef]
- Yi, K.M.; Trulls, E.; Lepetit, V.; Fua, P. LIFT: Learned invariant feature transform. In Computer Vision–ECCV 2016; LNCS; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Springer: Cham, Switzerland, 2016; Volume 9910, pp. 467–483. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Ros, G.; Codevilla, F.; Lopez, A.; Koltun, V. CARLA: An open urban driving simulator. In Proceedings of the 1st Annual Conference on Robot Learning, Mountain View, CA, USA, 13–15 November 2017; pp. 1–16. [Google Scholar]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Computer Vision–ECCV 2018; LNCS; Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Springer: Cham, Switzerland, 2018; Volume 11211, pp. 833–851. [Google Scholar] [CrossRef]
- Deng, C.; Qiu, K.; Xiong, R.; Zhou, C. Comparative study of deep learning based features in SLAM. In Proceedings of the 2019 4th Asia-Pacific Conference on Intelligent Robot Systems (ACIRS 2019), Nagoya, Japan, 13–15 July 2019; pp. 250–254. [Google Scholar] [CrossRef]
- Engel, J.; Koltun, V.; Cremers, D. Direct sparse odometry. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 611–625. [Google Scholar] [CrossRef]
- Tian, Z.; Shen, C.; Chen, H.; He, T. FCOS: Fully convolutional one-stage object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9627–9636. [Google Scholar]
- Kendall, A.; Grimes, M.; Cipolla, R. PoseNet: A convolutional network for realtime 6-DoF camera relocalization. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 2938–2946. [Google Scholar]
- Brahmbhatt, S.; Gu, J.; Kim, K.; Hays, J.; Kautz, J. MapNet: Geometry-aware learning of maps for camera localization. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 12616–12625. [Google Scholar]
- Tan, M.; Pang, R.; Le, Q.V. EfficientDet: Scalable and efficient object detection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2020), Seattle, WA, USA, 13–19 June 2020; pp. 10781–10790. [Google Scholar]
- Efe, U.; Ince, K.G.; Alatan, A. DFM: A performance baseline for deep feature matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021) Workshops, Nashville, TN, USA, 20–25 June 2021; pp. 4284–4293. [Google Scholar]
- Wu, J.; Ma, L.; Hu, X. Delving deeper into convolutional neural networks for camera relocalization. In Proceedings of the IEEE International Conference on Robotics and Automation, Singapore, 29 May–3 June 2017; pp. 5644–5651. [Google Scholar] [CrossRef]
- Kreuzig, R.; Ochs, M.; Mester, R. DistanceNet: Estimating traveled distance from monocular images using a recurrent convolutional neural network. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW 2019), Long Beach, CA, USA, 16–17 June 2019; pp. 1–9. [Google Scholar]
- Ai, Y.; Rui, T.; Lu, M.; Fu, L.; Liu, S.; Wang, S. DDL-SLAM: A robust RGB-D SLAM in dynamic environments combined with deep learning. IEEE Access 2020, 8, 162335–162342. [Google Scholar] [CrossRef]
- Han, S.; Xi, Z. Dynamic scene semantics SLAM based on semantic segmentation. IEEE Access 2020, 8, 43563–43570. [Google Scholar] [CrossRef]
- Mishra, P.; Jain, U.; Choudhury, S.; Singh, S.; Pandey, A.; Sharma, A.; Singh, R.; Pathak, V.K.; Saxena, K.K.; Gehlot, A. Footstep planning of humanoid robot in ROS environment using Generative Adversarial Networks (GANs) deep learning. Robot. Auton. Syst. 2022, 158, 104269. [Google Scholar] [CrossRef]
- Tu, X.; Xu, C.; Liu, S.; Xie, G.; Huang, J.; Li, R.; Yuan, J. Learning depth for scene reconstruction using an encoder-decoder model. IEEE Access 2020, 8, 89300–89317. [Google Scholar] [CrossRef]
- Jin, Q.; Meng, Z.; Pham, T.D.; Chen, Q.; Wei, L.; Su, R. DUNet: A deformable network for retinal vessel segmentation. Knowl. Based Syst. 2019, 178, 149–162. [Google Scholar] [CrossRef]
- Zhao, H.; Shi, J.; Qi, X.; Wangh, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW 2017), Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890. [Google Scholar]
- Lan, E. A novel deep learning architecture by integrating visual simultaneous localization and mapping (VSLAM) into CNN for real-time surgical video analysis. In Proceedings of the 2022 IEEE 19th International Symposium on Biomedical Imaging (ISBI 2022), Kolkata, India, 28–31 March 2022; pp. 1–5. [Google Scholar] [CrossRef]
- Aslan, M.F.; Durdu, A.; Sabanci, K. Visual-Inertial Image-Odometry Network (VIIONet): A Gaussian process regression-based deep architecture proposal for UAV pose estimation. Measurement 2022, 194, 111030. [Google Scholar] [CrossRef]
- Shamwell, J.E.; Leung, S.; Nothwang, W.D. Vision-aided absolute trajectory estimation using an unsupervised deep network with online error correction. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2018), Madrid, Spain, 1–5 October 2018; pp. 1–9. [Google Scholar] [CrossRef]
- Xiao, Y.; Li, L.; Li, X.; Yao, J. DeepMLE: A robust deep maximum likelihood estimator for two-view structure from motion. In Proceedings of the 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2022), Kyoto, Japan, 23–27 October 2022; pp. 1–8. [Google Scholar]
- Zhai, G.; Liu, L.; Zhang, L.; Liu, Y.; Jiang, Y. PoseConvGRU: A monocular approach for visual ego-motion estimation by learning. Pattern Recognit. 2020, 102, 107187. [Google Scholar] [CrossRef]
- Zhu, R.; Yang, M.; Liu, W.; Song, R.; Yan, B.; Xiao, Z. DeepAVO: Efficient pose refining with feature distilling for deep visual odometry. Neurocomputing 2022, 467, 22–35. [Google Scholar] [CrossRef]
- Wang, S.; Clark, R.; Wen, H.; Trigoni, N. DeepVO: Towards end-to-end visual odometry with deep recurrent convolutional neural networks. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation, Singapore, 29 May–3 June 2017; pp. 2043–2050. [Google Scholar] [CrossRef]
- Li, R.; Wang, S.; Long, Z.; Gu, D. UnDeepVO: Monocular visual odometry through unsupervised deep learning. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation, Brisbane, Australia, 21–25 May 2018; pp. 7286–7291. [Google Scholar] [CrossRef]
- Aslan, M.F.; Durdu, A.; Yusefi, A.; Yilmaz, A. HVIOnet: A deep learning based hybrid visual–inertial odometry. Neural Netw. 2022, 155, 461–474. [Google Scholar] [CrossRef]
- Almalioglu, Y.; Turan, M.; Saputra, M.R.U.; de Gusmão, P.P.B.; Markham, A.; Trigoni, N. SelfVIO: Self-supervised deep monocular visual–inertial odometry and depth estimation. Neural Netw. 2022, 150, 119–136. [Google Scholar] [CrossRef]
- Burguera, A. Lightweight underwater visual loop detection and classification using a Siamese convolutional neural network. IFAC Pap. 2021, 54, 410–415. [Google Scholar] [CrossRef]
- Lu, Y.; Chen, Y.; Zhao, D.; Li, D. MGRL: Graph neural network based inference in a Markov network with reinforcement learning for visual navigation. Neurocomputing 2021, 421, 140–150. [Google Scholar] [CrossRef]
- Liu, Y.; Xie, K.; Huang, H. VGF-Net: Visual-geometric fusion learning for simultaneous drone navigation and height mapping. Graph. Model. 2021, 116, 101108. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Fischer, P.; Ilg, E.; Hausser, P.; Hazirbas, C.; Golkov, V.; van der Smagt, P.; Cremers, D.; Brox, T. FlowNet: Learning optical flow with convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision, Washington, DC, USA, 7–13 December 2015; pp. 2758–2766. [Google Scholar] [CrossRef]
- Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. CBAM: Convolutional block attention module. In Computer Vision–ECCV 2018; LNCS; Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Springer: Cham, Switzerland, 2018; Volume 11211, pp. 3–19. [Google Scholar] [CrossRef]
- Geiger, A.; Lenz, P.; Urtasun, R. Are we ready for autonomous driving? The KITTI vision benchmark suite. In Proceedings of the 2012 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2012), Providence, RI, USA, 16–21 June 2012; pp. 3354–3361. [Google Scholar] [CrossRef]
- Geiger, A.; Lenz, P.; Stiller, C.; Urtasun, R. Vision meets robotics: The KITTI dataset. Int. J. Robot. Res. 2013, 32, 1231–1237. [Google Scholar] [CrossRef]
- Voigtlaender, P.; Krause, M.; Osep, A.; Luiten, J.; Sekar, B.B.G.; Geiger, A.; Leibe, B. MOTS: Multi-object tracking and segmentation. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2019), Long Beach, CA, USA, 15–20 June 2019; pp. 7942–7951. [Google Scholar]
- The KITTI Vision Benchmark Suite. Available online: https://www.cvlibs.net/datasets/kitti/index.php (accessed on 25 January 2023).
- Gaidon, A.; Wang, Q.; Cabon, Y.; Vig, E. VirtualWorlds as proxy for multi-object tracking analysis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 4340–4349. [Google Scholar] [CrossRef]
- Virtual KITTI 2 Dataset. Available online: https://europe.naverlabs.com/research/computer-vision/proxy-virtual-worlds-vkitti-2 (accessed on 12 February 2023).
- Sturm, J.; Engelhard, N.; Endres, F.; Burgard, W.; Cremers, D. A benchmark for the evaluation of RGB-D SLAM systems. In Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, Vilamoura, Algarve, Portugal, 7–12 October 2012; pp. 573–580. [Google Scholar] [CrossRef]
- RGB-D SLAM Dataset and Benchmark. Available online: https://vision.in.tum.de/data/datasets/rgbd-dataset (accessed on 12 February 2023).
- Shotton, J.; Glocker, B.; Zach, C.; Izadi, S.; Criminisi, A.; Fitzgibbon, A. Scene coordinate regression forests for camera relocalization in RGB-D images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013; pp. 2930–2937. [Google Scholar] [CrossRef]
- RGB-D Dataset 7-Scenes. Available online: https://www.microsoft.com/en-us/research/project/rgb-d-dataset-7-scenes (accessed on 12 February 2023).
- EuRoC MAV Dataset. Available online: https://mldta.com/dataset/euroc-mav-dataset (accessed on 12 February 2023).
- Burri, M.; Nikolic, J.; Gohl, P.; Schneider, T.; Rehder, J.; Omari, S.; Achtelik, M.; Siegwart, R. The EuRoC micro aerial vehicle datasets. Int. J. Robot. Res. 2016, 35, 1157–1163. [Google Scholar] [CrossRef]
- VaFRIC (Variable Frame-Rate Imperial College) Dataset. Available online: https://www.doc.ic.ac.uk/~ahanda/VaFRIC/iclnuim.html (accessed on 12 February 2023).
- Handa, A.; Whelan, T.; McDonald, J.B.; Davison, A.J. A benchmark for RGB-D visual odometry, 3D Reconstruction and SLAM. In Proceedings of the 2014 IEEE International Conference on Robotics and Automation (ICRA 2014), Hong Kong, China, 31 May–5 June 2014; pp. 1–9. [Google Scholar] [CrossRef]
- Silberman, N.; Hoiem, D.; Kohli, P.; Fergus, R. Indoor segmentation and support inference from RGBD images. In Computer Vision–ECCV 2012; LNCS; Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C., Eds.; Springer: Berlin/Heidelberg, Germany, 2012; Volume 7576, pp. 746–760. [Google Scholar] [CrossRef]
- NYU Depth Dataset V2. Available online: https://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html (accessed on 12 February 2023).
- Maddern, W.; Pascoe, G.; Linegar, C.; Newman, P. 1 Year, 1000km: The Oxford RobotCar dataset. Int. J. Robot. Res. 2017, 36, 3–15. [Google Scholar] [CrossRef]
- Oxford RobotCar Dataset. Available online: https://robotcar-dataset.robots.ox.ac.uk (accessed on 12 February 2023).
- The Malaga Stereo and Laser Urban Data Set. Available online: https://www.mrpt.org/MalagaUrbanDataset (accessed on 17 February 2023).
- Blanco-Claraco, J.-L.; Moreno-Dueñas, F.-Á.; González-Jiménez, J. The Malaga urban dataset: High-rate stereo and LiDAR in a realistic urban scenario. Int. J. Robot. Res. 2014, 33, 207–214. [Google Scholar] [CrossRef]
- Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The Cityscapes dataset for semantic urban scene understanding. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), Las Vegas, NV, USA, 27–30 June 2016; pp. 3213–3223. [Google Scholar] [CrossRef]
- The CityScapes Dataset. Available online: https://www.cityscapes-dataset.com (accessed on 17 February 2023).
- ApolloScapes Dataset. Available online: http://apolloscape.auto/self_localization.html (accessed on 17 February 2023).
- Huang, X.; Cheng, X.; Geng, Q.; Cao, B.; Zhou, D.; Wang, P.; Lin, Y.; Yang, R. The ApolloScape dataset for autonomous driving. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW 2018), Salt Lake City, UT, USA, 18–22 June 2018; pp. 1067–1073. [Google Scholar] [CrossRef]
- ScanNet. Available online: http://www.scan-net.org (accessed on 12 February 2023).
- Dai, A.; Chang, A.X.; Savva, M.; Halber, M.; Funkhouser, T.A.; Nießner, M. ScanNet: Richly-annotated 3D reconstructions of indoor scenes. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), Honolulu, HI, USA, 21–26 July 2017; pp. 2432–2443. [Google Scholar] [CrossRef]
- Mid-Air. Available online: https://midair.ulg.ac.be (accessed on 12 February 2023).
- Fonder, M.; Van Droogenbroeck, M. Mid-Air: A multi-modal dataset for extremely low altitude drone flights. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA, 16–17 June 2019; pp. 1–10. [Google Scholar] [CrossRef]
- AI2-THOR. Available online: https://ai2thor.allenai.org (accessed on 12 February 2023).
- Kolve, E.; Mottaghi, R.; Han, W.; VanderBilt, E.; Weihs, L.; Herrasti, A.; Gordon, D.; Zhu, Y.; Gupta, A.; Farhadi, A. AI2-THOR: An interactive 3D environment for visual AI. arXiv 2022, arXiv:1712.05474v4. [Google Scholar]
- Wald, J.; Sattler, T.; Golodetz, S.; Cavallari, T.; Tombari, F. Beyond controlled environments: 3D camera re-localization in changing indoor scenes. In Computer Vision–ECCV 2020; LNCS; Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M., Eds.; Springer: Cham, Switzerland, 2020; Volume 12352, pp. 467–487. [Google Scholar] [CrossRef]
Dataset | Year | Subject | Sensor | Short Description |
---|---|---|---|---|
The KITTI Visual Odometry dataset | 2012 | 22 stereo sequences | Color camera, GPS, LiDAR scanner | The dataset contains image pairs captured in outdoor scenes with ground truth labeling |
The vKITTI2 dataset | 2015 | 5 sequences | Unity game engine | The dataset is a synthetic dataset for training and testing autonomous driving models |
The TUM RGB-D dataset | 2012 | 39 sequences | Kinect sensor (near-infrared laser, infrared camera, color camera) | The data was recorded as color and depth frames with ground truth trajectories |
The NYU RGB-D V2 dataset | 2012 | 464 indoor scenes | Kinect RGB-D sensor | The dataset includes physical scenes for segmenting the visible areas of objects |
The RGB-D Dataset 7-Scenes | 2013 | 7 scenes | Kinect RGB-D sensor | The dataset allows observers to evaluate dense tracking and mapping, as well as relocalization |
The Malaga dataset | 2013 | 15 outdoor scenes | Stereo camera, 5 laser scanners, IMU, GPS | The dataset presents high-resolution stereo images over a 36.8 km trajectory |
The ICL-NUIM dataset | 2014 | 2 scenes | Kinect sensor | The dataset includes labeled scenes with a living room and an office |
The EuRoC MAV dataset | 2016 | 11 indoor scenes | Stereo camera in grey space, IMU | The dataset consists of stereo images synchronized with IMU measurements |
The Oxford RobotCar dataset | 2016 | Over 130 scenes | 6 cameras, LiDAR, GPS, INS | The dataset contains long trajectories in outdoor scenes with complex weather conditions |
The Cytiscapes dataset | 2016 | 50 urban scenes | Stereo camera, GPS | The dataset contains complex street scenes from 50 different cities |
The AI2-THOR dataset | 2017 | 120 indoor scenes | Physical simulation in Unity 3D | The dataset consists of photorealistic 3D indoor scenes with AI agent navigation |
The ScanNet dataset | 2017 | 707 scenes | Kinect sensor | The dataset is a set of annotated 3D indoor reconstructions |
The ApolloScape dataset | 2018 | 2 long sequences | Camera, Stereo camera, LiDAR, GPS, IMU | The dataset consists of varying conditions and traffic densities with complex scenarios |
The MidAir dataset | 2019 | 79 min of drone flight | 3 RGB cameras, accelerometer, gyroscope, GPS | The dataset provides a large amount of synchronized data corresponding to flight records |
The RIO10 dataset | 2020 | 10 sequences | Mobile phone | The dataset provides RGB and depth images with semantic maps for reference |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Favorskaya, M.N. Deep Learning for Visual SLAM: The State-of-the-Art and Future Trends. Electronics 2023, 12, 2006. https://doi.org/10.3390/electronics12092006
Favorskaya MN. Deep Learning for Visual SLAM: The State-of-the-Art and Future Trends. Electronics. 2023; 12(9):2006. https://doi.org/10.3390/electronics12092006
Chicago/Turabian StyleFavorskaya, Margarita N. 2023. "Deep Learning for Visual SLAM: The State-of-the-Art and Future Trends" Electronics 12, no. 9: 2006. https://doi.org/10.3390/electronics12092006
APA StyleFavorskaya, M. N. (2023). Deep Learning for Visual SLAM: The State-of-the-Art and Future Trends. Electronics, 12(9), 2006. https://doi.org/10.3390/electronics12092006