MDPI - Publisher of Open Access Journals

23 pages, 13423 KB

Open AccessArticle

A Lightweight LiDAR–Visual Odometry Based on Centroid Distance in a Similar Indoor Environment

by Zongkun Zhou, Weiping Jiang, Chi Guo, Yibo Liu and Xingyu Zhou

Remote Sens. 2025, 17(16), 2850; https://doi.org/10.3390/rs17162850 - 16 Aug 2025

Viewed by 637

Simultaneous Localization and Mapping (SLAM) is a critical technology for robot intelligence. Compared to cameras, Light Detection and Ranging (LiDAR) sensors achieve higher accuracy and stability in indoor environments. However, LiDAR can only capture the geometric structure of the environment, and LiDAR-based SLAM [...] Read more.

Simultaneous Localization and Mapping (SLAM) is a critical technology for robot intelligence. Compared to cameras, Light Detection and Ranging (LiDAR) sensors achieve higher accuracy and stability in indoor environments. However, LiDAR can only capture the geometric structure of the environment, and LiDAR-based SLAM often fails in scenarios with insufficient geometric features or highly similar structures. Furthermore, low-cost mechanical LiDARs, constrained by sparse point cloud density, are particularly prone to odometry drift along the Z-axis, especially in environments such as tunnels or long corridors. To address the localization issues in such scenarios, we propose a forward-enhanced SLAM algorithm. Utilizing a 16-line LiDAR and a monocular camera, we construct a dense colored point cloud input and apply an efficient multi-modal feature extraction algorithm based on centroid distance to extract a set of feature points with significant geometric and color features. These points are then optimized in the back end based on constraints from points, lines, and planes. We compare our method with several classic SLAM algorithms in terms of feature extraction, localization, and elevation constraint. Experimental results demonstrate that our method achieves high-precision real-time operation and exhibits excellent adaptability to indoor environments with similar structures. Full article

► Show Figures

Figure 1

22 pages, 4524 KB

Open AccessArticle

RAEM-SLAM: A Robust Adaptive End-to-End Monocular SLAM Framework for AUVs in Underwater Environments

by Yekai Wu, Yongjie Li, Wenda Luo and Xin Ding

Drones 2025, 9(8), 579; https://doi.org/10.3390/drones9080579 - 15 Aug 2025

Viewed by 555

Abstract

Autonomous Underwater Vehicles (AUVs) play a critical role in ocean exploration. However, due to the inherent limitations of most sensors in underwater environments, achieving accurate navigation and localization in complex underwater scenarios remains a significant challenge. While vision-based Simultaneous Localization and Mapping (SLAM) [...] Read more.

Autonomous Underwater Vehicles (AUVs) play a critical role in ocean exploration. However, due to the inherent limitations of most sensors in underwater environments, achieving accurate navigation and localization in complex underwater scenarios remains a significant challenge. While vision-based Simultaneous Localization and Mapping (SLAM) provides a cost-effective alternative for AUV navigation, existing methods are primarily designed for terrestrial applications and struggle to address underwater-specific issues, such as poor illumination, dynamic interference, and sparse features. To tackle these challenges, we propose RAEM-SLAM, a robust adaptive end-to-end monocular SLAM framework for AUVs in underwater environments. Specifically, we propose a Physics-guided Underwater Adaptive Augmentation (PUAA) method that dynamically converts terrestrial scene datasets into physically realistic pseudo-underwater images for the augmentation training of RAEM-SLAM, improving the system’s generalization and adaptability in complex underwater scenes. We also introduce a Residual Semantic–Spatial Attention Module (RSSA), which utilizes a dual-branch attention mechanism to effectively fuse semantic and spatial information. This design enables adaptive enhancement of key feature regions and suppression of noise interference, resulting in more discriminative feature representations. Furthermore, we incorporate a Local–Global Perception Block (LGP), which integrates multi-scale local details with global contextual dependencies to significantly improve AUV pose estimation accuracy in dynamic underwater scenes. Experimental results on real-world underwater datasets demonstrate that RAEM-SLAM outperforms state-of-the-art SLAM approaches in enabling precise and robust navigation for AUVs. Full article

► Show Figures

Figure 1

22 pages, 3506 KB

Open AccessArticle

UAV Navigation Using EKF-MonoSLAM Aided by Range-to-Base Measurements

by Rodrigo Munguia, Juan-Carlos Trujillo and Antoni Grau

Drones 2025, 9(8), 570; https://doi.org/10.3390/drones9080570 - 12 Aug 2025

Viewed by 255

Abstract

This study introduces an innovative refinement to EKF-based monocular SLAM by incorporating attitude, altitude, and range-to-base data to enhance system observability and minimize drift. In particular, by utilizing a single range measurement relative to a fixed reference point, the method enables unmanned aerial [...] Read more.

This study introduces an innovative refinement to EKF-based monocular SLAM by incorporating attitude, altitude, and range-to-base data to enhance system observability and minimize drift. In particular, by utilizing a single range measurement relative to a fixed reference point, the method enables unmanned aerial vehicles (UAVs) to mitigate error accumulation, preserve map consistency, and operate reliably in environments without GPS. This integration facilitates sustained autonomous navigation with estimation error remaining bounded over extended trajectories. Theoretical validation is provided through a nonlinear observability analysis, highlighting the general benefits of integrating range data into the SLAM framework. The system’s performance is evaluated through both virtual experiments and real-world flight data. The real-data experiments confirm the practical relevance of the approach and its ability to improve estimation accuracy in realistic scenarios. Full article

(This article belongs to the Section Drone Design and Development)

► Show Figures

Figure 1

19 pages, 3382 KB

Open AccessArticle

LiDAR as a Geometric Prior: Enhancing Camera Pose Tracking Through High-Fidelity View Synthesis

by Rafael Muñoz-Salinas, Jianheng Liu, Francisco J. Romero-Ramirez, Manuel J. Marín-Jiménez and Fu Zhang

Appl. Sci. 2025, 15(15), 8743; https://doi.org/10.3390/app15158743 - 7 Aug 2025

Viewed by 389

Abstract

This paper presents a robust framework for monocular camera pose estimation by leveraging high-fidelity, pre-built 3D LiDAR maps. The core of our approach is a render-and-match pipeline that synthesizes photorealistic views from a dense LiDAR point cloud. By detecting and matching keypoints between [...] Read more.

This paper presents a robust framework for monocular camera pose estimation by leveraging high-fidelity, pre-built 3D LiDAR maps. The core of our approach is a render-and-match pipeline that synthesizes photorealistic views from a dense LiDAR point cloud. By detecting and matching keypoints between these synthetic images and the live camera feed, we establish reliable 3D–2D correspondences for accurate pose estimation. We evaluate two distinct strategies: an Online Rendering and Tracking method that renders views on the fly, and an Offline Keypoint-Map Tracking method that precomputes a keypoint map for known trajectories, optimizing for computational efficiency. Comprehensive experiments demonstrate that our framework significantly outperforms several state-of-the-art visual SLAM systems in both accuracy and tracking consistency. By anchoring localization to the stable geometric information from the LiDAR map, our method overcomes the reliance on photometric consistency that often causes failures in purely image-based systems, proving particularly effective in challenging real-world environments. Full article

(This article belongs to the Special Issue Image Processing and Computer Vision Applications)

► Show Figures

Figure 1

32 pages, 1435 KB

Open AccessReview

Smart Safety Helmets with Integrated Vision Systems for Industrial Infrastructure Inspection: A Comprehensive Review of VSLAM-Enabled Technologies

by Emmanuel A. Merchán-Cruz, Samuel Moveh, Oleksandr Pasha, Reinis Tocelovskis, Alexander Grakovski, Alexander Krainyukov, Nikita Ostrovenecs, Ivans Gercevs and Vladimirs Petrovs

Sensors 2025, 25(15), 4834; https://doi.org/10.3390/s25154834 - 6 Aug 2025

Viewed by 949

Abstract

Smart safety helmets equipped with vision systems are emerging as powerful tools for industrial infrastructure inspection. This paper presents a comprehensive state-of-the-art review of such VSLAM-enabled (Visual Simultaneous Localization and Mapping) helmets. We surveyed the evolution from basic helmet cameras to intelligent, sensor-fused [...] Read more.

Smart safety helmets equipped with vision systems are emerging as powerful tools for industrial infrastructure inspection. This paper presents a comprehensive state-of-the-art review of such VSLAM-enabled (Visual Simultaneous Localization and Mapping) helmets. We surveyed the evolution from basic helmet cameras to intelligent, sensor-fused inspection platforms, highlighting how modern helmets leverage real-time visual SLAM algorithms to map environments and assist inspectors. A systematic literature search was conducted targeting high-impact journals, patents, and industry reports. We classify helmet-integrated camera systems into monocular, stereo, and omnidirectional types and compare their capabilities for infrastructure inspection. We examine core VSLAM algorithms (feature-based, direct, hybrid, and deep-learning-enhanced) and discuss their adaptation to wearable platforms. Multi-sensor fusion approaches integrating inertial, LiDAR, and GNSS data are reviewed, along with edge/cloud processing architectures enabling real-time performance. This paper compiles numerous industrial use cases, from bridges and tunnels to plants and power facilities, demonstrating significant improvements in inspection efficiency, data quality, and worker safety. Key challenges are analyzed, including technical hurdles (battery life, processing limits, and harsh environments), human factors (ergonomics, training, and cognitive load), and regulatory issues (safety certification and data privacy). We also identify emerging trends, such as semantic SLAM, AI-driven defect recognition, hardware miniaturization, and collaborative multi-helmet systems. This review finds that VSLAM-equipped smart helmets offer a transformative approach to infrastructure inspection, enabling real-time mapping, augmented awareness, and safer workflows. We conclude by highlighting current research gaps, notably in standardizing systems and integrating with asset management, and provide recommendations for industry adoption and future research directions. Full article

(This article belongs to the Topic 3D Computer Vision and Smart Building and City, 3rd Edition)

► Show Figures

Figure 1

24 pages, 988 KB

Open AccessArticle

Consistency-Oriented SLAM Approach: Theoretical Proof and Numerical Validation

by Zhan Wang, Alain Lambert, Yuwei Meng, Rongdong Yu, Jin Wang and Wei Wang

Electronics 2025, 14(15), 2966; https://doi.org/10.3390/electronics14152966 - 24 Jul 2025

Viewed by 309

Abstract

Simultaneous Localization and Mapping (SLAM) has long been a fundamental and challenging task in robotics literature, where safety and reliability are the critical issues for successfully autonomous applications of robots. Classically, the SLAM problem is tackled via probabilistic or optimization methods (such as [...] Read more.

Simultaneous Localization and Mapping (SLAM) has long been a fundamental and challenging task in robotics literature, where safety and reliability are the critical issues for successfully autonomous applications of robots. Classically, the SLAM problem is tackled via probabilistic or optimization methods (such as EKF-SLAM, Fast-SLAM, and Graph-SLAM). Despite their strong performance in real-world scenarios, these methods may exhibit inconsistency, which is caused by the inherent characteristic of model linearization or Gaussian noise assumption. In this paper, we propose an alternative monocular SLAM algorithm which theoretically relies on interval analysis (iMonoSLAM), to pursue guaranteed rather than probabilistically defined solutions. We consistently modeled and initialized the SLAM problem with a bounded-error parametric model. The state estimation process is then cast into an Interval Constraint Satisfaction Problem (ICSP) and resolved through interval constraint propagation techniques without any linearization or Gaussian noise assumption. Furthermore, we theoretically prove the obtained consistency and propose a versatile method for numerical validation. To the best of our knowledge, this is the first time such a proof has been proposed. A plethora of numerical experiments are carried to validate the consistency, and a preliminary comparison with classical EKF-SLAM in different noisy situations is also presented. Our proposed iMonoSLAM shows outstanding performance in obtaining reliable solutions, highlighting the potential application prospect in safety-critical scenarios of mobile robots. Full article

(This article belongs to the Special Issue Simultaneous Localization and Mapping (SLAM) of Mobile Robots)

► Show Figures

Figure 1

21 pages, 4044 KB

Open AccessArticle

DK-SLAM: Monocular Visual SLAM with Deep Keypoint Learning, Tracking, and Loop Closing

by Hao Qu, Lilian Zhang, Jun Mao, Junbo Tie, Xiaofeng He, Xiaoping Hu, Yifei Shi and Changhao Chen

Appl. Sci. 2025, 15(14), 7838; https://doi.org/10.3390/app15147838 - 13 Jul 2025

Viewed by 581

Abstract

The performance of visual SLAM in complex, real-world scenarios is often compromised by unreliable feature extraction and matching when using handcrafted features. Although deep learning-based local features excel at capturing high-level information and perform well on matching benchmarks, they struggle with generalization in [...] Read more.

The performance of visual SLAM in complex, real-world scenarios is often compromised by unreliable feature extraction and matching when using handcrafted features. Although deep learning-based local features excel at capturing high-level information and perform well on matching benchmarks, they struggle with generalization in continuous motion scenes, adversely affecting loop detection accuracy. Our system employs a Model-Agnostic Meta-Learning (MAML) strategy to optimize the training of keypoint extraction networks, enhancing their adaptability to diverse environments. Additionally, we introduce a coarse-to-fine feature tracking mechanism for learned keypoints. It begins with a direct method to approximate the relative pose between consecutive frames, followed by a feature matching method for refined pose estimation. To mitigate cumulative positioning errors, DK-SLAM incorporates a novel online learning module that utilizes binary features for loop closure detection. This module dynamically identifies loop nodes within a sequence, ensuring accurate and efficient localization. Experimental evaluations on publicly available datasets demonstrate that DK-SLAM outperforms leading traditional and learning-based SLAM systems, such as ORB-SLAM3 and LIFT-SLAM. DK-SLAM achieves 17.7% better translation accuracy and 24.2% better rotation accuracy than ORB-SLAM3 on KITTI and 34.2% better translation accuracy on EuRoC. These results underscore the efficacy and robustness of our DK-SLAM in varied and challenging real-world environments. Full article

(This article belongs to the Section Robotics and Automation)

► Show Figures

Figure 1

32 pages, 2740 KB

Open AccessArticle

Vision-Based Navigation and Perception for Autonomous Robots: Sensors, SLAM, Control Strategies, and Cross-Domain Applications—A Review

by Eder A. Rodríguez-Martínez, Wendy Flores-Fuentes, Farouk Achakir, Oleg Sergiyenko and Fabian N. Murrieta-Rico

Eng 2025, 6(7), 153; https://doi.org/10.3390/eng6070153 - 7 Jul 2025

Cited by 1 | Viewed by 2886

Abstract

Camera-centric perception has matured into a cornerstone of modern autonomy, from self-driving cars and factory cobots to underwater and planetary exploration. This review synthesizes more than a decade of progress in vision-based robotic navigation through an engineering lens, charting the full pipeline from [...] Read more.

Camera-centric perception has matured into a cornerstone of modern autonomy, from self-driving cars and factory cobots to underwater and planetary exploration. This review synthesizes more than a decade of progress in vision-based robotic navigation through an engineering lens, charting the full pipeline from sensing to deployment. We first examine the expanding sensor palette—monocular and multi-camera rigs, stereo and RGB-D devices, LiDAR–camera hybrids, event cameras, and infrared systems—highlighting the complementary operating envelopes and the rise of learning-based depth inference. The advances in visual localization and mapping are then analyzed, contrasting sparse and dense SLAM approaches, as well as monocular, stereo, and visual–inertial formulations. Additional topics include loop closure, semantic mapping, and LiDAR–visual–inertial fusion, which enables drift-free operation in dynamic environments. Building on these foundations, we review the navigation and control strategies, spanning classical planning, reinforcement and imitation learning, hybrid topological–metric memories, and emerging visual language guidance. Application case studies—autonomous driving, industrial manipulation, autonomous underwater vehicles, planetary rovers, aerial drones, and humanoids—demonstrate how tailored sensor suites and algorithms meet domain-specific constraints. Finally, the future research trajectories are distilled: generative AI for synthetic training data and scene completion; high-density 3D perception with solid-state LiDAR and neural implicit representations; event-based vision for ultra-fast control; and human-centric autonomy in next-generation robots. By providing a unified taxonomy, a comparative analysis, and engineering guidelines, this review aims to inform researchers and practitioners designing robust, scalable, vision-driven robotic systems. Full article

(This article belongs to the Special Issue Interdisciplinary Insights in Engineering Research)

► Show Figures

Figure 1

27 pages, 3462 KB

Open AccessArticle

Visual-Based Position Estimation for Underwater Vehicles Using Tightly Coupled Hybrid Constrained Approach

by Tiedong Zhang, Shuoshuo Ding, Xun Yan, Yanze Lu, Dapeng Jiang, Xinjie Qiu and Yu Lu

J. Mar. Sci. Eng. 2025, 13(7), 1216; https://doi.org/10.3390/jmse13071216 - 24 Jun 2025

Viewed by 399

Abstract

A tightly coupled hybrid monocular visual SLAM system for unmanned underwater vehicles (UUVs) is introduced in this paper. Specifically, we propose a robust three-step hybrid tracking strategy. The feature-based method initially provides a rough pose estimate, then the direct method refines it, and [...] Read more.

A tightly coupled hybrid monocular visual SLAM system for unmanned underwater vehicles (UUVs) is introduced in this paper. Specifically, we propose a robust three-step hybrid tracking strategy. The feature-based method initially provides a rough pose estimate, then the direct method refines it, and finally, the refined results are used to reproject map points to improve the number of features tracked and stability. Furthermore, a tightly coupled visual hybrid optimization method is presented to address the inaccuracy of the back-end pose optimization. The selection of features for stable tracking is achieved through the integration of two distinct residuals: geometric reprojection error and photometric error. The efficacy of the proposed system is demonstrated through quantitative and qualitative analyses in both artificial and natural underwater environments, demonstrating excellent stable tracking and accurate localization results. Full article

(This article belongs to the Section Ocean Engineering)

► Show Figures

Figure 1

16 pages, 3055 KB

Open AccessArticle

LET-SE2-VINS: A Hybrid Optical Flow Framework for Robust Visual–Inertial SLAM

by Wei Zhao, Hongyang Sun, Songsong Ma and Haitao Wang

Sensors 2025, 25(13), 3837; https://doi.org/10.3390/s25133837 - 20 Jun 2025

Viewed by 657

Abstract

This paper presents SE2-LET-VINS, an enhanced Visual–Inertial Simultaneous Localization and Mapping (VI-SLAM) system built upon the classic Visual–Inertial Navigation System for Monocular Cameras (VINS-Mono) framework, designed to improve localization accuracy and robustness in complex environments. By integrating Lightweight Neural Network (LET-NET) for high-quality [...] Read more.

This paper presents SE2-LET-VINS, an enhanced Visual–Inertial Simultaneous Localization and Mapping (VI-SLAM) system built upon the classic Visual–Inertial Navigation System for Monocular Cameras (VINS-Mono) framework, designed to improve localization accuracy and robustness in complex environments. By integrating Lightweight Neural Network (LET-NET) for high-quality feature extraction and Special Euclidean Group in 2D (SE2) optical flow tracking, the system achieves superior performance in challenging scenarios such as low lighting and rapid motion. The proposed method processes Inertial Measurement Unit (IMU) data and camera data, utilizing pre-integration and RANdom SAmple Consensus (RANSAC) for precise feature matching. Experimental results on the European Robotics Challenges (EuRoc) dataset demonstrate that the proposed hybrid method improves localization accuracy by up to 43.89% compared to the classic VINS-Mono model in sequences with loop closure detection. In no-loop scenarios, the method also achieves error reductions of 29.7%, 21.8%, and 24.1% on the MH_04, MH_05, and V2_03 sequences, respectively. Trajectory visualization and Gaussian fitting analysis further confirm the system’s good robustness and accuracy. SE2-LET-VINS offers a robust solution for visual–inertial navigation, particularly in demanding environments, and paves the way for future real-time applications and extended capabilities. Full article

(This article belongs to the Section Navigation and Positioning)

► Show Figures

Figure 1

25 pages, 5180 KB

Open AccessArticle

An Improved SLAM Algorithm for Substation Inspection Robots Based on 3D Lidar and Visual Information Fusion

by Yicen Liu and Songhai Fan

Energies 2025, 18(11), 2797; https://doi.org/10.3390/en18112797 - 27 May 2025

Viewed by 646

Abstract

Current substation inspection robots mainly use Lidar as a sensor for localization and map building. However, laser SLAM has the problem of localization error in scenes with similar and missing environmental structural features, and environmental maps built by laser SLAM provide more single-road [...] Read more.

Current substation inspection robots mainly use Lidar as a sensor for localization and map building. However, laser SLAM has the problem of localization error in scenes with similar and missing environmental structural features, and environmental maps built by laser SLAM provide more single-road information for inspection robot navigation, which is not conducive to the judgment of the road scene. For this reason, in this paper, 3D Lidar information and visual information are fused to create a SLAM algorithm applicable to substation inspection robots to solve the above laser SLAM localization error problem and improve the algorithm’s localization accuracy. First, in order to recover the scalability of monocular visual localization, the algorithm in this paper utilizes 3D Lidar information and visual information to calculate the true position of image feature points in space. Second, the laser position and visual position are utilized with interpolation to correct the point cloud distortion caused by the motion of the Lidar. Then, a position-adaptive selection algorithm is designed to use visual position instead of laser inter-frame position in some special regions to improve the robustness of the algorithm. Finally, a color laser point cloud map of the substation is constructed to provide more road environment information for the navigation of the inspection robot. The experimental results show that the localization accuracy and map-building effect of the VO-Lidar SLAM algorithm designed in this paper are better than the current laser SLAM algorithm and verify the applicability of the color laser point cloud map constructed by this algorithm in substation environments. Full article

(This article belongs to the Special Issue Application of Artificial Intelligence in Power System Monitoring and Fault Diagnosis II)

► Show Figures

Figure 1

26 pages, 5598 KB

Open AccessArticle

DeepLabV3+-Based Semantic Annotation Refinement for SLAM in Indoor Environments

by Shuangfeng Wei, Hongrui Tang, Changchang Liu, Tong Yang, Xiaohang Zhou, Sisi Zlatanova, Junlin Fan, Liping Tu and Yaqin Mao

Sensors 2025, 25(11), 3344; https://doi.org/10.3390/s25113344 - 26 May 2025

Cited by 1 | Viewed by 512

Abstract

Visual SLAM systems frequently encounter challenges in accurately reconstructing three-dimensional scenes from monocular imagery in semantically deficient environments, which significantly compromises robotic operational efficiency. While conventional manual annotation approaches can provide supplemental semantic information, they are inherently inefficient, procedurally complex, and labor-intensive. This [...] Read more.

Visual SLAM systems frequently encounter challenges in accurately reconstructing three-dimensional scenes from monocular imagery in semantically deficient environments, which significantly compromises robotic operational efficiency. While conventional manual annotation approaches can provide supplemental semantic information, they are inherently inefficient, procedurally complex, and labor-intensive. This paper presents an optimized DeepLabV3+-based framework for visual SLAM that integrates image semantic segmentation with automated point cloud semantic annotation. The proposed method utilizes MobileNetV3 as the backbone network for DeepLabV3+ to maintain segmentation accuracy while reducing computational demands. In this paper, we introduce a parameter-adaptive Density-Based Spatial Clustering of Applications with Noise (DBSCAN) clustering algorithm incorporating K-nearest neighbors and accelerated by KD-tree structures, effectively addressing the limitations of manual parameter tuning and erroneous annotations in conventional methods. Furthermore, a novel point cloud processing strategy featuring dynamic radius thresholding is developed to enhance annotation completeness and boundary precision. Experimental results demonstrate that our approach achieves significant improvements in annotation efficiency while preserving high accuracy, thereby providing reliable technical support for enhanced environmental understanding and navigation capabilities in indoor robotic applications. Full article

(This article belongs to the Special Issue Indoor Localization Technologies and Applications)

► Show Figures

Figure 1

19 pages, 13865 KB

Open AccessArticle

Monocular Initialization for Real-Time Feature-Based SLAM in Dynamic Environments with Multiple Frames

by Hexuan Dou, Bo Liu, Yinghao Jia and Changhong Wang

Sensors 2025, 25(8), 2404; https://doi.org/10.3390/s25082404 - 10 Apr 2025

Viewed by 590

Abstract

Two-view epipolar initialization for feature-based monocular SLAM with the RANSAC approach is challenging in dynamic environments. This paper presents a universal and practical method for improving the automatic estimation of initial poses and landmarks across multiple frames in real time. Image features corresponding [...] Read more.

Two-view epipolar initialization for feature-based monocular SLAM with the RANSAC approach is challenging in dynamic environments. This paper presents a universal and practical method for improving the automatic estimation of initial poses and landmarks across multiple frames in real time. Image features corresponding to the same spatial points are matched and tracked across consecutive frames, and those that belong to stationary points are identified using ST-RANSAC, an algorithm designed to detect inliers based on both spatial and temporal consistency. Two-view epipolar computations are then performed in parallel among frames and corresponding features to select the most reliable initialization. The proposed method is integrated with ORB-SLAM3 and evaluated on dynamic datasets for comparative analysis with the baseline. The experimental results demonstrate that the proposed method improves the accuracy of initial pose estimations with the construction of static landmarks while significantly reducing feature extraction scale and computational cost. Full article

(This article belongs to the Section Navigation and Positioning)

► Show Figures

Figure 1

20 pages, 8973 KB

Open AccessArticle

UE-SLAM: Monocular Neural Radiance Field SLAM with Semantic Mapping Capabilities

by Yuquan Zhang, Guangan Jiang, Mingrui Li and Guosheng Feng

Symmetry 2025, 17(4), 508; https://doi.org/10.3390/sym17040508 - 27 Mar 2025

Viewed by 1339

Abstract

Neural Radiance Fields (NeRF) have transformed 3D reconstruction by enabling high-fidelity scene generation from sparse views. However, existing neural SLAM systems face challenges such as limited scene understanding and heavy reliance on depth sensors. We propose UE-SLAM, a real-time monocular SLAM system integrating [...] Read more.

Neural Radiance Fields (NeRF) have transformed 3D reconstruction by enabling high-fidelity scene generation from sparse views. However, existing neural SLAM systems face challenges such as limited scene understanding and heavy reliance on depth sensors. We propose UE-SLAM, a real-time monocular SLAM system integrating semantic segmentation, depth fusion, and robust tracking modules. By leveraging the inherent symmetry between semantic segmentation and depth estimation, UE-SLAM utilizes DINOv2 for instance segmentation and combines monocular depth estimation, radiance field-rendered depth, and an uncertainty framework to produce refined proxy depth. This approach enables high-quality semantic mapping and eliminates the need for depth sensors. Experiments on benchmark datasets demonstrate that UE-SLAM achieves robust semantic segmentation, detailed scene reconstruction, and accurate tracking, significantly outperforming existing monocular SLAM methods. The modular and symmetrical architecture of UE-SLAM ensures a balance between computational efficiency and reconstruction quality, aligning with the thematic focus of symmetry in engineering and computational systems. Full article

(This article belongs to the Section Engineering and Materials)

► Show Figures

Figure 1

15 pages, 2753 KB

Open AccessArticle

Monocular Object-Level SLAM Enhanced by Joint Semantic Segmentation and Depth Estimation

by Ruicheng Gao and Yue Qi

Sensors 2025, 25(7), 2110; https://doi.org/10.3390/s25072110 - 27 Mar 2025

Viewed by 965

Abstract

SLAM is regarded as a fundamental task in mobile robots and AR, implementing localization and mapping in certain circumstances. However, with only RGB images as input, monocular SLAM systems suffer problems of scale ambiguity and tracking difficulty in dynamic scenes. Moreover, high-level semantic [...] Read more.

SLAM is regarded as a fundamental task in mobile robots and AR, implementing localization and mapping in certain circumstances. However, with only RGB images as input, monocular SLAM systems suffer problems of scale ambiguity and tracking difficulty in dynamic scenes. Moreover, high-level semantic information can always contribute to the SLAM process due to its similarity to human vision. Addressing these problems, we propose a monocular object-level SLAM system enhanced by real-time joint depth estimation and semantic segmentation. The multi-task network, called JSDNet, is designed to predict depth and semantic segmentation simultaneously, with four contributions that include depth discretization, feature fusion, a weight-learned loss function, and semantic consistency optimization. Specifically, feature fusion facilitates the sharing of features between the two tasks, while semantic consistency aims to guarantee the semantic segmentation and depth consistency among various views. Based on the results of JSDNet, we design an object-level system that combines both pixel-level and object-level semantics with traditional tracking, mapping, and optimization processes. In addition, a scale recovery process is also integrated into the system to evaluate the truth scale. Experimental results on NYU depth v2 demonstrate state-of-the-art depth estimation and considerable segmentation precision under real-time performance, while the trajectory accuracy on TUM RGB-D shows less errors compared with other SLAM systems. Full article

(This article belongs to the Section Navigation and Positioning)

► Show Figures

Figure 1

Search Results (126)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (126)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI