sensors-logo

Journal Browser

Journal Browser

3D Reconstruction with RGB-D Cameras and Multi-sensors

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Sensing and Imaging".

Deadline for manuscript submissions: 10 April 2025 | Viewed by 21559

Special Issue Editors


E-Mail Website
Guest Editor
Department of Electronics and Informatics, Vrije Universiteit Brussel, Brussels, Belgium
Interests: biometrics; measurement; point cloud processing; deep learning; 3D body scanning
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Electronics and Informatics Department, Vrije Universiteit Brussel, 1050 Brussels, Belgium
Interests: machine learning; computer vision; 3D graphics; anthropometry
Special Issues, Collections and Topics in MDPI journals
School of Computing, University of Leeds, Leeds, UK
Interests: computer graphics; computer animation; computer vision; machine learning and robotics
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Civil Engineering Department, Geomatics Engineering Lab, Faculty of Eningeering, Cairo University, Cairo, Egypt
Interests: RGB-D sensors; SLAM; indoor navigation; plenoptic cameras

Special Issue Information

Dear Colleagues,

Multi-sensor systems are widely used in 3D reconstruction tasks, such as 3D shape reconstruction, 4D body scanning, and human activity monitoring, to name a few. Compared to single-sensor systems, multi-sensor systems can simultaneously capture data from different viewports, which enables real-time complete shape capture. However, multi-sensor systems are usually expensive and require professional knowledge for operation. With the advancement of commodity RGB-D cameras, there have been countless attempts to build low-cost 3D reconstruction systems. During these attempts, additional challenges were encountered (e.g., calibration of multiple RGB-D sensors, human joint detection from point clouds, low-resolution of the scanned images, and compression of large-scale point clouds), which have encouraged researchers to explore more advanced algorithms.

In this context, the objective of this Special Issue is to connect researchers in the field of camera calibration of multiple sensors, RGB-D sensors, machine learning, 3D scanning, 4D capture, and other related fields. This issue will provide a state-of-the-art representation of methods that have led to progress in the research and application of multiple sensors.

We are soliciting original, full-length, unpublished research articles and reviews focused on this research topic. Topics of interest include, but are not limited to, the following:

  • Point cloud processing
  • Multi-sensor shape capture
  • Multi-sensor human activity understanding
  • RGB-D 3D reconscturtion
  • RGB-D human activity understanding
  • RGB-D calibration
  • RGB-D SLAM
  • RGB-D data processing

Dr. Pengpeng Hu
Prof. Dr. Adrian Munteanu
Dr. He Wang
Dr. Walid Darwish
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (10 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

30 pages, 13386 KiB  
Article
Enhancing 3D Models with Spectral Imaging for Surface Reflectivity
by Adam Stech, Patrik Kamencay and Robert Hudec
Sensors 2024, 24(19), 6352; https://doi.org/10.3390/s24196352 - 30 Sep 2024
Viewed by 461
Abstract
The increasing demand for accurate and detailed 3D modeling in fields such as cultural heritage preservation, industrial inspection, and scientific research necessitates advanced techniques to enhance model quality. This paper addresses this necessity by incorporating spectral imaging data to improve the surface detail [...] Read more.
The increasing demand for accurate and detailed 3D modeling in fields such as cultural heritage preservation, industrial inspection, and scientific research necessitates advanced techniques to enhance model quality. This paper addresses this necessity by incorporating spectral imaging data to improve the surface detail and reflectivity of 3D models. The methodology integrates spectral imaging with traditional 3D modeling processes, offering a novel approach to capturing fine textures and subtle surface variations. The experimental results of this paper underscore the advantages of incorporating spectral imaging data in the creation of 3D models, particularly in terms of enhancing surface detail and reflectivity. The achieved experimental results demonstrate that 3D models generated with spectral imaging data exhibit significant improvements in surface detail and accuracy, particularly for objects with intricate surface patterns. These findings highlight the potential of spectral imaging in enhancing 3D model quality. This approach offers significant advancements in 3D modeling, contributing to more precise and reliable representations of complex surfaces. Full article
(This article belongs to the Special Issue 3D Reconstruction with RGB-D Cameras and Multi-sensors)
Show Figures

Figure 1

17 pages, 7063 KiB  
Article
Online Scene Semantic Understanding Based on Sparsely Correlated Network for AR
by Qianqian Wang, Junhao Song, Chenxi Du and Chen Wang
Sensors 2024, 24(14), 4756; https://doi.org/10.3390/s24144756 - 22 Jul 2024
Cited by 1 | Viewed by 792
Abstract
Real-world understanding serves as a medium that bridges the information world and the physical world, enabling the realization of virtual–real mapping and interaction. However, scene understanding based solely on 2D images faces problems such as a lack of geometric information and limited robustness [...] Read more.
Real-world understanding serves as a medium that bridges the information world and the physical world, enabling the realization of virtual–real mapping and interaction. However, scene understanding based solely on 2D images faces problems such as a lack of geometric information and limited robustness against occlusion. The depth sensor brings new opportunities, but there are still challenges in fusing depth with geometric and semantic priors. To address these concerns, our method considers the repeatability of video stream data and the sparsity of newly generated data. We introduce a sparsely correlated network architecture (SCN) designed explicitly for online RGBD instance segmentation. Additionally, we leverage the power of object-level RGB-D SLAM systems, thereby transcending the limitations of conventional approaches that solely emphasize geometry or semantics. We establish correlation over time and leverage this correlation to develop rules and generate sparse data. We thoroughly evaluate the system’s performance on the NYU Depth V2 and ScanNet V2 datasets, demonstrating that incorporating frame-to-frame correlation leads to significantly improved accuracy and consistency in instance segmentation compared to existing state-of-the-art alternatives. Moreover, using sparse data reduces data complexity while ensuring the real-time requirement of 18 fps. Furthermore, by utilizing prior knowledge of object layout understanding, we showcase a promising application of augmented reality, showcasing its potential and practicality. Full article
(This article belongs to the Special Issue 3D Reconstruction with RGB-D Cameras and Multi-sensors)
Show Figures

Figure 1

11 pages, 153891 KiB  
Article
Neural Colour Correction for Indoor 3D Reconstruction Using RGB-D Data
by Tiago Madeira, Miguel Oliveira and Paulo Dias
Sensors 2024, 24(13), 4141; https://doi.org/10.3390/s24134141 - 26 Jun 2024
Cited by 2 | Viewed by 1061
Abstract
With the rise in popularity of different human-centred applications using 3D reconstruction data, the problem of generating photo-realistic models has become an important task. In a multiview acquisition system, particularly for large indoor scenes, the acquisition conditions will differ along the environment, causing [...] Read more.
With the rise in popularity of different human-centred applications using 3D reconstruction data, the problem of generating photo-realistic models has become an important task. In a multiview acquisition system, particularly for large indoor scenes, the acquisition conditions will differ along the environment, causing colour differences between captures and unappealing visual artefacts in the produced models. We propose a novel neural-based approach to colour correction for indoor 3D reconstruction. It is a lightweight and efficient approach that can be used to harmonize colour from sparse captures over complex indoor scenes. Our approach uses a fully connected deep neural network to learn an implicit representation of the colour in 3D space, while capturing camera-dependent effects. We then leverage this continuous function as reference data to estimate the required transformations to regenerate pixels in each capture. Experiments to evaluate the proposed method on several scenes of the MP3D dataset show that it outperforms other relevant state-of-the-art approaches. Full article
(This article belongs to the Special Issue 3D Reconstruction with RGB-D Cameras and Multi-sensors)
Show Figures

Figure 1

18 pages, 3738 KiB  
Article
Research on Three-Dimensional Reconstruction of Ribs Based on Point Cloud Adaptive Smoothing Denoising
by Darong Zhu, Diao Wang, Yuanjiao Chen, Zhe Xu and Bishi He
Sensors 2024, 24(13), 4076; https://doi.org/10.3390/s24134076 - 23 Jun 2024
Viewed by 876
Abstract
The traditional methods for 3D reconstruction mainly involve using image processing techniques or deep learning segmentation models for rib extraction. After post-processing, voxel-based rib reconstruction is achieved. However, these methods suffer from limited reconstruction accuracy and low computational efficiency. To overcome these limitations, [...] Read more.
The traditional methods for 3D reconstruction mainly involve using image processing techniques or deep learning segmentation models for rib extraction. After post-processing, voxel-based rib reconstruction is achieved. However, these methods suffer from limited reconstruction accuracy and low computational efficiency. To overcome these limitations, this paper proposes a 3D rib reconstruction method based on point cloud adaptive smoothing and denoising. We converted voxel data from CT images to multi-attribute point cloud data. Then, we applied point cloud adaptive smoothing and denoising methods to eliminate noise and non-rib points in the point cloud. Additionally, efficient 3D reconstruction and post-processing techniques were employed to achieve high-accuracy and comprehensive 3D rib reconstruction results. Experimental calculations demonstrated that compared to voxel-based 3D rib reconstruction methods, the 3D rib models generated by the proposed method achieved a 40% improvement in reconstruction accuracy and were twice as efficient as the former. Full article
(This article belongs to the Special Issue 3D Reconstruction with RGB-D Cameras and Multi-sensors)
Show Figures

Figure 1

18 pages, 2095 KiB  
Article
FusionVision: A Comprehensive Approach of 3D Object Reconstruction and Segmentation from RGB-D Cameras Using YOLO and Fast Segment Anything
by Safouane El Ghazouali, Youssef Mhirit, Ali Oukhrid, Umberto Michelucci and Hichem Nouira
Sensors 2024, 24(9), 2889; https://doi.org/10.3390/s24092889 - 30 Apr 2024
Cited by 1 | Viewed by 2398
Abstract
In the realm of computer vision, the integration of advanced techniques into the pre-processing of RGB-D camera inputs poses a significant challenge, given the inherent complexities arising from diverse environmental conditions and varying object appearances. Therefore, this paper introduces FusionVision, an exhaustive pipeline [...] Read more.
In the realm of computer vision, the integration of advanced techniques into the pre-processing of RGB-D camera inputs poses a significant challenge, given the inherent complexities arising from diverse environmental conditions and varying object appearances. Therefore, this paper introduces FusionVision, an exhaustive pipeline adapted for the robust 3D segmentation of objects in RGB-D imagery. Traditional computer vision systems face limitations in simultaneously capturing precise object boundaries and achieving high-precision object detection on depth maps, as they are mainly proposed for RGB cameras. To address this challenge, FusionVision adopts an integrated approach by merging state-of-the-art object detection techniques, with advanced instance segmentation methods. The integration of these components enables a holistic (unified analysis of information obtained from both color RGB and depth D channels) interpretation of RGB-D data, facilitating the extraction of comprehensive and accurate object information in order to improve post-processes such as object 6D pose estimation, Simultanious Localization and Mapping (SLAM) operations, accurate 3D dataset extraction, etc. The proposed FusionVision pipeline employs YOLO for identifying objects within the RGB image domain. Subsequently, FastSAM, an innovative semantic segmentation model, is applied to delineate object boundaries, yielding refined segmentation masks. The synergy between these components and their integration into 3D scene understanding ensures a cohesive fusion of object detection and segmentation, enhancing overall precision in 3D object segmentation. Full article
(This article belongs to the Special Issue 3D Reconstruction with RGB-D Cameras and Multi-sensors)
Show Figures

Figure 1

19 pages, 10016 KiB  
Article
LNMVSNet: A Low-Noise Multi-View Stereo Depth Inference Method for 3D Reconstruction
by Weiming Luo, Zongqing Lu and Qingmin Liao
Sensors 2024, 24(8), 2400; https://doi.org/10.3390/s24082400 - 9 Apr 2024
Cited by 1 | Viewed by 1522
Abstract
With the widespread adoption of modern RGB cameras, an abundance of RGB images is available everywhere. Therefore, multi-view stereo (MVS) 3D reconstruction has been extensively applied across various fields because of its cost-effectiveness and accessibility, which involves multi-view depth estimation and stereo matching [...] Read more.
With the widespread adoption of modern RGB cameras, an abundance of RGB images is available everywhere. Therefore, multi-view stereo (MVS) 3D reconstruction has been extensively applied across various fields because of its cost-effectiveness and accessibility, which involves multi-view depth estimation and stereo matching algorithms. However, MVS tasks face noise challenges because of natural multiplicative noise and negative gain in algorithms, which reduce the quality and accuracy of the generated models and depth maps. Traditional MVS methods often struggle with noise, relying on assumptions that do not always hold true under real-world conditions, while deep learning-based MVS approaches tend to suffer from high noise sensitivity. To overcome these challenges, we introduce LNMVSNet, a deep learning network designed to enhance local feature attention and fuse features across different scales, aiming for low-noise, high-precision MVS 3D reconstruction. Through extensive evaluation of multiple benchmark datasets, LNMVSNet has demonstrated its superior performance, showcasing its ability to improve reconstruction accuracy and completeness, especially in the recovery of fine details and clear feature delineation. This advancement brings hope for the widespread application of MVS, ranging from precise industrial part inspection to the creation of immersive virtual environments. Full article
(This article belongs to the Special Issue 3D Reconstruction with RGB-D Cameras and Multi-sensors)
Show Figures

Figure 1

26 pages, 11619 KiB  
Article
Neural Radiance Fields-Based 3D Reconstruction of Power Transmission Lines Using Progressive Motion Sequence Images
by Yujie Zeng, Jin Lei, Tianming Feng, Xinyan Qin, Bo Li, Yanqi Wang, Dexin Wang and Jie Song
Sensors 2023, 23(23), 9537; https://doi.org/10.3390/s23239537 - 30 Nov 2023
Cited by 3 | Viewed by 1812
Abstract
To address the fuzzy reconstruction effect on distant objects in unbounded scenes and the difficulty in feature matching caused by the thin structure of power lines in images, this paper proposes a novel image-based method for the reconstruction of power transmission lines (PTLs). [...] Read more.
To address the fuzzy reconstruction effect on distant objects in unbounded scenes and the difficulty in feature matching caused by the thin structure of power lines in images, this paper proposes a novel image-based method for the reconstruction of power transmission lines (PTLs). The dataset used in this paper comprises PTL progressive motion sequence datasets, constructed by a visual acquisition system carried by a developed Flying–walking Power Line Inspection Robot (FPLIR). This system captures close-distance and continuous images of power lines. The study introduces PL-NeRF, that is, an enhanced method based on the Neural Radiance Fields (NeRF) method for reconstructing PTLs. The highlights of PL-NeRF include (1) compressing the unbounded scene of PTLs by exploiting the spatial compression of normal L; (2) encoding the direction and position of the sample points through Integrated Position Encoding (IPE) and Hash Encoding (HE), respectively. Compared to existing methods, the proposed method demonstrates good performance in 3D reconstruction, with fidelity indicators of PSNR = 29, SSIM = 0.871, and LPIPS = 0.087. Experimental results highlight that the combination of PL-NeRF with progressive motion sequence images ensures the integrity and continuity of PTLs, improving the efficiency and accuracy of image-based reconstructions. In the future, this method could be widely applied for efficient and accurate 3D reconstruction and inspection of PTLs, providing a strong foundation for automated monitoring of transmission corridors and digital power engineering. Full article
(This article belongs to the Special Issue 3D Reconstruction with RGB-D Cameras and Multi-sensors)
Show Figures

Figure 1

21 pages, 13877 KiB  
Article
Recognition and Counting of Apples in a Dynamic State Using a 3D Camera and Deep Learning Algorithms for Robotic Harvesting Systems
by R. M. Rasika D. Abeyrathna, Victor Massaki Nakaguchi, Arkar Minn and Tofael Ahamed
Sensors 2023, 23(8), 3810; https://doi.org/10.3390/s23083810 - 7 Apr 2023
Cited by 18 | Viewed by 4403
Abstract
Recognition and 3D positional estimation of apples during harvesting from a robotic platform in a moving vehicle are still challenging. Fruit clusters, branches, foliage, low resolution, and different illuminations are unavoidable and cause errors in different environmental conditions. Therefore, this research aimed to [...] Read more.
Recognition and 3D positional estimation of apples during harvesting from a robotic platform in a moving vehicle are still challenging. Fruit clusters, branches, foliage, low resolution, and different illuminations are unavoidable and cause errors in different environmental conditions. Therefore, this research aimed to develop a recognition system based on training datasets from an augmented, complex apple orchard. The recognition system was evaluated using deep learning algorithms established from a convolutional neural network (CNN). The dynamic accuracy of the modern artificial neural networks involving 3D coordinates for deploying robotic arms at different forward-moving speeds from an experimental vehicle was investigated to compare the recognition and tracking localization accuracy. In this study, a Realsense D455 RGB-D camera was selected to acquire 3D coordinates of each detected and counted apple attached to artificial trees placed in the field to propose a specially designed structure for ease of robotic harvesting. A 3D camera, YOLO (You Only Look Once), YOLOv4, YOLOv5, YOLOv7, and EfficienDet state-of-the-art models were utilized for object detection. The Deep SORT algorithm was employed for tracking and counting detected apples using perpendicular, 15°, and 30° orientations. The 3D coordinates were obtained for each tracked apple when the on-board camera in the vehicle passed the reference line and was set in the middle of the image frame. To optimize harvesting at three different speeds (0.052 ms−1, 0.069 ms−1, and 0.098 ms−1), the accuracy of 3D coordinates was compared for three forward-moving speeds and three camera angles (15°, 30°, and 90°). The mean average precision ([email protected]) values of YOLOv4, YOLOv5, YOLOv7, and EfficientDet were 0.84, 0.86, 0.905, and 0.775, respectively. The lowest root mean square error (RMSE) was 1.54 cm for the apples detected by EfficientDet at a 15° orientation and a speed of 0.098 ms−1. In terms of counting apples, YOLOv5 and YOLOv7 showed a higher number of detections in outdoor dynamic conditions, achieving a counting accuracy of 86.6%. We concluded that the EfficientDet deep learning algorithm at a 15° orientation in 3D coordinates can be employed for further robotic arm development while harvesting apples in a specially designed orchard. Full article
(This article belongs to the Special Issue 3D Reconstruction with RGB-D Cameras and Multi-sensors)
Show Figures

Figure 1

25 pages, 20118 KiB  
Article
Light Field View Synthesis Using the Focal Stack and All-in-Focus Image
by Rishabh Sharma, Stuart Perry and Eva Cheng
Sensors 2023, 23(4), 2119; https://doi.org/10.3390/s23042119 - 13 Feb 2023
Viewed by 2228
Abstract
Light field reconstruction and synthesis algorithms are essential for improving the lower spatial resolution for hand-held plenoptic cameras. Previous light field synthesis algorithms produce blurred regions around depth discontinuities, especially for stereo-based algorithms, where no information is available to fill the occluded areas [...] Read more.
Light field reconstruction and synthesis algorithms are essential for improving the lower spatial resolution for hand-held plenoptic cameras. Previous light field synthesis algorithms produce blurred regions around depth discontinuities, especially for stereo-based algorithms, where no information is available to fill the occluded areas in the light field image. In this paper, we propose a light field synthesis algorithm that uses the focal stack images and the all-in-focus image to synthesize a 9 × 9 sub-aperture view light field image. Our approach uses depth from defocus to estimate a depth map. Then, we use the depth map and the all-in-focus image to synthesize the sub-aperture views, and their corresponding depth maps by mimicking the apparent shifting of the central image according to the depth values. We handle the occluded regions in the synthesized sub-aperture views by filling them with the information recovered from the focal stack images. We also show that, if the depth levels in the image are known, we can synthesize a high-accuracy light field image with just five focal stack images. The accuracy of our approach is compared with three state-of-the-art algorithms: one non-learning and two CNN-based approaches, and the results show that our algorithm outperforms all three in terms of PSNR and SSIM metrics. Full article
(This article belongs to the Special Issue 3D Reconstruction with RGB-D Cameras and Multi-sensors)
Show Figures

Figure 1

10 pages, 4347 KiB  
Article
3D Reconstruction Using 3D Registration-Based ToF-Stereo Fusion
by Sukwoo Jung, Youn-Sung Lee, Yunju Lee and KyungTaek Lee
Sensors 2022, 22(21), 8369; https://doi.org/10.3390/s22218369 - 1 Nov 2022
Cited by 9 | Viewed by 3167
Abstract
Depth sensing is an important issue in many applications, such as Augmented Reality (AR), eXtended Reality (XR), and Metaverse. For 3D reconstruction, a depth map can be acquired by a stereo camera and a Time-of-Flight (ToF) sensor. We used both sensors complementarily to [...] Read more.
Depth sensing is an important issue in many applications, such as Augmented Reality (AR), eXtended Reality (XR), and Metaverse. For 3D reconstruction, a depth map can be acquired by a stereo camera and a Time-of-Flight (ToF) sensor. We used both sensors complementarily to improve the accuracy of 3D information of the data. First, we applied a generalized multi-camera calibration method that uses both color and depth information. Next, depth maps of two sensors were fused by 3D registration and reprojection approach. Then, hole-filling was applied to refine the new depth map from the ToF-stereo fused data. Finally, the surface reconstruction technique was used to generate mesh data from the ToF-stereo fused pointcloud data. The proposed procedure was implemented and tested with real-world data and compared with various algorithms to validate its efficiency. Full article
(This article belongs to the Special Issue 3D Reconstruction with RGB-D Cameras and Multi-sensors)
Show Figures

Figure 1

Back to TopTop