3D Computer Vision and 3D Reconstruction

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Computer Science & Engineering".

Deadline for manuscript submissions: 31 December 2025 | Viewed by 5150

Special Issue Editors


E-Mail Website
Guest Editor
School of Instrumentation and Optoelectronic Engineering, Beihang University, Beijing 100191, China
Interests: 3D display; holographic display
School of Instrumentation and Optoelectronic Engineering, Beihang University, Beijing 100191, China
Interests: 3D display; holographic display; near-eye display
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
College of Electronics and Information Engineering, Sichuan University, Chengdu, China
Interests: integral imaging
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

The Special Issue aims to explore the latest advancements and innovations in the field of 3D computer vision and 3D reconstruction techniques. With the recent rapid development in computer science, electronic devices, optics, and related fields, 3D computer vision and reconstruction have hit their stride in recent years. These technologies inject new vitality into fields such as intelligent manufacturing, AR/VR display, and so on. Currently, there are substantial research-and-development efforts being made by academia and industry in this field. Accordingly, this Special Issue seeks to showcase research papers, communications, and review articles that focus on (1) 3D information acquisition and processing; (2) 3D reconstruction methods; (3) computer vision; (4) deep learning for 3D systems; and (5) emerging optical materials for 3D reconstruction.

  1. 3D information acquisition and processing;
  2. 3D reconstruction methods;
  3. Computer vision;
  4. Deep learning for 3D systems;
  5. Emerging optical materials for 3D reconstruction.

Dr. Yilong Li
Dr. Di Wang
Prof. Dr. Huan Deng
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • 3D information acquisition and processing
  • 3D reconstruction methods
  • computer vision
  • deep learning for 3D systems
  • emerging optical materials for 3D reconstruction

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

19 pages, 3066 KiB  
Article
WGA-SWIN: Efficient Multi-View 3D Object Reconstruction Using Window Grouping Attention in Swin Transformer
by Sheikh Sohan Mamun, Shengbing Ren, MD Youshuf Khan Rakib and Galana Fekadu Asafa
Electronics 2025, 14(8), 1619; https://doi.org/10.3390/electronics14081619 - 17 Apr 2025
Viewed by 232
Abstract
Multi-view 3D reconstruction aims to discover 3D characteristics based on visual information captured across multiple viewpoints. Transformer networks have shown remarkable success in various computer vision tasks, including multi-view 3D reconstruction. However, the reconstruction of accurate 3D shapes faces challenges when trying to [...] Read more.
Multi-view 3D reconstruction aims to discover 3D characteristics based on visual information captured across multiple viewpoints. Transformer networks have shown remarkable success in various computer vision tasks, including multi-view 3D reconstruction. However, the reconstruction of accurate 3D shapes faces challenges when trying to efficiently extract and merge features across views. The existing frameworks struggled to capture the subtle relationships between the views, resulting in a poor reconstruction. To address this issue, we present a new framework, WGA-SWIN, for 3D reconstruction using multi-view objects. Our method introduces a Window Grouping Attention (WGA) mechanism that uses group tokens from different views for each window attention operation, enabling efficient inter-view and intra-view feature extraction. Diversity among various groups in a model contributes to the richness of feature learning, which results in advanced and dependable feature learning, resulting in more comprehensive and robust representations. Within the encoder swin transformer blocks, we integrated WGA to utilize both hierarchical design and shifted window attention mechanisms for efficient multi-view feature extraction. In addition, we developed a progressive hierarchical decoder that combines swin transformer blocks with 3D convolutions to utilize voxel representation, resulting in a high resolution for obtaining high-quality 3D reconstructions with fine structural details. The experimental results on the benchmark datasets ShapeNet and Pix3D demonstrate that our work achieves state-of-the-art (SOTA) performance, outperforming existing methods in both single-view and multi-view 3D reconstruction, beyond the capabilities of current technologies. We lead by 0.95% and 1.07% in both IoU and F-Scores respectively, which demonstrates the robustness of our method. Full article
(This article belongs to the Special Issue 3D Computer Vision and 3D Reconstruction)
Show Figures

Figure 1

19 pages, 1037 KiB  
Article
DepthCloud2Point: Depth Maps and Initial Point for 3D Point Cloud Reconstruction from a Single Image
by Galana Fekadu Asafa, Shengbing Ren, Sheikh Sohan Mamun and Kaleb Amsalu Gobena
Electronics 2025, 14(6), 1119; https://doi.org/10.3390/electronics14061119 - 12 Mar 2025
Viewed by 1172
Abstract
Reconstructing 3D objects from single-view images has acquired significant interest due to their wide-ranging applications in robotics, autonomous vehicles, virtual reality, and augmented reality. Current methods, including voxel-based and point cloud-based approaches, face critical challenges such as irregular point distributions and an inability [...] Read more.
Reconstructing 3D objects from single-view images has acquired significant interest due to their wide-ranging applications in robotics, autonomous vehicles, virtual reality, and augmented reality. Current methods, including voxel-based and point cloud-based approaches, face critical challenges such as irregular point distributions and an inability to preserve complex object details, which result in suboptimal reconstructions. To address these limitations, we propose DepthCloud2Point, a framework that combines depth maps, image features, and an initial point cloud to generate detailed and accurate 3D point clouds. Depth maps are employed to provide rich spatial cues, resolving depth ambiguities, while the initial point cloud serves as a geometric prior to ensure uniform point distribution. These components are integrated into a unified pipeline, where the encoder extracts semantic and geometric features, and the generator synthesizes high-fidelity 3D reconstructions. Our approach is trained end-to-end on both synthetic and real-world datasets, achieving state-of-the-art performance. Quantitative results on the ShapeNet dataset show that DepthCloud2Point outperforms 3D-LMNet by 19.07% in CD and 38.86% in EMD, and Pixel2Point by 18.77% in CD and 19.25% in EMD. We also perform a qualitative study that shows that our approach is able to generate reconstructions that closely align with ground truth, capturing intricate object details and maintaining spatial coherence, confirming its outperforming over the 3D-LMNet and Pixel2Point. Full article
(This article belongs to the Special Issue 3D Computer Vision and 3D Reconstruction)
Show Figures

Figure 1

19 pages, 26378 KiB  
Article
2D to 3D Human Skeleton Estimation Based on the Brown Camera Distortion Model and Constrained Optimization
by Lan Ma and Hua Huo
Electronics 2025, 14(5), 960; https://doi.org/10.3390/electronics14050960 - 27 Feb 2025
Viewed by 531
Abstract
In the rapidly evolving field of computer vision and machine learning, 3D skeleton estimation is critical for applications such as motion analysis and human–computer interaction. While stereo cameras are commonly used to acquire 3D skeletal data, monocular RGB systems attract attention due to [...] Read more.
In the rapidly evolving field of computer vision and machine learning, 3D skeleton estimation is critical for applications such as motion analysis and human–computer interaction. While stereo cameras are commonly used to acquire 3D skeletal data, monocular RGB systems attract attention due to benefits including cost-effectiveness and simple deployment. However, persistent challenges remain in accurately inferring depth from 2D images and reconstructing 3D structures using monocular approaches. The current 2D to 3D skeleton estimation methods overly rely on deep training of datasets, while neglecting the importance of human intrinsic structure and the principles of camera imaging. To address this, this paper introduces an innovative 2D to 3D gait skeleton estimation method that leverages the Brown camera distortion model and constrained optimization. Utilizing the Azure Kinect depth camera for capturing gait video, the Azure Kinect Body Tracking SDK was employed to effectively extract 2D and 3D joint positions. The camera’s distortion properties were analyzed, using the Brown camera distortion model which is suitable for this scenario, and iterative methods to compensate the distortion of 2D skeleton joints. By integrating the geometric constraints of the human skeleton, an optimization algorithm was analyzed to achieve precise 3D joint estimations. Finally, the framework was validated through comparisons between the estimated 3D joint coordinates and corresponding measurements captured by depth sensors. Experimental evaluations confirmed that this training-free approach achieved superior precision and stability compared to conventional methods. Full article
(This article belongs to the Special Issue 3D Computer Vision and 3D Reconstruction)
Show Figures

Figure 1

21 pages, 16103 KiB  
Article
PPF-Net: Efficient Multimodal 3D Object Detection with Pillar-Point Fusion
by Lingxiao Zhang and Changyong Li
Electronics 2025, 14(4), 685; https://doi.org/10.3390/electronics14040685 - 10 Feb 2025
Viewed by 581
Abstract
Detecting objects in 3D space using LiDAR is crucial for robotics and autonomous vehicles, but the sparsity of LiDAR-generated point clouds limits performance. Camera images, rich in semantic information, can effectively compensate for this limitation. We propose a simpler yet effective multimodal fusion [...] Read more.
Detecting objects in 3D space using LiDAR is crucial for robotics and autonomous vehicles, but the sparsity of LiDAR-generated point clouds limits performance. Camera images, rich in semantic information, can effectively compensate for this limitation. We propose a simpler yet effective multimodal fusion framework to enhance 3D object detection without complex network designs. We introduce a cross-modal GT-Paste data augmentation method to address challenges like 2D object acquisition and occlusions from added objects. To better integrate image features with sparse point clouds, we propose Pillar-Point Fusion (PPF), which projects non-empty pillars onto image feature maps and uses an attention mechanism to map semantic features from pillars to their constituent points, fusing them with the points’ geometric features. Additionally, we design the BD-IoU loss function, which measures 3D bounding box similarity, and a joint regression loss combining BD-IoU and Smooth L1, effectively guiding model training. Our framework achieves consistent improvements across KITTI benchmarks. On the validation set, PFF (PV-RCNN) achieves at least 1.84% AP improvement in Cyclist detection performance across all difficulty levels compared to other multimodal SOTA methods. On the test set, PPF-Net excels in pedestrian detection for moderate and hard difficulty levels and achieves the best results in low-beam LiDAR scenarios. Full article
(This article belongs to the Special Issue 3D Computer Vision and 3D Reconstruction)
Show Figures

Figure 1

13 pages, 6806 KiB  
Article
Dual-Branch Dynamic Object Segmentation Network Based on Spatio-Temporal Information Fusion
by Fei Huang, Zhiwen Wang, Yu Zheng, Qi Wang, Bingsen Hao and Yangkai Xiang
Electronics 2024, 13(20), 3975; https://doi.org/10.3390/electronics13203975 - 10 Oct 2024
Viewed by 787
Abstract
To address the issue of low accuracy in the segmentation of dynamic objects using semantic segmentation networks, a dual-branch dynamic object segmentation network has been proposed, which is based on the fusion of spatiotemporal information. First, an appearance–motion feature fusion module is designed, [...] Read more.
To address the issue of low accuracy in the segmentation of dynamic objects using semantic segmentation networks, a dual-branch dynamic object segmentation network has been proposed, which is based on the fusion of spatiotemporal information. First, an appearance–motion feature fusion module is designed, which characterizes the motion information of objects by introducing a residual graph. This module combines a co-attention mechanism and a motion correction method to enhance the extraction of appearance features for dynamic objects. Furthermore, to mitigate boundary blurring and misclassification issues when 2D semantic information is projected back into 3D point clouds, a majority voting strategy based on time-series point cloud information has been proposed. This approach aims to overcome the limitations of post-processing in single-frame point clouds. By doing this, this method can significantly enhance the accuracy of segmenting moving objects in practical scenarios. Test results from the semantic KITTI public dataset demonstrate that our improved method outperforms mainstream dynamic object segmentation networks like LMNet and MotionSeg3D. Specifically, it achieves an Intersection over Union (IoU) of 72.19%, representing an improvement of 9.68% and 4.86% compared to LMNet and MotionSeg3D, respectively. The proposed method, with its precise algorithm, has practical applications in autonomous driving perception. Full article
(This article belongs to the Special Issue 3D Computer Vision and 3D Reconstruction)
Show Figures

Figure 1

14 pages, 3852 KiB  
Article
Implementation of an FPGA-Based 3D Shape Measurement System Using High-Level Synthesis
by Tae-Hyeon Kim, Hyunki Lee and Seung-Ho Ok
Electronics 2024, 13(16), 3282; https://doi.org/10.3390/electronics13163282 - 19 Aug 2024
Viewed by 1227
Abstract
Three-dimensional(3D) shape measurement using point clouds has recently gained significant attention. Phase measuring profilometry (PMP) is widely preferred for its robustness against external lighting changes and high-precision results. However, PMP suffers from long computation times due to complex calculations and its high memory [...] Read more.
Three-dimensional(3D) shape measurement using point clouds has recently gained significant attention. Phase measuring profilometry (PMP) is widely preferred for its robustness against external lighting changes and high-precision results. However, PMP suffers from long computation times due to complex calculations and its high memory usage. It also faces a 2π ambiguity issue, as the measured phase is limited to the 2π range. This is typically resolved using dual-wavelength methods. However, these methods require separate measurements of phase changes at two wavelengths, increasing the data processing volume and computation times. Our study addresses these challenges by implementing a 3D shape measurement system on a System-on-Chip (SoC)-type Field-Programmable Gate Array (FPGA). We developed a PMP algorithm with dual-wavelength methods, accelerating it through high-level synthesis (HLS) on the FPGA. This hardware implementation significantly reduces computation time while maintaining measurement accuracy. The experimental results demonstrate that our system operates correctly on the SoC-type FPGA, achieving computation speeds approximately 11.55 times higher than those of conventional software implementations. Our approach offers a practical solution for real-time 3D shape measurement, potentially benefiting applications in fields such as quality control, robotics, and computer vision. Full article
(This article belongs to the Special Issue 3D Computer Vision and 3D Reconstruction)
Show Figures

Figure 1

Back to TopTop