Indoor Multidimensional Reconstruction Based on Maximal Cliques

Zhu, Yongtong; Li, Lei; Liu, Na; Li, Qingdu; Yuan, Ye

doi:10.3390/math13091400

Open AccessArticle

Indoor Multidimensional Reconstruction Based on Maximal Cliques

by

Yongtong Zhu

,

Lei Li

,

Na Liu

,

Qingdu Li

and

Ye Yuan

^*

Institute of Machine Intelligence, University of Shanghai for Science and Technology, Shanghai 200093, China

^*

Author to whom correspondence should be addressed.

Mathematics 2025, 13(9), 1400; https://doi.org/10.3390/math13091400

Submission received: 10 March 2025 / Revised: 11 April 2025 / Accepted: 23 April 2025 / Published: 25 April 2025

Download

Browse Figures

Versions Notes

Abstract

:

Three-dimensional reconstruction is an essential skill for robots to achieve complex operation tasks, including moving and grasping. Applying deep learning models to obtain stereoscopic scene information, accompanied by algorithms such as target detection and semantic segmentation to obtain finer labels of things, is the dominant paradigm for robots. However, large-scale point cloud registration and pixel-level labeling are usually time-consuming. Here, a novel two-branch network architecture based on PointNet features is designed. Its feature-sharing mechanism enables point cloud registration and semantic extraction to be carried out simultaneously, which is convenient for fast reconstruction of indoor environments. Moreover, it uses graph space instead of Euclidean space to map point cloud features to obtain better relationship matching. Through extensive experimentation, our method demonstrates a significant reduction in processing time, taking approximately one-tenth of the time required by the original method without a decline in accuracy. This efficiency enhancement enables the successful execution of downstream tasks such as positioning and navigation.

Keywords:

3D reconstruction; point cloud registration; semantic segmentation; feature sharing mechanism; graph space mapping

MSC:

68T07

1. Introduction

The ability of robots to perform precise operations in complex environments is an essential skill for them to deeply participate in human social scenarios, which requires that robots have strong environmental perception capabilities [1]. Currently, robots typically run multiple independent deep learning models in parallel to obtain environmental depth and semantic information, usually leading to feature extraction [2]. For example, legged robots achieves walking and obstacle avoidance by first performing dense 3D reconstruction using LiDAR, followed by point cloud segmentation [3,4]. Robotic arms perform grasping in a known environment, which undergoes 3D reconstruction followed by semantic segmentation [5,6]. Inspired by the semantic sharing mechanism using multiple cameras and multiple robotic arm acquisition devices in 3D reconstruction [7,8], if these models can share features, it can theoretically reduce the time spent on feature extraction and improve the efficiency of robot environmental perception. Moreover, we have observed that in the point cloud registration process of 3D reconstruction, the efficiency of searching the optimal corresponding point pairs in Euclidean space is significantly lower than searching in graph space [9]. Therefore, we intend to improve the robot’s 3D reconstruction and semantic segmentation abilities through feature sharing and point cloud registration optimization.

Point cloud reconstruction plays a crucial role in enabling robots to understand and interact with their environment by providing a detailed, pre-established map of the working scene [10,11]. The reconstruction process [12] typically involves depth map post-processing, followed by aligning the point clouds and finally integrating the data into a unified model. Among these steps, point cloud registration is the most critical, as it ensures that individual point clouds are accurately aligned, forming a coherent and complete 3D representation of the environment. The current registration methods [13] focus on matching the extracted features and then regressing the transformation relationship. Previous studies, such as ICP [14] and RPMNet [15], have concentrated on extracting robust features, but they have overlooked the process of effectively searching for matches in the existing feature space. Zhang [9] proposed that searching for matching point features in graph space exhibits higher affinity than in Euclidean space. Inspired by this, we introduced an iterative registration module based on the maximal clique method in the reconstruction module.

After obtaining a 3D reconstructed map, semantic segmentation is necessary to further adapt the environment for robotic task execution. Traditional methods for semantic segmentation often rely on techniques like edge detection and region growing, as seen in the work of Carreira and Sminchisescu [16], which explores automatic object segmentation using constrained parametric min-cuts. In contrast, deep learning methods leverage shared MLP networks for improved accuracy and efficiency, as demonstrated by the PointNet++ architecture [17], which effectively captures local features in point cloud data for semantic segmentation tasks. It is evident that during the semantic understanding phase, the robot performs additional feature extraction in the point cloud, making it a natural idea to directly transfer features from the reconstruction phase to the segmentation phase [2].

Based on the above analysis, we propose a multidimensional reconstruction scheme that can quickly reconstruct indoor scenes with 3D and semantic information with a dual-branch deep neural network. To this end, we design an end-to-end registration network based on PointNet, which utilizes maximal cliques based on PointNet. Furthermore, we introduce a secondary branch network to obtain our semantic information with shared MLP and softmax with the extracted PointNet features. To summarize, our contributions are as follows.

(1): An efficient dual-branch network architecture is designed as the main framework for 3D reconstruction and semantic segmentation.
(2): A novel point cloud registration method is developed based on maximal cliques to improve the efficiency of looking for the optimal corresponding point pairs in point clouds.
(3): Evaluations on several datasets show that our approach can significantly reduce processing time without losing accuracy in indoor 3D reconstruction.

2. Related Work

2.1. Traditional Reconstruction Method

Three-dimensional reconstruction is a technique that generates three-dimensional models by extracting 3D information from 2D images or depth sensor data. This technology finds wide applications in various fields [18], including virtual reality, augmented reality, medical imaging, and robotic navigation. Traditional 3D reconstruction methods primarily rely on geometric operators and optimization algorithms. These methods [19,20] typically depend on feature extraction, matching, and geometric transformation steps to reconstruct point clouds in 3D space. However, the feature representation capabilities of traditional geometric operators are not strong enough, which, despite having some degree of generalization ability, significantly limits their capacity to reconstruct complex scenes. Moreover, subsequent feature-based correspondence searches, employing algorithms such as RANSAC [21], are computationally intensive and inefficient. Despite these limitations, traditional operator-based methods have laid a solid foundation for 3D reconstruction. While they face challenges in handling complex scenes and dynamic environments, their performance remains robust in many static and regular scenarios [22].

2.2. Learned Reconstruction Method

Deep learning-based methods for 3D reconstruction [23,24] typically leverage global features learned by deep networks to accomplish reconstruction tasks. While deep learning approaches [25,26] generally outperform traditional methods in specific applications, they suffer from poorer interpretability compared to geometric operator-based techniques. However, given their superior performance in many scenarios, we chose to combine the strengths of both approaches, sacrificing some interpretability to enhance overall performance. Despite these advancements, common feature matching techniques still face efficiency challenges. Zhang [9] proposed that employing maximal clique-based search methods can significantly improve the feature matching process; this can be effectively integrated into our reconstruction algorithms. Building on this, our model’s powerful learning capabilities are utilized by training on large-scale datasets, enabling it to learn shape priors across various object categories. This allows for reliable reconstruction of multiple types of objects.

2.3. Segmentation in Reconstruction

Point cloud segmentation is the process of partitioning a point cloud into meaningful clusters or segments, which is essential in 3D reconstruction tasks [27,28]. Segmenting the point cloud allows for the identification and differentiation of various objects and surfaces within the scene, facilitating more accurate and efficient reconstruction.

Existing methods [29,30,31] generally process point clouds directly using point convolutions or by inputting the point cloud data directly into neural networks. The PointNet series [17,32,33] utilizes shared multi-layer perceptrons (sharedMLPs) to directly input point clouds into the network for regression tasks. Point convolution-based methods convert point clouds into voxels for processing using 3D convolutional networks. Transformer-based approaches [34,35], on the other hand, require the design of specialized attention modules to focus the model’s attention effectively. However, none of these studies considered embedding their methods into 3D reconstruction tasks. Recognizing the high potential for feature reuse between point cloud registration and segmentation in reconstruction, we have developed a feature-sharing mechanism that allows both segmentation and reconstruction tasks to be performed simultaneously. This significantly improves the utilization of computational resources.

Inspired by the architecture of PointNet, we propose an innovative approach that incorporates a feature-sharing mechanism. This mechanism enables simultaneous reconstruction and semantic segmentation by sharing learned features across both tasks. This integration not only improves the efficiency of the reconstruction process but also enhances the accuracy of segmentation, thereby providing a more comprehensive understanding of the 3D scene.

3. Method

In this paper, we propose a novel fast indoor 3D reconstruction framework based on deep neural networks and maximal clique estimation, which can simultaneously perform scene reconstruction and semantic segmentation, as illustrated in Figure 1. Specifically, the framework consists of three modules: the feature extraction module, point cloud registration module, and semantic segmentation module. The feature extraction module aims to extract the features (e.g.,

F_{i}

and

F_{j}

) of adjacent point cloud segments; the point cloud registration module applies the maximal clique estimation method [9] to obtain the transformation matrix

{R, t}

; and the semantic segmentation module is responsible for deriving semantic labels (e.g.,

L_{1}

and

L_{2}

). Finally, the reconstructed point cloud scene is generated by combining the transformation matrix and semantic labels.

3.1. Point Cloud Registration

Given a source point cloud

P = {p_{i} \in R^{3} ∣ i = 1, \dots, I}

and a reference point cloud

Q = {q_{i} \in R^{3} ∣ i = 1, \dots, J}

, where I and J represent the number of points, respectively, our objective is to recover the unknown rigid transformation matrix

{R, t}

between these two point clouds.

R \in S O (3)

is a rotation matrix and

t \in R^{3}

is a translation vector; these are used to align the two point clouds. The point cloud registration can be formulated by the following objective function:

\underset{R, t}{error} \sum_{i = 1}^{I} \sum_{j = 1}^{J} ∣ ∣ p_{i} - (R q_{j} + t) {∣ ∣}^{2}

(1)

The principle of point cloud registration, described in Equation (1), is to minimize the Euclidean distance between corresponding point pairs. Although this is intuitive and concise, it does not take into account the global and local features of the points, which makes it difficult to achieve good registration results [36,37]. For this reason, we try to fuse the global feature

F_{p}

(extracted by PointNet) and the local feature

F_{f}

(i.e., geometric FPFH feature [38]) to obtain the fused feature

F_{f p}

that can better represent the global and local geometric levels. This approach not only mitigates the potential for local convergence in optimization caused by geometric features but also enhances the feature representation by incorporating the global PointNet features [17]. Therefore, the optimization goal can be further converted into the following:

F_{f p} (p_{i}) = concat (F_{f} (p_{i}), F_{p} (p_{i}))

(2)

\underset{R, t}{error} \sum_{i = 1}^{I} \sum_{j = 1}^{J} ∣ ∣ F_{f p} (p_{i}) - (R F_{f p} (q_{j}) + t) {∣ ∣}^{2}

(3)

According to the maximal clique theory [39], inliers are usually geometrically compatible with each other to form cliques in the graph, and the graph space can better describe the similarity between points than the Euclidean space. Inspired by this theory, we employed maximal clique estimation in the point cloud registration module to ascertain the corresponding point pairs. Specifically, the initial correspondence relationships

C_{initial} = {c}, where c = {(p_{i}, q_{j}) ∣ p_{i} \in P, q_{j} \in Q}

are represented by covariance matrices computed from the features of each point, facilitating subsequent graph construction. Subsequently, we construct a first-order undirected graph [40] G for

C_{initial}

, and the connection strength

S_{c m p} (c_{m}, c_{n})

between point pairs

c_{m}

and

c_{n}

is estimated as in Equation (4), with its parameters

d_{c m p} = 10 p r

, where

p r

is the point cloud resolution. We discuss in detail the determination of the

d_{c m p}

hyperparameter in the experimental section. Notably,

c_{m}

represents a pair of matching points

p_{i}

and

q_{j}

coming from the source point cloud and the template point cloud, respectively.

S_{c m p} (c_{m}, c_{n}) = e x p (- \frac{S_{d i s t} {(c_{m}, c_{n})}^{2}}{2 d_{c m p}^{2}})

(4)

S_{d i s t} (c_{m}, c_{n}) = | ∥p_{m} - p_{n}∥ - ∥q_{m} - q_{n}∥ |

(5)

After obtaining the initial set of corresponding point pairs, the method based on maximal clique search is used to filter out more accurate point pairs, thereby calculating the optimal transformation matrix, as shown in Figure 2. Specifically, a graph

G = (V, E)

is constructed based on

C_{initial}

, where E is the set of edges with the connection strength

S_{c m p} > d_{cmp}

between point pairs

c_{m}

and

c_{n}

. We then apply the Bron–Kerbosch algorithm [41] to search for the maximal clique in the graph G.

Each detected maximal clique contains several corresponding point pairs. Using these point pairs, the RT matrix is computed through the application of the SVD (singular value decomposition [42]) algorithm. Suppose m maximal cliques are detected; this would allow for the computation of m RT matrices. For each matrix, the RMSE (root mean square error) of the coordinates of the point clouds before and after transformation is calculated. We select the transformation matrix that yields the smallest RMSE as the optimal matrix. The above process can be described as follows:

{R^{*}, t^{*}} = arg min_{R, t} ∣ ∣ P - (R Q + t) {∣ ∣}^{2}

(6)

Notably, the segment pairs considered here maintain an overlap rate of over 30% during selection. By maintaining a sufficient overlap, the accuracy and robustness of the reconstruction process are significantly enhanced. This ensures that the point cloud segments are effectively utilized for reconstructing the scene.

3.2. Semantic Segmentation and Reconstruction Fusion

The semantic information in a reconstructed scene is crucial for understanding the environment, with our approach emphasizing the bypassing of redundant feature extraction through direct regression of the semantic label. Specifically, as shown in Figure 1c, we leverage the PointNet features extracted during the registration process, requiring only a simple MLP classifier to perform the semantic recognition.

Many putative loop closures detected through pairwise registration are false positives. While the superiority of our proposed registration algorithm significantly reduces such occurrences, we still incorporate a unified least-squares objective that jointly estimates the global configuration of the scene and the validity of each constraint. Consider a pose graph with vertices

P_{i}

and edges

{R_{i}} \cup {T_{i j}}

. Our goal is to compute a set of poses

\tilde{T} = {T_{i}}

that localizes the fragments in the global coordinate frame. This can be expressed as an objective of the form. To solve this objective function, Open3D provides two APIs: Levenberg–Marquardt [43] and Gauss–Newton. We opted for the Levenberg–Marquardt algorithm, as recommended by the Open3D official documentation. This choice is based on its introduction of a damping factor that adjusts the search direction and step size, enabling it to maintain good convergence speed and numerical stability across different optimization stages.

E (T, L) = \sum_{i} f (T_{i}, T_{i + 1}, R_{i}) + \sum_{i j} l_{i j} f (T_{i}, T_{j}, T_{i j}) + μ \sum_{i, j} Ψ (l_{i j}) .

(7)

Subsequently, we refine the selected loop closures using ICP and utilize pose graph optimization to obtain the final segment poses in the global frame.

4. Experiments

4.1. Datasets

(1): ModelNet40

We evaluated our registration algorithm on the ModelNet 40 dataset [44]. This artificial CAD dataset contains 40 categories of 3D models (such as airplanes, cars, plants, etc.), including 12,311 CAD instances, of which 9843 are used for training and the remaining 2468 are used for testing. In the data preprocessing stage, we use uniform sampling to obtain 2048 points for each model instance, since uniform sampling was chosen as the preprocessing method for this dataset because it effectively preserves the original data distribution and ensures a balanced representation of data across different regions. The generation of labels, namely, the RT matrix, is limited in this article to rotation

[- 45^{\circ}, 45^{\circ}]

and translation [−1, 1].

(2): Lounge and Bedroom Room

LoungeRGBDImages is an example dataset provided by Open3D [45] which includes 3000 color and depth image samples extracted from the Standford Lounge RGBD dataset. In addition, the dataset also includes camera trajectory logs and network reconstruction. Therefore, it can conveniently evaluate the effectiveness of various reconstruction methods.

BedroomRGBDImages is also an example dataset provided by Open3D; it includes sample sets from the Redwood BedroomRGBD dataset. This dataset is mainly used for research and development of 3D reconstruction and SLAM, and as it provides accurate reconstruction model truth values, we use it to evaluate our reconstruction methods.

(3): ICL-NUIM

The ICL-NUIM dataset [46] contains RGB-D image sequences of two room scenes (living room and office), both of which provide accurate camera trajectory truth values, with only the living room 3D reconstruction model truth values. Therefore, we chose the living room scenario to test our reconstruction algorithm. We used a 30 Hz frame rate dataset for the living room, which is more in line with the working conditions of the robot. This contains image information for frames 1510, 967, 882, and 1242.

(4): Real-world scene

To mitigate the potential distortions in the aforementioned dataset and the uncertainties inherent in the experimental process, we further collected real-world data from a turbine unit to validate the effectiveness of our algorithm. The equipment used for data acquisition was an Intel RealSense D435 depth camera (Intel, Santa Clara, CA, USA), with a resolution of 640 × 480, and a total of 384 frames were captured.

4.2. Registration Experiments

We conducted comparative registration experiments on the ModelNet40 dataset, which served as a preliminary experiment for our reconstruction experiment and provided the experimental basis for integrating registration experiments into the reconstruction experiment. The experimental comparison methods included PCRNet and ICP as shown in Table 1 and Figure 3. Since the inlier threshold was 10 pr, we use

R M S E < 0.01

as a threshold to measure the success of the registration. When the RMSE of a pair of point cloud registration was less than 0.01, we considered it a successful registration example; otherwise, it was considered a failure. For this purpose, we evaluated our method on the ModelNet test set and plotted the CED curve with

R o t . E r r o r < 10^{\circ}

.

We also conducted a series of qualitative analyses of the evaluation indicators. The main reference points are the rotation error and the translation error. The rotation error has the greatest impact and is also the most difficult to register for a point cloud pair. It can be seen that the rotation error based on maximal clique registration is the smallest, and the translation error is also the smallest. This provides us with strong confidence in conducting reconstruction experiments.

4.3. Reconstruction Experiments

We conducted reconstruction experiments by integrating our aforementioned dual-branch reconstruction methods in LoungeRoom. Experiments were conducted on the BedroomRGBD and ICL-NUIM LivingRoom datasets. Figure 4 shows the reconstruction effect of the Open3D LoungeRoom. In terms of the reconstruction effect, it achieves an explosive effect.

First, we conducted a large number of experiments to determine the key parameter

d_{c m p}

, which is the inlier threshold. Specifically, it describes the average distance from each point in the point cloud to its nearest neighbor. For the point cloud pair data in the ICL-NUIM dataset, we set up a series of experiments to explore the impact of

d_{c m p}

on the reconstruction error. The experimental results are shown in Table 2. It can be seen that when

d_{c m p}

is too large, point pairs that do not match each other are classified as inliers. When

d_{c m p}

is too small, the screening of inliers is extremely harsh, resulting in too few point pairs that match each other. This leads to an increase in rotation error and translation error. Therefore, in this study we choose

d_{c m p} = 10

pr.

The biggest progress made in the experiment is that our reconstruction greatly reduces the inference time. The Table 3 below compares our reconstructions’ times on the Open3D open-source dataset and the ICL-NUIM dataset. We also measured the qualitative reconstruction evaluation indicators using the reconstruction model truth values provided in the dataset. It can be seen that our reconstruction time increases by dozens of times while the reconstruction quality does not decrease significantly.

We perform the same experiments on another dataset, ICL-NUIM LivingRoom, and the results are given in the Table 4 below.

To more accurately capture the various uncertainties and noise that may arise during the experimental process, we collected 384 depth and color images, with a resolution of 640 × 480, from a single turbine unit. The reconstruction results are visualized in Figure 5. As shown in Table 5, our method still significantly outperforms the Open3D reconstruction method when applied to real-world scenarios.

In order to further compare our reconstruction errors, we compared the reconstruction quality of the Open3D reconstruction system and our improved reconstruction system separately. The heat map error experiment can further analyze the fine-grained analysis of the reconstruction quality of each reconstruction area in the reconstruction results. We conducted error analysis on the reconstructed heatmap [47] for the three datasets of the above experiment, and compared them with the true values, as shown in Figure 6.

The same conclusion can be drawn from Table 3 and Table 4, indicating that the quality of reconstruction does not decrease significantly. And we can see that the difference in reconstruction quality is mainly reflected in the edges and corners. The main reason for this is that the closed-loop set composed of images from the edges and corners is much smaller than the closed-loop set composed of point cloud segments in the middle. Therefore, part of the reason for this is determined by the quality of the captured dataset. And from the heat map, it can be seen that the reconstruction quality of the middle part is quite good, and our method is not inferior to the quality of the Open3D reconstruction system.

4.4. Complexity Analysis

The experimental results have already demonstrated that our method significantly outperforms the Open3D reconstruction system. We further validate the effectiveness of our algorithm from the perspective of algorithmic complexity. The complexity improvement of the reconstruction system proposed in this paper mainly focuses on the registration module. Therefore, the complexity analysis primarily compares the registration algorithm based on the maximum clique and the Colored-ICP algorithm. The Colored-ICP registration method involves nearest neighbor search (

O (n log n)

), correspondence filtering (

O (n)

), SVD decomposition (

O (m^{3})

, where

m = n

), and an iteration count of

k = 100

. Thus, the total complexity is

O (k \times (n log n + n + n^{3})) \approx O (k n^{3})

. The maximum clique search registration method, after pruning acceleration, has a Bron–Kerbosch search algorithm complexity of

O (n^{2})

and an SVD decomposition complexity of

O (c \times m^{3})

, where m is the clique size. Therefore, the complexity comparison before and after the improvement is as follows. The single execution complexity comparison is Max-Clique

O (n^{2} + c \times m^{3})

vs. Colored-ICP

O (k n^{3})

. Moreover, the registration method based on the maximum clique search does not require iteration. Hence, from the perspective of algorithmic complexity, our algorithm has a significantly lower complexity than the original Open3D system.

5. Conclusions

We present a system for scene reconstruction from RGB-D video. Our approach is a deep learning-based PointNet that improves convergence behavior with learned fusion features and obtains semantic information from the scenario. Furthermore, the use of maximal clique theory allows our method to explicitly handle more complicated situations. The experimental results show our method yields excellent performance on the Open3D and ICL-NUIM datasets. In the future, we hope to improve the accuracy and efficiency by using higher-dimensional semantic information between frames.

Author Contributions

Conceptualization, Y.Z.; methodology Y.Z.; software Y.Z.; validation Y.Z. and L.L.; formal analysis, Y.Z.; investigation, Y.Z.; resources, Y.Z.; data curation, Y.Z.; writing—original draft preparation, Y.Z.; writing—review and editing, Y.Y. and N.L.; visualization Y.Z.; supervision Y.Y., N.L., and Q.L.; project administration Y.Z. and Q.L.; funding acquisition Y.Y. and Q.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The benchmark instances for this study are available at https://www.open3d.org/docs/latest/python_api/open3d.data.html and https://www.doc.ic.ac.uk/~ahanda/VaFRIC/iclnuim.html, accessed on 10 Match 2025.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Liu, X.; Gong, G.; Hu, X.; Shang, G.; Zhu, H. Cognitive Enhancement of Robot Path Planning and Environmental Perception Based on Gmapping Algorithm Optimization. Electronics 2024, 13, 818. [Google Scholar] [CrossRef]
Menini, D.; Kumar, S.; Oswald, M.R.; Sandström, E.; Sminchisescu, C.; Van Gool, L. A real-time online learning framework for joint 3d reconstruction and semantic segmentation of indoor scenes. IEEE Robot. Autom. Lett. 2021, 7, 1332–1339. [Google Scholar] [CrossRef]
Hu, D.; Gan, V.J.; Yin, C. Robot-assisted mobile scanning for automated 3D reconstruction and point cloud semantic segmentation of building interiors. Autom. Constr. 2023, 152, 104949. [Google Scholar] [CrossRef]
Hu, D.; Gan, V.J.; Wang, T.; Ma, L. Multi-agent robotic system (MARS) for UAV-UGV path planning and automatic sensory data collection in cluttered environments. Build. Environ. 2022, 221, 109349. [Google Scholar] [CrossRef]
Huang, W.; Wang, C.; Zhang, R.; Li, Y.; Wu, J.; Li, F.-F. Voxposer: Composable 3d value maps for robotic manipulation with language models. arXiv 2023, arXiv:2307.05973. [Google Scholar]
Chowdhery, A.; Narang, S.; Devlin, J.; Bosma, M.; Mishra, G.; Roberts, A.; Barham, P.; Chung, H.W.; Sutton, C.; Gehrmann, S.; et al. Palm: Scaling language modeling with pathways. J. Mach. Learn. Res. 2023, 24, 240. [Google Scholar]
Rozenberszki, D.; Soros, G.; Szeier, S.; Lorincz, A. 3D Semantic Label Transfer in Human-Robot Collaboration. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Montreal, BC, Canada, 11–17 October 2021. [Google Scholar]
Dong, S.; Xu, K.; Zhou, Q.; Tagliasacchi, A.; Chen, B. Multi-robot collaborative dense scene reconstruction. ACM Trans. Graph. (TOG) 2019, 38, 84. [Google Scholar] [CrossRef]
Zhang, X.; Yang, J.; Zhang, S.; Zhang, Y. 3D registration with maximal cliques. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 17745–17754. [Google Scholar]
Hu, K.; Chen, Z.; Kang, H.; Tang, Y. 3D vision technologies for a self-developed structural external crack damage recognition robot. Autom. Constr. 2024, 159, 105262. [Google Scholar] [CrossRef]
Vazquez, G.D.B.; Lacapmesure, A.M.; Martínez, S.; Martínez, O.E. SUPPOSe 3Dge: A method for super-resolved detection of surfaces in volumetric fluorescence microscopy. J. Opt. Photonics Res. 2024, 16, 123–135. [Google Scholar]
Grunnet-Jepsen, A.; Tong, D. Depth Post-Processing for Intel^® Realsense™ d400 Depth Cameras; New Technologies Group, Intel Corporation: Santa Clara, CA, USA, 2018; Available online: https://www.intel.com/content/www/us/en/content-details/842031/content-details.html (accessed on 22 April 2025).
Bai, X.; Luo, Z.; Zhou, L.; Chen, H.; Li, L.; Hu, Z.; Fu, H.; Tai, C.L. Pointdsc: Robust point cloud registration using deep spatial consistency. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 15859–15869. [Google Scholar]
Segal, A.; Haehnel, D.; Thrun, S. Generalized-icp. In Proceedings of the Robotics: Science and Systems, Seattle, WA, USA, 28 June–1 July 2009; Volume 2, p. 435. [Google Scholar]
Yew, Z.J.; Lee, G.H. Rpm-net: Robust point matching using learned features. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11824–11833. [Google Scholar]
Carreira, J.; Sminchisescu, C. CPMC: Automatic object segmentation using constrained parametric min-cuts. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 34, 1312–1328. [Google Scholar] [CrossRef]
Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 652–660. [Google Scholar]
Wöhler, C. 3D Computer Vision: Efficient Methods and Applications; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
Zhou, L.; Wu, G.; Zuo, Y.; Chen, X.; Hu, H. A Comprehensive Review of Vision-Based 3D Reconstruction Methods. Sensors 2024, 24, 2314. [Google Scholar] [CrossRef]
Jia, M.; Zhang, M. An Overview of Methods and Applications of 3D Reconstruction. Int. J. Comput. Sci. Inf. Technol. 2024, 3, 16–23. [Google Scholar] [CrossRef]
Derpanis, K.G. Overview of the RANSAC Algorithm. Image Rochester NY 2010, 4, 2–3. [Google Scholar]
Koch, A.; Dipanda, A.; Bourgeois République, C. Evolutionary-based 3D reconstruction using an uncalibrated stereovision system: Application of building a panoramic object view. Multimed. Tools Appl. 2012, 57, 565–586. [Google Scholar] [CrossRef]
Choy, C.B.; Xu, D.; Gwak, J.; Chen, K.; Savarese, S. 3d-r2n2: A unified approach for single and multi-view 3d object reconstruction. In Computer Vision–ECCV 2016, Proceedings of the 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part VIII 14; Springer: Cham, Switzerland, 2016; pp. 628–644. [Google Scholar]
Mescheder, L.; Oechsle, M.; Niemeyer, M.; Nowozin, S.; Geiger, A. Occupancy networks: Learning 3d reconstruction in function space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4460–4470. [Google Scholar]
Ma, Y.; Luo, Y.; Yang, Z. Geometric operator convolutional neural network. arXiv 2018, arXiv:1809.01016. [Google Scholar]
Cao, W.; Yan, Z.; He, Z.; He, Z. A comprehensive survey on geometric deep learning. IEEE Access 2020, 8, 35929–35949. [Google Scholar] [CrossRef]
Xie, Y.; Tian, J.; Zhu, X.X. Linking points with labels in 3D: A review of point cloud semantic segmentation. IEEE Geosci. Remote Sens. Mag. 2020, 8, 38–59. [Google Scholar] [CrossRef]
Zhang, R.; Wu, Y.; Jin, W.; Meng, X. Deep-Learning-Based Point Cloud Semantic Segmentation: A Survey. Electronics 2023, 12, 3642. [Google Scholar] [CrossRef]
Li, Y.; Bu, R.; Sun, M.; Wu, W.; Di, X.; Chen, B. Pointcnn: Convolution on x-transformed points. Adv. Neural Inf. Process. Syst. 2018, 848–857. [Google Scholar]
Wang, Y.; Sun, Y.; Liu, Z.; Sarma, S.E.; Bronstein, M.M.; Solomon, J.M. Dynamic graph cnn for learning on point clouds. ACM Trans. Graph. (TOG) 2019, 38, 1–12. [Google Scholar] [CrossRef]
Thomas, H.; Qi, C.R.; Deschaud, J.E.; Marcotegui, B.; Goulette, F.; Guibas, L.J. Kpconv: Flexible and deformable convolution for point clouds. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6411–6420. [Google Scholar]
Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Adv. Neural Inf. Process. Syst. 2017, 3001–3010. [Google Scholar]
Qian, G.; Li, Y.; Peng, H.; Mai, J.; Hammoud, H.; Elhoseiny, M.; Ghanem, B. Pointnext: Revisiting pointnet++ with improved training and scaling strategies. Adv. Neural Inf. Process. Syst. 2022, 35, 23192–23204. [Google Scholar]
Zhao, H.; Jiang, L.; Jia, J.; Torr, P.H.; Koltun, V. Point transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 16259–16268. [Google Scholar]
Lai, X.; Liu, J.; Jiang, L.; Wang, L.; Zhao, H.; Liu, S.; Qi, X.; Jia, J. Stratified transformer for 3d point cloud segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 8500–8509. [Google Scholar]
Wang, Y.; Solomon, J.M. Deep closest point: Learning representations for point cloud registration. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 3523–3532. [Google Scholar]
Chang, S.; Ahn, C.; Lee, M.; Oh, S. Graph-matching-based correspondence search for nonrigid point cloud registration. Comput. Vis. Image Underst. 2020, 192, 102899. [Google Scholar] [CrossRef]
Rusu, R.B.; Blodow, N.; Beetz, M. Fast point feature histograms (FPFH) for 3D registration. In Proceedings of the 2009 IEEE International Conference on Robotics and Automation, Kobe, Japan, 12–17 May 2009; pp. 3212–3217. [Google Scholar]
Wu, Q.; Hao, J.K. A review on algorithms for maximum clique problems. Eur. J. Oper. Res. 2015, 242, 693–709. [Google Scholar] [CrossRef]
Cheng, J.; Ke, Y.; Fu, A.W.C.; Yu, J.X.; Zhu, L. Finding maximal cliques in massive networks. ACM Trans. Database Syst. (TODS) 2011, 36, 21. [Google Scholar] [CrossRef]
Bron, C.; Kerbosch, J. Algorithm 457: Finding all cliques of an undirected graph. Commun. ACM 1973, 16, 575–577. [Google Scholar] [CrossRef]
Oomori, S.; Nishida, T.; Kurogi, S. Point cloud matching using singular value decomposition. Artif. Life Robot. 2016, 21, 149–154. [Google Scholar] [CrossRef]
Fischer, A.; Izmailov, A.F.; Solodov, M.V. The Levenberg–Marquardt method: An overview of modern convergence theories and more. Comput. Optim. Appl. 2024, 89, 33–67. [Google Scholar] [CrossRef]
Wu, Z.; Song, S.; Khosla, A.; Yu, F.; Zhang, L.; Tang, X.; Xiao, J. 3d shapenets: A deep representation for volumetric shapes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1912–1920. [Google Scholar]
Zhou, Q.Y.; Park, J.; Koltun, V. Open3D: A Modern Library for 3D Data Processing. arXiv 2018, arXiv:1801.09847. [Google Scholar]
Handa, A.; Whelan, T.; McDonald, J.; Davison, A.J. A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM. In Proceedings of the 2014 IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, China, 31 May–7 June 2014; pp. 1524–1531. [Google Scholar]
Handa, A.; Newcombe, R.A.; Angeli, A.; Davison, A.J. Real-time camera tracking: When is high frame-rate best? In Computer Vision–ECCV 2012, Proceedings of the 12th European Conference on Computer Vision, Florence, Italy, 7–13 October 2012; Proceedings, Part VII 12; Springer: Cham, Switzerland, 2012; pp. 222–235. [Google Scholar]

Figure 1. Overview of the fast indoor 3D reconstruction framework (a). The process involves initial point cloud segment construction followed by feature extraction using PointNet, as shown in (b), resulting in two distinct feature sets. These features are then processed by the maximal clique estimation method, as shown in (c), to regress the

{R, t}

matrix, followed by the semantic segmentation as shown in (d) for semantic recognition of individual point clouds, identifying elements such as walls, floors, and furniture.

Figure 1. Overview of the fast indoor 3D reconstruction framework (a). The process involves initial point cloud segment construction followed by feature extraction using PointNet, as shown in (b), resulting in two distinct feature sets. These features are then processed by the maximal clique estimation method, as shown in (c), to regress the

{R, t}

matrix, followed by the semantic segmentation as shown in (d) for semantic recognition of individual point clouds, identifying elements such as walls, floors, and furniture.

Figure 2. The overall process of finding corresponding point pairs begins by using RANSAC to obtain the initial set of corresponding pairs,

C_{i n i t i a l}

. Next, a graph is constructed from these pairs to identify several maximal cliques. Finally, the corresponding pairs within these cliques are refined, and the RT matrix is computed using singular value decomposition (SVD).

Figure 2. The overall process of finding corresponding point pairs begins by using RANSAC to obtain the initial set of corresponding pairs,

C_{i n i t i a l}

. Next, a graph is constructed from these pairs to identify several maximal cliques. Finally, the corresponding pairs within these cliques are refined, and the RT matrix is computed using singular value decomposition (SVD).

Figure 3. The overall layout of the CED curve indicates that our method performs better than PCRNet.

Figure 4. Indoor multidimensional reconstruction effect display based on LoungeRoom.

Figure 5. Reconstruction results based on real-world scenes.

Figure 6. Visual analysis of reconstructed heatmap.

Table 1. The mean and standard deviation of rotational and translational errors for PCRNet, MaxClique, and ICP methods, along with their respective AUC values when applicable. PCRNet shows lower rotational error compared to ICP, while MaxClique demonstrates the lowest error rates overall, indicating higher precision in alignment tasks.

	Rot. Error (deg)		Trans. Error		AUC
Algorithm	Mean	Std. Dev.	Mean	Std. Dev.
ICP	11.87	31.87	0.0282	0.0392	–
PCRNet	8.82	4.82	0.0077	0.0008	0.9544
Ours	1.03	2.56	0.0085	0.0024	0.9943

Table 2. Experiments to determine optimal

d_{c m p}

parameters.

Table 2. Experiments to determine optimal

d_{c m p}

parameters.

$d_{cmp}$	Construct Graph	Filtered Points	RE (Rotation Error)	TE (Translation Error)
3 pr	410.2 ms	2519	3.47	10.12
5 pr	351.6 ms	4033	1.13	2.22
8 pr	384.0 ms	4716	1.76	5.26
10 pr	358.7 ms	4823	0.83	2.49
15 pr	406.1 ms	4845	1.97	0.48
20 pr	358.9 ms	4813	2.85	8.27

Table 3. In the two datasets (the lounge and bedroom scenes) from the Open3D dataset, our enhanced method is compared with the optimization algorithm provided by the Open3D official library to assess the inference times of the four modules in 3D reconstruction.

Dataset	Method	Make Time	Register Time	Refine Time	Integrate Time	RMS
Bedroom	Open3D	1 h 2 min 16.31 s	41 min 47.54 s	4 min 10.87 s	2 min 09.60 s	0.0169
Bedroom	Ours	1 h 2 min 16.31 s	03 min 20.35 s	24.46 s	1 min 44.96 s	0.0076
Lounge	Open3D	35 min 10.03 s	16 min 27.79 s	02 min 04.45 s	1 min 07.48 s	0.0260
Lounge	Ours	35 min 10.03 s	02 min 31.65 s	26.32 s	1 min 11.14 s	0.0567

Table 4. In the four scenes from the ICL-NUIM dataset, our enhanced method is compared with the optimization algorithm provided by the Open3D official library to assess the inference times of the four modules in 3D reconstruction.

Dataset	Method	Make Time	Register Time	Refine Time	Integrate Time	RMS
Livingroom_0	Open3D	17 min 16.67 s	03 min 41.70 s	01 min 41.02 s	36.55 s	0.943
Livingroom_0	Ours	17 min 16.67 s	38.22 s	13.39 s	38.70 s	0.015
Livingroom_1	Open3D	12 min 21.79 s	01 min 41.66 s	27.06 s	29.62 s	0.122
Livingroom_1	Ours	12 min 21.79 s	25.93 s	10.05 s	30.95 s	0.133
Livingroom_2	Open3D	10 min 09.05 s	01 min 23.64 s	16.45 s	25.12 s	0.038
Livingroom_2	Ours	10 min 09.05 s	18.80 s	15.65 s	28.72 s	0.926
Livingroom_3	Open3D	18 min 38.48 s	02 min 39.43 s	56.67 s	36.23 s	0.051
Livingroom_3	Ours	18 min 38.48 s	30.83 s	18.96 s	38.71 s	0.990

Table 5. Comparison of performance of 3D scene reconstruction based on data collected from real-world scenes.

Dataset	Method	Make Time	Register Time	Refine Time	Integrate Time
Real-world scene	Open3D	21 min 33.14 s	02 min 40.17 s	02 min 48.06 s	02 min 40.05 s
Real-world scene	Ours	21 min 33.14 s	01 min 10.25 s	02 min 36.74 s	02 min 47.91 s

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhu, Y.; Li, L.; Liu, N.; Li, Q.; Yuan, Y. Indoor Multidimensional Reconstruction Based on Maximal Cliques. Mathematics 2025, 13, 1400. https://doi.org/10.3390/math13091400

AMA Style

Zhu Y, Li L, Liu N, Li Q, Yuan Y. Indoor Multidimensional Reconstruction Based on Maximal Cliques. Mathematics. 2025; 13(9):1400. https://doi.org/10.3390/math13091400

Chicago/Turabian Style

Zhu, Yongtong, Lei Li, Na Liu, Qingdu Li, and Ye Yuan. 2025. "Indoor Multidimensional Reconstruction Based on Maximal Cliques" Mathematics 13, no. 9: 1400. https://doi.org/10.3390/math13091400

APA Style

Zhu, Y., Li, L., Liu, N., Li, Q., & Yuan, Y. (2025). Indoor Multidimensional Reconstruction Based on Maximal Cliques. Mathematics, 13(9), 1400. https://doi.org/10.3390/math13091400

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Indoor Multidimensional Reconstruction Based on Maximal Cliques

Abstract

1. Introduction

2. Related Work

2.1. Traditional Reconstruction Method

2.2. Learned Reconstruction Method

2.3. Segmentation in Reconstruction

3. Method

3.1. Point Cloud Registration

3.2. Semantic Segmentation and Reconstruction Fusion

4. Experiments

4.1. Datasets

4.2. Registration Experiments

4.3. Reconstruction Experiments

4.4. Complexity Analysis

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI