Missing Region Completion Network for Large-Scale Laser-Scanned Point Clouds: Application to Transparent Visualization of Cultural Heritage

Li, Weite; Pan, Jiao; Hasegawa, Kyoko; Li, Liang; Tanaka, Satoshi

doi:10.3390/rs16152758

Open AccessArticle

Missing Region Completion Network for Large-Scale Laser-Scanned Point Clouds: Application to Transparent Visualization of Cultural Heritage

by

Weite Li

^1,*,

Jiao Pan

²,

Kyoko Hasegawa

³,

Liang Li

⁴

and

Satoshi Tanaka

⁴

¹

School of Artificial Intelligence, Chongqing Technology and Business University, Chongqing 400067, China

²

School of Intelligence Science and Technology, University of Science and Technology Beijing, Beijing 100083, China

³

School of Information and Telecommunication Engineering, Tokai University, Tokyo 108-8619, Japan

⁴

College of Information Science and Engineering, Ritsumeikan University, Ibaraki 567-8570, Japan

^*

Author to whom correspondence should be addressed.

Remote Sens. 2024, 16(15), 2758; https://doi.org/10.3390/rs16152758 (registering DOI)

Submission received: 17 June 2024 / Revised: 26 July 2024 / Accepted: 26 July 2024 / Published: 28 July 2024

(This article belongs to the Special Issue Application of Remote Sensing in Cultural Heritage Research II)

Download

Browse Figures

Versions Notes

Abstract

:

The digital documentation and analysis of cultural heritage increasingly rely on high-precision three-dimensional point cloud data, which often suffers from missing regions due to limitations in acquisition conditions, hindering subsequent analyses and applications. Point cloud completion techniques, by predicting and filling these missing regions, are vital for restoring the integrity of cultural heritage structures, enhancing restoration accuracy and efficiency. In this paper, for challenges in processing large-scale cultural heritage point clouds, particularly the slow processing speed and visualization impairments from uneven point density during completion, we propose a point cloud completion employing centroid-based voxel feature extraction, which significantly accelerates feature extraction for massive point clouds. Coupled with an efficient upsampling module, it achieves a uniform point distribution. Experimental results show that the proposed method matches SOTA performance in completion accuracy while surpassing in point density uniformity, demonstrating capability in handling larger-scale point cloud data, and accelerating the processing of voluminous point clouds. In general, the proposed method markedly enhances the efficiency and quality of large-scale point cloud completion, holding significant value for the digital preservation and restoration of cultural heritage.

Keywords:

point cloud completion; 3D laser-scanned point cloud; point cloud upsampling; transparent visualization; cultural heritage; larger-scale point cloud

1. Introduction

Against the backdrop of the digital era, the three-dimensional (3D) digital preservation of cultural heritage has emerged as a crucial undertaking [1,2]. Point cloud data, serving as a bridge between the physical and digital worlds, furnishes robust support for the precise documentation and analysis of cultural heritage. However, in the context of point cloud laser scanning of extensive heritage sites, the missing regions in the original laser-scanned point cloud data become a significant obstacle for reconstruction work. These deficiencies, which may result from natural erosion, human-induced damage, or technological constraints, lead to incomplete point cloud data that hinder subsequent analysis, conservation, and exhibition endeavors.

Despite significant advancements in deep learning technologies that have propelled progress in areas such as image recognition and natural language processing, and their nascent applications to point cloud data processing [3,4], the task of shape completion for missing regions in large-scale cultural heritage point clouds remains encumbered by high computational costs, sluggish processing speeds, and suboptimal visual quality of the reconstructed results. Specifically, (1) Inefficient Computational Performance: Existing point cloud completion networks often demand substantial computational resources, particularly when dealing with realistically large-scale point cloud data. These network architectures typically revolve around a strategy of extracting features independently for each point, which proves excessively burdensome when confronted with data comprising tens or hundreds of thousands, even millions, of points [5]. While they may perform satisfactorily on simplified synthetic datasets, the transition to large-scale real-world scenarios reveals that this per-point feature computation approach significantly impedes processing speed, thereby curtailing the practicality of such methods in real-time applications or scenarios where stringent time constraints apply; (2) Redundancy in Feature Extraction: While point-by-point feature extraction aims to capture details in point clouds, it often leads to a wastage of substantial computational resources on non-essential information when dealing with large-scale point clouds. Shape completion of point clouds does not necessitate exhaustive feature descriptions for every point, but instead relies more heavily on the precise localization of key points and their characteristic expressions [6]. Consequently, the efficient screening and utilization of these key points to curtail superfluous computations have emerged as pivotal for enhancing processing efficiency; and (3) Point Distribution Uniformity: Existing methods also face limitations concerning the uniformity of the generated completed point clouds. The completed point clouds may exhibit areas of excessive density alongside sparse regions, which not only impairs the visual quality and subsequent geometric analyses of the point cloud but can also lead to distortions in understanding the structure of the target. The absence of an effective mechanism to ensure balanced point density and consistency in the completed shapes hampers their potential application in high-fidelity 3D reconstruction, virtual reality, and similar domains.

Therefore, optimizing the design of the point cloud completion network for a more efficient feature extraction mechanism, reducing computational redundancy, and ensuring the uniformity of the completed point cloud in the task of point cloud completion for large-scale missing regions of cultural heritage are the important issues that need to be solved urgently. Thus, the new contributions of this paper can be summarized as follows: (1) We employ a centroid-based feature extraction strategy for the efficiency bottlenecks in point cloud data processing. This approach initiates with voxelization of large-scale point clouds, effectively reducing the dimensional complexity of the data. Subsequently, by computing the centroids of each voxel grid and encoding the positional information of these centroids, we can efficiently capture the macrostructural features of the point cloud. These encoded centroid features are then fed into the Transformer block, which leverages attention mechanisms to predict key points features within missing regions. (2) We employ a feature-expansion module in the generation phase of point cloud completion. This module takes the predicted key point features as input and restores the full structure of the point cloud through the upsampling. Additionally, we incorporate a repulsion-loss function to regulate the distribution of generated points, thus guaranteeing a uniform point density across different regions. This effectively prevents over-dense or under-dense areas commonly encountered in existing methods, enhancing both the visual quality and geometric consistency of the point cloud that benefits applications such as 3D reconstruction and virtual reality.

This paper is organized as follows. In Section 2, we briefly review existing point cloud completion methods and their problems, including point feature extraction and generation of dense point clouds from sparse point clouds. Also, we review our previous work on transparent visualization. In Section 3, we illustrate the network architecture of the proposed method in detail, and for each part of the processing, we explain its structure and role. Also, we explain the loss function used in the network. In Section 4, we illustrate the experimental conditions and show the experimental results compared with existing methods on simple datasets to demonstrate the advantages of the proposed method in terms of processing efficiency. We also show the results of the application of the proposed method on large-scale cultural heritage point cloud data to demonstrate that the proposed method is effective in improving the comprehensibility of transparent visualization results. In Section 5, we discuss the problems in the experiments, summarize our work, and describe some outlooks for future work. In Section 6, we summarize our achievements and describe future work.

2. Related Works

2.1. Deep Learning-Based Point Cloud Shape Completion

In the reconstruction task for incomplete point clouds, generating support points from the predicted local point cloud features with missing shapes and recovering the overall shape features are the key steps to realize the point cloud reconstruction. Huang et al. [7] proposed a novel multi-resolution encoder (MRE), which extracts multi-layer features from local point clouds and their low-resolution feature points using the new combined multi-layer perception feature extractor (CMLP). The model only takes parts of the point cloud as input to keep the spatial structure of the original point cloud and only outputs the missing parts of the original point cloud, thus realizing accurate and high-fidelity point cloud completion. To solve the problem that the structural detail information is lost in the point cloud completion, Wen et al. [8] proposed a skip-attention mechanism, which can effectively capture the local structural detail features of the missing parts and selectively utilize these uncertain regions for 3D reconstruction. Xie et al. [9] proposed a grid-based residual network for point cloud completion, which introduces a 3D grid as an intermediate representation to normalize the unstructured point cloud and extracts the features of each point in the coarse point cloud by concatenating the features of the eight vertices corresponding to the 3D grid cells where the original points are located through triple feature extraction, and then obtains the final point cloud completion through a multilayer perceptron (MLP). Although MLPs can extract local region features from an incomplete point cloud, existing works that apply MLPs directly to point clouds cannot achieve accurate support point prediction because MLPs cannot efficiently obtain the geometric structure and contextual information of the point cloud. Recently, Yu et al. [10] proposed a point cloud completion network grounded in self-attention mechanisms, framing the task of point cloud completion as a set-to-set translation problem. This approach translates from the feature representations of known points to the missing portions of the point cloud. Initially, an encoder is employed to meticulously capture the structural information and inter-point relationships within the observed portion of the point cloud. Subsequently, a decoder is utilized to learn the reciprocal relations between the absent and present parts, thereby facilitating the reconstruction of the missing portion of the point cloud. Pan et al. [11] proposed a variational association point cloud completion network comprised of two cascaded sub-networks: a probabilistic model network and a relation-augmentation network. This methodology initiates by generating a coarse point cloud shape skeleton based on the incomplete input. It then proceeds, incorporating the preliminary coarse framework with features extracted from the incomplete point cloud, to predict shape structures with enhanced relational coherence, thereby boosting the fidelity of reconstructed point cloud detail.

However, the existing point cloud completion works to preprocess the target point cloud into a sparse point cloud with a fixed number of points (usually a few thousand points) to facilitate the fast encoding of the input point cloud. For large-scale point cloud data (usually more than 10 million points) in real surveying and mapping scenarios, downsampling the target point cloud to a fixed number of small point clouds and then the encoding will lose most of the feature information. This prediction from a local perspective will lead to a lower accuracy of the final output, which objectively makes it difficult for the above methods to handle large-scale point cloud data effectively.

2.2. Dense Point Cloud Reconstruction for Sparse Regions

The problem of dense point cloud reconstruction is essentially similar to the super-resolution problem of images. However, because point clouds have no spatial order or regular structure, unlike the image space represented by a regular grid, simple interpolation between the input points does not give satisfactory results. Early approaches tried various optimization strategies to generate dense point clouds without using deep learning models to solve this problem. For example, Alexa et al. [12] upsampled points by referring to Voronoi diagrams, which require surface-smoothing assumptions and are computed on a moving least squares surface. Subsequently, the locally optimal projection (LOP) operator [13,14] was shown to be effective for point resampling and surface reconstruction based on the L1 median. In recent years, deep neural networks have achieved outstanding performances in various point cloud-processing tasks, including point cloud registration [15,16], point cloud completion [7,8,9,10,11], shape classification [3,17,18], and semantic scene segmentation [19,20]. In the field of dense point cloud reconstruction, Yu et al. [21] proposed the first deep learning algorithm for point cloud upsampling, which works on patches by learning multilevel per-point features and expands the point cloud by multibranch convolution. Wang et al. [22] introduce a patch-based progressive upsampling technique for point clouds, capable of gradually upscaling input point clouds to higher sampling densities. Nonetheless, these methods are based on techniques in the image-processing domains and thereby give less consideration to the intrinsic geometric features of the input point clouds. This oversight leads to the incorporation of various artificial artifacts in the generated outputs. Furthermore, surface normal vectors, being pivotal geometric attributes, are presently not adequately accounted for in either the upsampling process or the resultant outputs. To solve this problem, Li et al. [23] proposed a framework based on generative adversarial networks (GAN) [24] to generate high-quality dense point clouds. This framework incorporates an upsampling module within the generator that operates in an up-down and then up-again fashion, designed to perform feature upsampling with error feedback and self-correction for point features. Qian et al. [25] propose a dense point cloud reconstruction model that learns local parametric normal vectors for each point. The method operates by generating samples within a two-dimensional parameter space, which are subsequently transformed into a three-dimensional space through a learned linear transformation. This transformation maps the reconstructed points on planar slices back onto the surface, thereby achieving dense point cloud generation. The approach incorporates an understanding of local geometry by learning per-point parameters, ensuring a more faithful representation of the original surface structure in the reconstructed dense point cloud.

However, research on dense point cloud reconstruction has predominantly focused on small-scale synthetic datasets such as ShapeNet [26] and ModelNet40 [27]. In contrast, studies on dense point cloud reconstruction for the large-scale cultural heritage point cloud with sparse regions are relatively scarce. Insufficient feature information in sparse regions significantly affects the accuracy of dense point cloud reconstruction, making it challenging to fuse features with dense regions [28] effectively, which leads to noisy or distorted dense point clouds after reconstruction, failing to represent the object’s shape and structure accurately.

2.3. Transparent Visualization for Large-Scale Point Clouds

Rapid advances in 3D surveying and mapping technology have made it possible to document complex real-world scenarios quickly and accurately. 3D scanning data is typically stored as large-scale point cloud data with a complex structure. When these objects have internal structures, and the acquired point clouds include both these internal configurations and the external contours, the overall complexity of the 3D structure becomes even more pronounced. To visualize such large point clouds, point-based rendering (PBR) [29], which treats each point as a rendering primitive, serves as the simplest and most widely used approach. Numerous works also employ point-based rendering directly in the visualization analysis of cultural heritage [30,31,32]. However, this opaque visualization technique fails to achieve perspective imaging, thereby posing challenges in acquiring internal structural information of tangible cultural heritage within intricate scenarios. Consequently, to observe 3D point clouds with intricate configurations, a transparent visualization that permits simultaneous analysis of both internal structures and external forms becomes necessary. In our previous work [33,34,35], we proposed a method for transparent visualization based on stochastic point rendering (SPBR). This method achieves precise depth perception of large-scale point clouds without necessitating any depth sorting procedures. It enables the transparent visualization of both internal structures and external forms, thereby enhancing the interpretability and visual quality of point cloud data in complex scenes.

3. Proposed Method for Missing Region Reconstruction

3.1. Overview

For large-scale laser-scanned point cloud data in the real world, the aim of our work is to achieve high-quality point cloud completion for the missing regions existing in the original point cloud and obtain a point cloud with a complete shape and uniform point density distribution to enhance its visibility in transparent visualization. Our point cloud completion network has three components: Centroid-aware Feature Extraction, Transformer Block, and Dense Point Cloud Generation. First, we perform voxelization of the original large-scale point cloud, compute the centroid position within each voxel grid, and utilize it as a center point for extracting features, then embed the centroid position information to generate the local feature for each grid. Then, the local features of all grids are used as inputs to the Transformer block, where the center points of the missing regions are generated by an Encoder-Decoder module. Finally, the center points are reconstructed into a dense point cloud with a well-distributed point density by the point cloud upsampling module. The overall framework of the proposed method is illustrated in Figure 1. We introduce our method in detail, as follows:

3.2. Centroid-Aware Feature Extraction

The KNN algorithm is highly complex when dealing with large-scale point cloud data. For each query point, the distance between it and all other points needs to be calculated, and the complexity is usually

O (n)

. Voxelization transforms continuous point cloud data into structured voxel data by discretizing the 3D space, which helps to simplify the subsequent feature extraction, reduce unnecessary computations, and better capture the geometric and topological information in the point cloud data.

In our work, to improve the efficiency of voxelization for large-scale scanned point clouds, we employ the Minkowski Engine [36] to construct voxel grids on the high-dimensional sparse tensor to reduce the computational cost and memory consumption during convolution and pooling operations. Specifically, for an input point cloud

P = {\{p_{n} \in R^{3 \times 1}\}}_{n = 1}^{N}

with N points, a voxel grid

G = {\{(c_{i}, f_{i})\}}_{i = 1}^{I}

is constructed on the sparse tensor, where

c_{i}

is the i-th grid centroid coordinate, and

f_{i}

is the centroid feature of the grid. For each grid containing feature information, we compute the average value of the points within each voxel grid as the centroid

c_{i}

. Then, inspired by the positional encoding operation in [37], we compute the relative positions of the points within the voxel with respect to the centroid and use the MLP for position encoding to produce the centroid-to-point positional encoding

e_{n}

as follows:

e_{n} = M L P (p_{n} - c_{i}),

(1)

where

p_{n}

is a point within the i-th voxel grid and

c_{i}

is the centroid within the i-th voxel grid corresponding to

p_{n}

. Then, a permutation-invariant operator

Φ

is used to aggregate the position encoding

e_{n}

of each point in the grid as the centroid feature

f_{i} = Φ (e_{n})

of each voxel grid to achieve efficient feature extraction for large-scale point clouds. Finally, we concatenate the centroid with its neighborhood feature into a grid feature vector

F_{i}

as the input to the Transformer block, which can be computed as follows:

F_{i} = f_{i} + ω (c_{i}),

(2)

where

ω

is a position embedding [10,38] operation in Transformers, which is used to capture the location information of the voxel grid.

3.3. Transformer Block

Point cloud data contains extensive geometric and spatial information, which makes it difficult for the traditional Transformer to capture the relative positions and global structural relations among point clouds. Therefore, it is important to utilize these geometric properties in the encoding process of the Transformer.

In our work, we adopt the geometry-aware Transformer block proposed in [10] to model the geometric relations that facilitate Transformers to leverage better the inductive bias about 3D geometric structures of point clouds, enabling the model to understand and utilize the spatial and geometric relations in the point cloud. Specifically, for the voxel grid features

F = {\{f_{i}\}}_{i = 1}^{I}

extracted in voxelization, we first input it into the encoder to obtain the encoder feature vector

R = {\{r_{i}\}}_{i = 1}^{I}

. Then, we generate the query embeddings dynamically conditioned on the encoder outputs in the query generator module. Following to the strategy in [39], we generate

K \times 3

dimensional features from

R

using a linear projection layer, and then reshape these features into K key point coordinates

C = {\{c_{i}\}}_{i = 1}^{I}

. Finally, we concatenate the global features of the encoder and the coordinates using the MLP to generate the key point feature

Z = {\{z_{k}\}}_{k = 1}^{K}

, which can be expressed as:

Z = M L P (T_{E} (F) ⨁ C),

(3)

where

T_{E}

is the encoder model,

F = {\{f_{i}\}}_{i = 1}^{I}

are the voxel grid features,

C = {\{c_{i}\}}_{i = 1}^{I}

are the key point coordinates of the predicted point cloud,

Z = {\{z_{k}\}}_{k = 1}^{K}

are the predicted key point features of the point cloud, and K is the number of the predicted key points.

3.4. Dense Point Cloud Generation

Generating a complete dense point cloud from predicted key points and features is the final step in point cloud completion. We consider that while using the FoldingNet-based method [40] is an easy way to generate dense point clouds, for large-scale point clouds with relatively complex structures, such a simple model would lose some local shape information and the point density distribution in the generated results would not be uniform.

Therefore, in our work, we employ the feature-expansion module from our previous work [41] to generate dense point clouds with uniform point density distribution from key point features. Specifically, for the key point features

Z = {\{z_{k}\}}_{k = 1}^{K}

obtained in the Transformer block, we use the upsampling operator in [23] to produce

Z_{u p}

, and then use the downsampling operator to compute

Z_{u p}

to produce

Z_{t}

. We next use the upsampling operator to upsample the difference (denoted as

σ

) between

Z_{u p}

and

Z_{t}

, and concatenate the resulting

σ_{u p}

with

Z_{u p}

to produce the expanded dense point feature

Z^{'}

. Finally, for the expanded feature

Z^{'}

, we regress the 3D coordinates through a series of fully connected layers on the feature of each point and produce the dense point cloud result.

3.5. Loss Functions

In our work, to make both the local key points and the dense point cloud closer to the Ground Truth, we adopt the Chamfer Distance [42] as the reconstruction loss to evaluate the similarity. Specifically, the reconstruction loss is the Chamfer Distance between the key points

C

, the generated dense point cloud data

D,

and the Ground Truth

S

:

L_{r e c}^{C} = d_{C D} (C, S) = \sum_{c_{i} \in C} \min_{s_{i} \in S} {‖c_{i} - s_{i}‖}_{2}^{2} + \sum_{s_{i} \in S} \min_{c_{i} \in C} {‖s_{i} - c_{i}‖}_{2}^{2},

(4)

L_{r e c}^{D} = d_{C D} (D, S) = \sum_{d_{i} \in D} \min_{s_{i} \in S} {‖d_{i} - s_{i}‖}_{2}^{2} + \sum_{s_{i} \in S} \min_{d_{i} \in D} {‖s_{i} - d_{i}‖}_{2}^{2},

(5)

where

c_{i}

is the point in the key points

C

,

d_{i}

is the point in the generated dense point cloud data

D

,

s_{i}

is the point in the Ground Truth

S

, and

{‖\cdot‖}_{2}

denotes the L2 norm of a vector.

The reconstruction loss makes the dense point cloud fit the Ground Truth better. Furthermore, to improve the local uniformity of the dense point cloud, we add the repulsion loss [21] to distribute the generated points rather more uniformly than distributing close to the original points:

L_{r e p} = \sum_{i = 0}^{N} \sum_{i^{'} \in K (i)} η ({‖d_{i^{'}} - d_{i}‖}_{2}) ω ({‖d_{i^{'}} - d_{i}‖}_{2})

(6)

where

K (i)

is the index set of the k-nearest neighbors of point

d_{i}

.

η (r) = - r

is a decreasing function, and

ω (r) = e^{- r^{2} ∕ h^{2}}

[13,14] is a fast-decaying weight function; they are adopted to penalize

d_{i}

when it is too close to its neighboring points in

K (i)

. Here,

h

is a hyperparameter of

ω

, and we set it to

0.03

in our experiments.

Altogether, we train the network in an end-to-end manner by minimizing the following joint loss function:

L = L_{r e c}^{C} + L_{r e c}^{D} + a L_{r e p}

(7)

where

a

is the coefficient to balance the reconstruction loss and repulsion loss.

4. Experiments

In this section, we delineate the specific details of our experimental implementation and present the corresponding results. Section 4.1 provides an exposition of the dataset utilized for our experiments, accompanied by a description of the experimental conditions. In Section 4.2, we conduct a comparative analysis between the proposed method and the existing point cloud completion network, utilizing a simple synthetic dataset. This comparison includes both visualization results and precise numerical analyses. Section 4.3 illustrates the application of our proposed method to a real scanned point cloud dataset and compare it with existing methods. In Section 4.4, we show the experimental results derived from large-scale cultural heritage scanned point clouds, supplemented with a transparent visualization-based analysis of the reconstructed point clouds.

4.1. Datasets and Implementation Details

To effectively evaluate the performance of our proposed method in comparison to existing methods on the point cloud completion task, we performed experiments utilizing the ShapeNet-55 synthetic dataset. Unlike previous datasets for point cloud completion such as PCN, which were limited to only 8 categories of point clouds, ShapeNet-55 offers a more extensive range by encompassing all 55 categories from ShapeNet. This broader selection facilitates a more comprehensive assessment of the model’s capability across datasets with varied characteristics. We followed the strategy of [10] to divide all the data into training and validation sets according to the ratio of 8:2. Consequently, 41,952 data points were allocated for training, while the remaining 10,518 were reserved for evaluation. Specifically, for each Ground Truth comprising 16,384 points, we randomly select a viewpoint and then keep the nearest 2048 points to that viewpoint to obtain the partial point clouds.

To evaluate the processing efficacy of the proposed method with larger-scale real scanned point cloud data and address practical challenges encountered in real laser mapping scenarios, we collected and generated a new real scanned point cloud dataset. This dataset comprises two distinct categories: roof scanning data and ground scanning data. We initially generated partial point clouds for the roof scanning data utilizing the method employed in the ShapeNet-55. This approach emulates scenarios where distant objects remain unscanned by the laser during mapping. Subsequently, we manually generated 473 instances of partial point clouds to simulate occlusion effects. For the ground data, we selected 50 ground regions from our real scanned point cloud and manually created circular holes to emulate blind zones on the ground in mapping scenarios.

We implemented our network on PyTorch and trained it on a workstation with an Intel Xeon(R) Platinum 8255C CPU and a single NVIDIA Geforce RTX 3090 GPU. For the optimization, we trained the network for 200 epochs using the Adam algorithm [43] with a batch size of 32 and set the learning rate as 0.0001.

To quantitatively evaluate the quality of the reconstructed point cloud, we employed five metrics. First, we followed existing work [7,9,44] using the mean chamfer distance as an evaluation metric, which measures the distance between the predicted point cloud and the Ground Truth in set-level. Also, to evaluate the reconstruction quality of the final complete point cloud, we adopted the normalized uniformity coefficient (NUC) [21] to calculate the uniformity of point density distribution of the results. In addition, for real scanned point clouds, we adopted Hausdorff distance [45] and Cloud-to-Cloud distance (C2C) [46] to evaluate the similarity between the final complete point cloud and the Ground Truth.

4.2. Synthetic Dataset Completion Results

We first conducted qualitative and quantitative comparison experiments with existing point cloud completion networks GRNet [9] and PoinTr [10] on the simple synthetic dataset ShapeNet-55. To fairly conduct the comparison experiments, we deployed the existing methods on the same workstation according to their open-source codes and conducted the experiments with the best hyperparameters provided in their papers. First, we compared the performance of the proposed method with the existing method in the overall point cloud shape completion task. Figure 2 shows the visualization results of six different categories of point clouds that we selected from ShapeNet-55. From Figure 2, we can see that compared to GRNet, PoinTr and the proposed method have a higher degree of shape recovery in the missing regions, and more new points are generated in the regions where the shape was originally missing. Meanwhile, for the proposed method and PoinTr, although there is not much difference in the shape-completion results, for the new points generated in the missing regions, the set of points generated by the proposed method are relatively more uniform in the distribution of point densities.

Table 1 demonstrates the numerical evaluation results of the proposed method with the existing method on the whole dataset. Our proposed method improves compared to GRNet in each category and the final average value. Meanwhile, the performance of the proposed method is basically at the same level as the current SOTA method PoinTr, and even the numerical computation results of the proposed method are better on slightly more complex data such as airplane. In addition, among the eight categories of data we selected, the categories in the first four columns contain more than 1800 data, while the categories in the last four columns contain no more than 300 data. However, for the final numerical computation results, there is no significant fluctuation in the numerical values of the data in the first four columns and the last four columns, which indicates that the difference in the number of samples in the training set does not significantly affect the final point cloud shape completion effect. It proves that the proposed method is also robust in the case of the dataset with the difference in the number of samples.

We also evaluated the uniformity of the point density distribution of the final reconstructed point cloud on ShapeNet-55, calculated using the previously mentioned NUC metric. Table 2 lists the results of the quantitative comparison between the proposed method and the existing methods. We set a different

p

to evaluate the uniformity of the results with reference to the definition of

p

in [21], which is the percentage of a sampled patch in a single data over the total object surface area. Note that we use a smaller

p

to evaluate local distribution uniformity in small regions, while larger values of the parameter

p

are used to compute NUC results for evaluating larger global uniformity. Since PoinTr adopts FoldingNet for the reconstruction of coarse point clouds to dense point clouds, there may be underfitting when dealing with complex local shape variations. This may cause the model to fail to accurately reconstruct or encode the point cloud data in some cases. In contrast, since the proposed method incorporates multi-dimensional feature information in the upsampling module and uses the repulsion-loss function to prevent aggregation of the generated new points, the proposed method generates new points with better uniformity in the missing regions, especially in terms of local uniformity.

To evaluate the processing performance of the proposed method with the existing method on different scales of point clouds, we randomly selected 1000 data from ShapeNet-55, generated datasets of different scales (the number of points contained in a single data ranges from 8192 to 524,288 points) by controlling the sampling rate, and then statistically counted the time required for the three methods to be trained on each dataset. Figure 3 shows the time spent training the proposed method with the existing method on different data scales. We can see from Figure 3 that when the data size of the point cloud is relatively small, there is not much difference in processing performance among the three methods. This is because the kNN algorithm adopted by PoinTr has a low search complexity when dealing with data with a small number of points, and it does not take much time to calculate the distance between points. However, with the increase in data size, the search complexity will increase linearly, and calculating the distance between the target point and all points will become very time-consuming. Loading all the points into memory may lead to excessive memory consumption, even beyond the limit of available memory. Compared to PoinTr, which adopts the kNN algorithm to extract the features of the input point cloud, the proposed method adopts the voxelization to effectively compress the data scale of the input point cloud, which can better cope with the large-scale point cloud by reducing the amount of computation. As a result, under the same experimental conditions, the proposed method can handle point clouds containing up to about 550,000 points, while PoinTr and GRNet will fail to train due to being out of memory after the number of points exceeds 250,000 points. Moreover, in the computational efficiency, with the increase of the number of points, the computational time spent by the proposed method increases more slowly, proving the proposed method’s robustness in dealing with point clouds with different scales of data.

4.3. Results on Real Scanned Point Cloud Datasets

To apply the proposed method to large-scale laser-scanned point cloud data in real scenes, we collected and generated a new dataset, as shown in Figure 4, which contains two categories of data that need to be applied with the point cloud completion technique in real surveying and mapping scenes. First, when scanning an outdoor building with a laser, the laser emitted by a scanner set up on the ground often has difficulty reaching the roof region, which causes the roof region in the raw point cloud obtained to be incomplete or sparse. Therefore, we collected and generated a set of Japanese-style roof scanning data to help solve the problem in Japanese-style cultural heritage sites with similar structures. Then, due to the blind spot problem in the field of view caused by the design of the laser scanner, there are circular holes in the scanned data where the scanner is positioned on the ground. This issue adversely impacts visualization quality. Consequently, we isolated the ground area from our scanned point cloud data and generated a category of dataset for completing circular holes in the ground.

Figure 5 show the results of the proposed method and PoinTr on the real scanned dataset. From Figure 5, for incomplete point clouds of the roof region, both methods can recover the shape of the roof effectively. Although there are a few outliers at the edge contours, such outliers are almost unrecognizable in the visualization for large-scanned point clouds in outdoor scenes. Compared with the simple synthetic dataset ShapeNet-55, the real roof scanning data has more complex shapes and more points, and the proposed method can complete the point cloud completion task on these two types of datasets with large differences, which proves the effectiveness of the proposed method in dealing with the more diverse shapes of point clouds.

We also quantitatively evaluated the performance of the proposed method and PoinTr on real scanned datasets. The results of the quantitative evaluation of individual data and the whole dataset are shown in Table 3, where Data 1 is the point cloud of the roof with the eave parts in Figure 5, Data 2 is the point cloud of the flatter roof in Figure 5, and Data 3 is the point cloud of the ground in Figure 5. We can see from Table 3 that PoinTr is better than the proposed method in both evaluation metrics, C2C Distance and Chamfer Distance. This is because the voxelization method of the proposed method loses some features when dealing with large-scale point clouds, which affects the quality of the final point cloud completion. However, the gap between the proposed method and PoinTr in these two metrics is not large. We consider such a gap acceptable when dealing with large-scale point cloud data in real scenes since such small differences are almost unrecognizable in visualization. It should also be noted that the proposed method has a significant improvement in the evaluation metric of Hausdorff Distance compared to PoinTr, which means that the shape of the result is closer to the real data by the proposed method’s complementation. This is because compared with FloadingNet, which has a larger error, the upsampling module we adopted can generate new points that are closer to the real surface.

On the real scanned point cloud dataset, we also used the NUC metric to quantitatively evaluate the quality of the point cloud completion. Table 4 lists the uniformity of the point density distributions of the two methods in local regions of different sizes. From Table 4, we can see that the point density distributions of the completed point clouds obtained by applying the proposed methods are more uniform, both in the smaller local area and in the larger global area. Moreover, in our transparent visualization method, since the opacity is affected by the point density, the point cloud with non-uniform point density distribution will have the problem that some regions are opaque while some regions are too transparent in the transparent visualization. Therefore, the completed point cloud with more uniform point density distribution obtained by the proposed method will effectively improve its visibility in transparent visualization.

4.4. Application to Large-Scale Cultural Heritage Scanning Data

In this subsection, we show the results of applying the proposed method to large-scale lasing scanned point cloud data of cultural heritages. In Section 4.4.1 we show the point cloud completion results and transparent visualization results for the roof region in the Waraku-an scanned data, and in Section 4.4.2 we apply the proposed method to the ground hole due to the field of view blindness in the Zuiganji Temple scanned data and show its transparent visualization results.

4.4.1. Results of Completion and Visualization for Waraku-an

Waraku-an is a teahouse located in Nijo Castle in Kyoto, a Unesco World Heritage Site. Now, Waraku-an serves as a venue for tea ceremonies held for Japanese and overseas dignitaries and guests. It also hosts the annual grand autumn tea ceremony organized by Kyoto City for members of the public. Figure 6 shows the original point cloud data of Waraku-an obtained by using a ground-based laser scanner (Z+F IMAGER 5016). Due to the limited view field of the laser scanner on the ground, the laser cannot fully reach the higher roof regions. This resulted in many missing areas or sparse point density areas in the roof region of the original point cloud. Therefore, applying the proposed method to the roof regions in Waraku-an is necessary to reconstruct the complete roof shape and thus improve visibility.

To improve the point cloud completion accuracy, we adopt the strategy of splitting the large-scale point cloud data into separate parts for processing and then fusing. As shown in Figure 7, we segment the roof region in the Waraku-an scanned data for separate processing. Specifically, we take the beam in the center as the boundary and split the roof in Figure 7a into two front and back point clouds for point cloud completion separately. For the roof point cloud shown in Figure 7b, we divided it into the eave region on both sides and the roof region in the center for separate processing, considering that it contains different structures.

Figure 8 shows the fusion results with point cloud completion. Note that in our point cloud completion network, the RGB features of the original input point cloud are not used, so all the output point clouds have no color information, which means the color of all the reconstructed point clouds is white. Here we set the obtained white roof point cloud to a similar color as the original scanned roof data for better visibility of the result in the transparent visualization. Figure 8 shows the fusion result by applying the proposed method to the roof data in Figure 7, and we can see that the missing part of the shape in the original roof is complemented to become a roof with a complete shape. Furthermore, a traditional roof is composed of tiles stacked on top of each other, and the undulating tiles form gaps due to the presence of laser blindness in each other’s shadows. By applying the proposed method, the gaps between the tiles are filled and the roof has a certain undulating shape.

We fuse the reconstructed roof data with the original point cloud to compare the enhancement in the visibility of the real laser-scanned point cloud by applying the proposed method from different viewpoints. Figure 9 shows the results of viewing the original scanned point cloud from the front and the bird’s-eye viewpoint. In this case, there are a lot of gaps in the roof area due to the mutual shading of the tiles, resulting in the shape of the roof being difficult to discern in the visualization; even the interior of the house can be directly seen when viewed from the bird’s-eye viewpoint.

Figure 10 shows the fusion visualization results after applying the proposed method. We can see that after the completion, numerous new points are generated in the roof region to fill the gaps between the tiles, which improves the perception of the roof region when viewed from the bird’s-eye viewpoint. In addition, for the missing parts at the eaves, the complete shape of the eaves is restored after the completion, which improves the recognizability of the roof region in visualization.

Figure 11 and Figure 12 show the enhancement of visibility before and after applying the proposed method in the transparent visualization. We can see that the gaps in the roof regions and the sparse point density regions in the original data are almost unrecognizable in the transparent visualization. Although this helps observe the internal structure of the house, it negatively affects the perception of the overall structure of the house. After point cloud completion, the opacity of the roof regions is improved due to the new points generated in the roof regions, thus improving their visibility in the transparent visualization.

Figure 13 shows the fused visualization results for the entire Waraku-an scanned data. Compared to the original scanned data, we consider that the transparent visualization results after applying the proposed method are more realistic in restoring the appearance of Waraku-an and more conducive to the understanding of the overall structure of Waraku-an, which proves that the proposed method can effectively enhance the comprehensibility of the large-scale point cloud with missing regions in the transparent visualization.

4.4.2. Results of Completion and Visualization for Zuiganji Temple

Figure 14 shows a laser-scanned point cloud of the cave site group of Zuiganji Temple, a National Treasure of Japan. This point cloud data was scanned using a laser-scanner FARO Focus 3D S120. Unlike outdoor buildings, as a cave site, we consider that the ground portion should also be recognized as a part of the cultural heritage. Thus, completing the holes generated in the ground during laser scanning is necessary.

As shown in Figure 14, most ground holes are inside the caves, making them difficult to observe in opaque conditions. To overcome this, we applied transparent visualization to the original scanned point cloud. This allowed us to observe the internal and external structure of the entire cave site group and to realize the negative impact of ground holes on visibility. The transparent visualization of the original laser-scanned point cloud is presented in Figure 15.

Performing point cloud completion on the original large-scale scanned point cloud is impractical. Therefore, we manually intercepted the ground holes for processing. The results of applying the proposed method to the intercepted ground holes are shown in Figure 16. The proposed method effectively completes the missing regions for data that do not have complex structures, such as ground holes. Furthermore, the distribution of point density in the completed results is relatively uniform.

Figure 17 demonstrates the fused transparent visualization results of the point cloud completion results for Zuiganji Temple. We can see that compared to the original transparent visualization result shown in Figure 15, since the ground holes of Zuiganji Temple are completed after applying the proposed method, the original ground hole area becomes less abrupt, and the internal structure of the cave is more understandable in the visualization result.

5. Discussion

We innovatively integrated point cloud voxelization techniques with Transformer architectures and proposed a new solution for the task of missing region reconstruction in large-scale point cloud data. Experimental results demonstrate that our proposed method, without compromising on the accuracy of point cloud completion, achieves efficient processing of large-scale point cloud data compared to the existing point cloud completion models. Specifically, under equivalent experimental conditions, the proposed method facilitates significantly faster point cloud completion and handles point cloud data of scales far exceeding those processable by existing methods. This is attributed to the proposed method that refrains from extracting per-point features from the input original point clouds, which incurs substantial memory consumption and processing time in large-scale scenarios. Instead, by adopting voxelization, we extract features of centroids within each voxel grid and encode their positions for input into the Transformer block for key point prediction, thus reducing memory and time requirements during feature extraction. Notably, experimental results showed that this strategy does not degrade the final completion quality. For data with a large number of points, point-wise features are often redundant, and representation through key point features permits swift characterization. Furthermore, evaluations on both synthetic and real-world laser-scanned datasets showed that the proposed method yields point clouds of better-quality reconstruction, which is manifested in the following ways: (1) The point density distribution is more uniform, and compared to the simple FoldingNet-based upsampling module, the proposed method realizes dense point cloud reconstruction through a feature-expansion module, and uses a repulsion-loss function to avoid the generation of localized regional aggregation of points, which results in a better uniformity of point density distribution in the final result; and (2) Due to the better uniformity, the visibility enhancement of the original data in transparent visualization is better.

Meanwhile, our work still has some limitations at present. First, the definition of missing regions is still unclear. Both in the production of the dataset and in the comparison experiments, we cut out the missing regions in the original scanned point cloud manually. During the manual cutting process, the size of the point cloud that specifically needs to contain the missing regions is interfered by the human factor, which will have a certain impact on the point cloud completion task in different scenarios. Then, there are limitations on the generalizability of real scanned point cloud datasets. Currently, only the traditional roof and ground scenes are selected from the laser-scanned point clouds we have for point cloud completion, and there are limitations when applying to other cultural heritages. Therefore, clarifying the definition of missing regions, developing automated missing region identification methods with less human intervention to improve efficiency and consistency, as well as investigating how to make the model more robust to adapt to the characteristics of different types of cultural heritages, including, but not limited to, material, morphology, and degree of damage, to improve the generalizability of the model, are the main focuses of our future work.

6. Conclusions

In this paper, we proposed a new method based on the point cloud voxelization and Transformer architecture for the task of point cloud completion of missing regions in large-scale point cloud data. Traditional point cloud completion methods are often limited by the efficiency bottleneck of large-scale data processing, especially when dealing with complex and high-density point cloud data such as cultural heritage sites, where memory occupation and computation time become insurmountable obstacles. The combination of voxelization and Transformer proposed in our work effectively alleviates this problem. Reducing the complexity of feature extraction through the voxelization strategy while utilizing the powerful pattern-recognition capability of the Transformer for key point prediction not only ensures completion accuracy, but also improves the processing speed and the upper limit of the processable data size. Experimental results show that the centroid-based features still characterize the point cloud well, even though point-by-point feature extraction is not performed for each point. We take advantage of this to achieve an efficient feature representation by reducing unnecessary computations. In addition, the application of the feature-expansion module and the repulsion-loss function further optimizes the uniformity of the point density distribution of the generated point cloud, which is crucial for the restoration of cultural heritages as it ensures the authenticity and integrity of the restored structures and improves the visibility under transparent visualization. In the future, we expect to expand the size of our real scanned point cloud dataset so that high-quality point cloud completion for missing regions can be realized in more cultural heritage scenarios. We consider that our work not only pushes forward the advancement of point cloud processing at the technical level, but also provides a technical support for the digital preservation and restoration practices of cultural heritage.

Author Contributions

Conceptualization, W.L., K.H., L.L. and S.T.; methodology, W.L., J.P., K.H., L.L. and S.T.; software, W.L.; validation, W.L.; formal analysis, W.L.; investigation, W.L.; resources, S.T.; data curation, W.L. and K.H.; writing—original draft preparation, W.L.; writing—review and editing, W.L., K.H., L.L. and S.T.; visualization, W.L.; supervision, L.L. and S.T.; project administration, S.T.; funding acquisition, W.L. and S.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Science and Technology Research Project of Chongqing Municipal Education Commission of China, grant number KJQN202200815, the High-level Talent Research Start-up Fund of Chongqing Technology and Business University, grant number 2256019 and JSPS KAKENHI, grant number 21H04903.

Data Availability Statement

The numerical analysis data presented in this paper are available on request from the corresponding author. The point cloud data are not publicly available due to that the data on cultural heritage is involved, which cannot be opened without authorization.

Acknowledgments

The authors would like to thank Hiroshi Yamaguchi of Nara National Research Institute for Cultural Properties, the Zuiganji Temple, Matsushima-cho, and Shrewd Design Co., Ltd. for their cooperation in executing the 3D scanning.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Gomes, L.; Bellon, O.R.P.; Silva, L. 3D reconstruction methods for digital preservation of cultural heritage: A survey. Pattern Recognit. Lett. 2014, 50, 3–14. [Google Scholar] [CrossRef]
Stanco, F.; Battiato, S.; Gallo, G. Digital imaging for cultural heritage preservation. In Analysis, Restoration, and Reconstruction of Ancient Artworks; CRC Press: Boca Raton, FL, USA, 2011. [Google Scholar]
Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 652–660. [Google Scholar]
Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Adv. Neural Inf. Process. Syst. 2017, 30, 5105–5114. [Google Scholar]
Hackel, T.; Savinov, N.; Ladicky, L.; Wegner, J.D.; Schindler, K.; Pollefeys, M. Semantic3D.net: A new Large-scale Point Cloud Classification Benchmark. arXiv 2017, arXiv:1704.03847. [Google Scholar] [CrossRef]
Weinmann, M.; Jutzi, B.; Hinz, S.; Mallet, C. Semantic point cloud interpretation based on optimal neighborhoods, relevant features and efficient classifiers. ISPRS J. Photogramm. Remote Sens. 2015, 105, 286–304. [Google Scholar] [CrossRef]
Huang, Z.; Yu, Y.; Xu, J.; Ni, F.; Le, X. Pf-net: Point fractal network for 3d point cloud completion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 7662–7670. [Google Scholar]
Wen, X.; Li, T.; Han, Z.; Liu, Y.-S. Point cloud completion by skip-attention network with hierarchical folding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 1939–1948. [Google Scholar]
Xie, H.; Yao, H.; Zhou, S.; Mao, J.; Zhang, S.; Sun, W. Grnet: Gridding residual network for dense point cloud completion. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 365–381. [Google Scholar]
Yu, X.; Rao, Y.; Wang, Z.; Liu, Z.; Lu, J.; Zhou, J. Pointr: Diverse point cloud completion with geometry-aware transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 12498–12507. [Google Scholar]
Pan, L.; Chen, X.; Cai, Z.; Zhang, J.; Zhao, H.; Yi, S.; Liu, Z. Variational relational point completion network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 8524–8533. [Google Scholar]
Alexa, M.; Behr, J.; Cohen-Or, D.; Fleishman, S.; Levin, D.; Silva, C.T. Computing and rendering point set surfaces. IEEE Trans. Vis. Comput. Graph. 2003, 9, 3–15. [Google Scholar] [CrossRef]
Lipman, Y.; Cohen-Or, D.; Levin, D.; Tal-Ezer, H. Parameterization-free projection for geometry reconstruction. ACM Trans. Graph. (ToG) 2007, 26, 22-es. [Google Scholar] [CrossRef]
Huang, H.; Li, D.; Zhang, H.; Ascher, U.; Cohen-Or, D. Consolidation of unorganized point clouds for surface reconstruction. ACM Trans. Graph. (TOG) 2009, 28, 1–7. [Google Scholar] [CrossRef]
Qin, Z.; Yu, H.; Wang, C.; Guo, Y.; Peng, Y.; Xu, K. Geometric transformer for fast and robust point cloud registration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 11143–11152. [Google Scholar]
Bai, X.; Luo, Z.; Zhou, L.; Chen, H.; Li, L.; Hu, Z.; Fu, H.; Tai, C.-L. Pointdsc: Robust point cloud registration using deep spatial consistency. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 15859–15869. [Google Scholar]
Zhang, Y.; Rabbat, M. A graph-cnn for 3d point cloud classification. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018; pp. 6279–6283. [Google Scholar]
Liu, Y.; Fan, B.; Xiang, S.; Pan, C. Relation-shape convolutional neural network for point cloud analysis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 8895–8904. [Google Scholar]
Landrieu, L.; Simonovsky, M. Large-scale point cloud semantic segmentation with superpoint graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4558–4567. [Google Scholar]
Lai, X.; Liu, J.; Jiang, L.; Wang, L.; Zhao, H.; Liu, S.; Qi, X.; Jia, J. Stratified transformer for 3d point cloud segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 8500–8509. [Google Scholar]
Yu, L.; Li, X.; Fu, C.-W.; Cohen-Or, D.; Heng, P.-A. Pu-net: Point cloud upsampling network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 2790–2799. [Google Scholar]
Yifan, W.; Wu, S.; Huang, H.; Cohen-Or, D.; Sorkine-Hornung, O. Patch-based progressive 3d point set upsampling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 5958–5967. [Google Scholar]
Li, R.; Li, X.; Fu, C.-W.; Cohen-Or, D.; Heng, P.-A. Pu-gan: A point cloud upsampling adversarial network. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 7203–7212. [Google Scholar]
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. Adv. Neural Inf. Process. Syst. 2014, 27, 2672–2680. [Google Scholar]
Qian, Y.; Hou, J.; Kwong, S.; He, Y. PUGeo-Net: A geometry-centric network for 3D point cloud upsampling. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 752–769. [Google Scholar]
Chang, A.X.; Funkhouser, T.; Guibas, L.; Hanrahan, P.; Huang, Q.; Li, Z.; Savarese, S.; Savva, M.; Song, S.; Su, H. Shapenet: An information-rich 3d model repository. arXiv 2015, arXiv:1512.03012. [Google Scholar]
Wu, Z.; Song, S.; Khosla, A.; Yu, F.; Zhang, L.; Tang, X.; Xiao, J. 3d shapenets: A deep representation for volumetric shapes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1912–1920. [Google Scholar]
Yin, T.; Zhou, X.; Krähenbühl, P. Multimodal virtual point 3d detection. Adv. Neural Inf. Process. Syst. 2021, 34, 16494–16507. [Google Scholar]
Gross, M.; Pfister, H. Point-Based Graphics; Elsevier: Amsterdam, The Netherlands, 2011. [Google Scholar]
Aicardi, I.; Chiabrando, F.; Lingua, A.M.; Noardo, F. Recent trends in cultural heritage 3D survey: The photogrammetric computer vision approach. J. Cult. Herit. 2018, 32, 257–266. [Google Scholar] [CrossRef]
Kersten, T.P.; Keller, F.; Saenger, J.; Schiewe, J. Automated generation of an historic 4D city model of Hamburg and its visualisation with the GE engine. In Proceedings of the Progress in Cultural Heritage Preservation: 4th International Conference, EuroMed 2012, Limassol, Cyprus, 29 October–3 November 2012; Proceedings 4. pp. 55–65. [Google Scholar]
Dylla, K.; Frischer, B.; Müller, P.; Ulmer, A.; Haegler, S. Rome reborn 2.0: A case study of virtual city reconstruction using procedural modeling techniques. In Proceedings of the Computer Applications and Quantitative Methods in Archaeology, Williamsburg, VA, USA, 22–26 March 2010; pp. 62–66. [Google Scholar]
Tanaka, S.; Hasegawa, K.; Okamoto, N.; Umegaki, R.; Wang, S.; Uemura, M.; Okamoto, A.; Koyamada, K. See-through imaging of laser-scanned 3D cultural heritage objects based on stochastic rendering of large-scale point clouds. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2016, 3, 73–80. [Google Scholar] [CrossRef]
Tanaka, S.; Hasegawa, K.; Shimokubo, Y.; Kaneko, T.; Kawamura, T.; Nakata, S.; Ojima, S.; Sakamoto, N.; Tanaka, H.T.; Koyamada, K. Particle-Based Transparent Rendering of Implicit Surfaces and its Application to Fused Visualization. In Proceedings of the EuroVis (Short Papers), Vienna, Austria, 5–8 June 2012; pp. 25–29. [Google Scholar]
Uchida, T.; Hasegawa, K.; Li, L.; Adachi, M.; Yamaguchi, H.; Thufail, F.I.; Riyanto, S.; Okamoto, A.; Tanaka, S. Noise-robust transparent visualization of large-scale point clouds acquired by laser scanning. ISPRS J. Photogramm. Remote Sens. 2020, 161, 124–134. [Google Scholar] [CrossRef]
Choy, C.; Gwak, J.; Savarese, S. 4d spatio-temporal convnets: Minkowski convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3075–3084. [Google Scholar]
Park, C.; Jeong, Y.; Cho, M.; Park, J. Fast point transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 16949–16958. [Google Scholar]
Bello, I.; Zoph, B.; Vaswani, A.; Shlens, J.; Le, Q.V. Attention augmented convolutional networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 3286–3295. [Google Scholar]
Yuan, W.; Khot, T.; Held, D.; Mertz, C.; Hebert, M. Pcn: Point completion network. In Proceedings of the 2018 International Conference on 3D Vision (3DV), Verona, Italy, 5–8 September 2018; pp. 728–737. [Google Scholar]
Yang, Y.; Feng, C.; Shen, Y.; Tian, D. Foldingnet: Point cloud auto-encoder via deep grid deformation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 206–215. [Google Scholar]
Li, W.; Hasegawa, K.; Li, L.; Tsukamoto, A.; Tanaka, S. Deep Learning-Based Point Upsampling for Edge Enhancement of 3D-Scanned Data and Its Application to Transparent Visualization. Remote Sens. 2021, 13, 2526. [Google Scholar] [CrossRef]
Fan, H.; Su, H.; Guibas, L.J. A point set generation network for 3d object reconstruction from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 605–613. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. In Proceedings of the International Conference Learn Represent (ICLR), San Diego, CA, USA, 5–8 May 2015. [Google Scholar]
Tchapmi, L.P.; Kosaraju, V.; Rezatofighi, H.; Reid, I.; Savarese, S. Topnet: Structural point cloud decoder. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 383–392. [Google Scholar]
Berger, M.; Levine, J.A.; Nonato, L.G.; Taubin, G.; Silva, C.T. A benchmark for surface reconstruction. ACM Trans. Graph. (TOG) 2013, 32, 1–17. [Google Scholar] [CrossRef]
Lague, D.; Brodu, N.; Leroux, J. Accurate 3D comparison of complex topography with terrestrial laser scanner: Application to the Rangitikei canyon (NZ). ISPRS J. Photogramm. Remote Sens. 2013, 82, 10–26. [Google Scholar] [CrossRef]

Figure 1. The architecture of the proposed point cloud completion network. The input of network is incomplete point cloud, and it predicts the dense point cloud with complete shapes by the processing in centroid-aware feature extraction, geometry-aware transform block, and point cloud upsampling. The blue points in the bottom left are input points, and the colored squares are non-empty voxels created by voxelization. The red triangles are centroids of non-empty voxels with their features.

Figure 2. Visualization results of the proposed method with existing methods on ShapeNet-55. We selected point cloud data from four different categories (chair, display, lamp, airplane, table, and bathtub) for comparison, where the table and bathtub categories contain more training data, while the display and chair have less training data.

Figure 3. Comparison of time required to train on different scales of point cloud.

Figure 4. The real scanned point cloud datasets we collected and generated. (a) is a type of roof data with simple shaped structures, (b) is a roof data containing complex structures such as eaves, and (c) is a ground region intercepted from a real laser-scanned point cloud.

Figure 5. The visualization results of the proposed method and the SOTA method PoinTr on our collected dataset. The first row of data is a roof point cloud with a complex structure, the second row is a roof point cloud with only a simple shape, and the third row is a ground region intercepted from the laser-scanned point cloud.

Figure 6. Original laser-scanned point cloud data of Waraku-an.

Figure 7. Segmented data of the roof area from the Waraku-an laser-scanned point cloud. (a) is the roof data of the house in the southeast direction, and (b) is the roof data of the house in the northwest direction.

Figure 8. Point cloud completion results for the Waraku-an roof regions. (a) is the fusion result obtained by applying the proposed method to Figure 7a, and (b) is the point cloud completion result based on Figure 7b.

Figure 9. Visualization results of the original scanned point cloud of Waraku-an from different viewpoints. (a) is the southeast roof data observed from the bird’s-eye viewpoint and the front direction, and (b) is the northwest roof data observed from the bird’s-eye viewpoint and the front direction.

Figure 10. The fusion visualization results after applying the proposed method. We keep the same viewpoints as in Figure 9. (a) is the result of southeast roof data, and (b) is the result of northwest roof data.

Figure 11. Fused transparent visualization results of the house located northwest in Waraku-an. (a) is the transparent visualization result with only the original laser-scanned point cloud, and (b) is the fused transparent visualization result of the point cloud completion result with the original point cloud.

Figure 12. Fused transparent visualization results of the house located southeast in Waraku-an. (a) is the transparent visualization result with only the original laser-scanned point cloud, and (b) is the fused transparent visualization result of the point cloud completion result with the original point cloud.

Figure 13. Fusion transparent visualization result for the entire Waraku-an. (a) is the result of the transparent visualization with only the original laser-scanned point cloud shown in Figure 6, and (b) is the result of the fused transparent visualization of the reconstructed point cloud with the original point cloud after applying the proposed method.

Figure 14. The 3D laser-scanned point cloud data of the Zuiganji Temple cave site group, which is a Japanese National Treasure located in Miyagi Prefecture.

Figure 15. The transparent visualization of the Zuiganji Temple cave site group. While the internal structure becomes visible, the ground holes hinder the understanding of the overall structure.

Figure 16. The results of applying the proposed method to the ground holes. The top data are manually intercepted ground hole from the original laser-scanned point cloud, and the bottom are the corresponding completion result obtained by applying the proposed method.

Figure 17. The fusion transparent visualization result of the original Zuiganji Temple cave site group laser-scanned point cloud data with the ground hole completion results.

Table 1. Results of numerical evaluation of chamfer distance on simple synthetic dataset ShapeNet-55 by our proposed method with existing methods.

	Table	Bottle	Airplane	Bathtub	Bed	Lamp	Piano	Sofa	Overall
GRNet	3.86	4.53	5.87	3.41	5.63	4.85	2.89	3.51	3.06
PoinTr	0.95	2.03	3.16	1.14	2.84	1.58	0.74	1.29	1.36
Ours	1.24	2.38	2.97	1.30	3.05	1.81	1.12	1.62	1.57

Table 2. Results of quantitative comparison between the proposed method and the existing methods in the uniformity of point distribution.

Method	$NUC with Different p$ ( $10^{- 3}$ )
Method	0.4%	0.6%	0.8%	1.0%	1.2%
GRNet	42.71	44.59	47.35	48.91	51.22
PoinTr	16.83	17.28	18.39	20.47	21.30
Ours	13.03	13.11	13.84	14.58	15.21

Table 3. Quantitative evaluation results of the proposed method and existing methods on real scanning datasets.

	$C 2 C Distance (10^{- 3})$				$Hausdorff Distance (10^{- 3})$				$Chamfer Distance (10^{- 3})$
	Data_1	Data_2	Data_3	Overall	Data_1	Data_2	Data_3	Overall	Data_1	Data_2	Data_3	Overall
PoinTr	2.59	2.19	1.93	3.01	4.79	8.67	2.97	7.26	4.29	3.68	2.58	4.14
Ours	2.67	2.93	2.07	3.13	3.54	6.24	1.06	5.11	4.81	5.05	2.91	4.43

Table 4. Results of quantitative comparison between the proposed method and the existing method in the uniformity of point distribution.

Method	$NUC with Different p$ $(10^{- 2})$
Method	0.4%	0.6%	0.8%	1.0%	1.2%
PoinTr	5.33	5.61	5.82	6.17	6.42
Ours	4.75	4.92	5.11	5.40	5.73

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, W.; Pan, J.; Hasegawa, K.; Li, L.; Tanaka, S. Missing Region Completion Network for Large-Scale Laser-Scanned Point Clouds: Application to Transparent Visualization of Cultural Heritage. Remote Sens. 2024, 16, 2758. https://doi.org/10.3390/rs16152758

AMA Style

Li W, Pan J, Hasegawa K, Li L, Tanaka S. Missing Region Completion Network for Large-Scale Laser-Scanned Point Clouds: Application to Transparent Visualization of Cultural Heritage. Remote Sensing. 2024; 16(15):2758. https://doi.org/10.3390/rs16152758

Chicago/Turabian Style

Li, Weite, Jiao Pan, Kyoko Hasegawa, Liang Li, and Satoshi Tanaka. 2024. "Missing Region Completion Network for Large-Scale Laser-Scanned Point Clouds: Application to Transparent Visualization of Cultural Heritage" Remote Sensing 16, no. 15: 2758. https://doi.org/10.3390/rs16152758

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Missing Region Completion Network for Large-Scale Laser-Scanned Point Clouds: Application to Transparent Visualization of Cultural Heritage

Abstract

1. Introduction

2. Related Works

2.1. Deep Learning-Based Point Cloud Shape Completion

2.2. Dense Point Cloud Reconstruction for Sparse Regions

2.3. Transparent Visualization for Large-Scale Point Clouds

3. Proposed Method for Missing Region Reconstruction

3.1. Overview

3.2. Centroid-Aware Feature Extraction

3.3. Transformer Block

3.4. Dense Point Cloud Generation

3.5. Loss Functions

4. Experiments

4.1. Datasets and Implementation Details

4.2. Synthetic Dataset Completion Results

4.3. Results on Real Scanned Point Cloud Datasets

4.4. Application to Large-Scale Cultural Heritage Scanning Data

4.4.1. Results of Completion and Visualization for Waraku-an

4.4.2. Results of Completion and Visualization for Zuiganji Temple

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI