1. Introduction
Point cloud denoising is a widely used data processing technique in various fields, such as 3D reconstruction, indoor positioning systems and positioning, intelligent manufacturing, virtual reality and augmented reality, and geological exploration and seismic research [
1]. With the ability to improve the quality and accuracy of point cloud data and provide a reliable data foundation for other applications. Point cloud denoising aims to effectively eliminate noise points, smooth the reconstructed surface model, and maintain the original topology and geometric characteristics of the sampled surface. When the density of a point in a local area of the point cloud is significantly higher or lower than that of the surrounding points, it is referred to as high-density noise or low-density noise, respectively. The presence of high-density noise can be determined by calculating the density or number of points in the local neighborhood using methods such as nearest neighbor distance or K-nearest neighbor. In specific applications, thresholds for high-density and low-density noise can be established through experiments and optimizations based on actual requirements.
The popular point cloud processing platform ‘Point Cloud Library’ offers denoising methods such as radius outlier removal and statistical outlier removal, but these are only suitable for low-density noise data [
1]. Recently, a new approach using Wasserstein curvature has been proposed for point cloud denoising; however, it mistakenly identifies some real information in flat areas as noise [
2].
The K-means algorithm is known for its low time complexity and fast running speed, making it suitable for processing most continuous variable data [
3]. It performs well in dealing with data sets that have spherical clustering structures and can complete clustering in a short period of time. However, the traditional K-means algorithm only considers Euclidean distance when calculating the similarity between samples and does not account for the statistical structure of point cloud. This limitation makes it challenging for the algorithm to distinguish between random noise and valid data [
3,
4]. To address this issue, a clustering algorithm is proposed based on the neighborhood density of each data in [
5,
6]. On the manifold, the local geometric structure of noise and valid data are inconsistent, and thus the data with similar local geometric characteristics need to be considered as the same cluster. In [
7,
8], a K-means clustering algorithm is used to denoise point cloud, but only a few metrics are adopted to calculate the distance on the manifold. Meanwhile, the influence function, which aims to evaluate the robustness of the mean induced by each metric to outliers, is not analyzed in these studies.
This manuscript extends our previous ideas in [
8] by proposing a point cloud denoising method based on the geometry of the Gaussian distribution family manifold. The manifold is endowed with five metrics, including the Euclidean metric, the affine-invariant Riemannian metric, the log-Euclidean metric, the Kullback–Leibler divergence as well as the symmetrized Kullback–Leibler divergence [
9]. To evaluate the robustness of the mean endowed by each metric to outliers, the influence functions of geometric metrics are calculated. As stated in [
10], it can be very difficult to solve the matrix equations for the influence functions directly, so we calculate the approximate values of these influence functions. To evaluate the denoising effects of these metrics, the true positive rate (TPR), the false positive rate (FPR), and the signal–noise rate growing (SNRG) are adopted. TPR refers to the proportion of correctly predicted positive cases among all actual positive cases. A higher TPR means less real data loss. In fact, higher TPR, higher SNRG, and lower FPR indicate that the algorithm can better distinguish between real data and noise. Our proposed algorithm will be evaluated by the above three standards.
The contributions in this paper are summarized as follows.
(1) A K-means clustering algorithm is proposed to denoise point cloud with high-density noise by leveraging the difference in local statistical characteristics between noise and valid data. The algorithm operates on a Gaussian distribution family manifold.
(2) By calculating the expectation and the covariance matrix of the data points, we map the original point cloud onto the Gauss distribution family manifold to form the parameter point cloud. Next, the metrics are assigned to the manifold, and the K-means method is applied to cluster the parameter point cloud, aiming to classify the original point cloud.
(3) With the purpose of analyzing the robustness of the means with different metrics, the approximate values of their influence functions are calculated, respectively. The simulation results demonstrate that using geometry metrics yields better denoising effects compared to using the Euclidean metric. Additionally, mean influence functions with geometry metrics exhibit greater robustness than that with the Euclidean metric.
The rest of the current work is structured as follows. Briefly,
Section 2 introduces the Riemannian framework of the Gaussian distribution family manifold.
Section 3 proposes a K-means clustering algorithm to denoise point cloud with high-density noise and calculates the approximate values of their influence functions with different metrics. In addition,
Section 4 presents the effectiveness of the proposed algorithm to denoise point cloud and verifies our obtained properties of influence functions.
4. Simulations and Results
The following section presents numerical experiments to demonstrate the denoising effect of Algorithm 1. It also compares the norms of the mean influence functions with FM, AIRM, LEM, KLD, and SKLD. In these simulations, all samples on
can be generated using the equation:
where
refers to a random matrix generated by MATLAB.
A. Numerical Simulations
In this example, the distance function induced by FM, AIRM, LEM, KLD, or SKLD is used in Algorithm 1, and the denoising effect caused by these metrics is compared, where the experimental data employ the 3D point cloud of Teapot.ply. Teapot.ply is one of the built-in data in MATLAB and can also be obtained through some open-source projects, such as the Stanford 3D Scanning Repository. In MATLAB, Teapot.ply is an example model that is often used to demonstrate and test 3D graphics processing and visualization functions. It is stored in PLY file format and contains three-dimensional geometric information representing a teapot.
As shown in
Figure 2a, background noise can be distributed uniformly with a signal-to-noise ratio (SNR) of 4137:1000. Taking
and
,
Figure 2b–f present the denoised images of the original point cloud using Algorithm 1, according to FM, AIRM, LEM, KLD, and SKLD, respectively. From
Figure 2b, it can be observed that Algorithm 1, which is based on the Euclidean metric, still has a significant amount of noise points remaining in the image.
Figure 2c,d show that Algorithm 1 performs well in removing noise points when using AIRM and LEM.
Figure 2e,f illustrate that Algorithm 1, utilizing KLD or SKLD, effectively denoises the image but also removes some real data points from the teapot. It is evident that Algorithm 1 on the basis of the Euclidean metric is not as effective as other geometric metrics.
B. Results and Discussions
To evaluate the denoising effects of these metrics, true positive (TP), false positive (FP), false negative (FN), and true negative (TN) are adopted. Then, the true positive rate (TPR), the false positive rate (FPR), and the signal–noise rate growing (SNRG) are defined by
where
represents the number of real data, and
stands for the number of noises.
In
Table 2, we have an SNR of 4137:1000 and 4137:2000. As higher TPR, higher SNRG, and lower FPR indicate that the algorithm can better distinguish between real data and noise, the highest TPR, the lowest FPR, and more than
SNRG are displayed in bold. From
Table 2, it is evident that Algorithm 1 based on the Euclidean metric generally has a higher FPR, lower TPR, and lower SNRG. Conversely, geometric metrics tend to lower FPR, higher TPR, and higher SNRG. Among them, Algorithm 1 with KLD and SKLD exhibits a lower FPR due to the removal of some valid data points in
Figure 2e,f. Therefore,
Table 2 demonstrates the advantages of using geometric metrics (AIRM, LEM, KLD, and SKLD) over the Euclidean metric.
To compare the robustness of influence functions with different metrics, we consider simulations involving 100 randomly generated symmetry positive-definite matrices along with l-injected outliers. We examine four scenarios: and 100 to analyze the effect of outlier numbers on influence function robustness.
From Propositions 1–5 as a basis for calculation norms of influence functions, we repeat simulations for each scenario (
, or 100) one hundred times, as shown in
Figure 3. It can be seen that the norms of the influence functions corresponding to the geometric metrics (AIRM, LEM, KLD, and SKLD) are not sensitive to changes in outlier numbers; they remain within a range close to one, regardless of if
l is set from 10 up to 100. On the other hand, norms associated with Euclidean metric (FM) fluctuate significantly—decreasing from around seven or eight down towards one when
l increases from 10 up to 100. Therefore, geometric means are almost independent of the number of outliers, and more stable than the arithmetic mean.
Figure 4 provides an intuitive representation by illustrating average norms across all simulations for each metric type (Euclidean vs Geometric). As observed in
Figure 3, results hold true here as well: FM consistently yields larger norm values compared to those obtained using geometric metrics when considering various values for
l such as 10, 40, and 70, respectively. Furthermore,
Figure 4 shows that if the number of outliers with the FM is 100, which is close to the number of symmetry positive-definite matrices used in simulations, influence functions for FM and geometric metrics become increasingly similar. This further supports the notion that geometric means are more robust than arithmetic means, as demonstrated by
Figure 4.
C. Complexity Analysis
Next, the computational complexity of the mean matrices with FM, LEM, AIRM, KLD, and SKLD will be demonstrated, whose formulas can be obtained from (
15) and
Table 1. Then, the computational complexity of the influence functions according to FM, LEM, AIRM, KLD, and SKLD will be also given, whose expressions are shown in Propositions 1–5. For iterative algorithms, we only calculate the complexity of single-step iterations.
This subsection demonstrates the computational complexity of mean matrices for FM, LEM, AIRM, KLD, and SKLD. The formulas can be obtained from (
15) and
Table 1. Additionally, the computational complexity of influence functions according to these metrics is provided in Properties 1–5. For iterative algorithms, only the complexity of single-step iterations is considered. Let us assume we have m
symmetry positive-definite matrices and
l outliers. It is assumed that the complexity of a single element is O(1),
and
. When calculating the matrix exponential of a symmetric matrix and the half power of a symmetric positive-definite matrix, the eigenvalue decomposition is needed to perform. Thus, we have that
and
.
Table 3 shows that the arithmetic mean has the lowest computation time followed by KLD and SKLD means. As the complexity of each step of the AIRM mean is equivalent to that of the LEM mean, so the LEM mean has a much faster calculation speed than the AIRM mean.
From
Table 4, it can be seen that calculating the influence functions with the FM takes less time than those with the geometric metrics. Among geometric metrics, the longest calculation time is the influence function corresponding to the AIRM, followed by that corresponding to the LEM, and the shortest calculation time is the influence function induced by the KLD and the SKLD. This difference arises due to the iterative algorithm requirement for calculating AIRM.
5. Conclusions
To conclude, a novel point cloud denoising algorithm is proposed in combination with the K-means algorithm. By calculating the expectation and covariance of the data points, this algorithm utilizes geometric metrics on the statistical manifold to map the original data point cloud to the Gaussian distribution family manifold, forming a parameter point cloud. Additionally, different measure structures are constructed on the Gaussian distribution family manifold, and the K-means method is used for clustering the parameter point cloud, aiming to cluster the corresponding original data. The robustness of different means with various metrics is analyzed by calculating their approximate influence functions. Simulations show that Algorithm 1 with the geometry metrics can obtain a better denoising effect than that with the Euclidean metric, and the geometric means are more robust than the arithmetic mean. We use three criteria (TPR, FPR, and SNRG) to evaluate the denoising effect of Algorithm 1 corresponding to different metrics. Simulations indicate that the algorithm corresponding to the Euclidean metric has a lower TPR and a higher FPR in most cases. SNRG is the most important criteria, which represents the promotion of signal-to-noise ratio through Algorithm 1.
Table 2 shows that the SNRG of Algorithm 1 with geometric metrics is generally higher than that with the Euclidean metric.
However, it should be noted that although denoising algorithms based on geometric metrics have the advantages of high TPR, low FPR, and high SNRG, it faces higher computational complexity than those utilizing the Euclidean metric, reaching . Additionally, computational complexity according to KLD and SKLD also reaches . This represents a drawback of denoising algorithms employing geometric metrics.
In this manuscript, Algorithm 1 is only used for numerical simulation of point cloud denoising, and can also be applied to specific tests in the future. Ultrasonic testing with a manipulator is an important non-destructive testing method used to detect the internal defects of materials with complex structures. Before the manipulator detects the workpiece, the laser instrument is used to collect the point cloud data of the workpiece for 3D reconstruction. However, the point cloud data collected by laser instruments often contains a significant amount of redundant or invalid points. Algorithm 1 can be used to these redundancies and improve the quality of point cloud data. This will result in reduced noise and more accurate models, ultimately enhancing subsequent processing effects.