1. Introduction
With the widespread adoption of low-cost RGB-D cameras such as Microsoft Kinect and the decreasing cost of commercial LiDAR systems, the acquisition of 3D point cloud data has become more accessible and diverse [
1,
2]. Point cloud processing techniques are rich in content and continuously evolving, increasingly demonstrating broad application prospects across various fields.
Point cloud registration, as a critical step in processing technology, involves achieving optimal alignment of point clouds from different coordinate systems through spatial transformations. Depending on the specific application scenarios, point cloud registration can be utilized to achieve target pose estimation [
3,
4], 3D object recognition [
5], and 3D reconstruction [
6], playing a significant role in fields such as reverse engineering [
7], digital cities [
8], and intelligent robotics [
9].
Generally, without relying on manual intervention or assistance from other instruments, point cloud registration based solely on the information inherent in the point cloud itself can be categorized into three types [
10]: (1) point-based [
11,
12], (2) global feature-based [
13,
14], and (3) local feature-based [
15,
16,
17,
18]. Point-based methods directly utilize point cloud data, performing registration through geometric alignment or random sampling strategies, and iteratively optimizing the matching relationships between point pairs. These methods typically achieve high accuracy but are sensitive to initial positions and noise, prone to falling into local optima, and incur high computational costs when dealing with large-scale point clouds [
19]. Global feature-based methods abstract features such as vectors or frequencies to describe the entire point cloud. However, they are limited when the overlap between two point clouds is insufficient. Local feature-based methods focus on describing the regions around feature points, making them better suited for registration tasks in scenarios where point clouds are partially overlapping [
10]. While recent studies on deep neural networks further demonstrate their superior performance in feature extraction and pattern recognition tasks such as image classification and object detection [
20,
21,
22], their sensitivity to input variations necessitates careful robustness analysis in safety-critical applications [
23].
The general pipeline for local feature-based point cloud registration is as follows:
Extract key points from the two point clouds based on a unified standard.
Compute 3D local feature descriptors for all key points.
Estimate initial matching point pairs based on feature similarity.
Remove incorrect matching point pairs that negatively affect registration.
Use the remaining correct matching point pairs to estimate the rigid transformation and complete the registration.
Among all steps in the pipeline, describing local features of key points is the most critical. Regardless of the type of 3D local feature descriptor, it should possess rich descriptiveness to differentiate various local surfaces and sufficient robustness to resist the effects of geometric transformations and environmental disturbances [
24].
Although numerous studies on constructing 3D local feature descriptors have emerged over the past decade, the performance of local feature-based point cloud-matching pipelines often faces challenges in practical applications due to environmental complexity and sensor limitations [
25]. This is primarily because of the following: (1) noise, occlusion, repetitive structures, and systematic errors in point cloud acquisition devices [
26] often lead to incorrect key point detection; (2) the inherent limitations of 3D local feature descriptors, such as the choice of support radius, often result in the initial matching set containing numerous incorrect matches [
27,
28]; (3) the symmetric structure and multi-layer repetitive structure of a class of targets (such as satellites) can lead to ambiguous matching results.
Therefore, in the fourth step of the pipeline, developing a method to remove incorrect matches plays a crucial role in enhancing the performance of local feature-based point cloud registration pipelines.
2. Related Work
In order to construct accurate feature point pairs, researchers have explored a variety of methods [
29]. A common approach is to use a threshold on feature distances to quickly establish an initial set of matching pairs [
30], but this method often includes a significant number of incorrect matches. To address this, some researchers have employed nearest neighbor (NN) methods to generate an initial set with fewer errors [
15,
31], while others have adopted a bidirectional NN approach to create a one-to-one correspondence in the initial set [
13]. Additionally, ref. [
32] used the nearest neighbor distance ratio (NNDR) algorithm to construct an initial set that outperforms the threshold-based method in terms of matching performance. These three approaches rely solely on the similarity of feature descriptors [
33]. Although they are computationally efficient and simple to implement, they are highly sensitive to outliers due to the limitations of local descriptors. As a result, the initial matching pairs generated still contain numerous incorrect matches, which must be filtered out using additional methods.
Random Sample Consensus (RANSAC) [
34] addresses this issue by selecting a minimal set of feature correspondences to compute a rigid transformation R that aligns the model with the scene. The number of correspondences required depends on the method used to compute R. For example, if only the positions of the key points are used, three correspondences are needed. If both positions and normal vectors of the key points are used, two correspondences suffice. If a local reference frame (LRF) is established for each key point, a single correspondence is sufficient. RANSAC iteratively selects point pairs to calculate which transformation results in the majority of points in the transformed point cloud being inliers, satisfying a set threshold. Examples based on this method include [
30,
35,
36]. However, directly applying RANSAC to the initial correspondence set to find consistent correspondences is not ideal because this method typically requires numerous iterations and does not always guarantee an optimal solution.
Geometric consistency (GC) utilizes external constraints based on the actual points in the point cloud, which are distinct from the feature space, to remove false matches. This method assumes that true matches exhibit geometric similarity, while false matches do not. These methods often use the distances between matched points as key constraints for the geometric consistency of rigid transformations in the point cloud [
13,
37,
38]. Dorai [
39] defined a threshold on the distance between points to assess the geometric consistency of different pairs, iteratively checking and removing the most inconsistent pairs. In addition, some methods combine multiple geometric consistency measures to form a collective constraint. For instance, Johnson and Hebert [
15,
40] used a combination of distances between points and their normal vectors for collective constraints. Furthermore, some methods apply geometric consistency to construct voting sets, where different matching pairs are scored, and those with higher scores are selected as true matches [
41,
42].
Pose clustering assumes that when the source and target point clouds are correctly matched, the transformations will likely cluster near the true transformation space. This method typically utilizes feature points’ LRFs, allowing a single match pair to vote for a transformation. After clustering all transformations, the centers of the clusters are considered as candidate transformations, which are then validated in descending order of their scores. Examples based on this method include [
43,
44,
45,
46,
47,
48].
Constraint Explanation Trees build consistent explanation trees for each possible transformation. As the number of nodes in the tree increases, the certainty of the hypothesis improves. Refs. [
49,
50] utilized this method for feature registration across different scales, incorporating features from coarse to fine scales, progressively adding nodes at each scale level. The selection of a specific sub-node means that all features corresponding to the parent nodes must satisfy the same transformation. When the transformed point cloud satisfies a certain threshold, the hypothesis transformation is deemed correct. This method effectively filters out inconsistent transformation hypotheses and gradually eliminates mismatched pairs.
Game theory applies a non-cooperative game framework to all matching pairs in the initial set, where a designed payoff function causes incompatible feature correspondences to vanish after several iterations, allowing the most reliable pairs to survive. These correspondences are then used to compute transformation hypotheses. Examples based on this method include [
51,
52].
The Generalized Hough Transform method is similar to pose clustering but differs in that it projects feature correspondences into the Hough space for voting and clustering. Each point in the Hough space represents a potential transformation between the source and target point clouds. Larger clusters in the space are considered more reliable transformations. Examples based on this method include [
53,
54,
55].
This study addresses the limitations and challenges of existing 3D point cloud feature-matching techniques by proposing a novel matching method—3D spatial encoding (3DSE). The purpose of 3DSE is to enhance the robustness of feature matching, drawing inspiration from image processing algorithms. By employing an inconsistency scoring mechanism, it identifies and excludes unstable local feature points to filter out accurate matches.
The core of 3DSE lies in leveraging the geometric consistency of true matches to prioritize the elimination of incorrect matches. It does not require a predefined initial voting set or its size but iteratively removes the most inconsistent feature points until convergence, leaving only reliable matches. Although there are precedents for directly excluding inconsistent points, these methods often fail to adequately consider the complexity of real-world point clouds, which is critical for stable feature matching.
Unlike traditional methods that require carefully designed geometric constraints, 3DSE adopts a simple and intuitive geometric constraint yet demonstrates exceptional performance. In simulated point cloud experiments, 3DSE exhibited strong robustness against various disturbances, and comparative studies further validated its advantages. Even when tested on low-quality real-world point clouds, 3DSE maintained high robustness. Most importantly, since our research focuses on the post-processing stage, our optimization method can be combined with other point cloud feature-matching schemes to further improve the accuracy of point cloud registration.
3. Methodology
Our approach is logically similar to other geometric constraint methods, all of which are based on similar ideas or observations: correct matching pairs should exhibit geometric consistency, whereas incorrect matches may not be compatible under geometric constraints. L2 distance, normal vectors, and LRF (local reference frame) are typical geometric constraints. In particular, in point cloud matching with rigid body transformations, it is intuitive to use rigidity constraints, such as the L2 distance, to eliminate incompatible matches. However, results obtained solely by using L2 distance rigidity constraints often exhibit ambiguity. Methods using normal vectors and LRF often require additional computational costs and parameter adjustments, and their results are highly susceptible to noise. Therefore, these methods often need to be paired with 3D local descriptors designed based on LRF to effectively utilize existing parameters.
For these reasons, we hypothesize that the same feature point in the source and target point clouds should have a similar spatial layout. Thus, a truly matching pair of features should not only be compatible in the quantized vectors of local descriptors but also maintain a similar directional order with other true matches. The primary function of 3DSE is to quantify the order of different matching points in the directional space. The technical details of the 3DSE method will be discussed in depth in the following.
This section delves into the technical details of the proposed 3DSE method. First, the method employs a bidirectional NN to match local features of the point clouds, establishing an initial set of matching pairs. Subsequently, the 3DSE method iteratively removes the most inconsistent matching pairs in terms of spatial relationships based on a predefined inconsistency tolerance, until all remaining matching pairs exhibit inconsistencies below the tolerance threshold. During each iteration, the following steps are performed: First, a spatial relationship graph of the point cloud is constructed based on the current set of matching pairs and encoded into 3D spatial representations. Next, a spatial verification step is conducted to eliminate outliers that fail to meet geometric consistency, thereby generating a new, more reliable set of matching pairs. This process is repeated until the predefined inconsistency criteria are met, ensuring the accuracy and robustness of the final matching pair set.
3.1. Initial Matching Set Acquisition
To construct the initial matching set
, the source point cloud
and the target point cloud
are first processed using the same key point selection criteria to extract key points, resulting in point clouds
and
. Here,
contains
m points with the feature set
, and
contains
n points with the feature set
. For each point
in
, NN search is performed on the feature set
of
to find the point
with the minimum distance that is less than the threshold
. The process is expressed as follows:
The matched pairs
that meet the conditions constitute the candidate matching set
:
On this basis, to eliminate one-to-many matching pairs, for each point
in point cloud
, a NN search is performed again within the limited range of
. Specifically, for
, extract all points from
that match it to form the set
:
Then, within the set
, select the point
that has the smallest distance to
and is less than the threshold
:
The final initial matching set
is defined as
This process effectively reduces the impact of one-to-many matches, ensuring the uniqueness and accuracy of the matches.
By following the aforementioned steps, to construct the initial matching set , we first perform down-sampling on the source point cloud and the target point cloud . Then, the 3D local feature descriptors are applied to describe the features of the down-sampled point clouds. Next, brute-force feature matching is conducted based on the L2 distance metric, generating the preliminary matching pair set .
3.2. The 3DSE Building
To construct the initial matching set
, we first perform down-sampling on the source point cloud
and the target point cloud
. Then, the 3D local feature descriptors are used to describe the features of the down-sampled point clouds. Next, brute-force feature matching is conducted based on the L2 distance metric to generate the preliminary matching set
. Since incorrect matches can negatively affect the estimation of the rigid transformation matrix, it is necessary to remove outliers. Typically, RANSAC or other spatial constraint relationships are used to filter out some outliers. However, as RANSAC is computationally expensive for the entire matching set, it is usually employed only when the matching set has been sufficiently reduced. Moreover, the performance of general spatial constraints is often suboptimal when the point cloud quality is low. Thus, a more relaxed representation is needed to describe the spatial constraints between local feature descriptors, allowing for the creation of a global model based on point-pair features. Inspired by this [
56], we constructed a 3DSE scheme.
In 3DSE, binary spatial mapping is performed for the three dimensions, namely
,
, and
.
,
, and
describe the relative spatial positions of feature pairs in the
x,
y, and
z axes of the 3D coordinate system, respectively. Suppose
k distinct sample points in point cloud
have their local feature descriptors computed, forming the feature set
. The
,
, and
are defined as
binary matrices:
Here, and represent the spatial coordinates of the points corresponding to features and , respectively.
Figure 1 briefly illustrates the 3DSE of a point cloud with four point features. The resulting
,
, and
are as follows:
In
,
and
, the
i-th row records the spatial relationships between feature
and other features in the point cloud
. For example,
,
, and
indicate that feature
is located behind, to the right, and above feature
(where the positive X-axis is defined as forward, the positive Y-axis as right, and the positive Z-axis as up). This mapping can also be interpreted as follows: in the
i-th row, feature
is selected as the origin, and the spatial relationships are divided into eight octants according to the 3D coordinate system. Then,
,
, and
indicate which octant other features belong to. As shown in
Figure 2, for example, in the coordinate system where feature
is the origin, features
,
, and
are located in the first, third, and fourth octants, respectively.
To represent the relative spatial relationships between features in the point cloud, this study adopts a 3DSE method. Specifically, assuming that k valid features are extracted from the point cloud, binary bits are required to encode the spatial relationships among these features. This is because for feature pairs and (where ), their spatial relationships and are not equal, and when , . In this way, the relative spatial positions between features in the point cloud are represented as binary encodings. The spatial relationship between each pair of features is encoded by 3 bits, corresponding to the spatial relationships along the X-, Y-, and Z-axes, respectively. This provides a loose geometric constraint on the relative positions of features in the point cloud.
To further enhance the pose invariance of this encoding method, we take into account the impact of rotational variations on the encoding results. In 2D images, rotational invariance is typically achieved by rotating the image around an axis perpendicular to the screen and accumulating the results. However, in 3D point clouds, pose variations are more complex. To address this challenge, we generate different point cloud poses using combinations of Euler angles, performing rotations r times around the X-, Y-, and Z-axes, respectively. This process is equivalent to treating each feature point as the reference origin and dividing the space uniformly into r sectors on the projection planes perpendicular to the X-, Y-, and Z-axes, thereby creating different spatial partitioning configurations.
For each pose variation, we perform 3DSE independently and ultimately combine all encoding results to generate the final spatial encoding map. In common pose measurement scenarios, there is usually overlap between consecutive frames of the point cloud, so pose differences typically do not exceed 90°. Based on this assumption, r is set to 1 in this study to simplify calculations and maintain pose invariance.
Thus, the generalized spatial graphs
,
, and
can be generated as follows. Assume that the original position of feature point
is
. After rotation by Euler angles
,
, and
, the new position of the feature point becomes
, where
.
Next, the generalized spatial graphs
,
, and
are defined as follows:
By defining the generalized spatial graphs , , and , we can more rigorously define the relative spatial positions between each pair of features.
Although constructing the entire spatial graph during execution requires using all features in the point cloud, which may consume a significant amount of memory, it is unnecessary to store the coordinates of points explicitly. Instead, only the sorting order of each feature along the three coordinate axes needs to be recorded. During point cloud feature matching, the spatial graphs can be generated based on the sorting order of these features’ coordinates, and spatial verification can then be performed.
3.3. The 3DSE Constraint
Accurately identifying and matching repetitive 3D structures is one of the key steps in 3D point cloud registration tasks. Feature matching based on key points’ local descriptors serves as the foundation for point cloud registration. The success of registration relies on the presence of a certain number of common local features between the point clouds. However, due to the complexity of point cloud data, the matching process is often accompanied by mismatches. These errors may originate from sensor noise, environmental interference, or quantization errors during data pre-processing. To improve the accuracy of matching, it is essential to effectively eliminate mismatched point pairs. This study proposes a 3DSE-based method to utilize geometric relationships between feature points to validate the consistency of the initial matching set and eliminate mismatched pairs. The specific implementation of this method is detailed as follows.
First, the initial matching set
between the source point cloud
and the target point cloud
is obtained using NN or NNDR. Using the 3DSE method, the geometric relationships of the matched pairs in
and
are computed, generating the corresponding spatial graphs
and
. To compare the geometric consistency between matched feature points, element-wise XOR operations are performed on
and
, resulting in inconsistency matrices
,
, and
, defined as follows:
Under ideal conditions, when the captured point clouds and their matching set
are entirely correct, all entries in the inconsistency matrices
,
, and
will be zero. If mismatches exist, these errors will result in inconsistent entries in
and
, as well as mismatched entries in
and
, and
and
, causing the XOR results in
,
, and
at the corresponding positions to be 1. Based on this, the inconsistency summation is defined as follows:
The three orthogonal components of inconsistency , , and computed through spatial encoding represent the total inconsistency of each feature point along different directions. To identify the most spatially inconsistent matches, the maximum values of these components are examined. The appearance of a maximum value indicates the possible presence of mismatches along that direction, allowing these mismatched pairs to be marked and eliminated. For the eliminated matching pairs, the corresponding entries in , , and are set to 0, and , , and are recalculated until the maximum values of these components are below a predefined tolerance t.
The factor r is a critical parameter that controls the strictness of spatial constraints. It directly affects the balance between eliminating mismatches and preserving true matches. A larger value of r strengthens the spatial constraints, making the elimination of mismatches more stringent. However, this may also inadvertently remove some true matches. Conversely, a smaller value of r reduces the strictness of the constraints, lowering the risk of eliminating true matches but potentially failing to remove all mismatches. Therefore, selecting an appropriate value of r is crucial for ensuring efficient and accurate registration results.
In certain specific application scenarios, such as pose estimation, the choice of r becomes particularly important. Typically, a smaller value of r (e.g., ) indicates a lower level of spatial constraint strictness, allowing for greater tolerance in spatial relationships. Although this setting may permit more mismatches, it provides greater flexibility to retain true matches, which is particularly critical for pose estimation tasks. In such applications, where high registration accuracy is required and point clouds typically have overlapping regions, retaining more matches effectively reduces the risk of eliminating true matches. Therefore, in pose estimation scenarios, setting is a reasonable strategy. This not only simplifies the computation but also ensures efficient and accurate registration.
Additionally, under each pose variation, 3DSE is performed independently, and all encoding results are combined to generate the complete spatial encoding map. For common pose estimation scenarios, point clouds between consecutive frames generally have overlapping regions, and pose variations typically do not exceed 90°. Based on this assumption, this study selects to simplify the computation.
3.4. Point-Pair Distance Constraint
The 2D spatial encoding method is primarily applied to large-scale image recognition and retrieval tasks, where matching verification is required for images undergoing affine transformations. However, point clouds undergoing rigid transformations typically do not need to consider local scaling of the target. To ensure the accuracy of 3D feature matching, additional constraints must be introduced. The proposed algorithm incorporates point-pair distance constraints to ensure invariance under rigid transformations.
In practical applications, relying solely on a single method for matching may result in some erroneous matches not being effectively removed, particularly when the point cloud undergoes significant transformations or contains noise.
To illustrate this,
Figure 3 demonstrates the effects of two different matching methods on point cloud matching. In the figure, red matching pairs represent erroneous matches, while blue matching pairs represent correct matches.
Specifically,
Figure 3a shows that using only the 3D spatial constraints fails to eliminate the erroneous red matching pairs, while
Figure 3b shows that using only the point-pair distance constraints fails to retain the correct blue matching pairs. Thus, relying solely on a single method leads to suboptimal matching results.
By combining these two methods, it is possible to eliminate erroneous matches while retaining correct matching pairs, thereby improving the accuracy of 3D point cloud matching.
In the algorithm, suppose that there exist matching pairs and between the source point cloud and the target point cloud . The point-pair distance inconsistency matrix is defined and calculated as follows:
First, the distance difference measure between corresponding point pairs in the source and target point clouds is calculated and denoted as
, with the formula as follows:
Next, a predefined distance threshold
is used to define the point-pair distance inconsistency matrix
. The calculation rule for this matrix is as follows:
Finally, the distance inconsistency component
for each matching pair with respect to other matching pairs is calculated as follows:
Through the above calculations, we can obtain the distance inconsistency component for each matching point pair under the distance threshold. This enables the effective identification and removal of erroneous matches with significant distance deviations.
3.5. Code
The following section provides an overview of Algorithm 1, which outlines the key steps of the proposed 3DSE method for eliminating mismatches in point cloud matching.
3.6. Computational Complexity
The time complexity of the proposed 3DSE method is dominated by three key operations: spatial encoding matrix construction and iterative outlier removal. For a point cloud with
n initial matching pairs and
r rotational discretization steps per axis, for each of the rotational poses, pairwise comparisons between
n matches are performed across the X/Y/Z-axes. The spatial encoding stage requires
operations to generate spatial relationship matrices. If the value of r is small or the computational space is sufficient, spatial encoding for different poses can be constructed in parallel, reducing the computational complexity to
. Meanwhile, all initial matches are compared pairwise to compute Euclidean distance discrepancies. The distance constraint matrix requires
operations to compute relative Euclidean distance differences between source and target point pairs; this results in
operations. The iterative outlier removal process requires calculating constraint inconsistency and finding the index of the most inconsistent point pair, and updating matrices with different constraint dimensions. It requires
operations. In summary, the computational complexity which scales linearly with the number of pose variations and quadratically with the correspondence set size.
Total Time Complexity:
. The space complexity remains highly efficient due to binary encoding and minimal intermediate storage. Source/target point coordinates and initial matches are preloaded, requiring no additional storage. Spatial encoding matrices
:
bits (1-bit storage).
Total Space Complexity:
bits.
Algorithm 1 Algorithm for 3DSE-based point cloud feature matching |
Input: Source point cloud and target point cloud . Output: . Extract key points from the source point cloud and target point cloud using the same key point selection criteria, resulting in point clouds and . Here, contains m points with a feature set , and contains n points with a feature set . Generate the initial matching pair set by matching local feature descriptors of and using Equation ( 5). Generate the spatial graphs and for and using Equations ( 10)–( 12). Compute the distance difference measure using Equation ( 19). Generate the inconsistency matrices , and using Equations ( 13)–( 15), and ( 20). Compute the inconsistency components , and across different dimensions using Equations ( 16)–( 18), and ( 21). Set . While , enter the loop: a. Identify the index I of the matching pair that maximizes . b. Set , , , and to 0. c. Recalculate , , , and . d. Update . End the loop.
|
4. Experiments
4.1. Experimental Data
This study employs both simulated and real-world data to evaluate the performance of the proposed 3DSE method in point cloud feature matching for space targets. To accurately simulate the reflective characteristics of solar panels on satellites, the point cloud density is adjusted during simulation based on the incident angle of the laser. The model of the simulated satellite is shown in
Figure 4.
In addition to simulations, real-world data are employed to evaluate the algorithm’s performance. The real-world data originate from a 64-element linear array LiDAR developed by the Shanghai Institute of Technical Physics, Chinese Academy of Sciences, which was used to scan a scaled satellite model. The photo of the scaled satellite model is shown in
Figure 5. The structure of the satellite scale model has strong symmetry. If feature matching is performed only through local feature descriptors, incorrect matching is likely to occur at the key points of symmetrical structure sampling. The proposed algorithm is tested using both simulated and real-world data.
To verify the applicability of the algorithm across different source data, we have also added a standard 3D feature-matching dataset, the Bologna Dataset5 (BoD5). The BoD5 dataset was obtained through a Kinect sensor and includes 43 model pairs with clutter and occlusions, significantly affected by noise.
Figure 6 provides an example view of this dataset.
4.2. Criteria
To quantitatively assess the performance of point cloud feature-matching methods, this study employs Recall of Inliers (ROI) as the evaluation metric. This metric is defined by analyzing the relevance of the top
K matching pairs scored by the algorithm. Specifically, let
represent the set of the top
K matching pairs based on the scoring method, and let
denote the initial matching set. For a given
K, the recall is defined as
By varying the value of K, recall curves can be generated to reflect the proportion of inliers within the selected matching pairs for different values of K. A higher recall indicates a larger proportion of inliers within the selected pairs.
To determine whether a matching pair
is an inlier, the ground-truth rotation matrix
and translation vector
are used to compute the
distance between the transformed source point and the target point. If the following condition is satisfied, the pair
m is considered an inlier:
Here,
is the distance threshold used to determine spatial consistency for matching pairs. In this study,
is set to
, where
represents the average resolution of the point cloud, defined as
where
N is the total number of points in the point cloud, and
represents the
distance between the
k-th point and its nearest neighbor.
Using these evaluation metrics, the accuracy and robustness of the proposed method in point cloud feature matching can be comprehensively assessed. These metrics not only account for the number of matching pairs but also consider their spatial consistency, providing a comprehensive evaluation framework for algorithm performance.
4.3. Comparison Methods
In this study, we evaluated four point cloud feature-matching methods and compared them with the proposed 3DSE method. The four methods include (1) NN matching, (2) NNDR matching, (3) L2-based spatial encoding, and (4) progressive consistency voting (PCV) [
42] optimization.
The key distinction of the 3DSE algorithm lies in its optimization of point cloud features via 3DSE, resulting in more precise and robust matching. To comprehensively evaluate the performance of different methods, we compared them across multiple aspects, focusing on accuracy, robustness, and computational efficiency in complex environments. Through this comparison, the relative advantages of the 3DSE algorithm are clearly demonstrated, especially in handling highly complex and irregular point cloud data, where it exhibits superior robustness and accuracy.
Additionally, this paper discusses the advantages and disadvantages of these methods and their applicability in real-world scenarios, providing valuable references for future research.
4.4. Parameter Selection and Ablation Experiments
Before assessing the performance of 3DSE, it is imperative to validate the efficacy of some of its critical components to ensure the interpretability of 3DSE. Experimental analysis is also required for the selection of several key parameters. These experiments are conducted on real-world datasets collected using a 64-line array LiDAR. K indicates the number of top-ranked correspondences.
The parameter
r controls the rotational invariance of 3DSE by discretizing Euler angles. We experimented with different values of
r. According to
Figure 7a, there is no significant fluctuation in performance when
r exceeds 1. However, as illustrated in
Figure 7b, computation time increases significantly with higher
r values. This trade-off aligns with the characteristics of pose estimation scenarios—limited-angle inter-frame rotations maintain a certain overlap area, rendering fine-grained posture discretization (
) meaningless. Therefore, choosing
is recommended to enhance efficiency. The distance threshold
determines the tolerance for scale differences between matching point pairs. A smaller threshold may excessively eliminate valid matches, while a larger threshold might allow anomalous matches. In this experiment, the distance threshold parameter is set to
.
Table 1 presents the results of our ablation study, demonstrating that the best performance is achieved when both 3DSE constraints and point-pair distance constraints are combined. The ablation experiments confirmed the necessity of spatial encoding and distance constraints.
Figure 3 illustrates potential failure cases when using only a single module.
4.5. Experimental Details
In the experimental stage, the initial matching set
needs to be constructed first. To generate the initial matching set, the source point cloud
and the target point cloud
are down-sampled. After down-sampling, we choose to use the Signature of Histograms of Orientations (SHOT) [
57] descriptor—a robust feature description method, to extract features from the down-sampled point clouds. Next, brute-force matching is performed based on the L2 distance metric to generate the preliminary matching pair set
. It is important to note that the purpose of the experiment is to evaluate the performance of different feature-matching methods under a unified initial matching set, rather than optimizing detectors and descriptors to produce the best matching set.
When evaluating feature-matching methods, this paper compares several classic approaches, including direct NN matching, the NNDR method, the L2-based spatial encoding method, and the PCV matching strategy. Each method has its strengths and limitations. This paper comprehensively compares these methods to explore their performance in practical point cloud-matching tasks.
To quantitatively compare the performance of these methods, Recall of Inliers (ROI) is selected as the evaluation criterion. By calculating the recall, the performance of different methods under various conditions can be assessed, especially in terms of the accuracy and robustness of inlier matching. Ultimately, the performance evaluation of all methods will be based on this quantitative standard to ensure the comparability of experimental results.
Through the above experimental procedures, not only can the effectiveness of different methods be evaluated, but a reliable quantitative basis can also be provided for the proposed 3DSE method to verify its advantages in feature-matching tasks.
4.6. ROI Performance Display
We tested the recall performance of each method on both simulated and real-world data to comprehensively evaluate the robustness of each algorithm. In the simulation experiments, Gaussian noise of varying intensities was added to the simulated point clouds, and different levels of down-sampling were performed to verify the performance of each method under these complex conditions.
Figure 8 illustrates the recall performance of different algorithms on the simulated data.
Figure 8a–e show the test results under different levels of Gaussian noise, while
Figure 8f–h present the results under various data sampling ratios. The experimental results indicate that as the intensity of Gaussian noise increases, especially at a noise level of 0.5 pr, the 3DSE method maintains good performance despite the noise, demonstrating high robustness. In the data sampling tests, as the amount of point cloud data decreases, 3DSE still achieves high recall performance.
Additionally,
Figure 9 shows the recall performance of different algorithms on real-world data. On real-world data, the 3DSE method significantly outperforms other compared methods, particularly excelling in complex environments. Compared to simulated data, the real-world data contain more complex noise and data loss. However, 3DSE still demonstrates high accuracy and robustness under these conditions.
Overall, the 3DSE method exhibits strong robustness and high recall rates under various experimental conditions. It provides accurate matching results even under noise interference and data sparsity, proving its potential in practical applications.
4.7. Robustness to Different Data Sources
In addition to simulation data and LiDAR data, we conducted experiments on the BoD5 dataset, captured using Kinect, to test the algorithm’s applicability to point cloud data acquired from other sensors. The results are shown in
Figure 10. As demonstrated in
Figure 10a, the 3DSE method still exhibits certain advantages, showing strong robustness to data obtained from different sensors. We also visualized the matching set before and after the removal of incorrect matches using pseudo-coloring (with red indicating incorrect matches and blue indicating correct matches).
Moreover, leveraging the binary classification characteristic of 3DSE, we tested the classification performance of various methods on the BoD5 dataset. We divided the initial matching set into two subsets comprising inliers and outliers. Based on the classification results of different algorithms, we used a confusion matrix to categorize them and calculated metrics such as precision, recall, and F-score to evaluate the classification performance of each method. The results are presented in
Table 2.
4.8. Effectiveness Display
To intuitively demonstrate the practical performance of the 3DSE algorithm,
Figure 11 presents a comparison of real-world point cloud data before and after the removal of mismatches. In
Figure 11a, all matching pairs in the initial matching set are shown, where a large number of mismatched pairs can be observed.
Figure 11b illustrates the matching results after processing with the 3DSE algorithm, which significantly reduces erroneous matches, retaining only a small number of high-quality matching points.
Through comparative analysis, it is evident that the 3DSE algorithm can accurately identify and eliminate mismatched pairs, thereby significantly improving the precision and robustness of point cloud registration. This result not only validates the effectiveness of the 3DSE algorithm in processing real-world point cloud data but also provides high-quality point cloud registration results for subsequent satellite model analysis. Specifically, after eliminating mismatches, the matching results exhibit higher geometric consistency, further proving the method’s potential for application in real-world complex environments.
4.9. Time Efficiency
To evaluate the time efficiency of different methods in handling the initial correspondence set, experiments were conducted in the MATLAB 2022b environment. The experiments tested various scenarios with initial correspondence counts ranging from 50 to 1000, with each scenario repeated ten times. The average value was taken as the measured time cost. The results are shown in
Table 3.
It is worth noting that the experimental algorithms were implemented in the MATLAB 2022b environment primarily for its convenience in rapid development and experimental validation rather than for high-performance optimization. Compared to lower-level languages such as C++, MATLAB typically incurs higher runtime costs. Therefore, the time costs presented in the table may be higher than those in related studies. However, this does not affect the comparative analysis of relative efficiency among the algorithms.
Specifically, as shown in
Table 1, when the scale of the initial correspondence set is small (50 to 200 pairs), the NN and NNDR methods incur the least time cost due to their simplicity. As the scale of the correspondence set increases, the time costs of L2SE, PCV, and 3DSE methods grow rapidly. In particular, the PCV method, due to its multi-round consistency voting process, exhibits significantly higher time costs when the correspondence count reaches 1000. Although the 3DSE method involves complex spatial consistency calculations, its time cost growth trend is relatively moderate compared to the PCV method, demonstrating a better balance between efficiency and effectiveness.
The time cost data in this experiment can serve as a reference for algorithm selection in different application scenarios. For example, NN or NNDR methods can be prioritized in scenarios requiring high real-time performance. For scenarios requiring higher matching precision and robustness, the 3DSE method demonstrates greater application value.
5. Conclusions
This paper presents a robust 3D point cloud feature-matching method named 3DSE. The core idea is to leverage the consistency of local geometric features in true matching pairs and to eliminate unreliable mismatched pairs through spatial geometric constraints, thereby generating a high-quality set of matching pairs. Through systematic experimental evaluation and comparative analysis, the proposed method demonstrates superior robustness and matching performance in both simulated and real-world data. The following are the main characteristics and application prospects of 3DSE.
5.1. Algorithm Characteristics
Simple and efficient geometric constraint design: 3DSE imposes consistency constraints on local features through spatial geometric encoding, without requiring complex iterative optimization processes. This approach is easy to understand and implement.
Outstanding robustness and adaptability: Experimental results on different datasets demonstrate that 3DSE maintains high performance and robustness under challenging conditions such as noise interference, data sampling biases, clutter, occlusions, and changes in data modalities. In sparse and noisy low-quality Kinect datasets like BoD5, 3DSE outperforms similar methods across various classification metrics. Additionally, in LiDAR-collected point clouds, when dealing with satellite-scale models characterized by numerous symmetrical and repetitive structures and few features, 3DSE effectively utilizes the constraints of feature point spatial layout, achieving superior outlier removal compared to similar algorithms.
Performance optimization and scalability: Compared with classical methods, 3DSE significantly improves point cloud registration accuracy while removing mismatched pairs, providing a reliable foundation for subsequent point cloud processing tasks.
5.2. Experimental Validation and Practicality
Comprehensive experimental analyses on both simulated and real-world data demonstrate that 3DSE outperforms state-of-the-art methods under various matching conditions. In particular, for 3D registration tasks, by effectively improving the quality of matching pairs, 3DSE significantly enhances the overall performance of point cloud registration. Moreover, the method’s implementation process is straightforward, and its runtime efficiency demonstrates high practical value under experimental conditions.
5.3. Application Prospects
The 3DSE method has broad application potential, particularly in the following fields:
Three-dimensional reconstruction: Reliable point-to-point matches support the fusion and reconstruction of multi-view data.
Object recognition and scene understanding: Achieving precise matching between models and point clouds in complex scenes provides a foundation for object recognition tasks.
Pose estimation of non-cooperative targets: In dynamic scenes, real-time pose estimation of targets based on multi-frame point cloud matching supports tasks such as navigation and industrial inspection.
5.4. Future Work
To further enhance the performance and applicability of 3DSE, we plan to conduct research in the following directions:
Efficiency optimization: Investigate more efficient strategies for computing geometric constraints in the algorithm to reduce computation time and resource consumption while maintaining matching accuracy.
Extension to multi-modal and complex scenarios: Explore multi-modal data fusion techniques to enhance 3DSE’s adaptability and robustness when handling high-dimensional data or operating in complex sensor environments, such as applications in unmanned systems and medical imaging.
In summary, the 3DSE method not only demonstrates significant advantages in current 3D point cloud processing tasks but also provides potential and directions for future research and applications.