A Robust Feature-Matching Method for 3D Point Clouds via Spatial Encoding

Wang, Han; Wang, Fengxiang; Xue, Ruikai; She, Xiaokai; Kong, Wei; Huang, Genghua

doi:10.3390/sym17050640

Open AccessArticle

A Robust Feature-Matching Method for 3D Point Clouds via Spatial Encoding

by

Han Wang

^1,2

,

Fengxiang Wang

^1,3,

Ruikai Xue

^1,4,5,

Xiaokai She

^1,2,

Wei Kong

^1,2,4,5

and

Genghua Huang

^1,2,4,5,*

¹

Key Laboratory of Space Active Opto-Electronics Technology, Shanghai Institute of Technical Physics, Chinese Academy of Sciences, Shanghai 200083, China

²

University of Chinese Academy of Sciences, Beijing 100049, China

³

College of Surveying and Geo-Informatics, Tongji University, Shanghai 200092, China

⁴

Shanghai Branch, Hefei National Laboratory, Shanghai 201315, China

⁵

Shanghai Research Center for Quantum Sciences, Shanghai 201315, China

^*

Author to whom correspondence should be addressed.

Symmetry 2025, 17(5), 640; https://doi.org/10.3390/sym17050640

Submission received: 7 March 2025 / Revised: 3 April 2025 / Accepted: 10 April 2025 / Published: 24 April 2025

(This article belongs to the Special Issue Studies of Optoelectronics in Symmetry)

Download

Browse Figures

Versions Notes

Abstract

This study addresses the challenging issues in 3D point cloud feature matching within the field of computer vision, where high data quality requirements and vulnerability to disturbances significantly impact performance. Existing methods are prone to outliers when generating feature correspondences due to noise, sampling deviations, symmetric structure, and other factors. To improve the robustness of point cloud feature matching, this paper proposes a novel 3D spatial encoding (3DSE) method that incorporates compact geometric constraints. Our method encodes the spatial layout of matching feature points by quantifying the order of appearance of matching points, and combines rigidity constraints to iteratively eliminate the least consistent matching pairs with the remaining point pairs, thereby sorting the initial matching set. The 3DSE algorithm was evaluated on multiple datasets, including simulated data, self-collected data, and public datasets, which cover data from both LiDAR and Kinect sensors. The comparison results with existing techniques demonstrate that 3DSE exhibits superior performance and robustness in handling noise, sparse point clouds, and changes in data modalities. The application of the proposed method significantly enhances the point cloud registration process, showing promising potential for 3D reconstruction, model-driven 3D object recognition, and pose estimation of non-cooperative targets.

Keywords:

point cloud registration; spatial encoding; feature matching; geometric consistency

1. Introduction

With the widespread adoption of low-cost RGB-D cameras such as Microsoft Kinect and the decreasing cost of commercial LiDAR systems, the acquisition of 3D point cloud data has become more accessible and diverse [1,2]. Point cloud processing techniques are rich in content and continuously evolving, increasingly demonstrating broad application prospects across various fields.

Point cloud registration, as a critical step in processing technology, involves achieving optimal alignment of point clouds from different coordinate systems through spatial transformations. Depending on the specific application scenarios, point cloud registration can be utilized to achieve target pose estimation [3,4], 3D object recognition [5], and 3D reconstruction [6], playing a significant role in fields such as reverse engineering [7], digital cities [8], and intelligent robotics [9].

Generally, without relying on manual intervention or assistance from other instruments, point cloud registration based solely on the information inherent in the point cloud itself can be categorized into three types [10]: (1) point-based [11,12], (2) global feature-based [13,14], and (3) local feature-based [15,16,17,18]. Point-based methods directly utilize point cloud data, performing registration through geometric alignment or random sampling strategies, and iteratively optimizing the matching relationships between point pairs. These methods typically achieve high accuracy but are sensitive to initial positions and noise, prone to falling into local optima, and incur high computational costs when dealing with large-scale point clouds [19]. Global feature-based methods abstract features such as vectors or frequencies to describe the entire point cloud. However, they are limited when the overlap between two point clouds is insufficient. Local feature-based methods focus on describing the regions around feature points, making them better suited for registration tasks in scenarios where point clouds are partially overlapping [10]. While recent studies on deep neural networks further demonstrate their superior performance in feature extraction and pattern recognition tasks such as image classification and object detection [20,21,22], their sensitivity to input variations necessitates careful robustness analysis in safety-critical applications [23].

The general pipeline for local feature-based point cloud registration is as follows:

Extract key points from the two point clouds based on a unified standard.
Compute 3D local feature descriptors for all key points.
Estimate initial matching point pairs based on feature similarity.
Remove incorrect matching point pairs that negatively affect registration.
Use the remaining correct matching point pairs to estimate the rigid transformation and complete the registration.

Among all steps in the pipeline, describing local features of key points is the most critical. Regardless of the type of 3D local feature descriptor, it should possess rich descriptiveness to differentiate various local surfaces and sufficient robustness to resist the effects of geometric transformations and environmental disturbances [24].

Although numerous studies on constructing 3D local feature descriptors have emerged over the past decade, the performance of local feature-based point cloud-matching pipelines often faces challenges in practical applications due to environmental complexity and sensor limitations [25]. This is primarily because of the following: (1) noise, occlusion, repetitive structures, and systematic errors in point cloud acquisition devices [26] often lead to incorrect key point detection; (2) the inherent limitations of 3D local feature descriptors, such as the choice of support radius, often result in the initial matching set containing numerous incorrect matches [27,28]; (3) the symmetric structure and multi-layer repetitive structure of a class of targets (such as satellites) can lead to ambiguous matching results.

Therefore, in the fourth step of the pipeline, developing a method to remove incorrect matches plays a crucial role in enhancing the performance of local feature-based point cloud registration pipelines.

2. Related Work

In order to construct accurate feature point pairs, researchers have explored a variety of methods [29]. A common approach is to use a threshold on feature distances to quickly establish an initial set of matching pairs [30], but this method often includes a significant number of incorrect matches. To address this, some researchers have employed nearest neighbor (NN) methods to generate an initial set with fewer errors [15,31], while others have adopted a bidirectional NN approach to create a one-to-one correspondence in the initial set [13]. Additionally, ref. [32] used the nearest neighbor distance ratio (NNDR) algorithm to construct an initial set that outperforms the threshold-based method in terms of matching performance. These three approaches rely solely on the similarity of feature descriptors [33]. Although they are computationally efficient and simple to implement, they are highly sensitive to outliers due to the limitations of local descriptors. As a result, the initial matching pairs generated still contain numerous incorrect matches, which must be filtered out using additional methods.

Random Sample Consensus (RANSAC) [34] addresses this issue by selecting a minimal set of feature correspondences to compute a rigid transformation R that aligns the model with the scene. The number of correspondences required depends on the method used to compute R. For example, if only the positions of the key points are used, three correspondences are needed. If both positions and normal vectors of the key points are used, two correspondences suffice. If a local reference frame (LRF) is established for each key point, a single correspondence is sufficient. RANSAC iteratively selects point pairs to calculate which transformation results in the majority of points in the transformed point cloud being inliers, satisfying a set threshold. Examples based on this method include [30,35,36]. However, directly applying RANSAC to the initial correspondence set to find consistent correspondences is not ideal because this method typically requires numerous iterations and does not always guarantee an optimal solution.

Geometric consistency (GC) utilizes external constraints based on the actual points in the point cloud, which are distinct from the feature space, to remove false matches. This method assumes that true matches exhibit geometric similarity, while false matches do not. These methods often use the distances between matched points as key constraints for the geometric consistency of rigid transformations in the point cloud [13,37,38]. Dorai [39] defined a threshold on the distance between points to assess the geometric consistency of different pairs, iteratively checking and removing the most inconsistent pairs. In addition, some methods combine multiple geometric consistency measures to form a collective constraint. For instance, Johnson and Hebert [15,40] used a combination of distances between points and their normal vectors for collective constraints. Furthermore, some methods apply geometric consistency to construct voting sets, where different matching pairs are scored, and those with higher scores are selected as true matches [41,42].

Pose clustering assumes that when the source and target point clouds are correctly matched, the transformations will likely cluster near the true transformation space. This method typically utilizes feature points’ LRFs, allowing a single match pair to vote for a transformation. After clustering all transformations, the centers of the clusters are considered as candidate transformations, which are then validated in descending order of their scores. Examples based on this method include [43,44,45,46,47,48].

Constraint Explanation Trees build consistent explanation trees for each possible transformation. As the number of nodes in the tree increases, the certainty of the hypothesis improves. Refs. [49,50] utilized this method for feature registration across different scales, incorporating features from coarse to fine scales, progressively adding nodes at each scale level. The selection of a specific sub-node means that all features corresponding to the parent nodes must satisfy the same transformation. When the transformed point cloud satisfies a certain threshold, the hypothesis transformation is deemed correct. This method effectively filters out inconsistent transformation hypotheses and gradually eliminates mismatched pairs.

Game theory applies a non-cooperative game framework to all matching pairs in the initial set, where a designed payoff function causes incompatible feature correspondences to vanish after several iterations, allowing the most reliable pairs to survive. These correspondences are then used to compute transformation hypotheses. Examples based on this method include [51,52].

The Generalized Hough Transform method is similar to pose clustering but differs in that it projects feature correspondences into the Hough space for voting and clustering. Each point in the Hough space represents a potential transformation between the source and target point clouds. Larger clusters in the space are considered more reliable transformations. Examples based on this method include [53,54,55].

This study addresses the limitations and challenges of existing 3D point cloud feature-matching techniques by proposing a novel matching method—3D spatial encoding (3DSE). The purpose of 3DSE is to enhance the robustness of feature matching, drawing inspiration from image processing algorithms. By employing an inconsistency scoring mechanism, it identifies and excludes unstable local feature points to filter out accurate matches.

The core of 3DSE lies in leveraging the geometric consistency of true matches to prioritize the elimination of incorrect matches. It does not require a predefined initial voting set or its size but iteratively removes the most inconsistent feature points until convergence, leaving only reliable matches. Although there are precedents for directly excluding inconsistent points, these methods often fail to adequately consider the complexity of real-world point clouds, which is critical for stable feature matching.

Unlike traditional methods that require carefully designed geometric constraints, 3DSE adopts a simple and intuitive geometric constraint yet demonstrates exceptional performance. In simulated point cloud experiments, 3DSE exhibited strong robustness against various disturbances, and comparative studies further validated its advantages. Even when tested on low-quality real-world point clouds, 3DSE maintained high robustness. Most importantly, since our research focuses on the post-processing stage, our optimization method can be combined with other point cloud feature-matching schemes to further improve the accuracy of point cloud registration.

3. Methodology

Our approach is logically similar to other geometric constraint methods, all of which are based on similar ideas or observations: correct matching pairs should exhibit geometric consistency, whereas incorrect matches may not be compatible under geometric constraints. L2 distance, normal vectors, and LRF (local reference frame) are typical geometric constraints. In particular, in point cloud matching with rigid body transformations, it is intuitive to use rigidity constraints, such as the L2 distance, to eliminate incompatible matches. However, results obtained solely by using L2 distance rigidity constraints often exhibit ambiguity. Methods using normal vectors and LRF often require additional computational costs and parameter adjustments, and their results are highly susceptible to noise. Therefore, these methods often need to be paired with 3D local descriptors designed based on LRF to effectively utilize existing parameters.

For these reasons, we hypothesize that the same feature point in the source and target point clouds should have a similar spatial layout. Thus, a truly matching pair of features should not only be compatible in the quantized vectors of local descriptors but also maintain a similar directional order with other true matches. The primary function of 3DSE is to quantify the order of different matching points in the directional space. The technical details of the 3DSE method will be discussed in depth in the following.

This section delves into the technical details of the proposed 3DSE method. First, the method employs a bidirectional NN to match local features of the point clouds, establishing an initial set of matching pairs. Subsequently, the 3DSE method iteratively removes the most inconsistent matching pairs in terms of spatial relationships based on a predefined inconsistency tolerance, until all remaining matching pairs exhibit inconsistencies below the tolerance threshold. During each iteration, the following steps are performed: First, a spatial relationship graph of the point cloud is constructed based on the current set of matching pairs and encoded into 3D spatial representations. Next, a spatial verification step is conducted to eliminate outliers that fail to meet geometric consistency, thereby generating a new, more reliable set of matching pairs. This process is repeated until the predefined inconsistency criteria are met, ensuring the accuracy and robustness of the final matching pair set.

3.1. Initial Matching Set Acquisition

To construct the initial matching set

M_{initial}

, the source point cloud

P s

and the target point cloud

P t

are first processed using the same key point selection criteria to extract key points, resulting in point clouds

P_{s}

and

P_{t}

. Here,

P_{s}

contains m points with the feature set

F_{s} = {f_{i}^{s} ∣ i = 1, 2, \dots, m}

, and

P_{t}

contains n points with the feature set

F_{t} = {f_{j}^{t} ∣ j = 1, 2, \dots, n}

. For each point

p_{i}^{s}

in

P_{s}

, NN search is performed on the feature set

F_{t}

of

P_{t}

to find the point

p_{j^{*}}^{t}

with the minimum distance that is less than the threshold

ϵ

. The process is expressed as follows:

j^{*} = \arg \min_{j \in {1, 2, \dots, n}} ∥ f_{i}^{s} - f_{j}^{t} ∥_{2}, and {∥ f_{i}^{s} - f_{j^{*}}^{t} ∥}_{2} < ϵ .

(1)

The matched pairs

(p_{i}^{s}, p_{j^{*}}^{t})

that meet the conditions constitute the candidate matching set

M_{candidate}

:

\begin{matrix} M_{candidate} = {(p_{i}^{s}, p_{j}^{t}) | j^{*} = \arg \min_{j \in {1, 2, \dots, n}} ∥ f_{i}^{s} - f_{j}^{t} ∥_{2}, ∥ f_{i}^{s} - f_{j}^{t} ∥_{2} < ϵ} . \end{matrix}

(2)

On this basis, to eliminate one-to-many matching pairs, for each point

p j^{t}

in point cloud

P_{t}

, a NN search is performed again within the limited range of

M_{candidate}

. Specifically, for

p j^{t}

, extract all points from

M_{candidate}

that match it to form the set

S_{j}

:

S_{j} = \{p_{i}^{s} | (p_{i}^{s}, p_{j}^{t}) \in M_{candidate}\} .

(3)

Then, within the set

S_{j}

, select the point

p_{i^{*}}^{s}

that has the smallest distance to

p_{j}^{t}

and is less than the threshold

ϵ

:

i^{*} = \arg \min_{i ∣ p_{i}^{s} \in S_{j}} ∥ f_{i}^{s} - f_{j}^{t} ∥_{2}, {∥ f_{i}^{s} - f_{j}^{t} ∥}_{2} < ϵ .

(4)

The final initial matching set

M_{initial}

is defined as

\begin{matrix} M_{initial} = {(p_{i^{*}}^{s}, p_{j}^{t}) | i^{*} = \arg \min_{i ∣ p_{i}^{s} \in S_{j}} ∥ f_{i}^{s} - f_{j}^{t} ∥_{2}, ∥ f_{i}^{s} - f_{j}^{t} ∥_{2} < ϵ} . \end{matrix}

(5)

This process effectively reduces the impact of one-to-many matches, ensuring the uniqueness and accuracy of the matches.

By following the aforementioned steps, to construct the initial matching set

M_{initial}

, we first perform down-sampling on the source point cloud

P s

and the target point cloud

P t

. Then, the 3D local feature descriptors are applied to describe the features of the down-sampled point clouds. Next, brute-force feature matching is conducted based on the L2 distance metric, generating the preliminary matching pair set

M_{initial}

.

3.2. The 3DSE Building

To construct the initial matching set

M_{initial}

, we first perform down-sampling on the source point cloud

P s

and the target point cloud

P t

. Then, the 3D local feature descriptors are used to describe the features of the down-sampled point clouds. Next, brute-force feature matching is conducted based on the L2 distance metric to generate the preliminary matching set

M_{initial}

. Since incorrect matches can negatively affect the estimation of the rigid transformation matrix, it is necessary to remove outliers. Typically, RANSAC or other spatial constraint relationships are used to filter out some outliers. However, as RANSAC is computationally expensive for the entire matching set, it is usually employed only when the matching set has been sufficiently reduced. Moreover, the performance of general spatial constraints is often suboptimal when the point cloud quality is low. Thus, a more relaxed representation is needed to describe the spatial constraints between local feature descriptors, allowing for the creation of a global model based on point-pair features. Inspired by this [56], we constructed a 3DSE scheme.

In 3DSE, binary spatial mapping is performed for the three dimensions, namely

X_{map}

,

Y_{map}

, and

Z_{map}

.

X_{map}

,

Y_{map}

, and

Z_{map}

describe the relative spatial positions of feature pairs in the x, y, and z axes of the 3D coordinate system, respectively. Suppose k distinct sample points in point cloud

P

have their local feature descriptors computed, forming the feature set

F = {f_{i} ∣ i = 1, 2, \dots, k}

. The

X_{map}

,

Y_{map}

, and

Z_{map}

are defined as

k \times k

binary matrices:

\begin{matrix} X_{map} (i, j) & = \{\begin{matrix} 0, & if x_{i} < x_{j} \\ 1, & if x_{i} \geq x_{j} \end{matrix} \end{matrix}

(6)

\begin{matrix} Y_{map} (i, j) & = \{\begin{matrix} 0, & if y_{i} < y_{j} \\ 1, & if y_{i} \geq y_{j} \end{matrix} \end{matrix}

(7)

\begin{matrix} Z_{map} (i, j) & = \{\begin{matrix} 0, & if z_{i} < z_{j} \\ 1, & if z_{i} \geq z_{j} \end{matrix} \end{matrix}

(8)

Here,

(x_{i}, y_{i}, z_{i})

and

(x_{j}, y_{j}, z_{j})

represent the spatial coordinates of the points corresponding to features

f_{i}

and

f_{j}

, respectively.

Figure 1 briefly illustrates the 3DSE of a point cloud with four point features. The resulting

X_{map}

,

Y_{map}

, and

Z_{map}

are as follows:

\begin{matrix} X_{map} & = (\begin{matrix} 1 & 0 & 0 & 1 \\ 1 & 1 & 0 & 1 \\ 1 & 1 & 1 & 1 \\ 0 & 0 & 0 & 1 \end{matrix}); \\ Y_{map} & = (\begin{matrix} 1 & 1 & 1 & 1 \\ 0 & 1 & 1 & 1 \\ 0 & 0 & 1 & 1 \\ 0 & 0 & 0 & 1 \end{matrix}); \\ Z_{map} & = (\begin{matrix} 1 & 1 & 1 & 1 \\ 0 & 1 & 0 & 0 \\ 0 & 1 & 1 & 1 \\ 0 & 1 & 0 & 1 \end{matrix}) . \end{matrix}

In

X_{map}

,

Y_{map}

and

Z_{map}

, the i-th row records the spatial relationships between feature

f_{i}

and other features in the point cloud

P

. For example,

X_{map} (1, 3) = 1

,

Y_{map} (1, 3) = 0

, and

Z_{map} (1, 3) = 1

indicate that feature

f_{1}

is located behind, to the right, and above feature

f_{3}

(where the positive X-axis is defined as forward, the positive Y-axis as right, and the positive Z-axis as up). This mapping can also be interpreted as follows: in the i-th row, feature

f_{i}

is selected as the origin, and the spatial relationships are divided into eight octants according to the 3D coordinate system. Then,

X_{map}

,

Y_{map}

, and

Z_{map}

indicate which octant other features belong to. As shown in Figure 2, for example, in the coordinate system where feature

f_{2}

is the origin, features

f_{1}

,

f_{3}

, and

f_{4}

are located in the first, third, and fourth octants, respectively.

To represent the relative spatial relationships between features in the point cloud, this study adopts a 3DSE method. Specifically, assuming that k valid features are extracted from the point cloud,

3 \times \frac{k (k - 1)}{2}

binary bits are required to encode the spatial relationships among these features. This is because for feature pairs

f_{i}

and

f_{j}

(where

i \neq j

), their spatial relationships

N m a p (i, j)

and

N m a p (j, i)

are not equal, and when

i = j

,

N m a p (i, j) = 1

. In this way, the relative spatial positions between features in the point cloud are represented as binary encodings. The spatial relationship between each pair of features is encoded by 3 bits, corresponding to the spatial relationships along the X-, Y-, and Z-axes, respectively. This provides a loose geometric constraint on the relative positions of features in the point cloud.

To further enhance the pose invariance of this encoding method, we take into account the impact of rotational variations on the encoding results. In 2D images, rotational invariance is typically achieved by rotating the image around an axis perpendicular to the screen and accumulating the results. However, in 3D point clouds, pose variations are more complex. To address this challenge, we generate different point cloud poses using combinations of Euler angles, performing rotations r times around the X-, Y-, and Z-axes, respectively. This process is equivalent to treating each feature point as the reference origin and dividing the space uniformly into r sectors on the projection planes perpendicular to the X-, Y-, and Z-axes, thereby creating

r^{3}

different spatial partitioning configurations.

For each pose variation, we perform 3DSE independently and ultimately combine all encoding results to generate the final spatial encoding map. In common pose measurement scenarios, there is usually overlap between consecutive frames of the point cloud, so pose differences typically do not exceed 90°. Based on this assumption, r is set to 1 in this study to simplify calculations and maintain pose invariance.

Thus, the generalized spatial graphs

G_{X}

,

G_{Y}

, and

G_{Z}

can be generated as follows. Assume that the original position of feature point

f_{i}

is

(x_{i}, y_{i}, z_{i})

. After rotation by Euler angles

ϕ = \frac{n_{1} \cdot 360^{\circ}}{r}

,

θ = \frac{n_{2} \cdot 360^{\circ}}{r}

, and

ψ = \frac{n_{3} \cdot 360^{\circ}}{r}

, the new position of the feature point becomes

(x_{i}^{n_{1} n_{2} n_{3}}, y_{i}^{n_{1} n_{2} n_{3}}, z_{i}^{n_{1} n_{2} n_{3}})

, where

n_{1}, n_{2}, n_{3} = 0, 1, \dots, r - 1

.

\begin{matrix} (\begin{matrix} x_{i}^{n_{1} n_{2} n_{3}} \\ y_{i}^{n_{1} n_{2} n_{3}} \\ z_{i}^{n_{1} n_{2} n_{3}} \end{matrix}) = (\begin{matrix} \cos ψ & - \sin ψ & 0 \\ \sin ψ & \cos ψ & 0 \\ 0 & 0 & 1 \end{matrix}) (\begin{matrix} \cos θ & 0 & \sin θ \\ 0 & 1 & 0 \\ - \sin θ & 0 & \cos θ \end{matrix}) \\ (\begin{matrix} 1 & 0 & 0 \\ 0 & \cos ϕ & - \sin ϕ \\ 0 & \sin ϕ & \cos ϕ \end{matrix}) (\begin{matrix} x_{i} \\ y_{i} \\ z_{i} \end{matrix}) \end{matrix}

(9)

Next, the generalized spatial graphs

G_{X}

,

G_{Y}

, and

G_{Z}

are defined as follows:

\begin{matrix} G_{X} (i, j, n_{1}, n_{2}, n_{3}) & = \{\begin{matrix} 0 & if x_{i}^{n_{1} n_{2} n_{3}} < x_{j}^{n_{1} n_{2} n_{3}} \\ 1 & if x_{i}^{n_{1} n_{2} n_{3}} \geq x_{j}^{n_{1} n_{2} n_{3}} \end{matrix} \end{matrix}

(10)

\begin{matrix} G_{Y} (i, j, n_{1}, n_{2}, n_{3}) & = \{\begin{matrix} 0 & if y_{i}^{n_{1} n_{2} n_{3}} < y_{j}^{n_{1} n_{2} n_{3}} \\ 1 & if y_{i}^{n_{1} n_{2} n_{3}} \geq y_{j}^{n_{1} n_{2} n_{3}} \end{matrix} \end{matrix}

(11)

\begin{matrix} G_{Z} (i, j, n_{1}, n_{2}, n_{3}) & = \{\begin{matrix} 0 & if z_{i}^{n_{1} n_{2} n_{3}} < z_{j}^{n_{1} n_{2} n_{3}} \\ 1 & if z_{i}^{n_{1} n_{2} n_{3}} \geq z_{j}^{n_{1} n_{2} n_{3}} \end{matrix} \end{matrix}

(12)

By defining the generalized spatial graphs

G_{X}

,

G_{Y}

, and

G_{Z}

, we can more rigorously define the relative spatial positions between each pair of features.

Although constructing the entire spatial graph during execution requires using all features in the point cloud, which may consume a significant amount of memory, it is unnecessary to store the coordinates of points explicitly. Instead, only the sorting order of each feature along the three coordinate axes needs to be recorded. During point cloud feature matching, the spatial graphs can be generated based on the sorting order of these features’ coordinates, and spatial verification can then be performed.

3.3. The 3DSE Constraint

Accurately identifying and matching repetitive 3D structures is one of the key steps in 3D point cloud registration tasks. Feature matching based on key points’ local descriptors serves as the foundation for point cloud registration. The success of registration relies on the presence of a certain number of common local features between the point clouds. However, due to the complexity of point cloud data, the matching process is often accompanied by mismatches. These errors may originate from sensor noise, environmental interference, or quantization errors during data pre-processing. To improve the accuracy of matching, it is essential to effectively eliminate mismatched point pairs. This study proposes a 3DSE-based method to utilize geometric relationships between feature points to validate the consistency of the initial matching set and eliminate mismatched pairs. The specific implementation of this method is detailed as follows.

First, the initial matching set

M_{initial}

between the source point cloud

P_{s}

and the target point cloud

P_{t}

is obtained using NN or NNDR. Using the 3DSE method, the geometric relationships of the matched pairs in

P_{s}

and

P_{t}

are computed, generating the corresponding spatial graphs

(G_{X}^{s}, G_{Y}^{s}, G_{Z}^{s})

and

(G_{X}^{t}, G_{Y}^{t}, G_{Z}^{t})

. To compare the geometric consistency between matched feature points, element-wise XOR operations are performed on

(G_{X}^{s}, G_{Y}^{s}, G_{Z}^{s})

and

(G_{X}^{t}, G_{Y}^{t}, G_{Z}^{t})

, resulting in inconsistency matrices

V_{x}

,

V_{y}

, and

V_{z}

, defined as follows:

\begin{matrix} V_{x} (i, j, n_{1}, n_{2}, n_{3}) = & G_{X}^{s} \oplus G_{X}^{t} \end{matrix}

(13)

\begin{matrix} V_{y} (i, j, n_{1}, n_{2}, n_{3}) = & G_{Y}^{s} \oplus G_{Y}^{t} \end{matrix}

(14)

\begin{matrix} V_{z} (i, j, n_{1}, n_{2}, n_{3}) = & G_{Z}^{s} \oplus G_{Z}^{t} \end{matrix}

(15)

Under ideal conditions, when the captured point clouds and their matching set

M init

are entirely correct, all entries in the inconsistency matrices

V_{x}

,

V_{y}

, and

V_{z}

will be zero. If mismatches exist, these errors will result in inconsistent entries in

G_{X}^{s}

and

G_{X}^{t}

, as well as mismatched entries in

G_{Y}^{s}

and

G_{Y}^{t}

, and

G_{Z}^{s}

and

G_{Z}^{t}

, causing the XOR results in

V_{x}

,

V_{y}

, and

V_{z}

at the corresponding positions to be 1. Based on this, the inconsistency summation is defined as follows:

\begin{matrix} S_{x} (i) & = \sum_{j = 1}^{N} ⋃_{n_{1} = 1}^{r} ⋃_{n_{2} = 1}^{r} ⋃_{n_{3} = 1}^{r} V_{x} (i, j, n_{1}, n_{2}, n_{3}) \end{matrix}

(16)

\begin{matrix} S_{y} (i) & = \sum_{j = 1}^{N} ⋃_{n_{1} = 1}^{r} ⋃_{n_{2} = 1}^{r} ⋃_{n_{3} = 1}^{r} V_{y} (i, j, n_{1}, n_{2}, n_{3}) \end{matrix}

(17)

\begin{matrix} S_{z} (i) & = \sum_{j = 1}^{N} ⋃_{n_{1} = 1}^{r} ⋃_{n_{2} = 1}^{r} ⋃_{n_{3} = 1}^{r} V_{z} (i, j, n_{1}, n_{2}, n_{3}) \end{matrix}

(18)

The three orthogonal components of inconsistency

S_{x}

,

S_{y}

, and

S_{z}

computed through spatial encoding represent the total inconsistency of each feature point along different directions. To identify the most spatially inconsistent matches, the maximum values of these components are examined. The appearance of a maximum value indicates the possible presence of mismatches along that direction, allowing these mismatched pairs to be marked and eliminated. For the eliminated matching pairs, the corresponding entries in

V_{x}

,

V_{y}

, and

V_{z}

are set to 0, and

S_{x}

,

S_{y}

, and

S_{z}

are recalculated until the maximum values of these components are below a predefined tolerance t.

The factor r is a critical parameter that controls the strictness of spatial constraints. It directly affects the balance between eliminating mismatches and preserving true matches. A larger value of r strengthens the spatial constraints, making the elimination of mismatches more stringent. However, this may also inadvertently remove some true matches. Conversely, a smaller value of r reduces the strictness of the constraints, lowering the risk of eliminating true matches but potentially failing to remove all mismatches. Therefore, selecting an appropriate value of r is crucial for ensuring efficient and accurate registration results.

In certain specific application scenarios, such as pose estimation, the choice of r becomes particularly important. Typically, a smaller value of r (e.g.,

r = 1

) indicates a lower level of spatial constraint strictness, allowing for greater tolerance in spatial relationships. Although this setting may permit more mismatches, it provides greater flexibility to retain true matches, which is particularly critical for pose estimation tasks. In such applications, where high registration accuracy is required and point clouds typically have overlapping regions, retaining more matches effectively reduces the risk of eliminating true matches. Therefore, in pose estimation scenarios, setting

r = 1

is a reasonable strategy. This not only simplifies the computation but also ensures efficient and accurate registration.

Additionally, under each pose variation, 3DSE is performed independently, and all encoding results are combined to generate the complete spatial encoding map. For common pose estimation scenarios, point clouds between consecutive frames generally have overlapping regions, and pose variations typically do not exceed 90°. Based on this assumption, this study selects

r = 1

to simplify the computation.

3.4. Point-Pair Distance Constraint

The 2D spatial encoding method is primarily applied to large-scale image recognition and retrieval tasks, where matching verification is required for images undergoing affine transformations. However, point clouds undergoing rigid transformations typically do not need to consider local scaling of the target. To ensure the accuracy of 3D feature matching, additional constraints must be introduced. The proposed algorithm incorporates point-pair distance constraints to ensure invariance under rigid transformations.

In practical applications, relying solely on a single method for matching may result in some erroneous matches not being effectively removed, particularly when the point cloud undergoes significant transformations or contains noise.

To illustrate this, Figure 3 demonstrates the effects of two different matching methods on point cloud matching. In the figure, red matching pairs represent erroneous matches, while blue matching pairs represent correct matches.

Specifically, Figure 3a shows that using only the 3D spatial constraints fails to eliminate the erroneous red matching pairs, while Figure 3b shows that using only the point-pair distance constraints fails to retain the correct blue matching pairs. Thus, relying solely on a single method leads to suboptimal matching results.

By combining these two methods, it is possible to eliminate erroneous matches while retaining correct matching pairs, thereby improving the accuracy of 3D point cloud matching.

In the algorithm, suppose that there exist matching pairs

(p_{i}^{s}, p_{i}^{t})

and

(p_{j}^{s}, p_{j}^{t})

between the source point cloud

P s

and the target point cloud

P t

. The point-pair distance inconsistency matrix

V_{d} (i, j)

is defined and calculated as follows:

First, the distance difference measure between corresponding point pairs in the source and target point clouds is calculated and denoted as

D_{temp} (i, j)

, with the formula as follows:

D_{temp} (i, j) = \frac{∥ p_{i}^{s} - p_{j}^{s} ∥ - ∥ p_{i}^{t} - p_{j}^{t} ∥}{\max (∥ p_{i}^{s} - p_{j}^{s} ∥, ∥ p_{i}^{t} - p_{j}^{t} ∥)}

(19)

Next, a predefined distance threshold

t_{d}

is used to define the point-pair distance inconsistency matrix

V_{d} (i, j)

. The calculation rule for this matrix is as follows:

V_{d} (i, j) = \{\begin{matrix} 0 & if - t_{d} \leq D_{temp} (i, j) \leq t_{d} \\ 1 & if D_{temp} (i, j) < - t_{d} or t_{d} < D_{temp} (i, j) \end{matrix}

(20)

Finally, the distance inconsistency component

S_{d} (i)

for each matching pair with respect to other matching pairs is calculated as follows:

S_{d} (i) = \sum_{j = 1}^{N} V_{d} (i, j)

(21)

Through the above calculations, we can obtain the distance inconsistency component for each matching point pair under the distance threshold. This enables the effective identification and removal of erroneous matches with significant distance deviations.

3.5. Code

The following section provides an overview of Algorithm 1, which outlines the key steps of the proposed 3DSE method for eliminating mismatches in point cloud matching.

3.6. Computational Complexity

The time complexity of the proposed 3DSE method is dominated by three key operations: spatial encoding matrix construction and iterative outlier removal. For a point cloud with n initial matching pairs and r rotational discretization steps per axis, for each of the rotational poses, pairwise comparisons between n matches are performed across the X/Y/Z-axes. The spatial encoding stage requires

O (r^{3} n^{2})

operations to generate spatial relationship matrices. If the value of r is small or the computational space is sufficient, spatial encoding for different poses can be constructed in parallel, reducing the computational complexity to

O (n^{2})

. Meanwhile, all initial matches are compared pairwise to compute Euclidean distance discrepancies. The distance constraint matrix requires

\frac{n (n - 1)}{2}

operations to compute relative Euclidean distance differences between source and target point pairs; this results in

O (n^{2})

operations. The iterative outlier removal process requires calculating constraint inconsistency and finding the index of the most inconsistent point pair, and updating matrices with different constraint dimensions. It requires

O (n)

operations. In summary, the computational complexity which scales linearly with the number of pose variations and quadratically with the correspondence set size. Total Time Complexity:

O (r^{3} n^{2})

. The space complexity remains highly efficient due to binary encoding and minimal intermediate storage. Source/target point coordinates and initial matches are preloaded, requiring no additional storage. Spatial encoding matrices

V_{x}, V_{y}, V_{z}

:

\frac{r^{3} n^{2}}{8}

bits (1-bit storage). Total Space Complexity:

O (r^{3} n^{2})

bits.

Algorithm 1 Algorithm for 3DSE-based point cloud feature matching

Input: Source point cloud $P_{s}$ and target point cloud $P_{t}$ .
Output: $M_{inlier}$ .
Extract key points from the source point cloud $P_{s}$ and target point cloud $P_{t}$ using the same key point selection criteria, resulting in point clouds $P_{s}$ and $P_{t}$ . Here, $P_{s}$ contains m points with a feature set $F_{s} = {f_{i}^{s} ∣ i = 1, 2, \dots, m}$ , and $P_{t}$ contains n points with a feature set $F_{t} = {f_{j}^{t} ∣ j = 1, 2, \dots, n}$ .
Generate the initial matching pair set $M_{initial}$ by matching local feature descriptors of $F_{s}$ and $F_{t}$ using Equation (5).
Generate the spatial graphs $G_{X}^{s}, G_{Y}^{s}, G_{Z}^{s}$ and $G_{X}^{t}, G_{Y}^{t}, G_{Z}^{t}$ for $P_{s}$ and $P_{t}$ using Equations (10)–(12).
Compute the distance difference measure $D_{temp} (i, j)$ using Equation (19).
Generate the inconsistency matrices $V_{x}, V_{y}, V_{z}$ , and $V_{d}$ using Equations (13)–(15), and (20).
Compute the inconsistency components $S_{x}, S_{y}, S_{z}$ , and $S_{d}$ across different dimensions using Equations (16)–(18), and (21).
Set $M_{inlier} = M_{initial}$ .
While $\max (S_{x}, S_{y}, S_{z}, S_{d}) > t$ , enter the loop:
a. Identify the index I of the matching pair that maximizes $(S_{x} + S_{y} + S_{z} + S_{d})$ .
b. Set $V_{x} (I)$ , $V_{y} (I)$ , $V_{z} (I)$ , and $V_{d} (I)$ to 0.
c. Recalculate $S_{x}$ , $S_{y}$ , $S_{z}$ , and $S_{d}$ .
d. Update $M_{inlier} = M_{inlier} - (p_{I}^{s}, p_{I}^{t})$ .
End the loop.

4. Experiments

4.1. Experimental Data

This study employs both simulated and real-world data to evaluate the performance of the proposed 3DSE method in point cloud feature matching for space targets. To accurately simulate the reflective characteristics of solar panels on satellites, the point cloud density is adjusted during simulation based on the incident angle of the laser. The model of the simulated satellite is shown in Figure 4.

In addition to simulations, real-world data are employed to evaluate the algorithm’s performance. The real-world data originate from a 64-element linear array LiDAR developed by the Shanghai Institute of Technical Physics, Chinese Academy of Sciences, which was used to scan a scaled satellite model. The photo of the scaled satellite model is shown in Figure 5. The structure of the satellite scale model has strong symmetry. If feature matching is performed only through local feature descriptors, incorrect matching is likely to occur at the key points of symmetrical structure sampling. The proposed algorithm is tested using both simulated and real-world data.

To verify the applicability of the algorithm across different source data, we have also added a standard 3D feature-matching dataset, the Bologna Dataset5 (BoD5). The BoD5 dataset was obtained through a Kinect sensor and includes 43 model pairs with clutter and occlusions, significantly affected by noise. Figure 6 provides an example view of this dataset.

4.2. Criteria

To quantitatively assess the performance of point cloud feature-matching methods, this study employs Recall of Inliers (ROI) as the evaluation metric. This metric is defined by analyzing the relevance of the top K matching pairs scored by the algorithm. Specifically, let

M_{inlier}^{K}

represent the set of the top K matching pairs based on the scoring method, and let

M_{inlier}

denote the initial matching set. For a given K, the recall is defined as

r e c a l l_{K} = \frac{# inliers in M_{i n l i e r}^{K}}{# inliers in M_{i n i t i a l}}

(22)

By varying the value of K, recall curves can be generated to reflect the proportion of inliers within the selected matching pairs for different values of K. A higher recall

{Recall}_{K}

indicates a larger proportion of inliers within the selected pairs.

To determine whether a matching pair

m = (p^{s}, p^{t})

is an inlier, the ground-truth rotation matrix

R_{true}

and translation vector

T_{true}

are used to compute the

L 2

distance between the transformed source point and the target point. If the following condition is satisfied, the pair m is considered an inlier:

| R_{t r u e} p^{s} + T_{t r u e} - p^{t} | \leq τ_{i n l i e r}

(23)

Here,

τ_{inlier}

is the distance threshold used to determine spatial consistency for matching pairs. In this study,

τ_{inlier}

is set to

5 p r

, where

p r

represents the average resolution of the point cloud, defined as

p r = \frac{1}{N} \sum_{k = 1}^{N} | p_{k} - p_{k}^{n n} |

(24)

where N is the total number of points in the point cloud, and

| p_{k} - p_{k}^{nn} |

represents the

L 2

distance between the k-th point and its nearest neighbor.

Using these evaluation metrics, the accuracy and robustness of the proposed method in point cloud feature matching can be comprehensively assessed. These metrics not only account for the number of matching pairs but also consider their spatial consistency, providing a comprehensive evaluation framework for algorithm performance.

4.3. Comparison Methods

In this study, we evaluated four point cloud feature-matching methods and compared them with the proposed 3DSE method. The four methods include (1) NN matching, (2) NNDR matching, (3) L2-based spatial encoding, and (4) progressive consistency voting (PCV) [42] optimization.

The key distinction of the 3DSE algorithm lies in its optimization of point cloud features via 3DSE, resulting in more precise and robust matching. To comprehensively evaluate the performance of different methods, we compared them across multiple aspects, focusing on accuracy, robustness, and computational efficiency in complex environments. Through this comparison, the relative advantages of the 3DSE algorithm are clearly demonstrated, especially in handling highly complex and irregular point cloud data, where it exhibits superior robustness and accuracy.

Additionally, this paper discusses the advantages and disadvantages of these methods and their applicability in real-world scenarios, providing valuable references for future research.

4.4. Parameter Selection and Ablation Experiments

Before assessing the performance of 3DSE, it is imperative to validate the efficacy of some of its critical components to ensure the interpretability of 3DSE. Experimental analysis is also required for the selection of several key parameters. These experiments are conducted on real-world datasets collected using a 64-line array LiDAR. K indicates the number of top-ranked correspondences.

The parameter r controls the rotational invariance of 3DSE by discretizing Euler angles. We experimented with different values of r. According to Figure 7a, there is no significant fluctuation in performance when r exceeds 1. However, as illustrated in Figure 7b, computation time increases significantly with higher r values. This trade-off aligns with the characteristics of pose estimation scenarios—limited-angle inter-frame rotations maintain a certain overlap area, rendering fine-grained posture discretization (

r > 1

) meaningless. Therefore, choosing

r = 1

is recommended to enhance efficiency. The distance threshold

t_{d}

determines the tolerance for scale differences between matching point pairs. A smaller threshold may excessively eliminate valid matches, while a larger threshold might allow anomalous matches. In this experiment, the distance threshold parameter is set to

t_{d} = 0.2

.

Table 1 presents the results of our ablation study, demonstrating that the best performance is achieved when both 3DSE constraints and point-pair distance constraints are combined. The ablation experiments confirmed the necessity of spatial encoding and distance constraints. Figure 3 illustrates potential failure cases when using only a single module.

4.5. Experimental Details

In the experimental stage, the initial matching set

M_{initial}

needs to be constructed first. To generate the initial matching set, the source point cloud

P s

and the target point cloud

P t

are down-sampled. After down-sampling, we choose to use the Signature of Histograms of Orientations (SHOT) [57] descriptor—a robust feature description method, to extract features from the down-sampled point clouds. Next, brute-force matching is performed based on the L2 distance metric to generate the preliminary matching pair set

M_{initial}

. It is important to note that the purpose of the experiment is to evaluate the performance of different feature-matching methods under a unified initial matching set, rather than optimizing detectors and descriptors to produce the best matching set.

When evaluating feature-matching methods, this paper compares several classic approaches, including direct NN matching, the NNDR method, the L2-based spatial encoding method, and the PCV matching strategy. Each method has its strengths and limitations. This paper comprehensively compares these methods to explore their performance in practical point cloud-matching tasks.

To quantitatively compare the performance of these methods, Recall of Inliers (ROI) is selected as the evaluation criterion. By calculating the recall, the performance of different methods under various conditions can be assessed, especially in terms of the accuracy and robustness of inlier matching. Ultimately, the performance evaluation of all methods will be based on this quantitative standard to ensure the comparability of experimental results.

Through the above experimental procedures, not only can the effectiveness of different methods be evaluated, but a reliable quantitative basis can also be provided for the proposed 3DSE method to verify its advantages in feature-matching tasks.

4.6. ROI Performance Display

We tested the recall performance of each method on both simulated and real-world data to comprehensively evaluate the robustness of each algorithm. In the simulation experiments, Gaussian noise of varying intensities was added to the simulated point clouds, and different levels of down-sampling were performed to verify the performance of each method under these complex conditions.

Figure 8 illustrates the recall performance of different algorithms on the simulated data. Figure 8a–e show the test results under different levels of Gaussian noise, while Figure 8f–h present the results under various data sampling ratios. The experimental results indicate that as the intensity of Gaussian noise increases, especially at a noise level of 0.5 pr, the 3DSE method maintains good performance despite the noise, demonstrating high robustness. In the data sampling tests, as the amount of point cloud data decreases, 3DSE still achieves high recall performance.

Additionally, Figure 9 shows the recall performance of different algorithms on real-world data. On real-world data, the 3DSE method significantly outperforms other compared methods, particularly excelling in complex environments. Compared to simulated data, the real-world data contain more complex noise and data loss. However, 3DSE still demonstrates high accuracy and robustness under these conditions.

Overall, the 3DSE method exhibits strong robustness and high recall rates under various experimental conditions. It provides accurate matching results even under noise interference and data sparsity, proving its potential in practical applications.

4.7. Robustness to Different Data Sources

In addition to simulation data and LiDAR data, we conducted experiments on the BoD5 dataset, captured using Kinect, to test the algorithm’s applicability to point cloud data acquired from other sensors. The results are shown in Figure 10. As demonstrated in Figure 10a, the 3DSE method still exhibits certain advantages, showing strong robustness to data obtained from different sensors. We also visualized the matching set before and after the removal of incorrect matches using pseudo-coloring (with red indicating incorrect matches and blue indicating correct matches).

Moreover, leveraging the binary classification characteristic of 3DSE, we tested the classification performance of various methods on the BoD5 dataset. We divided the initial matching set into two subsets comprising inliers and outliers. Based on the classification results of different algorithms, we used a confusion matrix to categorize them and calculated metrics such as precision, recall, and F-score to evaluate the classification performance of each method. The results are presented in Table 2.

4.8. Effectiveness Display

To intuitively demonstrate the practical performance of the 3DSE algorithm, Figure 11 presents a comparison of real-world point cloud data before and after the removal of mismatches. In Figure 11a, all matching pairs in the initial matching set are shown, where a large number of mismatched pairs can be observed. Figure 11b illustrates the matching results after processing with the 3DSE algorithm, which significantly reduces erroneous matches, retaining only a small number of high-quality matching points.

Through comparative analysis, it is evident that the 3DSE algorithm can accurately identify and eliminate mismatched pairs, thereby significantly improving the precision and robustness of point cloud registration. This result not only validates the effectiveness of the 3DSE algorithm in processing real-world point cloud data but also provides high-quality point cloud registration results for subsequent satellite model analysis. Specifically, after eliminating mismatches, the matching results exhibit higher geometric consistency, further proving the method’s potential for application in real-world complex environments.

4.9. Time Efficiency

To evaluate the time efficiency of different methods in handling the initial correspondence set, experiments were conducted in the MATLAB 2022b environment. The experiments tested various scenarios with initial correspondence counts ranging from 50 to 1000, with each scenario repeated ten times. The average value was taken as the measured time cost. The results are shown in Table 3.

It is worth noting that the experimental algorithms were implemented in the MATLAB 2022b environment primarily for its convenience in rapid development and experimental validation rather than for high-performance optimization. Compared to lower-level languages such as C++, MATLAB typically incurs higher runtime costs. Therefore, the time costs presented in the table may be higher than those in related studies. However, this does not affect the comparative analysis of relative efficiency among the algorithms.

Specifically, as shown in Table 1, when the scale of the initial correspondence set is small (50 to 200 pairs), the NN and NNDR methods incur the least time cost due to their simplicity. As the scale of the correspondence set increases, the time costs of L2SE, PCV, and 3DSE methods grow rapidly. In particular, the PCV method, due to its multi-round consistency voting process, exhibits significantly higher time costs when the correspondence count reaches 1000. Although the 3DSE method involves complex spatial consistency calculations, its time cost growth trend is relatively moderate compared to the PCV method, demonstrating a better balance between efficiency and effectiveness.

The time cost data in this experiment can serve as a reference for algorithm selection in different application scenarios. For example, NN or NNDR methods can be prioritized in scenarios requiring high real-time performance. For scenarios requiring higher matching precision and robustness, the 3DSE method demonstrates greater application value.

5. Conclusions

This paper presents a robust 3D point cloud feature-matching method named 3DSE. The core idea is to leverage the consistency of local geometric features in true matching pairs and to eliminate unreliable mismatched pairs through spatial geometric constraints, thereby generating a high-quality set of matching pairs. Through systematic experimental evaluation and comparative analysis, the proposed method demonstrates superior robustness and matching performance in both simulated and real-world data. The following are the main characteristics and application prospects of 3DSE.

5.1. Algorithm Characteristics

Simple and efficient geometric constraint design: 3DSE imposes consistency constraints on local features through spatial geometric encoding, without requiring complex iterative optimization processes. This approach is easy to understand and implement.

Outstanding robustness and adaptability: Experimental results on different datasets demonstrate that 3DSE maintains high performance and robustness under challenging conditions such as noise interference, data sampling biases, clutter, occlusions, and changes in data modalities. In sparse and noisy low-quality Kinect datasets like BoD5, 3DSE outperforms similar methods across various classification metrics. Additionally, in LiDAR-collected point clouds, when dealing with satellite-scale models characterized by numerous symmetrical and repetitive structures and few features, 3DSE effectively utilizes the constraints of feature point spatial layout, achieving superior outlier removal compared to similar algorithms.

Performance optimization and scalability: Compared with classical methods, 3DSE significantly improves point cloud registration accuracy while removing mismatched pairs, providing a reliable foundation for subsequent point cloud processing tasks.

5.2. Experimental Validation and Practicality

Comprehensive experimental analyses on both simulated and real-world data demonstrate that 3DSE outperforms state-of-the-art methods under various matching conditions. In particular, for 3D registration tasks, by effectively improving the quality of matching pairs, 3DSE significantly enhances the overall performance of point cloud registration. Moreover, the method’s implementation process is straightforward, and its runtime efficiency demonstrates high practical value under experimental conditions.

5.3. Application Prospects

The 3DSE method has broad application potential, particularly in the following fields:

Three-dimensional reconstruction: Reliable point-to-point matches support the fusion and reconstruction of multi-view data.
Object recognition and scene understanding: Achieving precise matching between models and point clouds in complex scenes provides a foundation for object recognition tasks.
Pose estimation of non-cooperative targets: In dynamic scenes, real-time pose estimation of targets based on multi-frame point cloud matching supports tasks such as navigation and industrial inspection.

5.4. Future Work

To further enhance the performance and applicability of 3DSE, we plan to conduct research in the following directions:

Efficiency optimization: Investigate more efficient strategies for computing geometric constraints in the algorithm to reduce computation time and resource consumption while maintaining matching accuracy.

Extension to multi-modal and complex scenarios: Explore multi-modal data fusion techniques to enhance 3DSE’s adaptability and robustness when handling high-dimensional data or operating in complex sensor environments, such as applications in unmanned systems and medical imaging.

In summary, the 3DSE method not only demonstrates significant advantages in current 3D point cloud processing tasks but also provides potential and directions for future research and applications.

Author Contributions

Conceptualization, H.W. and F.W.; methodology, H.W., F.W. and X.S.; software, H.W. and F.W.; validation, H.W.; formal analysis, H.W. and F.W.; investigation, H.W.; resources, G.H.; data curation, H.W. and F.W.; writing—original draft preparation, H.W.; writing—review and editing, F.W., R.X., W.K. and G.H.; visualization, H.W.; supervision, W.K. and G.H.; project administration, G.H.; funding acquisition, W.K. and G.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (42241169,62205361), the Shanghai Sailing Program (23YF1455100), the Youth Innovation Promotion Association, Chinese Academy of Sciences (2021234), the Shanghai Rising-Star Program (22QA1410500), the Shanghai Municipal Science and Technology Major Project of Science and Technology Commission of Shanghai Municipality (2019SHZDZX01) and the Innovation Program for Quantum Science and Technology (2021ZD0300304).

Data Availability Statement

Data is contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Fengguang, X.; Wang, H.; Xie, H.; Liqun, K. Removal Method of Mismatching Keypoints in 3D Point Cloud. Acta Opt. Sin. 2018, 38, 0210003. [Google Scholar] [CrossRef]
Guo, Y.; Sohel, F.; Bennamoun, M.; Wan, J.; Lu, M. A novel local surface feature for 3D object recognition under clutter and occlusion. Inf. Sci. 2015, 293, 196–213. [Google Scholar] [CrossRef]
Kang, G.; Zhang, Q.; Wu, J.; Zhang, H. Pose estimation of a non-cooperative spacecraft without the detection and recognition of point cloud features. Acta Astronaut. 2021, 179, 569–580. [Google Scholar] [CrossRef]
Zhou, Y.; Li, X.; Hu, H.; Su, L.; Du, H.; Fu, W.; Xu, L. Neighbor feature variance (NFV) based feature point selection method for three dimensional (3D) registration of space target. Measurement 2023, 222, 113693. [Google Scholar] [CrossRef]
Zou, X.; He, H.; Wu, Y.; Chen, Y.; Xu, M. Automatic 3D point cloud registration algorithm based on triangle similarity ratio consistency. IET Image Process. 2020, 14, 3314–3323. [Google Scholar] [CrossRef]
Zhang, L.; Liu, L.; Chai, B.; Xu, M.; Song, Y. Multi-resolution 3D reconstruction of cultural landscape heritage based on cloud computing and hd image data. J. Intell. Fuzzy Syst. 2020, 39, 5097–5107. [Google Scholar] [CrossRef]
Imbert, M.; Xiaoxing, L. Automatic Registration of 3D Point Clouds for Reverse Engineering. Adv. Sci. Lett. 2011, 4, 2431–2432. [Google Scholar] [CrossRef]
Cao, H.; Chen, D.; Zheng, Z.; Zhang, Y.; Zhou, H.; Ju, J. Fast Point Cloud Registration Method with Incorporation of RGB Image Information. Appl. Sci. 2023, 13, 5161. [Google Scholar] [CrossRef]
Xie, Y.; Song, A. Multi-View Registration of Partially Overlapping Point Clouds for Robotic Manipulation. IEEE Robot. Autom. Lett. 2024, 9, 8451–8458. [Google Scholar] [CrossRef]
Ghorbani, F.; Ebadi, H.; Sedaghat, A.; Pfeifer, N. A Novel 3-D Local DAISY-Style Descriptor to Reduce the Effect of Point Displacement Error in Point Cloud Registration. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2022, 15, 2254–2273. [Google Scholar] [CrossRef]
Besl, P.; McKay, N. A method for registration of 3-D shapes. IEEE Trans. Pattern Anal. Mach. Intell. 1992, 14, 239–256. [Google Scholar] [CrossRef]
Chen, Y.; Medioni, G. Object Modeling by Registration of Multiple Range Images. Image Vis. Comput. 1992, 10, 145–155. [Google Scholar] [CrossRef]
Zhou, Q.Y.; Park, J.; Koltun, V. Fast Global Registration. In Computer Vision—ECCV 2016. ECCV 2016; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2016; Volume 9906, pp. 766–782. [Google Scholar] [CrossRef]
Zhang, W.; Zhou, H.; Dong, Z.; Yan, Q.; Xiao, C. Rank-PointRetrieval: Reranking Point Cloud Retrieval via a Visually Consistent Registration Evaluation. IEEE Trans. Vis. Comput. Graph. 2023, 29, 3840–3854. [Google Scholar] [CrossRef]
Johnson, A.; Hebert, M. Using spin images for efficient object recognition in cluttered 3D scenes. IEEE Trans. Pattern Anal. Mach. Intell. 1999, 21, 433–449. [Google Scholar] [CrossRef]
Frome, A.; Huber, D.; Kolluri, R.; Bülow, T.; Malik, J. Recognizing objects in range data using regional point descriptors. In Computer Vision—ECCV 2004. ECCV 2004; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2004; Volume 3023, pp. 224–237. [Google Scholar]
Rusu, R.B.; Blodow, N.; Beetz, M. Fast Point Feature Histograms (FPFH) for 3D Registration. In Proceedings of the IEEE International Conference on Robotics and Automation, Kobe, Japan, 12–17 May 2009; Volume 1–7, pp. 1848–1853. [Google Scholar]
Yang, J.; Zhang, Q.; Xiao, Y.; Cao, Z. TOLDI: An effective and robust approach for 3D local shape description. Pattern Recognit. 2017, 65, 175–187. [Google Scholar] [CrossRef]
Gressin, A.; Mallet, C.; Demantke, J.; David, N. Towards 3D lidar point cloud registration improvement using optimal neighborhood knowledge. ISPRS-J. Photogramm. Remote Sens. 2013, 79, 240–251. [Google Scholar] [CrossRef]
Poiesi, F.; Boscaini, D. Learning General and Distinctive 3D Local Deep Descriptors for Point Cloud Registration. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 3979–3985. [Google Scholar] [CrossRef]
Ao, S.; Hu, Q.; Yang, B.; Markham, A.; Guo, Y. SpinNet: Learning a General Surface Descriptor for 3D Point Cloud Registration. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 11748–11757. [Google Scholar] [CrossRef]
Ao, S.; Guo, Y.; Hu, Q.; Yang, B.; Markham, A.; Chen, Z. You Only Train Once: Learning General and Distinctive 3D Local Descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 3949–3967. [Google Scholar] [CrossRef]
Kwon, H. AudioGuard: Speech Recognition System Robust against Optimized Audio Adversarial Examples. Multimed Tools Appl. 2023, 83, 57943–57962. [Google Scholar] [CrossRef]
Guo, Y.; Bennamoun, M.; Sohel, F.; Lu, M.; Wan, J.; Kwok, N.M. A Comprehensive Performance Evaluation of 3D Local Feature Descriptors. Int. J. Comput. Vis. 2016, 116, 66–89. [Google Scholar] [CrossRef]
Yang, J.; Xian, K.; Xiao, Y.; Cao, Z. Performance Evaluation of 3D Correspondence Grouping Algorithms. In Proceedings of the 2017 International Conference on 3D Vision (3DV), Qingdao, China, 10–12 October 2017; pp. 467–476. [Google Scholar] [CrossRef]
Zhao, B.; Chen, X.; Le, X.; Xi, J.; Jia, Z. A Comprehensive Performance Evaluation of 3-D Transformation Estimation Techniques in Point Cloud Registration. IEEE Trans. Instrum. Meas. 2021, 70, 5018814. [Google Scholar] [CrossRef]
Yang, J.; Xiao, Y.; Cao, Z. Toward the Repeatability and Robustness of the Local Reference Frame for 3D Shape Matching: An Evaluation. IEEE Trans. Image Process. 2018, 27, 3766–3781. [Google Scholar] [CrossRef]
Lim, J.; Lee, K. 3D object recognition using scale-invariant features. Vis. Comput. 2019, 35, 71–84. [Google Scholar] [CrossRef]
Guo, Y.; Bennamoun, M.; Sohel, F.; Lu, M.; Wan, J. 3D Object Recognition in Cluttered Scenes with Local Surface Features: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 36, 2270–2287. [Google Scholar] [CrossRef]
Yang, J.; Cao, Z.; Zhang, Q. A fast and robust local descriptor for 3D point cloud registration. Inf. Sci. 2016, 346, 163–179. [Google Scholar] [CrossRef]
Guo, Y.; Sohel, F.; Bennamoun, M.; Wan, J.; Lu, M. An Accurate and Robust Range Image Registration Algorithm for 3D Object Modeling. IEEE Trans. Multimed. 2014, 16, 1377–1390. [Google Scholar] [CrossRef]
Lowe, D. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
Mikolajczyk, K.; Schmid, C. A performance evaluation of local descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 1615–1630. [Google Scholar] [CrossRef]
Fischler, M.; Bolles, R. Random Sample Consensus—A Paradigm for Model-Fitting with Applications to Image-Analysis and Automated Cartography. Commun. ACM 1981, 24, 381–395. [Google Scholar] [CrossRef]
Taati, B.; Greenspan, M. Local shape descriptor selection for object recognition in range data. Comput. Vis. Image Underst. 2011, 115, 681–694. [Google Scholar] [CrossRef]
Papazov, C.; Burschka, D. An Efficient RANSAC for 3D Object Recognition in Noisy and Occluded Scenes. In Computer Vision—ACCV 2010, Proceedings of the 10th Asian Conference on Computer Vision, Queenstown, New Zealand, 8–12 November 2010; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2010; Volume 6492, pp. 135–148. [Google Scholar]
Chen, H.; Bhanu, B. 3D free-form object recognition in range images using local surface patches. Pattern Recognit. Lett. 2007, 28, 1252–1262. [Google Scholar] [CrossRef]
Mian, A.S.; Bennamoun, M.; Owens, R. Three-dimensional model-based object recognition and segmentation in cluttered scenes. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 28, 1584–1601. [Google Scholar] [CrossRef]
Dorai, C.; Wang, G.; Jain, A.; Mercer, C. Registration and integration of multiple object views for 3D model construction. IEEE Trans. Pattern Anal. Mach. Intell. 1998, 20, 83–89. [Google Scholar] [CrossRef]
Johnson, A.; Hebert, M. Surface matching for object recognition in complex three-dimensional scenes. Image Vis. Comput. 1998, 16, 635–651. [Google Scholar] [CrossRef]
Yang, J.; Xiao, Y.; Cao, Z.; Yang, W. Ranking 3D feature correspondences via consistency voting. Pattern Recognit. Lett. 2019, 117, 1–8. [Google Scholar] [CrossRef]
Quan, S.; Yin, K.; Ye, K.; Nan, K. Robust Feature Matching for 3D Point Clouds with Progressive Consistency Voting. Sensors 2022, 22, 7718. [Google Scholar] [CrossRef]
Zhong, Y. Intrinsic shape signatures: A shape descriptor for 3D object recognition. In Proceedings of the 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops, Kyoto, Japan, 27 September–4 October 2009; pp. 689–696. [Google Scholar]
Mian, A.; Bennamoun, M.; Owens, R. On the Repeatability and Quality of Keypoints for Local Feature-based 3D Object Retrieval from Cluttered Scenes. Int. J. Comput. Vis. 2010, 89, 348–361. [Google Scholar] [CrossRef]
Drost, B.; Ulrich, M.; Navab, N.; Ilic, S. Model Globally, Match Locally: Efficient and Robust 3D Object Recognition. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; pp. 998–1005. [Google Scholar] [CrossRef]
Guo, Y.; Sohel, F.; Bennamoun, M.; Lu, M.; Wan, J. Rotational Projection Statistics for 3D Local Surface Description and Object Recognition. Int. J. Comput. Vis. 2013, 105, 63–86. [Google Scholar] [CrossRef]
Guo, Y.; Bennamoun, M.; Sohel, F.A.; Wan, J.; Lu, M. 3D Free Form Object Recognition using Rotational Projection Statistics. In Proceedings of the 2013 IEEE Workshop on Applications of Computer Vision (WACV), Clearwater Beach, FL, USA, 15–17 January 2013; pp. 1–8. [Google Scholar]
Guo, Y.; Sohel, F.; Bennamoun, M.; Lu, M.; Wan, J. TriSI: A distinctive local surface descriptor for 3D modeling and object recognition. In Proceedings of the International Conference on Computer Graphics Theory and Applications, Barcelona, Spain, 21–24 February 2013; Volume 2, pp. 86–93. [Google Scholar]
Bariya, P.; Nishino, K. Scale-Hierarchical 3D Object Recognition in Cluttered Scenes. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; pp. 1657–1664. [Google Scholar] [CrossRef]
Bariya, P.; Novatnack, J.; Schwartz, G.; Nishino, K. 3D Geometric Scale Variability in Range Images: Features and Descriptors. Int. J. Comput. Vis. 2012, 99, 232–255. [Google Scholar] [CrossRef]
Albarelli, A.; Rodola, E.; Torsello, A. A Game-Theoretic Approach to Fine Surface Registration without Initial Motion Estimation. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; pp. 430–437. [Google Scholar] [CrossRef]
Rodola, E.; Albarelli, A.; Bergamasco, F.; Torsello, A. A Scale Independent Selection Process for 3D Object Recognition in Cluttered Scenes. Int. J. Comput. Vis. 2013, 102, 129–145. [Google Scholar] [CrossRef]
Ashbrook, A.; Fisher, R.B.; Robertson, C.; Werghi, N. Finding surface correspondence for object recognition and registration using pairwise geometric histograms. In Computer Vision—ECCV’98, Proceedings of the 5th European Conference on Computer Vision, Freiburg, Germany, 2–6 June 1998, Proceedings, Volume II; Springer: Berlin/Heidelberg, Germany, 1998; pp. 674–686. [Google Scholar]
Knopp, J.; Prasad, M.; Willems, G.; Timofte, R.; Van Gool, L. Hough Transform and 3D SURF for Robust Three Dimensional Classification. In Computer Vision—ECCV 2010, Proceedings of the 11th European Conference on Computer Vision, Heraklion, Crete, Greece, 5–11 September 2010, Proceedings, Part VI; Springer: Berlin/Heidelberg, Germany, 2010; Volume 6316, p. 589. [Google Scholar]
Tombari, F.; Di Stefano, L. Object recognition in 3D scenes with occlusions and clutter by hough voting. In Proceedings of the 2010 4th Pacific-Rim Symposium on Image and Video Technology, Singapore, 14–17 November 2010; pp. 349–355. [Google Scholar]
Zhou, W.; Li, H.; Lu, Y.; Tian, Q. SIFT Match Verification by Geometric Coding for Large-Scale Partial-Duplicate Web Image Search. ACM Trans. Multimed. Comput. Commun. Appl. 2013, 9, 1–18. [Google Scholar] [CrossRef]
Salti, S.; Tombari, F.; Di Stefano, L. SHOT: Unique signatures of histograms for surface and texture description. Comput. Vis. Image Underst. 2014, 125, 251–264. [Google Scholar] [CrossRef]

Figure 1. Schematic diagram of the projection of feature points at four different locations on different coordinate planes.

Figure 2. The relative spatial relationship of other feature points after a feature point is moved to the origin.

Figure 3. (a) The possible impact of using only the spatial encoding constraint. (b) The possible impact of using only the L2 distance between points as the geometric consistency constraint.

Figure 4. Simulated satellite model.

Figure 5. Scaled satellite model.

Figure 6. BoD5 dataset.

Figure 7. The impact of changing r on the 3DSE method affects (a) recall performance and (b) efficiency.

Figure 8. (a) 0.1 pr Gaussian noise; (b) 0.2 pr Gaussian noise; (c) 0.3 pr Gaussian noise; (d) 0.4 pr Gaussian noise; (e) 0.5 pr Gaussian noise; (f) 1/2 data decimation; (g) 1/4 data decimation; (h) 1/8 data decimation.

Figure 9. ROI of different algorithms on the satellite scale model point cloud.

Figure 10. Testing robustness to morphological changes in data on the BoD5 dataset. (a) Recall performance at different K values. (b) Visualization of the initial matching set. (c) The top 20 matching pairs retained after applying 3DSE.

Figure 11. (a) Before using 3DSE to remove incorrect matches. (b) After using 3DSE to remove incorrect matches (the red line represents incorrect matches, and the blue line represents correct matches).

Table 1. Comparison of the recall performances when using different constraints and keeping other configurations identical. The best method is shown in bold.

Constraints	K = 10	K = 20	K = 50	K = 100
3DSE Constraint	0.250	0.500	0.781	0.969
Point-Pair Distance Constraint	0.188	0.344	0.625	0.969
Both	0.281	0.531	0.906	1.000

Table 2. Testing the classification performance of different methods on the BoD5 dataset (K = 20).

	Precision (%)	Recall (%)	F-Score (%)
NN	5.0	4.5	4.8
NNDR	5.0	4.3	4.7
L2SE	45.0	40.9	42.9
PCV	85.0	73.9	79.1
3DSE	95.0	82.6	88.4

Table 3. Time cost of five feature-matching methods with different numbers of initial correspondences (unit: seconds).

Correspondences	NN	NNDR	L2SE	PCV	3DSE
50	0.00187	0.00187	0.0156	0.0565	0.0559
100	0.00212	0.00212	0.0433	0.3485	0.1346
200	0.00261	0.00261	0.1635	1.7377	0.4375
500	0.00565	0.00565	1.0887	18.7269	2.6988
1000	0.01252	0.01252	4.0854	324.0159	10.6752

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, H.; Wang, F.; Xue, R.; She, X.; Kong, W.; Huang, G. A Robust Feature-Matching Method for 3D Point Clouds via Spatial Encoding. Symmetry 2025, 17, 640. https://doi.org/10.3390/sym17050640

AMA Style

Wang H, Wang F, Xue R, She X, Kong W, Huang G. A Robust Feature-Matching Method for 3D Point Clouds via Spatial Encoding. Symmetry. 2025; 17(5):640. https://doi.org/10.3390/sym17050640

Chicago/Turabian Style

Wang, Han, Fengxiang Wang, Ruikai Xue, Xiaokai She, Wei Kong, and Genghua Huang. 2025. "A Robust Feature-Matching Method for 3D Point Clouds via Spatial Encoding" Symmetry 17, no. 5: 640. https://doi.org/10.3390/sym17050640

APA Style

Wang, H., Wang, F., Xue, R., She, X., Kong, W., & Huang, G. (2025). A Robust Feature-Matching Method for 3D Point Clouds via Spatial Encoding. Symmetry, 17(5), 640. https://doi.org/10.3390/sym17050640

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Robust Feature-Matching Method for 3D Point Clouds via Spatial Encoding

Abstract

1. Introduction

2. Related Work

3. Methodology

3.1. Initial Matching Set Acquisition

3.2. The 3DSE Building

3.3. The 3DSE Constraint

3.4. Point-Pair Distance Constraint

3.5. Code

3.6. Computational Complexity

4. Experiments

4.1. Experimental Data

4.2. Criteria

4.3. Comparison Methods

4.4. Parameter Selection and Ablation Experiments

4.5. Experimental Details

4.6. ROI Performance Display

4.7. Robustness to Different Data Sources

4.8. Effectiveness Display

4.9. Time Efficiency

5. Conclusions

5.1. Algorithm Characteristics

5.2. Experimental Validation and Practicality

5.3. Application Prospects

5.4. Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI