Next Article in Journal
Assimilating GNSS TEC with an LETKF over Yunnan, China
Previous Article in Journal
Multi-Oriented Enhancement Branch and Context-Aware Module for Few-Shot Oriented Object Detection in Remote Sensing Images
Previous Article in Special Issue
SRTPN: Scale and Rotation Transform Prediction Net for Multimodal Remote Sensing Image Registration
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

PointCNT: A One-Stage Point Cloud Registration Approach Based on Complex Network Theory

1
National Key Lab of Aerospace Power System Safety and Plasma Technology, Air Force Engineering University, Xi’an 710038, China
2
School of Mechanical Engineering, Xi’an Jiaotong University, Xi’an 710049, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2023, 15(14), 3545; https://doi.org/10.3390/rs15143545
Submission received: 4 May 2023 / Revised: 11 July 2023 / Accepted: 12 July 2023 / Published: 14 July 2023

Abstract

:
Inspired by the parallel visual pathway model of the human neural system, we propose an efficient and high-precision point cloud registration method based on complex network theory (PointCNT). A deep learning network (DNN) design method based on complex network theory is proposed, and a multipath feature extraction network, namely, Complex Kernel Point Convolution Neural Network (ComKP-CNN) for point clouds is designed based on the design method. Self-supervision is introduced to improve the feature extraction ability of the model. A feature embedding module is proposed to explicitly embed the transformation-variant coordinate information and transformation-invariant distance information into features. A feature fusion module is proposed to enable the source and template point clouds to perceive each other’s nonlocal features. Finally, a Multilayer Perceptron (MLP) with prominent fitting characteristics is utilized to estimate the transformation matrix. The experimental results show that the Registration Recall (RR) of PointCNT on ModelNet40 dataset reached 96.4%, significantly surpassing one-stage methods such as Feature-Metric Registration (FMR) and approaching two-stage methods such as Geometric Transformer (GeoTransformer). The computation speed is faster than two-stage methods, and the registration run time is 0.15 s. In addition, ComKP-CNN is universal and can improve the registration accuracy of other point cloud registration methods.

1. Introduction

With the rapid development of high-precision sensors such as Light Detection and Ranging (LiDAR), point clouds have become the primary data format used to represent the three-dimensional (3D) world [1]. In recent years, the demand for high-quality point cloud data has increased with the rapid development of automatic driving, digital twins, intelligent robots, industrial product quality inspection and other fields. However, these sensors can only capture 3D scene information in a certain view and cannot capture complete 3D scene information. Point cloud registration is a task that aligns two or more different point clouds by estimating the relative transformation between them [2]. Therefore, point cloud registration plays a unique and critical role in computer vision tasks.
However, there are many challenges in point cloud registration. Unlike images, point clouds are unstructured with sparsity and disorder. Point clouds have considerable noise due to the inherent shortcomings of scanning sensors. In addition, the problems of partial overlap and large differences in the 3D features of the same object from different views also bring challenges to point cloud registration.
Most traditional algorithms divide registration into two steps: first, find the correspondences, and then, estimate the rigid transformation matrix according to the correspondences. Obtaining the transformation matrix is simple when the correspondences are known. Similarly, finding the correspondences when the transformation matrix is known is simple. Given these two observations, most algorithms alternate between these two steps to obtain a better result [3]. However, traditional algorithms usually have high computational complexity, require many iterations, take a long time to compute, do not meet real-time requirements, and are nonconvex. These considerations make them sensitive to the initial position and easily fall into the local optimal solution. The method proposed in this paper does not need to calculate correspondences or need iterations, so it has good real-time performance.
Deep learning has shown great advantages due to its prominent fitting characteristics and has been widely used in automatic driving, healthcare, machine translation, damage detection and other fields [4]. Robust and Efficient Point Cloud Registration using PointNet (PointNetLK) [5] pioneered the application of deep learning to point cloud registration. The application of deep learning in point cloud registration has made great progress and significantly improved registration robustness and efficiency. Unfortunately, most deep-learning-based methods do not deviate from the traditional algorithm design. Traditional design is divided into two steps: finding correspondences (or computing a soft matching matrix) and estimating the transformation matrix. These are called “two-stage methods” in this paper. This kind of method completely separates the module for finding correspondences from the module for estimating the transformation matrix. These modules are trained separately, which causes accumulative errors. Two-stage methods have high computing costs and poor real-time performance because they need to find correspondences and calculate their confidence to eliminate outliers. An end-to-end “one-stage method” with fast computation speed and good real-time performance is proposed in this paper. Instead of finding correspondences between the source point cloud and template point cloud, a deep learning model is used to directly extract the global features of the two point clouds, and then the transformation matrix is directly estimated according to the global features. This process makes our registration method robust to noise and able to handle the partial overlap problem.
At present, there are few studies on one-stage methods. Most studies apply only the deep learning model used for two-dimensional (2D) images to point cloud registration after simple changes. Although their registration accuracy exceeds most traditional algorithms, there is still a large gap compared with the two-stage methods. According to the point cloud data characteristics, a new one-stage framework is designed to improve registration accuracy.
In this work, a new efficient and high-precision one-stage point cloud registration method based on complex network theory (PointCNT) is designed, which estimates the rigid transformation matrix by global features without searching for correspondences. An overview of PointCNT is shown in Figure 1. Our method consists of four parts. (1) Feature extraction module. Inspired by the parallel visual pathways model [6], a multipath feature extraction network for point clouds based on complex network theory is designed. A self-supervised module is introduced to improve the feature extraction ability. (2) Feature embedding module. Inspired by nonlocal neural networks [7,8], Geometric-based Self-attention (GBSelf-attention) is designed. GBSelf-attention embeds the transformation-variant coordinate information and the transformation-invariant distance information with geometric consistency between points into the point cloud feature. (3) Feature fusion module. Feature-based Cross-attention (FBCross-attention) is designed to fuse the source and template features so that the extracted features of the two point clouds can be transmitted interactively. (4) Registration module. Multilayer Perceptron (MLP) [9,10,11,12] is used to estimate the rotation and translation, and the rotation quaternion is used to represent the point cloud rotation.
To summarize, our main contributions are threefold:
(1)
An efficient, high-precision and end-to-end one-stage point cloud registration framework is proposed.
(2)
A deep learning network design method based on complex network theory is proposed, and a multipath feature extraction network based on the above method for point clouds is designed.
(3)
A self-supervised module is introduced to improve the feature extraction ability of the network.
(4)
GBSelf-attention and FBCorss-attention based on nonlocal neural networks are designed.
In Section 2, we summarize related research work on point cloud registration, mainly including traditional registration methods, learning-based two-stage methods and learning-based one-stage methods. In the Section 3, we introduce the cloud registration method PointCNT designed in this paper, which mainly includes feature extraction module, feature embedding module, feature fusion module and registration model. In Section 4, we carry out experiments to verify the effectiveness of PointCNT. We carry out ablation experiments to study the effects of different modules on the model and verify the performance of the designed feature extraction module, feature embedding module and feature fusion module. In Section 5, we discuss the research results. Section 6 is the conclusion.

2. Related Work

2.1. Traditional Registration Methods

Point cloud registration is divided into coarse registration and fine registration. Typical coarse registration algorithms include the Point Feature Histogram (PFH) [13], Fast Point Feature Histogram (FPFH) [14], 3D Shape Context (3Dsc) [15], Normal Distributions Transform (NDT) [16], 4-Points Congruent Sets (4PCS) [17] and Principal Component Analysis (PCA) [18]. The coarse registration algorithm is not sensitive to the initial pose, but its registration accuracy is low. The coarse registration can be considered a preprocessing process for point cloud initialization in fine registration. Iterative Closest Point (ICP) and its variants [19,20,21,22] are the best-known traditional fine registration algorithms. ICP alternates between finding point cloud correspondences and solving a least-squares problem to update the alignment. However, ICP-style methods are prone to local minima due to nonconvexity. To solve the above problem, a Globally Optimal Solution to 3D ICP Point-set Registration (Go-ICP) [23] uses a branch-and-bound method to search the motion space. Go-ICP outperforms local ICP methods when a global solution is desired but is several orders of magnitude slower than other ICP variants. Traditional methods do not require a large quantity of training data and have excellent generalization ability. However, they are usually sensitive to noise, have difficulty processing partially overlapping point clouds, easily converge to local optimal solutions, have low registration accuracy and have long computational times. Unlike traditional registration methods, PointCNT based on deep learning is an end-to-end algorithm. It is insensitive to noise and can process partial overlap problems with high computational efficiency and registration accuracy.

2.2. Learning-Based Two-Stage Registration Methods

At present, most research adopts two-stage methods to estimate the transformation matrix, as shown in Figure 2 and Table 1. In the first stage, the correspondences between the source and template are predicted, such as the corresponding relationship of key points, the corresponding relationship of feature points and the corresponding relationship of all points. In the second stage, the transformation matrix is estimated according to the correspondences. In this stage, Singular Value Decomposition (SVD) [19], Random Sample Consensus (RANSAC) [24] or Artificial Neural Networks (ANNs) [25] are usually used to estimate the transformation matrix.
As a classical two-stage method, Deep Closest Point (DCP) [3] first extracts the local point features in a point cloud, then establishes the soft matching matrix among points based on the extracted features and then uses weighted SVD to compute the transformation matrix according to the soft matching matrix. DCP is robust to noise, but its performance is poor when applied to point clouds with only partial overlap. Deep Global Registration (DGR) [26] is similar to DCP, but DGR changes the gradient propagation mode of weighted SVD. DGR takes the derivative of the loss function with respect to the weight w, reducing the computational complexity and improving the registration accuracy. Deep Virtual Corresponding Points (DeepVCP) [2], Partial Registration Network (PRNet) [27] and Geometric Transformer (GeoTransformer) [28] all use key points for matching. First, point cloud features are extracted by DNN, and key points are obtained according to the features. Then, the correspondence matrix is established according to the key points, and the transformation matrix is estimated according to the correspondence matrix. This kind of algorithm further improves the registration accuracy and can handle the partial overlap problem. Deep Neural Network for 3D Point Registration (3DRegNet) [29] and Robust Point Cloud Registration using Deep Spatial Consistency (PointDSC) [7] directly take correspondences as input, use ANN to eliminate the outliers, and then estimate the transformation matrix according to the correspondences that eliminated outliers. This kind of algorithm focuses on the outlier elimination method and obtains the correspondences with a higher proportion of inliers to improve the registration accuracy. Robust Point Cloud Registration Framework Based on Deep Graph Matching (RGM) [30] introduces the idea of a graph, such that the point features not only include the local geometric information but also include the structure and topology information in a wider range to find more correct correspondences.
Two-stage methods usually combine SVD to obtain the registration transformation matrix. Their accuracy is high, but they need to find the correspondences and compute the confidence to eliminate the outliers. Therefore, the computational cost is high, and the real-time performance is poor. Compared with the two-stage methods, our method does not need to find the corresponding point relationships. Our method directly estimates the transformation matrix according to the global features. This process avoids the accumulative errors caused by the complete separation of the feature extraction network and the module for computing the transformation matrix and improves the computational speed.

2.3. Learning-Based One-Stage Registration Methods

PointNetLK [5] pioneered the application of deep learning to point cloud registration. First, MLP and max pooling are used to extract the global features, then the inverse synthesis algorithm is used to improve the Lucas–Kanade Algorithm (LK) [31] and the improved LK algorithm is used to estimate the transformation matrix according to the global features. Point Cloud Registration Network using PointNet Encoding (PCRNet) [32] first utilizes MLP and max pooling to extract the global features of the source and template, concatenates the two global features, and then inputs the features into the ANN to estimate the transformation matrix. PCRNet is robust to noise because the transformation matrix is computed based on the global features. However, its structure is simple, and the registration accuracy is lower than that of the two-stage model. Feature-Metric Registration (FMR) [33] utilizes a self-supervised learning model composed of an encoder and decoder to extract features. Then, the transformation matrix is computed by the inverse synthesis algorithm based on the extracted features. However, this method performs poorly when dealing with partial overlap.
The one-stage method does not need to find the correspondences between the source and template but directly computes the transformation matrix according to the extracted features, which has a fast computation speed and good real-time performance. However, at present, there is little research on the one-stage method. The model structure is simple, and the registration accuracy of the one-stage model is still lower than that of the two-stage model. This paper proposes a new and complex one-stage method framework for registration. Inspired by the parallel visual pathway model in the human neural system, a novel feature extraction network is designed based on complex network theory. Additionally, a self-supervised model is introduced to improve the feature extraction ability of the network. GBSelf-attention and FBCorss-attention are designed to integrate the source point cloud and template point cloud features. The registration accuracy of our method is significantly higher than that of the above one-stage method and achieves state-of-the-art performance. PointCNT is also robust to noise, suitable for partial overlap and has a high inference speed.

3. PointCNT

Given two point clouds Q = q i R 3 i = 1 , 2 , ,   N and P = p t R 3 i = 1 , 2 , ,   M , the point cloud registration goal is to estimate a rigid transformation T = R S O ( 3 ) , t R 3 that aligns the two point clouds with a rotation matrix R and a translation vector t . The transformation can be solved by
R ^ , t ^ = arg min R S O 3 , t R 3 ( p x i , q y i ) C ρ q y i , R p x i + t ,
where C is the set of ground-truth correspondences between Q and P, and ρ ( a , b ) is some distance. However, in this paper, C is not solved, and T = R S O ( 3 ) , t R 3 is directly solved according to the global features.
The pipeline of our network PointCNT is shown in Figure 1 and can be summarized as follows:
R ^ , t ^ = R { φ [ E ( ϕ ( P ) ) , E ( ϕ ( Q ) ) ] } ,
where ϕ ( ) is the feature extraction module, E ( ) is the feature embedding module, φ [ ] is the feature fusion module and R [ ] is the feature registration module.
In this section, we introduce the one-stage point cloud registration method PointCNT in detail. In Section 3.1, we design the point cloud feature multipath extraction network ComKP-CNN based on the complex network theory, and introduced the self-monitoring method to enhance the ability of the network to extract point cloud features. In Section 3.2, we design GBSelf-attention to explicitly embed the coordinate information and distance information of the point cloud into the features. In Section 3.3, we design FBCross-attention to realize the interactive propagation of features between the source and template. In Section 3.4, we realize the registration of point clouds through MLP. Section 3.5 describes the loss function used in this paper.

3.1. Feature Extraction Module

In this section, Kernel Point Convolution (KPConv) [34] is used as the basic module for extracting point cloud features. Inspired by the parallel visual pathway model in the human neural system, as shown in Figure 3, a multipath feature extraction network based on complex network theory is designed. The parallel visual pathways model considers that the high-level brain regions related to vision do not simply receive signals from the retina through one neural pathway but receive neural signals through multiple pathways, and the number of neurons between different pathways is different. The network for extracting point cloud features should have a similar topology structure to the parallel visual pathways model. It is a complex network in which features have multiple transmission pathways rather than a single pathway.
A large number of empirical studies [35,36,37] show that networks in the real world are complex networks between regular networks and random networks, as shown in Figure 4. Almost all of these networks have a small-world effect; that is, networks have a smaller average path length, as shown in Equation (3), and a larger clustering coefficient, as shown in Equation (4). However, at present, most DNNs used to extract point cloud features [34,38,39,40] or even used to extract 2D image features [41,42,43,44] are not complex networks but regular networks, which do not have a small-world effect, as shown in Table 2. Based on the above analysis, we propose a new DNN design method based on complex network theory and use this method to design a new point cloud feature extraction network.
L = 1 1 2 N N 1 i j d i j ,
where N is the number of network nodes, and d i j is the path length between node i and node j.
C = 1 N i = 1 N 2 R i k i k i 1 ,
where N is the number of network nodes, R i is the number of triangles formed by node i and its neighbor nodes and k i is the number of first-order neighbor nodes of node i.
The DNN design method proposed in this paper includes three steps, as shown in Figure 5. First, the existing network designed by researchers is selected as the backbone, and the network is extended to a global coupling network. Then, the network is trained to obtain each edge weight. If the edge weight is small, the edge is considered to play a small role in extracting features, so the edge is deleted. Then, the network is retrained to obtain a complex network with a small-world effect. Based on complex network theory, this method can design a DNN with excellent performance and can better extract the input data features. This design can be used not only in the design of point cloud feature extraction networks but also in the design of feature extraction networks for images, text, voice and other types of data.
In this paper, a Kernel Point Convolution Neural Network (KP-CNN) [34] is used as the backbone to sample the point cloud. To prevent gradient explosion and gradient disappearance, the residual structure in KP-CNN is retained. The feature extraction module is designed using the above network design method, called the Complex Kernel Point Convolution Neural Network (ComKP-CNN). The KP-CNN used in this paper includes a KPConv Block (ConvBlock), as shown in Equation (5), and 10 Residual Blocks (ResBlock), as shown in Equation (6).
F o u t = A F G N Θ F i n ,
where F i n is the input features, F o u t is the output features, Θ is KPConv, G N is group normalization, and A F is the activation function. LeakyReLU is adopted in this paper.
F o u t = U B 2 C B U B 1 F i n + U B 3 M a x F i n ,
where C B = A F G N Θ is ConvBlock, U B = A F G N M L P is a unary block, which is mainly responsible for integrating the feature channels, M L P is MLP and M a x is max pooling. When the channels of F i n are not equal to the channels of F o u t , U B 3 F = A F G N M L P F ; otherwise, U B 3 F = F . When the point cloud is sampled down by the ResBlock, M a x is used to sample the input features at the short edge; otherwise, M a x is not used.
KP-CNN constructs a simple chain network with features as nodes and feature extraction layers as edges. In this paper, U B 2 C B U B 1 F i n (AddBlock) is used as the added edge to build a global coupling network, which is called the global coupling KP-CNN, as shown in Figure 6a. Then, we train the network, remove the edges with small weights and retrain the network to obtain a complex network with a small-world effect, called ComKP-CNN, as shown in Figure 6c.
The feature extracted from ComKP-CNN is input to the decoder, and the coordinates of each point are output. The KP-FCNN is utilized as the decoder, which consists of nearest upsampling and unary convolution. Features are transmitted from the intermediate layers of the encoder to the decoder through skip links, as shown in Figure 7.

3.2. Feature Embedding Module

3.2.1. GBSelf-Attention

Global context has proven critical in many computer vision tasks [28,45,46]. Since our model estimates the transformation matrix through the global features of the point cloud, rather than through the correspondences, the model needs to obtain the transformation-variant information of the point cloud. Therefore, the point coordinates are explicitly embedded into the features so that the features have transformation-variant characteristics. Additionally, the geometric features of the overlapping part of the source point cloud and template point cloud have geometric consistency, so we explicitly embed the distance information between points with transformation invariance into the features. Inspired by nonlocal neural networks, we design a geometric GBSelf-attention, as shown in Figure 8, to learn the global correlations in both feature and geometric spaces among the downsampled points for each point cloud. We describe the computation for downsampled points P ˜ , and the same goes for Q ˜ . The feature of F P ˜ R | P ˜ | × d f is taken as the input feature of GBSelf-attention and the output feature P ˜ , and the same goes for Q ˜ . The feature of F P ˜ G B R | P ˜ | × d f can be computed by
F P ˜ G B = M L P s o f t m a x S P ˜ P ˜ F P ˜ V + F P ˜ ,
where M L P is the MLP, s o f t m a x is a row-wise softmax function and S P ˜ P ˜ R | P ˜ | × | P ˜ | is the attention score matrix, which can be computed as
S P ˜ P ˜ = i = 1 | P ˜ | F P ˜ Q E i R T + F P ˜ Q F P ˜ K T d f ,
where Q , K , V , R R d f × d f are the respective projection matrices for queries, keys and values, E R | P ˜ | × | P ˜ | × d f is the geometric structure embedding and E i R | P ˜ | × d f is the i th element of E .

3.2.2. Coordinate Embedding

The coordinate embedding e i , j C between p ˜ i and p ˜ j is computed by Equation (9):
e i , j C = 1 2 x i + y i + z i + x j + y j + z j | x | max + | y | max + | z | max ,
where x i , y i , z i and x j , y j , z j are the coordinates of p ˜ i and p ˜ j , respectively, and | x | max , | y | max and | z | max are the maximum distances between point cloud P ˜ and the origin along the coordinate axis.

3.2.3. Distance Embedding

Give any two points p ˜ i , p ˜ j R 3 in P ˜ , and define the distance between them as d i , j = p ˜ i p ˜ j 2 . The distance embedding e i , j D between them is computed by applying a sinusoidal function [47] on d i , j d i , j α d α d . Here, α d is a hyperparameter used to tune the sensitivity to distance variations.
Finally, the geometric structure embedding e i , j is computed by aggregating the coordinate embedding and the distance embedding:
e i , j = c o p y e i , j C , d f C + c o p y e i , j D , d f D ,
where c o p y x , d R d represents copying x as a vector with dimension d, and C , D R d f × d f are the respective projection matrices for the distance embedding and the coordinate embedding.

3.3. Feature Fusion Module

Given the GBSelf-attention feature F P ˜ G B , F Q ˜ G B with the distance and coordinate embedding for P ˜ and Q ˜ , respectively, the FBSelf-attention feature F P ˜ F B of P ˜ is computed with the GBSelf-attention feature F P ˜ G B and F Q ˜ G B :
F P ˜ F B = M L P s o f t m a x S P ˜ Q ˜ F Q ˜ G B V + F P ˜ G B ,
where S P ˜ Q ˜ R | P ˜ | × | Q ˜ | is the attention score matrix, which is computed as the feature correlation between F P ˜ G B and F Q ˜ G B :
S P ˜ Q ˜ = F P ˜ G B Q F Q ˜ G B K T d f .
GBSelf-attention embeds the coordinate information as transformation-variant and the distance information as transformation-invariant into each individual point cloud so that the features can explicitly capture the geometric structure information. FBCors-attention enables two point clouds to perceive each other’s features so that the geometric consistency of the overlapping part can be transmitted interactively between the two point clouds. Finally, symmetric function max pooling is used to capture the global features F P ˜ g R d f and F Q ˜ g R d f and stacks F P ˜ g and F Q ˜ g in the channel dimension to obtain F P ˜ Q ˜ R 2 d f . The process is as follows:
F P ˜ Q ˜ = C a t M a x F P ˜ F B , M a x F Q ˜ F B ,
where C a t a , b represents stacking a and b in the channel dimension, and M a x : R | P ˜ o r Q ˜ | × d f R d f represents the max pooling of point cloud features in the dimension of points.

3.4. Registration Module

MLP is used to estimate the transformation matrix because of its prominent fitting characteristics. The registration module has five hidden layers, 1024, 1024, 512, 512, 256, and an output layer of size M + 3 , whose parameters represent the estimated transformation T . The first M of the output values are used to represent the rotation, and last three represent the translation vector t ^ R 3 . The rotation matrix R ^ S O 3 can represent the point cloud rotation, where M = 9 , or by the rotation quaternion q ^ s o 3 , where M = 4 . The experimental results show that PointCNT achieves better registration results when the rotation quaternion is used to represent rotation. Therefore, we use the rotation quaternion to represent point cloud rotation.

3.5. Loss Function

The loss function L consists of registration loss L R e g and self-supervised loss L U n s u p :
L = L R e g + α L U n s u p ,
where α 0 , 1 is the self-supervised coefficient, which is used to balance the role of the self-supervised module on the model.
Referring to DCP [3], we use the following loss function to measure our model’s agreement with the ground-truth rigid motions:
L R e g = R ^ T R g I 2 2 t ^ t g 2 2 λ θ 2 2 ,
where R ^ and t ^ represent the rotation matrix and translation vector estimated by PointCNT, respectively, and R g and t g denote the ground truth. The first two terms define a simple distance on S E 3 . The third term denotes Tikhonov regularization of the PointCNT parameters θ , which serves to reduce the network complexity.
A self-supervised module is introduced to enhance the feature extraction capability of our method. The loss function of the self-supervised module is as follows:
L U n s u p = 1 | P | i = 1 | P | ρ p i , ψ ϕ F p i + 1 | Q | i = 1 | Q | ρ q i , ψ ϕ F q i ,
where ϕ is the ComKP-CNN, ψ is the decoder and ρ a , b represents some distance between a and b. In this paper, ρ a , b = a b 2 .

4. Experiments and Results

In this section, we carry out experiments to study the point cloud registration method proposed in this paper. In Section 4.1, we introduce the details of the experiment, including the dataset and evaluation metrics. In Section 4.2, our method is evaluated on the CAD simulation dataset ModelNet40 and the outdoor dataset KITTI. In Section 4.3, ablation experiments are carried out to study the effects of ComKP-CNN, self-supervised module, coordinate embedding, distance embedding, FBCross-attention, max pooling as symmetric function and rotation quaternion as the representation of point cloud rotation on the model. The improvement of ComKP-CNN on other point cloud registration methods is also studied, which verifies the performance of the DNN design method proposed in this paper.

4.1. Implementation Details

We implement PointCNT in PyTorch. The experiment was carried out on a single Graphic Processing Unit (GPU) server. The GPU is an NVIDIA GeForce RTX3090, and the operating system is Ubuntu 20.04. The initial learning rate is set to 10 4 , and the Adam [48] optimization method and cosine annealing warm restart [49] learning rate adjustment method are utilized. All models are trained for 100 epochs.

4.1.1. Dataset Used in the Experiments

ModelNet40 [50] contains 3D CAD models from 40 categories. It is a widely used dataset for training 3D deep learning networks. We split ModelNet40 into two parts, each of which contains 20 point cloud categories. One part is split into a training set and a testing set according to the proportion of 8:2 to perform same-category testing. The other part is used to perform cross-category testing. ModelNet40 is a simulation dataset with similar characteristics to industrial products. This paper conducts experiments on the ModelNet40 dataset because point cloud registration has been applied to industrial product quality inspection. To verify the effectiveness of our model, we also conduct experiments on the 3DMatch dataset and KITTI dataset. 3DMatch [51] contains 62 scenes, among which 46 are used for training, 8 for validation and 8 for testing. KITTI contains point clouds captured with a Velodyne HDL64 LiDAR in Karlsruhe, Germany, together with the “ground truth” poses provided by a high-end GNSS/INS integrated navigation system. KITTI [52] contains point clouds captured with a Velodyne HDL64 LiDAR in Karlsruhe, Germany, together with the “ground truth” poses provided by a high-end GNSS/INS integrated navigation system.

4.1.2. Evaluation Metrics

We evaluate PointCNT with three metrics: (1) Relative Rotation Error (RRE), the geodesic distance between estimated and ground-truth rotation matrices; (2) Relative Translation Error (RTE), the Euclidean distance between estimated and ground-truth translation vectors; and (3) Registration Recall (RR), the fraction of point cloud pairs whose RRE and RTE are both below certain thresholds (i.e., RRE < 5° and RTE < 0.01).
R R E R ^ = arccos t r a c e R ^ T R g 1 2 .
R T E t ^ = t ^ t g 2 .

4.2. Model Evaluation Experiment

Following PointNetLK [5], we train and evaluate PointCNT on ModelNet40. During the training, the rigid transformation T g is randomly generated, where the rotation is in the range of [0, 45] degrees with arbitrarily chosen axes, and translation is in the range [0, 0.8]. For a fair comparison, initial translations for testing are in the range [0, 0.3], and initial rotations are in the range of [0, 80] degrees. The traditional methods, ICP and one-stage methods, PointNetLK, PCRNet and FMR, and two-stage methods, DCP, GeoTransformer and PointDSC, are selected as the baseline.

4.2.1. Train and Test on Same Object Categories

We use 20 ModelNet40 object categories to train our model and use the same 20 object categories to test our model. The results are shown in Figure 9 and Table 3. When the initial rotation angle is less than 40 degrees, the RRE, RTE and RR of PointCNT are close to those of the two-stage model. When the initial rotation angle is greater than 40 degrees, the RR of PointCNT is slightly lower than that of GeoTransformer and PointDSC, but it still exceeds DCP, traditional methods and one-stage methods. Compared with traditional methods and other one-stage methods, PointDSC is less sensitive to the initial position of the point cloud. This is because we utilize complex network theory to design a feature extraction network ComKP-CNN, which can extract the point cloud features better, and we explicitly embed the coordinate information and distance information to features in the feature embedding module. The results also show that compared with the traditional methods and other one-stage methods, our method is insensitive to the initial angle and achieves registration accuracy similar to that of two-stage methods.

4.2.2. Train and Test on Different Object Categories

We train PointCNT with 20 ModelNet40 object categories and then test PointCNT with another 20 object categories. The results are shown in Figure 10 and Table 4. The performance of our model is obviously better than that of other one-stage methods, which shows that our model has good generalization performance.

4.2.3. Gaussian Noise Experiments

We conduct experiments to study the robustness of PointCNT to noise. PointCNT is trained and tested on the same 20 object categories of ModelNet40. The range of the standard deviation of Gaussian noise is [0, 0.05]. The results are shown in Figure 11 and Table 5. Our model is robust to noise, and the registration results are almost unaffected by noise. This is because GBSelf-attention and FBSelf-attention are nonlocal neural networks that can perceive global point features.

4.2.4. Partial Overlap Experiments

Partial overlap is a problem that point cloud registration has to face. A model has practical application value only if it can achieve acceptable registration results in the case of partial overlap. We manually remove part of the point cloud to compare the performance on the partial overlap. PointCNT is trained and tested on the same 20 object categories of ModelNet40. The range of the standard deviation of Gaussian noise is 0.05. The results are shown in Table 6, and the qualitative visualization results are shown in Figure 12. PointCNT achieves a registration result similar to two-stage methods, and the computation speed is faster than that of two-stage methods. This is because our model is end to end and does not need to find correspondences. The RR of our model is much higher and the RRE and RTE are much lower than those of traditional methods and other one-stage and two-stage methods. This is because FBSelf-attention enables the source point cloud and the template point cloud to perceive each other’s features so that the geometric consistency of the overlapping part can be transmitted interactively between the two point clouds.

4.2.5. Effectiveness of PointCNT

In order to verify the effectiveness of the proposed model on different types of datasets, we carried out experiments on the indoor dataset 3DMatch and the outdoor dataset KITTI. We use 3DMatch training data preprocessed by [53]. We split KITTI into two groups, training and testing. The training group includes 00–07 sequences, and the testing group includes 08–10 sequences. As shown in Table 7, PointCNT achieves good point cloud registration results on 3DMatch and KITTI, which proves the effectiveness of our model. Kitti is a natural object dataset. The excellent performance of the model on KITTI proves that the feature extraction capability will not be limited according to the attributes and characteristics of the point cloud spatial distribution.

4.3. Ablation Experiments

We conduct ablation experiments on ModelNet40 to study the effects of ComKP-CNN, self-supervised module, coordinate embedding, distance embedding, FBCross-attention, max pooling as the symmetric function and rotation quaternion as the point cloud rotation representation on the model. The effect of each module on registration accuracy improvement is verified. PointCNT is trained and tested on the same 20 object categories of ModelNet40 with partial overlap. The range of the standard deviation of Gaussian noise is 0.05. The results are shown in Table 8.
The results show that the complex ComKP-CNN, self-supervised module, coordinate embedding, distance embedding and FBCross-attention all improve the model registration accuracy. Among them, ComKP-CNN contributes the most to the model and reduces RRE and RTE by 0.3335 and 0.0011, respectively, and RR increases by 2.3%, which indicates that ComKP-CNN designed in this paper is effective for extracting the point cloud features. The registration accuracy improvement by FBCross-attention is only second to that of ComKP-CNN. This is because FBCross-attention realizes the interactive propagation of point cloud features, including geometric consistency and coordinate difference between source and template. Table 8 shows that the registration effect when max pooling is used as the symmetric function in the model is better than that when average pooling is used as the symmetric function. This is because average pooling is too smooth, which makes the difference in global features of different point clouds become insignificant, while max pooling does not have such a problem. When the rotation quaternion is used to represent the rotation of the point cloud rather than the rotation matrix, the model registration effect is better, which is the same as the experimental result of 3DRegNet [29].
Experiments are carried out to study the influence of our designed feature extraction module (ComKP-CNN and self-supervised module), feature embedding module (coordinate embedding and distance embedding) and feature fusion module (FBCross-attention and symmetry function) on the registration results under different initial angles. PointCNT is trained and tested on the same 20 object categories of ModelNet40 with partial overlap. The range of standard deviation of Gaussian noise is 0.05. The results are shown in Table 9.
The results show that our designed feature extraction module, feature embedding module and feature fusion module can improve the registration accuracy of the model when the point cloud has different initial angles. And the larger the initial angle, the more obvious the effect of the module we designed on improving the accuracy of registration. Experiments prove the effectiveness of the designed feature extraction module, feature embedding module and feature fusion module in different initial angles.

4.4. Effectiveness of ComKP-CNN

One of the important contributions of this paper is to propose a DNN design method and design a new point cloud feature extraction framework ComKP-CNN. Therefore, we use ComKP-CNN to replace the feature extraction module of other point cloud registration frameworks to verify the effectiveness of ComKP-CNN. Table 10 shows that ComKP-CNN reduces the registration errors of PointNetLK, PCRNet, FMR, DCP, PointDSC and GeoTransformer to varying degrees and improves the registration accuracy. This indicates the correctness of our DNN design idea based on complex network theory. This idea is expected to be extended to the design of deep learning frameworks in other fields.

5. Discussion

It can be concluded that PointCNT is a novel and competitive registration algorithm for partial assignment tasks from the above extensive experiments. Mainly, some meaningful discussions are summarized below.
We conducted an experiment on the same object categories and different object categories on ModelNet40. The experiment shows that the accuracy of deep learning methods is significantly better than traditional methods, and our method’s registration accuracy is close to that of two-stage methods, and it has good generalization performance. The noise experiment shows that our method is minimally affected by noise. Our method shows advanced robustness in point cloud registration under noise interference. Partial overlap is a problem that point cloud registration has to face. The partial overlay experiment shows that our method achieves high registration accuracy in cases where point clouds only partially overlap, and the registration accuracy is much higher than other one-stage methods. The ablation experiment shows that ComKP-CNN contributes the most to the model and reduces RRE and RTE by 0.3335 and 0.0011, respectively, and RR increases by 2.3%, which proves the effectiveness of the deep learning model design method based on complex network theory proposed in this paper.
However, the point cloud registration accuracy of the proposed method is still lower than that of the two-stage method. In addition, although the method proposed in this paper is a one-stage method with fast inference speed and real-time performance, PointCNT is still complex and not easy to deploy.

6. Conclusions

We propose an efficient and high-precision one-stage point cloud registration method. The DNN design method based on complex network theory can not only be used for the design of point cloud feature extraction network but is also expected to be applied to the design of feature extraction networks for image, text, voice and other types of data. The results show that the designed ComKP-CNN can efficiently extract the features of point clouds, significantly reduce the error of point cloud registration and is expected to be applied to 3D target detection, semantic segmentation and other tasks. The results also show that the self-monitoring module is helpful for the model to better extract the features of the point cloud. In addition, the feature embedding module explicitly embeds the geometric information into the point cloud feature, which is helpful for point cloud registration. We also find that FBCross-attention makes the features of source point cloud and template point cloud perceptible to each other and improves the point cloud registration accuracy.
The proposed method of explaining and designing a deep learning model based on complex network theory is a novel idea. The method proposed in this paper is expected to be applied to 3D reconstruction, map reconstruction, digital twinning and other fields. In the future, we will carry out further detailed research on this method in the field of image recognition. We will also carry out research on model compression and model deployment.

Author Contributions

X.W. (Xin Wu), X.W. (Xiaolong Wei) and H.X. were responsible for the overall algorithm design and experimental design and wrote the paper. C.L., Y.H. and Y.Y. were responsible for the coding and experimental execution. W.H. was responsible for correcting complex papers. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China grant number 12075319, the National Natural Science Foundation of China grant number 11805277 and the National Natural Science Foundation of China grant number 51975583.

Data Availability Statement

Data available in a publicly accessible repository.

Acknowledgments

The public dataset used in this article is ModelNet40, 3DMatch and KITTI.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
2DTwo-Dimensional
3DThree-Dimensional
3DRegNetDeep Neural Network for 3D Point Registration
3Dsc3D Shape Context
4PCS4-Points Congruent Sets
ANNsArtificial Neural Networks
CADComputer-Aided Design
ComKP-CNNComplex Kernel Point Convolution Neural Network
ConvBlockKPConv Block
DCPDeep Closest Point
DeepVCPDeep Virtual Corresponding Points
DGRDeep Global Registration
FBCross-attention Feature-based Cross-attention
FMRFeature-Metric Registration
FPFHFast Point Feature Histogram
GBSelf-attentionGeometric-based Self-attention
GeoTransformerGeometric Transformer
Go-ICPA Globally Optimal Solution to 3D ICP Point-set Registration
GPUGraphic Processing Unit
ICPIterative Closest Point
KP-CNNKernel Point Convolution Neural Network
KPConvKernel Point Convolution
LiDARLight Detection and Ranging
LKLucas–Kanade Algorithm
MLPMultilayer Perceptron
NDTNormal Distributions Transform
PCAPrincipal Component Analysis
PCRNetPoint Cloud Registration Network using PointNet Encoding
PFHPoint Feature Histogram
PointCNTA One-Stage Point Cloud Registration Approach Based on Complex
Network Theory
PointDSCRobust Point Cloud Registration using Deep Spatial Consistency
PointNetLKRobust and Efficient Point Cloud Registration using PointNet
PRNetPartial Registration Network
RANSACRandom Sample Consensus
ResBlockResidual Blocks
RGMRobust Point Cloud Registration Framework Based on Deep Graph Matching
RRRegistration Recall
RRERelative Rotation Error
RTERelative Translation Error
SVDSingular Value Decomposition

References

  1. Huang, X.; Mei, G.; Zhang, J.; Abbas, R. A comprehensive survey on point cloud registration. arXiv 2021, arXiv:2103.02690. [Google Scholar]
  2. Lu, W.; Wan, G.; Zhou, Y.; Fu, X.; Yuan, P.; Song, S. DeepVCP: An End-to-End Deep Neural Network for Point Cloud Registration. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 12–21. [Google Scholar] [CrossRef]
  3. Wang, Y.; Solomon, J.M. Deep Closest Point: Learning Representations for Point Cloud Registration. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 3522–3531. [Google Scholar] [CrossRef] [Green Version]
  4. Wu, X.; Wei, X.; Xu, H.; He, W.; Sun, C.; Zhang, L.; Li, Y.; Fang, Y. Radar-absorbing materials damage detection through microwave images using one-stage object detectors. NDT E Int. 2022, 127, 102604. [Google Scholar] [CrossRef]
  5. Aoki, Y.; Goforth, H.; Srivatsan, R.A.; Lucey, S. PointNetLK: Robust & Efficient Point Cloud Registration Using PointNet. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 7156–7165. [Google Scholar] [CrossRef] [Green Version]
  6. Liu, Y. Inverstigation of Parallel Visual Pathways through Pulvinar Nuclei and V2 in Macaque Monkeys. Ph.D. Thesis, Zhejiang University, Hangzhou, China, 2021. [Google Scholar]
  7. Bai, X.; Luo, Z.; Zhou, L.; Chen, H.; Li, L.; Hu, Z.; Fu, H.; Tai, C.L. PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 15854–15864. [Google Scholar] [CrossRef]
  8. Wang, X.; Girshick, R.; Gupta, A.; He, K. Non-local Neural Networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7794–7803. [Google Scholar] [CrossRef] [Green Version]
  9. Ding, X.; Zhang, X.; Han, J.; Ding, G. RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition. arXiv 2021, arXiv:2105.01883. [Google Scholar]
  10. Guo, M.H.; Liu, Z.N.; Mu, T.J.; Hu, S.M. Beyond Self-Attention: External Attention Using Two Linear Layers for Visual Tasks. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 5436–5447. [Google Scholar] [CrossRef] [PubMed]
  11. Melas-Kyriazi, L. Do You Even Need Attention? A Stack of Feed-Forward Layers Does Surprisingly Well on ImageNet. arXiv 2021, arXiv:2105.02723. [Google Scholar]
  12. Tolstikhin, I.O.; Houlsby, N.; Kolesnikov, A.; Beyer, L.; Zhai, X.; Unterthiner, T.; Yung, J.; Keysers, D.; Uszkoreit, J.; Lucic, M.; et al. MLP-Mixer: An all-MLP Architecture for Vision. In Proceedings of the Neural Information Processing Systems, Online, 6–14 December 2021; Volume 34, pp. 24261–24272. [Google Scholar]
  13. Rusu, R.B.; Blodow, N.; Marton, Z.C.; Beetz, M. Aligning point cloud views using persistent feature histograms. In Proceedings of the 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, Nice, France, 22–26 September 2008; pp. 3384–3391. [Google Scholar] [CrossRef]
  14. Rusu, R.B.; Blodow, N.; Beetz, M. Fast Point Feature Histograms (FPFH) for 3D registration. In Proceedings of the 2009 IEEE International Conference on Robotics and Automation, Kobe, Japan, 12–17 May 2009; pp. 3212–3217. [Google Scholar] [CrossRef]
  15. Frome, A.; Huber, D.; Kolluri, R.; Bülow, T.; Malik, J. Recognizing Objects in Range Data Using Regional Point Descriptors. In Computer Vision-ECCV 2004, Proceedings of the 8th European Conference on Computer Vision, Prague, Czech Republic, 11–14 May 2004; Pajdla, T., Matas, J., Eds.; Springer: Berlin/Heidelberg, Germany, 2004; pp. 224–237. [Google Scholar]
  16. Biber, P.; Strasser, W. The Normal Distributions Transform: A new approach to laser scan matching. In Proceedings of the 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453), Las Vegas, NV, USA, 27–31 October 2003; Volume 3, pp. 2743–2748. [Google Scholar] [CrossRef]
  17. Aiger, D.; Mitra, N.J.; Cohen-Or, D. 4-Points Congruent Sets for Robust Pairwise Surface Registration. ACM Trans. Graph. 2008, 27, 1–10. [Google Scholar] [CrossRef] [Green Version]
  18. Abdi, H.; Williams, L.J. Principal Component Analysis. In Wiley Interdisciplinary Reviews: Computational Statistics; John Wiley & Sons, Inc.: Hoboken, HJ, USA, 2010; Volume 2, pp. 433–459. [Google Scholar] [CrossRef]
  19. Besl, P.J.; McKay, N.D. Method for registration of 3-D shapes. In Proceedings of the Sensor Fusion IV: Control Paradigms and Data Structures, Boston, MA, USA, 14–15 November 1992; Volume 1611, pp. 586–606. [Google Scholar]
  20. Bouaziz, S.; Tagliasacchi, A.; Pauly, M. Sparse Iterative Closest Point. In Proceedings of the Eleventh Eurographics/ACMSIGGRAPH Symposium on Geometry Processing, Genova, Italy, 3–5 July 2013; pp. 113–123. [Google Scholar] [CrossRef] [Green Version]
  21. Bronstein, M.M.; Bruna, J.; LeCun, Y.; Szlam, A.; Vandergheynst, P. Geometric Deep Learning: Going beyond Euclidean data. IEEE Signal Process. Mag. 2017, 34, 18–42. [Google Scholar] [CrossRef] [Green Version]
  22. Su, H.; Jampani, V.; Sun, D.; Maji, S.; Kalogerakis, E.; Yang, M.H.; Kautz, J. SPLATNet: Sparse Lattice Networks for Point Cloud Processing. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 2530–2539. [Google Scholar] [CrossRef] [Green Version]
  23. Yang, J.; Li, H.; Campbell, D.; Jia, Y. Go-ICP: A Globally Optimal Solution to 3D ICP Point-set Registration. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 2241–2254. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  24. Fischler, M.A.; Bolles, R.C. Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography. Commun. ACM 1981, 24, 381–395. [Google Scholar] [CrossRef]
  25. Jain, A.; Mao, J.; Mohiuddin, K. Artificial Neural Networks: A tutorial. Computer 1996, 29, 31–44. [Google Scholar] [CrossRef] [Green Version]
  26. Choy, C.; Dong, W.; Koltun, V. Deep Global Registration. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 2511–2520. [Google Scholar] [CrossRef]
  27. Wang, Y.; Solomon, J.M. PRNet: Self-Supervised Learning for Partial-to-Partial Registration. In Advances in Neural Information Processing Systems; Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R., Eds.; Curran Associates, Inc.: New York, NY, USA, 2019; Volume 32, pp. 8812–8824. [Google Scholar]
  28. Qin, Z.; Yu, H.; Wang, C.; Guo, Y.; Peng, Y.; Xu, K. Geometric Transformer for Fast and Robust Point Cloud Registration. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 11133–11142. [Google Scholar] [CrossRef]
  29. Pais, G.D.; Ramalingam, S.; Govindu, V.M.; Nascimento, J.C.; Chellappa, R.; Miraldo, P. 3DRegNet: A Deep Neural Network for 3D Point Registration. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 7191–7201. [Google Scholar] [CrossRef]
  30. Fu, K.; Liu, S.; Luo, X.; Wang, M. Robust Point Cloud Registration Framework Based on Deep Graph Matching. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 8889–8898. [Google Scholar] [CrossRef]
  31. Lucas, B.D.; Kanade, T. An Iterative Image Registration Technique with an Application to Stereo Vision. In Proceedings of the IJCAI’81: 7th International Joint Conference on Artificial Intelligence, San Francisco, CA, USA, 24–28 August 1981; Volume 2, pp. 674–679. [Google Scholar]
  32. Sarode, V.; Li, X.; Goforth, H.; Aoki, Y.; Srivatsan, R.A.; Lucey, S.; Choset, H. PCRNet: Point Cloud Registration Network using PointNet Encoding. arXiv 2019, arXiv:1908.07906. [Google Scholar]
  33. Huang, X.; Mei, G.; Zhang, J. Feature-Metric Registration: A Fast Semi-Supervised Approach for Robust Point Cloud Registration Without Correspondences. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11363–11371. [Google Scholar] [CrossRef]
  34. Thomas, H.; Qi, C.R.; Deschaud, J.E.; Marcotegui, B.; Goulette, F.; Guibas, L. KPConv: Flexible and Deformable Convolution for Point Clouds. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6410–6419. [Google Scholar] [CrossRef] [Green Version]
  35. Barabási, A.L.; Albert, R. Emergence of Scaling in Random Networks. Science 1999, 286, 509–512. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  36. Girvan, M.; Newman, M.E.J. Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA 2002, 99, 7821–7826. [Google Scholar] [CrossRef] [PubMed]
  37. Watts, D.J.; Strogatz, S.H. Collective dynamics of ‘small-world’networks. Nature 1998, 393, 440–442. [Google Scholar] [CrossRef]
  38. Choy, C.; Park, J.; Koltun, V. Fully Convolutional Geometric Features. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 8957–8965. [Google Scholar] [CrossRef]
  39. Qi, C.R.; Su, H.; Kaichun, M.; Guibas, L.J. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 77–85. [Google Scholar] [CrossRef] [Green Version]
  40. Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 5105–5114. [Google Scholar]
  41. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef] [Green Version]
  42. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef] [Green Version]
  43. Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
  44. Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
  45. Sun, J.; Shen, Z.; Wang, Y.; Bao, H.; Zhou, X. LoFTR: Detector-Free Local Feature Matching with Transformers. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 8918–8927. [Google Scholar] [CrossRef]
  46. Yu, H.; Li, F.; Saleh, M.; Busam, B.; Ilic, S. CoFiNet: Reliable coarse-to-fine correspondences for robust pointcloud registration. Adv. Neural Inf. Process. Syst. 2021, 34, 23872–23884. [Google Scholar]
  47. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is All You Need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 6000–6010. [Google Scholar]
  48. Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
  49. Loshchilov, I.; Hutter, F. SGDR: Stochastic Gradient Descent with Restarts. arXiv 2016, arXiv:1608.03983. [Google Scholar]
  50. Wu, Z.; Song, S.; Khosla, A.; Yu, F.; Zhang, L.; Tang, X.; Xiao, J. 3D ShapeNets: A deep representation for volumetric shapes. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1912–1920. [Google Scholar] [CrossRef] [Green Version]
  51. Zeng, A.; Song, S.; Nießner, M.; Fisher, M.; Xiao, J.; Funkhouser, T. 3DMatch: Learning Local Geometric Descriptors from RGB-D Reconstructions. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 199–208. [Google Scholar] [CrossRef] [Green Version]
  52. Geiger, A.; Lenz, P.; Urtasun, R. Are we ready for autonomous driving? The KITTI vision benchmark suite. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 3354–3361. [Google Scholar]
  53. Huang, S.; Gojcic, Z.; Usvyatsov, M.; Wieser, A.; Schindler, K. PREDATOR: Registration of 3D Point Clouds with Low Overlap. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 4265–4274. [Google Scholar] [CrossRef]
Figure 1. Architecture of the proposed network PointCNT.
Figure 1. Architecture of the proposed network PointCNT.
Remotesensing 15 03545 g001
Figure 2. The development of point cloud registration methods based on deep learning.
Figure 2. The development of point cloud registration methods based on deep learning.
Remotesensing 15 03545 g002
Figure 3. Parallel visual pathways model.
Figure 3. Parallel visual pathways model.
Remotesensing 15 03545 g003
Figure 4. (a) Regular network. (b) Complex network. (c) Random network.
Figure 4. (a) Regular network. (b) Complex network. (c) Random network.
Remotesensing 15 03545 g004
Figure 5. The proposed DNN design method is based on complex network theory.
Figure 5. The proposed DNN design method is based on complex network theory.
Remotesensing 15 03545 g005
Figure 6. The structure of the global coupling KP-CNN (a), global coupling KP-CNN after training (b) and ComKP-CNN (c).
Figure 6. The structure of the global coupling KP-CNN (a), global coupling KP-CNN after training (b) and ComKP-CNN (c).
Remotesensing 15 03545 g006
Figure 7. The structure of the self-supervised model.
Figure 7. The structure of the self-supervised model.
Remotesensing 15 03545 g007
Figure 8. The computation graph of GBSelf-attention.
Figure 8. The computation graph of GBSelf-attention.
Remotesensing 15 03545 g008
Figure 9. Comparison results of different methods under the same categories.
Figure 9. Comparison results of different methods under the same categories.
Remotesensing 15 03545 g009
Figure 10. Comparison results of different methods under the different categories.
Figure 10. Comparison results of different methods under the different categories.
Remotesensing 15 03545 g010
Figure 11. Comparison results of different Gaussian noises.
Figure 11. Comparison results of different Gaussian noises.
Remotesensing 15 03545 g011
Figure 12. Partial overlap qualitative visualization registration results of different methods at different initial angles. The initial angles of (ad) are 80, 60, 40 and 20, respectively. Green represents the source point cloud, red represents the template point cloud, and blue represents the registered source point cloud.
Figure 12. Partial overlap qualitative visualization registration results of different methods at different initial angles. The initial angles of (ad) are 80, 60, 40 and 20, respectively. Green represents the source point cloud, red represents the template point cloud, and blue represents the registered source point cloud.
Remotesensing 15 03545 g012
Table 1. Comparison of different point cloud registration methods.
Table 1. Comparison of different point cloud registration methods.
MethodCategoryProposed YearAdvantageDisadvantage
ICPTraditional
Registration Method
1992No need for a large
amount of data for training.
Sensitive to the initial position
of the point cloud and prone
to falling into local optima.
Go-ICP2016Adopting a global solution
for higher registration accuracy.
Running speed is very slow.
DCPLearning-based
Two-stage Method
2019Has good robustness to noise.Not applicable for partial overlap.
PointDSC2021High registration accuracy,
suitable for partial overlap.
Slow running speed,
requires additional algorithms
to find corresponding points.
GeoTransformer2022High registration accuracy,
suitable for partial overlap,
without the need for additional
algorithms to find corresponding points.
Slow running speed,
registration accuracy constrained
by key point matching.
PointNetLKLearning-based
One-stage Method
2019Applying deep learning
to point cloud registration
for the first time.
Low registration accuracy,
robustness and poor
generalization.
PCRNet2019Has good robustness to noise,
is an end-to-end model,
and runs fast.
The model has a simple
structure and low
registration accuracy.
FMR2020The unsupervised learning
method is used to extract
point cloud features,
and the inverse synthesis
algorithm is used to calculate
the transformation matrix.
Poor registration performance
when applied to point clouds
with only partial overlap.
Table 2. The average path length and clustering coefficient of a typical DNN.
Table 2. The average path length and clustering coefficient of a typical DNN.
DNNAverage Path LengthClustering Coefficient
DNN for imagesVGG165.6470
ResNet506.930
DNN for point cloudsKP-CNN3.9720
ComKP-CNN1.5970.684
Table 3. Comparison results of different methods under the same categories.
Table 3. Comparison results of different methods under the same categories.
Initial
Angle (°)
ICPPointNetLKPCRNetFMRDCPGeoTransformerPointDSCPointCNT (Ours)
RRE09.05830.02910.08670.05590.04840.03410.05740.0081
1011.13433.10202.16631.52600.63680.65180.53330.5488
2012.08655.14913.13632.67202.16691.17841.05022.1426
3013.13006.07474.04943.19462.01292.04681.10132.5567
4016.04687.08986.00833.15372.61552.06551.52273.1776
5020.121310.01908.05984.09324.10183.62561.80213.4471
6031.114116.13529.09436.08414.37754.07372.58454.0157
7037.118132.166113.059010.01627.18074.02543.01338.1355
8056.014758.146630.174542.088312.14295.18804.080912.1752
RTE00.07520.00010.00010.00030.00030.00010.00020.0002
100.07020.00310.00310.00310.00200.00110.00020.0021
200.06530.00410.00310.00210.00110.00200.00120.0021
300.06610.00620.00410.00310.00180.00120.00100.0031
400.08010.00800.00620.00500.00330.00180.00200.0037
500.09030.01020.00900.00750.00430.00310.00170.0053
600.10000.02020.01710.01210.00830.00470.00330.0083
700.12010.04020.02510.04020.01110.00510.00420.0122
800.13000.07010.04520.06010.01410.01020.00720.0140
Table 4. Comparison results of different methods under the same categories.
Table 4. Comparison results of different methods under the same categories.
Initial
Angle (°)
ICPPointNetLKPCRNetFMRDCPGeoTransformerPointDSCPointCNT (Ours)
RRE09.09590.18450.14690.00920.01330.05540.11450.0278
1011.07713.11073.19021.67820.79430.50841.14771.0007
2012.04575.64933.06342.64822.12691.12531.19782.3915
3013.05837.01535.16183.02552.37562.41671.33912.6777
4016.07848.19736.91553.22913.15343.00801.72223.0906
5020.006412.19629.08288.11734.07343.81792.18805.1831
6031.116017.021113.184615.05054.65814.06992.51545.3267
7037.103533.176720.144328.13248.14594.09133.56919.0556
8056.056362.018634.048439.174512.17276.10584.168513.0916
RTE00.07490.00010.00000.00000.00000.00000.00000.0000
100.06990.00300.00290.00200.00200.00100.00000.0022
200.06500.00410.00290.00190.00140.00210.00110.0020
300.06590.00700.00500.00360.00180.00140.00090.0033
400.07990.00900.00800.00440.00340.00160.00240.0037
500.09010.01300.01190.00790.00500.00360.00170.0055
600.11850.02510.01900.01500.00790.00500.00310.0110
700.11500.04590.03000.05010.01300.00600.00450.0140
800.12710.07500.05490.07010.01690.01200.00790.0181
Table 5. Comparison results of different methods under the same categories.
Table 5. Comparison results of different methods under the same categories.
Gaussian
Standard Deviation
00.010.020.030.040.05
RRE00.00290.01230.01330.01250.00710.0002
100.50950.48690.58510.70790.64620.6709
202.01182.19172.11382.30902.21192.3873
302.50862.51012.69732.60552.79472.6089
403.00573.40083.18963.50513.40013.1924
503.38963.60133.69863.49453.70843.7896
603.99894.21224.38764.19034.49724.5954
707.98558.39158.19428.49318.78588.6007
8011.994713.013813.004313.495413.188813.7911
RTE00.00000.00010.00000.00010.00000.0001
100.00200.00200.00170.00210.00210.0024
200.00190.00220.00250.00240.00240.0026
300.00290.00290.00350.00380.00360.0036
400.00360.00390.00360.00370.00400.0042
500.00510.00530.00540.00540.00550.0056
600.00820.00840.00830.00830.00830.0084
700.01190.01180.01300.01330.01250.0135
800.01430.01550.01500.01520.01610.0161
Table 6. Comparison results of different methods in the case of partial overlap.
Table 6. Comparison results of different methods in the case of partial overlap.
ModelRRE (°)RTERR (%)Time (s)
ICP17.37520.025382.30.12
PointNetLK17.37520.025382.30.12
PCRNet9.58630.022985.70.16
FMR8.87240.018388.20.08
DCP4.72830.006795.20.21
GeoTransformer3.68780.004297.10.23
PointDSC3.45860.003697.30.24
PointCNT (Ours)4.51280.006496.40.15
Table 7. Point cloud registration results of PointCNT on 3DMatch and KITTI.
Table 7. Point cloud registration results of PointCNT on 3DMatch and KITTI.
DatasetRRE (°)RTE (cm)RR (%)
3DMatch0.32587.385493.6
KITTI0.26147.936897.2
Table 8. The results of ablation experiments.
Table 8. The results of ablation experiments.
CKSSCEDEFBMPAPRQRMRRE (°)RTERR (%)
baseline 6.73220.013585.9
CK 5.32640.008389.3
SS 5.72170.008787.4
CE 5.64290.008687.6
DE 5.81410.008886.9
FB 5.58330.008588.1
MP 6.01350.009086.6
RQ 6.10780.009186.4
SS+CE+DE+FB+MP+RQ 4.84630.007594.1
CK+CE+DE+FB+MP+RQ 4.72340.006795.2
CK+SS+DE+FB+MP+RQ 4.78320.007094.8
CK+SS+CE+FB+MP+RQ 4.71380.006695.3
CK+SS+CE+DE+MP+RQ 4.81950.007394.4
CK+SS+CE+DE+FB+AP+RQ 4.66180.006995.6
CK+SS+CE+DE+FB+MP+RM 4.65760.006895.8
CK+SS+CE+DE+FB+MP+RQ (Ours) 4.51280.006496.4
Note: CK is ComKP-CNN, SS is self-supervised module, CE is coordinate embedding, DE is distance embedding, FB is FBCross-attention, MP is max pooling, AP is average pooling, RQ is rotation quaternion and RM is rotation matrix.
Table 9. Comparison results of different models with and without ComKP-CNN.
Table 9. Comparison results of different models with and without ComKP-CNN.
Initial Angle20°40°60°80°
MetricsRRE (°)RTERRE (°)RTERRE (°)RTERRE (°)RTE
Using ComKP-CNN2.42730.00263.23510.00414.61680.008513.84130.0162
Using KP-CNN2.51850.00313.50170.00484.89120.009714.3760.0177
Using self-supervised modeule2.42730.00263.23510.00414.61680.008513.84130.0162
No self-supervised modeule2.50320.00293.49260.00454.81280.009214.17340.0169
Using coordinate embedding2.42730.00263.23510.00414.61680.008513.84130.0162
No coordinate embedding2.49840.00283.45860.00444.83260.009114.01280.0168
Using distance embedding2.42730.00263.23510.00414.61680.008513.84130.0162
No distance embedding2.48150.00293.41250.00444.84020.009113.98210.0166
Using FBCross-attention2.42730.00263.23510.00414.61680.008513.84130.0162
No FBCross-attention2.51480.0033.48240.00464.86210.009414.26750.0172
Using max pooling2.42730.00263.23510.00414.61680.008513.84130.0162
Using average pooling2.46370.00283.38610.00434.82360.008713.96430.0165
Table 10. Comparison results of different models with and without ComKP-CNN.
Table 10. Comparison results of different models with and without ComKP-CNN.
PointNetLKPCRNetFMRDCPGeoTransformerPointDSC
Without ComKP-CNNRRE (°)17.37529.58638.87244.72833.68783.4586
RTE0.02530.02290.01830.00670.00420.0036
RR (%)82.385.788.295.297.197.3
With ComKP-CNNRRE (°)14.84637.83627.21563.67483.11633.0376
RTE0.02210.02040.01680.00550.00370.0034
RR (%)86.687.890.496.397.797.9
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wu, X.; Wei, X.; Xu, H.; Li, C.; Hou, Y.; Yin, Y.; He, W. PointCNT: A One-Stage Point Cloud Registration Approach Based on Complex Network Theory. Remote Sens. 2023, 15, 3545. https://doi.org/10.3390/rs15143545

AMA Style

Wu X, Wei X, Xu H, Li C, Hou Y, Yin Y, He W. PointCNT: A One-Stage Point Cloud Registration Approach Based on Complex Network Theory. Remote Sensing. 2023; 15(14):3545. https://doi.org/10.3390/rs15143545

Chicago/Turabian Style

Wu, Xin, Xiaolong Wei, Haojun Xu, Caizhi Li, Yuanhan Hou, Yizhen Yin, and Weifeng He. 2023. "PointCNT: A One-Stage Point Cloud Registration Approach Based on Complex Network Theory" Remote Sensing 15, no. 14: 3545. https://doi.org/10.3390/rs15143545

APA Style

Wu, X., Wei, X., Xu, H., Li, C., Hou, Y., Yin, Y., & He, W. (2023). PointCNT: A One-Stage Point Cloud Registration Approach Based on Complex Network Theory. Remote Sensing, 15(14), 3545. https://doi.org/10.3390/rs15143545

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop