An Iterative Closest Point Method for Lidar Odometry with Fused Semantic Features

Cao, Qiku; Liao, Yongjian; Fu, Zhe; Peng, Hongxin; Ding, Ziquan; Huang, Zijie; Huang, Nan; Xiong, Xiaoming; Cai, Shuting

doi:10.3390/app132312741

Open AccessArticle

An Iterative Closest Point Method for Lidar Odometry with Fused Semantic Features

by

Qiku Cao

¹

,

Yongjian Liao

²,

Zhe Fu

^1,*,

Hongxin Peng

²,

Ziquan Ding

²,

Zijie Huang

²,

Nan Huang

²,

Xiaoming Xiong

¹ and

Shuting Cai

^1,*

¹

School of Integrated Circuits, Guangdong University of Technology, Guangzhou 510006, China

²

School of Automation, Guangdong University of Technology, Guangzhou 510006, China

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2023, 13(23), 12741; https://doi.org/10.3390/app132312741

Submission received: 18 October 2023 / Revised: 24 November 2023 / Accepted: 24 November 2023 / Published: 28 November 2023

Download

Browse Figures

Versions Notes

Abstract

:

Lidar sensors play a pivotal role in a multitude of remote sensing domains, finding extensive applications in various sectors, including robotics, unmanned aerial vehicles (UAVs), autonomous driving, and 3D reconstruction, among others. Their significance and versatility in these areas have made them indispensable tools for a wide range of applications. The accuracy of Lidar odometry (LO), which serves as the front end of SLAM, is crucial. In this paper, we propose a novel iterative closest point (ICP) technique that combines semantic features to improve LO precision. First, the semantic segmentation neural network is used to extract the semantic features from each frame of the point cloud. Then, the obtained semantic features assist in extracting the local geometric features of the point cloud. Also, the residual blocks of the ICP algorithm’s least squares are combined with semantic confidence functions to better predict the exact pose. Compared to LOAM, there is an average improvement of 4 cm in accuracy per 100 m. Experimental results illustrate the superiority of the proposed method and indicate that the fusion of semantic features can robustly improve the precision of LO.

Keywords:

Lidar odometry; ICP; semantic segmentation

1. Introduction

Simultaneous localization and mapping (SLAM) technology is a critical component of autonomous navigation for robots and vehicles. With its rapid advancements in recent years, SLAM systems predominantly exploit cameras and Lidar sensors as their primary sensors.

Therefore, current research directions can be broadly categorized into visual SLAM [1,2], Lidar-based SLAM [3,4], and their integration [5,6]. In addition, some SLAM systems incorporate auxiliary sensors such as IMU and GPS to enhance robustness, which is particularly crucial in engineering applications. Visual SLAM relies on camera sensors to capture high-resolution images, providing rich scene information. However, it is sensitive to changes in lighting conditions and scene textures. In scenarios with weak textures and low lighting, visual SLAM may suffer from significant trajectory drift, rendering it unable to perform reliable localization and mapping tasks. Moreover, monocular cameras introduce scale uncertainty issues. In contrast to visual SLAM, Lidar-based SLAM employs Lidar sensors, which directly measure distances in the surrounding environment and generate precise 3D point cloud maps. It operates by emitting laser beams and measuring the time it takes for them to return, determining the distances to objects. Lidar’s capability to provide high-quality distance and position information reduces cumulative errors. This makes Lidar SLAM widely adopted in applications such as autonomous driving, robot navigation, and drones. For example, Cartographer [7], developed by Google, is a notable system for robot navigation and map construction, showing significant progress in both 2D and 3D Lidar-based SLAM. LOAM, proposed by Zhang et al. [8] at CMU, is a 3D Lidar odometry and mapping algorithm known for its sparse mapping approach. It primarily matches feature edges and planes for mapping. The algorithm comprises two threads: the odometry thread, running at a high frequency, and the mapping thread, which matches the current point cloud with the map to provide lower-frequency precise pose estimates. This algorithm is straightforward, highly efficient, and, due to its scalability, serves as the foundation for various LOAM-based extensions [9,10,11].

As part of the LiDAR-based SLAM system, Lidar odometry (LO) focuses on the accuracy of the pose trajectory obtained through feature matching algorithms, which is also the primary focus of our work. The key to the LO algorithm lies in feature matching, where high-quality feature matching determines the precision of pose estimation. To ensure the stable identification of good features by the LO algorithm over an extended period, we utilize semantic information obtained from a semantic segmentation model to assist in feature extraction and feature matching within the LO system. To address the problems of traditional ICP on LO, we upgrade the inference model (https://github.com/vincenzo0603/rangenet_lib_forTensorRT8XX, accessed on 9 September 2022) of the RangeNet++ framework to accelerate the acquisition of semantic labels. We propose a novel approach with a semantic confidence score, which is a LO with the fusion of semantic features and geometric features. The main contributions are summarized as follows:

A robust and accurate geometric feature selection strategy is proposed, which incorporates semantic information to select plane points and corner points based on curvature.
To improve the matching accuracy of points, the semantic confidence score in the residual block of least squares is adopted in the proposed semantic feature-based ICP algorithm.

Section 2 introduces and discusses related work, including an overview of the core semantic segmentation algorithm and point cloud registration algorithm. Section 3 introduces the proposed LO algorithm framework. In Section 4, we elaborate on the methods and details of our framework implementation. Section 5 presents a qualitative and quantitative evaluation of our algorithm, followed by an analysis. Section 6 discusses the strengths and weaknesses of our proposed approach, exploring the future directions of SLAM and our research. Finally, Section 7 summarizes the novelty and advantages of our proposed method.

2. Related Work

2.1. Lidar Odometry Algorithm

With the capability of directly acquiring spatial distance information and working at night, 3D Lidar provides significant value and prospects for applications in earth observation missions. Therefore, it is crucial to increase the precision of LO and reduce its accumulated drift error. The current conventional LO methods generally use geometric feature matching to solve six-degrees-of-freedom pose information in three-dimensional space. Planar and edge points are generally extracted from the point cloud as geometric features [12,13,14]. Geometric features such as lines and cylinders can also be extracted from special environments [15]. Compared to cameras, Lidar produces a sparse, unstructured point cloud that lacks texture features, but it can be used to accurately characterize 3D spatial structures. With the development of deep learning, much inspiring work has been achieved in the semantic segmentation of images and Lidar point clouds [16,17,18]. The related work on LO algorithms that incorporate semantic information is also gradually unfolding [19,20].

2.2. Semantic Segmentation Algorithm

One well established direct point cloud semantic segmentation to solve the unstructured problem of point clouds was solved by Charles et al. using symmetric functions while giving the network model PointNet [21]. Specifically, PointNet learns point-by-point features independently through multiple multi-layer perceptron (MLP) layers and extracts global features through a maximum pooling layer. Subsequently, excellent semantic models such as Deep sets [22], PointNet++ [23], and RangeNet++ [24] were proposed. RangeNet++ uses a spherical projection approach of point clouds, which can convert 3D point cloud data by spherical projection into 2D data. For analyzing 2D image data, there are numerous established techniques compared to 3D data. A full convolutional semantic segmentation is performed on a 2D image. The segmentation results are optimized by converting the successfully segmented 2D information to a 3D point cloud. It is shown on the KITTI dataset that RangeNet++ is a better semantic segmentation technique than before. We use the RangeNet++ model for semantic segmentation of point clouds, obtaining semantic information to assist in extracting high-quality features that will be used in our subsequent feature matching.

2.3. Point Cloud Registration Method

Point cloud registration is a vital component of the LO algorithm, with the objective of computing transformations by minimizing a distance function between adjacent point clouds [25]. Commonly used point cloud registration methods in LO can be primarily categorized into three types: point-based methods, distribution-based methods [26,27], and feature-based methods [28,29,30]. The iterative closest point (ICP) algorithm [31] is perhaps the most widely applied point-based registration method.

Also, the ICP algorithm is one of the most classical point cloud alignment algorithms for Lidar odometry. The ICP algorithms uses the nearest points as correspondence to optimize the transformation parameters through iteration, which stops when the precision satisfies the specified requirements. However, the ICP algorithm heavily depends on the initial parameters. As the number of iterations increases, the algorithm tends to settle into a locally optimal solution when the initial parameters are incorrectly set. To address these limitations, a variety of improvements to the ICP algorithm have been developed, such as point-to-line ICP [32], which employs the distance metric from the source to the target point cloud as the error function. It reduces the number of iterations and errors but is less robust for large initial displacements. Furthermore, point-to-plane or plane-to-plane distance can also be utilized as error functions [33]. However, these countermeasures depend on the geometric feature level. It is easy to introduce errors because they only sample the data from the simulated curvature of the nearby areas to determine the geometric features.

3. Overview of the System

The LO algorithm system we propose consists of three parts: the first part preprocesses the point cloud obtained by Lidar, including point cloud filtering and semantic segmentation. Specific details are provided in Section 4.1 and Section 4.2. The utilization of semantic information in the point cloud is a key focus of this paper. The second part involves extracting feature points through curvature and semantic labels, including plane points and edge points, as detailed in Section 4.3. Subsequently, feature matching, incorporating semantic information, is employed to obtain a relatively rough pose (composed of a rotation matrix R and translation vector t, describing the spatial relationship in three-dimensional space), as described in Section 4.4. Feature matching in this part is performed frame-to-frame, meaning that feature points of the current frame can only be matched with those of the previous frame, resulting in larger pose errors. The third part uses the pose obtained in the second part as prior information for one round of semantic label-assisted frame-to-submap feature matching. Unlike some classical LO algorithms, in this paper, we propose a LO algorithm framework that integrates semantic features, as shown in Figure 1. It is noteworthy that we have designed different semantic-assisted methods for frame-to-frame feature matching and frame-to-submap feature matching (in frame-to-submap, during point-to-plane feature matching, principal component analysis is used to fit the plane, instead of merely finding three nearest points as the plane), despite their similar names.

4. Methods

4.1. Point Clouds Filtering

During mobility, Lidar continually gathers information of its surroundings in the form of point cloud representations. We filter the point clouds beforehand to efficiently extract feature points and decrease the burden of semantic segmentation. First, the invalid points collected by the Lidar are removed. Second, point clouds that are too close to the Lidar coordinate system are also filtered, as these point clouds usually do not belong to the surrounding environment. These points generally belong to the machine on which the Lidar is loaded, such as the machine surface. Therefore, we remove the point clouds within 0.15 m from the origin of the Lidar coordinate system.

4.2. Point Cloud Semantic Segmentation

4.2.1. Projection of Point Cloud Spheres as the 2D Range Image

Due to the disorderly nature of point clouds, deep learning networks cannot be used directly. At the same time, the current 3D Lidar (usually 16 or 32 laser beams) can obtain a large amount of point cloud data in one frame. The process of processing all these clouds is tedious and computationally intensive. To deal with the above problem, we obtain the pixel coordinates

(u, v)

of the 2D image from the point cloud

p_{i} = (x, y, z)

by spherical projection, as follows:

(\begin{matrix} u \\ v \end{matrix}) = (\begin{matrix} [1 - (arcsin (z / R) + f_{u p}) f^{- 1}] \times h \\ \frac{1}{2} [1 + arctan (y / x) π^{- 1}] \times w \end{matrix})

(1)

where (

u, v

) represents the image coordinates, (

h, w

) are the desired height and width of the distance image representation, and

f = f_{u p} + f_{d o w n}

denotes the vertical field-of-view of the Lidar. The variable r corresponds to the Euclidean distance from the original point in the point cloud to the laser source, specifically

r = | | p_{i} {| |}_{2}

, representing the distance of each point. The point cloud projection process generates a pair of image coordinates (

u, v

) for each

p_{i}

. Using these indices, we extract the distance r, as well as the x, y, and z coordinates, and reflectance for each

p_{i}

, storing them in an image to create a [5 × h × w] tensor. We maintain the relationship between

p_{i}

and the (u, v) pairs in a list, facilitating the aggregation and cleaning of semantic information in the resulting point cloud. The schematic illustration of projecting the point cloud onto a distance image is depicted in Figure 2.

4.2.2. Fully Convolutional Semantic

After obtaining the 2D image from the point cloud projection, the image is fed into a fully convolutional network with an hourglass-shaped encoder–decoder structure for semantic segmentation. To achieve semantic segmentation of the point cloud represented in this distance image form, we employed a 2D semantic segmentation convolutional neural network (CNN) with an encoder–decoder hourglass-shaped architecture. This results in an output volume of [n × h × w] logits, where n is the number of classes in our dataset. The network is based on the Darknet deep learning framework proposed in YOLOv3 [34]. During training, stochastic gradient descent and weighted cross-entropy loss functions are used to optimize the end-to-end process. In comparison to the horizontal direction, the vertical direction of the distance image contains far more data about the used laser beams. To avoid losing information in the vertical direction, the encoder is downsampled only in the horizontal direction and upsampled elsewhere. The loss function L is represented as:

L = - \sum_{c = 1}^{C} w_{c} y_{c} log (\hat{y_{c}}), w h e r e w_{c} = \frac{1}{log (f_{c} + ϵ)}

(2)

where

w_{c}

denotes the weight of the inverse of the frequency

f_{c}

to penalize the class c. And

\hat{y_{c}}

denotes the softmax function of the last layer of the inference process on the unbounded module.

4.2.3. Point Cloud Reconstruction from 2D Range Image

After the full convolutional semantic segmentation, a 2D range image with semantic labels is obtained. Because the range image is initially generated from the point cloud, this suggests that there will be multiple 3D points mapped to the same pixel point of the range image by spherical projection. For example, projecting 120,000 3D points onto a [64 × 512] range image represents only 32,768 points. Mapping from a range image to a point cloud results in many points not being assigned to semantic labels. Therefore, we generate a semantic label for each point in the point cloud by using the previously established indexing relationship of each point

p_{i}

in the point cloud to the pixel points of the range image. An end-to-end semantic segmentation of the point cloud is shown in Figure 3.

4.3. Features Extraction

4.3.1. Geometric Features

We use

P = \{p_{1}, p_{2}, . . ., p_{n}\}

to denote a set of point clouds. Each of these points

p_{i} = \{x, y, z, c, l\}

, where

\{x, y, z\}

are the 3D coordinates of the point cloud, c denotes curvature, and l is the semantic label of the point. Based on the curvature of the point cloud, the edge and plane features of a frame point cloud can be distinguished as geometric features. These features are denoted as

N e

and

N p

, respectively. Each point’s curvature c is represented as follows:

c = \frac{1}{n \cdot ∥ p_{i} ∥} \sum_{j = 1, j \neq i}^{n} ∥ ∥ p_{j} ∥ - ∥ p_{i} ∥ ∥

(3)

where

p_{i}

denotes the target point and

p_{j}

is in the same laser beam. Then, these points are divided into edge points (curvature greater than

α

) and plane points (curvature less than

α

) using a threshold

α

. To obtain features uniformly, each ring is divided into six parts. In each part,

N e

sharpest points are selected as edge features, and

N p

smoothest points that satisfy the semantic constraints are selected as plane features. Finally, the edge and plane feature points of the whole target frame are denoted as

{N e}^{a l l}

,

{N p}^{a l l}

.

4.3.2. Semantic Constraint Plane Feature Extraction

By judging the planar features with respect to the curvature of point clouds, semantic labels can provide a strong constraint. On the same laser beam, we choose the left and right five points adjacent to the point cloud

e_{i}

to fit the curvature. At the same time, 10 neighboring point clouds with the same number u of labels are recorded as the point cloud

e_{i}

.

δ = \{\begin{matrix} 1 & \frac{u}{10} \geq 0.5 \\ 0 & \frac{u}{10} < 0.5 \end{matrix}

(4)

4.3.3. Removing Unstable Features

A reliable geometric feature is acquired provided that the feature points can be consistently observed. There are two types of feature points that are not consistently observed: (1) The scan plane and the laser scan line are approximately parallel. (2) One end of the scanned plane is blocked by another plane, but this blocked area becomes observable when the Lidar is moved to another viewpoint. To avoid the above unstable points from being selected, a surface patch is combined around the candidate point i. When the surface patch is not almost parallel to the laser beam and no point in the surface patch is selected, the candidate point i can then be selected. An overall feature extraction method is shown in Algorithm 1. In Algorithm 1, the same scanID represents those points obtained by scanning a full revolution in the horizontal direction of the same laser beam.

Algorithm 1: Feature Extraction

4.4. Feature Matching

In Methodology C, edge features

N e

and planar features

N p

are extracted from a frame of a point cloud by curvature information and semantic information. Feature matching is based on point cloud alignment algorithms. The iterative closest point (ICP) is the classical local point cloud alignment algorithm. ICP takes the nearest points in the source and target point clouds as the hypothetical counterpart and iterates through least squares until the error function value is less than a set threshold before outputting the corresponding R and t. It can be expressed as:

(R, t) = a r g m i n \sum_{i = 1}^{n} w_{i} {∥ R p_{i} + t - q_{i} ∥}^{2}

(5)

where

R \in S O (3)

is an orthogonal matrix, and

t \in R^{3}

is a translation vector. However, the ICP algorithm only considers the error function of point-to-point distance and lacks the utilization of point cloud structure information. To be highly accurate and avoid the local optimum for estimating the self-motion of Lidar, the error function is constructed using the point line and the point plane.

We denote the features of the previous frame as

\hat{N e}

,

\hat{N p}

, and the total features of the previous frame are represented as

\hat{{N e}^{a l l}}

,

\hat{{N p}^{a l l}}

. We search the nearest N points in

\hat{N e^{a l l}}

,

\hat{N p^{a l l}}

by KD-tree for all strong feature points

e_{i}

,

p_{i}

(belonging to

N e

,

N p

), and fit them into feature line

{l i n e}_{i}

and feature plane

{p l a n e}_{i}

by using principal component analysis (PCA), respectively. Feature matching is done by minimizing the block of residuals between distance

d_{e i}

of the point

e_{i}

to

{l i n e}_{i}

and distance

d_{p i}

of

p_{i}

to

{p l a n e}_{i}

to solve the pose.

To evaluate the quality of the correspondence between points and the fitted line or plane, we adaptively adjust the semantic confidence score to the constructed point-line. The confidence of the point-plane residuals is as follows:

λ^{l} = \sum_{k}^{n} S a m e^{l} (k) . n^{- 1}

(6)

where

S a m e^{l} (k)

equals 1, indicating that the semantic label of the current feature point

e_{i}

or

p_{i}

is the same as the kth point among the nearest N points corresponding to the search. Otherwise,

S a m e^{l} (k)

will be 0. The final residual formula can be given by:

R e s i d u a l^{i} = \frac{1}{2} {λ_{e}^{l} \sum_{i} d_{e_{i}} + λ_{p}^{l} \sum_{j} d_{p_{i}}}

(7)

5. Results

5.1. Dataset and Experimental Environment

We evaluate our approach on the KITTI odometry [35] dataset, which provides 11 sequences (0–10) with ground truth trajectories, including various outdoor complex scenes such as rural streets and highways. We conduct our experiments on Ubuntu 20.04 operating system, which is equipped with a robot operation system (ROS). All experimental algorithms are implemented with C++.

5.2. Analysis of FtF Matching and FtS Matching

In Section 3, the LO algorithm system is noted to consist of three parts. The relationship in the second part involves point cloud registration between adjacent frames, which is now referred to as frame-to-frame (FtF) matching. The matching in the third part is termed frame-to-submap (FtS) matching. We use two threads to calculate the odometry, one performing FtF matching and the other performing FtS matching. First, a less accurate relative pose is calculated by FtF matching, and then a more accurate FtS matching uses the FtF calculated poses as initial assumptions to obtain the final LO.

In Figure 4a–d, the trajectory represented by the green line indicates the ground truth, whereas the trajectory represented by the red line is generated during FtF matching. In contrast to Figure 4a, the FtF matching incorporating semantic features is illustrated in Figure 4b. Within the light green rectangle, it can be observed that the pose calculated through the semantic FtF matching process exhibits a higher degree of alignment with the true pose compared to Figure 4a. This, in turn, provides a more favorable initial hypothesis for the FtS matching process. A similar set of comparisons is also depicted in Figure 4c,d. The trajectory obtained through executing FtS matching represents the final trajectory of LO, as shown in Figure 4e–h.

5.3. Results and Analysis

In order to verify the accuracy and robustness of LO with semantic fusion, we compare it with the traditional geometric feature matching LO on KITTI odometry benchmark. Table 1 displays the relative translation errors for different LO algorithm frameworks. The relative translation error values (in meters) measure the error in average displacement within 100 m. Finally, the best-scoring method for each sequence and metric is highlighted in bold text.

Lidar odometry and mapping (LOAM), a classical SLAM system without loop closure detection; FLOAM, a variant of LOAM, which is a laser SLAM framework that excels in both accuracy and computational efficiency; and Surfel-based mapping (SuMa) with loop closure detection are considered for comparison. It can be seen that our proposed algorithm has higher accuracy compared to LOAM. SuMa shows very good performance in many KITTI scenarios. For example, in the country scenes of sequences 04 and 05, the translational error of SuMa is 0.44 and 0.43 for 100 m, respectively. But the translational error is 1.70 in the highway scenes, whereas the proposed method LO with fused semantic features is 1.27, which is more robust compared to SuMa [36].

In terms of feature extraction, due to the constraint of semantic information, we can extract higher quality feature points that participate in the next feature matching. At the same time, we can effectively reduce the number of feature points in order to reduce the computational consumption. In Table 2, compared to geometric feature extraction algorithms, semantic-assisted extraction reduces the number of extracted plane and edge features by at least 70% and 51%, respectively. The semantic-assisted extraction algorithm determines the curvature of the point cloud by judging the planar feature aspects at the same time, while the semantic labeling can provide a constraint.

6. Discussion

In terms of robustness, our proposed method exhibits superior performance in translational error accuracy, maintaining stability across various scenarios. From a scene perspective, the SuMa framework performs best in 6 out of 11 scenarios. However, SuMa’s performance varies significantly across different scenes, particularly with the highest error observed in sequence 01 (highway), suggesting its suitability for non-highway scenarios. Although our proposed method is generally more robust, the improvement in pose accuracy is not substantial across most scene environments.

Analyzing the scene types, all methods struggle in sequence 01 (highway) scenarios. Even the proposed method, which performs better than other approaches, exhibits a relatively high average translational error of up to 1.27 m. Several factors contribute to these challenges. First, highways typically lack many structures on both sides, reducing the extraction of feature points and impacting matching accuracy. Second, the higher vehicle speeds on highways result in greater point cloud distortion compared to slower-moving vehicles. Third, the presence of fast-moving dynamic objects on highways, such as other vehicles, contradicts the assumption of a static environment made by the SLAM models. If some of the extracted feature points are associated with dynamic objects, this can affect pose accuracy. In summary, highway scenes pose significant challenges due to their sparse geometry, fast-moving dynamics, and the presence of many dynamic objects. These factors contribute to the substantial errors observed in different algorithm frameworks in highway scenes. As indicated by our analysis, highways present a challenging environment, and our proposed method helps mitigate pose errors to a certain extent.

Lidar can also incorporate data from inertial measurement unit (IMU) sensors. IMUs are sensors used to measure an object’s linear acceleration and angular velocity, typically comprising accelerometers and gyroscopes. By leveraging the real-time data from IMUs alongside the precision of Lidar distance measurements, the fusion of these two sensor types significantly enhances the accuracy of pose estimation, especially in dynamic or complex environments. In addition to IMUs, the Global Positioning System (GPS) is another common supplementary sensor, particularly in outdoor scenarios. GPS provides global positioning information, aiding in global localization and map calibration. In some situations where SLAM systems might lose sensor data (e.g., Lidar or camera failure), GPS signals remain accessible. In such cases, GPS can offer an emergency means of localization to maintain the positioning of robots or devices.

In the future, we are interested in researching semantic-assisted multi-sensor fusion odometry, such as semantic visual-Lidar odometry (SVLO), where semantic features are fused. Following the camera–Lidar joint calibration, a relationship between pixel points and point clouds can be established through projection parameters (intrinsic and extrinsic), allowing simultaneous acquisition of semantic labels for both pixel points and point clouds through image semantic segmentation. Establishing a semantic-assisted visual odometry serves as the prior pose for the proposed semantic fusion Lidar odometry in this paper. Additionally, exploration will be undertaken to integrate IMU sensors.

7. Conclusions

In this paper, we propose a new LO method for fusing semantic features. The method uses semantic information to improve the selection of high-quality geometric features. Compared to geometric feature extraction algorithms, semantic-assisted extraction reduces the number of extracted plane and edge features by at least 70% and 51%, respectively. A semantic confidence score is introduced for the residual block of the least squares in the feature-matching algorithm. For environments with few scene features, the method improves the larger drift problem of conventional LO and provides a significant improvement in the accuracy of the odometer. Compared to LOAM, there is an average improvement of 4 cm in accuracy per 100 m. It also has a slight contribution to feature-rich scenes. Experimental results show that our proposed semantic feature fusion approach can significantly improve the accuracy and robustness compared to the traditional LO approach.

Author Contributions

Conceptualization, Q.C., Z.F., X.X. and S.C.; methodology, Q.C. and Z.F.; software, Y.L.; resources, H.P. and Z.D.; data curation, Z.H. and N.H.; writing—original draft preparation, Q.C.; writing—review and editing, Q.C. and Y.L.; supervision, Z.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Guangzhou Fundamental Application Research Project (2023A04J0341), National Natural Science Foundation of China (62301169).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data provided in this study is available upon request to the authors. Due to privacy considerations for subsequent research, this data is not publicly disclosed.

Conflicts of Interest

The authors declare no conflict of interest.

References

Mur-Artal, R.; Montiel, J.M.M.; Tardós, J.D. ORB-SLAM: A Versatile and Accurate Monocular SLAM System. IEEE Trans. Robot. 2015, 31, 1147–1163. [Google Scholar] [CrossRef]
Mur-Artal, R.; Tardós, J.D. ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras. IEEE Trans. Robot. 2017, 33, 1255–1262. [Google Scholar] [CrossRef]
Qin, C.; Ye, H.; Pranata, C.E.; Han, J.; Zhang, S.; Liu, M. LINS: A Lidar-Inertial State Estimator for Robust and Efficient Navigation. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020; pp. 8899–8906. [Google Scholar]
Shan, T.; Englot, B.; Meyers, D.; Wang, W.; Ratti, C.; Rus, D. LIO-SAM: Tightly-coupled Lidar Inertial Odometry via Smoothing and Mapping. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 24 October 2020–24 January 2021; pp. 5135–5142. [Google Scholar] [CrossRef]
Zuo, X.; Yang, Y.; Geneva, P.; Lv, J.; Liu, Y.; Huang, G.; Pollefeys, M. LIC-Fusion 2.0: LiDAR-Inertial-Camera Odometry with Sliding-Window Plane-Feature Tracking. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 24 October 2020–24 January 2021; pp. 5112–5119. [Google Scholar] [CrossRef]
Shan, T.; Englot, B.; Ratti, C.; Daniela, R. LVI-SAM: Tightly-coupled Lidar-Visual-Inertial Odometry via Smoothing and Mapping. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 5692–5698. [Google Scholar]
Hess, W.; Kohler, D.; Rapp, H.; Andor, D. Real-time loop closure in 2D LIDAR SLAM. In Proceedings of the 2016 IEEE International Conference on Robotics and Automation (ICRA), Stockholm, Sweden, 16–21 May 2016; pp. 1271–1278. [Google Scholar] [CrossRef]
Zhang, J.; Singh, S. LOAM: Lidar odometry and mapping in real-time. In Proceedings of the Robotics: Science and Systems, Berkeley, CA, USA, 12–16 July 2014; Volume 2, pp. 1–9. [Google Scholar]
Wang, H.; Wang, C.; Xie, L. Intensity Scan Context: Coding Intensity and Geometry Relations for Loop Closure Detection. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020; pp. 2095–2101. [Google Scholar] [CrossRef]
Wang, W.; Liu, J.; Wang, C.; Luo, B.; Zhang, C. DV-LOAM: Direct visual lidar odometry and mapping. Remote Sens. 2021, 13, 3340. [Google Scholar] [CrossRef]
Wang, H.; Wang, C.; Chen, C.; Xie, L. F-LOAM: Fast LiDAR Odometry and Mapping. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 27 September–1 October 2021. [Google Scholar]
Qian, J.; Mai, X.; Yuwen, X. Real-time Power Line Safety Distance Detection System Based on LOAM Slam. In Proceedings of the 2018 Chinese Automation Congress (CAC), Xi’an, China, 30 November 2018–2 December 2018; pp. 3204–3208. [Google Scholar] [CrossRef]
Shan, T.; Englot, B. LeGO-LOAM: Lightweight and Ground-Optimized Lidar Odometry and Mapping on Variable Terrain. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 4758–4765. [Google Scholar] [CrossRef]
Wei, X.; Lv, J.; Sun, J.; Pu, S. Ground-SLAM: Ground Constrained LiDAR SLAM for Structured Multi-Floor Environments. arXiv 2021, arXiv:2103.03713. [Google Scholar]
Zhou, L.; Huang, G.; Mao, Y.; Yu, J.; Wang, S.; Kaess, M. PLC-LiSLAM: LiDAR SLAM with Planes, Lines and Cylinders. IEEE Robot. Autom. Lett. 2022, 7, 7163–7170. [Google Scholar] [CrossRef]
Lawin, F.J.; Danelljan, M.; Tosteberg, P.; Bhat, G.; Khan, F.S.; Felsberg, M. Deep Projective 3D Semantic Segmentation. In Proceedings of the Computer Analysis of Images and Patterns: 17th International Conference, CAIP 2017, Ystad, Sweden, 22–24 August 2017; Proceedings, Part I 17. Springer: Berlin, Germany, 2017; pp. 95–107. [Google Scholar]
Wu, B.; Wan, A.; Yue, X.; Keutzer, K. SqueezeSeg: Convolutional Neural Nets with Recurrent CRF for Real-Time Road-Object Segmentation from 3D LiDAR Point Cloud. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia, 21–25 May 2018; pp. 1887–1893. [Google Scholar] [CrossRef]
Meng, H.Y.; Gao, L.; Lai, Y.K.; Manocha, D. VV-Net: Voxel VAE Net With Group Convolutions for Point Cloud Segmentation. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 8499–8507. [Google Scholar] [CrossRef]
Du, S.; Li, Y.; Li, X.; Wu, M. LiDAR odometry and mapping based on semantic information for outdoor environment. Remote Sens. 2021, 13, 2864. [Google Scholar] [CrossRef]
Dong, N.; Chi, R.; Zhang, W. LiDAR Odometry and Mapping Based on Semantic Information for Maize Field. Agronomy 2022, 12, 3107. [Google Scholar] [CrossRef]
Charles, R.Q.; Su, H.; Kaichun, M.; Guibas, L.J. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 77–85. [Google Scholar] [CrossRef]
Zaheer, M.; Kottur, S.; Ravanbakhsh, S.; Poczos, B.; Salakhutdinov, R.R.; Smola, A.J. Deep Sets. In Proceedings of the Advances in Neural Information Processing Systems; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. arXiv 2017, arXiv:1706.02413. [Google Scholar]
Chen, X.; Milioto, A.; Palazzolo, E.; Giguère, P.; Behley, J.; Stachniss, C. SuMa++: Efficient LiDAR-based Semantic SLAM. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 3–8 November 2019. [Google Scholar]
Huang, X.; Mei, G.; Zhang, J.; Abbas, R. A comprehensive survey on point cloud registration. arXiv 2021, arXiv:2103.02690. [Google Scholar]
Biber, P.; Straßer, W. The normal distributions transform: A new approach to laser scan matching. In Proceedings of the Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No. 03CH37453), Las Vegas, NV, USA, 27–31 October 2003; Volume 3, pp. 2743–2748. [Google Scholar]
Gupta, H.; Andreasson, H.; Magnusson, M.; Julier, S.; Lilientha, A.J. Revisiting Distribution-Based Registration Methods. In Proceedings of the IEEE 2023 European Conference on Mobile Robots (ECMR), Coimbra, Portugal, 4–7 September 2023; pp. 1–6. [Google Scholar]
Rusu, R.B.; Blodow, N.; Beetz, M. Fast point feature histograms (FPFH) for 3D registration. In Proceedings of the 2009 IEEE International Conference on Robotics and Automation, Kobe, Japan, 12–17 May 2009; pp. 3212–3217. [Google Scholar]
Bazin, J.C.; Seo, Y.; Pollefeys, M. Globally optimal consensus set maximization through rotation search. In Proceedings of the Computer Vision–ACCV 2012: 11th Asian Conference on Computer Vision, Daejeon, Korea, 5–9 November 2012; Revised Selected Papers, Part II 11. Springer: Berlin, Germany, 2013; pp. 539–551. [Google Scholar]
Yang, Z.; Wang, X.; Hou, J. A 4PCS coarse registration algorithm based on ISS feature points. In Proceedings of the IEEE 2021 40th Chinese Control Conference (CCC), Shanghai, China, 26–28 July 2021; pp. 7371–7375. [Google Scholar]
Besl, P.; McKay, N.D. A method for registration of 3-D shapes. IEEE Trans. Pattern Anal. Mach. Intell. 1992, 14, 239–256. [Google Scholar] [CrossRef]
Censi, A. An ICP variant using a point-to-line metric. In Proceedings of the 2008 IEEE International Conference on Robotics and Automation, Pasadena, CA, USA, 19–23 May 2008; pp. 19–25. [Google Scholar] [CrossRef]
Low, K.L. Linear Least-Squares Optimization for Point-to-Plane ICP Surface Registration; University of North Carolina: Chapel Hill, NC, USA, 2004; Volume 4, pp. 1–3. [Google Scholar]
Redmon, J.; Farhadi, A. Yolov3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
Geiger, A.; Lenz, P.; Urtasun, R. Are we ready for autonomous driving? The KITTI vision benchmark suite. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 3354–3361. [Google Scholar] [CrossRef]
Behley, J.; Stachniss, C. Efficient Surfel-Based SLAM using 3D Laser Range Data in Urban Environments. In Proceedings of the Robotics: Science and Systems (RSS), Pittsburgh, PA, USA, 26–30 June 2018. [Google Scholar]

Figure 1. Flowchart of the LO algorithm framework.

Figure 2. Image (A) represents the planar view of the scan-generated point cloud, while Image (B) is a three-dimensional view.

Figure 3. The visualization of the raw point cloud collected by Lidar is shown in image (a), visualized using the pcd-view tool from the PCL library. Its semantically segmented image is depicted in (c), where distinct colors denote different segmentation categories. Images (b), (d) are local zooms of (a) and (c), respectively.

Figure 4. (a–d) represent trajectories consisting of initial hypothetical poses obtained from frame-to-frame (FtF) matching. Furthermore, the red trajectory is calculated by frame-to-frame (FtF) matching, while the green trajectory represents the ground truth. (e–h) are the final obtained LO trajectories compared with the ground truth.

Table 1. Comparison of the the translation error of different Lidar odometry/slam methods on KITTI odometry benchmark.

Methods	Sequences on KITTI Dataset
Methods	00 Urban	01 Highway	02 Urban	03 Country	04 Country	05 Country	06 Urban	07 Urban	08 Urban	09 Urban	10 Country	Average
LOAM	0.78	1.43	0.92	0.86	0.71	0.57	0.65	0.63	1.12	0.77	0.79	0.84
FLOAM	0.81	1.34	0.82	0.80	0.76	0.57	0.61	0.61	1.14	0.76	0.78	0.82
SuMa	0.68	1.70	1.20	0.74	0.44	0.43	0.54	0.74	1.20	0.62	0.72	0.82
Proposed method	0.71	1.27	0.95	0.75	0.69	0.55	0.67	0.59	1.16	0.74	0.71	0.80

Table 2. Comparison of the number of feature points between geometric feature extraction (GFE) and semantic-assisted extraction (SAE) in sequences 1–3.

Point Cloud	Geometric Feature Extraction		Semantic-Aided Extraction
Point Cloud	Plane Feature	Edge Feature	Plane Feature	Edge Feature
122,471	4226	832	1635	436
107,637	4135	765	1434	433
113,722	4374	803	1335	399
122,336	5047	804	1168	268
121,301	4737	789	1373	439
117,365	4856	857	1469	393

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cao, Q.; Liao, Y.; Fu, Z.; Peng, H.; Ding, Z.; Huang, Z.; Huang, N.; Xiong, X.; Cai, S. An Iterative Closest Point Method for Lidar Odometry with Fused Semantic Features. Appl. Sci. 2023, 13, 12741. https://doi.org/10.3390/app132312741

AMA Style

Cao Q, Liao Y, Fu Z, Peng H, Ding Z, Huang Z, Huang N, Xiong X, Cai S. An Iterative Closest Point Method for Lidar Odometry with Fused Semantic Features. Applied Sciences. 2023; 13(23):12741. https://doi.org/10.3390/app132312741

Chicago/Turabian Style

Cao, Qiku, Yongjian Liao, Zhe Fu, Hongxin Peng, Ziquan Ding, Zijie Huang, Nan Huang, Xiaoming Xiong, and Shuting Cai. 2023. "An Iterative Closest Point Method for Lidar Odometry with Fused Semantic Features" Applied Sciences 13, no. 23: 12741. https://doi.org/10.3390/app132312741

APA Style

Cao, Q., Liao, Y., Fu, Z., Peng, H., Ding, Z., Huang, Z., Huang, N., Xiong, X., & Cai, S. (2023). An Iterative Closest Point Method for Lidar Odometry with Fused Semantic Features. Applied Sciences, 13(23), 12741. https://doi.org/10.3390/app132312741

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Iterative Closest Point Method for Lidar Odometry with Fused Semantic Features

Abstract

1. Introduction

2. Related Work

2.1. Lidar Odometry Algorithm

2.2. Semantic Segmentation Algorithm

2.3. Point Cloud Registration Method

3. Overview of the System

4. Methods

4.1. Point Clouds Filtering

4.2. Point Cloud Semantic Segmentation

4.2.1. Projection of Point Cloud Spheres as the 2D Range Image

4.2.2. Fully Convolutional Semantic

4.2.3. Point Cloud Reconstruction from 2D Range Image

4.3. Features Extraction

4.3.1. Geometric Features

4.3.2. Semantic Constraint Plane Feature Extraction

4.3.3. Removing Unstable Features

4.4. Feature Matching

5. Results

5.1. Dataset and Experimental Environment

5.2. Analysis of FtF Matching and FtS Matching

5.3. Results and Analysis

6. Discussion

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI