1. Introduction
Map construction has been a primary research focus in the area of autonomous navigation for outdoor mobile robots [
1]. Map construction approaches are mainly categorized into visual SLAM [
2] and LiDAR SLAM [
3]. The latter has gradually become the mainstream method of outdoor environment map construction because it uses LiDAR sensors with the advantages of long detection distance, high data accuracy, and no interference from light changes [
4,
5]. The LiDAR SLAM method is essentially separated into two parts: front end and back end [
6]. The front-end component is primarily responsible for LiDAR data processing and point cloud feature extraction, which includes preprocessing, filtering, feature point extraction, and so on. The front end’s purpose is to improve the LiDAR odometry’s accuracy, as well as the map stability. ICP (iterative closest point) [
7] and NDT (normal distribution transform) [
8] are two widely used front-end methods for aligning point clouds. The back-end component is primarily in charge of optimizing the positioning and maps, which includes position optimization, map update, and loop closure detection. The back end’s purpose is to increase both the consistency and the accuracy of positioning and maps. Back-end techniques commonly utilized include pose graph optimization algorithms such as GTSAM [
9], Ceres Solver [
10], and so on, while loop closure detection algorithms typically use radius nearest neighbor (RNN) [
11].
Currently, with the mainstream state-of-the-art LIO-SAM algorithm as the basis of research, there are still areas that need to be improved and areas for refinement in the front end and back end: (i) the front end should extract stable and reliable feature points as much as possible, so as to ensure the alignment accuracy of the LiDAR odometry and the stability of the map, and (ii) the back end should detect point cloud map loop closures as quickly and accurately as possible to avoid situations such as LiDAR odometry drift in large-scale scenarios.
As shown in
Figure 1a, the feature point cloud is extracted directly by curvature calculation, and the result is very rough and cluttered. Moreover, due to the large scale of outdoor scenes, traditional loop closure detection methods based on odometry-measured distances (e.g., the RNN algorithm) are likely to produce cumulative drift and loop closure failures. As shown in
Figure 1b, using the RNN algorithm for large-scale scene loop closure detection, the odometry drifts, and loop closure fails. In complex outdoor environments, directly relying on the curvature for feature extraction of point clouds is obviously single and unreasonable. Because the point cloud data acquired by LiDAR are too redundant, containing unnecessary distance maximum points, a large number of less reliable outlier points, etc., feature extraction and alignment of these points can easily degrade the odometry accuracy. The combination of the above leads to the low accuracy and poor reliability of the LiDAR SLAM algorithm in the outdoor mapping results, which brings unnecessary trouble to the subsequent robot re-localization and path planning. Therefore, there is an urgent need to optimize and improve the existing LiDAR SLAM front-end point cloud feature extraction module and back-end loop closure detection module, to further improve the accuracy and reliability of the map construction results, and to provide a guarantee for the subsequent autonomous navigation of mobile robots.
The LIO-SAM algorithm determines the LiDAR position by aligning two neighboring frames of the point cloud, a method known as scan-to-scan. It is important to note that LiDAR odometry based on scan-to-scan is less reliable since too little information is employed, resulting in lower position accuracy. Therefore, we integrated the distance from the laser point to the center of the LiDAR and the horizontal angle to extract the ground point, refine the front-end feature extraction module of the LIO-SAM algorithm, encode each frame of LiDAR point cloud as a range image, and cluster the non-ground point cloud. We decided to downsample the outlier points so that they could be included in the point cloud’s subsequent feature extraction because we could not discard all of the points that failed to cluster (outlier points); otherwise, the constructed maps would be too sparse, and the environment would have a single feature. By retaining just one column out of every five in the point cloud, we downsampled these outlier points. Next, planar points were extracted from ground points and corner points from clustered segmented non-ground points using the curvature. Both the downsampled outlier points and the successful clustering point cloud were contained in the non-ground points following clustering segmentation. The LiDAR measurement value is not always equal to the true value due to the existence of LiDAR measurement noise and other factors; this is known as “cumulative drift” and occurs each time the LiDAR odometry is solved. The error steadily increases over time. The localization of the LiDAR SLAM system may deviate from the true value due to cumulative drift, rendering the generated maps unusable. Therefore, loop closure detection is needed by the SLAM system in order to account for cumulative drift. In response to this problem, we present the Scan Context scene recognition method for loop closure detection. This method enhances the real-time mapping of the LIO-SAM algorithm’s back end and enables the algorithm to detect loop closure even in the case of a large odometry drift.
The overall framework of the algorithm proposed in this paper on the basis of the LIO-SAM algorithm is shown in
Figure 2. The green module is the improved module proposed in this paper, including front-end point cloud clustering segmentation, feature extraction, and back-end loop closure detection by using the Scan Context algorithm. Point cloud clustering segmentation includes ground point extraction and non-ground point clustering, etc. The main innovations of this study are as follows:
- (1)
We projected the original LiDAR point cloud as a range image and performed ground point extraction on the range image. This method effectively reduced the dimensionality of the point cloud data, improved the efficiency of point cloud processing, and could accurately identify the ground points.
- (2)
We used the BFS algorithm to cluster and segment the non-ground points and found the point cloud with the same label by traversing layer by layer, which improved the clustering speed of the point cloud, avoided the larger computation brought by the global search, and established the foundation for the lightweight front end of the algorithm.
- (3)
We introduce the Scan Context loop closure detection method in the back end of the LIO-SAM algorithm, which effectively improves the accuracy of loop closure detection and enhances the robustness of loop closure detection in the back end of the algorithm.
2. Related Work
LiDAR point cloud feature extraction methods have advanced significantly in the previous few years. H. Hoppe et al. (1992) represented the curvature and orientation information of the point cloud by calculating the normal vector of each LiDAR point [
12]. A. Golovinskiy et al. (2009) proposed a shape-based point cloud recognition method that used shape descriptors to represent objects in a point cloud and used machine learning algorithms to classify the point cloud [
13]. B. Jian et al. (2011) used a Gaussian mixture model to represent the point set, which was aligned and filtered based on the distance between the Gaussian models [
14]. The above method provides a good research idea for the point cloud feature extraction module of the subsequent LiDAR SLAM algorithm. Zhang J et al. (2014) published the LOAM algorithm with LiDAR as an odometry, which practically applied the LiDAR point cloud feature extraction method in the LiDAR SLAM algorithm [
15]. The LOAM algorithm extracts feature points from LiDAR point cloud data using curvature, such as corner points and planar points, and then aligns the LiDAR point cloud by creating constraints between the feature points to achieve map construction and localization. Furthermore, the LOAM algorithm compensates for each feature point’s motion by calculating the relative position of the start and end moments of the LiDAR point cloud, which solves the problem of point cloud distortion caused by LiDAR motion. However, the LOAM algorithm lacks a loop closure detection module, resulting in easy drift and increased error accumulation in large scale scenes. Zhang J et al. (2015) proposed the V-LOAM algorithm to improve the robustness of LiDAR SLAM systems in the absence of visual features and challenging motion scenarios [
16]. The V-LOAM algorithm uses high-speed but low-accuracy visual odometry to estimate robot motion and then optimizes the motion estimation and point cloud alignment using the LiDAR scan-match method, which is highly robust in high-speed motion and illumination change scenarios but is generally effective for outdoor mapping. It is worth mentioning that in recent years, LiDAR point cloud clustering segmentation methods based on deep learning have been significantly developed. The PointNet series of methods capture the local structure of the point cloud by designing a point net, which can learn the deep point set features for point cloud clustering segmentation [
17,
18]. The PointNet method can realize point cloud segmentation quickly and efficiently, but it cannot adequately capture the structural information in ground point cloud data and is sensitive to noise. The RandLA-Net method can directly segment the point cloud for clustering point by point, saving computational resources and memory consumption based on random point sampling [
19]. The RandLA-Net method can perform point cloud segmentation efficiently, but in the process of ground point cloud segmentation, it is easy to lose part of the ground features due to the local feature aggregation module used in the method, and it is poorly adapted to sparse or dense point clouds.
In the course of the development of LiDAR point cloud feature extraction methods, the SLAM back-end loop closure detection methods have likewise been widely studied and progressed. G. Grisetti et al. (2010) proposed the GraphSLAM method, which uses a graph to represent the robot’s trajectory and a map of the environment and estimates the position and pose of the loop closure by optimizing the structure and edges of the graph [
20]. Latif Y et al. (2013) proposed a method using a probabilistic graphical model that allows loop closure detection in different time steps [
21]. The method estimates the position and attitude of the loop closure by minimizing the error between the loop closure constraints and other constraints, thus improving the accuracy and robustness of the SLAM system. ShanTixiao et al. (2018) formally integrated the loop closure detection module into the LiDAR SLAM back end and proposed the LeGO-LOAM algorithm [
22]. The LeGO-LOAM algorithm enhances the LOAM algorithm framework by including a loop closure detection and optimization module based on the RNN algorithm.
Although the LiDAR SLAM algorithm framework has been improved, the single use of 3D LiDAR for mapping and localization will inevitably lead to mapping drift and localization failure, etc. In order to solve this problem, the researchers introduced the LiDAR inertial fusion (LiDAR- Inertial) SLAM framework based on the characteristics of the complementary LiDAR and IMU sensing [
23]. Ye Haoyang et al. (2019) proposed a LIO-mapping algorithm for tightly coupling LiDAR and IMU [
24]. The algorithm minimizes the cost functions from LiDAR and IMU simultaneously to compensate for the shortcomings of a single LiDAR’s insufficient mapping and localization, and it provides more accurate state estimation for LiDAR odometry. Shibo Zhao et al. (2019) proposed the LIOM algorithm [
25], which uses a CNN network to remove dynamic targets from each LiDAR scan, a Kalman filter to fuse the LiDAR and IMU data, and a high-frequency output for LiDAR position estimation. Shan Tixiao et al. (2020) proposed the LIO-SAM algorithm, which is more advanced and widely used today [
26], based on research into the LeGO-LOAM algorithm. The LIO-SAM algorithm fuses multi-sensing data such as IMU, LiDAR, etc.; builds observation models such as absolute measurement, relative measurement, and loop closure detection into factors; and integrates these factors into a complete factor graph for SLAM back-end optimization, which greatly improves LiDAR SLAM mapping accuracy and robustness compared with other algorithms. However, due to the front-end point cloud feature extraction method, the LIO-SAM algorithm is relatively single and rough, resulting in the algorithm losing its lightweight characteristics, and there is still improvement space for the accuracy of mapping construction. In addition, the LIO-SAM algorithm itself continues the loop closure detection module of the LeGO-LOAM algorithm, and the LiDAR odometry is prone to cumulative drift in large-scale scenarios.
Aiming at the mainstream LiDAR SLAM algorithms, which have the problems of single and unreliable point cloud feature extraction methods, time-consuming back-end loop closure detection, and easy drift of the odometry, this paper proposes a LiDAR-inertial SLAM method using point cloud segmentation and Scan Context based on the state-of-the-art LIO-SAM algorithm. First, we projected and encoded each frame of the LiDAR point cloud to obtain a point cloud range image. Then, in order to extract reliable point cloud features more accurately, we extracted ground points based on range images. It is worth mentioning that this method can extract ground points better than the RANSAC algorithm in outdoor scenes [
27,
28]. The outdoor scene is large in scale, and the amount of point cloud data acquired by LiDAR is large. When using the RANSAC method for ground point cloud segmentation, the computational efficiency is low and easily falls into the local optimum, the algorithm is time-consuming, and at the same time, it is difficult to obtain ideal segmentation results. Then, we used the BFS method to cluster and label the non-ground points [
29] and downsampled the points that failed to cluster (i.e., outlier points). By calculating the curvature, the point cloud clusters with the largest and smallest smoothness were selected from the point clouds under different labels for subsequent point cloud map alignment, which reduced the number of point cloud matches and realized the lightweight quality of the algorithm. Finally, we improved the speed and capability of the algorithm’s back-end loop closure detection through the Scan Context loop closure detection algorithm [
30] and improved the algorithm’s odometry drift phenomenon in large-scale scenarios. In order to validate the effectiveness of our proposed method, we tested it on several real datasets such as the KITTI dataset, Walking, Park, etc., and the results showed that our method had better mapping accuracy and real-time performance compared with other state-of-the-art LiDAR SLAM algorithms.
The above research methods provide strong support and a strong reference for the improvement of the LIO-SAM algorithm, but the problems related to the front-end feature extraction and back-end loop closure of the algorithm are still not effectively solved. Therefore, based on the research of the above methods, this paper proposes a LiDAR-inertial SLAM method based on point cloud segmentation and Scan Context, so as to provide an effective solution for the construction of outdoor environment maps.
5. Experimental Results
In this section, we present experimental validation of map construction accuracy and back-end mapping in real time. We tested the accuracy of our algorithm based on the KITTI dataset with a given ground truth and compared it with other state-of-the-art algorithms. To ensure adequate validation of the algorithms, we compared and evaluated our algorithm based on the most recently collected real-world datasets from Walking, Park, and others. The controller was a computer and was equipped with an i7-10710U processor (Intel Corporation, Santa Clara, CA, USA), which used the melodic version of the Robot Operating System (ROS) in Ubuntu 18.04, and the algorithms involved in this experiment were based on C++ implementation.
5.1. Dataset
We fully validated the algorithm’s accuracy through extensive testing on the KITTI dataset [
33]. The KITTI dataset was converted to a packet in the form of a
, which contained a
LiDAR point cloud,
IMU data, and
GPS absolute measurements. Additionally, we fully validated the algorithm’s real-time performance based on the most recent actual collected datasets, such as Walking and Park, provided by the LIO-SAM algorithm.
5.2. Evaluation
To accurately measure the accuracy of the algorithms in this study, an experimental comparison of the absolute trajectory error (ATE), which focuses on globally consistency in algorithm outcomes, was performed using the EVO evaluation tool. ATE compares the estimated position values output from the algorithm with the ground truth of the dataset according to the time stamps on a frame-by-frame basis, and then calculates the position difference between the estimated values and the ground truth on each set of corresponding timestamps, so as to realize an important measure of the global consistency of the trajectory, which is very suitable for evaluating the performance of the SLAM system and the accuracy of the trajectory.
Assuming that the trajectory position estimated by the SLAM algorithm at moment
is
and the position of the real trajectory at that moment is
, the ATE at that moment is defined as follows:
In order to be able to better reflect the relative error situation, with the total number of positions
and the time interval
known, the root mean squared error (RMSE) was utilized to count the errors of all frames to obtain an overall value.
where
, and
denotes that the translation portion was taken for the variables inside the parentheses.
In conclusion, we used the KITTI dataset to evaluate the absolute trajectory errors of the algorithms in this research.
5.3. Results
In order to evaluate the effectiveness of our method, we performed various tests of the proposed algorithm on the KITTI dataset, with a special focus on the sequences 0016 (No. 04), 0027 (No. 07), 0034 (No. 02), and 0042 (No. 01). We performed a careful comparison with several state-of-the-art and typical methods through these sequences in the order of KITTI dataset numbering. The results of the trajectory accuracy comparison are shown in
Figure 14 and
Figure 15.
As seen in
Figure 14 and
Figure 15, compared with A-LOAM, LeGO-LOAM, and LIO-SAM algorithms, the overall match between the trajectories estimated by our method and the ground truth of the dataset was higher, and the fitting results were better. The LeGO-LOAM algorithm data were not counted in
Figure 14a because the LeGO-LOAM algorithm drifted during the KITTI-0042 sequence mapping process. The trajectories estimated by the four algorithms had some deviation from the ground truth, but as can be seen by the localized enlarged view of the complex comparison of the trajectories of sequence 0034 and sequence 0027, our method was closer to the ground truth of the trajectories and achieved a better positional estimation result both in straight lines and at corners. Since the features extracted by our method in the front end were more accurate and reliable, the point cloud had less error in the map matching process, thus reducing the mapping error. Second, the back end of our method had a robust loop closure detection capability, which was not affected by the cumulative error of the LiDAR odometry and could further eliminate the cumulative error in the mapping process. As a result, our mapping results fit the ground truth more satisfactorily.
As can be seen in
Figure 16, our algorithm obtained the smallest ATE.Max results in the KITTI-0042, KITTI-0034, and KITTI-0016 datasets, which were 36.54 m, 19.54 m, and 0.864 m, respectively. It can be seen by fitting the color results with the ground truth that the algorithm in this paper had a better overall fit with the ground truth and a lower error level compared with the A-LOAM, LeGO-LOAM, and LIO-SAM algorithms. It directly illustrates that the algorithm in this paper was much closer to the ground truth, and the maximal error was smaller. The ATE error data are shown in
Table 1,
Table 2,
Table 3 and
Table 4.
It can be seen from
Table 1,
Table 2,
Table 3 and
Table 4 that the A-LOAM and LIO-SAM algorithms achieved better results in a small number of indicators. However, compared with algorithms such as A-LOAM, LeGO-LOAM, and LIO-SAM, our method achieved better results in most of the indicators of ATE. Our algorithm had obvious advantages in the evaluation of metrics such as ATE.Mean, ATE.RMSE, and ATE.SSE, which further validated that our method predicted trajectories that were closer to the ground truth with less error. The local point cloud features were too sparse, resulting in a large ATE.Min value for our method. However, the ATE.Max and ATE.Mean values were smaller, which also proved that our method had a better consistency of mapping results and higher accuracy.
Next, in order to effectively verify the loop closure detection effect of our method, we conducted a loop closure detection comparison experiment with the RNN algorithm, and the experimental results are shown in
Figure 17. For ease of observation, we rendered the two mapping results using different color schemes.
As can be seen through
Figure 17a,b, the traditional RNN-based loop closure detection had several loop closure failures, whereas the loop closure detection results based on our method were better, and the desired loop closure was achieved. The main reason for the difference in this result was that the traditional RNN-based loop closure detection method was susceptible to the influence of the cumulative error of the LiDAR odometry, whereas the Scan Context-based loop closure detection method had rotational invariance, which avoided the influence of the cumulative error of the LiDAR odometry on loop closure detection. In addition, by comparing
Figure 17a and
Figure 17b, it can be seen that the map construction results of our method were clearer, which indirectly verified the map construction accuracy of our method. Compared with several other algorithms, the method in this paper extracted reliable point cloud features at the front end and therefore had a high accuracy of map construction.
In order to fully validate the real-time performance of our method, in addition to the KITTI dataset, we also used the Walking and Park datasets collected in real scenarios provided by the LIO-SAM algorithm. The mapping results of our method based on the Walking and Park datasets are shown in
Figure 18, with localized mapping zoom results appended to the mapping results.
As can be seen in
Figure 18, the mapping results of our method did not show large or obvious drift, and the local mapping results can clearly show the environmental structure and details when enlarged. Among them,
Figure 18a can show clear localized building structures when compared with the results of LOAM, LIOM, and LIO-SAM mapping in the literature [
23]. This also directly indicated that our method built maps with high accuracy.
Table 5 shows the average running time of the four algorithms to register a LiDAR frame on the KITTI dataset and the Walking and Park datasets.
As shown in
Table 5, our method’s runtime decreased when compared with the A-LOAM, LeGO-LOAM, and LIO-SAM algorithms, making it more suitable for deployment in low-powered situations. In the above dataset, the minimum construction time for our method to process a single-frame point cloud was 19.8 ms, and the maximum was 92.1 ms. The mapping time was reduced by an average of 69.57% compared with the A-LOAM algorithm and even 11.38% compared with the state-of-the-art LIO-SAM algorithm.
6. Conclusions
Based on the research of the LIO-SAM algorithm, we proposed a map construction method suitable for an outdoor environment. First, we realized ground point extraction based on the range image transformation method, which can reduce the point cloud dimension and improve the reliability of ground point extraction. Second, we traversed and clustered the segmentation of non-ground point clouds by a BFS algorithm, which effectively avoided the increased computation of a global search and could significantly improve the clustering speed when the distribution of point clouds was not uniform. Finally, we introduced the Scan Context loop closure detection method in the back end of the algorithm, which improved the accuracy of the algorithm’s loop closure detection. We fully validated the overall mapping accuracy and real-time performance of the method in this paper with a variety of datasets such as KITTI, Walking, and Park.
In the future work, we will refer to the deep learning method to adaptively adjust the angle threshold in the point cloud clustering segmentation process and construct a new type of loop closure detection descriptor to further improve the robustness of the algorithm in the process of mapping in outdoor environments.