*3.1. Label Planar Feature Information*

The sensor carrier is moving on the ground and the LIDAR is mounted horizontally. The ground is observed with the beams below. We can get a rough but fast estimate of the plane from the number of rows of the image matrix. In the estimation plane, accurate ground points can be marked by judging the angle of each point to the ground.

$$a = \tan^{-1} \frac{dz}{\sqrt{\left(dx\right)^2 + \left(dy\right)^2}} = \frac{OP\_2}{OP\_1} \tag{1}$$

As shown in the Figure 2, *P*<sup>1</sup> and *P*<sup>2</sup> are two laser beams reflection points. The angle *a* corresponding to the points of adjacent laser beams should have a small value if there is no barrier. Points on the ground can be marked according to the size of the included angle value. *Dx*, *dy* and *dz* represent the differences of the two laser beams in the three directions, respectively.

**Figure 2.** Illustration of the planar feature information (**a**) The angle between the lines for judging the surface; (**b**) Red points are the planar feature in the scan.

#### *3.2. Label Rod-Shape Feature Information*

Figure 3 is the top view; OC is the measurement of the first beam and OD is the second.

$$B = \tan^{-1} \frac{r\_2 \sin \alpha}{r\_1 - r\_2 \cos \alpha} = \frac{MD}{OM} \tag{2}$$

**Figure 3.** Illustration of the rod-shape feature cluster (**a**) The fast and accurate method for segmentation; (**b**) Green points are the edge feature in the scan; (**c**) The BFS search method in current scan.

In this formula, *r*<sup>1</sup> and *r*<sup>2</sup> are the depth values measured by the two beams. *A* is the angle between the two laser beams. For example, the angle is 0.2◦ in x direction and 2◦ in y direction in Velodyne16.

If C and D are on different objects, the angle *β* between OC and OD will be greater than a certain threshold. In this work, we set this threshold at 20◦. According to this method, we can cluster the same cluster objects quickly and accurately.

The point cloud of our two-dimensional image model can be traversed quickly. We traversed each pixel in the 2D image and calculated the included angle for four points around each pixel. And each pixel is searched by BFS algorithm [22].

As shown in the Figure 3c, the surrounding points consist of the left, right, lower and top pixels. If the angle between the surrounding points is smaller than the threshold, we decided they are the same object. If a point is marked, then it will be skipped. So, the algorithm complexity is Θ(N), where N is the number of image pixels.

For the subsequent processing of point cloud, the influence of disorderly points and inaccurate points can be avoided. For example, when the carrier is driving, leaves, small objects, grass and weeds can be removed. These are difficult to observe through two consecutive frames of scanning, which are the main factors affecting the pose solution.

#### *3.3. Feature Extraction*

Through the segmentation and clustering of planar and rod-shaped information in the previous steps, the extraction of edge point and planar point are carried out in the rod-shape and planar feature information. Curvature is defined as follows:

$$\mathcal{L} = \frac{1}{|S|\bullet} \frac{1}{||\text{Pr}\_i^L||} ||\sum\_{j \in S, j \neq i} (\text{Pr}\_j^L - \text{Pr}\_i^L)||\tag{3}$$

In this formula, *S* is the set of continuous points from the same row of the 2D image. Pr is the point range. In this work, *S* is set to 10. And *c* is the curvature value.

Similar to LOAM, the depth information of each point is used to eliminate parallel points and occluded points, which have certain influence on the subsequent solution. In the feature extraction process, each frame is divided into 6 sub-images, which has a resolution of 16 × 300. Edge points and planar points are extracted from each sub-image, which are determined according to threshold *cth* and *pth*. In this work, *cth* and *pth* are chosen to be 1 and 0.1. The edge point set and planar point set extracted from k frame are *ε<sup>k</sup>* and *sk*.

After this process, we selectively obtain stable features and reduce the calculation pressure at the back-end. At the same time, this step improves the reliability of front-end scan registration.

#### **4. The LIO Odometry Module**

#### *4.1. IMU Pre-Integration*

LIDAR and IMU work in different frequencies. Usually, the LIDAR is 10 to 30 Hz and the IMU is 100 to 500 Hz. The pre-integration integrates the IMU measurement values between each adjacent frame of LIDAR, and adopts a value to express it. Through this step, we can get the output of the two sensors at the same frequency

The measurements of IMU include angular velocity *<sup>ω</sup>*/*B*(*t*) and acceleration /*aB*(*t*). The measured values are all under the B coordinate system. And the measurement equation can be modeled as:

$$
\widehat{\omega\_B}(t) = \omega\_B(t) + b^{\omega^\vee}(t) + \eta^{\omega^\vee}(t) \tag{4}
$$

$$
\widehat{a\_B}(t) = R\_W^B(t)(a\_W(t) - \mathbb{g}^W) + b^a(t) + \eta^a(t) \tag{5}
$$

The measured values are affected by the slowly varying bias *b*(*t*) and white noise *η*(*t*). The acceleration of gravity in the world system. *gW*= [0, 0, *g*] *<sup>T</sup>* is the gravity vector, which affects the measurement. So, it should be subtracted.

In our work, noise is ignored and the biases are considered the constant during the pre-integration period. The current state value can be obtained based on the derive of pre-integration. Assuming that *j* is the current frame and *i* is the last frame. The attitude rotation matrix *R<sup>j</sup> WB*, velocity *v j WB*, and position *p j WB* can be expressed as:

$$R\_{\rm WB}^{\dot{j}} = R\_{\rm WB}^{i} \Delta R\_{\rm ij} \text{Exp} \left( J\_{\Delta R\_{\dot{\imath}}}^{\mathcal{S}} \delta b\_{\omega}^{\dot{i}} \right) \tag{6}$$

$$
\boldsymbol{\sigma}\_{\mathsf{W}\mathsf{B}}^{\dot{i}} = \boldsymbol{\sigma}\_{\mathsf{W}\mathsf{B}}^{i} + \boldsymbol{\mathcal{g}}^{\mathsf{W}}\Delta t\_{\dot{\boldsymbol{\eta}}} + \boldsymbol{R}\_{\mathsf{W}\mathsf{B}}^{i} \left(\Delta \boldsymbol{\sigma}\_{\dot{\boldsymbol{\eta}}} + \boldsymbol{J}\_{\Delta \boldsymbol{\upsilon}\_{\ddot{\boldsymbol{\eta}}}}^{\omega} \delta \boldsymbol{b}\_{\omega}^{i} + \boldsymbol{J}\_{\Delta \boldsymbol{\upsilon}\_{\ddot{\boldsymbol{\eta}}}}^{a} \delta \boldsymbol{b}\_{a}^{i}\right) \tag{7}
$$

$$p^{\dot{i}}\_{\rm WB} = p^{\dot{i}}\_{\rm WB} + v^{\dot{i}}\_{\rm WB} \Delta t\_{\dot{i}\dot{j}} + \frac{1}{2} g^{\mathcal{W}} \Delta t\_{\dot{i}\dot{j}}^2 + R^{\dot{i}}\_{\rm WB} \left(\Delta p\_{\dot{i}\dot{j}} + f^{\omega}\_{\Delta p\_{\dot{i}\dot{j}}} \delta b^{\dot{i}}\_{\omega} + f^{a}\_{\Delta p\_{\dot{i}\dot{j}}} \delta b^{\dot{i}}\_{a}\right) \tag{8}$$

*J* is for Jacobians, and the details can be found in [23]. The *J<sup>a</sup>* (·) *ba* and *J<sup>ω</sup>* (·) *b<sup>ω</sup>* means a first-order approximation of the effect of changing the biases to avoid repeated integration. Meanwhile, the terms of pre-integration Δ*Rij*, Δ*vij* and Δ*pij* can be computed between the frame *i* and *j*:

$$
\Delta R\_{i\bar{j}} = \prod\_{k=i}^{j-1} \text{Exp}((\omega^{\:k}\_{\;B} - b^{\:i}\_{\;\omega})\Delta t) \tag{9}
$$

$$
\Delta \boldsymbol{\sigma}\_{i\bar{j}} = \sum\_{k=i}^{j-1} \Delta R\_{ik} (\boldsymbol{a}\_B^k - \boldsymbol{b}\_a^i) \Delta t \tag{10}
$$

$$
\Delta p\_{\vec{\eta}} = \sum\_{k=i}^{f-1} (\Delta v\_{ik} \Delta t + \frac{1}{2} \Delta R\_{ik} (a\_B^k - b\_a^i) \Delta t^2) \tag{11}
$$

### *4.2. Build Local Map*

In point cloud registration, the iterative closest point (ICP) algorithm is the most commonly scan registration method. However, as the urban scenes consists of lots of moving targets, ICP registration failure rate is high, which is directly based on raw point data. And ICP is improper for localization and mapping in real time due to its large amount of computation. In LOAM, pose estimation depends on scan-to-scan matching for quick estimation. However, this method is prone to cumulative error. In our work, current scan and local map are matched according to the predict value of IMU pre-integration. Meanwhile, the scan-to-map result is used to correct the IMU accumulative errors.

A local map associated with the current LIDAR frame is constructed. A fixed number of key frame maps within a certain range are constructed by sliding window method. The local map is converted to the W coordinate system. Edge points and planar points of a local map form the voxel map. And the points in local map are down-sampled to eliminate the duplicated features. In order to improve the point cloud matching speed, the feature information of the local map is stored in the data structure of KD-tree [24] for the convenience of subsequent search.

Therefore, this paper adopts the registration method based on feature points. After the feature points with the same type are obtained through preprocess, the graph optimization model is used to iteratively locate for current scan and local map.

### *4.3. Pose Estimation*

For each edge point *p<sup>ε</sup>* ∈ *εk*, we search for the nearest five points on the local map and calculate the mean and covariance matrices for the five points. When the distribution of points approximates a straight line, one eigenvalue of the covariance matrix will be significantly larger than the rest. In this study, the eigenvector corresponding to the eigenvalue *usm <sup>ε</sup>* is the main direction of the line, and *psm <sup>ε</sup>* is the geometric center of the five points in the Figure 4. If the line feature satisfies the condition, the distance between the current edge point and the line can be calculated, and the best pose estimation of the current point in the local map can be obtained by minimizing the distance. The distance calculation formula:

$$f\_{\varepsilon}(p\_{\varepsilon}) = p\_n \bullet \left( (T\_{\lfloor k \rfloor} p\_{\varepsilon} - p\_{\varepsilon}^{\rm sm}) \times u\_{\varepsilon}^{\rm sm} \right) \tag{12}$$

where symbol • is the dot product and × is the cross product. *pn* is the unit vector.

$$p\_{\rm tr} = \frac{(T\_{\rm k} p\_{\rm \varepsilon} - p\_{\rm \varepsilon}^{\rm sm}) \times \mu\_{\rm \varepsilon}^{\rm sm}}{|| (T\_{\rm k} p\_{\rm \varepsilon} - p\_{\rm \varepsilon}^{\rm sm}) \times \mu\_{\rm \varepsilon}^{\rm sm} ||} \tag{13}$$

**Figure 4.** Illustration of the edge point-to-line residual. *usm <sup>ε</sup>* s the main direction of the line, and *psm s* is the geometric center of the five nearest points in the local map.

In the same way, each planar point in the current scan *ps* ∈ *sk*, we search for five points on the local map to form a plane. However, the difference is that the eigenvector corresponding to the minimum eigenvalue of the five-point covariance matrix is the normal vector corresponding to this plane. As shown in the Figure 5, *usm <sup>s</sup>* is the main direction of the normal vector. *psm <sup>s</sup>* is the geometric center of five planar points.

$$f\_s(p\_s) = (T\_k p\_s - p\_s^{sm}) \bullet u\_s^{sm} \tag{14}$$

**Figure 5.** Illustration of the planar point-to-plane residual. *psm <sup>s</sup>* is the geometric center of five planar points. *usm <sup>s</sup>* is the main direction of the normal vector.

Therefore, this optimization problem can be constructed as:

$$\min \left\{ \sum f\_{\ell}(p\_{\ell}) + \sum f\_{s}(p\_{s}) \right\} \tag{15}$$

The graph optimization algorithm is used to solve the nonlinear optimization problem. Thus, accurate state estimation can be obtained. Jacobian's derivation can be based on the mathematical model of left disturbance with *δϕse*(3) [25].

$$\begin{split} J\_p &= \frac{\partial (TP)}{\partial \delta \rho} \\ = \lim\_{\delta \rho \to 0} \frac{\left(\exp(\delta \rho) \left(TP\right) - \left(TP\right)\right)}{\delta \rho} \\ &= \begin{bmatrix} I\_{3 \ast 3} & -\left[TP\right]\_\times \\ 0\_{1 \ast 3} & 0\_{1 \ast 3} \end{bmatrix} \end{split} \tag{16}$$

where [*TP*]<sup>×</sup> transforms 4D point expression {*x*, *<sup>y</sup>*, *<sup>z</sup>*, 1} into 3D point expression {*x*, *<sup>y</sup>*, *<sup>z</sup>*} and calculates its skew symmetric matrix. Jacobian matrix with edge residual can be derived by:

$$J\_{\varepsilon} = \frac{\partial f\_{\varepsilon}(p\_{\varepsilon})}{\partial (TP)} \frac{\partial (TP)}{\partial \delta \,\uprho} = \left. p\_{n} \bullet \left( u\_{\varepsilon}^{sm} \times J\_{p} \right) \right| \tag{17}$$

In the same way, we also can derive:

$$J\_s = \frac{\partial f\_s(p\_s)}{\partial (TP)} \frac{\partial (TP)}{\partial \delta \, q} = \mu\_\varepsilon^{sm} \bullet J\_p \tag{18}$$

According to above formula, the estimation can be calculated by iterative optimization until it converges. In the work, the local map size is set to within 50 m radius. We propose the new optimization model, deduce the corresponding residual and Jacobian, and improve the solving speed significantly compared to other algorithms (see Section 6).

#### **5. The Global Optimization Module**

If the motion change in the current scan is greater than a certain threshold (10◦ in rotation and 0.5 m in translation) compared with that of the previous scan, the current frame will be judged as a key frame, and it will enter the global optimization which is based on sliding window.

In this paper, the state vector in the sliding window is defined as *χ* = [*x*0, *x*1, *x*2,..., *xn*]. And *χ<sup>i</sup>* = *p<sup>W</sup> bi* , *q<sup>W</sup> bi* , *v<sup>W</sup> bi* , *ba*, *bg* . For the n keyframe window width, these states are obtained by minimizing

$$\underbrace{\min\_{\boldsymbol{\chi}\_{\text{in}}}}{\mathsf{C}\_{\boldsymbol{\chi}\_{\text{in}}}} \left\{ \left|| \boldsymbol{R}\_{p} \left( \boldsymbol{\widetilde{\boldsymbol{\chi}}} \right) \right| \boldsymbol{\upbeta} \right\}^{2} + \sum\_{k=1}^{n} \boldsymbol{L} \Big( \boldsymbol{\hat{z}}\_{\boldsymbol{b}\_{j}\boldsymbol{\prime}}^{b\_{i}} \boldsymbol{\chi} \Big) + \sum\_{k=1}^{n} \varkappa \Big( \boldsymbol{\hat{z}}\_{\boldsymbol{b}\_{j}\boldsymbol{\prime}}^{b\_{i}} \boldsymbol{\chi} \Big) + \sum\_{k=1}^{n} \boldsymbol{F} \Big( \boldsymbol{\hat{z}}\_{\boldsymbol{b}\_{j}}^{b\_{\text{top}}} \boldsymbol{\upgamma} \Big) \right\} \tag{19}$$

In this formula, *Rp*(*x*-) means the prior residual according to the measurements which are marginalized out because of the sliding window. *<sup>L</sup>*(*xk*), κ(*xk*) and *<sup>F</sup>*(*xk*) denote the LIDAR, IMU and loop closure error terms. Figure 6 shows the optimization process.

**Figure 6.** The optimization contains prior term, LIDAR term, IMU term and loop term. Prior term is generated by marginalization. Observations of LIDAR can provide scan-to-scan constraints and IMU pre-integration forms the constraints between keyframes. Loop closure is used for reduce the drift for the long-time running.

#### *5.1. Prior Term*

The purpose of marginalization is to bound the computational complexity. For the states out of the sliding window, they cannot be directly throwed away, because it will destroy the original constraint relationship and lose the constraint information. This work selectively marginalizes out *xi* from the sliding window via Schur-complement [26], and convert measurements corresponding to marginalized states into the prior.

#### *5.2. LIDAR Term*

Through the previous scan-to-map calculation of each scan (see Section 4.3), The LIDAR state variation between two adjacent frames is added into the graph optimization model as scan-to-scan constraint.

$$L\left(z\_{b\_{\vec{j}}}^{b\_i}\chi\right) = \Delta T\_{i\vec{j}} = T\_i^T T\_{\vec{j}} \tag{20}$$

This work assumes that *j* and *i* are the current and previous frame, respectively. This term can inhibit the accumulation of cumulative errors over a long time.

#### *5.3. IMU Term*

When IMU measurements are available, the residual between two continuous frames can be calculated, and the residual is defined as:

$$\begin{aligned} \varkappa \left( \stackrel{\cdot}{Z}\_{b\_{j} \cdot} \chi \right) = \begin{bmatrix} \delta a^{b\_{i}}\_{b\_{j}} \\ \delta b^{b\_{i}}\_{b\_{j}} \\ \delta b^{b\_{i}}\_{b\_{j}} \\ \delta b^{b\_{i}}\_{b\_{j}} \\ \delta b^{a} \\ \delta b^{a^{o}} \end{bmatrix} = \begin{bmatrix} \begin{bmatrix} R^{b\_{i}}\_{W} \left( \boldsymbol{p}^{W}\_{b\_{j}} - \boldsymbol{p}^{W}\_{b\_{i}} - \boldsymbol{v}^{W}\_{i} \Delta t + \frac{1}{2} \boldsymbol{g}^{W} \Delta t^{2} \right) - \dot{R}^{b\_{i}}\_{b\_{j}} \\ \boldsymbol{R}^{b\_{i}}\_{W} \left( \boldsymbol{v}^{W}\_{b} - \boldsymbol{v}^{W}\_{i} \Delta t + \boldsymbol{g}^{W} \Delta t \right) - \dot{\beta}^{b\_{i}}\_{b\_{j}} \\ 2 \left[ \left( \boldsymbol{q}^{b\_{i}}\_{b\_{j}} \right)^{-1} \odot \left( \boldsymbol{q}^{\omega}\_{b\_{i}} \right)^{-1} \odot \boldsymbol{q}^{\omega}\_{b\_{j}} \right]\_{\boldsymbol{x} \boldsymbol{y} \boldsymbol{z}} \\ \boldsymbol{b}^{a}\_{j} - \boldsymbol{b}^{a}\_{i} \\ \boldsymbol{b}^{a}\_{j} - \boldsymbol{b}^{a}\_{i} \end{bmatrix} \end{aligned} \tag{21}$$

where [•]*xyz* is the imaginary part of a quaternion. *α*ˆ *bi bj* , *β*ˆ*bi bj* and *q*ˆ *bi bj* are the pre-integration of position, velocity, and rotation between *j* and *i* under the assumption that *b<sup>a</sup>* and *b<sup>ω</sup>* are stable.

#### *5.4. Loop Term*

Loop closure is an important step to correct the accumulated error in SLAM system. In this study, the function is realized by distance detection. In the current frame, we search for the distance coordinates of nearby key frames. And frames within the geometric radius 15m can be marked as the candidate loop closure frames.

We select the nearest frame from the candidate frame as the previous key frame *T<sup>W</sup> loop*. Then a certain number of point clouds are found near the previous key frame, which are used for a small local map. In this module, the number of points in optimization model is less, so ICP is used to calculate the relative transformation *T<sup>W</sup> <sup>W</sup>* of similar scenes. The residual between the previous key frame and current frame can be obtained:

$$F\left(z\_{b\_j}^{b\_{loop}},\chi\right) = \Delta T\_{loop,j} = \left(T\_{loop}^W\right)^{-1} T\_{W'}^W T\_j^{W'} \tag{22}$$

In order not to affect the real-time performance, loop closure detection and mapping for ICP are in another thread.

#### **6. Evaluation**

In order to verify our algorithm, we have conducted public dataset experiments and real-word experiments. The proposed algorithm is operated on a laptop which consists of an Intel-i7 CPU and 16G of memory. The operating system is Ubuntu18.04 and ROS Melodic [27]. We use evo [28] to evaluate accuracy. The optimization library we used are GTSAM [29] and Ceres [30].

#### *6.1. Validation of M2DGR Datasets*

M2DGR [31] datasets were recorded using ground robots. As shown in Figure 7, A HDL 32E Velodyne LiDAR (labeled 3 in the figure) was used to scan the surrounding environment and obtain the 3D point cloud. The IMU device is Handsfree A9 (labeled 5 in the figure), which is a 9-axis sensor. In outdoors, the satellite visibility is good so that the GNSS-RTK suite (labeled 4 in the figure) outputs high-precision ground truth. For indoor environments, the ground truth trajectories are recorded with a motion-capture system which consists of twelve highspeed tracking cameras. The spatial relationship among different sensors have been calibrated.

In order to test the robustness of our algorithm, we adopt tests of different scenarios, and the data information is shown in the Table 1. Street\_01, 04, 07, 10 are collected on the street. In the street dataset, there are buildings discontinuously. The structured environment has rich geometric feature information. However, the switching of unstructured scene has the unpredictable influence on LIDAR odometry. Various weeds, leaves and other environmental factors affect the positioning accuracy in Street\_04 around the lawn. Loop is set for loop closure detection, which is important for the validation in back-end graph optimization, and the motion state of zigzag brings challenges for interframe motion estimation. When the running time is longer than 500 s, we think that it is long-term to test for stability and robustness. Gate\_02 is collected around the large circular gate. It is easy to satisfy the loopback condition. The ground robot is always rotating in Circle\_02 scene, which is difficult for the feature matching, and Hall\_05 is collected for indoor environment. There is a large amount of overlap and structured feature during the experiment.

**Figure 7.** The sensor suite of M2DGR. Sensors are strictly calibrated and using the same time stamps.

**Table 1.** The dataset contains all kinds of scene.


We contrast our system with ALOAM, LeGo-LOAM and LIO-SAM. ALOAM only depends on LIDAR; the core of the algorithm is the same as LOAM, but it is achieved according to Ceres for the code readability. LeGo-LOAM uses IMU data to help remove motion distortions from point clouds. LIO-SAM is a tightly coupled LIDAR and IMU approach, but there is no front-end processing of point clouds and the traditional solution algorithm in scan-to-map is more time consuming.

We choose some typical scenarios such as zigzag, rotation and loop. In these cases, point cloud mismatching often occurs due to the violent motion of the carrier. The long-term run is to verify the elimination of the accumulated error of the LIO system.

In Table 2, the bold and italic values indicate the minimum error. Seven groups of experiments prove that our algorithm improves the accuracy in most scenarios. Especially in a scene such as a street. In the sequence "Circle\_02", our LIO system has a higher error than LeGo-LOAM. That is because "Circle\_02" is collected in a fixed scene and the ground robot is always rotating. This motion state has slightly bad effect on IMU pre-integration. Other than "Circle\_02", our system benefits from the tightly couple of inertial and LIDAR information. In the sequence "Gate\_02" and "Hall\_05", the four algorithms perform equally well. These two scenes are simple and rich in structural features. However, in the street sequence, our algorithm can greatly improve the performance.


**Table 2.** Absolute trajectory error (ATE) RMSE (m) of the four algorithms in seven datasets.

6.1.1. Positioning Performance Analysis

Street\_01 is chosen for our analysis. Figure 8 shows the trajectories of the four algorithms in street\_01 in X-Y plot. The accumulative error of four algorithms can be obtained from the detail diagram. Our system makes reasonable use of feature information, which effectively improve the accuracy of point cloud matching. Planar points are extracted from ground surface feature information, and edge points are extracted from rod feature information. We notice that ALOAM, LeGo-LOAM and LIO-SAM will drift and have a large deviation when it comes to turning. However, in this system, the point cloud registration has been greatly improved.

**Figure 8.** The trajectories of the four algorithms on street\_01. Our trajectory is closest to the ground truth. The detail diagram is near the end of this test.

More detailed evaluations have been conducted. Figure 9 shows absolute trajectory error variation for 4 algorithms in street\_01. ALOAM have largest error without IMU measurements, and its scan-to-scan method has a bad influence, which is easy for providing inaccurate information in the scan-to-map module. LeGo-LOAM applies a two-step scanto-scan method, which is beneficial for improving efficiency, but it still introduces much error. Moreover, without IMU constraint, loose-coupled LIO system such as LeGo-LOAM cannot adequately make use of sensor observation information. In LIO-SAM, there is no point cloud preprocessing section. Lots of unstable observations also have bad effects on point cloud registration.

**Figure 9.** Illustration of the absolute trajectory error variation. Our system has been kept a low level.

This work not only tests RMSE in Table 2. Figure 10 displays each evaluation parameter. Our system has a good performance in different indicators. Figure 11 is the box diagram, which is used to display the dispersion of a set of data. The system also has the lowest deviation.

**Figure 10.** Illustration of A variety of indicators. The superiority of our algorithm can be concluded from the statistical graphs.

**Figure 11.** Box diagram which is used to reflect the characteristics of distribution of data.

Four algorithms have been compared. We now analyze the difference between our system and ground truth. Figure 12 is the display diagram of trajectory and truth value in the X-Y plane.

**Figure 12.** The detailed trajectories of ground truth and our system in street\_01. They are aligned well.

Figure 13 shows error changes in three directions throughout the period. We can see that the system has an obvious deviation at the start time. That is because the optimization process takes time to converge and correct. At the same time, local map takes time to build. After a long run, the error is remained low in our system. It verifies the robustness and high precision of the work.

**Figure 13.** Error changes in three directions. The initial error is a little larger, after that the error is small.
