**3. Multi-Sensor Loosely Coupled System Based on LIDAR**

The emergence of loosely coupled systems has opened up a new stage in the development of multi-sensor fusion systems. Until now, its application in low-cost platforms with limited computation powers is still wide. Loose coupling is mostly applied in three aspects: multi-sensor-based stage pose estimation, raw data-based information complementarity, and sensor-assisted pose constraints.

### *3.1. LIDAR-IMU Loosely Coupled System*

Most loosely coupled systems appear in earlier works. The popularity and wide attention of LIDAR-based 3D SLAM technology can be attributed to Zhang's original work, the LOAM algorithm [11]. One of his important contributions was to extract the information of effective edge and plane feature points from the complex point cloud. Furthermore, the point-to-line and plane distances are used to construct an error function and solve the nonlinear optimization problem of the pose, as shown in Figure 4. Even the earliest LIDAR SLAM systems already had the concept of sensor fusion. This work uses the integration operation of the gyroscope and accelerometer of the six-axis IMU to obtain the prior pose, which further improves the accuracy of the LIDAR odometer. However, LOAM does not have loop closure detection and global pose optimization at the back-end. After that, many LIDAR-IMU loosely coupled systems were improved and perfected on the basis of LOAM. The work presented in this section not only focuses on sensor data fusion, but also improving the point cloud registration of the front end and the overall optimization of the back end.

Shan [12] proposed the LeGO-LOAM algorithm on the basis of LOAM, which introduced point cloud clustering and ground segmentation into data preprocessing to speed up point cloud registration [13]. At the same time, a simple acceleration formula is used to process the IMU data for point cloud distortion correction and provide a priori pose. The IMU has played the same role in line/plane feature-based LOAM and two-stage LOAM. The feature extractions have drawn more attention. The normal vector of point is used to extend the feature types in both methods [14,15]. Different from both of the above methods, CSS-based LOAM and ALeGO-LOAM enhance the feature quality of LOAM [16,17]. Since the curvature scale space method and adaptive cloud sampling method are put forward, more accuracy edge points and plane points are extracted. However, this loose integration method does not effectively exclude the influence of the measurement bias of the IMU itself. Moreover, the IMU is merely a supplementary means.

**Figure 4.** Inter-frame pose estimation in LOAM systems.

In addition to the improvement in feature extraction, improved ICP and GPU acceleration are also optimized. ICP is widely used in SLAM front end. Many works in recent years have proposed faster and more robust ICP variants in order to guarantee the real-time (10 Hz) performance of LIDAR odometry in real-life regularization scenarios. The latest work [18] introduces symmetric KL divergence in the front end ICP algorithm. Its optimization object includes not only the distance between points, but also the difference in distribution shape. In order to ensure the real-time performance and calculation accuracy of the front end, GPU acceleration is applied to point cloud computing including SuMa [19], Elastic-LiDAR Fusion [20], and Droeschel [21]. The shape of the distance data is approximated as a set of disks called Surfel [22], which is convenient for solving point-to-plane problem in GPU. Due to GPU acceleration, the dense point cloud is exhibited in front of people. The visual content is more understandable. To some extent, these point clouds have semantic meaning. Moreover, these methods can be used in virtual reality and augmented reality. However, portable devices and mobile robots still lack powerful GPUs.

It is essential to adopt overall optimization of the back end for the SLAM system. In the latest work [23], Yue enriched feature types. He used the Multi-metric Linear Least Squares Iterative Closest Point (MULLS ICP) algorithm based on categorical feature points to efficiently estimate self-motion and construct a submap-based PGO (Pose Graph Optimization) backend optimization. Effective loop closure detection is a significant procedure of SLAM. In [24,25], the geometric and intensity information of the point cloud are encoded, and a global point cloud descriptor is set to implement a rotation-invariant loop-closure matching algorithm, which clarifies the appropriate optimization timing for the SLAM back-end. In [26], a 2D histogram, converted by all key frames of point clouds, determines where the loop closure occurs by calculating the similarity between the current frame and historical key frames. The loop detection is added in LOAM and LeGO-LOAM.

In addition, there is a branch of 3D LIDAR—solid-state LIDAR. It has the advantages of stable performance and low cost compared to the traditional multi-line LDIAR such as VLP-16 and VLP-32. However, by using irregular scans, this kind of LIDAR with smaller field of view tends to result in motion blurring. Solid-state LIDAR-based SLAM is a new topic. LOAM-Livox [27] is the one of the most representative works. According to the unique scanning method and sensor characteristics of the Livox radar, the author designed a SLAM system suitable for Livox with LOAM as a reference. The system removes the unqualified point cloud and extracts the line and surface features. The pose is iteratively solved by constructing the residuals of the line and surface distances. However, IMU is not used in this method.

The loose coupling of the inertial system processes the IMU data for point cloud distortion correction and provide a priori pose. In this framework, the effect of sensor fusion is limited. Thus, most of the existing algorithms improve front end and back end. The related work on the LIDAR-IMU tightly coupled system is compared in Table 2.

**Table 2.** LI loosely coupled system.


It can be seen from the table that most of the LI loosely coupled systems in recent years focus on the completion and optimization of the system. Finding accurate front end matching and efficient back end optimization methods are their major innovations. Although this part of the work did not make outstanding contributions to data fusion, it provided a stable platform and interface for the subsequent work and accelerated the development of SLAM technology. A summary of the methods in this section according to different strategies is shown in Figure 4.

#### *3.2. LIDAR-Visual-IMU Loosely Coupled System*

LIDAR odometer degradation occurs in unstructured and repetitive environments. Even for the positioning with the assistance of IMU, it still cannot work properly for long periods. In contrast, vision sensors do not require specific structural features such as edges and planes, which rely on sufficient texture and color information to accomplish localization. However, vision sensors cannot obtain depth information intuitively. Therefore, combining a camera with LIDAR provides a complementary solution. The LO and VO of LIDAR-Visual-IMU loosely coupled systems mostly operate independently but they share positioning information to each other for pose correction and smoother estimation.

The authors of LOAM extend their algorithms by combining feature tracking of monocular cameras with IMU in V-LOAM [28]. They correlated feature point depths with point clouds to produce a visual-inertial odometry for LIDAR scan matching. However, VO in this work only provides pose before LO and the final error solution is exactly the same as LOAM, which is without vision coupling. Subsequently, the authors of V-LOAM released its iterative version [29]. The improved method employs a sequential parallel processing flow to solve the motion estimation from coarse to fine. The system uses visual-inertial coupling methods to estimate motion and perform the scan matching to refine motion estimation and mapping further. The resulting system enables high-frequency, low-latency motion estimation, as well as dense, accurate 3D map registration.

Tim [30] took a different approach and chose to perform visual localization in a known point cloud map. The method utilizes the map acquired by the camera to track the pose of the monocular camera in a given 3D LIDAR map. Its local BA-based visual odometry system reconstructs sparse 3D points from image features and continuously matches them with the map to track the pose of the camera in an online manner. However, this approach requires the map to be obtained prior. This fusion method is obviously contrary to the original intention of SLAM and is more like an odometer. Zhang [31] combined LOAM-based LIDAR odometry with optical flow tracking-based visual odometry. On the premise of culling dynamic objects, the system weights and fuses the pose results of the two according to the number of valid features. However, this method cannot run in real-time. Pose optimization is still performed independently and without data association.

With the rapid development of VIO systems [32–34], visual-inertial odometry has gradually become a research hotspot in SLAM with its high performance to price ratio and high positioning accuracy. It has a profound impact on multi-sensor fusion in LO systems. Wang [35] proposed a LIDAR-Visual-Inertial SLAM system based on V-LOAM and VINS-MONO. He used a V-LOAM-based approach for mileage estimation and back-end global pose graph optimization by maintaining key frame database. In this approach, the pose estimation result of LO can correct the VIO system.

Shao [36] uses a binocular camera to form a LIDAR-Visual-Inertial system, which is divided into two parts: binocular-based VIO and LIDAR-based LO. The binocular VIO system employs stereo matching and IMU measurements to perform IMU pre-integration sum and tight coupling of pose graphs to marginalize lag frames, which provides LO with an accurate and reliable initial pose. Based on LOAM, LO adds vision-based closedloop detection and pose graph-based back end optimization, and uses iSAM2 [37,38] to incrementally optimize the LIDAR odometry factor and closed-loop factor. This work has approached tightly coupled systems, but the VIO and LO of the system are still relatively independent. Efficient closed-loop detection and back-end optimization make up for these shortcomings and lay the foundation for a large number of tightly coupled systems that appeared later.

Khattak [39] proposed another loosely coupled method similar to V-LOAM, which uses the inertial prior results of visual and thermal imaging for LIDAR scan matching. To adapt to a variety of complex environments, the authors employed visual and thermal imaging inertial odometry to work in long tunnels without illumination. In [40], the authors combined the VO and LO systems with a leg odometer. In its core, an Extended Kalman Filter (EKF) fuses IMU and legged odometry measurements for attitude and velocity estimation. The system also integrates attitude corrections from VO and LO and corrects attitude drift in a loosely coupled manner. This method has a good localization effect on the legged robot platform.

Lowe [41] proposed a LIDAR-aided vision SLAM system, which employs a novel feature depth and depth uncertainty estimation method. The system uniformly parameterizes three different types of visual features using measurements from LIDAR, camera, and IMU, simultaneously. The system has good adaptability to handheld devices.

CamVox [42] is the first Livox LIDAR SLAM system for assisted vision. The system is built on ORB-SLAM2 [43] and uses Livox to provide a more accurate depth estimation for the camera. Unlike LOAM-Livox, IMU is used for distortion correction of non-repetitively scanned point clouds. In addition, the authors utilized the non-repeating scanning feature of Livox LIDAR to perform automatic calibrations between the camera and LIDAR at uncontrolled scenes. The system achieved better pose estimation results than VINS-MONO and ORB-SLAM2. Shin et al. [44] believe that the relatively sparse point cloud is not meaningful for the depth associated with visual features. They applied the direct method [45] to the combination of low-line LIDAR and camera to implement a loosely coupled SLAM system, which addresses the sparsity problem in data association.

The latest representative loosely coupled work [46] proposed a system composed of multiple odometry methods. The system takes point clouds and images as outputs. Pose estimation algorithms include GICP (Generalized Iterative Closest Point) [47], P2P-ICP (Point-to-Plane Iterative Closest Point) [48], NDT (Normal Distributions Transform) [49], ColorICP (Color Iterative Closest Point) [50], and Huang's method of combining LiDAR and camera data [51]. The system utilizes multiple odometers to improve integrity and robustness. Point cloud-based localization evaluation methods and scoring criteria are defined to generate the optimal pose results. However, the system does not have data association or sharing. Wang [52] proposed a LIDAR-assisted VIO system, which relies on the voxel map structure to efficiently assign the depth information of LIDAR to visual features. Moreover, this work innovatively introduced the vanishing point information in the image into the visual odometry to reduce the rotation drift further. The localization accuracy of this method is superior to the state-of-the-art VIO and LIO systems. Table 3 summarizes the related works on the LIDAR-Visual-IMU loosely coupled system.


**Table 3.** LVI loosely coupled system.


**Table 3.** *Cont.*

This part of the work becomes more flexible with the introduction of vision. The point cloud increases the stability of the depth acquisition of visual features. In addition, the robustness of system positioning is also stronger. However, loose coupling leads to the independence between vision and LIDAR. The constraints between the data are not strong enough.

#### **4. Multi-Sensor Tightly Coupled System Based on LIDAR**

Positioning and mapping techniques are applied to more complex and changeable scenes with the rapid development of robotics. The previous loosely coupled system has the advantages of real-time and low computational complexity. However, it is still difficult to guarantee the accuracy in high-speed motion or degradation scenarios. With its highfrequency motion response characteristics, IMU has always been an indispensable sensor for mobile robots. For tightly coupled systems, it is a key issue to effectively fuse the IMU with other odometers.

Tightly coupled systems based on IMU assistance have made a breakthrough in visual odometry [53,54]. In this work, the IMU pre-integration formula, error transfer model, and definition of residual are deduced, which have a profound impact on the subsequent development of LIO and VIO. Moreover, these equations and models become the theoretical basis for joint optimization of tightly coupled systems. The world coordinate system is defined as W, the binding of the robot coordinate system and the IMU coordinate system is defined as B; the state vector **x** of robot can be defined as:

$$\mathbf{x} = \begin{bmatrix} \mathbf{R}^{\mathrm{T}}, \mathbf{p}^{\mathrm{T}}, \mathbf{v}^{\mathrm{T}}, \mathbf{b}^{\mathrm{T}} \end{bmatrix}^{\mathrm{T}} \tag{1}$$

where **R** ∈ SO(3) which is the rotation group, **p** and **v** are the position and velocity vectors of the robot, and **b** is the bias of the IMU. The measurement value of the IMU can be written as:

$$
\hat{\omega}\_t = \omega\_t + \mathbf{b}\_t^{\omega} + \mathbf{n}\_t^{\omega} \tag{2}
$$

$$
\hat{\mathbf{a}}\_{l} = \mathbf{R}\_{l}^{\text{BW}} (\mathbf{a}\_{l} - \mathbf{g}) + \mathbf{b}\_{l}^{a} + \mathbf{n}\_{l}^{a} \tag{3}
$$

where *ω*ˆ and **a**ˆ represent the angular velocity and angular velocity measurements of the IMU, **b** and **n** represent the bias and noise of the gyroscope and accelerometer, respectively, *ω* and **a** represent the true value, **g** represents the local gravitational acceleration, **R**BW represents the world coordinate system to the rotation matrix of the IMU coordinate system, and *t* represents the time. Using the discrete pre-integration method in [53], the relative motion can be obtained. Δ**v***ij*, Δ**p***ij* and Δ**R***ij* can be expressed as:

$$
\Delta \mathbf{v}\_{i\dot{j}} = \mathbf{R}\_i^T \left( \mathbf{v}\_{\dot{j}} - \mathbf{v}\_i - \mathbf{g} \Delta t\_{i\dot{j}} \right) \tag{4}
$$

$$
\Delta \mathbf{p}\_{ij} = \mathbf{R}\_i^T \left( \mathbf{p}\_j - \mathbf{p}\_i - \mathbf{v}\_i \Delta t\_{ij} - \frac{1}{2} \mathbf{g} \Delta t\_{ij}^2 \right) \tag{5}
$$

$$
\Delta \mathbf{R}\_{ij} = \mathbf{R}\_i^T \mathbf{R}\_j \tag{6}
$$

The residuals for the terms are expressed as:

$$\mathbf{r}\_{ij} = \begin{bmatrix} \mathbf{r}\_{\Delta \mathbf{v}\_{ij'}}, \mathbf{r}\_{\Delta \mathbf{R}\_{ij'}}, \mathbf{r}\_{\Delta p\_{ij'}}, \mathbf{r}\_{\mathbf{b}\_{ij'}}, \mathbf{r}\_{\mathbf{b}\_{ij}^a} \end{bmatrix} \tag{7}$$

For the specific derivation of residuals, please refer to the literature [53,54]. With the pre-integration formula and the definition of the error term, the coupling relationship between the IMU and the world coordinate system can be decoupled in the process of joint optimization. The system can update the biases of the IMU to ensure that the IMU data are added and optimized. Hence, a measured value closer to the true one is obtained. The definition of the residual makes it easier for the IMU to combine the residual terms of other sensor odometers to create a more complete error function. This is also the rationale of the tightly coupled optimization. The factor graph representation of the tightly coupled system is shown in Figure 5.

**Figure 5.** The factor graph for tightly coupled systems.
