*4.1. LIDAR-IMU Tightly Coupled System*

On the theoretical basis of pre-integration, a large amount of LIO-related work has gradually emerged in recent years. One of the early approaches to tightly couple LIDAR and IMU was proposed in LIPS [55], which employs a graph-based optimization framework. In this framework, a planar representation of the closest point is proposed. A set of point clouds is parameterized as plane features, and then the residuals-function is converted into the differences between the plane parameters of two frames, which, together with the residual term of the IMU pre-integration, constitutes the final optimization function. This tightly coupled approach was deeply influenced by the VINS series and began to emerge in the field of LIDAR SLAM. This form lays a solid foundation for the subsequent LIO and LVI tightly coupled systems.

The pre-integration of the IMU was used for removing the distortion of the raw point cloud in Gentil's work [56]. It tightly integrates the IMU and LIDAR data into a batch manifold optimization formulation, which describes the motion in the LIDAR scan based on the extrinsic parameters of IMU. The system also considers the first-order form of the pre-integration error to the time difference and solves the problem of hardware time asynchrony. Ye [57] proposed a tightly coupled LIDAR inertial localization and mapping framework, LIOM (LIO-mapping), which jointly optimizes measurements from LIDAR and IMU. A sliding-window model was further used to maintain a certain scale of optimization data. The accuracy of LIOM is better than that of LOAM. However, real-time cannot be achieved since LIOM is designed to process the measurements by all sensors.

Inspired by Hess's work [58], Ding [59] introduced the subgraph representation of 2D SLAM into the 3D LIDAR odometry and added inertial data to establish motion prediction and constraints between frames. In this system, the 3D occupancy grid method is utilized to replace the 2D occupancy grid to realize the pose measurement of all 6 degrees of freedom. Finally, the iterative solution is performed in the solver Ceres [60]. The system innovatively joins the environmental change detection (ECD) module, which can detect whether the known surrounding environment has changed. However, this feature is not used to eliminate the influence of unknown dynamic environment on SLAM.

The authors of LeGO-LOAM released LIO's follow-up work LIO-SAM [61] in combination with IMU-related theories. The system builds the LIDAR-inertial odometry on a factor graph, and multiple relative and absolute measurements including closed loops are incorporated into the system as factors, as shown in Figure 6. The innovation of LIO-SAM is to marginalize old pose and point cloud data to replace matching scans to global maps. The system uses local map matching instead of global matching to significantly improve real-time performance. In addition, the system also adds the GPS absolute positioning factor [62], which is used to correct the long-term drift of the system. However, since feature extraction relies on geometric environments, this method still cannot work for a long time in open scenes.

The latest work of LIRO [63] proposed a sensor fusion scheme combining LIO with UWB ranging. The solution can be easily deployed with minimal cost and time. The system tightly couples IMU, LIDAR, and UWB data with timestamp-based robot states in a sliding window to construct a cost function consisting of UWB, LO, and IMU pre-integrations. Finally, a factor graph model is used to incrementally marginalize and update the data within the window. However, the usage scenarios of UWB have great limitations. The system will no longer have an advantage in a huge range of occlusion scenarios.

The tightly coupled LIO system proposed by Chen [64] further refines the front-end. He proposed an efficient algorithm to simultaneously extract the explicit mixed features of the original point cloud, including ground features, edge features, and plane features. The system also introduced a deep learning-based LPD-Net to generate global descriptors for point clouds. The loop closures detection can be accomplished in the key frame database. This method greatly improves the accuracy of closed loop detection. In order to ensure the real-time performance of the system, Li [65] proposed a quantitative evaluation method for point cloud feature constraints and a screening algorithm for key features. An effective compromise is traded off between accuracy and computational cost. Lv [66] proposed a high-accuracy continuous-time trajectory estimation framework for LIO systems to efficiently fuse high-frequency asynchronous sensor data. The system uses a non-rigid body registration method for continuous-time trajectory estimation. Dynamic and static control points are defined to further optimize trajectory estimation. At the same time, a two-stage closed-loop correction method is proposed to effectively update the closed-loop pose and control points, respectively. However, the computational cost of closing the loop is not reported, nor does it address the uncertainty in motion that might suffer from motion degradation.

**Figure 6.** Factor graph structure of LIO-SAM.

RF-LIO [67] is a dynamic SLAM framework proposed on the basis of LIO-SAM. The system adaptively adds a multi-resolution range image composed of point clouds and removes moving objects using tightly coupled LIDAR inertial odometry. The LIDAR scans are then matched with the subgraphs. Therefore, it can obtain accurate pose estimation results even in highly dynamic environments. LIO tightly coupled systems based on solidstate LIDAR have also been gradually attracting attention. However, LIO degradation still occurs when moving in open scenes for a long time. FAST-LIO [68] proposed an efficient and robust LIO framework based on tightly coupled iterative Kalman filters for UAV systems. However, the system discards the impact of historical data on the current state. Global pose correction cannot be performed.

The tight coupling of the inertial system will undoubtedly increase the computational burden of the system while improving the accuracy. Most of the existing algorithms improve computing speed by marginalizing historical data or limiting local map capacity. The back end optimization generally only builds the pose graph of the LIDAR without adding the bias and speed measured by the IMU. These methods achieve excellent results in most scenarios. However, due to the dependence on geometric features, once the inertial system loses the LO constraint in the open unstructured scene, the SLAM will suffer serious drift and degradation. The related work on the LIDAR-IMU tightly coupled system is compared in Table 4.


**Table 4.** LI tightly coupled system.

With the development and improvement of the IMU pre-integration theory, the LO system can establish a stronger constraint relationship with the IMU. The localization accuracy of the SLAM system has also been further improved. However, tight coupling involves a great amount of computation. Finding a balance between the speed and the precision is the difficulty of this stage of work.

#### *4.2. LIDAR-Visual-IMU Tightly Coupled System*

Although the study of visual SLAM started relatively late, it quickly became a research hotspot of SLAM technology due to its advantages of small size and low cost. Many VIO works have reported in recent years. Vision forms an excellent complementation with LIDAR because it is not constrained by the structure of the scene. Therefore, LVI systems have received increasing attention owing to their stronger robustness in sensor degradation scenarios.

The strong data association between point clouds and images enables the system to tightly combine multiple effective features in the preprocessing stage. They will play an important role in the matching and optimization process. The LVI tightly coupled system is divided into two coupling methods based on optimization and filtering. The optimizationbased approach tightly integrates the error models of individual sensors and reduces the sensitivity of time synchronization by using local maps or sliding windows. This method simultaneously optimizes historical poses and achieves real-time performance with BA. In addition, this paper classifies the tight association of sensor data level as a special category of the tightly coupled. Although no joint optimization is performed during the pose solving, these works strongly correlate the data through preprocessing, and already contain the necessary key information of the sensor in single objective function. They all have a closed system, which is less scalable and compatible with other sensors. Filter-based approaches merely use the sensor data of the current frame and rely on the time synchronization of each data. Since the influence of historical data on the current pose is not considered, the amount of computation is relatively small and the scalability is relatively good.

LIMO [69] is one of the multi-sensor fusion positioning systems. The system performs strong correlation between point clouds and images through a variety of data preprocessing methods to achieve a stable and robust system. The system performs foreground segmentation, plane fitting, and ground fitting on point clouds for different scenes so as to obtain the best depth estimation of visual features. The system combines the 3D-2D PNP (Perspective-n-Point-Problem) [70] pose estimation method and the 2D-2D epipolar constraint [71] to achieve a good localization effect. Reference [72] is another example of strong data correlation, where point and line features are extracted in the image, and the position information of points and lines are obtained by a similar method to LIMO. Furthermore, a 3D-3D ICP model and reprojection error functions of points and lines are constructed, which achieves higher accuracy pose estimation. Recent work [73] fuses data from two sensors in a higher-level space using geometric features co-detected in LIDAR scan and image data. Correspondences between 3D lines extracted in the LIDAR scan and 2D lines detected in the image were determined. Wang [74] proposed a DSP-SLAM system by combining object detection with SLAM. This work uses a DeepSDF network to generate object shape vectors and 7D poses from keyframed point cloud and image data. Sparse point clouds and image segmentation results are used as observations to minimize surface loss and depth rendering loss functions. Object reconstruction and pose update are added to the ORB-SLAM2-based BA factor graph to simultaneously optimize camera pose, map points, and object pose. These works do a lot of meaningful work on data association, which makes the whole system more robust. However, these systems obviously lack the tight coupling of the optimization process. The combination of correlation between raw data and integration of errors will lead to a stronger system.

On the other hand, the very matured framework of the LIO system paves the way for the establishment of the LVI tightly coupled system, which has led to the emergence of a large number of tightly coupled systems based on the optimized LIDAR-Vision-IMU in the past two years. GR-Fusion [75] uses camera, IMU, LIDAR, GNSS, and encoder of motion chassis as main sensors to build a factor graph model in a sliding window. The LIDAR factor, the visual factor, the IMU factor, and the odometry factor are added as primary constraints to the factor graph. Meanwhile, local constraints are tightly coupled with GNSS constraints to optimize the global state of the robot. The system can detect the degradation of the sensor in real time and flexibly configure multi-working modes, which is suitable for a wide range of scenarios. LVIO-Fusion [76] adopts a similar system architecture. The difference is that the binocular camera is used as the visual sensor. This paper innovatively proposed different optimization strategies for straight and turning. Moreover, reinforcement learning networks are introduced to adaptively adjust the weights of sensors in different scenarios. LVI-SAM [77] is the latest work by the authors of LeGO-LOAM and LIO-SAM. The system consists of VIS (Visual-Inertial System) and LIS (LIDAR-Inertial System). VIS can provide pose prior for LIS, which can provide pose estimation and accurate feature point depth for VIS initialization. However, this system does not consider the marginalization of the LIO system and the problem of timestamp synchronization.

Some recent odometry systems also employ a tightly coupled approach to obtain low-drift pose estimates. Super Odometry [78] and MetroLoc [79] both use an LVI system with an IMU as the main sensors. The system consists of three parts: IMU odometer, VIO, and LIO. The observation data provided by VIO and LIO can constrain the bias of the IMU. On the other hand, the constrained IMU odometry provides predictions for VIO and LIO to achieve coarse-to-fine pose estimation, which is shown in Figure 7. The system can simultaneously extend GPS and wheel odometer with robustness to sensor degradation. Wisth [80] proposed a tightly coupled system based on LIDAR and a binocular camera. The factor graph optimization problem is composed of initial prior factor, visual factor, line factor, plane factor, and IMU pre-integration, and is solved by GTSAM [81]. The system shares a representation of vision-based point features and point cloud-based line and area features. The reprojection error function is then defined by parameterizing different features. In addition, based on the data timestamp of the camera, the system splits and merges the adjacent scan data of LIDAR to realize the time hard synchronization at the software level. There is no doubt that these works have good real-time positioning effect. However, a pure odometer system often discards historical data and only focuses on current or local observations, which leads to the loss of correlation and optimization of global data. Therefore, excellent front end odometer and reasonable back end optimization are necessary for SLAM system.

**Figure 7.** Factor graph structure of Super Odometry.

Filter-based methods also play an important role in the field of multi-sensor fusion. For joint state optimization, many methods use the EKF or the MSCKF framework [82]. Yang [83] used MSCKF to tightly couple planar features from RGB-D sensors, IMU measurements, and visual point features within 3.5 m. To limit the scale of the state vector, the system linearly marginalizes most of the point features and retains a few point features with plane-enhanced constraints in the state vector as SLAM features. The LIC-Fusion proposed by Zuo [84] adopts the MSCKF fusion framework, which tightly combines IMU measurements, extracted LIDAR edge features, and sparse visual features. Subsequently, in the latest follow-up work, LIC-Fusion 2.0 [85], the authors introduced a sliding-window-based planar feature tracking method to efficiently process 3D LIDAR point clouds in real time. R2LIVE [86] is a tightly coupled work based on solid-state LIDAR. The system combines an error-state-based iterative Kalman filtering front end and a new step of factor graph optimization-based sliding window optimization to refine the visual pose and landmark estimation. It achieves high accuracy and robustness in harsh scenarios such as indoors, outdoors, tunnels, and high-speed motion. These methods are fast and computationally inexpensive, but are sensitive to time synchronization. Measurements during filtering may degrade or fail. Therefore, a special sorting mechanism is required to guarantee the correct order of the measurement results of the different sensors.

In this paper, the LVI tightly coupled system is divided into three parts: strong data correlation, nonlinear optimization tight coupling, and state filter. Among them, the tightly coupled front end based on optimization is the main implementation. Table 5 compares the related works of the LIDAR-Vision-IMU tightly coupled system.


**Table 5.** LVI Tightly coupled system.


**Table 5.** *Cont.*

The emergence of a complete system in which LIDAR, vision, and IMU cooperate and complement each other is a milestone for multi-sensor fusion SLAM. The integration is not limited to these three sensors. Wheel/leg odometer and GNSS have also been effectively integrated into the system. Similarly, the increase in computational complexity is one of the toughest problems. In addition, there are some details that need to be optimized, such as dynamic environments, unstructured environments, rain and snow weather.

#### **5. Performance Evaluation**

#### *5.1. SLAM Datasets*

Evaluating the performance of SLAM algorithms is often inseparable from the help of open-source datasets. The mobile carriers for research and application of 3D LIDAR SLAM include unmanned vehicles, unmanned ships, and unmanned aerial vehicles. However, the current LIDAR point cloud datasets are mainly for autonomous driving scenarios. Data collection in outdoor scenes is complex and cumbersome, involving time synchronization, coordinate calibration, and calibration among various sensors. Public datasets save the time for data preparation for algorithmic research. The sequences and benchmarking frameworks provided also facilitate algorithm development.

The current public LIDAR-based datasets in the field include: KITTI dataset [87], which is currently the largest international evaluation dataset for autonomous driving scenarios and is also the most commonly used dataset in academia. The Waymo dataset [88] is a dataopen project of the autonomous driving company Waymo. The PandaSet dataset [89] is used to develop safe and controllable autonomous driving technology in complex environments and extreme weathers. Oxford Robotcar dataset [90] is a public dataset proposed by Oxford University Robotics Laboratory. The UrbanNav dataset [91] provides a challenging data source to the community to further accelerate the study of accurate and robust positioning in challenging urban canyons. The UrbanNav dataset includes complex and dynamic urban road environments and closed tunnel environments. It also provides the real pose of the GNSS system as a reference. Compared with the commonly used KITTI dataset, the collection environment of the UrbanNav dataset is closer to the complex environment of unmanned driving. At the same time, the related team of the UrbanNav dataset provides an overview [92] of LIDAR Odometry, which uses this dataset to evaluate open-source algorithms based on point clouds and features, respectively.

Obviously, the complex scene of the city can magnify the advantages and disadvantages of the SLAM algorithm. Therefore, this paper uses the UrbanNav dataset to evaluate some open-source algorithms in the following section.
