1. Introduction
SLAM algorithms are complex methods that allow a robot, without any external system other than its own sensors, to create a map of the environment and locate itself into this map. There are a large amount of non-linearities and imperfections in the mobile robot system (e.g., robot drifts, sensor noise, irregular environment) that could lead the SLAM algorithms to a bad representation of the environment, getting lost on this representation, or spending a considerable amount of computational resources [
1,
2]. Therefore, since these are the main difficulties a robot with a SLAM algorithm must overcome, this work focuses on characterizing, calibrating, and comparing five different 2D SLAM algorithms towards creating a good map, having a good track of its pose (position and orientation), but also spending the less possible CPU and memory while doing so.
For longer than two decades, SLAM has been in the spotlight of many robotics researchers, due its many possible applications such as autonomous driving [
3,
4], search and rescue [
5], autonomous underwater vehicles [
6,
7], and collaborative robotics [
8], which is why, nowadays, there are many different approaches trying to solve the same problem [
9]. Below are shown the most frequent SLAM algorithms approaches.
A first approach to solve the SLAM problem was based on the extended Kalman filters (EKF) [
1]. Kalman filters [
10] are based in the implementation of observers, which are mathematical models of the linearized system that help estimate the behavior of the real system, and in the utilization of an optimal state estimator, that considers white noise in the measurements of the system [
11]. For the SLAM problem, the EKF first predict the robot state (pose and a map represented by a series of landmarks or features) [
1,
12] using a mathematical model of the robot movement and the environment, and then uses the sensor data to correct the prediction. The sensor normally used in this approach is a LiDAR, but there are solutions using sonars or monocular cameras [
3].
Another strategy in the SLAM solution was made by using particle filters. This is a modern approach, but its conceptualization is not since its origins are approximately around 1949 with the Monte Carlo method [
13]. The main methodology difference towards Kalman filters is the data distribution type that this method can deal with. Kalman filters are intended to deal with linear Gaussian distributions [
10,
14], while particle filters can deal with arbitrary non-Gaussian distributions and non-linear process models [
1]. Particle filters in SLAM use a set of particles, each being a concrete guess of the robot state (pose and map) [
9]. As the robot moves into the environment and uses information from the sensors, the filter removes erroneous particles (with low probability of occurrence) and adds new particles close to those with the best probability of occurrence [
15]. After a certain time, the erroneous particles will have been eliminated while the correct ones will be similar between them (similar pose and map estimates) [
16]. The sensor normally used in this approach is a LiDAR [
9]. The algorithms Gmapping [
17] and HECTOR-SLAM [
18] are modern examples of the SLAM particle filter solution.
A more recent approach considers using graph-based methodologies. This proposes to use a graph [
3] whose nodes correspond to the robot’s poses at different points in time and whose edges represent restrictions between the poses. The graph is obtained from observations of the environment or movement actions conducted by the robot. When this graph is assembled, the map can be calculated by finding the spatial configuration of the nodes that is most consistent with the measurements modeled by the edges [
19,
20], this solution is usually obtained with standard optimization methods (e.g., Gauss-Newton, Levenberg–Marquardt) [
19] or with nonlinear sparse optimization [
9]. The sensor normally used in this approach is a LiDAR, but there are solutions using some time-of-flight cameras (also known as RGB-Dept cameras) such as the Microsoft’s Kinect [
9]. The algorithms Cartographer [
21], KARTO-SLAM [
22] and the original RTAB-Map [
23] are modern examples of the graph-based SLAM solution.
There are also modern methods that can be used for 3D SLAM, which can use different sensor types, such as Visual SLAM (vSLAM) that use low-cost cameras (e.g., monocular, stereo, and RGB-Dept cameras) to capture the environment data as the robot navigates, and then extract the relevant information to solve the SLAM problem using the EKF, the particle filter or the graph-based approach [
24]. For example, the latest version of the RTAB-Map SLAM algorithm also supports visual slam [
23]. There is also the Visual-inertial simultaneous localization and mapping (VI-SLAM) algorithm that fuses the information obtained from the camera with the data obtained from an Inertial Measurement Unit (IMU), such as the orientation and the change in the pose, to improve the accuracy of the SLAM solution, that is obtained using a filter or an optimization approach [
25,
26]. Additionally, direct 3D Slam methods exists, that use more modern 3D LiDAR systems, which are applied to improve the SLAM algorithm performance in challenging environments (e.g., smoke in the surroundings, fog or rainy situations) [
27,
28]. Finally, there are methods that combine the vision and LiDAR approaches in order to improve the SLAM performance in cases of aggressive motion, lack of light, or lack of visual features. These algorithms employ 2D or 3D LiDAR sensors and the EKF or the graph-based methodologies to obtain the SLAM solution [
27].
In this paper we focus on the comparison of 2D Slam algorithms with similar pose and map representation. These are based on the previously described SLAM solution approaches, but with different capabilities and strategies to obtain the best possible map and pose adjustment, or even better resources usage optimization. These capabilities are important when dealing with different environments, such as robots with limited resources, which might require an algorithm with the highest resources usage optimization possible, while cases with robots dealing with complex environments might better select an algorithm that has deeply optimized the pose and map calculations. In the subsequent sections the selected algorithms will be described with deeper emphasis.
In this work, four metrics are used for the comparison of 2D Slam algorithms, they were created and processed in MATLAB, and are explained in the following paragraphs.
The map accuracy was measured using k-nearest neighbor method [
29], by measuring the euclidean distance from each of the ground truth points to the nearest map point generated by the SLAM algorithm under test. A mathematical representation of the metric can be found in Equation (
1), where
N is the amount of points to sample,
and
represent the x-coordinate and y-coordinate difference between the ground truth point and the nearest map point generated by the algorithm, respectively. The measurement units used for this metric are centimeters.
Pose tracking accuracy was developed by a set of iterative loops calculating the euclidean distance between the ground-truth pose and the estimated pose [
30]. It can also be represented by equation (
1), but with a modified interpretation of the variables. For this metric
N is the number of poses to sample,
and
represent the x-coordinate and y-coordinate difference between the ground truth pose and the estimated pose generated by the algorithm, respectively. The measurement units used for this metric are meters.
Finally, CPU and memory usage were recorded using Python
psutil library [
31]. These both metrics are mathematically represented by averaging the whole measurements taken during the test run, and their units are percentage of for CPU usage, where a number beyond 100% means it is using more than a single core, and MB for memory usage.
Lastly, there are many SLAM comparison investigations done previously, such as Ref. [
32], which focuses on the algorithms processing time; Ref. [
29] which evaluates map accuracy, and CPU usage; Ref. [
20] which evaluates map accuracy, CPU and memory usage; Ref. [
33] which only measures pose and map accuracy, and Ref. [
34] which analyzes map accuracy and CPU usage.
Based on the reviewed works, there are two differentiating factors of the method proposed in this paper, which puts our investigation a step ahead:
The existing works focus only on map accuracy, pose accuracy, memory or CPU usage, but none of them considers all of them together. Our investigation considers all of them, giving a wider point of view to better characterize, calibrate, and compare the SLAM algorithms.
None of the current methods takes a statistical approach to provide confidence levels on the results obtained. With our investigation we can guarantee with 90% confidence that each condition will happen when the populations are considered. In addition with 95% confidence level that the characterization and calibration of the parameters is the best fit for the ranges tested.
2. Materials and Methods
2.1. Generalities
For all these experiments, since the trials and algorithms were simulated, the only equipment needed was a computer running
Ubuntu 18 with
ROS Melodic, the computer was a server with an
Intel Xeon Silver 4114 2.2 GHz. To simulate the environment, a software called
GAZEBO 11.0.0 release was used to simulate the test environment, while a robot named
TurtleBot 3 Burger was the one selected to be simulated in this work, because of its 2D LiDAR sensor and its differential driving mode, but other configurations can be used, such as a mecanum omnidirectional robot [
35].
2.2. Simulation Needs
Regarding the ROS nodes, there are some nodes that were tailored for our needs, other than simulated robot that can be easily implemented based on the TurtleBot 3 wiki [
36]. The first of them is the so-called
Robot Pose Publisher, which basically reads the data published by GAZEBO and stores every convenient time (20 times per second in this case) the actual pose of the robot [
37], second, a node that monitors the CPU and memory usage by the SLAM algorithm [
38], and last but not least, a node that makes the robot follow a fixed path, to guarantee that all the samples were performed under the same conditions [
39].
2.3. Data Processing Needs
Next,
MATLAB 2020B was used to convert the data provided through
rosbags in a manner that can be easily analyzed and synthesized, the scripts used are
Ground Truth Generator, which takes the environment created through GAZEBO and builds a high resolution 2D version of it [
40]. This well-known
Ground truth plot is then compared to the SLAM algorithm result by using a script that takes advantage of
knn-search method provided by MATLAB [
41], its output is the descriptive statistics of the whole comparison.
There are two other important scripts, in the first it is compared the real pose towards the estimated pose of the robot, and returns some meaningful descriptive statistics about the comparison [
42], and a script that analyzes the CPU and memory usage by the algorithm [
43].
2.4. Data Analysis Needs
The data analysis software used to provide sufficient statistical evidence of the results provided, was Minitab statistical tool version 2018.
2.5. SLAM Algorithms Used
There are five algorithms used, all of them as a 2D algorithm because of the robot sensor limitation wanted, these are described in the following subsections.
2.5.1. Cartographer
Cartographer was created by
Google and released for free worldwide access since October 2016 [
21]. The main idea with this algorithm was to improve the efficiency, by optimizing the way to process the data from particle filters. So, instead of creating a big map, it divides them by shorter sub-maps, which then are inserted on the way, besides a pose optimization, concluding in an error reduction that is carried over from robot pose [
44].
This algorithm is based in the combination of two separated 2D SLAM, one of them working locally, and the other working globally, both using a LiDAR sensor and optimized independently. Local SLAM is based in the collection and creation of sub-maps, one of them is the recollection and alignment of multiple scans with respect to initial position. Sub-maps are created like a dot net with an specific resolution, and with a probability associated that one of its dots is blocked. This probability depends if it was measured previously and if it is kept while more sub-maps are created. Once sub-map is created, it is passed by an algorithm to find the optimal position to match with the rest of the sub-maps, and then extrapolate the rest of them [
45].
The second part of the algorithm, the global SLAM, is focused in the sub-maps feedback. Once these sub-maps are created, all of them have robot poses associated. which are used to improve the maps, making a reduction of the accumulated SLAM error. This is well-known as loop closure [
45].
By using the well-known optimization called
Spare Pose Adjusment (SPA), every time a sub-map is generated, a map-scanner is executed to close the loop and insert the just-created sub-map into the graphic. Below are shown two formulas that determine if a cell is saved as busy, empty, or empty into a map cell [
46].
where:
The intention is to minimize the functional cost of updating the cells value that compose the map.
where:
is the cell value x, softened by the neighbor values.
is the laser reading related to cell.
is the matrix transformation that displaces the point to .
is the posture vector .
This model is configured based on different parameters of the algorithm. Below, in the
Table 1 are shown the main parameters that have incidence in the functionality of the algorithm [
47].
2.5.2. Gmapping
This algorithm is based in the principles described in the particle filter with
Rao-Blackwellization, which makes the math to get the actual posture of the robot, right from the probability given by the information collected in the past; with the help of this posture and the past maps made. It also has the capability of correcting estimations by the odometry and the calculation of the weights and the map [
17].
This is one of the most studied types of SLAM algorithms, it came right after many years of investigation around particle filters, using the Rao-Blackwellized particle filter approach [
48] to solve more efficiently the SLAM algorithm, reducing the number of particles required for the estimation [
48]. In addition, the robot pose uncertainty is greatly decreased in this algorithm. However, it has a higher computational resource requirement, as it usually has an elevated processing time and memory consumption when compared to the EKF filter approach.
The main parameters responsible of the functionality of the algorithm are listed in the
Table 2, according to [
49].
2.5.3. HECTOR-SLAM
This algorithm is named because of its development team, which is
Heterogeneous Cooperating Team Of Robots, an as it is explained in [
18], it was developed because of the necessity of an algorithm for
Urban Search and Rescue scenarios (USAR).
HECTOR-SLAM was developed from a 2D SLAM using a LiDAR sensor that had attached an IMU, this sensor provides the measurements for the navigation filter, and also gives the capability to perform 3D mapping. This is the reason why HECTOR-SLAM can be used into either 2D or 3D strategies.
As shown in [
18], the algorithm uses an occupation grid map. Since LiDAR has 6 degrees of freedom, the scanned points must be transformed to a local coordinates framework using the estimated behavior from the LiDAR. Reason why, using the estimated pose, the scanned points are converted in a point cloud. With this point cloud, it is performed a pre-processing of the data,
HECTOR-SLAM uses a
z axis filtering of the final point, with this only the final points of the
plane are considered.
Regarding the list of parameters of
HECTOR-SLAM, these are defined in the
Table 3, they were taken from [
50].
2.5.4. KARTO-SLAM
KARTO-SLAM is an optimized SLAM algorithm, it was developed by
SRI International’s Karto Robotics with a ROS extension, as an open source code. Its working base lies in the decomposition of Cholesky matrices to minimize the error, giving an optimized robot pose and trajectory [
22].
KARTO-SLAM builds the map by using nodes that save the location points of the robot trajectory and the dataset of sensor measurements. Graph borders are represented by transformations or trajectories between two consecutive poses in the space. when a new node is added, the map will be reprocessed and updated according to the border restriction in the space. These restrictions will be linearized as an scatter graph [
51,
52].
A loop closure condition can be shown if the robot revisits the same point twice or more times in the same run. In other words, a border that connects two nodes with the same world perception is made. Aligning these perceptions produces a virtual transformation. Based on this information it is determined if the algorithm can adjust its estimations and represents the environment with a good enough confidence level [
53].
An optimization is used to calculate the most likely pose from the nodes collected, to get the most probable graph. To use the optimization methods, it is necessary to define an error function between the measurements obtained. Assuming
is the nodes vector in the graph, and
the odometry between nodes
and
. A border
is produced, with an error expression that meets the Equation (
3).
Together with the inverse covariation matrix
, an error function is established, given by the Equation (
4).
The goal is to compute a posture
x, in a way that the Equation (
4) goes to its minimum, in a way that Equation (
5) is accomplished.
At this point it is necessary to describe the algorithm parameters, these are shown in the
Table 4 and were taken from [
54].
2.5.5. RTAB-Map
RTAB-Map comes from
Real-Time Appearance-Based Mapping, it is a graph-based SLAM algorithm, composed by a C++ library and a ROS package. This library is an open source library, and has been improved and extended since its beginning in a way that the closed loop algorithm implements a memory management strategy [
23].
Its processing requires some distributed storage systems, these are short-term memory, work memory, and long-term memory. These all together optimize the localization and mapping for long periods or in wide spaces, because they limit the size of the space processed, so that the loop closure can be executed in a short time lapse [
55,
56].
RTAB-Map implementation is based in a simultaneous processing. For graph-based SLAM, as the map grows, the processing, optimization, assembly, and CPU load also grows. Reason why,
RTAB-Map stablishes a maximum response time at SLAM output, once it has received the sensors data [
23,
57]. As the latest version of the algorithm admits 2D and 3D LiDAR sensors and is capable of performing visual SLAM, the RTAB-Map 2D LiDAR based SLAM option [
23] was used for the tests performed in this work.
The list of parameters of
RTAB-Map are shown in the
Table 5, they were taken from [
58].
2.6. Arenas Used
Three different arenas simulated through GAZEBO were created to test the SLAM algorithms. The differences between them are mainly based on the number of irregularities per area that they have, and also, by the kind of path that they force the robot to follow.
2.6.1. Common Environments Arena
This arena simulates an apartment with a set of rooms and regular geometry objects on it, in every single place there is a quite good number of irregularities, so that the robot can easily handle the SLAM task, see
Figure 1 for reference.
2.6.2. Training Arena
This arena is used for algorithms characterization and calibration, but also for the comparison trials. It is shown in
Figure 2. It can be considered as a middle point between
Common Environments Arena and
Labyrinth Arena, since it has regular figures as
Common environments Arena does, but also has long corridors around the zero coordinate of the arena, as
Labyrinth Arena does. These are the reasons why it is used for characterization and calibration of the algorithms.
The arena tries to challenge the algorithms with some sort of general asymmetry, and with angled obstacles to see how good it is dealing with this kind of obstacles, the arena itself is nothing but a corridor with a center room containing a single obstacle, however, something that can slip past is that the number of irregularities per area is a bit lower than with Common Environments Arena, but higher than Labyrinth Arena.
This arena is also considered for comparison trials, to reflect how good the characterization and calibration was.
2.6.3. Labyrinth Arena
This arena is the hardest of the three for the SLAM algorithms, at glance it shows a labyrinth easy to follow, however, it is a very difficult environment to map by any SLAM algorithm, as this arena challenges the algorithms with more complex obstacles and with long corridors, without any irregularity that could help the algorithms to easily locate themselves and recreate the environment map. These two reasons make this arena the hardest for the test performed in the algorithm comparison. For reference see
Figure 3.
2.7. Trajectories Used
There were fifteen trajectories used, six for
Common environments Arena, six for
Training Arena, and three for
Labyrinth Arena. The main objective of the trajectories is to make the robot follow the arenas in diverse ways, first starting from coordinate zero (geometric center of the arenas), then starting from a non-zero coordinate, and finally following twice the trajectory starting from coordinate zero. All these three trajectories are followed in the forward direction and then in reverse, except for the
Labyrinth Arena, in which reverse trajectories are the same than the forward direction, so only three trajectories were used in this arena. The match between observations and scenario is shown in the
Table 6.
All these trajectories are shown in a simplified version of each arena in the
Figure 4a–c for
Training Arena, in the
Figure 5a–c for
Common Environments Arena, and in
Figure 6a,b for
Labyrinth Arena. In these figures, the yellow arrow indicates the starting point and direction of the forward trajectory, and the tip of the red arrow indicates the finishing point of this path.
2.8. Characterization and Calibration Methods Used
To characterize each of the algorithms, a statistical approach was taken, it is not sensible to the type of SLAM algorithm or sensors used, it is only sensible to the data provided by each of the metrics for the trials, so that even SLAM approaches using sensors other than LiDAR can be calibrated following this method, as long as the map representation is compatible with the application of the knn-search metric, and the robot pose is obtained in matching measurement units. However, a 2D LiDAR sensor approach was taken to match the analysis with actual equipment available, and because these are the most common sensor for the 2D SLAM approach, and especially applicable to low-cost robotic platforms.
The methodology focuses on finding statistical evidence of the effects of the algorithms parameters on the output means of Pose Accuracy, Map Accuracy, CPU usage, and Memory usage, it is important to highlight that this paper have used mean measurements for characterization and calibration, but other descriptive statistical values can be used if wanted.
There are three different stages for calibration, these are described below.
The first stage is only used when the algorithm has a large amount of parameters that must be tuned, here comes into play the first statistical tool, which is a Plackett–Burman experiment, which is a kind of Design of Experiments with a reduced amount of samples, but with the weakness that only takes into account main effects, since main effects are aliased with 2-way interactions (only the effect of each variable by itself can be obtained). With this tool it can be ensured with some confidence level defined when analyzing the experiment results, that a variable has an effect over an output.
Next, the second stage is when calibration comes into play. This part of the process considers only the variables that demonstrated that, by themselves, have an effect over at least one of the four outputs we are measuring, a full-factorial Design of Experiments is used, it makes the combination of all the parameters in the ranges defined by the user, and returns a Pareto chart and an equation, with these both we can determine which is the best combination that reduces the error of the localization and mapping, or reduces the resources usage, also with some confidence level defined when analyzing the experiment results.
Finally, to compare the algorithms, since the data obtained from each run not necessarily shapes a Gaussian’s curve, the central limit theorem is used, population data are considered the whole different tests that can be performed on these arenas with this robot and with each algorithm, so that, calculating the mean of the means we can then compare this value between the values obtained from other algorithms (for a full-data comparison), using the statistical tools that can be used with Gaussian-behaving samples, in this case using hypothesis tests for the mean and the standard deviation of the means (Two-Sample T for the mean, and Two-Sample Standard Deviation for the standard deviation of the means).