1. Introduction
Autonomous navigation requires the mobile robot to localize itself autonomously in a dynamic environment. To obtain the current location on the known map, various techniques and algorithms have been developed. Normally, the localization problem is solved at three levels, position tracking, global localization, and the robot kidnapping problem [
1]. As far as practical application is concerned, besides solving the three sub-problems, real-time performance and accuracy also need to be considered. Approaches applying various sensor modalities are challenged by specific problems. On the one hand, Wireless-Sensor-Network (WSN)-based methods utilizing Ultra Wide Band (UWB) [
2], WiFi [
3,
4], and Bluetooth Low Energy (BLE) [
5]) can localize the robot with the Received Signal Strength Indicator (RSSI), which is unique at specific location. Such methods rely on Access Point (AP) deployment, and the accuracy is sensitive to the surrounding noise [
6,
7]. On the other hand, exteroceptive-sensor-based methods determine the robot’s states (including position and orientation) by perceiving the surroundings with sensors mounted on the robot. Perceptual data are fused with algorithms such as the Kalman Filter (KF) and its extensions (extended Kalman filter and unscented Kalman filter [
8]), grid localization, and the Particle Filter (PF). Among them, the PF is a widely used method with a multi-modal probabilistic density function (pdf); it is a prevalent approach to the nonlinear and non-Gaussian state estimation.
AMCL is a typical PF developed from Monte Carlo Localization (MCL) to incorporate various sensor modalities. MCL is a recursive form of the Bayesian Filter (BF), since the probability of the true pose is described with a fixed number of particles, the real-time performance in a large area is challenged, and it cannot deal with the robot kidnapping problem. Hence, AMCL is proposed by fusing an adaptive technique with MCL, and the particle number varies when solving different sub-problems. For instance, position tracking is resolved with minimum particles, and when it comes to global localization or the robot kidnapping problem, more particles are applied to cover potential states [
9]. However, conventional AMCL is faced with the robot kidnapping problem during long-term performance in symmetrical environments with repetitive features. Taking the environment shown in
Figure 1 as an example, the algorithm models the motion model with odometers and perceives the surroundings with LiDAR. When the accumulated errors of the odometers are too high, the re-localization mechanism can be aroused by a sudden turn or dynamic environment. Since the adaptive mechanism of conventional AMCL determines the particle number only considering the LiDAR measurement likelihood [
10,
11,
12], this directly brings up the particle premature convergence problem in symmetrical environments.
By investigating the underlying relationship between LiDAR measurement and the algorithm, this paper reveals the causes of catastrophic problems in global localization and the robot kidnapping problem in a specific environment. General measurements are shown in the two conditions, since scans are applied to match with the existing map, and the limited distance measurements of LiDAR bring up several matching results. For instance, the circles represent the robot’s real pose, and the perceived surroundings are demonstrated in Condition 1 and Condition 2, respectively. Then, to correct the probabilistic estimation, the observation is used to match with the known map. In both conditions, there are a few confusing matching results due to the repetitive features in the environment. The robot cannot determine which location is correct until a distinguishable feature is observed. During the autonomous navigation mission in Condition 1, the motion is chaotic before arriving at the T-junction, and there is no guarantee that the perception is correct, so matching the LiDAR observation to the map is not reliable [
13]. When the robot is in a long corridor, as demonstrated in Condition 2, a similar local submap can be obtained at many locations, as shown by the yellow stars. Therefore, once the right particle is determined during this stage, the algorithm is challenged by the problem of particle degeneracy. To obtain a better diversity of the particles, resampling methods were developed in previous works. Lin et al. proposed to improve FastSLAM with an adaptive bat-inspired resampling [
14]. Manizheh et al. preserved the particle diversity by optimizing the proposal distribution [
15]. More resampling techniques were introduced in [
9,
16,
17]. In previous works, approaches have been proposed to solve the robot kidnapping problem, but the accuracy and long-term performance were not considered [
18,
19].
To address the above challenges, this paper presents a low-cost and convenient method to deal with the global localization problem and robot kidnapping problem in a specific environment. The conventional LiDAR-based AMCL [
10] is improved to incorporate the discrete visual observation efficiently. The contributions of this work are as follows:
- (a)
The resampling stage of the conventional AMCL is improved to reposition particles in regions determined by visual sensors. The discrete visual localization results are tightly coupled with the filter. Compared to LiDAR-based AMCL, the proposed method is more robust to repetitive environments, while the efficiency property is not affected.
- (b)
The correction mechanism can be aroused automatically without preventing the proper continuous operation of the robots. Besides obtaining high-level accuracy, the robot kidnapping problem can also be avoided and solved. Since it is parallel to AMCL, even the visual part fails temporarily, and the position tracking can be maintained when landmarks are not available.
- (c)
Abundant experiments were implemented to validate that the framework is endowed with high accuracy and efficiency during long-term performance. The improved AMCL can be generalized to other systems with different forms of data as well.
The remainder of this paper is organized as follows.
Section 2 introduces related works.
Section 3 introduces the improved AMCL assisted by the absolute method. Experiments are presented in
Section 4. Finally,
Section 5 concludes this work.
2. Related Works
The indoor localization problem has been explored much in previous works, and algorithms were developed from different views separately, including accuracy [
20,
21], global localization [
22], and the robot kidnapping problem [
23]. WSN-based methods have been mainly improved for accuracy. Sandra et al. resolved the localization problem in multi-room indoor environments with UWB, trilateration, and fingerprinting combined to cope with the Received Signal Strength Indicator (RSSI). The specific AP deployment strategy limits the method suitable for small environments [
2]. Yu et al. combined BLE with other sensors for 3D localization [
24]. Such RSSI-based methods are vulnerable to fluctuating signals and AP deployment. Aiming to improve the accuracy, Ullah et al. provided a reference research on the effect of various factors [
25]. Bharadwaj and Koul proposed to mitigate the unreliable nature of the RSSI, which is easily plagued by surrounding noise [
26]. It can be seen that the above methods are limited by AP deployment and signal stability; this nature makes them unsuitable for the specific environment shown in
Figure 1.
To obtain a convenient localization framework, exteroceptive sensors such as LiDARs and cameras have been widely used. To be exact, the performance has been evaluated by balancing the accuracy, the capacity to solve global localization, and the robot kidnapping problem. Visual–LiDAR-fusion-based methods have been extensively applied for SLAM [
27]. Li et al. proposed a high-precision positioning method to improve the localization accuracy by fusing the inertial measurement unit (IMU), monocular camera, and LiDAR. The system consisted of IMU–3D laser data filtering and IMU–monocular data filtering. The federated filtering for the multi-sensor integrated navigation system was based on the KF. The approach was implemented to track the robot’s position [
22]. In [
20,
21], the accuracy of LiDAR odometry and visual odometry was studied. Though they could localize the robot continuously, the accumulated error and computation burden made it unsuitable for long-term operation.
Wang et al. utilized a 2D LiDAR to solve the global localization problem in a probabilistic way [
28]. The proposed algorithm utilizes a qualitative motion model, and the capability to cope with global localization was verified in the experiments, while the convergence speed and the robustness to dynamic objects were left to be improved. Guan et al. incorporated the Kullback–Leibler Divergence (KLD) sampling with the Random Finite Set (RFS) model to MCL. Doppler–azimuth radar was applied to provide global pose, and their method was tested in an area of 12 m × 12 m with seven landmarks [
29]. Chen et al. proposed heuristic Monte Carlo for local and global localization. The prior map obtained by radar was processed by an image-processing algorithm, then the Discrete Hough Transform (DHT) was implemented to match the visual features. During the localization stage, the particles obtained by the matching results were used as the heuristic distribution of MCL [
30].
The kidnapping problem, which is considered as the most difficult one, is usually resolved in two steps, kidnap detection and recovery. Chien et al. worked with the global localization problem in symmetrical environments, and the premature convergence of MCL for such occasions was investigated. Multi-Objective Particle Swarm Optimization (MOPSO) was introduced to detect the failure and to resample the particles with balanced weights, and the Pareto front was incorporated with MCL. The algorithm was proven with simulations, while the run-time performance, which is related to practical applications, was left to be improved [
19]. Campbell and Whitty aimed to detect all kidnapping events during the autonomous navigation of a robot. Various metrics were evaluated in the combined detector, then the most suitable metrics were determined by optimal thresholds. The detection accuracy relied on the classification of the kidnapping problem and the chosen thresholds [
31]. Su et al. solved the kidnapping problem of AMCL by integrating a LiDAR and camera. To retrieve pose proposals directly, the scene was represented by a grid map and Gist descriptors. Image retrieval and keyframe clustering were applied to refine the robot’s pose in the LiDAR likelihood framework. The continuous localization was completed by the probabilistic method, and the visual localization was applied to supply global poses for position tracking [
23]. Yilmaz and Temeltas improved Self-Adaptive Monte Carlo (SAMCL) by making the algorithm suitable for AGVs equipped with 2D or 3D LiDARs. The experiments showed that when unexpected collisions happened, the recovery needed 14-16 s for global localization [
12]. Hence, a balanced approach capable of solving the practical autonomous localization in the specific environment shown in
Figure 1 is still needed.
3. Proposed Method
The proposed method locates the robot globally by incorporating visual observation with the filter calculation. The architecture mainly includes three parts, as shown in
Figure 2. Firstly, the visual localization calculates the discrete global pose with the aid of landmarks. To improve the efficiency and robustness of the image processing, Histogram of Oriented Gradients (HOG) descriptors [
32] and k-Nearest Neighbor (KNN) [
33] are applied for real-time descriptor detection. Once a target is detected, the corresponding global location can be obtained. Perspective-n-Points (PnP) is implemented to calculate the pose
relative to the markers with the dimensions known. After that, the KLD sampling part is exploited to incorporate the visual localization results in a probabilistic manner. Specific particles are generated based on the visual localization and added to the existing particle set maintained by the conventional AMCL, and the threshold of the particle number can be more reasonable to prevent the premature convergence problem in symmetrical environments. Therefore, the uncertainty can be evaluated considering both visual and LiDAR measurements. The correction mechanism is aroused once the Euclidean distancebetween the conventional AMCL estimation and visual localization result is higher than the threshold.
3.1. Improved AMCL Framework
The parameters utilized to introduce the algorithm are listed in Nomenclature. The improved AMCL can be divided into 4 steps: initialization, updating sensor observation, resampling, and estimation considering visual perception. The framework is shown in Algorithm 1.
Algorithm 1 Improved probabilistic algorithm |
- Input:
previous state and sensor observations (Initialization) - Output:
threshold of particle number [ 10]; - 1:
= sample motion model(); - 2:
= likelihood field(), (Update); - 3:
= +<>, (Resampling); - 4:
if d> then - 5:
generate particle set center on , (Incorporate visual observation) - 6:
= <,> - 7:
else= random particles - 8:
end if - 9:
add to with probability ()
|
Initialization: As a recursive method, the algorithm is initialized with assumptions. The conventional AMCL distributes particles on the whole map with an average weight for global localization, then the current pose needs to be input or calculated for position tracking. For such occasions, the robot needs to move a little to obtain the surroundings’ information. However, in the symmetrical environment shown in
Figure 1, the initialization is time-consuming and confusing. Aided by camera observation, the visual localization results can be applied to determine the initial pose for position tracking.
Update and resampling: In detail,
and
are recursively utilized to update the state for tracking. Assume there are
M particles in the initial set. The robot’s pose is calculated in a probabilistic manner.
of the set
is calculated by Equation (
1).
denotes the pose of the
ith particle at the
tth moment, as shown in Equation (
2), and it is calculated based on the previous state and
.
Then local measurements of the LiDAR are applied to match with the known map, which is represented as importance factor
calculated by Equation (
3). In this stage, the samples with low
are eliminated. In the environment with repetitive features or even one that is symmetrical in structure, the wrong samples share similar measurements to the correct one. When the algorithm becomes trapped in the wrong local optimum, premature convergence is induced.
Incorporating visual observation: To maintain better diversity, visual localization results are considered as the reference. When the Euclidean distance
d between visual localization result
and the current estimation of AMCL
exceeds the threshold
, specific particles are generated to improve the reliability of the measurement before evaluating the uncertainty. Theoretically, by distributing the particles in a wider area, one can obtain better diversity. To increase the weight of visual-based localization, the number of specific particles is generally more than the existing set. Then, assigning
, calculated by Equation (
4), specific particles are preserved in the set
. For instance, when the area is defined as a square whose radius is
r, specific particles are generated centered on
with a Gaussian distribution.
3.2. Visual-Based Global Localization
Aside from unexpected collisions and manual intervention, the symmetrical nature of the environment is also a non-negligible factor that results in high measurement uncertainty. As plausible particles share similar weights in a long corridor with few features, once the right pose is eliminated, the algorithm may keep track of the wrong pose. Vision sensors with redundant information can solve such ambiguities. The RoboMaster S1 vision markers with known dimensions were deployed in the environment, as shown in
Figure 3. Once a marker is perceived, the robot’s absolute pose relative to the marker can be calculated. Three stages are included: feature extraction, descriptor recognition, and the PnP algorithm.
The observed images are described with the RGB color palette, consisting of Red (R), Green (G), and Blue (B). Then, the grayscale and gamma transform are imposed to extract the HOG descriptors efficiently. All the RGB images are described with the Hue, Value, Saturation (HSV) parameters. Assuming that
and
are gradients for horizontal and vertical edges, the amplitude
and angle
can be expressed by the formulas in Equations (
5) and (
6).
Then, a window
I, which consists of cells, is applied to find the gradient, and each cell has a local histogram of the 1D gradient directions. To compute the cell gradient histogram,
M and
are considered as the horizontal and vertical axis of the histogram, respectively, and each pixel in the cell is mapped. The detected HOG features with different cell sizes are presented in
Figure 4.
The extracted descriptors are recognized by KNN. Denote the training dataset as
. Each marker is labeled with a global position
on the grid map and a signature
. The finite set of signatures is denoted as
; let
denote the domain of instance
. The markers applied are presented in
Figure 3. The images of the markers observed from different views are preserved in the training dataset. KNN obtains the set
by calculating the similarity between the query
and training samples
with Equation (
7). In the experiment, each marker was sampled 500 times, and the accuracy peaked at
with
.
With a recognized acquisition,
can be determined, as shown in
Figure 5. To calculate the camera’s relative pose in the markers’ coordinate system, four 3D corners of the marker’s outer frame were extracted, represented by
, and the specific point of this expression can be expanded as
. To calculate the depth information of the monocular camera, markers and the camera are at the same height. Then, the distance
d between the camera and marker can be obtained by Equation (
8). Finally, the problem is formulated and solved by the P4P problem [
34].
3.3. Correction Mechanism with Visual Localization Results
LiDAR-based AMCL corrects the pose according to the matching result of the likelihood field. The proposed approach corrects the pose with both LiDAR and visual measurements. From the view of premature convergence, the solution is analyzed from uncertainty assessment and the calculation of the particle number.
The uncertainty level is assessed from two aspects. The general method is to monitor the measurement probability
. Since
is also formulated based on
, the average weight of
can be considered as the stochastic estimation of the measurement probability, as described in Equation (
4). Meanwhile,
and
are maintained to relieve the impact of accidental sensor noises. Here,
satisfies
. When the robot runs smoothly, the two factors have the following relationships:
or
. Occasions such as the robot being manually moved to another room can result in a sudden change of
, then
. This means the perceived surroundings cannot be matched to the local map estimated by the algorithm. To cope with the high uncertainty, more particles are distributed to cover possible areas. For example, in the face of the initialization and global localization stages, more particles are needed to cover the whole map, and the uncertainty is high, while few particles are applied for position tracking with a lower uncertainty. The algorithm is unable to correct the estimation result directly. In high uncertainty occasions, the particle number threshold is improved and more random particles are supplied for matching. However, there is no guarantee that it can relocate the particles successfully in symmetrical environments. In comparison, the deviation can be detected and corrected directly with the help of visual observations. With specific particles added, the visual-based locations are incorporated with grid localization by adjusting the belief interval.
The particle number is calculated by the KLD [
10]. Since a high uncertainty level means a dispersive distribution, correspondingly, more state spaces are occupied by particles. Assume
represents the number of nonempty state spaces;
can be calculated by Equation (
9). With
and
, the value of
can be obtained by the standard normal distribution.