1. Introduction
Autonomous driving technologies are evolving rapidly with the ultimate goal of developing a safe and reliable, fully autonomous vehicle, i.e., SAE level 4 and eventually level 5 [
1]. A real-time, accurate, and robust positioning system is the backbone of a fully autonomous vehicle and many Advanced Driver Assistance Systems (ADAS). It is the basis for environment perception, path planning, and autonomous decision making. Global Navigation Satellite System (GNSS) is most widely used for vehicle positioning (GPS, GLONASS, Galileo, BeiDou are all examples of GNSS systems). However, these systems are not always reliable as they are dependent on satellite visibility. Obstruction of GNSS signals because of trees and large buildings, or GNSS signals that get reflected before being received by the GNSS receiver (multipath error), severely degrades the receiver’s performance [
2,
3]. Researchers have fused inertial measurement unit (IMU) sensor data with GNSS data to increase its precision and reliability [
4,
5]. Positioning systems that fuse IMU and GNSS data are referred to as the Inertial Navigation System (INS). The state-of-the-art INS uses tactical-grade IMUs with Real-time kinematics (RTK) GNSS receivers to estimate positions accurately. Their data are often postprocessed to achieve centimeter-level accuracy. However, these systems are too expensive to be deployed in consumer-grade vehicles. This article focuses on improving the availability, accuracy, and reliability of the vehicle positioning system fusing low-cost automotive-grade sensor data.
The most popular approaches to sensor fusion for the purpose of vehicle positioning are filter-based or graph-based. In filter-based approaches, typically, different Bayesian filter variants take a recursive approach and usually adopt the Markov assumption. In other words, all measurements are summarized by a state that is refined each time a new measurement arrives. The measurements are then discarded after being processed. This approach limits computational costs and is particularly popular in real-time settings [
6,
7,
8,
9,
10,
11]. Graph-based methods, typically pose-graphs, formulate a nonlinear least-squares optimization problem using a set of measurements rather than processing them one-by-one [
12,
13]. Pose-graph consists of nodes that represent the state (position and orientation or pose), and two nodes are always connected by an edge representing the measurement between the two. In the optimization process, the state estimates are optimized by considering a set of measurements rather than relying on a Markov assumption. This approach has been demonstrated to be more accurate in various problems such as Simultaneous Localization and Mapping or SLAM [
14,
15,
16,
17] and bundle adjustment [
18,
19]. Filter-based solutions have become much less popular ever since. For a comparison of filter-based methods and graph-based methods in the context of visual SLAM, we refer to [
20].
One major drawback of pose-graph optimization is that it is computationally demanding. At best, the optimization time increases linearly with the size of the pose-graph. Real-time or online execution is possible if the number of nodes and measurements is limited. To tackle this problem, researchers have proposed sliding-window pose-graph optimization strategies, which limit the size of the pose-graph by considering a subset of measurements to make optimization computationally tractable [
21,
22,
23]. Global optimization requires optimizing over all nodes in the graph. Sliding window pose-graph optimization techniques optimize the sequence of vehicle poses over the n most recent nodes only and therefore perform local optimization instead.
The benefit of absolute GNSS positioning, compared to integrating relative positioning (vehicle odometry and visual odometry), is that the position errors are bounded, whereas integrating relative positioning will accumulate errors indefinitely (drift). Vehicle odometry is typically estimated using Micro-electro-mechanical systems (MEMS) gyroscopes, IMUs, and wheel encoders. Low-cost MEMS IMUs are not accurate and do not exhibit long-term accuracy, i.e., its accuracy significantly decreases over distances in the order of 1 km. Wheel encoders are robust, but accuracy degrades with wheel slip and changing tire pressure. Visual odometry estimates the camera/vehicle’s pose by tracking static visual features between consecutive image frames taken from single or multiple camera systems [
24,
25,
26,
27]. These systems have proven to be accurate in low traffic urban scenarios where there are many static image features to track. It often fails in dark environments, real-world dense traffic scenarios, and highways, where there are considerably fewer static features to track. Relative position estimates do not contain direct information on the absolute position. Instead, they act as soft constraints between absolute poses. If the relative pose estimates would contain no errors (i.e., act as hard constraints) and if GNSS readings would not be correlated in time, then absolute position errors could be reduced with the order of
, with
being the standard deviation of the GNSS error and
n the number of GNSS readings. In reality, this model is highly naive because: (1) the relative pose estimates do contain errors and should only be used as soft constraints and (2) the GNSS readings are highly correlated in time [
28], reducing the statistical information of a single GNSS reading when fused with other positioning sources.
In this article, we propose and evaluate a pose-graph optimization-based vehicle positioning and mapping framework using automotive-grade GNSS receiver, stereo camera, and in-vehicle yaw-rate and velocity sensor data, depicted in
Figure 1. In contrast to SLAM, where loop closure detection is used to compensate for the drift and generate an accurate map, we fuse absolute GNSS measurements with relative odometry measurements. The GNSS measurements act as loop closures, which reduces dependency on image-based landmark and feature detection algorithms. We combine and extend our previous work [
29,
30] on pose-graph based vehicle localization. In [
29], we propose and evaluate multiple off-line pose-graph modeling strategies to fuse vehicle odometry and GNSS data for robust vehicle positioning. In [
30], we extended our work in [
29] and proposed a real-time pose-graph generation and optimization framework, "Incremental Hopping Window pose-graph Optimization" for vehicle positioning. This article extends the framework to model stereo visual odometry, vehicle odometry, and GNSS measurements into a pose-graph which is then optimized using the incremental hopping window pose-graph fusion strategy [
30]. We also extensively compare the framework’s performance with (a) vehicle odometry and GNSS fusion (b) stereo visual odometry, vehicle odometry, and GNSS fusion for real-time vehicle positioning on large datasets covering more than 180 km in different scenarios such as urban-canyons and highways. An overview of the framework is depicted in Figure 4.
The rest of the article is structured as follows. In
Section 2, we describe some of the related works. A brief explanation about pose-graph optimization and pose-graph structure is provided in
Section 3 and
Section 4, respectively. The pose-graph generation and real-time optimization process are described in
Section 5 and
Section 6, respectively. The experiments and results are provided in
Section 7 and the conclusion in
Section 8.
2. Related Work
A lot of research has been done to enhance vehicle positioning capabilities in urban scenarios using a GNSS receiver. Hieu et al. [
31] present a loosely coupled model for INS/GPS integration using an extended Kalman filter. They show that accurate positioning and navigation results are possible from 9 to 14 s of GPS outages with the position errors spread from 3 to 10 m (Root mean square). Andrew Howard [
25] proposes a stereo visual odometry algorithm for estimating frame-to-frame camera motion from successive stereo image pairs using a dense stereo matching algorithm. This approach generalized and simplified the approach described by Hirschmüller [
32], which uses feature matching rather than tracking and employs stereo range data for inlier detection, by introducing a complete inlier detection scheme (based on the notion of cliques) and simplifying the point-to-point inlier test to permit faster comparisons. Agarwal et al. [
33] proposed a real-time, low-cost system mobile robot localization system for outdoor environments. Their system relied on a stereo vision to robustly estimate frame-to-frame motion in real-time. The motion estimation problem from a stereo camera (visual odometry) is formulated in the disparity space and used inertial measurements to fill in motion estimates when visual odometry failed. The motion is then fused with a low-cost GPS sensor using a Kalman filter. This system was mainly designed low speed outdoor mobile robotics applications.
Rehder et al. [
34] present a pose-graph optimization-based approach to estimate the global pose of a vehicle using stereo-visual odometry which is affected by bias due to lack of close-range features and very infrequent GPS measurements. They show that the graph-based state estimation framework is capable of inferring global orientation using a unified representation of local and global measurements and recovers from inaccurate initial estimates of the state. Chiu et al. [
35] tackled the real-time pose-graph optimization problem by combining a long-term smoother and a short-term smoother using the Sliding-Window Factor Graphs in iSAM2 (Kaess et al. [
36]). Indelman et al. [
37] use the incremental smoothing technique from [
36] to fuse multiple odometry and pose sources. They choose a similar graph representation as proposed in this contribution, with the difference that they keep the full graph in memory over the entire trajectory, making the approach more memory consuming. Cucci and Matteucci [
38] propose the graph-based ROAMFREE framework for multi-sensor pose tracking and sensor calibration. They keep the size of the graph bounded by simply discarding older nodes and edges, thus potentially obtaining overconfident estimates. Merfels et al. [
21] propose a pose-graph optimization-based multi-sensor fusion approach that combines measurements from multiple localization systems in a plug-and-play manner. They formulate the problem as a sliding window pose-graph optimization, enabling efficient optimization and providing accurate pose estimates with high availability. They use a novel marginalization approach that marginalizes information in the last optimization window into a single prior node before generating a new window. In this article, our goal is to develop a graph-based real-time vehicle positioning and mapping framework. We research different pose-graph modeling approaches that model relative and absolute sensor measurements into a pose-graph. We propose incremental hopping window pose-graph optimization strategy for real-time vehicle positioning and perform extensive evaluations on a dataset covering 180 km of vehicle trajectories. Our proposed approach is conceptually similar to [
21] but differs in the following aspect: (a) we model GNSS measurements considering their uncertainty, whereas they provide constant weight to all GNSS measurements, (b) we define the size of the optimization window with respect to the distance travelled, whereas they define it in time, (c) they marginalize the information of the last optimization window into a single prior node before optimizing a new window, whereas we use a section of the last optimization window as prior.
3. Pose Graph
In this section and
Section 4, we provide a small introduction to pose-graph optimization and pose-graph structures before explaining the proposed framework in
Section 5 and
Section 6. The sensor fusion algorithm optimizes a pose-graph that models the vehicle’s motion from vehicle odometry, visual odometry, and GNSS receiver readings using the least-squares optimization technique to estimate the vehicle’s pose in real-time. The pose-graph consists of nodes, denoted by
X, which model the absolute vehicle pose by elements of
, i.e., Euclidean motions in 2-D, and of edges denoted by
Z, which model the relative poses between nodes, also with elements of
. We have chosen to use
instead of
because the automotive-grade low-cost GNSS receiver can only provide reliable 2D position estimates, i.e., latitude and longitude. The altitude estimates from these receivers are unreliable. Each measurement
Z is accompanied with uncertainty expressed in the tangent space of
using an information matrix denoted with
. The edges always connect two nodes, i.e., the edge
denotes a relative pose that moves the node
onto
. The error
between the poses of the nodes
and
with respect to the measured relative pose
is computed with:
where
denotes the logarithmic map from
to its tangent space, i.e.,
is a three-dimensional vector consisting of the angular and positional difference between
and
, as shown in
Figure 2. The goal of graph optimization is to minimize the following nonlinear objective function:
where
C represents the set of all index pairs for which measurements are available. This optimization task can be performed with the usual nonlinear solvers like Levenberg–Marqaurdt, Gauss–Newton, or Dogleg [
39]. In our work, we use the Gauss–Newton solver contained in the g2o graph optimization framework developed by Kuemmerle et al. [
12].
In order to explain the graph structure, we use a graphical notation, which is introduced next. Nodes or absolute poses in the graph are visualized using solid circles. Whenever a node is kept fixed, i.e., its pose is not optimized for, the circle contains a cross. The edges or relative poses are visualized using arrows. In order to better visualize the actual measurement contained in an edge and its error w.r.t. the absolute nodes, we visualize the measurement as a dashed circle and the error as a red dashed line.
Figure 2 shows a simple graph before and after optimization. In this example, the error is minimized by taking node
from its initial position to the position corresponding to the measurement contained in the edge
.
5. Pose Graph Generation
The vehicle positioning and mapping framework, depicted in
Figure 4, realizes the in-vehicle multi-sensor fusion process. The framework is composed of four blocks: vehicle sensors, odometry source selector, front-end pose-graph generator, and back-end pose-graph optimizer. The odometry source selector block estimates the motion of the vehicle using a sequence of stereo-camera images, which is validated using the vehicle’s velocity and yaw-rate sensor. If the difference in motion computed from the two sources is more than a threshold, the block switches to the velocity and yaw-rate sensor from the stereo-camera. The selected odometry source and GNSS measurements are used to model a pose-graph in the front-end Pose-graph Generator block, and the pose-graph is optimized in real-time in the back-end Pose-graph Optimizer block. The GNSS receiver is used to estimate the vehicle’s pose in the Universal Transverse Mercator (UTM) coordinate system. The process of modeling the vehicle sensor data into a pose-graph is explained in this section.
5.1. Visual Odometry
The process of estimating the relative poses or motion of a camera using a sequence of images is called visual odometry. In this article, we use a front-facing stereo camera set up to estimate the motion of the vehicle, commonly referred to as stereo-visual odometry (SVO). A modified multi-threaded version of the
libviso2 library by Geiger et al. [
40] is used for SVO. The motion of the camera is estimated by detecting and matching static image feature points between the two consecutive stereo-image pairs (previous and the current stereo images). The feature points are detected using a corner and a blob detector and are then matched between the four images of the two stereo-image pairs using a Sobel filter response [
40].
Figure 5 shows the “circle” feature matching strategy. First, for all feature point candidates in the current left stereo-image, we find the best match in the previous left stereo-image within a
M ×
M search window. The points are then matched with the previous right stereo-image, the current right stereo-image, and the current left stereo-image again. When matching candidates between the left and right images, the epipolar constraint of an error tolerance of 1 pixel is used. The candidate is considered a valid match if the last feature coincides with the first feature. The valid candidates in the previous stereo frame are reprojected onto the current stereo frame. The camera motion is then estimated by minimizing the sum of reprojection errors. An outlier rejection scheme based on random sample consensus (RANSAC) is applied before the final motion optimization step. The obtained motion estimates are then refined using a Kalman filter [
40].
5.2. Odometry Source Selector
The stereo-visual odometry estimates the motion of the camera/vehicle by matching static feature points in consecutive frames. In ideal scenarios, it drifts less and is more accurate than odometry derived using automotive-grade IMUs and odometers. However, in heavy traffic scenarios, when the camera’s field of view (FOV) is occluded by large moving objects like trucks or large vehicles, its performance degrades. If most of the feature points that are matched are on a moving truck, it estimates the motion of the vehicle relative to the truck. Applying RANSAC is not enough to estimate the exact motion from such biased data. We use the in-vehicle velocity and yaw-rate sensor data to validate the motion estimates from the SVO to tackle this issue. Here we assume that the velocity and the yaw-rate sensor have a small number of outliers compared to SVO. The transformation estimate between the two consecutive stereo frames is compared with the transformation estimated by integrating the velocity and yaw-rate measurements for the period. If SVO and vehicle odometry is similar, SVO is assumed to be more accurate and is therefore preferred; otherwise, the velocity and yaw-rate sensor measurements are used.
5.3. Edge Generator
The inputs to the front-end pose-graph generator block are velocity and yaw-rate measurements of the vehicle and its GNSS receiver measurements. It models these measurements into nodes and edges of a pose-graph. The pseudocode of the pose-graph generation process for the graph modeling approach
G2 is given in Algorithm 1. The odometry nodes are generated using the yaw-rate and velocity measurements from the Odometry source selector block. We preintegrate the yaw-rate and velocity measurements [
41] to compute the change in heading and the distance traveled by the vehicle. The nodes containing the vehicle’s pose are generated when there is a change in heading of more than 5 deg (
), or the vehicle has traveled at-least
m (
) or when a GNSS measurement is received. The edge between the two consecutive odometry nodes is estimated by computing the transformation matrix between the two poses. The GNSS nodes are generated for each GNSS fix from the GNSS receiver, operating at 1Hz. The GNSS edge represents the measurement between the origin node (UTM tile origin) and the GNSS receiver antenna. The identity measurement or edge is generated between the GNSS node and the corresponding odometry node.
5.4. Information Matrix Determination
The GNSS receiver estimates the expected accuracy of its fix for latitude, longitude, and altitude at the confidence bound. These expected accuracies are provided in meters and denoted with , , . These uncertainty values are computed from GNSS reading Dilution of Precision (DOP). The information matrix values for the UTM-X and UTM-Y coordinate measurements for each GNSS edge is computed as and , respectively, where we assume that there is no correlation in both directions.
The information matrix for all odometry edges is computed as the inverse of the covariance matrix of each odometry measurement. The covariance matrix is derived from the average 1.1% drift of the total distance traveled. We assume that when the velocity is zero, the vehicle cannot move or rotate from its position. In this case, we give the odometry a high certainty ( for the corresponding elements in the information matrix) for the corresponding edges. This prevents the vehicle from having abnormal movements after pose-graph optimization. For example, the vehicle will not show any lateral displacement with zero longitudinal displacements.
5.5. Adaptive GNSS Outlier Rejection
So far, we have assumed that the GNSS receiver provides a good estimate for its uncertainty using the , , values. However, this holds true when there is sufficient satellite visibility. The accuracy of GNSS receiver is severely degraded when trees and buildings block the line-of-sight to satellites or multi-path error is induced due to signal reflection. The , , values do not necessarily reflect this. Ignoring this will severely degrade the performance of the fusion framework. We call these erroneous GNSS measurements with overconfident , , values outliers, and we propose an approach to detect and ignore them before fusing. This approach is based on the fact that GNSS readings have low short-term accuracy, but the yaw-rate and velocity sensors in the vehicle have very high short-term accuracy. Thus the vehicle odometry can be used as an observer to detect GNSS outliers. We do this by computing relative measurements from the absolute GNSS readings for each second and compute the difference with respect to the relative measurements of the vehicle odometry for the same time span. If the error of change in heading and displacement is below deg () and 3 m (), respectively, the GNSS reading is incorporated in the pose-graph. We have tuned these thresholds for high precision, i.e., we try to make sure that all outliers are rejected at the expense of also rejecting good some measurements.
Algorithm 1: Pose-graph generation |
|
7. Experiments and Results
We analyze the performance of the pose-graph based vehicle positioning and mapping framework on eight datasets covering more than 180 km of driving distance. The datasets are recorded with our test vehicle in driving sessions of 30 to 50 min in different environments like urban canyons, highways, and rural areas around the Eindhoven region in the Netherlands. The trajectories of all datasets are shown in
Figure 8. We perform the following experiments for detailed analysis:
GNSS data analysis: Evaluation of the quality of GNSS readings from the receiver with respect to the RTK-GNSS. This sets the baseline for our other experiments.
Graph modeling: We compare and discuss the performance of the three graph modeling approaches provided in
Section 4.
Incremental hopping window performance analysis: We analyze the performance of the incremental hopping window approach described in
Section 6 in terms of positioning accuracy and robustness for different window sizes, batch sizes, and processing time.
Performance analysis of the proposed framework: We analyze the performance of the incremental hopping window approach using multiple odometry sources and a GNSS receiver.
7.1. Vehicle Setup
Our experimental vehicle is equipped with a stereo camera, a U-Blox GNSS receiver with PPP, an RTK-GNSS system, and a CAN interface to access the vehicle ECU messages. The stereo camera has a baseline of 30 cm composed of two PointGrey Firefly cameras. It captures images of 640 × 480 resolution at 60 Hz. The RTK-GNSS is the industry standard for accurate GNSS-based positioning and is postprocessed to obtain an accuracy of up to m. The yaw-rate in rad/s and velocity in m/s is received at 25 Hz through the CAN interface from the vehicle ECU, and the U-Blox GNSS receiver operates at 1 Hz. The Pulse-per-second (PPS) from the GNSS receiver is used to synchronize the clock of the computer in the vehicle.
7.2. Performance Metrics
To evaluate the performance of our sensor fusion algorithm, we compare the results with the post-processed RTK-GNSS using the metrics described below. For both the GNSS readings and the fusion results, we compute these metrics for the poses corresponding to the PPS, which we call PPS-Poses.
Maximum offset error (
) in meters, which is the maximum offset Euclidean distance error over all
PPS-Poses of each dataset computed with respect to the corresponding RTK-GNSS position. It gives an indication of the magnitude of the outliers in a dataset. It is computed as
where
n is the total number of poses in a dataset,
,
and
,
are the corresponding coordinate values of the positioning system GNSS or GNSS-Odometry fusion, and of the RTK-GNSS points respectively.
(
) in meters, represents the structural offset between the RTK-GNSS and the positioning system under consideration. It is the Euclidean distance of the point computed from the average offset error in UTM-X and UTM-Y axes of the UTM coordinate system for each dataset. It is computed as
where
and
are the mean offset in the considered positioning system, computed as
Precision (
) in meters, which is the standard deviation of the distance of each point from the computed mean offset error for UTM-X and UTM-Y axes for each dataset. It represents the variation or dispersion of the readings for the considered positioning system from its mean for each dataset. It is computed as
where
is the distance of each point from the mean off-set, i.e.,
For these metrics, we also report the averages over all dataset and the relative percentage improvements of the pose-graph fusion with respect to the GNSS. As the relative information contained in the (visual) odometry cannot contribute to the absolute position information, we expect that our sensor fusion cannot improve accuracy (which measures the absolute position) but that it can improve on the maximum offset error and the precision.
7.3. Results
First, we analyze the GNSS receiver’s performance on the dataset against the RTK-GNSS receiver measurements in
Section 7.3.1. It is considered the baseline for which the improvements in the performance metrics of the proposed framework is estimated. Secondly, we compare the performance of the three graph modeling approaches to our dataset and select the best performing in
Section 7.3.2. Then, the incremental hopping window optimization performance on the selected pose-graph model is evaluated to estimate the correct batch size and optimization window size for real-time application in
Section 7.3.3. Finally, in
Section 7.3.4, we analyze the performance of the proposed framework with (i) vehicle odometry and GNSS fusion and (ii) SVO, vehicle odometry, and GNSS fusion, using the selected pose-graph model, the batch size, and the optimization window size.
7.3.1. GNSS Data Analysis
First, we evaluate the quality of the GNSS data from the receiver with respect to the RTK-GNSS for all datasets. The performance metrics for the GNSS readings and the characteristic of the environment in which the majority of the datasets are recorded are shown in
Table 1. The GNSS receiver performs well in a highway and rural environments. We can see that the precision for different datasets has an order of magnitude of a couple of meters, as expected. At the same time, we can see the maximum offset error varies a lot. This is because the GNSS signals suffered from reflection and occlusion in urban canyon areas and underpass. The
accuracy of all the datasets is never close to zero due to a bias in the GNSS readings. This bias is clearly visible in the error’s scatter plots depicted in
Figure 9. It is evident that over time spans that are relevant for automotive, that is, minutes to hours, the GNSS error distribution does not exhibit zero-mean behavior and that GNSS errors are highly correlated in time [
29]. The existence of this bias is exactly the reason why RTK-GNSS base-stations need at-least 24 h or more averaging time to achieve a positioning accuracy of 2 cm. Fusing odometry measurements with GNSS readings can improve the
precision of the positioning system.However, it cannot remove this bias, and therefore it cannot improve
accuracy, as the vehicle odometry only provides information about the relative motions [
42].
7.3.2. Graph Modeling
The performance metrics of the modeling approaches
G1,
G2, and
G3 are shown in
Table 2. For the first graph modeling approach
G1, the average maximum offset error decreased by
, but the average precision increased by
, which is unwanted. The average accuracy remained the same as the average GNSS accuracy, as expected. It is observed that out of the eight experiments, only three converged for the approach
G1. We believe this convergence issue is caused by too rigid modeling of the pose-graph, which hampers convergence when the vehicle poses, initialized from the odometry, are far off from the GNSS readings.
The pose-graph modeling approach
G2 performed much better than
G1. The optimization converged for all eight experiments.
Table 2 shows the performance metrics for this approach. The average maximum offset error and precision decreased by
and
, respectively. It performed well in both highways and urban-area scenarios. The improvement with respect to the first model
G1 is due to the fact that model
G2 has more flexibility, as the GNSS positions are also optimized. It improves convergence for scenarios with a challenging initialization when there is a large deviation of the odometry heading to the GNSS heading.
The pose-graph modeling approach
G3 performed well, but the performance was less than
G2.
Table 2 shows the performance metrics of this approach. The average maximum offset error and precision decreased by
and
, respectively. Like
G1 and
G2, there is no significant change in the average accuracy, as expected.
7.3.3. Incremental Hopping Window Analysis
In the previous experiments, we have optimized the entire pose-graph offline, where the modeling approach
G2 performed best. In this section, we study the performance achieved when optimizing the pose-graph structure
G2 online with the incremental hopping window optimization strategy. It provides valuable insight into the reliability and real-time applicability of the localization system. We also study the impact of different window sizes
w and batch sizes
b on the performance metrics. The incremental hopping window optimization strategy is evaluated on eight datasets with five different window sizes (500, 1000, 1500, 2000, and 2500 m). For each window size, we perform optimization for 16 different batch sizes (5, 10, 20, 25, 40, 50, 66, 100, 111, 125, 142, 166, 200, 250, 333, and 500 m). The proposed strategy runs on a single CPU core of a multicore Intel CPU running at 2.3 GHz.
Table 3 shows the average of the performance metrics of all datasets for a window size of 1000 m and a batch size of 5 m. As expected, both precision and the maximum error are improved significantly with respect to GNSS, but the accuracy is not improved due to the bias in GNSS errors.
Figure 10a shows the influence of the batch size on the average precision over all datasets. The incremental hopping window strategy provides the best results when optimizing with a batch size of 5 m for different window sizes. It performs in total 200 Gauss-Newton iterations for each window of size 1000 m. The precision is similar for batch sizes between 5 and 66 m but degrades for larger batch sizes. This shows that the precision of the system is highly dependent on the batch size rather than the window size. It is observed that the difference in precision between incremental hopping window optimization and global optimization is only 1 centimeter, which is in range of the accuracy of the ground truth. Therefore, in our experiments, both global optimization using all measurements and local optimization using the proposed framework achieve similar accuracy, as desired.
Figure 10b shows the batch size versus total computation time considering all datasets. The computation time increases with decreasing batch sizes, and it follows a similar pattern for different window sizes. This is because the total number of Gauss–Newton iterations increases with decreasing batch size.
Figure 11 provides the computation times and update rates for different batch sizes to demonstrate the real-time capability of the hopping-window optimization strategy. Here we can see that the computation time increases linearly with increase in batch size. Keeping in mind that the vehicle odometry is provided at 25 Hertz, it can be seen that practically all batch sizes meet this real-time constraint of 25 Hertz. An optimization window of 1500 m and a batch size of 40 m provide a good trade-off between precision and computation time. This configuration is marked with the black dot in
Figure 10.
7.3.4. Performance Analysis of the Proposed Framework
The previous experiments show that the graph model
G2 performs best in our large dataset. The Incremental hopping window pose-graph optimization with a batch size of 40 m and a window size of 1500 m provides a balanced configuration for real-time application. We choose these settings to analyze the performance of the proposed positioning and mapping framework with (i) vehicle odometry and GNSS fusion and (ii) SVO, vehicle odometry, and GNSS fusion. The performance metrics of the proposed framework are shown in
Table 4. GO1 represents offline global optimization, and LO1 represents incremental hopping window optimization both using vehicle odometry and GNSS data as input. GO2 represents offline global optimization, and LO2 represents incremental hopping window optimization both using SVO, vehicle odometry, and GNSS data as input. We can see that GO1 and LO1 show significant improvement over GNSS data, with GO1 performing marginally better than LO1. GO1 shows a 69.53% improvement in maximum offset error and 17.79% improvement in the error’s standard deviation when compared with automotive-grade GNSS receivers. In comparison, LO1 shows a 60.52% improvement in maximum offset error and 17.18% improvement in the error’s standard deviation. It shows that the incremental hopping window optimization with a batch size of 40 m and a window size of 1500 m performs similarly with respect to global optimization of the pose-graph.
Then we perform experiments to fuse SVO, vehicle odometry, and GNSS data. We see similar trends to the last experiments where both GO2 and LO2 showed significant improvement over GNSS data, with GO2 performing marginally better than LO2. GO2 shows a 65.49% improvement in maximum offset error and 20.86% improvement in the error’s standard deviation when compared with automotive-grade GNSS receivers. Whereas LO2 shows a 65.45% improvement in maximum offset error and 20.25% improvement in the error’s standard deviation. We also observe that GO2 performed better than GO1 and LO2 performed better than LO1, which shows that SVO improves the performance of the positioning system. It also shows that the SVO is more accurate than vehicle odometry, but vehicle odometry is more robust than SVO.
Figure 12 shows some pictures of the results of all of the fusion approaches projected onto Google maps.
Figure 12a,b show that the GNSS reading represented as a yellow line degraded while the vehicle was traveling under a bridge. However, all of the fusion approaches handled the situation well and remained close to the RTK-GNSS reading represented with a red line.
Figure 12c shows a similar situation when the GNSS readings degraded when the vehicle was traveling under trees, and the fusion algorithms remained close to the RTK-GNSS readings.
Figure 12d–f show the results of the fusion approaches projected onto the Google street view.
8. Conclusions
We have proposed and evaluated a pose-graph based real-time multi-sensor fusion framework for vehicle positioning and mapping using a stereo camera, vehicle’s yaw-rate, velocity sensor, and a GNSS receiver. The framework is extensively evaluated on a dataset with 180 km of vehicle trajectories recorded in highway, urban, and rural areas. It is shown that the graph model in which the GNSS readings are modeled as optimizable nodes (approach G3), achieves the best results in our experiments, as it allows for more flexibility and thereby improves convergences. The precision of incremental hopping window optimization is shown to be very similar to that of global optimization. The difference is only 1%, which indicates that the most valuable information for sensor fusion for vehicle positioning is contained in sensor readings of the last 500 m of the vehicle trajectory. It is shown that performing frequent iterations for incremental hopping window strategy by using a small batch size improves the precision with an increase in the optimization time. A window size of 1500 m and a batch size of 40 meters produced a balanced result, in terms of computation time and positioning precision. We have analyzed the framework’s performance with (i) vehicle odometry and GNSS fusion, and (ii) stereo visual odometry, vehicle odometry, and GNSS fusion. The results show that the pose-graph approach, which models stereo visual odometry (SVO), vehicle odometry, and GNSS data, and which is optimized offline (approach GO2), performs overall best in our dataset. It showed a 20.86% improvement in precision when compared to a GNSS receiver, whereas the accuracy remains the same. This is expected as the accuracy of the positioning system is bounded by the accuracy of the GNSS receiver. We can conclude that the stereo visual odometry improves the precision of the positioning system when compared to vehicle odometry and GNSS fusion, and the incremental hopping window pose-graph optimization (approach LO1 and LO2) performs similar to offline global optimization (approach GO1 and GO2) in our dataset.