A Tightly Coupled Visual-Inertial GNSS State Estimator Based on Point-Line Feature

Dong, Bo; Zhang, Kai

doi:10.3390/s22093391

Open AccessArticle

A Tightly Coupled Visual-Inertial GNSS State Estimator Based on Point-Line Feature

by

Bo Dong

¹

and

Kai Zhang

^1,2,*

¹

Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China

²

Research Institute of Tsinghua, Pearl River Delta, Guangzhou 510530, China

^*

Author to whom correspondence should be addressed.

Sensors 2022, 22(9), 3391; https://doi.org/10.3390/s22093391

Submission received: 1 April 2022 / Revised: 25 April 2022 / Accepted: 26 April 2022 / Published: 28 April 2022

(This article belongs to the Section Remote Sensors)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Visual-inertial odometry (VIO) is known to suffer from drifting and can only provide local coordinates. In this paper, we propose a tightly coupled GNSS-VIO system based on point-line features for robust and drift-free state estimation. Feature-based methods are not robust in complex areas such as weak or repeated textures. To deal with this problem, line features with more environmental structure information can be extracted. In addition, to eliminate the accumulated drift of VIO, we tightly fused the GNSS measurement with visual and inertial information. The GNSS pseudorange measurements are real-time and unambiguous but experience large errors. The GNSS carrier phase measurements can achieve centimeter-level positioning accuracy, but the solution to the whole-cycle ambiguity is complex and time-consuming, which degrades the real-time performance of a state estimator. To combine the advantages of the two measurements, we use the carrier phase smoothed pseudorange instead of pseudorange to perform state estimation. Furthermore, the existence of the GNSS receiver and IMU also makes the extrinsic parameter calibration crucial. Our proposed system can calibrate the extrinsic translation parameter between the GNSS receiver and IMU in real-time. Finally, we show that the states represented in the ECEF frame are fully observable, and the tightly coupled GNSS-VIO state estimator is consistent. We conducted experiments on public datasets. The experimental results demonstrate that the positioning precision of our system is improved and the system is robust and real-time.

Keywords:

GNSS-VIO; line feature; carrier phase smoothed pseudorange; parameter calibration; observability

1. Introduction

Localization plays an important role in many applications such as robotics, unmanned driving, and unmanned aerial vehicles (UAVs). The fusion of information from multiple sensors for localization is the current mainstream method. The sensor fusion approaches have been widely studied for decades and can be divided into two streams: loosely coupled and tightly coupled approaches. Loosely coupled approaches [1,2] process the measurements of each sensor individually and then fuse all the results to obtain the final results. Tightly coupled approaches [3,4] combine measurements from all sensors and then use an estimator to process these measurements to obtain the final results. In general, the accuracy of tightly coupled approaches will be higher than that of loosely coupled approaches.

Visual odometry (VO) has received more attention due to the low price of cameras. Davison et al. [5] introduced MonoSLAM, which is the first monocular visual SLAM system that extracts the Shi-Tomasi corner [6]. MonoSLAM actively searches for matching point features in projected ellipses. However, because of the high computational complexity of MonoSLAM, it can only handle some small scenes. Klein et al. [7] proposed PTAM, which is the first optimization-based monocular visual SLAM system that executes feature tracking and mapping as two independent tasks in parallel in two threads.

Because PTAM is designed for small-scale AR scenes, it also has some disadvantages. For example, it can only handle small-scale scenes, and the tracked features are easy to lose. A milestone work after PTAM is ORB-SLAM [8] and ORB-SLAM2 [9]. ORB-SLAM improves PTAM by detecting and tracking ORB features and adding loop closure. These improvements greatly improve the positioning precision of VO. However, ORB-SLAM can only build sparse point cloud maps and cannot track keyframes during pure rotation, which also limits its application.

In addition, when encountering environments with repeated textures, point features cannot express environmental structure information well. On the contrary, line features contain more information related to the environmental structure and can overcome the interference caused by repeated textures. Therefore, point-line feature fusion can ensure that the VO system is more robust. He Yijia et al. [10] proposed PL-VIO, which is a VIO system based on point-line features. It uses the LSD line segment extraction algorithm [11] from OpenCV [12] to detect line features. However, LSD becomes the bottleneck for real-time performance due to its high computational cost [13], which causes the performance of PL-VIO to be seriously affected. To improve the speed and positioning precision of PL-VIO, Fu Qiang et al. [14] proposed PL-VINS, which is also a VIO framework based on point and line features. PL-VINS designs a modified LSD algorithm by studying hidden parameter tuning and length rejection strategy. The modified LSD runs at least three times faster than the LSD. PLF-VINS [15] is another VIO system based on point-line features, which introduces two methods for fusing point and line features. The first method is to use the positional similarity of point-line features to search for the relationship between point and line features. The second method is to fuse 3D parallel lines. The residuals formed by the two methods are then added to the VIO system. The results of PLF-VINS show that its positioning precision is greatly improved compared to some classic SLAM systems, such as OKVIS [4] and VINS-Mono [16].

Line features can improve the positioning precision of the VO system. However, in weak textures or dark scenes, the system still cannot extract a sufficient number of point and line features, which can cause large positioning errors. To overcome the shortcoming, fusing IMU measurements with images is a very effective method. MSCKF [17,18] is a VIO system based on EKF, which adds camera poses to the states of the system. When cameras observe landmarks, constraints can be formed between camera poses. The system is then updated by an observation model derived from geometric constraints. Since the number of camera poses is much smaller than the number of landmarks, this greatly reduces the time complexity of the system. VINS-Mono [16] is a tightly coupled, nonlinear optimization-based method that can obtain high-precision positioning results. Due to the existence of the loop closure and 4-DoF pose graph optimization, even if the system runs in large-scale scenes, it can still obtain accurate positioning results. In addition, to improve the performance of ORB-SLAM and eliminate the errors of ORB-SLAM, ORB-SLAM3 [19] adds IMU measurements based on ORB-SLAM. ORB-SLAM3 is more robust and has higher positioning precision compared with ORB-SLAM.

The VIO system is more robust than the VO system. Nevertheless, the VIO system has four unobservable directions, namely x, y, z, and yaw, which lead to the accumulated drift of VIO. To eliminate the accumulated drift, an effective approach is to combine VIO with GNSS measurements. Lee et al. [20] demonstrated that the GPS-aided VIO system is fully observable in the ENU frame. GVINS [21] is a tightly coupled GNSS-VIO state estimator that combines VIO with GNSS pseudorange and Doppler frequency shift measurements, which achieves high positioning accuracy and is a state-of-the-art method. Li Xingxing et al. [22] introduced a semi-tightly coupled framework based on GNSS Precise Point Positioning (PPP) and stereo VINS. The system can use S-VINS to return accurately predicted positions in GNSS-unfriendly areas. Liu Jinxu et al. [23] also proposed a tightly coupled GNSS-VIO state estimator that fuses GNSS raw measurements with VIO. It drops all GNSS measurements in GNSS-degradation scenes, which limits its positioning precision.

However, there are many challenges in fusing the information from multiple sensors. First, the VIO system cannot extract enough point features in areas with repeated textures. Second, GNSS Single Point Positioning (SPP) uses pseudorange measurement; it can only achieve meter-level positioning accuracy. Therefore, if the pseudorange measurement and VIO are directly fused, the positioning accuracy is not greatly improved. Finally, since the GNSS receiver and IMU are fixed in different spatial positions, it is necessary to estimate the extrinsic parameter between them.

In response to the above problems, this paper proposes a new tightly coupled GNSS-VIO system. Our proposed system is drift-free and can provide global coordinates. The main contributions of the text are as follows:

To obtain environmental structure information and deal with environments with repeated textures, we extracted line features based on point features.
To combine the merits of the pseudorange and carrier phase measurements, we used the carrier phase smoothed pseudorange instead of the pseudorange measurement, which can make the GNSS-VIO system run in real-time and improve the positioning accuracy.
We demonstrate that the states represented in the ECEF frame are fully observable, and the tightly coupled GNSS-VIO state estimator is consistent.

The rest of this paper is organized as follows: Section 2 introduces the implementation method of GNSS-VIO in detail, including line features, GNSS raw measurements processing, and observability analysis. Section 3 summarizes the detailed structure of our system. Section 4 conducts experiments on public datasets. Finally, conclusions are given in Section 5.

2. Methods

2.1. Frames and Notations

The frames involved in our system consist of:

Sensor Frame: In our system, ${(\cdot)}^{c}$ , ${(\cdot)}^{b}$ , and ${(\cdot)}^{r}$ denote the camera frame, the body frame, and the GNSS receiver frame respectively.
Local World Frame: The origin of the local world frame ${(\cdot)}^{w}$ is the position where the VIO system starts running, and the z-axis is gravity aligned as illustrated in Figure 1.
ENU Frame: The ENU frame is also called the East–North–Up frame. The x-, y-, and z-axis of the ENU frame point to the east, north, and up directions, respectively. In our system, the ENU frame is at the same origin as the local world, and the z-axis of the two frames is aligned (Figure 1). We use ${(\cdot)}^{N}$ to represent the ENU frame.
Geodetic Frame: As shown in Figure 1, geodetic coordinate ${(\cdot)}^{G}$ of a point p is represented by geodetic longitude L, geodetic latitude B, and geodetic height H. The geodetic longitude L of p is the angle between the reference meridian and another meridian that passes through p. The geodetic latitude B of p is the angle between the ellipsoid normal vector that passes through p and the projection of the ellipsoid normal vector into the equatorial plane. The geodetic height H of p is the minimum distance between p and the reference ellipsoid.
ECEF Frame: The Earth-Centered, Earth-Fixed (ECEF) frame ${(\cdot)}^{E}$ is fixed to Earth. As depicted in Figure 1, the origin of the ECEF frame is at the center of mass of Earth. The x-axis points to the intersection of the Equator and the Prime Meridian. The z-axis is perpendicular to the equatorial plane in the direction of the North Pole. The y-axis is chosen to form a right-handed coordinate system with the x- and z-axis.

In this paper,

R_{b}^{w}

and

p_{b}^{w}

represent the rotation and translation of the body frame with respect to the local world frame, and

q_{b}^{w}

is the corresponding quaternion form of the rotation

R_{b}^{w}

.

v_{b}^{w}

,

b_{a}

, and

b_{g}

denote the velocity of the origin of the body frame measured in the local world frame, accelerometer bias, and gyroscope bias, respectively.

p_{c}^{b}

and

q_{c}^{b}

stand for the extrinsic parameters between the camera and IMU.

δ t_{r}

and

δ {\dot{t}}_{r}

represent the receiver clock error and receiver clock drifting rate, respectively.

φ

is the yaw between the local world frame and the ENU frame.

R_{w}^{N}

is the rotation matrix of the local world frame with respect to the ENU frame.

p_{N}^{E}

denotes the translation of the ENU frame with respect to the ECEF frame.

p_{r}^{b}

is the extrinsic translation parameter between the GNSS receiver and IMU.

{[\cdot]}_{\times}

represents the skew-symmetric matrix of a 3D vector.

2.2. Line Feature

2.2.1. Plücker Coordinates

In our system, we described a spatial line with the Plücker Coordinates. Given a spatial line

L_{w} \in π_{w}

, its Plücker Coordinates are represented by

L_{w} = {(n_{w}^{T}, d_{w}^{T})}^{T}

, where

n_{w} \in ℝ^{3}

is the normal vector of the plane determined by

L_{w}

, and the origin of

π_{w}

and

d_{w} \in ℝ^{3}

is the line direction vector. We can transform

L_{w}

in the local world frame to

L_{c}

in the camera frame by:

L_{c} = [\begin{matrix} R_{w}^{c} & {[p_{w}^{c}]}_{\times} R_{w}^{c} \\ 0 & R_{w}^{c} \end{matrix}] L_{w} = T_{w}^{c} L_{w},

(1)

2.2.2. Line Feature Triangulation

Assuming that

L_{w}

is observed by camera

c_{i}

and camera

c_{j}

in the normalized image plane as

z_{L_{w}}^{c_{i}}

and

z_{L_{w}}^{c_{j}}

, the two line segments can be denoted by two endpoints as shown in Figure 2. The two endpoints of

z_{L_{w}}^{c_{i}}

are

s_{L_{w}}^{c_{i}} = {[u_{s}^{c_{i}}, v_{s}^{c_{i}}, 1]}^{T}

and

e_{L_{w}}^{c_{i}} = {[u_{e}^{c_{i}}, v_{e}^{c_{i}}, 1]}^{T}

, and the two endpoints of

z_{L_{w}}^{c_{j}}

are

s_{L_{w}}^{c_{j}} = {[u_{s}^{c_{j}}, v_{s}^{c_{j}}, 1]}^{T}

and

e_{L_{w}}^{c_{j}} = {[u_{e}^{c_{j}}, v_{e}^{c_{j}}, 1]}^{T}

.

A 3D plane

π

can be modeled as:

π_{x} x + π_{y} y + π_{z} z + π_{w} = 0,

(2)

where

n_{π} = {[π_{x}, π_{y}, π_{z}]}^{T}

is the normal vector of

π

. For a point

p_{0} = {[x_{0}, y_{0}, z_{0}]}^{T}

on the plane, we can obtain:

π_{w} = - (π_{x} x_{0} + π_{y} y_{0} + π_{z} z_{0}) = - n_{π}^{T} \cdot p_{0},

(3)

According to Equations (2) and (3), we can obtain the plane

π_{i} = (n_{π_{i}}, π_{w_{i}})

and

π_{j} = (n_{π_{j}}, π_{w_{j}})

. As shown in Figure 2, the normal vector of

π_{i}

is

n_{π_{i}} = {[s_{L_{w}}^{c_{i}}]}_{\times} e_{L_{w}}^{c_{i}}

, and

π_{w_{i}}

can be computed by the optical center

c_{i} = {[0, 0, 0]}^{T}

as

π_{w_{i}} = - n_{π_{i}}^{T} \cdot c_{0} = 0

. To obtain

π_{j}

, we need to transform the two endpoints

s_{L_{w}}^{c_{j}}

and

e_{L_{w}}^{c_{j}}

to the camera frame

c_{i}

. Therefore, the corresponding reprojected endpoints are

{\tilde{s}}_{L_{w}}^{c_{j}} = R_{c_{j}}^{c_{i}} s_{L_{w}}^{c_{j}} + p_{c_{j}}^{c_{i}}

and

{\tilde{e}}_{L_{w}}^{c_{j}} = R_{c_{j}}^{c_{i}} e_{L_{w}}^{c_{j}} + p_{c_{j}}^{c_{i}}

, where

R_{c_{j}}^{c_{i}}

and

p_{c_{j}}^{c_{i}}

can be obtained by visual-inertial alignment. Similarly, the normal vector of

π_{j}

is

n_{π_{j}} = {[{\tilde{s}}_{L_{w}}^{c_{j}}]}_{\times} {\tilde{e}}_{L_{w}}^{c_{j}}

, and

π_{w_{j}}

can be calculated by the translation

p_{c_{j}}^{c_{i}}

from the camera frame

c_{j}

to the camera frame

c_{i}

as

π_{w_{j}} = - n_{π_{j}}^{T} \cdot p_{c_{j}}^{c_{i}}

.

After

π_{i}

and

π_{j}

are computed, we can obtain the Plücker Coordinates of

L_{w}

according to the dual Plücker matrix

L_{w}^{*}

:

L_{w}^{*} = [\begin{matrix} {[d_{w}]}_{\times} & n_{w} \\ - n_{w}^{T} & 0 \end{matrix}] = π_{i} π_{j}^{T} - π_{j} π_{i}^{T},

(4)

2.2.3. Orthonormal Representation

Since spatial lines only have 4-DoF, there is a problem of overparameterization using the Plücker Coordinates to represent spatial lines. In contrast, the orthonormal representation

(U, W) \in s O (3) \times s O (2)

is more suitable for nonlinear optimization. Let

U = R (ψ)

and

W = R (ϕ)

denote 3D and 2D rotation matrix, respectively, then we have:

U = R (ψ) = [u_{1}, u_{2}, u_{3}] = [\frac{n_{w}}{‖n_{w}‖}, \frac{d_{w}}{‖d_{w}‖}, \frac{n_{w} \times d_{w}}{‖n_{w} \times d_{w}‖}],

(5)

\begin{array}{l} W = R (ϕ) = [\begin{matrix} w_{1} & - w_{2} \\ w_{2} & w_{1} \end{matrix}] = [\begin{matrix} \cos (ϕ) & - \sin (ϕ) \\ \sin (ϕ) & \cos (ϕ) \end{matrix}] \\ = \frac{1}{\sqrt{({‖n_{w}‖}^{2} + {‖d_{w}‖}^{2})}} [\begin{matrix} ‖n_{w}‖ & - ‖d_{w}‖ \\ ‖d_{w}‖ & ‖n_{w}‖ \end{matrix}], \end{array}

(6)

where

ψ = {[ψ_{1}, ψ_{2}, ψ_{3}]}^{T}

and

ϕ

are the 3D rotation angles around the x-, y-, and z-axis of the camera frame and the 2D rotation angle.

Therefore, we can define the orthonormal representation of a spatial line by a four-dimensional vector:

o = {(ψ, ϕ)}^{T},

(7)

In addition, given an orthonormal representation

(U, W)

, the corresponding Plücker Coordinates can be obtained by:

{L^{'}}_{w} = {[w_{1} u_{1}^{T}, w_{2} u_{2}^{T}]}^{T} = \frac{1}{\sqrt{{‖n‖}^{2} + {‖d‖}^{2}}} L_{w},

(8)

where

w_{1} = \cos (ϕ)

,

w_{2} = \sin (ϕ)

, and

u_{i}

is the ith column of

U

.

L_{w}

and

{L^{'}}_{w}

have a scale factor, but they denote the same spatial line.

2.2.4. Line Feature Reprojection Residual

The line feature reprojection residual is defined in terms of point-to-line distance. Given a spatial line

L_{c} = {(n_{c}^{T}, d_{c}^{T})}^{T}

, it can be projected to the image plane by [24]:

l = {[l_{1}, l_{2}, l_{3}]}^{T} = K_{L} n_{c},

(9)

where

K_{L}

is the line projection matrix.

Finally, assume that

L_{w}^{j}

represents the jth spatial line

L_{j}

, which is observed by the ith camera frame

c_{i}

. Then the spatial line reprojection residual can be modeled as:

r_{L} ({\tilde{z}}_{L_{j}}^{c_{i}}, χ) = [\begin{matrix} d (s_{L_{j}}^{c_{i}}, l_{L_{j}}^{c_{i}}) \\ d (e_{L_{j}}^{c_{i}}, l_{L_{j}}^{c_{i}}) \end{matrix}]

(10)

where

s_{L_{j}}^{c_{i}} = {[u_{s_{j}}^{c_{i}}, v_{s_{j}}^{c_{i}}, 1]}^{T}

and

e_{L_{j}}^{c_{i}} = {[u_{e_{j}}^{c_{i}}, v_{e_{j}}^{c_{i}}, 1]}^{T}

denote the two endpoints of the line segment projected on the normalized image plane.

d (s, l)

is the point-to-line distance:

d (s, l) = \frac{s^{T} l}{\sqrt{l_{1}^{2} + l_{2}^{2}}},

(11)

For the corresponding Jacobian matrix, we followed the routine of [10].

2.3. GNSS Measurements

GNSS measurements include pseudorange, carrier phase, and Doppler frequency shift.

2.3.1. Pseudorange Measurement

The pseudorange is defined as the measured distance obtained by multiplying the travel time of the satellite signal by the speed of light. Due to the influence of satellite clock error, receiver clock error, and ionospheric and tropospheric delays, the pseudorange is prefixed with “pseudo” to distinguish it from the true distance from the satellite to the GNSS receiver. Generally, although the pseudorange measurement only has meter-level positioning precision (the positioning precision of P code is about 10 m, and the positioning precision of C/A code is 20 m to 30 m), it is real-time and has no ambiguity. Therefore, the pseudorange measurement is still very important for GNSS positioning technology. For a certain satellite

s_{j}

and a GNSS receiver

r_{k}

at time k, the pseudorange

{\tilde{P}}_{r_{k}}^{s_{j}}

can be modeled as:

{\tilde{P}}_{r_{k}}^{s_{j}} = ‖p_{s_{j}}^{E} - p_{r_{k}}^{E}‖ + c (δ t_{r_{k}} - δ t_{s_{j}}) + T_{r_{k}}^{s_{j}} + I_{r_{k}}^{s_{j}} + ε_{p_{k}^{j}},

(12)

where

p_{s_{j}}^{E}

and

p_{r_{k}}^{E}

represent the position of satellite

s_{j}

and the GNSS receiver

r_{k}

at time k in the ECEF frame, respectively.

c

is the speed of light.

δ t_{s_{j}}

,

T_{r_{k}}^{s_{j}}

, and

I_{r_{k}}^{s_{j}}

are the satellite clock error and tropospheric and ionospheric delays, which can be computed according to the satellite ephemeris.

ε_{p_{k}^{j}} \sim (0, σ_{p_{k}^{j}}^{2})

denotes the multipath and random error of the pseudorange measurement, which is subject to the Zero-mean Gaussian distribution.

2.3.2. Carrier Phase Measurement

Although pseudorange positioning is an important method for GNSS, its error is too large for some applications that require high-precision positioning. In contrast, due to the short wavelength of the carrier phase (

λ_{L_{1}} = 19 cm

and

λ_{L_{2}} = 24 cm

), if the carrier phase measurement is used for positioning, it can achieve centimeter-level positioning accuracy. However, since the carrier phase is a periodic sinusoidal signal, and the GNSS receiver can only measure the part of less than one wavelength, there is the problem of the whole-cycle ambiguity, which makes the positioning process time-consuming. The carrier phase is defined as the phase difference between the phase transmitted by the satellite and the reference phase generated by the GNSS receiver. Similar to the pseudorange measurement, the carrier phase measurement is also related to the position of satellite and GNSS receiver. The carrier phase measurement can be modeled as:

{\tilde{Φ}}_{r_{k}}^{s_{j}} = ‖p_{s_{j}}^{E} - p_{r_{k}}^{E}‖ + c (δ t_{r_{k}} - δ t_{s_{j}}) + T_{r_{k}}^{s_{j}} - I_{r_{k}}^{s_{j}} + λ N + ε_{Φ_{k}^{j}},

(13)

where

λ

is the carrier wavelength, and

N

is the whole-cycle ambiguity.

ε_{Φ_{k}^{j}} \sim (0, σ_{Φ_{k}^{j}}^{2})

represents the multipath and random error of the carrier phase measurement, which is subject to the Zero-mean Gaussian distribution.

2.3.3. Doppler Frequency Shift Measurement

The Doppler effect reveals a phenomenon in the spread of waves, that is, the wavelength of the radiation emitted by objects changes accordingly due to the relative motion of the wave source and the observer. As the wave source moves toward the observer, the wavelength becomes shorter and the frequency becomes higher due to the compression of the wave. On the contrary, as the wave source moves away from the observer, the wavelength becomes longer and the frequency becomes lower. Similarly, the Doppler effect can also occur between the satellite and the GNSS receiver. When a satellite orbits Earth in its elliptical orbit, due to the relative motion between the satellite and the GNSS receiver, the frequency of the satellite signal received by the GNSS receiver changes accordingly. This frequency change is called Doppler frequency shift, which can be modeled as:

Δ {\tilde{f}}_{r_{k}}^{s_{j}} = - \frac{1}{λ} [κ_{k}^{T} (v_{s_{j}}^{E} - v_{r_{k}}^{E}) + c (δ {\dot{t}}_{r_{k}} - δ {\dot{t}}_{s_{j}})] + ε_{Δ f_{k}^{j}},

(14)

where

κ_{k} = \frac{(p_{s_{j}}^{E} - p_{r_{k}}^{E})}{‖p_{s_{j}}^{E} - p_{r_{k}}^{E}‖}

is the unit vector pointing along the line of sight from the GNSS receiver

r_{k}

to the satellite

s_{j}

.

v_{s_{j}}^{E}

and

v_{r_{k}}^{E}

denote the velocity of the jth satellite and the receiver at time k in the ECEF frame, respectively.

δ {\dot{t}}_{s_{j}}

is the satellite clock drifting rate, which can be calculated according to the satellite ephemeris.

ε_{Δ f_{k}^{j}} \sim (0, σ_{Δ f_{k}^{j}}^{2})

represents the multipath and random error of the Doppler frequency shift measurement, which is subject to the Zero-mean Gaussian distribution.

2.4. Carrier Phase Smoothed Pseudorange

As mentioned above, the pseudorange measurement proves fast and efficient, but can only achieve meter-level positioning precision. In contrast, the carrier phase measurement can achieve centimeter-level positioning precision but is required to solve the whole-cycle ambiguity, which is complex and time-consuming. Therefore, to combine the merits of the two measurements, we can use carrier phase smoothed pseudorange (short for smoothed pseudorange) to improve the positioning precision. In general, the positioning precision of the smoothed pseudorange measurement is several times higher than that of the pseudorange measurement. For a single-frequency receiver, when the whole-cycle ambiguity and ionospheric delay are nearly constant within a period of time, the pseudorange can be smoothed by using the carrier phase. The Hatch filter is the most widely used for carrier phase smoothed pseudorange, which assumes that the ionospheric delay is nearly constant among GNSS epochs and then averages the multiepoch whole-cycle ambiguity and ionospheric delay to improve the positioning accuracy of the pseudorange measurement. The carrier phase smoothed pseudorange based on the Hatch filter can be modeled as:

{\bar{ρ}}_{r_{k}}^{s_{j}} = \frac{1}{k_{m}} {\tilde{P}}_{r_{k}}^{s_{j}} + (1 - \frac{1}{k_{m}}) [{\bar{ρ}}_{r_{k - 1}}^{s_{j}} + {\tilde{Φ}}_{r_{k}}^{s_{j}} - {\tilde{Φ}}_{r_{k - 1}}^{s_{j}}] + ε_{ρ_{k}^{j}}, 1 \leq k_{m} \leq m,

(15)

where m is the smoothed interval of the Hatch filter, usually 20 to 200 epochs.

ε_{ρ_{k}^{j}} \sim (0, σ_{ρ_{k}^{j}}^{2})

denotes the multipath and random error of the carrier phase smoothed pseudorange measurement, which is also subject to the Zero-mean Gaussian distribution. According to Equation (15), the variance of the smoothed pseudorange is related to the variance of the pseudorange and carrier phase measurements. Assuming that the variance at different times is independent, then the variance of the smoothed pseudorange can be modeled as:

σ_{ρ_{k}^{j}}^{2} = \frac{1}{k_{m}} σ_{P_{k}^{j}}^{2} + (1 - \frac{1}{k_{m}}) (σ_{Φ_{k}^{j}}^{2} + σ_{Φ_{k - 1}^{j}}^{2}),

(16)

2.5. Factor Graph Representation

The factor graph of our system is shown in Figure 3, which shows the factors in a sliding window, including point feature factors, line feature factors, IMU factors, carrier phase smoothed pseudorange factors, and Doppler frequency shift factors. Visual observation consists of the point and line features detected by our system. In nonlinear optimization, the states are optimized according to the residuals of these factors. The states of our system include:

\begin{array}{l} X = [x_{0}, x_{1}, \dots x_{n}, μ_{0}, μ_{1}, \dots μ_{i}, o_{0}, o_{1}, \dots, o_{j}, p_{c}^{b}, q_{c}^{b}, φ, p_{N}^{E}, p_{r}^{b}], \\ x_{k} = [p_{b_{k}}^{w}, v_{b_{k}}^{w}, q_{b_{k}}^{w}, b_{a}, b_{g}, δ t_{r_{k}}, δ {\dot{t}}_{r_{k}}], k = 0, 1, \dots, n, \\ o = [ψ, ϕ], \end{array}

(17)

where n is the sliding window size, i is the number of point features, and j is the number of line features.

μ_{i}

is the inverse depth of the ith point feature in the sliding window.

The IMU preintegration and point feature residuals can be obtained according to [16], and the line feature residual can be obtained from Equation (10). In the following, we compute the smoothed pseudorange and Doppler frequency shift residuals in detail. The position of the receiver at time k in the ECEF frame is:

p_{r_{k}}^{E} = R_{N}^{E} R_{w}^{N} p_{r_{k}}^{w} + p_{N}^{E},

(18)

where

p_{r_{k}}^{w}

is the position of the receiver in the local world frame, and it can be modeled as:

p_{r_{k}}^{w} = R_{b_{k}}^{w} p_{r}^{b} + p_{b_{k}}^{w},

(19)

R_{w}^{N}

is the rotation matrix of the local world frame with respect to the ENU frame. Since the two frames are gravity-aligned, the 1-DoF

R_{w}^{N}

is only related to the yaw offset

ϕ

and can be modeled as:

R_{w}^{N} = [\begin{matrix} c o s ϕ & - s i n ϕ & 0 \\ s i n ϕ & c o s ϕ & 0 \\ 0 & 0 & 1 \end{matrix}],

(20)

R_{N}^{E}

represents the rotation matrix of the ENU frame with respect to the ECEF frame, which is determined by the longitude L and latitude B of

p_{N}^{E}

. Given a position

p_{N}^{E} = {[X, Y, Z]}^{T}

in the ECEF frame, it can be denoted in the geodetic frame as:

\begin{array}{l} \{\begin{cases} L = \arctan (Y / X) \\ B = \arctan \{Z (N_{r} + H) / [\sqrt{X^{2} + Y^{2}} (N_{r} (1 - e^{2}) + H)]\} \\ H = Z / \sin B - N_{r} (1 - e^{2}) \end{cases}, \\ N_{r} = a / \sqrt{1 - e^{2} \sin^{2} B}, e^{2} = (a^{2} - b^{2}) / a^{2}, \end{array}

(21)

where a and b denote the semimajor axis and the semiminor axis of the elliptical orbit, respectively.

N_{r}

is the radius of curvature in prime vertical, and e is the eccentricity that is related to a and b.

After the longitude L and latitude B of

p_{N}^{E}

are computed,

R_{N}^{E}

can be obtained by:

R_{N}^{E} = [\begin{matrix} - \sin L & - \sin B \cos L & \cos B \cos L \\ \cos L & - \sin B \sin L & \cos B \sin L \\ 0 & \cos B & \sin B \end{matrix}],

(22)

According to Equation (15), the smoothed pseudorange residual can be represented as:

r_{ρ} ({\tilde{z}}_{r_{k}}^{s_{j}}, X) = \frac{1}{k_{m}} {\tilde{P}}_{r_{k}}^{s_{j}} + (1 - \frac{1}{k_{m}}) [{\bar{ρ}}_{r_{k - 1}}^{s_{j}} + {\tilde{Φ}}_{r_{k}}^{s_{j}} - {\tilde{Φ}}_{r_{k - 1}}^{s_{j}}] - {\bar{ρ}}_{r_{k}}^{s_{j}},

(23)

In nonlinear optimization, it is necessary to calculate the Jacobian matrix of the smoothed pseudorange residual with respect to the states. Therefore, from Equation (23), the Jacobian matrix of the smoothed pseudorange can be obtained as:

J_{ρ} = [\begin{matrix} \frac{\partial r_{ρ}}{\partial p_{b_{k - 1}}^{w}} & \frac{\partial r_{ρ}}{\partial R_{b_{k - 1}}^{w}} & \frac{\partial r_{ρ}}{\partial p_{b_{k}}^{w}} & \frac{\partial r_{ρ}}{\partial R_{b_{k}}^{w}} & \frac{\partial r_{ρ}}{\partial δ t_{r_{k - 1}}} & \frac{\partial r_{ρ}}{\partial δ t_{r_{k}}} & \frac{\partial r_{ρ}}{\partial φ} & \frac{\partial r_{ρ}}{\partial p_{N}^{E}} & \frac{\partial r_{ρ}}{\partial p_{r}^{b}} \end{matrix}],

(24)

with

\begin{array}{l} \frac{\partial r_{ρ}}{\partial p_{b_{k - 1}}^{w}} = (1 - \frac{1}{k_{m}}) \frac{{(p_{s_{j}}^{E} - p_{r_{k - 1}}^{E})}^{T}}{‖p_{s_{j}}^{E} - p_{r_{k - 1}}^{E}‖} R_{N}^{E} R_{w}^{N}, \\ \frac{\partial r_{ρ}}{\partial R_{b_{k - 1}}^{w}} = - (1 - \frac{1}{k_{m}}) \frac{{(p_{s_{j}}^{E} - p_{r_{k - 1}}^{E})}^{T}}{‖p_{s_{j}}^{E} - p_{r_{k - 1}}^{E}‖} R_{N}^{E} R_{w}^{N} R_{b_{k - 1}}^{w} {[p_{r}^{b}]}_{\times}, \\ \frac{\partial r_{ρ}}{\partial p_{b_{k}}^{w}} = - \frac{{(p_{s_{j}}^{E} - p_{r_{k}}^{E})}^{T}}{‖p_{s_{j}}^{E} - p_{r_{k}}^{E}‖} R_{N}^{E} R_{w}^{N}, \\ \frac{\partial r_{ρ}}{\partial R_{b_{k}}^{w}} = \frac{{(p_{s_{j}}^{E} - p_{r_{k}}^{E})}^{T}}{‖p_{s_{j}}^{E} - p_{r_{k}}^{E}‖} R_{N}^{E} R_{w}^{N} R_{b_{k}}^{w} {[p_{r}^{b}]}_{\times}, \\ \frac{\partial r_{ρ}}{\partial δ t_{r_{k - 1}}} = - c (1 - \frac{1}{k_{m}}), \\ \frac{\partial r_{ρ}}{\partial δ t_{r_{k}}} = c, \\ \frac{\partial r_{ρ}}{\partial φ} = - \frac{{(p_{s_{j}}^{E} - p_{r_{k}}^{E})}^{T}}{‖p_{s_{j}}^{E} - p_{r_{k}}^{E}‖} R_{N}^{E} {\dot{R}}_{w}^{N} p_{r_{k}}^{w} + (1 - \frac{1}{k_{m}}) \frac{{(p_{s_{j}}^{E} - p_{r_{k - 1}}^{E})}^{T}}{‖p_{s_{j}}^{E} - p_{r_{k - 1}}^{E}‖} R_{N}^{E} {\dot{R}}_{w}^{N} p_{r_{k - 1}}^{w}, \\ \frac{\partial r_{ρ}}{\partial p_{N}^{E}} = - \frac{{(p_{s_{j}}^{E} - p_{r_{k}}^{E})}^{T}}{‖p_{s_{j}}^{E} - p_{r_{k}}^{E}‖} + (1 - \frac{1}{k_{m}}) \frac{{(p_{s_{j}}^{E} - p_{r_{k - 1}}^{E})}^{T}}{‖p_{s_{j}}^{E} - p_{r_{k - 1}}^{E}‖}, \\ \frac{\partial r_{ρ}}{\partial p_{r}^{b}} = - \frac{{(p_{s_{j}}^{E} - p_{r_{k}}^{E})}^{T}}{‖p_{s_{j}}^{E} - p_{r_{k}}^{E}‖} R_{N}^{E} R_{w}^{N} R_{b_{k}}^{w} + (1 - \frac{1}{k_{m}}) \frac{{(p_{s_{j}}^{E} - p_{r_{k - 1}}^{E})}^{T}}{‖p_{s_{j}}^{E} - p_{r_{k - 1}}^{E}‖} R_{N}^{E} R_{w}^{N} R_{b_{k - 1}}^{w}, \end{array}

(25)

where

{\dot{R}}_{w}^{N} = \frac{d R_{w}^{N}}{d φ} = [\begin{matrix} - s i n ϕ & - c o s ϕ & 0 \\ c o s ϕ & - s i n ϕ & 0 \\ 0 & 0 & 0 \end{matrix}],

(26)

The derivation details are provided in Appendix A.

In Equation (14), the velocity of the receiver in the ECEF frame is transformed from that in the local world frame:

v_{r_{k}}^{E} ≃ R_{N}^{E} R_{w}^{N} v_{b_{k}}^{w},

(27)

Therefore, the Doppler frequency shift residual can be obtained by:

r_{Δ f} ({\tilde{z}}_{r_{k}}^{s_{j}}, X) = \frac{1}{λ} [κ_{k}^{T} (v_{s_{j}}^{E} - v_{r_{k}}^{E}) + c (δ {\dot{t}}_{r_{k}} - δ {\dot{t}}_{s_{j}})] + Δ {\tilde{f}}_{r_{k}}^{s_{j}},

(28)

Similar to the smoothed pseudorange, the Jacobian matrix of the Doppler frequency shift is:

J_{Δ f} = [\begin{matrix} \frac{\partial r_{Δ f}}{\partial p_{b_{k}}^{w}} & \frac{\partial r_{Δ f}}{\partial R_{b_{k}}^{w}} & \frac{\partial r_{Δ f}}{\partial v_{b_{k}}^{w}} & \frac{\partial r_{Δ f}}{\partial δ {\dot{t}}_{r_{k}}} & \frac{\partial r_{Δ f}}{\partial φ} & \frac{\partial r_{Δ f}}{\partial p_{N}^{E}} & \frac{\partial r_{Δ f}}{\partial p_{r}^{b}} \end{matrix}]

(29)

with

\begin{array}{l} \frac{\partial r_{Δ f}}{\partial p_{b_{k}}^{w}} = - {(v_{s_{j}}^{E} - v_{r_{k}}^{E})}^{T} \frac{I {‖p_{s_{j}}^{E} - p_{r_{k}}^{E}‖}^{2} - (p_{s_{j}}^{E} - p_{r_{k}}^{E}) {(p_{s_{j}}^{E} - p_{r_{k}}^{E})}^{T}}{λ {‖p_{s_{j}}^{E} - p_{r_{k}}^{E}‖}^{3}} R_{N}^{E} R_{w}^{N}, \\ \frac{\partial r_{Δ f}}{\partial R_{b_{k}}^{w}} = {(v_{s_{j}}^{E} - v_{r_{k}}^{E})}^{T} \frac{I {‖p_{s_{j}}^{E} - p_{r_{k}}^{E}‖}^{2} - (p_{s_{j}}^{E} - p_{r_{k}}^{E}) {(p_{s_{j}}^{E} - p_{r_{k}}^{E})}^{T}}{λ {‖p_{s_{j}}^{E} - p_{r_{k}}^{E}‖}^{3}} R_{N}^{E} R_{w}^{N} R_{b_{k}}^{w} {[p_{r}^{b}]}_{\times}, \\ \frac{\partial r_{Δ f}}{\partial v_{b_{k}}^{w}} = - \frac{1}{λ} κ_{k}^{T} R_{N}^{E} R_{w}^{N}, \\ \frac{\partial r_{Δ f}}{\partial δ {\dot{t}}_{r_{k}}} = \frac{c}{λ}, \\ \frac{\partial r_{Δ f}}{\partial ϕ} \approx - \frac{1}{λ} κ_{k}^{T} R_{N}^{E} {\dot{R}}_{w}^{N} v_{b_{k}}^{w}, \\ \frac{\partial r_{Δ f}}{\partial p_{N}^{E}} \approx 0, \\ \frac{\partial r_{Δ f}}{\partial p_{r}^{b}} = - {(v_{s_{j}}^{E} - v_{r_{k}}^{E})}^{T} \frac{I {‖p_{s_{j}}^{E} - p_{r_{k}}^{E}‖}^{2} - (p_{s_{j}}^{E} - p_{r_{k}}^{E}) {(p_{s_{j}}^{E} - p_{r_{k}}^{E})}^{T}}{λ {‖p_{s_{j}}^{E} - p_{r_{k}}^{E}‖}^{3}} R_{N}^{E} R_{w}^{N} R_{b_{k}}^{w}, \end{array}

(30)

The corresponding derivation rule is similar to the Jacobian matrix of the smoothed pseudorange and will not be repeated here.

After the residuals of all factors are obtained, then the system can optimize the states by Ceres solver [25]. The cost function of our system is:

\begin{matrix} \min_{X} \{{‖r_{p} - H_{p} X‖}^{2} + \sum_{k \in B} {‖r_{B} ({\tilde{z}}_{b_{k + 1}}^{b_{k}}, X)‖}_{P_{b_{k + 1}}^{b_{k}}}^{2} + \sum_{(l, j) \in C} α ({‖r_{C} ({\tilde{z}}_{l}^{c_{j}}, X)‖}_{P_{l}^{c_{j}}}^{2}) \\ + \sum_{(i, j) \in L} α (r_{L} {(z ˜_{_{j}}^{c_{i}},)}_{P_{_{j}}^{c_{i}}}^{2}) + \sum_{(k, j) \in ρ} {‖r_{ρ} ({\tilde{z}}_{r_{k}}^{s_{j}}, X)‖}_{σ_{ρ}^{2}}^{2} + \sum_{(k, j) \in Δ f} {‖r_{Δ f} ({\tilde{z}}_{r_{k}}^{s_{j}}, X)‖}_{σ_{Δ f}^{2}}^{2}\}, \end{matrix}

(31)

where

r_{p}

is the prior residual for marginalization.

B

is the set of IMU preintegration measurements in the sliding window.

C

and

L

are the set of point and line features in the sliding window, respectively.

α

is the Cauchy robust function used to suppress outliers.

2.6. GNSS-IMU Calibration

p_{r}^{b}

is the extrinsic translation parameter between the GNSS receiver and IMU. After our system performs ENU Origin Estimation and Yaw Estimation, then we can calibrate

p_{r}^{b}

. We estimate

p_{r}^{b}

as the initial value of nonlinear optimization through the following optimization problem:

\min_{p_{r}^{b}} \sum_{(k, j) \in Δ f} {‖r_{Δ f} ({\tilde{z}}_{r_{k}}^{s_{j}}, X)‖}_{σ_{Δ f}^{2}}^{2},

(32)

After the GNSS-IMU calibration is successfully initialized, our system performs the nonlinear optimization.

2.7. Observability Analysis of Tightly Coupled GNSS-VIO System

A SLAM system can be described using a state equation and an output equation, where the input and output are the external variables of the system, and the state is the internal variables of the system. If the states of the system can be completely represented by the output, the system is fully observable; otherwise, the system is not fully observable. Observability plays a very important role in the state estimation problem. If some states of the system are not observable, the positioning precision of the system will be affected when running in long trajectories. The observability of the system can be represented by the observability matrix. If the dimension of the null space of the observability matrix is equal to 0, then the system is fully observable. To facilitate the observability analysis of our proposed system, some simplifications were required. First, the accelerometer and gyroscope biases were not included in the states, because the biases were observable and they did not change the results of the observability analysis [26]. Second, we considered a single point and line feature [27]. Third, the translation parameter

p_{r}^{b}

was successfully calibrated. Then the discrete-time linear error state model and residual of the system are:

\begin{array}{l} δ x_{k + 1} ≃ Φ_{k} δ x_{k} + w_{k}, \\ r_{k} = H_{k} δ x_{k} + n_{k}, \end{array}

(33)

where

δ x_{k}

is the error state, and

r_{k}

is the residual.

Φ_{k}

and

H_{k}

are the error-state transition matrix and the measurement Jacobian matrix, respectively.

w_{k}

and

n_{k}

represent the system noise process and the measurement noise process, respectively. The noise process is modeled as a Zero-mean white Gaussian process.

According to [28], the observability matrix can be obtained as:

M = [\begin{matrix} H_{k} \\ H_{k + 1} Φ_{k} \\ ⋮ \\ H_{k + t} Φ_{k + t - 1} \dots Φ_{k} \end{matrix}],

(34)

In Equation (34), the observability matrix is defined as a function of the error-state transition matrix

Φ_{k}

and the measurement Jacobian matrix

H_{k}

.

Therefore, given the linearized system in Equation (33), its observability can be demonstrated according to Equation (34). The proof is as follows:

Theorem 1.

The states represented in the ECEF frame are fully observable.

Proof of Theorem1.

The simplified states include:

x_{k}^{E} = [p_{b_{k}}^{E}, v_{b_{k}}^{E}, q_{b_{k}}^{E}, μ, o, ϕ, p_{N}^{E}],

(35)

Generally, the raw accelerometer and gyroscope measurements from IMU can be obtained by:

\begin{array}{l} {\tilde{a}}_{b} = R_{w}^{b} (a_{w} + g_{w}) + b_{a} + n_{a}, \\ {\tilde{ω}}_{b} = ω_{b} + b_{g} + n_{g}, \end{array}

(36)

where

{\tilde{a}}_{b}

and

{\tilde{ω}}_{b}

are the accelerometer and gyroscope measurements. The measurements are represented in the local world frame.

Given two time instants, position, velocity, and orientation states can be propagated by the IMU measurements:

\begin{array}{l} p_{b_{k + 1}}^{w} = p_{b_{k}}^{w} + v_{b_{k}}^{w} Δ t_{k} + \int \int_{t \in (t_{k}, t_{k + 1})} [R_{b_{t}}^{w} ({\tilde{a}}_{b_{t}} - b_{a}) - g_{w}] d t^{2}, \\ v_{b_{k + 1}}^{w} = v_{b_{k}}^{w} + \int_{t \in (t_{k}, t_{k + 1})} [R_{b_{t}}^{w} ({\tilde{a}}_{b_{t}} - b_{a}) - g_{w}] d t, \\ q_{b_{k + 1}}^{w} = \int_{t \in (t_{k}, t_{k + 1})} q_{b_{t}}^{w} \otimes [\begin{matrix} 0 \\ \frac{1}{2} ({\tilde{ω}}_{b_{t}} - b_{g}) \end{matrix}] d t, \end{array}

(37)

where

\otimes

denotes the quaternion multiplication operation.

Equation (37) is the continuous state propagation model. To analyze the observability of the system, it was necessary to perform the discretization of the continuous-time system model. Therefore, we used the Euler method to compute the discrete-time model of Equation (37):

\begin{array}{l} p_{b_{k + 1}}^{w} = p_{b_{k}}^{w} + v_{b_{k}}^{w} Δ t_{k} + \frac{1}{2} [R_{b_{k}}^{w} ({\tilde{a}}_{b_{k}} - b_{a}) - g_{w}] Δ t_{k}^{2}, \\ v_{b_{k + 1}}^{w} = v_{b_{k}}^{w} + [R_{b_{k}}^{w} ({\tilde{a}}_{b_{k}} - b_{a}) - g_{w}] Δ t_{k}, \\ q_{b_{k + 1}}^{w} = q_{b_{k}}^{w} \otimes [\begin{matrix} 1 \\ \frac{1}{2} ({\tilde{ω}}_{b_{k}} - b_{g}) Δ t_{k} \end{matrix}], \end{array}

(38)

According to Equation (38), the error propagation equation of position, velocity, and orientation can be obtained by:

[\begin{array}{l} δ p_{b_{k + 1}}^{w} \\ δ v_{b_{k + 1}}^{w} \\ δ θ_{b_{k + 1}}^{w} \end{array}] = [\begin{array}{l} I & Δ t_{k} I & - \frac{1}{2} R_{b_{k}}^{w} {[{\tilde{a}}_{b_{k}} - b_{a}]}_{\times} Δ t_{k}^{2} \\ 0 & I & - R_{b_{k}}^{w} {[{\tilde{a}}_{b_{k}} - b_{a}]}_{\times} Δ t_{k} \\ 0 & 0 & I - {[{\tilde{ω}}_{b_{k}} - b_{g}]}_{\times} Δ t_{k} \end{array}] [\begin{matrix} δ p_{b_{k}}^{w} \\ δ v_{b_{k}}^{w} \\ δ θ_{b_{k}}^{w} \end{matrix}] + {w^{'}}_{k},

(39)

The details of Equation (39) are provided in Appendix B.

Similarly, when the states are represented in the ECEF frame, Equation (39) is still held, namely:

[\begin{array}{l} δ p_{b_{k + 1}}^{E} \\ δ v_{b_{k + 1}}^{E} \\ δ θ_{b_{k + 1}}^{E} \end{array}] = [\begin{array}{l} I & Δ t_{k} I & - \frac{1}{2} R_{b_{k}}^{E} {[{\tilde{a}}_{b_{k}} - b_{a}]}_{\times} Δ t_{k}^{2} \\ 0 & I & - R_{b_{k}}^{E} {[{\tilde{a}}_{b_{k}} - b_{a}]}_{\times} Δ t_{k} \\ 0 & 0 & I - {[{\tilde{ω}}_{b_{k}} - b_{g}]}_{\times} Δ t_{k} \end{array}] [\begin{matrix} δ p_{b_{k}}^{E} \\ δ v_{b_{k}}^{E} \\ δ θ_{b_{k}}^{E} \end{matrix}] + {w^{'}}_{k},

(40)

Therefore, the discrete-time error state model of Equation (35) can be obtained:

\begin{array}{l} [\begin{matrix} δ p_{b_{k + 1}}^{E} \\ δ v_{b_{k + 1}}^{E} \\ δ θ_{b_{k + 1}}^{E} \\ δ μ_{k + 1} \\ δ o_{k + 1} \\ δ φ_{k + 1} \\ δ p_{N}^{E}_{(k + 1)} \end{matrix}] = [\begin{matrix} I_{3} & Δ t_{k} I_{3} & Φ_{k}^{1} & 0_{3 \times 9} \\ 0_{3} & I_{3} & Φ_{k}^{2} & 0_{3 \times 9} \\ 0_{3} & 0_{3} & Φ_{k}^{3} & 0_{3 \times 9} \\ 0_{9 \times 3} & 0_{9 \times 3} & 0_{9 \times 3} & I_{9} \end{matrix}] [\begin{matrix} δ p_{b_{k}}^{E} \\ δ v_{b_{k}}^{E} \\ δ θ_{b_{k}}^{E} \\ δ μ_{k} \\ δ o_{k} \\ δ φ_{k} \\ δ p_{N}^{E}_{(k)} \end{matrix}] + w_{k}, \\ \Rightarrow δ x_{k + 1}^{E} = Φ_{k} δ x_{k}^{E} + w_{k}, \end{array}

(41)

where

\{\begin{cases} Φ_{k}^{1} = - \frac{1}{2} R_{b_{k}}^{E} {[{\tilde{a}}_{b_{k}} - b_{a}]}_{\times} Δ t_{k}^{2} \\ Φ_{k}^{2} = - R_{b_{k}}^{E} {[{\tilde{a}}_{b_{k}} - b_{a}]}_{\times} Δ t_{k} \\ Φ_{k}^{3} = I - {[{\tilde{ω}}_{b_{k}} - b_{g}]}_{\times} Δ t_{k} \end{cases},

(42)

Since

Φ_{k}

is an upper triangular matrix, the error-state transition matrix

Φ_{k + t, k}

from time step k to k + t is also an upper triangular matrix, namely:

\begin{array}{l} Φ_{k + t, k} = Φ_{k + t - 1} \dots Φ_{k + 1} Φ_{k} \\ = [\begin{array}{l} I_{3} & (t_{k + t - 1} - t_{k - 1}) I_{3} & Φ_{13}^{k + t} & 0_{3 \times 9} \\ 0_{3} & I_{3} & Φ_{23}^{k + t} & 0_{3 \times 9} \\ 0_{3} & 0_{3} & Φ_{33}^{k + t} & 0_{3 \times 9} \\ 0_{9 \times 3} & 0_{9 \times 3} & 0_{9 \times 3} & I_{9} \end{array}], \end{array}

(43)

where

Φ_{13}^{k + t}

,

Φ_{23}^{k + t}

, and

Φ_{33}^{k + t}

are nonzero entries.

For the measurement Jacobian matrix, our system consists of visual observations and GNSS measurements. GNSS measurements include pseudorange, carrier phase, and Doppler frequency shift, and any of them yields the same results for observability analysis. Therefore, in the following, we used the smoothed pseudorange measurement for observability analysis.

The Jacobian matrix of the smoothed pseudorange measurement with respect to the states in Equation (35) can be obtained by:

\begin{array}{l} H_{k + t}^{g} & = [\begin{array}{l} p_{k + t} & 0_{1 \times 11} & p_{k + t} R_{N}^{E} {\dot{R}}_{w}^{N} p_{r_{k + t}}^{w} - (1 - \frac{1}{k_{m}}) p_{k + t - 1} R_{N}^{E} {\dot{R}}_{w}^{N} p_{r_{k + t - 1}}^{w} & p_{k + t} - (1 - \frac{1}{k_{m}}) p_{k + t - 1} \end{array}] \\ = [\begin{matrix} h_{k + t}^{1} & 0_{1 \times 11} & h_{k + t}^{2} & h_{k + t}^{3} \end{matrix}], \\ p_{k + t} & = - \frac{{(p_{s_{j}}^{E} - p_{r_{k + t}}^{E})}^{T}}{‖p_{s_{j}}^{E} - p_{r_{k + t}}^{E}‖}, p_{k + t - 1} = - \frac{{(p_{s_{j}}^{E} - p_{r_{k + t - 1}}^{E})}^{T}}{‖p_{s_{j}}^{E} - p_{r_{k + t - 1}}^{E}‖}, \end{array}

(44)

The derivation details are similar to the Jacobian matrix of the smoothed pseudorange, which has been computed in Appendix A. Since the smoothed pseudorange measurement does not include velocity, orientation, the inverse depth of point feature, and the orthonormal representation of line feature, the corresponding entry of the Jacobian matrix is equal to zero.

In addition, we can obtain the Jacobian matrix of the visual observations by:

H_{k + t}^{v} = [\begin{matrix} 0_{2 \times 9} & \frac{\partial r_{C}}{\partial δ μ_{k + t}} & 0_{2 \times 4} & 0_{2 \times 4} \\ 0_{2 \times 9} & 0_{2 \times 1} & \frac{\partial r_{L}}{\partial δ o_{k + t}} & 0_{2 \times 4} \end{matrix}]

(45)

where

\frac{\partial r_{C}}{\partial δ μ_{k + t}}

and

\frac{\partial r_{L}}{\partial δ o_{k + t}}

can be obtained from [10].

Therefore, we can obtain the entry of the observability matrix:

[\begin{array}{l} H_{k + t}^{v} \\ H_{k + t}^{g} \end{array}] Φ_{k + t, k} = [\begin{array}{l} 0_{4 \times 3} & 0_{4 \times 3} & 0_{4 \times 3} & Π^{'} & 0_{4 \times 1} & 0_{4 \times 3} \\ h_{k + t}^{1} & h_{k + t}^{1} (t_{k + t - 1} - t_{k - 1}) & h_{k + t}^{1} Φ_{13}^{k + t} & 0_{3 \times 5} & h_{k + t}^{2} & h_{k + t}^{3} \end{array}]

(46)

where

Π^{'} = [\begin{matrix} \frac{\partial r_{C}}{\partial δ μ_{k + t}} & 0_{2 \times 4} \\ 0_{2 \times 1} & \frac{\partial r_{L}}{\partial δ o_{k + t}} \end{matrix}]

(47)

According to Equation (46), we can clearly observe that the dimension of the null space of

M

is equal to zero, which means the states represented in the ECEF frame are fully observable and the tightly coupled GNSS-VIO state estimator is consistent. □

The fact that the tightly coupled GNSS-VIO system is fully observable means that even if the system runs in long trajectories, the accumulated error can be eliminated. By leveraging the global measurements from GNSS, our system can achieve high-precision and drift-free positioning compared with the VIO system, which has four unobservable directions.

3. System Overview

The architecture of our proposed system is shown in Figure 4.

The proposed system implements four threads including data input, preprocessing, initialization, and nonlinear optimization. As shown in Figure 4, the white block diagrams represent the work that has been implemented by VINS-Mono [16] and GVINS [21], and the green block diagrams represent improvements we made. The inputs of our system are image, IMU, and GNSS measurements. In the preprocessing step, point and line feature detection and tracking were performed, IMU measurements were preintegrated, and the pseudorange measurements were smoothed by the carrier phase. In the initialization step, we followed the routine of VINS-Mono [16] for VI-Alignment. After VI-Alignment was completed, we performed GNSS initialization, which was divided into four stages: ENU Origin Estimation (Coarse), Yaw Estimation, ENU Origin Estimation (Fine), and GNSS-IMU Calibration. The first three stages were implemented in GVINS [21]. Finally, nonlinear optimization was performed. Nonlinear optimization will optimize the states in Equation (17) by leveraging the residuals and Jacobian matrices of different factors, which were computed in Section 2.5.

4. Experimental Results

Our experiments were conducted on the public dataset GVINS-Dataset [29], which captured scenes from the Hong Kong University of Science and Technology. The measurements were collected by a helmet that is equipped with a VI-Sensor and a u-blox ZED-F9P GNSS receiver. The dataset sports field captured a sports field scene where the device followed an athletic track for five laps. The sports field is an outdoor environment with an open area where the satellites are well locked and the RTK solution remains fixed. The other dataset complex environment was a complex indoor–outdoor environment where many challenging scenes were captured. For example, point and line features cannot be detected in bright or dim scenes, and the GNSS signal was highly corrupted or unavailable in cluttered or indoor environments (about 25 m). The overall distance of the complex environment dataset was over 3 km. We compared our proposed system with some open-source SLAM systems, including GVINS and VINS-Mono where VINS-Mono includes results with and without loop closure. For comparison with our results, we transformed the trajectories of VINS-Mono from the local world frame to the ECEF frame. Furthermore, since RTK has centimeter-level positioning precision, we compared it with our trajectories as the ground truth.

4.1. Sports Field

We plotted the trajectories of the sports field on Google Earth as shown in Figure 5. We see that VINS-Mono without loop closure suffers from accumulated drift among all three directions, which leads to the worst positioning precision among methods. Obviously, the drift increases with each lap around the sports field. VINS-Mono with loop closure can significantly improve the positioning precision, but there is still an obvious drift in yaw direction, which is mainly because the VINS-Mono has four unobservable directions. As a comparison, since GVINS is fully observable in the ECEF frame, it has a smaller positioning error and is drift-free. As depicted in Figure 5, its trajectory is very close to RTK. In addition, since our method is also a tightly coupled GNSS-VIO system, which is fully observable in the ECEF frame and the ENU frame, the positioning result of our method is also very accurate.

To quantitatively analyze results, we compared the positioning errors of GVINS and our method in detail. As shown in Figure 6, we visualized the positioning errors of GVINS and our proposed method. We see that the results of GVINS and our system fluctuate within a small range in the ENU frame and ECEF frame. The positioning errors of our method are less than 1m among all three directions of the ENU frame. Compared with GVINS, the error of our system is smaller in the east and north directions. In addition, the error of our method is smooth and stable in the ECEF frame, which is because we smooth the pseudorange measurements with the carrier phase. In addition, we listed the RMSE and Spherical Error Probable (SEP) for the positioning error of GVINS and our method in the ENU frame and ECEF frame. From Table 1, we can see that the RMSE of our positioning errors is lower than that of GVINS except in the up direction. The SEP refers to the radius of a sphere in which 50% of the estimated positions occur, and it can also evaluate the positioning precision of our system. The smaller the SEP, the higher the positioning precision of our system. We see that the SEP of our method in the ECEF frame is lower than that of GVINS, which shows that the positioning precision of our method is more accurate in the ECEF frame.

4.2. Complex Environment

In the following, we conducted experiments on the dataset complex environment, and we compared VINS-Mono, GVINS, RTK, and our proposed method. As shown in Figure 7, since VINS-Mono has four unobservable directions, the trajectory of VINS-Mono has a large drift. In addition, we saw that a section of the trajectory of VINS-Mono deviated from RTK by about 34 m, which is an unacceptable result. On the contrary, since GVINS and our proposed system are tightly coupled GNSS-VIO systems, there is no accumulated drift theoretically. In addition, due to a section of the trajectory being collected indoors, the GNSS receiver cannot obtain measurements at this time, which causes the failure of the RTK solution. In contrast, even if the GNSS measurements cannot be obtained, our system works well with VIO, and our trajectory is still accurate.

To further compare the performance of GVINS and our method, we analyzed the positioning errors of the two methods in the dataset complex environment as shown in Figure 8. It should be noted that we compared the RTK solution with our method as the ground truth. However, when the receiver is occluded, the satellite signal cannot be received, and the RTK solution is useless at this time. Thus, we only compared with segments where the RTK solution is available. Due to the complexity of the dataset, the overall positioning precision of the system is inferior to that of the sports field. The RMSE and SEP of GVINS and our method are shown in Table 2. We see that the positioning errors of GVINS in the east and north directions are slightly lower than that of our method, while the positioning error of our method in the up direction is much lower than that of GVINS. The SEP metric also shows that our method outperforms GVINS on this dataset.

Furthermore, since the ENU frame is gravity-aligned, we further analyzed the 2D trajectory error in the East–North plane as shown in Figure 9. The trajectory projected on the East–North plane is stained in red. The z-axis represents the positioning errors in the corresponding East–North plane, and the errors are represented by a scatter plot. In some bright or dim scenes, only a few point and line features can be detected, which will affect the positioning precision of the system. In addition, buildings, indoor environments, and trees will block satellite signals, which also degrade the performance of our system. Therefore, the positioning error in the complex environment is larger than that in the sports field, and the maximum error exceeds 2 m. As depicted in Figure 9, a sudden increase in positioning error occurs at the curves of the trajectory due to the large change in the orientation direction. On the contrary, the positioning error of the trajectory with small curvature is smoother compared with that of the trajectory with large curvature, which shows that our method is still effective even in complex scenes.

4.3. Observability Analysis

In Section 2.7, we demonstrated that the states represented in the ECEF are fully observable. However, according to [20], if the states are represented in the local world frame, the GNSS-VIO system has still four unobservable directions. Therefore, we can perform observability analysis from experimental results. Figure 10 shows the 3D trajectories of the complex environment represented in the ECEF frame and local world frame. We zoomed in on the trajectories of turning and climbing stairs for the convenience of the analysis. From Figure 10b, we see that since the states are unobservable in the local world frame, the two trajectories of climbing the same stairs have a drift. Even if the GNSS measurements are included in our system, the two trajectories are still inconsistent. In contrast, due to the states represented in the ECEF frame being fully observable, the two trajectories of climbing the same stairs are consistent and very close. Similarly, the same conclusions can be obtained for the trajectories of turning as shown in Figure 10. According to the results of the observability analysis, there is an obvious drift in the two trajectories of turning in the local world frame, while the phenomenon does not occur in the ECEF frame. In addition, the consistency of the system also shows that even in large-scale scenarios, it can still achieve high-precision positioning results as illustrated in Figure 10a.

4.4. Smoothed Interval

Equation (15) requires the receiver to continuously lock the carrier phase. If the receiver loses lock, the smoothed pseudorange is corrupted by any cycle slip that occurs and must be reinitialized when that happens. Therefore, it is necessary to investigate the influence of different smoothed intervals on the positioning precision of our system. The positioning error with six different settings is illustrated in Figure 11. When the smoothed interval is set to 200 epochs, the system has the largest positioning error in the three directions of the ENU frame. This happens because the receiver loses lock frequently within a smoothed interval and then cycle slip occurs, which causes an increase in the positioning error. By reducing the smoothed interval, the phenomenon that the receiver loses lock can be eliminated, which can improve the positioning precision of the carrier phase smoothed pseudorange. Obviously, with the smoothed intervals of 5, 10, 20, and 50 epochs, our system has smaller errors in east and north directions compared to the smoothed intervals of 100 or 200 epochs. However, if the smoothed interval is too small, the positioning precision in up direction cannot be further improved, and may even become worse. As shown in Figure 11, when the smoothed interval is set to 5 epochs, the positioning error in the up direction is the largest among different settings. The reason for this phenomenon is that when the smoothed interval is too small, the carrier phase cannot smooth the pseudorange well, which causes the smoothed pseudorange measurement to degenerate into the pseudorange measurement.

4.5. Line Feature Tracking Threshold

In the nonlinear optimization stage, if the number of keyframes for which line features are continuously observed is less than a threshold, then these line features are filtered out. A different threshold has a great impact on the robustness and positioning precision. As shown in Figure 12, a threshold of 2 means that those line segments that are observed by two consecutive keyframes or more can be added to the factor graph. However, since these line segments are continuously tracked for too few frames, they are not stable enough for nonlinear optimization, which will definitely reduce the positioning precision. On the contrary, if the line feature tracking threshold varies between 3 and 4, the proposed system can obtain more accurate positioning results in the ENU frame. Because the line segments observed by three or four consecutive keyframes are more stable and more suitable for nonlinear optimization.

5. Conclusions

In this paper, we propose a tightly coupled GNSS-VIO system based on point-line features. First, since line features contain more environmental structure information, we added line features to the system. Second, the pseudorange measurement can only achieve meter-level positioning precision but is fast and unambiguous. On the contrary, the carrier phase measurement can achieve the centimeter-level positioning precision but is required to solve the whole-cycle ambiguity, which is time-consuming. Therefore, we propose to combine the advantages of the two measurements and replace the pseudorange with the carrier phase smoothed pseudorange, which can greatly improve the performance of our system. Third, we considered the extrinsic translation parameter between the GNSS receiver and IMU, and our system can perform real-time parameter calibration. Finally, we demonstrated that if the states are represented in the ECEF frame, the tightly coupled GNSS-VIO system is fully observable. We conducted experiments on public datasets, which show our system achieves high-precision, robust, and real-time localization. In the future, we will further focus on improved methods for tightly coupled GNSS-VIO systems.

Author Contributions

Conceptualization, B.D. and K.Z.; methodology, B.D. and K.Z.; validation, B.D. and K.Z.; investigation, B.D.; writing—original draft preparation, B.D.; writing—review and editing, B.D. and K.Z.; All authors have read and agreed to the published version of the manuscript.

Funding

This research work was supported by the Key-Area Research and Development Program of Guangdong Province (No. 2020B0909050003).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

To derive the Jacobian matrix of the smoothed pseudorange residual with respect to the states, Equation (15) is expanded into:

\begin{array}{l} {\bar{ρ}}_{r_{k}}^{s_{j}} = & \frac{1}{k_{m}} [‖p_{s_{j}}^{E} - p_{r_{k}}^{E}‖ + c (δ t_{r_{k}} - δ t_{s_{j}}) + T_{r_{k}}^{s_{j}} + I_{r_{k}}^{s_{j}}] \\ + (1 - \frac{1}{k_{m}}) \{{\bar{ρ}}_{r_{k - 1}}^{s_{j}} + ‖p_{s_{j}}^{E} - p_{r_{k}}^{E}‖ + c (δ t_{r_{k}} - δ t_{s_{j}}) + T_{r_{k}}^{s_{j}} - I_{r_{k}}^{s_{j}} + λ N_{k} \\ - [‖p_{s_{j}}^{E} - p_{r_{k - 1}}^{E}‖ + c (δ t_{r_{k - 1}} - δ t_{s_{j}}) + T_{r_{k - 1}}^{s_{j}} - I_{r_{k - 1}}^{s_{j}} + λ N_{k - 1}]\} + ε_{ρ_{k}^{j}}, \end{array}

(A1)

The derivations can be obtained by the chain rule. For the translation

p_{b_{k - 1}}^{w}

and rotation matrix

R_{b_{k - 1}}^{w}

, we can calculate their derivations by:

\begin{array}{l} \frac{\partial r_{ρ}}{\partial p_{b_{k - 1}}^{w}} & = \frac{\partial r_{ρ}}{\partial p_{r_{k - 1}}^{E}} \frac{\partial p_{r_{k - 1}}^{E}}{\partial p_{r_{k - 1}}^{w}} \frac{\partial p_{r_{k - 1}}^{w}}{\partial p_{b_{k - 1}}^{w}} = (1 - \frac{1}{k_{m}}) \frac{{(p_{s_{j}}^{E} - p_{r_{k - 1}}^{E})}^{T}}{‖p_{s_{j}}^{E} - p_{r_{k - 1}}^{E}‖} \cdot R_{N}^{E} R_{w}^{N} \cdot I, \\ \frac{\partial r_{ρ}}{\partial R_{b_{k - 1}}^{w}} & = \frac{\partial r_{ρ}}{\partial p_{r_{k - 1}}^{E}} \frac{\partial p_{r_{k - 1}}^{E}}{\partial p_{r_{k - 1}}^{w}} \frac{\partial p_{r_{k - 1}}^{w}}{\partial R_{b_{k - 1}}^{w}} \\ = (1 - \frac{1}{k_{m}}) \frac{{(p_{s_{j}}^{E} - p_{r_{k - 1}}^{E})}^{T}}{‖p_{s_{j}}^{E} - p_{r_{k - 1}}^{E}‖} \cdot R_{N}^{E} R_{w}^{N} \cdot (- R_{b_{k - 1}}^{w} {[p_{r}^{b}]}_{\times}), \end{array}

(A2)

The derivation of the states at time k is similar to that at time k − 1 and will not be repeated here. In the following we introduce the Jacobian matrix of the smoothed pseudorange with respect to the yaw offset

φ

:

\begin{array}{l} \frac{\partial r_{ρ}}{\partial φ} & = \frac{1}{k_{m}} \frac{\partial {\tilde{P}}_{r_{k}}^{s_{j}}}{\partial p_{r_{k}}^{E}} \frac{\partial p_{r_{k}}^{E}}{\partial ϕ} + (1 - \frac{1}{k_{m}}) [\frac{\partial {\tilde{Φ}}_{r_{k}}^{s_{j}}}{\partial p_{r_{k}}^{E}} \frac{\partial p_{r_{k}}^{E}}{\partial φ} - \frac{\partial {\tilde{Φ}}_{r_{k - 1}}^{s_{j}}}{\partial p_{r_{k - 1}}^{E}} \frac{\partial p_{r_{k - 1}}^{E}}{\partial φ}] \\ = \frac{\partial {\tilde{P}}_{r_{k}}^{s_{j}}}{\partial p_{r_{k}}^{E}} \frac{\partial p_{r_{k}}^{E}}{\partial φ} - (1 - \frac{1}{k_{m}}) \frac{\partial {\tilde{Φ}}_{r_{k - 1}}^{s_{j}}}{\partial p_{r_{k - 1}}^{E}} \frac{\partial p_{r_{k - 1}}^{E}}{\partial φ} \\ = - \frac{{(p_{s_{j}}^{E} - p_{r_{k}}^{E})}^{T}}{‖p_{s_{j}}^{E} - p_{r_{k}}^{E}‖} \cdot R_{N}^{E} {\dot{R}}_{w}^{N} p_{r_{k}}^{w} + (1 - \frac{1}{k_{m}}) \frac{{(p_{s_{j}}^{E} - p_{r_{k - 1}}^{E})}^{T}}{‖p_{s_{j}}^{E} - p_{r_{k - 1}}^{E}‖} \cdot R_{N}^{E} {\dot{R}}_{w}^{N} p_{r_{k - 1}}^{w}, \end{array}

(A3)

where

{\dot{R}}_{w}^{N} = \frac{d R_{w}^{N}}{d φ} = [\begin{matrix} - \sin φ & - \cos φ & 0 \\ \cos φ & - \sin φ & 0 \\ 0 & 0 & 0 \end{matrix}],

(A4)

The Jacobian matrix of the smoothed pseudorange with respect to the translation

p_{N}^{E}

can be calculated by:

\begin{array}{l} \frac{\partial r_{ρ}}{\partial p_{N}^{E}} & = \frac{1}{k_{m}} \frac{\partial {\tilde{P}}_{r_{k}}^{s_{j}}}{\partial p_{r_{k}}^{E}} \frac{\partial p_{r_{k}}^{E}}{\partial p_{N}^{E}} + (1 - \frac{1}{k_{m}}) [\frac{\partial {\tilde{Φ}}_{r_{k}}^{s_{j}}}{\partial p_{r_{k}}^{E}} \frac{\partial p_{r_{k}}^{E}}{\partial p_{N}^{E}} - \frac{\partial {\tilde{Φ}}_{r_{k - 1}}^{s_{j}}}{\partial p_{r_{k - 1}}^{E}} \frac{\partial p_{r_{k - 1}}^{E}}{\partial p_{N}^{E}}] \\ = \frac{\partial {\tilde{P}}_{r_{k}}^{s_{j}}}{\partial p_{r_{k}}^{E}} \frac{\partial p_{r_{k}}^{E}}{\partial p_{N}^{E}} - (1 - \frac{1}{k_{m}}) \frac{\partial {\tilde{Φ}}_{r_{k - 1}}^{s_{j}}}{\partial p_{r_{k - 1}}^{E}} \frac{\partial p_{r_{k - 1}}^{E}}{\partial p_{N}^{E}} \\ = - \frac{{(p_{s_{j}}^{E} - p_{r_{k}}^{E})}^{T}}{‖p_{s_{j}}^{E} - p_{r_{k}}^{E}‖} \cdot I + (1 - \frac{1}{k_{m}}) \frac{{(p_{s_{j}}^{E} - p_{r_{k - 1}}^{E})}^{T}}{‖p_{s_{j}}^{E} - p_{r_{k - 1}}^{E}‖} \cdot I, \end{array}

(A5)

Similarly, the Jacobian matrix of the smoothed pseudorange with respect to the extrinsic translation parameter

p_{r}^{b}

can be calculated by:

\begin{array}{l} \frac{\partial r_{ρ}}{\partial p_{r}^{b}} & = \frac{1}{k_{m}} \frac{\partial {\tilde{P}}_{r_{k}}^{s_{j}}}{\partial p_{r_{k}}^{E}} \frac{\partial p_{r_{k}}^{E}}{\partial p_{r_{k}}^{w}} \frac{\partial p_{r_{k}}^{w}}{\partial p_{r}^{b}} + (1 - \frac{1}{k_{m}}) [\frac{\partial {\tilde{Φ}}_{r_{k}}^{s_{j}}}{\partial p_{r_{k}}^{E}} \frac{\partial p_{r_{k}}^{E}}{\partial p_{r_{k}}^{w}} \frac{\partial p_{r_{k}}^{w}}{\partial p_{r}^{b}} - \frac{\partial {\tilde{Φ}}_{r_{k - 1}}^{s_{j}}}{\partial p_{r_{k - 1}}^{E}} \frac{\partial p_{r_{k - 1}}^{E}}{\partial p_{r_{k - 1}}^{w}} \frac{\partial p_{r_{k - 1}}^{w}}{\partial p_{r}^{b}}] \\ = \frac{\partial {\tilde{P}}_{r_{k}}^{s_{j}}}{\partial p_{r_{k}}^{E}} \frac{\partial p_{r_{k}}^{E}}{\partial p_{r_{k}}^{w}} \frac{\partial p_{r_{k}}^{w}}{\partial p_{r}^{b}} - (1 - \frac{1}{k_{m}}) \frac{\partial {\tilde{Φ}}_{r_{k - 1}}^{s_{j}}}{\partial p_{r_{k - 1}}^{E}} \frac{\partial p_{r_{k - 1}}^{E}}{\partial p_{r_{k - 1}}^{w}} \frac{\partial p_{r_{k - 1}}^{w}}{\partial p_{r}^{b}} \\ = - \frac{{(p_{s_{j}}^{E} - p_{r_{k}}^{E})}^{T}}{‖ p_{s_{j}}^{E} - p_{r_{k}}^{E} ‖} \cdot R_{N}^{E} R_{w}^{N} \cdot R_{b_{k}}^{w} + (1 - \frac{1}{k_{m}}) \frac{{(p_{s_{j}}^{E} - p_{r_{k - 1}}^{E})}^{T}}{‖ p_{s_{j}}^{E} - p_{r_{k - 1}}^{E} ‖} \cdot R_{N}^{E} R_{w}^{N} \cdot R_{b_{k - 1}}^{w}, \end{array}

(A6)

Appendix B

We rewrite the discrete-time propagation equation of position in Equation (38):

p_{b_{k + 1}}^{w} = p_{b_{k}}^{w} + v_{b_{k}}^{w} Δ t_{k} + \frac{1}{2} [R_{b_{k}}^{w} ({\tilde{a}}_{b_{k}} - b_{a}) - g_{w}] Δ t_{k}^{2},

(A7)

To derive the derivation of the position at time k + 1 with respect to the orientation at time k, we add a tiny perturbation to the rotation matrix at time k by:

\begin{array}{l} \frac{\partial δ p_{b_{k + 1}}^{w}}{\partial δ θ_{b_{k}}^{w}} = \frac{1}{2} Δ t_{k}^{2} \lim_{δ θ_{b_{k}}^{w} \to 0} \frac{R_{b_{k}}^{w} \exp ({[δ θ_{b_{k}}^{w}]}_{\times}) ({\tilde{a}}_{b_{k}} - b_{a}) - R_{b_{k}}^{w} ({\tilde{a}}_{b_{k}} - b_{a})}{δ θ_{b_{k}}^{w}} \\ \approx \frac{1}{2} Δ t_{k}^{2} \lim_{δ θ_{b_{k}}^{w} \to 0} \frac{R_{b_{k}}^{w} [I + {[δ θ_{b_{k}}^{w}]}_{\times}] ({\tilde{a}}_{b_{k}} - b_{a}) - R_{b_{k}}^{w} ({\tilde{a}}_{b_{k}} - b_{a})}{δ θ_{b_{k}}^{w}} \\ = \frac{1}{2} Δ t_{k}^{2} \lim_{δ θ_{b_{k}}^{w} \to 0} \frac{R_{b_{k}}^{w} {[δ θ_{b_{k}}^{w}]}_{\times} ({\tilde{a}}_{b_{k}} - b_{a})}{δ θ_{b_{k}}^{w}} \\ = - \frac{1}{2} R_{b_{k}}^{w} {[{\tilde{a}}_{b_{k}} - b_{a}]}_{\times} Δ t_{k}^{2}, \end{array}

(A8)

The derivation of the orientation at time k + 1 with respect to the orientation at time k is similar to the routine of Equation (A8). We can also add a tiny perturbation to both sides of the orientation propagation equation by:

q_{b_{k + 1}}^{w} \otimes [\begin{matrix} 1 \\ \frac{1}{2} δ θ_{b_{k + 1}}^{w} \end{matrix}] = q_{b_{k}}^{w} \otimes [\begin{matrix} 1 \\ \frac{1}{2} δ θ_{b_{k}}^{w} \end{matrix}] \otimes [\begin{matrix} 1 \\ \frac{1}{2} ({\tilde{ω}}_{b_{k}} - b_{g}) Δ t_{k} \end{matrix}],

(A9)

then

\begin{array}{l} [\begin{matrix} 1 \\ \frac{1}{2} δ θ_{b_{k + 1}}^{w} \end{matrix}] = {(q_{b_{k + 1}}^{w})}^{*} \otimes q_{b_{k}}^{w} \otimes [\begin{matrix} 1 \\ \frac{1}{2} δ θ_{b_{k}}^{w} \end{matrix}] \otimes [\begin{matrix} 1 \\ \frac{1}{2} ({\tilde{ω}}_{b_{k}} - b_{g}) Δ t_{k} \end{matrix}] \\ \approx [\begin{matrix} 1 \\ - \frac{1}{2} ({\tilde{ω}}_{b_{k}} - b_{g}) Δ t_{k} \end{matrix}] \otimes [\begin{matrix} 1 \\ \frac{1}{2} δ θ_{b_{k}}^{w} \end{matrix}] \otimes [\begin{matrix} 1 \\ \frac{1}{2} ({\tilde{ω}}_{b_{k}} - b_{g}) Δ t_{k} \end{matrix}] \\ = [\begin{matrix} 1 \\ \frac{1}{2} R δ θ_{b_{k}}^{w} \end{matrix}], \end{array}

(A10)

where

R = e x p (- {[{\tilde{ω}}_{b_{k}} - b_{g}]}_{\times} Δ t_{k})

. Considering only the imaginary part of Equation (A10), then we have:

\begin{array}{l} δ θ_{b_{k + 1}}^{w} = R δ θ_{b_{k}}^{w} \\ = \exp (- {[{\tilde{ω}}_{b_{k}} - b_{g}]}_{\times} Δ t_{k}) δ θ_{b_{k}}^{w} \\ \approx [I - {[{\tilde{ω}}_{b_{k}} - b_{g}]}_{\times} Δ t_{k}] δ θ_{b_{k}}^{w}, \end{array}

(A11)

Therefore, according to Equation (A11), we can obtain:

\frac{\partial δ θ_{b_{k + 1}}^{w}}{\partial δ θ_{b_{k}}^{w}} = I - {[{\tilde{ω}}_{b_{k}} - b_{g}]}_{\times} Δ t_{k},

(A12)

References

Weiss, S.; Siegwart, R. Real-time metric state estimation for modular vision-inertial systems. In Proceedings of the 2011 IEEE International Conference on Robotics and Automation (ICRA), Shanghai, China, 9–13 May 2011; pp. 4531–4537. [Google Scholar]
Kneip, L.; Weiss, S.; Siegwart, R. Deterministic initialization of metric state estimation filters for loosely-coupled monocular vision-inertial systems. In Proceedings of the 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), San Francisco, CA, USA, 25–30 September 2011; pp. 2235–2241. [Google Scholar]
Bloesch, M.; Burri, M.; Omari, S.; Hutter, M.; Siegwart, R. Iterated extended Kalman filter based visual-inertial odometry using direct photometric feedback. Int. J. Robot. Res. 2017, 36, 1053–1072. [Google Scholar] [CrossRef] [Green Version]
Leutenegger, S.; Lynen, S.; Bosse, M.; Siegwart, R.; Furgale, P. Keyframe-based visual–inertial odometry using nonlinear optimization. Int. J. Robot. Res. 2015, 34, 314–334. [Google Scholar] [CrossRef] [Green Version]
Davison, A.J.; Reid, I.D.; Molton, N.D. MonoSLAM: Real-time single camera SLAM. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 29, 1052–1067. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Shi, J. Good features to track. In Proceedings of the 1994 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 21–23 June 1994; pp. 4531–4537. [Google Scholar]
Klein, G.; Murray, D. Parallel tracking and mapping for small AR workspaces. In Proceedings of the 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality, Nara, Japan, 13–16 November 2007; pp. 225–234. [Google Scholar]
Mur-Artal, R.; Montiel, J.M.M.; Tardós, J.D. ORB-SLAM: A versatile and accurate monocular SLAM system. IEEE Trans. Robot. 2015, 31, 1147–1163. [Google Scholar] [CrossRef] [Green Version]
Mur-Artal, R.; Tardós, J.D. Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras. IEEE Trans. Robot. 2017, 33, 1255–1262. [Google Scholar] [CrossRef] [Green Version]
He, Y.; Zhao, J.; Guo, Y.; He, W.; Yuan, K. PL-VIO: Tightly-Coupled Monocular Visual–Inertial Odometry Using Point and Line Features. Sensors 2018, 18, 1159. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Von, G.; Jakubowicz, J.; Morel, J.M.; Randall, G. LSD: A fast line segment detector with a false detection control. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 32, 722–732. [Google Scholar]
Kaehler, A.; Bradski, G. Learning OpenCV 3: Computer vision in C++ with the OpenCV Library; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2016. [Google Scholar]
Yang, Y.; Geneva, P.; Eckenhoff, K.; Huang, G. Visual-inertial odometry with point and line features. In Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 3–8 November 2019; pp. 2447–2454. [Google Scholar]
Fu, Q.; Wang, J.; Yu, H.; Ali, I.; Guo, F.; He, Y.; Zhang, H. PL-VINS: Real-time monocular visual-inertial SLAM with point and line features. arXiv 2020, arXiv:2009.07462. [Google Scholar]
Lee, J.; Park, S.Y. PLF-VINS: Real-time monocular visual-inertial SLAM with point-line fusion and parallel-line fusion. IEEE Robot. Autom. Lett. 2021, 6, 7033–7040. [Google Scholar] [CrossRef]
Qin, T.; Li, P.; Shen, S. Vins-mono: A robust and versatile monocular visual-inertial state estimator. IEEE Trans. Robot. 2018, 34, 1004–1020. [Google Scholar] [CrossRef] [Green Version]
Mourikis, A.I.; Roumeliotis, S.I. A Multi-State Constraint Kalman Filter for Vision-aided Inertial Navigation. In Proceedings of the 2007 IEEE International Conference on Robotics and Automation (ICRA), Roma, Italy, 10–14 April 2007; pp. 3565–3572. [Google Scholar]
Li, M.; Mourikis, A.I. Improving the accuracy of EKF-based visual-inertial odometry. In Proceedings of the 2012 IEEE International Conference on Robotics and Automation (ICRA), St Paul, MN, USA, 14–18 May 2012; pp. 828–835. [Google Scholar]
Campos, C.; Elvira, R.; Rodríguez, J.J.G.; Montiel, J.M.; Tardós, J.D. Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam. IEEE Trans. Robot. 2021, 37, 1874–1890. [Google Scholar] [CrossRef]
Lee, W.; Eckenhoff, K.; Geneva, P.; Huang, G. Intermittent gps-aided vio: Online initialization and calibration. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–4 June 2020; pp. 5724–5731. [Google Scholar]
Cao, S.; Lu, X.; Shen, S. GVINS: Tightly Coupled GNSS–Visual–Inertial Fusion for Smooth and Consistent State Estimation. IEEE Trans. Robot. 2022, 1–18. [Google Scholar] [CrossRef]
Li, X.; Wang, X.; Liao, J.; Li, X.; Li, S.; Lyu, H. Semi-tightly coupled integration of multi-GNSS PPP and S-VINS for precise positioning in GNSS-challenged environments. Satell. Navig. 2021, 2, 1. [Google Scholar] [CrossRef]
Liu, J.; Gao, W.; Hu, Z. Optimization-based visual-inertial SLAM tightly coupled with raw GNSS measurements. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 11612–11618. [Google Scholar]
Zhang, G.; Lee, J.H.; Lim, J.; Suh, I.H. Building a 3-D Line-Based Map Using Stereo SLAM. IEEE Trans. Robot. 2015, 31, 1364–1377. [Google Scholar] [CrossRef]
Agarwal, S.; Mierle, K. Ceres Solver. Available online: http://ceres-solver.org (accessed on 25 April 2022).
Jones, E.S.; Soatto, S. Visual-inertial navigation, mapping and localization: A scalable real-time causal approach. Int. J. Robot. Res. 2011, 30, 407–430. [Google Scholar] [CrossRef]
Roumeliotis, S.I.; Kottas, D.G.; Guo, C.; Hesch, J. Observability-Constrained Vision-Aided Inertial Navigation. U.S. Patent 9,243,916, 26 January 2016. [Google Scholar]
Li, M.; Mourikis, A.I. High-precision, consistent EKF-based visual-inertial odometry. Int. J. Robot. Res. 2013, 32, 690–711. [Google Scholar] [CrossRef]
Cao, S.; Lu, X. GVINS-Dataset. Available online: https://github.com/HKUST-Aerial-Robotics/GVINS-Dataset (accessed on 25 April 2022).

Figure 1. An illustration of the ECEF, Geodetic, ENU, and local world frames.

Figure 2. Line feature triangulation.

Figure 3. Factor graph illustration of our system.

Figure 4. A block diagram illustrating the workflow of our proposed system.

Figure 5. The trajectories of VINS-Mono, GVINS, RTK, and our proposed method in the sports field.

Figure 6. The positioning errors of GVINS and our proposed system in the sports field. The left three rows are the positioning errors in the east, north, and up directions of the ENU frame. The right three rows are the positioning errors in the X, Y, and Z directions of the ECEF frame.

Figure 7. The trajectories of VINS-Mono, GVINS, RTK, and our proposed method in complex environments.

Figure 8. The positioning errors of GVINS and our proposed system in the complex environment.

Figure 9. The 2D trajectory error of the complex environment on the East–North plane.

Figure 10. Comparison of the 3D trajectories represented in different frames. (a) is the ECEF frame and (b) is the local world frame.

Figure 11. The positioning errors with different smoothed interval settings (m = 5, 10, 20, 50, 100, and 200).

Figure 12. The positioning errors of different line feature tracking thresholds.

Table 1. RMSE[m] and SEP[m] statistic of GVINS and our proposed method in the sports field.

	ENU Frame				ECEF Frame
	RMSE			SEP	RMSE			SEP
	East	North	Up	SEP	X	Y	Z	SEP
GVINS	0.661	0.477	0.296	0.683	0.834	0.766	0.581	1.002
Proposed	0.524	0.360	0.725	0.761	0.522	0.440	0.466	0.651

Table 2. RMSE[m] and SEP[m] statistic of GVINS and our proposed method in the complex environment.

	ENU Frame
	RMSE			SEP
	East	North	Up	SEP
GVINS	1.091	0.782	3.016	2.540
Proposed	1.230	1.404	2.499	2.390

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Dong, B.; Zhang, K. A Tightly Coupled Visual-Inertial GNSS State Estimator Based on Point-Line Feature. Sensors 2022, 22, 3391. https://doi.org/10.3390/s22093391

AMA Style

Dong B, Zhang K. A Tightly Coupled Visual-Inertial GNSS State Estimator Based on Point-Line Feature. Sensors. 2022; 22(9):3391. https://doi.org/10.3390/s22093391

Chicago/Turabian Style

Dong, Bo, and Kai Zhang. 2022. "A Tightly Coupled Visual-Inertial GNSS State Estimator Based on Point-Line Feature" Sensors 22, no. 9: 3391. https://doi.org/10.3390/s22093391

APA Style

Dong, B., & Zhang, K. (2022). A Tightly Coupled Visual-Inertial GNSS State Estimator Based on Point-Line Feature. Sensors, 22(9), 3391. https://doi.org/10.3390/s22093391

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Tightly Coupled Visual-Inertial GNSS State Estimator Based on Point-Line Feature

Abstract

1. Introduction

2. Methods

2.1. Frames and Notations

2.2. Line Feature

2.2.1. Plücker Coordinates

2.2.2. Line Feature Triangulation

2.2.3. Orthonormal Representation

2.2.4. Line Feature Reprojection Residual

2.3. GNSS Measurements

2.3.1. Pseudorange Measurement

2.3.2. Carrier Phase Measurement

2.3.3. Doppler Frequency Shift Measurement

2.4. Carrier Phase Smoothed Pseudorange

2.5. Factor Graph Representation

2.6. GNSS-IMU Calibration

2.7. Observability Analysis of Tightly Coupled GNSS-VIO System

3. System Overview

4. Experimental Results

4.1. Sports Field

4.2. Complex Environment

4.3. Observability Analysis

4.4. Smoothed Interval

4.5. Line Feature Tracking Threshold

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

Appendix B

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI