*4.2. IMU Factor*

The IMU state of the *k*th frame and the *k* + 1th frame in the global coordinate system can be defined as:

$$\begin{array}{lcl}\mathbf{x}\_{k} = \left[p\_{b\_{k}}^{\mathrm{G}}, q\_{b\_{k}}^{\mathrm{G}}, \boldsymbol{\upsilon}\_{b\_{k}}^{\mathrm{G}} b\_{ak}, b\_{\mathcal{g}k}\right] \\ \mathbf{x}\_{k+1} = \left[p\_{b\_{k+1}}^{\mathrm{G}}, q\_{b\_{k+1}}^{\mathrm{G}}, \boldsymbol{\upsilon}\_{b\_{k+1}}^{\mathrm{G}}, b\_{ak+1}, b\_{\mathcal{g}k+1}\right] \end{array} \tag{9}$$

Take the IMU state of the *k*th frame, *xk*, as an example, which includes position *p<sup>G</sup> bk* , rotation *q<sup>G</sup> bk* , velocity *v<sup>G</sup> bk* , accelerometer bias *bak* and gyroscope bias *bgk*.

Next, the IMU residual equation can be constructed, which is defined as:

$$r\_B \left( \boldsymbol{z}\_{k+1}^k, \boldsymbol{\mathcal{X}} \right) = \begin{bmatrix} r\_p \\ r\_q \\ r\_v \\ r\_{ba} \\ r\_{b\mathbf{\tilde{x}}} \end{bmatrix} = \begin{bmatrix} \boldsymbol{R}\_G^{B\_k} \left( \boldsymbol{p}\_{b\_{k+1}}^G - \boldsymbol{p}\_{b\_k}^G + \frac{1}{2} \boldsymbol{g} \boldsymbol{\Delta} \boldsymbol{t}\_k^2 - \boldsymbol{v}\_{b\_k}^G \boldsymbol{\Delta} \boldsymbol{t}\_k \right) - \boldsymbol{\mathcal{P}}\_{k+1}^k \\\ 2 \left[ \boldsymbol{q}\_{b\_k}^{G-1} \otimes \boldsymbol{q}\_{b\_{k+1}}^G \otimes \boldsymbol{q}\_{k+1}^{k-1} \right]\_{\mathbf{xyz}} \\\ \boldsymbol{\mathcal{R}}\_G^{B\_k} \left( \boldsymbol{v}\_{k+1}^G + \boldsymbol{g} \boldsymbol{\Delta} \boldsymbol{t}\_k - \boldsymbol{v}\_k^G \right) - \boldsymbol{\mathcal{P}}\_{k+1}^k \\\ \boldsymbol{b}\_{ak+1} - \boldsymbol{b}\_{ak} \\\ \boldsymbol{b}\_{3k+1} - \boldsymbol{b}\_{3k} \end{bmatrix} \tag{10}$$

where *rp*,*rq*,*rv*,*rba*,*rbg<sup>T</sup>* represents the observation residual of IMU state between two consecutive keyframes in the sliding window, including the residual of position, rotation, velocity, accelerometer bias and gyroscope bias, *RBk <sup>G</sup>* represents the pose conversion matrix of the *k*th frame from the IMU coordinate system to GNSS global coordinate system, and *p*ˆ*k <sup>k</sup>*+1, *q*ˆ *k <sup>k</sup>*+1, *<sup>v</sup>*ˆ*<sup>k</sup> k*+1 represents the IMU pre-integration value of two keyframes in the sliding window within Δ*tk*.

#### *4.3. Visual Feature Factor*

The visual feature factor is essentially the re-projection error of the visual feature, that is, the difference between the theoretical value projected on the image plane and the actual observation value. In order to unify the coordinate system in Section 3.3, we provide the definition of re-projection error on the unit sphere instead of the generalized image plane. Specific schematic diagrams are shown in Figures 6 and 7.

**Figure 6.** Re-projection error of visual point features.

**Figure 7.** Re-projection error of visual line features.

4.3.1. Visual Point Feature Factor

In this study, the visual feature factors are built with reference to VINS-Mono [5]. As shown in Figure 6, the re-projection error of visual point features can be defined as the difference between the projection point on the unit spherical surface and the observation value after distortion correction. Given the *i*th normalized projection point ˆ *f j <sup>i</sup>* = *u*ˆ *j i* , *v*ˆ *j i* , 1*<sup>T</sup>* and observation point *f j <sup>i</sup>* = *uj i* , *v j i* , 1*<sup>T</sup>* in the *j*th frame, we use the first observation value *f j <sup>i</sup>* = *uj <sup>i</sup>*0, *v j i*0, 1*<sup>T</sup>* in the *j*th frame to define the visual point feature factor as:

$$\begin{cases} \begin{array}{c} r\_f \left( \dot{z}\_i^j, \mathcal{X} \right) = \begin{bmatrix} u\_i^j - u\_i^j \\ \vartheta\_i^j - v\_i^j \end{bmatrix} \\\ \begin{bmatrix} u\_i^j \\ v\_i^j \end{bmatrix} = R\_B^V \left( R\_G^{B\_j} \begin{pmatrix} R\_{b\overline{\times}\_i}^G \begin{bmatrix} u\_{i0}^j \\ v\_{i0}^j \end{bmatrix} + p\_V^B \end{bmatrix} + p\_{b\overline{\times}\_i}^G \right) + p\_{b\overline{\times}\_i}^G \end{cases} \tag{11}$$

where *R<sup>V</sup> <sup>B</sup>* represents the external parameter matrix between camera and IMU, which is obtained by calibration, *<sup>R</sup>Bj <sup>G</sup>* represents the pose conversion matrix from the IMU observation in the *j*th frame to the global coordinate system, *R<sup>G</sup> <sup>B</sup>*<sup>0</sup> represents the pose conversion matrix from the global coordinate system to the initial IMU observation, *κ<sup>i</sup>* stands for the inverse depth of *f <sup>i</sup> <sup>j</sup>* , *<sup>p</sup><sup>B</sup> <sup>V</sup>* represents the displacement from the IMU coordinate system to the camera coordinate system. Finally, *p<sup>G</sup> <sup>b</sup>*<sup>0</sup> and *<sup>p</sup><sup>G</sup> bi* represent the displacement of the first and the *i*th IMU observation in the global coordinate system, respectively.

#### 4.3.2. Visual Line Feature Factor

As shown in Figure 7, similar to the visual point feature, the definition of the reprojection error of the visual line feature is as follows: Given the characteristics of a visual line in space, the end point of a line segment is the center of the sphere to construct a unit sphere. Therefore, the reprojection error is the difference between the projection line on the unit sphere and the observed value. According to Equation (2), given the observed value of the characteristic factor of the ith line in the *j*th frame in the camera coordinate system as *lc j <sup>i</sup>* = *nc j i* , *vc j i T* , the projection line is obtained by projecting it onto the unit sphere, and can be expressed as:

$$\hat{l}\_{ci}^{j} = \begin{bmatrix} \hat{l}\_1 \\ \hat{l}\_2 \\ f\_3 \end{bmatrix} = \mathbf{K}n\_{ci}^{j} \in \mathbb{R}^6 \tag{12}$$

where K is the camera internal reference projection matrix. It can be seen from Equation (12) that the spatial coordinates of the line features projected onto the unit sphere are only related to *nc*. The two end points of the observation line are *a j <sup>i</sup>* and *b j i* , then the re-projection error of the line feature can be expressed by the dotted distance from the two end points of the observation line feature to the projection line feature:

*l*

$$\begin{cases} \begin{aligned} r\left(\boldsymbol{\varepsilon}\_{i'}^{j},\boldsymbol{\mathcal{X}}\right) &= \left[d\left(a\_{i'}^{j},l\_{ci}^{j}\right),d\left(b\_{i'}^{j},l\_{ci}^{j}\right)\right]^T\\ d\left(a\_{i'}^{j},l\_{ci}^{j}\right) &= \frac{\left(a\_{i}^{j}\right)^{T}l\_{ci}^{j}}{\sqrt{l\_{1}^{2}+l\_{2}^{2}}}\\ d\left(b\_{i'}^{j},l\_{ci}^{j}\right) &= \frac{\left(b\_{i}^{j}\right)^{T}l\_{ci}^{j}}{\sqrt{l\_{1}^{2}+l\_{2}^{2}}} \end{aligned} \end{cases} \tag{13}$$
