**1. Introduction**

Autonomous driving, unmanned aerial vehicles (UAVs), and the Internet of Things (IoT) are technologies that have developed rapidly in recent years. Precise navigation and positioning in complex environments are receiving increasing attention. However, any one sensor alone is not able to provide position solutions with high accuracy, availability, reliability, and continuity at any time and in all environments [1]. The integration of different sensors, for example, the integration of GNSS, INS, and vision, has become a trend [2–4].

GNSS (global navigation satellite system) is an efficient tool for providing precise positioning regardless of time and location and it is widely used in transportation. Zumberge et al. [5] proposed the precise point positioning (PPP) technique, which has received much attention in recent years for its low costs, global coverage, and high accuracy [6,7]. Though PPP can provide centimeter-level positioning for real-time kinematic applications, a nearly 30 min convergence time has limited its applications in UAVs and other technologies. Thus, great efforts have been focused on improving the PPP performance, especially to accelerate its convergence, and promoting various methods, e.g., multi-GNSS combination, ambiguity resolution, and atmospheric augmentation. Lou et al. [8] presented a comprehensive analysis of quad-constellations with PPP. The results showed that in comparison with the GPS-only solution, the four-system combined PPP can reduce the convergence

**Citation:** Gu, S.; Dai, C.; Mao, F.; Fang, W. Integration of Multi-GNSS PPP-RTK/INS/Vision with a Cascading Kalman Filter for Vehicle Navigation in Urban Areas. *Remote Sens.* **2022**, *14*, 4337. https://doi.org/ 10.3390/rs14174337

Academic Editor: Andrzej Stateczny

Received: 13 August 2022 Accepted: 26 August 2022 Published: 1 September 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

time by more than 60% on average in kinematic mode. For ambiguity resolution, Ge et al. [9] proposed the uncalibrated phase delay (UPD) method. Then, Laurichesse et al. [10] proposed the integer phase clock method, and Collins et al. [11] proposed the decoupled clock model to facilitate PPP ambiguity resolution (PPP-AR). It was proved that these three PPP-AR methods can dramatically accelerate the convergence and improve the positioning accuracy of PPP [9–12]. The undifferenced and uncombined data processing strategy has received increasing interest [13–18]. First proposed by Gerhard Wübbena et al. [19], PPP with ambiguity resolution and atmospheric augmentation is nowadays widely regarded as PPP-RTK (real-time kinematic). As PPP-RTK weakens the influence of the long convergence time in PPP and regional service coverage in RTK, it is regarded as a promising technique for high-precision navigation in mass market applications, including vehicle platforms. As a result, some regional authorities have developed their own PPP-RTK augmentation services, e.g., QZSS centimeter-level accuracy service (QZSS CLAS) began offering PPP-RTK services in 2018 [20]. Chinese BDS also intends to provide its satellite-based PPP-RTK service in the future.

However, the performance of GNSS is limited by non-line-of-sight (NLOS) conditions, which means PPP-RTK cannot work very well in challenging environments such as urban areas [21]. When the satellite signals are blocked by buildings or other structures, PPP-RTK fails to provide positioning results if there are less than four satellites available and the performance is terrible due to frequent re-convergence and gross errors. Inertial navigation systems (INS) are immune to interference and can output navigation states continuously without external information. However, the accuracy degrades fast over time due to the accumulated errors. Integrating GNSS with INS can minimize their respective drawbacks and improve the performance of GNSS or INS alone. There are two common integration strategies for PPP with INS, tightly coupled integration and loosely coupled integration [22]. Moreover, it has been proved that the tightly coupled integration of PPP/INS performs better than loosely coupled integration, especially under GNSSchallenged environments [23]. Furthermore, Rabbou M A [24] studied the integration of GPS PPP and MEMS (micro-electro-mechanical System)-based inertial system, the results of which suggested that decimeter-level positioning accuracy was achievable for GPS outages within 30 s. Gao et al. [23] analyzed the integration of multi-GNSS PPP with MEMS IMU. The results showed that the position RMS improved from 23.3 cm, 19.8 cm, and 14.9 cm for the GPS PPP/INS tightly coupled integration to 7.9 cm, 3.3 cm, and 5.1 cm for the multi-GNSS PPP/INS in the north, east, and up components, respectively. PPP-AR/INS tightly coupled integration is able to realize stable centimeter-level positioning after the first AR and achieve fast re-convergence and re-resolving after a short period of GNSS outage [25]. Han et al. [26] analyzed the performance of the tightly coupled integration of RTK/INS constrained with the ionospheric and tropospheric models. Gu et al. [27] realized the tightly coupled integration of multi-GNSS PPP/INS with atmospheric augmentation. Taking the advantages of PPP-RTK over PPP and RTK into consideration, the integration of multi-GNSS PPP-RTK and MEMS IMU still needs further research.

The performance of GNSS/INS tightly coupled integration could deteriorate even if there were short periods of GNSS signal outages and the positioning accuracy was terrible during long periods of GNSS signal outages, as the drift of INS accumulates rapidly. Therefore, other aiding sensors are required to limit the drift errors of INS when the GNSS signals are blocked. On the one hand, a camera is suitable for solving this problem since visual odometry (VO) can estimate the motion of a vehicle with a slow drift. On the other hand, the model of the monocular camera is relatively simple, but it lacks the metric scale, which can be recovered by IMU. Consequently, a monocular camera is usually integrated with IMU to achieve accurate pose estimations. The fusion algorithms of IMU and vision are usually based on an extended Kalman filter (EKF) [28–30] or nonlinear optimization [31,32]. The former method usually carries out linearization only once, so there may be obvious linearization errors for vision data processing. The latter method utilizes iterative linearization, which can achieve higher estimation accuracy but is subject to an increased computational burden. The multi-state constraint Kalman filter (MSCKF) is a popular EKF-based visual–inertial odometry (VIO) approach, which is capable of high-precision pose estimations in large-scale real-world environments [28]. MSCKF maintains several previous camera poses in the state vector by a sliding window and forms the constraints among multiple camera poses by using visual measurements of the same feature point across multiple camera views. Accordingly, the computational complexity is linear with the number of feature points.

On the one hand, VIO can provide accurate pose estimations when GNSS is unavailable. On the other hand, VIO or VI-SLAM (simultaneous localization and mapping) can only achieve an estimation of motion and provide the relative position and attitude and there are unavoidable accumulated drifts over time. Consequently, the integration of GNSS, INS, and vision is receiving increasing interest [33–36]. Kim et al., 2005 used a six-degreesof-freedom (DOF) SLAM to aid GNSS/INS navigation by providing reliable navigation solutions in denied and unknown environments GNSS. Then, Won et al. integrated GNSS with vision for low GNSS visibility [34], and proposed the selective integration of GNSS, INS, and vision under GNSS-challenged environments [2], which was able to improve the positioning accuracy compared with nonselective integration. However, in most of these studies, only the position provided by the GNSS or pseudo-range measurements were utilized in the fusion of GNSS, INS, and vision. The application of the carrier phase in multi-sensor fusion is less studied. More recently, Liu [35] proposed the tightly coupled integration of a GNSS/INS/stereo vision/map matching system for land vehicle navigation, but only the positioning results of PPP were integrated with the INS, stereo vision, and map matching system. Li et al. [36] further conducted the tightly coupled integration of RTK, MEMS-IMU, and monocular cameras. Obviously, more efforts should be focused on PPP-RTK/INS/vision integration to fully explore the potential of GNSS for further research.

There are many dynamic objects in urban areas which interfere with VIO. Dynamic objects can provide dynamic feature points, but mainstream SLAM uses static feature points to recover the motion. There are a lot of researches about dynamic object removal in VIO or SLAM, but most of them mainly focus on vision [37,38]. Thus, a simple dynamic feature points removal model based on position is proposed in this paper with the help of GNSS. As VIO has accumulation errors, a model based on position does not work well without GNSS.

This paper aims to evaluate the navigation performance of the integration of multi-GNSS PPP-RTK, MEMS-IMU, and monocular cameras with a cascading filter and the dynamic object removal algorithm in urban areas. The remainder of this paper is organized as follows: first, Section 2 presents the mathematical models of PPP-RTK, the MEMS-IMU, and monocular camera integration based on the MSCKF, integration of multi-GNSS PPP-RTK, INS, and vision as well as the dynamic object removal model, and introduces the structure of the proposed model. Then, the details of the test are demonstrated in Section 3 and the efficiency of different techniques in urban vehicle navigation is analyzed in Section 4. Finally, Section 5 presents the conclusions.

#### **2. Methods**

The undifferenced and uncombined PPP-RTK, INS model, PPP-RTK/INS tightly coupled integration model, as well as the vision model, are presented in this section in order to derive the integration model of multi-GNSS PPP-RTK/INS/vision with a cascading filter. According to the suggestion in RINEX 3.02 (https://kb.igs.org/hc/en-us/articles/ 115003980628-RINEX-3-02 (accessed on 12 August 2022)), the GPS and BDS systems are denoted as G and C, respectively.

*2.1. PPP-RTK Model*

The raw observations of the GNSS pseudo-range and carrier phase can be expressed as follows [39]:

$$\begin{cases} P\_{r,f}^s = \rho\_r^s + t\_{r,sys} + a\_r^s T\_Z + \frac{40.3}{f^2} \gamma\_r^s I\_r^s - b^{s,f} + b\_{r,f} + \varepsilon\_P\\ \Phi\_{r,f}^s = \rho\_r^s + t\_{r,sys} + a\_r^s T\_Z - \frac{40.3}{f^2} \gamma\_r^s I\_r^s + \lambda N\_{r,f}^s + \varepsilon\_\Phi \end{cases} \tag{1}$$

in which *P<sup>s</sup> <sup>r</sup>*, *<sup>f</sup>* and <sup>Φ</sup>*<sup>s</sup> <sup>r</sup>*, *<sup>f</sup>* are the pseudo-range and carrier phase at frequency *f* corresponding to receiver *r* and satellite *s* in length units, respectively; *ρ<sup>s</sup> <sup>r</sup>* is the geometric distance between receiver *r* and satellite *s*; *tr*,*sys* is the receiver clock error corresponding to the system *sys* ∈ (*G C*) in the length units, respectively; *<sup>α</sup><sup>s</sup> <sup>r</sup>* and *γ<sup>s</sup> <sup>r</sup>* are the mapping functions of the tropospheric and ionospheric delays, respectively; *TZ* and *I<sup>s</sup> <sup>r</sup>* stand for the zenith tropospheric delay and the zenith total electron content (TEC); *bs*, *<sup>f</sup>* and *br*, *<sup>f</sup>* denote the hardware delay for satellite *s* and receiver *r*, respectively; *λ* and *N<sup>s</sup> <sup>r</sup>*, *<sup>f</sup>* are the carrier phase wavelength and float ambiguity; *<sup>ε</sup><sup>p</sup>* and *<sup>ε</sup>*<sup>Φ</sup> represent the measurement noise of pseudo-range and carrier phase including the unmodeled multipath error, respectively. Additionally, it is assumed that other errors, such as satellite orbit and clock errors and relativistic effects, are corrected in advance.

After correcting the hardware delay for the satellite and linearization, Equation (1) can be written as

$$\begin{aligned} \Delta P\_{r,f}^{s} &= h\_{s}^{r} \delta \mathbf{x}\_{\text{GNSS}}^{\mathbf{r}} + t\_{r, \text{sys}} + a\_{r}^{s} \delta T\_{\text{w}} + \frac{40.3}{f^{2}} \gamma\_{r}^{s} I\_{r}^{s} + b\_{r,f} + \varepsilon\_{p} \\ \Delta \Phi\_{r,f}^{s} &= h\_{s}^{r} \delta \mathbf{x}\_{\text{GNSS}}^{\mathbf{r}} + t\_{r, \text{sys}} + a\_{r}^{s} \delta T\_{\text{w}} - \frac{40.3}{f^{2}} \gamma\_{r}^{s} I\_{r}^{s} + \lambda N\_{r,f}^{s} + \varepsilon\_{\Phi} \end{aligned} \tag{2}$$

where Δ*P<sup>s</sup> <sup>r</sup>*, *<sup>f</sup>* and ΔΦ*<sup>s</sup> <sup>r</sup>*, *<sup>f</sup>* are the OMC (observed-minus-computed) of the pseudo-range and carrier phase, respectively; superscript · *<sup>e</sup>* represents the *e*-frame (earth-centered earthfixed frame); *<sup>δ</sup>xe GNSS* and *<sup>h</sup><sup>s</sup> <sup>r</sup>* are the correction vectors of the receiver position and the corresponding direction cosine vector; δ*Tw* denotes the residual of the zenith tropospheric wet delay. Additionally, the DESIGN (deterministic plus stochastic ionosphere model for GNSS) model is adopted in this study as [39,40]

$$\begin{aligned} I\_r^s &= a\_0 + a\_1 dL + a\_2 dB + a\_3 dL^2 + a\_4 dB^2 + r\_r^s \\ \tilde{I}\_r^s &= a\_0 + a\_1 dL + a\_2 dB + a\_3 dL^2 + a\_4 dB^2 + r\_r^s + \varepsilon\_{\overline{I}\_r^s} \end{aligned} \tag{3}$$

in which *I<sup>s</sup> <sup>r</sup>* is the virtual observation of ionospheric delay and can be obtained from the ionospheric delay prior models of high-precision ionospheric products; *ai* (*i* = 0, 1, 2, 3, 4) describes the spatial distribution of the ionospheric delay; and *dL* and *dB* represent the difference in longitude and latitude between the approximate location of the station and the ionospheric pierce point (IPP), respectively. *r<sup>s</sup> <sup>r</sup>* describes the stochastic behavior of the ionospheric delay in the time domain and *<sup>ε</sup>I<sup>s</sup> <sup>r</sup>* is the corresponding noise of the virtual observation.

Then, the state vector *xPPP*−*RTK* can be written as

$$\mathbf{x}\_{\text{PPP}-\text{RTK}} = \begin{pmatrix} \delta \mathbf{x}\_{\text{GNSS}}^{\mathbf{r}} & t\_r & \delta T\_w & b\_r & \mathbf{N}\_r & a\_r \mathbf{r}\_r \end{pmatrix}^T \tag{4}$$

where *Nr* <sup>=</sup> *Nr*,1 *Nr*,2*<sup>T</sup>* denotes the float ambiguity on frequency *<sup>f</sup>*<sup>1</sup> and *<sup>f</sup>*2; *ar* <sup>=</sup> (*a*<sup>0</sup> *<sup>a</sup>*<sup>1</sup> *a*<sup>2</sup> *a*<sup>3</sup> *a*4) <sup>T</sup> and *rr* <sup>=</sup> - *r*1 *<sup>r</sup>* ...*r j r <sup>T</sup>* means the deterministic and stochastic parameters of the DESIGN, respectively.

The float ambiguity *N<sup>s</sup> <sup>r</sup>*, *<sup>f</sup>* should be further formulated for the PPP ambiguity resolution. It can be expressed as

$$N\_{r,f}^{s} = n - d\_r + d^s \tag{5}$$

in which *n* means the integer ambiguity and *dr* and *d<sup>s</sup>* are the UPD for the receiver and satellite. After the float ambiguity is obtained by Equation (2), the UPD can be removed and then the integer property of the ambiguity can be recovered. Moreover, the LAMBDA (leastsquares ambiguity decorrelation adjustment) method is applied to search for the optimal fixed value of the ambiguity [41]. Finally, the integer ambiguity is used as constraints to obtain the PPP solution with fixed ambiguity.
