**4. Evaluation Results**

As analyzed in the previous section, the measurement-update phase with operations of large matrixes is the bottleneck for real-time applications. However, the measurement-update phase of an information-based filter is parallelizable and can be implemented on a parallel processor. We used a laptop with the Geforce 940M GPU (NVIDIA, Santa Clara, CA, USA) and the i7-5600U CPU (Intel, Santa Clara, CA, USA) to test the proposed MOC*H*∞IF-VIO system. We implemented our method with CUDA (in Supplementary Materials) (Version 8.0, NVIDIA, Santa Clara, CA, USA, 2016) language. The time-update phase was executed on the CPU and the measurement-update phase was executed on the GPU. We used the Technical University of Munich (TUM) RGBD dataset [37] for evaluation. The TUM RGBD dataset contains multiple sequences of videos collected using an RGBD camera under different environmental conditions. The RGB images and depth images are collected at the rate of 30 Hz with a resolution of 640 × 480. The TUM dataset does not contain IMU measurements, we simply used the differential and quadratic differential of the ground truth with the addition of random noises as the simulated IMU measurements.

We compared the proposed MC*H*∞IF-VIO method with RGBDSLAM v2.0 in [38] and the lightweight visual tracking (LVT) in [12]. The results of RGBDSLAM were calculated with the corresponding ROS package, and the results of the LVT were directly obtained from the original paper. We conducted the evaluation for MC*H*∞IF-VIO with five different patch sizes: 2, 4, 8, 16, and 32. The numbers were selected as powers of two for the convenience of CUDA implementation. The absolute trajectory (ATE) and relative pose error (RPE) defined in [37] were used as error metrics. The ATE measures the global consistency of the estimated trajectory and is suitable for visual SLAM methods. On the other hand, the RPE measures the local accuracy of the trajectory over a fixed time interval and is suited for visual odometry methods. We used five sequences in the TUM dataset for testing the three methods. The basic parameters of the five sequences are shown in Table 3. The test results are shown in Figures 3–7. In these figures, MC*H*∞IF-VIO(*n*) represents the MC*H*∞IF-VIO with a patch size of *n*.


**Table 3.** Parameters of the used sequences.

The RPE of MC*H*∞IF-VIO increased along with the patch size in all of the tests. The change in the ATE of MC*H*∞IF-VIO was similar to RPE with some exceptions. This is in line with the common knowledge that more measurements lead to better accuracy in visual navigation. The MC*H*∞IF-VIO in this paper and LVT are dead-reckoning algorithms, while RGBDSLAM is a compete SLAM algorithm with joint optimization and loop-closure. Therefore, RGBDSLAM is more consistent globally than MC*H*∞IF-VIO and LVT according to the ATE results. In our tests, MC*H*∞IF-VIO achieved a smaller ATE than LVT with only one exception, where the ATE of MC*H*∞IF-VIO(32) was larger than LVT in the test on *fr1\_room*. MC*H*∞IF-VIO(2) had the smallest RPE in all of the tests. The estimation errors caused by the IMU measurement noises increased over time. Moreover, the simple frame-to-frame alignment used in the measurement-update phase was less robust than the optimization with the global and local map used in RGBDSLAM and LVT. Therefore, the performance of MC*H*∞IF-VIO in the test with long duration such as *fr3\_office* was less distinctive.

**Figure 3.** Test results on the sequence *fr1\_desk*: (**a**) ATE; (**b**) RPE.

**Figure 4.** Test results on the sequence *fr1\_desk2*: (**a**) ATE; (**b**) RPE.

**Figure 5.** Test results on the sequence *fr1\_room*: (**a**) ATE; (**b**) RPE.

**Figure 7.** Test results on the sequence *fr3\_office*: (**a**) ATE; (**b**) RPE.

#### **5. Conclusions**

In this paper, we presented our visual–inertial navigation system called MC*H*∞IF-VIO, which uses a raw intensity-based measurement model. By analyzing the integrated navigation system, we constructed a numerically stable mixed-degree cubature information filter scheme for the state estimation problem. The *H*∞ filter was combined for the non-Gaussian noises in the intensity measurements. The proposed VIO system was suitable for the RGBD camera–IMU system. The system was evaluated on the TUM RGBD dataset and compared with the RGBDSLAM and LVT systems.

Given the raw visual measurement model with intensities directly used as measurements in frame-to-frame alignment, the proposed MC*H*∞IF-VIO system with consideration of the system nonlinearity and non-Gaussian measurement noises was shown to achieve good performance in our tests. Though the method prefers short durations, the implementation with a patch size of two outperformed the RGBDSLAM and LVT in the long-duration test. With the aid of adjustable patch size, the proposed method could be tuned from an accurate-dense algorithm to a fast-sparse algorithm.

This paper aimed to build a robust state-estimation framework according to the fundamental characteristics of the visual–inertial navigation system. The camera measurements were used in a raw and direct way without heuristics in selecting the feature descriptors or tracking a sliding window. Other well-designed intensity-based and feature-based measurement models can be applied in this hybrid filter-based framework.

**Supplementary Materials:** The CUDA code used in the evaluations is available online at https://1drv.ms/f/s! ApzqIuEnpaPHiDMOyneS8UV2KO6N.

**Author Contributions:** Investigation, C.S. and X.W.; Methodology, C.S.; Project Administration, X.W.; Resources, N.C.; Software, C.S.; Supervision, N.C.; Writing – Original Draft, C.S.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflict of interest.
