*3.3. Experimental Analysis of Helmert Variance Component Estimation*

We ran OKVIS–Mono, VINS–Mono, PL–VIO and IPL–VIO systems on the EuRoc MAV datasets to evaluate the accuracy. Table 2 shows the trajectories' root mean square error (RMSE) of the translation part (m) and rotation part (degrees) of the four systems, the numbers in bold representing the estimated trajectory are more close to the groundtruth. Simultaneously, we made statistics of the histogram, which can be seen in Figure 8. As shown in Table 2, in terms of translation, the IPL–VIO system has higher accuracy than other systems on MH\_02\_easy, MH\_05\_difficult, V1\_03\_difficult, V2\_01\_easy, and V2\_02\_medium. In terms of rotation, the IPL–VIO system has higher accuracy on MH\_02\_easy, MH\_04\_difficult, V1\_03\_difficult, V2\_01\_easy, and V2\_02\_medium.


**Table 2.** The root mean square error (RMSE) results on several EuRoc MAV datasets.

extraction.

VIO improved from 0.26095 to 0.25248 m.

*Remote Sens.* **2020**, *12*, x FOR PEER REVIEW 13 of 19

*Remote Sens.* **2020**, *12*, x FOR PEER REVIEW 13 of 19

**Figure 8.** RMSEs for OKVIS–Mono, Vins–Mono without loop closure, PL–VIO, and the proposed IPL– VIO using the EuRoc MAV datasets. (**a**) RMSEs in translation. (**b**) RMSEs in rotation. **Figure 8.** RMSEs for OKVIS–Mono, Vins–Mono without loop closure, PL–VIO, and the proposed IPL–VIO using the EuRoc MAV datasets. (**a**) RMSEs in translation. (**b**) RMSEs in rotation. (**a**) (**b**) **Figure 8.** RMSEs for OKVIS–Mono, Vins–Mono without loop closure, PL–VIO, and the proposed IPL–

VIO using the EuRoc MAV datasets. (**a**) RMSEs in translation. (**b**) RMSEs in rotation.

IPL–VIO is 0.08778 m and the RMSE of the rotation part is 5.85792 degrees.

However, there are datasets in Table 2 whose accuracy decreases after the Helmert variance component method is used. As shown in Figure 9, in the V1\_01\_easy dataset, there are a large number of weak texture environments in the dataset scene, the quality of the extracted point features is relatively low. These still contain repetitive textures that make line features prone to the mismatch problem. Therefore, the RMSE of the translation part of PL–VIO is 0.07792 m and the RMSE of the rotation part is 5.82240 degrees. After using the Helmert variance component estimation, the results are susceptible to errors, resulting in a decrease in accuracy. The RMSE of the translation part of the IPL–VIO is 0.08778 m and the RMSE of the rotation part is 5.85792 degrees. However, there are datasets in Table 2 whose accuracy decreases after the Helmert variance component method is used. As shown in Figure 9, in the V1\_01\_easy dataset, there are a large number of weak texture environments in the dataset scene, the quality of the extracted point features is relatively low. These still contain repetitive textures that make line features prone to the mismatch problem. Therefore, the RMSE of the translation part of PL–VIO is 0.07792 m and the RMSE of the rotation part is 5.82240 degrees. After using the Helmert variance component estimation, the results are susceptible to errors, resulting in a decrease in accuracy. The RMSE of the translation part of the IPL–VIO is 0.08778 m and the RMSE of the rotation part is 5.85792 degrees. However, there are datasets in Table 2 whose accuracy decreases after the Helmert variance component method is used. As shown in Figure 9, in the V1\_01\_easy dataset, there are a large number of weak texture environments in the dataset scene, the quality of the extracted point features is relatively low. These still contain repetitive textures that make line features prone to the mismatch problem. Therefore, the RMSE of the translation part of PL–VIO is 0.07792 m and the RMSE of the rotation part is 5.82240 degrees. After using the Helmert variance component estimation, the results are susceptible to errors, resulting in a decrease in accuracy. The RMSE of the translation part of the

extraction. **Figure 9.** V1\_01\_easy visual feature extraction: (**a**) line features extraction, (**b**) point features **Figure 9.** V1\_01\_easy visual feature extraction: (**a**) line features extraction, (**b**) point features extraction.

Another representative dataset is MH\_03\_medium. Compared with VINS–Mono, the accuracy of PL–VIO with added line features decreased. This is because in MH\_03\_medium, there are mismatches of line features, as shown in Figure 10; the line features in the scene are also relatively short and fragmented, which increase error. However, it can be seen from Table 2 that after Helmert variance component estimation, compared with PL–VIO, the accuracy of the translation part of IPL– VIO improved from 0.26095 to 0.25248 m. Another representative dataset is MH\_03\_medium. Compared with VINS–Mono, the accuracy of PL–VIO with added line features decreased. This is because in MH\_03\_medium, there are mismatches of line features, as shown in Figure 10; the line features in the scene are also relatively short and fragmented, which increase error. However, it can be seen from Table 2 that after Helmert variance component estimation, compared with PL–VIO, the accuracy of the translation part of IPL– Another representative dataset is MH\_03\_medium. Compared with VINS–Mono, the accuracy ofPL–VIO with added line features decreased. This is because in MH\_03\_medium, there are mismatches of line features, as shown in Figure 10; the line features in the scene are also relatively short andfragmented, which increase error. However, it can be seen from Table <sup>2</sup> that after Helmert variance component estimation, compared with PL–VIO, the accuracy of the translation part of IPL–VIO improved from 0.26095 to 0.25248 m.

higher accuracy than PL–VIO.

*Remote Sens.* **2020**, *12*, x FOR PEER REVIEW 14 of 19

**Figure 10.** MH\_03\_medium line feature matching. The line features of the two frames at the previous time (**a**) and the next time (**b**) are matched. The line of the same color represents the corresponding matching line, and the red boxes on the left and right represent the mismatches of the line features. **Figure 10.** MH\_03\_medium line feature matching. The line features of the two frames at the previous time (**a**) and the next time (**b**) are matched. The line of the same color represents the corresponding matching line, and the red boxes on the left and right represent the mismatches of the line features. (**a**) (**b**) **Figure 10.** MH\_03\_medium line feature matching. The line features of the two frames at the previous time (**a**) and the next time (**b**) are matched. The line of the same color represents the corresponding

matching line, and the red boxes on the left and right represent the mismatches of the line features.

In order to show a more intuitive result, we have drawn the trajectory estimation heat map of both PL–VIO and IPL–VIO in a same figure for the MH\_05\_difficult and V2\_02\_medium datasets. As shown in Figures 11 and 12, the more reddish the figure, the larger the translation error of the trajectory. It can be seen that by adjusting the weights of the point and line features, the IPL–VIO has higher accuracy than PL–VIO. In order to show a more intuitive result, we have drawn the trajectory estimation heat map of both PL–VIO and IPL–VIO in a same figure for the MH\_05\_difficult and V2\_02\_medium datasets. As shown in Figures 11 and 12, the more reddish the figure, the larger the translation error of the trajectory. It can be seen that by adjusting the weights of the point and line features, the IPL–VIO has higher accuracy than PL–VIO. In order to show a more intuitive result, we have drawn the trajectory estimation heat map of both PL–VIO and IPL–VIO in a same figure for the MH\_05\_difficult and V2\_02\_medium datasets. As shown in Figures 11 and 12, the more reddish the figure, the larger the translation error of the trajectory. It can be seen that by adjusting the weights of the point and line features, the IPL–VIO has

MH\_05\_difficult dataset: (**a**) PL–VIO trajectory error details, overall diagram, and bird's eye view; (**b**) IPL–VIO trajectory error details, overall diagram, and bird's eye view. **Figure 11.** Comparison of trajectory translation errors between IPL–VIO and PL–VIO for the MH\_05\_difficult dataset: (**a**) PL–VIO trajectory error details, overall diagram, and bird's eye view; (**b**) IPL–VIO trajectory error details, overall diagram, and bird's eye view. **Figure 11.** Comparison of trajectory translation errors between IPL–VIO and PL–VIO for the MH\_05\_difficult dataset: (**a**) PL–VIO trajectory error details, overall diagram, and bird's eye view; (**b**) IPL–VIO trajectory error details, overall diagram, and bird's eye view.

**Figure 11.** Comparison of trajectory translation errors between IPL–VIO and PL–VIO for the

**Figure 12.** Comparison of trajectory translation errors between IPL–VIO and PL–VIO for the V2\_02\_medium dataset: (**a**) PL–VIO trajectory error details, overall diagram, and bird's eye view; (**b**) IPL–VIO trajectory error details, overall diagram, and bird's eye view. **Figure 12.** Comparison of trajectory translation errors between IPL–VIO and PL–VIO for the V2\_02\_medium dataset: (**a**) PL–VIO trajectory error details, overall diagram, and bird's eye view; (**b**) IPL–VIO trajectory error details, overall diagram, and bird's eye view.

(**b**)

When the carrier undergoes significant rotation changes or runs along straight lines, as shown in Figure 11a,b, using the Helmert variance component to estimate the weights of the points and lines, the trajectory accuracy can be significantly improved. From Figure 12a,b, we can see that for continuous rapid rotation changes, we can effectively improve the accuracy by adjusting the weights of point features and line features. When the carrier undergoes significant rotation changes or runs along straight lines, as shown in Figure 11a,b, using the Helmert variance component to estimate the weights of the points and lines, the trajectory accuracy can be significantly improved. From Figure 12a,b, we can see that for continuous rapid rotation changes, we can effectively improve the accuracy by adjusting the weights of point features and line features.

The PennCOSYVIO dataset contains various scenes such as obvious changes in lighting, rapid rotation, and repeated texture. For these challenges, the point and line features have different characteristics, so we used this dataset to compare and analyze the accuracy and time consumption of PL–VIO and IPL–VIO. The PennCOSYVIO dataset contains various scenes such as obvious changes in lighting, rapid rotation, and repeated texture. For these challenges, the point and line features have different characteristics, so we used this dataset to compare and analyze the accuracy and time consumption of PL–VIO and IPL–VIO.

It can be seen from Figure 13 that the dataset contains a large number of repetitive linear textures and scenes with changes in light, illumination, and darkness, which can fully verify the method proposed in this article. We used the Helmert variance component estimation method to weight the two visual features, and the accuracy of the trajectory can be significantly improved. As shown in Table 3, we compared APE and RPE of the trajectory after running PL–VIO and IPL–VIO. The rotation errors for the APE and RPE are expressed in degrees. The translation errors are expressed in the *x*, *y*, *z* axes, and the APE of translation part is expressed in meters, while the RPE of translation part is expressed in percentages. The numbers in bold, representing the estimated trajectory, are closer to the groundtruth. We can see that the trajectory accuracy has a significant improvement when compared to APE and RPE.

(**a**) (**b**)

It can be seen from Figure 13 that the dataset contains a large number of repetitive linear textures and scenes with changes in light, illumination, and darkness, which can fully verify the method proposed in this article. We used the Helmert variance component estimation method to weight the two visual features, and the accuracy of the trajectory can be significantly improved. As shown in Table 3, we compared APE and RPE of the trajectory after running PL–VIO and IPL–VIO. The rotation errors for the APE and RPE are expressed in degrees. The translation errors are expressed in the *x*, *y*,

**4. Discussion** 

of PL–VIO and IPL–VIO.

of point features and line features.

(a)

IPL–VIO trajectory error details, overall diagram, and bird's eye view.

(**a**)

(**b**)

**Figure 12.** Comparison of trajectory translation errors between IPL–VIO and PL–VIO for the V2\_02\_medium dataset: (**a**) PL–VIO trajectory error details, overall diagram, and bird's eye view; (**b**)

When the carrier undergoes significant rotation changes or runs along straight lines, as shown in Figure 11a,b, using the Helmert variance component to estimate the weights of the points and lines, the trajectory accuracy can be significantly improved. From Figure 12a,b, we can see that for continuous rapid rotation changes, we can effectively improve the accuracy by adjusting the weights

The PennCOSYVIO dataset contains various scenes such as obvious changes in lighting, rapid

**Figure 13.** Point and line features matching in the PennCOSYVIO dataset: (**a**,**b**) are the matching of line features, and the line of the same color is the matched line feature; (**c**,**d**) are the matching of point features, the point of the same color is tracked by the optical flow [12]. **Figure 13.** Point and line features matching in the PennCOSYVIO dataset: (**a**,**b**) are the matching of line features, and the line of the same color is the matched line feature; (**c**,**d**) are the matching of point features, the point of the same color is tracked by the optical flow [12].

**Table 3.** Absolute and relative pose error (APE and RPE) of the trajectory by running PL–VIO and IPL–VIO on the PennCOSYVIO dataset. **Table 3.** Absolute and relative pose error (APE and RPE) of the trajectory by running PL–VIO and IPL–VIO on the PennCOSYVIO dataset.


Table 4 shows the time consumption of each module in IPL–VIO. It can be seen that for the average time per frame of line feature extraction and matching, the original method takes 74 ms; the method proposed in this article takes 60 ms. At the back end, without the Helmert variance component estimation method, it takes 23 ms, and using the Helmert variance component estimation method, it takes 24 ms. Thus, the time increase is negligible. Table <sup>4</sup> shows the time consumption of each module in IPL–VIO. It can be seen that for the averagetime per frame of line feature extraction and matching, the original method takes 74 ms; the methodproposed in this article takes 60 ms. At the back end, without the Helmert variance componentestimation method, it takes 23 ms, and using the Helmert variance component estimation method, ittakes 24 ms. Thus, the time increase is negligible.

**Table 4.** The running time of each module of PL–VIO and IPL–VIO. **Table 4.** The running time of each module of PL–VIO and IPL–VIO.


features. Secondly, the Helmert variance component estimation method was introduced in the sliding window optimization, which ensured that more reasonable weights can be assigned for point features and line features. Compared with point features, line features are high-dimensional visual feature information that contain structured and geometric information, but matching line features is more time consuming. Thus, our proposed line feature matching method can shorten the matching time without any loss of accuracy. In addition, in the sliding window optimization, we used the

In this paper, an improved point line coupled VIO system (IPL–VIO) was proposed. IPL–VIO has two main improvements. Firstly, geometric information such as the position and angle of the line

#### **4. Discussion**

In this paper, an improved point line coupled VIO system (IPL–VIO) was proposed. IPL–VIO has two main improvements. Firstly, geometric information such as the position and angle of the line feature and the gray information of the pixels around the line features were explored. We comprehensively used the geometric information and correlation coefficient to match the line features. Secondly, the Helmert variance component estimation method was introduced in the sliding window optimization, which ensured that more reasonable weights can be assigned for point features and line features. Compared with point features, line features are high-dimensional visual feature information that contain structured and geometric information, but matching line features is more time consuming. Thus, our proposed line feature matching method can shorten the matching time without any loss of accuracy. In addition, in the sliding window optimization, we used the Helmert variance component estimation method to determine more reasonable posterior weights for point features and line features, and improved the accuracy of visual information in the VIO system.

In order to verify the effectiveness of the proposed IPL–VIO system, a series of experiments were conducted. The improved line feature matching method was compared with the traditional LBD descriptor matching method, and the EuRoc MAV datasets were used for verification. As is shown, the improved matching method had the same accuracy as the traditional method, but reduced the running time to about a quarter of the traditional one. We compared and analyzed IPL–VIO with the current mainstream VIO systems: OKVIS–Mono, VINS–Mono, and PL–VIO. The test results on the EuRoc MAV datasets showed that the proposed IPL–VIO system performed well on most datasets when compared to other systems. There are also datasets with reduced accuracy, such as the V1\_01\_easy dataset, where there are a large number of weak texture and repetitive texture environments in the dataset scenes; the quality of point features and line features is both poor, after adjusting the weights, and the accuracy of the trajectory decreased. From the error heat map of the trajectory, it can be seen that the trajectory accuracy of IPL–VIO can be improved whether it is smooth running or exhibiting continuous large-angle rotation. We also compared and analyzed the proposed IPL–VIO system and the PL–VIO system on the PennCOSYVIO dataset, which contains challenging scenes such as significant changes in lighting, large-angle rotation, and repeated textures. It was seen that the IPL–VIO system can improve the final trajectory accuracy after readjusting the point-line weights with the Helmert variance component estimation method. Furthermore, we assessed the speed of each module of IPL–VIO and PL–VIO. The improved line feature matching method can reduce the time consumption of the front end, and the Helmert variance component estimation method added in the back end was effective for the back end; the increase load was quite limited and almost negligible, which proved the effectiveness of the proposed IPL–VIO system.

The algorithm in this paper improved the basis of PL–VIO. Therefore, in Tables 2–4, we indicate the results of a comprehensive comparison of PL–VIO and IPL–VIO. As is shown in Table 2, IPL–VIO had higher accuracy than PL–VIO in most datasets, which shows that the algorithm in this paper has better performance in different scenarios. As can be seen from Table 3, the error in the x, y, z three-axis direction of IPL–VIO was almost small compared with PL–VIO. It can be seen from Table 4 that the method proposed in this paper shortened the matching time of line features and leaves more time for the operation of other modules.

#### **5. Conclusions**

This paper proposes an improved point–line VIO system IPL–VIO. The IPL–VIO system has two main improvement modules: the front end and the back end. In the front-end module, an improved line feature matching algorithm is proposed, which comprehensively uses the geometric information and the pixel gray information of the line feature to match. In the back-end module, we use the Helmert variance component estimation method to determinate the weights of the point features and line features. We compared IPL–VIO with OKVIS–Mono [9], VINS–Mono [10], and PL–VIO [24], and verified the effectiveness of the algorithm on the EuRoc MAV [31] and PennCOSYVIO [32] datasets. According to the analysis and results, there are two further conclusions:


We also look forward to the next work. At the back end, we use the simplified formula of the Helmert variance component estimation method, which introduces a certain degree of error. In the future, we would like to study how to improve the accuracy of weight determination without increasing the back-end overhead. We only use the Helmert variance component estimation method to estimate the weights of visual features; in the future, we will try to figure out how to better determine the weights of visual information and IMU information.

**Author Contributions:** B.X. and Y.C. conceived and designed the algorithm; B.X. performed the experiments, analyzed the data, and drafted the paper; J.W. contributed analysis tools; Y.C. and S.Z. revised the manuscript. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the National Key R&D Program of China (Grant No. 2017YFC0803801) and the National Key R&D Program of China (Grant No. 2016YFB0501803). We owe great appreciation to the anonymous reviewers for their critical, helpful, and constructive comments and suggestions.

**Conflicts of Interest:** The authors declare no conflict of interest.
