Next Article in Journal
Customized 2D Barcode Sensing for Anti-Counterfeiting Application in Smart IoT with Fast Encoding and Information Hiding
Previous Article in Journal
New Strategy for Improving the Accuracy of Aircraft Positioning Based on GPS SPP Solution
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Study on the Influence of Image Noise on Monocular Feature-Based Visual SLAM Based on FFDNet

Hubei Key Laboratory of Waterjet Theory and New Technology, Wuhan University, Wuhan 430072, China
*
Author to whom correspondence should be addressed.
Sensors 2020, 20(17), 4922; https://doi.org/10.3390/s20174922
Submission received: 17 July 2020 / Revised: 28 August 2020 / Accepted: 28 August 2020 / Published: 31 August 2020

Abstract

:
Noise appears in images captured by real cameras. This paper studies the influence of noise on monocular feature-based visual Simultaneous Localization and Mapping (SLAM). First, an open-source synthetic dataset with different noise levels is introduced in this paper. Then the images in the dataset are denoised using the Fast and Flexible Denoising convolutional neural Network (FFDNet); the matching performances of Scale Invariant Feature Transform (SIFT), Speeded Up Robust Features (SURF) and Oriented FAST and Rotated BRIEF (ORB) which are commonly used in feature-based SLAM are analyzed in comparison and the results show that ORB has a higher correct matching rate than that of SIFT and SURF, the denoised images have a higher correct matching rate than noisy images. Next, the Absolute Trajectory Error (ATE) of noisy and denoised sequences are evaluated on ORB-SLAM2 and the results show that the denoised sequences perform better than the noisy sequences at any noise level. Finally, the completely clean sequence in the dataset and the sequences in the KITTI dataset are denoised and compared with the original sequence through comprehensive experiments. For the clean sequence, the Root-Mean-Square Error (RMSE) of ATE after denoising has decreased by 16.75%; for KITTI sequences, 7 out of 10 sequences have lower RMSE than the original sequences. The results show that the denoised image can achieve higher accuracy in the monocular feature-based visual SLAM under certain conditions.

1. Introduction

Simultaneous Localization and Mapping (SLAM) has been an important research direction in the field of computer vision and robotics. It is the basic module for many location applications, such as mobile robots, micro aerial vehicles, autonomous driving, virtual reality, augmented reality, and so forth. Various sensors can implement the SLAM algorithm, such as GPS, LiDAR, IMU, and cameras. The camera using image sensors has the advantages of small size, low weight, low power, and low cost, also, it can provide vast information and easy to use. Visual SLAM methods using image sensors have seen tremendous improvements in accuracy, robustness, and efficiency, and have gained increasing popularity over recent years [1]. They can be classified into monocular, binocular, and RGB-D visual SLAM according to the camera used. The monocular visual SLAM system uses only a camera sensor, which is a pure vision issue. In low-cost monocular vision SLAM, image noise as well as its denoising technique is an important issue that needs more investigation.
In visual SLAM, there are three most popular formulations of visual odometry, namely direct, semi-direct, and feature-based methods. In the representative work of these methods, the DSO [2,3] and LSD-SLAM [4,5] are direct methods, the SVO [6,7] is a semi-direct method and the ORB-SLAM [8,9] is a feature-based method. There are lots of problems that have a close influence on the performance of SLAM, such as photometric calibration, motion bias, rolling shutter effect [1], and camera noise. To better understand the influence of noise on the SLAM system, other conditions are fixed as control variables. These methods to deal with the image noise are different. In LSD-SLAM [4,5], it is assumed that the image noise is Gaussian noise, but there is no quantitative analysis of how the noise intensity affects the system. In DSO [2], the study of geometric and photometric noise is performed and the research shows that the DSO modeled by noise is more robust to photometric noise than ORB-SLAM. Because when the photometric noise is high, the feature matching is easy to fail. In the SVO of the semi-direct method and the ORB-SLAM [8,9] of the feature-based method, it is assumed that the feature points are robust to noise, so the noise of the image has not been discussed separately. Image denoising is an effective method in visual SLAM, Cho et al. denoise the image sequences first and improved the visual SLAM performance in a turbid environment [10]. Chen et al. use the DCNNS for semantic segmentation, then the feature matching is limited to the pixel area of the same object in different frames, as a result, the influence of noise on the feature matching is reduced [11]. Liang proposes a precise Iterative Closest Point (ICP) algorithm to overcome noise and outliers to complete precise point cloud registration [12]. Zhang et al. perform feature matching with two past frames to reduce the feature sensitivity to noise [13].
Generally, the visual SLAM methods are based on the assumption that the image has a low noise level. In normal conditions, noise is difficult to notice, even with standard cameras. However, it rises quickly when exposure time has to be long, especially in dark scenes. This is a very challenging task for the existing monocular feature-based SLAM. Existing visual SLAM methods all suffer from accuracy degradation and even failure when the image noise reaches a certain intensity. However, how does image noise affect monocular feature-based visual SLAM, and what will happen to the accuracy and robustness after the image is denoised?
To answer these questions, a dataset containing different noise levels of images is needed. There are many datasets to evaluate the effect of the visual SLAM algorithm at present, such as the TUM dataset [14,15,16], EuRoC dataset [17], KITTI dataset [18,19], and ICL-NUIM RGB-D dataset [20]. However, the images in these datasets are all at low noise levels and have no different levels of noise. Therefore, a dataset for evaluating the noise of the visual SLAM method is extended from the previous publication WHU-RSVI [21]. However, the WHU-RSVI only provides a typical noise level, if the researchers want to study the noise of monocular visual SLAM, different noise levels and types are needed. According to the noise model and clean images introduced in [21], 33 sequences with different noise levels and types are obtained, and the method to add image noise with customized noise types and levels is open-sourced. To reduce the impact of noise on the visual SLAM system, this paper uses the Fast and Flexible Denoising convolutional neural Network (FFDNet) [22] to denoise the image, and then the performance of original sequences and denoised sequences is evaluated.
Scale Invariant Feature Transform (SIFT) [23,24], Speeded Up Robust Features (SURF) [25], and Oriented FAST and Rotated BRIEF (ORB) [26] are the three most commonly used methods in visual SLAM feature extraction and matching. The SIFT description is scale invariance, it can select a correct match of a key point from a large database of other key points, but the calculation is relatively large. SURF is an accelerated robust feature method implemented by integrating images for image convolution. SURF is also a scale and rotation invariant feature description method. ORB improves the problem that the FAST [27,28] detector does not have a direction, and it uses BRIEF [29] to describe the features. Compared with SIFT and SURF, ORB is rotation invariance and resistant to noise and greatly reduces the time required for extracting features.
To evaluate the impact of different levels of noise on matching performance, the three commonly used features were evaluated on the proposed dataset. Then, according to the evaluation, a method using FFDNet to improve ORB-SLAM2 is proposed.
The contributions of this paper are as follows:
(1)
An open-source dataset extended from WHU-RSVI [21] for evaluating the noise of Monocular visual SLAM systems or denoising tasks.
(2)
An open-source software for adding image noise with customized noise types and levels.
(3)
Quantitative evaluation of the influence of different levels of noise on the feature matching of SIFT, SURF, and ORB.
(4)
Accuracy improvement of ORB-SLAM2 system due to the image denoising.
All data, documents, programs, and scripts can be used under the Creative Commons Attribution-ShareAlike 3.0 Unported License at http://aric.whu.edu.cn/whu-noise-dataset.html.
To evaluate the influence of noise on the visual SLAM methods, a dataset with different levels of image noise is generated. The previous work of this dataset is WHU-RSVI [21], where the open-source ray-tracing software POV-Ray to generate images is used.
The model of the image is from “office” of “The Persistence of Ignorance” (http://www.ignorancia.org/), and the size of the model is 800 × 500 × 250 cm 3 . The resolution of the images in the dataset is 640 × 480 , and the camera’s field of view is 90 degrees and the intrinsic matrix of the image in the right-handed coordinates can be expressed as
K = 320.0 0.0 319.5 0.0 320.0 239.5 0.0 0.0 1.0 .
The function that represents the relationship between brightness and irradiance of image pixel values is called the Camera Response Function (CRF), it is a variety of linear and non-linear relationships that cameras are subjected to during imaging. The real CRF is analyzed in detail and a diverse database of real-world camera response functions is collected by Grossberg and Nayar [30]. Linear CRF is a common response model [31] and the CRF is set to linear in this paper.
Unlike images captured by real cameras, the images generated by POV-Ray are noise-free. There are many types of noise in an image, such as photon shot noise, dark current shot noise, offset fixed pattern noise, dark current fixed pattern noise, source follower noise, quantization noise, and sense node reset noise [32]. The noise model is simplified by Liu et al. [33], and it can be expressed as
I = f L + n s + n c + n q ,
where I is the pixel brightness from 0 to 255, f ( · ) represents the CRF, n s is all noise components that depend on irradiance, n c denotes the independent noise before correction and n q is the additional quantization noise. Since most cameras can achieve very low quantization noise, n q is ignored in the model of this paper. The expectation of the irradiance-dependent noise is E n s = 0 , and the variance of it is Var n s = L σ s 2 , the expectation of irradiance-independent noise is E n c = 0 , and the variance of it is Var n c = σ c 2 and L represents the irradiance. Grossberg and Nayar [30] normalized the irradiance L and the RGB value from 0 to 1, in this paper, the irradiance L value and the RGB value are discrete from 0 to 255 as mentioned in [34], σ s and σ c are also dimensionless values.
By taking different σ s and σ c , images with different noise levels can be obtained according to the Equation (2). It is assumed that the noise levels are the same on the three channels of R, G, and B and the image noise can be divided into three types: irradiance-independent, irradiance-dependent, and the mixture of the two. Irradiance-independent noise is the so-called Gaussian noise. Noisy images of different levels can be obtained by adjusting σ c . In this paper, σ c ranges from 0.005 to 0.10. Except for the value of σ c = 0.005, the σ c is incremented by a step size of 0.01. The irradiance-dependent noise is obtained by adjusting σ s . The value of σ c in this paper ranges from 0.01 to 0.20. Except for the value of σ s = 0.01, the other steps are 0.02. For the noise mixed by the irradiance-independent and the irradiance-dependent, the value of σ s is twice the value of σ c , and the steps are the same as that when the value is taken separately. The images of different noise levels in the final sequence are shown in Figure 1.

2. Methods

2.1. Noise Dataset

By modeling the trajectories of the sequence through B-spline curves, the ground truth of the trajectories of each sequence can be obtained. The format of the ground truth in the dataset is compatible with the TUM dataset. All sequences with different noise levels in the dataset have the same ground truth and the statistical information of the sequence is shown in Table 1.

2.2. Image Denoising

Image denoising is an important research direction in low-level vision. Noise inevitably occurs during the imaging process, which will seriously affect the quality of the image. Therefore, image denoising is an essential step in many image processing and computer vision tasks.
FFDNet [22,35] is used to denoise the images, in this paper, the model trained by Zhang et al. [22] using an open-source machine learning framework PyTorch (https://opencv.org/). FFDNet is an image denoising method based on the convolutional neural network developed in recent years. It is proposed based on DnCNN (denoising convolutional neural network) [36]. Compared with existing neural network denoising works DnCNN [36] and BM3D [37], FFDNet has faster execution time and smaller memory footprint, and it can use a single network model to deal with various levels of noise. It has a low calculation and can handle spatial shift noise, as shown in Figure 2.
As shown in Figure 2, the input tensor is then fed into a series of 3 × 3 convolutional layers, each of which contains the following three operations—Conv, ReLU (rectified linear unit), and BN (Batch Normalization) [38]. Specifically, the first layer uses Conv + ReLU, the middle is Conv + BN + ReLU, and the last convolution layer is Conv. After each convolution, zero padding is used to ensure that the size of the feature map does not change. After CNN, the upscaling operation is used as the reverse process of the downsampling process in the input stage to generate a denoised image with the same size as the original input image.
In FFDNet, it takes a tunable noise level map as input to make the denoising model flexible to noise levels. To improve the efficiency of the denoiser, a reversible downsampling operator is introduced to reshape the input image of size W × H × C into 4 downsampled sub-images of size w 2 × H 2 × 4 C , where W and H are the width and height of the input image, and C is the number of channels. For grayscale images, C = 1 , for color images, C = 3 . The downsampling process can significantly improve the training speed without reducing the modeling ability. By considering the balance of complexity and performance, it is empirically set the number of convolution layers as 15 for grayscale images and 12 for color images. As to the channels of feature maps, it is set 64 for grayscale images and 96 for color images. Besides, unlike DnCNN, denoising the down-sampled sub-images can effectively increase the receiving domain and thus obtain a suitable network depth. After the downsampling operation, it is fed into the CNN together with the adjustable noise level map, so that images with different noise levels can be processed. The image after image denoising is shown in (m)–(p) of Figure 3.
Zhang et al. [22] evaluated the running time of FFDNet (The evaluation was performed in Matlab (R2015b) environment on a computer with a six-core Intel(R) Core(TM) i7-5820K CPU @ 3.3 GHz, 32 GB of RAM and an Nvidia Titan X Pascal GPU). The authors find that for a gray image with a size of 512 × 512, the denoising time is 0.012 s (at 83.3 Hz). The image size of our dataset is 640 × 480, which is close to 512 × 512 in terms of data size. It indicates that when the computer has a high-performance GPU, FFDNet can meet the real-time requirements of visual SLAM.
The average Peak Signal to Noise Ratio (PSNR) of the noisy sequences and denoised sequences are shown in Table 2.

2.3. Feature Matching

The extraction and matching algorithms provided in OpenCV (https://opencv.org/) are used to compare the matching effects of the three features on the noise dataset. SIFT, SURF, and ORB are used to extract nearly 500 features for every image. Both SIFT and ORB can specify the number of extracted feature numbers, where SURF cannot be set to extract fixed and it is obtained according to the Heisen threshold. The same Heisen threshold results in different feature numbers in different images. So the Heisen threshold is adjusted for each image to extract 500 features in this paper.
The brute-force matching is performed after the feature extraction, that is, the distance between the descriptors of the features in the two images is measured, and then sorted, and the closest distance is taken as the matched pair. The distance between the descriptors expresses the similarity between the two features. For SIFT and SURF, the distance is generally expressed using Euclidean distance. For ORB, the Hamming distance is often used as it uses a binary descriptor. The Hamming distance between two strings of equal length is the number of positions where the corresponding symbols are different.
The ground truth is used to verify the quality of feature matching. Depth of the images is obtained by MegaPOV (http://megapov.inetart.net/), and the projection model of the pinhole camera can be expressed as
d 1 u 1 v 1 1 = K ( R 1 P + t 1 ) = f x 0 c x 0 f y c y 0 0 1 R 1 00 x + R 1 01 y + R 1 02 z + t 1 0 R 1 10 x + R 1 11 y + R 1 12 z + t 1 1 R 1 20 x + R 1 21 y + R 1 22 z + t 1 2 = ( f x R 1 00 + c x R 1 20 ) x ( f x R 1 01 + c x R 1 21 ) y ( f x R 1 02 + c x R 1 22 ) z + f x t 1 0 + c x t 1 2 ( f y R 1 10 + c y R 1 20 ) x ( f y R 1 11 + c y R 1 21 ) y ( f y R 1 12 + c y R 1 22 ) z + f y t 1 1 + c y t 1 2 R 1 20 x R 1 11 y R 1 22 z + t 1 2 ,
where K is the camera’s intrinsic matrix, P = ( x , y , z ) T is the point in the space, R 1 is the rotation matrix of the first image, t 1 is the translation vector and d 1 is the depth of the pixel. According to the Cramer’s rule, the linear equations can be solved, we define that
D = f x R 1 00 + c x R 20 f x R 1 01 + c x R 1 21 f x R 1 02 + c x R 1 22 f y R 1 10 + c y R 20 f y R 1 11 + c y R 1 21 f y R 1 12 + c y R 1 22 R 1 20 R 1 11 R 1 22 ,
D x = u d 1 f x t 1 0 c x t 1 2 f x R 1 01 + c x R 1 21 f x R 1 02 + c x R 1 22 v d 1 f y t 1 1 c y t 1 2 f y R 1 11 + c y R 1 21 f y R 1 12 + c y R 1 22 d 1 t 1 2 R 1 11 R 1 22 ,
D y = f x R 1 00 + c x R 1 20 u d f x t 1 0 c x t 1 2 f x R 1 02 + c x R 1 22 f y R 1 10 + c y R 1 20 v d f y t 1 1 c y t 1 2 f y R 1 12 + c y R 1 22 R 1 20 d 1 t 1 2 R 1 22 ,
D z = f x R 1 00 + c x R 1 20 f x R 1 01 + c x R 1 21 u d 1 f x t 1 0 c x t 1 2 f y R 1 10 + c y R 1 20 f y R 1 11 + c y R 1 21 v d 1 f y t 1 1 c y t 1 2 R 1 20 R 1 11 d 1 t 1 2 ,
where D is a nonzero determinant, then x, y and z in the equations have a unique solution, whose individual values for the unknowns are given by:
x = D x D , y = D y D , z = D z D
According to the x, y, z and the known d 2 , R 2 and t 2 , the pixel of the point P in the second image can be caculated by:
d 2 u 2 v 2 1 = K R 2 x y z + t 2 .
The error between the matched points is calculated and it can be expressed as
e m a t c h = ( u 2 u 1 ) 2 + ( v 2 v 1 ) 2 .
It is assumed that when the error is within 4 pixels, the match is correct. The point that matches correctly is determined as the inliers, otherwise, it is determined as the outliers.

3. Results and Discussions

The experiment platform of this paper is Intel Core i7-7700HQ (4 cores @ 2.80 GHz), 16 GB memory, and NVIDIA GeForce GTX 1050Ti. The mean execution time of FFDNet on the proposed dataset is 0.153 s per frame, and the tracking time of the ORB-SLAM is 0.025 s per frame.

3.1. Results of Feature Matching

Multiple pairs of images from different scenes in the dataset are selected for image matching, and the statistical analysis on irradiance-independent noise, irradiance-dependent noise, and mixed noise of the two is performed.
The results of irradiance-independent Gaussian noise are shown in Figure 4. It can be seen that the matching rate of ORB is the highest of the three, while SURF is second and SIFT is the lowest.
The results of irradiance-dependent noise and the mixture of the two are shown in Figure 5 and Figure 6. Similar to Gaussian noise, irradiance-dependent noise and the mixed noise of the two are also ORB with the highest matching rate. Different from Gaussian noise, as the intensity of noise increases, the matching rate of irradiance-dependent noise and mixed noise matching image pairs decreases more significantly. For the denoised image, its matching rate is higher than the original image whether it is on ORB, SURF, or SIFT.
The results of feature matching that performed on the original images and the denoised images are shown in Table 3. It is shown that for the denoised image, a higher correct matching rate can be achieved.

3.2. Results of Trajectories

The ORB-SLAM2 [9] is used to evaluate the dataset sequences. 1000 features are extracted from each image, the scale factor for the image pyramids is set to 1.2, and the pyramid level is set to 8. The image pyramid is to downsample the image at different levels to obtain images with different resolutions.
The three types of image sequences: irradiance-independent noise, irradiance-dependent noise, and mixed noise are evaluated and compared with the sequence after image denoising.

3.2.1. Noised Sequences

For the noisy sequences in the dataset, each sequence is evaluated for 10 times, and the mean values after excluding outliers are used to represent the evaluation results. The irradiance-independent noisy sequences can be successfully tracked when σ c is below 0.05, while the tracking failure occurs when σ c is greater than 0.05. As the noise intensity continues to increase, the RMSE of ATE increases significantly. From the noise intensity σ c = 0.005 to σ c = 0.05 , the RMSE of ATE increased from 6.7 cm to 26.2 cm with an increase of 19.5 cm. For the denoised images, all sequences in this dataset can be successfully tracked, and the ATE is maintained at about 6.6 cm. This can also reflect that FFDNet has a good effect on irradiance-independent Gaussian noise. The evaluation results are shown in Table 4.
For irradiance-dependent noise, it is found that when the noise intensity σ s < 0.04 , the noisy sequences can keep the ATE stable. As the noise intensity continues to increase, the RMSE of ATE increases significantly. When σ s = 0.01 , the RMSE of ATE is 7.3 cm, and when σ s = 0.08 , it increases to 27.5 cm, with an increase of 20.2 cm. For the denoised sequences, all can be successfully tracked before the noise intensity σ s = 0.16 . And the RMSE is maintained stable during this period as shown in Table 4.
The mixed noisy sequences are evaluated after the irradiance-independent noisy sequences and the irradiance-dependent sequences. The results are shown in Table 4. It is found that the tracking failure occurs when the noise intensity is higher than σ s = 0.06 , σ c = 0.03 .
The RMSE of ATE of the sequences are 7.1 cm at the noise intensity σ s = 0.01 , σ c = 0.005 , and it increases to 14.8 cm when the noise intensity is σ s = 0.04 , σ c = 0.02 , which indicates that the mixed noise has the largest effect on the robustness of ORB-SLAM2.
The evaluation results show that the denoised sequences have significantly improved robustness. In terms of accuracy, no matter the noise level, the accuracy of the denoised sequences is higher than that of the noisy sequences.

3.2.2. Clean Sequence

As shown in Figure 7, the clean images (noise intensity is zero) are different from the clean images after denoising. Generally, the denoised image looks smoother and softer because the noise level of an image is estimated by FFDNet, some parts that are not noise will be treated as noise and removed. As a result, the clean images after denoising by FFDNet are not the same as the original clean images.
The clean sequence is evaluated separately for 20 times and the results of the evaluation are shown in Table 5.
The evo tools [39] are used to plot the figures of the estimated trajectories and the two trajectories with the RMSE closest to the average are selected for plotting. A visual comparison of the original and denoised sequence of the estimated trajectories are shown in Figure 8. The comparison on each axis of the clean sequence and denoised clean sequence is shown in Figure 9. Although the difference between the two is small, it can be seen from the z-axis that the denoised clean sequence error is smaller than the original clean sequence. The statistical results of Table 5 show that for the denoised sequences, the maximum of ATE decreased by 10.68%, the mean of ATE decreased by 17.23%, the median of ATE decreased by 19.55%, the minimum of ATE decreased by 6.35%, and the RMSE of ATE decreased by 16.75%.
To analyze whether the denoised sequences perform better than the clean sequence, the original clean sequence and the denoised sequence are evaluated for 20 times, the denoised sequence performs better on 15 occasions and the original clean sequence performs better on 5 occasions.
The probability P of obtaining k original sequences (perform better) in n tests with a probability of original sequences (perform better) equal to p is given by the binomial distribution:
P ( x = k ) = n k p k ( 1 p ) n k .
Then it can be checked whether the original sequences perform better than the denoised sequences:
Null hypothesis ( H 0 ): The original sequences perform better than the denoised sequences, with probability p = 0.5.
Test statistic: Number of original sequences (perform better).
Alpha level (designated threshold of significance): 0.05.
Observation O: 5 original sequences (perform better) out of 20 experiments.
Left-tailed p-value of observation O given H0 is
P = i = 0 5 20 i 0 . 5 5 ( 1 0.5 ) 20 5 = 0.0207 .
The null hypothesis is rejected at the 0.05 level. Hence, this result (2.07% < 5%) indicates that we have evidence that is significant enough to reject the null hypothesis. So it can be said that the denoised sequence performed better than the original sequence.

3.2.3. KITTI Sequences

KITTI dataset [18,19] is a combined dataset of images, lidar, GPS measurements, and inertial measurement unit accelerations recorded while driving near Karlsruhe, Germany, on a mobile platform.
Table 6 reports the results of the denoised sequences and the original sequence. However, only the RMSE after translation and scale alignment with the ground-truth trajectory is reported.
The typical visual comparison between the original sequence and the denoised sequence (KITTI 00) is shown in Figure 10a, and the typical visual comparison on each axis of the original sequence and denoised sequence (KITTI 00) is shown in Figure 10b.
The 10 sequences on KITTI dataset are evaluated 10 times, and the results are weighted according to the number of frames. In 100 evaluations, the denoised sequences perform better on 64 occasions and the original sequences perform better on 36 occasions.
Similarly, it can be checked whether the original sequences perform better than the denoised sequences:
Null hypothesis ( H 0 ): The original sequences perform better than the denoised sequences, with probability p = 0.5.
Test statistic: Number of original sequences (perform better).
Alpha level (designated threshold of significance): 0.05.
Observation O: 36 original sequences (perform better) out of 100 experiments.
Left-tailed p-value of observation O given H0 is
P = i = 0 36 100 i 0 . 5 36 ( 1 0.5 ) 100 36 = 0.0033 .
The null hypothesis is rejected at the 0.05 level. Hence, this result (0.33% < 5%) indicates that we have evidence that is significant enough to reject the null hypothesis, and it can be said that the denoised sequences performed better than the original sequences.

3.3. Feature Stability

Feature matching is at the base of many computer vision problems. In feature-based SLAM, when the matching accuracy is higher, the tracking effect is better. To find the reason why the matching rate of the denoised sequences is higher, the descriptors of matching image pairs are analyzed. The features in the two images are matched by the distance between the descriptors, where SIFT and SURF use the Euclidean distance, ORB uses the Hamming distance. The closer the distance, the higher the similarity of features.
In the matching pairs, if the feature descriptors on the match are completely consistent, they are considered to be the same feature and have better stability. These features are called “stable features”. The greater the number of “stable features”, the better the stability of feature matching.
In this paper, the clean sequence and the ORB descriptor are used for testing. Statistical analysis is performed on the number of all 1514 adjacent image pairs. Among the 1514 matched pairs, the average number of “stable features” in the denoised sequence is 354.02, where the number of the original clean sequence is 346.94. After the denoising, there are 1156 pairs have more number of stable feature, 302 pairs have less number stable feature and other 56 pairs have the same number of “stable features”.
The difference between the “stable features” of the denoised pairs and the original clean pairs is also counted, the results are shown in Figure 11.
It can be seen from Figure 11 that the denoised pairs usually have 4–20 more “stable features” than the original clean pairs. Similarly, it can be checked whether the original pairs perform better than the denoised sequences:
Null hypothesis ( H 0 ): The original clean pairs perform better than the denoised pairs, with probability p = 0.5.
Test statistic: Number of original pairs (perform better).
Alpha level (designated threshold of significance): 0.05.
Observation O: 302 original pairs (perform better) out of 1458 experiments.
Left-tailed p-value of observation O given H0 is
P = i = 0 302 1458 i 0 . 5 302 ( 1 0.5 ) 1458 302 = 4.55 × 10 118 .
The null hypothesis is rejected at the 0.05 level. Hence, this result ( 4.55 × 10 118 < 5%) indicates that we have evidence that is significant enough to reject the null hypothesis, and it can be said that the features in the denoised pairs are more stable.
When the images to be co-registered, the noise in images often have different type and level. To study whether the denoised sequence has more stable features when the noise intensity is different, the original image and the denoised image in the clean sequence are used as a reference, and then images with different noise levels are performed image matching with the reference image respectively. The number of stable features is shown in Figure 12. It is shown that the denoised images have more stable features than the original. To some extent, it can be said that the denoising will be helpful when the image co-registration at different noise levels.

4. Conclusions

(1) This paper presents an open-source synthetic dataset to evaluate the noise of the visual SLAM methods by extending the WHU-RSVI dataset. The dataset contains 33 noisy sequences with various noise levels and the method to add image noise with customized noise types and levels is open-sourced.
(2) This paper uses FFDNet to denoise images in the dataset, and then SIFT, SURF, and ORB are employed for feature extraction and matching respectively. ORB has the highest matching rate at all noise levels, followed by SURF, and finally SIFT. At all noise levels, the denoised images always have a higher matching rate than the original image.
(3) This paper evaluates the performance of the noisy sequences and the denoised sequences under the ORB-SLAM2. The results show that the denoised sequences have a smaller ATE than the original image sequences. For the irradiance-independent Gaussian noise, the ATE of the denoised sequences is basically unaffected, while for the irradiance-dependent noise and mixed noise, the ATE of the denoised sequences decreases slightly. The denoised sequences are more robust than the original sequences.
(4) The clean sequence in the dataset is denoised by the FFDNet and the evaluated by ORB-SLAM2. Compared with the original sequence, the RMSE of ATE of the denoised clean sequence has decreased by 16.75%, indicating that when the features are sufficient, denoising the clean sequence can also improve the performance of the current monocular feature-based visual SLAM methods.
(5) The commonly used KITTI dataset sequences are evaluated, and compared with the original sequences, 7 out of 10 sequences have lower RMSE than the original sequences.
For an image without any noise, denoising the image will lose some information. However, it is found that after image denoising, the tracking accuracy of the ORB-SLAM2 is improved regardless of whether there is noise in the image or not. According to the analysis in Section 3.3, this is because the information affecting the feature descriptor is removed, so that the feature is usually more stable after denoising.
The work of this paper is low coupling with the existing SLAM methods, and it can be easily applied to existing methods. Moreover, using other denoising methods maybe also useful (We use DnCNN to denoise the clean sequence, and then use ORB-SLAM2 to evaluate it. Of the 20 evaluations, 16 of the evaluation results are better than the original sequence (p-value: 0.59% < 5%). The RMSE is 0.00660 m, an average drop of 32.83%). The image denoising in this paper belongs to the image pre-processing in the SLAM system, which has a similar effect to image enhancement. Therefore, an image denoising module can be added between the image capture module and feature tracking module when the engineers are developing relevant projects. Generally, FFDNet needs extra time to denoise images, but in offline feature-based SLAM, using FFDNet to denoise image sequences can improve the accuracy of the system. In structure from motion, the accuracy of the system is usually more important than the running time, so the researchers can also use FFDNet to pre-process the image first.
In future work, we will convert FFDNet’s PyTorch model to TorchScript and run it in C++, then FFDNet can be integrated with ORB-SLAM2 and the images denoised by FFDNet are input into ORB-SLAM2 for feature extraction and matching. Visual SLAM is a computationally intensive task, it is difficult for visual SLAM to achieve real-time performance in a resource-limited environment like micro aerial vehicles and mobile robots. So we plan to build a cloud computing service platform, the images captured by the camera will immediately transfer to the platform, then the image denoising, feature matching, bundle adjustment, loop detection, and other computationally intensive tasks can be performed in the cloud computing service platform. Besides, we will study the performance of the method under fast motions with motion blur. The noise dataset in this paper is for monocular, and the binocular and RGB-D sequences considered to expand.

Author Contributions

L.C. and X.X. conceived the idea and designed the methods and experiments; L.C. performed the experiments, analyzed the data and wrote the paper; J.L. and X.X. revised the paper. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Key R&D Program of China (2018YFB2100903).

Acknowledgments

The authors would like to thank Jaime Vives Piqueres who provided us with the open-source office scene.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
ATEAbsolute Trajectory Error
BNBatch Normalization
CNNConvolutional Neural Network
CRFCamera Response Function
DnCNNDenoising Convolutional Neural Network
DSODirect Sparse Odometry
FASTFeatures from Accelerated Segment Test
FFDNetFast and Flexible Denoising convolutional neural Network
LSD-SLAMLarge-Scale Direct monocular SLAM
ORBOriented FAST and Rotated BRIEF
ORB-SLAMORB feature-based SLAM
POV-RayPersistence Of Vision Raytracer
PSNRPeak Signal to Noise Ratio
RANSACRANdom SAmple Consensus
ReLURectified Linear Unit
RGB-DRed Green Blue-Depth
RMSERoot Mean Square Error
SIFTScale Invariant Feature Transform
SLAMSimultaneous Localization and Mapping
SURFSpeeded Up Robust Features
SVOSemidirect Visual Odometry

References

  1. Yang, N.; Wang, R.; Gao, X.; Cremers, D. Challenges in monocular visual odometry: Photometric calibration, motion bias, and rolling shutter effect. IEEE Robot. Autom. Lett. 2018, 3, 2878–2885. [Google Scholar] [CrossRef] [Green Version]
  2. Engel, J.; Koltun, V.; Cremers, D. Direct sparse odometry. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 611–625. [Google Scholar] [CrossRef] [PubMed]
  3. Gao, X.; Wang, R.; Demmel, N.; Cremers, D. LDSO: Direct sparse odometry with loop closure. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 2198–2204. [Google Scholar]
  4. Engel, J.; Schöps, T.; Cremers, D. LSD-SLAM: Large-scale direct monocular SLAM. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 5–12 September 2014; pp. 834–849. [Google Scholar]
  5. Engel, J.; Stückler, J.; Cremers, D. Large-scale direct SLAM with stereo cameras. In Proceedings of the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Hamburg, Germany, 28 September–2 October 2015; pp. 1935–1942. [Google Scholar]
  6. Forster, C.; Pizzoli, M.; Scaramuzza, D. SVO: Fast semi-direct monocular visual odometry. In Proceedings of the 2014 IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, China, 31 May–7 June 2014; pp. 15–22. [Google Scholar]
  7. Forster, C.; Zhang, Z.; Gassner, M.; Werlberger, M.; Scaramuzza, D. SVO: Semidirect visual odometry for monocular and multicamera systems. IEEE Trans. Robot. 2016, 33, 249–265. [Google Scholar] [CrossRef] [Green Version]
  8. Mur-Artal, R.; Montiel, J.M.M.; Tardos, J.D. ORB-SLAM: A versatile and accurate monocular SLAM system. IEEE Trans. Robot. 2015, 31, 1147–1163. [Google Scholar] [CrossRef] [Green Version]
  9. Mur-Artal, R.; Tardós, J.D. Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras. IEEE Trans. Robot. 2017, 33, 1255–1262. [Google Scholar] [CrossRef] [Green Version]
  10. Cho, Y.; Kim, A. Channel invariant online visibility enhancement for visual SLAM in a turbid environment. J. Field Robot. 2018, 35, 1080–1100. [Google Scholar] [CrossRef]
  11. Chen, Y.; He, H.; Chen, H.; Zhu, T. Improving Registration of Augmented Reality by Incorporating DCNNS into Visual SLAM. Int. J. Pattern Recognit. Artif. Intell. 2018, 32, 1855022. [Google Scholar] [CrossRef]
  12. Liang, L. Precise Iterative Closest Point Algorithm for RGB-D Data Registration with Noise and Outliers. Neurocomputing 2020, 399, 361–368. [Google Scholar] [CrossRef]
  13. Zhang, G.; Liu, H.; Dong, Z.; Jia, J.; Wong, T.T.; Bao, H. Efficient non-consecutive feature tracking for robust structure-from-motion. IEEE Trans. Image Process. 2016, 25, 5957–5970. [Google Scholar] [CrossRef]
  14. Sturm, J.; Engelhard, N.; Endres, F.; Burgard, W.; Cremers, D. A benchmark for the evaluation of RGB-D SLAM systems. In Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, Algarve, Portugal, 7–12 October 2012; pp. 573–580. [Google Scholar]
  15. Engel, J.; Usenko, V.; Cremers, D. A photometrically calibrated benchmark for monocular visual odometry. arXiv 2016, arXiv:1607.02555. [Google Scholar]
  16. Schubert, D.; Goll, T.; Demmel, N.; Usenko, V.; Stückler, J.; Cremers, D. The TUM VI benchmark for evaluating visual-inertial odometry. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 1680–1687. [Google Scholar]
  17. Burri, M.; Nikolic, J.; Gohl, P.; Schneider, T.; Rehder, J.; Omari, S.; Achtelik, M.W.; Siegwart, R. The EuRoC micro aerial vehicle datasets. Int. J. Robot. Res. 2016, 35, 1157–1163. [Google Scholar] [CrossRef]
  18. Geiger, A.; Lenz, P.; Urtasun, R. Are we ready for autonomous driving? The kitti vision benchmark suite. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 3354–3361. [Google Scholar]
  19. Geiger, A.; Lenz, P.; Stiller, C.; Urtasun, R. Vision meets robotics: The KITTI dataset. Int. J. Robot. Res. 2013, 32, 1231–1237. [Google Scholar] [CrossRef] [Green Version]
  20. Handa, A.; Whelan, T.; McDonald, J.; Davison, A.J. A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM. In Proceedings of the 2014 IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, China, 31 May–7 June 2014; pp. 1524–1531. [Google Scholar]
  21. Cao, L.; Ling, J.; Xiao, X. The WHU rolling shutter visual-inertial dataset. IEEE Access 2020, 8, 50771–50779. [Google Scholar] [CrossRef]
  22. Zhang, K.; Zuo, W.; Zhang, L. FFDNet: Toward a fast and flexible solution for CNN-based image denoising. IEEE Trans. Image Process. 2018, 27, 4608–4622. [Google Scholar] [CrossRef] [Green Version]
  23. Lowe, D.G. Object recognition from local scale-invariant features. In Proceedings of the International Conference on Computer Vision, Kerkyra, Corfu, Greece, 20–25 September 1999; Volume 99, pp. 1150–1157. [Google Scholar]
  24. Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
  25. Bay, H.; Tuytelaars, T.; Van Gool, L. Surf: Speeded up robust features. In Proceedings of the European Conference on Computer Vision, Graz, Austria, 7–13 May 2006; pp. 404–417. [Google Scholar]
  26. Rublee, E.; Rabaud, V.; Konolige, K.; Bradski, G.R. ORB: An efficient alternative to SIFT or SURF. In Proceedings of the International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; Volume 11, p. 2. [Google Scholar]
  27. Rosten, E.; Drummond, T. Machine learning for high-speed corner detection. In Proceedings of the European Conference on Computer Vision, Graz, Austria, 7–13 May 2006; pp. 430–443. [Google Scholar]
  28. Rosten, E.; Porter, R.; Drummond, T. Faster and better: A machine learning approach to corner detection. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 32, 105–119. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  29. Calonder, M.; Lepetit, V.; Strecha, C.; Fua, P. Brief: Binary robust independent elementary features. In Proceedings of the European Conference on Computer Vision, Heraklion, Crete, Greece, 5–11 September 2010; pp. 778–792. [Google Scholar]
  30. Grossberg, M.D.; Nayar, S.K. Modeling the space of camera response functions. IEEE Trans. Pattern Anal. Mach. Intell. 2004, 26, 1272–1282. [Google Scholar] [CrossRef]
  31. Handa, A.; Newcombe, R.A.; Angeli, A.; Davison, A.J. Real-time camera tracking: When is high frame-rate best? In Proceedings of the European Conference on Computer Vision, Florence, Italy, 7–13 October 2012; pp. 222–235. [Google Scholar]
  32. Konnik, M.; Welsh, J. High-level numerical simulations of noise in CCD and CMOS photosensors: Review and tutorial. arXiv 2014, arXiv:1412.4031. [Google Scholar]
  33. Liu, C.; Freeman, W.T.; Szeliski, R.; Kang, S.B. Noise estimation from a single image. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), New York, NY, USA, 17–22 June 2006; Volume 1, pp. 901–908. [Google Scholar]
  34. Tsin, Y.; Ramesh, V.; Kanade, T. Statistical calibration of CCD imaging process. In Proceedings of the Proceedings Eighth IEEE International Conference on Computer Vision (ICCV 2001), Vancouver, BC, Canada, 7–14 July 2001; Volume 1, pp. 480–487. [Google Scholar]
  35. Tassano, M.; Delon, J.; Veit, T. An Analysis and Implementation of the FFDNet Image Denoising Method. Image Process. Line 2019, 9, 1–25. [Google Scholar] [CrossRef]
  36. Zhang, K.; Zuo, W.; Chen, Y.; Meng, D.; Zhang, L. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE Trans. Image Process. 2017, 26, 3142–3155. [Google Scholar] [CrossRef] [Green Version]
  37. Dabov, K.; Foi, A.; Katkovnik, V.; Egiazarian, K. Image denoising by sparse 3-D transform-domain collaborative filtering. IEEE Trans. Image Process. 2007, 16, 2080–2095. [Google Scholar] [CrossRef] [PubMed]
  38. Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv 2015, arXiv:1502.03167. [Google Scholar]
  39. Grupp, M. evo: Python Package for the Evaluation of Odometry and SLAM. Available online: https://github.com/MichaelGrupp/evo (accessed on 30 August 2020).
Figure 1. Noise dataset overview. (ad) Clean images; (eh) Medium noisy images with noise Medium noise intensity σ s = 0.06 , σ c = 0.03 ; (il) High noisy images with noise Medium noise intensity σ s = 0.2 , σ c = 0.1 ; (mp) Denoised images for Medium noisy images.
Figure 1. Noise dataset overview. (ad) Clean images; (eh) Medium noisy images with noise Medium noise intensity σ s = 0.06 , σ c = 0.03 ; (il) High noisy images with noise Medium noise intensity σ s = 0.2 , σ c = 0.1 ; (mp) Denoised images for Medium noisy images.
Sensors 20 04922 g001
Figure 2. The structure of FFDNet.
Figure 2. The structure of FFDNet.
Sensors 20 04922 g002
Figure 3. Feature matching overview. (ac) Feature matching by SIFT, SURF, and ORB of the noisy images; (df) feature matching by SIFT, SURF and ORB of the denoised images.
Figure 3. Feature matching overview. (ac) Feature matching by SIFT, SURF, and ORB of the noisy images; (df) feature matching by SIFT, SURF and ORB of the denoised images.
Sensors 20 04922 g003
Figure 4. The matching rate of feature matching under the irradiance-independent noise.
Figure 4. The matching rate of feature matching under the irradiance-independent noise.
Sensors 20 04922 g004
Figure 5. The matching rate of feature matching under the irradiance-dependent noise.
Figure 5. The matching rate of feature matching under the irradiance-dependent noise.
Sensors 20 04922 g005
Figure 6. The matching rate of feature matching under the mixed noise.
Figure 6. The matching rate of feature matching under the mixed noise.
Sensors 20 04922 g006
Figure 7. (a) Example of the original clean image. (b) Example of the clean image after denoising.
Figure 7. (a) Example of the original clean image. (b) Example of the clean image after denoising.
Sensors 20 04922 g007
Figure 8. The estimated trajectory of the clean sequence, denoised clean sequence, and their ground truth.
Figure 8. The estimated trajectory of the clean sequence, denoised clean sequence, and their ground truth.
Sensors 20 04922 g008
Figure 9. Comparison of the clean sequence, denoised clean sequence, and their ground truth on each coordinate axis.
Figure 9. Comparison of the clean sequence, denoised clean sequence, and their ground truth on each coordinate axis.
Sensors 20 04922 g009
Figure 10. (a) Estimation statistics of the KITTI 00 sequence. (b) Comparison of the KITTI 00 sequence and its denoised sequence with ground truth on each coordinate axis.
Figure 10. (a) Estimation statistics of the KITTI 00 sequence. (b) Comparison of the KITTI 00 sequence and its denoised sequence with ground truth on each coordinate axis.
Sensors 20 04922 g010
Figure 11. Frequency histogram of the difference between the “stable features” of the denoised and original clean pairs.
Figure 11. Frequency histogram of the difference between the “stable features” of the denoised and original clean pairs.
Sensors 20 04922 g011
Figure 12. Stable features in different noise levels.
Figure 12. Stable features in different noise levels.
Sensors 20 04922 g012
Table 1. Statistical information of the trajectories.
Table 1. Statistical information of the trajectories.
FramesDurationLengthAvg. Vel.
289596.50 s50.11 m0.52 m/s
Table 2. The average Peak Signal to Noise Ratio (PSNR) values (dB) of the noisy sequences and the denoised sequences.
Table 2. The average Peak Signal to Noise Ratio (PSNR) values (dB) of the noisy sequences and the denoised sequences.
Noise LevelNoisyDenoisedNoise LevelNoisyDenoisedNoise LevelNoisyDenoised
( σ c , σ s )SequenceSequence( σ c , σ s )SequenceSequence( σ c , σ s )SequenceSequence
(0.005, 0.0)45.2631.773(0.0, 0.01)43.3531.771(0.005, 0.01)41.49131.77
(0.01, 0.0)39.83531.769(0.0, 0.02)37.71231.765(0.01, 0.02)35.71731.761
(0.02, 0.0)33.98331.758(0.0, 0.04)31.80431.753(0.02, 0.04)29.77931.748
(0.03, 0.0)30.50031.75(0.0, 0.06)28.31331.748(0.03, 0.06)26.27731.751
(0.04, 0.0)28.02031.748(0.0, 0.08)25.83831.75(0.04, 0.08)23.80331.761
(0.05, 0.0)26.09731.753(0.0, 0.10)23.92431.745(0.05, 0.10)21.89531.739
(0.06, 0.0)24.52831.765(0.0, 0.12)22.36831.698(0.06, 0.12)20.3531.234
(0.07, 0.0)23.20531.783(0.0, 0.14)21.0631.283(0.07, 0.14)19.05829.167
(0.08, 0.0)22.06431.803(0.0, 0.16)19.93630.011(0.08, 0.16)17.95526.029
(0.09, 0.0)21.06231.792(0.0, 0.18)18.95327.96(0.09, 0.18)17.00023.135
(0.10, 0.0)20.17231.602(0.0, 0.20)18.08325.621(0.10, 0.20)16.16420.918
Table 3. Statistics of feature matching results on Scale Invariant Feature Transform (SIFT), Speeded Up Robust Features (SURF), and Oriented FAST and Rotated BRIEF (ORB).
Table 3. Statistics of feature matching results on Scale Invariant Feature Transform (SIFT), Speeded Up Robust Features (SURF), and Oriented FAST and Rotated BRIEF (ORB).
OriginalDenoised
SIFTN Points990201990225
Matching Rate00.41740.5016
SURFN Points989880989825
Matching Rate0.61440.6424
ORBN Points990000990000
Matching Rate0.71940.7399
Table 4. The Root-Mean-Square Error (RMSE) of Absolute Trajectory Error (ATE) of noisy sequences and denoised sequences at different noise levels (“×” means the tracking is lost).
Table 4. The Root-Mean-Square Error (RMSE) of Absolute Trajectory Error (ATE) of noisy sequences and denoised sequences at different noise levels (“×” means the tracking is lost).
Noise LevelNoisyDenoisedNoise LevelNoisyDenoisedNoise LevelNoisyDenoised
( σ c , σ s )SequenceSequence( σ c , σ s )SequenceSequence( σ c , σ s )SequenceSequence
(0.005, 0.0)0.006660.00670(0.0, 0.01)0.007280.00638(0.005, 0.01)0.007050.00632
(0.01, 0.0)0.006990.00642(0.0, 0.02)0.007220.00648(0.01, 0.02)0.007340.00731
(0.02, 0.0)0.007410.00690(0.0, 0.04)0.007640.00628(0.02, 0.04)0.01480.00652
(0.03, 0.0)0.014990.00668(0.0, 0.06)0.022850.00669(0.03, 0.06)0.01130.00685
(0.04, 0.0)0.009250.00698(0.0, 0.08)0.027520.00724(0.04, 0.08)×0.00683
(0.05, 0.0)0.026190.00654(0.0, 0.10)×0.00678(0.05, 0.10)×0.00665
(0.06, 0.0)×0.00669(0.0, 0.12)×0.00657(0.06, 0.12)×0.00743
(0.07, 0.0)×0.00664(0.0, 0.14)×0.00816(0.07, 0.14)×0.00801
(0.08, 0.0)×0.00672(0.0, 0.16)×0.00788(0.08, 0.16)××
(0.09, 0.0)×0.00675(0.0, 0.18)×0.00867(0.09, 0.18)××
(0.10, 0.0)×0.00689(0.0, 0.20)××(0.10, 0.20)××
Table 5. Estimation statistics of the clean sequence.
Table 5. Estimation statistics of the clean sequence.
TrajectoriesMax Err (m)Mean Err (m)Median Err (m)Min Err (m)Rmse Err (m)Std Err (m)
clean sequence ATE0.0223810.0087070.008470.0014980.009830.00453
denoised sequence ATE0.019990.0072070.0068140.0014030.0081840.003854
Table 6. RMSE of ATE in meters after translation and scale alignment on the KITTI dataset (average over ten times).
Table 6. RMSE of ATE in meters after translation and scale alignment on the KITTI dataset (average over ten times).
SequencesKITTI-00KITTI-01KITTI-02KITTI-03KITTI-04
Denoised6.218838367.940122.46781.7589980.841975
Original8.0375375.062525.238331.1719830.906354
SequencesKITTI-05KITTI-06KITTI-07KITTI-08KITTI-09
Denoised5.74612616.421752.51580549.9824342.10563
Original5.84766814.667472.45180751.2459943.4026

Share and Cite

MDPI and ACS Style

Cao, L.; Ling, J.; Xiao, X. Study on the Influence of Image Noise on Monocular Feature-Based Visual SLAM Based on FFDNet. Sensors 2020, 20, 4922. https://doi.org/10.3390/s20174922

AMA Style

Cao L, Ling J, Xiao X. Study on the Influence of Image Noise on Monocular Feature-Based Visual SLAM Based on FFDNet. Sensors. 2020; 20(17):4922. https://doi.org/10.3390/s20174922

Chicago/Turabian Style

Cao, Like, Jie Ling, and Xiaohui Xiao. 2020. "Study on the Influence of Image Noise on Monocular Feature-Based Visual SLAM Based on FFDNet" Sensors 20, no. 17: 4922. https://doi.org/10.3390/s20174922

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop