Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Optical Flow and Expansion Based Deep Temporal Up-Sampling of LIDAR Point Clouds

Remote Sens. 2023, 15(10), 2487; https://doi.org/10.3390/rs15102487

by Zoltan Rozsa^1,2,*

and Tamas Sziranyi^1,2

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3:

Seungwon Song

Reviewer 4:

Chenxing Wang

Remote Sens. 2023, 15(10), 2487; https://doi.org/10.3390/rs15102487

Submission received: 13 February 2023 / Revised: 21 April 2023 / Accepted: 6 May 2023 / Published: 9 May 2023

(This article belongs to the Special Issue Computer Vision and Image Processing)

Round 1

Reviewer 1 Report

This paper presents a method for termporal upsmpling of LiDAR point clouds using camera images that can be acquired at higher frame rate. This topic is worth to be researched due to its practical relevance, particularly in automotive applications. The paper is rather well written and presents sound results with an appropriate analysis and comparison to the baselines. There are few minor issues that should be solved before publication: you cite [26] for extrinsic camera-LiDAR calibration, while the considered applications seems to demand spatio-temporal calibration between the two sensors. Please comment on this and add relevant related work. Also, Fig. 1 presents a multi-camera setup, but the paper seems to consider only a LIDAR-camera pair. Do you build (or at least tried to build) a panoramic image for this system ? Also treatment of the dynamic obstacles shall be commented more extensively. It is not covered in the related work and in fact not explained in the section about the pipeline (Section 3 and 3.4 in particular). Lastly, there are few typos / grammar errors, e.g. at line 167; RANSAC usually expands to Random Sample Consensus, not as you wrote in line 206.

Author Response

Response to Reviewer 1 Comments

Thank you very much for acknowledging our work.

There are few minor issues that should be solved before publication:

We have addressed each of your comments as follows.

you cite [26] for extrinsic camera-LiDAR calibration, while the considered applications seems to demand spatio-temporal calibration between the two sensors. Please comment on this and add relevant related work.

In our tests, different synchronized sub-datasets of the KITTI database are used. However, you are right; in a general application, spatio-temporal LIDAR-camera calibration should be applied to avoid synchronization problems. We have corrected the reference to [1]. Thank you for bringing it to our attention.

[1] S. Yoon, S. Ju, H. M. Nguyen, S. Park and J. Heo, "Spatiotemporal Calibration of Camera-LiDAR Using Nonlinear Angular Constraints on Multiplanar Target," in IEEE Sensors Journal, vol. 22, no. 11, pp. 10995-11005, 1 June1, 2022, doi: 10.1109/JSEN.2022.3168860.

Also, Fig. 1 presents a multi-camera setup, but the paper seems to consider only a LIDAR-camera pair. Do you build (or at least tried to build) a panoramic image for this system ?

You are right. The paper considers a LIDAR-camera pair. Fig. 1 visualizes a multi-camera setup from the Argoverse dataset to illustrate how the method can generate a circular LIDAR frame in case of an appropriate arrangement of multiple cameras. We have not tried to build panoramic images instead of simultaneously processing multiple ones, but we will consider this as a future work. It is written in the Conclusions part of the paper now. Thank you for the idea.

Also treatment of the dynamic obstacles shall be commented more extensively. It is not covered in the related work and in fact not explained in the section about the pipeline (Section 3 and 3.4 in particular).

Differentiating static and dynamic objects is unnecessary with the proposed pipeline (only ground points are handled differently among the stationary points). In fact, this is one of the advantages of our pipeline compared to alternatives.

As 3D scene flow is estimated for each point, movement estimation is included (point-wise). All the points of static objects should have the same scene flow value (the ego-motion vector in opposite direction), and points of a dynamic object should have some other value (same for the same objects). This phenomenon can be observed in Fig. 6 (Fig. 8 in the original paper), where approaching vehicle points have about 2m estimated displacement. In comparison, static environment points (including parking cars) have a roughly 1m estimated displacement.

This is now explained in more detail in Section 3 and indicated in the Contribution (1.1) subaection.

Lastly, there are few typos / grammar errors, e.g. at line 167; RANSAC usually expands to Random Sample Consensus, not as you wrote in line 206.

You are right about the RANSAC acronym. We have corrected this and also spell-checked the paper. Thank you for the constructive criticisms.

Reviewer 2 Report

This paper presents a framework for generating virtual point clouds based on optical flow, which can improve the temporal resolution of point clouds while maintaining the spatial resolution of point clouds, and can help ADAS improve the responsiveness of target recognition. The main contributions of this paper are threefold. First, this pipeline generates a predicted virtual point cloud with the help of images to solve the problem that the spatial resolution and temporal resolution are difficult to balance during the point cloud measurement. Second, this method can enable real-time run of this system and temporal up-sampling of LIDAR measurements. Third, the generated point cloud has a certain improvement in accuracy.

This paper is well-motivated. I would recommend that we accept this paper subject to the following modifications:

There are some mistakes in all Equations, such as adding a comma after the formula and requiring no space before “where”. Moreover, in lines 146 to 152, maybe you need to add “;” after every sentence.
It is suggested to add illustrations in Figure 5 and Figure 6, which make it easier for readers to understand. Moreover, you can put these two figures together.
There are some misspellings in line 11 (upsampling) and before line 178 (results).
The sentence in line 121 could merge into another paragraph.
In the first and second lines of Table 5, there are format errors in the references. The label should be placed uniformly after the method name.
In Section 3.3, please provide more details about how to set the threshold of finding points approximately at the same distance on this plane or the threshold.
In Section 4, the accuracy and errors of different methods can be presented together by comparing their visual results.
Could the proposed method provide more practical testing?
Some recent works still need to be included and added in the revised version, such as A feature-preserving framework for point cloud denoising, Computer-Aided Design. For example, “In the past two decades, the development of technologies of point cloud denoising has promoted the rapid development of research in 3D object recognition.

Author Response

Response to Reviewer 2 Comments:

This paper is well-motivated. I would recommend that we accept this paper subject to the following modifications:

Thank you very much for acknowledging our work and its contributions. We have addressed each of your comments as follows.

There are some mistakes in all Equations, such as adding a comma after the formula and requiring no space before “where”. Moreover, in lines 146 to 152, maybe you need to add “;” after every sentence.

Thank you for pointing out these mistakes. We have added the missing commas and spaces to the equations and text.

It is suggested to add illustrations in Figure 5 and Figure 6, which make it easier for readers to understand. Moreover, you can put these two figures together.

Thank you for the suggestion. We have put together Fig. 5. and Fig. 6 with Fig 4. for better understanding, as the formal Fig. 4. illustrated the inputs for the optical flow and motion-in-depth estimations.

There are some misspellings in line 11 (upsampling) and before line 178 (results).

Thank you for noticing. We have corrected the misspellings.

The sentence in line 121 could merge into another paragraph.

Thank you. We have merged the sentence into the previous paragraph.

In the first and second lines of Table 5, there are format errors in the references. The label should be placed uniformly after the method name.

Thank you for bringing it to our attention. We have placed the labels uniformly after the method name.

In Section 3.3, please provide more details about how to set the threshold of finding points approximately at the same distance on this plane or the threshold.

We have applied a 0.2 m threshold to find inlier points of the given plane. This is now written in Section 3.3.

In Section 4, the accuracy and errors of different methods can be presented together by comparing their visual results.

Thank you for the suggestion. We have moved the visual results into the same subsection where the quantitative results are presented.

Could the proposed method provide more practical testing?

The quantitative results of the paper present the prediction results of about 8000 frames (KITTI odometry – 6860 frames, KITTI depth completion – 999 frames). Besides, we have provided qualitative results on the Argoverse dataset (for illustration of 360^o LIDAR frame generation and applicability to different LIDAR sensors). In a future work, we plan to extend our tests. It is written in the Conclusions part of the paper now. All of the above tests are real-life measurements and scenarios.

Some recent works still need to be included and added in the revised version, such as A feature-preserving framework for point cloud denoising, Computer-Aided Design. For example, “In the past two decades, the development of technologies of point cloud denoising has promoted the rapid development of research in 3D object recognition.

Thank you for the advice; we have taken it. We added recent work about depth completion [1] and included the suggested literature with the proposed context in the Introduction part.

[1] Ö. Zováthi, B. Pálffy, Z. Jankó and C. Benedek, "ST-DepthNet: A spatio-temporal deep network for depth completion using a single non-repetitive circular scanning Lidar," in IEEE Robotics and Automation Letters, doi: 10.1109/LRA.2023.3266670.

Reviewer 3 Report

This paper proposes to use the motion-in-depth network to estimate the scene-flow of the LiDAR pointclouds.

Overall, it seems to be just a combination of several modules, so the novelty is limited.

In detail, the optical flow and the motion-in-depth results are generated by existing networks.

Only the 3D scene flow has originality but not novelty. Moreover, there is not enough explanation.

Also, the concept of maintaining the ground seems to be good,

but if only dynamic objects are predicted, the novelty in the mapping is reduced.

In addition, the disparity of the image is reflected directly in the LiDAR, so the accuracy will be lower.

Author Response

Response to Reviewer 3 Comments:

This paper proposes to use the motion-in-depth network to estimate the scene-flow of the LiDAR pointclouds.

Overall, it seems to be just a combination of several modules, so the novelty is limited.

In detail, the optical flow and the motion-in-depth results are generated by existing networks.

Only the 3D scene flow has originality but not novelty. Moreover, there is not enough explanation.

Thank you for the constructive criticisms. You are right about we apply existing networks and system components in our pipeline. However, we have extended these components (see Section 5.3). Most importantly, our most significant scientific contribution is the following: we have constructed a novel pipeline that is able to solve the temporal up-sampling problem of LIDAR point clouds, which has high relevance in decreasing the delays in automotive systems.

We have provided further explanation and literature about 3D scene flow in Section 3.4 and 3.5.

Also, the concept of maintaining the ground seems to be good, but if only dynamic objects are predicted, the novelty in the mapping is reduced.

Thank you.

Dynamic objects are predicted with the proposed pipeline. In fact, the main reason for the up-sampling is to monitor them frequently, not the mapping. A separate evaluation is provided just for dynamic objects in Section 5.2. (and we have outperformed alternatives). Also, most of our illustrations contain dynamic objects and our predictions of their locations, e.g., Fig. 1. blue and green enlargements (with preceding and subsequent poses), Fig.7,8 (Fig 9,10 in the original paper), etc.

3D scene flow is estimated for each point; movement estimation is included (point-wise). All the points of static objects should have the same scene flow value (inverse of the ego-motion vector), and points of a dynamic object should have some other value (same for the same objects). This phenomenon can be observed in Fig. 6 (Fig. 8 in the original paper), where approaching vehicle points have about 2m estimated displacement. In comparison, static environment points (including parking cars) have a roughly 1m estimated displacement.

The above about dynamic objects may have needed to be clearly stated in the paper. We stated this in Section 1.1 (Contribution) and have extended Section 3 with a description of dynamic object handling.

In addition, the disparity of the image is reflected directly in the LiDAR, so the accuracy will be lower.

We do not estimate disparities in the pipeline; we use depth values coming from the LIDAR (t-1, as input) and estimate depths (t, as output) of a virtual LIDAR frame (in time moments when there were no measurements available). The disparity image was used only to represent the input depth values in Fig. 4 for better visualization.

It may not have been clearly stated in the paper, so we corrected this in Section 3.

Reviewer 4 Report

The papse resolves the low resolution problem for Lidar point clouds using an efficient and low-cost method that sounds effective and meaningful. The paper is well orgnized and well expressed, which can be publised after some format corrections.

Author Response

Response to Reviewer 4 Comments:

The paper resolves the low resolution problem for Lidar point clouds using an efficient and low-cost method that sounds effective and meaningful. The paper is well organized and well expressed, which can be published after some format corrections.

Thank you very much for acknowledging our work and its strengths. We have corrected the paper.

The major changes and in the paper have been colored blue and underlined in the supplementary document of the submission.

Round 2

Reviewer 3 Report

While the overall content remains unchanged and therefore lacks novelty, the results of the study are interesting, and I believe that the manuscript is sufficient for publication as most of the previous comments have been answered and the missing explanations have been filled in.

Article Menu

Optical Flow and Expansion Based Deep Temporal Up-Sampling of LIDAR Point Clouds

Response to Reviewer 1 Comments

Response to Reviewer 2 Comments:

Response to Reviewer 3 Comments:

Response to Reviewer 4 Comments:

Further Information

Guidelines

MDPI Initiatives

Follow MDPI