Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

A Novel Real-Time Edge-Guided LiDAR Semantic Segmentation Network for Unstructured Environments

Remote Sens. 2023, 15(4), 1093; https://doi.org/10.3390/rs15041093

by Xiaoqing Yin, Xu Li^*, Peizhou Ni, Qimin Xu and Dong Kong

Reviewer 1:

Biao Yang

Reviewer 2:

Zongyue Wang

Reviewer 3: Anonymous

Remote Sens. 2023, 15(4), 1093; https://doi.org/10.3390/rs15041093

Submission received: 12 January 2023 / Revised: 10 February 2023 / Accepted: 13 February 2023 / Published: 16 February 2023

(This article belongs to the Special Issue Development and Application for Laser Spectroscopies)

Round 1

Reviewer 1 Report

This paper proposes a novel LiDAR semantic segmentation network for unstructured environments. The authors present an edge-guided method to alleviate the problem of edge blurring between classes in unstructured environments. In their method, the authors designed a supervised edge segmentation module for extracting accurate high-resolution edge features and an edge-guided fusion module for fusing edge features and point cloud semantic features. They added edge guidance in the loss function as well. The proposed method is validated by implementing different unstructured environment datasets and comparing the performance with the already published work.

The paper is interesting. I recommend publication if the following issues are solved.

1) The proposed method is tested with different datasets. However, this paper does not highlight the model of LiDARs used to collect the corresponding datasets. Do the inputs of the network need to be adapted to different LiDAR?

2) Why are the width and height of the input range images in the proposed network set to 2048 and 64? Please explain the basis for this setting in the text.

3) The fig6, fig7, and fig8 show the visual comparison with different methods on two datasets. The classes in these two datasets are not the same, but the authors use many of the same colors in the three figures, which is confusing. I suggest illustrating the color of the class in the figures.

4) The method uses a spherical projection preprocessing step. Does the single frame prediction time in Tables 1 and 2 include the data preprocessing time?

5) What is the appropriate threshold (Equation 14) that can be used in everyday practice?

6) The abbreviation of LiDAR was not introduced, and the writing of LiDAR was changed twice within the paper.

7) The writing needs to be polished again to correct mistakes and avoid confusion, for example:

• In 1. introduction, "The main contributions of this article are concluded as followed" should be "follows" not "followed".

• In 2. related work, 2.1. Semantic Segmentation of Large-Scale Point Clouds, "This approach can effectively decrease the amount of data and realize semantic segmentation in real-time" should be "real time" not "real-time".

• In 2. related work, 2.3. Edge Improved Semantic Segmentation, "TAKIKAWA et al.[35] design a edge detection stream" should be "an" not "a".

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

A novel real-time edge-guided LiDAR semantic segmentation network for unstructured environments

To address the blurred class edges phenomenon in unstructured environments based on LiDAR segmentation, this paper proposed a novel edge-guided network for real-time LiDAR semantic segmentation in unstructured environments. This paper designs two modules to decrease blurred class edges: 1) Edge Segmentation Module: extracting edge features. 2) Edge Guided Fusion: fusing edge features and main branch features through channel attention. Especially, the network has state-of-the-art performance in the segmentation of drivable areas and large-area static obstacles in unstructured environments. Here are the reviewer's opinions.

Pros:

1) This article illustrated the challenge and motivation well.

2) The method is written in detail, and the reasons for designing the network model are analyzed.

3) The results are convincing, showing improved performance in unstructured environments.

Cons:

1) Compared to the baseline, does your method harm performance in the structured region? Maybe you should use the full test set to verify robustness.

2) You should provide the flops and parameters of the model.

3) write your limitation down.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 3 Report

This paper proposes using the edge prediction task to assist the semantic segmentation task in 3d unstructured environments. The segmentation results show improvement compared to existing works. The paper's overall structure is well organized, clearly presenting the work contents. I have the following questions and comments:

1. The related work is not comprehensive, there are some other 3d point cloud edge-guided methods, e.g., JSENet. I suggest you compare the proposed method with existing methods.

(Hu, Z., Zhen, M., Bai, X., Fu, H., & Tai, C. (2020). JSENet: Joint Semantic Segmentation and Edge Detection Network for 3D Point Clouds. ArXiv. https://doi.org/10.1007/978-3-030-58565-5_14)

2. In Figure 1, it would be beneficial to 1) include the blocks used in the main branch (currently only the feature maps are shown) and corresponding legends as done in other modules, 2) label the output size for each layer, and 3) add the missing loss functions to the figure.

3. Formulation (1) is similar to Rangenet++, and a few other formulations, such as loss functions, are identical to SalsaNet. I suggest you avoid this and focus on your original contribution.

4. In Figure 3, it would be improved if you could enlarge the region to better show 1) the affected edge map area and 2) the differences between without inpainting and with inpainting.

5. In the main branch method section, if I understand correctly, you use three one-by-one convolution operations instead of the original setting in SalsaNext. Could you explain this change? It would be beneficial to add an ablation study to compare these two settings to showcase the benefits of your design.

6. Again in the main branch method section, This part is not clear to me. It would be beneficial to provide more detail in the appendix regarding the architecture details. Also, the Pixel-Shuffle layer has already been used in SalsaNext, it would be helpful to remark on that. If it is a new proposal by you, it would be useful to provide an ablation study to highlight its advantages.

7. In the EAB module, how can you concatenate inputs of different sizes? If I understand correctly, the outputs are always (Ci, H, W) in each EAB layer. However, one of the inputs (feature map from the main branch) has a different size in each encoder. It would be helpful to provide more details in the appendix or release your code as open source.

8. I suggest you provide a reference for the inpainting algorithm. If it is your idea, you need to give a pseudo code.

9. I couldn't find the description of the un-projection method for mapping the 2D range back to the 3D point cloud. As you use an inpainting method to project 3D point clouds into range images, will this affect the un-project process? If you interpolate some pixels, will the generated point cloud differ from the ground truth? If so, how do you calculate test accuracy?

10. In the results section, it would be better to provide the generated edge maps to showcase your results (if there are too many, you can put them in the appendix section).

11. It is suggested to reinforce the content of the discussion section and separate it from the conclusion to improve the quality of the paper.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Round 2

Reviewer 3 Report

The author has revised the suggestions made!

Article Menu

A Novel Real-Time Edge-Guided LiDAR Semantic Segmentation Network for Unstructured Environments

Further Information

Guidelines

MDPI Initiatives

Follow MDPI