Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Unstructured Road Segmentation Based on Road Boundary Enhancement Point-Cylinder Network Using LiDAR Sensor

Remote Sens. 2021, 13(3), 495; https://doi.org/10.3390/rs13030495

by Zijian Zhu¹

, Xu Li^1,*, Jianhua Xu², Jianhua Yuan³ and Ju Tao²

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Remote Sens. 2021, 13(3), 495; https://doi.org/10.3390/rs13030495

Submission received: 30 December 2020 / Revised: 27 January 2021 / Accepted: 29 January 2021 / Published: 30 January 2021

Round 1

Reviewer 1 Report

In the paper ‘Unstructured Road Segmentation Based on Road Boundary Enhancement Point-Cylinder Network Using LiDAR Sensor’ the authors present a specific neuronal network for road segmentation algorithm based on a cylindrical voxelization structure. To further improve the results the network makes use of an additional input from a preliminary road boundary point extraction based on a traditional RANSAC algorithm. The linguistic presentation is ok for large parts of the papers but certain abstracts definitely need a thoroughly revision. From a scientific point of view the paper is valuable but at some parts incomplete or even erroneous which needs to be fixed before publications.

A short section on how the coordinate system of the LIDAR data is defined is missing. For readers that are unfamiliar with data for self-driving technology this is not obvious (scanner centre is the coordinate systems origin and xy-plane is parallel to road surface!). The definition of the coordinate system is import for both, the 3D RANSAC-Boundary algorithm and the Point-Cylinder substructure. A small figure would be helpful. On the other hand, computing a 3d plane from three points is so basic that formulae do not to be presented (in case paper length is an issue). By the way, why not use vector notation – would be way more compact and easier to read.

The 3D RANSAC-Boundary algorithm is basically fine, but in the current form it will not work if the scanner xy-plane is titled against the road surface (e.g. caused by speed bump or slope change). Doing a simple plane adjustment and then, computing the (signed) distance to the adjusted plane would resolve this issue. Furthermore, the algorithm will fail is case of vertically curved road surfaces. If not tackled those two aspects should be at least mentioned in the paper.

There are several issues with formula (5). So the cylinder coordinates are l(k), theta(k) and z(k), right? Explicitly mention that maybe even before the formula. The floor operation does a discretization (i.e. voxelization) of the cylinder coordinates, right? But why is there no discretization for the theta? Is this because the data are already captured that way? Should be mentioned. The last term “floor(z(k) * v ) = w” should be “floor(z(k) * w ) = v”, or? And w is the ‘voxel resolution’ in z direction, right? w is not mentioned. Or should it be ‘r’?

The class labels (6 classes?) were defined confusing. Around line 361: Background? I guess should be ground. People = person? Colour legend for figure 4 and figure 5 would also be helpful. Roads are rather magenta than red (as described in figure 4 caption)

Figure 1 very nicely shows the overall structure of the proposed network. Since the boundary enhanced network performs better than the non-enhanced network I tried to understand what the network actually “sees” by the duplication of the boundary points. Is there an analogy to classical mathematic methods, like e.g. increasing the weight of “boundary points”. I also questioned myself if a similar improvements can be achieved by adding the boundary flag as additional input channel. Please comment on that (maybe in 3.3.2 in the first paragraph)

The results of the ‘ideal’ paper should be reproducible. Especially for NN papers this is especially difficult if the authors do not provide the network weights in some sort of repository. I know that is not commonly done yet, but the authors should think about it for future research. Nevertheless, the presented algorithms uses several parameters and threshold which values are never mentioned. e.g. distanceTol, zTol, voxel resolution r, number of epochs for training, etc. Do list them somewhere!

Some abbreviation were not introduced: LiDAR, RANSAC, IoU, SGD. By the way the writing of LiDAR was changed several times within the paper.

Further minor comments are directly made within the attached pdf

Comments for author File: Comments.pdf

Author Response

Response to Reviewer 1 Comments

Dear Reviewer:

Thanks a lot for your precious time and thorough review of the manuscript entitled “Unstructured Road Segmentation Based on Road Boundary Enhancement Point-Cylinder Network Using LiDAR Sensor” (ID: remotesensing-1076834). Here, we would like to express our sincere appreciation to you for the valuable comments, which are very important to improve the quality of the paper.

Point 1: A short section on how the coordinate system of the LIDAR data is defined is missing. For readers that are unfamiliar with data for self-driving technology this is not obvious (scanner centre is the coordinate systems origin and xy-plane is parallel to road surface!). The definition of the coordinate system is import for both, the 3D RANSAC-Boundary algorithm and the Point-Cylinder substructure. A small figure would be helpful. On the other hand, computing a 3d plane from three points is so basic that formulae do not to be presented (in case paper length is an issue). By the way, why not use vector notation – would be way more compact and easier to read.  

Response 1: Thanks a lot for your valuable comment. As you suggested, these coordinate systems are not familiar to some readers, and if they are not clarified, the readers will misunderstand. Therefore, we added a description of the LiDAR point cloud data coordinate system in lines 58-61 in the text. Added a description of the cylinder coordinate system in lines 273-275.

Indeed, as you said, calculating the 3d plane formula by three points is very basic, so it is deleted. The original formula's expression is more in line with the code, making programming easier for readers, so we did not use the form of vector notation.

Point 2: The 3D RANSAC-Boundary algorithm is basically fine, but in the current form it will not work if the scanner xy-plane is titled against the road surface (e.g. caused by speed bump or slope change). Doing a simple plane adjustment and then, computing the (signed) distance to the adjusted plane would resolve this issue. Furthermore, the algorithm will fail is case of vertically curved road surfaces. If not tackled those two aspects should be at least mentioned in the paper.

Response 2: Many thanks for your thorough consideration and valuable suggestion. For the situation where the scanner xy-plane is titled against the road surface, we modified the algorithm pseudo-code to resolve this situation. However, in vertically curved road surfaces, this article's boundary extraction algorithm cannot extract the rough road boundary very well, so we add the description in lines 499-503.

The current algorithm only discusses some unstructured roads in the urban environment, and there are still some scenarios that have not been considered for the time being. In the future, we will continue our work so that the algorithm can segment the road well in more possible unstructured scenarios.

Point 3: There are several issues with formula (5). So the cylinder coordinates are l(k), theta(k) and z(k), right? Explicitly mention that maybe even before the formula. The floor operation does a discretization (i.e. voxelization) of the cylinder coordinates, right? But why is there no discretization for the theta? Is this because the data are already captured that way? Should be mentioned. The last term “floor(z(k) * v ) = w” should be “floor(z(k) * w ) = v”, or? And w is the ‘voxel resolution’ in z direction, right? w is not mentioned. Or should it be ‘r’?

Response 3: Thanks a lot for reviewing out manuscript carefully. Indeed, due to our negligence, there are some problems with the formula. We have modified it and added a description of the relevant coordinate system in lines 273-275. The angle is not discretized in the formula because the angle calculated in the actual programming is directly rounded to use, that is to say, the discretization resolution of the angle is pi/180. The expression of the last formula is incorrect and has been corrected. Your understanding is correct. In this paper, the resolution of z direction is also r which is 0.05 in programming. The specific values of these parameters are also added in lines 283-285 of the paper.

Point 4: The class labels (6 classes?) were defined confusing. Around line 361: Background? I guess should be ground. People = person? Colour legend for figure 4 and figure 5 would also be helpful. Roads are rather magenta than red (as described in figure 4 caption).

Response 4: Thanks a lot for your insightful suggestion. I'm sorry that the inconsistency in the statements made you have some misunderstandings. In the description, the background is actually more appropriate as 'others' and has been corrected. The actual classification is road, buildings, vegetation, vehicles, people, and others. Lines 368-374 have been corrected and unified. At the same time, the description of figure.4 red has also been revised.

By the way, let me explain our data set which you asked in the pdf in line 442. In fact, we have made a 32-beams LiDAR data set (1000 frames of point cloud data, 2 scenes) in accordance with the KITTI format. To reduce the amount of manual annotation, we map the original 19 classes of KITTI into 6 classes and mark, train and test on the 32-beams LiDAR data. These two scenes are also independent of the original KITTI data. After 21 scenes of the original KITTI data, we named them 22 and 23. In the simple scenario of the first part of the experiment, our model is still trained based on KITTI data(19 classes). In the second experiment's more complex scenario, we added two scenarios of 32-beams LiDAR for training(6 classes).

When drawing pictures, it is a good idea to draw colour legends, but the focus is on the distinction between roads and the surrounding environment (regardless the output result is 19 or 6 classes), roads, buildings, and vegetation are drawn uniformly after mapping. Therefore, if we draw a total colour legend, it may distract readers, so only the colours corresponding to roads, buildings, and vegetation are described.

Point 5: Figure 1 very nicely shows the overall structure of the proposed network. Since the boundary enhanced network performs better than the non-enhanced network I tried to understand what the network actually “sees” by the duplication of the boundary points. Is there an analogy to classical mathematic methods, like e.g. increasing the weight of “boundary points”. I also questioned myself if a similar improvements can be achieved by adding the boundary flag as additional input channel. Please comment on that (maybe in 3.3.2 in the first paragraph).

Response 5: Thanks a lot for your professional comment. In the 2D image field, the paper "Embedding Structured Contour and Location Prior in Siamesed Fully Convolutional Networks for Road Detection" uses the canny operator to extract the image's edge information to enhance the neural network. I think there is also a certain analogy in LiDAR point clouds. The extracted boundary can be used as a "prior value" to help the network to classify different objects connected more accurately. But because it is difficult to extract the edge information of lidar in large scenes, this paper uses a simple method to extract the road's possible boundaries in real-time as the "prior value" of the neural network's boundary information. As you said, such an algorithm increases the weight of the boundary points, making the boundary points more likely to be correctly classified. The problem of semantic segmentation is often that the boundaries of different objects are difficult to distinguish. This part of the explanation is added to lines 303-307 in 3.3.2 in the first paragraph of the article.

As for the boundary marker as a channel input, this may be really effective and can effectively reduce the network's size. But if this kind of flag is only fused at the original data level, how to perform point cloud convolution and Cylinder or Voxel convolution on the shape of (N, 2) data requires more sophisticated design.

Point 6: The results of the ‘ideal’ paper should be reproducible. Especially for NN papers this is especially difficult if the authors do not provide the network weights in some sort of repository. I know that is not commonly done yet, but the authors should think about it for future research. Nevertheless, the presented algorithms uses several parameters and threshold which values are never mentioned. e.g. distanceTol, zTol, voxel resolution r, number of epochs for training, etc. Do list them somewhere!

Response 6: Thanks a lot for your insightful suggestion. As you have suggested, opening the code helps to facilitate the development of the open-source community and related technologies. Regrettably, this paper is one of the main research results of a funded project. Among the participating units of this project, one enterprise believes that BE-PCFCN has potential commercial application value. At the same time, our current work is to use C++ to transplant the algorithm to the actual vehicle platform for experiments, so it is very regrettable that it cannot be open source for the time being.

More detailed parameters in the algorithm have also been added to lines 283-285 and 391.

Point 7: Some abbreviation were not introduced: LiDAR, RANSAC, IoU, SGD. By the way the writing of LiDAR was changed several times within the paper.

Response 7: Thanks a lot for your insightful suggestion. All the explanations of abbreviation have been added to the article. LiDAR(line 50), RANSAC (line 216), IoU (line 334), SGD (line 345). At the same time, we also revised the spelling of all LiDAR in the paper. Thank you again for your careful reminder.

Point 8: Further minor comments are directly made within the attached pdf.

Response 8: Thank you very much for making so many suggestions for my article carefully. I have read your comments very carefully and revised them one by one.

Finally, the authors would like to express our gratitude again to the reviewer for the valuable comments and suggestions, as well as the time and efforts spent in the review. With the insightful comments and suggestions made by the editors and reviewers, we are able to enhance and improve the quality of the paper.

If there are other problems or further requirements, please contact us in time.

Sincerely yours,

The authors.

Author Response File: Author Response.docx

Reviewer 2 Report

This paper implements a semantic segmentation algorithm for unstructured roads, based on a point-cylinder network combined with a road enhancement module. My main comments and questions are the following:

1. Line 70-72: The algorithm is based to a large extent on height differences to extract road boundaries. But what if the boundaries of an unstructured road do not present any clear height differences compared to the road surface itself? Roads without boundaries of higher vegetation or buildings exist: looking for 'countryside road' on the internet easily results in some example images. These perhaps do not the constitute the majority of roads, but when it comes to safety related to self-driving cars, all possible cases should be considered.

What is a suitable value for the threshold (line 209) that could be used in everyday practice?

This brings us to the more general question whether the proposed algorithm is reliable enough to segment highly unstructured roads without the aid of a different kind of sensor (e.g. a camera)? Where is the border between the roads it can reliably handle, and the roads it cannot handle? Although the results in Table 2 in my view seem not good enough to always guarantee a safe situation, authors seem to conclude that the algorithm can safely handle all situations by itself (lines 469-470: 'It has strong robustness and is suitable for scenes that require high road segmentation accuracy'; see also line 485-486).

The above questions should be discussed in the text as they have important consequences for the use of these algorithms in practice.

2. In line 132-134, it is stated that 'these methods' achieve good results in structured road segmentation, but do not perform well on unstructured road segmentation. Which methods are meant here? Of several methods mentioned in the previous paragraph, the publications (e.g. [14, 17, 18, 19]) do not include tests on road segmentation. Therefore authors should clarify on what basis they conclude that methods perform well in structured road segmentation, but not on unstructured road segmentation. They should add here some references in which tests on unstructured roads have been performed, and the results were negative.

3. It would be good to add some references in the 'methods' section, on which the current manuscript is based:

- reference to point-voxel CNN [18];

- reference to the RANSAC algorithm (lines 205-206);

- reference to 3D cylinder convolution [28] (to be added in section 3.3.1 and near equation 5).

In this way the reader can better distinguish which parts of the method are existing work, and which parts the authors have developed themselves.

4. The approach is based on deep learning algorithms, although it might be interesting to know the result of a simple segmentation approach based only on the module explained in section 3.2. It would be good to include this in the experiments in section 4, to give an idea to the reader.

5. Is the assumption in line 204 ('there is no plane larger than the ground in an unstructured road scene') always valid? Suppose that I am driving on a road along a lake (a flat surface). Please describe how the algorithm would behave in this case.

Other remarks:

- Line 51: 'color and texture features used by the camera sensor are not robust enough': please explain why this is, and add reference(s) to support this statement.

- Line 63: 'KITTI dataset': although there is a reference to this dataset in the paper ([32]), it is preferable that this reference is added here, at the first occurrence.

- Line 144-145: 'However, due to the limitation ... reduce the demand for the data.' Please clarify this sentence.

- Line 145: Gao et al. is ref. [26], not [32].

- Line 149: The author's surname is Holder, ref. [27], not [21].

- Line 170-171: 'poor data locality and irregularity': please explain what is meant here.

- Line 213-217: please explain in words what the vectors v1, v2 and the temporary variable i, j and k denote, to make this more easily understandable.

- Line 329: Lovasz loss: please explain briefly how this works.

- Line 338: SGD: please write in full.

- Line 385: 'SPVNAS is the best algorithm for semantic segmentation among open source codes': Is this so in general? Does this statement include all the work cited in the paper? Please specify and add a reference on which this statement is based.

Illustrations:

- Fig. 2 in my view does not illustrate the boundary extraction process in a clear way, since in image (a) it is not clear where the roads are situated. Please use a different lidar image that illustrates this process more clearly, or add a camera image to make Fig. 2 clearer, or some other image or diagram in which the roads are clearly indicated.

References:

Please correct or complete the references (a few examples are given below, but please check all references):

[6] in: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)Macau, China, November 4-8, 2019

[14] in: ACM Transactions on Graphics, Vol. 37, No. 6, 217 (2018).

[17] Add conference information: Long Beach, CA, USA, 15-20 June 2019; Add pages: 9623-9632

[19] Add conference information: Salt Lake City, UT, USA, 18-23 June 2018; Add pages: 9224-9232

[24] Check spelling of first author's name

[29] Add pages.

Language and spelling:

- Lines 61-63: 'Some algorithms (...) directly extract ... road segmentation algorithms': the structure of this sentence is not entirely clear. Please clarify.

- Line 70: 'determine'

- Line 104: 'self-driving': a word needs to be added?

- Line 121-122: 'While PointNet (...) feature extraction': this sentence cannot stand on its own; please correct.

- Line 132: '[21-24]e': drop 'e'

- Line 260-261: 'legitimacy'

- Line 344: 'typical'

Author Response

Response to Reviewer 2 Comments

Dear Reviewer:

Point 1: Line 70-72: The algorithm is based to a large extent on height differences to extract road boundaries. But what if the boundaries of an unstructured road do not present any clear height differences compared to the road surface itself? Roads without boundaries of higher vegetation or buildings exist: looking for 'countryside road' on the internet easily results in some example images. These perhaps do not the constitute the majority of roads, but when it comes to safety related to self-driving cars, all possible cases should be considered.  

Response 1: Thanks a lot for your valuable comment. As you said, if there is no obvious boundary between the road and the surrounding environment, this enhancement will be ineffective to a certain extent. This paper does have some limitations in the selection of unstructured road scenes, and most of our discussions are in the urban environment (line 47-49); even if roads are partially damaged, there are still boundaries. If there is no clear road boundary, this algorithm needs to be manually labelled for training, that is, the algorithm will degenerate to the situation without prior enhancement.

The essence of the enhanced algorithm in this paper is to give the network a priori value of the boundary to better distinguish between different kinds of adjacent objects, which is the difficulty of the segmentation problem. These statements were added to lines 303-307 of the paper.

Simultaneously, this paper also compares the performance of a single-branch network without boundary enhancement. The result shows that the IoU of road segmentation can still reach more than 90% when the 64-beams lidar is used as input. These statements were added to lines 425-427 of the paper.

In general, this paper's enhancement algorithm can achieve enhancement when there is a boundary, and it will degenerate into a network without road enhancement when there is no boundary. These statements were added to lines 499-503 of the paper.

Finally, thank you again for your valuable suggestions, in view of the degradation of existing algorithms in some scenes, we will continue our work in the future, so that the algorithm can segment the road well in more possible unstructured scenes.

Point 2: What is a suitable value for the threshold (line 209) that could be used in everyday practice?

Response 2: Thank you very much for your question. The parameters in the text are given specific values in line 390, and the threshold of exceeding the ground plane is set to 1.

Point 3: This brings us to the more general question whether the proposed algorithm is reliable enough to segment highly unstructured roads without the aid of a different kind of sensor (e.g. a camera)? Where is the border between the roads it can reliably handle, and the roads it cannot handle? Although the results in Table 2 in my view seem not good enough to always guarantee a safe situation, authors seem to conclude that the algorithm can safely handle all situations by itself (lines 469-470: 'It has strong robustness and is suitable for scenes that require high road segmentation accuracy'; see also line 485-486).

Response 3: Thank you very much for your comment. Sorry for making you confused in the presentation. The comparison results in Table 2 are based on 32-beams LiDAR as input. Table 1 is based on the test results of 64-beams LiDAR on the KITTI dataset. It can be seen that when the 64-beams LiDAR is used as input, the road segmentation result is perfect, but the 32-beams LiDAR is somewhat lacking when used as the input. The specific analysis is in lines 454-465 of the article.

This article wants to express that when using the same sensor, the algorithm BE-PCFCN proposed in this paper can have higher robustness and higher accuracy. And even when the sensor itself cannot provide enough information, the algorithm in this paper can also have better results than other algorithms. Therefore, the conclusions drawn in this article should be relatively reliable and precise rather than absolutely unconditional.

Thank you very much again for your suggestion. We re-expressed the results and added the analysis of Table 2 in lines 461-465. Changed the statement of the conclusion in lines 479-482 and 497-498.

Point 4: In line 132-134, it is stated that 'these methods' achieve good results in structured road segmentation, but do not perform well on unstructured road segmentation. Which methods are meant here? Of several methods mentioned in the previous paragraph, the publications (e.g. [14, 17, 18, 19]) do not include tests on road segmentation. Therefore authors should clarify on what basis they conclude that methods perform well in structured road segmentation, but not on unstructured road segmentation. They should add here some references in which tests on unstructured roads have been performed, and the results were negative.

Response 4: Thank you very much for your professional advice. The original expression does have inaccuracy. Based on your professional suggestions, we changed the description: "However, these algorithms, such as KPConv, RandLA-Net, and SPVNAS, are all aimed at point cloud semantic segmentation. And there is a lack of relevant experiments to prove whether these algorithms can be applied to unstructured roads directly" Lines 136-139.

The actual meaning is that due to the lack of improvements for unstructured road scenes and related data sets, there are few algorithms for unstructured road segmentation. At the same time, there is a lack of relevant experiments to prove whether semantic segmentation in structured scenes can be applied to unstructured roads.

Point 5: It would be good to add some references in the 'methods' section, on which the current manuscript is based:

- reference to point-voxel CNN [18];

- reference to the RANSAC algorithm (lines 205-206);

- reference to 3D cylinder convolution [28] (to be added in section 3.3.1 and near equation 5).

In this way the reader can better distinguish which parts of the method are existing work, and which parts the authors have developed themselves.

Response 5: Thank you very much for your suggestions. We have added relevant references in the method. Add a reference to Cylinder in line 183, a reference to RANSAC in line 216, and a reference to point-voxel in line 259.

Point 6: The approach is based on deep learning algorithms, although it might be interesting to know the result of a simple segmentation approach based only on the module explained in section 3.2. It would be good to include this in the experiments in section 4, to give an idea to the reader.

Response 6: Thank you for your suggestion. First of all, I am sorry that we did not elaborate on the function of the 3.2 module, which caused you to misunderstand. The discussion on the module's role in the network in Section 3.2 has been added to lines 303-307. This part alone does not directly output the segmentation results. It can only be concatenated on the feature map as a prior value so that we can obtain a greater response at the boundary.

We have done a comparative test of adding and removing road enhancement modules in the experimental part. You can see them in Figure 4, and there is a comparison in Table 1, which is also explained in lines 401-403.

Point 7: Is the assumption in line 204 ('there is no plane larger than the ground in an unstructured road scene') always valid? Suppose that I am driving on a road along a lake (a flat surface). Please describe how the algorithm would behave in this case.

Response 7: Thank you for your suggestion. As you said, such assumptions are not always true, but we still believe that they are true in the urban environment studied in this article(line 47-49). Currently this article does have the limitations of the scenario. When the assumption is not true, the position of the obtained enhanced point cloud is also unknown. That is, the point cloud will be enhanced at the unknown location. Since the point cloud label will not change, even if some positions' weight is enhanced, it will not negatively impact the segmentation result. Related discussions have been added to lines 211-214 and lines 499-503 of the paper.

Point 8: Line 51: 'color and texture features used by the camera sensor are not robust enough': please explain why this is, and add reference(s) to support this statement.

Response 8: Thank you for your suggestion. The camera sensor is not robust because it will fail at night and is greatly affected by reflection during the day. Modified and added references in lines 52-54.

Point 9: Line 63: 'KITTI dataset': although there is a reference to this dataset in the paper ([32]), it is preferable that this reference is added here, at the first occurrence.

Response 9: Thank you very much for reminding. A reference has been added on line 68.

Point 10: Line 144-145: 'However, due to the limitation ... reduce the demand for the data.' Please clarify this sentence.

Response 10: Thank you very much for reminding me. Due to the lack of data, some algorithms that use deep learning for unstructured road segmentation use some techniques to rely on data as little as possible. Two examples are given later: weakly supervised learning for regional growth and transfer learning. These discussions are in lines 150-155.

Point 11: Line 145: Gao et al. is ref. [26], not [32].

Response 11: Thank you very much for reminding. The reference has been changed on line 150.

Point 12: Line 149: The author's surname is Holder, ref. [27], not [21].

Response 12: Thank you very much for reminding. The reference has been changed on line 153.

Point 13: Line 170-171: 'poor data locality and irregularity': please explain what is meant here.

Response 13: Thank you very much for reminding. Related discussion has been added to the article 174-186 line: "The voxel-based method requires O(n) random memory accesses, in which "n" is the number of points. This method only needs to iterate over all points once to scatter them to their corresponding voxel grids. While for the point-based method, gathering all the neighbor points requires at least O(kn) random memory accesses, in which “k” is the number of neighbors. To conclude, the point-based method has irregularity and poor data locality"

Point 14: Line 213-217: please explain in words what the vectors v1, v2 and the temporary variable i, j and k denote, to make this more easily understandable.

Response 14: Thank you very much for your suggestion. v1 and v2 are two vectors in plane calculation, i, j, and k are intermediate variables in the calculation process.

Because another reviewer thinks that the plane obtained through 3 points is too basic, it is recommended to delete it. After thinking, we decided to delete these basic calculation formulas, hope you can understand.

Point 15: Line 329: Lovasz loss: please explain briefly how this works.

Response 15: Thank you for your suggestion. A related explanation has been added to lines 336-338 of the article: "Lovasz loss is a Lovaze extension of Jaccard (IoU) Loss, which performs better. It is suitable for multi-class segmentation tasks using IoU as the evaluation index."

Point 16: Line 338: SGD: please write in full.

Response 16: Thank you for your suggestion. The full of SGD(Stochastic gradient descent) is written in line 345.

Point 17: Fig. 2 in my view does not illustrate the boundary extraction process in a clear way, since in image (a) it is not clear where the roads are situated. Please use a different lidar image that illustrates this process more clearly, or add a camera image to make Fig. 2 clearer, or some other image or diagram in which the roads are clearly indicated.

Response 17: Thank you very much for pointing out the problem with the Fig.2 professionally. Based on your suggestion, we have redrawn the image (a) in Figure 2. In the image, pink represent roads and green represent others.

Point 18: Please correct or complete the references (a few examples are given below, but please check all references):