Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Local Transformer Network on 3D Point Cloud Semantic Segmentation

Information 2022, 13(4), 198; https://doi.org/10.3390/info13040198

by Zijun Wang^1,2

, Yun Wang^1,2,*, Lifeng An¹, Jian Liu¹ and Haiyang Liu¹

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Reviewer 4: Anonymous

Information 2022, 13(4), 198; https://doi.org/10.3390/info13040198

Submission received: 19 February 2022 / Revised: 25 March 2022 / Accepted: 2 April 2022 / Published: 14 April 2022

(This article belongs to the Topic Big Data and Artificial Intelligence)

Round 1

Reviewer 1 Report

This manuscript contains exciting work from local transformer network on 3D point cloud semantic segmentation. However, the manuscript is written in style more like a report rather than a research article. The global innovativeness in research development hasn't been fully presented. Some figures and tables which involve world-wide novel research should be described and discussed with more details to emphasize the state-of-the-art-review all over the world novelty. Please use this the newest (2018-2022) Web of Science journal papers.

Author Response

Dear Reviewer:

We are very grateful for your comments and valuable suggestions, which have further improved our paper. We sincerely hope that this revised manuscript has addressed all your comments and suggestions.

Sincerely,

Ms. Wang

Author Response File: Author Response.pdf

Reviewer 2 Report

The paper presents interesting topics but it is overstressed for missing of the meaning of mathematical symbols used in many formulas. Therefore the following are to be observed carefully for having a better presentation:

- in formula 3, which concatenation operation ⊕ stands?

- in formula 4, MLP stands for?

- in figure 2, please specify the role of KNN

- in formula 5, ??(???1), ??(???2), ??(???3) stand for? Please explain what is the batch normalization

- what is V in formula 17 and ff.?

- chinese at row 253!!!

- in figure 6, up-sampling stands for?

- in formula 23, mij stands for? And what is Mij in formula 22? Maybe mij=Mij?

- what is the formula for getting the numbers of tables 1, 2, 3, 4?

Author Response

Dear Reviewer:

We are very grateful for your comments and valuable suggestions, which have further improved our paper. We sincerely hope that this revised manuscript has addressed all your comments and suggestions.

Sincerely,

Ms. Wang

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments

1 – The paper is well written and well organized.

2 – Line 90. Please define the “mIoU” acronym.

3 – Section 1 should end with a paragraph stating the organization of the remainder of the paper.

4 – Line 112. Please, define the MLP acronym.

5 – Line 127. Please, define the DGCNN acronym.

6 – Lines 129 and 130. We have “proposed a graph attention convolution (GAV),…” I think it should be GAC.

7 – Line 163. We have the “normal of every point”. Please, explain what is the normal, in this context.

8 – Please treat the equations as elements of text. The punctuation rules also apply to equations. After equation (1) and (2), we should have a “,”. Equations (3) and (4) should end with a final dot. Equations (6) and (11) are missing a final dot….Please check all the equations.

9 – Line 177. We have “is the ?1 distance.” It should be the “is the ?1 norm.”.

10 – Please do not use symbols with double meaning. In equation (1), ‘n’ is the normal vector. In line 262, ‘n’ is the number of neighbors. Please check on this.

11 – On the experimental results from Figure 7 to Figure 11.

On the caption, please change “line” to “row.
Explain what is the difference between the (a), (b), and (c) cases.

12 – On the experimental results from Figure 9 to Figure 11, please change, “Third line is the results …..” -> “The third row has the results …..”

Writing

1 – Line 22

sematic

semantic

2 – Line 30

Convolutional neural network has

Convolutional neural network (CNN) has

researchers considered how

with researchers considering how

3 – Line 34

projected the 3D point clouds onto the 2D plane, generated the bird's

project the 3D point clouds onto the 2D plane, generating the bird's

4 – Line 43

segmentation accuracy; When the voxel

segmentation accuracy. When the voxel

5 – Line 58

This paper proposed a local transformer

This paper proposes a local transformer

6 – Line 60

we used transformer to learn

we use transformer to learn

7 – Line 70

adopted encoder-decoder structure.

adopts encoder-decoder structure.

8 – Line 74

In decoder layer, we used the

In the decoder layer, we use the

9 – Line 85

We proposed two different key

We propose two different key

10 – Line 99

Squeezeseg[8] converts

SqueezeSeg [8] converts

11 – Line 103

range image (RV) and bird’s view image (BEV),

range view (RV) image and bird’s eye view (BEV) image,

12 - Line 105

The network divided the point

The network divides the point

13 – Line 106

and then extracted the features

and then extracts the features

14 – Line 124

and then obtained

and then obtains

15 – Line 125

local feature map; Finally

local feature map. Finally

16 – Line 127

resorted to

resorts to

17 – Line 131

more accurate represent of local features

more accurate representations of local features

18 – Line 146

In this paper, we proposed a local transformer

In this paper, we propose a local transformer

19 – Line 171

of points are concatenate by the normalized

of points are concatenated by the normalized

20 – Line 173

and encode each neighbor

and encodes each neighbor

21 – Line 178

Then original input feature

Then, original input feature

22 – Line 188

is shown in Figure 2:

is shown in Figure 2.

23 – Line 202

of the ?, ?, ? matrix.

of the ?, ?, ? matrices.

24 – Line 204

product between ? and ?, The attention score

product between ? and ?. The attention score

25 – Line 217

defined as follow:

defined as follows:

26 – Line 232

has not learned enough feature yet to obtain

has not learned enough features yet to obtain

27 – Line 245

can be finally obtain by the operation

can be finally obtained by the operation

28 – Line 253

We have “shown in Figure 5 错误!未找到引用源。”Please correct this.

to the encoder module obtained

to the encoder module are obtained

29 – Line 260

is the coordinate set

are the coordinate set

30 – Line 261

at decoder block. respectively.

at decoder block, respectively.

31 – Line 272

datasets is a dataset for semantic task

datasets address semantic task

32 – Line 292

We set the number of encoder layers is 7

We set the number of encoder layers as 7

33 – Line 294

Since the random down

The random down

34 – Line 295

is most efficient than other

is more efficient than other

35 – Line 311

and SemanticKIITI respectively.

and SemanticKIITI, respectively.

36 - Line 352. Caption of figure 9.

We have repeated “First line is the ground truth. First line is the ground truth.”

Author Response

Dear Reviewer:

We are very grateful for your comments and valuable suggestions, which have further improved our paper. We sincerely hope that this revised manuscript has addressed all your comments and suggestions.

Sincerely,

Ms. Wang

Author Response File: Author Response.pdf

Reviewer 4 Report

This paper presents a transformer-based architecture for semantic segmentation of 3D point cloud. It utilizes the local content information to make the processing of large-scale point cloud data possible. A cross-skip selection is proposed to expand the receptive of field without increasing the computational load. The experiments are carried out on several public datasets and comparison is performed with other techniques. There are some merits of this work, but there are still several issues to be addressed or fixed. First, the major contribution should be emphasized more clearly. Although some key features are presented, it is also expected to see the improvement over the previous transformer-based methods. Second, several figures in the paper are not clear enough. Please make the figures easy to read in terms of text size, etc. If some network structures are borrowed from the previous works, please provide the sources. Third, semantic segmentation of 3D point cloud is an important research topic with many practical applications. This paper presents the results on outdoor traffic scenes and the indoor environment. However, the adoption to the robotics is also one important application scenario, for example, in the recent work "BiLuNetICP: a deep neural network for object semantic segmentation and 6D pose recognition," IEEE Sensors Journal, May 2021. This should be properly addressed or compared. Fourth, the writing of this work is not straightforward to follow. It is suggested to substantially rewrite with a more structured way, especially providing more reasoning instead of only plain statement of the implementation details. Also, one reference is missing on page 8. Fifth, as most related works provide open source for testing and validation, it is also suggested the authors to make the code publicly available to increase the impact of this paper.

Author Response

Dear Reviewer:

We are very grateful for your comments and valuable suggestions, which have further improved our paper. We sincerely hope that this revised manuscript has addressed all your comments and suggestions.

Sincerely,

Ms. Wang

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

I recommend paying more attention to describing worldwide research innovativeness in the future.

Reviewer 2 Report

the authors have replied positively to the remarks of the reviewer, their corrections are exhaustive

Article Menu

Local Transformer Network on 3D Point Cloud Semantic Segmentation

Further Information

Guidelines

MDPI Initiatives

Follow MDPI