Next Article in Journal
Adopting AI in the Context of Knowledge Work: Empirical Insights from German Organizations
Previous Article in Journal
An Effective Student Grouping and Course Recommendation Strategy Based on Big Data in Education
 
 
Article
Peer-Review Record

Local Transformer Network on 3D Point Cloud Semantic Segmentation

Information 2022, 13(4), 198; https://doi.org/10.3390/info13040198
by Zijun Wang 1,2, Yun Wang 1,2,*, Lifeng An 1, Jian Liu 1 and Haiyang Liu 1
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Reviewer 4: Anonymous
Information 2022, 13(4), 198; https://doi.org/10.3390/info13040198
Submission received: 19 February 2022 / Revised: 25 March 2022 / Accepted: 2 April 2022 / Published: 14 April 2022
(This article belongs to the Topic Big Data and Artificial Intelligence)

Round 1

Reviewer 1 Report

This manuscript contains exciting work from local transformer network on 3D point cloud semantic segmentation. However, the manuscript is written in style more like a report rather than a research article. The global innovativeness in research development hasn't been fully presented. Some figures and tables which involve world-wide novel research should be described and discussed with more details to emphasize the state-of-the-art-review all over the world novelty. Please use this the newest (2018-2022) Web of Science journal papers.

Author Response

Dear Reviewer:

We are very grateful for your comments and valuable suggestions, which have further improved our paper. We sincerely hope that this revised manuscript has addressed all your comments and suggestions.

Sincerely,

Ms. Wang

Author Response File: Author Response.pdf

Reviewer 2 Report

The paper presents interesting topics but it is overstressed  for missing of the meaning of mathematical symbols used in many formulas. Therefore the following are to be observed carefully for having a better presentation:

- in formula 3, which concatenation operation ⊕ stands?

- in formula 4, MLP stands for?

- in figure 2, please specify the role of KNN

- in formula 5, ??(???1), ??(???2), ??(???3) stand for? Please explain what is the batch normalization

- what is V in formula 17 and ff.?

- chinese at row 253!!!

- in figure 6, up-sampling stands for?

- in formula 23, mij stands for? And what is Mij in formula 22? Maybe mij=Mij?

- what is the formula for getting the numbers of tables  1, 2, 3, 4?

Author Response

Dear Reviewer:

We are very grateful for your comments and valuable suggestions, which have further improved our paper. We sincerely hope that this revised manuscript has addressed all your comments and suggestions.

Sincerely,

Ms. Wang

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments

 

1 – The paper is well written and well organized.

 

2 – Line 90. Please define the “mIoU” acronym.

 

3 – Section 1 should end with a paragraph stating the organization of the remainder of the paper.

 

4 – Line 112. Please, define the MLP acronym.

 

5 – Line 127. Please, define the DGCNN acronym.

 

6 – Lines 129 and 130. We have “proposed a graph attention convolution (GAV),…” I think it should be GAC.

 

7 – Line 163. We have the “normal of every point”. Please, explain what is the normal, in this context.

 

8 – Please treat the equations as elements of text. The punctuation rules also apply to equations. After equation (1) and (2), we should have a “,”.  Equations (3) and (4) should end with a final dot. Equations (6) and (11) are missing a final dot….Please check all the equations.

 

9 – Line 177. We have “is the ?1 distance.” It should be the “is the ?1 norm.”.

 

10 – Please do not use symbols with double meaning. In equation (1), ‘n’ is the normal vector. In line 262, ‘n’ is the number of neighbors. Please check on this.

 

11 – On the experimental results from Figure 7 to Figure 11.

  1. On the caption, please change “line” to “row.
  2. Explain what is the difference between the (a), (b), and (c) cases.

 

12 – On the experimental results from Figure 9 to Figure 11, please change, “Third line is the results …..”  -> “The third row has the results …..”

 

 

Writing

 

1 – Line 22

sematic

->

semantic

 

2 – Line 30

Convolutional neural network has

->

Convolutional neural network (CNN) has

 

researchers considered how

->

with researchers considering how

 

3 – Line 34

projected the 3D point clouds onto the 2D plane, generated the bird's

->

project the 3D point clouds onto the 2D plane, generating the bird's

 

4 – Line 43

segmentation accuracy; When the voxel

->

segmentation accuracy.  When the voxel

 

5 – Line 58

This paper proposed a local transformer

->

This paper proposes a local transformer

 

6 – Line 60

we used transformer to learn

->

we use transformer to learn

 

7 – Line 70

adopted encoder-decoder structure.

->

adopts encoder-decoder structure.

 

8 – Line 74

In decoder layer, we used the

->

In the decoder layer, we use the

 

9 – Line 85

We proposed two different key

->

We propose two different key

 

10 – Line 99

Squeezeseg[8] converts

->

SqueezeSeg [8] converts

 

11 – Line 103

range image (RV) and bird’s view image (BEV),

->

range view (RV) image and bird’s eye view (BEV) image,

 

12 - Line 105

The network divided the point

->

The network divides the point

 

13 – Line 106

and then extracted the features

->

and then extracts the features

 

14 – Line 124

and then obtained

->

and then obtains

 

15 – Line 125

local feature map; Finally

->

local feature map. Finally

 

16 – Line 127

resorted to

->

resorts to

 

17 – Line 131

more accurate represent of local features

->

more accurate representations of local features

 

18 – Line 146

In this paper, we proposed a local transformer

->

In this paper, we propose a local transformer

 

19 – Line 171

of points are concatenate by the normalized

->

of points are concatenated by the normalized

 

20 – Line 173

and encode each neighbor

->

and encodes each neighbor

 

21 – Line 178

Then original input feature

->

Then, original input feature

 

22 – Line 188

is shown in Figure 2:

->

is shown in Figure 2.

 

23 – Line 202

of the ?, ?, ? matrix.

->

of the ?, ?, ? matrices.

 

24 – Line 204

product between ? and ?, The attention score

->

product between ? and ?. The attention score

 

25 – Line 217

defined as follow:

->

defined as follows:

 

26 – Line 232

has not learned enough feature yet to obtain

->

has not learned enough features yet to obtain

 

27 – Line 245

can be finally obtain by the operation

->

can be finally obtained by the operation

 

28 – Line 253

We have “shown in Figure 5 错误!未找到引用源。”Please correct this.

 

to the encoder module obtained

->

to the encoder module are obtained

 

29 – Line 260

is the coordinate set

->

are the coordinate set

 

30 – Line 261

at decoder block. respectively.

->

at decoder block, respectively.

 

31 – Line 272

datasets is a dataset for semantic task

->

datasets address semantic task

 

32 – Line 292

We set the number of encoder layers is 7

->

We set the number of encoder layers as 7

 

33 – Line 294

Since the random down

->

The random down

 

34 – Line 295

is most efficient than other

->

is more efficient than other

 

35 – Line 311

and SemanticKIITI respectively.

->

and SemanticKIITI, respectively.

 

36 - Line 352. Caption of figure 9.

We have repeated “First line is the ground truth. First line is the ground truth.”

Author Response

Dear Reviewer:

We are very grateful for your comments and valuable suggestions, which have further improved our paper. We sincerely hope that this revised manuscript has addressed all your comments and suggestions.

Sincerely,

Ms. Wang

Author Response File: Author Response.pdf

Reviewer 4 Report

This paper presents a transformer-based architecture for semantic segmentation of 3D point cloud. It utilizes the local content information to make the processing of large-scale point cloud data possible. A cross-skip selection is proposed to expand the receptive of field without increasing the computational load. The experiments are carried out on several public datasets and comparison is performed with other techniques. There are some merits of this work, but there are still several issues to be addressed or fixed. First, the major contribution should be emphasized more clearly. Although some key features are presented, it is also expected to see the improvement over the previous transformer-based methods. Second, several figures in the paper are not clear enough. Please make the figures easy to read in terms of text size, etc. If some network structures are borrowed from the previous works, please provide the sources. Third, semantic segmentation of 3D point cloud is an important research topic with many practical applications. This paper presents the results on outdoor traffic scenes and the indoor environment. However, the adoption to the robotics is also one important application scenario, for example, in the recent work "BiLuNetICP: a deep neural network for object semantic segmentation and 6D pose recognition," IEEE Sensors Journal, May 2021. This should be properly addressed or compared. Fourth, the writing of this work is not straightforward to follow. It is suggested to substantially rewrite with a more structured way, especially providing more reasoning instead of only plain statement of the implementation details. Also, one reference is missing on page 8. Fifth, as most related works provide open source for testing and validation, it is also suggested the authors to make the code publicly available to increase the impact of this paper.

Author Response

Dear Reviewer:

We are very grateful for your comments and valuable suggestions, which have further improved our paper. We sincerely hope that this revised manuscript has addressed all your comments and suggestions.

Sincerely,

Ms. Wang

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

I recommend paying more attention to describing worldwide research innovativeness in the future.

Reviewer 2 Report

the authors have   replied positively to the remarks of the reviewer, their corrections are exhaustive

Back to TopTop