Next Article in Journal
Lithosphere Ionosphere Coupling Associated with Seismic Swarm in the Balkan Peninsula from ROB-TEC and GPS
Previous Article in Journal
Occlusion and Deformation Handling Visual Tracking for UAV via Attention-Based Mask Generative Network
 
 
Article
Peer-Review Record

Mixed Feature Prediction on Boundary Learning for Point Cloud Semantic Segmentation

Remote Sens. 2022, 14(19), 4757; https://doi.org/10.3390/rs14194757
by Fengda Hao, Jiaojiao Li *, Rui Song, Yunsong Li and Kailang Cao
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Remote Sens. 2022, 14(19), 4757; https://doi.org/10.3390/rs14194757
Submission received: 28 July 2022 / Revised: 30 August 2022 / Accepted: 15 September 2022 / Published: 23 September 2022

Round 1

Reviewer 1 Report

Review

In the article " Mixed Feature Prediction on Boundary Learning for Point Cloud Semantic Segmentation" the authors proposed a model of semantic segmentation of the point cloud in the image using the dynamic feature aggregation (DFA) module. The authors conducted many experiments and as a result of their research obtained better results than those known from the literature from recent years. 

The authors have made an extensive literature review - 76 literature items, most of them from the last few years. The article is interesting, written according to the rules of writing articles. The subject matter, however, is not new. 

I have a few minor comments:

1. In chapter 3.3.1, equation 9 - there is no information on how to determine the threshold parameter.

2. Figure 7. What is the resolution of the input images? The input image at the bottom is blurred.

   There is no legend to the colors, e.g. black-board, etc.

3. Table 2 shows the parameters: overall accuracy (OA), average accuracy (mAcc) and mean IoU (mIoU) - the formulas for calculating these parameters are missing.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

In this paper, the authors proposed a pre-training based approach for point cloud segmentation. During the pre-training phase, some of the object boundary points were swapped with their farthest local neighbors as input to build the pretext task. Different neighborhood ranges were considered and stacked for different regions of the object through a Dynamic Feature Aggregation (DFA) module. After pre-training, the pre-trained model was used for downstream tasks like semantic segmentation, object classification, object detection tasks to test the model qualify under different settings on public datasets. Extensive comparisons with state-of-the-art approaches were included. I have some concerns that need to be addressed.

1. What is the probability/proportion of feature swapping in the pretraining task? For example, in BERT pretraining with MLM loss, 15% of tokens are masked. Please elaborate on how you decided the mixing percentage for this task.

2. Centroid and boundary centroids as denoted in Fig.3 are confusing. Are boundary centroids just simply boundary points and centroid points are those points calculated by equation (1)? You were trying to swap 3D coordinates of centroid points with their farthest local neighbor? Please clarify. In addition, if Section 3.2.3 is describing the high-pass filter, it would be more informative to name it “3.2.3. High-pass filter” directly.

3.In Section 3.2.3, how are relations defined between nodes? (A in graph G).

4. In Fig. 2, what are the purposes of adding four MLPs after the DFA module in the pretext task? I would also recommend the authors plot out the details of encoder and decoder for a better reading experience.

5. In Sections 5.3 and 5.5, what is the task/dataset used in Tables 9 and 11? Why not using ScanNet v2 and S3DIS as used in both Sections 5.2 and 5.4 for a consistent comparison?

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

This paper presented an efficient framework for point cloud semantic segmentation. First, the authors adopt a self-supervised paradigm to learn boundary points. They designed a pretext task to predict the original sharp features of the point clouds by mixed boundary features. Then, they developed a Dynamic Feature Aggregation (DFA) module for high-level spatial representations. They introduced a new boundary-label consistent loss to maintain the global distribution of the boundary regions. 

The paper is well written, and the contributions are significant.

Minor grammar spellings need to be corrected.

How is the inference time of the proposed methods different from other methods in the literature?

We suggest the following: 

1) Improving the abstract to make it more concise and focus on explaining the contributions.

2) Including the inference time analysis in Table 11.

3) Making the code that generated the results publicly available and adding its link to the paper.

4) Expanding the results to more datasets.

 

 

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

The authors addressed my concerns.

Back to TopTop