Next Article in Journal
Source Depth Discrimination Using Intensity Striations in the Frequency–Depth Plane in Shallow Water with a Thermocline
Next Article in Special Issue
GeoSparseNet: A Multi-Source Geometry-Aware CNN for Urban Scene Analysis
Previous Article in Journal
Enhancing SAR Multipath Ghost Image Suppression for Complex Structures through Multi-Aspect Observation
Previous Article in Special Issue
A Deep-Learning-Based Method for Extracting an Arbitrary Number of Individual Power Lines from UAV-Mounted Laser Scanning Point Clouds
 
 
Article
Peer-Review Record

Hybrid 3D Reconstruction of Indoor Scenes Integrating Object Recognition

Remote Sens. 2024, 16(4), 638; https://doi.org/10.3390/rs16040638
by Mingfan Li 1, Minglei Li 1,2,*, Li Xu 1 and Mingqiang Wei 3
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Remote Sens. 2024, 16(4), 638; https://doi.org/10.3390/rs16040638
Submission received: 18 December 2023 / Revised: 6 February 2024 / Accepted: 7 February 2024 / Published: 8 February 2024
(This article belongs to the Special Issue Point Cloud Processing with Machine Learning)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The paper presents a novel method for creating lightweight 3D models of indoor environments, including both room shapes and indoor objects, from point clouds.

 

Internal and External Segmentation: The paper proposes to segment the point cloud into internal and external components, and use different reconstruction strategies for each part. The external part is reconstructed by extracting planes and selecting the optimal subset of intersecting faces. The internal part is reconstructed by classifying and fitting CAD models to the segmented objects.

 

Object-Level Features and Model Fitting: The paper designs a set of features based on shape, spatial, statistical, and proprietary properties of the point clouds, and trains a random forest classifier to recognize indoor objects. The paper also introduces a model fitting method that involves extracting object-level key points and minimizing the distance between corresponding key points.

 

Experimental Evaluation and Comparison: The paper evaluates the effectiveness and performance of the proposed method on point clouds from different indoor scenes, and compares it with some state-of-the-art methods. The paper shows that the proposed method can generate accurate and lightweight 3D models, and handle complex scenes with occlusions and irregularities.

 

Some of the challenges of the paper are:

 

Data Quality and Completeness: The paper relies on the quality and completeness of the input point cloud, which may be affected by noise, outliers, and missing data. The paper does not address how to deal with these issues or how they affect the reconstruction results.

Object Recognition and Classification: The paper uses a supervised learning approach to recognize and classify indoor objects, which requires a large amount of labeled data and may not generalize well to unseen or novel objects. The paper does not discuss how to handle object variations, ambiguities, or uncertainties in the classification process.

Model Fitting and Alignment: The paper uses a model fitting approach to place CAD models at the target positions of the indoor objects, which assumes the availability and suitability of the CAD models for the objects. The paper does not consider the possible errors or inconsistencies between the CAD models and the point clouds, or how to refine or adjust the models to fit the data better.

 

The authors present some qualitative and quantitative comparisons of their method with other state-of-the-art methods, such as Polyfit, Polyfit (Bbox), and RfD-Net. They use the S3DIS dataset and the LiDAR point cloud collected by Leica BLK360 as the input data, and the Scan2CAD as the CAD model library.

The authors use several metrics to evaluate their method, such as scene completeness, fitting error, and computational efficiency. They also show some visual results of object and scene reconstruction under different levels of occlusion and complexity.

 

The authors claim that their method outperforms some state-of-the-art methods in terms of accuracy, completeness, and efficiency. They compare their method with Polyfit [19], Polyfit with Bbox, and RfD-Net [11] on various indoor scenes from the S3DIS dataset [38] and the LiDAR point cloud collected by Leica BLK3605. They use metrics such as fitting error, object number, and runtime to evaluate the performance of different methods. The results show that their method can generate visually appealing and lightweight models with reasonable object placement and low fitting error. The method can also handle complex scenes with occlusions and varying point density. The method is also efficient, as it can complete both classification and reconstruction tasks within a few minutes.

 

The authors claim that their method can achieve better results than the other methods in terms of generating lightweight models, preserving details, and handling occlusions. They also state that their method is more efficient and robust than the data-driven methods that require large training datasets and are sensitive to noise and outliers.

However, the authors do not provide any statistical analysis or significance tests to support their claims. They also do not discuss the limitations or drawbacks of their method, such as the dependency on the quality of the CAD models, the assumption of the piecewise planarity of the room shape, and the possible failure cases or scenarios.

The authors could improve their paper by addressing these issues and providing more rigorous and comprehensive evaluation of their method. They could also compare their method with more recent and relevant works in the field of indoor scene reconstruction.

 

Some possible suggestions to improve the paper are:

 

* Clarify the novelty and contribution of the proposed method compared to existing works. The paper does not clearly state what are the main advantages or differences of the hybrid reconstruction method over other methods that use geometric primitives or instance segmentation.

 

* Provide more details and analysis on the feature extraction and classification steps for indoor object modeling. The paper briefly mentions the shape, spatial, statistical, and proprietary features used for the random forest classifier, but does not explain how they are computed or why they are effective for different objects.

 

* Evaluate the robustness and scalability of the proposed method on more challenging and diverse indoor scenes. The paper only shows results on six rooms with relatively simple and regular structures. It would be interesting to see how the method performs on scenes with more complex and irregular shapes, occlusions, clutter, and noise.

 

* Compare the performance and efficiency of the proposed method with other state-of-the-art methods for indoor scene reconstruction. The paper only compares the results qualitatively with Polyfit and RfD-Net, but does not provide any quantitative metrics or runtime analysis. It would be helpful to show some numerical comparisons and discuss the strengths and limitations of the proposed method.

 

* The number of points in the point cloud samples is an important factor that affects the performance and quality of the reconstruction methods. Therefore, it would be helpful if the authors specify the exact number of points in their samples, or at least provide some statistics or ranges of the point cloud density and completeness. This would allow the readers to better understand the characteristics and challenges of the input data, and to compare the results with other methods more fairly and objectively.

 

*  The authors used six scenes from the S3DIS dataset and one scene from the LiDAR point cloud collected by Leica BLK360. They did not specify how many samples they used from the Scan2CAD dataset, but they mentioned that they chose the CAD models from this dataset. Therefore, the total number of samples (scenes) from each type of dataset is:

 

S3DIS: 6 scenes

LiDAR: 1 scene ?

Scan2CAD: unknown ?

 

I urge authors to a) Introduce S3DIS, since not everyone has to know it's scope and the resolution of data (number of points, covered area, etc.). b) pinpoint the data that's included in this study from s3dis (i.e. mention labels of data that's evaluated within the proposed method) and the rationale on why these specific scenes are selected. c) How they incorporate the lidar readings from the operational camera (Leica). d) What and how scan2cad used, in what scope, which units, the number of units and if possible, their labels. Also, introducing scan2cad is another urgency.

In the abstract, the phrase “internal and external points based on scene distribution and geometric shapes” is unclear and vague. What does scene distribution mean? How are the geometric shapes used to segment the points? A more specific and concise description is needed.

In the introduction, the sentence “This paper presents a hybrid indoor reconstruction method integrating object-level segmentation and recognition.” is too broad and does not reflect the main contribution of the paper. A more specific and informative sentence would be: “This paper presents a hybrid indoor reconstruction method that segments the room point cloud into internal and external components, and then reconstructs the room shape using intersecting faces and the indoor objects using CAD model fitting.”

In Section 3.1, the sentence “The method uses the geometric characteristics and internal relations of objects to generate lightweight indoor models including both individual objects and room shapes, as shown in Figure 1.” should be moved to the end of the section, as it summarizes the main idea of the section. Also, the term internal relations is not well-defined and should be explained or replaced with a more concrete term.

Comments on the Quality of English Language

Nothing worth mentioning.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 2 Report

Comments and Suggestions for Authors

Dear Authors,

 

Thank you for submitting your valuable manuscript.

Here are some comments that will improve the quality of your manuscript:

 

1. The problem you are trying to solve is too subjective. These contents are mentioned in the introduction, but the number of previous studies is not large, and the limitations are not cited in the issues derived from the study. The absence of a source indicates that the limitations were derived by the authors themselves rather than from other studies. Therefore, it is questionable whether the problem being solved is the problem that academia is trying to solve overall.

2. P.16 Figure 12. If occlusion is 90%, it is difficult to recognize the shape. To be honest, (d) in figure 12 should not be recognized as a chair. The fact that a shape that is difficult for anyone to judge as a chair is judged to be a chair raises reasonable doubt. Otherwise, comparative analysis results for false negatives or false positives should be presented.

 

3. P.16 4.2. Quantitative Comparisons.

One of the aims of this study appears to enhance object recognition. Therefore, it should be shown how many objects were recognized correctly and how many were incorrectly recognized among all objects. The results about accuracy should be compared with previous studies. The superiority of the classification accuracy of the algorithm developed by the authors cannot be confirmed based on the table presented in the manuscript alone.

 

Sincerely,

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 3 Report

Comments and Suggestions for Authors

Abstract​​

  • The abstract is well-written, providing a concise overview of the work.
  • However, it could benefit from a brief mention of the key results or findings to provide readers with a sense of the paper's impact.

Introduction​​

  • The introduction effectively sets the context and highlights the significance of indoor 3D reconstruction.
  • It would be advantageous to include a more explicit statement of the research problem and how your work addresses it.
  • Adding a brief overview of your method's novelty or advantages early in the introduction can capture the reader's interest.

Related Work​​

  • The review of related work is comprehensive, covering various techniques in indoor 3D reconstruction.
  • Consider discussing how your method compares to or improves upon these existing techniques more explicitly.
  • Highlighting the limitations of current methods more that your work overcomes would strengthen this section.

Methodology​​

  • The methodology is well-explained, with clear descriptions of the techniques used.
  • The use of figures to illustrate the process is commendable.
  • However, more detail on the algorithmic choices and their justifications would be beneficial.
  • The methodology section could be improved with a more critical analysis of potential limitations or challenges of your approach.

Results and Evaluation​​

  • The paper presents a good qualitative comparison with other methods.
  • Inclusion of quantitative results, such as accuracy metrics or performance benchmarks, would significantly enhance this section.
  • Discussing the implications of the results and how they validate or challenge existing theories or practices would add depth to the analysis.

Language and Grammar

  • The paper is generally well-written, but there are occasional grammatical errors and awkward phrasing. A thorough proofreading is recommended.
  • Ensure consistency in the use of technical terms and acronyms throughout the paper.

For example:

  1. Abstract​​:
    • Original: "We segment the room point cloud into internal and external points based on scene distribution and geometric shapes."
    • Suggestion: Consider rephrasing for clarity, e.g., "We segment the point cloud of the room into internal and external points based on the scene's distribution and geometric shapes."
  2. Introduction​​:
    • Original: "Nevertheless the reconstruction of complete indoor scenes from imperfect point clouds characterized by inherent noise and incompleteness remains a persistent challenge."
    • Suggestion: Add a comma for better readability, e.g., "Nevertheless, the reconstruction of complete indoor scenes from imperfect point clouds, characterized by inherent noise and incompleteness, remains a persistent challenge."
  3. Methodology Section​​:
    • Original: "We use KNN improved the RANSAC-based algorithm to extract planes."
    • Suggestion: Revise for grammatical correctness, e.g., "We use KNN to improve the RANSAC-based algorithm for plane extraction."
  4. Results and Evaluation​​:
    • Original: "The majority of objects featured in our test dataset predominantly consist of furniture items such as chairs tables cabinets and sofas."
    • Suggestion: Add commas for clarity, e.g., "The majority of objects featured in our test dataset predominantly consist of furniture items, such as chairs, tables, cabinets, and sofas."

These examples illustrate minor issues with phrasing, punctuation, and clarity. Addressing these throughout the document will enhance its overall readability and professionalism

Conclusion and Future Work

  • The conclusion effectively summarizes the paper but could be strengthened by highlighting the main contributions more explicitly.
  • Suggestions for future work are somewhat limited. Expanding on potential research directions or applications of your work would be beneficial.

Overall Assessment

  • The paper presents a novel and interesting approach to indoor 3D reconstruction.
  • Strengthening the comparative analysis with existing methods and providing more quantitative results would enhance the paper's impact.
  • Attention to detail in terms of language and clearer exposition of the methodology's rationale will improve the paper's clarity and readability.

 

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The paper has significantly improved in the passing rounds. Therefore it can be published as is.

Author Response

Dear reviewers and editor(s),

 

Thank you very much for taking the time to review this article and provide valuable comments. We appreciate your full consideration of our manuscript. Your expertise and suggestions have been instrumental in shaping and enhancing the quality of our research work. We truly value your professional input and your professional advice is important to us, which will have a profound impact on our future work.

 

Sincerely,

All authors

Reviewer 2 Report

Comments and Suggestions for Authors

Dear Authors,

Thank you for your kind comments.

Your comments would be highly appreciated.

 

About Comments 1:

The issues to be solved are well proved.

 

About Comments 2:

You assumed that each object in the room is a chair, table, cabinet, sofa, or clutter. Your research seems to be able to adapt only to meeting rooms. It might make your research impact lower. If research output can be used only specific situation or a special case, its impact is not as high as research output for a general situation or a general case.

 

About Comments 3:

Thanks to your matrix. The results are demonstrated well.

 

Sincerely,

Author Response

Dear reviewers and editor(s),

 

Thank you very much for taking the time to review this article and provide valuable comments. We appreciate your full consideration of our manuscript. Your expertise and suggestions have been instrumental in shaping and enhancing the quality of our research work. We truly value your professional input and your professional advice is important to us, which will have a profound impact on our future work.

 

About Comments 2:

 

We appreciate your concern regarding the potential limitations of our research. As mentioned in the 1. Introduction, automated indoor scene modelling is highly challenging due to the complexity of indoor environments. In the past, indoor scene reconstruction predominantly relied on manual methods. However, in recent years, automated approaches for indoor scene modelling have gradually emerged. Meeting rooms and offices are among the most common scenes we encounter and are provided as essential facilities by our university, facilitating convenient access to corresponding scene data. These scenes typically feature core objects such as chairs, tables, cabinets, sofas, and clutter, which comprise the majority of indoor environments, even in residential settings. Thus, we believe that by initially addressing the indoor reconstruction of these objects, we can effectively tackle the majority of indoor scene challenges. While our initial focus may have been on specific objects within meeting rooms, we are committed to expanding the scope of our research to ensure broader applicability. We will take proactive steps to address these considerations in our future work. For example, exploring more diverse indoor scenes, such as bedrooms, living rooms, etc., which may require more challenging point cloud instance segmentation methods, we will adopt deep learning methods to explore the depth characteristics between objects in indoor scenes in the future work to solve this problem. Thank you again for your valuable comments.

 

Sincerely,

All authors

Back to TopTop