1. Introduction
In recent years, digital twin technology plays a critical role not only in industrial settings but also in aging facilities by improving productivity, enabling predictive and preventive maintenance, and reducing operational costs [
1]. A geometric digital twin creates an accurate virtual model based on the shape, scale, and structure of real objects or systems, which can then be used for simulation or monitoring purposes [
2,
3]. In the architecture sector, this is commonly achieved through Building Information Modeling (BIM), which enables a detailed geometric representation of buildings. BIM enables efficient management of all phases of a building’s lifecycle, including design, construction, and maintenance, while integrating up-to-date information about the structure [
4]. Consequently, many institutions worldwide are adopting BIM, with models for new buildings typically developed from Computer-Aided Design (CAD) files. Furthermore, BIM is widely used in construction as an Industry Foundation Classes (IFC) file, which supports interoperability across various software applications. However, for older buildings, CAD files are often unavailable or outdated, resulting in incomplete architectural information. Thus, creating accurate 3D models for these structures requires alternative methods that rely on real-world data rather than CAD-based files [
5,
6,
7].
High-precision 3D point cloud data can be efficiently acquired using Simultaneous Localization and Mapping (SLAM)-based laser scanners such as the Faro Orbis or NavVis VLX, and the process of creating BIM models from these 3D data is known as scan-to-BIM. Recently, most of this process was manual, requiring modelers to construct geometric structures by referencing 3D point cloud data [
8,
9]. The approach to 3D modeling varies by modeler, and the manual of the work demands considerable time and expense. Significant research has focused on automation, using deep learning models to segment 3D point cloud data at the point or object level for application in 3D modeling [
10,
11,
12,
13,
14,
15,
16].
Figure 1 illustrates the scan-to-BIM process: (a) shows the 3D point cloud data acquired via laser scanning, (b) demonstrates the results of semantic segmentation using a deep learning model, and (c) presents the final BIM generated from the semantic segmentation results. Achieving high accuracy in BIM modeling within scan-to-BIM workflows depends crucially on the performance of semantic segmentation enabled by deep learning models. Consequently, efforts to enhance the performance of semantic segmentation are essential [
17,
18,
19,
20].
Various deep learning models have been employed in scan-to-BIM processes, including PointNet [
21], RandLA-Net [
22], and Point Transformer [
23], with ongoing research further advancing these techniques [
24,
25,
26]. A key factor in leveraging these deep learning models effectively is access to large-scale, labeled datasets from diverse domains, particularly datasets containing labeled point cloud data. Fortunately, several indoor point cloud datasets, such as Stanford 3D Indoor Scene Dataset (S3DIS) [
27], ScanNet [
28], and LiDARNet [
29], are publicly available. However, these datasets lack sufficient diversity and representativeness, making it challenging to achieve robust generalization for deep learning models. Addressing data scarcity is therefore essential for enhancing the generalization performance of these models, and recent studies have proposed automatic dataset generation methods that utilize both BIM models and point clouds [
30,
31,
32,
33]. While some research explores generating synthetic point clouds, these data often differ significantly from real-world characteristics, limiting their usefulness in improving model generalization [
34,
35]. Consequently, constructing datasets based solely on real point cloud data remains a crucial approach.
This paper proposes a framework to address the scarcity of labeled 3D point cloud data by semi-automatically generating training datasets for semantic segmentation using 3D point clouds and BIM. By integrating BIM models with 3D point cloud data of buildings, the framework enables the extraction of point clouds corresponding to structural elements, such as doors, windows, floors, ceilings, walls, columns, and beams. Additionally, a preprocessing method for dividing floors and spaces is introduced to enhance training efficiency and maximize the use of 3D point cloud data in model training. Using the dataset generated through this framework, we demonstrate improved performance in 3D semantic segmentation and provide an analysis of the framework’s impact on the generalization of deep learning models.
The structure of this paper is as follows:
Section 2 reviews related work relevant to this study.
Section 3 describes the proposed method in detail. In
Section 4, we present the dataset constructed using the proposed approach and analyze the results of 3D semantic segmentation.
Section 5 discusses the effectiveness and limitations of the proposed framework. Finally,
Section 6 provides conclusions and suggests directions for future work.
2. Related Work
This section introduces dataset creation, scan-to-BIM, and semantic segmentation.
2.1. Dataset Creation
Deep learning models for semantic segmentation of 3D data still require substantial amounts of data. Although large-scale public datasets such as S3DIS [
27], ScanNet [
28], and LiDARNet [
29] exist, the volume of available 3D data is far less compared to 2D data [
36].
Recently, extensive research has focused on generating labeled data for semantic segmentation directly from original point clouds. This approach involves transferring labels from data represented in CAD or BIM models to point clouds, typically by sampling point clouds from these CAD or BIM sources. While some studies rely solely on synthetic data, others propose methods that combine synthetic data with real-world data.
Huang et al. [
32] proposed a method to generate synthetic training data using parametric BIM models. Their approach simulates indoor structures using parametric BIM models, defines the required spatial information and locations, and converts the BIM model to a triangular mesh format, sampling point clouds from the surface of each component. Labels corresponding to each component are assigned, and point clouds are generated from various viewpoints, with Gaussian noise added to simulate realistic laser scanning data. However, while this approach only generates data, it does not validate a deep learning model. Conversely, Ma et al. [
34] used three commercial software tools to convert BIM models into synthetic point clouds and validated their approach on PointNet [
21]. While synthetic data were used to augment real data and generalize models for various indoor structures, limitations remain due to differences from real-world data, making it challenging to achieve performance improvements with synthetic data alone.
With advancements in user-friendly 3D scanners such as NavVis VLX, research has increasingly focused on automating dataset creation using real point cloud data [
30,
33,
37,
38]. Geyter et al. [
30] constructed training datasets by automatically combining synthetic point clouds sampled from BIM models with real point cloud data, validating the approach on deep learning models; however, they treated labels for doors and windows as clutter. Abreu et al. [
37] created a dataset but did not conduct validation on deep learning models, while Birkeland et al. [
33] proposed a semi-automated workflow for real point cloud data from industrial settings, including validation on deep learning models. However, this study focused on industrial components like pipes rather than building data. Other research efforts provided multiple methods of automated labeling but focused on limited datasets for specific elements, such as beams and hooks, and conducted validations on these small datasets [
38]. In contrast, this study constructs a large-scale building dataset in a semi-automated manner and tests it with state-of-the-art deep learning models to analyze their generalization performance.
2.2. Scan-to-BIM
Scan-to-BIM refers to the process of creating BIM models from raw point cloud data collected via 3D laser scanning. Traditionally, this process has been largely manual in construction and industrial fields, but the high time and cost demands have driven significant research into automation methods [
39,
40]. Initial approaches focused on automatically generating general structures or floor plans by detecting walls within 3D point cloud data, rather than directly producing BIM models, often using geometric methods rather than deep learning [
8,
9]. For example, GeoRec [
18] introduced a multi-task neural network architecture that performs three main tasks—layout estimation, object detection, and object mesh generation—using RGB-D data to produce occlusion-robust object models. However, this method is closer to mesh generation than true BIM modeling. Kim et al. [
41] proposed an approach for automatically generating material information from existing scan data, extracting geometric data from point clouds and using semantic segmentation on panoramic images to identify material properties. While this method creates 3D models with material information, reconstructing structured building elements remains challenging. Li et al. [
19] suggested a method for generating BIM models of existing buildings using RGB-D sensors, modifying a fully convolutional network (FCN) to adapt to RGB-D data, thus enhancing efficiency compared to traditional deep learning approaches. Nonetheless, limitations persist in complex environments. Finally, Perez-Perez et al. [
20] proposed Scan2BIM-Net, an end-to-end learning method composed of two Convolutional Neural Networks (CNNs) and one Recurrent Neural Network (RNN). However, their experiments were limited to five classes, and validation in large-scale environments has yet to be conducted.
Several studies have leveraged deep learning models trained on large datasets, primarily utilizing semantic segmentation results. Mahmoud et al. [
13] applied an enhanced RandLA-Net [
22] model for point cloud classification, employing the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm for spatial clustering, the RANdom Sample Consensus (RANSAC) algorithm for plane detection, and the Alpha Shape algorithm to extract wall boundaries. The generated data are then processed through Revit’s Dynamo algorithm, enabling the automated creation of 3D BIM models that include walls, ceilings, floors, and even furniture. While the framework significantly improves BIM model quality, challenges remain in distinguishing closely situated objects of the same class, highlighting a need for further refinement of density-based clustering algorithms. Another framework [
15] proposes a method to automatically generate BIM models from point clouds using PointNet [
21], distinguishing 13 classes of indoor objects and generating individual object instances. Unlike prior research, this framework uses clustering methods like DBSCAN to identify object instances. Additionally, by leveraging Dynamo to match bounding boxes with standard objects from a BIM library, this approach automates BIM model generation while preserving object relationships within the model.
Tang et al. [
16] applied a deep learning model to the original point cloud data, classifying it into 13 primary elements and removing furniture such as desks and chairs. From the classified point cloud, key planes were extracted, and a Markov Random Field was used to clarify spatial boundaries and the relationships between structural elements, optimizing spatial composition. Each spatial form underwent normalization, with architectural elements represented parametrically. The parametric components were then converted into IFC format to produce the final BIM model. Experiments conducted on various indoor datasets demonstrated the framework’s ability to generate high-precision, high-completeness BIM models.
2.3. 3D Semantic Segmentation
With the increasing application of semantic segmentation in the scan-to-BIM process, its importance has become more pronounced. PointNet [
21] was the first method to introduce semantic segmentation at the point level, proposing a neural network with permutation invariance and achieving a mean Intersection over Union (mIoU) of 47.6 on the S3DIS dataset. However, due to limitations in recognizing fine-grained patterns, PointNet++ [
42] was subsequently developed to enhance generalization in complex scenes through a hierarchical neural network that applies PointNet recursively. PointNet and PointNet++ were widely used for semantic segmentation in scan-to-BIM. Recently, RandLA-Net [
22] has become prominent. RandLA-Net highlighted the quadratic time complexity of the Farthest Point Sampling (FPS) algorithm used in prior studies and proposed a more efficient down-sampling method based on random sampling. Additionally, to maintain the representational integrity of point clouds despite the randomness of point selection, RandLA-Net introduced a local feature aggregation module. This approach improved both speed and accuracy, achieving an mIoU of 70.0 on the S3DIS dataset.
Since the introduction of the transformer architecture [
43], significant efforts have been made to adapt it to 3D point cloud data, with PointTransformer [
23] being a prominent example. Early transformer-based approaches applied self-attention across the entire point cloud, resulting in a substantial increase in computational cost. PointTransformer addressed this issue by limiting self-attention to local regions and incorporating an appropriate positional encoding, achieving an mIoU of 73.5 on the S3DIS dataset. Released in 2020, PointTransformer has since undergone continual refinement, evolving into PointTransformer v2 [
44] and v3 [
45]. PointTransformer v2 [
44] introduced optimizations to reduce computational costs and addressed limitations of positional encoding, proposing new grouped vector attention and partition-based pooling techniques. PointTransformer v3 [
45] further enhanced performance through point cloud serialization, delivering speeds three times faster and ten times more memory-efficient than PointTransformer v2, and achieving superior performance on both indoor and outdoor datasets. Consequently, this study adopts the latest state-of-the-art model, PointTransformer v3, for semantic segmentation in the scan-to-BIM process.
3. Method
The framework proposed in this paper aims to assign large-scale semantic labels to real point cloud datasets, utilizing pairs of manually constructed BIM models and real point clouds derived from 3D scans.
Figure 2 illustrates the proposed framework, which consists of two main components. First, a preprocessing method is introduced to partition floors and spaces, enabling the division of a single floor’s data into multiple rooms, similar to the S3DIS dataset. Second, an automated labeling method is provided for structural elements within the segmented spaces, including doors, windows, floors, ceilings, columns, and beams.
3.1. Pre-Processing
The preprocessing of real-world point cloud data begins with noise removal to retain only the structural elements from the original 3D point cloud. After noise reduction, floor segmentation is applied, followed by room partitioning within each floor. Once room segmentation is completed, manual corrections are performed to address any missegmentations. This segmentation of floors and rooms is essential to create data of an appropriate size for input into deep learning models for 3D semantic segmentation, aiming to concentrate dense point cloud data within smaller spatial areas. This approach enables the development of a high-resolution dataset, providing a high-quality foundation for further model training and analysis.
3.1.1. Floor Segmentation
Floor segmentation involves isolating each floor of a multi-story building as an independent point cloud segment. This process is challenging, as floor heights vary across buildings, and even within individual structures. To address these variations, this paper adopts a method used in other studies that leverages point density histograms to achieve floor segmentation [
45]. This method generates point density histograms across different height levels within the entire point cloud to identify potential floor and ceiling planes.
Figure 3a illustrates these histograms for various heights in the building’s point cloud, with black dashed lines indicating threshold values, set according to a predefined ratio, which mark potential ceiling and floor candidates.
However, due to variations in histogram binning intervals and threshold values, the candidate planes for floors and ceilings may differ, leading to ambiguities in floor segmentation. In this study, the histogram interval was experimentally set to 0.05 m, and the threshold was defined as the mean plus twice the standard deviation of each histogram. These experimentally chosen parameters allowed for easier identification of high-density regions likely to represent floors and ceilings. To distinguish floors from ceilings, bins that exceed the threshold are initially identified, and bins within 20 cm of each other are assumed to belong to the same floor and are grouped together. As shown in the indices on the left side of
Figure 3a, the odd-numbered indices (marked by green circles) represent floors, which are used for segmentation.
This process results in an approximate floor segmentation, illustrated by the orange regions in
Figure 3b. However, this approach is prone to noise, as seen in the black box at the bottom of
Figure 3b, where errors can arise from noise misinterpretation. Consequently, manual adjustments are necessary to refine the segmentation based on the initial results.
3.1.2. Room Segmentation
For room segmentation, this study applied the 2D room segmentation method from [
46] to the 3D point cloud data. This process requires generating a 2D occupancy map from the 3D point cloud, applying the Robust Structure Identification and Room Segmentation (ROSE2) [
46] algorithm to this 2D occupancy map, and subsequently reconstructing the results back into 3D point clouds, thereby achieving room segmentation for a single floor’s point cloud. This method was tested on acquired building data in this section.
This paper presents a method for generating a 2D occupancy map from 3D point cloud data. To create this map, it is essential to distinguish between occupied, free, and unknown regions. For the 2D occupancy map, the 3D point cloud data were converted into a 2D grid map with a resolution of 0.05 m. This grid-based approach is required for input into the ROSE2 algorithm, which operates on 2D images. Initially, all states in the occupancy map are assumed to be unknown, and the entire point cloud is considered as free, resulting in a draft occupancy map as shown in
Figure 4a.
To identify occupied areas, wall information is extracted from the 3D point cloud [
47,
48]. Since wall-related point clouds generally extend vertically from floor to ceiling, the point cloud for each floor is divided into 11 layers by height, with the 6th layer—representing the mid-height—used to create a 0.05 m grid map. This grid map is assumed to contain wall information, as illustrated in
Figure 4b. Finally, the draft occupancy map is combined with the wall structure grid map to produce the 2D occupancy map shown in
Figure 4c.
The generated 2D occupancy map is used as input for the ROSE2 room segmentation algorithm, which proceeds according to the standard ROSE2 process. For this study, the default ROSE2 settings were applied. The algorithm extracts wall representative lines from the 2D occupancy map and performs clustering based on DBSCAN to detect room segments. The final room segmentation results are then obtained from this detection process, following the steps shown in
Figure 4d to produce the 2D room segmentation output. This segmentation result is then projected onto the 3D point cloud data to achieve 3D room segmentation. However, as with floor segmentation, ROSE2-based room segmentation is not entirely accurate, and manual corrections are necessary to refine the results.
Figure 5 illustrates examples of manual adjustments to the room segmentation results. In cases where data for a single room are divided into multiple segments, these segments must be merged. Conversely, when multiple rooms are incorrectly grouped as a single room, manual intervention is required to separate them.
3.2. Element Selection by Point Intersection
This section introduces a method for segmenting each object within the room-divided 3D point cloud data to align with the BIM model. It is essential to determine whether each point in the 3D point cloud is adjacent to or contained within a specific BIM model object. To facilitate this process, this study utilizes (
https://github.com/IfcOpenShell/IfcOpenShell (accessed on 28 May 2024) IfcOpenShell [
49], an open-source software library designed for handling IFC file formats in scripting environments. IfcOpenShell employs Boundary Representation (B-Rep) and an unbalanced binary tree (UB tree) structure, allowing efficient exploration of each object within the IFC model.
For each BIM object, data are organized using B-Rep to create bounding boxes, which are then arranged into an UB tree. The root node represents the bounding box of the entire building, while leaf nodes correspond to the bounding boxes of individual objects. Intermediate nodes contain bounding boxes that encompass multiple objects, forming the overall UB tree structure illustrated in
Figure 6. By traversing this tree, it is possible to identify the BIM object associated with each point in the 3D point cloud.
However, even with precise scan-to-BIM construction, alignment issues may arise due to manual or measurement errors, resulting in imperfect contact between BIM objects and the 3D point cloud. To account for these discrepancies, the 3D points are treated not as single points but as spherical volumes with a radius experimentally set to 5 cm. The UB tree encompassing all BIM objects in the building is constructed, and the search through the 3D point cloud data is conducted using functions provided by IfcOpenShell.
4. Experimental Results
This section is divided into two parts. The first part presents an example of new training data generated using the proposed framework, demonstrating the partitioning of floors and rooms within the complete building point cloud data and the labeling of each point cloud segment with minimal manual intervention. The second part reports the results of applying state-of-the-art 3D semantic segmentation models to both the S3DIS dataset and the newly generated dataset. These results highlight the contribution of the dataset constructed through the proposed framework to the generalization of deep learning models.
4.1. Creation of Training Dataset
The dataset used in this study originates from the scan-to-BIM process and includes diverse structures such as classrooms and offices. The laser scanning process was conducted using the NavVis VLX, achieving an accuracy of 10 mm (1 cm). The acquired point cloud data include information on coordinates, color, intensity, and normal. BIM models were constructed by 3D designers directly on top of the 3D point cloud data. Five buildings were selected for dataset construction, with two floors chosen from each building, resulting in a dataset containing coordinates, color, and normal. In total, 10 areas were created as part of this dataset.
To generate training data, non-building elements within the point cloud—such as ground, trees, and grass—were manually removed, after which the data were processed through the proposed framework. The framework settings were as follows: for floor segmentation, the histogram interval was set to 0.05 m, with the threshold defined as the mean plus twice the standard deviation of each histogram. For room segmentation, the grid size for occupancy map creation was set to 0.05 m, and the (
https://github.com/goldleaf3i/declutter-reconstruct (accessed on 28 July 2024) ROSE2 parameters used for 2D room segmentation were those specified by the authors. In applying element selection by point intersection, a 0.05 m tolerance was established to account for data acquisition and IFC file generation errors, ensuring that points were assigned to nearby objects.
Figure 7 illustrates the training dataset generated using the proposed framework. The results of floor segmentation and room segmentation for each building are presented, with the final ground truth achieved through element selection by point intersection. Floor and room segmentation, which are part of the preprocessing steps, were initially performed automatically, followed by manual adjustments to correct any missegmented or merged sections. In contrast, element selection by point intersection was conducted in a fully automated manner to segment individual objects.
The generated training dataset includes structural elements such as walls, ceilings, floors, doors, windows, columns, and beams, while clutter was excluded due to the difficulty of instancing remaining point cloud elements. The colors representing each class are shown in
Figure 1b. Buildings #1 through #4 were partially employed for training the deep learning model, whereas Building #5 was solely designated for evaluation purposes. Furthermore, although the building was constructed using five datasets, it comprises a total of 94 rooms.
4.2. PointTransformer v3
In this study, Transformer v3 was selected from various deep learning architectures for semantic segmentation. As a state-of-the-art (SOTA) model, Transformer v3 demonstrates strong performance not only on indoor datasets like S3DIS and ScanNet but also on outdoor datasets, making it a promising choice for diverse datasets. The structure and labeling of the generated dataset were designed to align with S3DIS. However, given the scan-to-BIM focus of this study, the dataset was refined from the original 13 categories (ceiling, floor, wall, beam, column, window, door, table, chair, sofa, bookcase, board, clutter) to 8 categories, excluding table, chair, sofa, bookcase, and board. Labels in the S3DIS dataset were similarly consolidated, with non-essential labels categorized as clutter. For evaluation, the S3DIS dataset was used as a baseline, with additional building data included to train and assess each model from scratch.
4.2.1. Model Training
Model training and testing were conducted on four Nvidia H100 GPUs, each with 80GB of memory. The execution environment was configured according to the guidelines for running PointTransformer v3 [
50]. Training was performed using the indoor dataset S3DIS combined with the newly generated dataset. The model was trained for 3000 epochs, with evaluations conducted every 100 epochs. The learning rate was set to 0.006, with a cosine scheduler applied. A batch size of 16 was used, and the optimizer was AdamW with a weight decay of 0.05. CrossEntropy and LovaszLoss were applied as the loss criteria.
The datasets used for model training are summarized in
Table 1. S3DIS Areas 1, 2, 3, 4, and 6 were used for training, while Area 5 was reserved for evaluation. Dataset A consists solely of S3DIS data. Datasets B, C, D, and E include data from Buildings #1 through #4, respectively. Additionally, training was conducted on Dataset F, which combines data from various domains to enhance diversity.
4.2.2. Model Results for S3DIS
This section provides a quantitative evaluation and analysis of the semantic segmentation results.
Table 2 presents the semantic segmentation performance of PointTransformer v3, evaluated on S3DIS Area 5, using IoU metrics for each dataset. The results of models trained on additional datasets were analyzed in comparison to Dataset A, which was trained exclusively on S3DIS. Datasets B and C exhibited lower performance, while Datasets D, E, and F demonstrated higher performance. Specifically, Dataset B and Dataset C achieved mIoUs of 67.68 and 67.51, respectively, slightly lower than Dataset A, suggesting that data from Buildings #1 and #2 may not enhance model performance. In contrast, Dataset D and Dataset E, which include data from Buildings #3 and #4, showed improved mIoUs of 68.5 and 69.73, respectively, indicating a positive impact on performance. This suggests that data from Buildings #3 and #4 are of similar quality to S3DIS data. Notably, Dataset E achieved the highest mIoU, with strong performance across most classes and particularly high IoU for the column class, implying that Building #4 data share domain characteristics with S3DIS.
An analysis of each class reveals trends across the datasets. For the Beam class, all datasets achieved an IoU of 0, likely due to the difficulty of model learning despite the presence of three beam objects in the Area 5 dataset. The clutter class, trained solely on S3DIS data, showed minimal variation between datasets. Similarly, the ceiling, floor, and wall classes exhibited low variance across datasets, likely due to the high volume of points in these classes, contributing to consistent performance. Notably, Datasets B and D achieved over 8.0 IoU points higher than other datasets in the door class, and an IoU improvement of approximately 5.0 in the window class, suggesting that data from Buildings #1 and #3 aided feature extraction for these classes. Dataset E outperformed Dataset A, which was trained only on S3DIS data, in all classes except floor, indicating that Building #4 data not only align closely with the S3DIS domain but also appear to be of higher quality.
Dataset F was constructed by combining the data from all buildings except those included in Dataset E, which achieved the highest performance on S3DIS Area 5. As shown in
Table 1, Dataset F includes Areas 7, 9, and 11. It achieved an mIoU of 69.08, marking the highest performance among datasets other than Dataset E. Similar to Datasets B, C, and D, Dataset F exhibited strong performance on the window and door classes, with IoUs of 62.40 and 84.29, respectively, as well as an IoU of 42.28 for the column class. These results suggest that incorporating data from diverse domains, even in smaller quantities, may contribute to performance improvement compared to training on individual datasets alone.
The results from Datasets B, C, and D indicate that adding more data does not necessarily improve deep learning model performance. In contrast, the results from Dataset E demonstrate that training with data from similar domains can enhance performance. Additionally, the results from Dataset F suggest that using data from multiple domains, rather than a single domain, can contribute to overall performance gains and improved model generalization.
4.2.3. Model Results for the Created Dataset
In this section, we conduct additional experiments and provide analysis on a subset of the extended datasets. The purpose of these experiments is to determine whether training with mixed-domain data yields better performance than training with data from a single domain. For this reason, Dataset F (which combines different domain data), Datasets A (which only includes S3DIS) and E (which performed well on S3DIS Area 5) were selected for comparison. Testing was conducted on Areas 8, 10, 12, 15, and 16, which were excluded from the training data. Beam objects are not included in Areas 10 and 12 due to occlusion, and the data generated by the proposed framework exclude clutter objects.
Table 3 presents the results of testing models trained on Datasets A, E, and F on Areas 8, 10, 12, 15, and 16. As clutter is not included in the additional datasets, its IoU is 0.0. Similarly, beam objects are excluded from Areas 10 and 16, resulting in an IoU of 0.0 for this class.
Datasets A and E, which were trained on data not closely aligned with the domains of the test Areas 8, 10, and 12, exhibited lower performance on these areas. In contrast, Dataset F, which includes similar domain data, achieved over twice the mIoU of Datasets A and E on these test areas, despite its smaller scale. For the door class, models trained on Datasets A and E performed poorly compared to Dataset F, showing similar results for the window class. Notably, models trained on Datasets A and E showed significant performance drops on ceiling and floor objects in Areas 10 and 12, highlighting the advantages of using mixed-domain data for improved generalization.
The test results for Area 15 and Area 16, which belong to a different domain from the training data, are presented in
Table 3. The model trained on the diverse domain data of Dataset F achieved scores of 63.00 and 54.17 for Area 15 and Area 16, respectively, outperforming the models trained on Dataset A and Dataset E. These results demonstrate that models trained on diverse domain data exhibit superior generalization performance when applied to new domain data.
Figure 8 visualizes the results of deep learning models trained on each dataset. In the results for Area 5, the model trained on Dataset E—yielding the highest performance—demonstrates superior wall delineation compared to models trained on Datasets A and F, as indicated by the black dotted line. This suggests that Building #4’s Areas 13 and 14 likely share a similar domain with the S3DIS dataset.
In the results for Areas 8, 10, and 12, the black dotted sections reveal that the model trained on Dataset F outperforms the others in accurately segmenting floor objects. This performance improvement is not solely due to training on similar data but rather due to the distinct characteristics of the data. S3DIS Areas are typically situated close to ground level, resulting in floor heights near the origin. In contrast, the dataset created in this study includes data from multiple floors with varying heights. Consequently, the test buildings incorporate these floor height variations, and models trained on Dataset E—which includes data with similar domain characteristics—demonstrate superior performance in these areas.
The results for Area 15 indicate that models trained with Datasets A, E, and F successfully classify floors, ceilings, and columns. However, as shown in the black dashed regions in the fifth row of
Figure 8, models trained on Datasets A and E fail to segment beam objects, in contrast to the model trained on Dataset F. Similarly, the results for Area 16 reveal that models trained on Datasets A and E missegment windows as doors, whereas the model trained on Dataset F correctly identifies windows but occasionally missegments certain walls as windows. Despite these limitations, the model trained on the combined Dataset F demonstrates enhanced qualitative performance compared to the others.
In summary, training on diverse domain data can enhance the generalization performance of deep learning models, underscoring the need for generating data across various domains. The framework proposed in this study supports this by providing diverse domain data, which can improve model performance. Therefore, the framework proposed in this study can enhance the performance of deep learning models by providing diverse domain data.
5. Discussion
The automatic generation of Building Information Modeling (BIM) remains challenging for practical implementation. These challenges stem from technical limitations, insufficient data, and the complexity of diverse architectural structures. Although current technologies have achieved a certain level of success in segmentation and object extraction based on point cloud data, they are not yet capable of producing fully automated BIM models that meet the demands of practical applications.
The semi-automated framework proposed in this paper enables the creation of labeled point cloud datasets by utilizing pairs of BIM models and 3D point clouds generated through scan-to-BIM. This framework is designed to use real-world point clouds directly, allowing sensor noise and occlusions to be incorporated, which better reflects real-world conditions. The quality of the BIM model is crucial to the effectiveness of the proposed framework. If there is insufficient overlap between the real point cloud and the BIM model, deep learning models may face significant challenges in learning from the dataset. For instance, the BIM models for Buildings #2 and #3 were constructed earlier than others, resulting in less overlap with the current point clouds. However, given the inherent difficulties in creating perfectly overlapping BIM models due to sensor inaccuracies and manual modeling errors, a margin of 0.05 m was applied to account for these discrepancies.
The proposed framework is a semi-automated approach that incorporates some manual processes rather than being fully automated. Therefore, further research is needed to refine the manual components of floor and room segmentation. Most parameters, including threshold values, were determined experimentally due to limited prior research. For floor segmentation, density-based histograms were generated for the entire point cloud, with points exceeding a defined threshold identified as potential floor or ceiling candidates. The first threshold-exceeding point from the bottom was assumed to be the floor. However, as illustrated in
Figure 3, variations in floor or ceiling positions within the same level can result in misclassifications, where sections may be treated as noise and removed. Additionally, in buildings with open ceilings or irregular structures, accurate floor segmentation may be compromised, highlighting areas for further improvement.
For room segmentation, the proposed process reduces the 3D point cloud to a 2D occupancy map, applies 2D room segmentation techniques, and then restores it to the 3D point cloud. This approach is suitable when an occupancy map is unavailable in pre-acquired data. However, the effectiveness of room segmentation within the proposed framework relies heavily on the ROSE2 algorithm; inaccuracies in occupancy map generation could negatively impact results. Nonetheless, it can be effectively automated with appropriate parameter tuning.
Additionally, the analysis of the time required for manual work compared to the automated process was insufficient. However, as the time and quality of manual work can vary depending on the individual worker, a quantitative analysis of these factors has been identified as a topic for future research.
As demonstrated by the experimental results, training deep learning models on smaller datasets from multiple domains can enhance generalization performance more effectively than training on a large dataset from a single domain. However, since the overall height of the data can influence semantic segmentation, future datasets should be constructed with the floors positioned close to the origin to minimize this effect.
The proposed framework does not generate data for furniture, such as chairs and desks, as the focus of this study is on improving semantic segmentation performance for structural elements like walls, floors, and ceilings. Testing on diverse datasets confirmed that the framework contributes to enhanced deep learning model performance. Additionally, with paired BIM models and corresponding 3D point cloud data, this framework could be applicable beyond architecture, extending to fields such as industrial applications. Future research should include an analysis of the experimentally set parameters and investigate the framework’s adaptability to non-architectural domains.
6. Conclusions
Traditionally, creating 3D models has been a labor-intensive and costly process due to the manual effort required. To address this, research has increasingly focused on automating scan-to-BIM processes. In particular, advancements in deep learning have enabled methods based on 3D semantic segmentation to extract valuable information from point clouds. Large-scale datasets such as S3DIS and LiDARNet have been released to support these methods, yet the availability of 3D data remains limited compared to image datasets. To provide high-quality, labeled 3D point cloud data, this paper proposes a semi-automated framework that leverages BIM models and 3D point clouds to construct datasets for 3D semantic segmentation. This framework divides the data by floors and rooms and employs the open-source IfcOpenShell to facilitate dataset creation. A total of 10 areas were generated from two floors across five buildings.
Using the proposed framework, six datasets were created by extracting distinct Areas. Each dataset was used to train the state-of-the-art deep learning model, Transformer V3, with tests conducted on S3DIS Area 5 as well as additional Areas. Performance changes were analyzed with the addition of new data, revealing that training with data from similar domains improved model performance. Further experiments demonstrated that training with mixed-domain data also achieved superior performance compared to single-domain data. This study demonstrates the necessity of training on diverse domain data to achieve generalization in deep learning. By leveraging the proposed framework, it presents a vision for addressing the scarcity of high-quality data in the 3D data market. Furthermore, the findings highlight the potential of the framework to contribute to improving the generalization capabilities of deep learning models.
Author Contributions
Conceptualization, J.H.L. and S.L.; methodology, H.Y. and J.H.L.; software, H.Y. and Y.K; validation, J.-H.R. and S.L; formal analysis, H.Y.; investigation, H.Y. and J.H.L.; resources, S.L.; data curation, J.H.L. and H.Y.; writing—original draft preparation, H.Y.; writing—review and editing, H.Y. and Y.K.; visualization, J.-H.R.; supervision, J.H.L.; project administration, S.L.; funding acquisition, S.L. All authors have read and agreed to the published version of the manuscript.
Funding
This work is supported by the Korea Agency for Infrastructure Technology Advancement (KAIA) grant funded by the Ministry of Land, Infrastructure and Transport (Grant RS-2022-00143584).
Data Availability Statement
The S3DIS dataset [
27] is available at:
https://redivis.com/datasets/9q3m-9w5pa1a2h or
http://buildingparser.stanford.edu/dataset.html (accessed on 1 March 2024). The proposed database is available to those who make a request by email, but it may not be used for commercial purposes. If requestors alter, transform, or build upon this work, they may distribute the resulting work only under the same license. The created datasets (Building #1~#5) are partially available on request from the corresponding author due to obtaining the data with the cooperation of local governments.
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Singh, M.; Srivastava, R.; Fuenmayor, E.; Kuts, V.; Qiao, Y.; Murray, N.; Devine, D. Applications of Digital Twin across Industries: A Review. Appl. Sci. 2022, 2, 5727. [Google Scholar] [CrossRef]
- VanDerHorn, E.; Mahadevan, S. Digital Twin: Generalization, characterization and implementation. Decis. Support Syst. 2021, 145, 113524. [Google Scholar] [CrossRef]
- Attaran, M.; Celik, B.G. Digital Twin: Benefits, Use Cases, Challenges, and Opportunities. Decis. Anal. J. 2023, 6, 100165. [Google Scholar] [CrossRef]
- Okonta, E.D.; Vukovic, V.; Hayat, E. Prospective Directions in the Computer Systems Industry Foundation Classes (IFC) for Shaping Data Exchange in the Sustainability and Resilience of Cities. Electronics 2024, 13, 2297. [Google Scholar] [CrossRef]
- Acharya, D.; Khoshelham, K.; Winter, S. BIM-PoseNet: Indoor camera localisation using a 3D indoor model and deep learning from synthetic images. ISPRS J. Photogramm. Remote Sens. 2019, 150, 245–258. [Google Scholar] [CrossRef]
- Ochmann, S.; Vock, R.; Wessel, R.; Klein, R. Automatic reconstruction of parametric building models from indoor point clouds. Comput. Graph. 2016, 54, 94–103. [Google Scholar] [CrossRef]
- Wang, C.; Cho, Y.K.; Kim, C. Automatic BIM component extraction from point clouds of existing buildings for sustainability applications. Autom. Constr. 2015, 56, 1–13. [Google Scholar] [CrossRef]
- Murali, S.; Speciale, P.; Oswald, M.R.; Pollefeys, M. Indoor Scan2BIM: Building information models of house interiors. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 24–28 September 2017; pp. 6126–6133. [Google Scholar] [CrossRef]
- Tchuinkou Kwadjo, D.; Tchinda, N.; Bobda, C.; Menadjou, N.; Fotsing, C.; Nziengam, N. From PC2BIM: Automatic Model generation from Indoor Point Cloud. In Proceedings of the 13th International Conference on Distributed Smart Cameras, Trento, Italy, 9–11 September 2019; pp. 1–6. [Google Scholar] [CrossRef]
- Gao, X.; Yang, R.; Chen, X.; Tan, J.; Liu, Y.; Wang, Z.; Tan, J.; Liu, H. A New Framework for Generating Indoor 3D Digital Models from Point Clouds. Remote Sens. 2024, 16, 3462. [Google Scholar] [CrossRef]
- Luo, Z.; Xie, Z.; Wan, J.; Zeng, Z.; Liu, L.; Tao, L. Indoor 3D Point Cloud Segmentation Based on Multi-Constraint Graph Clustering. Remote Sens. 2023, 15, 131. [Google Scholar] [CrossRef]
- Singh, T.; Mahmoodian, M.; Wang, S. Enhancing Open BIM Interoperability: Automated Generation of a Structural Model from an Architectural Model. Buildings 2024, 14, 2475. [Google Scholar] [CrossRef]
- Mahmoud, M.; Chen, W.; Yang, Y.; Li, Y. Automated BIM generation for large-scale indoor complex environments based on deep learning. Autom. Constr. 2024, 162, 105376. [Google Scholar] [CrossRef]
- Yang, F.; Zhou, G.; Su, F.; Zuo, X.; Tang, L.; Liang, Y.; Zhu, H.; Li, L. Automatic Indoor Reconstruction from Point Clouds in Multi-room Environments with Curved Walls. Sensors 2019, 19, 3798. [Google Scholar] [CrossRef] [PubMed]
- Park, J.; Kim, J.; Lee, D.; Jeong, K.; Lee, J.; Kim, H.; Hong, T. Deep Learning–Based Automation of Scan-to-BIM with Modeling Objects from Occluded Point Clouds. J. Manag. Eng. 2022, 38, 04022025. [Google Scholar] [CrossRef]
- Tang, S.; Li, X.; Zheng, X.; Wu, B.; Wang, W.; Zhang, Y. BIM generation from 3D point clouds by combining 3D deep learning and improved morphological approach. Autom. Constr. 2022, 141, 104422. [Google Scholar] [CrossRef]
- Yue, H.; Wang, Q.; Zhao, H.; Zeng, N.; Tan, Y. Deep learning applications for point clouds in the construction industry. Autom. Constr. 2024, 168, 105769. [Google Scholar] [CrossRef]
- Huan, L.; Zheng, X.; Gong, J. GeoRec: Geometry-enhanced semantic 3D reconstruction of RGB-D indoor scenes. ISPRS J. Photogramm. Remote Sens. 2022, 186, 301–314. [Google Scholar] [CrossRef]
- Li, Y.; Li, W.; Tang, S.; Darwish, W.; Hu, Y.; Chen, W. Automatic Indoor As-Built Building Information Models Generation by Using Low-Cost RGB-D Sensors. Sensors 2020, 20, 293. [Google Scholar] [CrossRef]
- Perez-Perez, Y.; Golparvar-Fard, M.; El-Rayes, K. Scan2BIM-NET: Deep Learning Method for Segmentation of Point Clouds for Scan-to-BIM. J. Constr. Eng. Manag. 2021, 147, 04021107. [Google Scholar] [CrossRef]
- Charles, R.Q.; Su, H.; Kaichun, M.; Guibas, L.J. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 77–85. [Google Scholar] [CrossRef]
- Hu, Q.; Yang, B.; Xie, L.; Rosa, S.; Guo, Y.; Wang, Z.; Trigoni, N.; Markham, A. RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020. [Google Scholar] [CrossRef]
- Zhao, H.; Jiang, L.; Jia, J.; Torr, P.H.; Koltun, V. Point transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 16259–16268. [Google Scholar] [CrossRef]
- Qian, G.C.; Li, Y.C.; Peng, H.W.; Mai, J.J.; Hammoud, H.; Elhoseiny, M.; Ghanem, B. Pointnext: Revisiting pointnet++ with improved training and scaling strategies. Adv. Neural Inf. Process. Syst. 2022, 35, 23192–23204. [Google Scholar] [CrossRef]
- Choy, C.; Gwak, J.; Savarese, S. 4d spatio-temporal convnets: Minkowski convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 3075–3084. [Google Scholar] [CrossRef]
- Thomas, H.; Qi, C.R.; Deschaud, J.E.; Marcotegui, B.; Goulette, F.; Guibas, L.J. KPConv: Flexible and Deformable Convolution for Point Clouds. arXiv 2019, arXiv:1904.08889. [Google Scholar] [CrossRef]
- Armeni, I.; Sener, O.; Zamir, A.R.; Jiang, H.; Brilakis, I.; Fischer, M.; Savarese, S. 3d semantic parsing of large-scale indoor spaces. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1534–1543. [Google Scholar] [CrossRef]
- Liu, Y.; Dai, Q.; Xu, W. A point-cloud-based multiview stereo algorithm for free-viewpoint video. IEEE Trans. Vis. Comput. Graph. 2009, 16, 407–418. [Google Scholar] [CrossRef]
- Guo, Y.; Li, Y.; Ren, D.; Zhang, X.; Li, J.; Pu, L.; Ma, C.; Zhan, X.; Guo, J.; Wei, M.; et al. LiDAR-Net: A Real-scanned 3D Point Cloud Dataset for Indoor Scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 17–21 June 2024. [Google Scholar] [CrossRef]
- Geyter, S.; Bassier, M.; Vergauwen, M. Automated Training Data Creation for Semantic Segmentation of 3d Point Clouds. In The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences; ISPRS: Prague, Czech Republic, 2022; Volume XLVI-5/W1-2022, pp. 59–67. [Google Scholar] [CrossRef]
- Noichl, F.; Braun, A.; Borrmann, A. “BIM-to-Scan” for Scan-to-BIM: Generating Realistic Synthetic Ground Truth Point Clouds Based on Industrial 3D Models. In Proceedings of the 2021 European Conference on Computing in Construction, Online, 19–28 July 2021. [Google Scholar] [CrossRef]
- Huang, H.S.; Tang, S.J.; Wang, W.X.; Li, X.M.; Guo, R.Z. From BIM To Pointcloud: Automatic Generation of Labeled Indoor Pointcloud. Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci. 2022, XLIII-B5-2022, 73–78. [Google Scholar] [CrossRef]
- Birkeland, A.A.; Udnæs, M. Semi-automated Dataset Creation for Semantic and Instance Segmentation of Industrial Point Clouds. Comput. Ind. 2024, 155, 104064. [Google Scholar] [CrossRef]
- Ma, J.W.; Czerniawski, T.; Leite, F. Semantic Segmentation of Point Clouds of Building Interiors with Deep Learning: Augmenting Training Datasets with Synthetic BIM-Based Point Clouds. Autom. Constr. 2020, 113, 103144. [Google Scholar] [CrossRef]
- Zhang, H.; Wang, T. An Efficient Method for Producing Deep Learning Point Cloud Datasets based on BIM 3D Model and Computer Simulation. In Proceedings of the Second International Symposium on Computer Technology and Information Science (ISCTIS 2022), Guilin, China, 10–12 June 2022; Volume 12474, p. 2653608. [Google Scholar] [CrossRef]
- Griffiths, D.; Boehm, J. A Review on Deep Learning Techniques for 3D Sensed Data Classification. Remote Sens. 2019, 11, 1499. [Google Scholar] [CrossRef]
- Abreu, N.; Pinto, A.; Matos, A.; Pires, M. Procedural Point Cloud Modelling in Scan-to-BIM and Scan-vs-BIM Application: A Review. ISPRS Int. J. Geo Inf. 2023, 12, 260. [Google Scholar] [CrossRef]
- Humblot-Renaux, G.; Jensen, S.B.; Møgelmose, A. From CAD Models to Soft Point Cloud Labels: An Automatic Annotation Pipeline for Cheaply Supervised 3D Semantic Segmentation. Remote Sens. 2023, 15, 3578. [Google Scholar] [CrossRef]
- Andrea, M.; Egidio, L. Automatic Construction of Structural Meshes from Photographic and Laser Surveys. In Proceedings of the AIMETA Conference, Palermo, Italy, 4–8 September 2022; pp. 251–256. [Google Scholar] [CrossRef]
- Andriasyan, M.; Moyano, J.; Nieto-Julián, J.E.; Antón, D. From point cloud data to building information modelling: An automatic parametric workflow for heritage. Remote Sens. 2020, 12, 1094. [Google Scholar] [CrossRef]
- Kim, S.; Jeong, K.; Hong, T.; Lee, J. Deep Learning–Based Automated Generation of Material Data with Object–Space Relationships for Scan to BIM. J. Manag. Eng. 2023, 39, 3. [Google Scholar] [CrossRef]
- Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. arXiv 2017, arXiv:1706.02413. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2017; pp. 5998–6008. [Google Scholar] [CrossRef]
- Wu, X.; Lao, Y.; Jiang, L.; Liu, X.; Zhao, H. Point transformer v2: Grouped vector attention and partition-based pooling. Adv. Neural Inf. Process. Syst. 2022, 35, 33330–33342. Available online: https://dl.acm.org/doi/10.5555/3600270.3602685 (accessed on 12 May 2024).
- Martens, J.; Blankenbach, J. VOX2BIM+—A Fast and Robust Approach for Automated Indoor Point Cloud Segmentation and Building Model Generation. PFG-J. Photogramm. Remote Sens. Geoinf. Sci. 2023, 91, 273–294. [Google Scholar] [CrossRef]
- Luperto, M.; Kucner, T.P.; Tassi, A.; Magnusson, M.; Amigoni, F. Robust Structure Identification and Room Segmentation of Cluttered Indoor Environments from Occupancy Grid Maps. arXiv 2022. Available online: http://arxiv.org/abs/2203.03519 (accessed on 28 July 2024). [CrossRef]
- Fotsing, C.; Hahn, P.; Cunningham, D.; Bobda, C. Volumetric wall detection in unorganized indoor point clouds using continuous segments in 2D grids. Autom. Constr. 2022, 141, 104462. [Google Scholar] [CrossRef]
- Yoo, H.; Rhee, G.; Ryu, J.; Lee, S.; Lee, J. Method for Extracting Wall Objects for 3D Modeling Using a 2D Grid-Based Approach. J. Korea Multimed. Soc. 2023, 26, 1583–1593. Available online: https://www.dbpia.co.kr/journal/articleDetail?nodeId=NODE11652474 (accessed on 15 April 2024). [CrossRef]
- IfcOpenShell. The Open Source IFC Toolkit and Geometry Engine. Available online: http://ifcopenshell.org/ (accessed on 28 May 2024).
- Pointcept. Point Cloud Perception Codebase. Available online: https://github.com/Pointcept/Pointcept (accessed on 7 July 2024).
Figure 1.
The main elements of Scan-to-BIM: (a) Acquisition of 3D point cloud data through laser scanning, (b) 3D semantic segmentation using a deep learning model, and (c) BIM model generation based on segmentation information.
Figure 1.
The main elements of Scan-to-BIM: (a) Acquisition of 3D point cloud data through laser scanning, (b) 3D semantic segmentation using a deep learning model, and (c) BIM model generation based on segmentation information.
Figure 2.
The preprocessing stage consists of floor segmentation and room segmentation, resulting in completed 3D room segmentation. The second stage is the Point-Intersection Element Selection process, where an IfcOpenShell-based script is used to determine which BIM objects contain each point.
Figure 2.
The preprocessing stage consists of floor segmentation and room segmentation, resulting in completed 3D room segmentation. The second stage is the Point-Intersection Element Selection process, where an IfcOpenShell-based script is used to determine which BIM objects contain each point.
Figure 3.
This figure shows the results of floor segmentation applied to the complete point cloud of a building. (a) z-axis histogram and (b) entire building point cloud. Green circles indicate bins corresponding to floors, which are used as reference points for segmentation. However, this method may produce mismatches, as seen in the black box at the bottom of the figure.
Figure 3.
This figure shows the results of floor segmentation applied to the complete point cloud of a building. (a) z-axis histogram and (b) entire building point cloud. Green circles indicate bins corresponding to floors, which are used as reference points for segmentation. However, this method may produce mismatches, as seen in the black box at the bottom of the figure.
Figure 4.
This figure illustrates the ROSE2-based room segmentation process applied to floor-segmented data. The steps are as follows: (a) creation of a draft occupancy map from the floor-segmented data, (b) extraction of wall structures, (c) generation of a 2D occupancy map, and (d) room segmentation using ROSE2. ROSE2 performs 2D room segmentation on the occupancy map, which is then projected back onto the 3D point cloud, resulting in completed 3D room segmentation.
Figure 4.
This figure illustrates the ROSE2-based room segmentation process applied to floor-segmented data. The steps are as follows: (a) creation of a draft occupancy map from the floor-segmented data, (b) extraction of wall structures, (c) generation of a 2D occupancy map, and (d) room segmentation using ROSE2. ROSE2 performs 2D room segmentation on the occupancy map, which is then projected back onto the 3D point cloud, resulting in completed 3D room segmentation.
Figure 5.
Manual adjustments to the room segmentation results. The room segmentation process based on ROSE2 is not entirely accurate, requiring manual operations to merge or separate data as needed.
Figure 5.
Manual adjustments to the room segmentation results. The room segmentation process based on ROSE2 is not entirely accurate, requiring manual operations to merge or separate data as needed.
Figure 6.
The tree structure and BIM object representation method provided by IfcOpenShell [
49] utilize a boundary representation approach for BIM models. The model is represented through boundary representation (B-Rep), and bounding boxes are extracted for use in an unbalanced binary tree structure. Non-leaf nodes in the tree contain bounding boxes, while leaf nodes represent the boundary of individual objects.
Figure 6.
The tree structure and BIM object representation method provided by IfcOpenShell [
49] utilize a boundary representation approach for BIM models. The model is represented through boundary representation (B-Rep), and bounding boxes are extracted for use in an unbalanced binary tree structure. Non-leaf nodes in the tree contain bounding boxes, while leaf nodes represent the boundary of individual objects.
Figure 7.
This figure displays the dataset generated using the proposed framework. The dataset was manually created via scan-to-BIM, and includes seven types of architectural elements: walls, floors, ceilings, columns, beams, doors, and windows. Two floors from each building were incorporated into the dataset. The preprocessing stage, which includes floor and room segmentation, involved final manual adjustments to complete 3D room segmentation. Subsequently, areas were fully automated using IfcOpenShell. Buildings #1 to #4 were partially utilized for training the deep learning model, while Building #5 was exclusively reserved for evaluation purposes.
Figure 7.
This figure displays the dataset generated using the proposed framework. The dataset was manually created via scan-to-BIM, and includes seven types of architectural elements: walls, floors, ceilings, columns, beams, doors, and windows. Two floors from each building were incorporated into the dataset. The preprocessing stage, which includes floor and room segmentation, involved final manual adjustments to complete 3D room segmentation. Subsequently, areas were fully automated using IfcOpenShell. Buildings #1 to #4 were partially utilized for training the deep learning model, while Building #5 was exclusively reserved for evaluation purposes.
Figure 8.
This figure presents the qualitative evaluation results for Areas 5, 8, 10, and 12. Black dotted lines highlight regions where model performance varies significantly across the datasets.
Figure 8.
This figure presents the qualitative evaluation results for Areas 5, 8, 10, and 12. Black dotted lines highlight regions where model performance varies significantly across the datasets.
Table 1.
A list of datasets constructed for training deep learning model in this study. For each dataset, the areas utilized for training are represented with the symbol “O”. Dataset A utilizes only S3DIS data, while the remaining datasets incorporate additional data generated through the framework proposed in this paper. Area 5 of S3DIS and Areas 15 and 16 of Building #5 were used exclusively for testing purposes.
Table 1.
A list of datasets constructed for training deep learning model in this study. For each dataset, the areas utilized for training are represented with the symbol “O”. Dataset A utilizes only S3DIS data, while the remaining datasets incorporate additional data generated through the framework proposed in this paper. Area 5 of S3DIS and Areas 15 and 16 of Building #5 were used exclusively for testing purposes.
| S3DIS | Building #1 | Building #2 | Building #3 | Building #4 | Building #5 |
---|
(w/o Area 5) | Area 7 | Area 8 | Area 9 | Area 10 | Area 11 | Area 12 | Area 13 | Area 14 | Area 15 | Area 16 |
---|
Dataset A | O | - | - | - | - | - | - | - | - | - | - |
Dataset B | O | O | O | - | - | - | - | - | - | - | - |
Dataset C | O | - | - | O | O | - | - | - | - | - | - |
Dataset D | O | - | - | - | - | O | O | - | - | - | - |
Dataset E | O | - | - | - | - | - | - | O | O | - | - |
Dataset F | O | O | - | O | - | O | - | - | - | - | - |
Table 2.
IoU validation results for S3DIS Area 5. Each dataset was used to train the deep learning model, and the IoU performance on S3DIS Area 5 was compared across datasets.
Table 2.
IoU validation results for S3DIS Area 5. Each dataset was used to train the deep learning model, and the IoU performance on S3DIS Area 5 was compared across datasets.
| Ceiling | Floor | Wall | Beam | Column | Window | Door | Clutter | mIoU |
---|
Dataset A | 92.41 | 98.37 | 85.51 | 0.0 | 52.61 | 56.88 | 74.10 | 87.98 | 68.48 |
Dataset B | 92.23 | 98.32 | 84.58 | 0.0 | 32.49 | 61.96 | 82.56 | 89.32 | 67.68 |
Dataset C | 93.06 | 98.32 | 85.66 | 0.0 | 39.41 | 60.54 | 74.48 | 88.59 | 67.51 |
Dataset D | 93.03 | 98.29 | 85.79 | 0.0 | 35.46 | 64.35 | 82.13 | 88.98 | 68.50 |
Dataset E | 93.08 | 98.24 | 85.73 | 0.0 | 54.73 | 61.10 | 75.33 | 89.62 | 69.73 |
Dataset F | 93.04 | 97.81 | 84.24 | 0.0 | 42.28 | 62.40 | 84.29 | 88.60 | 69.08 |
Table 3.
Evaluation results of models trained on each dataset for Areas 8, 10, 12, 15, and 16. The generated data do not include any clutter, and Areas 10 and 12 also exclude beams. As a result, these areas exhibit an Intersection over Union (IoU) of 0.0.
Table 3.
Evaluation results of models trained on each dataset for Areas 8, 10, 12, 15, and 16. The generated data do not include any clutter, and Areas 10 and 12 also exclude beams. As a result, these areas exhibit an Intersection over Union (IoU) of 0.0.
| Area | Ceiling | Floor | Wall | Beam | Column | Window | Door | Clutter | mIoU |
---|
Dataset A | Area 8 | 87.21 | 88.16 | 63.72 | 0.04 | 32.09 | 0.53 | 13.73 | 0.0 | 35.68 |
Area 10 | 62.27 | 37.49 | 78.13 | 0.0 | 35.18 | 0.16 | 0.0 | 0.0 | 26.66 |
Area 12 | 59.95 | 38.78 | 68.05 | 7.67 | 13.01 | 1.03 | 0.1 | 0.0 | 23.57 |
Area 15 | 75.45 | 70.89 | 78.12 | 0.0 | 48.61 | 0.02 | 21.22 | 0.0 | 36.79 |
Area 16 | 92.51 | 67.39 | 68.37 | 0.0 | 68.90 | 0.0 | 10.01 | 0.0 | 38.40 |
Dataset E | Area 8 | 93.35 | 95.42 | 72.55 | 0.12 | 23.47 | 41.12 | 17.72 | 0.0 | 42.99 |
Area 10 | 59.29 | 27.44 | 79.4 | 0.0 | 4.87 | 4.83 | 0.34 | 0.0 | 22.02 |
Area 12 | 56.20 | 27.75 | 74.25 | 21.06 | 4.7 | 7.30 | 1.70 | 0.0 | 24.12 |
Area 15 | 77.42 | 79.22 | 78.22 | 0.12 | 36.73 | 40.29 | 38.27 | 0.0 | 43.78 |
Area 16 | 93.80 | 94.70 | 72.49 | 0.0 | 67.98 | 37.02 | 38.82 | 0.0 | 50.60 |
Dataset F | Area 8 | 95.71 | 97.21 | 85.28 | 86.68 | 71.11 | 81.66 | 54.42 | 0.0 | 71.51 |
Area 10 | 95.47 | 97.34 | 89.97 | 0.0 | 76.23 | 6.83 | 85.95 | 0.0 | 56.47 |
Area 12 | 90.23 | 96.87 | 82.55 | 2.13 | 41.66 | 24.80 | 66.20 | 0.0 | 50.56 |
Area 15 | 79.16 | 80.79 | 83.29 | 75.79 | 52.32 | 72.72 | 59.95 | 0.0 | 63.00 |
Area 16 | 93.40 | 94.73 | 71.83 | 0.0 | 54.32 | 62.51 | 56.56 | 0.0 | 54.17 |
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).