Semantic Segmentation of Heavy Construction Equipment Based on Point Cloud Data

Park, Suyeul; Kim, Seok

doi:10.3390/buildings14082393

Open AccessArticle

Semantic Segmentation of Heavy Construction Equipment Based on Point Cloud Data

by

Suyeul Park

and

Seok Kim

^*

Department of Railroad Infrastructure Engineering, Korea National University of Transportation, Uiwang-si 16106, Republic of Korea

^*

Author to whom correspondence should be addressed.

Buildings 2024, 14(8), 2393; https://doi.org/10.3390/buildings14082393

Submission received: 7 June 2024 / Revised: 27 July 2024 / Accepted: 31 July 2024 / Published: 2 August 2024

(This article belongs to the Special Issue Advanced Research on Intelligent Building Construction and Management)

Download

Browse Figures

Versions Notes

Abstract

:

Most of the currently developed 3D point cloud data-based object recognition algorithms have been designed for small indoor objects, posing challenges when applied to large-scale 3D point cloud data in outdoor construction sites. To address this issue, this research selected four high-performance deep learning-based semantic segmentation algorithms for large-scale 3D point cloud data: Rand-LA-Net, KPConv Rigid, KPConv Deformable, and SCF-Net. These algorithms were trained and validated using 3D digital maps of earthwork sites to build semantic segmentation models, and their performance was tested and evaluated. The results of this research represent the first application of 3D semantic segmentation algorithms to large-scale 3D digital maps of earthwork sites. It was experimentally confirmed that object recognition technology can be implemented in the construction industry using 3D digital maps composed of large-scale 3D point cloud data.

Keywords:

heavy construction equipment; semantic segmentation; 3D point cloud data; earthwork site; deep learning

1. Introduction

1.1. Research Background and Objectives

To address the challenge of low labor productivity in the construction industry, various initiatives have been suggested globally, including “regulatory innovation and enhanced transparency”, “enhancing construction capabilities”, “development of new construction materials”, and “adoption of novel digital technologies” [1,2,3,4,5,6,7,8,9,10]. Recently, driven by advancements in artificial intelligence (AI), considerable research has focused on enhancing construction labor productivity through automation via machine learning and deep learning technologies [4,11,12,13,14].

In the construction industry, several AI-driven computer vision technologies have been employed to monitor safety and processes, as well as to assess concrete damage, primarily through the analysis of 2D images and videos [15,16]. Furthermore, recent research has focused on object recognition, which entails simultaneous identification and classification of various items and structures on construction sites. This involves utilizing 2D images and video data to ensure the proper usage of safety helmets and belts [17]; differentiating between construction personnel and the general public; identifying forklifts [18]; inspecting cracks in concrete structures, such as roads, buildings, and bridges; and assessing potentially hazardous situations [5,19,20,21,22,23].

While research on object recognition using 2D image data in the construction industry has been conducted with high object recognition rates and fast processing speeds, contributing to construction labor productivity, the utilization of 3D point cloud data is increasing. This is due to the growing need for accurate height values and 3D geometric information in recent research areas, such as terrain change detection and earthwork volume calculation using UAVs (Unmanned Aerial Vehicles) and UGVs (Unmanned Ground Vehicles), construction equipment location monitoring, autonomous driving path planning, and safe route securing [24,25,26]. Unlike 2D image data, 3D point cloud data includes Z-value information, ensuring the accuracy of 3D spatial location information. This makes it suitable for research requiring 3D geometric information, such as 3D digital map creation and utilization and robotics [11,27]. However, 3D point cloud data still requires continuous research to be utilized in various fields of the construction industry due to its larger file size, lower object recognition accuracy, and longer processing time compared to 2D data [28,29].

Studies on object recognition using 3D point cloud data in the construction industry are relatively limited compared with those using 2D image data. However, research in this area is gradually increasing, with several studies focusing on heavy construction equipment, which plays a crucial role in the entire construction process, affecting the construction quality, duration, manpower management, and safety. Therefore, research on object recognition for heavy construction equipment, development of machine guidance and machine control technologies, and use of unmanned heavy construction equipment is being actively conducted [12,22,23,24].

Object recognition studies for heavy construction equipment can be broadly categorized into those that use 2D image and 3D point cloud data. Studies utilizing 2D images of heavy construction equipment include those that generate image datasets, such as ImageNet [25] and the Alberta Construction Image Dataset (ACID) [26], as well as those focused on their classification [30]. In contrast, few studies have focused on the object recognition of heavy construction equipment using 3D point cloud data. There are relatively few studies on object classification and recognition for construction sites using 3D point cloud data, as the characteristics of such data are closely linked to the specific attributes of the data obtained from construction sites [8,31]. Generally, 2D image data is relatively lightweight and has uniformly arranged pixels, making it easier to handle. In contrast, 3D point clouds are unordered, are unstructured, and have irregular density, posing challenges for related research [32]. These characteristics of 3D point cloud data have made it difficult to apply deep learning and machine learning techniques compared to traditional 2D images [33]. Moreover, 3D point cloud data collected from earthwork sites are large-scale and high-density due to the irregular and extensive nature of construction sites, adding further complexity to related research. Construction sites are mostly located outdoors and are large-scale and irregular, resulting in 3D point cloud data with high density and large volume [31]. Therefore, due to these characteristics of 3D point cloud data from construction sites, it is challenging to apply machine learning and deep learning models originally developed for indoor environments. However, recent deep learning algorithms, such as PointNet [28], RandLA-Net [29], and KPConv [34], have exhibited remarkable object classification and semantic segmentation performance on large-scale datasets. Moreover, several large-scale 3D point cloud datasets, including Semantic3D [35], Paris-Lille-3D [36], and Toronto-3D [33], have also been published. Additionally, the development of 3D point cloud data libraries, such as the Point Cloud Library (PCL) [37], Open3D [38], and Point Data Abstraction Library (PDAL) [39], has facilitated deep learning research using extensive 3D point cloud data of construction sites.

In line with rapidly evolving technological advancements, this research addresses the question, “Is it possible to efficiently apply 3D semantic segmentation algorithms to large-scale 3D point cloud data in the form of 3D digital maps generated at earthwork sites?” To conduct this research, we first utilized the 3D-ConHE dataset (3D Point Clouds of Construction Heavy Equipment) [40] built from previous studies and 3D point cloud data from UAV-captured earthwork sites to construct a 3D digital map composed of large-scale 3D point cloud data of construction heavy equipment at earthwork sites. The constructed 3D digital map was then used to train and validate four semantic segmentation algorithms to create a 3D semantic segmentation model for construction heavy equipment. Subsequently, we tested and evaluated the performance of the 3D semantic segmentation model for the construction heavy equipment generated in this research. This research aimed to provide specific application directions for 3D deep learning technology using large-scale 3D point cloud data, which is increasingly utilized in the construction industry. Through the results of this study, we propose a high-performance semantic segmentation algorithm based on 3D point cloud data, applicable to construction heavy equipment located on large-scale 3D digital maps of earthwork sites.

1.2. Research Scope and Methods

In this study, 14 digital maps comprising 3D point cloud data of road construction sites at the earthwork stage were created. Photogrammetry was conducted using UAVs at two road construction sites in Gwangju, Gyeonggi-do, and Jeungpyeong-gun, Chungcheongbuk-do, South Korea. Specifically, the maps exhibited various construction sections and topographical features. Before conducting 3D labeling, the 3D-ConHE dataset was integrated and aligned with the 3D digital maps. The integrated dataset comprised five types of primary heavy construction equipment commonly employed for earthwork: excavators, bulldozers, graders, dump trucks, and rollers. For 3D labeling, semantic segmentation labeling was performed based on 3D point cloud data by categorizing the integrated maps into three classes, resulting in 14 semantic segmentation datasets. These datasets were further divided into training and test sets and applied to the four semantic segmentation models to evaluate their performances.

First, a literature review was conducted to select the appropriate semantic segmentation algorithms, which were then trained and validated on semantic segmentation datasets comprising 3D point cloud data of heavy construction equipment. Subsequently, the models were tested on the test sets, and their performances were compared to determine the most efficient semantic segmentation algorithms for heavy construction equipment at outdoor construction sites. This paper is structured as follows. Section 1, the introduction, outlines the research background and objectives, scope, and methodology. Section 2 presents related works, analyzing major previous studies in line with the historical context. Section 3 covers the methodology used in this research, including the framework and performance measures for evaluating the semantic segmentation models. Section 4 details the data generation and construction of the semantic segmentation models. Section 5 focuses on testing and evaluating the performance of the constructed semantic segmentation models. Finally, Section 6 presents the conclusion.

2. Related Works

2.1. Object Recognition in the Construction Industry Using 2D Image Data

Several studies have aimed to conduct object recognition using 2D image data during both the construction and maintenance stages in the construction industry. Prior to the use of machine learning and deep learning in the construction industry, object recognition using genetic algorithms was employed in the field of architectural design. In the construction industry, genetic algorithms were mainly utilized for optimizing designs such as building floor plans, facade designs, and energy-efficient designs [41]. However, as the performance of machine learning and deep learning techniques has improved, these techniques are now more commonly used in the construction field than genetic algorithms. Next, to maintain bridges, Lee et al. (2018) conducted object detection to identify cracks and spalls in concrete bridge structures, which typically involve working at considerable heights [42]. The study used AlexNet, a deep learning-based artificial neural network, with 2D image data. Other studies have attempted to enhance safety management at construction sites by using 2D image data of collisions between heavy construction equipment and workers, as well as security camera footage, for object recognition [18,43]. Lee et al. (2022) developed an object detection model using data from security cameras installed at various sites to identify hazardous situations wherein workers collide with trucks or forklifts [43]. They achieved recognition accuracies of 88% and 92% for collisions between forklifts and workers in indoor and outdoor environments, respectively. Additionally, Jeong et al. (2021) developed an automatic recognition model for the autonomous operation of heavy construction equipment using computer vision technology [18]. They used footage from security cameras installed at construction sites for safety management to develop a Faster R-CNN model for identifying workers, signalmen, and forklifts through object detection technology [18]. They achieved an accuracy of 83.4% for detecting heavy construction equipment and workers, 84.2% for workers and signalmen, and 95.1% for heavy construction equipment and signalmen. As mentioned above, AI-driven object recognition models in construction have primarily employed 2D images or footage from security cameras for maintenance tasks, such as crack and spall detection in concrete structures, and safety management, such as recognizing and warning workers about collisions with heavy equipment at construction sites [15,27]. In addition to the construction field, recent research on 3D object detection for autonomous vehicles includes studies on Point RCNN. Point RCNN proposes a method that integrates image information with point cloud data, demonstrating excellent performance in object detection for autonomous vehicles [44].

Some studies have also conducted object recognition using 2D image data of heavy construction equipment at earthwork sites. Xiao et al. (2020) explored deep learning-based object detection and created the ACID for object classification [26]. This is a representative 2D image-based dataset of heavy construction equipment and comprises 100,000 labeled images for 10 types of heavy machinery: excavators, compactors, dozers, graders, dump trucks, concrete mixer trucks, wheel loaders, backhoe loaders, tower cranes, and mobile cranes [26]. It was used to train and test four image-based object recognition algorithms: region-based fully convolutional network (R-FCN)-ResNet101, Faster-RCNN-ResNet101, YOLO-v3, and inception-single shot multi-box detector (SSD), which achieved mean average precisions (mAPs) of 88.8%, 89.2%, 87.8%, and 83.0%, respectively. Moreover, YOLO-v3 exhibited the highest processing speed of 26.3 frames per second (fps) [26].

Arabi et al. (2019) conducted deep learning-based object detection using 2D images of heavy construction equipment by developing a MobileNet-SSD model for six types of heavy construction equipment: loaders, excavators, dump trucks, concrete-mixing trucks, rollers, and graders [45]. They used trained embedded systems, namely Jetson TX2 and Raspberry Pi 3B, for this construction equipment to reduce the processing time and enhance detection efficiency [45]. Thus, many studies have used databases such as ACID [26], AIM [45], ImageNet [25], and Microsoft COCO [46] that contain 2D images of heavy construction equipment. Among them, Xiao et al. (2020) [26] and Arabi et al. (2019) [45] created custom datasets comprising images of heavy construction equipment and selected 3D image-based object recognition algorithms that exhibited notable performance; they then trained and tested the models using these datasets. Therefore, these studies are academically significant for the application of object detection technology to the heavy construction equipment used at construction sites.

2.2. Object Recognition in the Construction Industry Using 3D Point Cloud Data

Most previous studies have employed 2D image and video data for object recognition during the construction and maintenance stages of construction projects. Although studies using 2D image data have been prevalent for some time, those on object recognition using 3D point cloud data emerged later but are steadily gaining traction. Most studies aimed at utilizing 3D point cloud data for the construction industry have concentrated on object classification of heavy equipment or structural elements, including walls, ceilings, doorways, and windows, or on semantic segmentation of bridges, railway tunnels, and temporary structures.

Xiong et al. (2013) classified building components, including walls, ceilings, floors, clutter, doorways, and windows, through a support vector machine (SVM), which is a widely used machine learning algorithm, to analyze point cloud data generated from building scans [47]. Kim et al. (2013) built 3D scan data for the construction site of a four-story concrete building; they classified the ground, second, third, and roof floors using an SVM classifier and compared the 3D scan data with a 4D building information model (BIM) for process management [48].

Subsequently, studies on semantic segmentation of construction sites have primarily targeted large-scale facilities, such as bridges and railway tunnels [24,49,50]. Kim et al. (2020) employed a semantic segmentation algorithm based on point cloud data to analyze the components of road bridges, such as abutments, slabs, piers, and girders [24]. Additionally, Ryu et al. (2021) employed semantic segmentation techniques to analyze linings, drains, tracks, sleepers, and floors in railway tunnels [50], whereas Kim et al. (2022) created 3D point cloud data by scanning scaffolds at construction sites and applied semantic segmentation to generate a BIM model for scaffolds, using it for safety inspections [51].

Although various studies have focused on classification and segmentation using 3D point cloud data from construction sites, few have used these data for object detection. Current studies have mainly focused on applying object detection algorithms such as YOLO to 2D images of heavy construction equipment, construction personnel, and buildings [26,43,52]. The scarcity of object detection studies using 3D point cloud data in the construction industry is owed to the challenges posed by their large size, high density, and high levels of noise and occlusion [31], which make it difficult to apply object detection algorithms [47]. Therefore, further research is essential for developing object detection algorithms that can handle the large and highly dense point cloud data of construction sites.

Furthermore, Chen et al. [31] conducted a prominent object recognition study using 3D point cloud data of heavy construction equipment for classifying this equipment using machine learning. They developed a principal axes descriptor (PAD) and applied it to an SVM to classify five types of construction equipment: backhoe loaders, bulldozers, dump trucks, excavators, and front loaders. In a follow-up study, they developed an additional 3D descriptor alongside the PAD and employed machine learning to classify objects commonly found at construction sites, such as trailers, trucks, workers, excavators, and mobile cranes [53]. These studies are significant because they were the first to use 3D point cloud data for machine learning-based object classification of heavy construction equipment, whereas most prior research on classifying such equipment relied on 2D image data [31,53]. However, because they classified heavy construction equipment by converting AUTOCAD files into scanned files, it is challenging to apply their approach for semantic segmentation of large-scale 3D digital maps of earthwork sites created with advanced scanning devices, such as airborne laser scanning, mobile scanning technology, and terrestrial laser scanning (TLS). These studies were groundbreaking because they were the first to employ machine learning-based object classification techniques on 3D point cloud data of heavy construction equipment. Their significance lies in the application of cutting-edge technology in the construction industry, where CAD files of heavy construction equipment are transformed into 3D point cloud data files for use in an SVM. However, AI technology progressed rapidly in 2017 and 2018, coinciding with the publication of these studies. By 2023, when this study was conducted, numerous deep learning technologies with significantly enhanced performances had been developed. Therefore, this study successfully conducted semantic segmentation of heavy construction equipment using 3D point cloud data and the latest deep learning object recognition technologies.

2.3. Semantic Segmentation Based on Large-Scale 3D Point Cloud Data

Unlike the evolution of CNN-based object recognition methods, which primarily utilize 2D image data with uniform widths and heights, early research on 3D object recognition encountered several challenges, such as data loss. To address these challenges, PointNet, introduced in 2017, was the first deep learning-based object recognition algorithm designed to process 3D point cloud data [28]. PointNet utilizes properties unique to 3D point cloud data, such as permutation and transformation invariance, to address challenges such as non-uniform density, lack of structure, and unordered point cloud data in 3D object recognition tasks. In PointNet, permutation invariance is integrated into a deep neural network, ensuring that the index and its associated coordinate data remain consistent even when the indices of the 3D point cloud data are rearranged [54]. Transformation invariance is maintained using a transformation network (T-Net) that consolidates the reference coordinate axes into a unified perspective. This ensures that all text is observed from a consistent viewpoint regardless of the input data orientation [29]. These two fundamental principles of PointNet enable classification, partial segmentation, and semantic segmentation in object recognition tasks based on 3D point cloud data [28]. However, it allows processing a maximum of 2048 point clouds simultaneously, and its object recognition rate (mean intersection over union (mIoU)) is lower than that of 2D image-based object recognition methods [29,54].

Thus, although PointNet is groundbreaking because it facilitates object classification and semantic segmentation using 3D point cloud data, its limitation of processing a maximum of 2048 point clouds simultaneously renders it unsuitable for semantic segmentation using extensive 3D point cloud data from earthwork sites. Therefore, this study further reviewed recent research on the development of deep learning-based semantic segmentation algorithms using 3D point cloud data, specifically targeting urban areas. This is because deep learning-based semantic segmentation algorithms that utilize 3D point cloud data, particularly those aimed at urban areas, typically share similar characteristics with data obtained from large-scale earthwork sites. Among the latest deep learning-based semantic segmentation algorithms reviewed, we selected RandLA-Net [29], spatial contextual feature (SCF)-Net [55], and kernel point convolution (KPConv) [34]. They met the criteria for training and testing with the Semantic3D dataset employed in this study and using 3D point cloud datasets obtained through an UAV with properties similar to those of a static TLS. Accordingly, we conducted a literature review that focused on the features of these algorithms.

RandLA-Net is tailored to handle large-scale 3D point cloud data and exhibits remarkable performance, with the aim of addressing the constraints of PointNet that were prevalent at that time. The 3D scanning data of urban regions are large-scale 3D point cloud data that are unsuitable for applying object recognition technologies designed for indoor objects. RandLA-Net [29] offers various advantages, such as handling large-scale 3D point cloud data, processing significant amounts of data simultaneously, and improved accuracy and processing speed [54]. Moreover, it is suitable for semantic segmentation tasks involving datasets such as Semantic3D [35], SemanticKITTI [56], and Stanford 3D Indoor Scene Dataset (S3DIS) [57]. In an experiment using the Semantic3D dataset, which closely resembles the characteristics of the objects employed in this study, RandLA-Net achieved an mIoU of 77.4% and overall accuracy (OA) of 94.8%, indicating excellent performance for semantic segmentation tasks involving 3D point cloud data [29].

However, this algorithm has the drawback of long processing times owing to low GPU utilization during training and testing on the Semantic3D dataset. To address these limitations, SCF-Net was designed as a semantic segmentation algorithm for large-scale 3D point cloud data [55]. In a related study, the SCF-Net algorithm outperformed RandLA-Net in a semantic segmentation experiment using the Semantic3D dataset. Finally, KPConv is a semantic segmentation technique that utilizes the extensive 3D point cloud data employed in this study [34]. Unlike grid-based methods, such as VoxelNet [58], or multilayer perceptron (MLP)-based methods, such as PointNet, this algorithm is designed to be easily applicable to convolution without converting 3D point cloud data into 2D image data [28,34,58]. Thus, KPConv can be applied to large-scale 3D point cloud datasets such as Scannet, Semantic3D, S3DIS, or Paris-Lille-3D (PL3D). Additionally, rigid KPConv yielded a higher mIoU than deformable KPConv on the Semantic 3D dataset [34]. In this chapter, we conducted a literature review on large-scale 3D point cloud data-based semantic segmentation algorithms, examining the problems addressed by each algorithm over time. Among the various 3D point cloud data-based semantic segmentation algorithms reviewed in this research, the PointNet algorithm stands out. Unlike previous research and traditional semantic segmentation methods developed for small indoor objects, PointNet enabled the implementation of semantic segmentation technology using large-scale 3D point cloud data. This established PointNet as a pioneering approach in the field of large-scale 3D point cloud data-based semantic segmentation. Additionally, following the PointNet algorithm, newer methods such as RandLA-Net, SCF-Net, and KPConv algorithms have been developed. Our literature review confirms that these latest algorithms also make it possible to apply semantic segmentation technology to large-scale 3D digital maps of earthwork site point cloud data.

3. Methodology

3.1. Framework

This chapter proposes the framework, as illustrated in Figure 1, of a high-performance semantic segmentation algorithm based on 3D point cloud data for heavy construction equipment obtained from the 3D digital maps of earthwork sites. It is divided into four steps for efficient semantic segmentation of large-scale 3D point cloud data of earthwork sites.

First, in the “selection of deep learning-based semantic segmentation algorithms for heavy construction equipment” step, deep learning-based semantic segmentation algorithms suitable for large-scale 3D point cloud data of earthwork sites were reviewed and selected for object recognition. Large-scale 3D point cloud data in the form of 3D digital maps of earthwork sites are unsuitable for deep learning-based semantic segmentation algorithms developed for indoor environments. These methods are only applicable to algorithms developed for datasets collected from urban areas. Hence, applying them to high-performance semantic segmentation algorithms for 3D point cloud data is challenging. Therefore, we conducted a comprehensive literature review and preliminary testing to identify a high-performance semantic segmentation algorithm that is suitable for 3D digital maps of earthwork sites. We also examined customizable 3D point cloud data that can be applied to this algorithm. The relevant details are provided in Section 2.

Next, in the “generation of earthwork site 3D digital maps” step, photogrammetry was conducted using UAVs to collect topographic data from the sites. The collected data were then preprocessed to construct 3D digital maps of earthwork sites composed of 3D point cloud data. To achieve this, a road construction site that was undergoing earthwork was selected. Through UAV photogrammetry and data preprocessing, the data were converted into the 3D point cloud format, resulting in 14 3D digital maps of the earthwork sites. Subsequently, the 3D digital maps were postprocessed for “constructing 3D PCD-based heavy construction equipment semantic segmentation datasets”. Additionally, the heavy construction equipment data from 3D-ConHE were integrated into the 3D digital maps and 3D labeled. These steps were crucial for creating a dataset comprising adequate heavy construction equipment data based on 3D point cloud data for training and testing the deep learning-based semantic segmentation models. This difficulty arises during the model training process using 3D digital maps of earthwork sites generated through UAV photogrammetry. The proposed data construction method addresses the problem of overfitting owing to insufficient training data. Finally, in the “creation of deep learning prediction models for heavy construction equipment and their performance evaluation” step, four semantic segmentation models based on 3D point cloud data were trained, validated, and tested on heavy construction equipment. Subsequently, the results were compared and analyzed to evaluate the model performances. The steps in the proposed semantic segmentation framework were structured chronologically.

3.2. Performance Metrics

We specifically selected the confusion matrix to evaluate the training effectiveness of the 3D semantic segmentation models employed in this study. Other metrics, such as OA, IoU, and mIoU, have been used to assess the performance of semantic segmentation models based on 3D point cloud data [29,33,34]. Equations (1)–(3) were used to compute the OA, IoU, and mIoU, respectively, to validate the semantic segmentation models trained in this study. True positives (TPs), true negatives (TNs), false positives (FPs), and false negatives (FNs) were defined based on the confusion matrix. Equation (3) represents the arithmetic mean, which is calculated by dividing the sum of the IoUs of multiple data points by the number of data points (N):

O A = \frac{\sum {T P}_{n}}{{T P}_{n} + {F P}_{n} + {T N}_{n} + {F N}_{n}}

(1)

{I o U}_{n} = \frac{{T P}_{n}}{{T P}_{n} + {F P}_{n} + {F N}_{n}}

(2)

{m I o U}_{n} = \frac{\sum {I o U}_{n}}{N}

(3)

4. Data Generation and Semantic Segmentation Model Construction

4.1. Generation of a 3D Digital Map of Earthwork Sites

4.1.1. UAV Photogrammetry of Earthwork Site and Data Preprocessing

The 3D point cloud data from two road construction sites were acquired via UAV photogrammetry using the DJI Phantom 4 RTK, which was connected to an RTK GNSS base station. The selected sites were close to Gwangju, Gyeonggi-do, and Jeungpyeong-gun, Chungcheongbuk-do, South Korea. Photogrammetry was performed at these sites on two separate occasions. Fourteen 3D digital maps encompassing the diverse topographical characteristics and types of heavy construction equipment employed at these sites were generated. Additionally, to ensure data consistency during the creation of 3D digital maps sites, UAV photogrammetry was conducted at an altitude of 100 m. However, owing to varying wind conditions, the UAV flew at altitudes ranging from 80 to 120 m wh. Additionally, the 2D images of the earthwork sites obtained through UAV photogrammetry were converted into 3D point cloud data via preprocessing, resulting in 14 3D digital maps in the LAS format. Finally, noise and outliers were removed from the maps using the open-source Cloudcompare software v2.12 alpha [64-bit] [59].

4.1.2. Changing the Format of 3D Digital Map Data

The 14 3D digital maps of the earthwork sites created in this study were converted into a dataset format suitable for training and testing before using them for the point cloud data-based semantic segmentation models. Thus, the Semantic3D dataset was selected, as its characteristics are similar to the dataset used in this study: large-scale and captured using stationary TLS. Consequently, the 3D digital maps were converted into the Semantic3D dataset format by converting their data fields into the [X, Y, Z, I, R, G, B] format. This process was conducted on a system comprising Windows 10 and Python 3.6. This conversion process standardized the data field format of the 3D digital maps to match that of the Semantic3D dataset. Consequently, it allowed the application of the 3D point cloud data-based semantic segmentation algorithms selected for this study.

4.1.3. 3D-ConHE Dataset Integration and Custom Dataset Construction

After converting the 3D digital maps of the earthwork sites into the Semantic3D dataset format, the 3D point cloud data of heavy construction equipment from the 3D-ConHE dataset created in a previous study [40] were integrated into the 3D digital maps to generate a custom dataset. This was done because the 3D digital maps created in this study did not include sufficient data for the five types of heavy construction equipment, resulting in insufficient 3D point cloud data to adequately train the semantic segmentation models. To address this limitation, specific parts of the 3D-ConHE dataset were integrated into the construction section of the 3D digital maps. This process generated a custom dataset suitable for training and testing the semantic segmentation models. The same method was used to integrate 15 pieces of 3D-ConHE data into the 3D digital maps to create custom datasets comprising 14 files with a combined size of 5.046 GB and 145,584,227 points. Additionally, all the 3D digital files in the custom dataset had dimensions equal to or smaller than 600 × 500 × 90 m in width, length, and height, respectively. Each file was ≤500 MB and contained ≤15,000,000 points.

4.2. 3D Semantic Labeling

4.2.1. Labeling of Large-Scale 3D Point Cloud Data

The 15 custom datasets were subjected to 3D point cloud labeling. The Semantic Segmentation Editor is a common software Version 1.6.0 used to label semantic segmentation datasets of general 3D point cloud data [60] and is often employed in semantic segmentation or object detection tasks, particularly for autonomous vehicle data. However, the custom datasets created in this study were large-scale 3D digital maps containing 3D point cloud data obtained from earthwork sites, which differs from the typical smaller-scale 3D point cloud datasets of autonomous vehicles. Consequently, it could not be used for 3D labeling in this study. Instead, semantic segmentation labeling for the custom datasets was conducted using labeling software designed for TLS data.

To conduct labeling, three classes were defined to represent the diverse geographical characteristics of the earthwork sites: Class 1 (Ground), Class 2 (Non-ground), and Class 3 (Heavy Construction Equipment, HCE). Point clouds representing construction sections were labeled as “Ground” for Class 1, whereas those representing forests, roads, and fields were labeled as “Non-ground” for Class 2. Finally, the point clouds of all heavy construction equipment in 3D-ConHE were labeled “heavy construction equipment” for Class 3. Adequate training data were ensured by labeling all point clouds corresponding to the five types of heavy construction equipment as Class 3. This prevented overfitting during the training of the semantic segmentation models and facilitated stable training. Point clouds that did not fall into Classes 1, 2, or 3 were categorized as unlabeled points, denoted as Class 0. Following this, all 15 custom datasets were labeled and inspected to ensure the acquisition of high-quality labeled 3D point cloud data.

4.2.2. Creating a Heavy Construction Equipment Dataset Based on 3D Point Cloud Data

We conducted 3D point cloud data labeling to create 14 (10 for training, 4 for testing) heavy construction equipment datasets with 3D point cloud data suitable for training and testing the semantic segmentation algorithms. All data in these datasets were formatted in the TXT and LABELS format. The TXT data fields included the [X, Y, Z, I, R, G, and B] values corresponding to the Semantic3D dataset, whereas the LABELS data fields comprised the class value [C]. The 14 datasets had a combined file size of 5.046 GB and total point count of 145,584,227, representing the 3D digital maps generated from outdoor earthwork sites. All 3D digital files had dimensions of ≤600 m × 500 m × 90 m in width, length, and height, respectively.

4.3. Construction of Semantic Segmentation Models

4.3.1. Overview

In this study, the KPConv algorithm stood out due to its kernel point fully convolutional neural network (KP-FCNN) architecture, which is classified as rigid or deformable based on the kernel type used. To determine the most suitable version for earthwork sites, the algorithm was divided into rigid and deformable versions for the experiment. Accordingly, four algorithms were used for training, validation, and testing: RandLA-Net, KPConv Rigid, KPConv Deformable, and SCF-Net. It should be noted that all four algorithms were tested and validated simultaneously to develop semantic segmentation models for heavy construction equipment based on 3D point cloud data. Subsequently, they were subjected to sequential testing. The RandLA-Net and KPConv algorithms were implemented on a system comprising an Nvidia RTX 2080 Ti 11 GB GPU, whereas SCF-Net was implemented on one comprising an Nvidia GTX 1080 Ti 11 GB GPU.

4.3.2. Training and Validation

All four models were simultaneously trained and validated using a batch size of two, a maximum of 100 epochs, and a learning rate of 0.001. We used identical hyperparameters for all models to determine the most optimal model. For the experiment, 10 training datasets of 3D point cloud data of heavy construction equipment were used, comprising 110,591,765 points with a file size of 3.813 GB. In contrast to 2D image data-based object recognition algorithms, the models in this study were trained on the point clouds of each class rather than the number of files in the training data. Therefore, although the number of files in the training data may appear limited, the semantic segmentation models were trained on 10 datasets comprising 110,591,765 point clouds, thereby ensuring adequate training. The four models were trained for a single epoch. To assess their performance, their IoU, mIoU, and OA scores were evaluated at the end of each epoch. Subsequently, the models were trained for the maximum allowable number of epochs, and the best-performing model was selected to formulate the four semantic segmentation algorithms.

5. Results and Discussion

5.1. Test

The performance of the four semantic segmentation models trained on 3D point cloud data was evaluated using a batch size of two and step size of 100 on four test sets from the 3D point cloud datasets of heavy construction equipment. The test data had a combined size of 1.233 GB and comprised 34,992,462 point-clouds. Figure 2 illustrates Test Dataset 1 along with its ground truth-labeled version. In Figure 2b, Class 1 (Ground) is shown in blue, Class 2 (Non-ground) is shown in green, and Class 3 (Heavy Construction Equipment, HCE) is shown in red.

5.2. Test Results and Discussions

In this section, the performance of each semantic segmentation model is compared and analyzed by consolidating the results of all four models. First, their prediction results for the four test datasets used in the experiment are visually compared.

Figure 3 shows a visual comparison of the results obtained for first test dataset. Evidently, RandLA-Net outperformed the other models for Class 1 (Ground) and Class 2 (Non-ground). Furthermore, for Class 3 (Heavy Construction Equipment), SCF-Net and RandLA-Net, which have the highest values for Test Dataset 1 in Table 1, exhibited similar results. In Figure 3, Class 1 (Ground) is shown in blue, Class 2 (Non-ground) is shown in green, and Class 3 (Heavy Construction Equipment, HCE) is shown in red.

The mIoU, OA, and IoU values of the four trained semantic segmentation models for construction equipment for each class across the four test datasets are listed in Table 1. Additionally, it includes the mIoU, OA, and overall mIoU for each class across the four test datasets. Evidently, RandLA-Net achieved the highest overall mIoU of 79.8%, followed closely by SCF-Net at 78.1%. Moreover, RandLA-Net exhibited the highest OA of 84.4%, followed by KPConv Rigid and SCF-Net at 82.8%. In terms of the mIoU for each class, RandLA-Net achieved the highest value of 80.3% for Class 1 (Ground), whereas SCF-Net achieved the highest value of 86.4% for Class 2 (Non-ground). Finally, for Class 3 (Heavy Construction Equipment), SCF-Net achieved the best performance of 89.5%. The four test datasets were uniformly used to evaluate four semantic segmentation models. Among them, RandLA-Net exhibited superior performance in terms of mIoU, OA, and mIoU for Class 1 compared, whereas SCF-Net surpassed RandLA-Net in terms of the mIoU for Classes 2 and 3.

The tests of the four semantic segmentation models in this research showed mIoU ranging from 79.8% to 63.9% and overall accuracy ranging from 84.4% to 82.6%. Compared to the object recognition accuracy of 2D image data, which ranges from 88% to 96% as reviewed in related works [15,18,26,39,40,41,43,44], the results of this research are relatively lower. This indicates that the object recognition performance is related to the differences between 2D image-based object recognition algorithms and 3D point cloud data-based object recognition. For instance, the RandLA-Net algorithm selected in this study is known to have the best performance among currently developed 3D point cloud data-based semantic segmentation algorithms. However, according to related papers, it shows a performance of mIoU 77.4% and OA 94.8% when using the Semantic3D dataset, which is somewhat lower compared to 2D image-based object recognition. Various reasons may account for these results. According to PointNet [28], while 2D image data is composed of pixels of uniform size, 3D point cloud data is characterized by irregular density, unstructured nature, and lack of order among data points. These characteristics have delayed the development of related technologies. Therefore, 3D point cloud data-based object recognition is known to have relatively lower object recognition rates and higher error rates compared to 2D image-based object recognition. However, with the development of new 3D point cloud data-based semantic segmentation algorithms, such as OpenTrench3D [61], the object recognition rate is gradually increasing, and the error rate is decreasing. This trend is expected to narrow the performance gap between 2D image-based object recognition and 3D point cloud data-based object recognition over time.

Based on the results obtained from this research, we confirmed that although the accuracy is somewhat lower compared to 2D object recognition algorithms, 3D point cloud data-based semantic segmentation algorithms can be efficiently applied. These 3D point cloud data-based semantic segmentation algorithms are expected to become more applicable with improvements in object recognition rates [60]. Furthermore, the results of this research provide a direction for the application of deep learning technology based on large-scale 3D point cloud data, which is anticipated to increase in utility within the construction industry. This signifies the importance of the study’s findings as foundational research in the related field.

6. Conclusions

With recent advancements in object recognition and preprocessing technologies based on 3D point cloud data, it has become possible to conduct research utilizing large-scale 3D point cloud data, which was previously difficult to apply. Consequently, various studies using 3D point cloud data have been conducted in the construction industry; however, research on object recognition targeting 3D digital maps of earthwork sites remains limited. Therefore, this study selected a semantic segmentation algorithm suitable for large-scale 3D digital maps of earthwork sites from among various semantic segmentation algorithms based on large-scale 3D point cloud data and conducted research on semantic segmentation using this data. The methodology of this study involved first reviewing 3D point cloud data-based semantic segmentation algorithms targeting earthwork sites. Then, a 3D-ConHE dataset was installed on the 3D digital map of the earthwork site to construct a semantic segmentation dataset based on 3D point cloud data. Next, to build datasets for training, validation, and testing of the selected algorithms (RandLA-Net, KPConv rigid, KPConv deform, SCF-Net), labeling was conducted on the 3D digital map of the earthwork site with the 3D-ConHE dataset installed, resulting in the creation of a large and extensive semantic segmentation dataset consisting of 14 3D digital map forms of 3D point cloud data. The constructed 14 semantic segmentation datasets were then applied to the selected algorithms (RandLA-Net, KPConv rigid, KPConv deform, SCF-Net) to build a semantic segmentation training model for construction heavy equipment, and tests were conducted on the constructed model. The results of the construction heavy equipment semantic segmentation training model tested were quantified using performance evaluation metrics such as OA, IoU, and mIoU, and a performance evaluation was conducted to compare the performance of the four semantic segmentation models trained in this study.

According to the performance evaluation results of this study, RandLA-Net showed the highest mIoU at 79.8% among the total four models, as well as the highest OA at 84.4%. For each class mIoU, RandLA-Net showed the highest at 80.3% for Class 1 (Ground), while SCF-Net showed the highest at 86.4% for Class 2 (Non-ground). Lastly, SCF-Net showed the highest at 89.5% for Class 3 (Heavy Construction Equipment). These results indicate that RandLA-Net and SCF-Net algorithms perform well when applying 3D digital maps of earthwork sites composed of large-scale and extensive 3D point cloud data. The results of this study experimentally confirm the applicability of semantic segmentation techniques to 3D digital maps composed of large-scale and extensive 3D point cloud data, which have been known to be difficult to handle and require significant time and effort. This study’s results contribute academically by being the first research on large-scale 3D semantic segmentation in the construction field using 3D digital maps of earthwork sites, thus facilitating subsequent research on various semantic segmentation applications. Moreover, this study proposed ways to enhance the utilization of 3D point cloud data generated at construction sites and improve the efficiency of AI research, serving as a foundational study for the increasing research on semantic segmentation of construction heavy equipment using large-scale 3D digital maps of earthwork sites. However, compared to 2D image-based object recognition research, the lower accuracy and applicability of 3D object recognition remain limitations of this study, requiring continuous research. Nevertheless, this study is expected to serve as a foundational study for technologies such as construction robot and UGV location tracking and safety assurance, leveraging the accurate depth and 3D geometric information characteristics of 3D point cloud data. The results of this study are expected to be utilized in various areas within the construction industry, such as terrain change detection, earthwork volume calculation, and terrain interpolation based on object recognition of 3D digital maps.

Author Contributions

Writing—original draft, S.P.; Writing—review & editing, S.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding authors.

Acknowledgments

This research was conducted with the support of the “National R&D Project for Smart Construction Technology (No. RS-2020-KA158708)” funded by the Korea Agency for Infrastructure Technology Advancement under the Ministry of Land, Infrastructure and Transport, and managed by the Korea Expressway Corporation.

Conflicts of Interest

The authors declare no conflict of interest.

References

Barbosa, F.; Mischke, J.; Parsons, M. Improving Construction Productivity. 2017. Available online: https://www.mckinsey.com/capabilities/operations/our-insights/improving-construction-productivity (accessed on 25 October 2023).
Durdyev, S.; Ismail, S. Offsite Manufacturing in the Construction Industry for Productivity Improvement. Eng. Manag. J. 2019, 31, 35–46. [Google Scholar] [CrossRef]
Jang, W.-S.; Skibniewski, M.J. Cost-Benefit Analysis of Embedded Sensor System for Construction Materials Tracking. J. Constr. Eng. Manag. 2009, 135, 378–386. [Google Scholar] [CrossRef]
Cai, S.; Ma, Z.; Skibniewski, M.J.; Bao, S. Construction Automation and Robotics for High-Rise Buildings over the Past Decades: A Comprehensive Review. Adv. Eng. Inform. 2019, 42, 100989. [Google Scholar] [CrossRef]
Hatami, M.; Flood, I.; Franz, B.; Zhang, X. State-of-the-Art Review on the Applicability of AI Methods to Automated Construction. In Proceedings of the Computing in Civil Engineering 2019: Data, Sensing, and Analytics, Atlanta, GA, USA, 17–19 June 2019; pp. 105–113. [Google Scholar]
Bamfo-Agyei, E.; Thwala, D.W.; Aigbavboa, C. Performance Improvement of Construction Workers to Achieve Better Productivity for Labour-Intensive Works. Buildings 2022, 12, 1593. [Google Scholar] [CrossRef]
Park, S.; Kim, S.; Seo, H. Study on Representative Parameters of Reverse Engineering for Maintenance of Ballasted Tracks. Appl. Sci. 2022, 12, 5973. [Google Scholar] [CrossRef]
Park, S.; Kim, S. Analysis of Overlap Ratio for Registration Accuracy Improvement of 3D Point Cloud Data at Construction Sites. J. KIBIM 2021, 11, 1–9. [Google Scholar]
Park, S.; Kim, S. Performance Evaluation of Denoising Algorithms for the 3D Construction Digital Map. J. KIBIM 2020, 10, 32–39. [Google Scholar]
Kim, Y.; Park, S.; Choi, Y.; Kim, S. Performance Verification of Integrated Module for Automatic Analysis of Digital Maps in Earthwork Sites. J. Constr. Autom. Robot. 2023, 2, 1–6. [Google Scholar] [CrossRef]
Kim, Y.-G.; Park, S.-Y.; Kim, S. Development of Framework for Digital Map Time Series Analysis of Earthwork Sites. J. KIBIM 2023, 13, 22–32. [Google Scholar] [CrossRef]
Choi, Y.; Park, S.; Kim, S. GCP-Based Automated Fine Alignment Method for Improving the Accuracy of Coordinate Information on UAV Point Cloud Data. Sensors 2022, 22, 8735. [Google Scholar] [CrossRef]
Choi, Y.; Park, S.; Kim, S. Development of Point Cloud Data-Denoising Technology for Earthwork Sites Using Encoder-Decoder Network. KSCE J. Civ. Eng. 2022, 26, 4380–4389. [Google Scholar] [CrossRef]
Li, X.; Yi, W.; Chi, H.L.; Wang, X.; Chan, A.P.C. A Critical Review of Virtual and Augmented Reality (VR/AR) Applications in Construction Safety. Autom. Constr. 2018, 86, 150–162. [Google Scholar] [CrossRef]
Mostafa, K.; Hegazy, T. Review of Image-Based Analysis and Applications in Construction. Autom. Constr. 2021, 122, 103516. [Google Scholar] [CrossRef]
Lasky, T.A.; Ravani, B. Sensor-Based Path Planning and Motion Control for a Robotic System for Roadway Crack Sealing. IEEE Trans. Control Syst. Technol. 2000, 8, 609–622. [Google Scholar] [CrossRef]
Li, H.; Lu, M.; Hsu, S.C.; Gray, M.; Huang, T. Proactive Behavior-Based Safety Management for Construction Safety Improvement. Saf. Sci. 2015, 75, 107–117. [Google Scholar] [CrossRef]
Jeong, I.; Kim, J.; Chi, S.; Roh, M.; Biggs, H. Solitary Work Detection of Heavy Equipment Using Computer Vision. KSCE J. Civ. Environ. Eng. Res. 2021, 41, 441–447. [Google Scholar]
Hsieh, Y.-A.; Tsai, Y.J. Machine Learning for Crack Detection: Review and Model Performance Comparison. J. Comput. Civ. Eng. 2020, 34, 04020038. [Google Scholar] [CrossRef]
Cha, Y.J.; Choi, W.; Büyüköztürk, O. Deep Learning-Based Crack Damage Detection Using Convolutional Neural Networks. Comput. Civ. Infrastruct. Eng. 2017, 32, 361–378. [Google Scholar] [CrossRef]
Ekanayake, B.; Wong, J.K.W.; Fini, A.A.F.; Smith, P. Computer Vision-Based Interior Construction Progress Monitoring: A Literature Review and Future Research Directions. Autom. Constr. 2021, 127, 103705. [Google Scholar] [CrossRef]
Dang, L.M.; Wang, H.; Li, Y.; Park, Y.; Oh, C.; Nguyen, T.N.; Moon, H. Automatic Tunnel Lining Crack Evaluation and Measurement Using Deep Learning. Tunn. Undergr. Space Technol. 2022, 124, 104472. [Google Scholar] [CrossRef]
Oh, K.; Yoo, M.; Jin, N.; Ko, J.; Seo, J.; Joo, H.; Ko, M. A Review of Deep Learning Applications for Railway Safety. Appl. Sci. 2022, 12, 572. [Google Scholar] [CrossRef]
Kim, H.; Kim, C. Deep-Learning-Based Classification of Point Clouds for Bridge Inspection. Remote Sens. 2020, 12, 3757. [Google Scholar] [CrossRef]
Fei-Fei, L.; Deng, J.; Li, K. ImageNet: Constructing a Large-Scale Image Database. J. Vis. 2010, 9, 1037. [Google Scholar] [CrossRef]
Xiao, B.; Kang, S.-C. Development of an Image Data Set of Construction Machines for Deep Learning Object Detection. J. Comput. Civ. Eng. 2021, 35, 05020005. [Google Scholar] [CrossRef]
Guo, Y.; Wang, H.; Hu, Q.; Liu, H.; Liu, L.; Bennamoun, M. Deep Learning for 3D Point Clouds: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 4338–4364. [Google Scholar] [CrossRef]
Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. PointNet. In Proceedings of the 2016 Fourth International Conference on 3D Vision (3DV 2016), Stanford, CA, USA, 25–28 October 2016; pp. 601–610. [Google Scholar]
Hu, Q.; Yang, B.; Xie, L.; Rosa, S.; Guo, Y.; Wang, Z.; Trigoni, N.; Markham, A. Randla-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11105–11114. [Google Scholar] [CrossRef]
Kim, H.; Kim, H.; Hong, Y.W.; Byun, H. Detecting Construction Equipment Using a Region-Based Fully Convolutional Network and Transfer Learning. J. Comput. Civ. Eng. 2018, 32, 04017082. [Google Scholar] [CrossRef]
Chen, J.; Fang, Y.; Cho, Y.K.; Kim, C. Principal Axes Descriptor for Automated Construction-Equipment Classification from Point Clouds. J. Comput. Civ. Eng. 2017, 31, 04016058. [Google Scholar] [CrossRef]
Bello, S.A.; Yu, S.; Wang, C.; Adam, J.M.; Li, J. Review: Deep Learning on 3D Point Clouds. Remote Sens. 2020, 12, 1729. [Google Scholar] [CrossRef]
Tan, W.; Qin, N.; Ma, L.; Li, Y.; Du, J.; Cai, G.; Yang, K.; Li, J. Toronto-3D: A Large-Scale Mobile LiDAR Dataset for Semantic Segmentation of Urban Roadways2211. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 797–806. [Google Scholar] [CrossRef]
Thomas, H.; Qi, C.R.; Deschaud, J.E.; Marcotegui, B.; Goulette, F.; Guibas, L. KPConv: Flexible and Deformable Convolution for Point Clouds. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6410–6419. [Google Scholar] [CrossRef]
Hackel, T.; Savinov, N.; Ladicky, L.; Wegner, J.D.; Schindler, K.; Pollefeys, M. Semantic3D.net: A new Large-scale Point Cloud Classification Benchmark. arXiv 2014, arXiv:1704.03847. [Google Scholar] [CrossRef]
Roynard, X.; Deschaud, J.E.; Goulette, F. Paris-Lille-3D: A Large and High-Quality Ground-Truth Urban Point Cloud Dataset for Automatic Segmentation and Classification. Int. J. Rob. Res. 2018, 37, 545–557. [Google Scholar] [CrossRef]
Rusu, R.B.; Cousins, S. 3D Is Here: Point Cloud Library (PCL). In Proceedings of the 2011 IEEE International Conference on Robotics and Automation, Shanghai, China, 9–13 May 2011; pp. 1–4. [Google Scholar] [CrossRef]
Zhou, Q.-Y.; Park, J.; Koltun, V. Open3D: A Modern Library for 3D Data Processing. arXiv, 2018; arXiv:1801.09847. [Google Scholar]
Butler, H.; Chambers, B.; Hartzell, P.; Glennie, C. PDAL: An Open Source Library for the Processing and Analysis of Point Clouds. Comput. Geosci. 2021, 148, 104680. [Google Scholar] [CrossRef]
Park, S.; Kim, S. 3D Point Cloud Dataset of Heavy Construction Equipment. Appl. Sci. 2024, 14, 3599. [Google Scholar] [CrossRef]
Baduge, S.K.; Thilakarathna, S.; Perera, J.S.; Arashpour, M.; Sharafi, P.; Teodosio, B.; Shringi, A.; Mendis, P. Artificial Intelligence and Smart Vision for Building and Construction 4.0: Machine and Deep Learning Methods and Applications. Autom. Constr. 2022, 141, 104440. [Google Scholar] [CrossRef]
Lee, Y.I.; Kim, B.; Cho, S. Image-Based Spalling Detection of Concrete Structures Using Deep Learning. J. Korea Concr. Inst. 2018, 30, 91–99. [Google Scholar] [CrossRef]
Lee, T.; Kim, S.; Hwang, C.-H.; Jung, H. Worker Collision Safety Management System Using Object Detection. J. Korea Inst. Inf. Commun. Eng. 2022, 26, 1259–1265. [Google Scholar]
Hao, N. 3D Object Detection from Point Cloud Based on Deep Learning. Wirel. Commun. Mob. Comput. 2022, 2022, 6228797. [Google Scholar] [CrossRef]
Arabi, S.; Haghighat, A.; Sharma, A. A Deep Learning Based Solution for Construction Equipment Detection: From Development to Deployment. arXiv 2019, arXiv:1904.09021. [Google Scholar]
Colleges, G.T.U.A.; Academy, O.; Academy, O.; Academy, O.; Science, A.C.; Technology, I.; Science, A.C. Microsoft COCO 2014. In Proceedings of the ECCV, Zurich, Switzerland, 5–12 September 2014; pp. 740–755. [Google Scholar]
Xiong, X.; Adan, A.; Akinci, B.; Huber, D. Automatic Creation of Semantically Rich 3D Building Models from Laser Scanner Data. Autom. Constr. 2013, 31, 325–337. [Google Scholar] [CrossRef]
Kim, C.; Kim, C.; Son, H. Automated Construction Progress Measurement Using a 4D Building Information Model and 3D Data. Autom. Constr. 2013, 31, 75–82. [Google Scholar] [CrossRef]
Lee, J.S.; Park, J.; Ryu, Y.M. Semantic Segmentation of Bridge Components Based on Hierarchical Point Cloud Model. Autom. Constr. 2021, 130, 103847. [Google Scholar] [CrossRef]
Ryu, Y.-M.; Kim, B.-K.; Park, J. Effect of Learning Data on the Semantic Segmentation of Railroad Tunnel Using Deep Learning. J. Korean Geotech. 2021, 37, 107–118. [Google Scholar]
Kim, J.; Chung, D.; Kim, Y.; Kim, H. Deep Learning-Based 3D Reconstruction of Scaffolds Using a Robot Dog. Autom. Constr. 2022, 134, 104092. [Google Scholar] [CrossRef]
Nath, N.D.; Behzadan, A.H. Deep Convolutional Networks for Construction Object Detection Under Different Visual Conditions. Front. Built Environ. 2020, 6, 97. [Google Scholar] [CrossRef]
Chen, J.; Fang, Y.; Cho, Y.K. Performance Evaluation of 3D Descriptors for Object Recognition in Construction Applications. Autom. Constr. 2018, 86, 44–52. [Google Scholar] [CrossRef]
Lee, D.-K.; Ji, S.-H.; Park, B.-Y. PointNet and RandLA-Net Algorithms for Object Detection Using 3D Point Clouds. J. Soc. Nav. Archit. Korea 2022, 59, 330–337. [Google Scholar] [CrossRef]
Fan, S.; Dong, Q.; Zhu, F.; Lv, Y.; Ye, P.; Wang, F.Y. SCF-Net: Learning Spatial Contextual Features for Large-Scale Point Cloud Segmentation. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 14499–14508. [Google Scholar] [CrossRef]
Behley, J.; Garbade, M.; Milioto, A.; Quenzel, J.; Behnke, S.; Stachniss, C.; Gall, J. SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9296–9306. [Google Scholar] [CrossRef]
Thabet, A.; Alwassel, H.; Ghanem, B. Self-Supervised Learning of Local Features in 3D Point Clouds. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 4048–4052. [Google Scholar] [CrossRef]
Zhou, Y.; Tuzel, O. VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4490–4499. [Google Scholar] [CrossRef]
Girardeau-Montaut, D. CloudCompare. 2019. Available online: https://www.danielgm.net/cc/ (accessed on 10 November 2023).
Hitachi Automotive and Industry Laboratory Semantic-Segmentation-Editor. Available online: https://github.com/Hitachi-Automotive-And-Industry-Lab/semantic-segmentation-editor (accessed on 10 November 2023).
Hansen, L.H.; Jensen, S.B.; Philipsen, M.P.; Møgelmose, A.; Bodum, L.; Moeslund, T.B. OpenTrench3D: A Photogrammetric 3D Point Cloud Dataset for Semantic Segmentation of Underground Utilities. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops 2024, Seattle, WA, USA, 17–21 June 2024; pp. 7646–7655. [Google Scholar]

Figure 1. Semantic segmentation framework of this research.

Figure 2. Data used for testing the four semantic segmentation models: (a) Test Dataset 1 and (b) its ground truth labels.

Figure 3. Semantic segmentation results of (a) RandLA-Net, (b) KPConv rigid, (c) KPConv deform, and (d) SCF-Net.

Table 1. Test results of the four semantic segmentation models for heavy construction equipment.

Methods	Test Data	mIoU (%)	OA (%)	mIoU (%)
Methods	Test Data	mIoU (%)	Accuracy (%)	IoU_c1 (%)	IoU_c2 (%)	IoU_c3 (%)
RandLA-Net	Total	79.8	84.4	80.3	82.6	76.6
	1	71.4	73.9	71.6	68.3	74.3
	2	84.1	89.2	84.7	90.0	77.5
	3	82.9	84.6	81.4	83.4	83.4
	4	80.9	89.9	83.3	88.8	70.6
KPConv rigid	Total	65.4	82.8	60.1	85.8	50.4
	1	65.0	80.4	74.5	77.7	42.9
	2	67.4	82.9	73.5	83.2	45.5
	3	70.9	93.4	58.8	96.0	57.9
	4	58.4	74.7	33.8	86.2	55.4
KPConv deform	Total	63.9	82.6	58.6	84.3	48.9
	1	54.1	73.6	58.2	81.8	22.2
	2	68.9	82.7	72.0	83.6	52.0
	3	67.4	94.3	51.0	87.5	63.8
	4	65.0	79.9	53.3	84.4	57.4
SCF-Net	Total	78.1	82.8	58.5	86.4	89.5
	1	80.7	80.2	67.4	83.4	91.3
	2	78.3	81.4	68.0	82.4	84.6
	3	83.7	94.6	66.1	95.8	89.3
	4	69.7	74.9	32.5	83.8	92.8

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Park, S.; Kim, S. Semantic Segmentation of Heavy Construction Equipment Based on Point Cloud Data. Buildings 2024, 14, 2393. https://doi.org/10.3390/buildings14082393

AMA Style

Park S, Kim S. Semantic Segmentation of Heavy Construction Equipment Based on Point Cloud Data. Buildings. 2024; 14(8):2393. https://doi.org/10.3390/buildings14082393

Chicago/Turabian Style

Park, Suyeul, and Seok Kim. 2024. "Semantic Segmentation of Heavy Construction Equipment Based on Point Cloud Data" Buildings 14, no. 8: 2393. https://doi.org/10.3390/buildings14082393

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Semantic Segmentation of Heavy Construction Equipment Based on Point Cloud Data

Abstract

1. Introduction

1.1. Research Background and Objectives

1.2. Research Scope and Methods

2. Related Works

2.1. Object Recognition in the Construction Industry Using 2D Image Data

2.2. Object Recognition in the Construction Industry Using 3D Point Cloud Data

2.3. Semantic Segmentation Based on Large-Scale 3D Point Cloud Data

3. Methodology

3.1. Framework

3.2. Performance Metrics

4. Data Generation and Semantic Segmentation Model Construction

4.1. Generation of a 3D Digital Map of Earthwork Sites

4.1.1. UAV Photogrammetry of Earthwork Site and Data Preprocessing

4.1.2. Changing the Format of 3D Digital Map Data

4.1.3. 3D-ConHE Dataset Integration and Custom Dataset Construction

4.2. 3D Semantic Labeling

4.2.1. Labeling of Large-Scale 3D Point Cloud Data

4.2.2. Creating a Heavy Construction Equipment Dataset Based on 3D Point Cloud Data

4.3. Construction of Semantic Segmentation Models

4.3.1. Overview

4.3.2. Training and Validation

5. Results and Discussion

5.1. Test

5.2. Test Results and Discussions

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI