1. Introduction
1.1. Background and Purpose of Research
The application of AI in construction is gaining momentum, with AI being utilized not only in architecture and structural engineering but also in intelligent disaster prevention and reduction [
1]. To effectively deploy AI across various settings, it is crucial to gather data specific to each environment. For instance, developing an AI solution for detecting cracks in concrete requires a vast array of crack images collected from different settings [
2]. However, in reality, data are finite, and obtaining training data from real-world environments demands considerable time and effort [
3]. Consequently, there is a growing emphasis on the importance of and demand for securing training data, especially in construction sites, where such efforts are increasingly recognized. Nonetheless, existing research on acquiring training data for construction sites faces several limitations. First, construction sites are inherently dynamic and hazardous, characterized by frequent changes and a large workforce. The traditional approach to collecting AI training data at construction sites, i.e., direct onsite data acquisition, poses significant risks. Given the construction industry’s reputation as one of the most hazardous industries [
4], gathering data directly from these environments is fraught with safety concerns. An alternative, gathering data in a virtual environment instead of through onsite photography or videography, offers a safer and more quantifiable method. However, data obtained from existing virtual reality typically suffer from limited backgrounds due to the labor-intensive and time-consuming costs associated with background creation. Consequently, they fail to reflect the complex and dynamic background characteristics of actual construction sites, and they do not capture the diversity and continuous change inherent to construction-site environments. According to an existing study, changing backgrounds make it challenging to identify objects consistently [
5]. Therefore, the efficacy of object recognition can greatly vary with the dynamics of the site’s background, underscoring the need for data collection methods that incorporate the authentic and variable backgrounds of construction sites to achieve meaningful outcomes.
This study aimed to address the limitations of previous research and capture the dynamics of construction sites by separating and synthesizing objects and backgrounds to generate synthetic data of scaffolding, which is often a cause of high incidences of fall accidents on construction sites. Throughout this process, the effectiveness of synthetic data creation, the potential to resolve data scarcity issues using synthetic data, and the exploration of various composition ratios to enhance object recognition performance were investigated. Additionally, by creating backgrounds that match the object’s environment, the goal was to secure data suitable for each site and reduce the need for on-site real data collection through the optimal mix of synthetic and real data within a hybrid dataset. By constructing hybrid datasets and assessing the accuracy across different composition ratios, this research sought to empirically validate how hybrid datasets can address data scarcity and quality issues and aimed to realize a cost-effective and efficient data construction method through the generation of synthetic data.
1.2. Research Scope and Method
The objective of this study was to acquire data through synthesis, effectively distinguishing between the object and background to capture the specific characteristics of the construction site. The background represents the construction environment, while the focal object is defined as the frame scaffolding. Frame scaffolding is extensively utilized in construction sites, and falls from such structures are a common occurrence. The imperative of AI-based recognition for identifying frame scaffolding lies in its potential to enhance worker safety and detect potential errors in scaffolding assembly. Additionally, acquiring training data for frame scaffolding is essential to enhance AI recognition performance. Given the linear structure of frame scaffolding, as opposed to a planar one, it allows for the visibility of all scenes positioned behind it. Consequently, the recognition of frame scaffolding, which is a linear object, is anticipated to pose greater difficulty compared to other objects.
The dataset utilized for the benchmarking training performance was captured through video recordings at the construction site. This study’s methodology and procedures are illustrated in
Figure 1. In the initial phase, the frame scaffolding, identified as the object of interest, was segmented from its background using the real-world images acquired. In terms of the background, a comprehensive training dataset was developed by gathering photographs and videos that accurately depict the construction site’s atmosphere. These elements were then synthesized with the construction site backgrounds, integrating the isolated objects. This synthesized training dataset was further augmented using Roboflow, an advanced platform designed for image dataset management and enhancement of model development processes. Roboflow offers key features, such as image annotation, labeling, and augmentation. Following the augmentation, the enriched synthetic dataset underwent labeling and was processed using the YOLOv5 algorithm for training. The research included an analysis of the outcomes by modifying the proportion of synthetic to real data within the dataset, evaluating the dataset’s efficacy at various ratios. The structure of the remainder of this document is as follows.
Section 2 engages in a literature review pertinent to training datasets.
Section 3 details the methodology employed for generating a synthetic dataset specific to construction sites, encompassing strategies for data collection, synthesis, labeling, and augmentation.
Section 4 outlines the results from the performance evaluation of the hybrid dataset.
Section 5 concludes the document with a summary of our findings and their implications.
2. Literature Review
Research is being conducted to improve object recognition performance in construction sites by developing neural network-based architectures and by generating synthetic data. In [
6,
7], a network that predicts parameters like the center point, width, and angle of construction vehicles to detect them accurately and efficiently was proposed. Additionally, ref. [
8] introduced a detection method based on the Enet neural network for identifying fenced at construction sites. The algorithm requires reconstruction to suit new objects, and network design also demands a significant variety of data. Thus, this study focused on generating synthetic data to improve recognition performance and conducted a literature review to explore this method. The construction of an AI training dataset is categorized as illustrated in
Figure 2. It primarily involves two approaches: collecting data from real environments and generating virtual training data through synthesis. In the case of data collected from real environments, the process can be challenging due to the dynamic nature of the surroundings or the presence of hazardous conditions. Consequently, significant research is focused on the synthesis of data within virtual environments as a strategy to amass adequate training data, even under adverse conditions.
Table 1 summarizes the methods for generating synthetic data as identified in previous studies. The key term ‘Synthetic Image Data Generation’ was used to conduct searches on Google Scholar and SCIENCE ON. The studies listed in the table span from 2017 to 2023, indicating that they were conducted within the past seven years. These papers are categorized based on the synthetic data generation method employed. There are three primary synthesis approaches: in a 3D environment, in a 2D environment, and a hybrid of both. Research related to 3D environment synthesis utilizes three-dimensional (3D) programs and game engines. For instance, data from tools like the Unity Engine, commonly used as a game development platform, are transformed from 3D graphical images or videos into 2D image data for use as training material. One notable study that focused on image synthesis for underwater object recognition using Convolutional Neural Networks (CNNs) involved rendering the objects after 3D CAD modeling [
9]. However, employing game engines or 3D programs for synthesis is time-consuming, technically demanding, and thus costly. To address these constraints, ‘object infusion’ and ‘learning-based’ synthesis methods have been developed. For example, research on generating synthetic data involves modeling objects in a 3D environment and integrating them into 2D background images [
10]. Nevertheless, this approach is limited to generating synthetic data only for the objects that have been modeled, and it lacks realism, failing to accurately represent physical objects. The ‘learning-based’ method, particularly using Generative Adversarial Networks (GANs), offers a solution by allowing for the generation of synthetic images from existing data. This approach not only facilitates the expansion of insufficient training datasets through GANs but also significantly increases the sample size of the dataset, showcasing its potential for enhancing data diversity and quality.
The initial exploration of data training utilizing Generative Adversarial Networks (GANs) led to the creation of a synthetic image dataset for medical technology applications [
14]. Similarly, a previous study generated synthetic data for jellyfish in marine environments using deep learning techniques [
15]. Moreover, numerous studies, such as the generation of dental image data employing Deep Convolutional GANs (DCGANs) [
19], are underway. These efforts are significant, as they address the shortage of training data for deep learning by enlarging the dataset and enhancing efficiency. The strategy of deploying algorithms like GAN and DCGAN to enrich images and amplify the sample size through modifications in object characteristics primarily aims to increase data volume, rather than its diversity. Achieving diversity through extensive alterations, such as variations in the background environment and object positioning, proves challenging when augmenting with a constrained dataset. In contexts like construction sites, not only is a large sample size vital but also data from diverse environments. As such, synthesizing varied backgrounds and objects through the ‘insert object’ method becomes crucial for attaining significant diversity.
The ‘insert object’ technique allows for the incorporation of objects into pre-existing background images without the necessity for configuring specific settings via a 3D program. This approach facilitates the acquisition of a substantial volume of training data and the enhancement of data diversity, circumventing the need for background 3D modeling, even in scenarios of data scarcity. For instance, in a study applying the object insertion technique, data were synthesized by isolating traffic cones from images of roads submerged under water [
18]. By leveraging the object insertion method, it is feasible to generate novel data by synthesizing backgrounds and objects, thereby achieving diversity through the alteration of object types and locations.
Table 2 contrasts the processes involved in constructing synthetic datasets. The asterisk (*) in the table denotes tasks that required time and labor investment. The approach of generating a training dataset via deep learning techniques, such as GAN, primarily aims to enlarge the sample size, often at the expense of diversity. Furthermore, this method is characterized by significant investments in time and technical expertise. Hence, for a construction site, with its constantly evolving environment, a method that prioritizes diversity, like object insertion, becomes crucial. The generation of synthetic data necessitates data that align with the specific environmental requirements, suggesting that AI applications for construction sites demand training data that accurately represent the site’s visual characteristics. Previous research has been limited by a lack of consideration for the diversity of construction environments and backgrounds in the ‘object insertion’ method, and there is an absence of studies on creating synthetic data for linear objects like scaffoldings, which are prone to high rates of fall accidents. In the constantly changing environment of construction sites, securing various training data through the ‘object insertion’ method appears increasingly necessary. Therefore, this study explored the creation of synthetic data using the object insertion method for scaffolding, which has a high rate of fall incidents but low object recognition rates due to its linear object characteristics, to confirm the potential of hybrid datasets in obtaining data for construction sites.
5. Conclusions
This study involved an experimental investigation into the potential of synthetic data generation for scaffolding, one of the contributing factors to fall accidents on construction sites, by using linear objects as the subject. The aim was to find the optimal ratio of mixing synthetic and real data to achieve a high object recognition performance and to verify the effectiveness of the hybrid dataset through experimentation. Our aim was to overcome the challenges of collecting training data at construction sites by creating a training dataset with an optimal composition ratio using synthetic data. We generated meaningful data at a low cost by synthesizing 2D backgrounds and objects. Training with a hybrid dataset, comprising real and synthetic data at a 2:8 ratio, yielded a performance similar to that of real data, validating the efficacy of the synthesized data. By reducing the proportion of data requiring direct collection at the construction site by 80% through the construction of a hybrid dataset, we mitigated the risk associated with data collection. The limitations of this study include the difficulty in making accurate comparisons among research papers due to the varied topics of each study, such as synthesized backgrounds, objects, and data quantities. Additionally, a limitation of this study is the small size of the dataset and the fact that the proportions of synthetic to real data in the hybrid dataset do not cover all cases. Future research aims to increase the size of the dataset and diversify the mix ratio of real and synthetic data to more precisely determine the impact of these ratios on object recognition performance. Despite these challenges, this study confirms the impact of hybrid datasets on performance improvement by creating synthetic data, including linear objects like scaffolds commonly found at construction sites. As there are limitations to collecting all necessary data directly from construction sites, the significance lies in generating low-cost, high-efficiency data by reducing the need for actual data collection through synthetic data creation. Through this research, it is evident that synthetic data generation, combined with real data at appropriate ratios, can reduce the labor involved in direct data collection on-site, particularly for data that are challenging to obtain due to linear characteristics. Future studies are needed to improve the accuracy of object recognition using synthetic datasets. For the future research, we plan to enhance the detection performance and increase data diversity by incorporating various types of object data collected at construction sites, as well as background data differentiated by environmental factors. We anticipate that the inclusion of more diverse background data, tailored to specific environmental requirements and object types, will make the data more suitable for construction sites. If synthetic data closely resemble real data, they are expected to enhance accuracy, necessitating the application of additional imaging technology to make synthetic data closer to reality. Image-processing techniques can adjust shadows, contrast, and color changes to make synthetic data more similar to real data. Furthermore, since real objects are 3D, recognizing images can vary depending on the object’s position and angle. To address this, synthetic object data collected from various angles rather than a single angle are expected to yield higher object recognition rates. Moreover, combining AI with ontology to assess the safety or unsafety of recognized objects can significantly enhance AI’s applicability [
20]. Simultaneous research on situation judgment, incorporating synthetic data generation and ontology, could lead to a more accurate situation judgment. Beyond object recognition, if we can judge the safety or danger of a situation that occurs during work on an object, we can expect to reduce the occurrence rate of accidents at construction sites.