1. Introduction
Over the years, numerous industrial revolutions took place, transforming the industrial scene. These evolutionary phases introduce the progressive integration of novel technologies into manufacturing procedures, aiming to enhance efficiency, productivity, and economic growth. Nowadays, Industry 5.0 is emerging, placing humans at the center of production. Specifically, the human-centered Industry 5.0 emphasizes the integration of smart technologies, automation, and data exchange in manufacturing. Beyond the human-centric focus of the Fifth Revolution, it also induces increased resilience and an improved emphasis on sustainability.
The human-centric Industry 5.0 places a high priority on employee safety in manufacturing, emphasizing a knowledge-driven approach to human–machine–environmental safety. Intelligent safety management that goes beyond conventional measures becomes necessary to deal with complex human–machine–environment interactions. Moreover, based on the capabilities and opportunities of Industry 5.0, it becomes pivotal to design and develop safety management strategies that not only address the unique challenges of each manufacturing setting but are also robust enough to be adapted across varying operational landscapes.
By leveraging advanced technologies, an efficient, flexible, and adaptable monitoring system can be established. Specifically, object detection algorithms can be integrated into video surveillance systems to analyze the footage in real-time and identify potential hazards within an industrial environment. However, the accuracy of such a complex process, based on AΙ methods, is strongly dependent on datasets.
The acquisition of real data poses significant challenges both in terms of cost and safety, specifically in cases where human participation is required. Especially in hazardous environments, such as manufacturing, obtaining real data involves significant risks, mainly concerning the safety of personnel involved in data collection. Those challenges can be addressed by utilizing game development platforms, such as Unity, to generate high-quality synthetic data via virtual reality (VR). This not only diminishes the time spent on data collection and annotation but also substantially reduces the requisite human effort and cost. Moreover, synthetic datasets can be created to accommodate the specifics of various industrial scenarios. VR acts as an immersive simulation tool, offering safe and controlled environments that mimic real-world scenarios. VR is utilized in various fields, such as education, healthcare, engineering, etc., with potential enhancements through sensors available in VR systems [
1].
One of the fundamental methods to protect workers is to monitor and control their exposure to hazards, as well as to detect and identify potential risks in the workplace. According to the National Institute for Occupational Safety and Health (NIOSH) in the U.S., the sequence of control measures begins with the utilization of personal protective equipment (PPE), which refers to specialized gear or clothing intended to protect individuals from potential hazards in the workplace. PPE is used to minimize the risk of injury or exposure to various physical or other types of hazards. Some examples of PPE include safety helmets, vests and other protective clothing, safety goggles, etc.
In this research paper, we delve into the importance of a flexible and adjustable detection system of safety hazards for Industry 5.0. Initially, we analyze the five industrial revolutions. More focus is given to Industries 4.0 and 5.0, presenting the benefits of them as well as the needs that pushed the advent of the fifth revolution. Additionally, considering the human-centric nature of Industry 5.0, we examine the necessity of flexible and adaptable safety management methods in manufacturing, leveraging advanced technologies, and highly respecting human–machine–environment interactions.
Moreover, we propose a flexible and adjustable detection system that can be exploited by factory safety management to detect hazards in real-time. Considering the importance of PPE utilization regarding personnel safety, we focus on the detection of that equipment to enhance industrial safety. However, the proposed system is independent of the use case and can be applied to various scenarios and environments. The first stage of our system is the synthetic data generation methodology, which involves the creation of large-scale annotated datasets using 3D software, such as Blender, and a game development platform, like Unity. The various steps to achieve this are detailed, and all the mentioned information can be adjusted. The generated data can be modified and restructured to suit evolving requirements or to simulate new environments, enhancing the adaptability of the proposed methodology. The second stage concerns the training and evaluation of a training model that can be deployed on video surveillance systems to identify the target hazards in real-time. We evaluate the methodology in a practical scenario in which the performance of AI object detection models trained both on real-world data from the CHV dataset and on our synthetic data is compared. Additionally, we conduct a series of experiments to determine the optimal ratio of synthetic and real data for constituting the training set of object detectors, aiming to achieve the highest possible performance with the minimum number of real-world samples. Finally, by utilizing the proposed methodology, we create a synthetic dataset of four PPE classes, namely vest, helmet, glove, and goggle. We train an object detector on this dataset, and we employ real-world images for inference, managing to achieve real-time detections, proving that the detection system can be exploited for real-time applications. The detection system is illustrated in
Figure 1. To the best of our knowledge, this is the first study to apply synthetic dataset generation methodology, utilizing a game development platform for PPE detection in manufacturing and offering insights about the capabilities and limitations of exploiting synthetic data for real-world applications in this domain.
The remainder of the paper is organized as follows:
Section 2 presents the different industrial evolutions, focusing on Industry 5.0 and its requirements for flexible and adaptable safety management methods.
Section 3 provides the related work and background for synthetic dataset generation methods as well as AI/ML object detection algorithms.
Section 4 introduces our proposed smart detection system, including the generation of annotated data and the creation of an AI object detection model, while
Section 5 presents the experiments and the results. Finally, the paper concludes in
Section 6.
5. Experiments and Results
The proposed system is evaluated in a series of different experiments to identify its robustness in generating synthetic data that can be utilized to train AI object detectors for real-time applications.
5.3. Experimenting with Synthetic and Limited Amount of Real-World Data
As described in
Section 4.2, an AI model trained exclusively on a synthetic dataset cannot achieve as high performance as one trained on a real-world dataset. However, in many real-world scenarios, the acquisition of large-scale amounts of real data is infeasible, posing a significant challenge to the development of an effective AI model. Therefore, the combination of real and synthetic data that constitutes the training dataset seems like a promising solution, addressing the problem of limited real data. Therefore, it is crucial to define the optimal ratio of real-world and synthetic data to maximize the model’s performance.
Through a series of experiments, we aim to examine the potential benefits and limitations of combining real and synthetic images for the creation of a training set that can be used to train AI models for real-world applications. To achieve this, we design a series of experiments where we begin with a relatively small number of real images and progressively augment the dataset with synthetic images. The real images are sourced from the CHV dataset, of which 50 are allocated for the training set and 25 for the validation set. In each experiment, we trained the YOLOv5s model on the combined training dataset while we evaluated it on the CHV test set, comprising 133 images, to ensure a consistent benchmark for evaluation.
Table 4 presents the number of real and synthetic images that constitute the training set for each experiment. In the first one, working under the constraints of limited data availability, the model is trained exclusively on a set of 50 real images. In all the following six experiments, the number of real images remains the same while we incrementally increase the number of synthetic images.
YOLOv5s is trained on training datasets of the various experiments, and it is evaluated at the same CHV test set.
Table 5 presents the mAP values of trained models for each experiment. In the initial experiment, the model, trained with only a small number of real images, achieved a mAP of 14.3%. In the second experiment, the incorporation of synthetic data led to a slight improvement in the model’s performance, increasing overall mAP to 16.3%. In the third one, a significant rise in mAP is observed, escalating to 71.2%. The highest mAP value of 84.1% is achieved in the sixth experiment, in which the training set contains 50 real images and 600 synthetic ones.
In the last experiment, where we doubled the number of synthetic images in the training set to 1200, it was observed that the mAP value slightly decreased to 81.0%. One explanation for this decrease in performance is the model’s over-adaptation to the characteristics of the synthetic data. When exposed to a significant volume of synthetic data, the model develops a bias, becoming particularly adept at recognizing objects in virtual conditions. Consequently, its ability to generalize and recognize objects in real-world scenarios could be compromised. This trend suggests a potential saturation point beyond which adding more synthetic data may not necessarily lead to performance gains and, in fact, could risk the model’s efficacy in real-world conditions, leading to diminished returns.
To ensure the robustness and reliability of our findings, we repeat the experiments listed in
Table 4 two more times, each time selecting a different random set of 75 images from the CHV dataset. The methodology remains consistent. Specifically, we begin with 50 real images and then gradually introduce synthetic data to observe the impact on performance. As illustrated in
Figure 8, the performance of the models showcases similar trends across all experiments, validating the authenticity of our initial results. It should be mentioned that the best performance of all models is observed when combining 50 real training images with 600 synthetic images. This configuration yields an average mAP score of 84.3%, with a standard deviation of 0.4%.
Remarkably, by employing a training set comprising just 50 real images combined with synthetic data to train an object detector, we manage to bridge the performance gap, achieving a result that is only a slight 4.7% behind the scenario of having only real images.