1. Introduction
The construction industry is the most hazardous industry due to the highly open and dynamic nature of its work sites [
1]. According to global occupational injury statistics, the construction industry employs only 7% of the global workforce, yet it accounts for 30–40% of annual occupational fatalities [
2]. Even in developed EU countries, construction-related fatalities represent 20% of all industrial deaths each year [
3], highlighting the importance of occupational safety in the construction industry.
The severity of occupational fatalities in Taiwan’s construction industry is even higher than that in the international statistics. According to the latest data analysis from the Occupational Safety and Health Administration (OSHA) of the Ministry of Labor [
4], statistics from 2012 to 2023 showed that the average annual death toll from occupational accidents in the construction industry reached 151.3 people, accounting for 48.1% of all accidents (
Table 1). The fatality rate per 1000 workers in construction (0.1835) was 6.31 times higher than the national average for all industries (0.0291) [
4], which was 3–4 times higher than that of other developed countries [
2]. This highlights an urgent need to strengthen occupational safety management in the construction industry. In addition, the average annual death toll from “falls and rolling” reached 100.8 fatalities, accounting for two-thirds (66.6%) of the construction industry or nearly one-third of the whole industry, underscoring the severe threat to workers’ lives. In response to this serious hazard rate, the government declared the year 2024 as the “Year of Fall Prevention in Construction” [
5].
According to the Domino Theory [
6], eliminating factors that lead to falls, known as potential hazards [
7], can prevent fall accidents. Cao and Xie [
8] identified two main factors causing occupational accidents: unsafe behavior and unsafe conditions. Unsafe behavior accounted for 67.3%, with “improper use of personal protective equipment” (e.g., not wearing or incorrectly attaching safety harnesses) being the highest at 40.3%. Unsafe conditions accounted for 38.5%, with “lack of safety devices on machinery/equipment” (e.g., missing or failing protective devices) being highest at 40.5%. Ji et al. [
9] found that improper use of protective equipment and removal of safety devices (e.g., removing guardrails or braces without replacement) are key factors for construction fall accidents.
2. Research Objectives
To reduce occupational accidents, advanced technologies have been integrated to enhance worker safety. Over the past decade, rapid advancements in deep learning have led to the widespread use of information technology for construction site safety management, particularly in applying computer vision (CV) for hazard monitoring [
10]. Most construction sites are equipped with CCTV for 24 h video surveillance for the integration of data with CV for safety monitoring.
Unlike traditional manual inspections, computer vision allows for continuous safety monitoring. However, challenges exist in using computer vision for fall accident detection due to differences between hazardous scenarios and subtle features [
11]. The recognition of these details is complex and prone to misjudgment, thus limiting effective safety monitoring, but it is required for timely hazard detection and alerts [
12]. For example, to detect scaffolding transverse brace installation, thin and elongated components must be recognized from complex backgrounds (
Figure 1), while correct safety lifeline hook-ups must be identified from brief and subtle actions (
Figure 2). Therefore, specialized computer vision methods are necessary for construction hazard scenarios [
13].
Based on such problems, we identified two primary hazardous scenarios causing severe falls and rolling hazards in construction accidents. Regarding the unsafe behavior, the safety lifeline is not hooked up, while in the unsafe conditions, the scaffolding transverse brace is removed without restoration. We developed a deep learning model with optimized hyperparameters using transfer learning to accurately recognize the subtle features of the two specified hazards to improve human-level performance.
3. Literature Review
Katsigiannis et al. [
14] used the transfer learning approach of MobileNet v2 to detect small cracks in brick walls. This lightweight, efficient model is appropriate for mobile or resource-limited environments, providing accurate, real-time crack analysis with excellent performance. Its ability to capture subtle crack variations, such as shape and size, is crucial, and its scalability enhances adaptability to different materials and lighting conditions, broadening its application scope. Zeng et al. [
15] used a ResNet50 transfer learning model to identify maritime vessels, focusing on the fine features of ships at sea. Known for its strong feature extraction capabilities, ResNet50’s deep structure effectively captures image details that are essential for vessel recognition in turbulent marine environments. This approach enhances identification accuracy and classification performance under varying sea and lighting conditions. The model’s feature extraction enables precise vessel recognition, improving adaptability to complex maritime environments.
Junzhe et al. [
16] applied transfer learning with YOLO v5 to detect small marine debris. The excellent object detection performance of YOLO v5 offers high real-time accuracy, making it appropriate for multi-target scenarios and particularly effective in identifying small debris in the ocean. The model performs stable detection under varying light and wave conditions, enhancing its practicality and versatility. This advanced detection approach efficiently identifies marine debris, supporting environmental protection efforts with a flexible and reliable detection system. Sanida et al. [
17] used VGGNet with transfer learning to capture the fine features of tomatoes, identifying various growth changes such as height, leaf count, fruit size, and color. VGGNet’s deep structure allows for accurate growth monitoring under different environmental conditions, supporting agricultural management decisions with stable recognition.
The key features, source models, and accuracy and recall rates of the related works applying computer vision for subtle feature detection are summarized in
Table 2, which are considered in this study.
4. Methodology
We developed a transfer learning method to detect two target subtle hazards.
4.1. Procedure
The process of the developed transfer learning method is depicted as follows (
Figure 3):
Select pre-trained source models and use Matlab® (R2024a) Deep Network Designer to choose pre-trained deep learning models with different parameters.
Remove the last few layers of the pre-trained network.
Add a fully connected layer and an output layer adjusted for the dataset with two output classes (True/False).
Adjust the parameters and train the model to optimize hyperparameters, including the learning rate, L2 regularization, epochs, and batch size.
Test the models with different settings. If accuracy is low, adjust the parameters or select a new network.
Validate accuracy and output models with an accuracy of 95%. If the desired accuracy is obtained, the model outputs the result; if not, the training is ended.
4.2. Selection of Pre-Trained Source Models
The Matlab
® Deep Network Designer App provides 19 types of pre-trained source models [
18]. While selecting the source models, factors such as graphics processing unit (GPU), memory capacity, and data requirements must be considered. Larger models with diverse parameters require more training data and demand higher GPU speed and memory capacity than smaller models. After a comprehensive evaluation, we selected four pre-trained networks as source models: MobileNet v2, GoogleNet, Inception v3, and ResNet50. The parameters for these four models are shown in
Table 3.
The pre-trained models selected for transfer learning in this study are described as follows:
MobileNet v2: Introduced by Sandler et al. from Google in 2019 [
19], MobileNet v2 is a lightweight deep learning model with a small size and low computational cost, making it suitable for resource-limited applications.
GoogleNet: Introduced by Szegedy et al. in 2014 [
20], and also known as Inception v1, GoogleNet offers a novel deep learning architecture aimed at addressing issues such as overfitting, gradient vanishing, and gradient explosion, which are seen in earlier deep learning networks like AlexNet and VGG.
Inception v3: Developed by Szegedy et al. [
21] as an improved version of GoogleNet, Inception v3 is a medium-large network featuring Inception modules for multi-level feature extraction, label smoothing to prevent overfitting, and symmetric and asymmetric blocks for enhanced feature representation.
ResNet50: As a variant of the residual neural network introduced by He et al. [
22], ResNet50 uses residual blocks with skip connections to address gradient vanishing. Its deep 50-layer architecture effectively captures complex data and image features, excelling in image classification, object detection, and segmentation tasks.
4.3. Data Collection for Hazard Scenarios
In transfer learning, a dataset specific to the target problem is essential. According to the “collaborative strategy” of transfer learning, data from various construction sites must be collected to enhance model generalization. We collected data from five construction sites: three for scaffold cross-brace installation images and two for safety lifeline hook-up images. The statistics for each site are shown in
Table 4. The two fall hazard scenarios were defined in this study as follows:
Scaffolding transverse brace installation: Scaffolding often leads to falls if cross-braces are not reinstalled after temporary removal for wall work or material transport. We collected 395 images of correctly installed braces (
Figure 2a) and 660 images of removed braces (
Figure 2b), totaling 1055 images.
Safety lifeline hook-up: Safety harnesses must be properly attached to anchor points to prevent falls. However, workers often resist hooking up their harnesses to lifelines to avoid movement restrictions, compromising protection (
Figure 3). This study collected 529 images of correct hook-ups (
Figure 2a) and 526 images of incorrect ones (
Figure 2b), totaling 1055 images.
4.4. Hyperparameters for Model Fine-Tuning
After data collection, the parameters were tuned for transfer learning. In Matlab® Deep Network Designer, the relevant parameters for transfer learning are as follows:
Input image size: To simulate standard CCTV construction site monitoring, the image resolution is kept moderate, with the input images set to 299 × 299 × 3 RGB format.
The weight learning rate (WLR) in the fully connected layers is used to adjust the learning rate of the weights in the neural network layers, allowing users to set a rate different from the default for specific layers. This enables new layers to learn at different speeds during transfer learning. The default value is 1.
The bias learning rate (BLR) is used to adjust the learning rate for biases in the neural network layers, allowing users to set specific rates for bias updates. This is especially useful in transfer learning and model fine-tuning. The default value is 1.
Weight L2 regularization (WL2R) is used to adjust the intensity of L2 regularization for the weights in the neural network layers. L2 regularization prevents overfitting by adding a penalty term (the sum of squared weights) to the loss function. The WeightL2Factor value is used to determine the weight of this penalty, or the strength of regularization, with a default value of 1.
The initial learning rate (ILR) is used to set the initial learning rate for neural network training. It controls the step size for weight updates early in training, impacting training speed and convergence. At a higher rate, convergence becomes fast, while a lower rate provides stable training but slower convergence. A small default, such as 0.01, is often used to ensure stability.
Epochs represent a complete cycle in neural network training, where the model trains on the entire dataset once. The number of epochs depends on the desired accuracy and dataset size, with a default value of 30.
MiniBatchSize (MBS) is used to define the number of training sets used in each training iteration to update model parameters. An appropriate minibatch size is necessary to improve training efficiency and significantly impact model convergence and stability. The default value depends on the GPU speed and memory, typically set to 32 or 64.
Among the parameters, only the default input image size was fixed. The others were adjusted progressively during the transfer learning process to achieve optimal recognition accuracy.
5. Results and Discussion
5.1. Testing
5.1.1. Test Platform and Performance Metrics
The specifications of the server used in the model training phase included (1) CPU: Intel(R) Xeon(R) E5-2620v4 @ 2.10GHz; (2) RAM: 40 GB, 2400 MHz; (3) OS: Microsoft Windows 10; and (4) GPU: NVIDIA Quadro P2000 (5 GB). Given the numerous experiments in this study, the performance of the recognition results was evaluated by determining the prediction accuracy (%), using Equation (1) [
23]:
where true positive (TP) refers to cases predicted as positive (P) and correctly identified (T); true negative (TN) refers to cases predicted as negative (N) and correctly identified; false positive (FP) refers to cases predicted as positive but incorrectly identified (F); and false negative (FN) refers to cases predicted as negative but incorrectly identified.
5.1.2. Experiment Planning and Parameter Adjustment
Four experiments were designed based on the two subtle hazard scenarios and the dataset size:
Test 1(a): Preliminary test of scaffolding transverse brace recognition with a dataset of 540 images.
Test 1(b): Advanced test of scaffolding transverse brace recognition for correct cross-brace installation with the full dataset of 1055 images.
Test 2(a): Preliminary test of safety lifeline hook-up recognition with a small dataset of 545 images.
Test 2(b): Advanced test of safety lifeline hook-up recognition with the full dataset of 1055 images.
To optimize system performance, preliminary tests were conducted with a MiniBatchSize of 128 to obtain faster and better results for this dataset size. Additionally, the initial learning rate was set to 0.001 (instead of the default of 0.01) to improve learning outcomes. These two parameters were fixed, with the other parameters tested incrementally. The parameters were adjusted using the simplified Taguchi method [
24]. MobileNet v2 was tested with the default values to adjust each parameter in fivefold increments to find the optimal settings. The best combinations for each model and experiment are shown in
Table 5, with the weight learning rate (10, 15, 20, or 25), BLR (10, 15, 20, or 25), L2 regularization (1, 2, or 3), and epochs (30 or 60).
5.2. Testing Results
5.2.1. Test 1(a)
In Test 1(a), a small dataset of 540 images was used to test each model’s accuracy in identifying the correct scaffolding transverse brace installation (
Table 6). According to the results, MobileNet v2 achieved the highest accuracy at 92%. However, none of the models reached the target accuracy of 95%.
5.2.2. Test 1(b)
In Test 1(b), the full dataset of 1055 images was used for testing. GoogleNet achieved the highest accuracy at 95.2%, meeting the target accuracy, as shown in
Table 7.
5.2.3. Test 2(a)
Using a smaller dataset of 545 images, ResNet50 achieved the highest accuracy at 92.66% (
Table 8). However, none of the models reached the target accuracy of 95%.
5.2.4. Test 2(b)
In Test 2(b), the full dataset of 1055 images was used for testing. Both ResNet50 and MobileNet v2 achieved an accuracy higher than 96%, as shown in
Table 9.
5.3. Summary and Discussion
MobileNet v2 achieved the highest accuracy of 92%. However, none of the models reached the target accuracy of 95%. With transfer learning, the two subtle fall hazard scenarios were identified. After transfer learning with fine-tuning, the pre-trained deep learning models showed better abilities than human recognition in the scenarios of “scaffolding transverse brace installation” and “correct safety lifeline hook-up”. This result confirms the potential of transfer learning for industrial applications in subtle hazard detection. The hazard recognition ability varied by task across the different pre-trained models. GoogleNet performed best for “cross-brace installation” detection, while MobileNet v2 and Inception v3 excelled in “safety lifeline hook-up” detection. However, MobileNet v2 was more efficient, requiring only 14.6% of the parameters of Inception v3. The simplified Taguchi method provided near-optimal parameter combinations and was effective for determining the hyperparameters for model fine-tuning. A “Collection” strategy for data augmentation did not improve the model accuracy. In contrast, the “Collaboration” strategy improved the model accuracy by gathering the datasets from various sites and increasing the sample size.
6. Conclusions
Computer vision is increasingly adopted in construction safety monitoring as high accuracy is essential to protect workers. Accidents are often caused by subtle hazards and misjudgment, delayed timely alerts, and a lack of safety monitoring. To address this issue, we applied transfer learning and pre-trained deep learning models on ImageNet to capture subtle features and fine-tune them with target datasets to achieve better accuracy than human recognition. To validate the method, two typical subtle hazard scenarios—“scaffolding transverse brace installation” and “safety lifeline hook-up”—were tested using the pre-trained models, namely, MobileNet v2, GoogleNet, Inception v3, and ResNet-50. In the test, GoogleNet achieved the highest accuracy for cross-brace detection (95.2%), while MobileNet v2 and Inception v3 excelled in lifeline hook-up detection (>96%), both surpassing human capability. The results confirm the effectiveness of transfer learning for subtle hazard detection, providing a reference for transfer strategies and optimizing the parameters of construction projects.
While demonstrating the potential of transfer learning for detecting two subtle hazards, time and resource constraints limited testing on other subtle hazards. Future research is still necessary to apply the developed method in this study to construction projects, such as scaffolding safety pins or crane anti-slip clips. Additionally, besides the four pre-trained models—MobileNet v2, GoogleNet, Inception v3, and ResNet-50—it is necessary to explore other models to enhance the recognition of subtle features.
Author Contributions
Research conceptualization, W.-D.Y.; conceptualization refinement, W.-T.H.; methodology, W.-D.Y. and W.-T.H.; model testing, C.-Y.T.; validation, W.-T.H.; formal analysis, W.-T.H.; investigation, W.-T.H.; data curation, W.-T.H.; writing—original draft preparation, W.-D.Y. and W.-T.H.; writing—review and editing, A.B.; visualization, W.-D.Y.; supervision, W.-T.H.; project administration, W.-D.Y.; funding acquisition, W.-D.Y. All authors have read and agreed to the published version of the manuscript.
Funding
This project (MOST 111-2221-E-324-011-MY3) was funded by the National Science and Technology Council of Taiwan. The authors gratefully acknowledge the support.
Institutional Review Board Statement
Not applicable, as this study does not involve humans or animals.
Informed Consent Statement
Not applicable.
Data Availability Statement
The data are available from the corresponding author upon reasonable request.
Acknowledgments
The images for the case study were collected from the construction sites of anonymous project owners. The authors would like to express their sincere gratitude to the project owners.
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Xia, X.; Xiang, P.; Khanmohammadi, S.; Gao, T.; Arashpour, M. Predicting Safety Accident Costs in Construction Projects Using Ensemble Data-Driven Models. J. Constr. Eng. Manag. 2024, 150, 04024054. [Google Scholar] [CrossRef]
- International Labour Organization (ILO). Safety and Health in the Construction Sector—Overcoming the Challenges; ILO: Geneva, Switzerland, 2014. [Google Scholar]
- Eurostat. Accidents at Work Statistics. Available online: http://ec.europa.eu/eurostat/statistics-explained/index.php/Accidents_at_work_statistics (accessed on 8 June 2024).
- Occupational Safety and Health Administration (OSHA), Ministry of Labor. Annual Labor Inspection Reports. Available online: https://www.osha.gov.tw/48110/48331/48333/48339/lpsimplelist (accessed on 8 June 2024).
- Occupational Safety and Health Administration (OSHA), Ministry of Labor. Strengthened Disaster Reduction Measures for the “Year of Fall Prevention in Construction, 2024”. OSHA Press Release. Available online: https://www.osha.gov.tw/48110/48417/48419/163014/post (accessed on 7 February 2024).
- Heinrich, H.W. Industrial Accident Prevention; McGraw-Hill: New York, NY, USA, 1931. [Google Scholar]
- Ministry of Labor. Taiwan Occupational Safety and Health Management System Guidelines. Available online: https://www.osha.gov.tw/48110/48713/48735/60262/ (accessed on 23 January 2024).
- Cao, C.; Xie, B. Analysis of Occupational Hazard Factors and Protective Strategies in Taiwan’s Manufacturing and Construction Industries—2017 Research Project (ILOSH 106-S313); Final Report; Institute of Labor, Occupational Safety and Health, Ministry of Labor, Taiwan: Taipei, Taiwan, 2017. (In Chinese) [Google Scholar]
- Ji, J.; Yang, H.; Chen, W.; Liu, K.; Zhang, T.; Ding, X. Analysis of Major Fall Scenarios and Prevention Strategies in the Construction Industry. J. Saf. Health 2008, 16, 383–400. [Google Scholar] [CrossRef]
- Seo, J.; Han, S.; Lee, S.; Kim, H. Computer Vision Techniques for Construction Safety and Health Monitoring. Adv. Eng. Inform. 2015, 29, 239–251. [Google Scholar] [CrossRef]
- Golovina, O.; Perschewski, M.; Teizer, J.; König, M. Algorithm for Quantitative Analysis of Close Call Events and Personalized Feedback in Construction Safety. Autom. Constr. 2019, 99, 206–222. [Google Scholar] [CrossRef]
- Li, Z.; Shi, A.; Li, X.; Dou, J.; Li, S.; Chen, T.; Chen, T. Deep Learning-Based Landslide Recognition Incorporating Deformation Characteristics. Remote Sens. 2024, 16, 992. [Google Scholar] [CrossRef]
- Fang, W.; Ding, L.; Luo, H.; Love, P.E.D. Falls from Heights: A Computer Vision-Based Approach for Safety Harness Detection. Autom. Constr. 2018, 91, 53–61. [Google Scholar] [CrossRef]
- Katsigiannis, S.; Seyedzadeh, S.; Agapiou, A.; Ramzan, N. Deep Learning for Crack Detection on Masonry Façades Using Limited Data and Transfer Learning. J. Build. Eng. 2023, 76, 107105. [Google Scholar] [CrossRef]
- Zeng, G.; Wang, R.; Yu, W.; Lin, A.; Li, H.; Shang, Y. A Transfer Learning-Based Approach to Maritime Warships Re-Identification. Eng. Appl. Artif. Intell. 2023, 125, 106696. [Google Scholar] [CrossRef]
- Junzhe, Z.; Fuqiang, J.; Yupeng, C.; Weiyi, W.; Qing, W. A Water Surface Garbage Recognition Method Based on Transfer Learning and Image Enhancement. Results Eng. 2023, 19, 101340. [Google Scholar] [CrossRef]
- Sanida, T.; Sideris, A.; Sanida, M.V.; Dasygenis, M. Tomato Leaf Disease Identification via Two–Stage Transfer Learning Approach. Smart Agric. Technol. 2023, 5, 100275. [Google Scholar] [CrossRef]
- MathWorks. Compare Pretrained Neural Networks. Matlab® R2024a Documentation. Available online: https://www.mathworks.com/help/deeplearning/ug/pretrained-convolutional-neural-networks.html (accessed on 18 June 2024).
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.-C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar] [CrossRef]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going Deeper with Convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015. [Google Scholar] [CrossRef]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the 2023 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar] [CrossRef]
- Wikipedia. Confusion Matrix. Available online: https://en.wikipedia.org/wiki/Confusion_matrix (accessed on 20 June 2024).
- Tsai, W.-Y. A Study on Safety Recognition of Protective Openings Using Faster R-CNN Technology. Master’s Thesis, Department of Construction Engineering, Chaoyang University of Technology, Taichung, Taiwan. June 2020.
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).