Exploring Edge Computing for Sustainable CV-Based Worker Detection in Construction Site Monitoring: Performance and Feasibility Analysis

Xiao, Xue; Chen, Chen; Skitmore, Martin; Li, Heng; Deng, Yue

doi:10.3390/buildings14082299

Open AccessArticle

Exploring Edge Computing for Sustainable CV-Based Worker Detection in Construction Site Monitoring: Performance and Feasibility Analysis

by

Xue Xiao

¹,

Chen Chen

^2,*

,

Martin Skitmore

³

,

Heng Li

⁴

and

Yue Deng

⁵

¹

Shenzhen THS Hi-Tech Co., Ltd., Shenzhen 518057, China

²

School of Civil Engineering and Architecture, Zhejiang University of Science and Technology, Hangzhou 310023, China

³

Faculty of Society and Design, Bond University, Robina, QLD 4226, Australia

⁴

Department of Building and Real Estate, The Hong Kong Polytechnic University, Hong Kong 999077, China

⁵

Institute of Quality Development Strategy, Wuhan University, Wuhan 430072, China

^*

Author to whom correspondence should be addressed.

Buildings 2024, 14(8), 2299; https://doi.org/10.3390/buildings14082299

Submission received: 10 May 2024 / Revised: 3 July 2024 / Accepted: 9 July 2024 / Published: 25 July 2024

(This article belongs to the Special Issue Recent Advances in Intelligent Infrastructure and Construction Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

This research explores edge computing for construction site monitoring using computer vision (CV)-based worker detection methods. The feasibility of using edge computing is validated by testing worker detection models (yolov5 and yolov8) on local computers and three edge computing devices (Jetson Nano, Raspberry Pi 4B, and Jetson Xavier NX). The results show comparable mAP values for all devices, with the local computer processing frames six times faster than the Jetson Xavier NX. This study contributes by proposing an edge computing solution to address data security, installation complexity, and time delay issues in CV-based construction site monitoring. This approach also enhances data sustainability by mitigating potential risks associated with data loss, privacy breaches, and network connectivity issues. Additionally, it illustrates the practicality of employing edge computing devices for automated visual monitoring and provides valuable information for construction managers to select the appropriate device.

Keywords:

edge computing 1; CV-based worker detection 2; construction site monitoring 3

1. Introduction

The construction industry is data-intensive since heterogeneous data are generated continuously as the construction progresses [1]. Consequently, data sustainability is crucial in this industry, as it safeguards project data’s long-term availability and integrity. It guarantees enduring availability, integrity, efficient transmission, and privacy of project data in the long run. Accurate tracking and efficient resource utilization can be achieved by upholding data sustainability throughout the construction lifecycle, resulting in improved project outcomes.

Various automated methods have been used to enhance the effectiveness of construction management. In particular, the location, pose, and context information are collected with terminal devices such as sensors and cameras. The collected data are transferred to the local computer and processed using machine learning or deep learning (DL) methods for equipment and workers’ safety and productivity management. Computer vision (CV)-based DL methods have been extensively utilized for automatic construction management. For instance, some researchers [2,3,4,5] applied CV-based object detection and classification methods for workers’ and equipment safety and productivity monitoring on construction sites. These methods have greatly enhanced construction process monitoring efficiency through workforce reduction and cost savings.

Data sustainability is crucial for CV-based DL methods in construction site monitoring due to the significant requirements for extensive video data capture, storage, and transfer. Traditional data processing and communication solutions have to transfer the data collected from terminal devices (i.e., sensors, cameras) to a high-capacity system for storage and analysis. Data transfer of this nature could lead to cybersecurity concerns, time delays, and huge investments. For example, transmitting high-quality live videos from multiple cameras may cause a time delay [6]. Moreover, deploying high-performance on-site local computers requires massive labor and investment (e.g., cooling, power, place) [1]. Therefore, traditional data communication and analysis based on local computing and servers are insufficient for the diverse needs of current construction digitalization. The emerging method of edge computing is utilized to achieve high-efficiency data management in the construction industry.

Edge computing refers to a decentralized computing approach where data storage and processing are brought closer to the point of data generation [7]. Compared with traditionally used cloud computing, edge computing has the following advantages: (1) Security and privacy. There is a significant risk of data leakage or security breaches during the data transmission to the cloud. Data storage and processing at the edge device can protect the privacy and commercial secrets of the construction participants. (2) Real-time response. The proximity of data processing to the data source will significantly reduce the service response time, achieving nearly real-time performance. In certain application scenarios demanding real-time feedback (e.g., safety monitoring, traffic monitoring) [8], cloud computing fails to meet these needs. (3) Energy consumption. Edge computing alleviates network bandwidth constraints and mitigates the high energy consumption inherent in cloud computing [8].

CV-based methods have been extensively utilized for enhancing construction productivity and safety through real-time site monitoring techniques [2,3,4,5]. However, analyzing construction visual data using CV-based deep learning methods demands significant computational resources [2,3,4,5]. In this context, edge computing has emerged as a transformative solution for worker detection in construction monitoring. By enabling real-time video feed analysis directly on edge devices, edge computing reduces the reliance on constant data transmission and promotes data sustainability. This paradigm shift not only improves the efficiency of analyzing worker activities but also enhances data privacy by minimizing dependence on cloud-based processing. However, compared with traditional local computers, the analysis of performance and feasibility of edge computing for computer vision-based worker detection in construction monitoring remains uncertain. To address this challenge and enable the implementation of computer vision (CV)-based smart construction management, this study intends to investigate the feasibility of edge computing approaches for automating the detection of construction workers.

This research aims to examine how edge computing supports worker detection using computer vision in construction monitoring. The structure of this study is outlined as follows: First, the most widely used CV-based construction worker detection task is selected for the performance comparison. Then, the chosen methods are evaluated on local computers and three edge devices. Finally, the worker detection performance across processing speed, GPU usage, mean average precision (mAP), and other metrics pertaining to the three tasks for each device will be compared and extensively discussed to validate the potential of leveraging edge computing for automated construction site surveillance.

2. Literature Review

Over the years, the computer science fields have witnessed a high-speed development of artificial intelligence methods, enabling computers to record, understand, and interpret valuable visual information in images and videos. Surveillance cameras are widely installed on construction sites to record daily construction progress. Therefore, researchers and engineers have developed numerous CV-based artificial intelligence methods to analyze construction videos for site monitoring and management. This section introduces current research into CV-based methods for automatic construction management.

Object detection methods are widely used for detecting construction objects such as workers, equipment, materials, structure defects, etc. [2,3,4,9]. Some researchers have employed deep learning-based detectors to detect construction workers and personal protective equipment in the context of managing construction safety. For example, Wu et al. [10] utilized the single-shot Shot Multibox Detector model to detect the hardhat worn on construction sites, and the detected hardhat is further classified into four colors, which are blue, white, yellow, and red. The mAP of the proposed method achieved 83.89%. Chen et al. [9] utilized the You Only Look Once (YOLO)-v5 model to recognize the use of hardhats by workers on an edge computing device, and the mAP of the detection model achieved 86.8%. Instead of hardhat detection, Nath et al. [11] detected hardhat and safety vests with Yolo-v3 and obtained 72.3% of the mAP value for the proposed method. Object detection methods are also widely used for structure defect detection [5,12]. For instance, Zhang et al. [12] applied the Yolo model to detect the crack on the bridge surface with a precision of 90.88%. Jiang et al. [13] used Yolo-v3 and SSD models to identify concrete surface damage and classify the damages into crack, spot, rebar exposure, and spalling.

Object tracking is another visual analyzing method commonly employed for visual monitoring in construction projects. The tracking method is usually integrated with the detection approach to obtain the trajectory of objects within the video. For instance, Xiao [14] created a multiple construction equipment tracker and combined it with the Yolo-v3 detector to track the trajectory of excavators and trucks in construction videos. Kim et al. [15] also used a Tracking–Learning–Detecting (TLD) tracker to monitor the movements of a specific truck across multiple camera feeds, enabling the assessment of the truck’s productivity. Zhu et al. [16] introduced a visual-based framework that combined detection techniques and tracking methods for real-time construction workforce and equipment tracking.

Activity recognition methods are widely used for construction safety and productivity control. Some researchers used the activity recognition method to recognize workers’ abnormal activities (e.g., falling, laying) in construction videos, thus identifying the unsafe conditions and protecting workers in advance [17,18]. Activity information can also be used for productivity monitoring. For example, Luo et al. [19,20] proposed spatial–temporal Convolutional Neural Network (CNN) models to recognize workers’ activities such as rebar connecting, moving, preparing, placing framework, etc. Then, by analyzing the time of the activities, workers’ productivity can be estimated. In addition to workers’ activity recognition, some researchers focus on equipment activity recognition [21,22,23]. Kim and Chi [21] applied the CNN model with a double-layer Long Short-Term Memory (LSTM) structure for activity recognition of excavators and analysis of the work cycles. Similarly, Chen et al. [23] introduced a framework that uses a zero-shot learning method to classify the activities of excavators and loaders and estimate the excavators’ productivity based on the sequential relationship of activities.

Currently, visual analysis methods are widely used in automatic construction management. However, existing methods mainly focus on using traditional local computers for data analysis. This does not require extensive development efforts for on-site installation and high bandwidth for large visual data transmission. Such constraints hinder the implementation of computer vision-based methods in real-world construction management projects. To address these problems, this work proposes an edge computing-based method and compares the performance of edge computing devices in executing CV-based tasks for detecting construction workers.

3. Methods

This study entailed training widely used object detection models for testing purposes. Subsequently, these models were deployed onto three edge computing devices to execute construction object detection tasks based on computer vision (CV). Finally, the proposed method’s performance was compared on local computers and edge computing devices. Figure 1 illustrates the framework of this study.

3.1. CV-Based Construction Object Detection Models

Two representative deep learning models for CV-based construction object detection were selected and trained for testing on edge devices. The previous literature has widely applied object detection in the construction industry to identify workers and equipment. Therefore, considering this study’s relevance and aims, this research selected both Yolov5 [24] and Yolov8 [25] models for comparison. Yolov5 was chosen for its consistent performance in prior studies [9,26], while Yolov8 represents the latest advancement in object detection technology. Detailed information on the Yolov5 model can be retrieved from the papers of Jocher and Reis et al. [24,25].

As shown in Figure 2, the network architecture of the YOLOv5 model has three main components: backbone, neck, and head. To perform the object detection task, the image will initially be put into the backbone to extract the features. Accordingly, the features will be fed into the neck. Then, the YOLO layer will generate feature maps, based on which the detection results will be generated. The structure of the Yolov8 network consists of three components: the head, the neck, and the backbone. As shown in Figure 3, the backbone is built on a 53-layer deep convolutional network to extract features from images efficiently. Then, the neck component connects the backbone to the head component to reduce the size of the features and enable efficient processing. The head component consists of multiple convolutional layers for predicting the objects’ class and location in images. This architecture enables Yolov8 to dynamically adjust its focus on various image components based on their relative importance, enabling the detection of both large and small objects in an image by performing object detection at different scales.

3.2. Edge Computing Devices

In this section, two representative deep learning models for CV-based construction object detection were selected and trained for testing on edge devices. From the literature review, object detection is commonly used for workers, and equipment detection is used for vision-based construction productivity and safety control. Specifically, Yolov5 [24] and Yolov8 [25] models were used for testing in this research. Yolov5 was selected because of its excellent performance in previous works [9,26], and Yolov8 was chosen for testing since it is a novel object detection model that has achieved outstanding performance. Detailed information on the Yolov5 model can be retrieved from the papers of Jocher and Reis et al. [24,25].

This study selected several edge computing devices for testing: Raspberry Pi 4B (Raspberry Pi, London, UK), Jetson Nano (NAVIDIA, City of Santa Clara, CAL, US), and Jetson Xavier NX (NAVIDIA, City of Santa Clara, CAL, US). Specifically, Raspberry Pi 4B was chosen for its low cost and small size. Jetson Nano and Jetson Xavier were selected since they are widely used edge computing devices with excellent performance. Jetson Xavier outperforms Jetson Nano in computational performance but at a much higher price point. The detailed specifications of the devices are shown in Table 1. NVIDIA Jetson Nano is a powerful small-sized computing device that allows developers to build practical artificial intelligence (AI) applications. It has a quad-core ARM Cortex-A57 MPCore CPU, a 128-core NVIDIA Maxwell GPU, 4 GB memory, and 32 GB data storage. The size of Jetson Nano is 69.6 mm × 45 mm. Raspberry Pi 4B is the latest Raspberry Pi board, which can compute like a local computer with a much smaller size. The size of Raspberry Pi 4B is 85 mm × 56 mm, and it has a 1.8 GHZ 64-bit quad-core ARM Cortex-A72 CPU without the GPU equipped. Jetson Xavier NX has a powerful CPU and GPU, bringing computing performance close to a local computer. It has a 384-core NVIDIA Volta architecture GPU with 48 Tensor Cores GPU and a 6-core NVIDIA Carmel ARM v8.2 64-bit CPU with 128 G storage and 8 GB memory. Typically, the size of the Jetson Xavier NX is also 69.6 mm × 45 mm. Images of the proposed edge computing devices are shown in Figure 4.

3.3. Evaluation Criteria and Strategies

The widely used criteria mAP is used to measure the performance of worker detection and tracking [9]. The mAP value is calculated with the intersection-over-union (IoU) value, a universal standard for evaluating the degree of overlap between predicted bounding boxes and ground truth bounding boxes. The detailed computation process of the IoU value can be retrieved from the paper of Xiao et al. [27]. The object is regarded as correctly detected when the IoU is larger than 0.5. In addition, precision is calculated with Equation (1), and recall value is calculated with Equation (2) [9], respectively. TP is true positive, representing the category of the object that is correctly classified and the bounding box that is correctly predicted. The detection result will be a false positive (FP) if at least one of the bounding boxes or categories is incorrectly predicted. False negative (FN) refers to the predicted bounding box not fully overlapping with the ground truth box.

P r e c i s i o n = \frac{T P}{T P + F P}

(1)

R e c a l l = \frac{T P}{T P + F N}

(2)

The precision–recall curve is drawn based on precision and recall values. Typically, precision quantifies the proportion of TPs within all detected objects, and recall measures the proportion of TNs within all actual objects in the image. Then, the 11-point interpolation method is used to calculate the area under the curve. Specifically, the recall value is divided into 11 parts. The AP value is calculated by separately calculating each part’s area, as illustrated in Equation (3). Based on the average value of all detection objects, use Equation (4) to calculate the mAP value [9].

A P = \frac{1}{11} \sum_{{R e c a l l}_{i}} P r e c i s i o n ({R e c a l l}_{i})

(3)

m A P = \frac{1}{N} \sum_{k = 1}^{N} {A P}_{k}

(4)

{A P}_{k}

: AP of class k;

N :

number of the object’s category.

4. Results and Discussion

The model was trained on a local computer equipped with an NVIDIA GTX 1660 graphics card (NAVIDIA, City of Santa Clara, CA, USA) and an Intel Core i5-9300H CPU (Intel, City of Santa Clara, CA, USA). The price of the local computer was 13,000 HK$. The proposed method was implemented on the Ubuntu 64 system using Python programming. The Yolov5 and Yolov8 models were trained on a dataset with 5000 images. Specifically, the dataset’s images were recorded from real construction sites in Hong Kong. Images in the dataset were labeled manually with the LabelImg object detection annotation tool. In the training stage, the dataset was divided into a training set and a test set, with 4000 images and 1000 images, respectively. Specifically, the Yolov5 model was operated under a PyTorch environment. The CUDA toolkit 11.6 and cuDNN 8.3.2 were utilized to accelerate model training. The training configuration of Yolov5 was 16 for the batch size, 100 for the number of training iterations, and 0.01 for the learning rate. The training time was 4.16 h. Figure 5b,c show the precision and recall curves, respectively. Figure 5a shows that the mAP value is 0.852, equal to the area under the curve. The training configuration of Yolov8 is 16 for the batch size, 100 for the number of training iterations, and 0.01 for the learning rate. It took 1.11 h to train the model. The mAP value is 0.824. The comparison results of mAP, precision, and recall values of the detection models are indicated in Table 2. Table 3 shows examples of test results.

4.1. Test and Comparison Results

A video collected from real construction sites tested the performance of construction worker detection tasks. A total of 18,000 video frames with a duration of 20 min were used for the test. Specifically, the videos were 15 frames per second (FPS), with 720 × 1280 pixels resolution, and the video frames were transferred to 640 × 640 pixels to be fed into the neural networks. The test results on edge computing devices and local computers are detailed in this section. Specifically, performance indicators such as processing speed, mAP, GPU consumption, and CPU consumption are compared. The comparison results are indicated in Table 3. The test results show that the mAP values on local computers and edge computing devices are the same. The processing speed of one video frame on a local computer for the Yolov5 and Yolo v8 models were 0.9 and 0.8 MS, respectively. Among the three edge computing devices, Xavier NX had the fastest computing speeds: 5.4 MS for Yolov5 and 5.8 MS for Yolov8. Raspberry Pi 4B achieved the slowest computing speeds of 45.9 and 25.7 MS for the Yolov5 and Yolov8 models, respectively. The computing speed of Jetson Nano for processing one video frame on Yolov5 and Yolov8 was 14.2 and 13.2 MS, respectively. The GPU consumption of Jetson Nano for both Yolov5 and Yolov8 models was the largest among all devices. Raspberry Pi 4B CPU consumption was the largest since it lacks a GPU. Figure 6 shows several example frames of the tested video. From the example images, it can be seen that 14 workers in the frame were detected correctly by Yolov5. However, one worker at the middle top of the frame was not detected by Yolov8.

4.2. Discussion

This work proves the feasibility of using edge computing devices to detect construction workers in construction management by providing comparative tests, thus contributing to the research community. It also includes detailed information about the performance of construction worker detection on local computers and different types of edge computing devices, which can support the application of edge computing in real building projects in the future. The test results show that the computing speed of Yolov8 is faster than the speed of Yolov5 on all tested devices except for Xavier NX. However, it still achieved the fastest computing speed among all edge computing devices. Typically, the computing speed of the local computer is approximately six times faster than that of Xavier NX.

Compared with the GPU and CPU consumption of Yolov8 and Yolov5 on all the tested devices, it can be noticed that Yolov8 requires higher CPU consumption than Yolov5. However, the GPU consumption of Yolov8 is lower than Yolov5. Therefore, if the Yolov5 model is used for detection, choosing a device equipped with a higher-performance GPU is best.

To achieve real-time video processing with 15 fps, the computing device should be able to process per video frame within 67 MS. The comparison results in Table 3 show that all the edge computing devices tested in this work meet this requirement. Therefore, based on Table 2, considering the cost of the devices, Raspberry Pi 4B can be chosen as the most efficient one for construction monitoring. Considering the device’s size, which can be easier to install on the construction site, Jetson Nano should be selected.

5. Conclusions

This study proposes an edge computing approach to enhance the implementation of CV-based worker detection in monitoring scenarios. The viability of employing edge computing for automated construction oversight is confirmed. First, one of the most widely used detection methods for construction workers was applied to the test. Then, the worker detection models yolov5 and yolov8 were tested on the local computer and three recently developed edge computing devices (i.e., Jetson Nano, Raspberry Pi 4B, and Jetson Xavier NX) to evaluate and compare their performances. The test results on construction videos showed that the mAP value of three tasks on local computers and edge devices are the same. The processing speed on the local computer is 0.8s per video frame, which is six times faster than Jetson Xavier NX, the fastest among the three tested edge computing devices.

This study makes three key contributions. Firstly, it proposes an edge computing solution to address data security, complex site installation, and time delays commonly encountered in traditional CV-based construction site monitoring. Secondly, it evaluates the feasibility of using edge computing for automated visual surveillance of construction sites by testing a CV-based worker detection task on local computers and edge computing devices. Lastly, this study compares the performance of visual tasks on four different edge computing devices, offering valuable insights for construction managers in selecting suitable devices for future construction management. Overall, by providing performance benchmarks between local and edge computing, this study is a valuable resource for industry stakeholders. Insights gained from this research can guide the adoption of edge computing solutions in construction management, promoting innovation and enhancing overall project productivity and safety. This is crucial for improving the scalability and practicality of CV-based construction site monitoring systems.

Overall, the contribution of this study lies in two aspects. In a theoretical aspect, a more efficient and safer edge computing method has been proposed to improve the efficiency of automatic construction management, promote innovation, and enhance the overall productivity and safety of the whole project. Furthermore, this research benchmarks the performance of local computing versus edge computing for CV tasks. This comparative analysis provides a foundational understanding of the computational requirements and capabilities of edge computing in construction management applications, potentially informing the development of future theoretical models and computational frameworks. In a practical aspect, this study’s comparison of edge computing devices’ performance offers practical guidance to construction management, enabling informed decisions on selecting hardware that best fits their monitoring needs. This practical contribution can lead to more effective CV implementations in construction, optimizing resource allocation and improving safety standards, which are crucial for improving the scalability and practicality of CV-based construction site monitoring systems.

Several limitations of this research should also be noted. This study exclusively examined one CV-based deep learning task type: worker detection. However, currently, there are various types of CV-based construction tasks. More tasks should be tested to validate the performance of the CV-based construction tasks on edge computing devices. Second, the Yolov5 and Yolov8 models used for testing are created for local computer use. Therefore, the performance of these models on local computers is superior to that of edge computing devices. Testing the models designed for edge computing use will be better. The primary objective of this study is to explore the performance of edge computing devices in worker detection within the construction industry. Consequently, this study has limitations in assessing deployment costs, environmental conditions, and the variability among different devices utilized.

Based on the limitations, more CV-based construction tasks such as worker and equipment tracking, activity recognition, and instance segmentation should also be tested to compare their performances on local computers and edge computing devices. In addition, an improved model adapted to lightweight edge computing devices should be created in future works. It might exhibit superior performance on edge computing devices. Moreover, an edge computing system that includes camera and edge computing device placement, data transfer methods, and network structure should be designed and validated in an actual construction environment.

Author Contributions

Methodology, X.X.; software, C.C.; validation, X.X. and M.S.; formal analysis, C.C.; resources, H.L. and Y.D.; writing—original draft, X.X. and C.C.; writing—review and editing, M.S. and Y.D.; visualization, X.X. and Y.D.; supervision, M.S. and H.L.; project administration, C.C.; funding acquisition, H.L. All authors have read and agreed to the published version of the manuscript.

Funding

The research was funded by the Shenzhen Science and Technology Innovation Committee [grant No. SGDX20211123114400001] and the Hong Kong Innovation and Technology Commission [grant No. GHP/166/21SZ]. The Shenzhen Science and Technology Innovation Committee funded the APC.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

Author Xue Xiao was employed by the company Shenzhen THS Hi-Tech Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Bello, S.A.; Oyedele, L.O.; Akinade, O.O.; Bilal, M.; Delgado, J.M.D.; Akanbi, L.A.; Ajayi, A.O.; Owolabi, H.A. Cloud computing in construction industry: Use cases, benefits and challenges. Autom. Constr. 2021, 122, 103441. [Google Scholar] [CrossRef]
Xiao, B.; Zhang, Y.; Chen, Y.; Yin, X. A semi-supervised learning detection method for vision-based monitoring of construction sites by integrating teacher-student networks and data augmentation. Adv. Eng. Inform. 2021, 50, 101372. [Google Scholar] [CrossRef]
Arabi, S.; Haghighat, A.; Sharma, A. A deep-learning-based computer vision solution for construction vehicle detection. Comput.-Aided Civ. Infrastruct. Eng. 2020, 35, 753–767. [Google Scholar] [CrossRef]
Park, M.-W.; Elsafty, N.; Zhu, Z. Hardhat-Wearing Detection for Enhancing On-Site Safety of Construction Workers. J. Constr. Eng. Manag. 2015, 141, 04015024. [Google Scholar] [CrossRef]
Flah, M.; Suleiman, A.R.; Nehdi, M.L. Classification and quantification of cracks in concrete structures using deep learning image-based techniques. Cem. Concr. Compos. 2020, 114, 103781. [Google Scholar] [CrossRef]
Wang, T.; Zhang, G.; Liu, A.; Bhuiyan, M.Z.A.; Jin, Q. A Secure IoT Service Architecture with an Efficient Balance Dynamics Based on Cloud and Edge Computing. IEEE Internet Things J. 2019, 6, 4831–4843. [Google Scholar] [CrossRef]
Satyanarayanan, M. The Emergence of Edge Computing. Computer 2017, 50, 30–39. [Google Scholar] [CrossRef]
Cao, K.; Liu, Y.; Meng, G.; Sun, Q. An Overview on Edge Computing Research. IEEE Access 2020, 8, 85714–85728. [Google Scholar] [CrossRef]
Chen, C.; Gu, H.; Lian, S.; Zhao, Y.; Xiao, B. Investigation of Edge Computing in Computer Vision-Based Construction Resource Detection. Buildings 2022, 12, 2167. [Google Scholar] [CrossRef]
Wu, J.; Cai, N.; Chen, W.; Wang, H.; Wang, G. Automatic detection of hardhats worn by construction personnel: A deep learning approach and benchmark dataset. Autom. Constr. 2019, 106, 102894. [Google Scholar] [CrossRef]
Nath, N.D.; Behzadan, A.H.; Paal, S.G. Deep learning for site safety: Real-time detection of personal protective equipment. Autom. Constr. 2020, 112, 103085. [Google Scholar] [CrossRef]
Zhang, J.; Qian, S.; Tan, C. Automated bridge surface crack detection and segmentation using computer vision-based deep learning model. Eng. Appl. Artif. Intell. 2022, 115, 105225. [Google Scholar] [CrossRef]
Jiang, Y.; Pang, D.; Li, C. A deep learning approach for fast detection and classification of concrete damage. Autom. Constr. 2021, 128, 103785. [Google Scholar] [CrossRef]
Xiao, B.; Lin, Q.; Chen, Y. A vision-based method for automatic tracking of construction machines at nighttime based on deep learning illumination enhancement. Autom. Constr. 2021, 127, 103721. [Google Scholar] [CrossRef]
Kim, J.; Chi, S. Multi-camera vision-based productivity monitoring of earthmoving operations. Autom. Constr. 2020, 112, 103121. [Google Scholar] [CrossRef]
Zhu, Z.; Ren, X.; Chen, Z. Integrated detection and tracking of workforce and equipment from construction jobsite videos. Autom. Constr. 2017, 81, 161–171. [Google Scholar] [CrossRef]
Yu, M.; Gong, L.; Kollias, S. Computer vision based fall detection by a convolutional neural network. In Proceedings of the 19th ACM International Conference on Multimodal Interaction, Glasgow, UK, 13–17 November 2017; Association for Computing Machinery: New York, NY, USA, 2017; pp. 416–420. [Google Scholar] [CrossRef]
Ramirez, H.; Velastin, S.A.; Meza, I.; Fabregas, E.; Makris, D.; Farias, G. Fall Detection and Activity Recognition Using Human Skeleton Features. IEEE Access 2021, 9, 33532–33542. [Google Scholar] [CrossRef]
Luo, X.; Li, H.; Cao, D.; Yu, Y.; Yang, X.; Huang, T. Towards efficient and objective work sampling: Recognizing workers’ activities in site surveillance videos with two-stream convolutional networks. Autom. Constr. 2018, 94, 360–370. [Google Scholar] [CrossRef]
Luo, H.; Xiong, C.; Fang, W.; Love, P.E.D.; Zhang, B.; Ouyang, X. Convolutional neural networks: Computer vision-based workforce activity assessment in construction. Autom. Constr. 2018, 94, 282–289. [Google Scholar] [CrossRef]
Kim, J.; Chi, S. Action recognition of earthmoving excavators based on sequential pattern analysis of visual features and operation cycles. Autom. Constr. 2019, 104, 255–264. [Google Scholar] [CrossRef]
Chen, C.; Zhu, Z.; Hammad, A. Automated excavators activity recognition and productivity analysis from construction site surveillance videos. Autom. Constr. 2020, 110, 103045. [Google Scholar] [CrossRef]
Chen, C. Automatic vision-based calculation of excavator earthmoving productivity using zero-shot learning activity recognition. Autom. Constr. 2023, 146, 104702. [Google Scholar] [CrossRef]
Jocher, G. YOLOv5 by Ultralytics. 2020. Available online: https://github.com/ultralytics/yolov5 (accessed on 10 May 2024). [CrossRef]
Reis, D.; Kupec, J.; Hong, J.; Daoudi, A. Real-Time Flying Object Detection with YOLOv8. arXiv 2023, arXiv:2305.09972. [Google Scholar] [CrossRef]
Ge, P.; Chen, Y. An Automatic Detection Approach for Wearing Safety Helmets on Construction Site based on YOLOv5. In Proceedings of the 2022 IEEE 11th Data Driven Control and Learning Systems Conference (DDCLS), Chengdu, China, 3–5 August 2022; pp. 140–145. [Google Scholar] [CrossRef]
Xiao, B.; Kang, S.-C. Vision-Based Method Integrating Deep Learning Detection for Tracking Multiple Construction Machines. J. Comput. Civ. Eng. 2021, 35, 04020071. [Google Scholar] [CrossRef]

Figure 1. Flowchart of the framework.

Figure 2. Structure of YOLOv5 (image Reprinted/adapted with permission from Ref [9]. 2023, MDPI).

Figure 3. Structure of YOLOv8.

Figure 4. Images of the proposed edge computing devices.

Figure 5. Training curves of mAP, precision, and recall Yolo-v5 and Yolov8.

Figure 6. Example frames of the test video.

Table 1. Detailed specifications of edge computing devices.

	Jetson Nano	Raspberry Pi 4B	Jetson Xavier NX
GPU	128-core NVIDIA Maxwell architecture GPU	N/A	384-core NVIDIA Volta architecture GPU with 48 Tensor Cores
CPU	Quad-core ARM Cortex-A57 MPCore	Broadcom BCM2711, quad-core Cortex-A72 (ARM v8) 64-bit SoC @ 1.8 GHz	6-core NVIDIA Carmel ARM v8.2 64-bit CPU 6 MB L2 + 4 MB L3
Memory	4 GB 64-bit LPDDR4 25.6 GB/s	8 GB LPDDR4-3200 SDRAM	8 GB 128-bit LPDDR4x 59.7 GB/s
Display	2 multi-mode DP 1.2/HDMI 2.0 1 × 2 DSI	2 × micro-HDMI 1 × 2 DSI	2 multi-mode DP 1.4/eDP 1.4/HDMI 2.0
Data storage	32 GB Micro-SD	128 GB Micro-SD	128 G SSD
Connectivity	1× GbE	WLAN, Bluetooth 5.0, GbE	1× GbE
Size	69.6 mm × 45 mm	85 mm × 56 mm	69.6 mm × 45 mm
Price	HK$ 1500	HK$ 500	HK$ 4500

Table 2. Comparison training results of yolov5 and yolov8.

	mAP	Precision	Recall
Yolov5	0.852	0.86	0.79
Yolov8	0.824	0.78	0.82

Table 3. Comparison results of the local computer and edge computing devices.

	Model	Speed (ms/per Frame)	GPU Consumption (%)	CPU Consumption (%)	mAP (%)
Local computer	Yolov5	0.9	77	159.5	85
Local computer	Yolov8	0.8	69	185.7	82
Jetson Nano	Yolov5	14.2	84	36.5	85
Jetson Nano	Yolov8	13.2	78	53.9	82
Raspberry Pi 4B	Yolov5	45.9	N/A	400	85
Raspberry Pi 4B	Yolov8	25.7	N/A	400	82
Xavier NX	Yolov5	5.4	68	120.3	85
Xavier NX	Yolov8	5.9	57	168.2	82

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xiao, X.; Chen, C.; Skitmore, M.; Li, H.; Deng, Y. Exploring Edge Computing for Sustainable CV-Based Worker Detection in Construction Site Monitoring: Performance and Feasibility Analysis. Buildings 2024, 14, 2299. https://doi.org/10.3390/buildings14082299

AMA Style

Xiao X, Chen C, Skitmore M, Li H, Deng Y. Exploring Edge Computing for Sustainable CV-Based Worker Detection in Construction Site Monitoring: Performance and Feasibility Analysis. Buildings. 2024; 14(8):2299. https://doi.org/10.3390/buildings14082299

Chicago/Turabian Style

Xiao, Xue, Chen Chen, Martin Skitmore, Heng Li, and Yue Deng. 2024. "Exploring Edge Computing for Sustainable CV-Based Worker Detection in Construction Site Monitoring: Performance and Feasibility Analysis" Buildings 14, no. 8: 2299. https://doi.org/10.3390/buildings14082299

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Exploring Edge Computing for Sustainable CV-Based Worker Detection in Construction Site Monitoring: Performance and Feasibility Analysis

Abstract

1. Introduction

2. Literature Review

3. Methods

3.1. CV-Based Construction Object Detection Models

3.2. Edge Computing Devices

3.3. Evaluation Criteria and Strategies

4. Results and Discussion

4.1. Test and Comparison Results

4.2. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI