**2. Literature Review**

A construction project has unique complexity. The completion of a project involves an engineering lifecycle consisting of many links, from design and construction to final acceptance. In an era in which the development of technical information evolves at the speed of light, the innovative technologies and management systems used in construction management help not only maintain control over safety and health as the construction work progresses but also facilitate the successful completion of construction projects by reducing uncertainties while focusing on the goal of sustainable development [26].

Artificial intelligence, or AI, is an engineering study focusing on researching and developing intelligent entities. AI includes the use of programs and big data to make computers and machines mimic human thinking and simulate the "intelligent" behaviors of a human being; when AI is the object of study, machine learning (ML) is a model to improve the performance of specific algorithms while learning from experiences, i.e., learning from data collected [27]. However, data learning is based on massive data processed using a multilayer neural network. A self-learning method is found after linear or nonlinear conversion via multiple processing layers, which automatically extracts features representative of data characteristics in place of the long time taken for traditional feature engineering. Deep learning is a technology that evolved from machine learning [28].

The applications of deep learning in computer vision in recent years are in the following classes [29], as shown in Figure 1: (1) classification: putting an image in one of the established classes by its nature and type; (2) semantic segmentation: identifying pixel blocks by event type instead of classifying into "instances"; (3) classification + localization: tagging a message to a single object with its location and size (w, h); (4) object detection: tagging multiple objects with their locations and sizes; and (5) instance segmentation: tagging "instances"; the objects of the same class are identified by individual locations and sizes, particularly when they are overlapping.

**Figure 1.** Applications in computer vision.

Most recent object detection studies are focused on the use of a CNN for typical model applications in which a matching object is identified before determining in which area a matching thing exists and tagging the location of highest probability with a box, as shown in Figure 2. Two fully connected layers are connected behind the CNN, one for classification and the other for tagging the matching area. There are three algorithms to organize an area: sliding window, region proposal, and grid-based.

**Figure 2.** Locating algorithm model.

1. Sliding window: a simple but time-consuming method based on the method of exhaustion. It works by establishing windows of various sizes for image scanning and extracting the feature information of every image window. Next, the data is fed to a classifier for object recognition to determine if the probability of the window matching the object to be detected is accurate. This method is the simplest but most time-consuming [30], as presented in Figure 3.

**Figure 3.** Sliding window algorithm.

2. Region proposal: information in the image, such as texture, edges, and color, are used to predetermine the regions of interest (ROI) containing the object and determine the probability of these regions for matching. The high recall is maintained by filtering thousands of regions per second. Similar algorithms are R-CNN, Fast R-CNN, and Faster R-CNN [31–34], as shown in Figure 4.

**Figure 4.** Region proposals algorithms.

3. Grid-based regression: a picture is divided into grids, and regions of various sizes are selected with the grids as centers. Regression determines the probability that every bounding box contains the target. This approach is suitable for real-time detection. Similar algorithms are you only look once (YOLO) and single shot multibox detector (SSD) [35], as shown in Figure 5.

**Figure 5.** Region Proposal algorithms.

You only look once (YOLO) predicts multiple bounding boxes and types of CNNs, realizing end-to-end target detection and identification. This algorithm avoids the weakness that object detection must be trained separately and accelerates the computation dramatically [36], as indicated in Figure 6.

**Figure 6.** Structure of a YOLO model.

The single shot multibox detector (SSD) is based on a feed-forward CNN that generates bounding box sets and scores of different types on the boxes, followed by non-maximum value suppression to complete the final detection process. This explains the incorporation of both the regression concept in YOLO and the anchor mechanism in Faster-CNN in single shot multibox detector (SSD), as regression is performed on the multi-dimensional region features of every location in the entire picture, which retains YOLO's characteristics of being fast while ensuring the window prediction is as accurate as Faster-RCNN [37], as shown in Figure 7.

**Figure 7.** Default boxes in the single shot multibox detector model.

Liu et al. (2016) tested the speed and accuracy of different object detection methods. The test results are shown in Table 1:

**Table 1.** Object detection algorithm speed and accuracy comparison.


A fast YOLO has faster processing speed but poor mAP. Although Faster R-CNN has a higher accuracy rate (73.2% mAP), it is not significantly more accurate at determining the number of images. In contrast, a single shot multibox detector (SSD) not only has a high accuracy rate but also a fast image detection speed [36].

Single shot multibox detector (SSD) object recognition has been used in many engineering applications. For example, Yudin and Slavioglo [38] used the single shot multi-box detector (SSD) to test how well the model identifies a traffic light, producing good results. Wang et al. [39] proposed an improved single shot multibox detector (SSD) capable of detecting a ship in a noisy background. The results were compared with those from Faster

R-CNN, and it was found that the enhanced single shot multibox detector (SSD) improved detection accuracy.

Much research on image recognition using deep learning has accumulated in recent years. Many people use deep learning technology in artificial intelligence to let computers handle more complex image recognition problems. Table 2 shows the development of deep learning in the construction industry in the past five years of applied research on image recognition.

**Table 2.** Research on the application of deep learning in construction image recognition.


Source: This study collated.

Past studies used deep learning algorithms to recognize three postures of construction workers, including standing, bending over, and squatting [20–22]. They provide engineering professionals with comprehensive deep learning solutions for detecting construction vehicles [23,24]. Only single objects, such as people, materials, or engineering vehicles, were seen in the above studies; therefore, the shapes and boundary types recognized were relatively pure. This study uses image automation to simultaneously identify workers, machinery, and materials in the current construction situation, assist the construction site manager in making safety judgments on the location of construction equipment, safety

protection measures, and material stacking, and monitor the construction status and maintenance of the construction site to reduce environmental hazards and control progress.

With the continuous evolution of technology, combining big data and artificial intelligence machine learning/deep learning can maximize the value of data. Therefore, this research collects the construction site image data set, imports the object detection system, uses it as artificial intelligence and machine learning training data, and builds AI to automatically identify the personnel, materials, and equipment on the construction site. In the future, continuous learning, modification, and technical improvement can reduce or avoid labor accidents on the construction site, thereby improving construction efficiency and schedule management.
