*4.2. Model Training Data Analysis*

In machine learning and deep learning, a loss function is frequently used to evaluate the error between predictions and valid values. The smaller the value, the closer the prediction to the actual value and the more accurate the model. Loss functions commonly used are mean square error (MSE) and cross-entropy; the former is usually used for regression and the latter for classification.

Data outcomes were evaluated based on the performance of the two accuracy indicators, F1 measure and overall accuracy, on the model. Both indicators above were determined using the four factors of the confusion matrix, and they were true positive (TP), true negative (TN), false positive (FP), and false negative (FN). The F1 measure was the harmonized average between accuracy and recall. It was used as an indicator of model performance and expressed as:

$$\text{F1 Measure} = \frac{(2 \times \text{Precision} \times \text{Recall})}{(\text{Precision} + \text{Recall})} \tag{1}$$

The overall accuracy was defined as the ratio of correct prediction of positive and negative samples in the models over all samples and expressed in Equation (2):

$$\text{Overall Accuracy} = \frac{(\text{TP} + \text{TN})}{(\text{TP} + \text{FP} + \text{FN} + \text{TN})} \tag{2}$$

The single shot multibox detector (SSD) was deployed to identify the classes of rebar, worker, and machine in all images collected in the data set. A total of 461 images were collected, including 400 photos of job site activities as machine learning samples, with 80% images for training. In addition, 40 images, accounting for 10% of the data set, served as the test samples during the training; another 40 were used as verification samples, accounting for 10%. In the end, 61 photos the model had not seen were brought in for recognition, and a 1 × 1 confusion matrix was generated, as shown in Table 3.

**Table 3.** Confusion matrix generated by single shot multibox detector model.


A calculation was performed for the two accuracy evaluation indicators based on the four factors generated in the confusion matrix. It was found that the F1 measure was 64%, and the oval accuracy was 66%. The details are provided in Table 4.

**Table 4.** The two accuracy evaluation indicators of the single shot multibox detector model.


The process mentioned above reveals that an SSD-based job site activity image recognition system is built by combining the job site image data collected and deep learning in AI. This system can identify and tag essential objects in a job site image, such as workers, machines, and construction materials. With more job site activity information gained from image recognition, the proposed system may help project managers develop project decisions regarding construction safety, job site configuration, progress control, and quality management, thus improving industrial competitiveness.
