*4.1. Establishment and Testing of Single Shot MultiBox Detector Model*

The main feature extraction program used to establish a single shot multibox detector (SSD) model was vgg.py. Features were extracted using 9 module computati9on feature layers in the sizes of 38 × 38, 19 × 19, 10 × 10, 5 × 5, 3 × 3, and 1 × 1 (Figure 13). At the first convolution computation feature layer, the image fed was 300 × 300 in size. Randomly generated 3 × 3 filters were used at the convolution layer to extract 64 features, and the activation function of ReLu was adopted to eliminate negative values. Batch normalization was introduced next to improve the stability of data distribution. After two rounds of convolution feature extraction, the pooling layer shrank the image down to 150 × 150 in size for the convolution computation of the second set. The filters extracted 128 features at the second set convolution computation feature layer. The same applied to the rest of the computation. Ultimately, the pooling layer reduced the images to 1 × 1 in size.

The detect\_image feature in the ssd.py program was used for predicting and testing the results. The height and width of the picture were determined after the photo was fed. However, the picture was converted into RGB format to improve detection for the pre-training weight of the image and convenience of color setup in the box. The letterbox\_image feature was used to identify the resized image without distortion. The image was normalized based on the batch\_size attribute before being fed into the model for regression and type prediction.

Data sets needed to be imported into classes\_path while the image training program train.py parameters were established to identify the image classes of rebar, worker, and machine. The pre-training weight, weight\_path, was established, and the shape was selected to be 300 × 300. The prior box size was defined as anchors\_size = [30, 60, 111, 162, 213, 264, 315]. The image training consisted of 2 stages, "freeze" and "unfreeze." The feature extraction network experienced no change during the freezing stage but minor network tuning. Thus, 50 generations were established. The number of data samples captured for one training run was 16. The backbone and feature extraction network experienced changes during the unfreezing stage. Ample memory was used, and, therefore, 100 generations were established. The number of training samples was 8.

The single shot multibox detector (SSD) program selected the pattern to be detected during the establishment test on the training outcome prediction program predict.py. The parameter setting patterns during the detection were single pictures, pre-recorded footage, or images captured directly from the camera. For this study, images were used for the prediction model.
