**4. Results and Discussion**

The training process was implemented on a server with a high-performance GPU (NVIDIA GeForce RTX 2080 Super), 64 GB DDR4 memory, and an Intel i9-10900K CPU. The training process is based on the deep learning framework Pytorch 1.14.

### *4.1. Test Comparison Results of Sub-Modules of the Improved Algorithm*

In order to fully verify the rationality of the various improvements to the YOLOv4 network model in this paper, a step-by-step verification experiment is performed. Objective evaluation indexes are used to qualitatively measure the quality and speed of the restoration algorithm. The commonly used objective metrics for evaluating object detection performance are mAP, parameters, and calculation amount (FLOPs).

First, the main framework of the model is improved, and the specific results are compared as shown in Table 3. It can be seen that when only the main framework of the model is improved, mAP is 96.43% of the original YOLOv4 model, and the amounts of the model parameters and algorithm calculations are reduced by 83.89% and 97.12%, respectively.


**Table 3.** Improved performance comparison of main model framework.

Based on the depth separable convolution structure, the ordinary convolution of the SPP structure and the PANet module is replaced by spatial convolution. Table 4 shows the performance comparison of the improved main model framework, PANet and SPP structure. It can be seen that after replacing the ordinary convolution of the SPP and PANet structure with separable convolution, the parameters are reduced from 10.31 M to 8.22 M, and the amount of calculations is reduced from 1.84 GMacs to 0.69 GMacs. The amount of model parameters and calculations are both greatly reduced.

**Table 4.** Performance comparison of improved main model framework, PANet and SPP structure.


To improve the forward reasoning speed of the model, the parameters in the convolutional layer and the batch normalization (BN) layer are merged. Frames per second (FPS) is used to consider the effectiveness of the forward inference speed, and the processing speed is compared with that of the original YOLOv4 network model. The specific index FPS are shown in Table 5. It can be seen that compared with the original YOLOv4 network model; the FPS processed by the improved network model is greatly improved. After the parameters in the convolution layer and the BN layer are merged, the processing speed of a 256 × 256 pixel image is increased from 101 FPS to 112 FPS.

**Table 5.** Performance comparison of merge convolution layer and BN layer.


To balance the proportion of the foreground and background data samples, the specific mAP, model parameters and calculation amount of the improved loss function are shown in Table 6. It can be seen that after adding the modulation factor to the loss function of the model, the mAP of the algorithm in the test database increases from 91.89% to 94.09%.

Through a complete comparison between the original YOLOv4 model and the improved YOLOv4 model, the tested mAP is shown in Figure 8. It can be seen that when the self-made database is used for detection, the mAP of the original YOLOv4 model is 95.50%, the mAP of the improved model is 94.09% and the detection performance decreases slightly. It can be concluded from Table 6 that the image detection performance mAP of the improved Yolov4 model is 98.52% of that of the original model, and the model parameters and calculation amount are reduced by 87.43% and 99.00%, respectively.


**Table 6.** Comparison of performance of different algorithms using RSOD dataset for object detection.

**Figure 8.** mAP of the original YOLOv4 model and the improved model. (**a**) mAP of the original YOLOv4 model; (**b**) mAP of the improved model.

In this study, the model modules are improved to reduce the depth and complexity of the overall network structure. Meanwhile, the separable convolution is used to realize spatial convolution, and the SPP and PANet modules are improved to reduce the model parameters. The convolutional layer and batch normalization layer are merged to improve the model inference speed. In addition, using the focal loss function for reference, the loss function of object detection network is improved to balance the proportion of the cracks and the background samples. The detection performance of the improved model is satisfactory in mAP, and the model size and calculation amount are greatly reduced.

### *4.2. Comparative Results of Frontier Algorithm Tests*

In this paper, the self-made database is used for training and testing, and the frontier network models in the field of object detection are used for comparison. The comparison results are shown in Table 7. It can be concluded that the improved model exhibits almost no loss in mAP compared to the high-performance algorithms, but the model size and calculation amount are greatly reduced. Through comparison with the faster lightweight network models, it can be seen that the model sizes are close, but the calculation amount FLOPs are reduced, and the detection performance mAP is higher than that of the classic lightweight network models. To demonstrate the detection performance of the improved model more intuitively, images shown in Figure 9 were randomly selected from the database for testing.


**Table 7.** Comparison of object detection performance of different algorithms.

**Figure 9.** Detection results of the concrete surface cracks.
