4.3. Data
The experimental dataset in this study consisted of 3000 crack pictures captured via the UAV, which were divided into training and test sets at a 9:1 ratio.
In the pre-processing stage, part of the training set was augmented to improve the generalization ability of the algorithm model. The image transformation method adopted in the dataset enhancement was still close to the tunnel crack image collected after image processing, including random brightness transformation, random horizontal flipping, and random vertical flipping. The transformation results after processing are shown in
Figure 7.
The image was scaled and standardized before being input into the network. The widths and heights of the scaled images were 512. The mean values of the standardized RGB three-channel were 123.675, 116.28, and 103.53, and the standard deviation was 58.395, 57.12, 57.375.
CenterNet determines the target’s location by predicting the target centre point, target centre point bias, and target size. Therefore, the corresponding labels of the image include the target centre-point Gaussian heat map, target centre-point bias, and target size, which are represented by a tensor of the same size as the network output.
4.4. Training Process and Experimental Results
To ensure the real and effective results of the comparative experiments, the training parameters used in all the experiments involved in this study were completely consistent. The initial learning rate of the training was 0.0001. The cosine annealing learning rate adjustment method was adopted, and the minimum learning rate was 0.00001. The batch size was set to eight during the training process. A total of 300 epochs were trained using the SGD optimization algorithm.
The training experiments were conducted in five groups: original CenterNet with the backbone network of ResNet18, CenterNet with the channel space attention mechanism, CenterNet with the feature selection module, CenterNet with target size loss improvement, and CenterNet with the above three improvements.
Table 2 compares the performance of CenterNet with the addition of CBAM and feature-selection modules, including FLOPS, FPS, and video memory.
In the data training process, owing to the different difficulties involved in data feature extraction, there are overlaps and omissions in some data, as shown in
Figure 8. Given this situation, the optimized model used in this study adopts the method of strengthening the feature extraction. This situation changed significantly after adding the feature extraction module, and the data processing accuracy was effectively improved.
When the test environment of the controlled experiment was the same as that of the training environment, the batch size of the experiment was set to one. The ablation experiments are summarized in
Table 3. From the ablation experiment, the following results were obtained:
After the CBAM module was added, the model size increased by 1.4 MB, FPS decreased by 106.6, video memory increased by 2 MB, FLOPs remained unchanged, and AP increased by 0.072 compared with the original model.
After adding the feature selection module, the model size increased by 0.8 MB, FPS decreased by 46.3, video memory increased by 58 MB, FLOPs increased by 3.29, and AP increased by 0.101 compared with the original model.
After IOU optimization in the original model, the size increased by 0.5 MB, FPS decreased by 123.7, video memory increased by 31 MB, FLOPs increased by 2.2, and AP increased by 0.021 compared with the original model.
After the addition of the feature selection module, the optimized model decreased the target size loss faster than the original CenterNet because the feature selection module could adaptively select the underlying features (such as the target texture and edge information) in the downsampling process and add them to the feature map in the upsampling process. Thus, the target size could be learned more quickly.
The change curve for the CenterNet target size loss after the original CenterNet and the addition of the feature selection module are shown in
Figure 9.
The feature selection module can adapt to underlying features, which is also evident in the actual detection effect. As shown in
Figure 10, after adding the feature selection module, the optimized model can predict the crack size more accurately due to the inclusion of information such as the crack edge.
The CBAM and feature selection modules, particularly the CBAM module, significantly impact the reasoning speed of the network. This is because, after the CBAM module is added to each ResBlock, the FPS of the network decreases overall, whereas the feature selection module reduces the FPS. Regarding video memory usage, the impact of the two additional modules was relatively small.
The feature information of the entire network is compressed by the subsampling module, which reduces the workload of subsequent network training and increases the reasoning speed of the entire network. The input information in the upper layer of the network is enhanced after the feature extraction module, and the upsampling stage uses fewer convolutional layers to improve the running speed of the network. The information about each input and output layer of the overall network optimized in this study is shown in
Table 4.
To demonstrate the improvement in the performance of the model before and after optimization more intuitively, five groups of training processes were randomly selected for comparison, as shown in
Figure 11. Dark blue represents the data processing accuracy of the original CenterNet model, and yellow represents the improvement in accuracy brought about by the optimized CenterNet-CBAM-FS-IOU model.
After optimization, the overall processing accuracy of CenterNet improved to a certain extent, and it could effectively identify cracks in construction concrete with a shorter training time. The actual detection effect is shown in
Figure 12, where the red box represents the detection crack prompts, and the number represents the detection number.
As a classic anchor-free model in the field of computer vision, the CenterNet model has a wide range of applications and optimization in various disciplines.
Table 5 shows the comparison between the AP, recall, and F
1 score after optimizing the CenterNet model. It is not difficult to find the improvement in the detection effect of the optimization scheme proposed in this paper through comparison.
In order to demonstrate the effectiveness of the proposed method more clearly, it was compared with common crack detection algorithms under the same experimental conditions. The processing results of different crack detection algorithms are shown in
Table 6, and the actual detection process is shown in
Figure 13. The experimental results show that the method proposed in this paper is superior to other methods, the processing effect of Mask-RCNN has the smallest gap with the method proposed in this paper, and the data processing results are basically close. Through comparison, it can be seen that the method proposed in this article outperforms other detection algorithms in terms of AP, recall, and F
1.