1. Introduction
The sudden loss of symmetry stability of the bridge centerline affects the bridge health’s structure and then causes accidents [
1]. As the main components commonly used in bridges, bolts can effectively reduce distortions and fatigue stresses and avoid the failure of bridge health structures [
2]. The missing and loose bolts caused by long-term exposure to harsh environments affects the overall structural health of the bridge and further cause accidents [
3]. Bolt detection usually adopts manual detection. This method has a large amount of detection, low efficiency, and large errors and requires high energy and the experience of inspection personnel [
4]. In this way, it not only consumes a lot of manpower and material resources but also has the possibility of missing bolt detection. Therefore, to maintain bridge structure health, it is necessary to develop a real-time accurate detection method for bridge bolts.
Structure health monitoring based on machine learning shows a good ability to detect defects and damage in engineering structures [
5]. As a branch of machine learning, computer vision has become an important way and development direction of structure health monitoring [
6]. Some studies have proposed that loose bolt detection based on computer vision is an important research direction in structure health monitoring [
7], and bolt loosening causes repeated load and vibration of steel structures in bridges. Computer vision object detection is used to locate bolts and realize real-time damage monitoring of bolts, which plays an important role in ensuring the structure health and stability of bridges, vehicles, and dams [
8]. The connection status of bolts has an important impact on the safety and reliability of the whole bridge structure’s health [
9], so it is necessary to strengthen the application of intelligent technology to detect bolt status in bridge structure health monitoring.
Object detection based on computer vision is a process in which the computer extracts the color, geometry, corner, texture, and other attribute features of the target objects, quickly locates the relevant information in the image, extracts and analyzes the relevant information, and finally understands the target object. Existing object detection algorithms are divided into traditional object detection methods and object detection methods based on deep learning. Traditional object detection methods obtain the features of detected objects in images by manually designing feature descriptors. This method encodes the features and uses machine learning classifiers for detection [
10,
11,
12]. The selection of different descriptors for detected object features affects the detection accuracy. The generalization ability of traditional object detection methods needs to be improved. Object detection methods based on deep learning are mainly divided into regression/classification-based frameworks and region proposal-based frameworks [
13]. The region proposal-based framework usually includes a multi-stage process such as region proposal, region classification, and region position adjustment. The region proposal-based framework has the advantages of high detection accuracy and shared calculation parameters, but it has the disadvantages of long training time and slow detection speed. The regression/classification-based framework directly detects the class probabilities and region coordinates of the objects. The framework can detect the object in one stage. Regression/classification-based framework is fast in detection and good at learning the features of objects. This framework mainly includes YOLO series [
14,
15,
16,
17] and SSD [
18]. Regression/classification-based framework is more suitable for real-time detection, but the accuracy of small-scale object detection needs to be improved.
YOLO series object detection network is widely used in the object detection of bridge structure health monitoring because of its advantages of high precision and fast speed [
19]. Deng J et al. [
20] detected concrete cracks through the YOLOv2 object detector, effectively assessing the robustness of bridge structures. At the same time, the proposed YOLOv2 detector was compared with Faster-RCNN, and the YOLOv2 detector showed better results in terms of both accuracy and detection speed. Teng S et al. [
21] used the improved YOLOv3 to detect surface defects (cracks and exposed rebar) in bridge structure health monitoring. The results showed that the improved YOLOv3 performed best in terms of detection accuracy and speed. Han Q et al. [
22] proposed to use the YOLOV3 for panoramic crack location monitoring in view of the structural health monitoring of common steel structure connections, and YOLO V3 proved good performance in both running speed and small-scale object detection. At present, bridge structural health monitoring is mostly based on object detection of cracks in concrete structures. Hou R et al. [
23] innovatively proposed the use of YOLO networks to locate trucks and detect trucks in real time to trigger the work of a bridge structural health monitoring system. Bolts are related to the nature of the bridge structure and are one of the key points of bridge structure health monitoring [
24]. The bridge structure health monitoring system triggered by real-time bolt positioning detection needs further research.
Bolts on the real scene are small-scale objects in computer vision object detection. The object detection combined with the geometry and texture features of the bolt have become the current research direction. In terms of feature description and classifier training, Ramana L et al. [
25] used histograms of oriented gradient (HOG) features and two classifiers to conduct classification training for small-scale bolts. Wang C et al. [
26] converted color images into gray images to improve detection accuracy, and they positioned bolts by convolutional neural network digit recognition and detected bolt rotation angles by Hough transform line detection. Dou Y et al. [
27] used the template matching method and geometric structure constraints to locate railway bolts. Park J et al. [
28] combined Canny edge detector, Hough transform (HT), and circular Hough transform (CHT) to detect nuts. In the region proposal-based framework, Huynh TC et al. [
29] firstly converted the color image to the gray level and then generated candidate areas based on a regional convolutional neural network (RCNN) to detect and locate bolts, which enhanced the accuracy of bolt detection. For the real-time positioning detection of bolts, Sun Y et al. [
30] marked bolts and trained YOLOv5 to locate and detect the marking circle on bolts. Although the detection accuracy is very high, manual labeling of each bolt is required. Zhao K et al. [
31] proposed an improved YOLOv3 algorithm which realizes high-precision detection of bolts through data enhancement, improved clustering algorithms, and fusion features.
Existing object detection methods based on computer vision realize bolt detection through the extraction or constraint of bolt geometric features. Although the above methods achieve high accuracy, most of them still need manual intervention. Bolt detection shows good results in regression/classification-based frameworks, which can adapt to current intelligent real-time detection requirements, but they still lack a high-precision positioning method for bridge bolts.
In this paper, the regression/classification-based framework of YOLOv5 is used to detect bolts on a bridge. The main difficulty of this study is that bolts, as a small-scale objects, make it difficult to extract features and to distribute anchors. It results in insufficient detection accuracy and waste of computing power. The contributions of this study are as follows:
(1) According to the calibration anchor boxes of bridge bolt images, we propose a method based on the K-means++ clustering algorithm to optimize the initial anchor box size and proportion parameters. This method solves the initial anchor boxes clustering problem in new datasets.
(2) To solve the problem that inefficient distribution of anchors and insufficient extraction of classification features, we use the hypercolumn technique to combine the low-level global features of the trunk with the high-level local features of three different scales and use the stairstep upsample structure to generate a single-scale output. This method enhances the detection accuracy and speeds up the detection.
3. Experiment
3.1. Dataset
A dataset of 1494 bolt images was constructed, partly from the public dataset NPU-BOLT [
33] (337 images) and partly from the web collection (1157 images). In order to better enhance the robustness of network detection, the bolt background in the dataset includes whiteboards and bridges. It also contains multi-angle and multi-resolution images. Then we used the original Mosaic data enhancement method in YOLOv5. It randomly clips and scales the four images and then randomly arranges and stitches them together to form an image. This data enhancement method not only enriches the data set but also adds small-scale objects to improve the training speed of the network. We selected 1045 (70%) images as the training set, 299 (20%) images as the val set, and 150 (10%) images as the test set. In this paper, LabelImg image annotation tool was used to label the position of bolts in the image, and the corresponding txt file was generated for training. When constructing the samples, the length of the captured images is resized to 640 pixels, and the width is adjusted accordingly to maintain the original proportion of the image.
3.2. Details
The experimental environment parameters in this paper are as follows: Windows 11 operating system, AMD R7-5800H central processing unit (CPU), Nvidia GeForce RTX 3060LP graphics processor (GPU), pytorch1.10.0+cu113 algorithm framework, and the programming language is Python3.6.
The training parameters in this paper are as follows: the number of training steps is 100 epochs; the initial learning rate is 0.01; the optimizer select SGD to update parameters; the batch size is 8; and the image size is 640 × 640.
Average precision (
AP), precision (
P), and frames per second (
FPS) are selected as evaluation indexes to evaluate the model.
FPS is used to evaluate the speed of object detection, which is the number of images detected by the network per second. The shorter the object detection time of an image is, the faster the network detection speed is. That is, the higher the
FPS value, the better the object detection network can meet the requirements of intelligent industry real-time detection. The specific calculation formulas for
P and
AP are as follows:
where
TP is the result of correctly identifying the object sample,
FP is the result of incorrectly identifying the non-object sample as the object,
FN is the result of incorrectly identifying the non-object sample as the object, and
TN is the result of correctly identifying the non-object sample as the non-object.
P is the accuracy rate,
R is the recall rate,
R measures the integrity of the test results, and
P measures the accuracy of the test.
AP is the area under the
P-
R curve.
AP is an important index to evaluate an object detection network, and the
AP value is proportional to the classification performance of the object detection network.
3.3. Experimental Results
Among the four versions of YOLOv5, the detection accuracy of YOLOv5x is the highest, and the detection speed of YOLOv5s is the fastest. Therefore, we applied the improved method in the experiment of YOLOv5x and YOLOv5s models. The experimental results are shown in
Table 1.
As can be seen from
Table 1, AP and P of the improved YOLOv5x network reach 86.3% and 87.3%, respectively, and the detection speed is also improved. The two metrics of AP and P score the highest in the improved YOLOv5x, and the FPS score the highest in the improved YOLOv5s. Compared with the original YOLOv5x, AP increased by 6.5% and P increased by 5.9%, which proved the effectiveness of the proposed method in improving the detection accuracy. Compared with the improved YOLOv5s, the improved YOLOv5x has higher detection accuracy and lower detection speed, and the FPS of improved YOLOv5x is 26.46, which meets the requirements for real-time detection. Considering the actual requirements, the improved YOLOv5x in this paper is superior to other models in the case of comprehensive consideration of detection accuracy and speed.
In order to verify the accuracy and real-time performance of our method,
Figure 7 shows the variation in precision during training epochs. A sufficient number of epochs ensures the convergence of the whole training process and enables the module to achieve the best performance under specific parameter configuration. As can be seen from
Figure 7, the methods achieve convergence, and the improved YOLOv5 converges after about 30 epochs, while YOLOv5 requires about 60 epochs to reach convergence. Moreover, P of the improved YOLOv5 is always greater than that of the original YOLOv5. Therefore, the new method proposed by us can effectively improve the precision and speed of bolt detection.
In order to more intuitively verify the validity of our method, we selected several representative images and compared the detection results with the original YOLOv5.
Figure 8 shows the visual comparison of detection results. The first row (a) is the detection result of YOLOv5x, and the second row (b) is the detection result of the improved YOLOv5x. Compared with sample 1 in the first column, the improved YOLOv5x can effectively avoid false detection. Compared with sample 2 in the second column, the improved YOLOv5x can improve the accuracy of object detection. Compared with sample 3 in the third column, the improved YOLOv5x can effectively avoid false detection and improve detection accuracy in multi-target detection and can detect some distant minimal objects. It can be seen that the K-means++ clustering algorithm and HC with stairsteps can improve the effect of bolt detection. In conclusion, the improved YOLOv5 can improve the positioning detection accuracy of bolts and reduce the misjudgment of small-scale objects, which can verify the effectiveness of the improved measures in this paper.
3.4. Ablation Experiments
In order to evaluate the influence of the two improved methods on the results, ablation comparison experiments were conducted on the models. The experimental results are shown in
Table 2.
It can be seen from
Table 2 that the addition of K-means++ clustering algorithm can effectively improve the detection accuracy for bolts, and the addition of the HC with stairsteps can not only effectively improve the detection accuracy but also improve the detection speed.
(1) By comparing the experimental results of YOLOv5x and YOLOv5x + Kmeans++, it can be seen that after the improvement of the anchor parameter by the Kmeans++ clustering algorithm, AP and P detected by the model increased by 3.7% and 2.7%, respectively, with no significant change in detection speed. The results showed that the anchor parameter obtained by the Kmeans++ clustering algorithm could improve the accuracy of bolt detection. Therefore, high AP and P values can be guaranteed by selecting appropriate anchor initialization parameters.
(2) By comparing the experimental results of YOLOv5x and YOLOv5x + HC with stairstep, we can see that after the addition of HC with stairstep structures, AP and P detected by the algorithm are improved by 2.9% and 2.1%, respectively, and the detection speed is improved. The results show that the clustering initialization of anchor and the change of network frame parameters can effectively realize the detection of small-scale objects and improve the detection accuracy. HC with stairsteps not only completes the fusion of high-level local features and low-level global features and solves the problem of unbalanced distribution of small-scale object anchor boxes of bolts in the output scale but also avoids the waste of calculation power in the three scale prediction branches.
(3) Compared with the experimental results of YOLOv5x and improved YOLOv5x, it can be seen that after combining the two improved measures, AP and P detected by the algorithm are improved by 5.9% and 6.5%, respectively, and the detection rate is improved. The results show that the combination of the above two improvement measures can improve the overall detection accuracy of the model, and the detection speed is also improved compared with the original model.
4. Conclusions
In this paper, an improved YOLOv5x model is proposed for bolt positioning detection, which is a regression/classification-based framework. Firstly, improved YOLOv5x adopts K-means++ clustering algorithm to learn the proportion and distribution of anchor boxes according to the new dataset. Secondly, the improved YOLOv5x fused the low-level global features of the trunk and the high-level local features of three different scales with an HC technique. In this way, it can solve the problem of the inefficient distribution of anchor boxes and enhance the feature extraction of small-scale objects. Finally, we built a multi-scenario bolts dataset through public datasets and network data acquisition. The improved YOLOv5x was evaluated in the comparison experiment. The results show that our improved YOLOv5x is effective. The final detection AP is 86.3%, P is 87.3%, and an FPS equaling 26.46 was reached, which meets real-time detection requirements.
At the same time, this paper also has some shortcomings. To ensure the bridge structural health monitoring, bolt positioning detection is the first step, and then identifying bolt loosening and wear is the next direction of our research. In order to monitor the health of bridge structures in real time, we will research the relative position judgment of bolts and nuts, calculations of the bolt rotation angle, bolt material judgments, etc.