Next Article in Journal
Protective Effects of Small Molecular Inhibitors on Steel Corrosion: The Generation of a Multi-Electric Layer on Passivation Films
Previous Article in Journal
Comparison and Optimization of Bearing Capacity of Three Kinds of Photovoltaic Support Piles in Desert Sand and Gravel Areas
Previous Article in Special Issue
Study on Catastrophic Evolution Law of Water and Mud Inrush in Water-Rich Fault Fracture Zone of Deep Buried Tunnel
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Spatial Adaptive Improvement Detection Network for Corroded Bolt Detection in Tunnels

by
Zhiwei Guo
1,
Xianfeng Cheng
2,
Quanmin Xie
3,4,* and
Hui Zhou
3,4
1
Beijing Municipal Development Freeway Construction & Administration Co., Ltd., Beijing 100071, China
2
Beijing MTR Corporation, Co., Ltd., Beijing 100068, China
3
State Key Laboratory of Precision Blasting, Jianghan University, Wuhan 430056, China
4
Hubei Key Laboratory of Blasting Engineering, Jianghan University, Wuhan 430056, China
*
Author to whom correspondence should be addressed.
Buildings 2024, 14(8), 2560; https://doi.org/10.3390/buildings14082560
Submission received: 28 June 2024 / Revised: 23 July 2024 / Accepted: 8 August 2024 / Published: 20 August 2024

Abstract

:
The detection of corroded bolts is crucial for tunnel safety. However, the specific directionality and complex texture of corroded bolt defects make current YOLO series models unable to identify them accurately. This study proposes a spatial adaptive improved detection network (SAIDN), which integrates a spatial adaptive improvement module (SAIM) that adaptively emphasizes important features and reduces interference, enhancing detection accuracy. The SAIM performs a detailed analysis and transformation of features in the spatial and channel dimensions, enhancing the model’s ability to recognize critical defect information. The use of depthwise separable convolutions and adaptive feature reweighting strategies improves detail processing capabilities and computational efficiency. Experimental results show that SAIDN significantly outperforms existing models in detection accuracy, achieving 94.4 % accuracy and 98.5 % recall, surpassing advanced models such as YOLOv9 and Cascade RCNN. These findings highlight the potential of SAIDN in enhancing subway tunnels’ safety and maintenance efficiency.

1. Introduction

Due to its high carrying capacity and efficiency, rail transport has become the main mode of land transportation [1,2]. The lining structure of subway tunnels is fixed by bolts, which support the tunnel structure to ensure the safe operation of the subway, as shown in Figure 1. Bolts play a crucial role in the structure of subway tunnels, as they not only fix the lining structure but also support the stability and safety of the entire tunnel [3,4,5]. However, these bolts are exposed to the air for long periods and subjected to humidity, air pollutants, and other environmental factors, making them susceptible to corrosion [6,7,8,9,10]. The hazards of corroded bolts mainly manifest in the following aspects: (1) Reduced load-bearing capacity: Corrosion weakens the mechanical properties of bolts, significantly reducing their load-bearing capacity. Over time, corrosion decreases bolts’ material strength and toughness, rendering them unable to support the weight and pressure of the tunnel structure effectively. This reduction in load-bearing capacity directly threatens the tunnel’s structural integrity, increasing the risk of deformation or collapse. (2) Compromised structural stability: Corroded bolts not only affect the performance of individual bolts but also cause a chain reaction that impacts the stability of the entire tunnel structure. Bolt corrosion may lead to loosening of the lining structure, causing displacement and deformation of the tunnel walls. In extreme cases, severely corroded bolts may break, resulting in partial or complete tunnel structure failure and serious safety incidents. (3) Accelerated corrosion of surrounding metal components: Corroded bolts create a more aggressive electrochemical environment, accelerating the corrosion process of surrounding metal components. This phenomenon, known as “galvanic corrosion”, reduces the overall durability of the tunnel structure, further increasing maintenance complexity and costs. (4) Difficulties in maintenance and replacement: The generation and accumulation of corrosion products can lead to blocked bolt holes and jammed threads, complicating the replacement and maintenance process. The buildup of rust and other debris makes removing and replacing corroded bolts extremely difficult, often requiring special tools and more manual intervention, thereby increasing maintenance costs and time. (5) Impact on operational efficiency: Frequent inspection and maintenance activities consume substantial resources and disrupt the subway system’s normal operation, causing inconvenience to passengers and operational delays. The economic burden of these maintenance activities is considerable, diverting resources that could have been used for system expansion and upgrades.
In addition, maintaining subway tunnels is extremely complex and challenging. As the tunnel structure gradually ages over long-term operation, ensuring its stability and safety becomes increasingly important [11,12,13,14]. However, tunnel maintenance faces numerous difficulties, increasing the complexity and cost of maintenance work and posing serious threats to the safe operation of tunnels. Specifically, the difficulties in maintaining tunnels are mainly reflected in the following aspects: (1) High costs and low efficiency: Currently, tunnel maintenance mainly relies on manual visual inspection, which is time-consuming and labor-intensive and is susceptible to the training level of work experience and the environmental conditions of the inspection personnel. Typically, inspectors work during nonoperational hours (such as night and early morning), where insufficient light and visual fatigue during nighttime work greatly reduce the accuracy and efficiency of inspections. Under these conditions, a well-trained maintenance team of 10 to 15 people can only inspect two to three kilometers in about three hours, resulting in high costs and low efficiency. (2) Limitations of subjective judgment: Manual inspection is prone to subjective judgment limitations, which may lead to many bolts being misjudged as normal or corroded, affecting the accuracy and timeliness of maintenance decisions. Misjudgment may result in failing to detect and resolve corrosion issues promptly, increasing potential safety risks or causing unnecessary replacements and maintenance, which wastes resources and costs. (3) Environmental impact: Environmental conditions inside the tunnel, such as humidity, temperature, and air quality, can affect inspection results. The high humidity and pollutants inside the tunnel accelerate bolt corrosion, increasing the difficulty of inspection and maintenance. Moreover, the tunnel environment typically has poor lighting and ventilation conditions, which add complexity and challenges to inspection work and pose serious health risks to inspection personnel, such as respiratory problems, visual fatigue, and dehydration or heatstroke due to high humidity and temperature conditions. (4) Hidden and complex nature of corrosion issues: Bolt corrosion is often hidden and complex, especially in critical areas of the tunnel, making it difficult to detect and evaluate. Corrosion usually occurs inside bolts or in small cracks on the surface, which are difficult to observe directly with the naked eye. Additionally, critical areas of the tunnel, such as deep structural connection points, are challenging to access and evaluate due to their hidden location. (5) Safety risks for maintenance personnel: During tunnel maintenance, inspectors face a high-risk work environment, including high humidity, low visibility, and safety hazards associated with nighttime work. Furthermore, the complexity of the tunnel structure poses various potential dangers during the inspection and replacement of corroded bolts, such as falls, collisions, and other accidents. These factors increase the complexity and danger of tunnel maintenance work, posing significant threats to the safety of maintenance personnel.
Given the above difficulties in maintaining tunnels and the potential hazards of corroded bolts, effectively detecting corroded bolts is crucial for preventing structural failures and ensuring the safety of the subway system. Regular and accurate corrosion detection can identify potential safety hazards early, reducing the likelihood of significant safety incidents and protecting the lives of passengers and staff. Moreover, early detection and intervention can prevent the spread of corrosion issues, reducing the frequency and cost of replacements and repairs. Effective detection methods can improve maintenance efficiency and lower labor and material costs.
In recent years, many researchers have focused on applying object detection technology to detect corroded bolts [11,12,13,14,15,16,17,18,19]. Although YOLOv8 [20] and YOLOv9 [21] demonstrate excellent general object detection capabilities, they show some deficiencies in addressing specific issues related to corrosion defects. First, corroded bolt defects exhibit subtle visual differences and often have specific directionality, posing high demands on the spatial sensitivity and directional recognition capabilities of detection algorithms. As advanced object detection frameworks, YOLOv8 and YOLOv9 have significant advantages in feature extraction and rapid detection, but their standard architectures are not specifically designed to capture the detailed features and directional information in images of corrosion defects. Specifically, the YOLO series models lack a natural mechanism to encode the directional information of corroded bolts because they primarily focus on general feature extraction and object classification without targeted optimization to handle the complex spatial structures and directional characteristics of corroded bolts. Detecting these defects requires the model to recognize small defect features and understand their spatial arrangement and direction to accurately locate and identify defect types. Second, the uniqueness of corroded bolt images, including the contrast variations caused by different light absorption rates of materials under various weather conditions and the texture complexity due to different types and degrees of corrosion during use, further increases the difficulty of detection. These factors require the detection model to possess high spatial and directional sensitivity, advanced feature adaptation, and background noise suppression capabilities. Therefore, detecting corroded bolts requires highly sensitive and accurate spatial localization capabilities and the differentiation of defect-specific textures and shapes, where YOLO series models may have limitations in fine feature representation and distinguishing specific defect types. In summary, while YOLO performs excellently in general object detection tasks, its performance is limited in the specific scenario of corroded bolt images due to insufficient directional information encoding and fine-grained feature processing capabilities.
This chapter proposes a spatial adaptive improvement detection network (SAIDN) for corroded bolt image defect detection. SAIDN has significant advantages due to its optimization for corroded bolt image analysis. SAIDN can adaptively emphasize important features, reduce interference from irrelevant information, and enhance detection accuracy. Specifically, this chapter proposes a spatial adaptive improvement module (SAIM), which performs a detailed analysis and transformation of features in both spatial and channel dimensions, adopting various methods. Its advantage lies in its comprehensive structure, including the spatial feature conversion layer (SFCL), direction improvement layer (DIL), and weighted allocation layer (WAL), which collectively enhance the model’s ability to recognize critical defect information in specific spatial structures. This modular approach can effectively extract and identify defects by finely analyzing input features and improving the model’s sensitivity to spatial structures and ability to recognize defects in complex backgrounds. Additionally, incorporating trigonometric functions further enhances the model’s natural ability to encode directional information and effectively handle periodic patterns, making it particularly suitable for identifying corroded bolts with directional or periodic characteristics. Furthermore, by strategically integrating depthwise separable convolutions and adaptive feature reweighting, the SAIM emphasizes the most critical features in the image, thereby improving the model’s accuracy in defect detection. Moreover, the SAIM can seamlessly integrate with existing deep learning architectures, making it a powerful and flexible tool to enhance corroded bolt image detection and recognition capabilities. Finally, the design of the depthwise separable convolutions in the SAIM enhances the model’s ability to handle details and optimizes computational efficiency, enabling SAIDN to maintain high-performance detection while achieving high operational efficiency. In summary, by integrating the SAIM, SAIDN has significant advantages in the recognition and localization of corroded bolts and performs excellently in processing efficiency and model robustness, providing an effective and precise solution for corroded bolt image analysis.
Our contributions are summarized as follows:
  • We propose SAIDN, which is specifically designed for corroded bolt defect detection. SAIDN integrates the SAIM, can adaptively emphasize important directional features, and reduces interference, significantly improving detection accuracy.
  • We propose the SAIM, which performs detailed analysis and transformation of features in spatial and channel dimensions, optimizing the recognition capability of specific spatial structures in corroded bolts. Its structure includes SFCL, DIL, and WAL, which collectively enhance the model’s ability to recognize critical defect information and identify defects in complex backgrounds.
  • We conduct comprehensive experiments demonstrating that SAIDN significantly surpasses current advanced object detection algorithms.

2. Related Work

2.1. Corroded Bolt Detection Based on One-Dimensional Signal

In recent years, corroded bolt detection technology has played an important role in infrastructure maintenance. This paper reviews several important detection methods, including acoustic emission, ultrasonic technology, piezoelectric sensor technology, and image recognition technology, and analyzes their respective advantages, disadvantages, and application scenarios in detail.
The acoustic emission method identifies corrosion and other defects by detecting sound waves generated by stress release within the material. This method performs excellently in detecting corrosion on bolt heads. Wang et al. [6] proposed a bolt head corrosion detection technology under external vibration based on an entropy-enhanced acoustic emission method, improving the sensitivity and accuracy of corrosion detection. Wang et al. [7] studied the identification of multiple bolt head corrosion and proposed an acoustic-ultrasonic method based on linear and nonlinear shape features, improving the accuracy of corrosion detection. Piezoelectric sensor technology utilizes the piezoelectric effect of piezoelectric ceramic materials to monitor stress changes at the bolt connection through sensors, thereby detecting corrosion and damage. Cui et al. [8] studied the use of active sensing methods with piezoelectric sensors to monitor corrosion damage at bolt connections, achieving early detection and assessment of corrosion. The advantages of this method include high sensitivity and real-time monitoring capabilities, but it also has drawbacks such as complex installation and high costs. Ultrasonic technology uses the propagation characteristics of high-frequency sound waves in the material to identify corrosion and other defects by detecting changes in reflected and transmitted waves. Lee et al. [9] conducted a blind test for bolt corrosion detection using ultrasonic technology, which can effectively detect internal corrosion defects in bolts. However, its detection accuracy is affected by the surface condition of the material and the operational technique. Wu et al. [10] studied the application of ultrasonic technology in detecting stress corrosion cracking in cable bolts, verifying the effectiveness of ultrasonic technology in detecting bolt corrosion in high-stress environments. In summary, acoustic emission has high detection sensitivity and is suitable for corrosion detection in complex environments; piezoelectric sensor technology is known for its high sensitivity and real-time monitoring capabilities but comes with high costs; and ultrasonic technology excels in detecting internal defects but is significantly affected by operational technique and material surface conditions.
Compared with the above technologies, the advantages of image recognition in corroded bolt detection are mainly reflected in its high precision, high efficiency, and automation capabilities. Firstly, image recognition technology can quickly process and analyze large volumes of image data, greatly improving detection efficiency. Secondly, through advanced algorithms and models, image recognition can accurately identify and locate corrosion areas, reducing subjective errors in manual judgment and improving the reliability and consistency of detection. Additionally, image recognition technology can operate stably in harsh and complex environments, with strong adaptability, effectively meeting detection needs under different lighting and background conditions. Finally, image recognition technology can automate the detection process, reducing labor costs and safety risks for maintenance personnel, providing strong technical support for the daily maintenance and management of infrastructure.

2.2. Corroded Bolt Detection Based on Two-Dimensional Images

Current deep learning-based visual bolt corrosion detection technology has developed rapidly [1,5,11,12,13,14,15,16,17,18,19]. Ta et al. [15] used regional convolutional neural network (RCNN)-based deep learning and a Hough line transform (HLT) algorithm to monitor corroded and loosened bolts in steel structures. Cha et al. [16] proposed an autonomous structural visual inspection method using an RCNN model for real-time damage detection, covering concrete cracks, steel corrosion, bolt corrosion, and steel delamination. This method significantly improves the model’s ability to identify various types of damage in complex backgrounds by introducing a region extraction network and convolutional neural networks. Ta et al. [17] used Mask RCNN to monitor and identify the degree of bolt corrosion in well-lit laboratory steel structures. Their innovation lies in achieving the precise localization of corrosion areas and quantitative assessment of the degree of corrosion through instance segmentation technology, significantly improving detection accuracy. Suh et al. [18] employed a Faster RCNN-based model to detect and locate various types of damage, including bolt corrosion. This method uses a region proposal network (RPN) to generate candidate regions and subsequent classification and regression steps to accurately identify various types of damage, including bolt corrosion. However, the aforementioned methods are two-stage models with slower inference speeds during application deployment, failing to meet the requirements for real-time detection. Additionally, in practice, it is unnecessary to sacrifice speed and cost to distinguish the target pixels of corroded bolts precisely.
Another branch of single-stage object detection algorithms, YOLO, accelerates the training and detection process. YOLO converts the object detection task into a regression problem, directly predicting the bounding boxes and categories of objects in the image, thereby significantly improving detection speed. Yang et al. [22] introduced a bolt-loosening detection method by combining the manual torque method with various versions of the YOLO, and the experiments showed that the method achieved good experimental indexes with strong application value in the scenario of using smartphones. Tan et al. [1] proposed an ensemble learning method that combines the improved multiscale Retinex with color restoration (IMSRCR) and YOLO to detect corroded bolts in linings based on actual tunnel image data. This method enhances image contrast and detail through multiscale Retinex and, combined with YOLO’s efficient detection capabilities, achieves accurate detection of corroded bolts in tunnel images. Subsequently, Tan et al. [11] proposed DMDSNet, which achieves two parallel tasks of bolt detection and pixel-level rust segmentation. DMDSNet optimizes both detection and segmentation tasks through a multitask learning framework, improving the overall performance of the model.
Despite significant progress in these object detection algorithms, there are still shortcomings in addressing the directionality of corroded bolts. Current methods mainly focus on the detection and classification of corrosion areas, neglecting the directional characteristics of bolt corrosion. Bolt corrosion often has directionality, which is important in actual maintenance and repair work. Therefore, future research can further explore how to incorporate directional features into object detection algorithms to improve the accuracy and practicality of corroded bolt detection.

2.3. Object Detection

In recent years, deep learning has become an important tool for solving big data problems due to its powerful ability to process complex patterns [20,21,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39]. It has been widely used in various fields and has shown its potential to solve many pressing problems. The two-stage detection network based on candidate regions has greatly improved the detection speed and accuracy through a series of improvements such as Fast R-CNN [23], Faster R-CNN [24], Cascade R-CNN [25], Dynamic RCNN [26], Grid RCNN [27], Sparse RCNN [28], FCOS [29], and Tridentnet [31]. These advancements have leveraged innovative techniques like region proposal networks, cascade structures, dynamic adjustments, grid-based predictions, sparsity constraints, and fully convolutional object detection approaches to enhance the robustness and precision of detection models. Meanwhile, the YOLO series one-stage detection algorithm achieves rapid and accurate image detection in an end-to-end manner, continuously optimizing from YOLOv1 to YOLOv9 [20,21,32,33,34,35,36,37,38,39] by introducing techniques such as batch normalization, residual structures, multiscale feature fusion, and reparameterization strategies, significantly improving detection performance and training efficiency. YOLOv8 [20] adopts new convolution and data augmentation techniques, while YOLOv9 [21] introduces programmable gradient information and auxiliary reversible branch design, further enhancing the model’s feature retention capability and training efficiency, optimizing detection performance. These enhancements in the YOLO series have enabled high accuracy in real-time detection, making it a preferred choice for many practical applications.
In recent years, Transformer-based object detection methods have also garnered attention. Among them, Deformable DETR [40] improves the network training convergence speed and addresses the slow convergence issue of the original DETR by using a new attention module to process image feature maps. DINO [41] reduces duplicate predictions and improves prediction accuracy through dynamic anchor boxes and contrastive denoising training. DDQ DETR [42] selects unique queries for one-to-one assignments based on dense queries, combining the advantages of traditional and end-to-end detectors. However, Transformer models usually require a large amount of training data and high computational resources, which is a significant limiting factor given the data volume in defect detection and the parameter constraints in industrial applications. Therefore, in photovoltaic defect detection, this paper explores the application and optimization of a two-stage detection network based on Faster R-CNN architecture, aiming for high-precision detection, complex background processing, and small-target defect recognition in the photovoltaic defect field.
Although deep learning object detection algorithms have achieved great success, they still face challenges and certain unavoidable limitations in industrial scenarios of corroded bolt detection. Firstly, when dealing with various corroded bolts, deep learning algorithms rely on existing backbone feature extraction networks designed based on common datasets, which have relatively weak directional awareness and difficulty flexibly responding to multiple defect types. These networks often lack the specialized capability to distinguish subtle differences and orientations specific to corrosion patterns on bolts. Secondly, the variability in lighting conditions, corrosion severity, and environmental factors pose additional challenges, as the models trained on standard datasets may not generalize well to these diverse real-world conditions. In conclusion, while deep learning has revolutionized the field of object detection and made significant strides in applications such as corroded bolt detection, ongoing research and innovation are required to address these challenges. Future work could focus on developing more specialized feature extraction networks that incorporate directional awareness, enhancing model robustness to varying conditions and optimizing algorithms for deployment on resource-constrained devices. By addressing these challenges, it will be possible to further improve the accuracy, efficiency, and applicability of deep learning-based corroded bolt detection systems in industrial settings.

3. Network Structure Introduction

3.1. Spatial Adaptive Improvement Detection Network

The corroded bolt defect detection model backbone network follows the design of the YOLOv8 series, as shown in Figure 2 and Figure 3. The backbone network consists of CBS and C 2 f module structures, with an SPPF structure added at the end to obtain receptive fields of different sizes. The SPPF can adapt to handling objects of different scales, thereby enhancing the model’s ability to adapt to images of different resolutions. The C 2 f module, as the core of the network, adjusts the number of channels through the 1 × 1 CBS module on one hand and performs deep feature extraction through the 1 × 1 CBS module and subsequent multiple 3 × 3 CBS modules on the other hand, finally integrating these features to enhance the network’s learning and robustness. The C2f employs the methods of expansion, shuffle, and merge cardinality to enhance the network’s learning ability without disrupting the original gradient path. The design of C2f focuses on how to improve the parameter utilization and learning ability of the network by increasing the number of stacked computational blocks without affecting the original gradient path.
The requirements for recognizing subtle features of corroded bolt images are extremely high in high-precision corroded bolt detection. This demands that the detection model not only capture deep features of the image but also have a more refined perception of the spatial distribution and directional sensitivity of these features. Based on this need, SAIDN is specifically designed to improve the detection accuracy of defects in corroded bolt images. Specifically, this network adds the SAIM to the last two layers of the C2f module in the backbone network to further enhance detection performance. Through its efficient feature learning mechanism, the C2f module can effectively extract key information from bolt images, but its performance still needs improvement when dealing with extremely complex or minute defects. This is because detecting these defects relies not only on the deep extraction of features but also on the model’s sensitivity to spatial details of features and the accurate recognition of defect direction. Therefore, by introducing the SAIM after the C2f module, the network’s understanding of corroded bolts’ spatial and directional characteristics can be further enhanced. The SAIM focuses on improving the model’s spatial sensitivity to corroded bolts, enabling it to more accurately capture the location, shape, and size of defects while effectively identifying the direction of defects. This enhancement is based on extracting deep features and, more importantly, analyzing the spatial relationship and directionality of these features, significantly improving the model’s ability to recognize complex defects. By integrating the SAIM into the end of the backbone network, this chapter achieves a significant optimization of the corroded bolt detection model. This design improves the model’s detection accuracy for corroded bolts and enhances the model’s adaptability to different types and scales of defects. SAIDN’s architecture is built.

3.2. Spatial Adaptive Improvement Module

In deep learning, especially in image processing and pattern recognition tasks, effectively extracting and utilizing feature information has become key to improving model performance. Particularly when processing images with complex backgrounds and subtle feature changes, such as corroded bolt defect images, existing single-stage networks overlook the fact that much corroded bolt information is lost when input data undergo layer-by-layer feature extraction and spatial transformation. To this end, this chapter proposes a novel SAIM to enhance feature representation in deep convolutional networks by utilizing spatial location information. SAIM effectively improves the performance of corroded bolt target detection models when handling complex image tasks. The core idea of this module is to perform fine-grained analysis and transformation of input features in spatial and channel directions to capture key defect information in bolt images and enhance the model’s sensitivity to specific spatial structures
As shown in Figure 4, the design concept of the SAIM is based on the detailed analysis and transformation of input features in both spatial and channel dimensions, aiming to capture key defect information in bolt images and improve the model’s perception capability for specific spatial structures. The module mainly includes three key components: SFCL, DIL, and WAL. These parts work together to enhance the model’s ability to extract and recognize corroded bolt information. Given the input feature map X R B × C × H × W , where B , C , H , and W represent the batch size, number of channels, height, and width, respectively, the specific process of the SAIM is as follows:
(1)
SFCL: First, the input feature map X undergoes three independent convolutional layers (denoted as f c δ , f c w , and f c c ), which transform the input features in the height, width, and channel directions to extract spatial information from different dimensions. At the same time, depending on the model’s mode, either 1 × 1 convolution or 3 × 3 depthwise separable convolution is chosen to extract spatial features further.
X δ = f c δ ( X ) , X w = f c w ( X ) , X c = f c c ( X )
(2)
DIL: Then, directional enhancement of the original features is performed in the height and width directions. The introduction of cosine and sine functions provides a natural way to encode directional information and increases the ability to process periodic patterns through their inherent periodic properties. This is particularly useful for understanding and recognizing inspection objects with significant directionality or periodic characteristics. Furthermore, this method improves the model’s sensitivity to subtle changes in images, enabling the model to detect and recognize targets in complex backgrounds more accurately.
X h = X h × cos θ h ( X ) , X h × sin θ h ( X ) X w = X w × cos θ w ( X ) , X w × sin θ w ( X )
where θ h and θ w are learnable transformations through training, extracting spatial directional information from the input features X.
The output after spatial feature transformation is processed by depthwise separable convolutions (denoted as t f c Δ , t f c w ). This step effectively captures the spatial dependencies of the image in the vertical and horizontal directions by applying 1 × 7 and 7 × 1 convolution kernels, respectively.
H = t f c δ X δ , W = tf c w X w
(3)
WAL: The output of the directional refinement perception layer is fused with the output of the feature transformation in the channel direction. The fused features are reweighted through adaptive average pooling and an MLP network. The weights output by the MLP network are used to adaptively reweight different features, thus emphasizing the most important features in the image.
C = X C , A = F . avg _ pool ( H + W + C ) , A = softmax ( A )
where A is the feature reweighting coefficient, M L P represents the multilayer perceptron network, and softmax is used to normalize the weights.
Finally, the reweighted features are mapped back to the original feature space through a convolution layer, and dropout is applied for regularization to prevent overfitting.
Y = proj H × A 0 + W × A 1 + C × A 2
where Y is the output feature map of the SAIM, and proj represents the convolution layer. The design of WAL ensures that the SAIM can effectively integrate with existing deep learning architectures.

3.3. Backbone Structure

The backbone network adopts the C2f module to replace the traditional C3 [36], enhancing the efficiency of feature extraction and gradient flow. This design brings more skip connections and additional split operations, as shown in Figure 5. The C 2 f module improves the flow of information and gradient propagation of features through enhanced skip connections, helping to mitigate the problem of gradient vanishing. The split operation may be used for functional differentiation, allowing the network to learn different feature representations on different branches. This improvement promotes better feature integration; although it may increase hardware dependency, it can significantly improve performance in most cases. Subsequently, the number of C2f modules in the backbone network is adjusted, changing the number of C2f modules from “3-6-9-3” to “3-6-6-3”, aiming to balance the model’s complexity and performance, optimizing the efficiency of the feature extraction stage. Secondly, the neck module is simplified by removing two convolutional connection layers, possibly to reduce the number of parameters and computational burden or to simplify the feature transmission path, thereby improving the operational efficiency of the model.

3.4. Loss Function

The SAIDN model uses a more flexible anchor-free approach, simplifying the model’s prediction mechanism, as shown in Figure 6. It can be seen that the previous objectness branch is no longer present, leaving only decoupled classification and regression branches. The regression branch uses the integral representation method proposed in distribution focal loss (DFL). The SAIDN model introduces the task-aligned assigner (TAA) strategy and DFL to improve the quality of positive samples and optimize the learning efficiency of difficult-to-classify samples, making model training more effective.
The matching strategy of TAA can be simply summarized as follows: selecting positive samples based on the weighted score of classification and regression scores.
t = s α + u β
where s is the predicted score corresponding to the labeled category and u is the IoU between the predicted box and the ground-truth (GT) box. The product of these two values measures the alignment degree.
DFL introduces the concept of probability distribution. Let the true class label y follow a Bernoulli distribution Ber ( μ ) and the predicted probability p follow a beta distribution Beta:
D K L ( Ber ( μ ) Beta ( α , β ) ) = μ ln μ m + ( 1 μ ) ln 1 μ 1 m
where m = α α + β is the mean of the beta distribution. To introduce the hyperparameter γ from focal loss, we generalize the KL divergence, resulting in the final form of DFL:
D F L ( μ , α , β ; γ ) = μ ln μ γ m γ + ( 1 μ ) ln ( 1 μ ) γ ( 1 m ) γ
where μ is the parameter of the Bernoulli distribution Ber ( μ ) that the true class label y follows, representing the probability that the true label is 1; α , β denotes the two parameters of the beta distribution Beta ( α , β ) that the predicted probability p follows, controlling the shape of the distribution; m is the mean of the beta distribution, defined as m = α α + β ; and γ is the focusing parameter used to adjust the weight of hard examples.
Classification loss and confidence loss both use binary cross-entropy loss:
L B C E = 1 n i = 1 n y i ln x i + 1 y i ln 1 x i
where n is the total number of samples, y i represents the true label value of sample i, and x i represents the model’s predicted value, which is typically the value processed by the Sigmoid function.
The bounding box loss uses the CIoU loss function:
C I o U = I o U ρ 2 ( b , b g t ) c 2 + α v s .
where α is a weighting function and v is used to measure the consistency of aspect ratio:
α = v ( 1 I o U ) + v
v = 4 π 2 arctan w g t h g t arctan w h 2
Finally, the CIoU loss function is defined as
L C I o U = 1 I o U + ρ 2 ( b , b g t ) c 2 + α v
The WCIOU function is an advanced version of the CIoU loss function. It incorporates additional components such as angle and aspect ratio consistency, which are useful for more accurately predicting bounding boxes in object detection tasks. The WCIOU loss function can be expressed as
L WCIOU = 1 WCIOU ( b , b g t )
where WCIOU ( b , b g t ) is defined as
WCIOU ( b , b g t ) = IoU ( b , b g t ) ρ 2 ( b , b g t ) c 2 α v

4. Experimental Results and Analysis

4.1. Dataset and Evaluation Metrics

We utilized the MS100 dataset [11] for our experiments. This dataset comprises corroded bolt images taken from a Beijing subway service tunnel, totaling 1441 images, each with a resolution of 640 × 640 pixels. For evaluation purposes, we randomly selected 720 images to form the test set, on which we assessed the performance of our proposed model.

4.2. Implementation Details

The model was trained in an environment with the following hardware configuration: Ubuntu 22.04 operating system, Intel Core i9-13900K processor, and NVIDIA GeForce GTX 3090 Ti graphics card. The model was developed using the PyTorch 1.9.0 framework. For optimal training, the initial learning rate was set at 0.001 and adjusted using a cosine learning rate decay strategy with a decay factor of 0.9. This method facilitates rapid convergence in the initial training phases and fine-tuning of model parameters in later stages. Stochastic gradient descent was utilized for parameter updates, incorporating a weight decay coefficient of 0.0001 and a momentum value of 0.999. This setup helps mitigate overfitting while ensuring stability and efficiency during training. The input image size for the model was 640 × 640 pixels. Given the hardware configuration and memory constraints, the batch size was set to 8. A total of 300 training iterations were performed to ensure the model sufficiently learned the data features and achieved robust generalization.

4.3. Comparative Experimental Results

To comprehensively evaluate the performance of our proposed SAIDN method, we designed and conducted a series of comparative experiments involving various object detection algorithms. The experiments compared the detection performance, operational efficiency, and real-world applicability of different methods from multiple dimensions. Specifically, we compared the detection effectiveness of each method across six key metrics: precision, recall, F1 score, [email protected], [email protected]:0.95, and mIOU. Additionally, we assessed model complexity through parameter count (Param.), floating point operations (FLOPs), and inference time (Inf.) to analyze computational efficiency. To provide a more intuitive understanding of the detection performance, we plotted PR curves and compared actual detection results. These comprehensive experiments aimed to validate the advantages of SAIDN in object detection tasks and provide strong evidence for its potential in practical applications.
In this study, our proposed SAIDN method demonstrated significant advantages across multiple performance metrics, as shown in Figure 7, Figure 8 and Table 1. As shown in Table 1, SAIDN consistently outperformed existing methods in precision, recall, F1 score, [email protected], [email protected]:0.95, and mIOU. Specifically, SAIDN achieved a precision of 94.4%, which is significantly higher than other methods such as YOLOv9 with 81.0% and Cascade RCNN with 78.5%. This indicates that SAIDN has superior performance in reducing false positives, meaning it can more accurately identify true targets without incorrectly identifying nontargets. Additionally, SAIDN’s recall was 98.5%, far exceeding other methods like YOLOv9 with 95.2% and Dynamic RCNN with 91.8%. This demonstrates SAIDN’s comprehensive ability to detect all targets, capturing almost all true instances. The high recall value underscores SAIDN’s exceptional capability in ensuring that actual targets are not missed. In terms of the F1 score, which balances precision and recall, SAIDN reached 97.2%, markedly higher than YOLOv9 with 90.3% and YOLOX with 90.4%. This reflects SAIDN’s outstanding performance in maintaining high accuracy while also ensuring that the majority of true targets are detected. The F1 score highlights SAIDN’s ability to effectively balance between precision and recall, making it highly reliable for practical applications. For [email protected], which measures detection accuracy at a low threshold, SAIDN achieved 97.1%, surpassing all compared methods such as YOLOv9 with 95.8% and Tridentnet with 94.2%. This indicates that SAIDN can accurately detect objects even with a lower threshold, ensuring high detection rates. For the more challenging [email protected]:0.95 metric, SAIDN achieved 53.8%, significantly higher than YOLOv9 with 50.8% and Cascade RCNN with 47.0%. This demonstrates that SAIDN maintains high detection performance across various IoU thresholds, accommodating a broader range of detection needs and making it versatile for different applications. Finally, SAIDN’s mean intersection over union (mIOU) was 68.5%, a notable improvement over other methods such as YOLOv9 with 63.5% and Tridentnet with 62.5%. This indicates superior target localization accuracy, showing that SAIDN can more precisely delineate the boundaries of detected objects, which is crucial for tasks requiring detailed object localization. In summary, SAIDN’s superior performance across these metrics demonstrates its effectiveness and reliability in object detection tasks. Its ability to achieve high precision, recall, and F1 scores, along with outstanding performance in [email protected], [email protected]:0.95, and mIOU, makes it a robust and versatile solution for various detection needs.
To further validate the performance of each method, we plotted PR curves, Histogram and Radar chart, as shown in Figure 9, Figure 10 and Figure 11. As shown in Table 1 and Figure 9, Figure 10 and Figure 11, it is evident that SAIDN consistently achieved higher precision across different recall rates than other methods, further proving its robustness and stability under varying detection conditions. The PR curve performance further corroborates SAIDN’s advantage in maintaining high precision even at high recall rates. Additionally, we compared the detection results of different methods, as shown in Figure 7 and Figure 8, which showcase the detection performance of each method in real-world scenarios. It is clear that SAIDN accurately detects targets and maintains high detection precision and localization accuracy in complex backgrounds. Compared to other methods, SAIDN excels in handling occlusion and complex backgrounds, validating its potential and reliability in practical applications.
We also compared the operational parameters of different methods, as shown in Table 2. Table 2 presents the performance of each method in terms of parameter count (Param.), floating point operations (FLOPs), and inference time (Inf.). Specifically, SAIDN’s parameter count is 5.3 million, significantly smaller than most other methods, such as Cascade RCNN with 68.93 million and YOLOv9 with 68.6 million, indicating a clear advantage in model complexity. A smaller parameter count means the model demands fewer storage and computational resources, beneficial for deployment in resource-constrained environments. Meanwhile, SAIDN’s FLOPs is 28.6 billion, lower than other methods like Grid RCNN with 204.49 billion and Tridentnet with 769.14 billion, demonstrating superior computational efficiency. Lower FLOPs indicates less computational load during inference, contributing to faster inference speeds and reduced energy consumption. More importantly, SAIDN’s inference time is 6.0 milliseconds, significantly faster than other methods such as Cascade RCNN with 22.6 milliseconds and Tridentnet with 35.6 milliseconds, indicating higher processing speed and real-time performance in practical applications. This performance metric is critical, especially in real-time response applications where SAIDN’s rapid inference capability can significantly enhance system responsiveness and user experience.
In summary, SAIDN exhibits excellent performance across all evaluated metrics, validating its effectiveness and robustness in object detection tasks. Compared to existing methods, SAIDN offers significant advantages in reducing false positives, improving detection comprehensiveness, and enhancing localization accuracy, better meeting the demands of practical applications. Additionally, the visualized PR curves and detection results further demonstrate SAIDN’s superior performance, showcasing its potential and reliability in real-world applications. Moreover, in operational parameter comparisons, SAIDN’s smaller parameter count, lower FLOPs, and extremely fast inference time further highlight its efficiency and practicality, providing strong support for its application in real-world scenarios.

4.4. Ablation Study Analysis

In this section, we propose the SAIDN for corroded bolt image defect detection. SAIDN has significant advantages due to its optimization for corroded bolt image analysis, allowing it to adaptively emphasize important features, reduce interference from irrelevant information, and enhance detection accuracy. The core component of SAIDN is the SAIM, which performs detailed analysis and transformation of features in both spatial and channel dimensions using various methods. The SAIM comprises three key layers: SFCL, DIL, and WAL. These layers collectively enhance the model’s ability to recognize critical defect information in specific spatial structures.
The ablation study results, as presented in Table 3 and Figure 12, illustrate the impact of each component on the overall performance of the SAIDN model. We systematically enabled and disabled the SFCL, DIL, and WAL components to assess their individual and combined effects, as follows:
(1)
When none of the components (SFCL, DIL, and WAL) were enabled, the baseline model achieved an [email protected] of 95.2%, [email protected]:0.95 of 50.1%, and mIOU of 63.0%. This serves as the foundational performance level for comparison.
(2)
When only DIL and WAL were enabled, the model’s [email protected] improved to 96.3%, [email protected]:0.95 increased to 51.4%, and mIOU rose to 66.3%. This demonstrates that DIL and WAL significantly enhance detection accuracy and precision by focusing on directional features and allocating weights effectively.
(3)
With SFCL and WAL enabled but DIL disabled, the model achieved an [email protected] of 94.5%, [email protected]:0.95 of 49.0%, and mIOU of 65.7%. Despite the slight drop in [email protected] and [email protected]:0.95, the mIOU showed improvement, indicating that WAL enhances spatial feature allocation even without DIL.
(4)
Enabling SFCL and DIL while disabling WAL resulted in an [email protected] of 95.9%, [email protected]:0.95 of 52.0%, and mIOU of 67.6%. These results highlight the importance of WAL in overall performance, as the absence of WAL still allowed for significant gains, showing the strong impact of SFCL and DIL on spatial feature transformation and directional improvement.
(5)
When all components (SFCL, DIL, and WAL) were enabled, the model achieved its highest performance with an [email protected] of 97.1%, [email protected]:0.95 of 53.8%, and mIOU of 68.5%. This confirms that the combined effect of all three components results in the best performance, demonstrating their complementary nature and the overall robustness they provide to the SAIDN model.
SFCL selectively enhances important feature channels while suppressing irrelevant ones, improving feature extraction effectiveness. The ablation results indicate that disabling SFCL results in decreased [email protected]:0.95 and mIOU, underscoring SFCL’s critical role in optimizing feature selection. DIL dynamically adjusts feature representations for different instances, enhancing the model’s detection flexibility and robustness. The results show that disabling DIL significantly lowers [email protected] and mIOU, highlighting its importance in overall detection accuracy and precision, particularly at higher IoU thresholds. WAL assigns different weights to focus on important regions, enhancing feature representation. Disabling WAL, even with the other two components enabled, results in reduced performance, emphasizing WAL’s crucial role in improving detection effectiveness, especially in complex backgrounds.
Through this detailed ablation study, we clarified the specific contributions of each component to the SAIDN model’s performance. Each component optimizes the model in different aspects, and their combined usage achieves the best performance, validating the effectiveness and robustness of SAIDN in corroded bolt defect detection.

5. Conclusions

In this study, we proposed the SAIDN for the defect detection of corroded bolts in images. By introducing the SAIM, we significantly enhanced the model’s ability to identify corroded bolt defects in complex backgrounds. The SAIM meticulously analyzes and transforms features in both spatial and channel dimensions, effectively boosting the model’s directional sensitivity and feature reweighting capability, thereby improving detection accuracy. Experimental results demonstrate that SAIDN exhibits exceptional performance in corroded bolt defect detection tasks. Compared to existing state-of-the-art object detection models, SAIDN achieved an AP of 97.1%, significantly surpassing other models. Additionally, SAIDN excelled in terms of parameter count and computational complexity, with only 5.3M parameters and 28.6 GFLOPs, showcasing efficient computational performance and low resource consumption.The ablation study further validated the contributions of each component of the SAIM to overall performance. Specifically, the spatial feature cross-level (SFCL), directional information learning (DIL), and weighted attention learning (WAL) each played critical roles in enhancing the model’s ability to capture and recognize defect information. Notably, the DIL component was crucial in improving directional sensitivity and handling periodic features, with its removal resulting in the most significant performance decline. In summary, SAIDN, by integrating the SAIM, demonstrates clear advantages in recognizing and locating corroded bolts, achieving new heights in detection accuracy while also exhibiting excellent processing efficiency and model robustness. Our research provides an effective and precise solution for corroded bolt image analysis, offering important design references for future research and model optimization in practical applications.

6. Future Work

Future work will continue to focus on further optimizing the model structure, enhancing detection performance, and exploring its application potential in other defect detection tasks. We aim to develop more advanced techniques for improving the robustness and efficiency of SAIDN, ensuring it can be effectively applied to a broader range of real-world scenarios. This includes investigating novel approaches for feature extraction, leveraging advanced machine learning algorithms, and integrating additional contextual information to further refine detection capabilities. Additionally, the potential for deploying SAIDN in resource-constrained environments will be explored, seeking to balance performance with computational efficiency to extend its applicability. Through these efforts, we aim to solidify SAIDN’s position as a leading solution in the field of defect detection, contributing valuable insights and tools for both academic research and industrial applications.

Author Contributions

Conceptualization, Z.G.; Software, Z.G.; Validation, X.C.; Investigation, X.C.; Resources, Z.G.; Data curation, X.C.; Writing—original draft, Z.G.; Writing—review & editing, Q.X. and H.Z.; Supervision, Q.X. and H.Z.; Funding acquisition, Q.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by 2023 Wuhan Knowledge Innovation Special Basic Research Project (No. 2023020201010147), National Natural Science Foundation of China (No.52378399) and 2022 Scientific Research Starting Foundation for Doctors of Hubei (Wuhan) Institute of Explosion and Blasting Technology (Grant No. PBSKL-2022-QD-03).

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to private dataset).

Conflicts of Interest

Author Zhiwei Guo was employed by the Beijing Municipal Development Freeway Construction & Administration Co., Ltd. Author Xianfeng Cheng was employed by the Beijing MTR Corporation, Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Tan, L.; Tang, T.; Yuan, D. An ensemble learning aided computer vision method with advanced color enhancement for corroded bolt detection in tunnels. Sensors 2022, 22, 9715. [Google Scholar] [CrossRef] [PubMed]
  2. Hu, X.; Cao, Y.; Sun, Y.; Tang, T. Railway automatic switch stationary contacts wear detection under few-shot occasions. IEEE Trans. Intell. Transp. Syst. 2021, 23, 14893–14907. [Google Scholar] [CrossRef]
  3. Yuan, Y.; Jiang, X.; Liu, X. Predictive maintenance of shield tunnels. Tunn. Undergr. Space Technol. 2013, 38, 69–86. [Google Scholar] [CrossRef]
  4. Huang, H.; Shao, H.; Zhang, D.; Wang, F. Deformational responses of operated shield tunnel to extreme surcharge: A case study. Struct. Infrastruct. Eng. 2017, 13, 345–360. [Google Scholar] [CrossRef]
  5. Tan, L.; Hu, X.; Tang, T.; Yuan, D. A lightweight metro tunnel water leakage identification algorithm via machine vision. Eng. Fail. Anal. 2023, 150, 107327. [Google Scholar] [CrossRef]
  6. Wang, F.; Zhu, R. Detection of bolt head corrosion under external vibration using a novel entropy-enhanced acoustic emission method. Nonlinear Dyn. 2022, 108, 3807–3816. [Google Scholar] [CrossRef]
  7. Wang, F. Identification of multi-bolt head corrosion using linear and nonlinear shapelet-based acousto-ultrasonic methods. Smart Mater. Struct. 2021, 30, 085031. [Google Scholar] [CrossRef]
  8. Cui, E.; Zuo, C.; Fan, M.; Jiang, S. Monitoring of corrosion-induced damage to bolted joints using an active sensing method with piezoceramic transducers. J. Civ. Struct. Health Monit. 2021, 11, 411–420. [Google Scholar] [CrossRef]
  9. Lee, R.; Collett, N.; Burch, S. Stud bolt corrosion inspection blind trials using ultrasonic techniques. Insight-Non-Destr. Test. Cond. Monit. 2012, 54, 327–330. [Google Scholar] [CrossRef]
  10. Wu, S.; Chen, H.; Ramandi, H.L.; Hagan, P.C.; Hebblewhite, B.; Crosky, A.; Saydam, S. Investigation of cable bolts for stress corrosion cracking failure. Constr. Build. Mater. 2018, 187, 1224–1231. [Google Scholar] [CrossRef]
  11. Tan, L.; Chen, X.; Hu, X.; Tang, T. DMDSNet: A Computer Vision-based Dual Multi-task Model for Tunnel Bolt Detection and Corrosion Segmentation. In Proceedings of the 2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC), Bilbao, Spain, 24–28 September 2023; pp. 4827–4833. [Google Scholar]
  12. Tan, L.; Chen, X.; Yuan, D.; Tang, T. DSNet: A Computer Vision-Based Detection and Corrosion Segmentation Network for Corroded Bolt Detection in Tunnel. Struct. Control. Health Monit. 2024, 2024, 1898088. [Google Scholar] [CrossRef]
  13. Lama, B.; Momayez, M. Review of non-destructive methods for rock bolts condition evaluation. Mining 2023, 3, 106–120. [Google Scholar] [CrossRef]
  14. Zhang, C.; Chen, X.; Liu, P.; He, B.; Li, W.; Song, T. Automated detection and segmentation of tunnel defects and objects using YOLOv8-CM. Tunn. Undergr. Space Technol. 2024, 150, 105857. [Google Scholar] [CrossRef]
  15. Ta, Q.B.; Kim, J.T. Monitoring of corroded and loosened bolts in steel structures via deep learning and Hough transforms. Sensors 2020, 20, 6888. [Google Scholar] [CrossRef] [PubMed]
  16. Cha, Y.J.; Choi, W.; Suh, G.; Mahmoudkhani, S.; Büyüköztürk, O. Autonomous structural visual inspection using region-based deep learning for detecting multiple damage types. Comput.-Aided Civ. Infrastruct. Eng. 2018, 33, 731–747. [Google Scholar] [CrossRef]
  17. Ta, Q.B.; Huynh, T.C.; Pham, Q.Q.; Kim, J.T. Corroded bolt identification using mask region-based deep learning trained on synthesized data. Sensors 2022, 22, 3340. [Google Scholar] [CrossRef] [PubMed]
  18. Suh, G.; Cha, Y.J. Deep faster R-CNN-based automated detection and localization of multiple types of damage. In Proceedings of the Sensors and Smart Structures Technologies for Civil, Mechanical, and Aerospace Systems, SPIE 2018, Denver, CO, USA, 5–8 March 2018; Volume 10598, pp. 197–204. [Google Scholar]
  19. Yanan, S.; Hui, Z.; Li, L.; Hang, Z. Rail surface defect detection method based on YOLOv3 deep learning networks. In Proceedings of the IEEE 2018 Chinese Automation Congress (CAC), Xi’an, China, 30 November–2 December 2018; pp. 1563–1568. [Google Scholar]
  20. Glenn, J. YOLOv8, version 8.1.0; Ultralytics: Los Angeles, CA, USA, 2023.
  21. Wang, C.Y.; Yeh, I.H.; Liao, H.Y.M. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. arXiv 2024, arXiv:2402.13616. [Google Scholar]
  22. Yang, X.; Gao, Y.; Fang, C.; Zheng, Y.; Wang, W. Deep learning-based bolt loosening detection for wind turbine towers. Struct. Control. Health Monit. 2022, 29, e2943. [Google Scholar] [CrossRef]
  23. Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
  24. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef]
  25. Cai, Z.; Vasconcelos, N. Cascade R-CNN: High quality object detection and instance segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 43, 1483–1498. [Google Scholar] [CrossRef]
  26. Zhang, H.; Chang, H.; Ma, B.; Wang, N.; Chen, X. Dynamic R-CNN: Towards high quality object detection via dynamic training. In Proceedings of the Computer Vision—ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part XV. Springer: Berlin/Heidelberg, Germany, 2020; pp. 260–275. [Google Scholar]
  27. Lu, X.; Li, B.; Yue, Y.; Li, Q.; Yan, J. Grid r-cnn. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 7363–7372. [Google Scholar]
  28. Sun, P.; Zhang, R.; Jiang, Y.; Kong, T.; Xu, C.; Zhan, W.; Tomizuka, M.; Li, L.; Yuan, Z.; Wang, C.; et al. Sparse r-cnn: End-to-end object detection with learnable proposals. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 14454–14463. [Google Scholar]
  29. Detector, A.F.O. FCOS: A Simple and Strong Anchor-Free Object Detector. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 4. [Google Scholar]
  30. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the Computer Vision—ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part I. Springer: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar]
  31. Paz, D.; Zhang, H.; Christensen, H.I. Tridentnet: A conditional generative model for dynamic trajectory generation. In Proceedings of the International Conference on Intelligent Autonomous Systems, Singapore, 22–25 June 2021; Springer: Berlin/Heidelberg, Germany, 2021; pp. 403–416. [Google Scholar]
  32. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
  33. Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
  34. Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
  35. Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
  36. Jocher, G.; Chaurasia, A.; Stoken, A.; Borovec, J.; Kwon, Y.; Michael, K.; Fang, J.; Wong, C.; Yifu, Z.; Montes, D.; et al. ultralytics/yolov5: V6.2-yolov5 classification models, apple m1, reproducibility, clearml and deci. ai integrations. Zenodo 2022. [Google Scholar] [CrossRef]
  37. Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. Yolox: Exceeding yolo series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
  38. Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W.; et al. YOLOv6: A single-stage object detection framework for industrial applications. arXiv 2022, arXiv:2209.02976. [Google Scholar]
  39. Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
  40. Zhu, X.; Su, W.; Lu, L.; Li, B.; Wang, X.; Dai, J. Deformable detr: Deformable transformers for end-to-end object detection. arXiv 2020, arXiv:2010.04159. [Google Scholar]
  41. Zhang, H.; Li, F.; Liu, S.; Zhang, L.; Su, H.; Zhu, J.; Ni, L.M.; Shum, H.Y. Dino: Detr with improved denoising anchor boxes for end-to-end object detection. arXiv 2022, arXiv:2203.03605. [Google Scholar]
  42. Zhang, S.; Wang, X.; Wang, J.; Pang, J.; Lyu, C.; Zhang, W.; Luo, P.; Chen, K. Dense distinct query for end-to-end object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7329–7338. [Google Scholar]
Figure 1. Accurate detection of corroded bolts is very important for the safety of subway tunnel lining structures.
Figure 1. Accurate detection of corroded bolts is very important for the safety of subway tunnel lining structures.
Buildings 14 02560 g001
Figure 2. The spatial adaptive improvement detection network structure. K: The size of the convolution kernel (Kernel). s: Stride, indicating the step length of the convolution kernel moving on the input image. The solid red line is the sine function, and the interrupted red line is the cosine function.
Figure 2. The spatial adaptive improvement detection network structure. K: The size of the convolution kernel (Kernel). s: Stride, indicating the step length of the convolution kernel moving on the input image. The solid red line is the sine function, and the interrupted red line is the cosine function.
Buildings 14 02560 g002
Figure 3. The SPPF module.
Figure 3. The SPPF module.
Buildings 14 02560 g003
Figure 4. The spatial adaptive improvement module structure. The solid red line is the sine function, and the interrupted red line is the cosine function.
Figure 4. The spatial adaptive improvement module structure. The solid red line is the sine function, and the interrupted red line is the cosine function.
Buildings 14 02560 g004
Figure 5. The C3 and C2f modules.
Figure 5. The C3 and C2f modules.
Buildings 14 02560 g005
Figure 6. The decoupled head module. K: The size of the convolution kernel (Kernel). s: Stride, indicating the step length of the convolution kernel moving on the input image.
Figure 6. The decoupled head module. K: The size of the convolution kernel (Kernel). s: Stride, indicating the step length of the convolution kernel moving on the input image.
Buildings 14 02560 g006
Figure 7. Comparative experiments with other target detection algorithms (part one).
Figure 7. Comparative experiments with other target detection algorithms (part one).
Buildings 14 02560 g007
Figure 8. Comparative experiments with other target detection algorithms (part two).
Figure 8. Comparative experiments with other target detection algorithms (part two).
Buildings 14 02560 g008
Figure 9. The PR curve.
Figure 9. The PR curve.
Buildings 14 02560 g009
Figure 10. Histogram: comparison of SAlDN algorithm with other object detection algorithms.
Figure 10. Histogram: comparison of SAlDN algorithm with other object detection algorithms.
Buildings 14 02560 g010
Figure 11. Radar chart: comparison of SAlDN algorithm with other object detection algorithms.
Figure 11. Radar chart: comparison of SAlDN algorithm with other object detection algorithms.
Buildings 14 02560 g011
Figure 12. Feature map visualization of the SAIM’s effect.
Figure 12. Feature map visualization of the SAIM’s effect.
Buildings 14 02560 g012
Table 1. Comparison of SAIDN with other object detection algorithms.
Table 1. Comparison of SAIDN with other object detection algorithms.
MethodsPrecisionRecallFl Score[email protected][email protected]:0.95mIOU
Cascade RCNN78.592.084.794.547.062.3
Dynamic RCNN79.291.885.094.746.762.1
Faster RCNN72.192.782.094.545.860.2
Grid RCNN76.890.583.193.240.560.0
Sparse RCNN77.089.282.791.044.359.5
FCOS75.588.181.389.143.258.7
SSD74.290.081.394.543.858.9
Tridentnet79.091.585.094.247.162.5
YOLOX80.591.090.495.646.163.0
YOLOv8n80.090.885.195.245.462.7
YOLOv981.095.290.395.850.863.5
SAIDN (ours)94.498.597.297.153.868.5
Table 2. Comparison of model operational parameters with other object detection algorithms.
Table 2. Comparison of model operational parameters with other object detection algorithms.
MethodsParameterFLOPsInf. (ms)
Cascade RCNN68.93118.8122.6
Dynamic RCNN41.1291.021.6
Faster RCNN41.1291.021.5
Grid RCNN64.24204.4932.6
Sparse RCNN105.9464.6321.2
FCOS31.8478.6319.6
SSD24.39137.0720.7
Tridentnet32.76769.1435.6
YOLOX8.9413.3215.8
YOLOv8n3.08.12.0
YOLOv968.6241.112.8
SAIDN5.328.66.0
Table 3. Comparison of ablation experiment performance.The × is not used, √ is used.
Table 3. Comparison of ablation experiment performance.The × is not used, √ is used.
ModelSFCLDILWAL[email protected][email protected]:0.95mIOU
SAIDN×××95.250.163.0
SAIDN×96.351.466.3
SAIDN×94.549.065.7
SAIDN×95.952.067.6
SAIDN97.153.868.5
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Guo, Z.; Cheng, X.; Xie, Q.; Zhou, H. Spatial Adaptive Improvement Detection Network for Corroded Bolt Detection in Tunnels. Buildings 2024, 14, 2560. https://doi.org/10.3390/buildings14082560

AMA Style

Guo Z, Cheng X, Xie Q, Zhou H. Spatial Adaptive Improvement Detection Network for Corroded Bolt Detection in Tunnels. Buildings. 2024; 14(8):2560. https://doi.org/10.3390/buildings14082560

Chicago/Turabian Style

Guo, Zhiwei, Xianfeng Cheng, Quanmin Xie, and Hui Zhou. 2024. "Spatial Adaptive Improvement Detection Network for Corroded Bolt Detection in Tunnels" Buildings 14, no. 8: 2560. https://doi.org/10.3390/buildings14082560

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop