1. Introduction
Due to its high carrying capacity and efficiency, rail transport has become the main mode of land transportation [
1,
2]. The lining structure of subway tunnels is fixed by bolts, which support the tunnel structure to ensure the safe operation of the subway, as shown in
Figure 1. Bolts play a crucial role in the structure of subway tunnels, as they not only fix the lining structure but also support the stability and safety of the entire tunnel [
3,
4,
5]. However, these bolts are exposed to the air for long periods and subjected to humidity, air pollutants, and other environmental factors, making them susceptible to corrosion [
6,
7,
8,
9,
10]. The hazards of corroded bolts mainly manifest in the following aspects: (1) Reduced load-bearing capacity: Corrosion weakens the mechanical properties of bolts, significantly reducing their load-bearing capacity. Over time, corrosion decreases bolts’ material strength and toughness, rendering them unable to support the weight and pressure of the tunnel structure effectively. This reduction in load-bearing capacity directly threatens the tunnel’s structural integrity, increasing the risk of deformation or collapse. (2) Compromised structural stability: Corroded bolts not only affect the performance of individual bolts but also cause a chain reaction that impacts the stability of the entire tunnel structure. Bolt corrosion may lead to loosening of the lining structure, causing displacement and deformation of the tunnel walls. In extreme cases, severely corroded bolts may break, resulting in partial or complete tunnel structure failure and serious safety incidents. (3) Accelerated corrosion of surrounding metal components: Corroded bolts create a more aggressive electrochemical environment, accelerating the corrosion process of surrounding metal components. This phenomenon, known as “galvanic corrosion”, reduces the overall durability of the tunnel structure, further increasing maintenance complexity and costs. (4) Difficulties in maintenance and replacement: The generation and accumulation of corrosion products can lead to blocked bolt holes and jammed threads, complicating the replacement and maintenance process. The buildup of rust and other debris makes removing and replacing corroded bolts extremely difficult, often requiring special tools and more manual intervention, thereby increasing maintenance costs and time. (5) Impact on operational efficiency: Frequent inspection and maintenance activities consume substantial resources and disrupt the subway system’s normal operation, causing inconvenience to passengers and operational delays. The economic burden of these maintenance activities is considerable, diverting resources that could have been used for system expansion and upgrades.
In addition, maintaining subway tunnels is extremely complex and challenging. As the tunnel structure gradually ages over long-term operation, ensuring its stability and safety becomes increasingly important [
11,
12,
13,
14]. However, tunnel maintenance faces numerous difficulties, increasing the complexity and cost of maintenance work and posing serious threats to the safe operation of tunnels. Specifically, the difficulties in maintaining tunnels are mainly reflected in the following aspects: (1) High costs and low efficiency: Currently, tunnel maintenance mainly relies on manual visual inspection, which is time-consuming and labor-intensive and is susceptible to the training level of work experience and the environmental conditions of the inspection personnel. Typically, inspectors work during nonoperational hours (such as night and early morning), where insufficient light and visual fatigue during nighttime work greatly reduce the accuracy and efficiency of inspections. Under these conditions, a well-trained maintenance team of 10 to 15 people can only inspect two to three kilometers in about three hours, resulting in high costs and low efficiency. (2) Limitations of subjective judgment: Manual inspection is prone to subjective judgment limitations, which may lead to many bolts being misjudged as normal or corroded, affecting the accuracy and timeliness of maintenance decisions. Misjudgment may result in failing to detect and resolve corrosion issues promptly, increasing potential safety risks or causing unnecessary replacements and maintenance, which wastes resources and costs. (3) Environmental impact: Environmental conditions inside the tunnel, such as humidity, temperature, and air quality, can affect inspection results. The high humidity and pollutants inside the tunnel accelerate bolt corrosion, increasing the difficulty of inspection and maintenance. Moreover, the tunnel environment typically has poor lighting and ventilation conditions, which add complexity and challenges to inspection work and pose serious health risks to inspection personnel, such as respiratory problems, visual fatigue, and dehydration or heatstroke due to high humidity and temperature conditions. (4) Hidden and complex nature of corrosion issues: Bolt corrosion is often hidden and complex, especially in critical areas of the tunnel, making it difficult to detect and evaluate. Corrosion usually occurs inside bolts or in small cracks on the surface, which are difficult to observe directly with the naked eye. Additionally, critical areas of the tunnel, such as deep structural connection points, are challenging to access and evaluate due to their hidden location. (5) Safety risks for maintenance personnel: During tunnel maintenance, inspectors face a high-risk work environment, including high humidity, low visibility, and safety hazards associated with nighttime work. Furthermore, the complexity of the tunnel structure poses various potential dangers during the inspection and replacement of corroded bolts, such as falls, collisions, and other accidents. These factors increase the complexity and danger of tunnel maintenance work, posing significant threats to the safety of maintenance personnel.
Given the above difficulties in maintaining tunnels and the potential hazards of corroded bolts, effectively detecting corroded bolts is crucial for preventing structural failures and ensuring the safety of the subway system. Regular and accurate corrosion detection can identify potential safety hazards early, reducing the likelihood of significant safety incidents and protecting the lives of passengers and staff. Moreover, early detection and intervention can prevent the spread of corrosion issues, reducing the frequency and cost of replacements and repairs. Effective detection methods can improve maintenance efficiency and lower labor and material costs.
In recent years, many researchers have focused on applying object detection technology to detect corroded bolts [
11,
12,
13,
14,
15,
16,
17,
18,
19]. Although YOLOv8 [
20] and YOLOv9 [
21] demonstrate excellent general object detection capabilities, they show some deficiencies in addressing specific issues related to corrosion defects. First, corroded bolt defects exhibit subtle visual differences and often have specific directionality, posing high demands on the spatial sensitivity and directional recognition capabilities of detection algorithms. As advanced object detection frameworks, YOLOv8 and YOLOv9 have significant advantages in feature extraction and rapid detection, but their standard architectures are not specifically designed to capture the detailed features and directional information in images of corrosion defects. Specifically, the YOLO series models lack a natural mechanism to encode the directional information of corroded bolts because they primarily focus on general feature extraction and object classification without targeted optimization to handle the complex spatial structures and directional characteristics of corroded bolts. Detecting these defects requires the model to recognize small defect features and understand their spatial arrangement and direction to accurately locate and identify defect types. Second, the uniqueness of corroded bolt images, including the contrast variations caused by different light absorption rates of materials under various weather conditions and the texture complexity due to different types and degrees of corrosion during use, further increases the difficulty of detection. These factors require the detection model to possess high spatial and directional sensitivity, advanced feature adaptation, and background noise suppression capabilities. Therefore, detecting corroded bolts requires highly sensitive and accurate spatial localization capabilities and the differentiation of defect-specific textures and shapes, where YOLO series models may have limitations in fine feature representation and distinguishing specific defect types. In summary, while YOLO performs excellently in general object detection tasks, its performance is limited in the specific scenario of corroded bolt images due to insufficient directional information encoding and fine-grained feature processing capabilities.
This chapter proposes a spatial adaptive improvement detection network (SAIDN) for corroded bolt image defect detection. SAIDN has significant advantages due to its optimization for corroded bolt image analysis. SAIDN can adaptively emphasize important features, reduce interference from irrelevant information, and enhance detection accuracy. Specifically, this chapter proposes a spatial adaptive improvement module (SAIM), which performs a detailed analysis and transformation of features in both spatial and channel dimensions, adopting various methods. Its advantage lies in its comprehensive structure, including the spatial feature conversion layer (SFCL), direction improvement layer (DIL), and weighted allocation layer (WAL), which collectively enhance the model’s ability to recognize critical defect information in specific spatial structures. This modular approach can effectively extract and identify defects by finely analyzing input features and improving the model’s sensitivity to spatial structures and ability to recognize defects in complex backgrounds. Additionally, incorporating trigonometric functions further enhances the model’s natural ability to encode directional information and effectively handle periodic patterns, making it particularly suitable for identifying corroded bolts with directional or periodic characteristics. Furthermore, by strategically integrating depthwise separable convolutions and adaptive feature reweighting, the SAIM emphasizes the most critical features in the image, thereby improving the model’s accuracy in defect detection. Moreover, the SAIM can seamlessly integrate with existing deep learning architectures, making it a powerful and flexible tool to enhance corroded bolt image detection and recognition capabilities. Finally, the design of the depthwise separable convolutions in the SAIM enhances the model’s ability to handle details and optimizes computational efficiency, enabling SAIDN to maintain high-performance detection while achieving high operational efficiency. In summary, by integrating the SAIM, SAIDN has significant advantages in the recognition and localization of corroded bolts and performs excellently in processing efficiency and model robustness, providing an effective and precise solution for corroded bolt image analysis.
Our contributions are summarized as follows:
We propose SAIDN, which is specifically designed for corroded bolt defect detection. SAIDN integrates the SAIM, can adaptively emphasize important directional features, and reduces interference, significantly improving detection accuracy.
We propose the SAIM, which performs detailed analysis and transformation of features in spatial and channel dimensions, optimizing the recognition capability of specific spatial structures in corroded bolts. Its structure includes SFCL, DIL, and WAL, which collectively enhance the model’s ability to recognize critical defect information and identify defects in complex backgrounds.
We conduct comprehensive experiments demonstrating that SAIDN significantly surpasses current advanced object detection algorithms.
2. Related Work
2.1. Corroded Bolt Detection Based on One-Dimensional Signal
In recent years, corroded bolt detection technology has played an important role in infrastructure maintenance. This paper reviews several important detection methods, including acoustic emission, ultrasonic technology, piezoelectric sensor technology, and image recognition technology, and analyzes their respective advantages, disadvantages, and application scenarios in detail.
The acoustic emission method identifies corrosion and other defects by detecting sound waves generated by stress release within the material. This method performs excellently in detecting corrosion on bolt heads. Wang et al. [
6] proposed a bolt head corrosion detection technology under external vibration based on an entropy-enhanced acoustic emission method, improving the sensitivity and accuracy of corrosion detection. Wang et al. [
7] studied the identification of multiple bolt head corrosion and proposed an acoustic-ultrasonic method based on linear and nonlinear shape features, improving the accuracy of corrosion detection. Piezoelectric sensor technology utilizes the piezoelectric effect of piezoelectric ceramic materials to monitor stress changes at the bolt connection through sensors, thereby detecting corrosion and damage. Cui et al. [
8] studied the use of active sensing methods with piezoelectric sensors to monitor corrosion damage at bolt connections, achieving early detection and assessment of corrosion. The advantages of this method include high sensitivity and real-time monitoring capabilities, but it also has drawbacks such as complex installation and high costs. Ultrasonic technology uses the propagation characteristics of high-frequency sound waves in the material to identify corrosion and other defects by detecting changes in reflected and transmitted waves. Lee et al. [
9] conducted a blind test for bolt corrosion detection using ultrasonic technology, which can effectively detect internal corrosion defects in bolts. However, its detection accuracy is affected by the surface condition of the material and the operational technique. Wu et al. [
10] studied the application of ultrasonic technology in detecting stress corrosion cracking in cable bolts, verifying the effectiveness of ultrasonic technology in detecting bolt corrosion in high-stress environments. In summary, acoustic emission has high detection sensitivity and is suitable for corrosion detection in complex environments; piezoelectric sensor technology is known for its high sensitivity and real-time monitoring capabilities but comes with high costs; and ultrasonic technology excels in detecting internal defects but is significantly affected by operational technique and material surface conditions.
Compared with the above technologies, the advantages of image recognition in corroded bolt detection are mainly reflected in its high precision, high efficiency, and automation capabilities. Firstly, image recognition technology can quickly process and analyze large volumes of image data, greatly improving detection efficiency. Secondly, through advanced algorithms and models, image recognition can accurately identify and locate corrosion areas, reducing subjective errors in manual judgment and improving the reliability and consistency of detection. Additionally, image recognition technology can operate stably in harsh and complex environments, with strong adaptability, effectively meeting detection needs under different lighting and background conditions. Finally, image recognition technology can automate the detection process, reducing labor costs and safety risks for maintenance personnel, providing strong technical support for the daily maintenance and management of infrastructure.
2.2. Corroded Bolt Detection Based on Two-Dimensional Images
Current deep learning-based visual bolt corrosion detection technology has developed rapidly [
1,
5,
11,
12,
13,
14,
15,
16,
17,
18,
19]. Ta et al. [
15] used regional convolutional neural network (RCNN)-based deep learning and a Hough line transform (HLT) algorithm to monitor corroded and loosened bolts in steel structures. Cha et al. [
16] proposed an autonomous structural visual inspection method using an RCNN model for real-time damage detection, covering concrete cracks, steel corrosion, bolt corrosion, and steel delamination. This method significantly improves the model’s ability to identify various types of damage in complex backgrounds by introducing a region extraction network and convolutional neural networks. Ta et al. [
17] used Mask RCNN to monitor and identify the degree of bolt corrosion in well-lit laboratory steel structures. Their innovation lies in achieving the precise localization of corrosion areas and quantitative assessment of the degree of corrosion through instance segmentation technology, significantly improving detection accuracy. Suh et al. [
18] employed a Faster RCNN-based model to detect and locate various types of damage, including bolt corrosion. This method uses a region proposal network (RPN) to generate candidate regions and subsequent classification and regression steps to accurately identify various types of damage, including bolt corrosion. However, the aforementioned methods are two-stage models with slower inference speeds during application deployment, failing to meet the requirements for real-time detection. Additionally, in practice, it is unnecessary to sacrifice speed and cost to distinguish the target pixels of corroded bolts precisely.
Another branch of single-stage object detection algorithms, YOLO, accelerates the training and detection process. YOLO converts the object detection task into a regression problem, directly predicting the bounding boxes and categories of objects in the image, thereby significantly improving detection speed. Yang et al. [
22] introduced a bolt-loosening detection method by combining the manual torque method with various versions of the YOLO, and the experiments showed that the method achieved good experimental indexes with strong application value in the scenario of using smartphones. Tan et al. [
1] proposed an ensemble learning method that combines the improved multiscale Retinex with color restoration (IMSRCR) and YOLO to detect corroded bolts in linings based on actual tunnel image data. This method enhances image contrast and detail through multiscale Retinex and, combined with YOLO’s efficient detection capabilities, achieves accurate detection of corroded bolts in tunnel images. Subsequently, Tan et al. [
11] proposed DMDSNet, which achieves two parallel tasks of bolt detection and pixel-level rust segmentation. DMDSNet optimizes both detection and segmentation tasks through a multitask learning framework, improving the overall performance of the model.
Despite significant progress in these object detection algorithms, there are still shortcomings in addressing the directionality of corroded bolts. Current methods mainly focus on the detection and classification of corrosion areas, neglecting the directional characteristics of bolt corrosion. Bolt corrosion often has directionality, which is important in actual maintenance and repair work. Therefore, future research can further explore how to incorporate directional features into object detection algorithms to improve the accuracy and practicality of corroded bolt detection.
2.3. Object Detection
In recent years, deep learning has become an important tool for solving big data problems due to its powerful ability to process complex patterns [
20,
21,
23,
24,
25,
26,
27,
28,
29,
30,
31,
32,
33,
34,
35,
36,
37,
38,
39]. It has been widely used in various fields and has shown its potential to solve many pressing problems. The two-stage detection network based on candidate regions has greatly improved the detection speed and accuracy through a series of improvements such as Fast R-CNN [
23], Faster R-CNN [
24], Cascade R-CNN [
25], Dynamic RCNN [
26], Grid RCNN [
27], Sparse RCNN [
28], FCOS [
29], and Tridentnet [
31]. These advancements have leveraged innovative techniques like region proposal networks, cascade structures, dynamic adjustments, grid-based predictions, sparsity constraints, and fully convolutional object detection approaches to enhance the robustness and precision of detection models. Meanwhile, the YOLO series one-stage detection algorithm achieves rapid and accurate image detection in an end-to-end manner, continuously optimizing from YOLOv1 to YOLOv9 [
20,
21,
32,
33,
34,
35,
36,
37,
38,
39] by introducing techniques such as batch normalization, residual structures, multiscale feature fusion, and reparameterization strategies, significantly improving detection performance and training efficiency. YOLOv8 [
20] adopts new convolution and data augmentation techniques, while YOLOv9 [
21] introduces programmable gradient information and auxiliary reversible branch design, further enhancing the model’s feature retention capability and training efficiency, optimizing detection performance. These enhancements in the YOLO series have enabled high accuracy in real-time detection, making it a preferred choice for many practical applications.
In recent years, Transformer-based object detection methods have also garnered attention. Among them, Deformable DETR [
40] improves the network training convergence speed and addresses the slow convergence issue of the original DETR by using a new attention module to process image feature maps. DINO [
41] reduces duplicate predictions and improves prediction accuracy through dynamic anchor boxes and contrastive denoising training. DDQ DETR [
42] selects unique queries for one-to-one assignments based on dense queries, combining the advantages of traditional and end-to-end detectors. However, Transformer models usually require a large amount of training data and high computational resources, which is a significant limiting factor given the data volume in defect detection and the parameter constraints in industrial applications. Therefore, in photovoltaic defect detection, this paper explores the application and optimization of a two-stage detection network based on Faster R-CNN architecture, aiming for high-precision detection, complex background processing, and small-target defect recognition in the photovoltaic defect field.
Although deep learning object detection algorithms have achieved great success, they still face challenges and certain unavoidable limitations in industrial scenarios of corroded bolt detection. Firstly, when dealing with various corroded bolts, deep learning algorithms rely on existing backbone feature extraction networks designed based on common datasets, which have relatively weak directional awareness and difficulty flexibly responding to multiple defect types. These networks often lack the specialized capability to distinguish subtle differences and orientations specific to corrosion patterns on bolts. Secondly, the variability in lighting conditions, corrosion severity, and environmental factors pose additional challenges, as the models trained on standard datasets may not generalize well to these diverse real-world conditions. In conclusion, while deep learning has revolutionized the field of object detection and made significant strides in applications such as corroded bolt detection, ongoing research and innovation are required to address these challenges. Future work could focus on developing more specialized feature extraction networks that incorporate directional awareness, enhancing model robustness to varying conditions and optimizing algorithms for deployment on resource-constrained devices. By addressing these challenges, it will be possible to further improve the accuracy, efficiency, and applicability of deep learning-based corroded bolt detection systems in industrial settings.
4. Experimental Results and Analysis
4.1. Dataset and Evaluation Metrics
We utilized the MS100 dataset [
11] for our experiments. This dataset comprises corroded bolt images taken from a Beijing subway service tunnel, totaling 1441 images, each with a resolution of 640 × 640 pixels. For evaluation purposes, we randomly selected 720 images to form the test set, on which we assessed the performance of our proposed model.
4.2. Implementation Details
The model was trained in an environment with the following hardware configuration: Ubuntu 22.04 operating system, Intel Core i9-13900K processor, and NVIDIA GeForce GTX 3090 Ti graphics card. The model was developed using the PyTorch 1.9.0 framework. For optimal training, the initial learning rate was set at 0.001 and adjusted using a cosine learning rate decay strategy with a decay factor of 0.9. This method facilitates rapid convergence in the initial training phases and fine-tuning of model parameters in later stages. Stochastic gradient descent was utilized for parameter updates, incorporating a weight decay coefficient of 0.0001 and a momentum value of 0.999. This setup helps mitigate overfitting while ensuring stability and efficiency during training. The input image size for the model was pixels. Given the hardware configuration and memory constraints, the batch size was set to 8. A total of 300 training iterations were performed to ensure the model sufficiently learned the data features and achieved robust generalization.
4.3. Comparative Experimental Results
To comprehensively evaluate the performance of our proposed SAIDN method, we designed and conducted a series of comparative experiments involving various object detection algorithms. The experiments compared the detection performance, operational efficiency, and real-world applicability of different methods from multiple dimensions. Specifically, we compared the detection effectiveness of each method across six key metrics: precision, recall, F1 score,
[email protected],
[email protected]:0.95, and mIOU. Additionally, we assessed model complexity through parameter count (Param.), floating point operations (FLOPs), and inference time (Inf.) to analyze computational efficiency. To provide a more intuitive understanding of the detection performance, we plotted PR curves and compared actual detection results. These comprehensive experiments aimed to validate the advantages of SAIDN in object detection tasks and provide strong evidence for its potential in practical applications.
In this study, our proposed SAIDN method demonstrated significant advantages across multiple performance metrics, as shown in
Figure 7,
Figure 8 and
Table 1. As shown in
Table 1, SAIDN consistently outperformed existing methods in precision, recall, F1 score,
[email protected],
[email protected]:0.95, and mIOU. Specifically, SAIDN achieved a precision of 94.4%, which is significantly higher than other methods such as YOLOv9 with 81.0% and Cascade RCNN with 78.5%. This indicates that SAIDN has superior performance in reducing false positives, meaning it can more accurately identify true targets without incorrectly identifying nontargets. Additionally, SAIDN’s recall was 98.5%, far exceeding other methods like YOLOv9 with 95.2% and Dynamic RCNN with 91.8%. This demonstrates SAIDN’s comprehensive ability to detect all targets, capturing almost all true instances. The high recall value underscores SAIDN’s exceptional capability in ensuring that actual targets are not missed. In terms of the F1 score, which balances precision and recall, SAIDN reached 97.2%, markedly higher than YOLOv9 with 90.3% and YOLOX with 90.4%. This reflects SAIDN’s outstanding performance in maintaining high accuracy while also ensuring that the majority of true targets are detected. The F1 score highlights SAIDN’s ability to effectively balance between precision and recall, making it highly reliable for practical applications. For
[email protected], which measures detection accuracy at a low threshold, SAIDN achieved 97.1%, surpassing all compared methods such as YOLOv9 with 95.8% and Tridentnet with 94.2%. This indicates that SAIDN can accurately detect objects even with a lower threshold, ensuring high detection rates. For the more challenging
[email protected]:0.95 metric, SAIDN achieved 53.8%, significantly higher than YOLOv9 with 50.8% and Cascade RCNN with 47.0%. This demonstrates that SAIDN maintains high detection performance across various IoU thresholds, accommodating a broader range of detection needs and making it versatile for different applications. Finally, SAIDN’s mean intersection over union (mIOU) was 68.5%, a notable improvement over other methods such as YOLOv9 with 63.5% and Tridentnet with 62.5%. This indicates superior target localization accuracy, showing that SAIDN can more precisely delineate the boundaries of detected objects, which is crucial for tasks requiring detailed object localization. In summary, SAIDN’s superior performance across these metrics demonstrates its effectiveness and reliability in object detection tasks. Its ability to achieve high precision, recall, and F1 scores, along with outstanding performance in
[email protected],
[email protected]:0.95, and mIOU, makes it a robust and versatile solution for various detection needs.
To further validate the performance of each method, we plotted PR curves, Histogram and Radar chart, as shown in
Figure 9,
Figure 10 and
Figure 11. As shown in
Table 1 and
Figure 9,
Figure 10 and
Figure 11, it is evident that SAIDN consistently achieved higher precision across different recall rates than other methods, further proving its robustness and stability under varying detection conditions. The PR curve performance further corroborates SAIDN’s advantage in maintaining high precision even at high recall rates. Additionally, we compared the detection results of different methods, as shown in
Figure 7 and
Figure 8, which showcase the detection performance of each method in real-world scenarios. It is clear that SAIDN accurately detects targets and maintains high detection precision and localization accuracy in complex backgrounds. Compared to other methods, SAIDN excels in handling occlusion and complex backgrounds, validating its potential and reliability in practical applications.
We also compared the operational parameters of different methods, as shown in
Table 2.
Table 2 presents the performance of each method in terms of parameter count (Param.), floating point operations (FLOPs), and inference time (Inf.). Specifically, SAIDN’s parameter count is 5.3 million, significantly smaller than most other methods, such as Cascade RCNN with 68.93 million and YOLOv9 with 68.6 million, indicating a clear advantage in model complexity. A smaller parameter count means the model demands fewer storage and computational resources, beneficial for deployment in resource-constrained environments. Meanwhile, SAIDN’s FLOPs is 28.6 billion, lower than other methods like Grid RCNN with 204.49 billion and Tridentnet with 769.14 billion, demonstrating superior computational efficiency. Lower FLOPs indicates less computational load during inference, contributing to faster inference speeds and reduced energy consumption. More importantly, SAIDN’s inference time is 6.0 milliseconds, significantly faster than other methods such as Cascade RCNN with 22.6 milliseconds and Tridentnet with 35.6 milliseconds, indicating higher processing speed and real-time performance in practical applications. This performance metric is critical, especially in real-time response applications where SAIDN’s rapid inference capability can significantly enhance system responsiveness and user experience.
In summary, SAIDN exhibits excellent performance across all evaluated metrics, validating its effectiveness and robustness in object detection tasks. Compared to existing methods, SAIDN offers significant advantages in reducing false positives, improving detection comprehensiveness, and enhancing localization accuracy, better meeting the demands of practical applications. Additionally, the visualized PR curves and detection results further demonstrate SAIDN’s superior performance, showcasing its potential and reliability in real-world applications. Moreover, in operational parameter comparisons, SAIDN’s smaller parameter count, lower FLOPs, and extremely fast inference time further highlight its efficiency and practicality, providing strong support for its application in real-world scenarios.
4.4. Ablation Study Analysis
In this section, we propose the SAIDN for corroded bolt image defect detection. SAIDN has significant advantages due to its optimization for corroded bolt image analysis, allowing it to adaptively emphasize important features, reduce interference from irrelevant information, and enhance detection accuracy. The core component of SAIDN is the SAIM, which performs detailed analysis and transformation of features in both spatial and channel dimensions using various methods. The SAIM comprises three key layers: SFCL, DIL, and WAL. These layers collectively enhance the model’s ability to recognize critical defect information in specific spatial structures.
The ablation study results, as presented in
Table 3 and
Figure 12, illustrate the impact of each component on the overall performance of the SAIDN model. We systematically enabled and disabled the SFCL, DIL, and WAL components to assess their individual and combined effects, as follows:
- (1)
When none of the components (SFCL, DIL, and WAL) were enabled, the baseline model achieved an
[email protected] of 95.2%,
[email protected]:0.95 of 50.1%, and mIOU of 63.0%. This serves as the foundational performance level for comparison.
- (2)
When only DIL and WAL were enabled, the model’s
[email protected] improved to 96.3%,
[email protected]:0.95 increased to 51.4%, and mIOU rose to 66.3%. This demonstrates that DIL and WAL significantly enhance detection accuracy and precision by focusing on directional features and allocating weights effectively.
- (3)
- (4)
Enabling SFCL and DIL while disabling WAL resulted in an
[email protected] of 95.9%,
[email protected]:0.95 of 52.0%, and mIOU of 67.6%. These results highlight the importance of WAL in overall performance, as the absence of WAL still allowed for significant gains, showing the strong impact of SFCL and DIL on spatial feature transformation and directional improvement.
- (5)
When all components (SFCL, DIL, and WAL) were enabled, the model achieved its highest performance with an
[email protected] of 97.1%,
[email protected]:0.95 of 53.8%, and mIOU of 68.5%. This confirms that the combined effect of all three components results in the best performance, demonstrating their complementary nature and the overall robustness they provide to the SAIDN model.
SFCL selectively enhances important feature channels while suppressing irrelevant ones, improving feature extraction effectiveness. The ablation results indicate that disabling SFCL results in decreased
[email protected]:0.95 and mIOU, underscoring SFCL’s critical role in optimizing feature selection. DIL dynamically adjusts feature representations for different instances, enhancing the model’s detection flexibility and robustness. The results show that disabling DIL significantly lowers
[email protected] and mIOU, highlighting its importance in overall detection accuracy and precision, particularly at higher IoU thresholds. WAL assigns different weights to focus on important regions, enhancing feature representation. Disabling WAL, even with the other two components enabled, results in reduced performance, emphasizing WAL’s crucial role in improving detection effectiveness, especially in complex backgrounds.
Through this detailed ablation study, we clarified the specific contributions of each component to the SAIDN model’s performance. Each component optimizes the model in different aspects, and their combined usage achieves the best performance, validating the effectiveness and robustness of SAIDN in corroded bolt defect detection.
5. Conclusions
In this study, we proposed the SAIDN for the defect detection of corroded bolts in images. By introducing the SAIM, we significantly enhanced the model’s ability to identify corroded bolt defects in complex backgrounds. The SAIM meticulously analyzes and transforms features in both spatial and channel dimensions, effectively boosting the model’s directional sensitivity and feature reweighting capability, thereby improving detection accuracy. Experimental results demonstrate that SAIDN exhibits exceptional performance in corroded bolt defect detection tasks. Compared to existing state-of-the-art object detection models, SAIDN achieved an AP of 97.1%, significantly surpassing other models. Additionally, SAIDN excelled in terms of parameter count and computational complexity, with only 5.3M parameters and 28.6 GFLOPs, showcasing efficient computational performance and low resource consumption.The ablation study further validated the contributions of each component of the SAIM to overall performance. Specifically, the spatial feature cross-level (SFCL), directional information learning (DIL), and weighted attention learning (WAL) each played critical roles in enhancing the model’s ability to capture and recognize defect information. Notably, the DIL component was crucial in improving directional sensitivity and handling periodic features, with its removal resulting in the most significant performance decline. In summary, SAIDN, by integrating the SAIM, demonstrates clear advantages in recognizing and locating corroded bolts, achieving new heights in detection accuracy while also exhibiting excellent processing efficiency and model robustness. Our research provides an effective and precise solution for corroded bolt image analysis, offering important design references for future research and model optimization in practical applications.