A Defect Detection Method for Grading Rings of Transmission Lines Based on Improved YOLOv8

Xiang, Siyu; Zhang, Linghao; Chen, Yumin; Du, Peike; Wang, Yao; Xi, Yue; Li, Bing; Zhao, Zhenbing

doi:10.3390/en17194767

Open AccessArticle

A Defect Detection Method for Grading Rings of Transmission Lines Based on Improved YOLOv8

by

Siyu Xiang

^1,2,

Linghao Zhang

¹,

Yumin Chen

¹,

Peike Du

³,

Yao Wang

³,

Yue Xi

⁴,

Bing Li

^5,*

and

Zhenbing Zhao

⁴

¹

State Grid Sichuan Electric Power Research Institute, Chengdu 610095, China

²

Power Internet of Things Key Laboratory of Sichuan Province, Chengdu 610095, China

³

State Grid Sichuan Liangshan Electric Power Company, Xichang 615000, China

⁴

School of Electrical and Electronic Engineering, North China Electric Power University, Baoding 071003, China

⁵

Department of Automation, North China Electric Power University, Baoding 071003, China

^*

Author to whom correspondence should be addressed.

Energies 2024, 17(19), 4767; https://doi.org/10.3390/en17194767

Submission received: 12 August 2024 / Revised: 15 September 2024 / Accepted: 23 September 2024 / Published: 24 September 2024

(This article belongs to the Section F: Electrical Engineering)

Download

Browse Figures

Versions Notes

Abstract

Detecting defects in aerial images of grading rings collected by drones poses challenges due to the structural similarity between normal and defective samples. The small visual differences make it hard to distinguish defects and extract key features. Additionally, critical defect features often become lost during feature fusion. To address these issues, this paper uses YOLOv8 as the baseline model and proposes an improved YOLOv8-based method for detecting grading ring defects in transmission lines. Our approach first integrates the CloAttention and C2f modules into the feature extraction network, enhancing the model’s ability to capture and identify defect features in grading rings. Additionally, we incorporate CARAFE into the feature fusion network to replace the original upsampling module, effectively reducing the loss of critical defect information during the fusion process. Experimental results demonstrate that our method achieves an average detection accuracy of 67.6% for grading ring defects, marking a 6.8% improvement over the baseline model. This improvement significantly enhances the effectiveness of defect detection in transmission line grading rings.

Keywords:

transmission line; grading ring defects; attention mechanism; CARAFE; object detection

1. Introduction

The electric power system comprises interconnected components, including power generation, transmission, and distribution. The transmission line is composed of conductors, insulators, fittings, and other components, which work together to ensure the transmission of electrical energy [1]. The grading ring is an important component of transmission lines, and its main functions are to prevent side lightning strikes and measure the circuit current. Because transmission lines have been operating in harsh natural environments for a long time [2], they are susceptible to galloping caused by extreme weather conditions, such as strong winds and rainstorms. Consequently, grading rings are prone to defects, such as displacement, fracture, and damage. Such defects not only compromise the lightning protection effectiveness of transmission lines but can also lead to safety hazards within the power system. Therefore, it is essential to implement effective defect detection methods for grading rings on transmission lines.

In recent years, the widespread adoption of drone technology and the rapid advancements in computer vision have made the intelligent inspection of transmission lines a significant trend [3]. The integration of drone-based aerial photography with computer vision technology is increasingly replacing traditional manual inspection methods. This shift is establishing drone-assisted inspections as a prominent and effective approach for evaluating high-voltage overhead transmission lines both domestically and internationally [4].

Consequently, an increasing number of researchers are focusing on applying deep learning technologies to detect defects in transmission lines.

Study [5] proposes a detection approach for various fittings. It addresses challenges such as false and repeated detections caused by a lack of contextual information. This method improves detection accuracy by incorporating structural knowledge into the model’s output. Study [6] proposes an algorithm for detecting hardware defects in transmission lines using YOLOv5. This algorithm significantly improves feature extraction and fusion by incorporating a feature extraction module based on the Swin Transformer [7] and the BiFPN [8] network within the YOLOv5 framework. Additionally, the algorithm employs DRConv to enhance key features across various semantic regions. Study [9] explores the use of deep learning for recognizing transmission line components and detecting defects. It utilizes the Fast-er-RCNN (Faster Region-Based Convolutional Neural Network) [10] model for the accurate recognition and classification of these components. Additionally, the document compares various network models based on their recognition accuracy and processing time for different transmission line components. This comparison highlights the effectiveness and reliability of deep learning methods in improving both the accuracy of component recognition and the efficiency of defect detection. This can enhance the efficiency and accuracy of transmission line inspection. Study [11] proposes a collaborative defect detection system. This system integrates edge computing with cloud computing. It employs a YOLO-Faster defect detection algorithm and uses PyQt5 to create an intuitive visual operation platform. This design allows unmanned aerial vehicles (UAVs) to conduct real-time monitoring and provide early warnings for defects during inspection processes. Study [12] presents a deep learning-based method for identifying the tilt of grading rings in transmission lines. This method addresses issues of low accuracy and imprecise tilt angle calculations in the image recognition of grading ring tilt. This approach first uses the Faster-RCNN model to detect the target locations of grading rings and insulator strings. It then employs a Cascaded Pyramid Network (CPN) to precisely locate multiple key feature points of the targets. Finally, the Text Box (TB) algorithm calculates the specific tilt angle of the grading rings. This method significantly improves the accuracy of tilt identification for grading rings in transmission lines.

Study [13] proposes a compact defect detection method for overhead transmission lines using convolutional neural networks. This method introduces a Three-Pathway Feature Fusion Network (TFFN) designed to preserve both spatial and semantic information through a cross-layer feature fusion mechanism. Additionally, it incorporates an enhanced receptive field attention (RFA+) module and a Context Perception Module (CPM). These components enable the network to effectively handle defects of varying sizes. In transmission line edge detection, there is a growing demand for high-speed and high-accuracy real-time detection algorithms. To address this need, Document [14] proposes a real-time defect detection method based on LEE-YOLOv7. This method is specifically designed for edge devices, with the goal of improving both detection efficiency and accuracy. This method first optimizes the input during the training phase using the Mosaic-9 data augmentation technique. It then introduces a Lightweight Convolutional Network, LCNet (Light Convolution Network) [15], to reduce the network’s burden. Finally, the network is deeply optimized with the Meta-ACON [16] activation function and the Wise-IoU [17] loss function. This approach achieves the precise detection of defects such as grading ring defects, suspended foreign objects, and bird nests in transmission lines.

In summary, the current research predominantly concentrates on detecting multiple types of defects in transmission lines, with comparatively few studies focusing exclusively on grading ring defects. Detecting grading ring defects presents several challenges. First, there are subtle visual differences between normal and defective grading rings. Additionally, extracting detailed defective features can be difficult. There is also a risk of losing critical defect details during the detection process. To address these challenges, this paper proposes a defect detection method for transmission line grading rings based on an improved YOLOv8 model. Based on the YOLOv8-M model, we first incorporated CloAttention (CloFormer Attention Mechanisms) [18] to enhance the C2f module in the feature extraction network. CloAttention aggregates both global and local information. The C2f module, a key part of the YOLOv8 architecture, includes multiple Bottleneck blocks. Each Bottleneck block features two convolutional layers that transform the feature maps and extract higher-level representations. The global branch of CloAttention learns long-range dependencies from deep features, while the local branch captures detailed regional information, thereby improving the model’s ability to extract features related to grading ring defects. Additionally, the paper incorporates CARAFE (Content-Aware ReAssembly of Features) [19], a universal, lightweight, and highly effective operator, to replace the original upsampling operation in YOLOv8. CARAFE uses low-level feature information to predict reassembly kernels and reconstruct features within predefined neighboring regions. This modification helps mitigate the loss of detailed information about grading ring defects during the fusion process, enabling the network to capture more comprehensive feature information.

The structure of this paper is organized into four main sections: Section 1 introduces the research background and status of grading ring defects. It highlights existing issues in defect detection. Section 2 details the methodology. It begins with an overview of the YOLOv8-M model’s architecture and then describes two specific enhancements made in this study. Section 2.1 and Section 2.2 elaborate on the necessity, structure, and methodology of these enhancements. Section 3 covers the datasets and evaluation metrics. Section 3.1 presents the experimental environment and datasets. Section 3.2 explains the evaluation metrics and their calculation methods. Section 4 showcases experimental results and analysis. Section 4.1 presents ablation experiments to verify the effectiveness of the proposed enhancements. Section 4.2 analyzes the confusion matrix results, demonstrating reduced missed and false detections. Section 4.3 conducts comparative experiments to validate the proposed model’s superiority over current mainstream detection models. Section 5 concludes the paper with a summary of the work and insights into future research directions.

2. Methods

The YOLOv8-M model is composed of three main components: the Backbone, the Neck, and the Head. The Backbone functions as the feature extraction network, tasked with extracting features from the input image and passing them to the Neck. The Neck acts as the feature fusion network, integrating features across multiple scales. The Head is responsible for predicting both the class and location of the targets, with each detection head comprising two branches dedicated to classification and regression tasks.

This paper enhances the YOLOv8-M model by integrating CloAttention into the C2f modules and replacing the last two C2f modules in the Backbone network. CloAttention combines shared and context-aware weights to process high-frequency local information and traditional attention mechanisms to manage low-frequency global information, thereby improving the model’s feature extraction capabilities. In the feature fusion network, CARAFE is used to replace the original upsampling module. CARAFE generates adaptive upsampling kernels based on the semantic information from grading ring defect feature maps, which helps mitigate the loss of detailed defect features. The improved YOLOv8-M structure is illustrated in Figure 1. In the diagram, “Conv” represents convolution, “k” denotes the size of the convolution kernel, and “s” stands for the stride. The C2f module consists of multiple Bottleneck blocks, each containing two convolutional layers. “C2f+Clo” indicates the integration of CloAttention into the Bottleneck layers of the C2f module, resulting in Bottleneck_Clo. “SPPF” stands for the Spatial Pyramid Pooling Fast module, while “CARAFE” represents the Content-Aware ReAssembly of Features. “Concat” denotes the concatenation of two feature maps, and “Detect” refers to the detection head, which utilizes two separate branches for object classification and bounding box regression prediction.

2.1. CloAttention

In the task of defect detection for transmission line grading rings, the precision of defect feature extraction is critical to the model’s ability to identify defects accurately. By applying CloAttention to the C2f modules at the end of the feature extraction network, the model’s feature representation capabilities are enhanced. This improvement helps the network better extract features from defect regions while minimizing interference from irrelevant features. CloAttention integrates global contextual information while preserving detailed local features, allowing the network to more effectively detect defects in grading rings. This approach provides the network with a deeper understanding and recognition of grading ring defects, thereby enhancing the overall comprehensiveness and accuracy of the detection.

CloAttention consists of two main components: a global branch and a local branch, as depicted in Figure 2. In the diagram, “LN” represents layer normalization, and “FC” denotes a fully connected layer. Moreover, “pool” refers to the pooling operation, a downsampling technique; “matmul” stands for matrix multiplication between two tensors; “softmax” indicates the normalization function; and “cat” represents the concatenation operation. The “mul” operation performs element-wise multiplications between tensors, also known as the Hadamard product. “DWconv” stands for depth-wise convolution. “Swish” and “Tanh” are both commonly used activation functions in deep learning.

The global branch employs traditional attention operations to help the model capture low-frequency global information. By effectively modeling the global context, it establishes long-range dependencies, enabling the transmission of feature information across distant locations and thereby enhancing the model’s detection accuracy. In contrast, the local branch uses both shared and context-aware weights to capture high-frequency local information. This design improves the model’s ability to manage relationships between different positions within the image, allowing for more precise feature extraction.

The global branch first performs a linear transformation on the input to obtain Q, K, and V. Then, it downsamples K and V to reduce the number of parameters. Finally, it applies standard attention processing to Q, K, and V to extract low-frequency global information, which is represented by Equation (1):

X_{g l o b a l} = Attntion (Q_{g}, Pool (K_{g}), Pool (V_{g}))

(1)

where

Q_{g}

,

K_{g}

, and

V_{g}

represent the query, key, and values obtained after linear transformation;

X_{g l o b a l}

represents the final low-frequency global information obtained. Attention denotes the standard attention operation, and Pool represents the downsampling operation.

The global branch can effectively capture low-frequency global information but lacks the capability to process high-frequency local information [20]. The local branch is used to address this issue. First, Q, K, and V are obtained through linear transformation, which is represented by Formula (2):

Q, K, V = FC (X_{i n})

(2)

In Formula (2), FC represents the fully connected layer, denotes the input after layer normalization, and

Q

,

K

, and

V

represent the query, key, and values obtained after the input goes through linear transformation.

Then, a depthwise convolution (DWconv) [21] is applied to aggregate local information, as represented by Formula (3):

V_{s} = DWconv (V),

(3)

Formula (3) represents the output after DWConv aggregates, where the weights of DWconv are globally shared.

After integrating the local information of V with shared weights, the context-aware weights are generated by combining Q and K. The specific steps involve using two DWconvs to aggregate the local information of Q and K separately, as represented by Formulas (4) and (5), respectively:

Q_{l} = DWconv (Q),

(4)

K_{l} = DWconv (K),

(5)

Formulas (4) and (5),

Q_{l}

and

K_{l}

represent the outputs after using DWConv to aggregate the information of

Q

and

K

respectively.

Next, the Hadamard product of

Q_{l}

and

K_{l}

is calculated, and the resulting output transforms FC, Swish, and FC. The transformed result is then divided by the number of token channels. The obtained result is passed through a Tanh function and then calculated with

V_{s}

using the Hadamard product and yielding context-aware weights ranging from −1 to 1. Finally, the generated weights are used to enhance local features. The specific process is represented by Formula (6):

X_{l o c a l} = Tan h (\frac{FC (Swish (FC (Q_{l})))}{\sqrt{d}}) ⊙ V_{s},

(6)

In Formula (6),

X_{l o c a l}

represents the final high-frequency local information obtained. Here, d is the number of token channels, and ⊙ denotes the Hardmard product. Tanh and Swish represent nonlinear functions that can generate higher-quality context-aware weights. The local branch utilizes both shared weights and context-aware weights. The context-aware weights enable the model to better adapt to the input content during the local perception process. Compared to local self-attention, the introduction of shared weights allows the model to better process high-frequency information, which can improve the performance of detecting defects in grading rings.

The outputs of the local branch and the global branch are concatenated along the channel dimension, and finally, a fully connected layer is connected, as represented by Formula (7):

X_{o u t} = FC (Concat (X_{l o c a l}, X_{g l o b a l})),

(7)

where

X_{o u t}

represents the final output after attention, and Concat denotes the concatenation operation.

2.2. CARAFE

Traditional upsampling methods, such as nearest neighbor or bilinear interpolation, rely solely on the spatial positions of pixels to determine the upsampling kernel. These methods do not utilize the semantic information of feature maps and generally have a limited receptive field. By replacing the original upsampling module in YOLOv8-M with CARAFE, we can expand the receptive field and incorporate additional contextual information from the image. This enhancement improves the feature pyramid network’s performance and reduces the loss of grading ring defect information during the fusion process, leading to the more accurate detection of defect features.

CARAFE primarily consists of two components: an upsampling kernel prediction module and a content-aware reassembly module [20], as illustrated in Figure 3. The upsampling kernel prediction module first utilizes a

1 \times 1

convolution to compress the channels of the input feature map of size

H \times W \times C

from

C

to

C_{m}

. It then employs a

k_{e n c o d e r} \times k_{e n c o d e r}

convolution layer to predict the upsampling kernels, transforming the channel count from

C_{m}

to

σ^{2} k_{u p}^{2}

. Following this, the channel dimension is expanded along the spatial dimension, generating upsampling kernels of size

σ H \times σ W \times k_{u p}^{2}

. Finally, these upsampling kernels undergo softmax normalization, ensuring that the sum of convolution kernel weights equals 1. In the content-aware reassembly module, each position in the output feature map is mapped back to the input feature map, and the region centered around that position is extracted. This region is then dot-produced with the predicted upsampling kernel for that point, resulting in the output value. Different channels at the same position share the same upsampling kernel.

3. Datasets and Evaluation Indicators

3.1. Experimental Environment and Data

The experiments in this paper were conducted on an Ubuntu 16.04 operating system with i7-5930K CPU and 1080Ti GPU. The deep learning framework used was PyTorch 1.9.0. The experimental environment is detailed in Table 1. Pre-trained weights were not used during training. The model was trained for 250 epochs with a batch size of 8.

The experimental dataset used in this paper comprises aerial images of transmission lines, totaling 1272 images of grading rings, including both normal and defective types. To effectively contrast defective samples with normal ones and enhance the model’s ability to learn defective features, normal samples were included alongside defective ones. The dataset was divided into a training set and a test set with an 80:20 ratio, resulting in 1018 images for training and 254 images for validation. Image labeling was performed using the open-source software Labelme 5.5.0.

3.2. Evaluation Indicators

This paper uses precision (P), recall (R), and mean average precision (mAP) as the evaluation metrics for the model. Precision measures the proportion of correctly predicted positive samples out of all samples predicted to be positive. The recall represents the proportion of correctly predicted positive samples among all actual positive samples. Specifically, mAP@50% reflects the average detection accuracy with the Intersection over the Union (IoU) threshold set at 0.5. The calculation formula is as follows:

P = \frac{T P}{T P + F P}

(8)

R = \frac{T P}{T P + F N}

(9)

A P = \int_{0}^{1} P (R) d R

(10)

m A P = \frac{1}{N} \sum_{i = 1}^{N} \int_{0}^{1} P_{i} (R_{i}) d R

(11)

Here,

T P

denotes the number of correctly predicted samples,

F P

refers to the number of falsely predicted samples,

F N

indicates the number of missed samples, and

N

represents the number of detection categories.

To evaluate the model’s performance, we used the number of model parameters (Params) to assess the model size, floating-point operations (FLOPs) to gauge computational complexity, and frames per second (FPS) to measure real-time detection performance.

4. Experimental Results and Analysis

4.1. Results and Analysis of Ablation Experiments

The baseline model must meet the fundamental characteristics and requirements of the power industry. The detection needs in this sector are highly specific and focus primarily on identifying faults in transmission lines, electrical equipment, and other related targets. In this context, the accuracy of the model is often prioritized over real-time performance. Incorrect detection results can lead to false alarms or missed detections, which can impact the stable operation of the power system. Given these requirements, this article selects the YOLOv8-M model as the baseline model.

To evaluate the impact of different modules incorporated in the proposed algorithm, this study conducted three ablation experiments. These experiments progressively added improvements to the baseline model to assess the effectiveness of each enhancement. The results are summarized in Table 2, where “√” indicates the inclusion of a particular method in the experiment. Method A represents the baseline model, specifically the YOLOv8-M model. Method B involved adding CloAttention to the YOLOv8-M model, while Method C incorporated both CloAttention and CARAFE into the YOLOv8-M model.

As shown in Table 2, adding both CloAttention and replacing the original upsampling operation with CARAFE improved the model’s ability to learn transmission line grading ring defect features, thereby enhancing its detection accuracy compared to the baseline YOLOv8-M model.

Compared to the baseline model (Method A), adding CloAttention (Method B) improved the detection accuracy for normal grading rings and defective grading rings by 1.1% and 1.2%, respectively, resulting in an average accuracy increase of 1.2%. When both CloAttention and CARAFE were added (Method C), the detection accuracy for normal and defective grading rings increased by 0.6% and 5.6%, respectively, leading to an average accuracy improvement of 3.1% compared to Method B. Overall, using both enhancements simultaneously (Method C) compared to the baseline model (Method A) improved the detection accuracy for normal and defective grading rings by 1.7% and 6.8%, respectively, with an average accuracy increase of 4.3%. Among these improvements, the challenge of extracting grading ring defect features significantly impacted the model’s detection capability. Incorporating the CloAttention mechanism into the deep feature extraction network allowed for the fusion of global context and local detail information, thereby enhancing the model’s ability to extract defective features. Additionally, replacing the traditional upsampling operation in the feature fusion network with CARAFE helped reduce the loss of defect feature information during upsampling. CARAFE improved the transmission of feature information, enriched the quality of fused features, and subsequently boosted the model’s recognition capability for grading ring defects. The combination of these two enhancements led to a substantial increase in the accuracy of grading ring damage detection. The modifications made lead to adjustments and computations across multiple layers of the network. As a result, there was an increase in the number of parameters, computational complexity, and detection time. Specifically, the number of parameters in the improved model rose from 25.8 M to 27.4 M. The computational cost increased from 78.7 GFLOPs to 81.3 GFLOPs. Additionally, FPS (frames per second) decreased from 91.7

f \cdot s^{- 1}

to 37.7

f \cdot s^{- 1}

. This paper prioritizes accuracy over detection speed.

4.2. Confusion Matrix

Figure 4 presents the confusion matrices for YOLOv8-M, and the enhanced YOLOv8-M with CloAttention and CARAFE (the proposed method) applied to the transmission line grading ring dataset. These matrices offer an intuitive view of missed and false detections. In Figure 4, the diagonal values of the confusion matrices show the proportion of samples correctly classified by each model. However, it is crucial to note that these diagonal values were not directly used to calculate the average precision (AP). The off-diagonal values, on the other hand, indicate the proportion of samples that were misclassified. An ideal confusion matrix would have all values along the diagonal, with zeros elsewhere.

The confusion matrix for this dataset is a 3 × 3 matrix. In this matrix, each column represents a true class, and each row represents a predicted class. For example, in Figure 4a, the value “0.35” in row 3, column 2, indicates that 35% of defective grading rings (the true class) were incorrectly classified as normal grading rings (the predicted class). This highlights a significant case of misclassification.

The comparison between Figure 4a,b reveal that the proposed method significantly improves the classification of defective grading rings. The proportion of correctly classified defective grading rings increases from 46% to 60%. Additionally, the proportion of defective grading rings misclassified as normal grading rings decreases from 35% to 23%. This method also reduces the proportion of the background that was incorrectly classified as defective grading rings from 19% to 17%. The experimental results indicate that the proposed method effectively reduced both the missed detections and false detections of defective grading rings.

4.3. Comparative Experiments

To further validate the effectiveness of the proposed method, a comparison was conducted with four mainstream detection models: YOLOv5-M, YOLOv6-M [22], YOLOv7 [23], and YOLOv9-C. The results of this comparison are presented in Table 3.

According to the analysis presented in Table 3, the YOLOv8-M model strikes the best balance between accuracy and detection speed among the compared models, making it the chosen baseline for this study. The challenge in detecting grading ring defects lies in its subtle visual differences from normal grading rings, which makes the defect features less prominent and lacking in important global context information. Additionally, traditional upsampling operations can lead to a loss of defect feature information, diminishing the quality of fused features and resulting in a lower average precision rate. To address these issues, this study incorporated CloAttention and CARAFE into the YOLOv8-M model. The test results demonstrate that this enhanced model achieves the highest scores in both mAP50 and mAP50:95 metrics. Specifically, the average mAP50 for normal and defective grading rings is 77.9%, with an mAP50:95 of 47.3%. For defective grading rings, the mAP50 reaches 67.6%, and the mAP50:95 stands at 36.6%.

The proposed method achieves superior detection accuracy for defective grading rings compared to YOLOv5-M, YOLOv6-M, YOLOv7, YOLOv8-M, and YOLOv9-C, with improvements of 9%, 10%, 15.7%, 6.8%, and 7.6%, respectively. The model’s parameters and GFLOPS are 27.4 M and 81.3 G, respectively, placing it in the mid-range for single-stage object detection algorithms. Due to the integration of CloAttention and CARAFE, the network involves additional adjustments and computations across multiple layers. Despite a slight reduction in computational parameters and detection speed, with an FPS of 37.7

f \cdot s^{- 1}

, the proposed method achieves the highest model accuracy on the grading ring dataset.

4.4. Visualization of the Results of Different Methods

To gain a more intuitive understanding of the improved algorithm’s performance, we conducted further analysis. This analysis focused on the localization effects of detection bounding boxes. We selected test images from various models for this purpose. By examining these images, we evaluated the accuracy and precision of the bounding boxes. The results of the grading ring image detection across different network models are visualized in Figure 5.

When testing the YOLOv5-M model on images from Group ① and Group ④, overlapping detection boxes were observed. Although the model successfully detected the defective grading ring on the right side of images in Group ②, the fitting of the detection box was suboptimal. During testing on images from Group ③, the defective grading ring on the right side was not detected, resulting in a missed detection. When testing Group ① images with the YOLOv6-M model, a misdetection occurred, mistakenly identifying the normal grading ring on the right as defective. During the testing on Group ③ images, there were overlapping detections, and the defective grading ring on the right was not detected. For the Group ④ images, a positioning repetition issue was observed. When testing Group ① and Group ② images with the YOLOv7 model, other components were sometimes mistakenly detected as grading rings. During testing on Group ③ images, the defective grading ring on the right was not detected, and other components were incorrectly identified as grading rings. For Group ④ images, the detection boxes did not align properly with the targets. With the YOLOv8-M model, during testing on Group ① images, detection boxes overlapped, with both defective and normal grading rings detected simultaneously on the right side. For Group ② images, while the defective grading ring on the right was detected, the detection box did not fully encompass the entire defective grading ring. In Group ③ images, the defective grading ring on the right was missed entirely, resulting in a missed detection. The testing of Group ④ images also revealed overlapping detection boxes. The YOLOv9-C model experienced overlapping detection boxes during testing on Group ① images, where the right-side target contained both defective and normal detection boxes simultaneously. For Group ② images, the defective grading ring on the right was undetected. In the testing of Group ③ images, although the defective grading ring on the right was detected, the detection box did not align properly with the target.

However, when applying the method described in this article to Figure 5, the detection results for these images improved significantly. All four groups of images were detected accurately, with no missed detections, misdetections, or overlapping detection boxes. This indicates that the proposed algorithm is more effective in capturing the defective features of grading rings, with enhanced contextual awareness and feature fusion capabilities. The improved model, thus, demonstrates superior accuracy in locating and identifying defects, outperforming other mainstream detection methods.

5. Conclusions

To address the challenges of detecting grading ring defects in the unmanned aerial vehicle (UAV) aerial photography of transmission lines, this paper proposes an enhanced detection method based on an improved YOLOv8 model. This method tackles the issues of minimal visual differences between normal and defective grading rings and the difficulties in feature extraction by integrating CloAttention into the feature extraction network. CloAttention improves the capture of both global and local information, leading to more discriminative features for detecting grading ring defects. To further address the issue of detail loss during feature fusion, CARAFE is introduced to replace the original upsampling operation in the feature fusion network. By leveraging additional image information, CARAFE significantly enhances the performance of the feature pyramid network, reducing the loss of defective features and substantially improving the model’s accuracy. Experimental results demonstrate that the proposed method achieves an average accuracy of 67.6% for detecting defective grading rings, marking a 6.8% improvement over the original model. This performance surpasses that of other mainstream object detection algorithms, making the proposed method highly effective for grading ring defect detection in transmission line aerial images and providing valuable support for intelligent transmission line inspection. In the dataset training described in this paper, pre-trained weights were not utilized, and the number of normal samples was not balanced with defective samples. If more datasets become available in the future, domain adaptation methods could enhance the performance of the target domain model. These methods could leverage information-rich source domain samples to improve results. Additionally, the combined structure of various fittings in transmission lines exhibits certain regularities. For instance, the structure composed of grading rings and heavy hammers is commonly used to counteract galloping in transmission lines. This combination helps prevent the excessive galloping amplitude of the conductor, which could otherwise lead to line tripping. It is evident that damage to the grading rings typically occurs near the heavy hammer. Effectively utilizing this structural knowledge could enhance the model’s detection capabilities. Generally, there is a trade-off between model accuracy and speed. Improving accuracy often requires an increase in computational resources, the number of parameters, and detection time. Future work may need to address this trade-off to achieve better performance.

Author Contributions

Conceptualization, S.X. and L.Z.; methodology, S.X. and L.Z.; software, Y.C.; validation, Y.C. and P.D.; formal analysis, S.X. and P.D.; investigation, Y.W. and Y.X.; resources, S.X.; data curation, S.X.; writing—original draft preparation, Y.X.; writing—review and editing, B.L. visualization, L.Z.; supervision, B.L.; project administration, Z.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Science and Technology Project of State Grid Sichuan Electric Power Company (No. 52199723000Q).

Data Availability Statement

We apologize that the power data cannot be disclosed due to their particularity and confidentiality; further inquiries can be directed to the corresponding authors.

Conflicts of Interest

Authors Siyu Xiang, Linghao Zhang and Yumin Chen were employed by the State Grid Sichuan Electric Power Research Institute. Authors Peike Du and Yao Wang were employed by the State Grid Sichuan Liangshan Electric Power Company. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Correction Statement

This article has been republished with a minor correction to the Funding statement. This change does not affect the scientific content of the article.

Abbreviations

Abbreviation	Full Name
LN	Layer normalization
FC	Fully connected layer
pool	Pooling
DWconv	Depth-wise convolution
cat	Concat
mul	Hadamard product

References

Rodgers, W.; Cardenas, J.A.; Gemoets, L.A.; Sarfi, R.J. A smart grids knowledge transfer paradigm supported by experts’ throughput modeling artificial intelligence algorithmic processes. Technol. Forecast. Soc. Chang. 2023, 190, 122373. [Google Scholar] [CrossRef]
Zhao, Z.B.; Jiang, Z.G.; Li, Y.X.; Qi, Y.C.; Zhai, Y.J.; Zhao, W.Q.; Zhang, K. Overview of visual defect detection of transmission line components. J. Image Graph. 2021 26, 2545–2560.
Luo, P.; Wang, B.; Wang, H.; Ma, F.; Ma, H.; Wang, L. An ultrasmall bolt defect detection method for transmission line inspection. IEEE Trans. Instrum. Meas. 2023, 72, 1–12. [Google Scholar] [CrossRef]
Li, L.R.; Chen, P.; Zhang, Y.L.; Mei, B.; Gong, P.C.; Yu, H.J. Detection of power devices and abnormal objects in transmission lines based on improved CenterNet. High Volt. Eng. 2023, 49, 4757–4765. [Google Scholar]
Zhao, Z.B.; Xiong, J.; Xu, H.D.; Zhang, J.H. Integrating Structural Reasoning for Deep Model Transmission Line Fittings and Their Defects Detection. High Volt. Eng. 2023, 8, 3346–3353. [Google Scholar]
Yang, J.; Zhang, K.; Shi, C.; Zheng, F. SBD-YOLOv5: An Enhanced YOLOv5-Based Method for Transmission Line Fitting Defect Detection. In Proceedings of the 2023 China Automation Congress (CAC), Chongqing, China, 17–19 November 2023; pp. 8090–8095. [Google Scholar]
Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10781–10790. [Google Scholar]
Tang, Y.; Han, J.; Wei, W.L.; Ding, J.; Peng, X.J. Research on part recognition and defect detection of transmission line in deep learning. Electron. Meas. Technol. 2018, 41, 60–65. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28. [Google Scholar] [CrossRef] [PubMed]
Lu, L.; Chen, Z.; Wang, R. Research on Transmission Line Defect Detection System Based on Cloud-Edge-End Collaboration. In Proceedings of the 2023 China Automation Congress (CAC), Chongqing, China, 17–19 November 2023; pp. 1829–1834. [Google Scholar]
Dai, Y.D.; Jiang, Z.J.; Wang, M.F.; Chen, S.H. Deep Learning Based Transmission Line Mean-Voltage Ring Tilt Identification. Process Autom. Instrum. 2022, 43, 106–110. [Google Scholar]
Fu, Q.; Liu, J.; Zhang, X.; Zhang, Y.; Ou, Y.; Jiao, R.; Li, C.; Mazzanti, G. A Small-Sized Defect Detection Method for Overhead Transmission Lines Based on Convolutional Neural Networks. IEEE Trans. Instrum. Meas. 2023, 72, 3524612. [Google Scholar] [CrossRef]
Hu, C.L.; Pei, S.T.; Liu, Y.P.; Yang, W.J.; Yang, R.; Zhang, H.Y.; Liu, H.F. Real-time Defect Detection Method for Transmission Line Edge end Based on LEE-YOLOv7. High Volt. Eng. 2024, 1–14. [Google Scholar]
Cui, C.; Gao, T.; Wei, S.; Du, Y.; Guo, R.; Dong, S.; Liu, B.; Zhou, Y.; Lv, X.; Liu, Q.; et al. PP-LCNet: A lightweight CPU convolutional neural network. arXiv 2021, arXiv:2109.15099. [Google Scholar]
Bhattacharyya, S.; Mtsuko, D.; Allen, C.; Coleman, C. Effects of Rashba-spin–orbit coupling on superconducting boron-doped nanocrystalline diamond films: Evidence of interfacial triplet superconductivity. New J. Phys 2020, 22, 093039. [Google Scholar] [CrossRef]
Tong, Z.; Chen, Y.; Xu, Z.; Yu, R. Wise-IoU: Bounding box regression loss with dynamic focusing mechanism. arXiv 2023, arXiv:2301.10051. [Google Scholar]
Fan, Q.; Huang, H.; Guan, J.; He, R. Rethinking local perception in lightweight vision transformer. arXiv 2023, arXiv:2303.17803. [Google Scholar]
Wang, J.; Chen, K.; Xu, R.; Liu, Z.; Loy, C.C.; Lin, D. Carafe: Content-aware reassembly of features. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 3007–3016. [Google Scholar]
Si, C.; Yu, W.; Zhou, P.; Zhou, Y.; Wang, X.; Yan, S. Inception transformer. Adv. Neural Inf. Process. Syst. 2022, 35, 23495–23509. [Google Scholar]
Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar]
Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W.; et al. YOLOv6: A single-stage object detection framework for industrial applications. arXiv 2022, arXiv:2209.02976. [Google Scholar]
Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]

Figure 1. Improvement of YOLOv8-M structure diagram.

Figure 2. Structural diagram of CloAttention.

Figure 3. Structural diagram of CARAFE.

Figure 4. Test results of the confusion matrix.

Figure 5. Test results of different models.

Table 1. The parameters of the experimental platform.

Parameter	Configure
Operating System	Ubuntu 16.04
Deep Learning Framework	Pytorch 1.9.0.
CPU Model	i7-5930K
Graphics Card (GPU) Model	1080Ti
CUDA	10.2
Programming Language	Python 3.8

Table 2. Results of ablation experiments.

Method	YOLOv8-M	CloAttention	CARAFE	mAP@0.5/%			$Param /$ $1 0^{6}$	$FLOPs / 10^{9}$
Method	YOLOv8-M	CloAttention	CARAFE	NormalGrading Ring	Defective GradingRing	All	$Param /$ $1 0^{6}$	$FLOPs / 10^{9}$
A	√			86.4	60.8	73.6	25.8	78.7
B	√	√		87.5	62.0	74.8	27.3	81.0
C	√	√	√	88.1	67.6	77.9	27.4	81.3

Table 3. Detection performance of different models.

Model	Detection Target	mAP50/%	mAP50:95/%	$Params / 1 0^{6}$	$FLOPs / 10^{9}$	$FPS / (f \cdot s^{- 1})$
YOLOv5-M	Normal and defective grading ring	72.2	46.4	20.8	47.9	142.8
YOLOv5-M	Defective grading ring	58.6	35.4	20.8	47.9	142.8
YOLOv6-M	Normal and defective grading ring	71.7	46.0	34.8	85.6	111.7
YOLOv6-M	Defective grading ring	57.6	35.7	34.8	85.6	111.7
YOLOv7	Normal and defective grading ring	68.3	40.2	36.4	103.2	65.3
YOLOv7	Defective grading ring	51.9	28.1	36.4	103.2	65.3
YOLOv8-M	Normal and defective grading ring	73.6	46.2	25.8	78.7	91.7
YOLOv8-M	Defective grading ring	60.8	35.5	25.8	78.7	91.7
YOLOv9-C	Normal and defective grading ring	73.9	47.0	50.7	236.6	43.3
YOLOv9-C	Defective grading ring	60.0	35.5	50.7	236.6	43.3
Ours	Normal and defective grading ring	77.9	47.3	27.4	81.3	37.7
Ours	Defective grading ring	67.6	36.6	27.4	81.3	37.7

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xiang, S.; Zhang, L.; Chen, Y.; Du, P.; Wang, Y.; Xi, Y.; Li, B.; Zhao, Z. A Defect Detection Method for Grading Rings of Transmission Lines Based on Improved YOLOv8. Energies 2024, 17, 4767. https://doi.org/10.3390/en17194767

AMA Style

Xiang S, Zhang L, Chen Y, Du P, Wang Y, Xi Y, Li B, Zhao Z. A Defect Detection Method for Grading Rings of Transmission Lines Based on Improved YOLOv8. Energies. 2024; 17(19):4767. https://doi.org/10.3390/en17194767

Chicago/Turabian Style

Xiang, Siyu, Linghao Zhang, Yumin Chen, Peike Du, Yao Wang, Yue Xi, Bing Li, and Zhenbing Zhao. 2024. "A Defect Detection Method for Grading Rings of Transmission Lines Based on Improved YOLOv8" Energies 17, no. 19: 4767. https://doi.org/10.3390/en17194767

APA Style

Xiang, S., Zhang, L., Chen, Y., Du, P., Wang, Y., Xi, Y., Li, B., & Zhao, Z. (2024). A Defect Detection Method for Grading Rings of Transmission Lines Based on Improved YOLOv8. Energies, 17(19), 4767. https://doi.org/10.3390/en17194767

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Defect Detection Method for Grading Rings of Transmission Lines Based on Improved YOLOv8

Abstract

1. Introduction

2. Methods

2.1. CloAttention

2.2. CARAFE

3. Datasets and Evaluation Indicators

3.1. Experimental Environment and Data

3.2. Evaluation Indicators

4. Experimental Results and Analysis

4.1. Results and Analysis of Ablation Experiments

4.2. Confusion Matrix

4.3. Comparative Experiments

4.4. Visualization of the Results of Different Methods

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Correction Statement

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI