1. Introduction
Overhead power lines play a crucial role in power transmission systems, which are responsible for delivering electrical energy. However, these power lines face various natural and man-made threats, such as lightning strikes, wind and snow, birds, trees, pollution, and corrosion [
1,
2]. These threats can cause damage to the power lines, thereby affecting the safety and stability of power transmission. For example, lightning strikes are a common natural threat that can cause transient overvoltages in power lines, trigger arc discharges, and damage power lines and insulators. Wind and snow can increase the mechanical load on power lines, even leading to breakage. Birds may nest on power lines, causing short circuits. Trees may come into contact with power lines, causing grounding. Pollution can reduce the insulating performance of power lines, triggering breakdown discharges. Corrosion can reduce the mechanical strength of power lines, affecting their service life [
3,
4,
5,
6,
7]. These threats require us to take effective preventive and responsive measures. Therefore, the timely and effective detection of damage to overhead power lines is an important measure to ensure the operation of the power grid.
In past research, the detection of damage to overhead power lines has become a widely focused topic. Numerous technical attempts have emerged, including traditional techniques and some emerging techniques, such as CNN, GAN, and deformable visual transformations [
8,
9,
10]. However, these methods often face a series of challenges and limitations when dealing with the complex environment and diverse damage features of overhead power lines.
In the field of overhead power line damage detection, traditional techniques currently rely mainly on manual or drone methods for detection [
11,
12]. However, these methods have certain shortcomings. The manual detection process is cumbersome, time consuming, and dangerous, relying on the subjective judgment of the inspector, and it is difficult to adapt to complex and changing environments; although drone detection can cover a large area quickly and safely, it is costly, greatly affected by the environment, and data processing is complex and requires professional operation and maintenance [
13,
14,
15].
In the research on CNN, GAN, and deformable visual transformations, although these techniques have explored overhead power line damage detection, there are also some obvious shortcomings. First, traditional convolutional neural networks (CNNs) may have limitations in capturing multi-scale, multi-shape damage features [
16]. In the complex environment of overhead power lines, damage features may have huge variations, and traditional CNN architectures may struggle to effectively adapt to this diversity. Second, although generative adversarial networks (GANs) have made significant progress in image generation and processing tasks, they may be restricted by training data and the complexity of damage features in actual power line damage detection. The training process of GANs may require a large amount of annotated data [
17], and in power line damage detection, obtaining a large-scale and accurately annotated dataset may be challenging. In addition, deformable visual transformation methods, although they have a certain degree of flexibility, may still have some limitations in dealing with the diversity and complexity of power line damage. These methods may need more powerful and intelligent feature extraction and transformation mechanisms to adapt to the various shapes and scene changes of power line damage. Therefore, it is particularly urgent to use computer vision technology to achieve the automatic, intelligent, and efficient damage detection of overhead power lines.
In recent years, deep learning has made major breakthroughs in the field of computer vision, especially object detection technology, which has shown strong performance and application potential in multiple fields [
18,
19]. The purpose of object detection technology is to locate and identify objects of interest in images, usually including two steps: object localization and object classification. Object localization is used to find out the location and range of objects in the image, which are usually represented by bounding boxes; object classification is used to judge which category the object belongs to, and it is usually represented by category labels. Object detection technology can be divided into two types: region-based methods and regression-based methods. Region-based methods first generate some candidate regions; then, they classify and regress each region to obtain the final detection result. Regression-based methods directly regress the position and category of objects from the image, do not generate candidate regions [
20], and are therefore faster and more efficient. YOLO (You Only Look Once) is a typical regression-based object detection method [
21,
22,
23]. It divides the image into multiple grids, predicts a fixed number of bounding boxes and category probabilities for each grid, and then filters out the final detection result based on confidence. YOLO has the advantages of fast speed, high accuracy, and strong generalization ability, and it has become one of the mainstream methods in the field of object detection.
The adaptive threshold mechanism is a dynamic technique in image processing that adjusts the detection sensitivity based on the brightness, contrast, and other features of the input image [
24,
25]. In overhead power line damage detection, changes in illumination, weather conditions, and other factors under different environmental conditions may affect the feature performance of the image. Therefore, introducing an adaptive threshold mechanism can help the model better adapt to different environments and improve the robustness of the model. This mechanism can learn the feature changes of the image during the training process so that the model can effectively adjust the threshold under different scenarios, thereby improving the accuracy and robustness of damage detection.
GSConv is a convolution method that introduces a more flexible way of information interaction in the model through grouping and random permutation [
26,
27]. In overhead power line damage detection, images may contain damage features of different scales and shapes, and traditional convolution methods may struggle to fully capture these diverse features. GSConv enhances the model’s perception of different features through grouping and random permutation, improving the model’s detection accuracy. At the same time, GSConv effectively balances the computational burden of the model while maintaining the model’s running speed, making the model more suitable for the task of overhead power line damage detection in actual scenarios.
Slim Neck is a lightweight network structure designed to reduce the complexity and computational burden of the model while maintaining good performance [
28,
29]. In overhead power line damage detection, due to the large amount of data and the need for the model to perform inference in real-time or near real-time scenarios, lightweight network structures become particularly important. Slim Neck optimizes the network structure, reduces redundant parameters and computational units, and achieves improved model running efficiency while maintaining detection accuracy [
30]. This lightweight design makes the model more operable and better able to meet the requirements of actual applications.
This paper aims to propose an enhanced version of YOLOv8 for overhead power line damage detection. The adaptive threshold mechanism, GSConv, and Slim Neck are integrated into the YOLOv8 framework to form a new architecture. The adaptive threshold mechanism can dynamically adjust the detection threshold based on the brightness, contrast, and other features of the input image, thereby improving the robustness of the model. GSConv is a novel convolution method that can balance the running speed of the model while maintaining model accuracy. Slim Neck is a lightweight network structure that can reduce the complexity and computational burden of the model while maintaining good performance.
The main contributions of this paper are as follows. (1) For the problem of high-voltage transmission line damage detection, an enhanced version of YOLOv8 is proposed. This method integrates the adaptive threshold mechanism, GSConv, and Slim Neck into the YOLOv8 framework, thereby improving the robustness, speed, and efficiency of the model. (2) Comprehensive experiments were conducted on the high-voltage transmission line damage detection dataset. These experiments verified the effectiveness and superiority of the proposed method as well as compared it with existing methods, demonstrating the advantages and characteristics of the proposed method.
The structure of this paper is arranged as follows:
Section 2 describes the proposed method in detail, including the design and implementation of the adaptive threshold mechanism, GSConv, and Slim Neck.
Section 3 reports the experimental results and analysis, including the dataset, evaluation indicators, experimental settings, experimental results, and comparative experiments.
Section 4 provides a summary of the proposed method and makes predictions about its potential usefulness. At the same time, we highlight the strengths and weak-nesses of the method, as well as the directions for further research that we plan to undertake.
2. Optimized YOLO v8 Technique for Defect Identification
This section introduces an optimized YOLO v8 technique for defect detection. Firstly, the YOLO v8 algorithm will be outlined, which is followed by a discussion on the application of adaptive threshold processing in power line damage recognition. Subsequently, an enhancement of YOLOv8 based on GSConv will be presented. Finally, overhead power line damage detection based on Slim Neck will be explored.
2.1. Overview of the YOLO v8 Algorithm
YOLOv8 is the latest version of an object detection algorithm known as “You Only Look Once”, which was proposed by Ultralytics in 2023. The name implies that this algorithm only needs to look at an image once to identify the objects within it, giving it a significant speed advantage.
Compared to previous versions (such as YOLO v3, v4, v5, YOLOX, and v7), YOLOv8 has improved in terms of accuracy and detection precision. This is mainly due to its three main components: input, backbone, and head [
31].
‘Input’ is the image we want to detect. ‘Backbone’ is a neural network whose task is to extract useful information, known as “features”, from the input image. These features can help us understand what the objects in the image are and where they are located. ‘Head’ is another neural network whose task is to use the features extracted by the backbone to predict the type and location of objects in the image. Although YOLOv8 still uses the model structure of YOLO v5, it has been improved in many ways, making it superior in terms of developer experience and architecture. For example, YOLOv8 introduces new techniques such as spatial attention, feature fusion, and context aggregation. These techniques enable YOLOv8 to detect objects in images faster and more accurately, making it a key technology in the field of object detection.
Figure 1 contains a structure diagram of the improved YOLOv8 model [
32,
33].
Advantages: The YOLOv8 model surpasses other YOLO series models in terms of accuracy and detection precision. By introducing spatial attention, feature fusion, and context aggregation modules, it enables faster and more accurate object detection.
Disadvantages: Despite the improvements made in many aspects of YOLOv8, there are still some limitations. For instance, its ability to detect small targets needs to be improved, and its adaptability to complex backgrounds also needs further optimization.
2.2. Application of Adaptive Thresholding in Power Line Damage Identification
In this research, an adaptive threshold mechanism is introduced that can dynamically adjust the detection threshold [
34]. This mechanism adjusts the threshold based on the ‘brightness’ and ‘contrast’ characteristics of the input image, enabling the model to adapt to various environmental conditions and thereby enhancing the robustness of the model.
It is noted that the brightness and contrast of the input image significantly affect the detection threshold. Therefore, an algorithm is designed that dynamically adjusts the detection threshold based on the brightness and contrast of the input image. If the brightness of the input image is high, the detection threshold is correspondingly increased to avoid false detections caused by overly bright pixels. Conversely, if the brightness of the input image is low, the detection threshold is lowered to avoid missing detections due to overly dark pixels. As shown in
Figure 2, the image is first read, and its brightness and contrast are calculated. Then, two histograms are constructed to visualize these parameters: one for displaying brightness and the other for displaying contrast. In the histogram of brightness, the average value of brightness is marked with a red dashed line. In the histogram of contrast, the range of contrast, i.e., the standard deviation of brightness, is marked with two blue dashed lines. This visualization method can help better understand the distribution of brightness and contrast in the image, thereby providing an intuitive basis for the adaptive threshold mechanism.
To implement this dynamic threshold adjustment, a base threshold T0 is first defined. Then, T0 is adjusted based on the brightness B and contrast C of the input image to obtain the final detection threshold T. This adjustment process can be represented by the following formula:
In this formula, k1 = 0.1, k2 = 0.2, and T0 = 0.5 are set. These three parameters are obtained through experiments and optimization. They determine the degree of influence of brightness and contrast on the threshold.
In the process of determining the parameters in Formula (1), a base threshold T0 is first defined according to empirical values. Then, T0 is adjusted based on the brightness B and contrast C of the input image to obtain the final detection threshold T. Some experiments based on empirical values were conducted, and the following outline the experimental process and results:
As illustrated in
Table 1, a series of experiments were conducted where the base threshold T0 was varied while keeping K1 and K2 constant. The experimental results indicate that when T0 is set to 0.5, the detection accuracy rate reaches its peak at 89.7%. However, as T0 continues to increase, the detection accuracy rate begins to decline. These findings underscore the importance of selecting an appropriate value for T0 for optimal detection performance.
From the table above, it can be seen that when T0 = 0.5, k1 = 0.1, and k2 = 0.2, the detection accuracy of the model is the highest. Therefore, this set of parameters was chosen as the empirical values. Then, based on these empirical values, more tests were conducted, and the following are the experimental process and results:
As illustrated in
Table 2, we conducted a series of experiments where the values of K1 and K2 were varied while keeping T0 constant at 0.5. The experimental results indicate that the detection accuracy of the model changes when the values of K1 and K2 are altered. However, the model’s detection accuracy remains highest when T0 = 0.5, K1 = 0.1, and K2 = 0.2. Therefore, we selected this set of parameters as the final parameters. This method demonstrated effective results in the experiments, significantly improving the accuracy and robustness of power line damage identification. These experimental results further validate our selection.
To further understand this formula, further derivation is carried out. The brightness B of the input image is between 0 and 255, and the contrast C is between −127 and 127. Therefore, the following formula can be obtained:
This formula indicates that when the brightness and contrast of the input image change, the detection threshold T will also change correspondingly. This is what is referred to as the adaptive threshold mechanism. As shown in
Figure 3, this threshold is dynamically adjusted based on the brightness and contrast of the image to adapt to the characteristics of the image. Then, the adjusted threshold is applied to the image, binary processing is carried out, and thus a new image is obtained.
In this manner, the model can automatically adjust the detection threshold according to the characteristics of the input image, thereby achieving good detection results under various environmental conditions. This method has demonstrated effective results in experiments, significantly enhancing the accuracy and robustness of power line damage recognition.
2.3. YOLOv8 Enhancement Based on GSConv
GSConv is a special convolution operation that uses GS (Group Separable) convolution to improve the model’s performance [
35,
36]. The goal of GSConv is to make the output of Depthwise Separable Convolution (DSC) as close as possible to Standard Convolution (SC) while reducing computational cost. GSConv can better balance the accuracy and speed of the model.
The formula for GSConv can be expressed as:
Here, x is the input feature map, GConv is the group convolution operation I defined, and Shuffle is the channel shuffle operation I defined. I further define the GConv and Shuffle operations as follows:
where Conv is the convolution operation, N is the number of input feature maps,
is the i-th input feature map, and
is a permutation function used to shuffle the channels of the input feature map, as shown in
Figure 4.
In the YOLOv8 model, this paper replaces the Standard Convolution (SC) in the backbone network with GSConv to improve the efficiency and accuracy of the model. The formula is:
where SC is the standard convolution operation, and x is the input feature map. By replacing SC with GSConv, the efficiency and accuracy of the YOLOv8 model can be improved.
To further understand the combination of GSConv and YOLOv8, this paper delves deeper into the working principle of GSConv. It was found that the key to GSConv lies in its two main components: Group Convolution (GConv) and Channel Shuffle. Group convolution is a special convolution operation that divides the input feature map into multiple groups and then performs convolution within each group. This can reduce the computational load while maintaining the representational power of the model. Channel shuffle is an operation that shuffles the output of group convolution to increase the diversity of features.
The formula for GSConv can be further expanded as:
In the YOLOv8 model, the formula for replacing SC with GSConv is expressed as:
where YOLOv8(x) represents the output of the YOLOv8 model, and x is the input feature map.
represents the replacement of Standard Convolution (SC) with GSConv.
is the specific implementation of GSConv, which included three steps: First, perform Standard Convolution (SC) on the input feature map x, then perform group convolution (Conv) on the output of SC, and finally perform channel shuffle on the result of group convolution.
Figure 5 shows the addition of the GSConv module in the backbone structure of YOLOv8.
2.4. Overhead Power Line Damage Detection Based on Slim Neck
Overhead power line damage detection is a crucial task in power line inspection, and its accuracy directly impacts power safety. The target size of overhead power line damage varies greatly, the damaged area is small, and the detection background is complex and variable, which poses certain challenges to the detection.
The original YOLOv8 network model tends to ignore the detailed information of small targets during the feature extraction process, leading to a decrease in detection accuracy. To address this issue, the Slim Neck module is added to the backbone network of YOLOv8.
The Slim Neck module is based on the Efficient Aggregation Network E-ELAN, which can assign higher weights to effective feature channels, suppress irrelevant background features, and reduce the impact of background noise on target detection, thereby enhancing the network model’s ability to judge the location and size of overhead power line damage [
37,
38]. As shown in
Figure 6, the specific structure of the Slim Neck module is as follows:
The specific structure of the Slim Neck module is as follows:
Here, represents the input feature map, where denotes the height of the feature map, denotes the width of the feature map, and denotes the number of channels. The GAP operation performs average pooling on each channel to obtain a new feature map of size . The convolution uses a convolution kernel of size 1 to convolve the feature map obtained in the previous step, obtaining a new feature map that represents the weight information of each channel. Feature map weighting multiplies the obtained weight feature map with the original feature map channel by channel to obtain the weighted feature map.
The formula for the Slim Neck module is as follows:
Here, represents the weight feature map, and represents the weighted feature map.
The GAP operation is an abbreviation for Global Average Pooling, which performs average pooling on each channel of the input feature map to obtain a new feature map of size
. The
convolution is a convolution operation with a kernel size of
, which can reduce or increase the dimension of the input feature map [
39]. In the Slim Neck module, the
convolution is used to calculate the weight feature map. Feature map weighting is the multiplication of the weight feature map and the original feature map channel by channel to obtain the weighted feature map.
The Slim Neck module avoids the use of fully connected layers, reducing the number of model parameters. With a small number of parameters, it can effectively improve the detection performance of the network model [
40]. Experimental results show that adding the Slim Neck module to the YOLOv8 network model can effectively improve the detection accuracy of overhead power line damage.
Figure 7 shows the addition of the Slim Neck module in the YOLOv8 Neck structure.
4. Conclusions
In this study, an improved YOLOv8 model has been successfully proposed, which was specifically designed for high-voltage power line damage detection. The model enhances robustness and accuracy by introducing an adaptive threshold mechanism, the GSConv convolution method, and a lightweight network structure, Slim Neck, while reducing the complexity of the model.
However, it is also clearly recognized that despite significant progress in some areas, there are still some potential weaknesses and challenges. Firstly, although the introduction of Slim Neck reduces the complexity of the model, the adaptive threshold mechanism and GSConv convolution method may increase the computational requirements of the model, which could pose a challenge for resource-limited devices or systems. Secondly, as the model needs to learn how to dynamically adjust the detection threshold based on the characteristics of the input image, it may require a longer training time. The model has shown significant accuracy improvements when dealing with “lightning strike” and “break” labels. However, for other types of power line damage, such as corrosion or wear, the detection capability of the model may decline. In addition, if the background of the power line is complex or the lighting conditions are poor, it may affect the detection performance of the model. Despite these challenges, among all compared network models, this model achieved the best results on all indicators, demonstrating its superiority in object detection tasks. Especially in terms of precision, recall, and mAP50, the model achieved 90.3%, 89.6%, and 91.1%, respectively, all of which are the highest. The model has high robustness and accuracy, and it can dynamically adjust the detection threshold based on the characteristics of the input image, thus finding a balance between the running speed and accuracy of the model. Therefore, despite some challenges, this model is very suitable for the real-time detection of high-voltage power line damage. This paper will continue to optimize and improve this model to address these potential weaknesses and challenges in hopes of achieving better results in the future.