Insulator Defect Detection Based on YOLOv5s-KE

Fang, Guozhi; An, Xin; Fang, Qi; Gao, Shengpan

doi:10.3390/electronics13173483

Open AccessArticle

Insulator Defect Detection Based on YOLOv5s-KE

¹

School of Mechanical and Electronic Engineering, Quanzhou University of Information Engineering, Quanzhou 362000, China

²

Heilongjiang Province Key Laboratory of Laser Spectroscopy Technology and Application, Harbin University of Science and Technology, Harbin 150080, China

³

State Grid HLJ Electric Power Transmission and Transformation Engineering Co., Ltd., Harbin 150080, China

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(17), 3483; https://doi.org/10.3390/electronics13173483

Submission received: 5 August 2024 / Revised: 26 August 2024 / Accepted: 27 August 2024 / Published: 2 September 2024

(This article belongs to the Special Issue Deep Learning in Image Processing and Pattern Recognition, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

:

To tackle the issue of low detection accuracy in insulator images caused by intricate backgrounds and small defect sizes, as well as the requirement for real-time detection on embedded and mobile devices, this research introduces the YOLOv5s-KE model. Integrating multiple strategies, YOLOv5s-KE aims to boost detection accuracy significantly. Initially, an enhanced anchor generation method utilizing the K-means++ algorithm is proposed to generate more appropriate anchor boxes for insulator defects. Moreover, an attention mechanism is integrated into both the backbone and neck networks to enhance the model’s capacity to focus on defect features and resist interference. To improve the detection of small defects, the EIoU loss function is implemented in place of the original CIoU loss function. In order to meet the real-time detection needs on embedded and mobile devices, the model is further refined through the integration of Ghost convolution for lightweight feature extraction and a linear transformation to reduce the computational burden of standard convolution. A channel pruning strategy is deployed to optimize the sparsely trained network, diminishing redundancy, and improving model generalization. Additionally, the CARAFE operator replaces the original upsampling operator to minimize model parameters and elevate detection speed. Experimental outcomes demonstrate that YOLOv5s-KE achieves a detection accuracy of 92.3% on the Chinese transmission line insulator dataset, marking a 5.2% enhancement over the original YOLOv5s. The streamlined version of YOLOv5s-KE achieves a detection speed of 94.3 frames per second, indicating an improvement of 30.1 frames per second compared to the original model. Model parameters are condensed to 9.6 M, resulting in a detection accuracy of 91.1%. This study underscores the precision and efficiency of the proposed approach, suggesting that the advanced strategies explored introduce novel possibilities for insulator defect detection.

Keywords:

insulator defect; YOLOv5s; attention mechanism; light weighting algorithm

1. Introduction

Insulator defect detection is a crucial application within the field of object detection. The increasing utilization of artificial intelligence has sparked interest among researchers in developing methods for detecting insulator defects, with a particular focus on integrating deep learning into defect detection. A review of existing literature reveals two primary categories of research: techniques based on image analysis for detecting insulator defects and methods that leverage deep learning for the same purpose.

Insulator defect detection using image-based techniques generally comprises two primary steps. Initially, insulators are segmented from complex backgrounds by leveraging features such as color characteristics [1,2] and morphological contours. Subsequently, a mathematical model is constructed utilizing the morphological features and positional data of the insulators to identify defects. For instance, dilation and erosion algorithms can be employed to meticulously process the image edges, thereby accentuating the defect characteristics and facilitating the construction of an appropriate detection model. Furthermore, in addition to the extraction of edge features, the color attributes can also be leveraged to discern defects on insulators. Oberweger M. et al. [3] utilized the local outlier factor (LOF) algorithm to identify defect locations, evaluating whether the insulator was faulty. In computer vision-based image classification problems, support vector machine (SVM) [4,5] and neural networks are often used.

However, computer image recognition algorithms inevitably have to face the problem of complex backgrounds. Therefore, some scholars used infrared images to detect defective insulators [6,7]. He Hongying’s team extracted nine color moment features related to the defect severity from the insulator images and used a radial basis function neural network to recognize insulator defects [8]. Zaripova’s team’s defect detection methods were based on a combined analysis of the mean and standard values of the brightness distribution of a group of insulators [9].

Although image processing-based methods have achieved notable progress in insulator defect detection, several challenges persist. Firstly, feature extraction often necessitates manual intervention to achieve optimal detection results, resulting in limited adaptability and insufficient generalization capability. Furthermore, the initial detection accuracy of image processing methods is relatively low, and achieving significant improvements in subsequent stages is challenging. Additionally, these methods fail to meet practical application requirements in terms of detection speed and effectiveness. Nevertheless, these methods also provide improvement solutions for the later emergence of deep learning, which can quickly penetrate into the field of insulator detection.

Deep learning algorithms have become a crucial branch of computer vision due to their effectiveness in object detection. Researchers have leveraged the practical experience gained from convolutional neural networks (CNNs) in object detection and applied it to insulator defect detection [10,11]. Subsequently, numerous deep learning-based algorithms have emerged for the task of insulator defect detection. However, there is no denying that complex background problems are still the research objects in the fields of deep learning.

YOLO, which stands for “You Only Look Once”, is an advanced real-time object detection system. Since its inception in 2015, it has undergone multiple iterations, with each version showing significant improvements in both speed and accuracy. It is renowned for its lightweight and efficient network architecture. Improvements to the YOLO algorithm series primarily focus on enhancing processing speed and recognition accuracy. Making the network lightweight can improve the speed of recognition. Wu Tao et al. improved the YOLOv3 algorithm for insulator recognition by using the lightweight MobileNet network as the backbone for feature extraction, replacing the original structure to enhance detection speed [12,13]. However, the feature extraction method of MobileNet primarily uses downsampling and lacks sufficient channel expansion, resulting in a significant loss of high-level semantic information and reduced detection accuracy. Deng FM replaced the backbone network CSPDarknet53 of YOLOv4 by MobileNetv3 [14]. Many lightweight backbone networks are used in insulator defect detection tasks, such as YOLOv3-tiny [15,16,17], YOLOv4-tiny [18,19], and YOLOv7-tiny [20,21,22]. Lightweight algorithms can make models easier to integrate into embedded systems. Weng’s teams deployed multiple models on Jetson Xavier NX. The model was compressed to 9.4 MB, with a frame rate of 9.5 on the Jetson Xavier NX [23]. Because embedded devices have the characteristics of small size and low weight, it is easy to integrate on drones. Meng Y’s teams proposed a lightweight target detection algorithm (NanoDet). Designed to address the issue of real-time and accurate defect detection on insulators using Unmanned Aerial Vehicles (UAVs) along power lines. Given the limited computational resources of UAVs, the proposed model incorporates a segmented design where energy-saving layers are mapped to the CPU for execution, maximizing computational resource utilization. Experimental results demonstrate that this method can significantly reduce power consumption by up to 46.4% without violating time constraints [24]. This design concept also provides a lot of information for this paper.

Various algorithms have shown their respective strengths. Insulators in aerial images often present challenges such as complex backgrounds and small defect sizes. Therefore, finding an algorithm that can accurately and quickly identify insulator defects is crucial. The above research provides us with a lot of information; however, we still need to solve the problems inherent to a lightweight model, with low detection accuracy and issues related to complex backgrounds. YOLOv5 is one of the most advanced object detection algorithms available. It offers four different network structure models: YOLOv5s, YOLOv5m, YOLOv5l, and YOLOv5x, differing mainly in the width and depth of the network configurations. Among them, the YOLOv5x network structure is more complex and demands higher device performance. In contrast, YOLOv5s is characterized by its small model size and fast detection speed, making it more convenient for practical applications and large-scale deployment. Therefore, this paper proposes an insulator defect detection algorithm based on YOLOv5s.

In this study, we focus on the detection of insulator defects, leveraging the YOLOv5s algorithm. We propose modifications to the YOLOv5s network to enhance the accuracy and speed of insulator defect detection. The remaining sections are organized as follows. Section 2 of the article mainly introduces the improved algorithm based on YOLOv5. According to the characteristics of insulator detection image, it functions by utilizing the Kmeans++ clustering method to improve the anchor boxes. Then we replace the backbone network of the model with GhostNet. After that, we prune the channel of the modified model to reduce the number of model parameters and improve the speed of the model. In Section 3 we selected a Chinese power transmission line insulator dataset for testing. We verified the effectiveness of the proposed algorithm by ablation experiments. Section 4 summarizes the contents of this paper. The main contributions of this paper are summarized as follows:

To address the challenges posed by complex backgrounds in aerial images of insulators and the small size of insulator defects, which occupy only a minor portion of the image, we present an improved insulator defect detection method called YOLOv5s-KE. Current algorithms face difficulties in achieving high detection accuracy under these circumstances. Our method commences by employing the Kmeans++ clustering technique to enhance the anchor boxes, consequently boosting the model’s capacity for generalization across defects of different scales.
We integrate the efficient channel attention (ECA) mechanism into the backbone and neck networks of YOLOv5s-KE, to suppress the interference from non-essential feature information. Additionally, we employ the efficient intersection over union loss (EIoU_Loss) as the loss function to improve the localization capability for small-sized defects, accelerate the convergence of the network model, and increase the accuracy of insulator defect detection.
We conduct experiments on the Chinese power transmission line insulator dataset and compare our method with mainstream algorithms to validate the effectiveness of the proposed YOLOv5s-KE. The experimental results demonstrate the superiority of our approach in detecting insulator defects with higher precision.
To meet the real-time detection requirements of insulator defects on embedded and mobile devices, we introduce a lightweight version of YOLOv5s-KE. We enhance the detection speed and reduce the model’s parameter size by incorporating the GhostNet network to construct the G_CSP1 module, replacing the CSP1 module in the YOLOv5s-KE backbone network, thereby reducing the computational load. Furthermore, we apply channel pruning techniques to remove redundant channels in the network, decreasing the model size. We also replace the original upsampling operator with the CARAFE operator, maintaining upsampling performance while reducing the model parameters and computational load, thus improving the detection speed. Finally, experiments conducted on the Chinese power transmission line insulator dataset, compared with mainstream lightweight algorithms, confirm the effectiveness of our proposed method.

2. Materials and Methods

Aiming at the problem that the size of the insulator defect is small and the detection accuracy is low under complex backgrounds, we propose a new YOLOv5s-KE algorithm, as shown in Figure 1.

Firstly, according to the input insulator defect picture, the anchor is regenerated by the K-means++ algorithm to be more suitable for the insulator defect characteristics, reduce the regression error of the bounding box, and generate the anchor box more suitable for the insulator defect. Secondly, in the feature fusion stage, the ECA-Net channel attention mechanism is introduced to strengthen the key features of insulator defects, to enhance the ability of the model to distinguish the background from the target. Finally, the loss function is optimized, the CIoU_Loss is replaced by the EIoU_Loss, and the width and height difference between the prediction frame and the target frame is used as the loss factor, which improves the positioning accuracy of the model for small targets, thus significantly improving the accuracy of insulator defect detection.

2.1. K-Means++ Anchor

The YOLOv5s model implements adaptive anchor calculation in its design, It can automatically adjust the size and proportion of anchors by analyzing the training dataset in the initial stage of training. This process usually uses a K-means clustering algorithm to analyze the real bounding box in the training dataset, so as to find the anchor points that best represent the size distribution of objects in the dataset. However, the output of the K-means algorithm is largely affected by the selection of the initial cluster centers, and the improper selection of the initial cluster centers may lead to poor clustering results, thus affecting the accuracy of clustering. If the initial selection of cluster centers is not accurate enough, the algorithm may not be able to effectively separate different categories, thus reducing the accuracy of clustering. It is easy to appear that the cluster centers are distributed to a cluster; at this time, the algorithm tends to the local optimal solution and may not reach the global minimum, which may affect the accuracy of clustering.

To solve the problem of the initial clustering center of the K-means algorithm, this chapter uses the K-means++ [25] method to improve the clustering of the insulator defect dataset. The K-means++ strategy evaluates the probability that each data point becomes a cluster center, and selects a new center by weighing the distance between the selected centers, thereby greatly improving the quality of selecting the initial cluster center, obtaining an anchor that is more suitable for the insulator defect dataset, and helping to reduce the regression error of the defect bounding box. And the detection accuracy of the defect area is improved. The following are the specific execution steps of K-means++:

Randomly pick a sample from the dataset as the starting cluster center $c_{1}$ ;
Calculate the square of the Euclidean distance between each sample and the existing cluster center $D (x)$ , determine the probability of becoming a new cluster center $P (x)$ , and then select the sample with the highest probability as the new cluster center. The formula can be expressed as:

$P (x) = \frac{D^{2} (x)}{\sum_{x \in X} D^{2} (x)}$

(1)
Repeat step 2 until all K cluster centers are selected.
Performing a K-means algorithm process on the K initial cluster center.
Anchor is output after the number of iterations is reached.

According to the above method, we analyze the Chinese power line insulator dataset by using the K-means++ algorithm. We obtain the K-means++ algorithm clustering result, as shown in Figure 2.

The labeled box data in the insulator defect dataset is applied to the above algorithm, and the exact prior box is obtained by iteration. The model has three detection layers, each layer has three sizes, and there are nine sizes in total. Through multiple experiments, it is concluded that when the number of detection layers is three times, the division of anchor is more suitable for the dataset. When the number of detection layers is three times, the prior frame selection K = 9 is the best. The anchor sequence obtained by final clustering is shown in Table 1.

The K-means++ algorithm selects the initial clustering center by probability distribution, which can avoid the local optimum problem caused by the improper selection of the initial point in the standard K-means algorithm. In insulator defect detection, the size difference of different defects (such as crack, damage, pollution) may be large. If the initialization is improper, the size of the anchor frame may not effectively cover all defects. This will result in inaccurate detection of defects of certain dimensions. K-means++ can better select the initial anchor frame size, which makes the clustering results more representative, and then improves the detection performance of the model. In insulator defect detection, stable anchor box clustering results mean that similar anchor box configurations will be obtained no matter how the model is initialized during the training process, thus ensuring the robustness and accuracy of the detection model. This is important for identifying defects with different shapes and sizes. Insulator defects are often subtle, small-size targets. If the clustering of anchor frame can not accurately reflect the size distribution of these small targets, the model is prone to miss detection when detecting small targets. Due to the more reasonable initialization strategy, K-means++ can better capture the size features of small targets, provide more detailed anchor box clustering results, and improve the detection ability of the model for small size defects.

Compared with the traditional K-means algorithm, K-means++ still maintains the same level of overall computational complexity, although it slightly increases some computational overhead in the initial stage. In insulator defect detection, this performance improvement is achieved without significantly increasing the computational complexity, making K-means++ an efficient and practical improvement strategy.

2.2. ECA Attention Mechanisms

An attention mechanism (AM) is a technique that simulates the focusing characteristics of human attention in neural networks. It can enhance the processing ability of the model for specific parts of the input data. In deep learning, the attention mechanism enables the model to dynamically focus on specific regions of the sequence, while diluting the focus on the rest. In this paper, the ECA-Net (efficient channel attention network) mechanism is used to effectively solve the problem of background interference in insulator defect detection. The ECA attention mechanism gives a larger weight to the insulator defect part and gives a smaller weight to the background area, thereby effectively solving the background interference problem of the insulator defect detection and improving the accuracy of the model to the insulator defect detection.

In 2020, ECA-Net was proposed, which is an improved channel attention model based on SE-Net (squeeze and excitation network) [26], specially designed to improve the efficiency and effectiveness of convolutional neural networks in processing channel information. ECA-Net uses a one-dimensional convolution kernel with a size of K to obtain the local correlation between channels, thus abandoning the dependence on the fully connected layer and effectively reducing the complexity of the model and the number of parameters. It can also adaptively determine the size of the convolution kernel, which allows the model to capture the correlation between channels at different scales, rather than relying on artificial preset scales. This mechanism helps to maintain the original information of the features, while still effectively enhancing the features of the important channels and suppressing the unimportant channels, thus enhancing the network’s attention to the local feature information and more effectively distinguishing the background from the target. The performance and efficiency of the network in processing the insulator defect image are improved to a great extent. The network structure of ECA-Net is shown in Figure 3.

The value of K is calculated in Equation (2), where channels represent the number of channels of the input feature. As shown in Figure 3, every five channels are compressed into one channel, and the number of channels is kept consistent by filling. The value of K is calculated as follows:

K = \frac{l o g_{2} c h a n n e l s + 1}{2}

(2)

Attention mechanisms can make the network pay attention to useful feature information during training and reduce the interference of other information. The introduction of the ECA attention mechanism can improve the performance of the model, better capture the relationship between features, and help to improve the model’s ability to extract insulator defect targets in complex backgrounds. The backbone of YOLOv5s is mainly responsible for extracting image features, while the neck is used to facilitate information interaction between deep and shallow feature maps. Therefore, this paper chooses to integrate the ECA attention module into the two network structures, respectively, to enhance the feature fusion effect, enhance the ability of the network to distinguish the background and the insulator defect target, and improve the accuracy of the network model for insulator defect detection. We add ECA to the end of CSP1_x and CSP2_x. Figure 4 shows the structure of the improved YOLOv5s network.

2.3. Loss Function Design

In the YOLOv5s network model, the loss function, as a key component of the output layer of the algorithm, is responsible for evaluating the error between the predicted bounding box and the actual box. It can be used to adjust the position of the bounding box to improve the accuracy of the algorithm for insulator defect identification. The traditional YOLOv5s algorithm uses the loss function CIoU to calculate the loss of the bounding box. In addition, the common positioning loss functions include IoU, GIoU, DIoU, EIoU, etc. The following will introduce the principles advantages and disadvantages of various positioning loss functions one by one.

IoU: IoU mainly judges the position of the prediction frame by calculating the cross ratio of the prediction frame and the target frame. The calculation formula of IoU is as follows:

$I o U = \frac{A \cap B}{A \cup B}$

(3)

$I o U_{L o s s} = 1 - ln (I o U)$

(4)

An IoU value of 1 indicates that the predicted box is perfectly coincident with the true box; a value of 0 indicates that there is no overlap. However, when the two functions do not intersect, IoU is equal to 0, which can not show the distance between the two objects, and IoU is infinite, non-differentiable, and can not optimize the model.
GIoU: GIoU solves the problem that the loss function is infinite when the prediction frame and the target frame in IoU are not intersected. The calculation formula of GIoU is as follows:

$G I o U = I o U - \frac{A^{b} - x}{A^{b}}$

(5)

$G I o U_{L o s s} = 1 - G I o U$

(6)

In the formula, $A^{b}$ represents the minimum area that can surround the prediction frame and the target frame, x represents the intersection of the prediction frame and the target frame, and $A^{b} - x$ represents the difference set. It is easy to find that the smaller the difference set is, the smaller the GIoU loss value is, that is, the closer the prediction frame is to the target frame. When the prediction box and the target box do not overlap, the difference between the true value and the target value can be reflected through GIoU. However, when the prediction frame completely covers the target frame, the GIoU remains unchanged even if the target frame moves within the prediction frame. At this time, the GIoU cannot effectively indicate the specific positional relationship between the real frame and the predicted frame.
CIoU: The CIoU loss calculates the overlap area, the Euclidean distance of the center points and the aspect ratio information between the predicted box and the real box, and accurately calculates the overlap degree between the bounding boxes. The calculation formula is as follows:

$C I o U_{L o s s} = 1 - I o U + \frac{ρ^{2} (b, b^{g t})}{c^{2}} + α v$

(7)

$v = \frac{4}{π^{2}} (a r c t a n \frac{w^{g t}}{h^{g t}} - a r c t a n \frac{w}{h})$

(8)

$α = \frac{v}{1 - I o U + v}$

(9)

where, b is the central coordinate value of the predicted bounding box, $I o U$ is the image overlap area, $b^{g t}$ is the coordinate of the real bounding box central point, $α$ is a balance scale factor, $ρ^{2}$ is used to calculate the Euclidean distance between the centers of the predicted bounding box and the real bounding box, v is a parameter used to evaluate the length–width ratio between the predicted box and the real box. $c^{2}$ represents the length of the smallest enclosing box diagonal around the predicted box and the actual box.
EIoU: It can be seen from Equation (7) that when the predicted frame size changes, but the aspect ratio remains unchanged, v loses its effect. The gradient calculation formula of w, h is as follows:

$\frac{\partial v}{\partial w} = \frac{8}{π^{2}} (a r c t a n \frac{w^{g t}}{h^{g t}}) \times \frac{h}{w^{2} + h^{2}}$

(10)

$\frac{\partial v}{\partial h} = \frac{8}{π^{2}} (a r c t a n \frac{w^{g t}}{h^{g t}}) \times \frac{w}{w^{2} + h^{2}}$

(11)

The gradient of the two is inversely proportional; that is, when training, one of them increases and the other decreases. EIoU splits the aspect ratio in CIoU, calculates the width and height differences between the prediction frame and the target frame, and gradually optimizes the size of the prediction frame to be closer to the target frame, thus reducing the real gap between the two, speeding up the convergence of the model and improving the accuracy of the detection. The EIoU calculation formula is as follows:

$E I o U_{L o s s} = 1 - I o U + \frac{ρ^{2} (b, b^{g t})}{c_{w}^{2} + c_{h}^{2}} + \frac{ρ^{2} (w, w^{g t})}{c_{w}^{2}} + \frac{ρ^{2} (h, h^{g t})}{c_{h}^{2}}$

(12)

where: $ρ^{2}$ represents the square of the Euclidean distance, $c_{w}$ and $c_{h}$ represent the width and height of the minimum bounding rectangle, b and $b^{g t}$ represent the coordinates of the center points of the prediction box and the target box. The schematic diagram of EIoU is shown in Figure 5.

The loss function consists of three parts: overlap loss, center offset loss, and size loss. The overlap loss and center point offset loss are consistent with the corresponding parts of CIoU. Size loss takes the difference in width and height between the prediction frame and the target frame as a loss factor, so as to enhance the model ability for small size defects. Therefore, the CIoU loss function in YOLOv5s will be replaced by the EIoU loss function in this paper.

2.4. Lightweight Network Structure Design

Embedded or mobile devices often have limited computing power and storage capacity. To make these devices run the model efficiently, the network model needs to be lightweight. Lightweight technology is designed for deep learning models and aims to reduce the size and complexity of models to run on devices with limited computing power. By adopting a lightweight network structure and optimizing convolution operation, the number of model parameters is reduced, and the detection speed of the model is improved. GhostNet reduces parameter and computational costs through its characteristic Ghost module, without sacrificing too much accuracy. As shown in Figure 6, it is the algorithm framework of lightweight YOLOv5s-KE insulator defect detection.

According to the characteristics of the GhostNet module, this paper improves the G_CSP 1_X substitution feature extraction network, and then uses channel pruning technology to reduce the parameters and calculation requirements of the model under the same accuracy, thus speeding up the detection rate of the model. Then the up-sampling operator is replaced by CARAFE to further lighten the model and improve the detection speed of the model for insulator defects.

2.4.1. GhostNet Model

The traditional convolution operation has the problems of redundant feature information, a large amount of calculation, and slow training speed. In response to this problem, Huawei Noah’s Ark Laboratory proposed a lightweight network GhostNet based on MobileNet model improvement in 2020, whose computing performance surpasses the MobileNet V3 network developed by Google. GhostNet generates a certain number of base feature maps (called raw feature maps) using standard convolution operations. These base feature maps are then linearly manipulated to generate additional feature maps (ghost feature maps), and then the original and ghost feature maps are fused to form the final feature map.

The Ghost module can increase the number of feature maps at a lower computational cost, while maintaining the depth and complexity of the network, reduce the number of parameters to improve the computational efficiency, and obtain a more lightweight network model.

Assume that the input is

X \in R^{C \times H \times W}

, and the height, width, and number of channels of the input data are written as H, W and C (where C is the number of channels, H is the height, and W is the width) and contain n convolutional layers with convolution kernel size

K \times K

. Compare the calculation amount of the traditional convolutional neural network (Figure 7) and the GhostNet model (Figure 8). To simplify the operation, assume that the size of the input is exactly the same as that of the output, then the amount of computation of the conventional convolution is

n \times c \times h \times w \times k \times k

.

Compared with the traditional convolution operation in the YOLOv5s network, the convolution operation of the Ghost module has been greatly reduced. Under the assumption of the same input, the input feature vectors are first subjected to conventional convolution, and p intrinsic feature maps are output. To generate additional redundant feature maps, the Ghost module performs several linear operations (

ϕ_{1}, ϕ_{2}, \dots, ϕ_{k}

) on each core feature map, so that each core feature map can produce multiple similar derived feature maps. The formula for the ratio of Ghost convolution to conventional convolution is as follows:

r = \frac{n \times c \times h \times w \times k \times k}{\frac{n}{q} \times h \times w \times c \times k \times k + (q - 1) \times \frac{n}{q} \times h \times w \times k \times k}

(13)

where

k \times k

is the size of the linear operation convolution kernel. q is the number of linear transformations.

\frac{n}{q}

is the number of output channels at the first transformation.

q - 1

is because the identity map does not need to be calculated, but it also counts as part of the second transformation. r is the computation ratio of ordinary convolution and GhostConv convolution.

When the size of the convolution kernel of the linear operation is the same as that of the conventional convolution operation, the formula can be further simplified as follows:

r \approx \frac{q \times c}{q + c - 1} \approx q

(14)

The ratio of the amount of computation of the Ghost module to the amount of computation of the conventional convolution module is approximately q.

2.4.2. G_CSP 1_X Substitution

Based on the lightweight advantage of the Ghost module, this paper proposes a Ghost_CSP 1_X module which can be applied to YOLOv5s. In this paper, Ghost convolution is introduced into the residual structure Resunit in the CSP 1_X to construct the G_CSP 1_X module, and the CSP 1_X module in the backbone network is replaced by the G_CSP 1_X module to reduce the amount of calculation in the network structure. The improved G_CSP1 module is shown in Figure 9.

A GhostResunit is use to replace an original Resunit module in CSP 1_X module, wherein that GhostResunit performs channel reduction through common convolution to generate two feature map, and then Ghost convolution is used to replace the common convolution in the original Resunit module, and merging the generated characteristic map and another group of characteristic maps through a channel to form a final characteristic map. Figure 10 shows the network structure diagram of YOLOv5s-KE algorithm based on Ghost optimization.

2.5. Channel Pruning

In the calculation of convolutional neural networks, there are usually many redundant parameters, which do not substantially contribute to the recognition ability of the model, and may even reduce the detection performance of the model. Removing these redundant parameters can improve the detection speed of the network. The channel pruning technique can remove redundant channels while maintaining the original convolution structure, which not only ensures the accuracy, but also improves the detection efficiency. Therefore, this paper uses channel pruning as the model lightweight strategy. The pruning process consists of three stages: first, sparsify the training, then perform the pruning operation, and finally fine-tuning the network. The specific process is shown in Figure 11.

Channel is a crucial link in the channel pruning process, which is used to evaluate the contribution of each channel in the convolutional neural network to the performance of the model. This process enables the identification of channels that have a significant impact on the accuracy, detection capability, or classification performance of the model. In the sparse training phase, this paper mainly focuses on the BN layer in the YOLOv5s-KE algorithm. The purpose is to prepare for pruning to identify and weaken those neural network weights that have less impact on the final performance. In this paper, the importance of each channel is evaluated by using the scaling factor

γ

of the BN layer as an indicator.

The scaling factor

γ

of the BN layer is learned in training, and the closer the value of

γ

is to zero, the less important the channel is. When pruning, we can determine which channels have the least impact on the network performance according to the value of

γ

, to reduce the complexity and computational cost of the model.

By performing

L 1

norm regularization on the scaling factor of the BN layer, the partial factor is driven to zero, and the channel is thinned. This process helps the network to automatically identify which channels are unimportant, and the scaling factor of the secondary channels will gradually decrease with training. The loss function of the method is defined as:

L = \sum_{(x, y)} l (f (x, W), y) + λ \sum_{γ \in Γ} g (γ)

(15)

where,

(x, y)

represents the input and label of the corresponding training set, the first term represents the loss function of the YOLOv5s network, W represents the weight parameter of the model,

λ

is the weight of the adjustment loss and regularization, and

g (γ)

is the regularization penalty of the scaling factor. The experiment uses

g (γ) = | s |

as the

L 1

norm to sparsify the weight matrix and assist feature selection.

The channel pruning strategy is used to set different pruning rates for each convolution layer. Assuming that the network model has L layers, we can define a vector containing the pruning ratio of each layer (such as

(P_{1}, P_{2}, \dots, P_{L})

), where

P_{1}

represents the pruning ratio of the first layer. The pruning rate of each layer is randomly selected in a preset range between 0 and R, where R defines the maximum value of the pruning rate. Through this random selection method, the statistical data of the BN layer can be effectively fine-tuned, thereby improving the prediction accuracy of the performance potential of the subnet. Figure 12 shows the process of channel pruning.

2.6. CARAFE Upsampling Operator

The CARAFE module consists of a kernel prediction module and a content-aware recombination module. The kernel prediction module consists of three parts: the channel compressor, the content encoder, and the kernel normalizer. These sub-modules work together to generate the upsampling kernel. Specifically, the channel compressor reduces the number of channels of the input feature map by 1 × 1 convolution, thereby reducing the amount of computation. The content encoder is used to generate an upsampling kernel of size

k_{u p} \times k_{u p}

; when different upsampling kernels are used at different locations, the upsampling kernel should have a four-dimensional shape, which is determined by

H \times W \times K_{u p} \times K_{u p}

, where H and W are the sizes after upsampling, and

K_{u p}

is the size of the upsampling kernel. For the feature map output by the channel compressor, a

k_{e n c o d e r} \times k_{e n c o d e r}

convolution layer is used to predict the upsampling kernel, to obtain a larger upsampling kernel and represent a larger receptive field. The kernel normalizer normalizes the upsampling kernel by SoftMax. The content-aware reorganization module associates each position of the output feature map to a corresponding position in the input feature map, and employs a

k_{u p} \times k_{u p}

region centered at the point to perform dot multiplication with the upsampling kernel predicted by the point to obtain the final output feature map. The content-aware reorganization module can make the relevant feature information in the local area receive more attention, so that the reorganized feature map has more semantic information than the original feature map.

As shown in Figure 13, input a feature map X with a size of

H σ W \times C

and an upsampling rate of

σ

, and CARAFE will generate a new feature map with a size of

σ C \times σ H \times σ W

, denoted as

X^{'}

. And for each point

l^{'}

on the new feature map, there is a corresponding point l on the feature map X. The CARAFE is calculated as follow.

W_{l^{'}} = Ψ (N (X_{l}, k_{e n c o d e r}))

(16)

X_{l^{'}}^{'} = Φ (N (X_{l}, k_{u p}), W_{l}^{'})

(17)

3. Results

3.1. Insulator Dataset

In this paper, the Chinese power line insulator dataset (CPLID) is used in the experiment. The CPLID dataset covers 600 normal insulator images and 247 insulator images with defects taken in various scenes. The quality of aerial images may be affected by environmental conditions, such as changes in lighting and shooting angles, resulting in problems such as blurring or lack of clarity. To prevent overfitting, this paper uses data enhancement to increase the number of images in the CPLID dataset to 3600, and divides them into a training set and test set according to a ratio of 8:2.

3.2. Data Enhancement

In this paper, the Mosaic data enhancement method is used to increase the diversity of input samples, improve the accuracy of the model in identifying complex and small targets, and reduce the dependence of the model on the input images. The method randomly combines four insulator images, then initializes a region and randomly selects a center point. Finally, the four images are stitched together around this point to create a new stitched image. In this new graph, the real box information corresponding to the target object still exists, so it can effectively expand the sample dataset, improve the complexity of network training, and thus enhance the generalization performance of the model. Figure 14 presents a schematic illustration of the Mosaic data enhancement.

3.3. Experiment Configuration

Training the network model requires computer equipment with powerful computing power, especially when dealing with large-scale datasets. Moreover, GPU is more efficient than CPU in processing deep learning and target detection tasks, which can significantly improve the speed of network training. Therefore, all the computing tasks in this paper are carried out on the GPU of a personal computer to speed up the calculation. Based on the Windows 10 operating system, the experimental environment is equipped with 64 GB memory, Intel (R) Core (TM) i9-13900K CPU, and NVIDIA GeForce RTX 3090 Ti GPU. Python 3.7, Anaconda3, and other deep learning-related libraries are installed in the environment for image processing (some main libraries such as torch: 2.3.1+cu121, timm: 0.6.7, urllib3: 2.2.2, pip: 24.0, numpy:1.26.4), and PyCharm (2024.1.6) are used for code writing and debugging, as well as the generation of result maps.

In this paper, the improved YOLOv5s model is used to train the network. The initial learning rate is set to 0.002, the learning rate is reduced by a decay rate of 0.005, and the momentum is set to 0.8. The training batch size was 14 images and the test program was started after 400 iterations.

3.4. Experiment Comparison and Analysis

In order to further confirm the feasibility of the proposed YOLOv5s-KE algorithm for insulator defect detection, the YOLOv5s-KE network is compared with other mainstream target detection network models on the CPLID dataset. The comparison algorithms include YOLOv3, YOLOv4, Faster RCNN, SSD, YOLOv5s, and YOLOv7, each calculated as 0, as shown in Table 2.

It can be seen from Table 2 that the mAP value of the YOLOv5s-KE model is the highest, reaching 92.3%. Compared with one-stage algorithms YOLOv3, YOLOv4, SSD, and YOLOv7, the mAP value of model checking is increased by 9.2%, 6.8%, 9.3% and 2.1%, respectively, and compared with two-stage algorithm Faster RCNN, the detection accuracy is also increased by 2.8%. Compared with the original YOLOv5s, the detection accuracy is improved by 5.2%, and the detection speed is slightly reduced, but it can fully meet the speed requirements of the detection.

To intuitively compare the original YOLOv5s and YOLOv5s-ke methods, a P–R curve is drawn. This curve helps to evaluate their performance. A larger area covered by the curve on the coordinate axis indicates better model performance. As shown in Figure 15, the YOLOv5s-KE network model demonstrates superior detection performance compared to the original YOLOv5s.

To further verify the effectiveness of the YOLOv5s-KE model after lightweight in this paper to detect insulator defects, it is compared with YOLOv3-tiny [16,17], YOLOv4-tiny [18,19], YOLOv7-tiny and the original YOLOv5s network, and the experimental results are shown in Table 3.

It can be seen from Table 3 that the mAP value of the Prune-YOLOv5s network model of the method in this paper is 91.1%, and the FPS reaches 94.3 frames/s, which is 30.1 frames/s higher than that of the original YOLOv5s network, and it performs best among the mainstream lightweight algorithms. Figure 16 is a visualization of the object detection map in the CPLID dataset.

3.5. Ablation Experiments

In order to verify the effectiveness of the improved anchor module, ECA attention module and EIOU loss function proposed in this paper, ablation experiments are carried out on the CPLID dataset. Our goals were as follows:

Analyze the effectiveness of the anchor box selection method of YOLOv5s, as shown in Table 4. The improved anchor selection method is the experimental result of using the K-means++ to re-select the anchor box.
ECA attention mechanism was added to the backbone and neck network of YOLOv5s, respectively, and the network with ECA attention mechanism was compared with the original network model. The experimental results are shown in Table 4.
In YOLOv5s, the EIoU loss function is compared with the original loss function CIoU. The visualization results of the loss function are shown in Figure 17, and the experimental results are shown in Table 4.

As shown in Figure 17, in the training of YOLOv5s network, when CIoU loss function is used, the network needs 30 epochs to show the convergence trend. In contrast, the network using the EIoU loss function has shown a stable convergence property at the 20th epoch, which indicates that the EIoU loss function has a faster convergence rate.

It can be seen from Table 4 that the performance of the original model can be improved by various improvement measures. Among them, the attention mechanism ECA is the most prominent, which improves the accuracy of insulator defect detection from 87.1% to 90.4%, an increase of 3.3%. This shows that after adding the ECA attention module to the backbone and neck, the model’s ability to extract and detect target features has been enhanced, which effectively improves the ability to identify insulator defects in the complex backgrounds and reduces the situation of missed detection and false detection. After using K-means++ reclustering, the mAP of the anchor model is improved by 2.5%, which indicates that the anchor frame size suitable for insulator defect detection can be better found, and the accuracy of defect detection is improved. Finally, the mAP value of the model with EIOU loss function is increased by 2.2%, which further proves that the EIOU loss function not only accelerates the convergence of the model, but also enhances the ability to locate the insulator defects, thus effectively improving the detection accuracy of the model. Compared with the original YOLOv5s model, the YOLOv5s-KE model proposed in this paper improves the mAP value by 5.2%, and has good performance in insulator defect detection, which provides a certain reference value for the follow-up study of insulator defect detection.

To verify the effectiveness of the improved Ghost convolution module, channel pruning, and CARAFE upsampling operator proposed in this paper, ablation experiments are carried out on the CPLID dataset.

Ghost convolution is introduced into YOLOv5s-KE network to construct G_CSP1 and compared with the original YOLOv5s-KE network to verify the effectiveness of the improved module.
On the basis of all the above improvement measures, set different cutting rates to find the cutting rate that best meets the performance requirements.
On the basis of all the above improvements, the CARAFE operator is used to replace the original upsampling operator and compared with the original network to verify the effectiveness of the improved module.

In this paper, the G_CSP1 module is mainly used to improve the backbone network. In this experiment, the parameters, accuracy and reasoning speed of the model before and after the replacement of the G_CSP1 module will be compared to verify the effectiveness of the optimized algorithm. The experimental results are shown in Table 5.

It can be seen from Table 5 that the YOLOv5s-KE algorithm integrated with the G_CSP1 module reduces the size of the model from 36.8 M to 28.6 M by 8.2 M, reduces the mAP value from 92.3% to 92.0% by 0.3%, and increases the FPS from 53.8 FPS to 66.3 FPS by 12.5 FPS. The experimental results show that the introduction of the G_CSP1 module can reduce the number of model parameters, compress the size of the model, and improve the speed of model reasoning at the expense of less accuracy.

After the conventional training of the improved model, the weight file is sparsely trained. The sparsity training epoch is consistent with the conventional training, and the sparsity rate is 0.0001 to aggregate the model parameters, which is convenient for subsequent pruning. After sparse training, in order to ensure that the model is not excessively pruned, which will affect the detection accuracy of the model, the channel pruning rate is set to 40%, 45%, 50%, and 55%, and the test of different channels pruning rates is carried out to find the most suitable channel pruning rate for the model. The experimental results are shown in Table 6.

It can be seen from Table 6 that with the increase in pruning rate, the number of parameters, the model size, and the detection accuracy are gradually reduced, and the FPS is gradually increased. When the pruning rate reaches 45%, the model size is 15.7 M, which is 12.9 M lower than the model without pruning, and the mAP is 91.4%, which is acceptable to be reduced by 0.6%. FPS reached 83.7 frames per second, an increase of 17.4 frames per second. When the pruning rate reaches 50%, compared with the pruning rate of 45%, although the FPS increases by 1.2 frames per second, the mAP of the model decreases by 0.3%. Based on the above considerations, 45% was selected as the final pruning rate of the model.

The upsampling part of the improved model is replaced by the CARAFE operator. The feasibility of the method is verified by comparing the parameters, accuracy, and reasoning speed of the model before and after the replacement of the CARAFE upsampling operator. The experimental results are shown in Table 7.

As shown in Figure 18, Figure 18a is the YOLOv5s-KE test result, and Figure 18b is the Prune-YOLOv5s test result. It can be seen that the recognition accuracy of the Prune-YOLOv5s algorithm is slightly lower than that of the YOLOv5s-ke algorithm before the lightweight adjustment, but from the analysis of the above detection results, the detection accuracy of the Prune-YOLOv5s model is still reliable overall, and there is no missing detection phenomenon for the relatively vague insulator defect target in the picture. Overall, it still has a high detection accuracy.

4. Conclusions

In this study, we focus on analyzing UAV aerial images of insulators. To address the challenges posed by complex backgrounds and small defect sizes in the images, we adopt the YOLOv5s network as the foundational model for enhancing detection accuracy and speed. By customizing and refining the network based on the specific characteristics of insulator images, we aim to optimize its performance in defect detection. The main conclusions are as follows:

Aiming at the problems of small size and complex background of insulator defects in aerial images, the algorithm improves the detection accuracy. The anchor is improved by using the K-means++ algorithm to generate an anchor frame which is more suitable for insulator defects. Secondly, the ECA-Net attention mechanism module is embedded in the backbone network and the neck network to enhance the target recognition ability of the model in complex environments and improve detection accuracy. Finally, the EIoU loss function is used to replace the original CIoU loss function to optimize the width and height loss of the prediction frame and the target frame, and improve the ability to locate small-size defects.
To solve the problems of the large number of parameters and slow detection speed when the model is deployed in embedded and mobile devices, Ghost convolution is used to improve the CSP1 module, and the G-CSP1 module is constructed to replace the CSP1 structure in the original backbone network and optimize the convolution operation. Through the channel pruning technology, the redundant channels in the network are deleted, the model volume is reduced, and the detection speed is improved. In addition, the original upsampling operator is replaced by the CARAFE operator, which reduces the model parameters and computational cost while maintaining the upsampling effect and reducing the complexity of the model.
The experimental results on the CPLID dataset show that the proposed algorithm has a significant improvement in the accuracy of insulator defect detection, and the average detection accuracy is improved from 87.1% to 92.3%, which can well meet the needs of insulator defect detection. At the same time, the detection speed is increased from 53.8 frames per second to 94.3 frames per second, and the lightweight model reduces the size of the model from 36.8 M to 9.6 M, with almost no loss of detection accuracy.

The insulator defect detection method proposed in this paper can accurately detect the defects of insulator images with sufficient illumination. However, in order to detect insulator defects anytime and anywhere, follow-up studies need to strengthen the ability to detect images with insufficient illumination. In order to improve the robustness and generalization ability of the detection algorithm, it is necessary to expand the dataset and increase the multi-angle images and multi-spectral data of the same target.

Author Contributions

Conceptualization, G.F., X.A., Q.F., S.G.; methodology, G.F., X.A.; software, G.F., X.A.; validation G.F., X.A., Q.F.; writing—review and editing X.A., Q.F., S.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The CPLID is available online at https://github.com/InsulatorData/InsulatorDataSet (accessed on 1 September 2022).

Conflicts of Interest

Author Qi Fang was employed by the company State Grid HLJ Electric Power Transmission and Transformation Engineering Co. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Zhai, Y.; Chen, R.; Yang, Q.; Li, X.; Zhao, Z. Insulator Fault Detection Based on Spatial Morphological Features of Aerial Images. IEEE Access 2018, 6, 35316–35326. [Google Scholar] [CrossRef]
Zuo, D.; Hu, H.; Qian, R.; Liu, Z. An insulator defect detection algorithm based on computer vision. In Proceedings of the 2017 IEEE International Conference on Information and Automation (ICIA), Macau SAR, China, 18–20 July 2017; pp. 361–365. [Google Scholar] [CrossRef]
Oberweger, M.; Wendel, A.; Bischof, H. Visual recognition and fault detection for power line insulators. In Proceedings of the 19th Computer Vision Winter Workshop, Krtiny, Czech Republic, 3–5 February 2014; pp. 1–8. [Google Scholar]
Jabid, T.; Uddin, M.Z. Rotation invariant power line insulator detection using local directional pattern and support vector machine. In Proceedings of the 2016 International Conference on Innovations in Science, Engineering and Technology (ICISET), Dhaka, Bangladesh, 28–29 October 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 1–4. [Google Scholar]
Li, H.; Yang, S.; Wang, X.; Jia, L.; Yong, W.; Wu, X.; Long, L.; Chang, L. Research on Infrared Thermal Imaging Zero-value Insulator Identification Based on GA-SVM Algorithm. In Proceedings of the 2024 5th International Seminar on Artificial Intelligence, Networking and Information Technology (AINIT), Nanjing, China, 22–24 March 2024; pp. 1694–1698. [Google Scholar] [CrossRef]
Shafique, U.; Alam, S.M.; Rashid, U.; Javed, W.; Anwaar, H.; Zeb, M.S.; Ahmad, T.; Imtiaz, U.; Nzanywayingoma, F. Infrared Thermography Based Insulator Fault Classification via Unsupervised Clustering and Semi-Supervised Learning. IEEE Access 2024. [Google Scholar] [CrossRef]
Li, X.; Shao, M.; Dong, X.; Zhang, A.; Zhai, Z.; Zhao, W.; Feng, Z.; Liu, Y. The Application of Infrared Thermal Imaging Technology in State Detection of Porcelain Insulators. In Proceedings of the 2023 3rd International Conference on Energy, Power and Electrical Engineering (EPEE), Wuhan, China, 15–17 September 2023; pp. 476–480. [Google Scholar] [CrossRef]
He, H.; Luo, D.; Lee, W.J.; Zhang, Z.; Cao, Y.; Lu, T. A Contactless Insulator Contamination Levels Detecting Method Based on Infrared Images Features and RBFNN. IEEE Trans. Ind. Appl. 2019, 55, 2455–2463. [Google Scholar] [CrossRef]
Zaripova, A.; Zaripov, D.; Usachev, A. Investigation of the algorithm to find defects in high-voltage insulators for an automated thermal imaging control system. In E3S Web of Conferences; EDP Sciences: Les Ulis, France, 2021; Volume 288, p. 01070. [Google Scholar]
Akram, M.W.; Li, G.; Jin, Y.; Chen, X.; Zhu, C.; Zhao, X.; Khaliq, A.; Faheem, M.; Ahmad, A. CNN based automatic detection of photovoltaic cell defects in electroluminescence images. Energy 2019, 189, 116319. [Google Scholar] [CrossRef]
He, M.; Qin, L.; Deng, X.; Liu, K. MFI-YOLO: Multi-Fault Insulator Detection Based on an Improved YOLOv8. IEEE Trans. Power Deliv. 2024, 39, 168–179. [Google Scholar] [CrossRef]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar]
Deng, F.; Xie, Z.; Mao, W.; Li, B.; Shan, Y.; Wei, B.; Zeng, H. Research on edge intelligent recognition method oriented to transmission line insulator fault detection. Int. J. Electr. Power Energy Syst. 2022, 139, 108054. [Google Scholar] [CrossRef]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems 28, Proceedings of the Annual Conference on Neural Information Processing Systems 2015, Montreal, QC, Canada, 7–12 December 2015; MIT Press: Cambridge, MA, USA, 2015; Volume 28. [Google Scholar]
Adiono, T.; Ramadhan, R.M.; Sutisna, N.; Syafalni, I.; Mulyawan, R.; Lin, C.H. Fast and Scalable Multicore YOLOv3-Tiny Accelerator Using Input Stationary Systolic Architecture. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2023, 31, 1774–1787. [Google Scholar] [CrossRef]
Kim, M.; Oh, K.; Cho, Y.; Seo, H.; Nguyen, X.T.; Lee, H.J. A Low-Latency FPGA Accelerator for YOLOv3-Tiny With Flexible Layerwise Mapping and Dataflow. IEEE Trans. Circuits Syst. I Regul. Pap. 2024, 71, 1158–1171. [Google Scholar] [CrossRef]
Valadanzoj, Z.; Daryanavard, H.; Harifi, A. High-speed YOLOv4-tiny hardware accelerator for self-driving automotive. J. Supercomput. 2024, 80, 6699–6724. [Google Scholar] [CrossRef]
Liu, J.; Qiu, Y.; Ni, X.; Shi, B.; Liu, H. Fast Detection of Railway Fastener Using a New Lightweight Network Op-YOLOv4-Tiny. IEEE Trans. Intell. Transp. Syst. 2024, 25, 133–143. [Google Scholar] [CrossRef]
Akhmetov, Y.; Nurmanova, V.; Bagheri, M.; Zollanvari, A.; Phung, T. Overhead Line Insulator Type Classification Using YOLOv3 Architectures. In Proceedings of the 2022 International Conference on Computing, Networking, Telecommunications & Engineering Sciences Applications (CoNTESA), Skopje, North Macedonia, 15–16 December 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 36–40. [Google Scholar]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
Liu, C.; Wu, Y.; Liu, J.; Sun, Z. Improved YOLOv3 network for insulator detection in aerial images with diverse background interference. Electronics 2021, 10, 771. [Google Scholar] [CrossRef]
Weng, D.; Zhu, Z.; Yan, Z.; Wu, M.; Jiang, Z.; Ye, N. Lightweight network for insulator fault detection based on improved YOLOv5. Connect. Sci. 2024, 36, 2284090. [Google Scholar] [CrossRef]
Meng, Y.; Tang, Y.; Huang, X.; Wang, H.; Zhu, J.; Tang, W.; Chen, L. A Lightweight Insulator Detection Methodology for UAVs in Power Line Inspection. J. Circuits Syst. Comput. 2024, 33, 2450069. [Google Scholar] [CrossRef]
Wang, J.; Li, X. Abnormal Electricity Detection of Users Based on Improved Canopy-Kmeans and Isolation Forest Algorithms. IEEE Access 2024, 12, 99110–99121. [Google Scholar] [CrossRef]
Liang, Q.; Xiang, S.; Hu, Y.; Coppola, G.; Zhang, D.; Sun, W. PD2SE-Net: Computer-assisted plant disease diagnosis and severity estimation network. Comput. Electron. Agric. 2019, 157, 518–529. [Google Scholar] [CrossRef]

Figure 1. YOLOv5s-KE framework for insulator defect detection.

Figure 2. K-means++ algorithm clustering result.

Figure 3. Diagram of efficient channel attention (ECA) module.

Figure 4. Improved YOLOv5s network structure diagram.

Figure 5. EIoU schematic diagram.

Figure 6. Lightweight YOLOv5s-KE insulator defect detection algorithm framework.

Figure 7. Traditional convolution graph.

Figure 8. Ghost convolution graph.

Figure 9. Improved G_CSP1 module structure diagram.

Figure 10. YOLOv5s-KE network structure diagram based on Ghost optimization.

Figure 11. Pruning flow chart.

Figure 12. Channel pruning. (a) Original net. (b) Compact net.

Figure 13. The overall framework of CARAFE.

Figure 14. Mosaic enhanced schematic ((a–d) are four figures in the data set, and we splice the four figures together. (b) changes the contrast, (c) does the rotation transformation, and (d) does the mirror transformation).

Figure 15. YOLOv5s original and YOLOv5s-KE P–R curves.

Figure 16. Visual detection results on the CPLID dataset (a) YOLOv5s. (b) YOLOv5-KE.

Figure 17. Loss function comparison result chart.

Figure 18. Visual detection results on the CPLID dataset: (a) YoloV5-KE. (b) Prune-YoloV5-KE.

Table 1. Improvement anchor.

Feature Layer	Small Size Anchor	Medium Size Anchor	Large Size Anchor
Small Feature Layer	(30,40)	(40,79)	(90,60)
Medium Feature Layer	(40,120)	(30,180)	(165,60)
Large Feature Layer	(40,120)	(90,185)	(182,190)

Table 2. Different model test results.

Method	mAP (%)	FPS (F/s)
YOLOv3	83.1	36.0
YOLOv4	85.5	54.6
SSD	83.0	40.5
Faster RCNN	89.5	30.1
YOLOv5s	87.1	64.2
YOLOv7	90.2	70.3
YOLOv5s-KE	92.3	53.8

Table 3. The results of the experiment on CPLID.

Method	Model Size (M)	mAP (%)	FPS (F/s)
YOLOv3-tiny	19.7	82.3	67.4
YOLOv4-tiny	22.4	84.4	79.6
YOLOv5s	27.1	87.1	64.2
YOLOv7-tiny	16.3	87.2	88.5
Ours-Prune-YOLOv5s-KE	9.6	91.1	94.3

Table 4. Ablation test results of each module.

Method	Improved Anchor	ECA	EIOU LOSS	mAP (%)
YOLOv5s	×	×	×	87.1
Improved module 1	√	×	×	89.6
Improved module 1	×	√	×	90.4
Improved module 1	×	×	√	89.3
YOLOv5s-KE	√	√	√	92.3

Table 5. Results of ablation experiment of G_CSP1 module.

Method	Model Size (M)	mAP (%)	FPS (F/s)
YOLOv5s-KE	36.8	92.3	53.8
YOLOv5s-KE+G_CSP1	28.6	92.0	66.3

Table 6. Results of ablation experiments with different pruning rates.

Pruning Rate (%)	Model Size (M)	mAP (%)	FPS (F/s)
40	17.2	91.4	81.5
45	15.7	91.4	83.7
50	14.3	91.1	84.9
55	12.9	91.0	86.1

Table 7. Results of ablation experiment with CARAFE upsampling operator.

Pruning Rate (%)	Model Size (M)	mAP (%)	FPS (F/s)
YOLOv5s-KE	15.7	91.4	83.7
YOLOv5s-KE+CARAFE	9.6	91.1	94.3

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Fang, G.; An, X.; Fang, Q.; Gao, S. Insulator Defect Detection Based on YOLOv5s-KE. Electronics 2024, 13, 3483. https://doi.org/10.3390/electronics13173483

AMA Style

Fang G, An X, Fang Q, Gao S. Insulator Defect Detection Based on YOLOv5s-KE. Electronics. 2024; 13(17):3483. https://doi.org/10.3390/electronics13173483

Chicago/Turabian Style

Fang, Guozhi, Xin An, Qi Fang, and Shengpan Gao. 2024. "Insulator Defect Detection Based on YOLOv5s-KE" Electronics 13, no. 17: 3483. https://doi.org/10.3390/electronics13173483

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Insulator Defect Detection Based on YOLOv5s-KE

Abstract

1. Introduction

2. Materials and Methods

2.1. K-Means++ Anchor

2.2. ECA Attention Mechanisms

2.3. Loss Function Design

2.4. Lightweight Network Structure Design

2.4.1. GhostNet Model

2.4.2. G_CSP 1_X Substitution

2.5. Channel Pruning

2.6. CARAFE Upsampling Operator

3. Results

3.1. Insulator Dataset

3.2. Data Enhancement

3.3. Experiment Configuration

3.4. Experiment Comparison and Analysis

3.5. Ablation Experiments

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI