Next Article in Journal
Bearing Fault Diagnosis Method Based on Convolutional Neural Network and Knowledge Graph
Next Article in Special Issue
Papaver somniferum and Papaver rhoeas Classification Based on Visible Capsule Images Using a Modified MobileNetV3-Small Network with Transfer Learning
Previous Article in Journal
Structure and Dynamics of dsDNA in Cell-like Environments
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Research on Insulator Defect Detection Based on an Improved MobilenetV1-YOLOv4

1
School of Electrical & Information Engineering, Anhui University of Science and Technology, Huainan 232001, China
2
School of Electrical and Opto Electronic Engineering, West Anhui University, Lu’an 237012, China
*
Author to whom correspondence should be addressed.
Entropy 2022, 24(11), 1588; https://doi.org/10.3390/e24111588
Submission received: 7 October 2022 / Revised: 28 October 2022 / Accepted: 31 October 2022 / Published: 2 November 2022

Abstract

:
Insulator devices are important for transmission lines, and defects such as insulator bursting and string loss affect the safety of transmission lines. In this study, we aim to investigate the problems of slow detection speed and low efficiency of traditional insulator defect detection algorithms, and to improve the accuracy of insulator fault identification and the convenience of daily work; therefore, we propose an insulator defect detection algorithm based on an improved MobilenetV1-YOLOv4. First, the backbone feature extraction network of YOLOv4 ‘Backbone’ is replaced with the lightweight module Mobilenet-V1. Second, the scSE attention mechanism is introduced in stages of preliminary feature extraction and enhanced feature extraction, sequentially. Finally, the depthwise separable convolution substitutes the 3 × 3 convolution of the enhanced feature extraction network to reduce the overall number of network parameters. The experimental results show that the weight of the improved algorithm is 57.9 MB, which is 62.6% less than that obtained by the MobilenetV1-YOLOv4 model; the average accuracy of insulator defect detection is improved by 0.26% and reaches 98.81%; and the detection speed reaches 190 frames per second with an increase of 37 frames per second.

1. Introduction

Insulators are unique insulating devices that are used on overhead transmission lines to protect the power transmission lines. There are mainly three types of insulators: ceramic, glass, and composite. Due to long-term exposure to the outdoors, short-circuit faults between lines are likely to occur when there are insulator problems such as bursting, string loss, and contamination, which result in significant harm to the normal power supply and service life of transmission lines. Therefore, insulator defect detection is important to ensure normal operation of power lines.
With the development of artificial intelligence and machine learning technology, many scholars have conducted research on target algorithms for insulator defect detection that can be divided into two main categories: One category is the two-stage target detection algorithm based on candidate regions and CNN extraction; the other category is the one-stage target detection algorithm that directly completes feature extraction, classification, and regression prediction [1]. The first category includes the RCNN [2], Fast RCNN [3], and Faster RCNN [4] two-stage target detection algorithms; the second category includes YOLO (You Only Look Once) [5], SSD (Single Shot Multibox Detector) [6], YOLOv2 [7], YOLOv3 [8], YOLOv4 [9], etc. The authors of [10] proposed an improved YOLOv5 insulator breakage detection algorithm; this algorithm improved the accuracy of the algorithm by adding the attention mechanism ECA-NET (efficient channel attention) and by using the Soft NMS algorithm to reallocate the original candidate frames. The detection effect of overlapping targets was strengthened, which could meet the engineering application. In a study by [11], an insulator defect detection method was proposed that adopted an RPN network for feature extraction to realize the detection of tiny defects on insulators. The advantage of this method was that it had high detection accuracy. However, it had low detection speed, i.e., only 12.8 frames per second and it could not be deployed on the mobile end. A study by [12] adopted ResNet as a backbone network, and equipped it with a batch of normalized convolutional attention modules (BN-CBAM) that could better utilize channel information and could enhance the ability of different channels to feature maps. In this case, this model had a large number of parameters. Mobile deployment requires more computing resources and memory. The authors of [13] proposed a defect detection algorithm deployed on FPGA. The algorithm had high efficiency, power loss was only 10 W, and hardware deployment was realized. Ref. [14] proposed an insulator fault detection algorithm based on a convolutional neural network, which, first, cascaded the detection network and the fault classification network to realize the position detection and fault classification of insulators, then detected faulty insulator(s) in images with complex background and high resolution, and finally, realized the fault detection. Unfortunately, there were few datasets and the identified faults were limited. Ref. [15] proposed a YOLOv5 insulator defect detection method; although it realized fast detection, this method sacrificed accuracy for speed, with only 81% accuracy, and therefore, could not be used in practical applications. In [16], the authors proposed an insulator detection method based on YOLOv2 that could complete the detection of 25 images per second, however, with an accuracy of only 88%, and therefore, may have missed detection. Ref. [17] proposed an insulator defect detection method based on YOLOv5s and designed a new residual module to reduce network parameters and to extract more useful features. The method had high detection speed but at the cost of network accuracy. Ref. [18] proposed an improved YOLOv3-based insulator detection with a new feature pyramid network, which had high detection accuracy for insulator defects. However, the network could not learn independently. Ref. [19] proposed an aerial image insulator detection based on an improved YOLOv3, which utilized multi-level feature mapping modules in the network. This model had better detection accuracy, but increased the complexity of the network. Ref. [20] proposed an insulator fault detection model based on an improved YOLOv3 with the utilization of an SPP network and a multi-scale prediction network. The performance of this method for insulator detection was relatively good, but the weight of the improved network model was up to 255 MB with significant use of memory. Ref. [21] proposed an insulator fault detection model based on an improved Faster RCNN that adopted a feature pyramid network to complete the insulator image location under a complex background and insulator identification. However, this method had high requirements for hardware. Although the detection accuracies of the proposals by [19,20,21] were improved, they were difficult to deploy to actual models and practical applications. In [22], an image texture segmentation algorithm was proposed for aviation insulators. However, the recognition effects were unsatisfactory as the texture features of the insulators were very similar to leaves, though different from most backgrounds. Ref. [23] proposed a multi-scale feature insulator detection algorithm that, first, extracted the local features of insulators, and then trained local features to obtain spatial order features to improve the robustness of the algorithm. However, this algorithm was highly complex, time-consuming, and incapable of satisfying real-time applications. Ref. [24] proposed a multi-scale residual neural network-based insulator surface damage identification method that could effectively detect insulators in single backgrounds while ignoring the identification of insulators in complex backgrounds. Ref. [25] proposed an improved U-Net insulator image segmentation method based on attention mechanism, which embeded the ECA-Net attention mechanism in the encoding stage of U-Net to improve the semantic feature extraction ability of the model and further the accuracy of the insulator image segmentation. Ref. [26] proposed an insulator self-explosion detection method based on YOLOv4 that, first, fused the shallow feature map into the feature pyramid, and then adopted the SENet structure to improve the recognition accuracy of network. This method had high accuracy and a slight decrease in detection speed. Ref. [27] proposed a new insulator defect detection algorithm using deep learning and morphological detection that adopted a residual network to extract the morphological features of insulators, and then the image segmentation pixel clustering method to establish a mathematical model of insulator defects. The detection accuracy of this method was good, but the network parameters were more complicated. Ref. [28] proposed an insulator string detection method based on an improved YOLOv5 that adopted an EIOU loss function and the AFK-MC2 anchor point generation method to detect insulators. Ref. [29] proposed an insulator defect detection method based on an improved lightweight YOLOv4, reaching a detection accuracy of 93.81% and a detection speed of 53 frames per second. Ref. [30] proposed an improved insulator detection algorithm based on YOLOvX that adopted an improved Siou loss function to speed up model convergence, and was embedded with the ECA attention mechanism. The research results showed that the detection accuracy of this method reached 97.18%, and the detection speed was 71 frames per second. Although both two-stage and one-stage target detection algorithms have achieved great success in insulator recognition, due to the limitations of storage space and power consumption, many scholars have begun to study a lightweight model with reduced complexity and improved detection speed that can be deployed on tiny devices. A traditional CNN has high memory and computational demands, which makes it unable to run on mobile and embedded devices. MobileNet [31,32,33] was proposed by the Google team, in 2017, with a focus on lightweight CNN networks in mobile terminals or embedded devices. It used depth convolution in the network to reduce the amount of computations and parameters. Using the reciprocal residual structure in the network could reduce the memory consumption during reasoning. Wang [34] proposed a CSPNet lightweight network, in 2019, that integrated gradient changes into a feature map from beginning to end, and thus, reduced the amount of computations and ensured accuracy. This method reduced the amount of calculation and improved the running speed of the model without reducing the accuracy of the model and could be used for mobile deployment. In 2020, Huawei proposed a new lightweight network, i.e., GhostNet [35]. We knew that the redundancy of the feature map was very important, therefore, we designed a Ghost module that realized the operation with fewer computations to generate the redundant feature maps, and deployed the network to a mobile terminal. In order to make the algorithm model smaller, we applied it to insulator detection.
To further improve insulator detection accuracy and speed, to reduce algorithm complexity, and to make the algorithm easy to be deployed on hardware equipment (such as drones), in this paper, we propose a lightweight insulator defect detection algorithm based on an improved MobilenetV1-YOLOv4 that reconstructs the backbone feature extraction network of YOLOv4 through the lightweight module Mobilenet-V1, introduces the scSE attention mechanism to enhance feature extraction, and adopts depthwise separable convolution to reduce the overall number of parameters in the network. The effectiveness of this proposed algorithm was verified through simulation and experimental analysis.

2. Fundamentals of MobilenetV1-YOLOv4

YOLOv4 is a high-accuracy, one-stage target detection algorithm that has been developed from YOLOv3. In 2020, Bochkovskiy et al. published the YOLOv4 model and explained it. The MobilenetV1-YOLOv4 algorithm is a lightweight network model obtained from YOLOv4. It consists of the following three parts: backbone feature extraction network, feature pyramid network (PANet, SPP), and classification regression layer (YOLO Head). Figure 1 shows the network structure of MobilenetV1-YOLOv4.

2.1. Backbone Feature Extraction Network

The backbone feature network of YOLOv4 is CSPDarkNet53. It adopts the CSPnet structure to split the original residual block into two parts: the main body and the branch. The main body is still a residual block, and the branch is a large residual edge that is connected to the main body through a small amount of processing. The benefit of this is that gradients do not disappear as the network depth complexity increases. To reduce network complexity, CSPDarkNet53 is replaced by MobilenetV1. The network structure adopts depthwise separable convolution that decomposes standard convolution into two steps, i.e., depthwise convolution and point-by-point convolution. This can significantly reduce the size of the algorithmic model. Finally, to avoid information loss of image features due to multiple convolutions, output is still in three dimensions, i.e., 13 × 13 × 1024, 26 × 26 × 512, and 52 × 52 × 256.

2.2. Feature Pyramid Network

A feature pyramid network enhances feature extraction for three dimensions of the initial feature network output, which can be divided into two main parts: SPP and PANet. The result obtained by the convolution of the last feature layer of the MobilenetV1 network is subjected to SPP max pooling, in which there are four different compositions of convolutional kernels; the pooling kernel sizes for max pooling are 13 × 13, 9 × 9, 5 × 5, and 1 × 1, respectively. Finally, the splicing of the quantity of channels is realized and sent to the PANet network. The PANet structure is an instance segmentation algorithm that can be used for both top-to-bottom and bottom-to-top feature extractions. The size of the feature map is changed through upsampling and downsampling to realize dimension splicing and complete repeated feature extraction.

2.3. Classification and Regression Layer

In the PANet network, 3 × 3 convolution is first used to convolve the result of the feature pyramid, and then the 1 × 1 convolution, and finally three prediction results of different sizes are obtained. However, the positions of the prediction results and the final prediction frame do not correspond, and therefore, decoding is demanded to finally acquire the prediction information.

3. Improved MobilenetV1-YOLOv4 Algorithm

In this paper, the YOLOv4 algorithm is adopted as the basic network. The original CSPdarknet53 network is replaced with a lighter network Mobilenet-V1 to make the detection speed of the model faster. The 3 × 3 ordinary convolutions in the PANet network and SPP are replaced by depthwise separable convolution to further reduce the number of network parameters. To maintain network accuracy, the scSE attention mechanism is added to the three dimensions of the preliminary feature extraction, and also to the results of upsampling in the enhanced feature extraction network. Finally, the convolution layer after SPP is modified to five layers. The improved network structure is shown in Figure 2.
As shown in Figure 2, the convolution size in the CBR module is 1 × 1, and later, normalization is used to avoid the disappearance of gradient and to speed up the convergence; the activation function can effectively increase the nonlinearity of the network and make it learn better; the DBRCBR module contains two types of convolutions, i.e., depthwise convolution and point convolution, to form depthwise separable convolution. Therefore, the traditional 3 × 3 convolution is replaced with this depthwise separable convolution. Conv2D in the DBRCBR module is point convolution, and it is actually a 1 × 1 convolution. Its function is to freely change the number of output channels. Next, it performs channel fusion on the feature map of depth convolution output. After the convolution operations of depth convolution and point convolution, this is the operation process of depth separable convolution. The scSE attention mechanism can enhance the feature information that is needed and can suppress the feature information that is not needed.

3.1. scSE Attention Mechanism

The spatial and channel squeeze and channel excitation (scSE) attention mechanism module is a combination of two modules, spatial squeeze and channel excitation (cSE) and channel squeeze and spatial excitation (sSE).
The cSE attention module is shown in Figure 3. First, global pooling is performed on the feature map to compress the feature image information. It can generate a Z vector (dimension is 1 × 1 × C) whose dimension changes from [C, H, W] to [C, 1, 1], and compresses the height and width of the feature image to 1 × 1. Then, two fully connected layers are used to process feature information; the fully connected neuron quantity is less than the input feature layer the first time, while it is equal the second time. After two times of full connection, the C-dimension vector is obtained, and the Sigmoid activation function is used for normalization to fix its value between zero and one. At this time, the weight of each channel of the input feature layer is known. Finally, multiply the obtained weights with the original feature map to obtain the calibrated results, which improves the ability of the network to extract channel features.
Figure 4 displays the sSE attention mechanism module whose role is to squeeze the feature map along the channel and spatially motivate. First, operate on the input feature map. The channel compression is performed using a 1 × 1 × 1 convolutional method, which changes the feature map from the initial [C, W, H] to [1, H, W]. Then, a new spatial feature map is obtained by the Sigmoid activation function method, and finally the obtained new spatial feature map is multiplied with the original feature map to achieve spatial information calibration. This operation has a significant effect on the spatial position of related features, thereby, omitting irrelevant features and improving the learning ability of spatial feature information.
The scSE attention mechanism module adds the new features obtained by both cSE and sSE attention modules to complete the stacking of information. The scSE attention mechanism can enhance the channel and spatial information, and can improve the detection accuracy of the whole network, as shown in Figure 5.

3.2. Depth Separable Convolutional Module

Depthwise separable convolution is a combination of depthwise convolution and point convolution. It can reduce the number of parameters calculated, and can reduce the model size. The process of depthwise separable convolution is shown in Figure 6. First, the first convolutional operation is performed on the input RGB image, and each convolutional kernel is responsible for calculating one channel. The quantity of convolution kernels is similar to that of the channels in the previous-layer feature map. Through this operation, three characteristic graphs are obtained. However, the number of feature maps after depth convolution is consistent with the number of channels in the input layer, so more and more effective feature maps cannot be obtained. Meanwhile, this operation also requires a separate convolutional computation for each channel of the input layer, which cannot effectively utilize the feature information of different channels at the same spatial position. Thus, point convolution is adopted to convolve these feature maps again to generate new feature maps. The computation mode of point convolution is very similar to that of conventional convolution. The size of the convolution kernel of point convolution is 1 × 1 × C, where C represents the quantity of channels after depthwise convolution. Each convolutional kernel has three channels. Then, multiply and accumulate the convolutional kernel with the three channels of the input image to obtain new feature maps. The amount of these new feature maps is consistent with that of the convolutional kernels. After the above depthwise and point convolutions, we can finally acquire new features. In addition, the number of network parameters is significantly reduced, and the detection speed is quickened.
For ordinary convolution, the calculation formula of its parameter quantity is Formula (1) as follows:
N p c = D F × D F × C i × C o
where DF refers to the dimension of convolutional kernel, Ci refers to the quantity of input channels, Co refers to the dimension of output, and Npc refers to the quantity of parameters.
The formula of network computation amount is Formula (2) as follows:
N c c = H o × W o × C o × D F × D F × C i
where Ho and Wo stand for the height and width of the input feature map, respectively, and Ncc stands for the network computation amount.
For depthwise separable convolution, its parameter quantity is calculated by adding the total parameters obtained by depthwise convolution and point convolution. Formula (3) can be used to calculate the parameter quantity of depthwise convolution as follows:
N p 1 = D F × D F × C i
Formula (4) can be used to calculate its network computation amount as follows:
N c 1 = H o × W o × C i × D F × D F
Formula (5) can be used to calculate the parameter quantity of point convolution as follows:
N p 2 = C i × C o
Formula (6) can be used to calculate its network computation amount as follows:
N c 2 = H o × W o × C o × 1 × 1 × C i
Thus, the total parameters for depthwise separable convolution can be obtained by Formula (7):
N p d = N p 1 + N p 2
The total network computation amount can be obtained by Formula (8):
N c d = N c 1 + N c 2
Further, the ratio of the parameter quantity of depthwise separable convolution to that of ordinary convolution can be obtained as per Formula (9), from which we can know the number of parameters under depthwise separable convolution is 1 C o + 1 D F 2 of the original one:
N p d N p c = D F × D F × C i + C i × C o D F × D F × C i × C o = 1 C o + 1 D F 2
In the improved MobilenetV1-YOLOv4 algorithm network, the ordinary 3 × 3 convolutions in the PANet network and the SPP network are replaced by the depthwise separable convolution, and the weight after training is 57.9 MB. As compared with the weight of 244 MB trained by the YOLOv4 network, the size of the weight is reduced by 76.3%.

3.3. Mobilenet-V1 Network

In this paper, the original backbone network CSPdarknet53 is replaced by the Mobilenet-V1 lightweight network. The Mobilenet model is a lightweight network proposed by Google for devices with low computing power such as embedded devices. Its core operation is to form the main network by using depthwise separable convolutional blocks. Three preliminary effective features can be obtained through the Mobilenet-V1 network, which are 52 × 52 × 256, 26 × 26 × 512, and 13 × 13 × 1024. Table 1 demonstrates the Mobilenet-V1 structure, in which the ordinary convolution is represented by Conv, the depthwise separable convolution is represented by Conv dw, and the step size in the convolutional process is represented by S. Three layer structures of the original Mobilenet-V1 are removed, namely global average pooling, fully connected layer, and Softmax.

4. Experimental Platform Construction and Training

4.1. Experimental Platform

This experiment is completed under Windows11 system with a computer configuration as follows: the 12th Gen Intel(R) Core(TM) [email protected] GHz CPU, 32 GB memory, NVDIA GeForce RTX 3080 Ti GPU, 12GB video memory, software Anaconda3, version 3.7 Python, and Tensorflow2.5 deep learning framework.

4.2. Data Collection and Processing

The dataset of this paper comes from the Baidu open source dataset, including 4147 pictures of diversified and defective insulators, and various types of insulator defect data. In order to enable the network model to learn the characteristics and location information of insulators and to improve the accuracy of model output, insulator defects in the image can be marked. The .xml label files are formed by using the labelimg software to mark insulator defects. The boundary box is minimized as far as possible during the image marking of insulator defects to reduce the influence of background. The annotation results are saved in PASCAL VOC format, and the generated XML format file is saved in the pre-created folder. The ratio of training set plus validation set to test set is 9:1, and the ratio of training set to validation set is 9:1. Some dataset images are shown in Figure 7.

4.3. Model Training

A total of four groups of comparative experiments were performed in this study. The first group contains YOLOv4, MobilenetV1-YOLOv4, and improved MobilenetV1-YOLOv4; the second group is the comparison with other lightweight models, namely MobilenetV1-YOLOv4, MobilenetV2-YOLOv4, MobilenetV3-YOLOv4, and Ghostnet-YOLOv4; the third group is the comparison with improved networks, respectively, improved MobilenetV1-YOLOv4, improved MobilenetV2-YOLOv4, improved MobilenetV3-YOLOv4, and improved Ghostnet-YOLOv4. The fourth group of experiments was conducted in a deeply separable convolutional network by adding the scSE attention mechanism. It is compared with the current algorithm YOLOv5.
The idea of migration learning is applied in the training of the network, and the pretraining weights are obtained by using the training VOC dataset. During the training process, each network is trained for a total of 250 times. For the first 50 times of training, the backbone network of the model is frozen by the frozen training to improve the training efficiency and avoid the destruction of weights. In addition, the occupied video memory is also small in the process. The batch size is set to 32 during frozen training. Afterwards, unfrozen training is performed to unfreeze the backbone, which increases the occupation of the network video memory. At this time, the batch size is set to 16. By judging the current batch size, the learning rate of the model can be adaptively adjusted from the minimum 0.0001 to the maximum 0.01. Three different YOLO Heads are acquired by improving the training of the network, and finally the prediction frame is obtained by using the non-maximum suppression method. The essence of the non-maximum suppression method is to search local maxima and suppress nonmax elements. First, set the confidence threshold of the target box. The threshold set in this study is 0.5. Then, arrange the list of candidate boxes in descending order according to the confidence. In the identified target categories, select and retain the bounding box A with the highest confidence, and then calculate the IoU of bounding box A and the remaining boxes B. If the IoU value is greater than the threshold, remove B. Repeat this step until the iteration of a target class is completed and the required target box is finally output.
In the training process, the mosaic data enhancement method is used, and four pictures are randomly read at one time from the dataset for scaling or cropping to get a new picture after combination. The new image has a different background, which enriches the spatial semantic information and enhances the generalization ability of the model. By using the label smoothing training method, the data can be better calibrated and can be more accurately predicted. The cosine annealing algorithm is used to avoid the network falling into the local optimal solution. Formula (10) illustrates the principle of using the cosine annealing algorithm to prevent the network from falling into the local optimal solution:
η t = η min i + 1 2 ( η max i η min i ) ( 1 + cos ( T c u r T i π ) )
where i represents the ordinal of the index value; η min i and η max i stands for the minimum and maximum values of learning rate, respectively; Tcur refers to the current number of cycles; and Ti means the total number of cycles under the current operating environment.

4.4. Loss Function

The loss function of the improved MobilenetV1-YOLOv4 consists of three parts, which are, in turn, position regression loss, object confidence loss, and classification loss, as shown in Formula (11):
L o s s = L o s s l o c + L o s s o b j + L o s s c l s
Formula (12) shows the position regression loss:
L o s s l o c = λ c o o r d i = 0 s × s j = 0 M I i j o b j ( 2 W i × h i ) ( 1 C I O U )
where the computations for CIOU, β, and v follow Formulas (13)–(15), respectively:
C I O U = I O U ρ 2 ( b , b g t ) c 2 β ν
β = ν 1 I O U + ν
ν = 4 π 2 ( arctan w g t h g t arctan w h ) 2
Formula (16) displays the object confidence loss:
L o s s o b j = i = 0 s × s j = 0 M I i j o b j [ C i log ( C i ) + ( 1 C i ) log ( 1 C i ) ] λ n o o b j i = 0 s × s j = 0 M I i j n o o b j [ C i log ( C i ) + ( 1 C i ) log ( 1 C i ) ]
Formula (17) displays the classification loss function:
L o s s c l s = i = 0 s × s j = 0 M I i j o b j c c l a s s e s [ P i ( c ) log ( P i ( c ) ) + ( 1 P i ( c ) ) log ( 1 P i ( c ) ) ]
From (12) to (17), λcoord and λnoobj represent the weight coefficient; s × s represents the grid number; M represents each anchor box; wi and hi, respectively, represent the width and height of the prediction box center point; IOU represents the ratio of the predicted value to the true value; ρ 2 ( b , b g t ) represents the Euclidean distance between the center points of prediction box and true value; C represents the diagonal distance of the smallest closure region that can contain both the prediction box and the true value; β is a parameter to measure the consistency of length-width ratio; V is a trade-off parameter; C i is the predicted confidence degree and Ci is the actual confidence degree; I i j o b j refers to the jth prediction box of the ith grid that is responsible for predicting the target; and I i j n o o b j refers to the jth prediction box of the ith grid that contains no prediction of target. Figure 8 demonstrates the improved MobilenetV1-YOLOv4 loss function. The smaller the loss function, the better the robustness of the model.
As shown in Figure 8, the loss function of the training set and the verification set has a significant downward trend. After 200 times of training, the loss function gradually converges to a fixed value. After reaching 250 times, the loss function is basically unchanged. This indicates that the improved network has high convergence speed and the trainings work well.

5. Experimental Results and Discussion

This paper takes precision P, recall rate R, average precision (mAP), and frame rate (FPS) as the main evaluation indicators of the proposed algorithm in the experiment. Their formulae are, respectively, as follows:
P = T P T P + F P
R = T P T P + F N
A P = 0 1 P ( R ) d R
m A P = 1 n i = 1 n A P
f F P S = N t
In these formulae, TP refers to the number of correctly predicted positive samples, FP the number of incorrectly predicted positive samples, and FN the number of incorrectly predicted negative samples; n represents the number of object detection categories, N represents the number of detected pictures, and t represents the detection time. An intersection ratio, which is the ratio of the intersection of the predicted frame and the real frame to their union, is introduced and set to 0.5 in this experiment. Table 2 provides the experimental results of the first group containing YOLOv4, MobilenetV1-YOLOv4, and the improved MobilenetV1-YOLOv4.
As can be seen from Table 2, the recall rate of the improved MobilenetV1-YOLOv4 is 21.7% higher than that of YOLOv4, and it reaches a detection speed of 190 frames/s, which may be due to the addition of the scSE attention mechanism and the depthwise separable convolution. In addition, as compared with the YOLOv4 algorithm, the detection speed of the improved MobilenetV1-YOLOv4 also increases by 2.6 times, and the detection precision improves by 4.14%; as compared with MobilenetV1-YOLOv4, the improved MobilenetV1-YOLOv4 also has a higher detection speed and a higher detection precision; the model weight of the improved MobilenetV1-YOLOv4 is significantly reduced as compared with YOLOv4 and MobilenentV1-YOLOv4. This is because scSE can automatically learn the effective features of an image from both image space and feature channel, can suppress useless redundant features, and can better retain image edge information. Finally, more useful features are obtained through feature splicing and convolution. The improved algorithm can identify insulator defects better.
Table 3 presents the comparisons among the improved MobilenetV1-YOLOv4, the MobilenetV1-YOLOv4, and the lightweight algorithms MobilnetV2-YOLOv4, MobilnetV3-YOLOv4, and Ghostnet-YOLOv4.
As can be seen from Table 3, the recall rate of the improved MobilenetV1-YOLOv4 is higher than that of other lightweight algorithms, and its mAP value is higher than that of other lightweight algorithms except that of MobilenetV2-YOLOv4. In addition, the detection speed of the improved MobilenetV1-YOLOv4 is optimal. Although MobilenetV1-YOLOv4, MobilenetV2-YOLOv4, MobilentV3-YOLOv4, and Ghostnet-YOLOv4 are lightweight networks, among the enhanced feature extraction networks, 3 × 3 convolution is still an ordinary convolution. The large number of network parameters leads to slower detection speed and greater model weight. The improved network is basically composed of deeply separable convolution and 1 × 1 convolution composition. This convolution can improve the real-time performance of the algorithm, and under the action of the scSE attention mechanism, it can ensure that the network accuracy will not decline.
Table 4 shows the third group of comparative experiments, which all have the added scSE attention mechanism. Finally, there are five convolution layers behind the SPP, and the 3 × 3 convolution is replaced by a depthwise separable convolution.
As can be seen from Table 4, as compared with other improved lightweight networks, the improved MobilenetV1-YOLOv4 still has the best detection accuracy and detection speed. In Table 4, all the lightweight networks have been improved. Because the improved Ghostnet-YOLOv4, improved MobilenetV2-YOLOv4, and improved MobilenetV3-YOLOv4 have a small number of network model parameters, the accuracy is lower than that of the improved MobilenetV1-YOLOv4. Through experimental verification, the detection speed of the improved MobilenetV1-YOLOv4 is optimal, with detection accuracy of 98.81% and detection speed of 190 frames/s, therefore, achieving significant improvement in its detection of insulator defects.
As can be seen from Table 5, first, scSE attention mechanism (2 scSEs) is only added after the upsampling results. Secondly, scSE (3 scSEs) is added after preliminary feature extraction. In this paper, we argue that embedding scSE attention modules into different parts of the network produces different experimental results. The semantic information of the feature map initially extracted is not rich, but it still retains the medium and shallow texture information and contour information of the target in the feature map. This information is very important for target detection. After the initial extraction of the three dimensions, embedding the scSE attention module can better enhance the spatial features and channel features of the target in the feature map. In the PANet structure, its feature map shows richer semantic features, larger receptive field, and smaller feature map scale. The scSE attention module can no longer effectively distinguish important spatial and channel features from the highly fused small-scale feature map.
In the experiment, with a gradual increase in scSE, the accuracy gradually increases, but the detection speed becomes smaller and smaller, because, in the same network, the network parameters are also increased. Finally, as compared with YOLOv5, although the accuracy is reduced, the detection speed and model weight are still better than YOLOv5.
Figure 9a–d show the P-R curves of the above four groups of experiments. The insulator defect label name is 1.
In a P-R curve, P represents precision and R represents recall. With recall as the abscissa axis and precision as the ordinate axis, a P-R curve can intuitively display the overall accuracy and recall of the classification algorithm. We can illustrate the superiority of the algorithm by comparing the area size under the P-R curve. It can be seen from Figure 9a–c that the area of the improved MobilenetV1-YOLOv4 curve is large, indicating that its detection accuracy is good. The green line represents YOLOv5, which occupies the largest area due to its highest accuracy.
Figure 10 is a chart of insulator defect prediction results. Regardless of whether the insulator defects in the figure are large or small, they can be effectively detected without missed detection. In addition, in each picture, the confidence level of the improved algorithm is higher than YOLOv4, indicating that the detection accuracy of this algorithm is higher. When the defect is at the edge of the image, the improved YOLOv4 detection effect is better.

6. Conclusions

In this paper, we put forward an insulator defect detection algorithm based on an improved MobilenetV1-YOLOv4. Comparative experiments with other algorithms prove the superiority of the proposed algorithm model, which has good detection accuracy while ensuring the speed.
(1) The accuracy of the insulator defect detection algorithm based on an improved MobilenetV1-YOLOv4 is up to 98.81% with a detection speed of 190 frames/s, which is 116 frames/s, 37 frames/s, 51 frames/s, 61 frames/s, and 72 frames/s faster than that of YOLOv4, MobilenetV1-YOLOv4, MobilenetV2-YOLOv4, MobilenetV3-YOLOv4, and Ghostnet-YOLOv4, respectively. As compared with the original YOLOv4, the detection accuracy of the proposed insulator defect detection algorithm is 4.14% higher.
(2) We obtain a brand new algorithm structure by replacing the backbone network, adding the scSE attention mechanism, and using depthwise separable convolution to reduce the quantity of network parameters. The experimental results show that the proposed algorithm can effectively detect insulator defects with highly improved detection accuracy and speed.
(3) We put forward a lightweight detection algorithm that is significantly advantageous for real-time detection of insulator defects. This is a direction worthy of research. In the future, we plan to further research insulator faults such as bursting and contamination. With the emergence of a new target detection algorithm, the algorithm proposed in this paper needs to be further optimized. At present, it will take some time to complete the hardware deployment. In the future, we plan to consider applying this model to other detections to achieve the real-time performance and generalization ability of this algorithm.

Author Contributions

Methodology, S.X., J.D., Y.H., L.L. and T.H.; conceived and designed the experiments, S.X. and J.D.; performed the experiments, J.D. and Y.H.; conceptualization and investigation, J.D. and L.L.; analyzed the data, J.D. and T.H.; wrote the paper, S.X. and J.D.; funding acquisition, Y.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number 61772033 and the Research and the Development Fund of Institute of Environmental Friendly Materials and Occupational Health, Anhui University of Science and Technology, grant number ALW2021YF03.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

The data that support the findings of this study are available on request from corresponding author. The paper has obtained the written informed consent of the patient to publish the paper.

Data Availability Statement

The data that support the findings of this study are available on request from corresponding author.

Acknowledgments

The authors want to thank the editor and anonymous reviewers for their valuable suggestions for improving this paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Dong, W.; Liang, H.; Liu, G.; Hu, Q.; Yu, X. Review of Deep Convolution Applied to Target Detection Algorithm. J. Front. Comput. Sci. Technol. 2022, 16, 1025–1042. [Google Scholar]
  2. Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision & Pattern Recognition IEEE Computer Society, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
  3. Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
  4. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  5. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
  6. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Cheng-Yang, F.; Alexander, C. Ssd: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar]
  7. Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
  8. Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
  9. Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
  10. Han, G.; He, M.; Gao, M.; Yu, J.; Liu, K.; Qin, L. Insulator Breakage Detection Based on Improved YOLOv5. Sustainability 2022, 14, 6066. [Google Scholar] [CrossRef]
  11. Wang, S.; Liu, Y.; Qing, Y.; Wang, C.; Lan, T.; Yao, R. Detection of Insulator Defects with Improved ResNeSt and Region Proposal Network. IEEE Access 2020, 8, 184841–184850. [Google Scholar] [CrossRef]
  12. Gao, Z.; Yang, G.; Li, E.; Liang, Z. Novel Feature Fusion Module-Based Detector for Small Insulator Defect Detection. IEEE Sens. J. 2021, 21, 16807–16814. [Google Scholar] [CrossRef]
  13. Yu, L.; Zhu, J.; Zhao, Q.; Wang, Z. An Efficient YOLO Algorithm with an Attention Mechanism for Vision-Based Defect Inspection Deployed on FPGA. Micromachines 2022, 13, 1058. [Google Scholar] [CrossRef]
  14. Wang, Z.; Liu, X.; Peng, H.; Zheng, L.; Gao, J.; Bao, Y. Railway Insulator Detection Based on Adaptive Cascaded Convolutional Neural Network. IEEE Access 2021, 9, 115676–115686. [Google Scholar] [CrossRef]
  15. Feng, Z.; Guo, L.; Huang, D.; Li, R. Electrical insulator defects detection method based on yolov5. In Proceedings of the 2021 IEEE 10th Data Driven Control and Learning Systems Conference (DDCLS), Suzhou, China, 14–16 May 2021; IEEE: Manhattan, NY, USA, 2021; pp. 979–984. [Google Scholar]
  16. Sadykova, D.; Pernebayeva, D.; Bagheri, M.; James, A. IN-YOLO: Real-Time Detection of Outdoor High Voltage Insulators Using UAV Imaging. IEEE Trans. Power Deliv. 2019, 35, 1599–1601. [Google Scholar] [CrossRef]
  17. Liquan, Z.; Mengjun, Z.; Ying, C.; Yanfei, J. Fast Detection of Defective Insulator Based on Improved YOLOv5s. Comput. Intell. Neurosci. 2022, 2022, 8955292. [Google Scholar] [CrossRef] [PubMed]
  18. Zhang, X.; Zhang, Y.; Liu, J.; Zhang, C.; Xue, X.; Zhang, H.; Zhang, W. InsuDet: A Fault Detection Method for Insulators of Overhead Transmission Lines Using Convolutional Neural Networks. IEEE Trans. Instrum. Meas. 2021, 70, 1–12. [Google Scholar] [CrossRef]
  19. Liu, C.; Wu, Y.; Liu, J.; Sun, Z. Improved YOLOv3 Network for Insulator Detection in Aerial Images with Diverse Background Interference. Electronics 2021, 10, 771. [Google Scholar] [CrossRef]
  20. Liu, J.; Liu, C.; Wu, Y.; Xu, H.; Sun, Z. An Improved Method Based on Deep Learning for Insulator Fault Detection in Diverse Aerial Images. Energies 2021, 14, 4365. [Google Scholar] [CrossRef]
  21. Zhao, W.; Xu, M.; Cheng, X.; Zhao, Z. An Insulator in Transmission Lines Recognition and Fault Detection Model Based on Improved Faster RCNN. IEEE Trans. Instrum. Meas. 2021, 70, 1–8. [Google Scholar] [CrossRef]
  22. Wu, Q.; An, J.; Lin, B. A Texture Segmentation Algorithm Based on PCA and Global Minimization Active Contour Model for Aerial Insulator Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2012, 5, 1509–1518. [Google Scholar] [CrossRef]
  23. Liao, S.; An, J. A Robust Insulator Detection Algorithm Based on Local Features and Spatial Orders for Aerial Images. IEEE Geosci. Remote Sens. Lett. 2014, 12, 963–967. [Google Scholar] [CrossRef]
  24. She, L.; Fan, Y.; Wang, J.; Cai, L.; Xue, J.; Xu, M. Insulator Surface Breakage Recognition Based on Multiscale Residual Neural Network. IEEE Trans. Instrum. Meas. 2021, 70, 1–9. [Google Scholar] [CrossRef]
  25. Han, G.; Zhang, M.; Wu, W.; He, M.; Liu, K.; Qin, L.; Liu, X. Improved U-Net based insulator image segmentation method based on attention mechanism. Energy Rep. 2021, 7, 210–217. [Google Scholar] [CrossRef]
  26. He, H.; Huang, X.; Song, Y.; Zhang, Z.; Wang, M.; Chen, B.; Yan, G. An insulator self-blast detection method based on YOLOv4 with aerial images. Energy Rep. 2022, 8, 448–454. [Google Scholar] [CrossRef]
  27. Zhang, Z.; Huang, S.; Li, Y.; Li, H.; Hao, H. Image Detection of Insulator Defects Based on Morphological Processing and Deep Learning. Energies 2022, 15, 2465. [Google Scholar] [CrossRef]
  28. Ding, J.; Cao, H.; Ding, X.; An, C. High Accuracy Real-Time Insulator String Defect Detection Method Based on Improved YOLOv5. Front. Energy Res. 2022, 10, 889. [Google Scholar] [CrossRef]
  29. Qiu, Z.; Zhu, X.; Liao, C.; Shi, D.; Qu, W. Detection of Transmission Line Insulator Defects Based on an Improved Lightweight YOLOv4 Model. Appl. Sci. 2022, 12, 1207. [Google Scholar] [CrossRef]
  30. Han, G.; Li, T.; Li, Q.; Zhao, F.; Zhang, M.; Wang, R.; Yuan, Q.; Liu, K.; Qin, L. Improved Algorithm for Insulator and Its Defect Detection Based on YOLOX. Sensors 2022, 22, 6186. [Google Scholar] [CrossRef]
  31. Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
  32. Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
  33. Howard, A.; Sandler, M.; Chu, G.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; Quoc, V.; et al. Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar]
  34. Wang, C.Y.; Liao, H.Y.M.; Wu, Y.H.; Chen, P.Y.; Hsieh, J.W.; Yeh, I.H. CSPNet: A new backbone that can enhance learning capability of CNN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 390–391. [Google Scholar]
  35. Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. Ghostnet: More Features from Cheap Operations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 1580–1589. [Google Scholar]
Figure 1. The network structure of MobilenetV1-YOLOv4.
Figure 1. The network structure of MobilenetV1-YOLOv4.
Entropy 24 01588 g001
Figure 2. Improved MobilenetV1-YOLOv4 network.
Figure 2. Improved MobilenetV1-YOLOv4 network.
Entropy 24 01588 g002
Figure 3. Spatial squeeze and channel excitation (cSE).
Figure 3. Spatial squeeze and channel excitation (cSE).
Entropy 24 01588 g003
Figure 4. Channel squeeze and spatial excitation (sSE).
Figure 4. Channel squeeze and spatial excitation (sSE).
Entropy 24 01588 g004
Figure 5. Concurrent spatial and channel squeeze and channel excitation (scSE).
Figure 5. Concurrent spatial and channel squeeze and channel excitation (scSE).
Entropy 24 01588 g005
Figure 6. Depthwise separable convolution.
Figure 6. Depthwise separable convolution.
Entropy 24 01588 g006
Figure 7. Several sample images.
Figure 7. Several sample images.
Entropy 24 01588 g007
Figure 8. Loss function.
Figure 8. Loss function.
Entropy 24 01588 g008
Figure 9. P-R curves based on YOLOv4.
Figure 9. P-R curves based on YOLOv4.
Entropy 24 01588 g009aEntropy 24 01588 g009b
Figure 10. Insulator defect prediction result graph.
Figure 10. Insulator defect prediction result graph.
Entropy 24 01588 g010aEntropy 24 01588 g010b
Table 1. Parameters of the Mobilenet-V1 network.
Table 1. Parameters of the Mobilenet-V1 network.
MobileNet-V1 Body
Type/StideFilter ShapeOutput Size
Input 416 × 416 × 3
Conv/s23 × 3 × 3 × 32208 × 208 × 32
Conv dw/s13 × 3 × 32 dw208 × 208 × 32
Conv/s11 × 1 × 32 × 64208 × 208 × 64
Conv dw/s23 × 3 × 64 dw104 × 104 × 64
Conv/s11 × 1 × 64 × 128104 × 104 × 128
Conv dw/s13 × 3 × 128 dw104 × 104 × 128
Conv/s11 × 1 × 128 × 128104 × 104 × 128
Conv dw/s23 × 3 × 128 dw52 × 52 × 128
Conv/s11 × 1 × 128 × 25652 × 52 × 256
Conv dw/s13 × 3 × 256 dw52 × 52 × 256
Conv/s11 × 1 × 256 × 25652 × 52 × 256
Conv dw/s23 × 3 × 256 dw26 × 26 × 256
Conv/s11 × 1 × 256 × 51226 × 26 × 512
5 × Conv dw/s1,
Conv/s1
3 × 3 × 256 dw,
1 × 1 × 512 × 512
26 × 26 × 512
Conv dw/s23 × 3 × 512 dw13 × 13 × 512
Conv/s11 × 1 × 512 × 102413 × 13 × 1024
Conv dw/s13 × 3 × 1024 dw13 × 13 × 1024
Conv/s11 × 1 × 1024 × 102413 × 13 × 1024
Table 2. Original contrast experiment (the first group).
Table 2. Original contrast experiment (the first group).
AlgorithmRecall Rate (R)/%mAP/%FPS Frame/sModel Weight/MB
YOLOv477.5994.4174244
MobilenetV1-YOLOv492.2998.55153155
Improved MobilenetV1-YOLOv493.2598.8119057.9
Table 3. Comparison experiment of mainstream lightweight networks (the second group).
Table 3. Comparison experiment of mainstream lightweight networks (the second group).
AlgorithmRecall Rate (R)/%mAP/%FPS Frame/sModel Weight/MB
MobilenetV1-YOLOv492.2998.55153155
MobilenetV2-YOLOv492.5398.95139148
MobilenetV3-YOLOv490.1297.74129152
Ghostnet-YOLOv488.9298.02118150
Improved MobilenetV1-YOLOv493.2598.8119057.9
Table 4. Comparative experiment after network improvement (the third group).
Table 4. Comparative experiment after network improvement (the third group).
AlgorithmRecall Rate (R)/%mAP/%FPS Frame/sModel Weight/MB
Improved Ghostnet-YOLOv491.3397.9813543.8
Improved MobilenetV2-YOLOv492.7798.1316845.5
Improved MobilenetV3-YOLOv490.8497.9816448.9
Improved MobilenetV1-YOLOv493.2598.8119057.9
Table 5. Comparative experiments (the four group).
Table 5. Comparative experiments (the four group).
AlgorithmscSERecall Rate (R)/%mAP/%FPS Frame/sModel Weight/MB
Improved Mobi-lenetV1-YOLOv4290.8498.0720652
Improved Mobi-lenetV1-YOLOv4390.8498.1819657.6
Improved Mobi-lenetV1-YOLOv4593.2598.8119057.9
YOLOv5 98.5599.6710981.8
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Xu, S.; Deng, J.; Huang, Y.; Ling, L.; Han, T. Research on Insulator Defect Detection Based on an Improved MobilenetV1-YOLOv4. Entropy 2022, 24, 1588. https://doi.org/10.3390/e24111588

AMA Style

Xu S, Deng J, Huang Y, Ling L, Han T. Research on Insulator Defect Detection Based on an Improved MobilenetV1-YOLOv4. Entropy. 2022; 24(11):1588. https://doi.org/10.3390/e24111588

Chicago/Turabian Style

Xu, Shanyong, Jicheng Deng, Yourui Huang, Liuyi Ling, and Tao Han. 2022. "Research on Insulator Defect Detection Based on an Improved MobilenetV1-YOLOv4" Entropy 24, no. 11: 1588. https://doi.org/10.3390/e24111588

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop