LA_YOLOx: Effective Model to Detect the Surface Defects of Insulative Baffles

Li, Quanyang; Luo, Zhongqiang; He, Xiangjie; Chen, Hongbo

doi:10.3390/electronics12092035

Open AccessArticle

LA_YOLOx: Effective Model to Detect the Surface Defects of Insulative Baffles

¹

School of Automation and Information Engineering, Sichuan University of Science and Engineering, Yibin 644000, China

²

Artificial Intelligence Key Laboratory of Sichuan Province, Sichuan University of Science and Engineering, Yibin 644000, China

³

Sichuan Shuneng Electric Power Co., Ltd., Chengdu 610000, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Electronics 2023, 12(9), 2035; https://doi.org/10.3390/electronics12092035

Submission received: 12 March 2023 / Revised: 12 April 2023 / Accepted: 26 April 2023 / Published: 27 April 2023

Download

Browse Figures

Versions Notes

Abstract

:

In the field of industry, defect detection based on YOLO models is widely used. In real detection, the method of defect detection of insulative baffles is artificial detection. The work efficiency of this method, however, is low because the detection is depends absolutely on human eyes. Considering the excellent performance of YOLOx, an intelligent detection method based on YOLOx is proposed. First, we selected a CIOU loss function instead of an IOU loss function by analyzing the defect characteristics of insulative baffles. In addition, considering the limitation of model resources in application scenarios, the lightweight YOLOx model is proposed. We replaced YOLOx’s backbone with lightweight backbones (MobileNetV3 and GhostNet), and used Depthwise separable convolution instead of conventional convolution. This operation reduces the number of network parameters by about 42% compared with the original YOLOx network. However, the mAP of it is decreased by about 0.8% compared with the original YOLOx model. Finally, the attention mechanism is introduced into the feature fusion module to solve this problem, and we called the lightweight YOLOx with an attention module LA_YOLOx. The final value of mAP of LA_YOLOx reaches 95.60%, while the original YOLOx model is 95.31%, which proves the effectiveness of the LA_YOLOx model.

Keywords:

defect detection; YOLOx; insulative baffles; lightweight structures; attention module

1. Introduction

In our modern life, electric power is an indispensable living element for human activities. However, defects and breakdowns of electrical equipment will trigger some accidents because of long-time operations. At present, electric power enterprises mainly utilize the maintenance method to ensure the safe operation of electrical equipment [1]. Insulative baffles is one of essential tools for maintenance work, which can effectively isolate electrical equipment and workers, help workers to finish the maintain work, and protect the safety of workers. However, there exist wrong operations in the process of manufacture and other factors such as the long-time usage and the bad operating environment. These factors may lead to defects of insulative baffles. These defects may bring some hidden dangers to the electric power operation. In order to ensure the high quality of insulative baffles and improve the security of insulative baffles, it is necessary to detect the surface defects of insulative baffles.

Surface defect detection is a common task in the process of industrial quality inspection, which is the key to control product quality [2]. In the field of surface defect detection, manual visual inspection is the earliest method, but this method consumes manpower, time and energy, and is not efficient. With the development of machine vision technology and the requirements of industry, in the middle- and late-20th century, eddy current detection [3], ultrasonic detection [4], magnetic flux leakage detection [5] and other surface defect detection methods have been paid attention to and studied by many scholars. These methods are mainly used to detect all kinds of high sensitivity devices with metal materials. Their detection speed is fast, the cost is low, and remote detection can also be realized. However, the application of these methods is limited, and only a few specific types of defects can be identified, which is difficult to meet the needs of surface defect detection of other industrial devices. In 1999, Lowe proposed the scale-invariant feature transform (SIFT) feature extraction algorithm [6,7]. In 2005, Navneet [8] proposed the histogram of oriented gradient (HOG) algorithm for the feature extraction of edge gradients in different directions. Image classification methods such as support vector machine (SVM) and random forest(RF) can be combined with the above feature extraction algorithm to build a traditional machine vision surface defect detection system. However, the traditional machine vision methods have tedious steps and a poor robustness of the system, and the detection effects in complex environments are not ideal. The current defect detection tasks of insulative baffles are characterized by small targets, multiple defects, and non-obvious defect features, and the task requirements are high precision, good real-time performance, good robustness, etc. Therefore, those traditional methods of surface defect detection are not suitable for the defect detection of insulative baffles.

Defect detection belongs to the field of object detection. In 2014, Ross firstly constructed the RCNN network for object detection task [9]. Subsequently, aiming at the shortcomings of RCNN, Ross created Fast RCNN [10] and He Kaiming’s team proposed Faster RCNN [11]. These RCNN (Regions with CNN features) networks are two-stage networks, namely, the generation of candidate boxes and classification of them, in which the generation of candidate boxes takes a long time. In addition, the acquisition of regional recommendation candidate boxes, and then the classification of each candidate box, is still a large amount of calculation. Successively, more one-stage networks have been proposed, such as YOLO (You only look once) [12] and SSD (Single shot multibox detector) [13], in view of the speed, performance of real-time and computation of RCNN series. YOLO networks achieves excellent performance of real-time at the expense of certain accuracy. However, with the continuous research and improvement of YOLO, YOLOv4 [14] and YOLOv5 have been able to complete the target detection task excellently with high accuracy. In 2021, MegVII further proposed YOLOx [15] on the basis of YOLOv3 network [16], whose performance on COCO datasets exceeded YOLOv4 and YOLOv5. Therefore, this paper will also conduct research based on the YOLOx model. Due to the high real-time performance and good accuracy of the YOLO series, they are often used in the task of industrial surface defect detection, so as to replace manual and traditional defect detection methods, and achieve efficient and intelligent detection.

However, in the actual detection task, researchers usually encounter problems such as “small defects, insignificant defect features, and a large number of network training parameters”. Therefore, they also improve the network model for these problems. Cheng et al. (2021) [17] improved the YOLOv3 algorithm to solve the problem of missing detection caused by small target size and unclear features in metal surface defect detection, such as adding a feature layer and adding DIoU Loss, etc. Finally, the performance of this algorithm is tested on a NEU-DET dataset, and the results show that the algorithm is superior to the algorithm before improvement. Xu et al. (2021) [18] used the attention module in MobileNetV3 and GhostNet for reference and integrated them in the YOLO structure to finally lightweight the YOLOv4 network structure, greatly reducing the number of parameters and improving the accuracy to a certain extent. Lim et al. (2022) [19] used the YOLOv4-Tiny algorithm to detect wood defects. Because many wood manufacturers are still relying on manual human eyes inspection and the large number of YOLOv4 network parameters, the author proposed for this detection based on the YOLOv4-Tiny lightweight architecture. Kim et al. (2021) [20] proposed a method to improve the small target detection performance of YOLOv5 in aerial images for the problem of low resolution and difficulty in detecting small targets in aerial images. The specific operation is to introduce the ECA module, cancel the large object detection layer, and use transpose convolution instead of upsampling. The experiment proves that the value of mAP is increased by 6.9% compared with the original YOLOv5. Zheng et al. (2021) [21], aiming at the problem that it is difficult to effectively detect subtle defects on bearing caps, introduced the SE attention module, the dilated convolution, and added a feature layer into YOLOv3. The final test results are excellent.

This paper is of great significance to study the surface defects of insulative baffles, which can effectively prevent the low-quality baffles from entering the market, and also can timely discover the insulative baffles with potential safety risks, so as to ensure the safety of electric power operation. However, the task of insulative baffles defect detection also has similar problems. For example, application scenarios have certain restrictions on model resources, and some target defects are small or not obvious. Inspired by the above literature, this paper proposes to replace part of the original YOLOx structure with some lightweight structures to reduce the number of parameters and computation of the original model. However, in general, the lightweight operations of the network will degrade the detection performance. In order to reduce the negative impact of lightweight on the detection performance, the attention mechanism is introduced to make the network pay attention to the defect features, so as to improve the detection performance of a lightweight network. The contribution of this paper is that the lightweight and attentive YOLOx model (LA_YOLOx) is proposed. The LA_YOLOx model is successfully verified by many experimental comparisons of different lightweight structures and attention modules. The proposed scheme can not only excellently replace the existing inefficient methods and solve those problems in the process of detection, but also does not degrade the detection performance of the original YOLOx.

The remainders of this paper are as follows: Section 2 analyzes the detection task of insulative baffles’ surface defects, including the analysis of the datasets and detection algorithm of deep learning. Section 3 introduces the lightweight structure and depthwise separable convolution. Section 4 introduces the principle of the three attention modules, and summarizes and visualizes the structure of the LA_YOLOx model. Section 5 describes the experimental scheme and the results obtained. Finally, the paper and experiments are summarized in Section 6, and the problems that need to be further studied in the detection task of insulative baffles surface defects are analyzed and prospected.

2. Related Work

At present, when the quality inspector detects the surface defects of the insulative baffles, the concentration and fatigue degree of the inspector will decrease due to the long-time of work, which leads to the phenomenon of missing and wrong inspection.

There is no relevant research on the surface defect detection task of insulative baffles. Therefore, this paper will study this task to realize an intelligent detection process. Section 2.1 analyzes the categories and characteristics of the surface defects of the insulative baffles. Section 2.2 expounds on the current YOLO series algorithms and analyzes the YOLOx algorithm.

2.1. Analysis of Surface Defects of Insulative Baffles

For the task of intelligent detection of the surface defects of the insulative baffles, a total of 2054 images of defective insulation baffles were created and collected. In order to facilitate the subsequent experiments, all of the surface defect images of insulative baffles are scaled to a uniform size in this paper.

In the dataset, according to the surface physical characteristics of the insulative baffles, the common surface defects of the insulative baffles fall into the following four categories: “scratches (SC), burn marks (BM), pit points (PP), voltage breakdowns (VB)”. Defects that cause pit points and scratches are caused by improper operations during manufacturing in the factory. In addition, in the process of usage, burn marks and voltage breakdowns can be made in the high-temperature and high-pressure environment. These defects have buried hidden safety hazards to the electric power operation, which may threaten the personal safety of workers and cause economic losses. The surface defects of the insulative baffles are shown in Figure 1.

2.2. Algorithm Analysis

2.2.1. Object Detection Algorithm Based on YOLO

An object detection algorithm based on a deep learning model can be divided into a one-stage network and two-stage network. The representative models in the one-stage network are YOLO, SSD, etc., while the representative models in the two-stage network are RCNN series [22]. The two-stage network first generates candidate regions, and then performs convolutional feature extraction on the obtained candidate regions. Finally, the objects are classified according to the obtained features and the position regression of the bounding box is performed. However, this method is time-consuming, and the time needed to process different parts becomes the bottleneck of the real-time task of object detection [23]. Redmon proposed the YOLO model in his paper [12] in 2016. The YOLO model can be directly mapped from image pixels to the coordinates and classification probability of the target box, which is a one-stage network. The core idea of YOLO is to transform object detection into a regression problem, which uses the whole image as the input of the network and only passes through a neural network to obtain the position of the bounding box and the category of its, as shown in Figure 2. The YOLO model achieves the real-time requirements with excellent performance at the expense of certain recognition accuracy. However, the phenomenon of accuracy decline is mainly caused by the poor detection performance of small targets. In order to solve the problem of a poor detection effect on small targets, YOLO has been improved successively. At present, there are several versions of YOLO, among which YOLOx has the best performance on the COCO dataset. Therefore, this paper will make improvements based on the YOLOx model and verify its detection performance on our dataset.

2.2.2. YOLOx

Just as with other YOLO series networks, the network structure of YOLOx is still divided into three parts: backbone network, neck and head.

The backbone network of YOLOx inherits the CSPDarkNet in YOLOv4 and YOLOv5 and uses the Focus structure in YOLOv5. The activation function adopted by YOLOx inherits SiLU in YOLOv5, while the activation function of YOLOv4 is Mish. The two formulas are shown in Equations (2) and (3). Both the Mish and SiLU activation functions have the characteristics of no upper bound and lower bound, smooth and non-monotonic, which can be regarded as smooth ReLU activation functions. In addition, Spatial Pyramid Pooling (SPP) [24] in YOLOv4 is used in the neck of the network, while YOLOv5 and YOLOx add it to the last scale layer of the backbone network. The neck consists of Path Aggregation network (PANet) [25] and Feature Pyramid Network (FPN) [26], where upsampling and downsampling operations are completed for feature fusion. The final output in the head is three different scales.

f {(x)}_{R e l u} = m a x (0, x)

(1)

f {(x)}_{M i s h} = x * t a n h (l n (1 + e^{x}))

(2)

f {(x)}_{S i l u} = x * s i g m o i d (x)

(3)

2.2.3. Analysis and Optimization of Loss Function

In addition, in the output of the head, the regression of the predicted box to the real box adopted an IOU (Intersection over union) loss function [27]. The IOU loss can better optimize the prediction boxes. However, when there is no overlap between the real boxes and the prediction boxes, the prediction boxes cannot be optimized. There are different defects among the insulative baffle defects, and their sizes, shapes and distances are also different. The real boxes may contain the prediction boxes without overlapping parts. The IOU loss function was replaced by the CIOU (Complete intersection over union) [28] loss function in the experimental part. The formula is as shown in Equations (4)–(6). CIOU function also takes into account the three geometric factors of box regression (overlapping area, center distance, aspect ratio), which solve the problem that predicted boxes are difficult to further optimize with the loss function in the face of the mismatch between the real boxes and the predicted boxes.

C I O U_{l o s s} = 1 - I O U + \frac{ρ^{2} (b, b^{G T})}{c^{2}} + α υ

(4)

α = \frac{υ}{υ + (1 - I O U)}

(5)

υ = \frac{4}{π^{2}} {(a r c t a n \frac{w^{G T}}{h^{G T}} - a r c t a n \frac{w}{h})}^{2}

(6)

ρ (.)

represents the Euclidean distance.

(b, b^{G T})

represents the center point of the prediction box and the real box, respectively. c represents the diagonal distance of the minimum outer rectangle of both the prediction box and the real box.

α

represents the trade-off parameter.

υ

indicates the aspect ratio consistency parameter.

However, the differences between traditional YOLO and YOLOx are follows:

The traditional YOLO model adopts the Anchor-Based method, which is adopted by both YOLOv4 and YOLOv5 models, that is, the prior box is set in advance and then the prediction box is output by training. The Anchor-Free idea adopted by YOLOx means that there is no setting of a prior box. Instead, the size information of the downsampling is skillfully converted into the size of the output prediction box (If the sample is downsampled five times, the size of the final prediction box is two to the fifth, that is, 32 × 32). Then, the final output prediction box is matched with the real box, and the positive sample prediction box is further selected.

The traditional positive sample allocation scheme usually allocates the same number of positive samples to different targets in the same scene. YOLOv4 adopts a single positive sample matching strategy. It makes each real box in training only be predicted by a prior box, which is not conducive to the fast convergence of training. In YOLOv5, the multi-positive sample matching strategy is adopted to improve the training efficiency of the model, that is, each real box can be predicted by multiple prior boxes during training. However, this strategy makes the positive samples of the target inappropriate or redundant. However, considering that there are different target sizes in the same scene, YOLOx adopts SimOTA [29] technology to dynamically match positive samples for targets of different sizes, that is, different numbers of positive samples are set for targets of different sizes, which solves the problem of the positive sample allocation strategy in YOLOv4 and YOLOv5.

In the head of the traditional YOLO model, classification and regression are implemented in a 1 × 1 convolution, which, according to the author of YOLOx, brings adverse effects to network recognition. Therefore, the head of YOLOx is different from the previous YOLO model. YOLOx decouples the head into three parts: detecting whether objects are included, position regression and category. The loss function is also calculated in parts. Then, it integrates the prediction results together when predicting. Since the size of the defect target is inconsistent, taking the input of 416 × 416 image size as an example, we can obtain three output scales of features through the feature fusion module of the neck, which are 52 × 52, 26 × 26 and 13 × 13, respectively. These three scales will detect small, medium and large types of targets.

The entire YOLOx network structure is shown in Figure 3. YOLOx, as with YOLOv5, can be divided into four versions according to the depth and width of the network model: “S”, “M”, “L” and “X”, as well as the official lightweight structure “Nano” and “Tiny”. However, in general, the detection accuracy of the lightweight structure will decrease. Due to the limitation of model resources in application scenarios and the requirement of detection accuracy, the model based on YOLOx(s) is selected for training in this paper.

3. Lightweight Structures for YOLOx

The number of training parameters of YOLOv4 reach to 25 M, while that of YOLOv5 is 7.2 M and that of YOLOx is 9 M. The detection accuracy of the YOLOx model on the COCO dataset is better than that of YOLOv4 and YOLOv5. Since the number of YOLOx parameters is more than that of YOLOv5, and considering the limitation of model resources in application scenarios, the lightweight backbone networks are used to replace the original backbone network. In addition, the depthwise separable convolution (DSC) method is introduced to replace the conventional convolution of the original neck and head of YOLOx. The lightweight YOLOx network proposed in this paper is finally formed.

The computational and structural complexity of most convolutional neural networks brings great pressure to mobile deployment. Therefore, MobileNet [30,31,32] was proposed by Google with three versions: MobileNetV1, MobileNetV2, and MobileNetV3.

3.1. MobileNetV3

MobileNetV3 combines the advantages of the previous two versions, that is, it not only retains the DSC structure and inverse residual structure with a linear bottleneck, but also introduces a lightweight attention model and replaces the Swish function with an H-Swish function. MobileNet is an efficient model for mobile and embedded devices. Therefore, we chose MobileNetV3 to replace the backbone network of YOLOx. Bneck, the basic unit of MobileNetV3, is shown in Figure 4.

3.2. GhostNet

The author of GhostNet [33] visualized the feature maps output by the network, and believed that some feature maps are very similar and have certain redundancies. However, the author does not eliminate these redundancies, and raises a question: “Can a cheap operation replace the original convolutional layer to generate these redundant feature maps?”. GhostNet was born based on such a problem. It replaces the original convolutional layer with a cheap operation to produce the same similar feature maps, thus reducing the amount of calculations and parameters. Therefore, a lightweight CNN model named GhostNet is produced based on MobileNet. The basic unit of GhostNet is shown in Figure 5.

3.3. Depthwise Separable Convolution

Depthwise separable convolution (DSC) is often used in various lightweight structures. The obvious feature of this structure is that the conventional convolution is decomposed into two directions of depth and width. With the same results, the computation amount is greatly reduced, and the number of layers of the neural network can be deepened to strengthen the learning of networks. The calculation process of DSC structure is shown in Figure 6.

As shown in Figure 6, depthwise convolution of input features is performed first, and then pointwise convolution is performed after depthwise convolution. The same effect as conventional convolution is also obtained. The scheme of lightweight for YOLOx is replacing the backbone network of YOLOx with MobileNetV3 or GhostNet, and using the DSC structure instead of the conventional convolution method in the feature fusion module and decoupled head module.

4. Attention Modules for Lightweight YOLOx

Usually, the lightweight network will bring a certain decrease in accuracy. In order to improve the detection performance of the lightweight YOLOx, the attention mechanism is introduced in the feature fusion part of the YOLOx model to guide the network to pay attention to defects.

Attention mechanism is a common trick in deep learning, and its core is to make the network focus on the places in the picture that need more attention. Generally speaking, attention mechanism can be divided into channel attention mechanism, spatial attention mechanism, and the combination of them. This paper conducts experiments on three attention modules, which are SE (Squeeze and Excitation) [34], CBAM (Convolutional Block Attention Module) [35], and ECA (Efficient Channel Attention) [36], respectively.

4.1. SE

The SE module uses the Squeeze-Excitation method to focus on the weight of each channel of the input feature. The principle of SE is to pay attention to the channel features with heavy weight and suppress the channel features with a small weight. Its schematic diagram is shown in Figure 7. Firstly, the input features (C × H × W) are conducted by the global average pooling operation, which is called “Squeeze” (C × 1 × 1). Then, two consecutive fully connected layers are connected. The number of channels in the first fully connected layer is small, and the number of channels in the second fully connected layer is the same as the number of input feature channels, which is the “Excitation” (C × H × W). Finally, the Sigmoid function is used to normalize the channel weights and multiply the normalized value with the input feature layer to obtain the channel features with weights (C × H × W).

4.2. CBAM

CBAM comes from the paper [35], which is the combination of the channel attention mechanism and spatial attention mechanism. The schematic diagram of CBAM is shown in Figure 8. Firstly, the module adopts global average pooling and global maximum pooling operations for input features (C × H × W to C × 1 × 1). Secondly, it goes through two fully connected layers to obtain two pooling results. Finally, the two pooling results are added and the Sigmoid function is taken (C × 1 × 1), which is multiplied with the input feature layer to obtain the feature map weighted by channel attention (C × H × W). Next, spatial attention weighting will be applied to the feature map of channel attention. The specific operation is to take the maximum value and average value of each feature point on the channel (1 × H × W), and then a “concat” operation is performed on these two results (2 × H × W). A 1 × 1 convolution is used to adjust the number of channels (1 × H × W) and the Sigmoid function is taken to obtain the weight of each feature point in the input feature layer in the next (1 × H × W). After the weight is obtained, the weight is multiplied by the original input feature layer to obtain the channel and space the equally weighted attention feature map (C × H × W). The process can be summarized as: firstly, channel attention weighting is applied to the input feature map, and then spatial attention weighting is applied to the weighted channel features.

4.3. ECA

The ECA module, as the name indicates, is mainly aimed at improving the operation of channel attention, specifically the improvement of the SE module. After the pooling operation (C × 1 × 1), the SE module uses two consecutive fully connected layers to change the channel dimension, which the authors of ECA believed will lose part of the feature information. Therefore, different from the SE module, ECA removes the fully connected layer and replaces it with 1D convolution for learning, and the remaining structure and operation are the same as those of the SE module (C × 1 × 1). Finally, the feature after 1D convolution is multiplied by the original feature to obtain the feature after weighted channel attention (C × H × W). The size of 1D convolution will affect the number of channels to be considered in the calculation of each weight of the attention mechanism, that is, the coverage of cross-channel interaction. In this module, the size selection of 1D convolution is very important. The structure of ECA is shown in Figure 9.

4.4. Overview of Improved YOLOx

In order to reduce the amount of computation and parameters of the original YOLOx model, the original CSPDarkNet is replaced by the lightweight model. Additionally, the feature pyramid structure of the original YOLOx is retained. At the same time, the DSC method is used in the feature fusion module and prediction module. In addition, in order to improve the performance of the lightweight network, attention mechanism is introduced in the feature fusion part. We called the final model LA_YOLOx. Figure 10 shows the structure diagram of the LA_YOLOx model.

5. Experiments Results and Discussion

This experiment uses a Windows 10 operating system, Pycharm compilation software, Python3.6, CUDA10.1, CUDNN7.6.0, tensorflow-gpu2.2.0, and GPU hardware of TITAN XP. Since the backbone network is replaced and the SPP structure of the original YOLOx is retained, the pre-trained model of the original YOLOx cannot be used. Therefore, the lightweight backbone network is pre-trained, and the pre-trained weights are used for transfer learning and subsequent experiments.

5.1. Experimental Scheme

5.1.1. Experimental Setup

The training process of the experiment was set as 200 iterations, and transfer learning technology was introduced. The first 100 iterations were freezing training, and the training batch was 32. The last 100 iterations were unfreeze training, and the training batch was also 32. We chose a stochastic gradient descent optimization algorithm with momentum (SGD-M). The initial learning rate was 0.001, and the learning rate decay mode was a cosine annealing algorithm.

5.1.2. Dataset and Preprocessing

The image size of insulative baffles datasets is 416 × 416. Finally, the defects of the dataset were annotated and made into a VOC format by Labelimg software.

The dataset is divided into three categories: training set, validation set and test set. The ratio of the training–validation set, test set is 8:2. The ratio of training set and validation set is 8:2, that is, the training set is 1314, the validation set is 329, and the test set is 411.

Due to the relatively small number of datasets, we enhanced the datasets in the code by using the technology of Mosaic data enhancement and conventional data enhancement before the end of training. Since the Mosaic data enhancement deviates from the real defect distribution of the defect data image, only the conventional data enhancement method is used in the last 60 iterations of training.

5.2. Surface Defect Detection of Insulative Baffles of LA_YOLOx

In order to verify the performance of the improved lightweight YOLOx model, the datasets are, respectively, sent to the YOLOx model, the official lightweight structure YOLOx_Nano, YOLOx_Tiny and the proposed network based on YOLOx_MobileNetV3 and YOLOx_GhostNet for training. We used the number of training parameters, average precision (AP), mean average precision (mAP), Recall and Precision as indicators to verify the performance of each network model in the detection of surface defects on insulative baffles.

5.2.1. Experimental Comparasions of Different Lightweight YOLOx

Through experimental tests, the experimental results of each model are shown in Table 1.

The number of training parameters: It reflects the computation of the network, and also reflects the lightweight degree of the network.

Recall and Precision: Recall refers to the detection regression rate of the real box. Precision refers to the classification accuracy rate of the target in the prediction box. Since the Recall and Precision values are affected by the Non-Maximum-Suppression (NMS) threshold, the Recall and Precision of the experiment are obtained based on the threshold value of 0.5.

AP and mAP: AP refers to the average detection accuracy of the model for each target in all images, and mAP refers to taking the mean value of AP.

In Table 1, YOLOx has the largest number of parameters. After replacing the corresponding part of YOLOx with lightweight structures, the number of model parameters decreases by 3.7–4.0M, which is equivalent to the number of parameters of YOLOx_Tiny. YOLOx_Nano has the least number of parameters. Meanwhile, YOLOx_MobileNetV3 and YOLOx_GhostNet maintain almost the same accuracy as the original YOLOx model. This is because there are attention modules and residual connections in MobileNetV3 and GhostNet, and the detection accuracy will not decrease. In addition, after replacing part of the conventional convolution in the neck and head with DSC method, the detection accuracy decreased by 0.8–1.6%. In particular, in the index of Recall, the small target PP and the large target SC both decreased by about 2.5%.

In conclusion, lightweight structures can greatly reduce the number of parameters and the computational cost of the original model. However, the detection accuracy decreased to a certain extent, mainly the indicator of the Recall of the small target defect PP and the large target defect SC. Therefore, in order to solve the problem that the detection accuracy of the lightweight YOLOx model decreases, we will improve this model to improve the detection accuracy, especially the indicator of Recall of the small target defect PP and the large target defect SC.

Furthermore, based on the detection performance of the above models, MobileNetV3 in YOLOx has the best performance when comparing GhostNet in YOLOx. Specifically, YOLOx_MobileNetV3 has the highest value in mAP. When using DSC structure in lightweight YOLOx(YOLOx_MobileNetV3,YOLOx_GhostNet), YOLOx_M&DSC has the best performance in mAP, and Recall and Precision of partial defects are also best. Therefore, we chose the YOLOx_M&DSC model as a model for further improvement.

5.2.2. Experimental Comparasions of Different LA_YOLOx Models

To solve the problem of the lightweight network (YOLOx_M&DSC)), we add the attention mechanism SE, CBAM and ECA module to the output of the small target and large target in the feature fusion part, and carry out experiments respectively. The experimental results are shown in Table 2. In Table 2, SE(s) represents that the SE module is added to the output of the small target and SE(l) represents that the SE module is added to the output of the large target, and the other similar names have similar meanings. Due to the small number of parameters brought by the three attention mechanisms, the experimental statistics are all about 5.26M, which is approximate to the number of parameters of the lightweight YOLOx network. Therefore, there is no index of parameters in Table 2.

It can be concluded from Table 2 that the SE module has the best performance. In the model with the SE module added, the detection effect of the small target with attention is not as good as that of the large target with attention. When the SE module is added to both small and large targets, the detection performance is not good. This may be because the operations of up-sampling, down-sampling and feature fusion of the features after attentive operation bring negative information, which affects the learning of the model. Therefore, we ultimately chose to add the SE module only in the large target output.

5.2.3. Detection Effect and Analysis

Through analysis Table 1 and Table 2, we clearly summarized the detection performance of three models in Table 3. These three models are the original YOLOx model, YOLOx_M&DSC model and our LA_YOLOx model, respectively. In this paper, the LA_YOLOx model is YOLOx_MobileNetV3 with the SE(l) attention module based on our dataset.

From the Table 3, it is clear that our LA_YOLOx model not only reduces the complexity of the model, but also effectively solves the problem of Recall degradation of PP and SC.

Setting different thresholds can obtain different (P, R) points, and connecting these points can generate the PR curve. In general, Recall is the abscissa and Precision is the ordinate. It is worth mentioning that the geometric significance of AP is the area of the first quadrant surrounded by the PR curve. Therefore, PR curve can directly reflect the performance of Recall and Precision. The PR curve of our LA_YOLOx model is shown in Figure 11.

The detection effect of insulative baffles based on the three YOLO models in this paper is shown in Figure 12. The display form is to locate the four types of defects with different color rectangular boxes, respectively. At the same time, it will display the category and probability value at the upper left corner of the rectangular box. Among them, the verification pictures of detection effects are the test pictures of insulative baffles defects.

As can be seen from Figure 12, after the detection of the lightweight YOLOx model, some small targets PP and large targets SC are missed. However, after introducing attention mechanism, the LA_YOLOx model in this paper can detect it all. So, our LA_YOLOx model solves the problem of missing detection in the lightweight network, and the probability value of some targets is higher than that of the lightweight model without attention. At the same time, the value of Recall in Table 3 also reflects the effectiveness of the proposed LA_YOLOx model.

6. Discussion, Conclusions, and Future Works

Insulative baffles are one of the commonly used electric power safety equipment in electric power sites. It is very important to ensure their high quality and good appearance, prevent defective products from entering the market, and ensure the safety of power workers and the normal operation of the grid. At present, the detection method of insulating separator surface defects is a manual visual method. There are no manufacturers and enterprises to carry out intelligent detection. In this paper, a surface defect detection method based on YOLOx is proposed to replace the traditional manual method. Our method makes the detection of the insulative baffles intelligent, greatly improves the detection efficiency, effectively ensures the normal use of the insulatve baffles and ensures the normal operation of maintaining work of electric power.

At the same time, consider that the application scenario has certain limitations on the model resources. This paper proposes to replace the corresponding part of the original YOLOx model with lightweight structures, and the final number of parameter decreases by 3.7M compared with the original YOLOx. However, compared with the original YOLOx, the detection accuracy is reduced by 0.8%. In order to improve the detection accuracy of the improved model, attention mechanism is introduced into the feature fusion module, and the final detection accuracy of the LA_YOLOx model reaches 95.68%, which is increased by 0.4% compared with the original YOLOx. The experimental results show that we have achieved the lightness of the network without degrading its detection performance. The overall detection effect of the LA_YOLOx is comparable to that of the original YOLOx, which proves the effectiveness of the LA_YOLOx proposed by us.

However, it was found in the experimental results that the value of the Recall of small target defect PP and large target defect SC is relatively low compared to the other two defects. Our LA_YOLOx model can keep the original detection accuracy, but cannot further improve it. This may be because small, insignificant and complex features are hard to learn for lightweight models. Therefore, our future research will aim at solving this problem with our LA_YOLOx model. In addition, we have a small amount of data on insulative baffles defects. Although we trained with Mosaic data augmentation techniques, the images produced by such techniques were far from the true distribution of the defects in the insulative baffles. In the future, the number of datasets will also be increased and fed into the model for training to enhance the generalization ability of the model. Experimental results show that the detection of small targets is not as good as the detection of large targets, resulting in the addition of attention mechanism SE in the detection of large targets. Therefore, we will use the L model to study how to improve the detection ability of small targets in the future. Due to limited time and equipment, we will study other attention modules in the future to see if there are any better-performing attention modules.

Author Contributions

Conceptualization, Q.L. and X.H.; methodology, Q.L.; software, Q.L.; validation, Q.L., Z.L., X.H. and H.C.; investigation, Q.L. and X.H.; resources, H.C.; data curation, Q.L., X.H. and H.C.; writing—original draft preparation, Q.L.; writing—review and editing, Q.L. and Z.L.; visualization, Q.L.; supervision, Z.L.; project administration, Z.L. and H.C.; funding acquisition, H.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China under Grant 61801319, in part by Sichuan Science and Technology Program under Grant 2020JDJQ0061, 2021YFG0099, in part by the Sichuan University of Science and Engineering Talent Introduction Project under Grant 2020RC33, in part by Innovation Fund of Chinese Universities under Grant 2020HYA04001, in part by Artificial Intelligence Key Laboratory of Sichuan Province Project under Grant 2021RZJ03, in part by 2021 Graduate Innovation Fund of Sichuan University of Science & Engineering under Grant y2021069.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to restrictions, e.g., privacy or ethical.

Conflicts of Interest

The authors declare no conflict of interest. The role of the funders in the design of the study: collection, analyses or interpretation of data.

Abbreviations

The following abbreviations are used in this manuscript:

LA_YOLOx	Lightweight and attentive YOLOx model
YOLOx_MobileNetV3	MobileNetV3 backbone in YOLOx
YOLOx_M&DSC	MobileNetV3 backbone and Depthwise separable convolution in YOLOx
YOLOx_GhostNet	GhostNet backbone in YOLOx
YOLOx_G&DSC	GhostNet backbone and Depthwise separable convolution in YOLOx

References

Yao, J.G.; Xiao, H.Y.; Zhang, J.; Jiang, Y.; Yao, P.; Sun, G.Q. Design of Electric Equipment Operation Security Condition Assessment System. J. Power Syst. Autom. 2009, 1, 52–58. [Google Scholar]
Su, H.; Zhang, J.B.; Zhang, B.H.; Wei, Z. Review of research on the inspection of surface defect based on visual perception. Comput. Integr. Manuf. Syst. 2009, 29, 169–191. [Google Scholar]
Cao, M.; Zhang, W.; Zeng, W.; Chen, C. Research on the device of differential excitation type eddy current testing for metal defect detection. In Proceedings of the IEEE Far East Forum on Nondestructive Evaluation/Testing: New Technology and Application, Jinan, China, 17–20 June 2013; pp. 155–158. [Google Scholar]
Zeng, W.; Wang, H.; Tian, G.; Hu, G.X.; Yang, X.M.; Wan, M. Application of a non-contact laser ultrasonic imaging technique in surface defect detection. In Proceedings of the IEEE Far East Forum on Nondestructive Evaluation/Testing, Chengdu, China, 20–23 June 2014; pp. 30–33. [Google Scholar]
Pan, M.; Zhou, D.; Chang, X. Analysis of surface defect detection characteristics of new pulsed MagneticFlux Leakage detection method. Sensors Microsystems 2017, 36, 32–35. [Google Scholar]
Lowe, D.G. Object recognition from local scale-invariant features. In Proceedings of the Seventh IEEE International Conference on Computer Vision, Kerkyra, Greece, 20–27 September 1999; Volume 2, pp. 1150–1157. [Google Scholar]
Lowe, D.G. The title of the cited article. Distinctive Image Featur. Scale-Invariant Keypoints 2004, 60, 91–110. [Google Scholar]
Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA, 20–25 June 2005; pp. 886–893. [Google Scholar]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
Girshick, R. Fast RCNN. arXiv 2015, arXiv:1504.08083. [Google Scholar]
Ren, S.Q.; He, K.M.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 36, 1137–1149. [Google Scholar] [CrossRef]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NE, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. arXiv 2016, arXiv:1512.02325. [Google Scholar]
Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
Ge, Z.; Liu, S.T.; Wang, F.; Sun, J. YOLOX: Exceeding YOLO Series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
Cheng, J.Y.; Duan, X.H.; Zhu, W. Research on Metal Surface Defect Detection by Improved YOLOv3. Comput. Eng. Appl. 2021, 57, 252–258. [Google Scholar]
Xu, Y.J.; Li, C. Lightweight object detection network based on optimized YOLO. Comput. Sci. 2021, 48, 265–269. [Google Scholar]
Lim, W.H.; Bonab, M.B.; Chua, K.H. An Optimized Lightweight Model for Real-Time Wood Defects Detection based on YOLOv4-Tiny. In Proceedings of the 2022 IEEE International Conference on Automatic Control and Intelligent Systems (I2CACIS), Shah Alam, Malaysia, 25–25 June 2022; pp. 186–191. [Google Scholar]
Kim, M.; Jeong, J.; Kim, S. ECAP-YOLO: Efficient Channel Attention Pyramid YOLO for Small Object Detection in Aerial Image. Remote. Sens. 2021, 13, 4851. [Google Scholar] [CrossRef]
Zheng, Z.; Zhao, J.; Li, Y. Research on Detecting Bearing-Cover Defects Based on Improved YOLOv3. IEEE Access 2021, 9, 10304–10315. [Google Scholar] [CrossRef]
Tao, X.; Hou, W.; Xu, D. A Survey of Surface Defect Detection Methods Based on Deep Learning. Acta Autom. Sin. 2021, 47, 1017–1034. [Google Scholar]
Fan, L.L.; Zhao, H.W.; Zhao, H.Y.; Hu, H.; Wang, Z. Survey of target detection based on deep convolutional neural networks. Opt. Precis. Eng. 2020, 28, 1152–1164. [Google Scholar]
He, K.M.; Zhang, X.Y.; Ren, S.Q. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef]
Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path Aggregation Network for Instance Segmentation. arXiv 2018, arXiv:1803.01534. [Google Scholar]
Lin, T.Y.; Dollár, P.; Girshick, R. Feature Pyramid Networks for Object Detection. arXiv 2017, arXiv:1612.03144. [Google Scholar]
Yu, J.H.; Jiang, Y.N.; Wang, Z.Y.; Huang, T. UnitBox: An Advanced Object Detection Network. arXiv 2016, arXiv:1608.01471. [Google Scholar]
Zheng, Z.H.; Wang, P.; Ren, D.W.; Liu, W.; Ye, R.; Hu, Q.; Zuo, W. Enhancing Geometric Factors in Model Learning and Inference for Object Detection and Instance Segmentation. arXiv 2021, arXiv:2005.03572. [Google Scholar] [CrossRef]
Ge, Z.; Liu, S.T.; Li, Z.M.; Yoshie, O.; Sun, J. Ota: Optimal transport assignment for object detection. arXiv 2021, arXiv:2103.14259. [Google Scholar]
Howard, A.; Zhu, M.L.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:2103.14259. [Google Scholar]
Sandler, M.; Howard, A.; Zhu, M.L.; Zhmoginov, A.; Chen, L.C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. arXiv 1801, arXiv:1801.04381. [Google Scholar]
Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for MobileNetV3. arXiv 2019, arXiv:10.48550/1905.02244. [Google Scholar]
Han, K.; Wang, Y.H.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. GhostNet: More Features from Cheap Operations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 1577–1586. [Google Scholar]
Hu, J.; Shen, L.; Albanie, S. Squeeze-and-Excitation Networks. arXiv 2018, arXiv:1709.01507. [Google Scholar]
Woo, S.; Park, J.; Lee, J. CBAM: Convolutional Block Attention Module. arXiv 2018, arXiv:1807.06521. [Google Scholar]
Wang, Q.L.; Wu, B.G.; Zhu, P.F.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. arXiv 2020, arXiv:1910.03151. [Google Scholar]

Figure 1. Defect examples of insulative baffles. (a) Scratch (SC). (b) Burn mark (BM). (c) Pit points (PP). (d) Voltage breakdown (VB).

Figure 2. Detecting process of YOLO models.

Figure 3. YOLOx structure.

Figure 4. Bneck.

Figure 5. (a) Convolution layer. (b) Ghost module.

Figure 6. Calculation process of DSC.

Figure 7. SE Module.

Figure 8. CBAM Module.

Figure 9. ECA Module.

Figure 10. LA_YOLOx structure.

Figure 11. P-R curve of four defects.

Figure 12. Detection effects of different YOLOx models.

Table 1. Performance of lightweight YOLOx based on our dataset.

Model	Indexes	BM	PP	SC	VB	Params (M)	mAP(%)
YOLOx	R(%)	98.95	91.67	85.54	97.50	8.93	95.31
	P(%)	98.95	94.83	91.90	96.30
	AP(%)	99.19	94.57	91.13	96.36
YOLOx (Tiny)	R(%)	99.21	90.00	85.38	97.50	5.03	94.72
	P(%)	98.95	93.10	91.89	97.50
	AP(%)	99.47	91.68	91.49	96.36
YOLOx (Nano)	R(%)	98.43	84.76	82.00	96.25	2.25	91.87
	P(%)	98.94	89.22	94.50	95.06
	AP(%)	99.28	84.54	88.86	94.80
YOLOx(M)	R(%)	98.69	92.14	84.46	96.25	7.47	95.60
	P(%)	98.43	93.93	93.21	98.72
	AP(%)	98.43	94.67	91.42	97.87
YOLOx (M&DSC)	R(%)	98.69	89.29	82.77	97.50	5.26	94.50
	P(%)	99.21	93.25	96.76	98.73
	AP(%)	99.18	90.59	90.74	97.49
YOLOx(G)	R(%)	96.69	91.19	85.23	98.75	7.17	95.42
	P(%)	98.95	94.57	90.82	96.34
	AP(%)	98.79	94.34	90.90	97.63
YOLOx (G&DSC)	R(%)	98.69	88.33	83.23	96.25	4.96	93.70
	P(%)	99.47	92.06	95.92	97.47
	AP(%)	99.00	88.04	90.95	96.79

Table 2. Performance of different attention modules based on YOLOx(M&DSC).

Attention	Indexes	BM	PP	SC	VB	mAP(%)
SE (s)	R (%)	98.69	91.19	83.54	96.25	95.09
	P (%)	99.21	95.51	94.76	98.72
	AP (%)	98.83	93.37	91.16	97.01
SE (l)	R (%)	98.69	91.90	84.66	97.50	95.68
	P (%)	99.21	95.31	95.64	98.72
	AP (%)	98.75	94.11	92.02	97.85
CBAM (s)	R(%)	98.69	90.00	83.08	96.25	94.70
	P (%)	99.47	93.56	94.41	97.47
	AP (%)	99.31	91.32	90.44	97.75
CBAM (l)	R(%)	98.43	91.19	82.46	96.25	94.65
	P (%)	99.21	93.87	95.89	98.72
	AP (%)	98.65	90.99	91.49	97.46
ECA (s)	R (%)	98.95	90.95	82.62	96.25	95.01
	P (%)	99.47	93.40	95.21	98.72
	AP (%)	99.10	92.90	90.88	97.16
ECA (l)	R (%)	98.43	90.71	83.08	96.25	94.84
	P (%)	99.21	94.54	95.91	97.47
	AP (%)	98.68	92.27	91.21	97.21
SE (sl)	R (%)	98.43	90.95	83.23	93.75	95.05
	P (%)	99.21	94.09	95.41	97.40
	AP (%)	98.68	93.02	91.51	97.00

Table 3. Overview performance of three YOLOx models based on our dataset (Ablation experiments).

Model	Indexes	BM	PP	SC	VB	Params (M)	mAP(%)
YOLOx	R (%)	98.95	91.67	85.54	97.50	8.93	95.31
	P (%)	98.95	94.83	91.90	96.30
	AP (%)	99.19	94.57	91.13	96.36
YOLOx (M&DSC)	R (%)	98.69	89.29	82.77	97.50	5.26	94.50
	P (%)	99.21	93.25	96.76	98.73
	AP (%)	99.18	90.59	90.74	97.49
LA_YOLOx	R (%)	98.69	91.90	84.66	97.50	5.26	95.68
	P (%)	99.21	95.31	95.64	98.72
	AP (%)	98.75	94.11	92.02	97.85

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, Q.; Luo, Z.; He, X.; Chen, H. LA_YOLOx: Effective Model to Detect the Surface Defects of Insulative Baffles. Electronics 2023, 12, 2035. https://doi.org/10.3390/electronics12092035

AMA Style

Li Q, Luo Z, He X, Chen H. LA_YOLOx: Effective Model to Detect the Surface Defects of Insulative Baffles. Electronics. 2023; 12(9):2035. https://doi.org/10.3390/electronics12092035

Chicago/Turabian Style

Li, Quanyang, Zhongqiang Luo, Xiangjie He, and Hongbo Chen. 2023. "LA_YOLOx: Effective Model to Detect the Surface Defects of Insulative Baffles" Electronics 12, no. 9: 2035. https://doi.org/10.3390/electronics12092035

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

LA_YOLOx: Effective Model to Detect the Surface Defects of Insulative Baffles

Abstract

1. Introduction

2. Related Work

2.1. Analysis of Surface Defects of Insulative Baffles

2.2. Algorithm Analysis

2.2.1. Object Detection Algorithm Based on YOLO

2.2.2. YOLOx

2.2.3. Analysis and Optimization of Loss Function

3. Lightweight Structures for YOLOx

3.1. MobileNetV3

3.2. GhostNet

3.3. Depthwise Separable Convolution

4. Attention Modules for Lightweight YOLOx

4.1. SE

4.2. CBAM

4.3. ECA

4.4. Overview of Improved YOLOx

5. Experiments Results and Discussion

5.1. Experimental Scheme

5.1.1. Experimental Setup

5.1.2. Dataset and Preprocessing

5.2. Surface Defect Detection of Insulative Baffles of LA_YOLOx

5.2.1. Experimental Comparasions of Different Lightweight YOLOx

5.2.2. Experimental Comparasions of Different LA_YOLOx Models

5.2.3. Detection Effect and Analysis

6. Discussion, Conclusions, and Future Works

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI