Next Article in Journal
Infrared Thermography as a Non-Invasive Tool in Musculoskeletal Disease Rehabilitation—The Control Variables in Applicability—A Systematic Review
Next Article in Special Issue
Use of Artificial Neural Networks to Predict the Progression of Glaucoma in Patients with Sleep Apnea
Previous Article in Journal
Research on Predicting Remain Useful Life of Rolling Bearing Based on Parallel Deep Residual Network
Previous Article in Special Issue
Energy-Aware Multi-Objective Job Shop Scheduling Optimization with Metaheuristics in Manufacturing Industries: A Critical Survey, Results, and Perspectives
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Object Detection Related to Irregular Behaviors of Substation Personnel Based on Improved YOLOv4

College of Computer Science, Sichuan University, Chengdu 610065, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2022, 12(9), 4301; https://doi.org/10.3390/app12094301
Submission received: 7 March 2022 / Revised: 17 April 2022 / Accepted: 23 April 2022 / Published: 24 April 2022
(This article belongs to the Special Issue Applied Artificial Intelligence (AI))

Abstract

:
The accurate and timely detection of irregular behavior of substation personnel plays an important role in maintaining personal safety and preventing power outage accidents. This paper proposes a method for irregular behaviors detection (IBD) of substation personnel based on an improved YOLOv4 which uses MobileNetV3 to replace the CSPDarkNet53 feature extraction network, depthwise separable convolution and efficient channel attention (ECA) to optimize the SPP and PANet networks, and four scales of feature maps to fuse to improve the detection accuracy. First, an image dataset was constructed using video data and still photographs preprocessed by the gamma correction method. Then, the improved YOLOv4 model was trained by combining Mosaic data enhancement, cosine annealing, and label smoothing skills. Several detection cases were carried out, and the experimental results showed that the proposed improved YOLOv4 model has high accuracy, with a mean average precision (mAP) of 83.51%, as well as a fast detection speed, with a frames per second (FPS) of 38.06 pictures/s. This represents better performance than other object detection methods, including Faster RCNN, SSD, YOLOv3, and YOLOv4. This study offers a reference for the IBD of substation personnel and provides an automated intelligent monitoring method.

1. Introduction

Power substations are the main location of voltage transformation, current transformation, and power distribution in the power system, ref. [1] and their safe and stable operation is the basis for reducing the occurrence of outage accidents. Due to substation personnel being in the same environment for a long time, their mentality gradually becomes loose and safety awareness inevitably weakens, such the substation personnel often violate the operating regulations in practical operation, resulting in various accidents [2]. Therefore, monitoring regular operations, safety personnel, and personal safety protective equipment on the practical scene has become one of the key issues in reducing workplace accidents and worker injuries. The most common irregular behaviors include absence of safety personnel, lack of safe operating tools, not wearing insulated gloves, not wearing safety helmets, etc., all of which cause great safety hazards to both operators and electrical equipment. Thus, it is necessary to monitor the appropriate of safety operation tools and protective equipment by workers in real time through video surveillance [3]; object detection technology in deep learning can meet this demand.
At present, detection methods focused on irregular behavior by substation personnel can be summarized into two categories, namely, traditional machine learning-based methods and deep learning-based methods. Detection methods based on traditional machine learning usually use image processing technology, feature definition, and feature extraction, where feature extraction includes the color, texture, shape, and other features of local or overall targets [4]. For instance, Li et al. [5] developed a novel and practical safety helmet wearing detection system which used the color features of the HSV color space to identify whether workers are wearing safety helmets, achieving good results in common situations in substations. Li et al. [6] used the ViBe background modeling algorithm to detect and extract moving targets captured by the video surveillance system in a substation, then the histogram of oriented gradient (HOG) features was extracted to train the support vector machine (SVM) to achieve pedestrian classification, and finally safety helmet detection was performed through the color features. Liu [7] and Hu [8] divided worker images into three units, then extracted HOG and histogram of color (HOC) features from these three units to train a linear SVM classifier to improve the detection precision of substation workers’ protective clothing. Traditional machine learning-based methods usually require image processing along with complex features definition, extraction, and selection processes. However, the detection accuracy of these methods in complex backgrounds such as uneven illumination, different weather, and unusual angles is often drastically reduced, and cannot meet the requirements of irregular behavior detection (IBD) of substation personnel.
Recently, with the introduction of deep learning, the related technologies of object detection have developed substantially and the feature extraction ability has been significantly improved, avoiding the tedious process of manual feature definition through layer-by-layer feature extraction and nonlinear transformation. The development of deep learning object detection technology provides a new automatic detection method for image-based object detection of power equipment, ref. [9] personnel safety, foreign objects, ref. [10] etc., in substations. Wang et al. [11] quickly identified substation operators by training a single shot multi-box detector (SSD) model, then combined this with the geometric features of the human body, HSV color space, and morphological processing to detect safety helmet wearing conditions. Chen et al. [12] proposed an improved Faster R-CNN algorithm based on Resenet101; combined with Retinex image enhancement technology and K-means++ clustering algorithm, the mean average precision (mAP) for helmet wearing detection can reach 94.3% and the detection speed can reach 11.62 images per second. Yao et al. [13] proposed an enhanced RCNN model which combined Faster RCNN and the Wasserstein generative adversarial network (GAN) to enhance low-resolution images of substations; the mAP and intersection over union (IoU) of people, bicycles, cars, and other targets in substations were 82.45% and 71.26%, respectively. Zhao et al. [14] proposed a wear-enhanced YOLOv3 algorithm model which used gamma correction preprocessing, data enhancement, and K-means++ algorithm; the mAP and frames per second (FPS) of this method were 81.95% and 19.4, respectively. Wang [15] simplified the feature extraction and feature fusion network on the basis of Yolov4 while reducing the output branches of the network and using depthwise separable convolution to replace standard convolution; this model was able to reach a precision of 81.13% and a recall of 84.97 for safety helmets detection. However, most of the above-mentioned methods are only intended for the detection of safety helmet wearing, which is not the only one kind of irregular behavior in substations. At the same time, the above methods have low detection speed and low real-time performance, which may lead to the inability of inspectors to prevent the occurrence of security accidents in time.
In terms of the current problems, this paper proposes a method for the IBD of substation personnel based on improved YOLOv4. First, a dataset of irregular behaviors by substation personnel was constructed based on video surveillance and still photographs and the images were preprocessed by gamma correction. Then, the MobileNetV3 was used to replace the CSPDarkNet53 of YOLOv4 as the feature extraction network used to reduce the parameters. At the same time, four scales of feature maps were extracted and input to the SPP and PANet networks, which were optimized based on depthwise separable convolution and the efficient channel attention (ECA) to improve the performance of the detection model. The example results show that the method proposed in this paper has better accuracy and faster detection speed than other methods, and can provide a reference for real-time detection of substation personnel IBD.

2. Irregular Behavior Detection Dataset of Substation Personnel

2.1. Dataset Construction and Annotation

The IBD behaviors detection dataset of substation personnel used in this paper was partly derived from video data captured by the video surveillance system in the substation, which was converted into images for a total of 494 images. Another 2491 images were collected from photographs shot by the substation workers when they were working. The IBD dataset of substation personnel included three categories, namely, safety protective equipment, safety operating equipment, and safety personnel, specifically including safety helmets, insulating gloves, ordinary operators, safety personnel, insulated operating rods, and electroscopes. Detection of the objects that substation personnel should wear and carry when working can reflect whether substation personnel are engaging in irregular behaviors. If the control center detects these irregular behaviors, it can stop them in time to avoid personal safety accidents.
The IBD dataset of substation personnel needs to be annotated before training the object detection model. In this paper, the LabelImg labeling tool [16] was used to annotate the location and category of an object using a rectangular bounding box. Regardless of their status as an ordinary operator or as safety personnel, their location and category were labeled as “person”. For safety personnel, the location and category of the badge worn by safety personnel were both labeled as “badge”. For safety helmets, insulating gloves, insulated operating rods, and electroscopes, the location and category of the objects were labeled as “helmet”, “glove”, “operatingbar”, and “powerchecker”, respectively. The annotation format used in this paper is Pascal VOC [17], a common format in the field of computer vision, and target location information can be acquired from the upper left vertex coordinates (xmin, ymax) and lower right vertex coordinates (xmax, ymin) of the rectangular bounding boxes.

2.2. Gamma Correction Preprocessing

The images from the IBD dataset of substation personnel were various and diverse, as they were shot at different times, in different environments, and at different angles. Thus, problems with the images in the dataset including uneven illumination, low contrast, etc. had to be accounted for. Due to the above-mentioned problems, gamma correction [18] was used to preprocess the images to reduce irrelevant information, improve image contrast, and speed up the training process of the object detection model.
Gamma correction is a method for performing non-linear tone editing on an image based on the gamma curve, detecting the dark and light parts in the image signal, and adjusting their ratio, thereby improving the contrast of the image. Moreover, in the field of computer graphics the gamma curve [19] is a transform relationship between the screen output voltage and the corresponding brightness. Assuming a single substation operator image used as input, each pixel in the image was gamma corrected according to the following steps:
(1)
Normalization: the value of each pixel in the substation personnel irregular behavior image was converted into a range between 0 and 1 according to the equation IN = (I + 0.5)/256, where IN is the value of normalized pixel and I is the value of original pixel.
(2)
Pre-compensation: the pre-compensated value I N was obtained based on the equation I N = I N 1 / γ , where γ is the parameter of gamma correction.
(3)
Denormalization: the pre-compensated value I N was converted back to the original range, between 0 and 255.
Through the above three steps, a preprocessed image with gamma correction can be obtained; the effect is shown in Figure 1. Moreover, when γ < 1, images become brighter with decreasing γ, while when γ > 1, images become darker with increasing γ. When γ tends to 0 or 1, the image is extremely bright or extremely dark, and excessive correction results in low image contrast.

3. Improved YOLOv4 Lightweight Detection Model

A lightweight improved YOLOv4 model is proposed in this paper to achieve fast and accurate detection of irregular behaviors of substation personnel. The overall detection process is shown in Figure 2. First, the gamma correction image preprocessing network was used to reduce interference from irrelevant information and improve image contrast. Then, the four scale features were extracted by MobileNetV3 and combined with the ECA-SPP and ECA-PANet network to fuse and enhance the feature information, with the detection layer utilizing the fused features to predict locations and categories. Finally, non-maximum suppression (NMS) was used to eliminate redundant prediction bounding boxes, realizing the detection of irregular behaviors of substation personnel. The structure and principle of the lightweight improved YOLOv4 model are introduced below.

3.1. YOLOv4 Algorithm

The YOLOv4 [20] model is an end-to-end object detection algorithm which has absorbed many excellent optimization strategies in the convolutional neural network (CNN) field compared to the YOLOv3 model, including to the feature extraction network, model training, loss function, etc., and which enjoys significant improvements in terms of accuracy and speed. Therefore, we used YOLOv4 as the basic network to realize a lightweight model, thus facilitating real-time detection in the substation. The YOLOv4 model is mainly composed of the three parts shown in Figure 3, namely, the feature extraction network (CSPDarknet53), feature enhancement network (SPP and PANet), and prediction network (YOLOHead).
First of all, the images from the IBD dataset of substation personnel were input into the CSPDarkNet53, which consists of CBM module and five CSP modules. Then, features with three scales were extracted, including 52 × 52, 26 × 26, 13 × 13, and fused using spatial pyramid pooling (SPP) [21] and the path aggregation network (PANet) [22] to enhance the feature information. Finally, the category and location of targets were predicted through the 3 × 3 and 1 × 1 convolutions in YOLOHead.
Next, the predicted results of category and location must be decoded in order to display the detection results intuitively. Generally, prediction results are processed into S × S × M dimension feature information by 1 × 1 convolution in YOLOHead; S represents a feature map size with a certain scale and M is the feature information, including location and category, the meaning of which is as follows:
M = N anchors × ( N classes + L x , y , w , h + C )
where Nanchors is the three anchors allocated on each scale feature map, Nclasses is the total number of categories, Lx,y,w,h are the four location parameters of the anchors, and C is the confidence score. The definition of confidence is
C = Pr ( object ) × IoU pred truth
where IoU pred truth is used to measure the consistency between the truth bounding box and the predicted bounding box and its value is the ratio of the intersection area and the union set area. Pr(object) reflects whether the anchor contains a target. If there is a target, its value is 1; otherwise, its value is 0.

3.2. MobileNetV3

As the latest version of the MobileNet series network, MobileNetV3 [23] combines the depthwise separable convolution of MobileNetV1 [24] with the inverted residual and linear bottleneck of MobileNetV2 [25]. In addition, the squeeze and excitation networks (SENet) [26] attention mechanism and the hard-swish activation function were introduced to improve the feature extraction capacity of the model.
MobileNetV3 is mainly composed of a CBH module and fifteen bneck modules. The CBH module includes a convolutional layer (Conv), batch normalization (BN) layer, and hard-swish activation function. The hard-swish activation function is a new activation function proposed in MobileNetV3; its definition is shown in Equation (3). The model efficiency was improved and the inference speed accelerated by replacing the sigmoid function in swish [27] with the ReLU6 function. The bneck module is designed on the basis of depthwise separable convolution and inverted residual with linear bottleneck. The SENet lightweight attention module is introduced into certain specific bneck modules to increase the channel weights with important features, the structure of which is shown in Figure 4.
hard - swish ( x ) = x Re LU 6 ( x + 3 ) 6 ,   ReLU 6 ( x ) = min [ 6 , max ( 0 , x ) ]
According to Figure 4, the input is transmitted to the pointwise convolution layer for dimensionality increase, then passes through the BN layer and hard-swish activation function. Next, depthwise convolution using 3 × 3 convolution extracts the features of each input channel and the corresponding weight coefficient for each channel is generated through the SENet attention mechanism, which consists of global average pooling (GAP), a fully connected layer (FC), a ReLU activation function, FC layer, and a hard-sigmoid activation function. The weight coefficient is immediately multiplied with all elements of the corresponding channel to enhance important features and suppress unimportant features. Finally, the last pointwise convolution is used for dimensionality reduction and linear mapping is performed to reduce the features’ loss and obtain better detection results.

3.3. Efficient Channel Attention

As the backgrounds of the images obtained from video surveillance are very complex, the interference caused by the environment needs to be solved, which was achieved by adding an attention mechanism to improve the detection accuracy. In order to balance accuracy and computation, an efficient channel attention (ECA) [28] module was used to improve the SPP and PANet networks, which increases the computation minimally and can significantly improve the performance of the model in various aspects. The principle and structure of the ECA module are shown in Figure 5.
On the premise of maintaining the channel dimension, the ECA module performs GAP on feature channels and performs one-dimensional convolution to generate channel weights where the convolution kernel size (k) can be determined adaptively, that is, the kernel size (k) is proportional to the channel dimension. Therefore, the convolution kernel size (k) can be calculated by Equation (4) based on the given channel dimension (C), thusly:
k = ψ ( C ) = | log ( C ) a + b a | odd
where k is adaptively determined by mapping the channel dimension, |•|odd is the nearest odd number, and a and b are the linear function coefficients, which are set to 2 and 1, respectively.

3.4. Improved YOLOv4 Model

The implementation of YOLOv4 has high requirements in terms of computing performance and memory space. In order to improve the detection speed to meet the real-time requirements of practical engineering, an improved YOLOv4 lightweight model is proposed, the structure of which is shown in Figure 6. The particular principle is described in detail as follows.
(1) Feature extraction network: the CSPDarknet53 feature extraction network in YOLOv4 was replaced by the MobileNetV3 lightweight convolutional neural network, and the input size required by the initial MobileNetV3 was reshaped from 224 × 224 to 416 × 416. As shown in Figure 6, the original three scale features were increased to four scales; the features after bneck3, bneck3, bneck6 and bneck3 modules were used as feature 1, feature 2, feature 3, and feature 4, respectively, where the number of the bneck represents the amount of modules. Moreover, SENet was only used in the second bneck3 module, the last two bnecks in the bneck6 module, and the last bneck3 module. Then, the size of features 1, 2, 3, and 4 in the improved YOLOv4 model was reshaped to 104 × 104, 52 × 52, 26 × 26, and 13 × 13 and used for feature fusion and enhancement of the SPP and PANet networks.
(2) ECA-SPP and ECA-PANet: in order to further improve detection accuracy, the ECA attention mechanism was added behind the extracted four scale features; DSC was used to replace the 3 × 3 standard convolution in SPP and PANet and the ECA attention mechanism was embedded in SPP and PANet to improve accuracy. The resulting networks were named ECA-SPP and ECA-PANet, as shown in Figure 6. The details are as follows: (i) the CBR × 3 in front of and behind the SPP in Figure 3 was replaced with a CDC module; (ii) the ECA attention mechanism was added behind the max-pooling with four kinds of pooling kernel sizes (1 × 1, 5 × 5, 9 × 9, 13 × 13) in SPP; (iii) the CBR × 5 in PANet of YOLOv4 in Figure 3 was replaced with an MFC module; (iv) the ECA attention mechanism was added behind each upsampling and downsampling.
(3) Prediction network: The scales of the output feature maps of ECA-SPP and ECA-PANet were input to the prediction network (YOLOHead), where the CBR modules were replaced by DSC to further reduce the computation of the model. Then, the fused features of the three scales were reshaped to 52 × 52 × 33, 26 × 26 × 33, and 13 × 13 × 33, where the third dimension, 33, can be split into 3 × (6 + 5); 3 represents the anchors assigned to each scale feature map, 6 represents the total number of categories in the irregular behavior dataset of substation personnel, and 5 can be divided into 1 + 1 + 1 + 1 + 1, which represent, respectively, the abscissa and ordinate of the center point, height, and width of the prediction box and the confidence degree.

4. Experimental Results and Analysis

In order to obtain our optimal improved YOLOv4 detection model and verify its superiority, the case of IBD of substation personnel was carried out using the software environment of Visual Studio Code 2016, Tensorflow 2.4.0, Opencv 4.2.0.34, Cuda 11.0, cudnn 8.0.5.39 and hardware environment of an Nvidia GeForce RTX 3060 GPU with 6 GB video memory. The results obtained under different influence factors, gamma correction and different methods were then discussed and compared to verify the performance of the proposed model.

4.1. Experimental Process and Parameter Configuration

The case implementation process is shown in Figure 7. The images of substation personnel irregular behaviors were preprocessed by gamma correction and the corresponding annotations were divided into a train_val set and a test set with a certain ratio. Then, the K-means clustering method was used to cluster the ground-truth boxes in the train_val set to obtain the anchor sizes (11:11; 17:37; 25:17; 32:64; 41:146; 43:27; 73:48; 82:177; 161:260).
During the model training process, the preprocessed images were augmented by geometric and optical transformation to expand the samples, then the weight basis in the pre-training model was used to accelerate model training, which was obtained by using ImageNet public dataset to train the improved YOLOv4 model. Next, the train_val set was divided into a training set and a validation set at a ratio of 9:1 and were input to the improved YOLOv4 model for training in combination with Mosaic data augmentation, cosine annealing, and label smoothing to enhance the effectiveness and generalization of model training. At the same time, the stage-wise training method was adopted in order to save training resources. Finally, the optimal improved YOLOv4 model was used to detect the images in the test set, while non-maximum suppression (NMS) was applied to eliminate redundant prediction boxes. The parameters used for model training are shown in Table 1; the initial learning rate was applied for warmup and was then adjusted by cosine annealing to make the model more stable and accelerate the convergence speed. The loss curves during model training are shown Figure 8.
In order to facilitate the comparison of detection performance between different algorithm models, three evaluation indexes commonly used in the field of object detection were introduced. The first evaluation index was average precision (AP), which measures the recognition accuracy for each class. The second evaluation index was mean average precision (mAP), which is the mean value of the AP of each class. The third evaluation index was frames per second (FPS), which represents the number of images detected per second and reflects the detection speed of the model. The fourth evaluation index was F1-score, which represents the harmonic mean of precision and recall and reflects the detection accuracy of the model on each class.

4.2. Influence Factors and Result Analysis

4.2.1. Detection Results under Different Improvement Methods

The basic YOLOv4 object detection algorithm was improved for lightweight modelling by using the lightweight convolutional neural network MobileNetV3 as the feature extraction network, increasing the scales of feature extraction, and optimizing the SPP and PANet networks by using depthwise separable convolution and attention mechanisms. To demonstrate the effectiveness of these improvements, ablation experiments were performed without gamma correction preprocessing; the results are shown in Table 2 and Figure 9, with √ representing the method used to improve the model.
As can be seen in Table 2, when only the CSPDarkNet53 backbone feature extraction network is replaced with MobileNetV3, the mAP is only able to reach 78.49%. When the three attention mechanisms (SENet, CBAM, and ECA) were used to improve the SPP and PANet networks, the mAP could be improved by more than 1%, with the introduction of ECA having the highest mAP of 80.98%. Moreover, when the features of three scales were increased to four scales, the mAP could reach 81.44% after feature fusion and enhancement with ECA-SPP and ECA-PANet. At the same time, the optimal improvement method (Group 5) significantly improved the AP of each category compared to Group 1, while the mAP of the model increased by 2.95% and the detection speed (FPS) was only reduced by 2.67, meaning that the detection speed for a single picture was only reduced by 1.72 milliseconds.

4.2.2. Detection Results under Different Sample Ratios

In order to analyze the influence of different sample ratios of the IBD dataset on detection precision, the ratios of the train_val set and the test set were set as 9:1, 8:2, 7:3, 6:4, and 5:5. The performance indexes, including the AP of each class and the mAP of the test results under different sample ratios, are shown in Table 3.
As can be seen from Table 3, as the ratio of the train_val set to test set increases, the AP of each class and mAP generally show an upward trend, with the ratio of 9:1 having the greatest impact on the categories of operatingbar and powerchecker. When the ratio is 9:1, all performance indexes have their maximum values and the mAP reaches 81.44%. In order to improve the generalization ability of the model, as many training samples as possible should be selected in order to improve the detection accuracy of the practical application of the model; thus, the sample ratio in this paper was set as 9:1.

4.2.3. Detection Results Using Different Detection Methods

In order to verify the effectiveness of the improved YOLOv4 lightweight detection model proposed in this paper, four mainstream object detection models were constructed for comparison: Faster RCNN, SSD, YOLOv3, and YOLOv4. Then, the same training set consisting of the irregular behavior dataset of substation personnel was used to train these models, which were then used to detect images in the same test set. The detection results are shown in Table 4.
From the results in Table 4, it can be seen that the series of YOLO detection algorithms are able to accurately and quickly detect irregular behaviors of substation personnel by virtue of their structural advantages. The mAP was over 76%, which is higher than Faster RCNN and SSD. Moreover, in the series of YOLO algorithms, the mAP of YOLOv4 can reach 79.57, which is 3.1% higher than that of YOLOv3, although the detection speed (FPS) is reduced on account of the complex structure of YOLOv4. However, the improved YOLOv4 lightweight detection model proposed in this paper can increase the mAP to 81.44% while increasing the FPS to 38.06 pictures/s; that is, the time required to detect an image is about 26.27 ms, which is almost 1.72 times the FPS of YOLOv4. From the perspective of the AP of each class, although the detection accuracy of YOLOv4 for the person, helmet, and glove categories is slightly higher than that of the improved YOLOv4, the improved YOLOv4 can achieve more accurate detection on difficult-to-detect target categories, such as badge, operatingbar, and powerchecker. From the results in Table 5, it can be seen that the improved YOLOv4 is generally better than Faster RCNN, SSD, YOLOv3, and YOLOv4. The F1-score of the improved YOLOv4 model in the glove category is only 0.08 lower than YOLOv4, and the F1-score of improved YOLOv4 in the person category is only 0.02 lower than YOLOv3.

4.2.4. Detection Results under Gamma Correction

Because the irregular behaviors images of substation personnel were collected in complex and diverse environments, the problems of uneven illumination and areas in the images that are either very bright or very dark need to be solved. Therefore, on the basis of the improved YOLOv4 lightweight model the gamma correction preprocessing method was combined with changes to the image brightness to enhance image contrast, thereby improving the accuracy of the model. The detection results when using gamma correction are shown in Table 6 and Figure 10.
It can be seen from the above results that when the γ of gamma correction preprocessing is less than 1.0, the AP of each class and the overall mAP decrease greatly with the decrease of γ; that is, the detection effect becomes worse as the image becomes brighter. However, when γ is equal to 1.2 the overall mAP is the highest, at 83.51%, and the APs of all classes are the highest as well. When γ is greater than 1.2, both the APs of all classes and the overall mAP decrease, while the APs of the glove, operatingbar, and powerchecker decrease the most drastically. Moreover, it can be found from this tendency that there are many very bright images in the dataset, which can be corrected to normal brightness when γ is equal to 1.2 and the image contrast is opportune, thus making it easier for the model to detect the targets.

4.3. Robustness and Generalization Ability Verification

Due to the drastic changes in time, environment, angle, and other factors during the shooting process, the images were prone to uneven illumination and blurring; thus, images under special conditions like exposure, multi-target, and motion blur were used to verify the robustness and generalization ability of the improved YOLOv4 lightweight detection model, which was used to detect images under the above three special conditions. The detection results were compared with the YOLOv4, YOLOv3, Faster RCNN, and SSD models, and the results are shown in Figure 11.
With respect to multi-target detection, it can be seen that both SSD and Faster RCNN miss a person, and SSD incorrectly detects the operating bar as the power checker, while YOLOv4 and YOLOv3 both miss the operating bar. With respect to detection under exposure, it can be seen that SSD and Faster RCNN missed too many targets, YOLOv3 cannot detect the gloves, and YOLOv4 missed one glove. Finally, with respect to detection under motion blur, it can be seen that Faster RCNN cannot detect the gloves or badge, while YOLOv4, YOLOv3, and SSD all incorrectly detect the hand as a glove. However, the improved YOLOv4 lightweight detection model can accurately detect the targets in the above special cases, verifying that the proposed method has good detection accuracy in complex and diverse substation environments.
Meanwhile, on the basis of the improved YOLOv4 lightweight detection model with the usage of gamma correction preprocessing method, exposed images caused by uneven lighting or improper shooting can be adjusted and image contrast can be enhanced. As shown in Figure 12, when the image was preprocessed by gamma correction with γ = 1.2, the targets in the image can be clearly highlighted, and the person in Figure 12a and the power checker in Figure 12c are accurately detected. These results effectively prove that the proposed method is helpful for monitoring irregular behavior of substation personnel.

5. Conclusions

In order to realize the automatic monitoring of irregular behavior of substation personnel, this paper proposes an improved YOLOv4 lightweight detection model based on MobileNetV3 and the ECA attention mechanism. Case studies were carried out and the detection results were compared and analyzed with different improvement methods, object detection models, and gamma correction preprocessing. The following conclusions can be drawn.
(1)
Using MobileNetv3 as the feature extraction network to replace CSPDarkNet53, increasing the scales of feature extraction, and using depthwise separable convolution and an ECA attention mechanism to optimize the SPP and PANet networks effectively enhanced the accuracy of the model, which can provide a reference for other researchers to improve such models.
(2)
The improved YOLOv4 lightweight model proposed in this paper had good performance in detecting irregular behavior of substation personnel, with high detection accuracy and fast detection speed. The mAP and FPS reached 83.51% and 38.06 pictures/s, respectively, and the overall performance was better than that of Faster RCNN, SSD, YOLOv3, and YOLOv4.
(3)
Gamma correction preprocessing can effectively enhance image samples, improving the detection accuracy of the improved YOLOv4 lightweight detection model, and increasing the mAP to 83.51%. The proposed method was conducive to accurate, automated, and fast detection of irregular behaviors of substation personnel and was suitable for the requirements of real-time detection in practical engineering contexts.
Certain problems should be further studied and solved in the future. The indoor and outdoor environments of substations result in images with various backgrounds. Effective and automatic image preprocessing methods need to be studied, as the accuracy and robustness of detection models strongly depends on the training image samples. More image samples should therefore be added to the dataset. In addition, the object model would be deployed on computing devices to develop a visual image analysis and real-time detection system; thus, related hardware issues and practical application effects will need to be studied and verified in the future.

Author Contributions

Conceptualization, X.L. and J.F.; methodology, J.F.; software, J.F.; validation, X.L. and J.F.; formal analysis, J.F.; investigation, J.F.; resources, J.F.; data curation, X.L.; writing—original draft preparation, J.F.; writing—review and editing, X.L.; supervision, X.L.; project administration, X.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All data used in this research can be provided upon request.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Cai, W.; Le, J.; Jin, C.; Liu, K. Real-Time Image-Identification-Based Anti-Manmade Misoperation System for Substations. IEEE Trans. Power Deliv. 2012, 27, 1748–1754. [Google Scholar] [CrossRef]
  2. Pedro, H.; Juan, M.; Helon, D. An Outliers Processing Module Based on Artificial Intelligence for Substations Metering System. IEEE Trans. Power Syst. 2020, 35, 3400–3409. [Google Scholar]
  3. Zhong, J.; Li, W.; Roy, B.; Yu, J. Incorporating a Condition Monitoring Based Aging Failure Model of a Circuit Breaker in Substation Reliability Assessment. IEEE Trans. Power Syst. 2015, 30, 3407–3415. [Google Scholar] [CrossRef]
  4. Peng, G.; Du, B.; Cao, C.; He, D. Pointer-type instrument positioning method of intelligent inspection system for substation. J. Electron. Imaging 2022, 31, 013001. [Google Scholar] [CrossRef]
  5. Li, K.; Zhao, X.; Bian, J.; Tan, M. Automatic Safety Helmet Wearing Detection. In Proceedings of the 2017 IEEE 7th Annual International Conference on CYBER Technology in Automation, Control, and Intelligent Systems (CYBER), Honolulu, HI, USA, 31 July–4 August 2017; pp. 617–622. [Google Scholar]
  6. Li, J.; Liu, H.; Wang, T.; Jiang, M.; Wang, S.; Li, K.; Zhao, X. Safety helmet wearing detection based on image processing and machine learning. In Proceedings of the 2017 Ninth International Conference on Advanced Computational Intelligence (ICACI), Doha, Qatar, 4–6 February 2017; pp. 201–205. [Google Scholar]
  7. Liu, W.; He, Z. Smart video access control system with hybrid features in complicated environment. In Proceedings of the 2016 International Conference on Wavelet Analysis and Pattern Recognition (ICWAPR), Jeju, Korea, 10–13 July 2016; pp. 201–207. [Google Scholar]
  8. Hu, J.; Lin, X.; Li, C.; Zhou, J.; Li, G.; Feng, Y. Protective Clothing Detection of Substation Workers Using S-HOG+C Operator. In Proceedings of the 2018 International Conference on Power System Technology (POWERCON), Guangzhou, China, 6–8 November 2018; pp. 4181–4185. [Google Scholar]
  9. Liu, Y.; Ji, X.; Pei, S.; Ma, R.; Zhang, G.; Lin, Y.; Chen, Y. Research on automatic location and recognition of insulators in substation based on YOLOv3. High Volt. 2020, 5, 62–68. [Google Scholar] [CrossRef]
  10. Xu, L.; Song, Y.; Zhang, W.; An, Y.; Wang, Y.; Ning, H. An efficient foreign objects detection network for power substation. Image Vis. Comput. 2021, 109, 104159. [Google Scholar] [CrossRef]
  11. Komuro, N.; Hashiguchi, T.; Hirai, K.; Ichikawa, M. Development of Wireless Sensor Nodes to Monitor Working Environment and Human Mental Conditions. In IT Convergence and Security; Springer: Berlin/Heidelberg, Germany, 2021; pp. 123–129. [Google Scholar]
  12. Chen, S.; Tang, W.; Ji, T.; Zhu, H.; Ouyang, Y.; Wang, W. Detection of Safety Helmet Wearing Based on Improved Faster R-CNN. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 19–24 July 2020; pp. 1–7. [Google Scholar]
  13. Yao, N.; Shan, G.; Zhu, X. Substation Object Detection Based on Enhance RCNN Model. In Proceedings of the 2021 6th Asia Conference on Power and Electrical Engineering (ACPEE), Chongqing, China, 8–11 April 2021; pp. 463–469. [Google Scholar]
  14. Zhao, B.; Lan, H.; Niu, Z.; Zhu, H.; Qian, T.; Tang, W. Detection and Location of Safety Protective Wear in Power Substation Operation Using Wear-Enhanced YOLOv3 Algorithm. IEEE Access 2021, 9, 125540–125549. [Google Scholar] [CrossRef]
  15. Wang, S. Substation Personnel Safety Detection Network Based on YOLOv4. In Proceedings of the 2021 IEEE 2nd International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering (ICBAIE), Nanchang, China, 26–28 March 2021; pp. 877–881. [Google Scholar]
  16. LabelImg. LabelImg Is a Graphical Image Annotation Tool and Label Object Bounding Boxes in Images. Available online: https://github.com/tzutalin/labelImg (accessed on 3 December 2018).
  17. Mark, E.; Luc, V.; Christopher, K.I.W.; John, W.; Andrew, Z. The PASCAL Visual Object Classes (VOC) Challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar]
  18. Ajay, K.; Ghosh, D. Fuzzy rule-based image exposure level estimation and adaptive gamma correction for contrast enhancement in dark images. In Proceedings of the 2012 IEEE 11th International Conference on Signal Processing, Beijing, China, 21–25 October 2012; pp. 667–672. [Google Scholar]
  19. Gonzalez, R.; Woods, E. Digital Image Processing, 3rd ed.; Prentice-Hall: Englewood Cliffs, NJ, USA, 2009. [Google Scholar]
  20. Alexey, B.; Wang, C.; Liao, H. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
  21. He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  22. Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path Aggregation Network for Instance Segmentation. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar]
  23. Andrew, H.; Mark, S.; Chen, B.; Wang, W.; Chen, L.; Tan, M.; Chu, G.; Vijay, V.; Zhu, Y.; Pang, R.; et al. Searching for MobileNetV3. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar]
  24. Andrew, H.; Zhu, M.; Chen, B.; Dmitry, K.; Wang, W.; Tobias, W.; Marco, A.; Hartwig, A. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
  25. Mark, S.; Andrew, H.; Zhu, M.; Andrey, Z.; Chen, L. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
  26. Hu, J.; Shen, L.; Samuel, A.; Sun, G.; Wu, E. Squeeze-and-Excitation Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 42, 2011–2023. [Google Scholar] [CrossRef] [Green Version]
  27. Prajit, R.; Barret, Z.; Quoc, V. Searching for Activation Functions. arXiv 2017, arXiv:1710.05941. [Google Scholar]
  28. Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11531–11539. [Google Scholar]
Figure 1. Correction effect under different values of γ: (a) γ = 0.2; (b) γ = 0.4; (c) γ = 0.6; (d) γ = 0.8; (e) γ = 1.0; (f) γ = 1.2; (g) γ = 1.4; (h) γ = 1.6; (i) γ = 1.8; (j) γ = 2.0.
Figure 1. Correction effect under different values of γ: (a) γ = 0.2; (b) γ = 0.4; (c) γ = 0.6; (d) γ = 0.8; (e) γ = 1.0; (f) γ = 1.2; (g) γ = 1.4; (h) γ = 1.6; (i) γ = 1.8; (j) γ = 2.0.
Applsci 12 04301 g001
Figure 2. Correction effect under different values of γ.
Figure 2. Correction effect under different values of γ.
Applsci 12 04301 g002
Figure 3. The structure of YOLOv4.
Figure 3. The structure of YOLOv4.
Applsci 12 04301 g003
Figure 4. Structural diagram of the bneck module.
Figure 4. Structural diagram of the bneck module.
Applsci 12 04301 g004
Figure 5. Structural diagram of the ECA module.
Figure 5. Structural diagram of the ECA module.
Applsci 12 04301 g005
Figure 6. Structure of the improved YOLOv4 lightweight network.
Figure 6. Structure of the improved YOLOv4 lightweight network.
Applsci 12 04301 g006
Figure 7. The structure of the improved YOLOv4 lightweight network.
Figure 7. The structure of the improved YOLOv4 lightweight network.
Applsci 12 04301 g007
Figure 8. Loss curves for model training and validation.
Figure 8. Loss curves for model training and validation.
Applsci 12 04301 g008
Figure 9. The AP of each class using different improvement methods.
Figure 9. The AP of each class using different improvement methods.
Applsci 12 04301 g009
Figure 10. The mAP and AP of each class under gamma correction.
Figure 10. The mAP and AP of each class under gamma correction.
Applsci 12 04301 g010
Figure 11. Detection results under special conditions.
Figure 11. Detection results under special conditions.
Applsci 12 04301 g011
Figure 12. Detection results with gamma correction: (a,c) original images; (b,d) gamma corrected images.
Figure 12. Detection results with gamma correction: (a,c) original images; (b,d) gamma corrected images.
Applsci 12 04301 g012
Table 1. Parameter settings for model training.
Table 1. Parameter settings for model training.
Parameter NameFirst StageSecond Stage
Batch size84
Initial learning rate1 × 10−31 × 10−4
Number of epochs100100
Warmup epochs10
Label smoothing value0.01
Minimum learning rate of cosine annealing1 × 10−6
NMS threshold0.3
Table 2. Detection results using different improvement methods.
Table 2. Detection results using different improvement methods.
GroupMobileNetV3Multi-ScalesSPP and PANetmAP (%)FPS (Pictures/s)
SENetCBAMECA
1 78.4940.73
2 80.3039.25
3 79.7240.02
4 80.9839.69
5 81.4438.06
Table 3. Detection results under different detection methods.
Table 3. Detection results under different detection methods.
RatiosAP of Each Class (%)mAP (%)
BadgeGloveHelmetOperatingbarPersonPowerchecker
5:582.1574.5192.9762.2792.0259.9477.31
6:483.5376.4094.4063.6392.4163.0378.90
7:384.0174.8094.9365.8491.9560.3178.64
8:283.9276.8895.5168.5693.1570.0781.35
9:184.8875.9694.3074.9692.8065.7681.44
Table 4. Detection results using different detection methods.
Table 4. Detection results using different detection methods.
MethodsAP of Each Class (%)mAP (%)FPS (Pictures/s)
BadgeGloveHelmetOperatingbarPersonPowerchecker
Faster RCNN34.7646.4080.9761.6385.0136.1557.4912.98
SSD47.7660.1084.7260.1990.9654.8466.4328.30
YOLOv380.2369.4693.7863.2993.8858.1676.4729.48
YOLOv480.9681.8694.5866.4793.8459.7379.5722.08
Improved YOLOV484.8875.9694.3074.9692.8065.7681.4438.06
Table 5. The F1-scores of each class using different detection methods.
Table 5. The F1-scores of each class using different detection methods.
MethodsF1-Score of Each Class
BadgeGloveHelmetOperatingbarPersonPowerchecker
Faster RCNN0.290.50.790.640.780.41
SSD0.510.560.860.60.90.53
YOLOv30.820.690.910.590.910.55
YOLOv40.810.820.930.610.90.58
Improved YOLOV40.830.740.930.740.890.62
Table 6. Detection results using different values of γ.
Table 6. Detection results using different values of γ.
γAP of Each Class (%)mAP (%)
BadgeGloveHelmetOperatingbarPersonPowerchecker
0.271.7148.9990.2164.5285.4457.6169.75
0.483.3466.6193.2272.9791.4165.2878.80
0.682.5672.9194.0574.3392.3567.0780.55
0.883.0473.5994.2675.3992.5867.1681.00
184.8875.9694.374.9692.865.7681.44
1.285.0180.9595.3376.0894.0169.6583.51
1.484.5575.7894.2874.2592.6865.481.16
1.683.9274.2294.874.4292.2965.2980.82
1.882.5171.6894.4374.1491.1265.1279.83
281.4267.5593.873.3790.7964.2978.85
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Fang, J.; Li, X. Object Detection Related to Irregular Behaviors of Substation Personnel Based on Improved YOLOv4. Appl. Sci. 2022, 12, 4301. https://doi.org/10.3390/app12094301

AMA Style

Fang J, Li X. Object Detection Related to Irregular Behaviors of Substation Personnel Based on Improved YOLOv4. Applied Sciences. 2022; 12(9):4301. https://doi.org/10.3390/app12094301

Chicago/Turabian Style

Fang, Jingxin, and Xuwei Li. 2022. "Object Detection Related to Irregular Behaviors of Substation Personnel Based on Improved YOLOv4" Applied Sciences 12, no. 9: 4301. https://doi.org/10.3390/app12094301

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop